anlearn.stats¶

anlearn.stats.IQR¶

class anlearn.stats.IQR(k: float = 1.5, lower_quantile: float = 0.25, upper_quantile: float = 0.75, ensure_2d: bool = True)[source]¶

Interquartile range

Outlier deteciton method using Tukey’s fences. If lower quantile is 0.25 (\(Q_1\) lower quartile) and upper quantile is 0.75 (\(Q_3\) upper quartile), then outlier is any observation outside the range:

\[[Q_1 - k(Q_3 - Q_1); Q_3 + k(Q_3 - Q_1)]\]

John Tukey proposed \(k=1.5\) is an outlier, and \(k=3\) is far out.

Parameters

k (float, optional) – Outlier threshold, by default 1.5
lower_quantile (float, optional) – Lower quantile, from (0; 1), by default 0.25
upper_quantile (float, optional) – Upper quantile, from (0; 1), by default 0.75
ensure_2d (bool, optional) – Frobid input 1D arrays, by default True

lqv_¶

Lower quantile value estimated from the input data

Type: float

uqv_¶

Upper quantile value estimated from the input data

Type: float

iqr_¶

Interquartile range, uqv_ - lqv_

Type: float

Example

>>> import numpy as np
>>> from anlearn.stats import IQR
>>> X = np.hstack([[-7,-4], np.arange(5), [10, 15]])
>>> iqr = IQR(ensure_2d=False)
>>> iqr.fit(X)
IQR(ensure_2d=False)
>>> iqr.predict(X)
array([-1,  1,  1,  1,  1,  1,  1,  1, -1])
>>> iqr.score_samples(X)
array([-1.75, -1.  , -0.  , -0.  , -0.  , -0.  , -0.  , -1.5 , -2.75])

Raises: ValueError – Lower quantile must be lower than upper quantile.

fit(X: anlearn._typing.ArrayLike, y: Optional[anlearn._typing.ArrayLike] = None) → anlearn.stats.IQR [source]¶

Fit estimator

Parameters

X (ArrayLike) – Input data of shape (n_samples, 1) or (n_samples,) if ensure_2d is False
y (Optional[ArrayLike], optional) – Ignored, present for API consistency by convention, by default None

Returns

Fitted estimator

Return type

IQR

fit_predict(X, y=None)¶

Perform fit on X and returns labels for X.

Returns -1 for outliers and 1 for inliers.

Parameters

X ({array-like, sparse matrix, dataframe} of shape (n_samples, n_features)) –
y (Ignored) – Not used, present for API consistency by convention.

Returns

y – 1 for inliers, -1 for outliers.

Return type

ndarray of shape (n_samples,)

get_params(deep=True)¶

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: dict

predict(X: anlearn._typing.ArrayLike) → numpy.ndarray [source]¶

Predict if samples are outliers or not

Samples with a score lower than k are considered to be outliers.

Parameters: X (ArrayLike) – Input data, shape (n_samples, n_features)
Returns: Shape (n_samples,) 1 for inlineres, -1 for outliers
Return type: numpy.ndarray

score_samples(X: anlearn._typing.ArrayLike) → numpy.ndarray [source]¶

Score samples

Score is comuputed as distance from interval \([Q_{lower}; Q_{upper}]\) divided by interquartile range. \(score = distance(data, (lqv, uqv)) / iqr\). Score is inverted for scikit-learn compatibility

Parameters: X (ArrayLike) – Input data of shape (n_samples, 1) or (n_samples,) if ensure_2d is False
Returns: Shape (n_samples,). The outlier score of the input samples. The lower, the more abnormal.
Return type: numpy.ndarray

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: estimator instance