anlearn.stats¶
anlearn.stats.IQR¶
- class anlearn.stats.IQR(k: float = 1.5, lower_quantile: float = 0.25, upper_quantile: float = 0.75, ensure_2d: bool = True)[source]¶
Interquartile range
Outlier deteciton method using Tukey’s fences. If lower quantile is 0.25 (\(Q_1\) lower quartile) and upper quantile is 0.75 (\(Q_3\) upper quartile), then outlier is any observation outside the range:
\[[Q_1 - k(Q_3 - Q_1); Q_3 + k(Q_3 - Q_1)]\]John Tukey proposed \(k=1.5\) is an outlier, and \(k=3\) is far out.
- Parameters
Example
>>> import numpy as np >>> from anlearn.stats import IQR >>> X = np.hstack([[-7,-4], np.arange(5), [10, 15]]) >>> iqr = IQR(ensure_2d=False) >>> iqr.fit(X) IQR(ensure_2d=False) >>> iqr.predict(X) array([-1, 1, 1, 1, 1, 1, 1, 1, -1]) >>> iqr.score_samples(X) array([-1.75, -1. , -0. , -0. , -0. , -0. , -0. , -1.5 , -2.75])
- Raises
ValueError – Lower quantile must be lower than upper quantile.
- fit(X: anlearn._typing.ArrayLike, y: Optional[anlearn._typing.ArrayLike] = None) → anlearn.stats.IQR[source]¶
Fit estimator
- Parameters
X (ArrayLike) – Input data of shape (n_samples, 1) or (n_samples,) if ensure_2d is False
y (Optional[ArrayLike], optional) – Ignored, present for API consistency by convention, by default None
- Returns
Fitted estimator
- Return type
- fit_predict(X, y=None)¶
Perform fit on X and returns labels for X.
Returns -1 for outliers and 1 for inliers.
- Parameters
X ({array-like, sparse matrix, dataframe} of shape (n_samples, n_features)) –
y (Ignored) – Not used, present for API consistency by convention.
- Returns
y – 1 for inliers, -1 for outliers.
- Return type
ndarray of shape (n_samples,)
- get_params(deep=True)¶
Get parameters for this estimator.
- predict(X: anlearn._typing.ArrayLike) → numpy.ndarray[source]¶
Predict if samples are outliers or not
Samples with a score lower than
k
are considered to be outliers.- Parameters
X (ArrayLike) – Input data, shape (n_samples, n_features)
- Returns
Shape (n_samples,) 1 for inlineres, -1 for outliers
- Return type
- score_samples(X: anlearn._typing.ArrayLike) → numpy.ndarray[source]¶
Score samples
Score is comuputed as distance from interval \([Q_{lower}; Q_{upper}]\) divided by interquartile range. \(score = distance(data, (lqv, uqv)) / iqr\). Score is inverted for scikit-learn compatibility
- Parameters
X (ArrayLike) – Input data of shape (n_samples, 1) or (n_samples,) if
ensure_2d
is False- Returns
Shape (n_samples,). The outlier score of the input samples. The lower, the more abnormal.
- Return type
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance