anlearn.loda¶
anlearn.loda.Histogram¶
- class anlearn.loda.Histogram(bins: Union[int, str] = 'auto', return_min: bool = True)[source]¶
Histogram model
Histogram model based on
scipy.stats.rv_histogram
.- Parameters
bins (Union[int, str], optional) –
int
- number of equal-width bins in the given range.str
- method used to calculate bin width (numpy.histogram_bin_edges
).
See
numpy.histogram_bin_edges
bins for more details, by default “auto”return_min (bool, optional) – Return minimal float value instead of 0, by default True
- hist¶
Value of histogram
- Type
- bin_edges¶
Edges of histogram
- Type
- pdf¶
Probability density function
- Type
- fit(X: numpy.ndarray) → anlearn.loda.Histogram[source]¶
Fit estimator
- Parameters
X (numpy.ndarray) – Input data, shape (n_samples,)
- Returns
Fitted estimator
- Return type
- get_params(deep=True)¶
Get parameters for this estimator.
- predict_proba(X: numpy.ndarray) → numpy.ndarray[source]¶
Predict probability
Predict probability of input data X.
- Parameters
X (numpy.ndarray) – Input data, shape (n_samples,)
- Returns
Probability estimated from histogram, shape (n_samples,)
- Return type
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
anlearn.loda.LODA¶
- class anlearn.loda.LODA(n_estimators: int = 1000, bins: Union[int, str] = 'auto', q: float = 0.05, random_state: Optional[int] = None, n_jobs: Optional[int] = None, verbose: int = 0)[source]¶
LODA: Lightweight on-line detector of anomalies 1
LODA is an ensemble of histograms on random projections. See Pevný, T. Loda 1 for more details.
- Parameters
n_estimators (int, optional) – number of histograms, by default 1000
bins (Union[int, str], optional) –
int
- number of equal-width bins in the given range.str
- method used to calculate bin width (numpy.histogram_bin_edges
).
See
numpy.histogram_bin_edges
bins for more details, by default “auto”q (float, optional) – Quantile for compution threshold from training data scores. This threshold is used for predict method, by default 0.05
random_state (Optional[int], optional) – Random seed used for stochastic parts., by default None
n_jobs (Optional[int], optional) – Not implemented yet, by default None
verbose (int, optional) – Verbosity of logging, by default 0
- projections_¶
Random projections, shape (n_estimators, n_features)
- Type
Examples
>>> import numpy as np >>> from anlearn.loda import LODA >>> X = np.array([[0, 0], [0.1, -0.2], [0.3, 0.2], [0.2, 0.2], [-5, -5], [0.6, 0.7]]) >>> loda = LODA(n_estimators=10, bins=10, random_state=42) >>> loda.fit(X) LODA(bins=10, n_estimators=10, random_state=42) >>> loda.predict(X) array([ 1, 1, 1, 1, -1, 1])
References
- 1(1,2,3,4)
Pevný, T. Loda: Lightweight on-line detector of anomalies. Mach Learn 102, 275–304 (2016). <https://doi.org/10.1007/s10994-015-5521-0>
- fit(X: anlearn._typing.ArrayLike, y: Optional[anlearn._typing.ArrayLike] = None) → anlearn.loda.LODA[source]¶
Fit estimator
- Parameters
X (ArrayLike) – Input data, shape (n_samples, n_features)
y (Optional[ArrayLike], optional) – Present for API consistency by convention, by default None
- Returns
Fitted estimator
- Return type
- fit_predict(X, y=None)¶
Perform fit on X and returns labels for X.
Returns -1 for outliers and 1 for inliers.
- Parameters
X ({array-like, sparse matrix, dataframe} of shape (n_samples, n_features)) –
y (Ignored) – Not used, present for API consistency by convention.
- Returns
y – 1 for inliers, -1 for outliers.
- Return type
ndarray of shape (n_samples,)
- get_params(deep=True)¶
Get parameters for this estimator.
- predict(X: anlearn._typing.ArrayLike) → numpy.ndarray[source]¶
Predict if samples are outliers or not
Samples with a score lower than
anomaly_threshold_
are considered to be outliers.- Parameters
X (ArrayLike) – Input data, shape (n_samples, n_features)
- Returns
1 for inlineres, -1 for outliers, shape (n_samples,)
- Return type
- score_features(X: anlearn._typing.ArrayLike) → numpy.ndarray[source]¶
Feature importance
Feature importance is computed as a one-tailed two-sample t-test between \(-log(\hat{p}_i)\) from histograms on projections with and without a specific feature. The higher the value is, the more important feature is.
See full description in 3.3 Explaining the cause of an anomaly 1 for more details.
- Parameters
X (ArrayLike) – input data, shape (n_samples, n_features)
- Returns
Feature importance in anomaly detection.
- Return type
Notes
\[t_j = \frac{\mu_j - \bar{\mu}_j}{ \sqrt{\frac{s^2_j}{|I_j|} + \frac{\bar{s}^2_j}{|\bar{I_j}|}}}\]
- score_samples(X: anlearn._typing.ArrayLike) → numpy.ndarray[source]¶
Anomaly scores for samples
Average of the logarithm probabilities estimated of individual projections. Output is proportional to the negative log-likelihood of the sample, that means the less likely a sample is, the higher the anomaly value it receives 1. This score is reversed for scikit-learn compatibility.
- Parameters
X (ArrayLike) – Input data, shape (n_samples, n_features)
- Returns
The anomaly score of the input samples. The lower, the more abnormal. Shape (n_samples,)
- Return type
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance