anlearn.loda

anlearn.loda.Histogram

class anlearn.loda.Histogram(bins: Union[int, str] = 'auto', return_min: bool = True)[source]

Histogram model

Histogram model based on scipy.stats.rv_histogram.

Parameters
hist

Value of histogram

Type

numpy.ndarray

bin_edges

Edges of histogram

Type

numpy.ndarray

pdf

Probability density function

Type

numpy.ndarray

fit(X: numpy.ndarray)anlearn.loda.Histogram[source]

Fit estimator

Parameters

X (numpy.ndarray) – Input data, shape (n_samples,)

Returns

Fitted estimator

Return type

Histogram

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

predict_proba(X: numpy.ndarray)numpy.ndarray[source]

Predict probability

Predict probability of input data X.

Parameters

X (numpy.ndarray) – Input data, shape (n_samples,)

Returns

Probability estimated from histogram, shape (n_samples,)

Return type

numpy.ndarray

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

anlearn.loda.LODA

class anlearn.loda.LODA(n_estimators: int = 1000, bins: Union[int, str] = 'auto', q: float = 0.05, random_state: Optional[int] = None, n_jobs: Optional[int] = None, verbose: int = 0)[source]

LODA: Lightweight on-line detector of anomalies 1

LODA is an ensemble of histograms on random projections. See Pevný, T. Loda 1 for more details.

Parameters
  • n_estimators (int, optional) – number of histograms, by default 1000

  • bins (Union[int, str], optional) –

    See numpy.histogram_bin_edges bins for more details, by default “auto”

  • q (float, optional) – Quantile for compution threshold from training data scores. This threshold is used for predict method, by default 0.05

  • random_state (Optional[int], optional) – Random seed used for stochastic parts., by default None

  • n_jobs (Optional[int], optional) – Not implemented yet, by default None

  • verbose (int, optional) – Verbosity of logging, by default 0

projections_

Random projections, shape (n_estimators, n_features)

Type

numpy.ndarray

hists_

Histograms on random projections, shape (n_estimators,)

Type

List[Histogram]

anomaly_threshold_

Treshold for predict() function

Type

float

Examples

>>> import numpy as np
>>> from anlearn.loda import LODA
>>> X = np.array([[0, 0], [0.1, -0.2], [0.3, 0.2], [0.2, 0.2], [-5, -5], [0.6, 0.7]])
>>> loda = LODA(n_estimators=10, bins=10, random_state=42)
>>> loda.fit(X)
LODA(bins=10, n_estimators=10, random_state=42)
>>> loda.predict(X)
array([ 1,  1,  1,  1, -1,  1])

References

1(1,2,3,4)

Pevný, T. Loda: Lightweight on-line detector of anomalies. Mach Learn 102, 275–304 (2016). <https://doi.org/10.1007/s10994-015-5521-0>

fit(X: anlearn._typing.ArrayLike, y: Optional[anlearn._typing.ArrayLike] = None)anlearn.loda.LODA[source]

Fit estimator

Parameters
  • X (ArrayLike) – Input data, shape (n_samples, n_features)

  • y (Optional[ArrayLike], optional) – Present for API consistency by convention, by default None

Returns

Fitted estimator

Return type

LODA

fit_predict(X, y=None)

Perform fit on X and returns labels for X.

Returns -1 for outliers and 1 for inliers.

Parameters
  • X ({array-like, sparse matrix, dataframe} of shape (n_samples, n_features)) –

  • y (Ignored) – Not used, present for API consistency by convention.

Returns

y – 1 for inliers, -1 for outliers.

Return type

ndarray of shape (n_samples,)

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

predict(X: anlearn._typing.ArrayLike)numpy.ndarray[source]

Predict if samples are outliers or not

Samples with a score lower than anomaly_threshold_ are considered to be outliers.

Parameters

X (ArrayLike) – Input data, shape (n_samples, n_features)

Returns

1 for inlineres, -1 for outliers, shape (n_samples,)

Return type

numpy.ndarray

score_features(X: anlearn._typing.ArrayLike)numpy.ndarray[source]

Feature importance

Feature importance is computed as a one-tailed two-sample t-test between \(-log(\hat{p}_i)\) from histograms on projections with and without a specific feature. The higher the value is, the more important feature is.

See full description in 3.3 Explaining the cause of an anomaly 1 for more details.

Parameters

X (ArrayLike) – input data, shape (n_samples, n_features)

Returns

Feature importance in anomaly detection.

Return type

numpy.ndarray

Notes

\[t_j = \frac{\mu_j - \bar{\mu}_j}{ \sqrt{\frac{s^2_j}{|I_j|} + \frac{\bar{s}^2_j}{|\bar{I_j}|}}}\]
score_samples(X: anlearn._typing.ArrayLike)numpy.ndarray[source]

Anomaly scores for samples

Average of the logarithm probabilities estimated of individual projections. Output is proportional to the negative log-likelihood of the sample, that means the less likely a sample is, the higher the anomaly value it receives 1. This score is reversed for scikit-learn compatibility.

Parameters

X (ArrayLike) – Input data, shape (n_samples, n_features)

Returns

The anomaly score of the input samples. The lower, the more abnormal. Shape (n_samples,)

Return type

numpy.ndarray

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance