Submit
Path:
~
/
/
proc
/
thread-self
/
root
/
opt
/
alt
/
python35
/
lib64
/
python3.5
/
site-packages
/
sklearn
/
ensemble
/
__pycache__
/
File Content:
iforest.cpython-35.pyc
��(X�. � @ s� d d l m Z d d l Z d d l Z d d l m Z d d l m Z d d l Z d d l m Z d d l m Z d d l m Z m Z d d l m Z d g Z e j e j f Z Gd d � d e � Z d d � Z d S)� )�divisionN)�warn)�issparse� )�six)�ExtraTreeRegressor)�check_random_state�check_array� )�BaseBagging�IsolationForestc s| e Z d Z d Z d d d d d d d d � f d d � Z d d � Z d d � f d d � Z d d � Z d d � Z � S)r a0 Isolation Forest Algorithm Return the anomaly score of each sample using the IsolationForest algorithm The IsolationForest 'isolates' observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. Since recursive partitioning can be represented by a tree structure, the number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node. This path length, averaged over a forest of such random trees, is a measure of normality and our decision function. Random partitioning produces noticeably shorter paths for anomalies. Hence, when a forest of random trees collectively produce shorter path lengths for particular samples, they are highly likely to be anomalies. Read more in the :ref:`User Guide <isolation_forest>`. .. versionadded:: 0.18 Parameters ---------- n_estimators : int, optional (default=100) The number of base estimators in the ensemble. max_samples : int or float, optional (default="auto") The number of samples to draw from X to train each base estimator. - If int, then draw `max_samples` samples. - If float, then draw `max_samples * X.shape[0]` samples. - If "auto", then `max_samples=min(256, n_samples)`. If max_samples is larger than the number of samples provided, all samples will be used for all trees (no sampling). contamination : float in (0., 0.5), optional (default=0.1) The amount of contamination of the data set, i.e. the proportion of outliers in the data set. Used when fitting to define the threshold on the decision function. max_features : int or float, optional (default=1.0) The number of features to draw from X to train each base estimator. - If int, then draw `max_features` features. - If float, then draw `max_features * X.shape[1]` features. bootstrap : boolean, optional (default=False) If True, individual trees are fit on random subsets of the training data sampled with replacement. If False, sampling without replacement is performed. n_jobs : integer, optional (default=1) The number of jobs to run in parallel for both `fit` and `predict`. If -1, then the number of jobs is set to the number of cores. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. verbose : int, optional (default=0) Controls the verbosity of the tree building process. Attributes ---------- estimators_ : list of DecisionTreeClassifier The collection of fitted sub-estimators. estimators_samples_ : list of arrays The subset of drawn samples (i.e., the in-bag samples) for each base estimator. max_samples_ : integer The actual number of samples References ---------- .. [1] Liu, Fei Tony, Ting, Kai Ming and Zhou, Zhi-Hua. "Isolation forest." Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. .. [2] Liu, Fei Tony, Ting, Kai Ming and Zhou, Zhi-Hua. "Isolation-based anomaly detection." ACM Transactions on Knowledge Discovery from Data (TKDD) 6.1 (2012): 3. �d �autog�������?g �?Fr Nr c sk t t | � j d t d d d d d | � d | d d d | d | d | d | d | d | � | | _ d S)NZbase_estimator�max_featuresr Zsplitter�random�random_state� bootstrapZbootstrap_featuresF�n_estimators�max_samples�n_jobs�verbose)�superr �__init__r � contamination) �selfr r r r r r r r )� __class__� �/iforest.pyr r s zIsolationForest.__init__c C s t d � � d S)Nz"OOB score not supported by iforest)�NotImplementedError)r �X�yr r r �_set_oob_score� s zIsolationForest._set_oob_scorec s� t | d d g d d �} t | � r1 | j � t | j � } | j d | j d � } | j d } t | j t j � r� | j d k r� t d | � } qKt d | j � � n� t | j t � r� | j | k r� t d | j | f � | } qK| j } nL d | j k od k n s1t d | j � � t | j | j d � } | | _ t t j t j t | d � � � � } t t | � j | | | d | d | �t j j | j | � d d | j � | _ | S)a� Fit estimator. Parameters ---------- X : array-like or sparse matrix, shape (n_samples, n_features) The input samples. Use ``dtype=np.float32`` for maximum efficiency. Sparse matrices are also supported, use sparse ``csc_matrix`` for maximum efficiency. Returns ------- self : object Returns self. � accept_sparseZcscZ ensure_2dF�sizer r � zHmax_samples (%s) is not supported.Valid choices are: "auto", int orfloatzwmax_samples (%s) is greater than the total number of samples (%s). max_samples will be set to n_samples for estimation.g g �?z%max_samples must be in (0, 1], got %rr � max_depth� sample_weightg Y@)r r Zsort_indicesr r �uniform�shape� isinstancer r Zstring_types�min� ValueError� INTEGER_TYPESr �int�max_samples_�np�ceil�log2�maxr r Z_fit�spZstatsZscoreatpercentile�decision_functionr � threshold_)r r r r&