API

matrixprofile.analyze

matrixprofile.analyze(ts, query=None, windows=None, sample_pct=1.0, threshold=0.98, n_jobs=1, preprocessing_kwargs=None)[source]

Runs an appropriate workflow based on the parameters passed in. The goal of this function is to compute all fundamental algorithms on the provided time series data. For now the following is computed:

  1. Matrix Profile - exact or approximate based on sample_pct given that a window is provided. By default is the exact algorithm.

  2. Top Motifs - The top 3 motifs are found.

  3. Top Discords - The top 3 discords are found.

  4. Plot MP, Motifs and Discords

When a window is not provided or more than a single window is provided, the PMP is computed:

  1. Compute UPPER window when no window(s) is provided

  2. Compute PMP for all windows

  3. Top Motifs

  4. Top Discords

  5. Plot PMP, motifs and discords.

Parameters
  • ts (array_like) – The time series to analyze.

  • query (array_like, Optional) – The query to analyze. Note that when computing the PMP the query is ignored!

  • windows (int or array_like, Optional) – The window(s) to compute the MatrixProfile. Note that it may be an int for a single matrix profile computation or an array of ints for computing the pan matrix profile.

  • sample_pct (float, default = 1) – A float between 0 and 1 representing how many samples to compute for the MP or PMP. When it is 1, the exact algorithm is used.

  • threshold (float, Default 0.98) – The correlation coefficient used as the threshold. It should be between 0 and 1. This is used to compute the upper window size when no window(s) is given.

  • n_jobs (int, Default = 1) – Number of cpu cores to use.

  • preprocessing_kwargs (dict, default = None) –

    A dictionary object to sets parameters for preprocess function. A valid preprocessing_kwargs should have the following structure:

    >>> {
    >>>     'window': The window size to compute the mean/median/minimum/maximum value,
    >>>     'method': A string indicating the data imputation method, which should be
    >>>               'mean', 'median', 'min' or 'max',
    >>>     'direction': A string indicating the data imputation direction, which should be
    >>>                 'forward', 'fwd', 'f', 'backward', 'bwd', 'b'. If the direction is
    >>>                 forward, we use previous data for imputation; if the direction is
    >>>                 backward, we use subsequent data for imputation.,
    >>>     'add_noise': A boolean value indicating whether noise needs to be added into the
    >>>                 time series
    >>> }
    

    To disable preprocessing procedure, set the preprocessing_kwargs to None/False/””/{}.

Returns

tuple – The appropriate PMP or MP profile object and associated figures.

Return type

(profile, figures)

matrixprofile.compute

matrixprofile.compute(ts, windows=None, query=None, sample_pct=1, threshold=0.98, n_jobs=1, preprocessing_kwargs=None)[source]

Computes the exact or approximate MatrixProfile based on the sample percent specified. Currently, MPX and SCRIMP++ is used for the exact and approximate algorithms respectively. When multiple windows are passed, the Pan-MatrixProfile is computed and returned.

By default, only passing in a time series (ts), the Pan-MatrixProfile is computed based on the maximum upper window algorithm with a correlation threshold of 0.98.

Notes

When multiple windows are passed and the Pan-MatrixProfile is computed, the query is ignored!

Parameters
  • ts (array_like) – The time series to analyze.

  • windows (int, array_like) – The window(s) to compute the MatrixProfile. Note that it may be an int for a single matrix profile computation or an array of ints for computing the pan matrix profile.

  • query (array_like, optional) – The query to analyze. Note that when computing the PMP the query is ignored!

  • sample_pct (float, default 1) – A float between 0 and 1 representing how many samples to compute for the MP or PMP. When it is 1, the exact algorithm is used.

  • threshold (float, default 0.98) – The correlation coefficient used as the threshold. It should be between 0 and 1. This is used to compute the upper window size when no window(s) is given.

  • n_jobs (int, default = 1) – Number of cpu cores to use.

  • preprocessing_kwargs (dict, default = None) –

    A dictionary object to sets parameters for preprocess function. A valid preprocessing_kwargs should have the following structure:

    >>> {
    >>>     'window': The window size to compute the mean/median/minimum/maximum value,
    >>>     'method': A string indicating the data imputation method, which should be
    >>>               'mean', 'median', 'min' or 'max',
    >>>     'direction': A string indicating the data imputation direction, which should be
    >>>                 'forward', 'fwd', 'f', 'backward', 'bwd', 'b'. If the direction is
    >>>                 forward, we use previous data for imputation; if the direction is
    >>>                 backward, we use subsequent data for imputation.,
    >>>     'add_noise': A boolean value indicating whether noise needs to be added into the
    >>>                 time series
    >>> }
    

    To disable preprocessing procedure, set the preprocessing_kwargs to None/False/””/{}.

Returns

dict – The profile computed.

Return type

profile

matrixprofile.visualize

matrixprofile.visualize(profile)[source]

Automatically creates plots for the provided data structure. In some cases many plots are created. For example, when a MatrixProfile is passed with corresponding motifs and discords, the matrix profile, discords and motifs will be plotted.

Parameters

profile (dict_like) – A MatrixProfile, Pan-MatrixProfile or Statistics data structure.

Returns

list – A list of matplotlib figures.

Return type

figures

matrixprofile.preprocess.preprocess

matrixprofile.preprocess.preprocess(ts, window, impute_method='mean', impute_direction='forward', add_noise=True)[source]

Preprocesses the given time series by adding noise and imputing missing data.

Parameters
  • ts (array_like) – The time series to be preprocessed.

  • window (int) – The window size to compute the mean/median/minimum value/maximum value.

  • method (string, Default = 'mean') – A string indicating the data imputation method, which should be ‘mean’, ‘median’, ‘min’ or ‘max’.

  • direction (string, Default = 'forward') – A string indicating the data imputation direction, which should be ‘forward’, ‘fwd’, ‘f’, ‘backward’, ‘bwd’, ‘b’. If the direction is forward, we use previous data for imputation; if the direction is backward, we use subsequent data for imputation.

  • add_noise (bool, Default = True) – A boolean value indicating whether noise needs to be added into the time series.

Returns

temp – The time series after being preprocessed.

Return type

array_like

matrixprofile.preprocess.impute_missing

matrixprofile.preprocess.impute_missing(ts, window, method='mean', direction='forward')[source]

Imputes missing data in time series.

Parameters
  • ts (array_like) – The time series to be handled.

  • window (int) – The window size to compute the mean/median/minimum value/maximum value.

  • method (string, Default = 'mean') – A string indicating the data imputation method, which should be ‘mean’, ‘median’, ‘min’ or ‘max’.

  • direction (string, Default = 'forward') – A string indicating the data imputation direction, which should be ‘forward’, ‘fwd’, ‘f’, ‘backward’, ‘bwd’, ‘b’. If the direction is forward, we use previous data for imputation; if the direction is backward, we use subsequent data for imputation.

Returns

temp – The time series after being imputed missing data.

Return type

array_like

matrixprofile.preprocess.add_noise_to_series

matrixprofile.preprocess.add_noise_to_series(series)[source]

Adds noise to the given time series.

Parameters

series (array_like) – The time series subsequence to be added noise.

Returns

temp – The time series subsequence after being added noise.

Return type

array_like

matrixprofile.discover.motifs

matrixprofile.discover.motifs(profile, exclusion_zone=None, k=3, max_neighbors=10, radius=3, use_cmp=False)

Find the top K number of motifs (patterns) given a matrix profile or a pan matrix profile. By default the algorithm will find up to 3 motifs (k) and up to 10 of their neighbors with a radius of 3 * min_dist using the regular matrix profile. If the profile is a Matrix Profile data structure, you can also use a Corrected Matrix Profile to compute the motifs.

Parameters
  • profile (dict) – The output from one of the matrix profile algorithms.

  • exclusion_zone (int, Default to algorithm ez) – Desired number of values to exclude on both sides of the motif. This avoids trivial matches. It defaults to half of the computed window size. Setting the exclusion zone to 0 makes it not apply.

  • k (int, Default = 3) – Desired number of motifs to find.

  • max_neighbors (int, Default = 10) – The maximum number of neighbors to include for a given motif.

  • radius (int, Default = 3) – The radius is used to associate a neighbor by checking if the neighbor’s distance is less than or equal to dist * radius

  • use_cmp (bool, Default = False) – Use the Corrected Matrix Profile to compute the motifs (only for a Matrix Profile data structure).

Returns

dict – The original input profile with the addition of the “motifs” key. The motifs key consists of the following structure.

A list of dicts containing motif indices and their corresponding neighbor indices.

>>> [
>>>     {
>>>         'motifs': [first_index, second_index],
>>>         'neighbors': [index, index, index ...max_neighbors]
>>>     }
>>> ]

The index is a single value when a MatrixProfile is passed in otherwise the index contains a row and column index for Pan-MatrixProfile.

Return type

profile

matrixprofile.discover.discords

matrixprofile.discover.discords(profile, exclusion_zone=None, k=3)

Find the top K number of discords (anomalies) given a mp or pmp, exclusion zone and the desired number of discords. The exclusion zone nullifies entries on the left and right side of the first and subsequent discords to remove non-trivial matches. More specifically, a discord found at location X will more than likely have additional discords to the left or right of it.

Parameters
  • profile (dict) – A MatrixProfile or Pan-MatrixProfile structure.

  • exclusion_zone (int, Default mp algorithm ez) – Desired number of values to exclude on both sides of the anomaly.

  • k (int) – Desired number of discords to find.

Returns

dict – The original profile object with an additional ‘discords’ key. Take note that a MatrixProfile discord contains a single value while the Pan-MatrixProfile contains a row and column index.

Return type

profile

matrixprofile.discover.snippets

matrixprofile.discover.snippets(ts, snippet_size, num_snippets=2, window_size=None)[source]

The snippets algorithm is used to summarize your time series by identifying N number of representative subsequences. If you want to identify typical patterns in your time series, then this is the algorithm to use.

Parameters
  • ts (array_like) – The time series.

  • snippet_size (int) – The size of snippet desired.

  • num_snippets (int, Default 2) – The number of snippets you would like to find.

  • window_size (int, Default (snippet_size / 2)) – The window size.

Returns

list – A list of snippets as dictionary objects with the following structure.

>>> {
>>>     index: the index of the snippet,
>>>     snippet: the snippet values,
>>>     neighbors: the starting indices of all subsequences similar to the current snippet
>>>     fraction: fraction of the snippet
>>> }

Return type

snippets

matrixprofile.discover.regimes

matrixprofile.discover.regimes(profile, num_regimes=3)

Given a MatrixProfile, compute the corrected arc curve and extract the desired number of regimes. Regimes are computed with an exclusion zone of 5 * window size per the authors.

The author states:

This exclusion zone is based on an assumption that regimes will have multiple repetitions; FLUSS is not able to segment single gesture patterns.

Parameters
  • profile (dict) – Data structure from a MatrixProfile algorithm.

  • num_regimes (int) – The desired number of regimes to find.

Returns

dict – The original MatrixProfile object with additional keys containing.

>>> {
>>>     'cac': The corrected arc curve
>>>     'cac_ez': The exclusion zone used
>>>     'regimes': Array of starting indices indicating a regime.
>>> }

Return type

profile

matrixprofile.discover.statistics

matrixprofile.discover.statistics(ts, window_size)[source]

Compute global and moving statistics for the provided 1D time series. The statistics computed include the min, max, mean, std. and median over the window specified and globally.

Parameters
  • ts (array_like) – The time series.

  • window_size (int) – The size of the window to compute moving statistics over.

Returns

dict – The global and rolling window statistics.

>>> {
>>>     ts: the original time series,
>>>     min: the global minimum,
>>>     max: the global maximum,
>>>     mean: the global mean,
>>>     std: the global standard deviation,
>>>     median: the global median,
>>>     moving_min: the moving minimum,
>>>     moving_max: the moving maximum,
>>>     moving_mean: the moving mean,
>>>     moving_std: the moving standard deviation,
>>>     moving_median: the moving median,
>>>     window_size: the window size provided,
>>>     class: Statistics
>>> }

Return type

statistics

Raises

ValueError – If window_size is not an int. If window_size > len(ts) If ts is not a list or np.array. If ts is not 1D.

matrixprofile.discover.hierarchical_clusters

matrixprofile.discover.hierarchical_clusters(X, window_size, t, threshold=0.05, method='single', depth=2, criterion='distance', n_jobs=1)[source]

Cluster M time series into hierarchical clusters using agglomerative approach. This function is more or less a convenience wrapper around SciPy’s scipy.cluster.hierarchy functions, but uses the MPDist algorithm to compute distances between each pair of time series.

Note

Memory usage could potentially high depending on the length of your time series and how many distances are computed!

Parameters
  • X (array_like) – An M x N matrix where M is the time series and N is the observations at a given time.

  • window_size (int) – The window size used to compute the MPDist.

  • t (scalar) – For criteria ‘inconsistent’, ‘distance’ or ‘monocrit’, this is the threshold to apply when forming flat clusters. For ‘maxclust’ criteria, this would be max number of clusters requested.

  • threshold (float, Default 0.05) – The percentile in which the MPDist is taken from. By default it is set to 0.05 based on empircal research results from the paper. Generally, you should not change this unless you know what you are doing! This value must be a float greater than 0 and less than 1.

  • method (str, Default single) – The linkage algorithm to use. Options: {single, complete, average, weighted}

  • depth (int, Default 2) – A non-negative value more than 0 to specify the number of levels below a non-singleton cluster to allow.

  • criterion (str, Default distance) –

    Options: {inconsistent, distance, maxclust, monocrit} The criterion to use in forming flat clusters.

    inconsistent :

    If a cluster node and all its descendants have an inconsistent value less than or equal to t, then all its leaf descendants belong to the same flat cluster. When no non-singleton cluster meets this criterion, every node is assigned to its own cluster. (Default)

    distance :

    Forms flat clusters so that the original observations in each flat cluster have no greater a cophenetic distance than t.

    maxclust :

    Finds a minimum threshold r so that the cophenetic distance between any two original observations in the same flat cluster is no more than r and no more than t flat clusters are formed.

    monocrit :

    Forms a flat cluster from a cluster node c with index i when monocrit[j] <= t. For example, to threshold on the maximum mean distance as computed in the inconsistency matrix R with a threshold of 0.8 do:

    MR = maxRstat(Z, R, 3)
    cluster(Z, t=0.8, criterion='monocrit', monocrit=MR)
    

  • n_jobs (int, Default 1) – The number of cpu cores used to compute the MPDist.

Returns

clusters – Clustering statistics, distances and labels.

>>> {
>>>     pairwise_distances: MPDist between pairs of time series as
>>>                         np.ndarray,
>>>     linkage_matrix: clustering linkage matrix as np.ndarray,
>>>     inconsistency_statistics: inconsistency stats as np.ndarray,
>>>     assignments: cluster label associated with input X location as
>>>                  np.ndarray,
>>>     cophenet: float the cophenet statistic,
>>>     cophenet_distances: cophenet distances between pairs of time
>>>                         series as np.ndarray
>>>     class: hclusters
>>> }

Return type

dict

matrixprofile.algorithms.stomp

matrixprofile.algorithms.stomp(ts, window_size, query=None, n_jobs=1)[source]

Computes matrix profiles for a single dimensional time series using the parallelized STOMP algorithm (by default). Ray or Python’s multiprocessing library may be used. When you have initialized Ray on your machine, it takes priority over using Python’s multiprocessing.

Parameters
  • ts (array_like) – The time series to compute the matrix profile for.

  • window_size (int) – The size of the window to compute the matrix profile over.

  • query (array_like) – Optionally, a query can be provided to perform a similarity join.

  • n_jobs (int, Default = 1) – Number of cpu cores to use.

Returns

dict – A MatrixProfile data structure.

>>> {
>>>     'mp': The matrix profile,
>>>     'pi': The matrix profile 1NN indices,
>>>     'rmp': The right matrix profile,
>>>     'rpi': The right matrix profile 1NN indices,
>>>     'lmp': The left matrix profile,
>>>     'lpi': The left matrix profile 1NN indices,
>>>     'metric': The distance metric computed for the mp,
>>>     'w': The window size used to compute the matrix profile,
>>>     'ez': The exclusion zone used,
>>>     'join': Flag indicating if a similarity join was computed,
>>>     'sample_pct': Percentage of samples used in computing the MP,
>>>     'data': {
>>>         'ts': Time series data,
>>>         'query': Query data if supplied
>>>     }
>>>     'class': "MatrixProfile"
>>>     'algorithm': "stomp_parallel"
>>> }

Return type

profile

Raises

ValueError – If window_size < 4. If window_size > query length / 2. If ts is not a list or np.array. If query is not a list or np.array. If ts or query is not one dimensional.

matrixprofile.algorithms.mpx

matrixprofile.algorithms.mpx(ts, w, query=None, cross_correlation=False, n_jobs=1)[source]

The MPX algorithm computes the matrix profile without using the FFT.

Parameters
  • ts (array_like) – The time series to compute the matrix profile for.

  • w (int) – The window size.

  • query (array_like) – Optionally a query series.

  • cross_correlation (bool, Default=False) – Determine if cross_correlation distance should be returned. It defaults to Euclidean Distance.

  • n_jobs (int, Default = 1) – Number of cpu cores to use.

Returns

dict – A MatrixProfile data structure.

>>> {
>>>     'mp': The matrix profile,
>>>     'pi': The matrix profile 1NN indices,
>>>     'rmp': The right matrix profile,
>>>     'rpi': The right matrix profile 1NN indices,
>>>     'lmp': The left matrix profile,
>>>     'lpi': The left matrix profile 1NN indices,
>>>     'metric': The distance metric computed for the mp,
>>>     'w': The window size used to compute the matrix profile,
>>>     'ez': The exclusion zone used,
>>>     'join': Flag indicating if a similarity join was computed,
>>>     'sample_pct': Percentage of samples used in computing the MP,
>>>     'data': {
>>>         'ts': Time series data,
>>>         'query': Query data if supplied
>>>     }
>>>     'class': "MatrixProfile"
>>>     'algorithm': "mpx"
>>> }

Return type

profile

matrixprofile.algorithms.skimp

matrixprofile.algorithms.skimp(ts, windows=None, show_progress=False, cross_correlation=False, pmp_obj=None, sample_pct=0.1, n_jobs=1)[source]

Computes the Pan Matrix Profile (PMP) for the given time series. When the time series is only passed, windows start from 8 and increase by increments of 2 up to length(ts) / 2. Also, the PMP is only computed using 10% of the windows unless sample_pct is set to a different value.

Note

When windows is explicitly provided, sample_pct no longer takes affect. The MP for all windows provided will be computed.

Parameters
  • ts (array_like) – The time series.

  • show_progress (bool, default = False) – Show the progress in percent complete in increments of 5% by printing it out to the console.

  • cross_correlation (bool, default = False) – Return the MP values as Pearson Correlation instead of Euclidean distance.

  • pmp_obj (dict, default = None) – Repurpose already computed window sizes with this provided PMP. It should be the output of a PMP algorithm such as skimp or maximum subsequence.

  • sample_pct (float, default = 0.1 (10%)) – Number of window sizes to compute MPs for. Decimal percent between 0 and 1.

  • n_jobs (int, Default = 1) – Number of cpu cores to use.

Returns

dict – A Pan-MatrixProfile data structure.

>>> {
>>>     'pmp': the pan matrix profile as a 2D array,
>>>     'pmpi': the pmp indices,
>>>     'data': {
>>>         'ts': time series used,
>>>     },
>>>     'windows': the windows used to compute the pmp,
>>>     'sample_pct': the sample percent used,
>>>     'metric':The distance metric computed for the pmp,
>>>     'algorithm': the algorithm used,
>>>     'class': PMP
>>> }

Return type

profile

Raises

ValueError :

  1. ts is not array_like. 2. windows is not an iterable 3. show_progress is not a boolean. 4. cross_correlation is not a boolean. 5. sample_pct is not between 0 and 1.

matrixprofile.algorithms.mass2

matrixprofile.algorithms.mass2(ts, query, extras=False, threshold=1e-10)[source]

Compute the distance profile for the given query over the given time series.

Parameters
  • ts (array_like) – The time series to search.

  • query (array_like) – The query.

  • extras (boolean, default False) – Optionally return additional data used to compute the matrix profile.

Returns

np.array, dict – An array of distances np.array() or dict with extras.

With extras:

>>> {
>>>     'distance_profile': The distance profile,
>>>     'product': The FFT product between ts and query,
>>>     'data_mean': The moving average of the ts over len(query),
>>>     'query_mean': The mean of the query,
>>>     'data_std': The moving std. of the ts over len(query),
>>>     'query_std': The std. of the query
>>> }

Return type

distance_profile

Raises

ValueError – If ts is not a list or np.array. If query is not a list or np.array. If ts or query is not one dimensional.

matrixprofile.algorithms.mpdist

matrixprofile.algorithms.mpdist(ts, ts_b, w, threshold=0.05, n_jobs=1)[source]

Computes the MPDist between the two series ts and ts_b. For more details refer to the paper:

Matrix Profile XII: MPdist: A Novel Time Series Distance Measure to Allow Data Mining in More Challenging Scenarios. Shaghayegh Gharghabi, Shima Imani, Anthony Bagnall, Amirali Darvishzadeh, Eamonn Keogh. ICDM 2018

Parameters
  • ts (array_like) – The time series to compute the matrix profile for.

  • ts_b (array_like) – The time series to compare against.

  • w (int) – The window size.

  • threshold (float, Default 0.05) – The percentile in which the distance is taken from. By default it is set to 0.05 based on empircal research results from the paper. Generally, you should not change this unless you know what you are doing! This value must be a float greater than 0 and less than 1.

  • n_jobs (int, Default = 1) – Number of cpu cores to use.

Returns

float – The MPDist.

Return type

mpdist

matrixprofile.algorithms.pairwise_dist

matrixprofile.algorithms.pairwise_dist(X, window_size, threshold=0.05, n_jobs=1)[source]

Utility function to compute all pairwise distances between the timeseries using MPDist.

Note

scipy.spatial.distance.pdist cannot be used because they do not allow for jagged arrays, however their code was used as a reference in creating this function. https://github.com/scipy/scipy/blob/master/scipy/spatial/distance.py#L2039

Parameters
  • X (array_like) – An array_like object containing time series to compute distances for.

  • window_size (int) – The window size to use in computing the MPDist.

  • threshold (float) – The threshold used to compute MPDist.

  • n_jobs (int) – Number of CPU cores to use during computation.

Returns

Y – Returns a condensed distance matrix Y. For each \(i\) and \(j\) (where \(i<j<m\)),where m is the number of original observations. The metric dist(u=X[i], v=X[j]) is computed and stored in entry ij.

Return type

np.ndarray

matrixprofile.algorithms.maximum_subsequence

matrixprofile.algorithms.maximum_subsequence(ts, threshold=0.95, refine_stepsize=0.05, n_jobs=1, include_pmp=False, lower_window=8)[source]

Finds the maximum subsequence length based on the threshold provided. Note that this threshold is domain specific requiring some knowledge about the underyling time series in question.

The subsequence length starts at 8 and iteratively doubles until the maximum correlation coefficient is no longer met.

Parameters
  • ts (array_like) – The time series to analyze.

  • threshold (float, Default 0.95) – The correlation coefficient used as the threshold. It should be between 0 and 1.

  • refine_stepsize (float, Default 0.05) – Used in the refinement step to find a more precise upper window. It should be a percentage between 0.01 and 0.99.

  • n_jobs (int, Default = 1) – Number of cpu cores to use.

  • include_pmp (bool, default False) – Include the PanMatrixProfile for the computed windows.

  • lower_window (int, default 8) – Lower bound of subsequence length that can be altered if required.

Returns

With include_pmp=False (default) int : The maximum subsequence length based on the threshold provided.

With include_pmp=True dict : A dict containing the upper window, windows and pmp.

>>> {
>>>     'upper_window': The upper window,
>>>     'windows': array_like windows used to compute the pmp,
>>>     'pmp': the pan matrix profile as a 2D array,
>>>     'pmpi': the pmp indices,
>>> }

Return type

obj

matrixprofile.algorithms.prescrimp

matrixprofile.algorithms.prescrimp(ts, window_size, query=None, step_size=0.25, sample_pct=0.1, random_state=None, n_jobs=1)[source]

This is the PreScrimp algorithm from the SCRIMP++ paper. It is primarly used to compute the approximate matrix profile. In this case we use a sample percentage to mock “the anytime/approximate nature”.

Parameters
  • ts (np.ndarray) – The time series to compute the matrix profile for.

  • window_size (int) – The window size.

  • query (array_like) – Optionally, a query can be provided to perform a similarity join.

  • step_size (float, default 0.25) – The sampling interval for the window. The paper suggest 0.25 is the most practical. It should be a float value between 0 and 1.

  • sample_pct (float, default = 0.1 (10%)) – Number of samples to compute distances for in the MP.

  • random_state (int, default None) – Set the random seed generator for reproducible results.

  • n_jobs (int, Default = 1) – Number of cpu cores to use.

Note

The matrix profiles computed from prescrimp will always be the approximate solution.

Returns

dict – A MatrixProfile data structure.

>>> {
>>>    'mp': The matrix profile,
>>>    'pi': The matrix profile 1NN indices,
>>>    'rmp': The right matrix profile,
>>>    'rpi': The right matrix profile 1NN indices,
>>>    'lmp': The left matrix profile,
>>>    'lpi': The left matrix profile 1NN indices,
>>>    'metric': The distance metric computed for the mp,
>>>    'w': The window size used to compute the matrix profile,
>>>    'ez': The exclusion zone used,
>>>    'join': Flag indicating if a similarity join was computed,
>>>    'sample_pct': Percentage of samples used in computing the MP,
>>>    'data': {
>>>        'ts': Time series data,
>>>        'query': Query data if supplied
>>>    }
>>>    'class': "MatrixProfile"
>>>    'algorithm': "prescrimp"
>>>}

Return type

profile

Raises

ValueError – If window_size < 4. If window_size > query length / 2. If ts is not a list or np.array. If query is not a list or np.array. If ts or query is not one dimensional. If sample_pct is not between 0 and 1.

matrixprofile.algorithms.scrimp_plus_plus

matrixprofile.algorithms.scrimp_plus_plus(ts, window_size, query=None, step_size=0.25, sample_pct=0.1, random_state=None, n_jobs=1)[source]

SCRIMP++ is an anytime algorithm that computes the matrix profile for a given time series (ts) over a given window size (m). Essentially, it allows for an approximate solution to be provided for quicker analysis. In the case of this implementation, sample percentage is used. An approximate solution is given based a sample percentage from 0 to 1. The default sample percentage is currently 10%.

This algorithm was created at the University of California Riverside. For further academic understanding, please review this paper:

Matrix Profile XI: SCRIMP++: Time Series Motif Discovery at Interactive Speed. Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Kaveh Kamgar Eamonn Keogh, ICDM 2018.

https://www.cs.ucr.edu/~eamonn/SCRIMP_ICDM_camera_ready_updated.pdf

Parameters
  • ts (np.ndarray) – The time series to compute the matrix profile for.

  • window_size (int) – The window size.

  • query (array_like) – Optionally, a query can be provided to perform a similarity join.

  • step_size (float, default 0.25) – The sampling interval for the window. The paper suggest 0.25 is the most practical. It should be a float value between 0 and 1.

  • sample_pct (float, default = 0.1 (10%)) – Number of samples to compute distances for in the MP.

  • random_state (int, default None) – Set the random seed generator for reproducible results.

  • n_jobs (int, Default = 1) – Number of cpu cores to use.

Returns

dict – A MatrixProfile data structure.

>>> {
>>>    'mp': The matrix profile,
>>>    'pi': The matrix profile 1NN indices,
>>>    'rmp': The right matrix profile,
>>>    'rpi': The right matrix profile 1NN indices,
>>>    'lmp': The left matrix profile,
>>>    'lpi': The left matrix profile 1NN indices,
>>>    'metric': The distance metric computed for the mp,
>>>    'w': The window size used to compute the matrix profile,
>>>    'ez': The exclusion zone used,
>>>    'join': Flag indicating if a similarity join was computed,
>>>    'sample_pct': Percentage of samples used in computing the MP,
>>>    'data': {
>>>        'ts': Time series data,
>>>        'query': Query data if supplied
>>>    }
>>>    'class': "MatrixProfile"
>>>    'algorithm': "scrimp++"
>>> }

Return type

profile

Raises

ValueError – If window_size < 4. If window_size > query length / 2. If ts is not a list or np.array. If query is not a list or np.array. If ts or query is not one dimensional. If sample_pct is not between 0 and 1.

matrixprofile.transform.apply_av

matrixprofile.transform.apply_av(profile, av='default', custom_av=None)[source]

Utility function that returns a MatrixProfile data structure with a calculated annotation vector that has been applied to correct the matrix profile.

Parameters
  • profile (dict) – A MatrixProfile structure.

  • av (str, Default = "default") – The type of annotation vector to apply.

  • custom_av (array_like, Default = None) – Custom annotation vector (will only be applied if av is “custom”).

Returns

dict – A MatrixProfile data structure with a calculated annotation vector and a corrected matrix profile.

Return type

profile

Raises

ValueError – If profile is not a MatrixProfile data structure. If custom_av parameter is not array-like when using a custom av. If av paramter is invalid. If lengths of annotation vector and matrix profile are different. If values in annotation vector are outside [0.0, 1.0].

matrixprofile.transform.make_default_av

matrixprofile.transform.make_default_av(ts, window)[source]

Utility function that returns an annotation vector filled with 1s (should not change the matrix profile).

Parameters
  • ts (array_like) – The time series.

  • window (int) – The specific window size used to compute the MatrixProfile.

Returns

np.array – An annotation vector.

Return type

av

Raises

ValueError – If ts is not a list or np.array. If ts is not one-dimensional. If window is not an integer.

matrixprofile.transform.make_complexity_av

matrixprofile.transform.make_complexity_av(ts, window)[source]

Utility function that returns an annotation vector where values are based on the complexity estimation of the signal.

Parameters
  • ts (array_like) – The time series.

  • window (int) – The specific window size used to compute the MatrixProfile.

Returns

np.array – An annotation vector.

Return type

av

Raises

ValueError – If ts is not a list or np.array. If ts is not one-dimensional. If window is not an integer.

matrixprofile.transform.make_meanstd_av

matrixprofile.transform.make_meanstd_av(ts, window)[source]

Utility function that returns an annotation vector where values are set to 1 if the standard deviation is less than the mean of standard deviation. Otherwise, the values are set to 0.

Parameters
  • ts (array_like) – The time series.

  • window (int) – The specific window size used to compute the MatrixProfile.

Returns

np.array – An annotation vector.

Return type

av

Raises

ValueError – If ts is not a list or np.array. If ts is not one-dimensional. If window is not an integer.

matrixprofile.transform.make_clipping_av

matrixprofile.transform.make_clipping_av(ts, window)[source]

Utility function that returns an annotation vector such that subsequences that have more clipping have less importance.

Parameters
  • ts (array_like) – The time series.

  • window (int) – The specific window size used to compute the MatrixProfile.

Returns

np.array – An annotation vector.

Return type

av

Raises

ValueError – If ts is not a list or np.array. If ts is not one-dimensional. If window is not an integer.

matrixprofile.utils.empty_mp

matrixprofile.utils.empty_mp()[source]

Utility function that provides an empty MatrixProfile data structure.

Returns

dict – An empty MatrixProfile data structure.

Return type

profile

matrixprofile.utils.pick_mp

matrixprofile.utils.pick_mp(profile, window)[source]

Utility function that extracts a MatrixProfile from a Pan-MatrixProfile placing it into the MatrixProfile data structure.

Parameters
  • profile (dict) – A Pan-MatrixProfile data structure.

  • window (int) – The specific window size used to compute the desired MatrixProfile.

Returns

dict – A MatrixProfile data structure.

Return type

profile

Raises

ValueError – If profile is not a Pan-MatrixProfile data structure. If window is not an integer. If desired MatrixProfile is not found based on window.

matrixprofile.io.to_disk

matrixprofile.io.to_disk(profile, file_path, format='json')[source]

Writes a profile object of type MatrixProfile or PMP to disk as a JSON formatted file by default.

Note

The JSON format is human readable where as the mpf format is binary and cannot be read when opened in a text editor. When the file path does not include the extension, it is appended for you.

Parameters
  • profile (dict_like) – A MatrixProfile or Pan-MatrixProfile data structure.

  • file_path (str) – The path to write the file to.

  • format (str, default json) – The format of the file to be written. Options include json, mpf

matrixprofile.io.from_disk

matrixprofile.io.from_disk(file_path, format='infer')[source]

Reads a profile object of type MatrixProfile or PMP from disk into the respective object type. By default the type is inferred by the file extension.

Parameters
  • file_path (str) – The path to read the file from.

  • format (str, default infer) – The file format type to read from disk. Options include: infer, json, mpf

Returns

profile – A MatrixProfile or Pan-MatrixProfile data structure.

Return type

dict_like, None

matrixprofile.io.to_json

matrixprofile.io.to_json(profile)[source]

Converts a given profile object into JSON format.

Parameters

profile (dict_like) – A MatrixProfile or Pan-MatrixProfile data structure.

Returns

The profile as a JSON formatted string.

Return type

str

matrixprofile.io.from_json

matrixprofile.io.from_json(profile)[source]

Converts a JSON formatted string into a profile data structure.

Parameters

profile (str) – The profile as a JSON formatted string.

Returns

profile – A MatrixProfile or Pan-MatrixProfile data structure.

Return type

dict_like

matrixprofile.io.to_mpf

matrixprofile.io.to_mpf(profile)[source]

Converts a given profile object into MPF binary file format.

Parameters

profile (dict_like) – A MatrixProfile or Pan-MatrixProfile data structure.

Returns

The profile as a binary formatted string.

Return type

str

matrixprofile.io.from_mpf

matrixprofile.io.from_mpf(profile)[source]

Converts binary formatted MPFOutput message into a profile data structure.

Parameters

profile (str) – The profile as a binary formatted MPFOutput message.

Returns

profile – A MatrixProfile or Pan-MatrixProfile data structure.

Return type

dict_like

matrixprofile.datasets.fetch_available

matrixprofile.datasets.fetch_available(category=None)[source]

Fetches the available datasets found in github.com/matrix-profile-foundation/mpf-datasets github repository. Providing a category filters the datasets.

Parameters

category (str, Optional) – The desired category to retrieve datasets by.

Returns

A list of dictionaries containing details about each dataset.

Return type

list

Raises

ValueError: – When a category is provided, but is not found in the listing.

matrixprofile.datasets.load

matrixprofile.datasets.load(name)[source]

Loads a MPF dataset by base file name or file name. The match is case insensitive.

Note

An internet connection is required to fetch the data.

Returns

The dataset and metadata.

>>> {
>>>     'name': The file name loaded,
>>>     'category': The category the file came from,
>>>     'description': A description,
>>>     'data': The real valued data as an np.ndarray,
>>>     'datetime': The datetime as an np.ndarray
>>> }

Return type

dict