API¶
matrixprofile.analyze¶
-
matrixprofile.
analyze
(ts, query=None, windows=None, sample_pct=1.0, threshold=0.98, n_jobs=1, preprocessing_kwargs=None)[source]¶ Runs an appropriate workflow based on the parameters passed in. The goal of this function is to compute all fundamental algorithms on the provided time series data. For now the following is computed:
Matrix Profile - exact or approximate based on sample_pct given that a window is provided. By default is the exact algorithm.
Top Motifs - The top 3 motifs are found.
Top Discords - The top 3 discords are found.
Plot MP, Motifs and Discords
When a window is not provided or more than a single window is provided, the PMP is computed:
Compute UPPER window when no window(s) is provided
Compute PMP for all windows
Top Motifs
Top Discords
Plot PMP, motifs and discords.
- Parameters
ts (array_like) – The time series to analyze.
query (array_like, Optional) – The query to analyze. Note that when computing the PMP the query is ignored!
windows (int or array_like, Optional) – The window(s) to compute the MatrixProfile. Note that it may be an int for a single matrix profile computation or an array of ints for computing the pan matrix profile.
sample_pct (float, default = 1) – A float between 0 and 1 representing how many samples to compute for the MP or PMP. When it is 1, the exact algorithm is used.
threshold (float, Default 0.98) – The correlation coefficient used as the threshold. It should be between 0 and 1. This is used to compute the upper window size when no window(s) is given.
n_jobs (int, Default = 1) – Number of cpu cores to use.
preprocessing_kwargs (dict, default = None) –
A dictionary object to sets parameters for preprocess function. A valid preprocessing_kwargs should have the following structure:
>>> { >>> 'window': The window size to compute the mean/median/minimum/maximum value, >>> 'method': A string indicating the data imputation method, which should be >>> 'mean', 'median', 'min' or 'max', >>> 'direction': A string indicating the data imputation direction, which should be >>> 'forward', 'fwd', 'f', 'backward', 'bwd', 'b'. If the direction is >>> forward, we use previous data for imputation; if the direction is >>> backward, we use subsequent data for imputation., >>> 'add_noise': A boolean value indicating whether noise needs to be added into the >>> time series >>> }
To disable preprocessing procedure, set the preprocessing_kwargs to None/False/””/{}.
- Returns
tuple – The appropriate PMP or MP profile object and associated figures.
- Return type
(profile, figures)
matrixprofile.compute¶
-
matrixprofile.
compute
(ts, windows=None, query=None, sample_pct=1, threshold=0.98, n_jobs=1, preprocessing_kwargs=None)[source]¶ Computes the exact or approximate MatrixProfile based on the sample percent specified. Currently, MPX and SCRIMP++ is used for the exact and approximate algorithms respectively. When multiple windows are passed, the Pan-MatrixProfile is computed and returned.
By default, only passing in a time series (ts), the Pan-MatrixProfile is computed based on the maximum upper window algorithm with a correlation threshold of 0.98.
Notes
When multiple windows are passed and the Pan-MatrixProfile is computed, the query is ignored!
- Parameters
ts (array_like) – The time series to analyze.
windows (int, array_like) – The window(s) to compute the MatrixProfile. Note that it may be an int for a single matrix profile computation or an array of ints for computing the pan matrix profile.
query (array_like, optional) – The query to analyze. Note that when computing the PMP the query is ignored!
sample_pct (float, default 1) – A float between 0 and 1 representing how many samples to compute for the MP or PMP. When it is 1, the exact algorithm is used.
threshold (float, default 0.98) – The correlation coefficient used as the threshold. It should be between 0 and 1. This is used to compute the upper window size when no window(s) is given.
n_jobs (int, default = 1) – Number of cpu cores to use.
preprocessing_kwargs (dict, default = None) –
A dictionary object to sets parameters for preprocess function. A valid preprocessing_kwargs should have the following structure:
>>> { >>> 'window': The window size to compute the mean/median/minimum/maximum value, >>> 'method': A string indicating the data imputation method, which should be >>> 'mean', 'median', 'min' or 'max', >>> 'direction': A string indicating the data imputation direction, which should be >>> 'forward', 'fwd', 'f', 'backward', 'bwd', 'b'. If the direction is >>> forward, we use previous data for imputation; if the direction is >>> backward, we use subsequent data for imputation., >>> 'add_noise': A boolean value indicating whether noise needs to be added into the >>> time series >>> }
To disable preprocessing procedure, set the preprocessing_kwargs to None/False/””/{}.
- Returns
dict – The profile computed.
- Return type
profile
matrixprofile.visualize¶
-
matrixprofile.
visualize
(profile)[source]¶ Automatically creates plots for the provided data structure. In some cases many plots are created. For example, when a MatrixProfile is passed with corresponding motifs and discords, the matrix profile, discords and motifs will be plotted.
- Parameters
profile (dict_like) – A MatrixProfile, Pan-MatrixProfile or Statistics data structure.
- Returns
list – A list of matplotlib figures.
- Return type
figures
matrixprofile.preprocess.preprocess¶
-
matrixprofile.preprocess.
preprocess
(ts, window, impute_method='mean', impute_direction='forward', add_noise=True)[source]¶ Preprocesses the given time series by adding noise and imputing missing data.
- Parameters
ts (array_like) – The time series to be preprocessed.
window (int) – The window size to compute the mean/median/minimum value/maximum value.
method (string, Default = 'mean') – A string indicating the data imputation method, which should be ‘mean’, ‘median’, ‘min’ or ‘max’.
direction (string, Default = 'forward') – A string indicating the data imputation direction, which should be ‘forward’, ‘fwd’, ‘f’, ‘backward’, ‘bwd’, ‘b’. If the direction is forward, we use previous data for imputation; if the direction is backward, we use subsequent data for imputation.
add_noise (bool, Default = True) – A boolean value indicating whether noise needs to be added into the time series.
- Returns
temp – The time series after being preprocessed.
- Return type
array_like
matrixprofile.preprocess.impute_missing¶
-
matrixprofile.preprocess.
impute_missing
(ts, window, method='mean', direction='forward')[source]¶ Imputes missing data in time series.
- Parameters
ts (array_like) – The time series to be handled.
window (int) – The window size to compute the mean/median/minimum value/maximum value.
method (string, Default = 'mean') – A string indicating the data imputation method, which should be ‘mean’, ‘median’, ‘min’ or ‘max’.
direction (string, Default = 'forward') – A string indicating the data imputation direction, which should be ‘forward’, ‘fwd’, ‘f’, ‘backward’, ‘bwd’, ‘b’. If the direction is forward, we use previous data for imputation; if the direction is backward, we use subsequent data for imputation.
- Returns
temp – The time series after being imputed missing data.
- Return type
array_like
matrixprofile.preprocess.add_noise_to_series¶
matrixprofile.discover.motifs¶
-
matrixprofile.discover.
motifs
(profile, exclusion_zone=None, k=3, max_neighbors=10, radius=3, use_cmp=False)¶ Find the top K number of motifs (patterns) given a matrix profile or a pan matrix profile. By default the algorithm will find up to 3 motifs (k) and up to 10 of their neighbors with a radius of 3 * min_dist using the regular matrix profile. If the profile is a Matrix Profile data structure, you can also use a Corrected Matrix Profile to compute the motifs.
- Parameters
profile (dict) – The output from one of the matrix profile algorithms.
exclusion_zone (int, Default to algorithm ez) – Desired number of values to exclude on both sides of the motif. This avoids trivial matches. It defaults to half of the computed window size. Setting the exclusion zone to 0 makes it not apply.
k (int, Default = 3) – Desired number of motifs to find.
max_neighbors (int, Default = 10) – The maximum number of neighbors to include for a given motif.
radius (int, Default = 3) – The radius is used to associate a neighbor by checking if the neighbor’s distance is less than or equal to dist * radius
use_cmp (bool, Default = False) – Use the Corrected Matrix Profile to compute the motifs (only for a Matrix Profile data structure).
- Returns
dict – The original input profile with the addition of the “motifs” key. The motifs key consists of the following structure.
A list of dicts containing motif indices and their corresponding neighbor indices.
>>> [ >>> { >>> 'motifs': [first_index, second_index], >>> 'neighbors': [index, index, index ...max_neighbors] >>> } >>> ]
The index is a single value when a MatrixProfile is passed in otherwise the index contains a row and column index for Pan-MatrixProfile.
- Return type
profile
matrixprofile.discover.discords¶
-
matrixprofile.discover.
discords
(profile, exclusion_zone=None, k=3)¶ Find the top K number of discords (anomalies) given a mp or pmp, exclusion zone and the desired number of discords. The exclusion zone nullifies entries on the left and right side of the first and subsequent discords to remove non-trivial matches. More specifically, a discord found at location X will more than likely have additional discords to the left or right of it.
- Parameters
profile (dict) – A MatrixProfile or Pan-MatrixProfile structure.
exclusion_zone (int, Default mp algorithm ez) – Desired number of values to exclude on both sides of the anomaly.
k (int) – Desired number of discords to find.
- Returns
dict – The original profile object with an additional ‘discords’ key. Take note that a MatrixProfile discord contains a single value while the Pan-MatrixProfile contains a row and column index.
- Return type
profile
matrixprofile.discover.snippets¶
-
matrixprofile.discover.
snippets
(ts, snippet_size, num_snippets=2, window_size=None)[source]¶ The snippets algorithm is used to summarize your time series by identifying N number of representative subsequences. If you want to identify typical patterns in your time series, then this is the algorithm to use.
- Parameters
ts (array_like) – The time series.
snippet_size (int) – The size of snippet desired.
num_snippets (int, Default 2) – The number of snippets you would like to find.
window_size (int, Default (snippet_size / 2)) – The window size.
- Returns
list – A list of snippets as dictionary objects with the following structure.
>>> { >>> index: the index of the snippet, >>> snippet: the snippet values, >>> neighbors: the starting indices of all subsequences similar to the current snippet >>> fraction: fraction of the snippet >>> }
- Return type
snippets
matrixprofile.discover.regimes¶
-
matrixprofile.discover.
regimes
(profile, num_regimes=3)¶ Given a MatrixProfile, compute the corrected arc curve and extract the desired number of regimes. Regimes are computed with an exclusion zone of 5 * window size per the authors.
- The author states:
This exclusion zone is based on an assumption that regimes will have multiple repetitions; FLUSS is not able to segment single gesture patterns.
- Parameters
profile (dict) – Data structure from a MatrixProfile algorithm.
num_regimes (int) – The desired number of regimes to find.
- Returns
dict – The original MatrixProfile object with additional keys containing.
>>> { >>> 'cac': The corrected arc curve >>> 'cac_ez': The exclusion zone used >>> 'regimes': Array of starting indices indicating a regime. >>> }
- Return type
profile
matrixprofile.discover.statistics¶
-
matrixprofile.discover.
statistics
(ts, window_size)[source]¶ Compute global and moving statistics for the provided 1D time series. The statistics computed include the min, max, mean, std. and median over the window specified and globally.
- Parameters
ts (array_like) – The time series.
window_size (int) – The size of the window to compute moving statistics over.
- Returns
dict – The global and rolling window statistics.
>>> { >>> ts: the original time series, >>> min: the global minimum, >>> max: the global maximum, >>> mean: the global mean, >>> std: the global standard deviation, >>> median: the global median, >>> moving_min: the moving minimum, >>> moving_max: the moving maximum, >>> moving_mean: the moving mean, >>> moving_std: the moving standard deviation, >>> moving_median: the moving median, >>> window_size: the window size provided, >>> class: Statistics >>> }
- Return type
statistics
- Raises
ValueError – If window_size is not an int. If window_size > len(ts) If ts is not a list or np.array. If ts is not 1D.
matrixprofile.discover.hierarchical_clusters¶
-
matrixprofile.discover.
hierarchical_clusters
(X, window_size, t, threshold=0.05, method='single', depth=2, criterion='distance', n_jobs=1)[source]¶ Cluster M time series into hierarchical clusters using agglomerative approach. This function is more or less a convenience wrapper around SciPy’s scipy.cluster.hierarchy functions, but uses the MPDist algorithm to compute distances between each pair of time series.
Note
Memory usage could potentially high depending on the length of your time series and how many distances are computed!
- Parameters
X (array_like) – An M x N matrix where M is the time series and N is the observations at a given time.
window_size (int) – The window size used to compute the MPDist.
t (scalar) – For criteria ‘inconsistent’, ‘distance’ or ‘monocrit’, this is the threshold to apply when forming flat clusters. For ‘maxclust’ criteria, this would be max number of clusters requested.
threshold (float, Default 0.05) – The percentile in which the MPDist is taken from. By default it is set to 0.05 based on empircal research results from the paper. Generally, you should not change this unless you know what you are doing! This value must be a float greater than 0 and less than 1.
method (str, Default single) – The linkage algorithm to use. Options: {single, complete, average, weighted}
depth (int, Default 2) – A non-negative value more than 0 to specify the number of levels below a non-singleton cluster to allow.
criterion (str, Default distance) –
Options: {inconsistent, distance, maxclust, monocrit} The criterion to use in forming flat clusters.
inconsistent
:If a cluster node and all its descendants have an inconsistent value less than or equal to t, then all its leaf descendants belong to the same flat cluster. When no non-singleton cluster meets this criterion, every node is assigned to its own cluster. (Default)
distance
:Forms flat clusters so that the original observations in each flat cluster have no greater a cophenetic distance than t.
maxclust
:Finds a minimum threshold
r
so that the cophenetic distance between any two original observations in the same flat cluster is no more thanr
and no more than t flat clusters are formed.monocrit
:Forms a flat cluster from a cluster node c with index i when
monocrit[j] <= t
. For example, to threshold on the maximum mean distance as computed in the inconsistency matrix R with a threshold of 0.8 do:MR = maxRstat(Z, R, 3) cluster(Z, t=0.8, criterion='monocrit', monocrit=MR)
n_jobs (int, Default 1) – The number of cpu cores used to compute the MPDist.
- Returns
clusters – Clustering statistics, distances and labels.
>>> { >>> pairwise_distances: MPDist between pairs of time series as >>> np.ndarray, >>> linkage_matrix: clustering linkage matrix as np.ndarray, >>> inconsistency_statistics: inconsistency stats as np.ndarray, >>> assignments: cluster label associated with input X location as >>> np.ndarray, >>> cophenet: float the cophenet statistic, >>> cophenet_distances: cophenet distances between pairs of time >>> series as np.ndarray >>> class: hclusters >>> }
- Return type
dict
matrixprofile.algorithms.stomp¶
-
matrixprofile.algorithms.
stomp
(ts, window_size, query=None, n_jobs=1)[source]¶ Computes matrix profiles for a single dimensional time series using the parallelized STOMP algorithm (by default). Ray or Python’s multiprocessing library may be used. When you have initialized Ray on your machine, it takes priority over using Python’s multiprocessing.
- Parameters
ts (array_like) – The time series to compute the matrix profile for.
window_size (int) – The size of the window to compute the matrix profile over.
query (array_like) – Optionally, a query can be provided to perform a similarity join.
n_jobs (int, Default = 1) – Number of cpu cores to use.
- Returns
dict – A MatrixProfile data structure.
>>> { >>> 'mp': The matrix profile, >>> 'pi': The matrix profile 1NN indices, >>> 'rmp': The right matrix profile, >>> 'rpi': The right matrix profile 1NN indices, >>> 'lmp': The left matrix profile, >>> 'lpi': The left matrix profile 1NN indices, >>> 'metric': The distance metric computed for the mp, >>> 'w': The window size used to compute the matrix profile, >>> 'ez': The exclusion zone used, >>> 'join': Flag indicating if a similarity join was computed, >>> 'sample_pct': Percentage of samples used in computing the MP, >>> 'data': { >>> 'ts': Time series data, >>> 'query': Query data if supplied >>> } >>> 'class': "MatrixProfile" >>> 'algorithm': "stomp_parallel" >>> }
- Return type
profile
- Raises
ValueError – If window_size < 4. If window_size > query length / 2. If ts is not a list or np.array. If query is not a list or np.array. If ts or query is not one dimensional.
matrixprofile.algorithms.mpx¶
-
matrixprofile.algorithms.
mpx
(ts, w, query=None, cross_correlation=False, n_jobs=1)[source]¶ The MPX algorithm computes the matrix profile without using the FFT.
- Parameters
ts (array_like) – The time series to compute the matrix profile for.
w (int) – The window size.
query (array_like) – Optionally a query series.
cross_correlation (bool, Default=False) – Determine if cross_correlation distance should be returned. It defaults to Euclidean Distance.
n_jobs (int, Default = 1) – Number of cpu cores to use.
- Returns
dict – A MatrixProfile data structure.
>>> { >>> 'mp': The matrix profile, >>> 'pi': The matrix profile 1NN indices, >>> 'rmp': The right matrix profile, >>> 'rpi': The right matrix profile 1NN indices, >>> 'lmp': The left matrix profile, >>> 'lpi': The left matrix profile 1NN indices, >>> 'metric': The distance metric computed for the mp, >>> 'w': The window size used to compute the matrix profile, >>> 'ez': The exclusion zone used, >>> 'join': Flag indicating if a similarity join was computed, >>> 'sample_pct': Percentage of samples used in computing the MP, >>> 'data': { >>> 'ts': Time series data, >>> 'query': Query data if supplied >>> } >>> 'class': "MatrixProfile" >>> 'algorithm': "mpx" >>> }
- Return type
profile
matrixprofile.algorithms.skimp¶
-
matrixprofile.algorithms.
skimp
(ts, windows=None, show_progress=False, cross_correlation=False, pmp_obj=None, sample_pct=0.1, n_jobs=1)[source]¶ Computes the Pan Matrix Profile (PMP) for the given time series. When the time series is only passed, windows start from 8 and increase by increments of 2 up to length(ts) / 2. Also, the PMP is only computed using 10% of the windows unless sample_pct is set to a different value.
Note
When windows is explicitly provided, sample_pct no longer takes affect. The MP for all windows provided will be computed.
- Parameters
ts (array_like) – The time series.
show_progress (bool, default = False) – Show the progress in percent complete in increments of 5% by printing it out to the console.
cross_correlation (bool, default = False) – Return the MP values as Pearson Correlation instead of Euclidean distance.
pmp_obj (dict, default = None) – Repurpose already computed window sizes with this provided PMP. It should be the output of a PMP algorithm such as skimp or maximum subsequence.
sample_pct (float, default = 0.1 (10%)) – Number of window sizes to compute MPs for. Decimal percent between 0 and 1.
n_jobs (int, Default = 1) – Number of cpu cores to use.
- Returns
dict – A Pan-MatrixProfile data structure.
>>> { >>> 'pmp': the pan matrix profile as a 2D array, >>> 'pmpi': the pmp indices, >>> 'data': { >>> 'ts': time series used, >>> }, >>> 'windows': the windows used to compute the pmp, >>> 'sample_pct': the sample percent used, >>> 'metric':The distance metric computed for the pmp, >>> 'algorithm': the algorithm used, >>> 'class': PMP >>> }
- Return type
profile
- Raises
ValueError : –
ts is not array_like. 2. windows is not an iterable 3. show_progress is not a boolean. 4. cross_correlation is not a boolean. 5. sample_pct is not between 0 and 1.
matrixprofile.algorithms.mass2¶
-
matrixprofile.algorithms.
mass2
(ts, query, extras=False, threshold=1e-10)[source]¶ Compute the distance profile for the given query over the given time series.
- Parameters
ts (array_like) – The time series to search.
query (array_like) – The query.
extras (boolean, default False) – Optionally return additional data used to compute the matrix profile.
- Returns
np.array, dict – An array of distances np.array() or dict with extras.
With extras:
>>> { >>> 'distance_profile': The distance profile, >>> 'product': The FFT product between ts and query, >>> 'data_mean': The moving average of the ts over len(query), >>> 'query_mean': The mean of the query, >>> 'data_std': The moving std. of the ts over len(query), >>> 'query_std': The std. of the query >>> }
- Return type
distance_profile
- Raises
ValueError – If ts is not a list or np.array. If query is not a list or np.array. If ts or query is not one dimensional.
matrixprofile.algorithms.mpdist¶
-
matrixprofile.algorithms.
mpdist
(ts, ts_b, w, threshold=0.05, n_jobs=1)[source]¶ Computes the MPDist between the two series ts and ts_b. For more details refer to the paper:
Matrix Profile XII: MPdist: A Novel Time Series Distance Measure to Allow Data Mining in More Challenging Scenarios. Shaghayegh Gharghabi, Shima Imani, Anthony Bagnall, Amirali Darvishzadeh, Eamonn Keogh. ICDM 2018
- Parameters
ts (array_like) – The time series to compute the matrix profile for.
ts_b (array_like) – The time series to compare against.
w (int) – The window size.
threshold (float, Default 0.05) – The percentile in which the distance is taken from. By default it is set to 0.05 based on empircal research results from the paper. Generally, you should not change this unless you know what you are doing! This value must be a float greater than 0 and less than 1.
n_jobs (int, Default = 1) – Number of cpu cores to use.
- Returns
float – The MPDist.
- Return type
mpdist
matrixprofile.algorithms.pairwise_dist¶
-
matrixprofile.algorithms.
pairwise_dist
(X, window_size, threshold=0.05, n_jobs=1)[source]¶ Utility function to compute all pairwise distances between the timeseries using MPDist.
Note
scipy.spatial.distance.pdist cannot be used because they do not allow for jagged arrays, however their code was used as a reference in creating this function. https://github.com/scipy/scipy/blob/master/scipy/spatial/distance.py#L2039
- Parameters
X (array_like) – An array_like object containing time series to compute distances for.
window_size (int) – The window size to use in computing the MPDist.
threshold (float) – The threshold used to compute MPDist.
n_jobs (int) – Number of CPU cores to use during computation.
- Returns
Y – Returns a condensed distance matrix Y. For each \(i\) and \(j\) (where \(i<j<m\)),where m is the number of original observations. The metric
dist(u=X[i], v=X[j])
is computed and stored in entryij
.- Return type
np.ndarray
matrixprofile.algorithms.maximum_subsequence¶
-
matrixprofile.algorithms.
maximum_subsequence
(ts, threshold=0.95, refine_stepsize=0.05, n_jobs=1, include_pmp=False, lower_window=8)[source]¶ Finds the maximum subsequence length based on the threshold provided. Note that this threshold is domain specific requiring some knowledge about the underyling time series in question.
The subsequence length starts at 8 and iteratively doubles until the maximum correlation coefficient is no longer met.
- Parameters
ts (array_like) – The time series to analyze.
threshold (float, Default 0.95) – The correlation coefficient used as the threshold. It should be between 0 and 1.
refine_stepsize (float, Default 0.05) – Used in the refinement step to find a more precise upper window. It should be a percentage between 0.01 and 0.99.
n_jobs (int, Default = 1) – Number of cpu cores to use.
include_pmp (bool, default False) – Include the PanMatrixProfile for the computed windows.
lower_window (int, default 8) – Lower bound of subsequence length that can be altered if required.
- Returns
With include_pmp=False (default) int : The maximum subsequence length based on the threshold provided.
With include_pmp=True dict : A dict containing the upper window, windows and pmp.
>>> { >>> 'upper_window': The upper window, >>> 'windows': array_like windows used to compute the pmp, >>> 'pmp': the pan matrix profile as a 2D array, >>> 'pmpi': the pmp indices, >>> }
- Return type
obj
matrixprofile.algorithms.prescrimp¶
-
matrixprofile.algorithms.
prescrimp
(ts, window_size, query=None, step_size=0.25, sample_pct=0.1, random_state=None, n_jobs=1)[source]¶ This is the PreScrimp algorithm from the SCRIMP++ paper. It is primarly used to compute the approximate matrix profile. In this case we use a sample percentage to mock “the anytime/approximate nature”.
- Parameters
ts (np.ndarray) – The time series to compute the matrix profile for.
window_size (int) – The window size.
query (array_like) – Optionally, a query can be provided to perform a similarity join.
step_size (float, default 0.25) – The sampling interval for the window. The paper suggest 0.25 is the most practical. It should be a float value between 0 and 1.
sample_pct (float, default = 0.1 (10%)) – Number of samples to compute distances for in the MP.
random_state (int, default None) – Set the random seed generator for reproducible results.
n_jobs (int, Default = 1) – Number of cpu cores to use.
Note
The matrix profiles computed from prescrimp will always be the approximate solution.
- Returns
dict – A MatrixProfile data structure.
>>> { >>> 'mp': The matrix profile, >>> 'pi': The matrix profile 1NN indices, >>> 'rmp': The right matrix profile, >>> 'rpi': The right matrix profile 1NN indices, >>> 'lmp': The left matrix profile, >>> 'lpi': The left matrix profile 1NN indices, >>> 'metric': The distance metric computed for the mp, >>> 'w': The window size used to compute the matrix profile, >>> 'ez': The exclusion zone used, >>> 'join': Flag indicating if a similarity join was computed, >>> 'sample_pct': Percentage of samples used in computing the MP, >>> 'data': { >>> 'ts': Time series data, >>> 'query': Query data if supplied >>> } >>> 'class': "MatrixProfile" >>> 'algorithm': "prescrimp" >>>}
- Return type
profile
- Raises
ValueError – If window_size < 4. If window_size > query length / 2. If ts is not a list or np.array. If query is not a list or np.array. If ts or query is not one dimensional. If sample_pct is not between 0 and 1.
matrixprofile.algorithms.scrimp_plus_plus¶
-
matrixprofile.algorithms.
scrimp_plus_plus
(ts, window_size, query=None, step_size=0.25, sample_pct=0.1, random_state=None, n_jobs=1)[source]¶ SCRIMP++ is an anytime algorithm that computes the matrix profile for a given time series (ts) over a given window size (m). Essentially, it allows for an approximate solution to be provided for quicker analysis. In the case of this implementation, sample percentage is used. An approximate solution is given based a sample percentage from 0 to 1. The default sample percentage is currently 10%.
This algorithm was created at the University of California Riverside. For further academic understanding, please review this paper:
Matrix Profile XI: SCRIMP++: Time Series Motif Discovery at Interactive Speed. Yan Zhu, Chin-Chia Michael Yeh, Zachary Zimmerman, Kaveh Kamgar Eamonn Keogh, ICDM 2018.
https://www.cs.ucr.edu/~eamonn/SCRIMP_ICDM_camera_ready_updated.pdf
- Parameters
ts (np.ndarray) – The time series to compute the matrix profile for.
window_size (int) – The window size.
query (array_like) – Optionally, a query can be provided to perform a similarity join.
step_size (float, default 0.25) – The sampling interval for the window. The paper suggest 0.25 is the most practical. It should be a float value between 0 and 1.
sample_pct (float, default = 0.1 (10%)) – Number of samples to compute distances for in the MP.
random_state (int, default None) – Set the random seed generator for reproducible results.
n_jobs (int, Default = 1) – Number of cpu cores to use.
- Returns
dict – A MatrixProfile data structure.
>>> { >>> 'mp': The matrix profile, >>> 'pi': The matrix profile 1NN indices, >>> 'rmp': The right matrix profile, >>> 'rpi': The right matrix profile 1NN indices, >>> 'lmp': The left matrix profile, >>> 'lpi': The left matrix profile 1NN indices, >>> 'metric': The distance metric computed for the mp, >>> 'w': The window size used to compute the matrix profile, >>> 'ez': The exclusion zone used, >>> 'join': Flag indicating if a similarity join was computed, >>> 'sample_pct': Percentage of samples used in computing the MP, >>> 'data': { >>> 'ts': Time series data, >>> 'query': Query data if supplied >>> } >>> 'class': "MatrixProfile" >>> 'algorithm': "scrimp++" >>> }
- Return type
profile
- Raises
ValueError – If window_size < 4. If window_size > query length / 2. If ts is not a list or np.array. If query is not a list or np.array. If ts or query is not one dimensional. If sample_pct is not between 0 and 1.
matrixprofile.transform.apply_av¶
-
matrixprofile.transform.
apply_av
(profile, av='default', custom_av=None)[source]¶ Utility function that returns a MatrixProfile data structure with a calculated annotation vector that has been applied to correct the matrix profile.
- Parameters
profile (dict) – A MatrixProfile structure.
av (str, Default = "default") – The type of annotation vector to apply.
custom_av (array_like, Default = None) – Custom annotation vector (will only be applied if av is “custom”).
- Returns
dict – A MatrixProfile data structure with a calculated annotation vector and a corrected matrix profile.
- Return type
profile
- Raises
ValueError – If profile is not a MatrixProfile data structure. If custom_av parameter is not array-like when using a custom av. If av paramter is invalid. If lengths of annotation vector and matrix profile are different. If values in annotation vector are outside [0.0, 1.0].
matrixprofile.transform.make_default_av¶
-
matrixprofile.transform.
make_default_av
(ts, window)[source]¶ Utility function that returns an annotation vector filled with 1s (should not change the matrix profile).
- Parameters
ts (array_like) – The time series.
window (int) – The specific window size used to compute the MatrixProfile.
- Returns
np.array – An annotation vector.
- Return type
av
- Raises
ValueError – If ts is not a list or np.array. If ts is not one-dimensional. If window is not an integer.
matrixprofile.transform.make_complexity_av¶
-
matrixprofile.transform.
make_complexity_av
(ts, window)[source]¶ Utility function that returns an annotation vector where values are based on the complexity estimation of the signal.
- Parameters
ts (array_like) – The time series.
window (int) – The specific window size used to compute the MatrixProfile.
- Returns
np.array – An annotation vector.
- Return type
av
- Raises
ValueError – If ts is not a list or np.array. If ts is not one-dimensional. If window is not an integer.
matrixprofile.transform.make_meanstd_av¶
-
matrixprofile.transform.
make_meanstd_av
(ts, window)[source]¶ Utility function that returns an annotation vector where values are set to 1 if the standard deviation is less than the mean of standard deviation. Otherwise, the values are set to 0.
- Parameters
ts (array_like) – The time series.
window (int) – The specific window size used to compute the MatrixProfile.
- Returns
np.array – An annotation vector.
- Return type
av
- Raises
ValueError – If ts is not a list or np.array. If ts is not one-dimensional. If window is not an integer.
matrixprofile.transform.make_clipping_av¶
-
matrixprofile.transform.
make_clipping_av
(ts, window)[source]¶ Utility function that returns an annotation vector such that subsequences that have more clipping have less importance.
- Parameters
ts (array_like) – The time series.
window (int) – The specific window size used to compute the MatrixProfile.
- Returns
np.array – An annotation vector.
- Return type
av
- Raises
ValueError – If ts is not a list or np.array. If ts is not one-dimensional. If window is not an integer.
matrixprofile.utils.empty_mp¶
matrixprofile.utils.pick_mp¶
-
matrixprofile.utils.
pick_mp
(profile, window)[source]¶ Utility function that extracts a MatrixProfile from a Pan-MatrixProfile placing it into the MatrixProfile data structure.
- Parameters
profile (dict) – A Pan-MatrixProfile data structure.
window (int) – The specific window size used to compute the desired MatrixProfile.
- Returns
dict – A MatrixProfile data structure.
- Return type
profile
- Raises
ValueError – If profile is not a Pan-MatrixProfile data structure. If window is not an integer. If desired MatrixProfile is not found based on window.
matrixprofile.io.to_disk¶
-
matrixprofile.io.
to_disk
(profile, file_path, format='json')[source]¶ Writes a profile object of type MatrixProfile or PMP to disk as a JSON formatted file by default.
Note
The JSON format is human readable where as the mpf format is binary and cannot be read when opened in a text editor. When the file path does not include the extension, it is appended for you.
- Parameters
profile (dict_like) – A MatrixProfile or Pan-MatrixProfile data structure.
file_path (str) – The path to write the file to.
format (str, default json) – The format of the file to be written. Options include json, mpf
matrixprofile.io.from_disk¶
-
matrixprofile.io.
from_disk
(file_path, format='infer')[source]¶ Reads a profile object of type MatrixProfile or PMP from disk into the respective object type. By default the type is inferred by the file extension.
- Parameters
file_path (str) – The path to read the file from.
format (str, default infer) – The file format type to read from disk. Options include: infer, json, mpf
- Returns
profile – A MatrixProfile or Pan-MatrixProfile data structure.
- Return type
dict_like, None
matrixprofile.io.to_json¶
matrixprofile.io.from_json¶
matrixprofile.io.to_mpf¶
matrixprofile.io.from_mpf¶
matrixprofile.datasets.fetch_available¶
-
matrixprofile.datasets.
fetch_available
(category=None)[source]¶ Fetches the available datasets found in github.com/matrix-profile-foundation/mpf-datasets github repository. Providing a category filters the datasets.
- Parameters
category (str, Optional) – The desired category to retrieve datasets by.
- Returns
A list of dictionaries containing details about each dataset.
- Return type
list
- Raises
ValueError: – When a category is provided, but is not found in the listing.
matrixprofile.datasets.load¶
-
matrixprofile.datasets.
load
(name)[source]¶ Loads a MPF dataset by base file name or file name. The match is case insensitive.
Note
An internet connection is required to fetch the data.
- Returns
The dataset and metadata.
>>> { >>> 'name': The file name loaded, >>> 'category': The category the file came from, >>> 'description': A description, >>> 'data': The real valued data as an np.ndarray, >>> 'datetime': The datetime as an np.ndarray >>> }
- Return type
dict