Time-Series Synthesizer
ydata.synthesizers.TimeSeriesSynthesizer
Bases: BaseModel
Unlike the RegularSynthesizer, the TimeSeriesSynthesizer is designed to capture
and replicate temporal relationships within entities over time. It learns from
sequential patterns in the data and generates synthetic time-series records that
preserve trends, seasonality, and correlations per entity.
Additionally, this synthesizer can augment datasets by increasing the number of unique entities while maintaining realistic temporal behavior.
Key Features
- Time-Aware Training (
fit): Learns entity-level sequential dependencies and trends over time. - Pattern-Preserving Sampling (
sample): Generates synthetic time-series data that mimics real-world time progression. - Entity Augmentation: Expands the dataset by generating additional synthetic entities with realistic time patterns.
- Time Window Processing: Operates on an
N-entity time window to model time dependencies effectively. - Model Persistence (
save&load): Store and restore trained synthesizers for future use.
To define a single entity series the following Metadata configuration would be required:
dataset_attrs = {
"sortbykey": "sate",
}
metadata = Metadata(dataset, dataset_type=DatasetType.TIMESERIES, dataset_attrs=dataset_attrs)
dataset_attrs = {
"sortbykey": "sate",
"entities": ['entity', 'entity_2']
}
metadata = Metadata(dataset, dataset_type=DatasetType.TIMESERIES, dataset_attrs=dataset_attrs)
Usage Example
from ydata.synthesizers import TimeSeriesSynthesizer
# Step 1: Train the model with time-series data
synth = TimeSeriesSynthesizer()
synth.fit(data, metadata)
# Step 2: Generate synthetic time-series data
synthetic_data = synth.sample(n_entities=10)
# Step 3: Save the trained model
synth.save("timeseries_model.pkl")
# Step 4: Load the trained model later
loaded_synth = TimeSeriesSynthesizer.load("timeseries_model.pkl")
fit(X, metadata, extracted_cols=None, calculated_features=None, anonymize=None, privacy_level=PrivacyLevel.HIGH_FIDELITY, condition_on=None, anonymize_ids=False, segment_by='auto', random_state=None)
Train the TimeSeriesSynthesizer on real time-series data.
This method learns patterns, dependencies, and sequential behaviors from the input dataset (X)
while preserving the relationships between entities over time. The synthesizer processes time-dependent
features and constructs a generative model capable of producing realistic time-series data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
Dataset
|
Input dataset. |
required |
metadata
|
Metadata
|
Metadata instance. |
required |
extracted_cols
|
list[str]
|
List of columns to extract data from. |
None
|
calculated_features
|
list[dict[str, str |]]
|
Defines additional business rules to be ensured for the synthetic generated dataset. |
None
|
anonymize
|
Optional[dict | AnonymizerConfigurationBuilder]
|
Specifies anonymization strategies for sensitive fields while leveraging ydata's AnonymizerEngine |
None
|
privacy_level
|
str | PrivacyLevel
|
Defines the trade-off between privacy and data fidelity. Options: |
HIGH_FIDELITY
|
condition_on
|
Union[str, list[str]]
|
Enables conditional data generation by specifying key features to condition the model on. |
None
|
anonymize_ids
|
bool
|
If |
False
|
segment_by
|
str | list | `auto`
|
Defines how data should be segmented while training, based on a column or an automated decision. Options: |
'auto'
|
random_state
|
Optional
|
Set a seed for reproducibility. If |
None
|
sample(n_entities=None, smoothing=False, fidelity=None, sort_result=True, condition_on=None, balancing=False, random_state=None, connector=None, **kwargs)
Generate a time series.
This method generates a new time series. The instance should be trained via the method fit before calling sample.
The generated time series has the same length of the training data. However, in the case of multi-entity time series, it is possible
The generated time series has the same length of the training data. However, in the case of multi-entity time series, it is possible
to augment the number of entities by specifying the parameter n_entities.
For a multi-entity sample, there are two major arguments that can be used to modify the results: fidelity and smoothing.
- Fidelity: It defines how close the new entities should be from the original ones.
When a
float, it represents the behavioral noise to be added to the entity expressed as a percentage of its variance. Seeydata.synthesizer.entity_augmenter.FidelityConfigfor more details. - Smoothing: It defines if and how the new entities trajectory should be smoothed.
See
ydata.synthesizer.entity_augmenter.SmoothingConfigfor more details.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_entities
|
Optional[int]
|
Number of entities to sample. If None, generates as many entities as in the training data. By default None. |
None
|
smoothing
|
Union[bool, dict, SmoothingConfig]
|
Define how the smoothing should be done. |
False
|
fidelity Optional[Union[float, dict, FidelityConfig]]
|
Define the fidely policy. |
required | |
sort_result
|
bool
|
True if the sample should be sorted by sortbykey, False otherwise. |
True
|
condition_on
|
list[ConditionalFeature] | dict | DataFrame | None
|
Conditional rules to be applied. |
None
|
balancing
|
bool
|
If True, the categorical features included in the conditional rules have equally distributed percentages. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Dataset |
Dataset
|
The generated synthetic time-series dataset |