Time-Series Synthesizer
ydata.synthesizers.TimeSeriesSynthesizer
Bases: BaseModel
Unlike the RegularSynthesizer
, the TimeSeriesSynthesizer
is designed to capture
and replicate temporal relationships within entities over time. It learns from
sequential patterns in the data and generates synthetic time-series records that
preserve trends, seasonality, and correlations per entity.
Additionally, this synthesizer can augment datasets by increasing the number of unique entities while maintaining realistic temporal behavior.
Key Features
- Time-Aware Training (
fit
): Learns entity-level sequential dependencies and trends over time. - Pattern-Preserving Sampling (
sample
): Generates synthetic time-series data that mimics real-world time progression. - Entity Augmentation: Expands the dataset by generating additional synthetic entities with realistic time patterns.
- Time Window Processing: Operates on an
N
-entity time window to model time dependencies effectively. - Model Persistence (
save
&load
): Store and restore trained synthesizers for future use.
To define a single entity series the following Metadata configuration would be required:
dataset_attrs = {
"sortbykey": "sate",
}
metadata = Metadata(dataset, dataset_type=DatasetType.TIMESERIES, dataset_attrs=dataset_attrs)
dataset_attrs = {
"sortbykey": "sate",
"entities": ['entity', 'entity_2']
}
metadata = Metadata(dataset, dataset_type=DatasetType.TIMESERIES, dataset_attrs=dataset_attrs)
Usage Example
from ydata.synthesizers import TimeSeriesSynthesizer
# Step 1: Train the model with time-series data
synth = TimeSeriesSynthesizer()
synth.fit(data, metadata)
# Step 2: Generate synthetic time-series data
synthetic_data = synth.sample(n_entities=10)
# Step 3: Save the trained model
synth.save("timeseries_model.pkl")
# Step 4: Load the trained model later
loaded_synth = TimeSeriesSynthesizer.load("timeseries_model.pkl")
fit(X, metadata, extracted_cols=None, calculated_features=None, anonymize=None, privacy_level=PrivacyLevel.HIGH_FIDELITY, condition_on=None, anonymize_ids=False, segment_by='auto', random_state=None)
Train the TimeSeriesSynthesizer
on real time-series data.
This method learns patterns, dependencies, and sequential behaviors from the input dataset (X
)
while preserving the relationships between entities over time. The synthesizer processes time-dependent
features and constructs a generative model capable of producing realistic time-series data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Dataset
|
Input dataset. |
required |
metadata
|
Metadata
|
Metadata instance. |
required |
extracted_cols
|
list[str]
|
List of columns to extract data from. |
None
|
calculated_features
|
list[dict[str, str |]]
|
Defines additional business rules to be ensured for the synthetic generated dataset. |
None
|
anonymize
|
Optional[dict | AnonymizerConfigurationBuilder]
|
Specifies anonymization strategies for sensitive fields while leveraging ydata's AnonymizerEngine |
None
|
privacy_level
|
str | PrivacyLevel
|
Defines the trade-off between privacy and data fidelity. Options: |
HIGH_FIDELITY
|
condition_on
|
Union[str, list[str]]
|
Enables conditional data generation by specifying key features to condition the model on. |
None
|
anonymize_ids
|
bool
|
If |
False
|
segment_by
|
str | list | `auto`
|
Defines how data should be segmented while training, based on a column or an automated decision. Options: |
'auto'
|
random_state
|
Optional
|
Set a seed for reproducibility. If |
None
|
sample(n_entities=None, smoothing=False, fidelity=None, sort_result=True, condition_on=None, balancing=False, random_state=None, connector=None, **kwargs)
Generate a time series.
This method generates a new time series. The instance should be trained via the method fit
before calling sample
.
The generated time series has the same length of the training data. However, in the case of multi-entity time series, it is possible
to augment the number of entities by specifying the parameter n_entities
.
For a multi-entity sample, there are two major arguments that can be used to modify the results: fidelity and smoothing.
- Fidelity: It defines how close the new entities should be from the original ones.
When a
float
, it represents the behavioral noise to be added to the entity expressed as a percentage of its variance. Seeydata.synthesizer.entity_augmenter.FidelityConfig
for more details. - Smoothing: It defines if and how the new entities trajectory should be smoothed.
See
ydata.synthesizer.entity_augmenter.SmoothingConfig
for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_entities
|
Optional[int]
|
Number of entities to sample. If None, generates as many entities as in the training data. By default None. |
None
|
smoothing
|
Union[bool, dict, SmoothingConfig]
|
Define how the smoothing should be done. |
False
|
fidelity
|
Optional[Union[float, dict, FidelityConfig]]
|
Define the fidely policy. |
None
|
sort_result
|
bool
|
True if the sample should be sorted by sortbykey, False otherwise. |
True
|
condition_on
|
list[ConditionalFeature] | dict | DataFrame | None
|
Conditional rules to be applied. |
None
|
balancing
|
bool
|
If True, the categorical features included in the conditional rules have equally distributed percentages. |
False
|
Returns:
Name | Type | Description |
---|---|---|
Dataset |
Dataset
|
The generated synthetic time-series dataset |