Skip to content

Time-Series Synthesizer

ydata.synthesizers.TimeSeriesSynthesizer

Bases: BaseModel

Unlike the RegularSynthesizer, the TimeSeriesSynthesizer is designed to capture and replicate temporal relationships within entities over time. It learns from sequential patterns in the data and generates synthetic time-series records that preserve trends, seasonality, and correlations per entity.

Additionally, this synthesizer can augment datasets by increasing the number of unique entities while maintaining realistic temporal behavior.

Key Features
  • Time-Aware Training (fit): Learns entity-level sequential dependencies and trends over time.
  • Pattern-Preserving Sampling (sample): Generates synthetic time-series data that mimics real-world time progression.
  • Entity Augmentation: Expands the dataset by generating additional synthetic entities with realistic time patterns.
  • Time Window Processing: Operates on an N-entity time window to model time dependencies effectively.
  • Model Persistence (save & load): Store and restore trained synthesizers for future use.

To define a single entity series the following Metadata configuration would be required:

    dataset_attrs = {
        "sortbykey": "sate",
    }

metadata = Metadata(dataset, dataset_type=DatasetType.TIMESERIES, dataset_attrs=dataset_attrs)
As for a multi-entity time series, it requires the metadata dataset attributes to specify at least one column corresponding to an entity ID. For instance, the following example specify two columns as entity ID columns:
dataset_attrs = {
    "sortbykey": "sate",
    "entities": ['entity', 'entity_2']
}

metadata = Metadata(dataset, dataset_type=DatasetType.TIMESERIES, dataset_attrs=dataset_attrs)

Usage Example
from ydata.synthesizers import TimeSeriesSynthesizer

# Step 1: Train the model with time-series data
synth = TimeSeriesSynthesizer()
synth.fit(data, metadata)

# Step 2: Generate synthetic time-series data
synthetic_data = synth.sample(n_entities=10)

# Step 3: Save the trained model
synth.save("timeseries_model.pkl")

# Step 4: Load the trained model later
loaded_synth = TimeSeriesSynthesizer.load("timeseries_model.pkl")

fit(X, metadata, extracted_cols=None, calculated_features=None, anonymize=None, privacy_level=PrivacyLevel.HIGH_FIDELITY, condition_on=None, anonymize_ids=False, segment_by='auto', random_state=None)

Train the TimeSeriesSynthesizer on real time-series data.

This method learns patterns, dependencies, and sequential behaviors from the input dataset (X) while preserving the relationships between entities over time. The synthesizer processes time-dependent features and constructs a generative model capable of producing realistic time-series data.

Parameters:

Name Type Description Default
X Dataset

Input dataset.

required
metadata Metadata

Metadata instance.

required
extracted_cols list[str]

List of columns to extract data from.

None
calculated_features list[dict[str, str |]]

Defines additional business rules to be ensured for the synthetic generated dataset.

None
anonymize Optional[dict | AnonymizerConfigurationBuilder]

Specifies anonymization strategies for sensitive fields while leveraging ydata's AnonymizerEngine

None
privacy_level str | PrivacyLevel

Defines the trade-off between privacy and data fidelity. Options: "HIGH_FIDELITY", "BALANCED_PRIVACY_FIDELITY", "HIGH_PRIVACY". Defaults to "HIGH_FIDELITY". Defaults to HIGH_FIDELITY.

HIGH_FIDELITY
condition_on Union[str, list[str]]

Enables conditional data generation by specifying key features to condition the model on.

None
anonymize_ids bool

If True, automatically anonymizes columns of type ID. Defaults to False.

False
segment_by str | list | `auto`

Defines how data should be segmented while training, based on a column or an automated decision. Options: "auto" (default).

'auto'
random_state Optional

Set a seed for reproducibility. If None, randomness is used.

None

sample(n_entities=None, smoothing=False, fidelity=None, sort_result=True, condition_on=None, balancing=False, random_state=None, connector=None, **kwargs)

Generate a time series.

This method generates a new time series. The instance should be trained via the method fit before calling sample. The generated time series has the same length of the training data. However, in the case of multi-entity time series, it is possible to augment the number of entities by specifying the parameter n_entities.

For a multi-entity sample, there are two major arguments that can be used to modify the results: fidelity and smoothing.

  1. Fidelity: It defines how close the new entities should be from the original ones. When a float, it represents the behavioral noise to be added to the entity expressed as a percentage of its variance. See ydata.synthesizer.entity_augmenter.FidelityConfig for more details.
  2. Smoothing: It defines if and how the new entities trajectory should be smoothed. See ydata.synthesizer.entity_augmenter.SmoothingConfig for more details.

Parameters:

Name Type Description Default
n_entities Optional[int]

Number of entities to sample. If None, generates as many entities as in the training data. By default None.

None
smoothing Union[bool, dict, SmoothingConfig]

Define how the smoothing should be done. True uses the auto configuration.

False
fidelity Optional[Union[float, dict, FidelityConfig]]

Define the fidely policy.

None
sort_result bool

True if the sample should be sorted by sortbykey, False otherwise.

True
condition_on list[ConditionalFeature] | dict | DataFrame | None

Conditional rules to be applied.

None
balancing bool

If True, the categorical features included in the conditional rules have equally distributed percentages.

False

Returns:

Name Type Description
Dataset Dataset

The generated synthetic time-series dataset