Skip to content

Multi-table Synthesizer

ydata.synthesizers.MultiTableSynthesizer

Bases: BaseModel

fit(X, metadata, anonymize=None, limit=50000000, calculated_features=None, attribute_tables=None, random_state=None, encoder_type=EncoderType.BIRCH)

Fit a MultiTable Synthesizer instance.

The synthesizer operates over a denormalized version of the dataset.

Parameters:

Name Type Description Default
X MultiDataset

Training dataset.

required
metadata MultiMetadata

Associated metadata.

required
anonymize Optional[dict]

Defines which columns to anonymize and the anonymization method. Defaults to None.

None
limit int

Limit of rows from the denormalized dataset to use for training. Defaults to 50_000_000.

50000000
calculated_features(Optional[ list[dict[str, str | Callable | List[str]]]

Lists the column that will be computed based on other tables/columns and the function to compute. Defaults to None.

required
attribute_tables list | set | str

collection of tables that contain static information.

None
random_state RandomSeed

random generator or seed for the synthesizer fit

None

is_attribute_table(table)

Check if table is an attribute table.

Parameters:

Name Type Description Default
table str

table name

required

Returns:

Name Type Description
bool bool

True if table is an attribute table, false otherwise.

sample(n_samples=1.0, connector=None, if_exists=WriteMode.APPEND, random_state=None)

Sample from a trained multitable synthesizer.

Parameters:

Name Type Description Default
n_samples float | None

percentage of the original database to sample. Values between 0.1 up to 5 are accepted by the method. Default is set to 1.0.

1.0
connector RDBMSConnector | None

connector to enable persist tables progressively.

None
if_exists {'fail', 'replace', 'append'}

defines the write behavior when the table already exists. Defaults to 'append' - append: add the data to the pre-existing table. - fail: raises an error if the table exists. - replace: drop the existing table and create a new one. Note that when using replace, if the database table has constraints that restrict deletion, the persistence can fail leading to inconsistencies in the database

APPEND
random_state RandomSeed

random generator or seed for the synthesizer fit

None
Note

When this method receives a connector to a RDBMS database, it persists all generated tables and return an empty dataset. The use of a connector is recommended for a better memory management.

Returns:

Type Description
MultiDataset

synthetic MultiDataset

save(path)

Saves the SYNTHESIZER and the models fitted per variable.

update_anonymized_columns_metadata(table, metadata, dataset_schema, anonymization_data)

update metadata types after anonymization.