Multi-table Synthesizer

`ydata.synthesizers.MultiTableSynthesizer`

Bases: BaseModel

`fit(X, metadata, anonymize=None, limit=50000000, calculated_features=None, attribute_tables=None, random_state=None, encoder_type=EncoderType.BIRCH)`

Fit a MultiTable Synthesizer instance.

The synthesizer operates over a denormalized version of the dataset.

Parameters:

Name	Type	Description	Default
`X`	`MultiDataset`	Training dataset.	required
`metadata`	`MultiMetadata`	Associated metadata.	required
`anonymize`	`Optional[dict]`	Defines which columns to anonymize and the anonymization method. Defaults to None.	`None`
`limit`	`int`	Limit of rows from the denormalized dataset to use for training. Defaults to 50_000_000.	`50000000`
`calculated_features(Optional[`	`list[dict[str, str \| Callable \| List[str]]]`	Lists the column that will be computed based on other tables/columns and the function to compute. Defaults to None.	required
`attribute_tables`	`list \| set \| str`	collection of tables that contain static information.	`None`
`random_state`	`RandomSeed`	random generator or seed for the synthesizer fit	`None`

`is_attribute_table(table)`

Check if table is an attribute table.

Parameters:

Name	Type	Description	Default
`table`	`str`	table name	required

Returns:

Name	Type	Description
`bool`	`bool`	True if table is an attribute table, false otherwise.

`sample(n_samples=1.0, connector=None, if_exists=WriteMode.APPEND, random_state=None)`

Sample from a trained multitable synthesizer.

Parameters:

Name	Type	Description	Default
`n_samples`	`float \| None`	percentage of the original database to sample. Values between 0.1 up to 5 are accepted by the method. Default is set to 1.0.	`1.0`
`connector`	`RDBMSConnector \| None`	connector to enable persist tables progressively.	`None`
`if_exists`	`{'fail', 'replace', 'append'}`	defines the write behavior when the table already exists. Defaults to 'append' - append: add the data to the pre-existing table. - fail: raises an error if the table exists. - replace: drop the existing table and create a new one. Note that when using replace, if the database table has constraints that restrict deletion, the persistence can fail leading to inconsistencies in the database	`APPEND`
`random_state`	`RandomSeed`	random generator or seed for the synthesizer fit	`None`

Note

When this method receives a connector to a RDBMS database, it persists all generated tables and return an empty dataset. The use of a connector is recommended for a better memory management.

Returns:

Type	Description
`MultiDataset`	synthetic MultiDataset

`save(path)`

Saves the SYNTHESIZER and the models fitted per variable.

`update_anonymized_columns_metadata(table, metadata, dataset_schema, anonymization_data)`

update metadata types after anonymization.