Multi-table Synthesizer
ydata.synthesizers.MultiTableSynthesizer
Bases: BaseModel
fit(X, metadata, anonymize=None, limit=50000000, calculated_features=None, attribute_tables=None, random_state=None, encoder_type=EncoderType.BIRCH)
Fit a MultiTable Synthesizer instance.
The synthesizer operates over a denormalized version of the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
MultiDataset
|
Training dataset. |
required |
metadata
|
MultiMetadata
|
Associated metadata. |
required |
anonymize
|
Optional[dict]
|
Defines which columns to anonymize and the anonymization method. Defaults to None. |
None
|
limit
|
int
|
Limit of rows from the denormalized dataset to use for training. Defaults to 50_000_000. |
50000000
|
calculated_features(Optional[
|
list[dict[str, str | Callable | List[str]]]
|
Lists the column that will be computed based on other tables/columns and the function to compute. Defaults to None. |
required |
attribute_tables
|
list | set | str
|
collection of tables that contain static information. |
None
|
random_state
|
RandomSeed
|
random generator or seed for the synthesizer fit |
None
|
is_attribute_table(table)
sample(n_samples=1.0, connector=None, if_exists=WriteMode.APPEND, random_state=None)
Sample from a trained multitable synthesizer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_samples
|
float | None
|
percentage of the original database to sample. Values between 0.1 up to 5 are accepted by the method. Default is set to 1.0. |
1.0
|
connector
|
RDBMSConnector | None
|
connector to enable persist tables progressively. |
None
|
if_exists
|
{'fail', 'replace', 'append'}
|
defines the write behavior when the table already exists. Defaults to 'append' - append: add the data to the pre-existing table. - fail: raises an error if the table exists. - replace: drop the existing table and create a new one. Note that when using replace, if the database table has constraints that restrict deletion, the persistence can fail leading to inconsistencies in the database |
APPEND
|
random_state
|
RandomSeed
|
random generator or seed for the synthesizer fit |
None
|
Note
When this method receives a connector to a RDBMS database, it persists all generated tables and return an empty dataset. The use of a connector is recommended for a better memory management.
Returns:
Type | Description |
---|---|
MultiDataset
|
synthetic MultiDataset |
save(path)
Saves the SYNTHESIZER and the models fitted per variable.
update_anonymized_columns_metadata(table, metadata, dataset_schema, anonymization_data)
update metadata types after anonymization.