Bases: BaseSynthesizer
Source code in ydata/sdk/synthesizers/regular.py
fit(X, privacy_level=PrivacyLevel.HIGH_FIDELITY, entity_id_cols=None, generate_cols=None, exclude_cols=None, dtypes=None, target=None, name=None, anonymize=None)
Fit the synthesizer.
The synthesizer accepts as training dataset either a pandas DataFrame
directly or a YData DataSource
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Union[DataSource, pandas.DataFrame]
|
Training dataset |
required |
privacy_level |
PrivacyLevel
|
Synthesizer privacy level (defaults to high fidelity) |
PrivacyLevel.HIGH_FIDELITY
|
entity_id_cols |
Union[str, List[str]]
|
(optional) columns representing entities ID |
None
|
generate_cols |
List[str]
|
(optional) columns that should be synthesized |
None
|
exclude_cols |
List[str]
|
(optional) columns that should not be synthesized |
None
|
dtypes |
Dict[str, Union[str, DataType]]
|
(optional) datatype mapping that will overwrite the datasource metadata column datatypes |
None
|
target |
Optional[str]
|
(optional) Target column |
None
|
name |
Optional[str]
|
(optional) Synthesizer instance name |
None
|
anonymize |
Optional[str]
|
(optional) fields to anonymize and the anonymization strategy |
None
|
Source code in ydata/sdk/synthesizers/regular.py
sample(n_samples=1)
Sample from a RegularSynthesizer
instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_samples |
int
|
number of rows in the sample |
1
|
Returns:
Type | Description |
---|---|
pdDataFrame
|
synthetic data |
Source code in ydata/sdk/synthesizers/regular.py
PrivacyLevel
Bases: Enum
Privacy level exposed to the end-user.
BALANCED_PRIVACY_FIDELITY = auto()
class-attribute
Balanced privacy/fidelity
HIGH_FIDELITY = auto()
class-attribute
High fidelity
HIGH_PRIVACY = auto()
class-attribute
High privacy