YData Synthesizers offers 3 different levels of privacy:
- high privacy: the model is optimized for privacy purposes,
- high fidelity (default): the model is optimized for high fidelity,
- balanced: tradeoff between privacy and fidelity.
The default privacy level is high fidelity. The privacy level can be changed by the user at the moment a synthesizer level is trained by using the parameter
The parameter expect a
What is the difference between anonymization and privacy?
Anonymization makes sure sensitive information are hidden from the data. Privacy makes sure it is not possible to infer the original data points from the synthetic data points via statistical attacks.
Therefore, for data sharing anonymization and privacy controls are complementary.
The example below demonstrates how to train a synthesizer configured for high privacy:
import os from ydata.sdk.dataset import get_dataset from ydata.sdk.synthesizers import PrivacyLevel, RegularSynthesizer # Do not forget to add your token as env variables os.environ["YDATA_TOKEN"] = '<TOKEN>' # Remove if already defined def main(): """In this example, we demonstrate how to train a synthesizer with a high-privacy setting from a pandas DataFrame. After training a Regular Synthesizer, we request a sample. """ X = get_dataset('titanic') # We initialize a regular synthesizer # As long as the synthesizer does not call `fit`, it exists only locally synth = RegularSynthesizer() # We train the synthesizer on our dataset setting the privacy level to high synth.fit( X, name="titanic_synthesizer", privacy_level=PrivacyLevel.HIGH_PRIVACY ) # We request a synthetic dataset with 50 rows sample = synth.sample(n_samples=50) print(sample) if __name__ == "__main__": main()