Skip to content

Synthesize Relational databases

Integrate Fabric's MultiTableSynthesizer in your data flows and generate synthetic relational databases or multi-table datasets

The capability to generate synthetic data from relational databases is a powerful and innovative approach to streamline the access to data and improve data democratization strategy within the organization. Fabric's SDK makes available an easy-to-use code interface to integrate the process of generating synthetic multi-table databases into your existing data flows.

How to get your datasource?

Learn how to create your multi-table data in Fabric here before creating your first multi-table synthetic data generator!

Get your datasource and connector ID

Datasource uid: You can find your datasource ID through Fabric UI. Open your relational dataset and click in the "Explore in Labs" button. Copy the uid that you find available in the code snippet.

Connector uid: You can find your connector ID through Fabric UI. Open the connector tab from your Data Catalog. Under the connector "Actions" select "Explore in Lab". Copy the uid available in the code snippet.

Quickstart example:

import os

from ydata.sdk.datasources import DataSource
from ydata.sdk.synthesizers import MultiTableSynthesizer

# Authenticate to Fabric to leverage the SDK - https://docs.sdk.ydata.ai/latest/sdk/installation/
# Make sure to add your token as env variable.
os.environ["YDATA_TOKEN"] = '<TOKEN>'  # Remove if already defined

# In this example, we demonstrate how to train a synthesizer from an existing RDBMS Dataset.
# Make sure to follow the step-by-step guide to create a Dataset in Fabric's catalog: https://docs.sdk.ydata.ai/latest/get-started/create_multitable_dataset/
X = DataSource.get('<DATASOURCE_UID>')

# Init a multi-table synthesizer. Provide a connector so that the process of data synthesis write the
# synthetic data into the destination database
# Provide a connector ID as the write_connector argument. See in this tutorial how to get a connector ID
synth = MultiTableSynthesizer(write_connector='<CONNECTOR_UID')

# Start the training of your synthetic data generator
synth.fit(X)

# As soon as the training process is completed you are able to sample a synthetic database
# The input expected is a percentage of the original database size
# In this case it was requested a synthetic database with the same size as the original
# Your synthetic sample was written to the database provided in the write_connector
synth.sample(frac=1.)