Faker Synthesizer
Overview
YData SDK includes the Faker Synthesizer, a method for generating synthetic data from a schema or metadata only — no real dataset is required. You define column types and characteristics (such as id, name, email, phone, address, zipcode), optional regex patterns, categorical values, and date ranges. The synthesizer then generates rows that match this schema, making it ideal when you need plausible synthetic data without access to any real source data.
You can build the schema with MetadataConfigurationBuilder or a dictionary configuration, and optionally derive metadata from an existing dataset and then generate from it. Locale support allows you to control the language and format of generated values.
Key Features
- No Source Data Required: Generate data purely from a schema or metadata definition.
- Metadata- and Schema-Driven: Define columns by datatype, vartype, and characteristics (id, name, email, phone, address, zipcode, date).
- Flexible Constraints: Use regex patterns for string columns and categorical distributions for controlled variety.
- Date Control: Set min, max, and format for date columns.
- Dual Configuration: Use either
MetadataConfigurationBuilderfor programmatic setup or a dictionary config. - Locale Support: Choose a locale (e.g.
"en") for language and regional formatting. - Fit from Data or Scratch: Fit from existing data’s metadata (e.g. infer schema then generate) or from a schema built from scratch.
Use Cases
- Prototyping and Demos: Quickly create sample tables for UI demos or product mockups.
- Test Fixtures and Dev Databases: Populate test and development environments with realistic-looking data.
- Schema-Only or Masking Workflows: Generate data that respects a target schema without exposing real data.
- No Real Data Available: Produce synthetic data when data collection is impossible or not yet done.
Best Practices
- Define Clear Datatypes and Characteristics: Use the right characteristic (e.g. email, phone) so generated values match expectations.
- Use Categories and Regex for Control: Categorical distributions and regex keep variety predictable and valid.
- Set Date Ranges Where Needed: Use min/max and format for date columns to avoid invalid or inconsistent values.
- Prefer the Builder for Complex Schemas:
MetadataConfigurationBuilderkeeps large schemas readable and maintainable.
Advanced Usage
The Faker Synthesizer supports two configuration styles:
- Builder: Use
MetadataConfigurationBuilderto add columns with datatype, vartype, characteristic, and optional regex, categories, or date min/max. Then build aMetadataand pass it tofit(). - Dictionary: Pass a dictionary mapping column names to config (datatype, vartype, characteristic, regex, categories, min, max, format, etc.) into
MetadataConfigurationBuilder(config)and then intoMetadata.
You can fit from scratch (metadata built only from schema) or from existing data by computing Metadata(data) and fitting the synthesizer on that metadata to generate data that matches the same structure.
Related Materials
