Text to Dataset (LLM synthesizer)

`ydata.synthesizers.LLMSynthesizer`

Generates tabular or multi-table synthetic data from a prompt-based schema (no source dataset).

Use fit(tables=...) to set the schema, then sample(sample_size=...) to generate.

Example (financial services): >>> from ydata.synthesizers import LLMSynthesizer >>> synth = LLMSynthesizer(model="openai/gpt-5-nano") >>> tables = { ... "transactions": { ... "prompt": "Credit card transactions", ... "columns": { ... "transaction_id": {"prompt": "unique id", "dtype": "string"}, ... "amount": {"prompt": "amount", "dtype": "float"}, ... }, ... } ... } >>> synth.fit(tables=tables) >>> data = synth.sample(sample_size=100)

`fit(tables, existing_data=None)`

Set the schema used for generation.

Parameters:

Name	Type	Description	Default
`tables`	`dict[str, dict]`	Map of table name -> {"prompt": str, "columns": {col: {"prompt", "dtype"} or {"dtype": "category", "values": [...]}}. Optional per table: "primary_key", "foreign_keys" (list of {column, referenced_table, prompt}).	required
`existing_data`	`dict[str, DataFrame] \| None`	Optional. If provided, new columns are generated for these rows (e.g. enrich existing transactions).	`None`

Returns:

Type	Description
`'LLMSynthesizer'`	self

`sample(sample_size=4, progress_callback=None)`

Generate rows from the schema set in fit().

Parameters:

Name	Type	Description	Default
`sample_size`	`int \| dict[str, int]`	Rows per root table. int (same for all) or dict[table_name, int]. Default 4.	`4`
`progress_callback`	`Callable[..., Awaitable[None]] \| None`	Optional async callback for progress (e.g. table, rows, percentage).	`None`

Returns:

Type	Description
`Dataset \| MultiDataset`	Dataset if one table, else MultiDataset.

Raises:

Type	Description
`ValueError`	If `fit()` was not called.