Skip to content

Document Generator

ydata.synthesizers.text.model.document.DocumentFormat

Bases: Enum

Enum representing supported output formats for synthetic document generation.

Attributes:

Name Type Description
DOCX

Microsoft Word document format (docx)

PDF

Portable Document Format (pdf)

HTML

HyperText Markup Language format (html)

ydata.synthesizers.text.model.document.DocumentGenerator

Bases: BaseGenerator

A class for generating synthetic documents in various formats (DOCX, PDF, HTML) based on input specifications.

Features
  • Support for multiple document formats (DOCX, PDF, HTML)
  • Configurable LLM selection
  • Template-based document generation
  • Customizable document structure and styling
  • Batch processing of multiple document specifications

Parameters:

Name Type Description Default
api_key str

API key for the LLM provider

required
provider Union[LLMProvider, str]

The LLM provider to use

OPENAI
model_name Optional[Union[OpenAIModel, AnthropicModel, str]]

Specific model to use

GPT_4
default_format DocumentFormat

Default output format if not specified in request

required

generate(document_type=None, n_docs=1, audience=None, tone=None, purpose=None, region=None, language=None, length=None, topics=None, style_guide=None, output_dir=None, **kwargs)

Generate documents based on input specifications.

Parameters:

Name Type Description Default
document_type str

Type of document to generate

None
audience str

Target audience for the document

None
tone str

Desired tone of the document. Can be selected from the following limited list of values formal, casual, persuasive, empathetic, inspirational, enthusiastic, humorous, neutral.

None
purpose str

Purpose of the document

None
region str

Target region/locale

None
language str

Language of the document

None
length str

Desired length of the document

None
topics str

Key points to cover

None
style_guide str

Style guide to follow

None
output_dir str

Directory to store generated documents

None
**kwargs

Additional arguments to pass to the generation process

{}

Raises:

Type Description
ValueError

If input validation fails or document format is unsupported