Document Generator
The Document Generator creates synthetic documents in PDF, DOCX, and HTML formats. It supports any LLM provider — Workbench (default), OpenAI, Anthropic, or Gemini — via the backend parameter.
- Generate single or multiple documents
- Customize document type, audience, tone, language, and region
- Multiple output formats: PDF, DOCX, HTML
- Scanned document simulation: pass
scanned=Truetogenerate()or use"scanned": {True: 0.3, False: 0.7}inDatasetConfig.variationsfor mixed batches - Brand logos: pass
logo_pathto embed a logo (PNG, JPG, GIF, SVG, or WEBP) into each document's brand slot - Template from an image: reproduce an existing layout with
generate_from_template() - Batch generation with controlled variation via
DatasetConfig
Tone values
The tone parameter accepts: formal, casual, persuasive, empathetic, inspirational, enthusiastic, humorous, neutral.
Provider setup
Set your provider's API key as an environment variable, or pass it directly via subscription_key=:
Example Code
"""
Document Generator Example
"""
import os
from ydata.synthesizers.text.model.document import DocumentGenerator, DocumentFormat
if __name__ == "__main__":
# Step 1: Authenticate with ydata-sdk
os.environ['YDATA_LICENSE_KEY'] = 'add-sdk-key' # Replace with your license key
# Step 2: Initialize the DocumentGenerator with desired format
print("Initializing Document Generator...")
generator = DocumentGenerator(
document_format=DocumentFormat.PDF # Set the document output format (PDF, DOCX, or HTML)
)
# Step 3: Generate a single document
# Note: The tone parameter accepts one of the following values: [formal, casual, persuasive, empathetic, inspirational, enthusiastic, humorous, neutral]
print("\n=== Generating Single Invoice Document ===")
generator.generate(
n_docs=1, # Generate one document
document_type="Invoice", # Type of document to generate
audience="Corporate client", # Target audience
tone="professional", # Writing tone
purpose="Issue a detailed invoice for services rendered. Please provided detailed examples and real line items", # Document purpose
region="North America", # Regional context
language="English", # Output language
length="Long", # Document length (invoices are usually not long)
topics="Consulting services, Hourly rates, Tax breakdown, Payment terms",
# Key topics as a single comma-separated string
style_guide="Professional design for a financial institution", # Style or branding requirements
output_dir="output/documents", # Output directory
)
print("\n=== Generating Single Invoice (Supermarket) Document ===")
generator.generate(
n_docs=1, # Generate one document
document_type="Invoice", # Still an invoice
audience="Retail customer", # Target audience is a consumer
tone="professional", # Still professional but consumer-friendly
purpose="Detailed supermarket invoice with grocery and household items purchases.",
# Purpose tailored to retail
region="North America", # Regional context
language="English", # Output language
length="Long", # Allows for many line items
topics="Groceries, Household goods, Unit price, Quantity, Subtotals, Tax, Total due, Payment method",
# Supermarket-specific topics
style_guide="Clean and readable receipt-style format typical of supermarket invoices",
# Style expectation for consumer retail
output_dir="output/documents", # Output directory
)
# Step 4: Generate multiple documents with the same parameters
print("\n=== Generating Multiple Documents ===")
generator.generate(
n_docs=5, # Generate 5 documents with the same parameters
document_type="Report",
audience="Technical",
tone="neutral", # Writing tone (must be one of the predefined values)
purpose="Technical documentation",
region="Global",
language="English",
length="Medium",
topics="API documentation, code examples, best practices",
style_guide="Clear and concise",
output_dir="output/documents",
)
