Skip to content

Document Generator

The Document Generator creates synthetic documents in PDF, DOCX, and HTML formats. It supports any LLM provider — Workbench (default), OpenAI, Anthropic, or Gemini — via the backend parameter.

  • Generate single or multiple documents
  • Customize document type, audience, tone, language, and region
  • Multiple output formats: PDF, DOCX, HTML
  • Scanned document simulation: pass scanned=True to generate() or use "scanned": {True: 0.3, False: 0.7} in DatasetConfig.variations for mixed batches
  • Brand logos: pass logo_path to embed a logo (PNG, JPG, GIF, SVG, or WEBP) into each document's brand slot
  • Template from an image: reproduce an existing layout with generate_from_template()
  • Batch generation with controlled variation via DatasetConfig

Tone values

The tone parameter accepts: formal, casual, persuasive, empathetic, inspirational, enthusiastic, humorous, neutral.

Provider setup

Set your provider's API key as an environment variable, or pass it directly via subscription_key=:

import os
os.environ['WORKBENCH_SUBSCRIPTION_KEY'] = '<your-key>'
import os
os.environ['ANTHROPIC_API_KEY'] = '<your-key>'
import os
os.environ['OPENAI_API_KEY'] = '<your-key>'
import os
os.environ['GOOGLE_API_KEY'] = '<your-key>'

Example Code

"""
Document Generator Example
"""
import os

from ydata.synthesizers.text.model.document import DocumentGenerator, DocumentFormat

if __name__ == "__main__":
    # Step 1: Authenticate with ydata-sdk
    os.environ['YDATA_LICENSE_KEY'] = 'add-sdk-key'  # Replace with your license key

    # Step 2: Initialize the DocumentGenerator with desired format
    print("Initializing Document Generator...")
    generator = DocumentGenerator(
        document_format=DocumentFormat.PDF  # Set the document output format (PDF, DOCX, or HTML)
    )

    # Step 3: Generate a single document

    # Note: The tone parameter accepts one of the following values: [formal, casual, persuasive, empathetic, inspirational, enthusiastic, humorous, neutral]
    print("\n=== Generating Single Invoice Document ===")
    generator.generate(
        n_docs=1,  # Generate one document
        document_type="Invoice",  # Type of document to generate
        audience="Corporate client",  # Target audience
        tone="professional",  # Writing tone
        purpose="Issue a detailed invoice for services rendered. Please provided detailed examples and real line items",  # Document purpose
        region="North America",  # Regional context
        language="English",  # Output language
        length="Long",  # Document length (invoices are usually not long)
        topics="Consulting services, Hourly rates, Tax breakdown, Payment terms",
        # Key topics as a single comma-separated string
        style_guide="Professional design for a financial institution",  # Style or branding requirements
        output_dir="output/documents",  # Output directory
    )

    print("\n=== Generating Single Invoice (Supermarket) Document ===")
    generator.generate(
        n_docs=1,  # Generate one document
        document_type="Invoice",  # Still an invoice
        audience="Retail customer",  # Target audience is a consumer
        tone="professional",  # Still professional but consumer-friendly
        purpose="Detailed supermarket invoice with grocery and household items purchases.",
        # Purpose tailored to retail
        region="North America",  # Regional context
        language="English",  # Output language
        length="Long",  # Allows for many line items
        topics="Groceries, Household goods, Unit price, Quantity, Subtotals, Tax, Total due, Payment method",
        # Supermarket-specific topics
        style_guide="Clean and readable receipt-style format typical of supermarket invoices",
        # Style expectation for consumer retail
        output_dir="output/documents",  # Output directory
    )

    # Step 4: Generate multiple documents with the same parameters
    print("\n=== Generating Multiple Documents ===")
    generator.generate(
        n_docs=5,  # Generate 5 documents with the same parameters
        document_type="Report",
        audience="Technical",
        tone="neutral",  # Writing tone (must be one of the predefined values)
        purpose="Technical documentation",
        region="Global",
        language="English",
        length="Medium",
        topics="API documentation, code examples, best practices",
        style_guide="Clear and concise",
        output_dir="output/documents",
    )