Skip to content

Document Generator

The Document Generator allows you to create synthetic documents in various formats (PDF, DOCX, HTML) with customizable parameters. This is particularly useful for generating test data, creating templates, or producing sample documents for training purposes.

  • Generate single or multiple documents
  • Customize document type, audience, and tone
  • Support for multiple output formats (PDF, DOCX, HTML)
  • Control over document length and style
  • Regional and language customization

Limited values for documents input parameter tone

The tone input parameter must receive a value that exists within the following list: formal, casual, persuasive, empathetic, inspirational, enthusiastic, humorous, neutral

Don't forget to set up your license key

import os

os.environ['YDATA_LICENSE_KEY'] = '{add-your-key}'

Example Code

"""
Document Generator Example
"""
import os

from ydata.synthesizers.text.model.document import DocumentGenerator, DocumentFormat

if __name__ == "__main__":
    # Step 1: Authenticate with ydata-sdk
    os.environ['YDATA_LICENSE_KEY'] = 'add-sdk-key'  # Replace with your license key

    # Step 2: Initialize the DocumentGenerator with desired format
    print("Initializing Document Generator...")
    generator = DocumentGenerator(
        document_format=DocumentFormat.PDF  # Set the document output format (PDF, DOCX, or HTML)
    )

    # Step 3: Generate a single document

    # Note: The tone parameter accepts one of the following values: [formal, casual, persuasive, empathetic, inspirational, enthusiastic, humorous, neutral]
    print("\n=== Generating Single Invoice Document ===")
    generator.generate(
        n_docs=1,  # Generate one document
        document_type="Invoice",  # Type of document to generate
        audience="Corporate client",  # Target audience
        tone="professional",  # Writing tone
        purpose="Issue a detailed invoice for services rendered. Please provided detailed examples and real line items",  # Document purpose
        region="North America",  # Regional context
        language="English",  # Output language
        length="Long",  # Document length (invoices are usually not long)
        topics="Consulting services, Hourly rates, Tax breakdown, Payment terms",
        # Key topics as a single comma-separated string
        style_guide="Professional design for a financial institution",  # Style or branding requirements
        output_dir="output/documents",  # Output directory
    )

    print("\n=== Generating Single Invoice (Supermarket) Document ===")
    generator.generate(
        n_docs=1,  # Generate one document
        document_type="Invoice",  # Still an invoice
        audience="Retail customer",  # Target audience is a consumer
        tone="professional",  # Still professional but consumer-friendly
        purpose="Detailed supermarket invoice with grocery and household items purchases.",
        # Purpose tailored to retail
        region="North America",  # Regional context
        language="English",  # Output language
        length="Long",  # Allows for many line items
        topics="Groceries, Household goods, Unit price, Quantity, Subtotals, Tax, Total due, Payment method",
        # Supermarket-specific topics
        style_guide="Clean and readable receipt-style format typical of supermarket invoices",
        # Style expectation for consumer retail
        output_dir="output/documents",  # Output directory
    )

    # Step 4: Generate multiple documents with the same parameters
    print("\n=== Generating Multiple Documents ===")
    generator.generate(
        n_docs=5,  # Generate 5 documents with the same parameters
        document_type="Report",
        audience="Technical",
        tone="neutral",  # Writing tone (must be one of the predefined values)
        purpose="Technical documentation",
        region="Global",
        language="English",
        length="Medium",
        topics="API documentation, code examples, best practices",
        style_guide="Clear and concise",
        output_dir="output/documents",
    )