Skip to content

Document Generator

The Document Generator allows you to create synthetic documents in various formats (PDF, DOCX, HTML) with customizable parameters. This is particularly useful for generating test data, creating templates, or producing sample documents for training purposes.

  • Generate single or multiple documents
  • Customize document type, audience, and tone
  • Support for multiple output formats (PDF, DOCX, HTML)
  • Control over document length and style
  • Regional and language customization

Limited values for documents input parameter tone

The tone input parameter must receive a value that exists within the following list: formal, casual, persuasive, empathetic, inspirational, enthusiastic, humorous, neutral

Don't forget to set up your license key

import os

os.environ['YDATA_LICENSE_KEY'] = '{add-your-key}'

Example Code

"""
Document Generator Example
"""
import os

from ydata.synthesizers.text.model.document import DocumentGenerator, DocumentFormat

if __name__ == "__main__":
    # Step 1: Authenticate with ydata-sdk
    os.environ['YDATA_LICENSE_KEY'] = '{add-your-key}'  # Replace with your license key

    # Step 2: Initialize the DocumentGenerator with desired format
    print("Initializing Document Generator...")
    generator = DocumentGenerator(
        document_format=DocumentFormat.PDF  # Set the document output format (PDF, DOCX, or HTML)
    )

    # Step 3: Generate a single document

    # Note: The tone parameter accepts one of the following values: [formal, casual, persuasive, empathetic, inspirational, enthusiastic, humorous, neutral]
    print("\n=== Generating Single Document ===")
    generator.generate(
        n_docs=1,  # Generate one document
        document_type="Curriculum",  # Type of document to generate
        audience="HR",  # Target audience
        tone="formal",  # Writing tone (must be one of the predefined values)
        purpose="Application for a Senior Machine Learning Engineer",  # Document purpose
        region="North America",  # Regional context
        language="German",  # Output language
        length="Long",  # Document length
        topics="Foundational models, LLMs, GenerativeAI, API, Python, software engineer",  # Key topics
        style_guide="Flawless design",  # Style requirements
        output_dir="output/documents",  # Output directory
    )

    # Step 4: Generate multiple documents with the same parameters
    print("\n=== Generating Multiple Documents ===")
    generator.generate(
        n_docs=5,  # Generate 5 documents with the same parameters
        document_type="Report",
        audience="Technical",
        tone="neutral",  # Writing tone (must be one of the predefined values)
        purpose="Technical documentation",
        region="Global",
        language="English",
        length="Medium",
        topics="API documentation, code examples, best practices",
        style_guide="Clear and concise",
        output_dir="output/documents",
    )