Document Generator
The Document Generator allows you to create synthetic documents in various formats (PDF, DOCX, HTML) with customizable parameters. This is particularly useful for generating test data, creating templates, or producing sample documents for training purposes.
- Generate single or multiple documents
- Customize document type, audience, and tone
- Support for multiple output formats (PDF, DOCX, HTML)
- Control over document length and style
- Regional and language customization
Limited values for documents input parameter tone
The tone
input parameter must receive a value that exists within the following list: formal, casual, persuasive, empathetic, inspirational, enthusiastic, humorous, neutral
Don't forget to set up your license key
Example Code
"""
Document Generator Example
"""
import os
from ydata.synthesizers.text.model.document import DocumentGenerator, DocumentFormat
if __name__ == "__main__":
# Step 1: Authenticate with ydata-sdk
os.environ['YDATA_LICENSE_KEY'] = '{add-your-key}' # Replace with your license key
# Step 2: Initialize the DocumentGenerator with desired format
print("Initializing Document Generator...")
generator = DocumentGenerator(
document_format=DocumentFormat.PDF # Set the document output format (PDF, DOCX, or HTML)
)
# Step 3: Generate a single document
# Note: The tone parameter accepts one of the following values: [formal, casual, persuasive, empathetic, inspirational, enthusiastic, humorous, neutral]
print("\n=== Generating Single Document ===")
generator.generate(
n_docs=1, # Generate one document
document_type="Curriculum", # Type of document to generate
audience="HR", # Target audience
tone="formal", # Writing tone (must be one of the predefined values)
purpose="Application for a Senior Machine Learning Engineer", # Document purpose
region="North America", # Regional context
language="German", # Output language
length="Long", # Document length
topics="Foundational models, LLMs, GenerativeAI, API, Python, software engineer", # Key topics
style_guide="Flawless design", # Style requirements
output_dir="output/documents", # Output directory
)
# Step 4: Generate multiple documents with the same parameters
print("\n=== Generating Multiple Documents ===")
generator.generate(
n_docs=5, # Generate 5 documents with the same parameters
document_type="Report",
audience="Technical",
tone="neutral", # Writing tone (must be one of the predefined values)
purpose="Technical documentation",
region="Global",
language="English",
length="Medium",
topics="API documentation, code examples, best practices",
style_guide="Clear and concise",
output_dir="output/documents",
)