Skip to content

Welcome to YData SDK Documentation

pypi Pythonversion downloads

Overview

YData SDK is the leading Python package for Data & AI, providing an ecosystem of methods that enables data professionals to adopt a data-centric development approach focused on improving data quality. The library includes integrated components for:

  • Data Ingestion: Connect to various data sources seamlessly
  • Data Quality Evaluation: Standardized metrics and assessments
  • Data Improvement: Tools for enhancing dataset quality
  • Synthetic Data Generation: Create high-quality synthetic datasets

Get Started with YData SDK

Get your license key at ydata.ai/register

Key Benefits

YData SDK offers several advantages for AI, data science development and data management:

  • Next-Gen Features

    • State-of-the-art data quality profiling
    • Advanced metadata management
    • Leading synthetic data generation technology
  • Enhanced Collaboration

    • Seamless integration with multiple tools and services
    • Unified environment for all developers
    • Reduced development overhead
  • Improved Developer Experience

    • Well-integrated software solution
    • Seamless transitions between tools
    • Consistent compatibility
  • Enterprise Interoperability

    • Native integration with major platforms (Databricks, Snowflake)
    • Cohesive data architecture support
    • Enterprise-grade reliability

Core Functionality

1. Connectors

2. Metadata

3. Data Profiling

4. Synthetic Data

5. Data Anonymization

Supported Data Formats

Tabular data Synthetic data generator The RegularSynthesizer is perfect for high-dimensional, time-independent data synthesis with exceptional quality results.

Timeseries Synthetic data generator The TimeSeriesSynthesizer handles both regular and irregular time-series data, from smart sensors to stock market data, including support for transactional data with irregular intervals.

Relational databases Synthetic data generator The MultiTableSynthesizer excels at replicating complex relational database schemas while maintaining data integrity and relationships.

The TextSynthesizer and QASynthesizer excels at generating privacy preserving text corpus and generating Question and Answer Pairs for LLM fine tuning and eval.

The DocumentSynthesizer excels at replicating complex custom internal documents while maintaining data consistency and content relevance.