Skip to content

Getting Started with YData SDK

The ydata-sdk is a powerful Python package designed to simplify data access, processing, and synthetic data generation within the YData ecosystem. This comprehensive toolkit enables users to manage datasets, run profiling, and generate high-quality synthetic data for analytics, machine learning, and data privacy applications.

Core Capabilities

The SDK is structured into six key areas, each designed to address specific data management needs:

1. Connectors

  • Data Source Integration
    • Connect to various databases (SQL, DWs, Lakehouses)
    • Access cloud storage (S3, Azure, GCP)
    • Handle local file systems
  • Streamlined Data Access
    • Unified interface for all data sources
    • Optimized data loading
    • Efficient memory management

2. Metadata

  • Data Understanding
    • Extract comprehensive dataset metadata
    • Analyze data quality metrics
    • Track data lineage
  • Enhanced Management
    • Automated metadata collection
    • Version control for datasets
    • Quality monitoring

3. Profiling

  • Comprehensive Analysis
    • Statistical profiling and analysis
    • Data quality assessment
    • Pattern and anomaly detection
  • Visualization
    • Interactive data visualizations
    • Distribution analysis
    • Correlation insights
  • Automated Reporting
    • Quality score generation
    • Data drift monitoring
    • Actionable recommendations

4. Anonymization

  • Privacy Protection
    • PII detection and masking
    • Sensitive data handling
    • Compliance validation
  • Advanced Methods
    • Multiple anonymization techniques
    • Privacy metrics calculation
    • Utility preservation
  • Custom Rules
    • Configurable privacy rules
    • Business-specific requirements
    • Regulatory compliance

5. Synthetic Data

  • Data Generation
    • Create high-fidelity synthetic datasets
    • Preserve data distributions and relationships
    • Ensure privacy compliance
  • Use Cases
    • Analytics and reporting
    • Machine learning / AI training
    • Privacy-preserving sharing / applications

6. Report

  • Automated Reporting
    • Generate comprehensive data quality reports
    • Create profiling insights
    • Perform integrity checks
  • Output Formats
    • Interactive dashboards
    • PDF reports
    • JSON exports

Support

Need help getting started? Check out our: