Skip to content

❄️ Snowflake & YData

YData seamless integrates with Snowflake, allowing you to connect, query, and manage your data in Snowflake with ease. This section will guide you through the benefits, setup, and use of the Snowflake connector.

Benefits of Integration

Integrating YData SDK with Snowflake offers several key benefits:

  • Scalability: Snowflake's architecture scales effortlessly with your data needs, while YData SDK ensure efficient data integration and management.
  • Performance: Leveraging Snowflake's high performance for data querying and YData SDK's optimization techniques enhances overall data processing speed.
  • Security: Snowflake's robust security features, combined with YData SDK's data governance capabilities, ensure your data remains secure and compliant.
  • Interoperability: YData SDK simplifies the process of connecting to Snowflake, allowing you to quickly set up and start using the data without extensive configuration. Benefit from the unique YData functionalities like data preparation with Python, synthetic data generation and data profiling.

Setting Up the Snowflake Connector

👨‍💻 Complete code example and recipe can be found here.

    # Importing YData's package
    from ydata.connectors import SnowflakeConnector

    # Build your connection string
    USERNAME = "insert-username"
    PASSWORD = "insert-password"
    ACCOUNT_IDENTIFIER = "insert-account-identifier"
    PORT = 443
    DATABASE_NAME = "insert-database-name"
    SCHEMA = "insert-schema-name"
    WAREHOUSE = "insert-warehouse-name"

    conn_str = {
        "hostname": ACCOUNT_IDENTIFIER,
        "username": USERNAME,
        "password": PASSWORD,
        "port": PORT,
        "database": DATABASE_NAME,
        "warehouse": WAREHOUSE
    }

    # Create the Snowflake Connector
    conn = SnowflakeConnector(conn_string=conn_str)

    print(conn)

With your connector created you are now able to explore your database and available datasets.

List available schemas and get the metadata of a given schema
    # returns a list of schemas
    schemas = conn.list_schemas()

    # get the metadata of a database schema, including columns and relations between tables (PK and FK)
    schema = conn.get_database_schema('PATIENTS')

Read from a Snowflake instance

Using the Snowflake connector it is possible to:

  • Get the data from a Snowflake table
  • Get a sample from a Snowflake table
  • Get the data from a query to a Snowflake instance
  • Get the full data from a selected database
Read full and a sample from a table
    # returns the whole data from a given table
    table = conn.get_table('cardio_test')
    print(table)

    # Get a sample with n rows from a given table
    table_sample = conn.get_table_sample(table='cardio_test', sample_size=50)
    print(table_sample)
Get the data from a query
    # returns the whole data from a given table
    query_output = conn.query('SELECT * FROM patients.cardio_test;')
    print(query_output)

Write to a Snowflake instance

If you need to write your data into a Snowflake instance you can also leverage your Snowflake connector for the following actions:

  • Write the data into a table
  • Write a new database schema

The if_exists parameter allow you to decide whether you want to append, replace or fail in case a table with the same name already exists in the schema.

Writing a dataset to a table in a Snowflake schema
    conn.write_table(data=tables['cardio_test'],
                          name='cardio',
                          if_exists='fail')

table_names allow you to define a new name for the table in the database. If not provided it will be assumed the table names from your dataset.

Writing a full database to a Snowflake schema
    conn.write_database(data=database,
                         schema_name='new_cardio',
                         table_names={'cardio_test': 'cardio'})

I hope you enjoyed this quick tutorial on seamlessly integrating Snowflake with your data preparation workflows. ❄️🚀