Constraints

The constraint engine lets you define data quality rules that can be validated against any Dataset and used to filter out non-compliant rows before synthesis or downstream processing.

How it works

Every constraint implements a validate(dataset) method that returns a boolean mask — True where the rule is satisfied, False where it is violated. The ConstraintEngine collects those masks and aggregates them into a summary report.

Dataset ──► ConstraintEngine.validate() ──► summary()  (what broke?)
                     │
                     └──► filter()  (remove offending rows)

Two kinds of constraints

Kind	What it checks	Output shape	Used by `filter()`?
Row constraint	Each individual row (value comparisons, nulls, regex…)	`n_rows × n_cols` boolean mask	✅ Yes
Column constraint	An aggregate statistic of the whole column (mean, std, max…)	`1 × n_cols` boolean	❌ No — reported only

Quick start

from ydata.constraints import (
    ConstraintEngine,
    NotNull, Unique, GreaterThan, NotIncludedIn,  # row
    MeanBetween, NullRateLowerThan, MaxBetween,   # column
    CustomConstraint,                              # bring your own logic
)
from ydata.dataset import Dataset

engine = ConstraintEngine([
    # ── Row constraints ──
    NotNull(columns=["age", "income"]),
    Unique(columns=["customer_id"]),
    GreaterThan(columns=["age"], value=0),
    NotIncludedIn(column="status", values=["banned", "deleted"]),

    # ── Column constraints ──
    MeanBetween(lower_bound=20, upper_bound=65, columns=["age"]),
    NullRateLowerThan(value=0.05, columns=["income"]),
    MaxBetween(lower_bound=0, upper_bound=10_000, columns=["tx_amount"]),

    # ── Custom logic ──
    CustomConstraint(
        lambda df: df["end_date"] >= df["start_date"],
        columns=["end_date"], available_columns=["start_date"],
        axis="row",
    ),
])

engine.validate(dataset)
print(engine.summary())

clean_dataset = engine.filter(dataset)

Sections

Row Constraints — per-row rules: nulls, uniqueness, ranges, patterns, ordering
Column Constraints — aggregate rules: mean, min/max, null rate, std
Constraint Engine — validate, filter, summary, fault isolation, and combining constraints