Constraints
The constraint engine lets you define data quality rules that can be validated against any Dataset and used to filter out non-compliant rows before synthesis or downstream processing.
How it works
Every constraint implements a validate(dataset) method that returns a boolean mask — True where the rule is satisfied, False where it is violated. The ConstraintEngine collects those masks and aggregates them into a summary report.
Dataset ──► ConstraintEngine.validate() ──► summary() (what broke?)
│
└──► filter() (remove offending rows)
Two kinds of constraints
| Kind | What it checks | Output shape | Used by filter()? |
|---|---|---|---|
| Row constraint | Each individual row (value comparisons, nulls, regex…) | n_rows × n_cols boolean mask |
✅ Yes |
| Column constraint | An aggregate statistic of the whole column (mean, std, max…) | 1 × n_cols boolean |
❌ No — reported only |
Quick start
from ydata.constraints import (
ConstraintEngine,
NotNull, Unique, GreaterThan, NotIncludedIn, # row
MeanBetween, NullRateLowerThan, MaxBetween, # column
CustomConstraint, # bring your own logic
)
from ydata.dataset import Dataset
engine = ConstraintEngine([
# ── Row constraints ──
NotNull(columns=["age", "income"]),
Unique(columns=["customer_id"]),
GreaterThan(columns=["age"], value=0),
NotIncludedIn(column="status", values=["banned", "deleted"]),
# ── Column constraints ──
MeanBetween(lower_bound=20, upper_bound=65, columns=["age"]),
NullRateLowerThan(value=0.05, columns=["income"]),
MaxBetween(lower_bound=0, upper_bound=10_000, columns=["tx_amount"]),
# ── Custom logic ──
CustomConstraint(
lambda df: df["end_date"] >= df["start_date"],
columns=["end_date"], available_columns=["start_date"],
axis="row",
),
])
engine.validate(dataset)
print(engine.summary())
clean_dataset = engine.filter(dataset)
Sections
- Row Constraints — per-row rules: nulls, uniqueness, ranges, patterns, ordering
- Column Constraints — aggregate rules: mean, min/max, null rate, std
- Constraint Engine — validate, filter, summary, fault isolation, and combining constraints