Row Constraints
Row constraints validate individual rows in your dataset and return a per-row boolean mask. Rows that fail are flagged in summary() and removed by filter().
All row constraints can be imported directly from ydata.constraints:
from ydata.constraints import (
NotNull, Unique, GreaterThan, LowerThan, Between, Positive,
NotIncludedIn, IncludedIn, StringLength, Monotonic,
BetweenDates, Regex, RelationConstraint, CombineConstraints,
CustomConstraint,
)
NotNull
Flags any row where the column value is null / NaN.
| Parameter | Type | Description |
|---|---|---|
columns |
str \| list[str] |
Column(s) that must not be null |
name |
str \| None |
Optional label for the engine summary |
Unique
Flags rows where the column value appears more than once. Use this for primary keys and ID columns.
| Parameter | Type | Description |
|---|---|---|
columns |
str \| list[str] |
Column(s) whose values must be unique |
name |
str \| None |
Optional label |
GreaterThan / LowerThan / Between / Positive
Numeric range checks against a fixed value or another column.
| Parameter | Type | Description |
|---|---|---|
columns |
str \| list[str] |
Column(s) to check |
value |
float \| str |
Threshold value, or column name to compare against |
lower_bound / upper_bound |
float |
Bounds for Between (both inclusive) |
IncludedIn / NotIncludedIn
Allowlist and blocklist checks.
| Parameter | Type | Description |
|---|---|---|
column |
str |
Column to check |
values |
Any \| list[Any] |
Allowed (or forbidden) values |
StringLength
Checks that string values fall within a character-length range. Works on any column after coercing to str.
from ydata.constraints import StringLength
# Postcodes must be 4–6 characters
c = StringLength(columns=["postcode"], min_length=4, max_length=6)
# Product codes must be at least 3 characters, no upper limit
c = StringLength(columns=["product_code"], min_length=3)
| Parameter | Type | Description |
|---|---|---|
columns |
str \| list[str] |
Column(s) to check |
min_length |
int |
Minimum length, inclusive. Defaults to 0 |
max_length |
int \| None |
Maximum length, inclusive. None = no upper limit |
Monotonic
Checks that values in a column are monotonically ordered. Row i is flagged when it breaks the ordering relative to row i-1. Useful for timestamps, cumulative balances, and auto-increment IDs.
from ydata.constraints import Monotonic
# Non-decreasing (equal consecutive values allowed)
c = Monotonic(columns=["timestamp"], increasing=True)
# Strictly decreasing (equal consecutive values are violations)
c = Monotonic(columns=["countdown"], increasing=False, strict=True)
| Parameter | Type | Description |
|---|---|---|
columns |
str \| list[str] |
Column(s) to check |
increasing |
bool |
True = non-decreasing, False = non-increasing. Default True |
strict |
bool |
If True, equal consecutive values are violations. Default False |
Note
The first row always passes — there is no predecessor to compare it to.
Regex
Checks that string values match a regular expression.
from ydata.constraints import Regex
# UK postcode pattern
c = Regex(column="postcode", regex=r"^[A-Z]{1,2}\d[A-Z\d]?\s?\d[A-Z]{2}$")
| Parameter | Type | Description |
|---|---|---|
column |
str |
Column to check |
regex |
str |
Regular expression pattern (re.fullmatch) |
BetweenDates
Checks that a date column falls within an interval (in days) relative to another date column.
from ydata.constraints import BetweenDates
# end_date must be between 0 and 365 days after start_date
c = BetweenDates(
constrained_column="end_date",
reference_column="start_date",
lower_bound=0,
upper_bound=365,
)
| Parameter | Type | Description |
|---|---|---|
constrained_column |
str |
Date column being checked |
reference_column |
str |
Date column used as the reference point |
lower_bound |
int |
Minimum offset in days (inclusive) |
upper_bound |
int |
Maximum offset in days (inclusive) |
RelationConstraint
Validates that each row's key columns and value columns match an allowed combination defined in a reference DataFrame. Supports scalar and list-of-allowed-values in the reference.
from ydata.constraints import RelationConstraint
import pandas as pd
reference = pd.DataFrame({
"country": ["PT", "ES", "FR", "DE"] * 3,
"account_type": ["savings", "checking", "premium"] * 4,
})
c = RelationConstraint(
reference=reference,
key_columns=["country"],
value_columns=["account_type"],
)
| Parameter | Type | Description |
|---|---|---|
reference |
pd.DataFrame |
DataFrame of allowed key → value combinations |
key_columns |
list[str] |
Columns that form the lookup key (must be scalar in reference) |
value_columns |
list[str] |
Columns whose values must be in the allowed set for that key |
CombineConstraints
Combines multiple row constraints with a logical operation.
CustomConstraint
For any logic not covered by the named constraints, use CustomConstraint with axis="row". The callable receives a pd.DataFrame and must return a boolean pd.Series aligned to the DataFrame's index.
from ydata.constraints import CustomConstraint
# Cross-column rule: income must exceed spending
c = CustomConstraint(
lambda df: df["income"] > df["monthly_spending"],
columns=["income"],
available_columns=["monthly_spending"],
axis="row",
name="income_exceeds_spending",
)
With entity grouping (apply the check per group):
def cumulative_balance_check(df):
return df["balance"] == df["amount"].cumsum()
c = CustomConstraint(
check=cumulative_balance_check,
available_columns=["balance", "amount"],
entity="account_id",
axis="row",
)
| Parameter | Type | Description |
|---|---|---|
check |
Callable |
Receives a pd.DataFrame, returns a boolean pd.Series |
columns |
str \| list[str] \| None |
Columns to include in the returned mask |
available_columns |
list[str] \| None |
Extra columns passed to check but not in the mask |
entity |
str \| None |
Group-by column; check is applied per group |
name |
str \| None |
Optional label |
axis |
str |
Must be "row" (or "rows") |