Row Constraints

Row constraints validate individual rows in your dataset and return a per-row boolean mask. Rows that fail are flagged in summary() and removed by filter().

All row constraints can be imported directly from ydata.constraints:

from ydata.constraints import (
    NotNull, Unique, GreaterThan, LowerThan, Between, Positive,
    NotIncludedIn, IncludedIn, StringLength, Monotonic,
    BetweenDates, Regex, RelationConstraint, CombineConstraints,
    CustomConstraint,
)

NotNull

Flags any row where the column value is null / NaN.

from ydata.constraints import NotNull

c = NotNull(columns=["age", "income"])

Parameter	Type	Description
`columns`	`str \\| list[str]`	Column(s) that must not be null
`name`	`str \\| None`	Optional label for the engine summary

Unique

Flags rows where the column value appears more than once. Use this for primary keys and ID columns.

from ydata.constraints import Unique

c = Unique(columns=["customer_id"])

Parameter	Type	Description
`columns`	`str \\| list[str]`	Column(s) whose values must be unique
`name`	`str \\| None`	Optional label

GreaterThan / LowerThan / Between / Positive

Numeric range checks against a fixed value or another column.

GreaterThanLowerThanBetweenPositive

from ydata.constraints import GreaterThan

# Fixed threshold
c = GreaterThan(columns=["age"], value=0)

# Compare to another column
c = GreaterThan(columns=["end_date"], value="start_date")

from ydata.constraints import LowerThan

c = LowerThan(columns=["error_rate"], value=0.05)

from ydata.constraints import Between

c = Between(columns=["score"], lower_bound=0, upper_bound=100)

from ydata.constraints import Positive

# Shorthand for GreaterThan value=0
c = Positive(columns=["amount", "balance"])

Parameter	Type	Description
`columns`	`str \\| list[str]`	Column(s) to check
`value`	`float \\| str`	Threshold value, or column name to compare against
`lower_bound` / `upper_bound`	`float`	Bounds for `Between` (both inclusive)

IncludedIn / NotIncludedIn

Allowlist and blocklist checks.

IncludedInNotIncludedIn

from ydata.constraints import IncludedIn

c = IncludedIn(column="status", values=["active", "pending", "closed"])

from ydata.constraints import NotIncludedIn

# Blocklist — row fails if value is in this list
c = NotIncludedIn(column="country", values=["XX", "ZZ"])

Parameter	Type	Description
`column`	`str`	Column to check
`values`	`Any \\| list[Any]`	Allowed (or forbidden) values

StringLength

Checks that string values fall within a character-length range. Works on any column after coercing to str.

from ydata.constraints import StringLength

# Postcodes must be 4–6 characters
c = StringLength(columns=["postcode"], min_length=4, max_length=6)

# Product codes must be at least 3 characters, no upper limit
c = StringLength(columns=["product_code"], min_length=3)

Parameter	Type	Description
`columns`	`str \\| list[str]`	Column(s) to check
`min_length`	`int`	Minimum length, inclusive. Defaults to `0`
`max_length`	`int \\| None`	Maximum length, inclusive. `None` = no upper limit

Monotonic

Checks that values in a column are monotonically ordered. Row i is flagged when it breaks the ordering relative to row i-1. Useful for timestamps, cumulative balances, and auto-increment IDs.

from ydata.constraints import Monotonic

# Non-decreasing (equal consecutive values allowed)
c = Monotonic(columns=["timestamp"], increasing=True)

# Strictly decreasing (equal consecutive values are violations)
c = Monotonic(columns=["countdown"], increasing=False, strict=True)

Parameter	Type	Description
`columns`	`str \\| list[str]`	Column(s) to check
`increasing`	`bool`	`True` = non-decreasing, `False` = non-increasing. Default `True`
`strict`	`bool`	If `True`, equal consecutive values are violations. Default `False`

Note

The first row always passes — there is no predecessor to compare it to.

Regex

Checks that string values match a regular expression.

from ydata.constraints import Regex

# UK postcode pattern
c = Regex(column="postcode", regex=r"^[A-Z]{1,2}\d[A-Z\d]?\s?\d[A-Z]{2}$")

Parameter	Type	Description
`column`	`str`	Column to check
`regex`	`str`	Regular expression pattern (`re.fullmatch`)

BetweenDates

Checks that a date column falls within an interval (in days) relative to another date column.

from ydata.constraints import BetweenDates

# end_date must be between 0 and 365 days after start_date
c = BetweenDates(
    constrained_column="end_date",
    reference_column="start_date",
    lower_bound=0,
    upper_bound=365,
)

Parameter	Type	Description
`constrained_column`	`str`	Date column being checked
`reference_column`	`str`	Date column used as the reference point
`lower_bound`	`int`	Minimum offset in days (inclusive)
`upper_bound`	`int`	Maximum offset in days (inclusive)

RelationConstraint

Validates that each row's key columns and value columns match an allowed combination defined in a reference DataFrame. Supports scalar and list-of-allowed-values in the reference.

from ydata.constraints import RelationConstraint
import pandas as pd

reference = pd.DataFrame({
    "country":      ["PT", "ES", "FR", "DE"] * 3,
    "account_type": ["savings", "checking", "premium"] * 4,
})

c = RelationConstraint(
    reference=reference,
    key_columns=["country"],
    value_columns=["account_type"],
)

Parameter	Type	Description
`reference`	`pd.DataFrame`	DataFrame of allowed key → value combinations
`key_columns`	`list[str]`	Columns that form the lookup key (must be scalar in reference)
`value_columns`	`list[str]`	Columns whose values must be in the allowed set for that key

CombineConstraints

Combines multiple row constraints with a logical operation.

MERGE (AND)XNOR

from ydata.constraints import CombineConstraints, Positive, NotNull

# Row passes only if ALL constraints pass
c = CombineConstraints(
    [NotNull(columns=["x"]), Positive(columns=["x"])],
    operation=CombineConstraints.Operation.MERGE,
)

# Row passes if ALL pass OR ALL fail (exclusive-NOR)
c = CombineConstraints(
    constraints,
    operation=CombineConstraints.Operation.XNOR,
)

CustomConstraint

For any logic not covered by the named constraints, use CustomConstraint with axis="row". The callable receives a pd.DataFrame and must return a boolean pd.Series aligned to the DataFrame's index.

from ydata.constraints import CustomConstraint

# Cross-column rule: income must exceed spending
c = CustomConstraint(
    lambda df: df["income"] > df["monthly_spending"],
    columns=["income"],
    available_columns=["monthly_spending"],
    axis="row",
    name="income_exceeds_spending",
)

With entity grouping (apply the check per group):

def cumulative_balance_check(df):
    return df["balance"] == df["amount"].cumsum()

c = CustomConstraint(
    check=cumulative_balance_check,
    available_columns=["balance", "amount"],
    entity="account_id",
    axis="row",
)

Parameter	Type	Description
`check`	`Callable`	Receives a `pd.DataFrame`, returns a boolean `pd.Series`
`columns`	`str \\| list[str] \\| None`	Columns to include in the returned mask
`available_columns`	`list[str] \\| None`	Extra columns passed to `check` but not in the mask
`entity`	`str \\| None`	Group-by column; `check` is applied per group
`name`	`str \\| None`	Optional label
`axis`	`str`	Must be `"row"` (or `"rows"`)