Skip to content

Row Constraints

Row constraints validate individual rows in your dataset and return a per-row boolean mask. Rows that fail are flagged in summary() and removed by filter().

All row constraints can be imported directly from ydata.constraints:

from ydata.constraints import (
    NotNull, Unique, GreaterThan, LowerThan, Between, Positive,
    NotIncludedIn, IncludedIn, StringLength, Monotonic,
    BetweenDates, Regex, RelationConstraint, CombineConstraints,
    CustomConstraint,
)

NotNull

Flags any row where the column value is null / NaN.

from ydata.constraints import NotNull

c = NotNull(columns=["age", "income"])
Parameter Type Description
columns str \| list[str] Column(s) that must not be null
name str \| None Optional label for the engine summary

Unique

Flags rows where the column value appears more than once. Use this for primary keys and ID columns.

from ydata.constraints import Unique

c = Unique(columns=["customer_id"])
Parameter Type Description
columns str \| list[str] Column(s) whose values must be unique
name str \| None Optional label

GreaterThan / LowerThan / Between / Positive

Numeric range checks against a fixed value or another column.

from ydata.constraints import GreaterThan

# Fixed threshold
c = GreaterThan(columns=["age"], value=0)

# Compare to another column
c = GreaterThan(columns=["end_date"], value="start_date")
from ydata.constraints import LowerThan

c = LowerThan(columns=["error_rate"], value=0.05)
from ydata.constraints import Between

c = Between(columns=["score"], lower_bound=0, upper_bound=100)
from ydata.constraints import Positive

# Shorthand for GreaterThan value=0
c = Positive(columns=["amount", "balance"])
Parameter Type Description
columns str \| list[str] Column(s) to check
value float \| str Threshold value, or column name to compare against
lower_bound / upper_bound float Bounds for Between (both inclusive)

IncludedIn / NotIncludedIn

Allowlist and blocklist checks.

from ydata.constraints import IncludedIn

c = IncludedIn(column="status", values=["active", "pending", "closed"])
from ydata.constraints import NotIncludedIn

# Blocklist — row fails if value is in this list
c = NotIncludedIn(column="country", values=["XX", "ZZ"])
Parameter Type Description
column str Column to check
values Any \| list[Any] Allowed (or forbidden) values

StringLength

Checks that string values fall within a character-length range. Works on any column after coercing to str.

from ydata.constraints import StringLength

# Postcodes must be 4–6 characters
c = StringLength(columns=["postcode"], min_length=4, max_length=6)

# Product codes must be at least 3 characters, no upper limit
c = StringLength(columns=["product_code"], min_length=3)
Parameter Type Description
columns str \| list[str] Column(s) to check
min_length int Minimum length, inclusive. Defaults to 0
max_length int \| None Maximum length, inclusive. None = no upper limit

Monotonic

Checks that values in a column are monotonically ordered. Row i is flagged when it breaks the ordering relative to row i-1. Useful for timestamps, cumulative balances, and auto-increment IDs.

from ydata.constraints import Monotonic

# Non-decreasing (equal consecutive values allowed)
c = Monotonic(columns=["timestamp"], increasing=True)

# Strictly decreasing (equal consecutive values are violations)
c = Monotonic(columns=["countdown"], increasing=False, strict=True)
Parameter Type Description
columns str \| list[str] Column(s) to check
increasing bool True = non-decreasing, False = non-increasing. Default True
strict bool If True, equal consecutive values are violations. Default False

Note

The first row always passes — there is no predecessor to compare it to.


Regex

Checks that string values match a regular expression.

from ydata.constraints import Regex

# UK postcode pattern
c = Regex(column="postcode", regex=r"^[A-Z]{1,2}\d[A-Z\d]?\s?\d[A-Z]{2}$")
Parameter Type Description
column str Column to check
regex str Regular expression pattern (re.fullmatch)

BetweenDates

Checks that a date column falls within an interval (in days) relative to another date column.

from ydata.constraints import BetweenDates

# end_date must be between 0 and 365 days after start_date
c = BetweenDates(
    constrained_column="end_date",
    reference_column="start_date",
    lower_bound=0,
    upper_bound=365,
)
Parameter Type Description
constrained_column str Date column being checked
reference_column str Date column used as the reference point
lower_bound int Minimum offset in days (inclusive)
upper_bound int Maximum offset in days (inclusive)

RelationConstraint

Validates that each row's key columns and value columns match an allowed combination defined in a reference DataFrame. Supports scalar and list-of-allowed-values in the reference.

from ydata.constraints import RelationConstraint
import pandas as pd

reference = pd.DataFrame({
    "country":      ["PT", "ES", "FR", "DE"] * 3,
    "account_type": ["savings", "checking", "premium"] * 4,
})

c = RelationConstraint(
    reference=reference,
    key_columns=["country"],
    value_columns=["account_type"],
)
Parameter Type Description
reference pd.DataFrame DataFrame of allowed key → value combinations
key_columns list[str] Columns that form the lookup key (must be scalar in reference)
value_columns list[str] Columns whose values must be in the allowed set for that key

CombineConstraints

Combines multiple row constraints with a logical operation.

from ydata.constraints import CombineConstraints, Positive, NotNull

# Row passes only if ALL constraints pass
c = CombineConstraints(
    [NotNull(columns=["x"]), Positive(columns=["x"])],
    operation=CombineConstraints.Operation.MERGE,
)
# Row passes if ALL pass OR ALL fail (exclusive-NOR)
c = CombineConstraints(
    constraints,
    operation=CombineConstraints.Operation.XNOR,
)

CustomConstraint

For any logic not covered by the named constraints, use CustomConstraint with axis="row". The callable receives a pd.DataFrame and must return a boolean pd.Series aligned to the DataFrame's index.

from ydata.constraints import CustomConstraint

# Cross-column rule: income must exceed spending
c = CustomConstraint(
    lambda df: df["income"] > df["monthly_spending"],
    columns=["income"],
    available_columns=["monthly_spending"],
    axis="row",
    name="income_exceeds_spending",
)

With entity grouping (apply the check per group):

def cumulative_balance_check(df):
    return df["balance"] == df["amount"].cumsum()

c = CustomConstraint(
    check=cumulative_balance_check,
    available_columns=["balance", "amount"],
    entity="account_id",
    axis="row",
)
Parameter Type Description
check Callable Receives a pd.DataFrame, returns a boolean pd.Series
columns str \| list[str] \| None Columns to include in the returned mask
available_columns list[str] \| None Extra columns passed to check but not in the mask
entity str \| None Group-by column; check is applied per group
name str \| None Optional label
axis str Must be "row" (or "rows")