feat(GX-3008): add uniqueness coverage to GenerateDataQualityCheckExpectationsEvent.run#1155
Conversation
…ectationsEvent.run When the Uniqueness DQI is selected, add ExpectColumnProportionOfUniqueValuesToBeBetween for each column not already covered. Mirrors completeness pattern; uses COLUMN_UNIQUE_PROPORTION (already a float proportion) rather than computing a ratio from raw counts.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
Adds downstream uniqueness anomaly-detection coverage generation to the agent action so that selecting UNIQUENESS results in per-column ExpectColumnProportionOfUniqueValuesToBeBetween expectations when coverage is missing (idempotent), mirroring the existing completeness pattern.
Changes:
- Refactors
GenerateDataQualityCheckExpectationsAction.run()by extracting per-asset logic into_generate_expectations_for_asset. - Implements uniqueness expectation generation and missing-coverage filtering (
_add_uniqueness_change_expectations,_get_columns_missing_uniqueness_coverage). - Adds unit tests covering uniqueness expectation creation, forecast mode, edge cases, and idempotency; bumps package dev version.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
great_expectations_cloud/agent/actions/generate_data_quality_check_expectations_action.py |
Refactors per-asset generation flow and adds uniqueness expectation generation + coverage filtering. |
tests/agent/actions/test_generate_data_quality_check_expectations_action.py |
Adds fixtures and tests validating uniqueness expectation behavior (including forecast + edge cases). |
pyproject.toml |
Increments dev version. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
great_expectations_cloud/agent/actions/generate_data_quality_check_expectations_action.py
Show resolved
Hide resolved
alena-hutchinson
left a comment
There was a problem hiding this comment.
Not blocking, but could we combine the code in _get_columns_missing_uniqueness|completeness_coverage and/or _add_uniqueness|completeness_change_expectations into a shared column-method? They are almost identical and there is some pretty repetitive code in this file. Maybe that's something we do in the later epic when we become more selective about adding column-level AD Expectations, but it feels like this action could use some refactoring.
c18fc75
into
m/GX-3007/selective-metrics-in-get-metrics
GX-3007 already added COLUMN_UNIQUE_PROPORTION to the metric list when Uniqueness is selected. This PR adds the downstream step: generating ExpectColumnProportionOfUniqueValuesToBeBetween expectations for every column that doesn't already have uniqueness anomaly detection coverage.
The implementation mirrors the completeness pattern exactly. The main difference is that COLUMN_UNIQUE_PROPORTION arrives as a float proportion (0-1) directly from the metric run, so no row-count division is needed. Edge cases for proportion=0 and proportion=1 produce static max_value=0 and min_value=1 expectations (no windows); mixed proportions get two MEAN windows; use_forecast=True switches both windows to FORECAST.
Coverage is idempotent: _get_columns_missing_uniqueness_coverage filters out any column that already has a uniqueness expectation before creating new ones.
Lint required extracting the per-asset body of run() into _generate_expectations_for_asset to stay within the max-complexity=8 limit.