You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Updated the validmind.data_validation.MissingValues test so Pass/Fail is evaluated against the percentage of missing values (consistent with the reported “% missing” output). Renamed the configuration parameter from min_threshold to min_percentage_threshold and updated templates/notebooks and related configs accordingly.
Why
Users interpreted min_threshold=1 as 1% missing allowed based on the test description/output, but the prior implementation effectively behaved differently, causing columns with <1% missing to be labeled Fail. This change removes the ambiguity and ensures the threshold logic matches what users see and expect.
How to test
Open any updated notebook and run the MissingValues cells to confirm min_percentage_threshold works and min_threshold is gone.
Run unit tests: poetry run python -m unittest tests.unit_tests.data_validation.test_MissingValues
What needs special review?
Dependencies, breaking changes, and deployment notes
Release notes
Checklist
What and why
Screenshots or videos (Frontend)
How to test
What needs special review
Dependencies, breaking changes, and deployment notes
This PR updates the implementation and usage of the missing values validation test. Specifically, the parameter formerly named min_threshold is now renamed to min_percentage_threshold to more clearly indicate that it represents a percentage value. The changes are applied across multiple notebooks, test cases, and configuration files. In addition, the underlying logic in the MissingValues function has been updated to calculate the percentage of missing values based on the total number of rows and to apply the threshold comparison accordingly. The documentation string in the function has also been updated to reflect the new parameter meaning. Minor adjustments, such as setting the execution count to null in one notebook cell, have been made to improve clarity and correctness in the examples.
Test Suggestions
Add tests with edge cases, such as when the missing value percentage is exactly on the threshold.
Verify behavior with a dataset having no rows to ensure the function handles division by zero properly.
Include tests with non-standard missing value representations (e.g., '-999' or 'None' as strings) to ensure they do not affect percentage calculations.
Test the updated function with both integer and floating-point values to confirm that the new parameter works as expected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bugSomething isn't workinginternalNot to be externalized in the release notessupportSupport-related PR
3 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description
What and why?
What
Updated the
validmind.data_validation.MissingValuestest so Pass/Fail is evaluated against the percentage of missing values (consistent with the reported “% missing” output). Renamed the configuration parameter frommin_thresholdtomin_percentage_thresholdand updated templates/notebooks and related configs accordingly.Why
Users interpreted
min_threshold=1as 1% missing allowed based on the test description/output, but the prior implementation effectively behaved differently, causing columns with <1% missing to be labeled Fail. This change removes the ambiguity and ensures the threshold logic matches what users see and expect.How to test
MissingValuescells to confirmmin_percentage_thresholdworks andmin_thresholdis gone.poetry run python -m unittest tests.unit_tests.data_validation.test_MissingValuesWhat needs special review?
Dependencies, breaking changes, and deployment notes
Release notes
Checklist