Skip to content

[SC-14933] Fix-missing-values-threshold-and-rename-parameter#482

Merged
juanmleng merged 2 commits intomainfrom
juan/sc-14933/fix-missing-values-threshold-and-rename-parameter
Feb 28, 2026
Merged

[SC-14933] Fix-missing-values-threshold-and-rename-parameter#482
juanmleng merged 2 commits intomainfrom
juan/sc-14933/fix-missing-values-threshold-and-rename-parameter

Conversation

@juanmleng
Copy link
Contributor

Pull Request Description

What and why?

What

Updated the validmind.data_validation.MissingValues test so Pass/Fail is evaluated against the percentage of missing values (consistent with the reported “% missing” output). Renamed the configuration parameter from min_threshold to min_percentage_threshold and updated templates/notebooks and related configs accordingly.

Why

Users interpreted min_threshold=1 as 1% missing allowed based on the test description/output, but the prior implementation effectively behaved differently, causing columns with <1% missing to be labeled Fail. This change removes the ambiguity and ensures the threshold logic matches what users see and expect.

How to test

  • Open any updated notebook and run the MissingValues cells to confirm min_percentage_threshold works and min_threshold is gone.
  • Run unit tests: poetry run python -m unittest tests.unit_tests.data_validation.test_MissingValues

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

@juanmleng juanmleng self-assigned this Feb 27, 2026
@juanmleng juanmleng added bug Something isn't working internal Not to be externalized in the release notes labels Feb 27, 2026
@github-actions
Copy link
Contributor

PR Summary

This PR updates the implementation and usage of the missing values validation test. Specifically, the parameter formerly named min_threshold is now renamed to min_percentage_threshold to more clearly indicate that it represents a percentage value. The changes are applied across multiple notebooks, test cases, and configuration files. In addition, the underlying logic in the MissingValues function has been updated to calculate the percentage of missing values based on the total number of rows and to apply the threshold comparison accordingly. The documentation string in the function has also been updated to reflect the new parameter meaning. Minor adjustments, such as setting the execution count to null in one notebook cell, have been made to improve clarity and correctness in the examples.

Test Suggestions

  • Add tests with edge cases, such as when the missing value percentage is exactly on the threshold.
  • Verify behavior with a dataset having no rows to ensure the function handles division by zero properly.
  • Include tests with non-standard missing value representations (e.g., '-999' or 'None' as strings) to ensure they do not affect percentage calculations.
  • Test the updated function with both integer and floating-point values to confirm that the new parameter works as expected.

@juanmleng juanmleng merged commit d6e06b5 into main Feb 28, 2026
17 checks passed
@juanmleng juanmleng deleted the juan/sc-14933/fix-missing-values-threshold-and-rename-parameter branch February 28, 2026 09:40
@nrichers nrichers added the support Support-related PR label Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working internal Not to be externalized in the release notes support Support-related PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants