Skip to content

Add get_numeric_columns method in VMDataset to get list of all numeric columns#485

Open
AnilSorathiya wants to merge 1 commit intomainfrom
anilsorathiya/sc-12844/add-get-numeric-columns-in-vm-dataset-for
Open

Add get_numeric_columns method in VMDataset to get list of all numeric columns#485
AnilSorathiya wants to merge 1 commit intomainfrom
anilsorathiya/sc-12844/add-get-numeric-columns-in-vm-dataset-for

Conversation

@AnilSorathiya
Copy link
Contributor

Pull Request Description

What and why?

Add a way to get all numeric columns from a VMDataset (not only feature columns)

  • New method: get_numeric_columns() on VMDataset:
    • Returns a list of column names that have a numeric dtype (from pd.api.types.is_numeric_dtype).
    • Operates on the full underlying dataframe (self._df), so it includes all numeric columns (target, extra columns, prediction/probability columns, etc.), not only feature columns.
    • Documented with a short docstring explaining this and the return type.

How to test

New test: test_get_numeric_columns in TestTabularDataset:

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

  • What and why
  • Screenshots or videos (Frontend)
  • How to test
  • What needs special review
  • Dependencies, breaking changes, and deployment notes
  • Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

@AnilSorathiya AnilSorathiya added internal Not to be externalized in the release notes chore Chore tasks that aren't bugs or new features labels Mar 10, 2026
@AnilSorathiya AnilSorathiya requested a review from juanmleng March 10, 2026 13:55
@github-actions
Copy link
Contributor

PR Summary

This pull request introduces a new method, get_numeric_columns, in the DataFrameDataset class, which extracts all numeric columns from a given pandas DataFrame. Unlike the existing feature_columns_numeric attribute that only considers feature columns, the new method returns every column with a numeric datatype (including target and other extra columns).

Key changes include:

  • A new method get_numeric_columns in the dataset module that uses pandas dtype checking to identify numeric columns.
  • New unit tests in the tests module to validate the functionality of get_numeric_columns for a variety of scenarios:
    • A DataFrame mixing numeric, categorical, and text columns.
    • A DataFrame with all numeric columns.
    • A DataFrame with only non-numeric columns, ensuring the method returns an empty list in such cases.

These changes enhance the robustness of column type identification within the dataset, enabling more flexible downstream processing.

Test Suggestions

  • Add tests for DataFrames with missing values in numeric columns to ensure the method correctly identifies numeric columns even with NaN values.
  • Include tests with edge-case data types (e.g., mixed types in one column) to verify the lambda function handles all situations.
  • Test the behavior when an empty DataFrame is passed to ensure the method returns an empty list without raising errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Chore tasks that aren't bugs or new features internal Not to be externalized in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant