Skip to content

fix: enable native_datafusion Spark SQL tests previously ignored in #3315#3696

Open
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix-native-datafusion-spark-sql-tests
Open

fix: enable native_datafusion Spark SQL tests previously ignored in #3315#3696
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix-native-datafusion-spark-sql-tests

Conversation

@andygrove
Copy link
Member

Which issue does this PR close?

Closes #3315.

Rationale for this change

Three Spark SQL tests were ignored for native_datafusion scan mode due to plan structure differences. The root cause for the streaming tests was that CometNativeScanExec did not expose a numOutputRows metric, which Spark's streaming ProgressReporter uses to count input rows.

What changes are included in this PR?

  • CometNativeScanExec: Add numOutputRows as an alias for the output_rows native metric. Both keys reference the same SQLMetric instance, so when native code updates output_rows, Spark's streaming framework sees the correct value via numOutputRows.
  • dev/diffs/3.5.8.diff: Remove IgnoreCometNativeDataFusion tags from three tests:
    • FileDataSourceV2FallBackSuite: "Fallback Parquet V2 to V1" (assertion already handles CometNativeScanExec)
    • StreamingQuerySuite: "SPARK-41198: input row calculation with CTE"
    • StreamingQuerySuite: "SPARK-41199: input row calculation with mixed-up of DSv1 and DSv2 streaming sources"

How are these changes tested?

All three tests verified locally with COMET_PARQUET_SCAN_IMPL=native_datafusion against Spark 3.5.8 with the updated diff applied.

…pache#3315

Add numOutputRows metric alias to CometNativeScanExec so Spark's streaming
ProgressReporter can find input row counts. Remove IgnoreCometNativeDataFusion
tags from three Spark SQL tests that now pass with native_datafusion scan.
@andygrove andygrove marked this pull request as ready for review March 14, 2026 14:15
@andygrove andygrove requested a review from mbutrovich March 14, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[native_datafusion] [Spark SQL Tests] Plan structure differences cause test failures

1 participant