Skip to content

fix: enable more Spark SQL tests for native_datafusion (DynamicPartitionPruningSuite / ExplainSuite)#3694

Open
andygrove wants to merge 7 commits intoapache:mainfrom
andygrove:fix-dpp-native-datafusion-fallback
Open

fix: enable more Spark SQL tests for native_datafusion (DynamicPartitionPruningSuite / ExplainSuite)#3694
andygrove wants to merge 7 commits intoapache:mainfrom
andygrove:fix-dpp-native-datafusion-fallback

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Mar 13, 2026

Which issue does this PR close?

Closes #3313.

Rationale for this change

Two Spark SQL tests were skipped for native_datafusion scan mode via IgnoreCometNativeDataFusion:

  • DynamicPartitionPruningSuite - "static scan metrics"
  • ExplainSuite - "explain formatted - check presence of subquery in case of DPP"

These tests failed because CometNativeScanExec was missing driver-side scan metrics (numFiles, filesSize, numPartitions, etc.) and proper EXPLAIN FORMATTED output with Location abbreviation.

What changes are included in this PR?

CometNativeScanExec (Comet code):

  • Add verboseStringWithOperatorId() with Location path abbreviation, matching FileSourceScanExec and CometScanExec behavior so EXPLAIN FORMATTED output is correct
  • Share driver metrics (numFiles, filesSize, numPartitions, metadataTime, staticFilesNum, staticFilesSize, pruningTime) from the underlying CometScanExec so Spark scan metric assertions pass

Spark diff (3.5.8.diff):

  • Add CometNativeScanExec to SparkPlanInfo metadata extraction for event logging
  • Add CometNativeScanExec to DPP test scan pattern matching
  • Remove IgnoreCometNativeDataFusion tags from both tests

How are these changes tested?

Tested locally by running both Spark SQL tests with COMET_PARQUET_SCAN_IMPL=native_datafusion:

  • DynamicPartitionPruningV1SuiteAEOff - "static scan metrics" - PASSED
  • ExplainSuite - "explain formatted - check presence of subquery in case of DPP" - PASSED

Also verified both tests pass with default (auto) scan mode.

Update CometNativeScan.isDynamicPruningFilter to check for
DynamicPruningExpression in addition to PlanExpression. The previous
check only caught dynamic DPP (with subqueries) but missed static DPP
where Spark resolves the pruning expression to a literal wrapped in
DynamicPruningExpression.

Closes apache#3313
- Add verboseStringWithOperatorId() with Location abbreviation so
  EXPLAIN FORMATTED shows scan metadata correctly
- Share driver metrics (numFiles, filesSize, numPartitions, etc.)
  from the underlying CometScanExec so Spark scan metric tests pass
- Add CometNativeScanExec to SparkPlanInfo metadata extraction and
  DPP test scan pattern matching in the Spark diff
The isDynamicPruningFilter check in CometNativeScan only needs to
check for PlanExpression since DPP subqueries always contain one.
The DynamicPruningExpression wrapper check is not needed.
This test is now covered by the Spark SQL test suite with native_datafusion
scan mode enabled.
@andygrove andygrove changed the title fix: detect all DPP forms in native_datafusion scan fallback fix: enable DPP-related Spark SQL tests for native_datafusion scan Mar 14, 2026
No changes to CometNativeScan.scala are needed for this PR.
@andygrove andygrove changed the title fix: enable DPP-related Spark SQL tests for native_datafusion scan fix: enable more Spark SQL tests for native_datafusion Mar 14, 2026
@andygrove andygrove changed the title fix: enable more Spark SQL tests for native_datafusion fix: enable more Spark SQL tests for native_datafusion (DynamicPartitionPruningSuite / ExplainSuite) Mar 14, 2026
@andygrove andygrove marked this pull request as ready for review March 14, 2026 14:14
@andygrove andygrove requested a review from mbutrovich March 14, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[native_datafusion] [Spark SQL Tests] Dynamic Partition Pruning (DPP) not working correctly

1 participant