Skip to content

[PySpark] - Add DataFrame iterator methods and corresponding tests#381

Open
mariotaddeucci wants to merge 4 commits intoduckdb:mainfrom
mariotaddeucci:feature-pyspark-dataframe-iterators
Open

[PySpark] - Add DataFrame iterator methods and corresponding tests#381
mariotaddeucci wants to merge 4 commits intoduckdb:mainfrom
mariotaddeucci:feature-pyspark-dataframe-iterators

Conversation

@mariotaddeucci
Copy link

@mariotaddeucci mariotaddeucci commented Mar 14, 2026

  • Implement isEmpty, toLocalIterator, foreach, and foreachPartition methods in the DataFrame class.

- Implement `toLocalIterator`, `foreach`, and `foreachPartition` methods in the DataFrame class.
- Add tests for `isEmpty`, `foreach`, `foreachPartition`, and `toLocalIterator` methods in the test suite.
Copilot AI review requested due to automatic review settings March 14, 2026 02:05
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds DataFrame iteration APIs to the DuckDB Spark-compat layer and expands the test suite to validate these behaviors.

Changes:

  • Implement DataFrame.toLocalIterator, DataFrame.foreach, DataFrame.foreachPartition, and DataFrame.isEmpty.
  • Refactor row construction into a shared helper used by both collect() and toLocalIterator().
  • Add tests covering isEmpty, foreach, foreachPartition, and toLocalIterator.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
tests/fast/spark/test_spark_dataframe.py Adds unit tests for the new DataFrame iterator/foreach APIs and isEmpty.
duckdb/experimental/spark/sql/dataframe.py Implements toLocalIterator, foreach, foreachPartition, isEmpty, and shared Row construction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +170 to +171
while rows := itertools.islice(rows_generator, 10_000):
f(rows)

mock_callable = mock.MagicMock()
df.foreachPartition(mock_callable)
mock_callable.assert_called_once_with(expected)
Comment on lines +82 to +83
def toLocalIterator(self, prefetchPartitions: bool = False) -> Iterator[Row]:
"""Returns an iterator that contains all of the rows in this :class:`DataFrame`.
Comment on lines +116 to +117
while rows := cur.fetchmany(10_000):
yield from (_construct_row(x, columns) for x in rows)
mariotaddeucci and others added 3 commits March 13, 2026 23:14
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants