Conversation
read_nsv now infers numeric types per-column (like read_csv) instead of leaving everything as strings. to_nsv converts non-string values to str and NaN to empty string before writing.
- reader.py: check() appended a bare int instead of a (pos, line, col) tuple when the string ends with a trailing backslash, which would crash the subsequent unpacking loop - core.py: remove unused loop variable i in dumps() - test_utils.py: load_then_dump() splatted the list into dumps() instead of passing it as a single iterable argument
- Apply pandas' default NA value set (NA, NaN, nan, null, None, etc.) before type inference so NA strings become NaN in all column types, matching read_csv behaviour - Detect all-true/false columns (case-insensitive) and cast to bool; bool+NA columns return object with Python bools and NaN, also matching read_csv behaviour - Refactor inference into _infer_column() helper - Add TestReadNsvNullInference and TestReadNsvBoolInference test classes, all using read_csv as the oracle
CI runs without pandas (it's an optional dependency). Guard all test classes with @skip_no_pandas so the suite passes without it.
The pandas extra was not being installed in CI. Switch to pip install -e ".[pandas]" so the tests actually run. Revert the skipUnless guards added in the previous commit.
Instead of reimplementing pandas' bool/NA/numeric detection, convert NSV rows to CSV in memory and pass to read_csv directly.
Keep type inference local to patch_pandas rather than leaking constants into module scope. No CSV serialization overhead.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive pandas integration for the NSV (Newline-Separated Values) format, including automatic type inference for
read_nsv()and proper handling of non-string types into_nsv().Key Changes
Enhanced
read_nsv()function:read_csv()behavior usingpd.to_numeric()dtypeparameter to override inferenceImproved
to_nsv()method:Comprehensive test suite (
tests/test_pandas.py):Implementation Details
pd.to_numeric(..., errors='coerce')to safely attempt numeric conversionpatch_pandas()function now properly registers bothread_nsv()andto_nsv()with pandashttps://claude.ai/code/session_01SNTCQdxd21HTHqZd61uHHY