Skip to content

GH-533: Add ALP (Adaptive Lossless floating-Point) encoding specification#557

Open
prtkgaur wants to merge 1 commit intoapache:masterfrom
prtkgaur:alpEncoding
Open

GH-533: Add ALP (Adaptive Lossless floating-Point) encoding specification#557
prtkgaur wants to merge 1 commit intoapache:masterfrom
prtkgaur:alpEncoding

Conversation

@prtkgaur
Copy link

@prtkgaur prtkgaur commented Mar 11, 2026

Add the encoding specification for ALP (encoding value 10) to Encodings.md. ALP compresses FLOAT and DOUBLE columns by converting values to integers via decimal scaling, then applying Frame of Reference encoding and bit-packing. Values that cannot be losslessly round-tripped are stored as exceptions.

See rendered preview here: https://github.com/prtkgaur/parquet-format/blob/alpEncoding/Encodings.md#adaptive-lossless-floating-point-alp--10

The spec covers:

  • Page layout: 7-byte header, offset array, compressed vectors
  • Vector format: AlpInfo, ForInfo, packed values, exception data
  • Encoding math: two-step multiplication for cross-language consistency
  • Parameter selection, exception detection, and decoding steps

Based on the paper "ALP: Adaptive Lossless floating-Point Compression" (Afroozeh and Boncz, SIGMOD 2024). Wire format matches the C++ Arrow and Java parquet-java implementations.

Rationale for this change

What changes are included in this PR?

Do these changes have PoC implementations?

Add the encoding specification for ALP (encoding value 10) to Encodings.md.
ALP compresses FLOAT and DOUBLE columns by converting values to integers via
decimal scaling, then applying Frame of Reference encoding and bit-packing.
Values that cannot be losslessly round-tripped are stored as exceptions.

The spec covers:
- Page layout: 7-byte header, offset array, compressed vectors
- Vector format: AlpInfo, ForInfo, packed values, exception data
- Encoding math: two-step multiplication for cross-language consistency
- Parameter selection, exception detection, and decoding steps

Based on the paper "ALP: Adaptive Lossless floating-Point Compression"
(Afroozeh and Boncz, SIGMOD 2024). Wire format matches the C++ Arrow
and Java parquet-java implementations.
@alamb alamb changed the title Add ALP (Adaptive Lossless floating-Point) encoding specification GH-533: Add ALP (Adaptive Lossless floating-Point) encoding specification Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Add ALP encoding support in parquet file format

2 participants