Conversation
| codec: CompressionCodec; | ||
| num_values: long = null; // only present if not equal to rg.num_rows | ||
| total_uncompressed_size: long; | ||
| total_compressed_size: long; |
There was a problem hiding this comment.
It would be nice to keep total unencoded size here which I think is generally useful? But I suppose it can be added after?
src/main/flatbuf/parquet3.fbs
Outdated
| dictionary_page_offset: long = null; | ||
| statistics: Statistics; | ||
| is_fully_dict_encoded: bool; | ||
| bloom_filter_offset: long = null; |
There was a problem hiding this comment.
Should we this be made a struct/value to make the bloom filter info more self contained?
src/main/flatbuf/parquet3.fbs
Outdated
| row_groups: [RowGroup]; | ||
| kv: [KV]; | ||
| created_by: string; | ||
| // column_orders: [ColumnOrder]; // moved to SchemaElement |
There was a problem hiding this comment.
remove this row for now?
emkornfield
left a comment
There was a problem hiding this comment.
I think we also need to add an apache header here, and CI to make sure this compiles?
|
Hi @rok and @emkornfield , could you help to have another look of this pr? |
| min_lo4: uint; | ||
| min_lo8: ulong; | ||
| min_hi8: ulong; | ||
| min_len: byte = null; |
There was a problem hiding this comment.
| min_len: byte = null; | |
| min_len: int = null; |
Original suffix lenght could exceed int8 range of byte type.
There was a problem hiding this comment.
Hi @rok , the previous comment is outdated. max_len and min_len store the truncated suffix length, which means the maximum value is 16. We use negative numbers to represent inexact values. I have updated the comment and the example, please take a look.
| max_lo4: uint; | ||
| max_lo8: ulong; | ||
| max_hi8: ulong; | ||
| max_len: byte = null; |
There was a problem hiding this comment.
As above:
| max_len: byte = null; | |
| max_len: int = null; |
src/main/flatbuf/parquet3.fbs
Outdated
|
|
||
| /** repetition of the field. The root of the schema does not have a repetition_type. | ||
| * All other nodes must have one */ | ||
| repetition_type: FieldRepetitionType; |
There was a problem hiding this comment.
To allow for root to not have repetition type. In thrift we have optional:
parquet-format/src/main/thrift/parquet.thrift
Line 518 in 38818fa
| repetition_type: FieldRepetitionType; | |
| repetition_type: FieldRepetitionType = null; |
|
@Jiayi-Wang-db Feel free to resolve comments you feel were addressed to make this more readable? |
|
Is |
Not at all. I pushed it accidentally. Removed. |
Rationale for this change
Improve wide table support.
What changes are included in this PR?
Add parquet flatbuf schema.
Do these changes have PoC implementations?
apache/arrow#48431