feat: Add array data type support#433
Conversation
| let (val, next) = reader.read_long(cursor); | ||
| let decimal = Decimal::from_unscaled_long(val, precision, scale) | ||
| .expect("Failed to create decimal from unscaled long"); | ||
| let decimal = |
There was a problem hiding this comment.
i think we should convert all expect to typed errors so malformed array payload handling does not leave panic backdoor
| // silently widen key semantics. | ||
| if matches!( | ||
| field_type, | ||
| DataType::Array(_) | DataType::Map(_) | DataType::Row(_) |
There was a problem hiding this comment.
this applies to all array/map/row, so i understand this PR only adds Array, but i think better to add all of them.
if there is objection, i can remove
|
@charlesdong1991 Thanks for the great pr. I'll find some time to review |
There was a problem hiding this comment.
Pull request overview
Adds end-to-end ARRAY column support to the Fluss Rust client, including a Java-compatible binary array representation and integration across row encoders/decoders, Arrow interop, and public APIs.
Changes:
- Introduces
FlussArray/FlussArrayWriter(BinaryArray layout) and wires it intoDatum,InternalRow, and runtime field/value writers. - Adds compacted row read/write support for arrays (length-prefixed BinaryArray bytes) and rejects
ARRAYas a key column type. - Extends Arrow conversion logic and updates user/docs + tests for primitive, nullable, empty, and nested arrays.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| website/docs/user-guide/rust/data-types.md | Documents ARRAY<T> type mapping and how to construct/read arrays. |
| website/docs/user-guide/rust/api-reference.md | Adds InternalRow::get_array and documents FlussArray API. |
| crates/fluss/src/row/mod.rs | Exposes binary_array module and re-exports FlussArray; adds InternalRow::get_array. |
| crates/fluss/src/row/field_getter.rs | Adds Array field getter + tests. |
| crates/fluss/src/row/encode/compacted_key_encoder.rs | Adds test ensuring array types are rejected in key encoding. |
| crates/fluss/src/row/datum.rs | Adds Datum::Array and Arrow ListBuilder append/conversion logic. |
| crates/fluss/src/row/compacted/compacted_row_writer.rs | Adds write_array (delegates to length-prefixed bytes). |
| crates/fluss/src/row/compacted/compacted_row_reader.rs | Makes deserialization fallible and adds array decoding via FlussArray::from_bytes. |
| crates/fluss/src/row/compacted/compacted_row.rs | Propagates fallible deserialization and adds InternalRow::get_array + array tests. |
| crates/fluss/src/row/compacted/compacted_key_writer.rs | Explicitly rejects complex key types and adds write_array to the delegated writer surface. |
| crates/fluss/src/row/column.rs | Implements ColumnarRow::get_array by converting Arrow ListArray → FlussArray; adds tests. |
| crates/fluss/src/row/binary_array.rs | New Java-compatible binary array implementation + writer + tests. |
| crates/fluss/src/row/binary/binary_writer.rs | Adds BinaryWriter::write_array and InnerValueWriter::Array. |
| crates/fluss/src/record/arrow.rs | Adds Arrow List builder support and from_arrow_type mapping for list element conversion. |
| bindings/cpp/src/types.rs | Propagates Datum::Array handling in type resolution and row ownership conversion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
@charlesdong1991 LGTM overall, left minor comments
If you fixed smth in the meantime - good, sorry for the noise, as I started reviewing before the new code :)
|
Hi @fresh-borzoni thanks a lot! |
leekeiabstraction
left a comment
There was a problem hiding this comment.
Thank you for the PR! Left some comments. PTAL
fresh-borzoni
left a comment
There was a problem hiding this comment.
@charlesdong1991 TY, looked through again, left comments
PTAL
| (offset, size) | ||
| } | ||
|
|
||
| pub fn get_string(&self, pos: usize) -> Result<&str> { |
There was a problem hiding this comment.
why do we return Result in half of the getters and half of the methods go unchecked?
I doubt that Java is the same half-way, pls, check
There was a problem hiding this comment.
it seems java uniformly uncheck.
and now in rust, it is because getter that involve operations that are inherently fallible (e.g. length bound checks etc), but primitive getters only do fixed-size slice conversion.
i agree we shouldn't do half-way, so either make all getters return Result with bounds checking or we also let primitive getters panic on corrupted payload so it matches java
i would prefer to have all getters return Result, wdyt? @fresh-borzoni
There was a problem hiding this comment.
@charlesdong1991
Let's use Result for now, if we need to add infallible variants later - it would be non-breaking change.
So it's the safe unblocking way, to make API consistent and leave some room for improvement if needed, just add TODO or comment about this.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fresh-borzoni
left a comment
There was a problem hiding this comment.
@charlesdong1991 TY, Looked through, LGTM overall, one nit comment 👍
|
Thanks both! PTAL @fresh-borzoni @leekeiabstraction @luoyuxia |
|
Test case changes looks good to me but diff seems larger, seems to be squashed with is retriable changes from another PR? Not sure if it will conflict. LGTM otherwise. |
Purpose
Linked issue: close #386
Brief change log
Add end-to-end ARRAY column support in fluss-rust
Tests
Yes all tests are passed locally