Skip to content

feat: Add array data type support#433

Open
charlesdong1991 wants to merge 12 commits intoapache:mainfrom
charlesdong1991:issue-386
Open

feat: Add array data type support#433
charlesdong1991 wants to merge 12 commits intoapache:mainfrom
charlesdong1991:issue-386

Conversation

@charlesdong1991
Copy link
Contributor

Purpose

Linked issue: close #386

Brief change log

Add end-to-end ARRAY column support in fluss-rust

Tests

Yes all tests are passed locally

let (val, next) = reader.read_long(cursor);
let decimal = Decimal::from_unscaled_long(val, precision, scale)
.expect("Failed to create decimal from unscaled long");
let decimal =
Copy link
Contributor Author

@charlesdong1991 charlesdong1991 Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should convert all expect to typed errors so malformed array payload handling does not leave panic backdoor

// silently widen key semantics.
if matches!(
field_type,
DataType::Array(_) | DataType::Map(_) | DataType::Row(_)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this applies to all array/map/row, so i understand this PR only adds Array, but i think better to add all of them.
if there is objection, i can remove

@luoyuxia
Copy link
Contributor

luoyuxia commented Mar 8, 2026

@charlesdong1991 Thanks for the great pr. I'll find some time to review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end ARRAY column support to the Fluss Rust client, including a Java-compatible binary array representation and integration across row encoders/decoders, Arrow interop, and public APIs.

Changes:

  • Introduces FlussArray / FlussArrayWriter (BinaryArray layout) and wires it into Datum, InternalRow, and runtime field/value writers.
  • Adds compacted row read/write support for arrays (length-prefixed BinaryArray bytes) and rejects ARRAY as a key column type.
  • Extends Arrow conversion logic and updates user/docs + tests for primitive, nullable, empty, and nested arrays.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
website/docs/user-guide/rust/data-types.md Documents ARRAY<T> type mapping and how to construct/read arrays.
website/docs/user-guide/rust/api-reference.md Adds InternalRow::get_array and documents FlussArray API.
crates/fluss/src/row/mod.rs Exposes binary_array module and re-exports FlussArray; adds InternalRow::get_array.
crates/fluss/src/row/field_getter.rs Adds Array field getter + tests.
crates/fluss/src/row/encode/compacted_key_encoder.rs Adds test ensuring array types are rejected in key encoding.
crates/fluss/src/row/datum.rs Adds Datum::Array and Arrow ListBuilder append/conversion logic.
crates/fluss/src/row/compacted/compacted_row_writer.rs Adds write_array (delegates to length-prefixed bytes).
crates/fluss/src/row/compacted/compacted_row_reader.rs Makes deserialization fallible and adds array decoding via FlussArray::from_bytes.
crates/fluss/src/row/compacted/compacted_row.rs Propagates fallible deserialization and adds InternalRow::get_array + array tests.
crates/fluss/src/row/compacted/compacted_key_writer.rs Explicitly rejects complex key types and adds write_array to the delegated writer surface.
crates/fluss/src/row/column.rs Implements ColumnarRow::get_array by converting Arrow ListArrayFlussArray; adds tests.
crates/fluss/src/row/binary_array.rs New Java-compatible binary array implementation + writer + tests.
crates/fluss/src/row/binary/binary_writer.rs Adds BinaryWriter::write_array and InnerValueWriter::Array.
crates/fluss/src/record/arrow.rs Adds Arrow List builder support and from_arrow_type mapping for list element conversion.
bindings/cpp/src/types.rs Propagates Datum::Array handling in type resolution and row ownership conversion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 LGTM overall, left minor comments
If you fixed smth in the meantime - good, sorry for the noise, as I started reviewing before the new code :)

@charlesdong1991
Copy link
Contributor Author

Hi @fresh-borzoni thanks a lot!
It would be great if you can give another round of reviews since i made some changes during your review ^^

Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Left some comments. PTAL

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 TY, looked through again, left comments
PTAL

(offset, size)
}

pub fn get_string(&self, pos: usize) -> Result<&str> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we return Result in half of the getters and half of the methods go unchecked?
I doubt that Java is the same half-way, pls, check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems java uniformly uncheck.
and now in rust, it is because getter that involve operations that are inherently fallible (e.g. length bound checks etc), but primitive getters only do fixed-size slice conversion.

i agree we shouldn't do half-way, so either make all getters return Result with bounds checking or we also let primitive getters panic on corrupted payload so it matches java

i would prefer to have all getters return Result, wdyt? @fresh-borzoni

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991
Let's use Result for now, if we need to add infallible variants later - it would be non-breaking change.
So it's the safe unblocking way, to make API consistent and leave some room for improvement if needed, just add TODO or comment about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @fresh-borzoni updated, PTAL

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 TY, Looked through, LGTM overall, one nit comment 👍

@charlesdong1991
Copy link
Contributor Author

Thanks both! PTAL @fresh-borzoni @leekeiabstraction @luoyuxia

@leekeiabstraction
Copy link
Contributor

Test case changes looks good to me but diff seems larger, seems to be squashed with is retriable changes from another PR? Not sure if it will conflict. LGTM otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Array data type support in Rust

5 participants