GH-3404: fix CI TestByteBitPacking512VectorLE.unpackValuesUsingVector randomly oom#3405
GH-3404: fix CI TestByteBitPacking512VectorLE.unpackValuesUsingVector randomly oom#3405wgtmac merged 1 commit intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to eliminate CI flakiness (random OutOfMemoryError) in TestByteBitPacking512VectorLE.unpackValuesUsingVector by reducing peak memory usage during the test.
Changes:
- Refactors test input generation from eagerly-built
List<int[]>to a lazily-producedStream<int[]>. - Reuses a single
int[] outputbuffer (instead of multiple output arrays) across the different unpacking paths. - Minor loop formatting cleanup in the vector unpack methods.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int pack8Count = intInput.length / 8; | ||
| int byteOutputSize = pack8Count * bitWidth; | ||
| byte[] byteOutput = new byte[byteOutputSize]; | ||
| int[] output = new int[intInput.length]; |
There was a problem hiding this comment.
This test still allocates extremely large arrays for higher bit widths (e.g., when intInput.length ~= 268,435,520, you allocate intInput + output (~2 GB total) plus byteOutput up to ~1 GB). Even with lazy generation and reusing a single output array, a single iteration can require ~3 GB of contiguous heap and can still OOM under typical Surefire heaps.
Consider refactoring the test to avoid materializing the full input/output/packed buffers at once (e.g., generate/pack/unpack/verify in smaller fixed-size chunks, or lower itemMax for this test while keeping representative coverage).
...src/test/java/org/apache/parquet/column/values/bitpacking/TestByteBitPacking512VectorLE.java
Outdated
Show resolved
Hide resolved
|
cc @dossett |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| unpack8Values(bitWidth, byteOutput, output); | ||
| assertArrayEquals(intInput, output); | ||
| Arrays.fill(output, 0); | ||
|
|
||
| ByteBuffer byteBuffer = ByteBuffer.wrap(byteOutput); | ||
| unpackValuesUsingVectorByteBuffer(bitWidth, byteBuffer, output3); | ||
| unpackValuesUsingVectorArray(bitWidth, byteOutput, output); | ||
| assertArrayEquals(intInput, output); | ||
| Arrays.fill(output, 0); | ||
|
|
There was a problem hiding this comment.
Arrays.fill(output, 0) after each assert is an O(n) operation over the entire output array and adds significant runtime (especially for large chunks). Since each unpack method writes all positions it is responsible for, clearing should be unnecessary if output is sized to intInput.length; if you still need clearing, limit it to the written range (e.g., 0..intInput.length).
Rationale for this change
Fix CI TestByteBitPacking512VectorLE.unpackValuesUsingVector randomly oom by lazily initializing data and reusing temporary arrays.
What changes are included in this PR?
Refactor
TestByteBitPacking512VectorLE.unpackValuesUsingVectorwithout modifying its logic.Are these changes tested?
Yes.
Are there any user-facing changes?
No.
Closes #3404