Skip to content

Add block-transposed multi-vector representation for SIMD-friendly layouts#805

Open
suri-kumkaran wants to merge 10 commits intomainfrom
users/suryangupta/transposed_multivector
Open

Add block-transposed multi-vector representation for SIMD-friendly layouts#805
suri-kumkaran wants to merge 10 commits intomainfrom
users/suryangupta/transposed_multivector

Conversation

@suri-kumkaran
Copy link
Contributor

Summary

Introduces BlockTransposed<T, GROUP, PACK>, a new Repr implementation that stores multi-vectors in a block-transposed layout optimized for SIMD processing. This enables faster multi-vector distance computations by ensuring data is arranged for efficient vectorized access patterns.

Key changes

  • BlockTransposed<T, GROUP, PACK>: Generic block-transposed matrix representation where groups of GROUP rows are stored transposed, with an optional PACK factor for column interleaving (e.g. PACK=2 for vpmaddwd-style instructions).
  • Row proxy types: BlockTransposedRow / BlockTransposedRowMut with indexed access, iteration, and Send/Sync support.
  • Factory methods: new_block_transposed, from_strided, from_matrix_view — generic over any T: Copy + Default.
  • Block-level views: block() / block_mut() / remainder_block() / remainder_block_mut() returning MatrixView / MutMatrixView with dimensions (padded_ncols/PACK, GROUP*PACK) for direct SIMD consumption.
  • Compile-time constraints: GROUP > 0, PACK > 0, GROUP % PACK == 0 enforced at monomorphization.
  • Added tests including parameterized correctness tests across multiple GROUP/PACK combinations, generic element types (f32, i32, u8), bounds-checking panic tests, and Miri-compatible paths.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new multi-vector matrix representation (BlockTransposed<T, GROUP, PACK>) intended to replace/extend the prior kmeans-specific BlockTranspose layout and enable SIMD-friendly access patterns (including optional packed column interleaving).

Changes:

  • Added multi_vector::BlockTransposed representation with row proxy types, block views, constructors, and extensive tests.
  • Migrated kmeans (lloyds, kmeans++) and product-quantization training/pivot codepaths to use Mat<BlockTransposed<...>> instead of BlockTranspose.
  • Removed the old BlockTranspose implementation from algorithms::kmeans::common and stopped re-exporting it from algorithms::kmeans.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
diskann-quantization/src/product/train.rs Switches PQ training to build/use Mat<BlockTransposed<...>> for SIMD-friendly kmeans operations.
diskann-quantization/src/product/tables/transposed/pivots.rs Updates pivot chunk storage to Mat<BlockTransposed<...>> and adjusts group-size/construction calls.
diskann-quantization/src/multi_vector/mod.rs Adds the new block_transposed module and re-exports its public types.
diskann-quantization/src/multi_vector/matrix.rs Makes some internals pub(crate) to support the new representation’s accessors/constructors.
diskann-quantization/src/multi_vector/block_transposed.rs New core implementation of block-transposed (+ packed) representation with row proxies, block views, constructors, and tests.
diskann-quantization/src/algorithms/kmeans/plusplus.rs Replaces BlockTranspose usage with Mat<BlockTransposed<...>> and updates microkernel wiring.
diskann-quantization/src/algorithms/kmeans/mod.rs Removes pub use common::BlockTranspose; export.
diskann-quantization/src/algorithms/kmeans/lloyds.rs Replaces BlockTranspose usage with Mat<BlockTransposed<...>>.
diskann-quantization/src/algorithms/kmeans/common.rs Removes the old BlockTranspose implementation and its tests, leaving square_norm.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov-commenter
Copy link

codecov-commenter commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 97.12644% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.03%. Comparing base (5e0e49d) to head (b305a62).

Files with missing lines Patch % Lines
...-quantization/src/multi_vector/block_transposed.rs 96.86% 30 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #805      +/-   ##
==========================================
+ Coverage   88.96%   89.03%   +0.07%     
==========================================
  Files         442      443       +1     
  Lines       81906    82690     +784     
==========================================
+ Hits        72868    73626     +758     
- Misses       9038     9064      +26     
Flag Coverage Δ
miri 89.03% <97.12%> (+0.07%) ⬆️
unittests 88.89% <97.12%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...skann-quantization/src/algorithms/kmeans/common.rs 96.38% <ø> (-1.11%) ⬇️
...skann-quantization/src/algorithms/kmeans/lloyds.rs 98.14% <100.00%> (+0.01%) ⬆️
...ann-quantization/src/algorithms/kmeans/plusplus.rs 97.31% <100.00%> (+0.02%) ⬆️
diskann-quantization/src/bits/ptr.rs 100.00% <100.00%> (ø)
diskann-quantization/src/minmax/multi/meta.rs 97.66% <100.00%> (ø)
diskann-quantization/src/multi_vector/matrix.rs 95.25% <100.00%> (+0.09%) ⬆️
...antization/src/product/tables/transposed/pivots.rs 97.57% <100.00%> (ø)
diskann-quantization/src/product/train.rs 96.66% <100.00%> (ø)
diskann-quantization/src/utils.rs 96.92% <100.00%> (+0.04%) ⬆️
...-quantization/src/multi_vector/block_transposed.rs 96.86% <96.86%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Suryansh! I've taken a look over and left a few stylistic comments, but there are few larger ones that I will group together here:

  1. We should be very careful with the unchecked arithmetic, particularly in compute_capacity. We honestly probably want two flavors of this function, one that verifies none of the intermediate steps overflow that gets used in the constructor, and one that assumes everything has already been checked (this would be the current compute_capacity function). I would also recommend leaning in to usize::next_multiple_of instead of doing the operation manually.

  2. I'm getting more convinced that we do not want to use Mat/MatRef/MatMut in public APIs. This forces us into an awkward situation where we need to start adding a bunch of inherent methods to these types, which makes method discoverability a little harder and is not a pattern that can be replicated outside of diskann-quantization. Instead, I think it would be cleaner to have thin wrappers:

    struct BlockTransposed<T, const GROUP: usize, const PACK: usize> {
        data: Mat<Repr<T, GROUP, PACK>>,
    }
    
    struct BlockTransposedRef<'a, T, const GROUP: usize, const PACK:    usize> {
        data: MatRef<'a, Repr<T, GROUP, PACK>>,
    }
    
    // etc.

    Inherent methods can be added to these at will and the generated docs for these inherent methods will be clear (since they will not be grouped in with all the other inherent methods). This will also let you safely has as_slice as an inherent method to avoid the manual unsafe construction in tests.

    Also, I'll admit a small preference for using method delegation instead of stamping out a giant macro, something like:

    impl<T, const GROUP: usize, const PACK: usize> BlockTransposed<T, GROUP, PACK> {
        fn block(&self, block: usize) -> MatrixView<'_, T> {
            self.reborrow().block(block)
        }
    }

    But I think that is secondary.

  3. In the tests - we should really have one that calls collect on a Mat::row_iter_mut() and writes to a bunch of rows in like a thread::scoped or something. Basically, something to make sure that working on multiple rows concurrently is fine (and when using Miri - that will add another level of safety).

@suri-kumkaran
Copy link
Contributor Author

suri-kumkaran commented Mar 3, 2026

Thanks Suryansh! I've taken a look over and left a few stylistic comments, but there are few larger ones that I will group together here:

  1. We should be very careful with the unchecked arithmetic, particularly in compute_capacity. We honestly probably want two flavors of this function, one that verifies none of the intermediate steps overflow that gets used in the constructor, and one that assumes everything has already been checked (this would be the current compute_capacity function). I would also recommend leaning in to usize::next_multiple_of instead of doing the operation manually.

  2. I'm getting more convinced that we do not want to use Mat/MatRef/MatMut in public APIs. This forces us into an awkward situation where we need to start adding a bunch of inherent methods to these types, which makes method discoverability a little harder and is not a pattern that can be replicated outside of diskann-quantization. Instead, I think it would be cleaner to have thin wrappers:

    struct BlockTransposed<T, const GROUP: usize, const PACK: usize> {
        data: Mat<Repr<T, GROUP, PACK>>,
    }
    
    struct BlockTransposedRef<'a, T, const GROUP: usize, const PACK:    usize> {
        data: MatRef<'a, Repr<T, GROUP, PACK>>,
    }
    
    // etc.

    Inherent methods can be added to these at will and the generated docs for these inherent methods will be clear (since they will not be grouped in with all the other inherent methods). This will also let you safely has as_slice as an inherent method to avoid the manual unsafe construction in tests.
    Also, I'll admit a small preference for using method delegation instead of stamping out a giant macro, something like:

    impl<T, const GROUP: usize, const PACK: usize> BlockTransposed<T, GROUP, PACK> {
        fn block(&self, block: usize) -> MatrixView<'_, T> {
            self.reborrow().block(block)
        }
    }

    But I think that is secondary.

  3. In the tests - we should really have one that calls collect on a Mat::row_iter_mut() and writes to a bunch of rows in like a thread::scoped or something. Basically, something to make sure that working on multiple rows concurrently is fine (and when using Miri - that will add another level of safety).

Thanks Mark, for the amazing suggestions. I have addressed (1) and (2) and will soon add tests to cover (3). Please help with a follow up review.

05-03-2026 Update: Added tests to cover 3 as well.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@suri-kumkaran suri-kumkaran requested a review from Copilot March 5, 2026 20:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

/// pointer suitable for storing inside a [`Mat`](super::multi_vector::matrix::Mat).
///
/// To reclaim the memory later, reconstruct the `Box` via
/// `Box::from_raw(slice_from_raw_parts_mut(ptr, len))`.
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc uses slice_from_raw_parts_mut without a path, which can be confusing in rustdoc (it’s not in the prelude). Consider spelling it as std::ptr::slice_from_raw_parts_mut to make the recovery recipe copy/pasteable as-is.

Suggested change
/// `Box::from_raw(slice_from_raw_parts_mut(ptr, len))`.
/// `Box::from_raw(std::ptr::slice_from_raw_parts_mut(ptr, len))`.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @suri-kumkaran for the hard work - we're almost there. I have a few more asks I came across during the review process:

Coverage

A lot of the repeated methods between BlockTransposeMut and BlockTranspose aren't covered by the test, including functions with non-trivial logic like BlockTransposedMut::remainder_block_mut. I've included techniques in this review for simplifying the delegation and deduplicating the mutable method implementations, which should address most of this naturally.

As a rule of thumb: if a method purely delegates to a tested method with no additional logic, it's fine to leave uncovered (though coverage is appreciated). If it contains arithmetic, unsafe, or any branching, it needs coverage.

You can run coverage analysis locally with

cargo llvm-cov nextest --html --package diskann-quantization --cargo-profile ci

Miri and Corner Case Handling

Please run Miri before submitting:

cargo +nightly miri nextest run --package diskann-quantization --cargo-profile ci multi_vector

You'll find that test_row_view_empty triggers undefined behavior because linear_index computes a non-zero offset when ncols == 0. Additionally, there are no empty-row tests for PACK > 1 or for the mutable row views.

Given the amount of unsafe in this change, thorough corner-case testing is essential for confidence. The parameterized tests (test_block_transposed_impl, test_packed_impl) are a good foundation; consider unifying them into a single function parameterized over <T, GROUP, PACK> that systematically exercises the full API surface (construction, row views, mutable row views, block views, element access, OOB returns, padding). That way, each new (T, GROUP, PACK, nrows, ncols) combination automatically gets complete coverage.

@suri-kumkaran suri-kumkaran requested a review from Copilot March 10, 2026 13:06
@suri-kumkaran
Copy link
Contributor Author

Thank you @suri-kumkaran for the hard work - we're almost there. I have a few more asks I came across during the review process:

Coverage

A lot of the repeated methods between BlockTransposeMut and BlockTranspose aren't covered by the test, including functions with non-trivial logic like BlockTransposedMut::remainder_block_mut. I've included techniques in this review for simplifying the delegation and deduplicating the mutable method implementations, which should address most of this naturally.

As a rule of thumb: if a method purely delegates to a tested method with no additional logic, it's fine to leave uncovered (though coverage is appreciated). If it contains arithmetic, unsafe, or any branching, it needs coverage.

You can run coverage analysis locally with

cargo llvm-cov nextest --html --package diskann-quantization --cargo-profile ci

Miri and Corner Case Handling

Please run Miri before submitting:

cargo +nightly miri nextest run --package diskann-quantization --cargo-profile ci multi_vector

You'll find that test_row_view_empty triggers undefined behavior because linear_index computes a non-zero offset when ncols == 0. Additionally, there are no empty-row tests for PACK > 1 or for the mutable row views.

Given the amount of unsafe in this change, thorough corner-case testing is essential for confidence. The parameterized tests (test_block_transposed_impl, test_packed_impl) are a good foundation; consider unifying them into a single function parameterized over <T, GROUP, PACK> that systematically exercises the full API surface (construction, row views, mutable row views, block views, element access, OOB returns, padding). That way, each new (T, GROUP, PACK, nrows, ncols) combination automatically gets complete coverage.

Thanks for the amazing suggestions. I have unified testing, increased coverage and completed local miri testing. Please help with review :)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the back and forth - sorry this was such a long review process. Let's send it!


#[test]
fn test_api_pack1_group16() {
let row_range = if cfg!(miri) { 127..128 } else { 0..128 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Miri ranges bug me, the only hit one arbitrary "nice" point. Can we make them hit layout boundaries instead? Like 0, 1, GROUP - 1, GROUP, GROPU + 1 rows (and a few multiples there-of) and 0, 1, PACK - 1, PACK, PACK + 1 columns?

It's exactly these kinds of transition zones that can accidentally exercise UB and that Miri will protect agains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants