Skip to content

Saving and loading in-memory btrees to disk#820

Merged
backurs merged 8 commits intomainfrom
arturs/in_memory_to_disk
Mar 10, 2026
Merged

Saving and loading in-memory btrees to disk#820
backurs merged 8 commits intomainfrom
arturs/in_memory_to_disk

Conversation

@backurs
Copy link
Contributor

@backurs backurs commented Mar 9, 2026

This PR allows to save and load in-memory (:memory:) indices. Previously, it was possible to save/load only on-disk indices.

It also adds is_memory variable to SavedParams struct so that we know whether the original bf-tree index was in-memory or on-disk.

We also add two tests on whether in-memory saving/loading works both in pq and non-pq settings.

@backurs backurs requested review from a team and Copilot March 9, 2026 22:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends BfTreeProvider persistence to support saving and loading indices that were originally backed by in-memory (:memory:) BfTrees, including both non-quantized and PQ-quantized variants.

Changes:

  • Added snapshot_to_disk(...) helpers on vector/quant/neighbor providers to persist in-memory BfTrees to disk.
  • Extended persisted SavedParams with an is_memory flag and updated load/save flows to round-trip in-memory indices.
  • Added async tests covering save/load for in-memory indices with and without PQ quantization.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
diskann-providers/src/model/graph/provider/async_/bf_tree/vector_provider.rs Adds a convenience method to snapshot in-memory vector BfTree to disk.
diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs Adds a convenience method to snapshot in-memory quant-vector BfTree to disk.
diskann-providers/src/model/graph/provider/async_/bf_tree/neighbor_provider.rs Adds a convenience method to snapshot in-memory neighbor-list BfTree to disk.
diskann-providers/src/model/graph/provider/async_/bf_tree/provider.rs Persists is_memory, adjusts save/load logic for in-memory backends, and adds new tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@metajack
Copy link
Contributor

metajack commented Mar 9, 2026

This looks fine to me. The repeated pattern of loading and saving all the various providers could maybe be factored into two helper functions? I'm not sure how hard that is, so feel free to do what is reasonable.

@codecov-commenter
Copy link

codecov-commenter commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.96%. Comparing base (4db2797) to head (56ec094).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #820      +/-   ##
==========================================
- Coverage   90.64%   88.96%   -1.69%     
==========================================
  Files         432      442      +10     
  Lines       79629    81906    +2277     
==========================================
+ Hits        72182    72870     +688     
- Misses       7447     9036    +1589     
Flag Coverage Δ
miri 88.96% <ø> (-1.69%) ⬇️
unittests 88.82% <ø> (-1.80%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 52 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@backurs
Copy link
Contributor Author

backurs commented Mar 10, 2026

This looks fine to me. The repeated pattern of loading and saving all the various providers could maybe be factored into two helper functions? I'm not sure how hard that is, so feel free to do what is reasonable.

Good suggestion. For loading I created the following function:

fn load_bftree(
    params: &BfTreeParams,
    snapshot_path: std::path::PathBuf,
    is_memory: bool,
) -> Result<BfTree, ANNError> {
    let config = params.to_config(&snapshot_path, is_memory);
    if is_memory {
        BfTree::new_from_snapshot_disk_to_memory(snapshot_path, config)
            .map_err(|e| ANNError::from(super::ConfigError(e)))
    } else {
        BfTree::new_from_snapshot(config, None)
            .map_err(|e| ANNError::from(super::ConfigError(e)))
    }
}

and I'm calling it in 5 places.

For saving I was able to refactor the code with help of traits but the resulting code was complicated and longer than the original code. So, for saving, perhaps we can leave the current solution.

@backurs backurs merged commit 3cd8ac2 into main Mar 10, 2026
24 checks passed
@backurs backurs deleted the arturs/in_memory_to_disk branch March 10, 2026 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants