Local store to persist on disk. This is a simple and lightweight library to persist data on disk. It is a key-value store, however the key is auto-generated long, and the only data supported is a byte array. Data modification is not supported. Transactions are not supported. The intent of the library is to support temporary persistence of data blobs to tolerate network partitions. For example, the hosting service receives data, and needs to send it for further processing. It cannot stop receiving data, and it must acknowledge reception. However, it should tolerate the loss of connection to the downstream services, and be able to retry sending the data once connection is restored. The data is written to the store as append-only operation. The entries could be deleted, and the store would reclaim the space automatically at some point.
The library was tested with 10 threads, each writing 100000 randomly generated entries of 200KB size on average, and reading, and deleting 90000 entries with no wait. With checkpoints every 5 seconds, the test completes in ~55 seconds (MacBook M1 Pro, 16GB). The test translates to ~18K RPS. This rate far exceeds the intended use case of the library. There is no need for the author to store more than 100 entries per second. Please do your own testing to make sure the library meets your requirements.
The store creates and manages two main types of files for each segment of data:
-
Key Files:
- Naming Convention:
i<firstKey>.dat(e.g.,i1.dat,i129.dat). - Purpose: These files act as an index for a segment. They store metadata about each key, including its status (active or deleted) and an offset pointing to the actual data in the corresponding Value File.
- Managed by:
com.marqeta.mqpay.store.KeyFile
- Naming Convention:
-
Value Files:
- Naming Convention:
v<firstKey>.dat(e.g.,v1.dat,v129.dat). - Purpose: These files store the actual
byte[]data associated with the keys. - Managed by:
com.marqeta.mqpay.store.ValueFile
- Naming Convention:
Both KeyFile and ValueFile extend com.marqeta.mqpay.store.DataFile, which provides common low-level file I/O operations using RandomAccessFile.
-
Key File (
i<firstKey>.dat) Format:- Header (Constant part, written on creation):
0-7bytes: File Header (long) - Stores version information or flags. Currently, it's0L(KeyFile.HEAD).8-15bytes: Open Time (long) - Timestamp (milliseconds since epoch) when the key file (and thus the segment) was created (KeyFile.openTime).16-23bytes: Close Time (long) - Timestamp when the segment was closed for new additions. Initially0L, updated when the segment becomes full or is explicitly checkpointed while being the current segment (KeyFile.OFFSET_CLOSETIME,KeyFile.closeFileForAddition()).24-31bytes: First Key (long) - The ID of the first key that this segment can store (KeyFile.firstKey).
- Key Map (Bitmap for tombstones, written on creation and updated on checkpoint):
32-33bytes: Map Size (short) - The length of the subsequent key map byte array (DataFile.writeBytesWithLength()used byKeyFile).34 - (34 + Map Size - 1)bytes: Key Map (byte[]) - A bitmap where each bit corresponds to a key within the segment (relative tofirstKey). A bit set to0means the key is active;1means it's deleted (tombstone) (KeyFile.map,KeyFile.checkPoint()).
- Offsets (Appended as keys are added):
- Starting after the Key Map: A sequence of Offsets (int[]). Each
int(4 bytes) is an offset in the corresponding Value File where the data for a specific key is stored. These are written sequentially as keys are added to the segment (KeyFile.add(),KeyFile.writeInt()).
- Starting after the Key Map: A sequence of Offsets (int[]). Each
- Header (Constant part, written on creation):
-
Value File (
v<firstKey>.dat) Format:- This file is a simple sequence of data entries. Each entry consists of:
- Data Length (short, 2 bytes): The length of the
byte[]data that follows (ValueFile.write()callsDataFile.writeBytesWithLength()). - Data (byte[]): The actual raw byte data being stored.
- Data Length (short, 2 bytes): The length of the
- This file is a simple sequence of data entries. Each entry consists of:
In the context of this codebase, a Segment (represented by the com.marqeta.mqpay.store.Segment class) is a fundamental unit of data storage and management. It encapsulates a pair of files: a KeyFile and a ValueFile.
- Purpose: Each segment is responsible for storing a contiguous range of keys and their associated data. When new data is written, it's typically appended to the "current" or "active" segment.
- Lifecycle:
- A new segment is created when the
Storeis initialized or when the current segment becomes full (reaches itssegmentSizelimit for keys). - Once a segment is full, it's closed for new additions (its
KeyFile'scloseTimeis set), and a new segment is created to handle subsequent writes. - Older, closed segments are kept for read operations.
- A new segment is created when the
- Components:
KeyFile: Manages the index for the keys within that segment. It stores key metadata, including their status (active/deleted) and the offset of their data in theValueFile.ValueFile: Stores the actualbyte[]data for the keys belonging to that segment.
- Key Management: A segment knows its
firstKeyandlastKey(or the maximum key it can hold based onsegmentSize). This allows theStoreto quickly identify which segment a particular key belongs to during read operations (Store.findKeySegment()). - Concurrency: The
Segmentclass uses locks (ReentrantLock) to ensure thread-safe access to its underlyingKeyFileandValueFileduring read and write operations.
This segmented approach allows the system to:
- Manage large amounts of data by breaking it into smaller, more manageable chunks.
- Optimize write performance by typically appending to the end of the current segment's files.
- Facilitate efficient lookups by narrowing down the search for a key to a specific segment.
This diagram shows the typical flow when Store.put(byte[] data) is called.
sequenceDiagram
participant Client
participant Store
participant CurrentSegment as Segment
participant KeyFile
participant ValueFile
participant OS_FileSystem
Client->>Store: put(data)
Store->>Store: putLock.lock()
alt If currentSegment is not open (e.g., full or new store)
Store->>Store: segmentsLock.writeLock().lock()
Store->>Store: Add old currentSegment to segments map
Store->>CurrentSegment: Segment.create(path, newFirstKey, segmentSize)
CurrentSegment->>ValueFile: ValueFile.create(path, newFirstKey)
ValueFile->>OS_FileSystem: Create v<newFirstKey>.dat file
ValueFile-->>CurrentSegment: ValueFile instance
CurrentSegment->>KeyFile: KeyFile.create(path, segmentSize, newFirstKey)
KeyFile->>OS_FileSystem: Create i<newFirstKey>.dat file
KeyFile->>OS_FileSystem: Write KeyFile header (HEAD, openTime, 0L for closeTime, firstKey)
KeyFile->>OS_FileSystem: Write KeyFile map (initially all 0s, with length prefix)
KeyFile-->>CurrentSegment: KeyFile instance
CurrentSegment-->>Store: New currentSegment instance
Store->>Store: segmentsLock.writeLock().unlock()
end
Store->>CurrentSegment: put(data)
CurrentSegment->>ValueFile: write(data)
ValueFile->>ValueFile: lock()
ValueFile->>OS_FileSystem: Seek to currentOffset
ValueFile->>OS_FileSystem: Write data.length (short)
ValueFile->>OS_FileSystem: Write data (byte[])
ValueFile->>ValueFile: Update currentOffset
ValueFile->>ValueFile: unlock()
ValueFile-->>CurrentSegment: dataOffset (long)
CurrentSegment->>KeyFile: add(dataOffset)
KeyFile->>KeyFile: lock()
KeyFile->>KeyFile: Increment lastKey, count, activeCount
KeyFile->>OS_FileSystem: Append (int)dataOffset to KeyFile
alt If KeyFile is now full
KeyFile->>KeyFile: closeFileForAddition()
KeyFile->>OS_FileSystem: Seek to OFFSET_CLOSETIME
KeyFile->>OS_FileSystem: Write current time (long) as closeTime
end
KeyFile->>KeyFile: unlock()
KeyFile-->>CurrentSegment: newGeneratedKey (long)
CurrentSegment-->>Store: newGeneratedKey
Store->>Store: count.incrementAndGet()
Store->>Store: putLock.unlock()
Store-->>Client: newGeneratedKey
This diagram shows the typical flow when Store.get(long key) is called.
sequenceDiagram
participant Client
participant Store
participant TargetSegment as Segment
participant KeyFile
participant ValueFile
participant OS_FileSystem
Client->>Store: get(key)
Store->>Store: findKeySegment(key)
alt Key is in currentSegment
Store-->>TargetSegment: currentSegment
else Key is in an older segment
Store->>Store: segmentsLock.readLock().lock()
Store->>Store: Lookup SegmentInfo in segments map by key range
Store-->>TargetSegment: Found Segment instance (or null if not found/loaded)
Store->>Store: segmentsLock.readLock().unlock()
end
alt If TargetSegment is null
Store-->>Client: null
else
Store->>TargetSegment: get(key)
TargetSegment->>KeyFile: getOffset(key)
KeyFile->>KeyFile: Calculate index from key
KeyFile->>KeyFile: Check in-memory 'map' if key at index is active
alt If key is inactive or out of range for KeyFile
KeyFile-->>TargetSegment: -1 or -2 (invalid offset)
else Key is active
KeyFile-->>TargetSegment: dataOffset (from in-memory 'offsets' array)
end
alt If dataOffset < 0
TargetSegment-->>Store: null
else
TargetSegment->>ValueFile: read(dataOffset)
ValueFile->>ValueFile: lock()
ValueFile->>OS_FileSystem: Seek to dataOffset
ValueFile->>OS_FileSystem: Read dataLength (short)
ValueFile->>OS_FileSystem: Read data (byte[dataLength])
ValueFile->>ValueFile: unlock()
ValueFile-->>TargetSegment: readData (byte[])
TargetSegment-->>Store: readData
end
Store-->>Client: readData (or null)
end