Refine and relocate core_ondisk documentation#17
Conversation
Signed-off-By: Yifan Zhao <yifan.yfzhao@foxmail.com>
There was a problem hiding this comment.
Pull request overview
This PR relocates and expands the EROFS core on-disk format documentation into a dedicated ondisk/ section, updating navigation accordingly.
Changes:
- Add a new
src/ondisk/index.mdlanding page for on-disk format docs. - Move/rewrite the core on-disk format documentation to
src/ondisk/core_ondisk.md. - Update the
design.mdtoctree to point to the new on-disk documentation entry point and remove the old root-levelsrc/core_ondisk.md.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/ondisk/index.md | New on-disk documentation index page and toctree entry. |
| src/ondisk/core_ondisk.md | New/relocated core on-disk format spec with detailed structure descriptions. |
| src/design.md | Updates navigation to reference ondisk/index instead of the old core_ondisk.md. |
| src/core_ondisk.md | Removes the previous top-level core on-disk format document. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ```{toctree} | ||
| :hidden: | ||
| core_ondisk.md | ||
| ondisk/index | ||
| merging.md |
There was a problem hiding this comment.
The design toctree now points to ondisk/index, but src/merging.md still links to #on_disk_superblock, which only worked when the superblock section was in the same page. Please update that link to a cross-document reference (e.g., a {ref} to the on_disk_superblock label, or a doc link to ondisk/core_ondisk with the anchor) so it doesn't break after this relocation.
There was a problem hiding this comment.
false positive. the link still works in src/merging.md
src/ondisk/index.md
Outdated
| # EROFS On-Disk Format | ||
|
|
||
| EROFS uses a compact, block-aligned on-disk layout that is deliberately kept as | ||
| minimal as possible to maximise runtime performance and simplify implementation. |
There was a problem hiding this comment.
Spelling/style consistency: most of the docs use American English (e.g., "maximize" in src/design.md). Consider changing "maximise" here to "maximize" to match the rest of the documentation.
| minimal as possible to maximise runtime performance and simplify implementation. | |
| minimal as possible to maximize runtime performance and simplify implementation. |
There was a problem hiding this comment.
I agree this part, I tend to use AE spelling.
src/ondisk/core_ondisk.md
Outdated
| to utilise unless `namehash` is also introduced, but that complicates the | ||
| overall implementation and expands directory sizes. | ||
|
|
||
| - It is harder to keep filenames and directory entries in the same directory | ||
| block (especially _large directories_) to minimise I/O amplification. |
There was a problem hiding this comment.
Spelling/style consistency: elsewhere in the docs (e.g., src/design.md) uses American English ("utilize", "minimize"). Consider changing "utilise"/"minimise" in this note to "utilize"/"minimize" for consistency.
| to utilise unless `namehash` is also introduced, but that complicates the | |
| overall implementation and expands directory sizes. | |
| - It is harder to keep filenames and directory entries in the same directory | |
| block (especially _large directories_) to minimise I/O amplification. | |
| to utilize unless `namehash` is also introduced, but that complicates the | |
| overall implementation and expands directory sizes. | |
| - It is harder to keep filenames and directory entries in the same directory | |
| block (especially _large directories_) to minimize I/O amplification. |
53eef02 to
e77313d
Compare
src/ondisk/index.md
Outdated
| (erofs_ondisk_format)= | ||
| # EROFS On-Disk Format | ||
|
|
||
| EROFS uses a compact, block-aligned on-disk layout that is deliberately kept as |
There was a problem hiding this comment.
EROFS uses a flexible, hierarchical, block-aligned on-disk layout that is built with the following goals:
- DMA- and mmap-friendly, block-aligned data to maximize runtime performance on all kinds of storage devices;
- A simple core on-disk format that is easy to parse and has zero unnecessary metadata redundancy for archive use unlike other generic filesystems, ideal for data auditing and accessing remote untrusted data;
- Advanced on-disk features like compression (compressed inodes and metadata compression) are completely optional and aren’t mixed with the core design: you can use them only when needed.
src/ondisk/index.md
Outdated
| The entire filesystem tree is built from just three core on-disk structures: | ||
|
|
||
| - **Superblock** — located at a fixed offset of 1024 bytes; the sole | ||
| structure at a fixed position in the image. |
There was a problem hiding this comment.
- located at a fixed offset of 1024 bytes; the only structure at a fixed position in the filesystem.
src/ondisk/index.md
Outdated
|
|
||
| - **Superblock** — located at a fixed offset of 1024 bytes; the sole | ||
| structure at a fixed position in the image. | ||
| - **Compact/Extended inodes** — one record per file, device, |
There was a problem hiding this comment.
one record per file -> per regular file
src/ondisk/index.md
Outdated
| - **Compact/Extended inodes** — one record per file, device, | ||
| symlink, or directory; addressed in O(1) time via a simple NID-to-offset formula. | ||
| - **Directory entries** — 12-byte records, sorted lexicographically | ||
| within each directory block. |
There was a problem hiding this comment.
Directory entries — 12-byte records, sorted lexicographically by filename at the beginning of each directory block (each data block of a directory inode).
src/ondisk/core_ondisk.md
Outdated
| @@ -1,4 +1,4 @@ | |||
| # Core on-disk format | |||
| # Core On-Disk Format | |||
There was a problem hiding this comment.
updated, also in index.md
src/ondisk/core_ondisk.md
Outdated
| An EROFS image conforms to the core on-disk format if and only if **all** of the | ||
| following conditions are met: | ||
|
|
||
| 1. The `compression_enable` field (offset 0x54, 2 bytes) in the superblock is **0**. |
src/ondisk/core_ondisk.md
Outdated
| > For example, when `blkszbits` is 12 (block size is 4KiB): | ||
| The EROFS superblock is located at a fixed absolute offset of **1024 bytes**. | ||
| Its base size is 128 bytes. When `sb_extslots` is non-zero, the total superblock | ||
| size is `128 + sb_extslots × 16` bytes. The first 1024 bytes are currently unused, |
There was a problem hiding this comment.
I don't like non-ascii chars like × honestly, use * instead.
There was a problem hiding this comment.
updated, also in other places
src/ondisk/core_ondisk.md
Outdated
| | 0x0C | 1 | `u8` | `blkszbits` | Block size = `2^blkszbits`; minimum 9 | | ||
| | 0x0D | 1 | `u8` | `sb_extslots` | Number of 16-byte superblock extension slots | | ||
| | 0x0E | 2 | `u16` | `rootnid_2b` | Root directory NID (16-bit); see {ref}`root-nid-encoding` | | ||
| | 0x0E | 2 | `u16` | `blocks_hi` | High 16 bits of total block count; see {ref}`block-count-encoding` | |
There was a problem hiding this comment.
That is not part of the core on-disk format, all 48-bit extensions needs to be documented seperately.
src/ondisk/core_ondisk.md
Outdated
| The superblock contains three timestamp-related fields: | ||
|
|
||
| - `epoch`: the absolute Unix timestamp used as the counting base point. Compact | ||
| inodes store `mtime` as a 32-bit offset relative to `epoch` rather than an |
There was a problem hiding this comment.
This one is only valid if 48-bit is on
src/ondisk/core_ondisk.md
Outdated
| indicates the total number of directory entries in this directory block. | ||
| - For all entries except the last: `nameoff[i+1] − nameoff[i]`. | ||
| - For the last entry in the block: `block_end − nameoff[last]`, where `block_end` | ||
| is the first byte past the block. Any bytes between the end of the last filename |
There was a problem hiding this comment.
There is no such restriction: The trailing filename is ended either by an '\0' or in the end of the block or the end of the directory inode.
It's up to mkfs to decide how to deal with the remaining bytes but kernels or any compatible parser won't read it.
There was a problem hiding this comment.
will remove the following statement:
Any bytes between the end of the last filename and `block_end` must be filled with `0x00`.
Signed-off-by: Yifan Zhao <yifan.yfzhao@foxmail.com>
e77313d to
3e10646
Compare
| | 0x08 | 4 | `u32` | `feature_compat` | Compatible feature flags; see {ref}`feature-flags` | | ||
| | 0x0C | 1 | `u8` | `blkszbits` | Block size = `2^blkszbits`; minimum 9 | | ||
| | 0x0D | 1 | `u8` | `sb_extslots` | Number of 16-byte superblock extension slots | | ||
| | 0x0E | 2 | `u16` | `rootnid` | Root directory NID | |
There was a problem hiding this comment.
rootnid_2b uses the new name instead.
| | 0x0D | 1 | `u8` | `sb_extslots` | Number of 16-byte superblock extension slots | | ||
| | 0x0E | 2 | `u16` | `rootnid` | Root directory NID | | ||
| | 0x10 | 8 | `u64` | `inos` | Total valid inode count | | ||
| | 0x18 | 8 | `u64` | `build_time` | Filesystem creation time, seconds since UNIX epoch | |
There was a problem hiding this comment.
Use the new name epoch instead.
| | 0x0E | 2 | `u16` | `rootnid` | Root directory NID | | ||
| | 0x10 | 8 | `u64` | `inos` | Total valid inode count | | ||
| | 0x18 | 8 | `u64` | `build_time` | Filesystem creation time, seconds since UNIX epoch | | ||
| | 0x20 | 4 | `u32` | `build_time_nsec` | Nanoseconds component of `build_time` | |
There was a problem hiding this comment.
I think it needs a section to introduce build_time usage and how to derive the timestamp of compact inodes for core on-disk format.
| | 0x18 | 8 | `u64` | `build_time` | Filesystem creation time, seconds since UNIX epoch | | ||
| | 0x20 | 4 | `u32` | `build_time_nsec` | Nanoseconds component of `build_time` | | ||
| | 0x24 | 4 | `u32` | `blocks` | Total filesystem block count | | ||
| | 0x28 | 4 | `u32` | `meta_blkaddr` | Start block address of the metadata area | |
There was a problem hiding this comment.
Start block address to specify the inode-metadata zone., you could revise the grammar a little bit.
| | 0x20 | 4 | `u32` | `build_time_nsec` | Nanoseconds component of `build_time` | | ||
| | 0x24 | 4 | `u32` | `blocks` | Total filesystem block count | | ||
| | 0x28 | 4 | `u32` | `meta_blkaddr` | Start block address of the metadata area | | ||
| | 0x2C | 4 | `u32` | `reserved` | Feature-specific; not described in core format | |
There was a problem hiding this comment.
maybe xattr_blkaddr can be shown here, since xattrs are not quite away from the core on-disk format.
| |-------|-------|-------------| | ||
| | 0 | 1 | Inode version: 0 = compact (32-byte), 1 = extended (64-byte) | | ||
| | 1–3 | 3 | Data layout: values 0–4 are defined; 5–7 are reserved. See {ref}`inode_data_layouts` | | ||
| | 4 | 1 | `EROFS_I_NLINK_1_BIT` (non-directory compact inodes) / `EROFS_I_DOT_OMITTED_BIT` (directory inodes) | |
There was a problem hiding this comment.
they are not part of the core on-disk format (since it supports together with 48-bit feature); the compact inode always record nlink for the core format.
|
|
||
| | Name | Applicable when | Description | | ||
| |--------------------|-----------------|-------------| | ||
| | `i_nb.nlink` | `EROFS_I_NLINK_1_BIT` unset (non-directory compact inodes) | Hard link count | |
There was a problem hiding this comment.
here is reserved in the core format, always 0.
| ## Directories | ||
|
|
||
| All on-disk directories are organized in the form of **directory blocks** of size | ||
| `2^(blkszbits + dirblkbits)` (currently `dirblkbits` is always 0). |
There was a problem hiding this comment.
dirblkbits is strictly 0 for now.
|
|
||
| Each directory block is divided into two contiguous regions: | ||
|
|
||
| 1. A fixed-size array of directory entry records at the start of the block. |
There was a problem hiding this comment.
An array of fixed-size directory entries record at the start of the block.
| 2. Variable-length filename strings packed at the end of the block, growing towards | ||
| the entry array. | ||
|
|
||
| The `nameoff` field of the **first** entry in a block encodes the total number of |
No description provided.