Skip to content

Refine and relocate core_ondisk documentation#17

Open
SToPire wants to merge 2 commits intoerofs:mainfrom
SToPire:to-be-merge-core
Open

Refine and relocate core_ondisk documentation#17
SToPire wants to merge 2 commits intoerofs:mainfrom
SToPire:to-be-merge-core

Conversation

@SToPire
Copy link
Member

@SToPire SToPire commented Mar 3, 2026

No description provided.

Signed-off-By: Yifan Zhao <yifan.yfzhao@foxmail.com>
Copilot AI review requested due to automatic review settings March 3, 2026 12:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR relocates and expands the EROFS core on-disk format documentation into a dedicated ondisk/ section, updating navigation accordingly.

Changes:

  • Add a new src/ondisk/index.md landing page for on-disk format docs.
  • Move/rewrite the core on-disk format documentation to src/ondisk/core_ondisk.md.
  • Update the design.md toctree to point to the new on-disk documentation entry point and remove the old root-level src/core_ondisk.md.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/ondisk/index.md New on-disk documentation index page and toctree entry.
src/ondisk/core_ondisk.md New/relocated core on-disk format spec with detailed structure descriptions.
src/design.md Updates navigation to reference ondisk/index instead of the old core_ondisk.md.
src/core_ondisk.md Removes the previous top-level core on-disk format document.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 120 to 123
```{toctree}
:hidden:
core_ondisk.md
ondisk/index
merging.md
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design toctree now points to ondisk/index, but src/merging.md still links to #on_disk_superblock, which only worked when the superblock section was in the same page. Please update that link to a cross-document reference (e.g., a {ref} to the on_disk_superblock label, or a doc link to ondisk/core_ondisk with the anchor) so it doesn't break after this relocation.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

false positive. the link still works in src/merging.md

# EROFS On-Disk Format

EROFS uses a compact, block-aligned on-disk layout that is deliberately kept as
minimal as possible to maximise runtime performance and simplify implementation.
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling/style consistency: most of the docs use American English (e.g., "maximize" in src/design.md). Consider changing "maximise" here to "maximize" to match the rest of the documentation.

Suggested change
minimal as possible to maximise runtime performance and simplify implementation.
minimal as possible to maximize runtime performance and simplify implementation.

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this part, I tend to use AE spelling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines +355 to +359
to utilise unless `namehash` is also introduced, but that complicates the
overall implementation and expands directory sizes.

- It is harder to keep filenames and directory entries in the same directory
block (especially _large directories_) to minimise I/O amplification.
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling/style consistency: elsewhere in the docs (e.g., src/design.md) uses American English ("utilize", "minimize"). Consider changing "utilise"/"minimise" in this note to "utilize"/"minimize" for consistency.

Suggested change
to utilise unless `namehash` is also introduced, but that complicates the
overall implementation and expands directory sizes.
- It is harder to keep filenames and directory entries in the same directory
block (especially _large directories_) to minimise I/O amplification.
to utilize unless `namehash` is also introduced, but that complicates the
overall implementation and expands directory sizes.
- It is harder to keep filenames and directory entries in the same directory
block (especially _large directories_) to minimize I/O amplification.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@SToPire SToPire force-pushed the to-be-merge-core branch from 53eef02 to e77313d Compare March 4, 2026 02:06
(erofs_ondisk_format)=
# EROFS On-Disk Format

EROFS uses a compact, block-aligned on-disk layout that is deliberately kept as
Copy link
Member

@hsiangkao hsiangkao Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EROFS uses a flexible, hierarchical, block-aligned on-disk layout that is built with the following goals:

  • DMA- and mmap-friendly, block-aligned data to maximize runtime performance on all kinds of storage devices;
  • A simple core on-disk format that is easy to parse and has zero unnecessary metadata redundancy for archive use unlike other generic filesystems, ideal for data auditing and accessing remote untrusted data;
  • Advanced on-disk features like compression (compressed inodes and metadata compression) are completely optional and aren’t mixed with the core design: you can use them only when needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

The entire filesystem tree is built from just three core on-disk structures:

- **Superblock** — located at a fixed offset of 1024 bytes; the sole
structure at a fixed position in the image.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • located at a fixed offset of 1024 bytes; the only structure at a fixed position in the filesystem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.


- **Superblock** — located at a fixed offset of 1024 bytes; the sole
structure at a fixed position in the image.
- **Compact/Extended inodes** — one record per file, device,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one record per file -> per regular file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

- **Compact/Extended inodes** — one record per file, device,
symlink, or directory; addressed in O(1) time via a simple NID-to-offset formula.
- **Directory entries** — 12-byte records, sorted lexicographically
within each directory block.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directory entries — 12-byte records, sorted lexicographically by filename at the beginning of each directory block (each data block of a directory inode).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@@ -1,4 +1,4 @@
# Core on-disk format
# Core On-Disk Format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core On-disk Format?

Copy link
Member Author

@SToPire SToPire Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, also in index.md

An EROFS image conforms to the core on-disk format if and only if **all** of the
following conditions are met:

1. The `compression_enable` field (offset 0x54, 2 bytes) in the superblock is **0**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_compressed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

> For example, when `blkszbits` is 12 (block size is 4KiB):
The EROFS superblock is located at a fixed absolute offset of **1024 bytes**.
Its base size is 128 bytes. When `sb_extslots` is non-zero, the total superblock
size is `128 + sb_extslots × 16` bytes. The first 1024 bytes are currently unused,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like non-ascii chars like × honestly, use * instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, also in other places

| 0x0C | 1 | `u8` | `blkszbits` | Block size = `2^blkszbits`; minimum 9 |
| 0x0D | 1 | `u8` | `sb_extslots` | Number of 16-byte superblock extension slots |
| 0x0E | 2 | `u16` | `rootnid_2b` | Root directory NID (16-bit); see {ref}`root-nid-encoding` |
| 0x0E | 2 | `u16` | `blocks_hi` | High 16 bits of total block count; see {ref}`block-count-encoding` |
Copy link
Member

@hsiangkao hsiangkao Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not part of the core on-disk format, all 48-bit extensions needs to be documented seperately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

The superblock contains three timestamp-related fields:

- `epoch`: the absolute Unix timestamp used as the counting base point. Compact
inodes store `mtime` as a 32-bit offset relative to `epoch` rather than an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is only valid if 48-bit is on

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

section removed

indicates the total number of directory entries in this directory block.
- For all entries except the last: `nameoff[i+1] − nameoff[i]`.
- For the last entry in the block: `block_end − nameoff[last]`, where `block_end`
is the first byte past the block. Any bytes between the end of the last filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such restriction: The trailing filename is ended either by an '\0' or in the end of the block or the end of the directory inode.

It's up to mkfs to decide how to deal with the remaining bytes but kernels or any compatible parser won't read it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove the following statement:

Any bytes between the end of the last filename and `block_end` must be filled with `0x00`.

Signed-off-by: Yifan Zhao <yifan.yfzhao@foxmail.com>
Copy link
Member

@hsiangkao hsiangkao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SToPire please revise again, thanks!

| 0x08 | 4 | `u32` | `feature_compat` | Compatible feature flags; see {ref}`feature-flags` |
| 0x0C | 1 | `u8` | `blkszbits` | Block size = `2^blkszbits`; minimum 9 |
| 0x0D | 1 | `u8` | `sb_extslots` | Number of 16-byte superblock extension slots |
| 0x0E | 2 | `u16` | `rootnid` | Root directory NID |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rootnid_2b uses the new name instead.

| 0x0D | 1 | `u8` | `sb_extslots` | Number of 16-byte superblock extension slots |
| 0x0E | 2 | `u16` | `rootnid` | Root directory NID |
| 0x10 | 8 | `u64` | `inos` | Total valid inode count |
| 0x18 | 8 | `u64` | `build_time` | Filesystem creation time, seconds since UNIX epoch |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the new name epoch instead.

| 0x0E | 2 | `u16` | `rootnid` | Root directory NID |
| 0x10 | 8 | `u64` | `inos` | Total valid inode count |
| 0x18 | 8 | `u64` | `build_time` | Filesystem creation time, seconds since UNIX epoch |
| 0x20 | 4 | `u32` | `build_time_nsec` | Nanoseconds component of `build_time` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs a section to introduce build_time usage and how to derive the timestamp of compact inodes for core on-disk format.

| 0x18 | 8 | `u64` | `build_time` | Filesystem creation time, seconds since UNIX epoch |
| 0x20 | 4 | `u32` | `build_time_nsec` | Nanoseconds component of `build_time` |
| 0x24 | 4 | `u32` | `blocks` | Total filesystem block count |
| 0x28 | 4 | `u32` | `meta_blkaddr` | Start block address of the metadata area |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start block address to specify the inode-metadata zone., you could revise the grammar a little bit.

| 0x20 | 4 | `u32` | `build_time_nsec` | Nanoseconds component of `build_time` |
| 0x24 | 4 | `u32` | `blocks` | Total filesystem block count |
| 0x28 | 4 | `u32` | `meta_blkaddr` | Start block address of the metadata area |
| 0x2C | 4 | `u32` | `reserved` | Feature-specific; not described in core format |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe xattr_blkaddr can be shown here, since xattrs are not quite away from the core on-disk format.

|-------|-------|-------------|
| 0 | 1 | Inode version: 0 = compact (32-byte), 1 = extended (64-byte) |
| 1–3 | 3 | Data layout: values 0–4 are defined; 5–7 are reserved. See {ref}`inode_data_layouts` |
| 4 | 1 | `EROFS_I_NLINK_1_BIT` (non-directory compact inodes) / `EROFS_I_DOT_OMITTED_BIT` (directory inodes) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are not part of the core on-disk format (since it supports together with 48-bit feature); the compact inode always record nlink for the core format.


| Name | Applicable when | Description |
|--------------------|-----------------|-------------|
| `i_nb.nlink` | `EROFS_I_NLINK_1_BIT` unset (non-directory compact inodes) | Hard link count |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is reserved in the core format, always 0.

## Directories

All on-disk directories are organized in the form of **directory blocks** of size
`2^(blkszbits + dirblkbits)` (currently `dirblkbits` is always 0).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dirblkbits is strictly 0 for now.


Each directory block is divided into two contiguous regions:

1. A fixed-size array of directory entry records at the start of the block.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An array of fixed-size directory entries record at the start of the block.

2. Variable-length filename strings packed at the end of the block, growing towards
the entry array.

The `nameoff` field of the **first** entry in a block encodes the total number of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodes -> indicates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants