From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH 00/12] btrfs: add raid56 support for bs > ps cases
Date: Mon, 17 Nov 2025 18:00:40 +1030 [thread overview]
Message-ID: <cover.1763361991.git.wqu@suse.com> (raw)
[OVERVIEW]
This series add the missing raid56 support for the experimental bs > ps
support.
The main challenge here is the conflicts between RAID56 RMW/recovery and
data checksum.
For RAID56 RMW/recovery, the vertical stripe can only be mapped one page
one time, as the upper layer can pass bios that are not backed by large
folios (direct IO, encoded read/write/send).
On the other hand, data checksum requires multiple pages at the same
time, e.g. btrfs_calculate_block_csum_pages().
To meet both requirements, introduce a new unit, step, which is
min(PAGE_SIZE, sectorsize), and make the paddrs[] arrays in RAID56 to be
in step sizes.
So for vertical stripe related works, reduce the map size from
one sector to one step. For data checksum verification grab the pointer
from involved paddrs[] array and pass the sub-array into
btrfs_calculate_block_csum_pages().
So before the patchset, the btrfs_raid_bio paddr pointers looks like
this:
16K page size, 4K fs block size (aka, subpage case)
0 16K ...
stripe_pages[]: | | ...
stripe_paddrs[]: 0 1 2 3 4 ...
fs blocks |<-->|<-->|<-->|<-->| ...
There are at least one fs block (sector) inside a page, and each
paddrs[] entry represents an fs block 1:1.
To the new structure for bs > ps support:
4K page size, 8K fs block size
0 4k 8K 12K 16K ...
stripe_pages[]: | | | | | ...
stripe_paddrs[]: 0 1 2 3 4 ...
fs blocks |<------->|<------->| ...
Now paddrs[] entry is no longer 1:1 mapped to an fs block, but
multiple paddrs mapped to one fs block.
The glue unit between paddrs[] and fs blocks is a step.
One fs blocks can one or more steps, and one step maps to a paddr[]
entry 1:1.
For bs <= ps cases, one step is the same as an fs block.
For bs > ps case, one step is just a page.
For RAID56, now we need one extra step iteration loop when handling an
fs block.
[TESTING]
I have tested the following combinations:
- bs=4k ps=4k x86_64
- bs=4k ps=64k arm64
The base line to ensure no regression caused by this patchset for bs
== ps and bs < ps cases.
- bs=8k ps=4k x86_64
The new run for this series.
The only new failure is related to direct IO read verification, which
is a known one caused by no direct IO support for bs > ps cases.
I'm afraid in the long run, the combination matrix will be larger than
larger, and I'm not sure if my environment can handle all the extra bs/ps
combinations.
The long term plan is to test bs=4k ps=4k, bs=4k ps=64k, bs=8k ps=4k
cases only.
[PATCHSET LAYOUT]
Patch 1 introduces an overview of how btrfs_raid_bio structure
works.
Patch 2~10 starts converting t he existing infrastructures to use the
new step based paddr pointers.
Patch 11 enables RAID56 for bs > ps cases, which is still an
experimental feature.
The last patch removes the "_step" infix which is used as a temporary
naming during the work.
[ROADMAP FOR BS > PS SUPPORT]
The remaining feature not yet implemented for bs > ps cases is direct
IO. The needed patch in iomap is submitted through VFS/iomap tree, and
the btrfs part is a very tiny patch, will be submitted during v6.19
cycle.
Qu Wenruo (12):
btrfs: add an overview for the btrfs_raid_bio structure
btrfs: introduce a new parameter to locate a sector
btrfs: prepare generate_pq_vertical() for bs > ps cases
btrfs: prepare recover_vertical() to support bs > ps cases
btrfs: prepare verify_one_sector() to support bs > ps cases
btrfs: prepare verify_bio_data_sectors() to support bs > ps cases
btrfs: prepare set_bio_pages_uptodate() to support bs > ps cases
btrfs: prepare steal_rbio() to support bs > ps cases
btrfs: prepare rbio_bio_add_io_paddr() to support bs > ps cases
btrfs: prepare finish_parity_scrub() to support bs > ps cases
btrfs: enable bs > ps support for raid56
btrfs: remove the "_step" infix
fs/btrfs/disk-io.c | 6 -
fs/btrfs/raid56.c | 711 ++++++++++++++++++++++++++++-----------------
fs/btrfs/raid56.h | 87 ++++++
3 files changed, 535 insertions(+), 269 deletions(-)
--
2.51.2
next reply other threads:[~2025-11-17 7:31 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-17 7:30 Qu Wenruo [this message]
2025-11-17 7:30 ` [PATCH 01/12] btrfs: add an overview for the btrfs_raid_bio structure Qu Wenruo
2025-11-17 7:30 ` [PATCH 02/12] btrfs: introduce a new parameter to locate a sector Qu Wenruo
2025-11-17 7:30 ` [PATCH 03/12] btrfs: prepare generate_pq_vertical() for bs > ps cases Qu Wenruo
2025-11-17 7:30 ` [PATCH 04/12] btrfs: prepare recover_vertical() to support " Qu Wenruo
2025-11-17 7:30 ` [PATCH 05/12] btrfs: prepare verify_one_sector() " Qu Wenruo
2025-11-17 7:30 ` [PATCH 06/12] btrfs: prepare verify_bio_data_sectors() " Qu Wenruo
2025-11-17 7:30 ` [PATCH 07/12] btrfs: prepare set_bio_pages_uptodate() " Qu Wenruo
2025-11-17 7:30 ` [PATCH 08/12] btrfs: prepare steal_rbio() " Qu Wenruo
2025-11-17 7:30 ` [PATCH 09/12] btrfs: prepare rbio_bio_add_io_paddr() " Qu Wenruo
2025-11-17 7:30 ` [PATCH 10/12] btrfs: prepare finish_parity_scrub() " Qu Wenruo
2025-11-17 7:30 ` [PATCH 11/12] btrfs: enable bs > ps support for raid56 Qu Wenruo
2025-11-17 7:30 ` [PATCH 12/12] btrfs: remove the "_step" infix Qu Wenruo
2025-11-18 15:15 ` [PATCH 00/12] btrfs: add raid56 support for bs > ps cases David Sterba
2025-11-18 21:10 ` Qu Wenruo
2025-11-19 8:13 ` David Sterba
2025-11-20 13:23 ` Neal Gompa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1763361991.git.wqu@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox