public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] btrfs: add raid56 support for bs > ps cases
@ 2025-11-17  7:30 Qu Wenruo
  2025-11-17  7:30 ` [PATCH 01/12] btrfs: add an overview for the btrfs_raid_bio structure Qu Wenruo
                   ` (12 more replies)
  0 siblings, 13 replies; 17+ messages in thread
From: Qu Wenruo @ 2025-11-17  7:30 UTC (permalink / raw)
  To: linux-btrfs

[OVERVIEW]
This series add the missing raid56 support for the experimental bs > ps
support.

The main challenge here is the conflicts between RAID56 RMW/recovery and
data checksum.

For RAID56 RMW/recovery, the vertical stripe can only be mapped one page
one time, as the upper layer can pass bios that are not backed by large
folios (direct IO, encoded read/write/send).

On the other hand, data checksum requires multiple pages at the same
time, e.g. btrfs_calculate_block_csum_pages().

To meet both requirements, introduce a new unit, step, which is
min(PAGE_SIZE, sectorsize), and make the paddrs[] arrays in RAID56 to be
in step sizes.

So for vertical stripe related works, reduce the map size from
one sector to one step. For data checksum verification grab the pointer
from involved paddrs[] array and pass the sub-array into
btrfs_calculate_block_csum_pages().

So before the patchset, the btrfs_raid_bio paddr pointers looks like
this:

  16K page size, 4K fs block size (aka, subpage case)

                       0                   16K  ...
  stripe_pages[]:      |                   |    ...
  stripe_paddrs[]:     0    1    2    3    4    ...
  fs blocks            |<-->|<-->|<-->|<-->|    ...

  There are at least one fs block (sector) inside a page, and each
  paddrs[] entry represents an fs block 1:1.

To the new structure for bs > ps support:

  4K page size, 8K fs block size

                       0    4k   8K   12K   16K  ...
  stripe_pages[]:      |    |    |    |    |     ...
  stripe_paddrs[]:     0    1    2    3    4     ...
  fs blocks            |<------->|<------->|     ...

  Now paddrs[] entry is no longer 1:1 mapped to an fs block, but
  multiple paddrs mapped to one fs block.

The glue unit between paddrs[] and fs blocks is a step.

One fs blocks can one or more steps, and one step maps to a paddr[]
entry 1:1.

For bs <= ps cases, one step is the same as an fs block.
For bs > ps case, one step is just a page.

For RAID56, now we need one extra step iteration loop when handling an
fs block.

[TESTING]
I have tested the following combinations:

- bs=4k ps=4k x86_64
- bs=4k ps=64k arm64
  The base line to ensure no regression caused by this patchset for bs
  == ps and bs < ps cases.

- bs=8k ps=4k x86_64
  The new run for this series.

  The only new failure is related to direct IO read verification, which
  is a known one caused by no direct IO support for bs > ps cases.

I'm afraid in the long run, the combination matrix will be larger than
larger, and I'm not sure if my environment can handle all the extra bs/ps
combinations.

The long term plan is to test bs=4k ps=4k, bs=4k ps=64k, bs=8k ps=4k
cases only.

[PATCHSET LAYOUT]
Patch 1 introduces an overview of how btrfs_raid_bio structure
works.
Patch 2~10 starts converting t he existing infrastructures to use the
new step based paddr pointers.
Patch 11 enables RAID56 for bs > ps cases, which is still an
experimental feature.
The last patch removes the "_step" infix which is used as a temporary
naming during the work.

[ROADMAP FOR BS > PS SUPPORT]
The remaining feature not yet implemented for bs > ps cases is direct
IO. The needed patch in iomap is submitted through VFS/iomap tree, and
the btrfs part is a very tiny patch, will be submitted during v6.19
cycle.


Qu Wenruo (12):
  btrfs: add an overview for the btrfs_raid_bio structure
  btrfs: introduce a new parameter to locate a sector
  btrfs: prepare generate_pq_vertical() for bs > ps cases
  btrfs: prepare recover_vertical() to support bs > ps cases
  btrfs: prepare verify_one_sector() to support bs > ps cases
  btrfs: prepare verify_bio_data_sectors() to support bs > ps cases
  btrfs: prepare set_bio_pages_uptodate() to support bs > ps cases
  btrfs: prepare steal_rbio() to support bs > ps cases
  btrfs: prepare rbio_bio_add_io_paddr() to support bs > ps cases
  btrfs: prepare finish_parity_scrub() to support bs > ps cases
  btrfs: enable bs > ps support for raid56
  btrfs: remove the "_step" infix

 fs/btrfs/disk-io.c |   6 -
 fs/btrfs/raid56.c  | 711 ++++++++++++++++++++++++++++-----------------
 fs/btrfs/raid56.h  |  87 ++++++
 3 files changed, 535 insertions(+), 269 deletions(-)

-- 
2.51.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-11-20 13:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-17  7:30 [PATCH 00/12] btrfs: add raid56 support for bs > ps cases Qu Wenruo
2025-11-17  7:30 ` [PATCH 01/12] btrfs: add an overview for the btrfs_raid_bio structure Qu Wenruo
2025-11-17  7:30 ` [PATCH 02/12] btrfs: introduce a new parameter to locate a sector Qu Wenruo
2025-11-17  7:30 ` [PATCH 03/12] btrfs: prepare generate_pq_vertical() for bs > ps cases Qu Wenruo
2025-11-17  7:30 ` [PATCH 04/12] btrfs: prepare recover_vertical() to support " Qu Wenruo
2025-11-17  7:30 ` [PATCH 05/12] btrfs: prepare verify_one_sector() " Qu Wenruo
2025-11-17  7:30 ` [PATCH 06/12] btrfs: prepare verify_bio_data_sectors() " Qu Wenruo
2025-11-17  7:30 ` [PATCH 07/12] btrfs: prepare set_bio_pages_uptodate() " Qu Wenruo
2025-11-17  7:30 ` [PATCH 08/12] btrfs: prepare steal_rbio() " Qu Wenruo
2025-11-17  7:30 ` [PATCH 09/12] btrfs: prepare rbio_bio_add_io_paddr() " Qu Wenruo
2025-11-17  7:30 ` [PATCH 10/12] btrfs: prepare finish_parity_scrub() " Qu Wenruo
2025-11-17  7:30 ` [PATCH 11/12] btrfs: enable bs > ps support for raid56 Qu Wenruo
2025-11-17  7:30 ` [PATCH 12/12] btrfs: remove the "_step" infix Qu Wenruo
2025-11-18 15:15 ` [PATCH 00/12] btrfs: add raid56 support for bs > ps cases David Sterba
2025-11-18 21:10   ` Qu Wenruo
2025-11-19  8:13     ` David Sterba
2025-11-20 13:23       ` Neal Gompa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox