Linux RAID subsystem development
 help / color / mirror / Atom feed
* [PATCH RFC] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
@ 2026-05-20 10:57 Qu Wenruo
  2026-05-22  8:52 ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2026-05-20 10:57 UTC (permalink / raw)
  To: linux-btrfs; +Cc: hch, linux-raid

Recently kernel RAID56 lib is trying to remove the unexpected
single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile
btrfs still supports such setup, which means in the long run btrfs has
to handle such corner case by ourselves.

Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and
P/Q stripes all match each other, rotation also makes no difference.

This patch will disguise those single-data-RAID56 chunks as
RAID1/RAID1C3 chunks.

This is done at two timings:

- Chunk read
- Chunk allocation

And this disguise only affect on-disk chunk map, not affecting the
corresponding block groups, so the extra bits like RAID1C3 or RAID56
compatible flags will not be affected.

This method has a minimal impact on the fs, all other operations like
scrub and read-repair, are all based on the chunk map type, so the
disguise method will require no extra modification to those call sites.

Although there are still some locations that are checking against
block_group->flags, e.g. scrub. Those call sites will still get extra
limits assuming the bg is RAID56. But it should not cause any extra
problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Reason for RFC:

Although the current patch works fine, it is a little fragile, e.g. at
chunk allocation time, we must use the bg->flags, instead of the
in-memory chunk map type.

I'm wondering if we should introduce some dedicated member, e.g.
btrfs_chunk_map::on_disk_type to handle it.
---
 fs/btrfs/volumes.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 93a923e4ecaf..848eafa5fbf7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6000,6 +6000,26 @@ struct btrfs_chunk_map *btrfs_alloc_chunk_map(int num_stripes, gfp_t gfp)
 	return map;
 }
 
+/*
+ * For single data stripe RAID56, it's completely RAID1/RAID1C3, as data,
+ * P/Q stripes all match each other, and rotation makes no difference anymore.
+ *
+ * So we still keep the on-disk chunk/bg type, but in memory we change the
+ * chunk map type directly to RAID1/RAID1C3
+ */
+static void change_single_data_raid56_map(struct btrfs_chunk_map *map)
+{
+	if ((map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) == 0)
+		return;
+	if (nr_data_stripes(map) > 1)
+		return;
+	if (map->type & BTRFS_BLOCK_GROUP_RAID5)
+		map->type |= BTRFS_BLOCK_GROUP_RAID1;
+	else
+		map->type |= BTRFS_BLOCK_GROUP_RAID1C3;
+	map->type &= ~BTRFS_BLOCK_GROUP_RAID56_MASK;
+}
+
 static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
 			struct alloc_chunk_ctl *ctl,
 			struct btrfs_device_info *devices_info)
@@ -6023,6 +6043,7 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
 	map->io_width = BTRFS_STRIPE_LEN;
 	map->sub_stripes = ctl->sub_stripes;
 	map->num_stripes = ctl->num_stripes;
+	change_single_data_raid56_map(map);
 
 	for (int i = 0; i < ctl->ndevs; i++) {
 		for (int j = 0; j < ctl->dev_stripes; j++) {
@@ -6201,7 +6222,7 @@ int btrfs_chunk_alloc_add_chunk_item(struct btrfs_trans_handle *trans,
 	btrfs_set_stack_chunk_length(chunk, bg->length);
 	btrfs_set_stack_chunk_owner(chunk, BTRFS_EXTENT_TREE_OBJECTID);
 	btrfs_set_stack_chunk_stripe_len(chunk, BTRFS_STRIPE_LEN);
-	btrfs_set_stack_chunk_type(chunk, map->type);
+	btrfs_set_stack_chunk_type(chunk, bg->flags);
 	btrfs_set_stack_chunk_num_stripes(chunk, map->num_stripes);
 	btrfs_set_stack_chunk_io_align(chunk, BTRFS_STRIPE_LEN);
 	btrfs_set_stack_chunk_io_width(chunk, BTRFS_STRIPE_LEN);
@@ -7596,6 +7617,7 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf,
 	 */
 	map->sub_stripes = btrfs_raid_array[index].sub_stripes;
 	map->verified_stripes = 0;
+	change_single_data_raid56_map(map);
 
 	if (num_stripes > 0)
 		map->stripe_size = btrfs_calc_stripe_length(map);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-20 10:57 [PATCH RFC] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo
@ 2026-05-22  8:52 ` Christoph Hellwig
  2026-05-22  9:30   ` Qu Wenruo
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2026-05-22  8:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, hch, linux-raid

Hi Qu,

this does looks sensible to me.  Maybe add an xfstests case that
exercises these using loop devices?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-22  8:52 ` Christoph Hellwig
@ 2026-05-22  9:30   ` Qu Wenruo
  2026-05-22 12:08     ` Christoph Hellwig
  0 siblings, 1 reply; 4+ messages in thread
From: Qu Wenruo @ 2026-05-22  9:30 UTC (permalink / raw)
  To: Christoph Hellwig, Qu Wenruo; +Cc: linux-btrfs, linux-raid



在 2026/5/22 18:22, Christoph Hellwig 写道:
> Hi Qu,
> 
> this does looks sensible to me.  Maybe add an xfstests case that
> exercises these using loop devices?
Sure, I'll add a test case to verify the single-data-RAID56 behavior for 
btrfs.

Not sure what specific workload you may want to verify, but my planned 
workload is:

- Create single-data-RAID56 btrfs
- Fsstress it
- Unmount

- Mount with one device missing, readonly
- Do a readonly scrub, should be no error
- Retry with the remaining device(s)

If you have something else to add, I'm pretty happy to add.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-22  9:30   ` Qu Wenruo
@ 2026-05-22 12:08     ` Christoph Hellwig
  0 siblings, 0 replies; 4+ messages in thread
From: Christoph Hellwig @ 2026-05-22 12:08 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs, linux-raid

On Fri, May 22, 2026 at 07:00:25PM +0930, Qu Wenruo wrote:
> Not sure what specific workload you may want to verify, but my planned 
> workload is:
>
> - Create single-data-RAID56 btrfs
> - Fsstress it
> - Unmount
>
> - Mount with one device missing, readonly
> - Do a readonly scrub, should be no error
> - Retry with the remaining device(s)

fsstress doesn't actually check data ingtegrity.  So fsx is usually
a better choice for these kinds of tests.  Otherwise this sounds good.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-22 12:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20 10:57 [PATCH RFC] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo
2026-05-22  8:52 ` Christoph Hellwig
2026-05-22  9:30   ` Qu Wenruo
2026-05-22 12:08     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox