Linux RAID subsystem development
 help / color / mirror / Atom feed
* [PATCH 0/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
@ 2026-05-22  9:23 Qu Wenruo
  2026-05-22  9:23 ` [PATCH 1/3] btrfs: remove btrfs_chunk_map::io_(align|width) members Qu Wenruo
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Qu Wenruo @ 2026-05-22  9:23 UTC (permalink / raw)
  To: linux-btrfs; +Cc: hch, linux-raid

[CHANGELOG]
RFC->v1:
- Remove unused btrfs_chunk_map::io_(align|width) members
  To make space for the new member.

- Fix a duplicated bg->flags assignment in fill_dummy_bgs()
  Just exposed by the last patch which also touches that code.

- Use a dedicated member to record the on-disk chunk type
  So it's less fragile than the RFC patch.

The single-data-RAID56 (2 disks raid5 or 3 disks raid6) is not a feature
that should be supported in the first place, thus raid56 is going to
remove such support.

Meanwhile rejecting single-data-RAID56 will cause existing btrfs users
unable to mount their fs, even if there is only a single
empty chunk with that flag, mostly caused by some degraded mount.

So to avoid impacting existing users, implement an internal
single-data-RAID56 to RAID1/RAID1C3 mapping, to co-operate with the
upstream raid56 lib changes.

Qu Wenruo (3):
  btrfs: remove btrfs_chunk_map::io_(align|width) members
  btrfs: remove duplicated block group type assignment
  btrfs: disguise single-data-RAID56 as RAID1/RAID1C3

 fs/btrfs/block-group.c |  3 +--
 fs/btrfs/volumes.c     | 25 ++++++++++++++++++-------
 fs/btrfs/volumes.h     | 10 ++++++++--
 3 files changed, 27 insertions(+), 11 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] btrfs: remove btrfs_chunk_map::io_(align|width) members
  2026-05-22  9:23 [PATCH 0/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo
@ 2026-05-22  9:23 ` Qu Wenruo
  2026-05-22  9:23 ` [PATCH 2/3] btrfs: remove duplicated block group type assignment Qu Wenruo
  2026-05-22  9:23 ` [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo
  2 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2026-05-22  9:23 UTC (permalink / raw)
  To: linux-btrfs; +Cc: hch, linux-raid

Those two members are read from on-disk metadata, but never utilized.
And for new chunks we always set those members to BTRFS_STRIPE_LEN
anyway.

Thus there is no need to keep them inside btrfs_chunk_map.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 4 ----
 fs/btrfs/volumes.h | 2 --
 2 files changed, 6 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 93a923e4ecaf..daa8f1ddf713 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6019,8 +6019,6 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
 	map->chunk_len = ctl->chunk_size;
 	map->stripe_size = ctl->stripe_size;
 	map->type = type;
-	map->io_align = BTRFS_STRIPE_LEN;
-	map->io_width = BTRFS_STRIPE_LEN;
 	map->sub_stripes = ctl->sub_stripes;
 	map->num_stripes = ctl->num_stripes;
 
@@ -7583,8 +7581,6 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf,
 	map->start = logical;
 	map->chunk_len = length;
 	map->num_stripes = num_stripes;
-	map->io_width = btrfs_chunk_io_width(leaf, chunk);
-	map->io_align = btrfs_chunk_io_align(leaf, chunk);
 	map->type = type;
 	/*
 	 * We can't use the sub_stripes value, as for profiles other than
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 96904d18f686..f24385b599dc 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -633,8 +633,6 @@ struct btrfs_chunk_map {
 	u64 chunk_len;
 	u64 stripe_size;
 	u64 type;
-	int io_align;
-	int io_width;
 	int num_stripes;
 	int sub_stripes;
 	struct btrfs_io_stripe stripes[];
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] btrfs: remove duplicated block group type assignment
  2026-05-22  9:23 [PATCH 0/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo
  2026-05-22  9:23 ` [PATCH 1/3] btrfs: remove btrfs_chunk_map::io_(align|width) members Qu Wenruo
@ 2026-05-22  9:23 ` Qu Wenruo
  2026-05-22  9:23 ` [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo
  2 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2026-05-22  9:23 UTC (permalink / raw)
  To: linux-btrfs; +Cc: hch, linux-raid

In the function fill_dummy_bgs(), bg->flags is assigned twice.

Just remove the second assignment.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/block-group.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index a0cb0db68c9a..41f65a3165f8 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2627,7 +2627,6 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info)
 		bg->flags = map->type;
 		bg->cached = BTRFS_CACHE_FINISHED;
 		bg->used = map->chunk_len;
-		bg->flags = map->type;
 		bg->space_info = btrfs_find_space_info(fs_info, bg->flags);
 		ret = btrfs_add_block_group_cache(bg);
 		/*
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-22  9:23 [PATCH 0/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo
  2026-05-22  9:23 ` [PATCH 1/3] btrfs: remove btrfs_chunk_map::io_(align|width) members Qu Wenruo
  2026-05-22  9:23 ` [PATCH 2/3] btrfs: remove duplicated block group type assignment Qu Wenruo
@ 2026-05-22  9:23 ` Qu Wenruo
  2026-05-23 14:23   ` David Sterba
  2 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2026-05-22  9:23 UTC (permalink / raw)
  To: linux-btrfs; +Cc: hch, linux-raid

Recently kernel RAID56 lib is trying to remove the unexpected
single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile
btrfs still supports such setup, which means in the long run btrfs has
to handle such corner case by ourselves.

Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and
P/Q stripes all match each other, rotation also makes no difference.

This patch will disguise those single-data-RAID56 chunks as
RAID1/RAID1C3 chunks.

This is done at two timings:

- Chunk read
- Chunk allocation

This is done by introducing btrfs_chunk_map::on_disk_type member, which
stores the type read from the on-disk metadata.
Meanwhile btrfs_chunk_map::type is calculated using on_disk_type.

For most profiles @type matches @on_disk_type, but for
single-data-RAID56, the @type will be RAID1/RAID1C3.

This method has a minimal impact on the fs, all other operations like
scrub and read-repair, are all based on the chunk map type, so the
disguise method will require no extra modification to those call sites.

Although there are still some locations that are checking against
block_group->flags, e.g. scrub. Those call sites will still get extra
limits assuming the bg is RAID56. But it should not cause any extra
problem.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/block-group.c |  2 +-
 fs/btrfs/volumes.c     | 21 ++++++++++++++++++---
 fs/btrfs/volumes.h     |  8 ++++++++
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 41f65a3165f8..5c9fc7ed4c3e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2624,7 +2624,7 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info)
 
 		/* Fill dummy cache as FULL */
 		bg->length = map->chunk_len;
-		bg->flags = map->type;
+		bg->flags = map->on_disk_type;
 		bg->cached = BTRFS_CACHE_FINISHED;
 		bg->used = map->chunk_len;
 		bg->space_info = btrfs_find_space_info(fs_info, bg->flags);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index daa8f1ddf713..a754ab7a538f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6000,6 +6000,19 @@ struct btrfs_chunk_map *btrfs_alloc_chunk_map(int num_stripes, gfp_t gfp)
 	return map;
 }
 
+static void set_real_chunk_type(struct btrfs_chunk_map *map)
+{
+	map->type = map->on_disk_type;
+	if (likely((map->on_disk_type & BTRFS_BLOCK_GROUP_RAID56_MASK) == 0 ||
+		   nr_data_stripes(map) > 1))
+		return;
+	if (map->on_disk_type & BTRFS_BLOCK_GROUP_RAID5)
+		map->type |= BTRFS_BLOCK_GROUP_RAID1;
+	else
+		map->type |= BTRFS_BLOCK_GROUP_RAID1C3;
+	map->type &= ~BTRFS_BLOCK_GROUP_RAID56_MASK;
+}
+
 static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
 			struct alloc_chunk_ctl *ctl,
 			struct btrfs_device_info *devices_info)
@@ -6018,9 +6031,10 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans,
 	map->start = start;
 	map->chunk_len = ctl->chunk_size;
 	map->stripe_size = ctl->stripe_size;
-	map->type = type;
+	map->on_disk_type = type;
 	map->sub_stripes = ctl->sub_stripes;
 	map->num_stripes = ctl->num_stripes;
+	set_real_chunk_type(map);
 
 	for (int i = 0; i < ctl->ndevs; i++) {
 		for (int j = 0; j < ctl->dev_stripes; j++) {
@@ -6199,7 +6213,7 @@ int btrfs_chunk_alloc_add_chunk_item(struct btrfs_trans_handle *trans,
 	btrfs_set_stack_chunk_length(chunk, bg->length);
 	btrfs_set_stack_chunk_owner(chunk, BTRFS_EXTENT_TREE_OBJECTID);
 	btrfs_set_stack_chunk_stripe_len(chunk, BTRFS_STRIPE_LEN);
-	btrfs_set_stack_chunk_type(chunk, map->type);
+	btrfs_set_stack_chunk_type(chunk, map->on_disk_type);
 	btrfs_set_stack_chunk_num_stripes(chunk, map->num_stripes);
 	btrfs_set_stack_chunk_io_align(chunk, BTRFS_STRIPE_LEN);
 	btrfs_set_stack_chunk_io_width(chunk, BTRFS_STRIPE_LEN);
@@ -7581,7 +7595,7 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf,
 	map->start = logical;
 	map->chunk_len = length;
 	map->num_stripes = num_stripes;
-	map->type = type;
+	map->on_disk_type = type;
 	/*
 	 * We can't use the sub_stripes value, as for profiles other than
 	 * RAID10, they may have 0 as sub_stripes for filesystems created by
@@ -7592,6 +7606,7 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf,
 	 */
 	map->sub_stripes = btrfs_raid_array[index].sub_stripes;
 	map->verified_stripes = 0;
+	set_real_chunk_type(map);
 
 	if (num_stripes > 0)
 		map->stripe_size = btrfs_calc_stripe_length(map);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index f24385b599dc..f75f516afe0d 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -632,7 +632,15 @@ struct btrfs_chunk_map {
 	u64 start;
 	u64 chunk_len;
 	u64 stripe_size;
+	/*
+	 * The real type that is utilized during logical address mapping.
+	 *
+	 * For most profiles it matches @on_disk_type, but for single-data-RAID56,
+	 * the real type will be set to RAID1/RAID1C3, to avoid unsupported
+	 * operations from raid56 lib.
+	 */
 	u64 type;
+	u64 on_disk_type;
 	int num_stripes;
 	int sub_stripes;
 	struct btrfs_io_stripe stripes[];
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-22  9:23 ` [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo
@ 2026-05-23 14:23   ` David Sterba
  2026-05-23 22:46     ` Qu Wenruo
  2026-05-25  6:00     ` Christoph Hellwig
  0 siblings, 2 replies; 14+ messages in thread
From: David Sterba @ 2026-05-23 14:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, hch, linux-raid

On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote:
> Recently kernel RAID56 lib is trying to remove the unexpected
> single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile
> btrfs still supports such setup, which means in the long run btrfs has
> to handle such corner case by ourselves.
> 
> Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and
> P/Q stripes all match each other, rotation also makes no difference.
> 
> This patch will disguise those single-data-RAID56 chunks as
> RAID1/RAID1C3 chunks.

I don't think this is the right way to fix it. Calculations are an
implementation detail and should be done at the time of the xor_gen or
raid6_call, not touching the upper level structures related to format.

Instead, please add helpers that take the same parameters and do the
switch if there's the edge case of raid5 or raid6, using memcpy. Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-23 14:23   ` David Sterba
@ 2026-05-23 22:46     ` Qu Wenruo
  2026-05-24  4:13       ` Qu Wenruo
  2026-05-25  6:00     ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2026-05-23 22:46 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs, hch, linux-raid



在 2026/5/23 23:53, David Sterba 写道:
> On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote:
>> Recently kernel RAID56 lib is trying to remove the unexpected
>> single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile
>> btrfs still supports such setup, which means in the long run btrfs has
>> to handle such corner case by ourselves.
>>
>> Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and
>> P/Q stripes all match each other, rotation also makes no difference.
>>
>> This patch will disguise those single-data-RAID56 chunks as
>> RAID1/RAID1C3 chunks.
> 
> I don't think this is the right way to fix it. Calculations are an
> implementation detail and should be done at the time of the xor_gen or
> raid6_call, not touching the upper level structures related to format.
> 
> Instead, please add helpers that take the same parameters and do the
> switch if there's the edge case of raid5 or raid6, using memcpy. Thanks.

I'd say going dedicated memcpy() is an adhoc fix, and not any easier to 
maintain.

Now you have two paths, one from the raid56 lib, one from the btrfs 
specific handling for the single-data specific corner case, and that no 
one else is going to review/utilize that corner case.

 From this point of view, reusing the existing RAID1/RAID1C3 is way more 
reliable and sane.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-23 22:46     ` Qu Wenruo
@ 2026-05-24  4:13       ` Qu Wenruo
  2026-05-25 10:12         ` David Sterba
  0 siblings, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2026-05-24  4:13 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs, hch, linux-raid



在 2026/5/24 08:16, Qu Wenruo 写道:
> 
> 
> 在 2026/5/23 23:53, David Sterba 写道:
>> On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote:
>>> Recently kernel RAID56 lib is trying to remove the unexpected
>>> single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile
>>> btrfs still supports such setup, which means in the long run btrfs has
>>> to handle such corner case by ourselves.
>>>
>>> Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and
>>> P/Q stripes all match each other, rotation also makes no difference.
>>>
>>> This patch will disguise those single-data-RAID56 chunks as
>>> RAID1/RAID1C3 chunks.
>>
>> I don't think this is the right way to fix it. Calculations are an
>> implementation detail and should be done at the time of the xor_gen or
>> raid6_call, not touching the upper level structures related to format.
>>
>> Instead, please add helpers that take the same parameters and do the
>> switch if there's the edge case of raid5 or raid6, using memcpy. Thanks.
> 
> I'd say going dedicated memcpy() is an adhoc fix, and not any easier to 
> maintain.
> 
> Now you have two paths, one from the raid56 lib, one from the btrfs 
> specific handling for the single-data specific corner case, and that no 
> one else is going to review/utilize that corner case.
> 
>  From this point of view, reusing the existing RAID1/RAID1C3 is way more 
> reliable and sane.

I tried the idea, and it doesn't look elegant at all, we need a lot of 
if () checks around every xor_gen()/raid6_call.

Sure the end result is not that huge (and I have only compile tested 
it), but if we had some parameters passed wrongly, or some corner case 
missed, it can be very hard to notice.

Just paste the adhoc fix for reference:

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 08ee8f316d96..50fefe33bbe3 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1410,11 +1410,17 @@ static void generate_pq_vertical_step(struct 
btrfs_raid_bio *rbio, unsigned int
  				rbio_qstripe_paddr(rbio, sector_nr, step_nr));

  		assert_rbio(rbio);
-		raid6_call.gen_syndrome(rbio->real_stripes, step, pointers);
+		if (rbio->nr_data > 1) {
+			raid6_call.gen_syndrome(rbio->real_stripes, step, pointers);
+		} else {
+			memcpy(pointers[1], pointers[0], step);
+			memcpy(pointers[2], pointers[0], step);
+		}
  	} else {
  		/* raid5 */
  		memcpy(pointers[rbio->nr_data], pointers[0], step);
-		xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1,
+		if (rbio->nr_data > 1)
+			xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1,
  				step);
  	}
  	for (stripe = stripe - 1; stripe >= 0; stripe--)
@@ -1987,11 +1993,17 @@ static void recover_vertical_step(struct 
btrfs_raid_bio *rbio,
  		}

  		if (failb == rbio->real_stripes - 2) {
-			raid6_datap_recov(rbio->real_stripes, step,
-					  faila, pointers);
+			if (unlikely(rbio->nr_data == 1))
+				memcpy(pointers[0], pointers[rbio->real_stripes - 1], step);
+			else
+				raid6_datap_recov(rbio->real_stripes, step,
+						  faila, pointers);
  		} else {
-			raid6_2data_recov(rbio->real_stripes, step,
-					  faila, failb, pointers);
+			if (unlikely(rbio->nr_data == 1))
+				memcpy(pointers[0], pointers[rbio->real_stripes - 2], step);
+			else
+				raid6_2data_recov(rbio->real_stripes, step,
+						  faila, failb, pointers);
  		}
  	} else {
  		void *p;
@@ -2002,6 +2014,9 @@ static void recover_vertical_step(struct 
btrfs_raid_bio *rbio,
  		/* Copy parity block into failed block to start with */
  		memcpy(pointers[faila], pointers[rbio->nr_data], step);

+		if (rbio->nr_data == 1)
+			goto cleanup;
+
  		/* Rearrange the pointer array */
  		p = pointers[faila];
  		for (stripe_nr = faila; stripe_nr < rbio->nr_data - 1;
@@ -2644,11 +2659,17 @@ static bool verify_one_parity_step(struct 
btrfs_raid_bio *rbio,
  	if (has_qstripe) {
  		assert_rbio(rbio);
  		/* RAID6, call the library function to fill in our P/Q. */
-		raid6_call.gen_syndrome(rbio->real_stripes, step, pointers);
+		if (rbio->nr_data > 1) {
+			raid6_call.gen_syndrome(rbio->real_stripes, step, pointers);
+		} else {
+			memcpy(pointers[1], pointers[0], step);
+			memcpy(pointers[2], pointers[0], step);
+		}
  	} else {
  		/* RAID5. */
  		memcpy(pointers[nr_data], pointers[0], step);
-		xor_gen(pointers[nr_data], pointers + 1, nr_data - 1, step);
+		if (rbio->nr_data > 1)
+			xor_gen(pointers[nr_data], pointers + 1, nr_data - 1, step);
  	}

  	/* Check scrubbing parity and repair it. */
--
2.54.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-23 14:23   ` David Sterba
  2026-05-23 22:46     ` Qu Wenruo
@ 2026-05-25  6:00     ` Christoph Hellwig
  2026-05-25 10:17       ` David Sterba
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2026-05-25  6:00 UTC (permalink / raw)
  To: David Sterba; +Cc: Qu Wenruo, linux-btrfs, hch, linux-raid

On Sat, May 23, 2026 at 04:23:44PM +0200, David Sterba wrote:
> On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote:
> > Recently kernel RAID56 lib is trying to remove the unexpected
> > single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile
> > btrfs still supports such setup, which means in the long run btrfs has
> > to handle such corner case by ourselves.
> > 
> > Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and
> > P/Q stripes all match each other, rotation also makes no difference.
> > 
> > This patch will disguise those single-data-RAID56 chunks as
> > RAID1/RAID1C3 chunks.
> 
> I don't think this is the right way to fix it. Calculations are an
> implementation detail and should be done at the time of the xor_gen or
> raid6_call, not touching the upper level structures related to format.

It absolutely is.  Adding fast path workarounds for this is completely
stupid when it can be trivially handled on the mount side.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-24  4:13       ` Qu Wenruo
@ 2026-05-25 10:12         ` David Sterba
  2026-05-25 10:25           ` Qu Wenruo
  0 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2026-05-25 10:12 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs, hch, linux-raid

On Sun, May 24, 2026 at 01:43:54PM +0930, Qu Wenruo wrote:
> I tried the idea, and it doesn't look elegant at all, we need a lot of 
> if () checks around every xor_gen()/raid6_call.
> 
> Sure the end result is not that huge (and I have only compile tested 
> it), but if we had some parameters passed wrongly, or some corner case 
> missed, it can be very hard to notice.
> 
> Just paste the adhoc fix for reference:

I ended up with something similar, but replaced the open coded ifs with
a helper so there's not that much churn at the call sites.

> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 08ee8f316d96..50fefe33bbe3 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1410,11 +1410,17 @@ static void generate_pq_vertical_step(struct 
> btrfs_raid_bio *rbio, unsigned int
>   				rbio_qstripe_paddr(rbio, sector_nr, step_nr));
> 
>   		assert_rbio(rbio);
> -		raid6_call.gen_syndrome(rbio->real_stripes, step, pointers);
> +		if (rbio->nr_data > 1) {
> +			raid6_call.gen_syndrome(rbio->real_stripes, step, pointers);
> +		} else {
> +			memcpy(pointers[1], pointers[0], step);
> +			memcpy(pointers[2], pointers[0], step);
> +		}
>   	} else {
>   		/* raid5 */
>   		memcpy(pointers[rbio->nr_data], pointers[0], step);
> -		xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1,
> +		if (rbio->nr_data > 1)
> +			xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1,
>   				step);
>   	}
>   	for (stripe = stripe - 1; stripe >= 0; stripe--)
> @@ -1987,11 +1993,17 @@ static void recover_vertical_step(struct 
> btrfs_raid_bio *rbio,
>   		}
> 
>   		if (failb == rbio->real_stripes - 2) {
> -			raid6_datap_recov(rbio->real_stripes, step,
> -					  faila, pointers);
> +			if (unlikely(rbio->nr_data == 1))
> +				memcpy(pointers[0], pointers[rbio->real_stripes - 1], step);
> +			else
> +				raid6_datap_recov(rbio->real_stripes, step,
> +						  faila, pointers);
>   		} else {
> -			raid6_2data_recov(rbio->real_stripes, step,
> -					  faila, failb, pointers);
> +			if (unlikely(rbio->nr_data == 1))
> +				memcpy(pointers[0], pointers[rbio->real_stripes - 2], step);

The 2data recovery needs to do 2 memcpy

> +			else
> +				raid6_2data_recov(rbio->real_stripes, step,
> +						  faila, failb, pointers);
>   		}
>   	} else {
>   		void *p;
> @@ -2002,6 +2014,9 @@ static void recover_vertical_step(struct 
> btrfs_raid_bio *rbio,
>   		/* Copy parity block into failed block to start with */
>   		memcpy(pointers[faila], pointers[rbio->nr_data], step);
> 
> +		if (rbio->nr_data == 1)
> +			goto cleanup;
> +
>   		/* Rearrange the pointer array */
>   		p = pointers[faila];
>   		for (stripe_nr = faila; stripe_nr < rbio->nr_data - 1;
> @@ -2644,11 +2659,17 @@ static bool verify_one_parity_step(struct 
> btrfs_raid_bio *rbio,
>   	if (has_qstripe) {
>   		assert_rbio(rbio);
>   		/* RAID6, call the library function to fill in our P/Q. */
> -		raid6_call.gen_syndrome(rbio->real_stripes, step, pointers);
> +		if (rbio->nr_data > 1) {
> +			raid6_call.gen_syndrome(rbio->real_stripes, step, pointers);
> +		} else {
> +			memcpy(pointers[1], pointers[0], step);
> +			memcpy(pointers[2], pointers[0], step);
> +		}
>   	} else {
>   		/* RAID5. */
>   		memcpy(pointers[nr_data], pointers[0], step);
> -		xor_gen(pointers[nr_data], pointers + 1, nr_data - 1, step);
> +		if (rbio->nr_data > 1)
> +			xor_gen(pointers[nr_data], pointers + 1, nr_data - 1, step);
>   	}
> 
>   	/* Check scrubbing parity and repair it. */
> --
> 2.54.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-25  6:00     ` Christoph Hellwig
@ 2026-05-25 10:17       ` David Sterba
  2026-05-25 10:39         ` Qu Wenruo
  2026-05-26  6:45         ` Christoph Hellwig
  0 siblings, 2 replies; 14+ messages in thread
From: David Sterba @ 2026-05-25 10:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs, linux-raid

On Mon, May 25, 2026 at 08:00:30AM +0200, Christoph Hellwig wrote:
> On Sat, May 23, 2026 at 04:23:44PM +0200, David Sterba wrote:
> > On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote:
> > > Recently kernel RAID56 lib is trying to remove the unexpected
> > > single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile
> > > btrfs still supports such setup, which means in the long run btrfs has
> > > to handle such corner case by ourselves.
> > > 
> > > Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and
> > > P/Q stripes all match each other, rotation also makes no difference.
> > > 
> > > This patch will disguise those single-data-RAID56 chunks as
> > > RAID1/RAID1C3 chunks.
> > 
> > I don't think this is the right way to fix it. Calculations are an
> > implementation detail and should be done at the time of the xor_gen or
> > raid6_call, not touching the upper level structures related to format.
> 
> It absolutely is.  Adding fast path workarounds for this is completely
> stupid when it can be trivially handled on the mount side.

It's a lazy hack at best and fix on absolutely wrong layer. The library
should provide the support for the edge case. But we disagree on that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-25 10:12         ` David Sterba
@ 2026-05-25 10:25           ` Qu Wenruo
  0 siblings, 0 replies; 14+ messages in thread
From: Qu Wenruo @ 2026-05-25 10:25 UTC (permalink / raw)
  To: dsterba; +Cc: linux-btrfs, hch, linux-raid



在 2026/5/25 19:42, David Sterba 写道:
> On Sun, May 24, 2026 at 01:43:54PM +0930, Qu Wenruo wrote:
>> I tried the idea, and it doesn't look elegant at all, we need a lot of
>> if () checks around every xor_gen()/raid6_call.
>>
>> Sure the end result is not that huge (and I have only compile tested
>> it), but if we had some parameters passed wrongly, or some corner case
>> missed, it can be very hard to notice.
>>
>> Just paste the adhoc fix for reference:
> 
> I ended up with something similar, but replaced the open coded ifs with
> a helper so there's not that much churn at the call sites.
> 
>> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
>> index 08ee8f316d96..50fefe33bbe3 100644
>> --- a/fs/btrfs/raid56.c
>> +++ b/fs/btrfs/raid56.c
>> @@ -1410,11 +1410,17 @@ static void generate_pq_vertical_step(struct
>> btrfs_raid_bio *rbio, unsigned int
>>    				rbio_qstripe_paddr(rbio, sector_nr, step_nr));
>>
>>    		assert_rbio(rbio);
>> -		raid6_call.gen_syndrome(rbio->real_stripes, step, pointers);
>> +		if (rbio->nr_data > 1) {
>> +			raid6_call.gen_syndrome(rbio->real_stripes, step, pointers);
>> +		} else {
>> +			memcpy(pointers[1], pointers[0], step);
>> +			memcpy(pointers[2], pointers[0], step);
>> +		}
>>    	} else {
>>    		/* raid5 */
>>    		memcpy(pointers[rbio->nr_data], pointers[0], step);
>> -		xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1,
>> +		if (rbio->nr_data > 1)
>> +			xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1,
>>    				step);
>>    	}
>>    	for (stripe = stripe - 1; stripe >= 0; stripe--)
>> @@ -1987,11 +1993,17 @@ static void recover_vertical_step(struct
>> btrfs_raid_bio *rbio,
>>    		}
>>
>>    		if (failb == rbio->real_stripes - 2) {
>> -			raid6_datap_recov(rbio->real_stripes, step,
>> -					  faila, pointers);
>> +			if (unlikely(rbio->nr_data == 1))
>> +				memcpy(pointers[0], pointers[rbio->real_stripes - 1], step);
>> +			else
>> +				raid6_datap_recov(rbio->real_stripes, step,
>> +						  faila, pointers);
>>    		} else {
>> -			raid6_2data_recov(rbio->real_stripes, step,
>> -					  faila, failb, pointers);
>> +			if (unlikely(rbio->nr_data == 1))
>> +				memcpy(pointers[0], pointers[rbio->real_stripes - 2], step);
> 
> The 2data recovery needs to do 2 memcpy

Nope. The recovery is only to repair the data stripe.
We do not need to bother the failed P or Q stripe here.

And I think this explains exactly why doing adhoc fix is a stupid idea 
in the first place.

We're wasting time on useless corner cases, and corner cases inside 
corner cases.
It makes no sense to waste all the time reviewing those useless details, 
when we have a much better solution to avoid all those hassles.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-25 10:17       ` David Sterba
@ 2026-05-25 10:39         ` Qu Wenruo
  2026-05-26  6:46           ` Christoph Hellwig
  2026-05-26  6:45         ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Qu Wenruo @ 2026-05-25 10:39 UTC (permalink / raw)
  To: dsterba, Christoph Hellwig; +Cc: linux-btrfs, linux-raid



在 2026/5/25 19:47, David Sterba 写道:
> On Mon, May 25, 2026 at 08:00:30AM +0200, Christoph Hellwig wrote:
>> On Sat, May 23, 2026 at 04:23:44PM +0200, David Sterba wrote:
>>> On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote:
>>>> Recently kernel RAID56 lib is trying to remove the unexpected
>>>> single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile
>>>> btrfs still supports such setup, which means in the long run btrfs has
>>>> to handle such corner case by ourselves.
>>>>
>>>> Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and
>>>> P/Q stripes all match each other, rotation also makes no difference.
>>>>
>>>> This patch will disguise those single-data-RAID56 chunks as
>>>> RAID1/RAID1C3 chunks.
>>>
>>> I don't think this is the right way to fix it. Calculations are an
>>> implementation detail and should be done at the time of the xor_gen or
>>> raid6_call, not touching the upper level structures related to format.
>>
>> It absolutely is.  Adding fast path workarounds for this is completely
>> stupid when it can be trivially handled on the mount side.
> 
> It's a lazy hack at best and fix on absolutely wrong layer. The library
> should provide the support for the edge case. But we disagree on that.

I strongly disagree.

Firstly on the layer to fix, let me be clear, if something writes like 
RAID1, reads like RAID1, repairs like RAID1, then it's RAID1.

With that mindset, changing chunk type at read/add time is exactly the 
correct layer to fix.

Or let me change the title of the patch, ENHANCE single-data-RAID56 
handling to get rid all the corner cases.
Do you still think that's the wrong correct layer to fix?

So the wrong layer argument doesn't stand. If you're so focused on 
fixing the problem in the library layer, then all you can see is to put 
some adhoc fixes to emulate the library fix.


Furthermore, we are already doing all the warning about 2-disk raid5, 
now I'm providing a solution that is as mature as RAID1, now you're 
complaining about the simplicity but wants all the complex striping and 
reduced read performance?

I don't see any point in this.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-25 10:17       ` David Sterba
  2026-05-25 10:39         ` Qu Wenruo
@ 2026-05-26  6:45         ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2026-05-26  6:45 UTC (permalink / raw)
  To: David Sterba; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs, linux-raid

On Mon, May 25, 2026 at 12:17:07PM +0200, David Sterba wrote:
> > It absolutely is.  Adding fast path workarounds for this is completely
> > stupid when it can be trivially handled on the mount side.
> 
> It's a lazy hack at best and fix on absolutely wrong layer. The library
> should provide the support for the edge case. But we disagree on that.

No, the library should not provide hacks for something it never intended
to support, and which is a bad idea.  It's also something the
implementation that the library was factored out from for btrfs use
never supported.

So adding the workaround to btrfs, which caused the problem is a
requirement.  Doing it to only create mount time overhead instead of
burdening the I/O path is a very smart way to solve the problem,
while adding hacks to the I/O fastpath is everything but smart.

But in the end this is for the btrfs maintainers to decide.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
  2026-05-25 10:39         ` Qu Wenruo
@ 2026-05-26  6:46           ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2026-05-26  6:46 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Christoph Hellwig, linux-btrfs, linux-raid

On Mon, May 25, 2026 at 08:09:54PM +0930, Qu Wenruo wrote:
>> It's a lazy hack at best and fix on absolutely wrong layer. The library
>> should provide the support for the edge case. But we disagree on that.
>
> I strongly disagree.
>
> Firstly on the layer to fix, let me be clear, if something writes like 
> RAID1, reads like RAID1, repairs like RAID1, then it's RAID1.

*nod*

> With that mindset, changing chunk type at read/add time is exactly the 
> correct layer to fix.

It also is amazingly simple!


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-05-26  6:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22  9:23 [PATCH 0/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo
2026-05-22  9:23 ` [PATCH 1/3] btrfs: remove btrfs_chunk_map::io_(align|width) members Qu Wenruo
2026-05-22  9:23 ` [PATCH 2/3] btrfs: remove duplicated block group type assignment Qu Wenruo
2026-05-22  9:23 ` [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo
2026-05-23 14:23   ` David Sterba
2026-05-23 22:46     ` Qu Wenruo
2026-05-24  4:13       ` Qu Wenruo
2026-05-25 10:12         ` David Sterba
2026-05-25 10:25           ` Qu Wenruo
2026-05-25  6:00     ` Christoph Hellwig
2026-05-25 10:17       ` David Sterba
2026-05-25 10:39         ` Qu Wenruo
2026-05-26  6:46           ` Christoph Hellwig
2026-05-26  6:45         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox