* [PATCH 0/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
@ 2026-05-22 9:23 Qu Wenruo
2026-05-22 9:23 ` [PATCH 1/3] btrfs: remove btrfs_chunk_map::io_(align|width) members Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Qu Wenruo @ 2026-05-22 9:23 UTC (permalink / raw)
To: linux-btrfs; +Cc: hch, linux-raid
[CHANGELOG]
RFC->v1:
- Remove unused btrfs_chunk_map::io_(align|width) members
To make space for the new member.
- Fix a duplicated bg->flags assignment in fill_dummy_bgs()
Just exposed by the last patch which also touches that code.
- Use a dedicated member to record the on-disk chunk type
So it's less fragile than the RFC patch.
The single-data-RAID56 (2 disks raid5 or 3 disks raid6) is not a feature
that should be supported in the first place, thus raid56 is going to
remove such support.
Meanwhile rejecting single-data-RAID56 will cause existing btrfs users
unable to mount their fs, even if there is only a single
empty chunk with that flag, mostly caused by some degraded mount.
So to avoid impacting existing users, implement an internal
single-data-RAID56 to RAID1/RAID1C3 mapping, to co-operate with the
upstream raid56 lib changes.
Qu Wenruo (3):
btrfs: remove btrfs_chunk_map::io_(align|width) members
btrfs: remove duplicated block group type assignment
btrfs: disguise single-data-RAID56 as RAID1/RAID1C3
fs/btrfs/block-group.c | 3 +--
fs/btrfs/volumes.c | 25 ++++++++++++++++++-------
fs/btrfs/volumes.h | 10 ++++++++--
3 files changed, 27 insertions(+), 11 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH 1/3] btrfs: remove btrfs_chunk_map::io_(align|width) members 2026-05-22 9:23 [PATCH 0/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo @ 2026-05-22 9:23 ` Qu Wenruo 2026-05-22 9:23 ` [PATCH 2/3] btrfs: remove duplicated block group type assignment Qu Wenruo 2026-05-22 9:23 ` [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo 2 siblings, 0 replies; 14+ messages in thread From: Qu Wenruo @ 2026-05-22 9:23 UTC (permalink / raw) To: linux-btrfs; +Cc: hch, linux-raid Those two members are read from on-disk metadata, but never utilized. And for new chunks we always set those members to BTRFS_STRIPE_LEN anyway. Thus there is no need to keep them inside btrfs_chunk_map. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/volumes.c | 4 ---- fs/btrfs/volumes.h | 2 -- 2 files changed, 6 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 93a923e4ecaf..daa8f1ddf713 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6019,8 +6019,6 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans, map->chunk_len = ctl->chunk_size; map->stripe_size = ctl->stripe_size; map->type = type; - map->io_align = BTRFS_STRIPE_LEN; - map->io_width = BTRFS_STRIPE_LEN; map->sub_stripes = ctl->sub_stripes; map->num_stripes = ctl->num_stripes; @@ -7583,8 +7581,6 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf, map->start = logical; map->chunk_len = length; map->num_stripes = num_stripes; - map->io_width = btrfs_chunk_io_width(leaf, chunk); - map->io_align = btrfs_chunk_io_align(leaf, chunk); map->type = type; /* * We can't use the sub_stripes value, as for profiles other than diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 96904d18f686..f24385b599dc 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -633,8 +633,6 @@ struct btrfs_chunk_map { u64 chunk_len; u64 stripe_size; u64 type; - int io_align; - int io_width; int num_stripes; int sub_stripes; struct btrfs_io_stripe stripes[]; -- 2.54.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/3] btrfs: remove duplicated block group type assignment 2026-05-22 9:23 [PATCH 0/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo 2026-05-22 9:23 ` [PATCH 1/3] btrfs: remove btrfs_chunk_map::io_(align|width) members Qu Wenruo @ 2026-05-22 9:23 ` Qu Wenruo 2026-05-22 9:23 ` [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo 2 siblings, 0 replies; 14+ messages in thread From: Qu Wenruo @ 2026-05-22 9:23 UTC (permalink / raw) To: linux-btrfs; +Cc: hch, linux-raid In the function fill_dummy_bgs(), bg->flags is assigned twice. Just remove the second assignment. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/block-group.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index a0cb0db68c9a..41f65a3165f8 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2627,7 +2627,6 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info) bg->flags = map->type; bg->cached = BTRFS_CACHE_FINISHED; bg->used = map->chunk_len; - bg->flags = map->type; bg->space_info = btrfs_find_space_info(fs_info, bg->flags); ret = btrfs_add_block_group_cache(bg); /* -- 2.54.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 2026-05-22 9:23 [PATCH 0/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo 2026-05-22 9:23 ` [PATCH 1/3] btrfs: remove btrfs_chunk_map::io_(align|width) members Qu Wenruo 2026-05-22 9:23 ` [PATCH 2/3] btrfs: remove duplicated block group type assignment Qu Wenruo @ 2026-05-22 9:23 ` Qu Wenruo 2026-05-23 14:23 ` David Sterba 2 siblings, 1 reply; 14+ messages in thread From: Qu Wenruo @ 2026-05-22 9:23 UTC (permalink / raw) To: linux-btrfs; +Cc: hch, linux-raid Recently kernel RAID56 lib is trying to remove the unexpected single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile btrfs still supports such setup, which means in the long run btrfs has to handle such corner case by ourselves. Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and P/Q stripes all match each other, rotation also makes no difference. This patch will disguise those single-data-RAID56 chunks as RAID1/RAID1C3 chunks. This is done at two timings: - Chunk read - Chunk allocation This is done by introducing btrfs_chunk_map::on_disk_type member, which stores the type read from the on-disk metadata. Meanwhile btrfs_chunk_map::type is calculated using on_disk_type. For most profiles @type matches @on_disk_type, but for single-data-RAID56, the @type will be RAID1/RAID1C3. This method has a minimal impact on the fs, all other operations like scrub and read-repair, are all based on the chunk map type, so the disguise method will require no extra modification to those call sites. Although there are still some locations that are checking against block_group->flags, e.g. scrub. Those call sites will still get extra limits assuming the bg is RAID56. But it should not cause any extra problem. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/block-group.c | 2 +- fs/btrfs/volumes.c | 21 ++++++++++++++++++--- fs/btrfs/volumes.h | 8 ++++++++ 3 files changed, 27 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 41f65a3165f8..5c9fc7ed4c3e 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2624,7 +2624,7 @@ static int fill_dummy_bgs(struct btrfs_fs_info *fs_info) /* Fill dummy cache as FULL */ bg->length = map->chunk_len; - bg->flags = map->type; + bg->flags = map->on_disk_type; bg->cached = BTRFS_CACHE_FINISHED; bg->used = map->chunk_len; bg->space_info = btrfs_find_space_info(fs_info, bg->flags); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index daa8f1ddf713..a754ab7a538f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6000,6 +6000,19 @@ struct btrfs_chunk_map *btrfs_alloc_chunk_map(int num_stripes, gfp_t gfp) return map; } +static void set_real_chunk_type(struct btrfs_chunk_map *map) +{ + map->type = map->on_disk_type; + if (likely((map->on_disk_type & BTRFS_BLOCK_GROUP_RAID56_MASK) == 0 || + nr_data_stripes(map) > 1)) + return; + if (map->on_disk_type & BTRFS_BLOCK_GROUP_RAID5) + map->type |= BTRFS_BLOCK_GROUP_RAID1; + else + map->type |= BTRFS_BLOCK_GROUP_RAID1C3; + map->type &= ~BTRFS_BLOCK_GROUP_RAID56_MASK; +} + static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans, struct alloc_chunk_ctl *ctl, struct btrfs_device_info *devices_info) @@ -6018,9 +6031,10 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans, map->start = start; map->chunk_len = ctl->chunk_size; map->stripe_size = ctl->stripe_size; - map->type = type; + map->on_disk_type = type; map->sub_stripes = ctl->sub_stripes; map->num_stripes = ctl->num_stripes; + set_real_chunk_type(map); for (int i = 0; i < ctl->ndevs; i++) { for (int j = 0; j < ctl->dev_stripes; j++) { @@ -6199,7 +6213,7 @@ int btrfs_chunk_alloc_add_chunk_item(struct btrfs_trans_handle *trans, btrfs_set_stack_chunk_length(chunk, bg->length); btrfs_set_stack_chunk_owner(chunk, BTRFS_EXTENT_TREE_OBJECTID); btrfs_set_stack_chunk_stripe_len(chunk, BTRFS_STRIPE_LEN); - btrfs_set_stack_chunk_type(chunk, map->type); + btrfs_set_stack_chunk_type(chunk, map->on_disk_type); btrfs_set_stack_chunk_num_stripes(chunk, map->num_stripes); btrfs_set_stack_chunk_io_align(chunk, BTRFS_STRIPE_LEN); btrfs_set_stack_chunk_io_width(chunk, BTRFS_STRIPE_LEN); @@ -7581,7 +7595,7 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf, map->start = logical; map->chunk_len = length; map->num_stripes = num_stripes; - map->type = type; + map->on_disk_type = type; /* * We can't use the sub_stripes value, as for profiles other than * RAID10, they may have 0 as sub_stripes for filesystems created by @@ -7592,6 +7606,7 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf, */ map->sub_stripes = btrfs_raid_array[index].sub_stripes; map->verified_stripes = 0; + set_real_chunk_type(map); if (num_stripes > 0) map->stripe_size = btrfs_calc_stripe_length(map); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index f24385b599dc..f75f516afe0d 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -632,7 +632,15 @@ struct btrfs_chunk_map { u64 start; u64 chunk_len; u64 stripe_size; + /* + * The real type that is utilized during logical address mapping. + * + * For most profiles it matches @on_disk_type, but for single-data-RAID56, + * the real type will be set to RAID1/RAID1C3, to avoid unsupported + * operations from raid56 lib. + */ u64 type; + u64 on_disk_type; int num_stripes; int sub_stripes; struct btrfs_io_stripe stripes[]; -- 2.54.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 2026-05-22 9:23 ` [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo @ 2026-05-23 14:23 ` David Sterba 2026-05-23 22:46 ` Qu Wenruo 2026-05-25 6:00 ` Christoph Hellwig 0 siblings, 2 replies; 14+ messages in thread From: David Sterba @ 2026-05-23 14:23 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, hch, linux-raid On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote: > Recently kernel RAID56 lib is trying to remove the unexpected > single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile > btrfs still supports such setup, which means in the long run btrfs has > to handle such corner case by ourselves. > > Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and > P/Q stripes all match each other, rotation also makes no difference. > > This patch will disguise those single-data-RAID56 chunks as > RAID1/RAID1C3 chunks. I don't think this is the right way to fix it. Calculations are an implementation detail and should be done at the time of the xor_gen or raid6_call, not touching the upper level structures related to format. Instead, please add helpers that take the same parameters and do the switch if there's the edge case of raid5 or raid6, using memcpy. Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 2026-05-23 14:23 ` David Sterba @ 2026-05-23 22:46 ` Qu Wenruo 2026-05-24 4:13 ` Qu Wenruo 2026-05-25 6:00 ` Christoph Hellwig 1 sibling, 1 reply; 14+ messages in thread From: Qu Wenruo @ 2026-05-23 22:46 UTC (permalink / raw) To: dsterba; +Cc: linux-btrfs, hch, linux-raid 在 2026/5/23 23:53, David Sterba 写道: > On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote: >> Recently kernel RAID56 lib is trying to remove the unexpected >> single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile >> btrfs still supports such setup, which means in the long run btrfs has >> to handle such corner case by ourselves. >> >> Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and >> P/Q stripes all match each other, rotation also makes no difference. >> >> This patch will disguise those single-data-RAID56 chunks as >> RAID1/RAID1C3 chunks. > > I don't think this is the right way to fix it. Calculations are an > implementation detail and should be done at the time of the xor_gen or > raid6_call, not touching the upper level structures related to format. > > Instead, please add helpers that take the same parameters and do the > switch if there's the edge case of raid5 or raid6, using memcpy. Thanks. I'd say going dedicated memcpy() is an adhoc fix, and not any easier to maintain. Now you have two paths, one from the raid56 lib, one from the btrfs specific handling for the single-data specific corner case, and that no one else is going to review/utilize that corner case. From this point of view, reusing the existing RAID1/RAID1C3 is way more reliable and sane. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 2026-05-23 22:46 ` Qu Wenruo @ 2026-05-24 4:13 ` Qu Wenruo 2026-05-25 10:12 ` David Sterba 0 siblings, 1 reply; 14+ messages in thread From: Qu Wenruo @ 2026-05-24 4:13 UTC (permalink / raw) To: dsterba; +Cc: linux-btrfs, hch, linux-raid 在 2026/5/24 08:16, Qu Wenruo 写道: > > > 在 2026/5/23 23:53, David Sterba 写道: >> On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote: >>> Recently kernel RAID56 lib is trying to remove the unexpected >>> single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile >>> btrfs still supports such setup, which means in the long run btrfs has >>> to handle such corner case by ourselves. >>> >>> Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and >>> P/Q stripes all match each other, rotation also makes no difference. >>> >>> This patch will disguise those single-data-RAID56 chunks as >>> RAID1/RAID1C3 chunks. >> >> I don't think this is the right way to fix it. Calculations are an >> implementation detail and should be done at the time of the xor_gen or >> raid6_call, not touching the upper level structures related to format. >> >> Instead, please add helpers that take the same parameters and do the >> switch if there's the edge case of raid5 or raid6, using memcpy. Thanks. > > I'd say going dedicated memcpy() is an adhoc fix, and not any easier to > maintain. > > Now you have two paths, one from the raid56 lib, one from the btrfs > specific handling for the single-data specific corner case, and that no > one else is going to review/utilize that corner case. > > From this point of view, reusing the existing RAID1/RAID1C3 is way more > reliable and sane. I tried the idea, and it doesn't look elegant at all, we need a lot of if () checks around every xor_gen()/raid6_call. Sure the end result is not that huge (and I have only compile tested it), but if we had some parameters passed wrongly, or some corner case missed, it can be very hard to notice. Just paste the adhoc fix for reference: diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 08ee8f316d96..50fefe33bbe3 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1410,11 +1410,17 @@ static void generate_pq_vertical_step(struct btrfs_raid_bio *rbio, unsigned int rbio_qstripe_paddr(rbio, sector_nr, step_nr)); assert_rbio(rbio); - raid6_call.gen_syndrome(rbio->real_stripes, step, pointers); + if (rbio->nr_data > 1) { + raid6_call.gen_syndrome(rbio->real_stripes, step, pointers); + } else { + memcpy(pointers[1], pointers[0], step); + memcpy(pointers[2], pointers[0], step); + } } else { /* raid5 */ memcpy(pointers[rbio->nr_data], pointers[0], step); - xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1, + if (rbio->nr_data > 1) + xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1, step); } for (stripe = stripe - 1; stripe >= 0; stripe--) @@ -1987,11 +1993,17 @@ static void recover_vertical_step(struct btrfs_raid_bio *rbio, } if (failb == rbio->real_stripes - 2) { - raid6_datap_recov(rbio->real_stripes, step, - faila, pointers); + if (unlikely(rbio->nr_data == 1)) + memcpy(pointers[0], pointers[rbio->real_stripes - 1], step); + else + raid6_datap_recov(rbio->real_stripes, step, + faila, pointers); } else { - raid6_2data_recov(rbio->real_stripes, step, - faila, failb, pointers); + if (unlikely(rbio->nr_data == 1)) + memcpy(pointers[0], pointers[rbio->real_stripes - 2], step); + else + raid6_2data_recov(rbio->real_stripes, step, + faila, failb, pointers); } } else { void *p; @@ -2002,6 +2014,9 @@ static void recover_vertical_step(struct btrfs_raid_bio *rbio, /* Copy parity block into failed block to start with */ memcpy(pointers[faila], pointers[rbio->nr_data], step); + if (rbio->nr_data == 1) + goto cleanup; + /* Rearrange the pointer array */ p = pointers[faila]; for (stripe_nr = faila; stripe_nr < rbio->nr_data - 1; @@ -2644,11 +2659,17 @@ static bool verify_one_parity_step(struct btrfs_raid_bio *rbio, if (has_qstripe) { assert_rbio(rbio); /* RAID6, call the library function to fill in our P/Q. */ - raid6_call.gen_syndrome(rbio->real_stripes, step, pointers); + if (rbio->nr_data > 1) { + raid6_call.gen_syndrome(rbio->real_stripes, step, pointers); + } else { + memcpy(pointers[1], pointers[0], step); + memcpy(pointers[2], pointers[0], step); + } } else { /* RAID5. */ memcpy(pointers[nr_data], pointers[0], step); - xor_gen(pointers[nr_data], pointers + 1, nr_data - 1, step); + if (rbio->nr_data > 1) + xor_gen(pointers[nr_data], pointers + 1, nr_data - 1, step); } /* Check scrubbing parity and repair it. */ -- 2.54.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 2026-05-24 4:13 ` Qu Wenruo @ 2026-05-25 10:12 ` David Sterba 2026-05-25 10:25 ` Qu Wenruo 0 siblings, 1 reply; 14+ messages in thread From: David Sterba @ 2026-05-25 10:12 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, hch, linux-raid On Sun, May 24, 2026 at 01:43:54PM +0930, Qu Wenruo wrote: > I tried the idea, and it doesn't look elegant at all, we need a lot of > if () checks around every xor_gen()/raid6_call. > > Sure the end result is not that huge (and I have only compile tested > it), but if we had some parameters passed wrongly, or some corner case > missed, it can be very hard to notice. > > Just paste the adhoc fix for reference: I ended up with something similar, but replaced the open coded ifs with a helper so there's not that much churn at the call sites. > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c > index 08ee8f316d96..50fefe33bbe3 100644 > --- a/fs/btrfs/raid56.c > +++ b/fs/btrfs/raid56.c > @@ -1410,11 +1410,17 @@ static void generate_pq_vertical_step(struct > btrfs_raid_bio *rbio, unsigned int > rbio_qstripe_paddr(rbio, sector_nr, step_nr)); > > assert_rbio(rbio); > - raid6_call.gen_syndrome(rbio->real_stripes, step, pointers); > + if (rbio->nr_data > 1) { > + raid6_call.gen_syndrome(rbio->real_stripes, step, pointers); > + } else { > + memcpy(pointers[1], pointers[0], step); > + memcpy(pointers[2], pointers[0], step); > + } > } else { > /* raid5 */ > memcpy(pointers[rbio->nr_data], pointers[0], step); > - xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1, > + if (rbio->nr_data > 1) > + xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1, > step); > } > for (stripe = stripe - 1; stripe >= 0; stripe--) > @@ -1987,11 +1993,17 @@ static void recover_vertical_step(struct > btrfs_raid_bio *rbio, > } > > if (failb == rbio->real_stripes - 2) { > - raid6_datap_recov(rbio->real_stripes, step, > - faila, pointers); > + if (unlikely(rbio->nr_data == 1)) > + memcpy(pointers[0], pointers[rbio->real_stripes - 1], step); > + else > + raid6_datap_recov(rbio->real_stripes, step, > + faila, pointers); > } else { > - raid6_2data_recov(rbio->real_stripes, step, > - faila, failb, pointers); > + if (unlikely(rbio->nr_data == 1)) > + memcpy(pointers[0], pointers[rbio->real_stripes - 2], step); The 2data recovery needs to do 2 memcpy > + else > + raid6_2data_recov(rbio->real_stripes, step, > + faila, failb, pointers); > } > } else { > void *p; > @@ -2002,6 +2014,9 @@ static void recover_vertical_step(struct > btrfs_raid_bio *rbio, > /* Copy parity block into failed block to start with */ > memcpy(pointers[faila], pointers[rbio->nr_data], step); > > + if (rbio->nr_data == 1) > + goto cleanup; > + > /* Rearrange the pointer array */ > p = pointers[faila]; > for (stripe_nr = faila; stripe_nr < rbio->nr_data - 1; > @@ -2644,11 +2659,17 @@ static bool verify_one_parity_step(struct > btrfs_raid_bio *rbio, > if (has_qstripe) { > assert_rbio(rbio); > /* RAID6, call the library function to fill in our P/Q. */ > - raid6_call.gen_syndrome(rbio->real_stripes, step, pointers); > + if (rbio->nr_data > 1) { > + raid6_call.gen_syndrome(rbio->real_stripes, step, pointers); > + } else { > + memcpy(pointers[1], pointers[0], step); > + memcpy(pointers[2], pointers[0], step); > + } > } else { > /* RAID5. */ > memcpy(pointers[nr_data], pointers[0], step); > - xor_gen(pointers[nr_data], pointers + 1, nr_data - 1, step); > + if (rbio->nr_data > 1) > + xor_gen(pointers[nr_data], pointers + 1, nr_data - 1, step); > } > > /* Check scrubbing parity and repair it. */ > -- > 2.54.0 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 2026-05-25 10:12 ` David Sterba @ 2026-05-25 10:25 ` Qu Wenruo 0 siblings, 0 replies; 14+ messages in thread From: Qu Wenruo @ 2026-05-25 10:25 UTC (permalink / raw) To: dsterba; +Cc: linux-btrfs, hch, linux-raid 在 2026/5/25 19:42, David Sterba 写道: > On Sun, May 24, 2026 at 01:43:54PM +0930, Qu Wenruo wrote: >> I tried the idea, and it doesn't look elegant at all, we need a lot of >> if () checks around every xor_gen()/raid6_call. >> >> Sure the end result is not that huge (and I have only compile tested >> it), but if we had some parameters passed wrongly, or some corner case >> missed, it can be very hard to notice. >> >> Just paste the adhoc fix for reference: > > I ended up with something similar, but replaced the open coded ifs with > a helper so there's not that much churn at the call sites. > >> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c >> index 08ee8f316d96..50fefe33bbe3 100644 >> --- a/fs/btrfs/raid56.c >> +++ b/fs/btrfs/raid56.c >> @@ -1410,11 +1410,17 @@ static void generate_pq_vertical_step(struct >> btrfs_raid_bio *rbio, unsigned int >> rbio_qstripe_paddr(rbio, sector_nr, step_nr)); >> >> assert_rbio(rbio); >> - raid6_call.gen_syndrome(rbio->real_stripes, step, pointers); >> + if (rbio->nr_data > 1) { >> + raid6_call.gen_syndrome(rbio->real_stripes, step, pointers); >> + } else { >> + memcpy(pointers[1], pointers[0], step); >> + memcpy(pointers[2], pointers[0], step); >> + } >> } else { >> /* raid5 */ >> memcpy(pointers[rbio->nr_data], pointers[0], step); >> - xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1, >> + if (rbio->nr_data > 1) >> + xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1, >> step); >> } >> for (stripe = stripe - 1; stripe >= 0; stripe--) >> @@ -1987,11 +1993,17 @@ static void recover_vertical_step(struct >> btrfs_raid_bio *rbio, >> } >> >> if (failb == rbio->real_stripes - 2) { >> - raid6_datap_recov(rbio->real_stripes, step, >> - faila, pointers); >> + if (unlikely(rbio->nr_data == 1)) >> + memcpy(pointers[0], pointers[rbio->real_stripes - 1], step); >> + else >> + raid6_datap_recov(rbio->real_stripes, step, >> + faila, pointers); >> } else { >> - raid6_2data_recov(rbio->real_stripes, step, >> - faila, failb, pointers); >> + if (unlikely(rbio->nr_data == 1)) >> + memcpy(pointers[0], pointers[rbio->real_stripes - 2], step); > > The 2data recovery needs to do 2 memcpy Nope. The recovery is only to repair the data stripe. We do not need to bother the failed P or Q stripe here. And I think this explains exactly why doing adhoc fix is a stupid idea in the first place. We're wasting time on useless corner cases, and corner cases inside corner cases. It makes no sense to waste all the time reviewing those useless details, when we have a much better solution to avoid all those hassles. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 2026-05-23 14:23 ` David Sterba 2026-05-23 22:46 ` Qu Wenruo @ 2026-05-25 6:00 ` Christoph Hellwig 2026-05-25 10:17 ` David Sterba 1 sibling, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2026-05-25 6:00 UTC (permalink / raw) To: David Sterba; +Cc: Qu Wenruo, linux-btrfs, hch, linux-raid On Sat, May 23, 2026 at 04:23:44PM +0200, David Sterba wrote: > On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote: > > Recently kernel RAID56 lib is trying to remove the unexpected > > single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile > > btrfs still supports such setup, which means in the long run btrfs has > > to handle such corner case by ourselves. > > > > Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and > > P/Q stripes all match each other, rotation also makes no difference. > > > > This patch will disguise those single-data-RAID56 chunks as > > RAID1/RAID1C3 chunks. > > I don't think this is the right way to fix it. Calculations are an > implementation detail and should be done at the time of the xor_gen or > raid6_call, not touching the upper level structures related to format. It absolutely is. Adding fast path workarounds for this is completely stupid when it can be trivially handled on the mount side. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 2026-05-25 6:00 ` Christoph Hellwig @ 2026-05-25 10:17 ` David Sterba 2026-05-25 10:39 ` Qu Wenruo 2026-05-26 6:45 ` Christoph Hellwig 0 siblings, 2 replies; 14+ messages in thread From: David Sterba @ 2026-05-25 10:17 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Qu Wenruo, linux-btrfs, linux-raid On Mon, May 25, 2026 at 08:00:30AM +0200, Christoph Hellwig wrote: > On Sat, May 23, 2026 at 04:23:44PM +0200, David Sterba wrote: > > On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote: > > > Recently kernel RAID56 lib is trying to remove the unexpected > > > single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile > > > btrfs still supports such setup, which means in the long run btrfs has > > > to handle such corner case by ourselves. > > > > > > Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and > > > P/Q stripes all match each other, rotation also makes no difference. > > > > > > This patch will disguise those single-data-RAID56 chunks as > > > RAID1/RAID1C3 chunks. > > > > I don't think this is the right way to fix it. Calculations are an > > implementation detail and should be done at the time of the xor_gen or > > raid6_call, not touching the upper level structures related to format. > > It absolutely is. Adding fast path workarounds for this is completely > stupid when it can be trivially handled on the mount side. It's a lazy hack at best and fix on absolutely wrong layer. The library should provide the support for the edge case. But we disagree on that. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 2026-05-25 10:17 ` David Sterba @ 2026-05-25 10:39 ` Qu Wenruo 2026-05-26 6:46 ` Christoph Hellwig 2026-05-26 6:45 ` Christoph Hellwig 1 sibling, 1 reply; 14+ messages in thread From: Qu Wenruo @ 2026-05-25 10:39 UTC (permalink / raw) To: dsterba, Christoph Hellwig; +Cc: linux-btrfs, linux-raid 在 2026/5/25 19:47, David Sterba 写道: > On Mon, May 25, 2026 at 08:00:30AM +0200, Christoph Hellwig wrote: >> On Sat, May 23, 2026 at 04:23:44PM +0200, David Sterba wrote: >>> On Fri, May 22, 2026 at 06:53:53PM +0930, Qu Wenruo wrote: >>>> Recently kernel RAID56 lib is trying to remove the unexpected >>>> single-data-RAID56 (2 disks RAID5 or 3 disk RAID5) support, meanwhile >>>> btrfs still supports such setup, which means in the long run btrfs has >>>> to handle such corner case by ourselves. >>>> >>>> Thankfully single-data-RAID56 is really RAID1/RAID1C3, since data and >>>> P/Q stripes all match each other, rotation also makes no difference. >>>> >>>> This patch will disguise those single-data-RAID56 chunks as >>>> RAID1/RAID1C3 chunks. >>> >>> I don't think this is the right way to fix it. Calculations are an >>> implementation detail and should be done at the time of the xor_gen or >>> raid6_call, not touching the upper level structures related to format. >> >> It absolutely is. Adding fast path workarounds for this is completely >> stupid when it can be trivially handled on the mount side. > > It's a lazy hack at best and fix on absolutely wrong layer. The library > should provide the support for the edge case. But we disagree on that. I strongly disagree. Firstly on the layer to fix, let me be clear, if something writes like RAID1, reads like RAID1, repairs like RAID1, then it's RAID1. With that mindset, changing chunk type at read/add time is exactly the correct layer to fix. Or let me change the title of the patch, ENHANCE single-data-RAID56 handling to get rid all the corner cases. Do you still think that's the wrong correct layer to fix? So the wrong layer argument doesn't stand. If you're so focused on fixing the problem in the library layer, then all you can see is to put some adhoc fixes to emulate the library fix. Furthermore, we are already doing all the warning about 2-disk raid5, now I'm providing a solution that is as mature as RAID1, now you're complaining about the simplicity but wants all the complex striping and reduced read performance? I don't see any point in this. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 2026-05-25 10:39 ` Qu Wenruo @ 2026-05-26 6:46 ` Christoph Hellwig 0 siblings, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2026-05-26 6:46 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, Christoph Hellwig, linux-btrfs, linux-raid On Mon, May 25, 2026 at 08:09:54PM +0930, Qu Wenruo wrote: >> It's a lazy hack at best and fix on absolutely wrong layer. The library >> should provide the support for the edge case. But we disagree on that. > > I strongly disagree. > > Firstly on the layer to fix, let me be clear, if something writes like > RAID1, reads like RAID1, repairs like RAID1, then it's RAID1. *nod* > With that mindset, changing chunk type at read/add time is exactly the > correct layer to fix. It also is amazingly simple! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 2026-05-25 10:17 ` David Sterba 2026-05-25 10:39 ` Qu Wenruo @ 2026-05-26 6:45 ` Christoph Hellwig 1 sibling, 0 replies; 14+ messages in thread From: Christoph Hellwig @ 2026-05-26 6:45 UTC (permalink / raw) To: David Sterba; +Cc: Christoph Hellwig, Qu Wenruo, linux-btrfs, linux-raid On Mon, May 25, 2026 at 12:17:07PM +0200, David Sterba wrote: > > It absolutely is. Adding fast path workarounds for this is completely > > stupid when it can be trivially handled on the mount side. > > It's a lazy hack at best and fix on absolutely wrong layer. The library > should provide the support for the edge case. But we disagree on that. No, the library should not provide hacks for something it never intended to support, and which is a bad idea. It's also something the implementation that the library was factored out from for btrfs use never supported. So adding the workaround to btrfs, which caused the problem is a requirement. Doing it to only create mount time overhead instead of burdening the I/O path is a very smart way to solve the problem, while adding hacks to the I/O fastpath is everything but smart. But in the end this is for the btrfs maintainers to decide. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-05-26 6:46 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-22 9:23 [PATCH 0/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo 2026-05-22 9:23 ` [PATCH 1/3] btrfs: remove btrfs_chunk_map::io_(align|width) members Qu Wenruo 2026-05-22 9:23 ` [PATCH 2/3] btrfs: remove duplicated block group type assignment Qu Wenruo 2026-05-22 9:23 ` [PATCH 3/3] btrfs: disguise single-data-RAID56 as RAID1/RAID1C3 Qu Wenruo 2026-05-23 14:23 ` David Sterba 2026-05-23 22:46 ` Qu Wenruo 2026-05-24 4:13 ` Qu Wenruo 2026-05-25 10:12 ` David Sterba 2026-05-25 10:25 ` Qu Wenruo 2026-05-25 6:00 ` Christoph Hellwig 2026-05-25 10:17 ` David Sterba 2026-05-25 10:39 ` Qu Wenruo 2026-05-26 6:46 ` Christoph Hellwig 2026-05-26 6:45 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox