Linux RAID subsystem development
 help / color / mirror / Atom feed
* [PATCH v2 0/2] md/raid10: fix r10bio width mismatches across reshape
@ 2026-05-15  9:27 Chen Cheng
  2026-05-15  9:27 ` [PATCH v2 1/2] md/raid10: make r10bio_pool use fixed-size objects Chen Cheng
  2026-05-15  9:27 ` [PATCH v2 2/2] md/raid10: bound reused r10bio devs[] walks by used_nr_devs Chen Cheng
  0 siblings, 2 replies; 3+ messages in thread
From: Chen Cheng @ 2026-05-15  9:27 UTC (permalink / raw)
  To: Yu Kuai; +Cc: Chen Cheng, linux-raid, linux-kernel

From: Chen Cheng <chencheng@fnnas.com>

Hi,

This series fixes slab out-of-bounds accesses in raid10 when reshape changes
the number of raid disks while regular I/O is still reusing r10bio objects
allocated under the previous geometry.

The bug is reproducible with a simple 4-disk to 5-disk reshape under write
load, for example:

  mdadm -C /dev/md777 -l10 -n4 /dev/sda /dev/sdb /dev/sdc /dev/sdd
  mkfs.ext4 /dev/md777
  mount /dev/md777 /mnt/test
  fsstress -d /mnt/test -n 24000 -p 8 -l 24 &
  mdadm /dev/md777 --add /dev/sde
  mdadm --grow /dev/md777 --raid-devices=5 \
    --backup-file=/tmp/md-reshape-backup

Without these changes, an r10bio allocated under the old geometry can later be
reused, initialized, or freed after conf->geo.raid_disks has switched to the
new geometry. This creates width mismatches between the object and the current
devs[] walk/initialization width, which can trigger KASAN reports such as
slab-out-of-bounds in __make_request(), put_all_bios(), or find_bio_disk().

This series addresses the problem in two steps:

  1. make the regular r10bio pool fixed-size across reshape transitions, and
     move the pool rebuild into the freeze window before the live geometry
     switch;

  2. track the number of valid devs[] entries in each reused r10bio and use
     that recorded width when walking devs[] after reshape.

Changes in v2:
  - add this cover letter
  - convert r10bio_pool to a fixed-size kmalloc mempool
  - rebuild r10bio_pool inside the freeze window before switching live reshape
    geometry
  - switch raid10_quiesce() to freeze_array()/unfreeze_array()

Open issues:

One point where this v2 series still differs from raid1 is the pool-switch
semantics during reshape.

raid1 handles this by:
  - converting r1bio_pool to a fixed-size pool,
  - freezing the array,
  - swapping in the new pool while the array is frozen,
  - switching the live geometry/state,
  - unfreezing the array, and
  - destroying the old pool afterwards.

In other words, raid1 keeps the old and new regular I/O pools logically
separated across the reshape transition.

This raid10 v2 series follows the same high-level direction by converting
r10bio_pool to a fixed-size pool and moving the pool rebuild into the freeze
window before the live geometry switch. However, it does not yet mirror
raid1 completely: queued regular r10bios may still exist on retry_list or
bio_end_io_list at the time of the pool replacement, and raid10's current
freeze semantics only guarantee that in-flight I/O has either completed or
been queued.

My current understanding is that there are two possible directions to make
this fully robust:

  1. strengthen raid10 freeze semantics so that the reshape-time pool switch
     guarantees that no old regular r10bio can survive across the transition;
     or

  2. explicitly associate in-flight regular r10bios with the pool they were
     allocated from, so they can always be returned to the correct pool even
     if old and new pools overlap in time.

There is also a pre-existing boundary issue in find_bio_disk(): if the bio
is not found in devs[], the code can still walk past the recorded width.
That issue is not addressed in this series.

Testing:
  - reproduced the original KASAN slab-out-of-bounds on 4-disk -> 5-disk
    raid10 reshape with fsstress
  - verified that this series fixes that reproducer
  - exercised the 5-disk -> 4-disk reshape direction as well

Thanks,
Chen Cheng


Chen Cheng (2):
  md/raid10: make r10bio_pool use fixed-size objects
  md/raid10: bound reused r10bio devs[] walks by used_nr_devs

 drivers/md/raid10.c | 63 +++++++++++++++++++++++++++++++++------------
 drivers/md/raid10.h |  4 ++-
 2 files changed, 49 insertions(+), 18 deletions(-)

-- 
2.54.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-15  9:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15  9:27 [PATCH v2 0/2] md/raid10: fix r10bio width mismatches across reshape Chen Cheng
2026-05-15  9:27 ` [PATCH v2 1/2] md/raid10: make r10bio_pool use fixed-size objects Chen Cheng
2026-05-15  9:27 ` [PATCH v2 2/2] md/raid10: bound reused r10bio devs[] walks by used_nr_devs Chen Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox