Linux RAID subsystem development
 help / color / mirror / Atom feed
* [PATCH v6 0/3] md/raid10: fix r10bio width mismatches across reshape
@ 2026-06-23 12:38 Chen Cheng
  2026-06-23 12:38 ` [PATCH v6 1/3] md: suspend array when sync_action=reshape Chen Cheng
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Chen Cheng @ 2026-06-23 12:38 UTC (permalink / raw)
  To: linux-raid, yukuai, yukuai; +Cc: chencheng, linux-kernel

From: Chen Cheng <chencheng@fnnas.com>

Hi,

This series fixes slab out-of-bounds accesses in raid10 when reshape changes
the number of raid disks while regular I/O is still reusing r10bio objects
allocated under the previous geometry.

The bug is reproducible with a simple 4-disk to 5-disk reshape under write
load, for example:

  mdadm -C /dev/md777 -l10 -n4 /dev/sda /dev/sdb /dev/sdc /dev/sdd
  mkfs.ext4 /dev/md777
  mount /dev/md777 /mnt/test
  fsstress -d /mnt/test -n 24000 -p 8 -l 24 &
  mdadm /dev/md777 --add /dev/sde
  mdadm --grow /dev/md777 --raid-devices=5 \
    --backup-file=/tmp/md-reshape-backup


KASAN report:

  BUG: KASAN: slab-out-of-bounds in free_r10bio+0x1c4/0x260 [raid10]
  Read of size 8 at addr ffff00008c2dfac8 by task ksoftirqd/0/15
  free_r10bio
  raid_end_bio_io
  one_write_done
  raid10_end_write_request


This series addresses the problem in three steps:

  1. ensure the sync_action=reshape caller suspends and locks before start_reshape

  2. resize r10bio_pool when reshape grows raid_disks

  3. reorder the r10bio free flow before bio_endio in the regular and discard
     completion paths


Changes in v6:
   - suspend the array in action_store() after flush_work()
   - free r10bio before ending the discard master bio

Changes in v5 (suggested by Yu Kuai):
   - simplify patch 2
   - switch patch 3 from bounding reused r10bio devs[] walks by used_nr_devs
     to reordering the free/endio flow

Changes in v4:
   - make the sync_action=reshape path invoke mddev_suspend_and_lock() before
     calling start_reshape()
   - leave the md-cluster and dm-raid paths unchanged; they still reach
     start_reshape() with the mddev locked but without suspend

Changes in v3:
   - replace freeze_array()/unfreeze_array() in raid10_start_reshape() with
     mddev_suspend_and_lock_nointr()/mddev_unlock_and_resume(); freeze_array()
     can return while retry-list items still hold pool objects, while
     mddev_suspend() provides the correct upper-layer quiesce interface

Changes in v2:
  - add this cover letter
  - convert r10bio_pool to a fixed-size kmalloc mempool
  - rebuild r10bio_pool inside the freeze window before switching live reshape
    geometry
  - switch raid10_quiesce() to freeze_array()/unfreeze_array()


Testing:
  - reproduced the original KASAN slab-out-of-bounds on 4-disk -> 5-disk
    raid10 reshape with fsstress
  - verified that this series fixes that reproducer
  - exercised the 5-disk -> 4-disk reshape direction as well

Thanks,
Chen Cheng



Chen Cheng (3):
  md: suspend array when sync_action=reshape
  md/raid10: resize r10bio_pool for reshape
  md/raid10: free r10bio before ending master_bio in raid_end_bio_io()
    and raid_end_discard_bio()

 drivers/md/md.c     | 17 +++++++++----
 drivers/md/raid10.c | 61 ++++++++++++++++++++++++++++++++-------------
 drivers/md/raid10.h |  2 +-
 3 files changed, 56 insertions(+), 24 deletions(-)

-- 
2.54.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-23 13:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-23 12:38 [PATCH v6 0/3] md/raid10: fix r10bio width mismatches across reshape Chen Cheng
2026-06-23 12:38 ` [PATCH v6 1/3] md: suspend array when sync_action=reshape Chen Cheng
2026-06-23 12:55   ` sashiko-bot
2026-06-23 12:38 ` [PATCH v6 2/3] md/raid10: resize r10bio_pool for reshape Chen Cheng
2026-06-23 13:00   ` sashiko-bot
2026-06-23 12:38 ` [PATCH v6 3/3] md/raid10: free r10bio before ending master_bio in raid_end_bio_io() and raid_end_discard_bio() Chen Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox