All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/3] md/raid10: fix r10bio width mismatches across reshape
@ 2026-06-22 12:13 Chen Cheng
  2026-06-22 12:13 ` [PATCH v5 1/3] md: suspend array when sync_action=reshape Chen Cheng
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Chen Cheng @ 2026-06-22 12:13 UTC (permalink / raw)
  To: linux-raid, yukuai, yukuai; +Cc: chencheng, linux-kernel

From: Chen Cheng <chencheng@fnnas.com>

Hi,

This series fixes slab out-of-bounds accesses in raid10 when reshape changes
the number of raid disks while regular I/O is still reusing r10bio objects
allocated under the previous geometry.

The bug is reproducible with a simple 4-disk to 5-disk reshape under write
load, for example:

  mdadm -C /dev/md777 -l10 -n4 /dev/sda /dev/sdb /dev/sdc /dev/sdd
  mkfs.ext4 /dev/md777
  mount /dev/md777 /mnt/test
  fsstress -d /mnt/test -n 24000 -p 8 -l 24 &
  mdadm /dev/md777 --add /dev/sde
  mdadm --grow /dev/md777 --raid-devices=5 \
    --backup-file=/tmp/md-reshape-backup


kcsan report:

  BUG: KASAN: slab-out-of-bounds in free_r10bio+0x1c4/0x260 [raid10]
  Read of size 8 at addr ffff00008c2dfac8 by task ksoftirqd/0/15
  free_r10bio
  raid_end_bio_io
  one_write_done
  raid10_end_write_request


This series addresses the problem in three steps:

  1. ensure the sync_action=reshape caller suspends and locks before start_reshape

  2. covert the r10bio pool fixed-size from old geometry to new.

  3. reorder r10bio free flow to avoid race when free r10bio.

Changes in v5(suggesst by yukuai):
   - patch 2 simpify
   - patch 3 use new way{reorder free r10bio flow} instead of 
     old way {bound reused r10bio devs[] walks by used_nr_devs}

Changes in v4:
   - The sync_action=reshape path, caller now invokes
     mddev_suspend_and_lock() before calling start_reshape()
   - The md-cluster and dm-raid paths are unchanged, that is reach
     start_reshape() with the mddev locked but without suspended.


Changes in v3:
   - Replace freeze_array()/unfreeze_array() in raid10_start_reshape() with
     mddev_suspend_and_lock_nointr()/mddev_unlock_and_resume(). freeze_array()
     returns when nr_pending == nr_queued, which still allows retry-list items
     to hold pool objects; mddev_suspend() provides the correct upper-layer
     quiesce interface. (Suggested by Yu Kuai)


Changes in v2:
  - add this cover letter
  - convert r10bio_pool to a fixed-size kmalloc mempool
  - rebuild r10bio_pool inside the freeze window before switching live reshape
    geometry
  - switch raid10_quiesce() to freeze_array()/unfreeze_array()


Testing:
  - reproduced the original KASAN slab-out-of-bounds on 4-disk -> 5-disk
    raid10 reshape with fsstress
  - verified that this series fixes that reproducer
  - exercised the 5-disk -> 4-disk reshape direction as well

Thanks,
Chen Cheng



Chen Cheng (3):
  md: suspend array before raid10 reshape via sync_action
  md/raid10: make r10bio_pool use fixed-size objects
  md/raid10: bound reused r10bio devs[] walks by used_nr_devs

 drivers/md/md.c     | 22 ++++++++++++++----
 drivers/md/raid10.c | 56 +++++++++++++++++++++++++++++++++------------
 drivers/md/raid10.h |  4 +++-
 3 files changed, 61 insertions(+), 21 deletions(-)

-- 
2.54.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v5 1/3] md: suspend array when sync_action=reshape
  2026-06-22 12:13 [PATCH v5 0/3] md/raid10: fix r10bio width mismatches across reshape Chen Cheng
@ 2026-06-22 12:13 ` Chen Cheng
  2026-06-22 12:25   ` sashiko-bot
  2026-06-22 12:13 ` [PATCH v5 2/3] md/raid10: resize r10bio_pool for reshape Chen Cheng
  2026-06-22 12:13 ` [PATCH v5 3/3] md/raid10: free r10bio before ending master_bio in raid_end_bio_io() Chen Cheng
  2 siblings, 1 reply; 7+ messages in thread
From: Chen Cheng @ 2026-06-22 12:13 UTC (permalink / raw)
  To: linux-raid, yukuai, yukuai; +Cc: chencheng, linux-kernel

From: Chen Cheng <chencheng@fnnas.com>

raid10 needs to resize/swap r10bio_pool when reshape changes
raid_disks, and, don't let new requests keep allocating r10bio
objects from the old pool while that transition is in progress.

suspend and lock array before mddev_start_reshape(), and resume
it on exit.

Other sync_action ops are unchanged.

Signed-off-by: Chen Cheng <chencheng@fnnas.com>
---
 drivers/md/md.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 096bb64e87bd..e139f36e30b9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5256,30 +5256,41 @@ static int mddev_start_reshape(struct mddev *mddev)
 
 static ssize_t
 action_store(struct mddev *mddev, const char *page, size_t len)
 {
 	int ret;
+	bool suspended = false;
 	enum sync_action action;
 
 	if (!mddev->pers || !mddev->pers->sync_request)
 		return -EINVAL;
 
+	action = md_sync_action_by_name(page);
+	if (action == ACTION_RESHAPE) {
+		ret = mddev_suspend(mddev, true);
+		if (ret)
+			return ret;
+		suspended = true;
+	}
 retry:
 	if (work_busy(&mddev->sync_work))
 		flush_work(&mddev->sync_work);
 
 	ret = mddev_lock(mddev);
-	if (ret)
+	if (ret) {
+		mddev_resume(mddev);
 		return ret;
+	}
 
 	if (work_busy(&mddev->sync_work)) {
-		mddev_unlock(mddev);
+		if (suspended)
+			mddev_unlock_and_resume(mddev);
+		else
+			mddev_unlock(mddev);
 		goto retry;
 	}
 
-	action = md_sync_action_by_name(page);
-
 	/* TODO: mdadm rely on "idle" to start sync_thread. */
 	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
 		switch (action) {
 		case ACTION_FROZEN:
 			md_frozen_sync_thread(mddev);
@@ -5344,11 +5355,14 @@ action_store(struct mddev *mddev, const char *page, size_t len)
 	md_wakeup_thread(mddev->thread);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
 	ret = len;
 
 out:
-	mddev_unlock(mddev);
+	if (suspended)
+		mddev_unlock_and_resume(mddev);
+	else
+		mddev_unlock(mddev);
 	return ret;
 }
 
 static struct md_sysfs_entry md_scan_mode =
 __ATTR_PREALLOC(sync_action, S_IRUGO|S_IWUSR, action_show, action_store);
-- 
2.54.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v5 2/3] md/raid10: resize r10bio_pool for reshape
  2026-06-22 12:13 [PATCH v5 0/3] md/raid10: fix r10bio width mismatches across reshape Chen Cheng
  2026-06-22 12:13 ` [PATCH v5 1/3] md: suspend array when sync_action=reshape Chen Cheng
@ 2026-06-22 12:13 ` Chen Cheng
  2026-06-22 12:42   ` sashiko-bot
  2026-06-22 12:13 ` [PATCH v5 3/3] md/raid10: free r10bio before ending master_bio in raid_end_bio_io() Chen Cheng
  2 siblings, 1 reply; 7+ messages in thread
From: Chen Cheng @ 2026-06-22 12:13 UTC (permalink / raw)
  To: linux-raid, yukuai, yukuai; +Cc: chencheng, linux-kernel

From: Chen Cheng <chencheng@fnnas.com>

When reshape changes raid_disks, the pool must also switch to new geometry
object size.

Allocate a new geometry size pool and replace the old.

Signed-off-by: Chen Cheng <chencheng@fnnas.com>
---
 drivers/md/raid10.c | 44 +++++++++++++++++++++++++++++++++-----------
 drivers/md/raid10.h |  2 +-
 2 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index cee5a253a281..d740744a9746 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -101,14 +101,25 @@ static void end_reshape(struct r10conf *conf);
 static inline struct r10bio *get_resync_r10bio(struct bio *bio)
 {
 	return get_resync_pages(bio)->raid_bio;
 }
 
-static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
+static inline int calc_r10bio_size(unsigned int raid_disks)
 {
-	struct r10conf *conf = data;
-	int size = offsetof(struct r10bio, devs[conf->geo.raid_disks]);
+	return offsetof(struct r10bio, devs[raid_disks]);
+}
+
+static mempool_t *create_r10bio_pool(unsigned int raid_disks)
+{
+	int size = calc_r10bio_size(raid_disks);
+
+	return mempool_create_kmalloc_pool(NR_RAID_BIOS, size);
+}
+
+static struct r10bio *alloc_r10bio(unsigned int raid_disks, gfp_t gfp_flags)
+{
+	int size = calc_r10bio_size(raid_disks);
 
 	/* allocate a r10bio with room for raid_disks entries in the
 	 * bios array */
 	return kzalloc(size, gfp_flags);
 }
@@ -135,11 +146,11 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
 	struct bio *bio;
 	int j;
 	int nalloc, nalloc_rp;
 	struct resync_pages *rps;
 
-	r10_bio = r10bio_pool_alloc(gfp_flags, conf);
+	r10_bio = alloc_r10bio(conf->geo.raid_disks, gfp_flags);
 	if (!r10_bio)
 		return NULL;
 
 	if (test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery) ||
 	    test_bit(MD_RECOVERY_RESHAPE, &conf->mddev->recovery))
@@ -275,11 +286,11 @@ static void put_all_bios(struct r10conf *conf, struct r10bio *r10_bio)
 static void free_r10bio(struct r10bio *r10_bio)
 {
 	struct r10conf *conf = r10_bio->mddev->private;
 
 	put_all_bios(conf, r10_bio);
-	mempool_free(r10_bio, &conf->r10bio_pool);
+	mempool_free(r10_bio, conf->r10bio_pool);
 }
 
 static void put_buf(struct r10bio *r10_bio)
 {
 	struct r10conf *conf = r10_bio->mddev->private;
@@ -1537,11 +1548,11 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
 {
 	struct r10conf *conf = mddev->private;
 	struct r10bio *r10_bio;
 
-	r10_bio = mempool_alloc(&conf->r10bio_pool, GFP_NOIO);
+	r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
 
 	r10_bio->master_bio = bio;
 	r10_bio->sectors = sectors;
 
 	r10_bio->mddev = mddev;
@@ -1729,11 +1740,11 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 		last_stripe_index *= geo->far_copies;
 	end_disk_offset = (bio_end & geo->chunk_mask) +
 				(last_stripe_index << geo->chunk_shift);
 
 retry_discard:
-	r10_bio = mempool_alloc(&conf->r10bio_pool, GFP_NOIO);
+	r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
 	r10_bio->mddev = mddev;
 	r10_bio->state = 0;
 	r10_bio->sectors = 0;
 	r10_bio->read_slot = -1;
 	memset(r10_bio->devs, 0, sizeof(r10_bio->devs[0]) * geo->raid_disks);
@@ -3830,11 +3841,11 @@ static int setup_geo(struct geom *geo, struct mddev *mddev, enum geo_type new)
 static void raid10_free_conf(struct r10conf *conf)
 {
 	if (!conf)
 		return;
 
-	mempool_exit(&conf->r10bio_pool);
+	mempool_destroy(conf->r10bio_pool);
 	kfree(conf->mirrors);
 	kfree(conf->mirrors_old);
 	kfree(conf->mirrors_new);
 	safe_put_page(conf->tmppage);
 	bioset_exit(&conf->bio_split);
@@ -3877,13 +3888,12 @@ static struct r10conf *setup_conf(struct mddev *mddev)
 	if (!conf->tmppage)
 		goto out;
 
 	conf->geo = geo;
 	conf->copies = copies;
-	err = mempool_init(&conf->r10bio_pool, NR_RAID_BIOS, r10bio_pool_alloc,
-			   rbio_pool_free, conf);
-	if (err)
+	conf->r10bio_pool = create_r10bio_pool(conf->geo.raid_disks);
+	if (!conf->r10bio_pool)
 		goto out;
 
 	err = bioset_init(&conf->bio_split, BIO_POOL_SIZE, 0, 0);
 	if (err)
 		goto out;
@@ -4373,10 +4383,11 @@ static int raid10_start_reshape(struct mddev *mddev)
 	struct geom new;
 	struct r10conf *conf = mddev->private;
 	struct md_rdev *rdev;
 	int spares = 0;
 	int ret;
+	mempool_t *new_pool = NULL;
 
 	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
 		return -EBUSY;
 
 	if (setup_geo(&new, mddev, geo_start) != conf->copies)
@@ -4409,10 +4420,15 @@ static int raid10_start_reshape(struct mddev *mddev)
 
 	if (spares < mddev->delta_disks)
 		return -EINVAL;
 
 	conf->offset_diff = min_offset_diff;
+	if (mddev->delta_disks > 0) {
+		new_pool = create_r10bio_pool(new.raid_disks);
+		if (!new_pool)
+			return -ENOMEM;
+	}
 	spin_lock_irq(&conf->device_lock);
 	if (conf->mirrors_new) {
 		memcpy(conf->mirrors_new, conf->mirrors,
 		       sizeof(struct raid10_info)*conf->prev.raid_disks);
 		smp_mb();
@@ -4509,10 +4525,14 @@ static int raid10_start_reshape(struct mddev *mddev)
 	mddev->degraded = calc_degraded(conf);
 	spin_unlock_irq(&conf->device_lock);
 	mddev->raid_disks = conf->geo.raid_disks;
 	mddev->reshape_position = conf->reshape_progress;
 	set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
+	if (new_pool) {
+		mempool_destroy(conf->r10bio_pool);
+		conf->r10bio_pool = new_pool;
+	}
 
 	clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 	clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
 	set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
@@ -4531,10 +4551,12 @@ static int raid10_start_reshape(struct mddev *mddev)
 	smp_wmb();
 	conf->reshape_progress = MaxSector;
 	conf->reshape_safe = MaxSector;
 	mddev->reshape_position = MaxSector;
 	spin_unlock_irq(&conf->device_lock);
+	if (new_pool)
+		mempool_destroy(new_pool);
 	return ret;
 }
 
 /* Calculate the last device-address that could contain
  * any block from the chunk that includes the array-address 's'
diff --git a/drivers/md/raid10.h b/drivers/md/raid10.h
index ec79d87fb92f..b711626a5db7 100644
--- a/drivers/md/raid10.h
+++ b/drivers/md/raid10.h
@@ -85,11 +85,11 @@ struct r10conf {
 	int			have_replacement; /* There is at least one
 						   * replacement device.
 						   */
 	wait_queue_head_t	wait_barrier;
 
-	mempool_t		r10bio_pool;
+	mempool_t		*r10bio_pool;
 	mempool_t		r10buf_pool;
 	struct page		*tmppage;
 	struct bio_set		bio_split;
 
 	/* When taking over an array from a different personality, we store
-- 
2.54.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v5 3/3] md/raid10: free r10bio before ending master_bio in raid_end_bio_io()
  2026-06-22 12:13 [PATCH v5 0/3] md/raid10: fix r10bio width mismatches across reshape Chen Cheng
  2026-06-22 12:13 ` [PATCH v5 1/3] md: suspend array when sync_action=reshape Chen Cheng
  2026-06-22 12:13 ` [PATCH v5 2/3] md/raid10: resize r10bio_pool for reshape Chen Cheng
@ 2026-06-22 12:13 ` Chen Cheng
  2026-06-22 12:29   ` sashiko-bot
  2 siblings, 1 reply; 7+ messages in thread
From: Chen Cheng @ 2026-06-22 12:13 UTC (permalink / raw)
  To: linux-raid, yukuai, yukuai; +Cc: chencheng, linux-kernel

From: Chen Cheng <chencheng@fnnas.com>

origin flow:

      bio_endio(master_bio);   /* may drop active_io to zero */
      allow_barrier(conf);
      free_r10bio(r10_bio);    /* reads conf->geo, returns to pool */

one scenario is:

  CPU A (softirq, raid_end_bio_io)         CPU B (action_store) --> reshape
  ================================         ===============================
  bio_endio(master_bio)
    md_end_clone_io
      percpu_ref_put -> 0
                                           wait_event wakeup, and,
                                           	mddev_suspend return
                                           raid10_start_reshape:
                                             setup_geo(&conf->geo, new)
                                             ...
                                             mempool_destroy(old_pool)
                                             conf->r10bio_pool = new_pool
  allow_barrier(conf)
  free_r10bio(r10_bio)
    put_all_bios:
      for (i=0; i<conf->geo.raid_disks; i++)
          ==> old obj, new geo, OOB
    mempool_free(r10_bio, conf->r10bio_pool)
          ==> old-geometry obj freed into new pool

so .. fix by reorder the flow:

	free_r10bio(r10_bio)
	allow_barrier(conf)
	bio_endio(master_io)

Signed-off-by: Chen Cheng <chencheng@fnnas.com>
---
 drivers/md/raid10.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d740744a9746..a4642c903b20 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -330,24 +330,27 @@ static void reschedule_retry(struct r10bio *r10_bio)
  */
 static void raid_end_bio_io(struct r10bio *r10_bio)
 {
 	struct bio *bio = r10_bio->master_bio;
 	struct r10conf *conf = r10_bio->mddev->private;
+	bool returned = test_and_set_bit(R10BIO_Returned, &r10_bio->state);
+	blk_status_t status = test_bit(R10BIO_Uptodate, &r10_bio->state)
+				? BLK_STS_OK : BLK_STS_IOERR;
 
-	if (!test_and_set_bit(R10BIO_Returned, &r10_bio->state)) {
-		if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
-			bio->bi_status = BLK_STS_IOERR;
-		bio_endio(bio);
-	}
+	put_all_bios(conf, r10_bio);
+	mempool_free(r10_bio, conf->r10bio_pool);
 
 	/*
 	 * Wake up any possible resync thread that waits for the device
 	 * to go idle.
 	 */
 	allow_barrier(conf);
 
-	free_r10bio(r10_bio);
+	if (!returned) {
+		bio->bi_status = status;
+		bio_endio(bio);
+	}
 }
 
 /*
  * Update disk head position estimator based on IRQ completion info.
  */
-- 
2.54.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 1/3] md: suspend array when sync_action=reshape
  2026-06-22 12:13 ` [PATCH v5 1/3] md: suspend array when sync_action=reshape Chen Cheng
@ 2026-06-22 12:25   ` sashiko-bot
  0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-22 12:25 UTC (permalink / raw)
  To: Chen Cheng; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [High] Unmatched mddev_resume() on mddev_lock() failure causes mddev->suspended underflow.
- [High] The retry loop bypasses array re-suspension but leaves the suspended flag set, causing unprotected reshape and double-resume.
- [High] Suspending the array before calling flush_work() creates a circular memory reclaim deadlock.

Pre-existing issues:
- [High] Cross-task corruption of PF_MEMALLOC_NOIO flags when mddev_suspend() and mddev_resume() are interleaved.
--

commit ec573079e939aa92bfb3fd89247846ae57883ef2
Author: Chen Cheng <chencheng@fnnas.com>

md: suspend array when sync_action=reshape

This patch suspends the md array before starting a reshape operation to
prevent new requests from allocating objects from the old pool during
the transition. The array is then resumed on exit.

> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 096bb64e87bd5..e139f36e30b92 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5258,26 +5258,37 @@ static ssize_t
>  action_store(struct mddev *mddev, const char *page, size_t len)
>  {
>  	int ret;
> +	bool suspended = false;
>  	enum sync_action action;
>  
>  	if (!mddev->pers || !mddev->pers->sync_request)
>  		return -EINVAL;
>  
> +	action = md_sync_action_by_name(page);
> +	if (action == ACTION_RESHAPE) {
> +		ret = mddev_suspend(mddev, true);

[Severity: High]
This is a pre-existing issue, but does calling mddev_suspend() and
mddev_resume() from concurrent contexts risk corrupting the PF_MEMALLOC_NOIO
task flags?

When mddev_suspend() is called, it stores the task-specific NOIO flag into
the global mddev structure:

drivers/md/md.c:mddev_suspend() {
    ...
    mddev->noio_flag = memalloc_noio_save();
    ...
}

If two tasks concurrently enter mddev_suspend(), the second task increments
the suspend counter without overwriting mddev->noio_flag. If the first task
then resumes (e.g., on a lock failure) it decrements the counter but returns 
early because the array is still suspended:

drivers/md/md.c:__mddev_resume() {
    if (mddev->suspended) {
        mutex_unlock(&mddev->suspend_mutex);
        return;
    }
    ...
    memalloc_noio_restore(mddev->noio_flag);
}

This prevents the first task from restoring its NOIO flag. When the second
task later resumes and drops the counter to 0, it incorrectly restores the
first task's flags onto its own context. Could this leave the first task
stuck in NOIO mode indefinitely?

> +		if (ret)
> +			return ret;
> +		suspended = true;
> +	}
>  retry:
>  	if (work_busy(&mddev->sync_work))
>  		flush_work(&mddev->sync_work);

[Severity: High]
Can holding the array suspended while synchronously waiting for the worker
thread expose the worker's memory allocations to a circular deadlock?

If md_start_sync() allocates memory with GFP_KERNEL in the kworker thread
(for example, when calling md_register_thread()), it can trigger direct
memory reclaim. Because the worker thread did not transition the array to
suspended itself, it does not inherit the PF_MEMALLOC_NOIO flag.

Consequently, memory reclaim can issue I/O to the md device. Since the array
is now suspended, md_handle_request() will block this I/O indefinitely. The
worker thread hangs waiting for I/O, and action_store() hangs here in
flush_work() waiting for the worker thread, permanently deadlocking the
array. Is it safe to suspend the array before flushing the work?

>  
>  	ret = mddev_lock(mddev);
> -	if (ret)
> +	if (ret) {
> +		mddev_resume(mddev);
>  		return ret;
> +	}

[Severity: High]
Does this unconditional mddev_resume() risk underflowing the mddev->suspended
counter if the array was not actually suspended?

If action_store() is called with an action other than ACTION_RESHAPE, the
array is not suspended. If the subsequent mddev_lock() fails (e.g., if it is
interrupted by a signal), the error path unconditionally calls mddev_resume().

Since the array was not suspended, __mddev_resume() decrements
mddev->suspended from 0 to -1. Because mddev->suspended is a signed integer
and -1 evaluates to true, subsequent calls to mddev_suspend() will see
mddev->suspended != 0, increment it back to 0, and return immediately without
actually pausing I/O. Could this permanently break the suspension mechanism
for the array?

>  
>  	if (work_busy(&mddev->sync_work)) {
> -		mddev_unlock(mddev);
> +		if (suspended)
> +			mddev_unlock_and_resume(mddev);
> +		else
> +			mddev_unlock(mddev);
>  		goto retry;
>  	}

[Severity: High]
Will the retry loop bypass array re-suspension while leaving the local
suspended flag set to true?

If work_busy(&mddev->sync_work) is true, the code unlocks and resumes the
array, then jumps back to the retry label. However, the retry label is placed
after the mddev_suspend() block. 

The code will not re-suspend the array, but the local 'suspended' boolean
remains true. As a result, the reshape operation proceeds on an active,
unsuspended array. Furthermore, when the function exits, it will call
mddev_unlock_and_resume() again because 'suspended' is true, leading to a
double-resume and another mddev->suspended underflow.

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622121312.1775322-1-chencheng@fnnas.com?part=1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 3/3] md/raid10: free r10bio before ending master_bio in raid_end_bio_io()
  2026-06-22 12:13 ` [PATCH v5 3/3] md/raid10: free r10bio before ending master_bio in raid_end_bio_io() Chen Cheng
@ 2026-06-22 12:29   ` sashiko-bot
  0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-22 12:29 UTC (permalink / raw)
  To: Chen Cheng; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

Pre-existing issues:
- [High] Incomplete fix: identical Use-After-Free and mempool corruption in `raid_end_discard_bio`.
- [High] Memory leak of `r10_bio` on `REQ_NOWAIT` write requests during cluster resync or reshape.
--

commit 088539a2820315440079f31df4aa792c8a6bc547
Author: Chen Cheng <chencheng@fnnas.com>

md/raid10: free r10bio before ending master_bio in raid_end_bio_io()

This commit reorders the teardown sequence in raid_end_bio_io() to free
r10_bio before calling bio_endio(). This prevents a race condition where
a concurrent array reshape can alter the geometry and destroy the mempool
after the master bio completes but before the r10_bio is freed.

> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index d740744a97461..a4642c903b202 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -332,12 +332,12 @@ static void raid_end_bio_io(struct r10bio *r10_bio)
>  {
>  	struct bio *bio = r10_bio->master_bio;
>  	struct r10conf *conf = r10_bio->mddev->private;
> +	bool returned = test_and_set_bit(R10BIO_Returned, &r10_bio->state);
> +	blk_status_t status = test_bit(R10BIO_Uptodate, &r10_bio->state)
> +				? BLK_STS_OK : BLK_STS_IOERR;
>  
> -	if (!test_and_set_bit(R10BIO_Returned, &r10_bio->state)) {
> -		if (!test_bit(R10BIO_Uptodate, &r10_bio->state))
> -			bio->bi_status = BLK_STS_IOERR;
> -		bio_endio(bio);
> -	}
> +	put_all_bios(conf, r10_bio);
> +	mempool_free(r10_bio, conf->r10bio_pool);
>  
>  	/*
>  	 * Wake up any possible resync thread that waits for the device
> @@ -345,7 +345,10 @@ static void raid_end_bio_io(struct r10bio *r10_bio)
>  	 */
>  	allow_barrier(conf);
>  
> -	free_r10bio(r10_bio);
> +	if (!returned) {
> +		bio->bi_status = status;
> +		bio_endio(bio);
> +	}
>  }
>  

[Severity: High]
This is a pre-existing issue, but the exact same race condition appears to
exist in the discard completion path.

In drivers/md/raid10.c:raid_end_discard_bio():

	} else {
		md_write_end(r10bio->mddev);
		bio_endio(r10bio->master_bio);
		free_r10bio(r10bio);
		break;
	}

Since bio_endio(r10bio->master_bio) drops active_io to zero, couldn't this
also wake up mddev_suspend() and allow a concurrent reshape to modify
conf->geo and destroy conf->r10bio_pool before free_r10bio() is called?


[Severity: High]
This is a pre-existing issue, but does raid10_write_request() leak the r10_bio
on REQ_NOWAIT requests?

In drivers/md/raid10.c:__make_request():

	r10_bio = mempool_alloc(conf->r10bio_pool, GFP_NOIO);
...
	if (bio_data_dir(bio) == READ)
		raid10_read_request(mddev, bio, r10_bio);
	else
		raid10_write_request(mddev, bio, r10_bio);

Then in raid10_write_request() during clustered resync:

	if ((mddev_is_clustered(mddev) &&
	     mddev->cluster_ops->area_resyncing(mddev, WRITE,
						bio->bi_iter.bi_sector,
						bio_end_sector(bio)))) {
		DEFINE_WAIT(w);
		/* Bail out if REQ_NOWAIT is set for the bio */
		if (bio->bi_opf & REQ_NOWAIT) {
			bio_wouldblock_error(bio);
			return;
		}

And later during reshape:

		if (bio->bi_opf & REQ_NOWAIT) {
			allow_barrier(conf);
			bio_wouldblock_error(bio);
			return;
		}

Since these paths return immediately, they leave the r10_bio allocated. Could
repeated REQ_NOWAIT rejections eventually exhaust conf->r10bio_pool, causing
subsequent allocations in __make_request() to hang indefinitely?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622121312.1775322-1-chencheng@fnnas.com?part=3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 2/3] md/raid10: resize r10bio_pool for reshape
  2026-06-22 12:13 ` [PATCH v5 2/3] md/raid10: resize r10bio_pool for reshape Chen Cheng
@ 2026-06-22 12:42   ` sashiko-bot
  0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-22 12:42 UTC (permalink / raw)
  To: Chen Cheng; +Cc: yukuai, linux-raid

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [Critical] Replacing `conf->r10bio_pool` during `raid10_start_reshape()` races with discard I/O completion, leading to mempool Use-After-Free and heap out-of-bounds writes.
--

commit 1d797159800c9f37dc1886dd00e8c5968da7d731
Author: Chen Cheng <chencheng@fnnas.com>
Subject: md/raid10: resize r10bio_pool for reshape

This commit updates raid10_start_reshape() to allocate a new, larger
r10bio_pool when the number of raid disks changes. The old pool is then
destroyed and replaced with the new one.

> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -4511,6 +4527,10 @@ static int raid10_start_reshape(struct mddev *mddev)
>  	mddev->raid_disks = conf->geo.raid_disks;
>  	mddev->reshape_position = conf->reshape_progress;
>  	set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
> +	if (new_pool) {
> +		mempool_destroy(conf->r10bio_pool);
> +		conf->r10bio_pool = new_pool;
> +	}

[Severity: Critical]
Does this introduce a race condition with discard I/O completion?

In the discard completion path, raid_end_discard_bio() calls bio_endio()
on the master bio before freeing the r10bio:

raid_end_discard_bio() {
        ...
        bio_endio(r10bio->master_bio);
        ...
        free_r10bio(r10bio);
}

Calling bio_endio() invokes md_end_clone_io(), which decrements the
mddev->active_io refcount. If this drops to zero, mddev_suspend() can return
while the thread is still executing raid_end_discard_bio().

If that happens, raid10_start_reshape() can proceed to destroy
conf->r10bio_pool and assign new_pool here, while free_r10bio() is still
about to use it.

Could this lead to free_r10bio() releasing the object into a destroyed
mempool, or freeing a smaller r10_bio object into the newly allocated pool
designed for larger objects (which could cause a heap out-of-bounds write
later when __make_request() clears the devs array)?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260622121312.1775322-1-chencheng@fnnas.com?part=2

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-22 12:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-22 12:13 [PATCH v5 0/3] md/raid10: fix r10bio width mismatches across reshape Chen Cheng
2026-06-22 12:13 ` [PATCH v5 1/3] md: suspend array when sync_action=reshape Chen Cheng
2026-06-22 12:25   ` sashiko-bot
2026-06-22 12:13 ` [PATCH v5 2/3] md/raid10: resize r10bio_pool for reshape Chen Cheng
2026-06-22 12:42   ` sashiko-bot
2026-06-22 12:13 ` [PATCH v5 3/3] md/raid10: free r10bio before ending master_bio in raid_end_bio_io() Chen Cheng
2026-06-22 12:29   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.