Linux RAID subsystem development
 help / color / mirror / Atom feed
From: Hiroshi Nishida <nishidafmly@gmail.com>
To: Song Liu <song@kernel.org>, Yu Kuai <yukuai@fygo.io>
Cc: Li Nan <magiclinan@didiglobal.com>, Xiao Ni <xiao@kernel.org>,
	linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
	Hiroshi Nishida <nishidafmly@gmail.com>
Subject: [PATCH 8/8] md/raid5: reserve stripe cache for user I/O during rebuild
Date: Wed, 24 Jun 2026 08:54:52 -0700	[thread overview]
Message-ID: <20260624155452.211646-9-nishidafmly@gmail.com> (raw)
In-Reply-To: <20260624155452.211646-1-nishidafmly@gmail.com>

The resync read-ahead window (RAID5_SYNC_WINDOW) can fill the stripe
cache with rebuild stripes and starve concurrent user I/O, producing a
burst-starvation flip-flop between rebuild and application throughput.

Add two yield points to the window-submission loop:
 - stop the window immediately if any thread is waiting for a stripe
   (waitqueue_active(&conf->wait_for_stripe)); the check is intentionally
   racy -- a waiter appearing just after is serviced by the next
   sync_request call, so no barrier is needed.
 - stop expanding once active_stripes reaches half the cache
   (max_nr_stripes / RAID5_SYNC_HWMARK), but only when
   preread_active_stripes > 0, i.e. user write I/O is actually competing.
   Sync stripes never set STRIPE_PREREAD_ACTIVE, so during a pure rebuild
   the counter stays zero and the window fills freely; rebuild-only
   throughput is unchanged.

This bounds the share of the stripe cache a rebuild may hold while user
I/O is present, so application latency no longer collapses during the
read-ahead bursts, without throttling a rebuild that has the array to
itself.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Hiroshi Nishida <nishidafmly@gmail.com>
---
 drivers/md/raid5.c | 21 +++++++++++++++++++++
 drivers/md/raid5.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ad6230415af3..480f3aa069ef 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6656,6 +6656,27 @@ static inline sector_t raid5_sync_request(struct mddev *mddev, sector_t sector_n
 	     submitted < RAID5_SYNC_WINDOW && win_sector < max_sector &&
 	     win_sector < mddev->resync_max;
 	     submitted++, win_sector += RAID5_STRIPE_SECTORS(conf)) {
+		/*
+		 * Yield to user I/O: stop the read-ahead if anyone is waiting
+		 * for a stripe.  The check is intentionally racy -- a waiter
+		 * appearing just after is serviced by the next sync_request
+		 * call, so no barrier is needed.
+		 */
+		if (waitqueue_active(&conf->wait_for_stripe))
+			break;
+		/*
+		 * Reserve cache for user I/O only when it is actually competing.
+		 * preread_active_stripes counts stripes queued for write I/O
+		 * (including the read phase of RMW); sync stripes never set
+		 * STRIPE_PREREAD_ACTIVE, so during a pure rebuild it stays zero
+		 * and the window fills freely.  Competing user reads do not bump
+		 * the counter but are caught by the waitqueue_active() check
+		 * above.
+		 */
+		if (atomic_read(&conf->preread_active_stripes) > 0 &&
+		    atomic_read(&conf->active_stripes) >=
+		    conf->max_nr_stripes / RAID5_SYNC_HWMARK)
+			break;
 		sh = raid5_get_active_stripe(conf, NULL, win_sector,
 					     R5_GAS_NOBLOCK);
 		if (!sh)
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 1f37dabd727b..7833cc07597f 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -499,6 +499,7 @@ struct disk_info {
 #define MAX_STRIPE_BATCH	32	/* stripes per handle_active_stripes pass */
 #define STRIPE_BATCH_WORKERS	8	/* stripes-per-worker threshold for spawning */
 #define RAID5_SYNC_WINDOW	32	/* stripes to pre-submit per sync_request call */
+#define RAID5_SYNC_HWMARK	2	/* rebuild uses at most 1/N of stripe cache */
 
 /* NR_STRIPE_HASH_LOCKS must be a power of two, since
  * STRIPE_HASH_LOCKS_MASK masks with (NR_STRIPE_HASH_LOCKS - 1).
-- 
2.43.0


  parent reply	other threads:[~2026-06-24 15:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24 15:54 [PATCH 0/8] md/raid5: scalability and rebuild-path improvements Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 1/8] md: change chunk_sectors and stripe cache counts to unsigned int Hiroshi Nishida
2026-06-24 16:16   ` sashiko-bot
2026-06-24 17:25     ` Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 2/8] md/raid5: raise stripe cache limit from 32768 to 262144 Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 3/8] md: widen badblock sectors param from int to sector_t Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 4/8] md/raid5: raise NR_STRIPE_HASH_LOCKS from 8 to 32 Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 5/8] md/raid5: submit a window of stripes during resync/recovery Hiroshi Nishida
2026-06-24 16:12   ` sashiko-bot
2026-06-24 17:13     ` Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 6/8] md/raid5: allocate worker groups per NUMA node Hiroshi Nishida
2026-06-24 16:07   ` sashiko-bot
2026-06-24 16:53     ` Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 7/8] md/raid5: raise MAX_STRIPE_BATCH from 8 to 32 Hiroshi Nishida
2026-06-24 16:09   ` sashiko-bot
2026-06-24 17:01     ` Hiroshi Nishida
2026-06-24 15:54 ` Hiroshi Nishida [this message]
2026-06-24 16:12   ` [PATCH 8/8] md/raid5: reserve stripe cache for user I/O during rebuild sashiko-bot
2026-06-24 17:25     ` Hiroshi Nishida

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260624155452.211646-9-nishidafmly@gmail.com \
    --to=nishidafmly@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=magiclinan@didiglobal.com \
    --cc=song@kernel.org \
    --cc=xiao@kernel.org \
    --cc=yukuai@fygo.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox