Linux RAID subsystem development
 help / color / mirror / Atom feed
From: Hiroshi Nishida <nishidafmly@gmail.com>
To: Song Liu <song@kernel.org>, Yu Kuai <yukuai@fygo.io>
Cc: Li Nan <magiclinan@didiglobal.com>, Xiao Ni <xiao@kernel.org>,
	linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
	Hiroshi Nishida <nishidafmly@gmail.com>
Subject: [PATCH 4/8] md/raid5: raise NR_STRIPE_HASH_LOCKS from 8 to 32
Date: Wed, 24 Jun 2026 08:54:48 -0700	[thread overview]
Message-ID: <20260624155452.211646-5-nishidafmly@gmail.com> (raw)
In-Reply-To: <20260624155452.211646-1-nishidafmly@gmail.com>

The stripe cache hash is striped across NR_STRIPE_HASH_LOCKS spinlocks
(see stripe_hash_locks_hash()).  The value has been 8 since the per-hash
locking was introduced, which is small for modern many-core servers:
with only 8 buckets, stripe cache lookup and allocation contend on the
same few locks once the CPU count greatly exceeds the bucket count.

Raise it to 32.  Two constraints bound the choice:

  - STRIPE_HASH_LOCKS_MASK is (NR_STRIPE_HASH_LOCKS - 1) and is used as
    a bitmask in stripe_hash_locks_hash(), so the value must be a power
    of two.

  - raid5_quiesce() acquires every hash lock plus device_lock at once
    via lock_all_device_hash_locks_irq().  That holds
    NR_STRIPE_HASH_LOCKS + 1 locks simultaneously, which must stay below
    MAX_LOCK_DEPTH (48) so the held-lock array does not overflow when
    lockdep is enabled.

32 is the largest power of two that satisfies both: 32 + 1 leaves
headroom under 48, whereas 64 would exceed it.  (The pre-existing
"must remain below 64" comment understates this; MAX_LOCK_DEPTH is the
real ceiling.)  STRIPE_HASH_LOCKS_MASK and all NR_STRIPE_HASH_LOCKS-
sized arrays scale automatically.

This is purely a lock-striping change; it does not affect stripe cache
correctness.  The benefit appears on many-core systems, while on small
systems the extra buckets are harmless.

Tested: loop-device RAID-5 create, write/verify, fail a disk, rebuild
onto a spare and scrub all complete cleanly.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Hiroshi Nishida <nishidafmly@gmail.com>
---
 drivers/md/raid5.h | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 57349737d393..3efab71ebef7 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -498,12 +498,14 @@ struct disk_info {
 #define HASH_MASK		(NR_HASH - 1)
 #define MAX_STRIPE_BATCH	8
 
-/* NOTE NR_STRIPE_HASH_LOCKS must remain below 64.
- * This is because we sometimes take all the spinlocks
- * and creating that much locking depth can cause
- * problems.
+/* NR_STRIPE_HASH_LOCKS must be a power of two, since
+ * STRIPE_HASH_LOCKS_MASK masks with (NR_STRIPE_HASH_LOCKS - 1).
+ * It must also be small enough that taking all of them at once in
+ * lock_all_device_hash_locks_irq(), plus device_lock, keeps the held
+ * lock count below MAX_LOCK_DEPTH (48) with lockdep enabled.  32 is the
+ * largest power of two that satisfies both constraints.
  */
-#define NR_STRIPE_HASH_LOCKS 8
+#define NR_STRIPE_HASH_LOCKS 32
 #define STRIPE_HASH_LOCKS_MASK (NR_STRIPE_HASH_LOCKS - 1)
 
 struct r5worker {
-- 
2.43.0


  parent reply	other threads:[~2026-06-24 15:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24 15:54 [PATCH 0/8] md/raid5: scalability and rebuild-path improvements Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 1/8] md: change chunk_sectors and stripe cache counts to unsigned int Hiroshi Nishida
2026-06-24 16:16   ` sashiko-bot
2026-06-24 17:25     ` Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 2/8] md/raid5: raise stripe cache limit from 32768 to 262144 Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 3/8] md: widen badblock sectors param from int to sector_t Hiroshi Nishida
2026-06-24 15:54 ` Hiroshi Nishida [this message]
2026-06-24 15:54 ` [PATCH 5/8] md/raid5: submit a window of stripes during resync/recovery Hiroshi Nishida
2026-06-24 16:12   ` sashiko-bot
2026-06-24 17:13     ` Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 6/8] md/raid5: allocate worker groups per NUMA node Hiroshi Nishida
2026-06-24 16:07   ` sashiko-bot
2026-06-24 16:53     ` Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 7/8] md/raid5: raise MAX_STRIPE_BATCH from 8 to 32 Hiroshi Nishida
2026-06-24 16:09   ` sashiko-bot
2026-06-24 17:01     ` Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 8/8] md/raid5: reserve stripe cache for user I/O during rebuild Hiroshi Nishida
2026-06-24 16:12   ` sashiko-bot
2026-06-24 17:25     ` Hiroshi Nishida

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260624155452.211646-5-nishidafmly@gmail.com \
    --to=nishidafmly@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=magiclinan@didiglobal.com \
    --cc=song@kernel.org \
    --cc=xiao@kernel.org \
    --cc=yukuai@fygo.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox