From: Hiroshi Nishida <nishidafmly@gmail.com>
To: Song Liu <song@kernel.org>, Yu Kuai <yukuai@fygo.io>
Cc: Li Nan <magiclinan@didiglobal.com>, Xiao Ni <xiao@kernel.org>,
linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
Hiroshi Nishida <nishidafmly@gmail.com>
Subject: [PATCH 4/8] md/raid5: raise NR_STRIPE_HASH_LOCKS from 8 to 32
Date: Wed, 24 Jun 2026 08:54:48 -0700 [thread overview]
Message-ID: <20260624155452.211646-5-nishidafmly@gmail.com> (raw)
In-Reply-To: <20260624155452.211646-1-nishidafmly@gmail.com>
The stripe cache hash is striped across NR_STRIPE_HASH_LOCKS spinlocks
(see stripe_hash_locks_hash()). The value has been 8 since the per-hash
locking was introduced, which is small for modern many-core servers:
with only 8 buckets, stripe cache lookup and allocation contend on the
same few locks once the CPU count greatly exceeds the bucket count.
Raise it to 32. Two constraints bound the choice:
- STRIPE_HASH_LOCKS_MASK is (NR_STRIPE_HASH_LOCKS - 1) and is used as
a bitmask in stripe_hash_locks_hash(), so the value must be a power
of two.
- raid5_quiesce() acquires every hash lock plus device_lock at once
via lock_all_device_hash_locks_irq(). That holds
NR_STRIPE_HASH_LOCKS + 1 locks simultaneously, which must stay below
MAX_LOCK_DEPTH (48) so the held-lock array does not overflow when
lockdep is enabled.
32 is the largest power of two that satisfies both: 32 + 1 leaves
headroom under 48, whereas 64 would exceed it. (The pre-existing
"must remain below 64" comment understates this; MAX_LOCK_DEPTH is the
real ceiling.) STRIPE_HASH_LOCKS_MASK and all NR_STRIPE_HASH_LOCKS-
sized arrays scale automatically.
This is purely a lock-striping change; it does not affect stripe cache
correctness. The benefit appears on many-core systems, while on small
systems the extra buckets are harmless.
Tested: loop-device RAID-5 create, write/verify, fail a disk, rebuild
onto a spare and scrub all complete cleanly.
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Hiroshi Nishida <nishidafmly@gmail.com>
---
drivers/md/raid5.h | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 57349737d393..3efab71ebef7 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -498,12 +498,14 @@ struct disk_info {
#define HASH_MASK (NR_HASH - 1)
#define MAX_STRIPE_BATCH 8
-/* NOTE NR_STRIPE_HASH_LOCKS must remain below 64.
- * This is because we sometimes take all the spinlocks
- * and creating that much locking depth can cause
- * problems.
+/* NR_STRIPE_HASH_LOCKS must be a power of two, since
+ * STRIPE_HASH_LOCKS_MASK masks with (NR_STRIPE_HASH_LOCKS - 1).
+ * It must also be small enough that taking all of them at once in
+ * lock_all_device_hash_locks_irq(), plus device_lock, keeps the held
+ * lock count below MAX_LOCK_DEPTH (48) with lockdep enabled. 32 is the
+ * largest power of two that satisfies both constraints.
*/
-#define NR_STRIPE_HASH_LOCKS 8
+#define NR_STRIPE_HASH_LOCKS 32
#define STRIPE_HASH_LOCKS_MASK (NR_STRIPE_HASH_LOCKS - 1)
struct r5worker {
--
2.43.0
next prev parent reply other threads:[~2026-06-24 15:55 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-24 15:54 [PATCH 0/8] md/raid5: scalability and rebuild-path improvements Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 1/8] md: change chunk_sectors and stripe cache counts to unsigned int Hiroshi Nishida
2026-06-24 16:16 ` sashiko-bot
2026-06-24 17:25 ` Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 2/8] md/raid5: raise stripe cache limit from 32768 to 262144 Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 3/8] md: widen badblock sectors param from int to sector_t Hiroshi Nishida
2026-06-24 15:54 ` Hiroshi Nishida [this message]
2026-06-24 15:54 ` [PATCH 5/8] md/raid5: submit a window of stripes during resync/recovery Hiroshi Nishida
2026-06-24 16:12 ` sashiko-bot
2026-06-24 17:13 ` Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 6/8] md/raid5: allocate worker groups per NUMA node Hiroshi Nishida
2026-06-24 16:07 ` sashiko-bot
2026-06-24 16:53 ` Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 7/8] md/raid5: raise MAX_STRIPE_BATCH from 8 to 32 Hiroshi Nishida
2026-06-24 16:09 ` sashiko-bot
2026-06-24 17:01 ` Hiroshi Nishida
2026-06-24 15:54 ` [PATCH 8/8] md/raid5: reserve stripe cache for user I/O during rebuild Hiroshi Nishida
2026-06-24 16:12 ` sashiko-bot
2026-06-24 17:25 ` Hiroshi Nishida
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260624155452.211646-5-nishidafmly@gmail.com \
--to=nishidafmly@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=magiclinan@didiglobal.com \
--cc=song@kernel.org \
--cc=xiao@kernel.org \
--cc=yukuai@fygo.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.