[PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock
@ 2026-06-05 16:53 Breno Leitao
  2026-06-09 10:46 ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: Breno Leitao @ 2026-06-05 16:53 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Darren Hart,
	Davidlohr Bueso, André Almeida
  Cc: linux-kernel, puranjay, rmikey, stuclar, namhyung, kernel-team,
	Breno Leitao

struct futex_hash_bucket packs (atomic_t waiters, spinlock_t lock,
struct plist_head chain, struct futex_private_hash *priv) into a
single ____cacheline_aligned_in_smp 64-byte block. Three distinct
access patterns hit that line:

  1. Lockless atomic_read(&hb->waiters) via futex_hb_waiters_pending()
     on the fast path before taking the lock.
  2. spin_lock(&hb->lock) contenders writing the lock word.
  3. The lock holder modifying chain.{next,prev} on every futex_wake,
     futex_q_unlock, plist_add, __futex_unqueue.

This was first noticed on a Meta cache (ucache) production workload:
perf c2c on a busy 176-core AMD EPYC 9D64 ranked this exact cacheline as
the #1 HITM source: 129 Local + 31 Remote HITM, hit by 156 distinct
CPUs in a second.

The contention is not specific to that workload, though. Our very own
"perf bench futex" hash exercises the same buckets and shows the same
false sharing, so the rest of this changelog quantifies the fix with
perf bench futex.

Move chain to its own cacheline so:
  - Lockless waiters_pending() readers no longer invalidate the line
    that lock contenders are spinning to acquire.
  - Cross-CCD lock handoffs ship only the (waiters, lock) line; the
    next holder reads chain from its own L2/L3 instead of fetching
    chain entries together with the lock byte.

This improves "perf bench futex hash" on a 176-core AMD EPYC 9D64 by
15%:

                   baseline    +fix       delta
  average      1,394,938   1,616,781    +15.9 %
  median       1,430,012   1,617,072    +13.1 %
  min          1,214,488   1,501,741    +23.5 %
  max          1,488,167   1,730,734    +16.3 %

The distributions do not overlap: the slowest +fix run (1.50 M) is
faster than every baseline run except the single fastest (1.49 M).

This improves wake up latency as well:

perf bench futex wake -s (broadcast wakeup latency, lower is better):
  baseline:   0.300 / 0.329 / 0.266 ms   (avg 0.298)
  +fix:       0.292 / 0.253 / 0.270 ms   (avg 0.272, -9 %)

Cost: one extra cacheline (56 B padding) per bucket. Would it be
acceptable?

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 kernel/futex/futex.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index 79ef2c709c81..4981dcf465a9 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -142,7 +142,16 @@ static inline bool should_fail_futex(bool fshared)
 struct futex_hash_bucket {
 	atomic_t waiters;
 	spinlock_t lock;
-	struct plist_head chain;
+	/*
+	 * Keep the plist_head chain on its own cacheline. Lockless
+	 * futex_hb_waiters_pending() readers and lock contenders touch
+	 * the (waiters, lock) line; the lock holder modifies chain on
+	 * every wake/queue. perf c2c on a busy 176-core AMD host showed
+	 * this bucket cacheline as the #1 HITM source (129 Lcl + 31 Rmt
+	 * in 5s), hit by 156 distinct CPUs at offset 0x4 (lock) and
+	 * 0x8/0x10 (chain.{next,prev}).
+	 */
+	struct plist_head chain ____cacheline_aligned_in_smp;
 	struct futex_private_hash *priv;
 } ____cacheline_aligned_in_smp;
 

---
base-commit: b99ae45861eccff1e1d8c7b05a13650be805d437
change-id: 20260605-futex-c5478d627985

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock
  2026-06-05 16:53 [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock Breno Leitao
@ 2026-06-09 10:46 ` Peter Zijlstra
  2026-06-09 15:28   ` Breno Leitao
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2026-06-09 10:46 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Thomas Gleixner, Ingo Molnar, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-kernel, puranjay, rmikey, stuclar,
	namhyung, kernel-team

On Fri, Jun 05, 2026 at 09:53:12AM -0700, Breno Leitao wrote:
> struct futex_hash_bucket packs (atomic_t waiters, spinlock_t lock,
> struct plist_head chain, struct futex_private_hash *priv) into a
> single ____cacheline_aligned_in_smp 64-byte block. Three distinct
> access patterns hit that line:
> 
>   1. Lockless atomic_read(&hb->waiters) via futex_hb_waiters_pending()
>      on the fast path before taking the lock.
>   2. spin_lock(&hb->lock) contenders writing the lock word.
>   3. The lock holder modifying chain.{next,prev} on every futex_wake,
>      futex_q_unlock, plist_add, __futex_unqueue.
> 
> This was first noticed on a Meta cache (ucache) production workload:
> perf c2c on a busy 176-core AMD EPYC 9D64 ranked this exact cacheline as
> the #1 HITM source: 129 Local + 31 Remote HITM, hit by 156 distinct
> CPUs in a second.
> 
> The contention is not specific to that workload, though. Our very own
> "perf bench futex" hash exercises the same buckets and shows the same
> false sharing, so the rest of this changelog quantifies the fix with
> perf bench futex.

So I can't see this. After 'fixing' the benchmark to run with a fixed
number of buckets (see below), a perf c2c record shows the
futex_hash_bucket::priv load to be the 'expensive' (when doing perf
report on that, rather than perf c2c report, because this latter is
total garbage)

> Move chain to its own cacheline so:
>   - Lockless waiters_pending() readers no longer invalidate the line
>     that lock contenders are spinning to acquire.
>   - Cross-CCD lock handoffs ship only the (waiters, lock) line; the
>     next holder reads chain from its own L2/L3 instead of fetching
>     chain entries together with the lock byte.
> 
> This improves "perf bench futex hash" on a 176-core AMD EPYC 9D64 by
> 15%:
> 
>                    baseline    +fix       delta
>   average      1,394,938   1,616,781    +15.9 %
>   median       1,430,012   1,617,072    +13.1 %
>   min          1,214,488   1,501,741    +23.5 %
>   max          1,488,167   1,730,734    +16.3 %
> 
> The distributions do not overlap: the slowest +fix run (1.50 M) is
> faster than every baseline run except the single fastest (1.49 M).

When I run: "perf bench futex hash", I do see massive contention, but
not on the line you mention. Instead we hammer mm->futex.phash.atomic in
futex_ref_{get,put}().

These are the atomic_long_inc_not_zero() / atomic_long_dec_and_test().

The reason this happens is unfortunate, you would want this thing to hit
the PERCPU fast-path, but due to the per-thread auto scaling, the
benchmark startup phase allocates a (2 thread) small hash, then a bigger
and a bigger, for each next thread that comes in.

Per there being a pending new hash, we drop to ATOMIC mode, such that we
can actually observe the 0 references.

However, because the benchmark is in fact hammering the buckets (per
design), it will never actually hit 0 references and swap in the larger
hash.

If one were to specific an explicit number of buckets, the benchmark
will function correctly:

  				       v7.1-rc7	+patch

  perf bench futex hash			 192479  195523  +1.5%
  perf bench futex hash -b 256		3453734 3987880 +15.5%

And then I do see the improvement from your patch, but I really cannot
make sense of your reasoning for it.

> Cost: one extra cacheline (56 B padding) per bucket. Would it be
> acceptable?

I'm really not sure, it *doubles* the futex memory cost.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock
  2026-06-09 10:46 ` Peter Zijlstra
@ 2026-06-09 15:28   ` Breno Leitao
  2026-06-09 20:11     ` Peter Zijlstra
  2026-06-09 20:16     ` Thomas Gleixner
  0 siblings, 2 replies; 9+ messages in thread
From: Breno Leitao @ 2026-06-09 15:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-kernel, puranjay, rmikey, stuclar,
	namhyung, kernel-team

Hello Peter,

On Tue, Jun 09, 2026 at 12:46:03PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 05, 2026 at 09:53:12AM -0700, Breno Leitao wrote:
> > struct futex_hash_bucket packs (atomic_t waiters, spinlock_t lock,
> > struct plist_head chain, struct futex_private_hash *priv) into a
> > single ____cacheline_aligned_in_smp 64-byte block. Three distinct
> > access patterns hit that line:
> > 
> >   1. Lockless atomic_read(&hb->waiters) via futex_hb_waiters_pending()
> >      on the fast path before taking the lock.
> >   2. spin_lock(&hb->lock) contenders writing the lock word.
> >   3. The lock holder modifying chain.{next,prev} on every futex_wake,
> >      futex_q_unlock, plist_add, __futex_unqueue.
> > 
> > This was first noticed on a Meta cache (ucache) production workload:
> > perf c2c on a busy 176-core AMD EPYC 9D64 ranked this exact cacheline as
> > the #1 HITM source: 129 Local + 31 Remote HITM, hit by 156 distinct
> > CPUs in a second.
> > 
> > The contention is not specific to that workload, though. Our very own
> > "perf bench futex" hash exercises the same buckets and shows the same
> > false sharing, so the rest of this changelog quantifies the fix with
> > perf bench futex.
> 
> So I can't see this. After 'fixing' the benchmark to run with a fixed
> number of buckets (see below), a perf c2c record shows the
> futex_hash_bucket::priv load to be the 'expensive' (when doing perf
> report on that, rather than perf c2c report, because this latter is
> total garbage)

I am able to confirm with both. Keep in mind that I am using a multi CCD
CPU (AMD EPYC 9D64).

I ran perf c2c record on an EPYC 9D64 (88C/176T, 1 socket, multi-CCD)
under `perf bench futex hash -b 256`, on baseline and patched. Top hot
kernel HITM lines:

  Baseline (b99ae45861ec):
    offset 0x00  futex_q_lock                  core.c:865   (waiters)
    offset 0x04  queued_spin_lock_slowpath     qspinlock.c  (lock)
    offset 0x04  _raw_spin_lock                atomic.h     (lock)
    offset 0x18  futex_hash                    core.c:312   (priv)

  + With this patch 
    offset 0x00  futex_q_lock                  core.c:865   (waiters)
    offset 0x04  queued_spin_lock_slowpath     qspinlock.c  (lock)
    offset 0x04  _raw_spin_lock                atomic.h     (lock)
    [no priv entry on this cacheline]

`futex_hash` is literally the lockless `fph = hb->priv`
read. On baseline it sits on the lock cacheline at offset 0x18 and is
a top HITM source - exactly what you saw. On this patch that entry is
gone from the lock cacheline.

What remains at offsets 0x00 and 0x04 is intrinsic lock contention
(waiters_pending fast path + queued spinlock hand-off); the patch can't reduce
that without changing the lock itself.

Throughput on the same run:

  baseline   : 1,267,863 ops/sec
  +This patch: 1,460,971 ops/sec

> > Move chain to its own cacheline so:
> >   - Lockless waiters_pending() readers no longer invalidate the line
> >     that lock contenders are spinning to acquire.
> >   - Cross-CCD lock handoffs ship only the (waiters, lock) line; the
> >     next holder reads chain from its own L2/L3 instead of fetching
> >     chain entries together with the lock byte.
> > 
> > This improves "perf bench futex hash" on a 176-core AMD EPYC 9D64 by
> > 15%:
> > 
> >                    baseline    +fix       delta
> >   average      1,394,938   1,616,781    +15.9 %
> >   median       1,430,012   1,617,072    +13.1 %
> >   min          1,214,488   1,501,741    +23.5 %
> >   max          1,488,167   1,730,734    +16.3 %
> > 
> > The distributions do not overlap: the slowest +fix run (1.50 M) is
> > faster than every baseline run except the single fastest (1.49 M).
> 
> When I run: "perf bench futex hash", I do see massive contention, but
> not on the line you mention. Instead we hammer mm->futex.phash.atomic in
> futex_ref_{get,put}().
> 
> These are the atomic_long_inc_not_zero() / atomic_long_dec_and_test().
> 
> The reason this happens is unfortunate, you would want this thing to hit
> the PERCPU fast-path, but due to the per-thread auto scaling, the
> benchmark startup phase allocates a (2 thread) small hash, then a bigger
> and a bigger, for each next thread that comes in.
> 
> Per there being a pending new hash, we drop to ATOMIC mode, such that we
> can actually observe the 0 references.
> 
> However, because the benchmark is in fact hammering the buckets (per
> design), it will never actually hit 0 references and swap in the larger
> hash.

Ack.  the auto-scaling pathology you described reproduces here too

> If one were to specific an explicit number of buckets, the benchmark
> will function correctly:
> 
>   				       v7.1-rc7	+patch
> 
>   perf bench futex hash			 192479  195523  +1.5%
>   perf bench futex hash -b 256		3453734 3987880 +15.5%
> 
> And then I do see the improvement from your patch, but I really cannot
> make sense of your reasoning for it.

So, let me rephrase it. The bucket cacheline takes hits from four access
patterns - the three I listed (waiters_pending readers, lock spinners,
lock-holder chain writes) plus the lockless `fph = hb->priv` load on the
futex_hash() fast path, which is what c2c surfaced. That priv load is the
dominant HITM source on baseline, not the chain writes I emphasized. 

> > Cost: one extra cacheline (56 B padding) per bucket. Would it be
> > acceptable?
> 
> I'm really not sure, it *doubles* the futex memory cost.

I think it's worth the trade. The global hash scales linearly with
num_possible_cpus(), so the extra bytes track the same curve as the machines
that actually need the fix

in simpler words, a box big enough to feel this contention has plenty of RAM
headroom to absorb it.

Thanks for the review,
--breno

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock
  2026-06-09 15:28   ` Breno Leitao
@ 2026-06-09 20:11     ` Peter Zijlstra
  2026-06-09 20:18       ` Peter Zijlstra
  2026-06-09 20:16     ` Thomas Gleixner
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2026-06-09 20:11 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Thomas Gleixner, Ingo Molnar, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-kernel, puranjay, rmikey, stuclar,
	namhyung, kernel-team

On Tue, Jun 09, 2026 at 08:28:16AM -0700, Breno Leitao wrote:

> > I'm really not sure, it *doubles* the futex memory cost.
> 
> I think it's worth the trade. The global hash scales linearly with
> num_possible_cpus(), so the extra bytes track the same curve as the machines
> that actually need the fix
> 
> in simpler words, a box big enough to feel this contention has plenty of RAM
> headroom to absorb it.

You might not have heard, but RAM has gotten ludicrously expensive.

Anyway, how does something like the below work for you? It's a total
hack job, but it (sorta) builds and runs.


---
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index ff2a4fb2993f..8555c76077af 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -124,7 +124,7 @@ late_initcall(fail_futex_debugfs);
 #endif /* CONFIG_FAIL_FUTEX */
 
 static struct futex_hash_bucket *
-__futex_hash(union futex_key *key, struct futex_private_hash *fph);
+__futex_hash(union futex_key *key, struct futex_private_hash *fph, struct futex_private_hash **fph_p);
 
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 static bool futex_ref_get(struct futex_private_hash *fph);
@@ -183,14 +183,6 @@ __futex_hash_private(union futex_key *key, struct futex_private_hash *fph)
 {
 	u32 hash;
 
-	if (!futex_key_is_private(key))
-		return NULL;
-
-	if (!fph)
-		fph = rcu_dereference(key->private.mm->futex_phash);
-	if (!fph || !fph->hash_mask)
-		return NULL;
-
 	hash = jhash2((void *)&key->private.address,
 		      sizeof(key->private.address) / 4,
 		      key->both.offset);
@@ -211,13 +203,12 @@ static void futex_rehash_private(struct futex_private_hash *old,
 
 		spin_lock(&hb_old->lock);
 		plist_for_each_entry_safe(this, tmp, &hb_old->chain, list) {
-
 			plist_del(&this->list, &hb_old->chain);
 			futex_hb_waiters_dec(hb_old);
 
 			WARN_ON_ONCE(this->lock_ptr != &hb_old->lock);
 
-			hb_new = __futex_hash(&this->key, new);
+			hb_new = __futex_hash(&this->key, new, NULL);
 			futex_hb_waiters_inc(hb_new);
 			/*
 			 * The new pointer isn't published yet but an already
@@ -299,18 +290,17 @@ struct futex_private_hash *futex_private_hash(void)
 	goto again;
 }
 
-struct futex_hash_bucket *futex_hash(union futex_key *key)
+struct futex_bucket_ref futex_hash(union futex_key *key)
 {
-	struct futex_private_hash *fph;
+	struct futex_private_hash *fph = NULL;
 	struct futex_hash_bucket *hb;
 
 again:
 	scoped_guard(rcu) {
-		hb = __futex_hash(key, NULL);
-		fph = hb->priv;
+		hb = __futex_hash(key, NULL, &fph);
 
 		if (!fph || futex_private_hash_get(fph))
-			return hb;
+			return (struct futex_bucket_ref){ .hb = hb, .fph = fph };
 	}
 	futex_pivot_hash(key->private.mm);
 	goto again;
@@ -412,17 +402,19 @@ static int futex_mpol(struct mm_struct *mm, unsigned long addr)
  * global hash is returned.
  */
 static struct futex_hash_bucket *
-__futex_hash(union futex_key *key, struct futex_private_hash *fph)
+__futex_hash(union futex_key *key, struct futex_private_hash *fph, struct futex_private_hash **fph_p)
 {
 	int node = key->both.node;
 	u32 hash;
 
-	if (node == FUTEX_NO_NODE) {
-		struct futex_hash_bucket *hb;
-
-		hb = __futex_hash_private(key, fph);
-		if (hb)
-			return hb;
+	if (node == FUTEX_NO_NODE && futex_key_is_private(key)) {
+		if (!fph)
+			fph = rcu_dereference(key->private.mm->futex_phash);
+		if (fph && fph->hash_mask) {
+			if (fph_p)
+				*fph_p = fph;
+			return __futex_hash_private(key, fph);
+		}
 	}
 
 	hash = jhash2((u32 *)key,
@@ -1348,7 +1340,8 @@ static void exit_pi_state_list(struct task_struct *curr)
 		pi_state = list_entry(next, struct futex_pi_state, list);
 		key = pi_state->key;
 		if (1) {
-			CLASS(hb, hb)(&key);
+			CLASS(hb, hbr)(&key);
+			struct futex_hash_bucket *hb = hbr.hb;
 
 			/*
 			 * We can race against put_pi_state() removing itself from the
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index 9f6bf6f585fc..37fc944edeb9 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -2,6 +2,7 @@
 #ifndef _FUTEX_H
 #define _FUTEX_H
 
+#include "linux/mm_types.h"
 #include <linux/futex.h>
 #include <linux/rtmutex.h>
 #include <linux/sched/wake_q.h>
@@ -222,7 +223,6 @@ extern struct hrtimer_sleeper *
 futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
 		  int flags, u64 range_ns);
 
-extern struct futex_hash_bucket *futex_hash(union futex_key *key);
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 extern void futex_hash_get(struct futex_hash_bucket *hb);
 extern void futex_hash_put(struct futex_hash_bucket *hb);
@@ -237,8 +237,15 @@ static inline struct futex_private_hash *futex_private_hash(void) { return NULL;
 static inline void futex_private_hash_put(struct futex_private_hash *fph) { }
 #endif
 
-DEFINE_CLASS(hb, struct futex_hash_bucket *,
-	     if (_T) futex_hash_put(_T),
+struct futex_bucket_ref {
+	struct futex_hash_bucket *hb;
+	struct futex_private_hash *fph;
+};
+
+extern struct futex_bucket_ref futex_hash(union futex_key *key);
+
+DEFINE_CLASS(hb, struct futex_bucket_ref,
+	     if (_T.fph) futex_private_hash_put(_T.fph),
 	     futex_hash(key), union futex_key *key);
 
 DEFINE_CLASS(private_hash, struct futex_private_hash *,
diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c
index 643199fdbe62..5c227a4d963d 100644
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -945,7 +945,8 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 
 retry_private:
 	if (1) {
-		CLASS(hb, hb)(&q.key);
+		CLASS(hb, hbr)(&q.key);
+		struct futex_hash_bucket *hb = hbr.hb;
 
 		futex_q_lock(&q, hb);
 
@@ -1101,9 +1102,9 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 		futex_unqueue_pi(&q);
 		spin_unlock(q.lock_ptr);
 		if (q.drop_hb_ref) {
-			CLASS(hb, hb)(&q.key);
+			CLASS(hb, hbr)(&q.key);
 			/* Additional reference from futex_unlock_pi() */
-			futex_hash_put(hb);
+			futex_hash_put(hbr.hb);
 		}
 		goto out;
 
@@ -1162,7 +1163,8 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 	if (ret)
 		return ret;
 
-	CLASS(hb, hb)(&key);
+	CLASS(hb, hbr)(&key);
+	struct futex_hash_bucket *hb = hbr.hb;
 	spin_lock(&hb->lock);
 retry_hb:
 
diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c
index 1d99a84dc9ad..8ae99b7cb873 100644
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -459,8 +459,10 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
 
 retry_private:
 	if (1) {
-		CLASS(hb, hb1)(&key1);
-		CLASS(hb, hb2)(&key2);
+		CLASS(hb, hbr1)(&key1);
+		CLASS(hb, hbr2)(&key2);
+		struct futex_hash_bucket *hb1 = hbr1.hb;
+		struct futex_hash_bucket *hb2 = hbr2.hb;
 
 		futex_hb_waiters_inc(hb2);
 		double_lock_hb(hb1, hb2);
@@ -838,7 +840,8 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 	switch (futex_requeue_pi_wakeup_sync(&q)) {
 	case Q_REQUEUE_PI_IGNORE:
 		{
-			CLASS(hb, hb)(&q.key);
+			CLASS(hb, hbr)(&q.key);
+			struct futex_hash_bucket *hb = hbr.hb;
 			/* The waiter is still on uaddr1 */
 			spin_lock(&hb->lock);
 			ret = handle_early_requeue_pi_wakeup(hb, &q, to);
@@ -909,9 +912,9 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 		BUG();
 	}
 	if (q.drop_hb_ref) {
-		CLASS(hb, hb)(&q.key);
+		CLASS(hb, hbr)(&q.key);
 		/* Additional reference from requeue_pi_wake_futex() */
-		futex_hash_put(hb);
+		futex_hash_put(hbr.hb);
 	}
 
 out:
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index ceed9d879059..8c8e3ae899cb 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -169,7 +169,8 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 	if ((flags & FLAGS_STRICT) && !nr_wake)
 		return 0;
 
-	CLASS(hb, hb)(&key);
+	CLASS(hb, hbr)(&key);
+	struct futex_hash_bucket *hb = hbr.hb;
 
 	/* Make sure we really have tasks to wakeup */
 	if (!futex_hb_waiters_pending(hb))
@@ -266,8 +267,10 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
 
 retry_private:
 	if (1) {
-		CLASS(hb, hb1)(&key1);
-		CLASS(hb, hb2)(&key2);
+		CLASS(hb, hbr1)(&key1);
+		CLASS(hb, hbr2)(&key2);
+		struct futex_hash_bucket *hb1 = hbr1.hb;
+		struct futex_hash_bucket *hb2 = hbr2.hb;
 
 		double_lock_hb(hb1, hb2);
 		op_ret = futex_atomic_op_inuser(op, uaddr2);
@@ -446,7 +449,8 @@ int futex_wait_multiple_setup(struct futex_vector *vs, int count, int *woken)
 		u32 val = vs[i].w.val;
 
 		if (1) {
-			CLASS(hb, hb)(&q->key);
+			CLASS(hb, hbr)(&q->key);
+			struct futex_hash_bucket *hb = hbr.hb;
 
 			futex_q_lock(q, hb);
 			ret = futex_get_value_locked(&uval, uaddr);
@@ -621,7 +625,8 @@ int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
 
 retry_private:
 	if (1) {
-		CLASS(hb, hb)(&q->key);
+		CLASS(hb, hbr)(&q->key);
+		struct futex_hash_bucket *hb = hbr.hb;
 
 		futex_q_lock(q, hb);
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock
  2026-06-09 15:28   ` Breno Leitao
  2026-06-09 20:11     ` Peter Zijlstra
@ 2026-06-09 20:16     ` Thomas Gleixner
  2026-06-09 20:23       ` Peter Zijlstra
  2026-06-09 20:25       ` Peter Zijlstra
  1 sibling, 2 replies; 9+ messages in thread
From: Thomas Gleixner @ 2026-06-09 20:16 UTC (permalink / raw)
  To: Breno Leitao, Peter Zijlstra
  Cc: Ingo Molnar, Darren Hart, Davidlohr Bueso, André Almeida,
	linux-kernel, puranjay, rmikey, stuclar, namhyung, kernel-team

Breno!

On Tue, Jun 09 2026 at 08:28, Breno Leitao wrote:
> On Tue, Jun 09, 2026 at 12:46:03PM +0200, Peter Zijlstra wrote:
>> On Fri, Jun 05, 2026 at 09:53:12AM -0700, Breno Leitao wrote:
>>   perf bench futex hash			 192479  195523  +1.5%
>>   perf bench futex hash -b 256		3453734 3987880 +15.5%
>> 
>> And then I do see the improvement from your patch, but I really cannot
>> make sense of your reasoning for it.
>
> So, let me rephrase it. The bucket cacheline takes hits from four access
> patterns - the three I listed (waiters_pending readers, lock spinners,
> lock-holder chain writes) plus the lockless `fph = hb->priv` load on the
> futex_hash() fast path, which is what c2c surfaced. That priv load is the
> dominant HITM source on baseline, not the chain writes I emphasized. 

Ok. That makes a lot more sense now.

>> > Cost: one extra cacheline (56 B padding) per bucket. Would it be
>> > acceptable?
>> 
>> I'm really not sure, it *doubles* the futex memory cost.
>
> I think it's worth the trade. The global hash scales linearly with
> num_possible_cpus(), so the extra bytes track the same curve as the machines
> that actually need the fix
>
> in simpler words, a box big enough to feel this contention has plenty of RAM
> headroom to absorb it.

Well, it's not only about the global hash. The per process private hash
is affected too.

Can you try the completely untested below?

Thanks,

        tglx
---
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -124,7 +124,7 @@ late_initcall(fail_futex_debugfs);
 #endif /* CONFIG_FAIL_FUTEX */
 
 static struct futex_hash_bucket *
-__futex_hash(union futex_key *key, struct futex_private_hash *fph);
+__futex_hash(union futex_key *key, struct futex_private_hash **fph);
 
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 static bool futex_ref_get(struct futex_private_hash *fph);
@@ -179,22 +179,25 @@ void futex_hash_put(struct futex_hash_bu
 }
 
 static struct futex_hash_bucket *
-__futex_hash_private(union futex_key *key, struct futex_private_hash *fph)
+__futex_hash_private(union futex_key *key, struct futex_private_hash **fph)
 {
+	struct futex_private_hash *lfph = *fph;
 	u32 hash;
 
 	if (!futex_key_is_private(key))
 		return NULL;
 
-	if (!fph)
-		fph = rcu_dereference(key->private.mm->futex_phash);
-	if (!fph || !fph->hash_mask)
+	if (!lfph)
+		lfph = rcu_dereference(key->private.mm->futex_phash);
+	if (!lfph || !lfph->hash_mask)
 		return NULL;
 
+	*fph = lfph;
+
 	hash = jhash2((void *)&key->private.address,
 		      sizeof(key->private.address) / 4,
 		      key->both.offset);
-	return &fph->queues[hash & fph->hash_mask];
+	return &lfph->queues[hash & lfph->hash_mask];
 }
 
 static void futex_rehash_private(struct futex_private_hash *old,
@@ -217,7 +220,7 @@ static void futex_rehash_private(struct
 
 			WARN_ON_ONCE(this->lock_ptr != &hb_old->lock);
 
-			hb_new = __futex_hash(&this->key, new);
+			hb_new = __futex_hash(&this->key, &new);
 			futex_hb_waiters_inc(hb_new);
 			/*
 			 * The new pointer isn't published yet but an already
@@ -301,13 +304,12 @@ struct futex_private_hash *futex_private
 
 struct futex_hash_bucket *futex_hash(union futex_key *key)
 {
-	struct futex_private_hash *fph;
+	struct futex_private_hash *fph = NULL;
 	struct futex_hash_bucket *hb;
 
 again:
 	scoped_guard(rcu) {
-		hb = __futex_hash(key, NULL);
-		fph = hb->priv;
+		hb = __futex_hash(key, &fph);
 
 		if (!fph || futex_private_hash_get(fph))
 			return hb;
@@ -319,7 +321,7 @@ struct futex_hash_bucket *futex_hash(uni
 #else /* !CONFIG_FUTEX_PRIVATE_HASH */
 
 static struct futex_hash_bucket *
-__futex_hash_private(union futex_key *key, struct futex_private_hash *fph)
+__futex_hash_private(union futex_key *key, struct futex_private_hash **fph)
 {
 	return NULL;
 }
@@ -412,7 +414,7 @@ static int futex_mpol(struct mm_struct *
  * global hash is returned.
  */
 static struct futex_hash_bucket *
-__futex_hash(union futex_key *key, struct futex_private_hash *fph)
+__futex_hash(union futex_key *key, struct futex_private_hash **fph)
 {
 	int node = key->both.node;
 	u32 hash;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock
  2026-06-09 20:11     ` Peter Zijlstra
@ 2026-06-09 20:18       ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2026-06-09 20:18 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Thomas Gleixner, Ingo Molnar, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-kernel, puranjay, rmikey, stuclar,
	namhyung, kernel-team

On Tue, Jun 09, 2026 at 10:11:17PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 09, 2026 at 08:28:16AM -0700, Breno Leitao wrote:
> 
> > > I'm really not sure, it *doubles* the futex memory cost.
> > 
> > I think it's worth the trade. The global hash scales linearly with
> > num_possible_cpus(), so the extra bytes track the same curve as the machines
> > that actually need the fix
> > 
> > in simpler words, a box big enough to feel this contention has plenty of RAM
> > headroom to absorb it.
> 
> You might not have heard, but RAM has gotten ludicrously expensive.
> 
> Anyway, how does something like the below work for you? It's a total
> hack job, but it (sorta) builds and runs.
> 

Please use this one, I spotted a silly bug.

---
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index ff2a4fb2993f..fa0674e5d058 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -124,7 +124,7 @@ late_initcall(fail_futex_debugfs);
 #endif /* CONFIG_FAIL_FUTEX */
 
 static struct futex_hash_bucket *
-__futex_hash(union futex_key *key, struct futex_private_hash *fph);
+__futex_hash(union futex_key *key, struct futex_private_hash *fph, struct futex_private_hash **fph_p);
 
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 static bool futex_ref_get(struct futex_private_hash *fph);
@@ -183,14 +183,6 @@ __futex_hash_private(union futex_key *key, struct futex_private_hash *fph)
 {
 	u32 hash;
 
-	if (!futex_key_is_private(key))
-		return NULL;
-
-	if (!fph)
-		fph = rcu_dereference(key->private.mm->futex_phash);
-	if (!fph || !fph->hash_mask)
-		return NULL;
-
 	hash = jhash2((void *)&key->private.address,
 		      sizeof(key->private.address) / 4,
 		      key->both.offset);
@@ -211,13 +203,12 @@ static void futex_rehash_private(struct futex_private_hash *old,
 
 		spin_lock(&hb_old->lock);
 		plist_for_each_entry_safe(this, tmp, &hb_old->chain, list) {
-
 			plist_del(&this->list, &hb_old->chain);
 			futex_hb_waiters_dec(hb_old);
 
 			WARN_ON_ONCE(this->lock_ptr != &hb_old->lock);
 
-			hb_new = __futex_hash(&this->key, new);
+			hb_new = __futex_hash(&this->key, new, NULL);
 			futex_hb_waiters_inc(hb_new);
 			/*
 			 * The new pointer isn't published yet but an already
@@ -299,18 +290,17 @@ struct futex_private_hash *futex_private_hash(void)
 	goto again;
 }
 
-struct futex_hash_bucket *futex_hash(union futex_key *key)
+struct futex_bucket_ref futex_hash(union futex_key *key)
 {
-	struct futex_private_hash *fph;
-	struct futex_hash_bucket *hb;
-
 again:
 	scoped_guard(rcu) {
-		hb = __futex_hash(key, NULL);
-		fph = hb->priv;
+		struct futex_private_hash *fph = NULL;
+		struct futex_hash_bucket *hb;
+
+		hb = __futex_hash(key, NULL, &fph);
 
 		if (!fph || futex_private_hash_get(fph))
-			return hb;
+			return (struct futex_bucket_ref){ .hb = hb, .fph = fph };
 	}
 	futex_pivot_hash(key->private.mm);
 	goto again;
@@ -412,17 +402,19 @@ static int futex_mpol(struct mm_struct *mm, unsigned long addr)
  * global hash is returned.
  */
 static struct futex_hash_bucket *
-__futex_hash(union futex_key *key, struct futex_private_hash *fph)
+__futex_hash(union futex_key *key, struct futex_private_hash *fph, struct futex_private_hash **fph_p)
 {
 	int node = key->both.node;
 	u32 hash;
 
-	if (node == FUTEX_NO_NODE) {
-		struct futex_hash_bucket *hb;
-
-		hb = __futex_hash_private(key, fph);
-		if (hb)
-			return hb;
+	if (node == FUTEX_NO_NODE && futex_key_is_private(key)) {
+		if (!fph)
+			fph = rcu_dereference(key->private.mm->futex_phash);
+		if (fph && fph->hash_mask) {
+			if (fph_p)
+				*fph_p = fph;
+			return __futex_hash_private(key, fph);
+		}
 	}
 
 	hash = jhash2((u32 *)key,
@@ -1348,7 +1340,8 @@ static void exit_pi_state_list(struct task_struct *curr)
 		pi_state = list_entry(next, struct futex_pi_state, list);
 		key = pi_state->key;
 		if (1) {
-			CLASS(hb, hb)(&key);
+			CLASS(hb, hbr)(&key);
+			struct futex_hash_bucket *hb = hbr.hb;
 
 			/*
 			 * We can race against put_pi_state() removing itself from the
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index 9f6bf6f585fc..4cab346067fe 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -222,7 +222,6 @@ extern struct hrtimer_sleeper *
 futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout,
 		  int flags, u64 range_ns);
 
-extern struct futex_hash_bucket *futex_hash(union futex_key *key);
 #ifdef CONFIG_FUTEX_PRIVATE_HASH
 extern void futex_hash_get(struct futex_hash_bucket *hb);
 extern void futex_hash_put(struct futex_hash_bucket *hb);
@@ -237,8 +236,15 @@ static inline struct futex_private_hash *futex_private_hash(void) { return NULL;
 static inline void futex_private_hash_put(struct futex_private_hash *fph) { }
 #endif
 
-DEFINE_CLASS(hb, struct futex_hash_bucket *,
-	     if (_T) futex_hash_put(_T),
+struct futex_bucket_ref {
+	struct futex_hash_bucket *hb;
+	struct futex_private_hash *fph;
+};
+
+extern struct futex_bucket_ref futex_hash(union futex_key *key);
+
+DEFINE_CLASS(hb, struct futex_bucket_ref,
+	     if (_T.fph) futex_private_hash_put(_T.fph),
 	     futex_hash(key), union futex_key *key);
 
 DEFINE_CLASS(private_hash, struct futex_private_hash *,
diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c
index 643199fdbe62..5c227a4d963d 100644
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -945,7 +945,8 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 
 retry_private:
 	if (1) {
-		CLASS(hb, hb)(&q.key);
+		CLASS(hb, hbr)(&q.key);
+		struct futex_hash_bucket *hb = hbr.hb;
 
 		futex_q_lock(&q, hb);
 
@@ -1101,9 +1102,9 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 		futex_unqueue_pi(&q);
 		spin_unlock(q.lock_ptr);
 		if (q.drop_hb_ref) {
-			CLASS(hb, hb)(&q.key);
+			CLASS(hb, hbr)(&q.key);
 			/* Additional reference from futex_unlock_pi() */
-			futex_hash_put(hb);
+			futex_hash_put(hbr.hb);
 		}
 		goto out;
 
@@ -1162,7 +1163,8 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 	if (ret)
 		return ret;
 
-	CLASS(hb, hb)(&key);
+	CLASS(hb, hbr)(&key);
+	struct futex_hash_bucket *hb = hbr.hb;
 	spin_lock(&hb->lock);
 retry_hb:
 
diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c
index 1d99a84dc9ad..8ae99b7cb873 100644
--- a/kernel/futex/requeue.c
+++ b/kernel/futex/requeue.c
@@ -459,8 +459,10 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flags1,
 
 retry_private:
 	if (1) {
-		CLASS(hb, hb1)(&key1);
-		CLASS(hb, hb2)(&key2);
+		CLASS(hb, hbr1)(&key1);
+		CLASS(hb, hbr2)(&key2);
+		struct futex_hash_bucket *hb1 = hbr1.hb;
+		struct futex_hash_bucket *hb2 = hbr2.hb;
 
 		futex_hb_waiters_inc(hb2);
 		double_lock_hb(hb1, hb2);
@@ -838,7 +840,8 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 	switch (futex_requeue_pi_wakeup_sync(&q)) {
 	case Q_REQUEUE_PI_IGNORE:
 		{
-			CLASS(hb, hb)(&q.key);
+			CLASS(hb, hbr)(&q.key);
+			struct futex_hash_bucket *hb = hbr.hb;
 			/* The waiter is still on uaddr1 */
 			spin_lock(&hb->lock);
 			ret = handle_early_requeue_pi_wakeup(hb, &q, to);
@@ -909,9 +912,9 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
 		BUG();
 	}
 	if (q.drop_hb_ref) {
-		CLASS(hb, hb)(&q.key);
+		CLASS(hb, hbr)(&q.key);
 		/* Additional reference from requeue_pi_wake_futex() */
-		futex_hash_put(hb);
+		futex_hash_put(hbr.hb);
 	}
 
 out:
diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c
index ceed9d879059..8c8e3ae899cb 100644
--- a/kernel/futex/waitwake.c
+++ b/kernel/futex/waitwake.c
@@ -169,7 +169,8 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 	if ((flags & FLAGS_STRICT) && !nr_wake)
 		return 0;
 
-	CLASS(hb, hb)(&key);
+	CLASS(hb, hbr)(&key);
+	struct futex_hash_bucket *hb = hbr.hb;
 
 	/* Make sure we really have tasks to wakeup */
 	if (!futex_hb_waiters_pending(hb))
@@ -266,8 +267,10 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flags, u32 __user *uaddr2,
 
 retry_private:
 	if (1) {
-		CLASS(hb, hb1)(&key1);
-		CLASS(hb, hb2)(&key2);
+		CLASS(hb, hbr1)(&key1);
+		CLASS(hb, hbr2)(&key2);
+		struct futex_hash_bucket *hb1 = hbr1.hb;
+		struct futex_hash_bucket *hb2 = hbr2.hb;
 
 		double_lock_hb(hb1, hb2);
 		op_ret = futex_atomic_op_inuser(op, uaddr2);
@@ -446,7 +449,8 @@ int futex_wait_multiple_setup(struct futex_vector *vs, int count, int *woken)
 		u32 val = vs[i].w.val;
 
 		if (1) {
-			CLASS(hb, hb)(&q->key);
+			CLASS(hb, hbr)(&q->key);
+			struct futex_hash_bucket *hb = hbr.hb;
 
 			futex_q_lock(q, hb);
 			ret = futex_get_value_locked(&uval, uaddr);
@@ -621,7 +625,8 @@ int futex_wait_setup(u32 __user *uaddr, u32 val, unsigned int flags,
 
 retry_private:
 	if (1) {
-		CLASS(hb, hb)(&q->key);
+		CLASS(hb, hbr)(&q->key);
+		struct futex_hash_bucket *hb = hbr.hb;
 
 		futex_q_lock(q, hb);
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock
  2026-06-09 20:16     ` Thomas Gleixner
@ 2026-06-09 20:23       ` Peter Zijlstra
  2026-06-09 20:25       ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2026-06-09 20:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Breno Leitao, Ingo Molnar, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-kernel, puranjay, rmikey, stuclar,
	namhyung, kernel-team

On Tue, Jun 09, 2026 at 10:16:31PM +0200, Thomas Gleixner wrote:
> Breno!
> 
> On Tue, Jun 09 2026 at 08:28, Breno Leitao wrote:
> > On Tue, Jun 09, 2026 at 12:46:03PM +0200, Peter Zijlstra wrote:
> >> On Fri, Jun 05, 2026 at 09:53:12AM -0700, Breno Leitao wrote:
> >>   perf bench futex hash			 192479  195523  +1.5%
> >>   perf bench futex hash -b 256		3453734 3987880 +15.5%
> >> 
> >> And then I do see the improvement from your patch, but I really cannot
> >> make sense of your reasoning for it.
> >
> > So, let me rephrase it. The bucket cacheline takes hits from four access
> > patterns - the three I listed (waiters_pending readers, lock spinners,
> > lock-holder chain writes) plus the lockless `fph = hb->priv` load on the
> > futex_hash() fast path, which is what c2c surfaced. That priv load is the
> > dominant HITM source on baseline, not the chain writes I emphasized. 
> 
> Ok. That makes a lot more sense now.
> 
> >> > Cost: one extra cacheline (56 B padding) per bucket. Would it be
> >> > acceptable?
> >> 
> >> I'm really not sure, it *doubles* the futex memory cost.
> >
> > I think it's worth the trade. The global hash scales linearly with
> > num_possible_cpus(), so the extra bytes track the same curve as the machines
> > that actually need the fix
> >
> > in simpler words, a box big enough to feel this contention has plenty of RAM
> > headroom to absorb it.
> 
> Well, it's not only about the global hash. The per process private hash
> is affected too.
> 
> Can you try the completely untested below?

This moves the access to futex_hash_put() :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock
  2026-06-09 20:16     ` Thomas Gleixner
  2026-06-09 20:23       ` Peter Zijlstra
@ 2026-06-09 20:25       ` Peter Zijlstra
  2026-06-09 20:32         ` Thomas Gleixner
  1 sibling, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2026-06-09 20:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Breno Leitao, Ingo Molnar, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-kernel, puranjay, rmikey, stuclar,
	namhyung, kernel-team

On Tue, Jun 09, 2026 at 10:16:31PM +0200, Thomas Gleixner wrote:
> @@ -301,13 +304,12 @@ struct futex_private_hash *futex_private
>  
>  struct futex_hash_bucket *futex_hash(union futex_key *key)
>  {
> -	struct futex_private_hash *fph;
> +	struct futex_private_hash *fph = NULL;
>  	struct futex_hash_bucket *hb;
>  
>  again:
>  	scoped_guard(rcu) {
> -		hb = __futex_hash(key, NULL);
> -		fph = hb->priv;
> +		hb = __futex_hash(key, &fph);
>  
>  		if (!fph || futex_private_hash_get(fph))
>  			return hb;

Also, same bug I had in my first patch, you need to re-set fph to NULL
on the goto again :-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock
  2026-06-09 20:25       ` Peter Zijlstra
@ 2026-06-09 20:32         ` Thomas Gleixner
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2026-06-09 20:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Breno Leitao, Ingo Molnar, Darren Hart, Davidlohr Bueso,
	André Almeida, linux-kernel, puranjay, rmikey, stuclar,
	namhyung, kernel-team

On Tue, Jun 09 2026 at 22:25, Peter Zijlstra wrote:
> On Tue, Jun 09, 2026 at 10:16:31PM +0200, Thomas Gleixner wrote:
>> @@ -301,13 +304,12 @@ struct futex_private_hash *futex_private
>>  
>>  struct futex_hash_bucket *futex_hash(union futex_key *key)
>>  {
>> -	struct futex_private_hash *fph;
>> +	struct futex_private_hash *fph = NULL;
>>  	struct futex_hash_bucket *hb;
>>  
>>  again:
>>  	scoped_guard(rcu) {
>> -		hb = __futex_hash(key, NULL);
>> -		fph = hb->priv;
>> +		hb = __futex_hash(key, &fph);
>>  
>>  		if (!fph || futex_private_hash_get(fph))
>>  			return hb;
>
> Also, same bug I had in my first patch, you need to re-set fph to NULL
> on the goto again :-)

Figured that out by now :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-06-09 20:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05 16:53 [PATCH RFC] futex: avoid false sharing between hb->chain and the bucket lock Breno Leitao
2026-06-09 10:46 ` Peter Zijlstra
2026-06-09 15:28   ` Breno Leitao
2026-06-09 20:11     ` Peter Zijlstra
2026-06-09 20:18       ` Peter Zijlstra
2026-06-09 20:16     ` Thomas Gleixner
2026-06-09 20:23       ` Peter Zijlstra
2026-06-09 20:25       ` Peter Zijlstra
2026-06-09 20:32         ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.