From: Peter Zijlstra <peterz@infradead.org>
To: Shrikanth Hegde <sshegde@linux.ibm.com>
Cc: "Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"André Almeida" <andrealmeid@igalia.com>,
"Darren Hart" <dvhart@infradead.org>,
"Davidlohr Bueso" <dave@stgolabs.net>,
"Ingo Molnar" <mingo@redhat.com>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Thomas Gleixner" <tglx@linutronix.de>,
linux-kernel@vger.kernel.org,
"Valentin Schneider" <vschneid@redhat.com>,
"Waiman Long" <longman@redhat.com>
Subject: Re: [PATCH v2 0/6] futex: Use RCU-based per-CPU reference counting
Date: Thu, 6 Nov 2025 10:29:29 +0100 [thread overview]
Message-ID: <20251106092929.GR4067720@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <ae8c6fd5-cc9c-44f3-a489-0346873f4be5@linux.ibm.com>
On Wed, Jul 16, 2025 at 11:51:46PM +0530, Shrikanth Hegde wrote:
> > Anyway, I think we can improve both. Does the below help?
> >
> >
> > ---
> > diff --git a/kernel/futex/core.c b/kernel/futex/core.c
> > index d9bb5567af0c..8c41d050bd1f 100644
> > --- a/kernel/futex/core.c
> > +++ b/kernel/futex/core.c
> > @@ -1680,10 +1680,10 @@ static bool futex_ref_get(struct futex_private_hash *fph)
> > {
> > struct mm_struct *mm = fph->mm;
> > - guard(rcu)();
> > + guard(preempt)();
> > - if (smp_load_acquire(&fph->state) == FR_PERCPU) {
> > - this_cpu_inc(*mm->futex_ref);
> > + if (READ_ONCE(fph->state) == FR_PERCPU) {
> > + __this_cpu_inc(*mm->futex_ref);
> > return true;
> > }
> > @@ -1694,10 +1694,10 @@ static bool futex_ref_put(struct futex_private_hash *fph)
> > {
> > struct mm_struct *mm = fph->mm;
> > - guard(rcu)();
> > + guard(preempt)();
> > - if (smp_load_acquire(&fph->state) == FR_PERCPU) {
> > - this_cpu_dec(*mm->futex_ref);
> > + if (READ_ONCE(fph->state) == FR_PERCPU) {
> > + __this_cpu_dec(*mm->futex_ref);
> > return false;
> > }
>
> Yes. It helps. It improves "-b 512" numbers by at-least 5%.
While talking with Sebastian about this work, I realized this patch was
never committed. So I've written it up like so, and will commit to
tip/locking/urgent soonish.
---
Subject: futex: Optimize per-cpu reference counting
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed, 16 Jul 2025 16:29:46 +0200
Shrikanth noted that the per-cpu reference counter was still some 10%
slower than the old immutable option (which removes the reference
counting entirely).
Further optimize the per-cpu reference counter by:
- switching from RCU to preempt;
- using __this_cpu_*() since we now have preempt disabled;
- switching from smp_load_acquire() to READ_ONCE().
This is all safe because disabling preemption inhibits the RCU grace
period exactly like rcu_read_lock().
Having preemption disabled allows using __this_cpu_*() provided the
only access to the variable is in task context -- which is the case
here.
Furthermore, since we know changing fph->state to FR_ATOMIC demands a
full RCU grace period we can rely on the implied smp_mb() from that to
replace the acquire barrier().
This is very similar to the percpu_down_read_internal() fast-path.
The reason this is significant for PowerPC is that it uses the generic
this_cpu_*() implementation which relies on local_irq_disable() (the
x86 implementation relies on it being a single memop instruction to be
IRQ-safe). Switching to preempt_disable() and __this_cpu*() avoids
this IRQ state swizzling. Also, PowerPC needs LWSYNC for the ACQUIRE
barrier, not having to use explicit barriers safes a bunch.
Combined this reduces the performance gap by half, down to some 5%.
Reported-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/futex/core.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -1680,10 +1680,10 @@ static bool futex_ref_get(struct futex_p
{
struct mm_struct *mm = fph->mm;
- guard(rcu)();
+ guard(preempt)();
- if (smp_load_acquire(&fph->state) == FR_PERCPU) {
- this_cpu_inc(*mm->futex_ref);
+ if (READ_ONCE(fph->state) == FR_PERCPU) {
+ __this_cpu_inc(*mm->futex_ref);
return true;
}
@@ -1694,10 +1694,10 @@ static bool futex_ref_put(struct futex_p
{
struct mm_struct *mm = fph->mm;
- guard(rcu)();
+ guard(preempt)();
- if (smp_load_acquire(&fph->state) == FR_PERCPU) {
- this_cpu_dec(*mm->futex_ref);
+ if (READ_ONCE(fph->state) == FR_PERCPU) {
+ __this_cpu_dec(*mm->futex_ref);
return false;
}
next prev parent reply other threads:[~2025-11-06 9:30 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-10 11:00 [PATCH v2 0/6] futex: Use RCU-based per-CPU reference counting Sebastian Andrzej Siewior
2025-07-10 11:00 ` [PATCH v2 1/6] selftests/futex: Adapt the private hash test to RCU related changes Sebastian Andrzej Siewior
2025-07-11 18:33 ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-07-10 11:00 ` [PATCH v2 2/6] futex: Use RCU-based per-CPU reference counting instead of rcuref_t Sebastian Andrzej Siewior
2025-07-11 18:33 ` [tip: locking/futex] " tip-bot2 for Peter Zijlstra
2025-08-16 2:38 ` Sean Christopherson
2025-08-26 0:38 ` Sean Christopherson
2025-07-30 12:20 ` [PATCH v2 2/6] " André Draszik
2025-07-30 19:44 ` Thomas Gleixner
2025-08-01 14:59 ` André Draszik
2025-08-02 13:22 ` [tip: locking/urgent] futex: Move futex cleanup to __mmdrop() tip-bot2 for Thomas Gleixner
2025-08-21 17:39 ` Breno Leitao
2025-07-10 11:00 ` [PATCH v2 3/6] futex: Make futex_private_hash_get() static Sebastian Andrzej Siewior
2025-07-11 18:33 ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-07-10 11:00 ` [PATCH v2 4/6] futex: Remove support for IMMUTABLE Sebastian Andrzej Siewior
2025-07-11 18:33 ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-07-10 11:00 ` [PATCH v2 5/6] selftests/futex: " Sebastian Andrzej Siewior
2025-07-11 18:33 ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-07-10 11:00 ` [PATCH v2 6/6] perf bench futex: " Sebastian Andrzej Siewior
2025-07-11 18:33 ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-07-15 15:59 ` [PATCH v2 0/6] futex: Use RCU-based per-CPU reference counting Shrikanth Hegde
2025-07-15 16:31 ` Sebastian Andrzej Siewior
2025-07-15 17:04 ` Shrikanth Hegde
2025-07-16 14:29 ` Peter Zijlstra
2025-07-16 18:21 ` Shrikanth Hegde
2025-11-06 9:29 ` Peter Zijlstra [this message]
2025-11-06 11:09 ` Sebastian Andrzej Siewior
2025-11-06 11:23 ` Peter Zijlstra
2025-11-06 20:17 ` Paul E. McKenney
2025-11-06 11:40 ` [tip: locking/urgent] futex: Optimize per-cpu " tip-bot2 for Peter Zijlstra
2025-11-06 15:26 ` Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251106092929.GR4067720@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=andrealmeid@igalia.com \
--cc=bigeasy@linutronix.de \
--cc=dave@stgolabs.net \
--cc=dvhart@infradead.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=sshegde@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.