All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: linux-kernel@vger.kernel.org
Cc: linux-tip-commits@vger.kernel.org,
	 "Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	x86@kernel.org
Subject: Re: [tip: locking/futex] futex: Use RCU-based per-CPU reference counting instead of rcuref_t
Date: Mon, 25 Aug 2025 17:38:08 -0700	[thread overview]
Message-ID: <aK0B8IYKIH1IHyDj@google.com> (raw)
In-Reply-To: <aJ_vEP2EHj6l0xRT@google.com>

On Fri, Aug 15, 2025, Sean Christopherson wrote:
> On Fri, Jul 11, 2025, tip-bot2 for Peter Zijlstra wrote:
> > The following commit has been merged into the locking/futex branch of tip:
> > 
> > Commit-ID:     56180dd20c19e5b0fa34822997a9ac66b517e7b3
> > Gitweb:        https://git.kernel.org/tip/56180dd20c19e5b0fa34822997a9ac66b517e7b3
> > Author:        Peter Zijlstra <peterz@infradead.org>
> > AuthorDate:    Thu, 10 Jul 2025 13:00:07 +02:00
> > Committer:     Peter Zijlstra <peterz@infradead.org>
> > CommitterDate: Fri, 11 Jul 2025 16:02:00 +02:00
> > 
> > futex: Use RCU-based per-CPU reference counting instead of rcuref_t
> > 
> > The use of rcuref_t for reference counting introduces a performance bottleneck
> > when accessed concurrently by multiple threads during futex operations.
> > 
> > Replace rcuref_t with special crafted per-CPU reference counters. The
> > lifetime logic remains the same.
> > 
> > The newly allocate private hash starts in FR_PERCPU state. In this state, each
> > futex operation that requires the private hash uses a per-CPU counter (an
> > unsigned int) for incrementing or decrementing the reference count.
> > 
> > When the private hash is about to be replaced, the per-CPU counters are
> > migrated to a atomic_t counter mm_struct::futex_atomic.
> > The migration process:
> > - Waiting for one RCU grace period to ensure all users observe the
> >   current private hash. This can be skipped if a grace period elapsed
> >   since the private hash was assigned.
> > 
> > - futex_private_hash::state is set to FR_ATOMIC, forcing all users to
> >   use mm_struct::futex_atomic for reference counting.
> > 
> > - After a RCU grace period, all users are guaranteed to be using the
> >   atomic counter. The per-CPU counters can now be summed up and added to
> >   the atomic_t counter. If the resulting count is zero, the hash can be
> >   safely replaced. Otherwise, active users still hold a valid reference.
> > 
> > - Once the atomic reference count drops to zero, the next futex
> >   operation will switch to the new private hash.
> > 
> > call_rcu_hurry() is used to speed up transition which otherwise might be
> > delay with RCU_LAZY. There is nothing wrong with using call_rcu(). The
> > side effects would be that on auto scaling the new hash is used later
> > and the SET_SLOTS prctl() will block longer.
> > 
> > [bigeasy: commit description + mm get/ put_async]
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Link: https://lore.kernel.org/r/20250710110011.384614-3-bigeasy@linutronix.de
> > ---
> 
> This is causing explosions on my test systems, in code that doesn't obviously
> have anything to do with futex.

Closing the loop, this turned out to be a KVM bug[*].  Why the futex changes
exposed the bug and caused explosions, I have no idea, but nothing suggests that
this patch is buggy.

[*] https://lore.kernel.org/all/20250825160406.ZVcVPStz@linutronix.de


> The most common symptom is a #GP on this code in try_to_wake_up():
> 
> 		/* Link @node into the waitqueue. */
> 		WRITE_ONCE(prev->next, node);
> 
> although on systems with 5-level paging I _think_ it just manifests as hard
> hanges (I assume because prev->next is corrupted, but is still canonical with
> LA57?  But that's a wild guess).
> 
> The failure always occurs when userspace writes /sys/module/kvm/parameters/nx_huge_pages,
> but I don't think there's anything KVM specific about the issue.  Simply writing
> the param doesn't explode, the problem only arises when I'm running tests in
> parallel (but then failure is almost immediate), so presumably there's a task
> migration angle or something?
> 
> Manually disabling CONFIG_FUTEX_PRIVATE_HASH makes the problem go away, and
> running with CONFIG_FUTEX_PRIVATE_HASH=y prior to this rework is also fine.  So
> it appears that the problem is specifically in the new code.
> 
> I can provide more info as needed next week.
> 
> Oops: general protection fault, probably for non-canonical address 0xff0e899fa1566052: 0000 [#1] SMP

...

> Call Trace:
>  <TASK>
>  _raw_spin_lock_irqsave+0x50/0x60
>  try_to_wake_up+0x4f/0x5d0
>  set_nx_huge_pages+0xe4/0x1c0 [kvm]
>  param_attr_store+0x89/0xf0
>  module_attr_store+0x1e/0x30
>  kernfs_fop_write_iter+0xe4/0x160
>  vfs_write+0x2cb/0x420
>  ksys_write+0x7f/0xf0
>  do_syscall_64+0x6f/0x1f0
>  ? arch_exit_to_user_mode_prepare+0x9/0x50
>  entry_SYSCALL_64_after_hwframe+0x4b/0x53

  reply	other threads:[~2025-08-26  0:38 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-10 11:00 [PATCH v2 0/6] futex: Use RCU-based per-CPU reference counting Sebastian Andrzej Siewior
2025-07-10 11:00 ` [PATCH v2 1/6] selftests/futex: Adapt the private hash test to RCU related changes Sebastian Andrzej Siewior
2025-07-11 18:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-07-10 11:00 ` [PATCH v2 2/6] futex: Use RCU-based per-CPU reference counting instead of rcuref_t Sebastian Andrzej Siewior
2025-07-11 18:33   ` [tip: locking/futex] " tip-bot2 for Peter Zijlstra
2025-08-16  2:38     ` Sean Christopherson
2025-08-26  0:38       ` Sean Christopherson [this message]
2025-07-30 12:20   ` [PATCH v2 2/6] " André Draszik
2025-07-30 19:44     ` Thomas Gleixner
2025-08-01 14:59       ` André Draszik
2025-08-02 13:22       ` [tip: locking/urgent] futex: Move futex cleanup to __mmdrop() tip-bot2 for Thomas Gleixner
2025-08-21 17:39         ` Breno Leitao
2025-07-10 11:00 ` [PATCH v2 3/6] futex: Make futex_private_hash_get() static Sebastian Andrzej Siewior
2025-07-11 18:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-07-10 11:00 ` [PATCH v2 4/6] futex: Remove support for IMMUTABLE Sebastian Andrzej Siewior
2025-07-11 18:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-07-10 11:00 ` [PATCH v2 5/6] selftests/futex: " Sebastian Andrzej Siewior
2025-07-11 18:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-07-10 11:00 ` [PATCH v2 6/6] perf bench futex: " Sebastian Andrzej Siewior
2025-07-11 18:33   ` [tip: locking/futex] " tip-bot2 for Sebastian Andrzej Siewior
2025-07-15 15:59 ` [PATCH v2 0/6] futex: Use RCU-based per-CPU reference counting Shrikanth Hegde
2025-07-15 16:31   ` Sebastian Andrzej Siewior
2025-07-15 17:04     ` Shrikanth Hegde
2025-07-16 14:29       ` Peter Zijlstra
2025-07-16 18:21         ` Shrikanth Hegde
2025-11-06  9:29           ` Peter Zijlstra
2025-11-06 11:09             ` Sebastian Andrzej Siewior
2025-11-06 11:23               ` Peter Zijlstra
2025-11-06 20:17                 ` Paul E. McKenney
2025-11-06 11:40             ` [tip: locking/urgent] futex: Optimize per-cpu " tip-bot2 for Peter Zijlstra
2025-11-06 15:26               ` Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aK0B8IYKIH1IHyDj@google.com \
    --to=seanjc@google.com \
    --cc=bigeasy@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.