From: Peter Zijlstra <peterz@infradead.org>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Will Deacon <will@kernel.org>, Waiman Long <llong@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Tejun Heo <tj@kernel.org>, Barret Rhoden <brho@google.com>,
Josh Don <joshdon@google.com>, Dohyun Kim <dohyunkim@google.com>,
linux-arm-kernel@lists.infradead.org, kernel-team@meta.com
Subject: Re: [PATCH bpf-next v2 00/26] Resilient Queued Spin Lock
Date: Mon, 10 Feb 2025 11:49:31 +0100 [thread overview]
Message-ID: <20250210104931.GE31462@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20250210093840.GE10324@noisy.programming.kicks-ass.net>
On Mon, Feb 10, 2025 at 10:38:41AM +0100, Peter Zijlstra wrote:
> On Thu, Feb 06, 2025 at 02:54:08AM -0800, Kumar Kartikeya Dwivedi wrote:
>
>
> > Deadlock Detection
> > ~~~~~~~~~~~~~~~~~~
> > We handle two cases of deadlocks: AA deadlocks (attempts to acquire the
> > same lock again), and ABBA deadlocks (attempts to acquire two locks in
> > the opposite order from two distinct threads). Variants of ABBA
> > deadlocks may be encountered with more than two locks being held in the
> > incorrect order. These are not diagnosed explicitly, as they reduce to
> > ABBA deadlocks.
> >
> > Deadlock detection is triggered immediately when beginning the waiting
> > loop of a lock slow path.
> >
> > While timeouts ensure that any waiting loops in the locking slow path
> > terminate and return to the caller, it can be excessively long in some
> > situations. While the default timeout is short (0.5s), a stall for this
> > duration inside the kernel can set off alerts for latency-critical
> > services with strict SLOs. Ideally, the kernel should recover from an
> > undesired state of the lock as soon as possible.
> >
> > A multi-step strategy is used to recover the kernel from waiting loops
> > in the locking algorithm which may fail to terminate in a bounded amount
> > of time.
> >
> > * Each CPU maintains a table of held locks. Entries are inserted and
> > removed upon entry into lock, and exit from unlock, respectively.
> > * Deadlock detection for AA locks is thus simple: we have an AA
> > deadlock if we find a held lock entry for the lock we’re attempting
> > to acquire on the same CPU.
> > * During deadlock detection for ABBA, we search through the tables of
> > all other CPUs to find situations where we are holding a lock the
> > remote CPU is attempting to acquire, and they are holding a lock we
> > are attempting to acquire. Upon encountering such a condition, we
> > report an ABBA deadlock.
> > * We divide the duration between entry time point into the waiting loop
> > and the timeout time point into intervals of 1 ms, and perform
> > deadlock detection until timeout happens. Upon entry into the slow
> > path, and then completion of each 1 ms interval, we perform detection
> > of both AA and ABBA deadlocks. In the event that deadlock detection
> > yields a positive result, the recovery happens sooner than the
> > timeout. Otherwise, it happens as a last resort upon completion of
> > the timeout.
> >
> > Timeouts
> > ~~~~~~~~
> > Timeouts act as final line of defense against stalls for waiting loops.
> > The ‘ktime_get_mono_fast_ns’ function is used to poll for the current
> > time, and it is compared to the timestamp indicating the end time in the
> > waiter loop. Each waiting loop is instrumented to check an extra
> > condition using a macro. Internally, the macro implementation amortizes
> > the checking of the timeout to avoid sampling the clock in every
> > iteration. Precisely, the timeout checks are invoked every 64k
> > iterations.
> >
> > Recovery
> > ~~~~~~~~
>
> I'm probably bad at reading, but I failed to find anything that
> explained how you recover from a deadlock.
>
> Do you force unload the BPF program?
Even the simple AB-BA case,
CPU0 CPU1
lock-A lock-B
lock-B lock-A <-
just having a random lock op return -ETIMO doesn't actually solve
anything. Suppose CPU1's lock-A will time out; it will have to unwind
and release lock-B before CPU0 can make progress.
Worse, if CPU1 isn't quick enough to unwind and release B, then CPU0's
lock-B will also time out.
At which point they'll both try again and you're stuck in the same
place, no?
Given you *have* to unwind to make progress; why not move the entire
thing to a wound-wait style lock? Then you also get rid of the whole
timeout mess.
next prev parent reply other threads:[~2025-02-10 11:00 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-06 10:54 [PATCH bpf-next v2 00/26] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 01/26] locking: Move MCS struct definition to public header Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 02/26] locking: Move common qspinlock helpers to a private header Kumar Kartikeya Dwivedi
2025-02-07 23:21 ` kernel test robot
2025-02-06 10:54 ` [PATCH bpf-next v2 03/26] locking: Allow obtaining result of arch_mcs_spin_lock_contended Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 04/26] locking: Copy out qspinlock.c to rqspinlock.c Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 05/26] rqspinlock: Add rqspinlock.h header Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 06/26] rqspinlock: Drop PV and virtualization support Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 07/26] rqspinlock: Add support for timeouts Kumar Kartikeya Dwivedi
2025-02-10 9:56 ` Peter Zijlstra
2025-02-11 4:55 ` Alexei Starovoitov
2025-02-11 10:11 ` Peter Zijlstra
2025-02-11 18:00 ` Alexei Starovoitov
2025-02-06 10:54 ` [PATCH bpf-next v2 08/26] rqspinlock: Protect pending bit owners from stalls Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 09/26] rqspinlock: Protect waiters in queue " Kumar Kartikeya Dwivedi
2025-02-10 10:17 ` Peter Zijlstra
2025-02-13 6:20 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 10/26] rqspinlock: Protect waiters in trylock fallback " Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 11/26] rqspinlock: Add deadlock detection and recovery Kumar Kartikeya Dwivedi
2025-02-08 1:53 ` Alexei Starovoitov
2025-02-08 3:03 ` Kumar Kartikeya Dwivedi
2025-02-10 10:21 ` Peter Zijlstra
2025-02-13 6:11 ` Kumar Kartikeya Dwivedi
2025-02-10 10:36 ` Peter Zijlstra
2025-02-06 10:54 ` [PATCH bpf-next v2 12/26] rqspinlock: Add a test-and-set fallback Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 13/26] rqspinlock: Add basic support for CONFIG_PARAVIRT Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 14/26] rqspinlock: Add helper to print a splat on timeout or deadlock Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 15/26] rqspinlock: Add macros for rqspinlock usage Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 16/26] rqspinlock: Add locktorture support Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 17/26] rqspinlock: Hardcode cond_acquire loops to asm-generic implementation Kumar Kartikeya Dwivedi
2025-02-08 1:58 ` Alexei Starovoitov
2025-02-08 3:04 ` Kumar Kartikeya Dwivedi
2025-02-10 9:53 ` Peter Zijlstra
2025-02-10 10:03 ` Peter Zijlstra
2025-02-13 6:15 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 18/26] rqspinlock: Add entry to Makefile, MAINTAINERS Kumar Kartikeya Dwivedi
2025-02-07 14:14 ` kernel test robot
2025-02-07 14:45 ` kernel test robot
2025-02-08 0:43 ` kernel test robot
2025-02-06 10:54 ` [PATCH bpf-next v2 19/26] bpf: Convert hashtab.c to rqspinlock Kumar Kartikeya Dwivedi
2025-02-08 2:01 ` Alexei Starovoitov
2025-02-08 3:06 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 20/26] bpf: Convert percpu_freelist.c " Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 21/26] bpf: Convert lpm_trie.c " Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 22/26] bpf: Introduce rqspinlock kfuncs Kumar Kartikeya Dwivedi
2025-02-07 13:43 ` kernel test robot
2025-02-06 10:54 ` [PATCH bpf-next v2 23/26] bpf: Handle allocation failure in acquire_lock_state Kumar Kartikeya Dwivedi
2025-02-08 2:04 ` Alexei Starovoitov
2025-02-06 10:54 ` [PATCH bpf-next v2 24/26] bpf: Implement verifier support for rqspinlock Kumar Kartikeya Dwivedi
2025-02-12 0:08 ` Eduard Zingerman
2025-02-13 6:41 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 25/26] bpf: Maintain FIFO property for rqspinlock unlock Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 26/26] selftests/bpf: Add tests for rqspinlock Kumar Kartikeya Dwivedi
2025-02-12 0:14 ` Eduard Zingerman
2025-02-13 6:25 ` Kumar Kartikeya Dwivedi
2025-02-10 9:31 ` [PATCH bpf-next v2 00/26] Resilient Queued Spin Lock Peter Zijlstra
2025-02-10 9:38 ` Peter Zijlstra
2025-02-10 10:49 ` Peter Zijlstra [this message]
2025-02-11 4:37 ` Alexei Starovoitov
2025-02-11 10:43 ` Peter Zijlstra
2025-02-11 18:33 ` Alexei Starovoitov
2025-02-13 9:59 ` Peter Zijlstra
2025-02-14 2:37 ` Alexei Starovoitov
2025-03-04 10:46 ` Peter Zijlstra
2025-03-05 3:26 ` Alexei Starovoitov
2025-02-10 9:49 ` Peter Zijlstra
2025-02-10 19:16 ` Ankur Arora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250210104931.GE31462@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brho@google.com \
--cc=daniel@iogearbox.net \
--cc=dohyunkim@google.com \
--cc=eddyz87@gmail.com \
--cc=joshdon@google.com \
--cc=kernel-team@meta.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=llong@redhat.com \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=paulmck@kernel.org \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).