From: Peter Zijlstra <peterz@infradead.org>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Will Deacon <will@kernel.org>, Waiman Long <llong@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Tejun Heo <tj@kernel.org>, Barret Rhoden <brho@google.com>,
Josh Don <joshdon@google.com>, Dohyun Kim <dohyunkim@google.com>,
linux-arm-kernel@lists.infradead.org, kernel-team@meta.com
Subject: Re: [PATCH bpf-next v2 00/26] Resilient Queued Spin Lock
Date: Mon, 10 Feb 2025 11:49:31 +0100 [thread overview]
Message-ID: <20250210104931.GE31462@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20250210093840.GE10324@noisy.programming.kicks-ass.net>
On Mon, Feb 10, 2025 at 10:38:41AM +0100, Peter Zijlstra wrote:
> On Thu, Feb 06, 2025 at 02:54:08AM -0800, Kumar Kartikeya Dwivedi wrote:
>
>
> > Deadlock Detection
> > ~~~~~~~~~~~~~~~~~~
> > We handle two cases of deadlocks: AA deadlocks (attempts to acquire the
> > same lock again), and ABBA deadlocks (attempts to acquire two locks in
> > the opposite order from two distinct threads). Variants of ABBA
> > deadlocks may be encountered with more than two locks being held in the
> > incorrect order. These are not diagnosed explicitly, as they reduce to
> > ABBA deadlocks.
> >
> > Deadlock detection is triggered immediately when beginning the waiting
> > loop of a lock slow path.
> >
> > While timeouts ensure that any waiting loops in the locking slow path
> > terminate and return to the caller, it can be excessively long in some
> > situations. While the default timeout is short (0.5s), a stall for this
> > duration inside the kernel can set off alerts for latency-critical
> > services with strict SLOs. Ideally, the kernel should recover from an
> > undesired state of the lock as soon as possible.
> >
> > A multi-step strategy is used to recover the kernel from waiting loops
> > in the locking algorithm which may fail to terminate in a bounded amount
> > of time.
> >
> > * Each CPU maintains a table of held locks. Entries are inserted and
> > removed upon entry into lock, and exit from unlock, respectively.
> > * Deadlock detection for AA locks is thus simple: we have an AA
> > deadlock if we find a held lock entry for the lock we’re attempting
> > to acquire on the same CPU.
> > * During deadlock detection for ABBA, we search through the tables of
> > all other CPUs to find situations where we are holding a lock the
> > remote CPU is attempting to acquire, and they are holding a lock we
> > are attempting to acquire. Upon encountering such a condition, we
> > report an ABBA deadlock.
> > * We divide the duration between entry time point into the waiting loop
> > and the timeout time point into intervals of 1 ms, and perform
> > deadlock detection until timeout happens. Upon entry into the slow
> > path, and then completion of each 1 ms interval, we perform detection
> > of both AA and ABBA deadlocks. In the event that deadlock detection
> > yields a positive result, the recovery happens sooner than the
> > timeout. Otherwise, it happens as a last resort upon completion of
> > the timeout.
> >
> > Timeouts
> > ~~~~~~~~
> > Timeouts act as final line of defense against stalls for waiting loops.
> > The ‘ktime_get_mono_fast_ns’ function is used to poll for the current
> > time, and it is compared to the timestamp indicating the end time in the
> > waiter loop. Each waiting loop is instrumented to check an extra
> > condition using a macro. Internally, the macro implementation amortizes
> > the checking of the timeout to avoid sampling the clock in every
> > iteration. Precisely, the timeout checks are invoked every 64k
> > iterations.
> >
> > Recovery
> > ~~~~~~~~
>
> I'm probably bad at reading, but I failed to find anything that
> explained how you recover from a deadlock.
>
> Do you force unload the BPF program?
Even the simple AB-BA case,
CPU0 CPU1
lock-A lock-B
lock-B lock-A <-
just having a random lock op return -ETIMO doesn't actually solve
anything. Suppose CPU1's lock-A will time out; it will have to unwind
and release lock-B before CPU0 can make progress.
Worse, if CPU1 isn't quick enough to unwind and release B, then CPU0's
lock-B will also time out.
At which point they'll both try again and you're stuck in the same
place, no?
Given you *have* to unwind to make progress; why not move the entire
thing to a wound-wait style lock? Then you also get rid of the whole
timeout mess.
next prev parent reply other threads:[~2025-02-10 10:49 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-06 10:54 [PATCH bpf-next v2 00/26] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 01/26] locking: Move MCS struct definition to public header Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 02/26] locking: Move common qspinlock helpers to a private header Kumar Kartikeya Dwivedi
2025-02-07 23:21 ` kernel test robot
2025-02-06 10:54 ` [PATCH bpf-next v2 03/26] locking: Allow obtaining result of arch_mcs_spin_lock_contended Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 04/26] locking: Copy out qspinlock.c to rqspinlock.c Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 05/26] rqspinlock: Add rqspinlock.h header Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 06/26] rqspinlock: Drop PV and virtualization support Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 07/26] rqspinlock: Add support for timeouts Kumar Kartikeya Dwivedi
2025-02-10 9:56 ` Peter Zijlstra
2025-02-11 4:55 ` Alexei Starovoitov
2025-02-11 10:11 ` Peter Zijlstra
2025-02-11 18:00 ` Alexei Starovoitov
2025-02-06 10:54 ` [PATCH bpf-next v2 08/26] rqspinlock: Protect pending bit owners from stalls Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 09/26] rqspinlock: Protect waiters in queue " Kumar Kartikeya Dwivedi
2025-02-10 10:17 ` Peter Zijlstra
2025-02-13 6:20 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 10/26] rqspinlock: Protect waiters in trylock fallback " Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 11/26] rqspinlock: Add deadlock detection and recovery Kumar Kartikeya Dwivedi
2025-02-08 1:53 ` Alexei Starovoitov
2025-02-08 3:03 ` Kumar Kartikeya Dwivedi
2025-02-10 10:21 ` Peter Zijlstra
2025-02-13 6:11 ` Kumar Kartikeya Dwivedi
2025-02-10 10:36 ` Peter Zijlstra
2025-02-06 10:54 ` [PATCH bpf-next v2 12/26] rqspinlock: Add a test-and-set fallback Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 13/26] rqspinlock: Add basic support for CONFIG_PARAVIRT Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 14/26] rqspinlock: Add helper to print a splat on timeout or deadlock Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 15/26] rqspinlock: Add macros for rqspinlock usage Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 16/26] rqspinlock: Add locktorture support Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 17/26] rqspinlock: Hardcode cond_acquire loops to asm-generic implementation Kumar Kartikeya Dwivedi
2025-02-08 1:58 ` Alexei Starovoitov
2025-02-08 3:04 ` Kumar Kartikeya Dwivedi
2025-02-10 9:53 ` Peter Zijlstra
2025-02-10 10:03 ` Peter Zijlstra
2025-02-13 6:15 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 18/26] rqspinlock: Add entry to Makefile, MAINTAINERS Kumar Kartikeya Dwivedi
2025-02-07 14:14 ` kernel test robot
2025-02-07 14:45 ` kernel test robot
2025-02-08 0:43 ` kernel test robot
2025-02-06 10:54 ` [PATCH bpf-next v2 19/26] bpf: Convert hashtab.c to rqspinlock Kumar Kartikeya Dwivedi
2025-02-08 2:01 ` Alexei Starovoitov
2025-02-08 3:06 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 20/26] bpf: Convert percpu_freelist.c " Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 21/26] bpf: Convert lpm_trie.c " Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 22/26] bpf: Introduce rqspinlock kfuncs Kumar Kartikeya Dwivedi
2025-02-07 13:43 ` kernel test robot
2025-02-06 10:54 ` [PATCH bpf-next v2 23/26] bpf: Handle allocation failure in acquire_lock_state Kumar Kartikeya Dwivedi
2025-02-08 2:04 ` Alexei Starovoitov
2025-02-06 10:54 ` [PATCH bpf-next v2 24/26] bpf: Implement verifier support for rqspinlock Kumar Kartikeya Dwivedi
2025-02-12 0:08 ` Eduard Zingerman
2025-02-13 6:41 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 25/26] bpf: Maintain FIFO property for rqspinlock unlock Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 26/26] selftests/bpf: Add tests for rqspinlock Kumar Kartikeya Dwivedi
2025-02-12 0:14 ` Eduard Zingerman
2025-02-13 6:25 ` Kumar Kartikeya Dwivedi
2025-02-10 9:31 ` [PATCH bpf-next v2 00/26] Resilient Queued Spin Lock Peter Zijlstra
2025-02-10 9:38 ` Peter Zijlstra
2025-02-10 10:49 ` Peter Zijlstra [this message]
2025-02-11 4:37 ` Alexei Starovoitov
2025-02-11 10:43 ` Peter Zijlstra
2025-02-11 18:33 ` Alexei Starovoitov
2025-02-13 9:59 ` Peter Zijlstra
2025-02-14 2:37 ` Alexei Starovoitov
2025-03-04 10:46 ` Peter Zijlstra
2025-03-05 3:26 ` Alexei Starovoitov
2025-02-10 9:49 ` Peter Zijlstra
2025-02-10 19:16 ` Ankur Arora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250210104931.GE31462@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brho@google.com \
--cc=daniel@iogearbox.net \
--cc=dohyunkim@google.com \
--cc=eddyz87@gmail.com \
--cc=joshdon@google.com \
--cc=kernel-team@meta.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=llong@redhat.com \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=paulmck@kernel.org \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.