From: Peter Zijlstra <peterz@infradead.org>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>,
bpf <bpf@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Will Deacon <will@kernel.org>, Waiman Long <llong@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Tejun Heo <tj@kernel.org>, Barret Rhoden <brho@google.com>,
Josh Don <joshdon@google.com>, Dohyun Kim <dohyunkim@google.com>,
linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
Kernel Team <kernel-team@meta.com>
Subject: Re: [PATCH bpf-next v2 00/26] Resilient Queued Spin Lock
Date: Tue, 11 Feb 2025 11:43:52 +0100 [thread overview]
Message-ID: <20250211104352.GC29593@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <CAADnVQ+3wu0WB2pXs4cccxfkbTb3TK8Z+act5egytiON+qN9tA@mail.gmail.com>
On Mon, Feb 10, 2025 at 08:37:06PM -0800, Alexei Starovoitov wrote:
> On Mon, Feb 10, 2025 at 2:49 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > Do you force unload the BPF program?
>
> Not yet. As you can imagine, cancelling bpf program is much
> harder than sending sigkill to the user space process.
So you are killing the user program? Because it wasn't at all clear what
if anything is done when this failure case is tripped.
> The prog needs to safely free all the resources it holds.
> This work was ongoing for a couple years now with numerous discussions.
Well, for you maybe, I'm new here. This is only the second submission,
and really only the first one I got to mostly read.
> > Even the simple AB-BA case,
> >
> > CPU0 CPU1
> > lock-A lock-B
> > lock-B lock-A <-
> >
> > just having a random lock op return -ETIMO doesn't actually solve
> > anything. Suppose CPU1's lock-A will time out; it will have to unwind
> > and release lock-B before CPU0 can make progress.
> >
> > Worse, if CPU1 isn't quick enough to unwind and release B, then CPU0's
> > lock-B will also time out.
> >
> > At which point they'll both try again and you're stuck in the same
> > place, no?
>
> Not really. You're missing that deadlock is not a normal case.
Well, if this is unpriv user programs, you should most definitely
consider them the normal case. Must assume user space is malicious.
> As soon as we have cancellation logic working we will be "sigkilling"
> prog where deadlock was detected.
Ah, so that's the plan, but not yet included here? This means that every
BPF program invocation must be 'cancellable'? What if kernel thread is
hitting tracepoint or somesuch?
So much details not clear to me and not explained either :/
> In this patch the verifier guarantees that the prog must check
> the return value from bpf_res_spin_lock().
Yeah, but so what? It can check and still not do the right thing. Only
checking the return value is consumed somehow doesn't really help much.
> The prog cannot keep re-trying.
> The only thing it can do is to exit.
Right, but it might have already modified things, how are you going to
recover from that?
> Failing to grab res_spin_lock() is not a normal condition.
If you're going to be exposing this to unpriv, I really do think you
should assume it to be the normal case.
> The prog has to implement a fallback path for it,
But verifier must verify it is sane fallback, how can it do that?
> > Given you *have* to unwind to make progress; why not move the entire
> > thing to a wound-wait style lock? Then you also get rid of the whole
> > timeout mess.
>
> We looked at things like ww_mutex_lock, but they don't fit.
> wound-wait is for databases where deadlock is normal and expected.
> The transaction has to be aborted and retried.
Right, which to me sounds exactly like what you want for unpriv.
Have the program structured such that it must acquire all locks before
it does a modification / store -- and have the verifier enforce this.
Then any lock failure can be handled by the bpf core, not the program
itself. Core can unlock all previously acquired locks, and core can
either re-attempt the program or 'skip' it after N failures.
It does mean the bpf core needs to track the acquired locks -- which you
already do, except it becomes mandatory, prog cannot acquire more than
~32 locks.
> res_spin_lock is different. It's kinda safe spin_lock that doesn't
> brick the kernel.
Well, 1/2 second is pretty much bricked imo.
> That was a conscious trade-off. Deadlocks are not normal.
I really do think you should assume they are normal, unpriv and all
that.
next prev parent reply other threads:[~2025-02-11 11:01 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-06 10:54 [PATCH bpf-next v2 00/26] Resilient Queued Spin Lock Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 01/26] locking: Move MCS struct definition to public header Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 02/26] locking: Move common qspinlock helpers to a private header Kumar Kartikeya Dwivedi
2025-02-07 23:21 ` kernel test robot
2025-02-06 10:54 ` [PATCH bpf-next v2 03/26] locking: Allow obtaining result of arch_mcs_spin_lock_contended Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 04/26] locking: Copy out qspinlock.c to rqspinlock.c Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 05/26] rqspinlock: Add rqspinlock.h header Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 06/26] rqspinlock: Drop PV and virtualization support Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 07/26] rqspinlock: Add support for timeouts Kumar Kartikeya Dwivedi
2025-02-10 9:56 ` Peter Zijlstra
2025-02-11 4:55 ` Alexei Starovoitov
2025-02-11 10:11 ` Peter Zijlstra
2025-02-11 18:00 ` Alexei Starovoitov
2025-02-06 10:54 ` [PATCH bpf-next v2 08/26] rqspinlock: Protect pending bit owners from stalls Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 09/26] rqspinlock: Protect waiters in queue " Kumar Kartikeya Dwivedi
2025-02-10 10:17 ` Peter Zijlstra
2025-02-13 6:20 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 10/26] rqspinlock: Protect waiters in trylock fallback " Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 11/26] rqspinlock: Add deadlock detection and recovery Kumar Kartikeya Dwivedi
2025-02-08 1:53 ` Alexei Starovoitov
2025-02-08 3:03 ` Kumar Kartikeya Dwivedi
2025-02-10 10:21 ` Peter Zijlstra
2025-02-13 6:11 ` Kumar Kartikeya Dwivedi
2025-02-10 10:36 ` Peter Zijlstra
2025-02-06 10:54 ` [PATCH bpf-next v2 12/26] rqspinlock: Add a test-and-set fallback Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 13/26] rqspinlock: Add basic support for CONFIG_PARAVIRT Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 14/26] rqspinlock: Add helper to print a splat on timeout or deadlock Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 15/26] rqspinlock: Add macros for rqspinlock usage Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 16/26] rqspinlock: Add locktorture support Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 17/26] rqspinlock: Hardcode cond_acquire loops to asm-generic implementation Kumar Kartikeya Dwivedi
2025-02-08 1:58 ` Alexei Starovoitov
2025-02-08 3:04 ` Kumar Kartikeya Dwivedi
2025-02-10 9:53 ` Peter Zijlstra
2025-02-10 10:03 ` Peter Zijlstra
2025-02-13 6:15 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 18/26] rqspinlock: Add entry to Makefile, MAINTAINERS Kumar Kartikeya Dwivedi
2025-02-07 14:14 ` kernel test robot
2025-02-07 14:45 ` kernel test robot
2025-02-08 0:43 ` kernel test robot
2025-02-06 10:54 ` [PATCH bpf-next v2 19/26] bpf: Convert hashtab.c to rqspinlock Kumar Kartikeya Dwivedi
2025-02-08 2:01 ` Alexei Starovoitov
2025-02-08 3:06 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 20/26] bpf: Convert percpu_freelist.c " Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 21/26] bpf: Convert lpm_trie.c " Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 22/26] bpf: Introduce rqspinlock kfuncs Kumar Kartikeya Dwivedi
2025-02-07 13:43 ` kernel test robot
2025-02-06 10:54 ` [PATCH bpf-next v2 23/26] bpf: Handle allocation failure in acquire_lock_state Kumar Kartikeya Dwivedi
2025-02-08 2:04 ` Alexei Starovoitov
2025-02-06 10:54 ` [PATCH bpf-next v2 24/26] bpf: Implement verifier support for rqspinlock Kumar Kartikeya Dwivedi
2025-02-12 0:08 ` Eduard Zingerman
2025-02-13 6:41 ` Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 25/26] bpf: Maintain FIFO property for rqspinlock unlock Kumar Kartikeya Dwivedi
2025-02-06 10:54 ` [PATCH bpf-next v2 26/26] selftests/bpf: Add tests for rqspinlock Kumar Kartikeya Dwivedi
2025-02-12 0:14 ` Eduard Zingerman
2025-02-13 6:25 ` Kumar Kartikeya Dwivedi
2025-02-10 9:31 ` [PATCH bpf-next v2 00/26] Resilient Queued Spin Lock Peter Zijlstra
2025-02-10 9:38 ` Peter Zijlstra
2025-02-10 10:49 ` Peter Zijlstra
2025-02-11 4:37 ` Alexei Starovoitov
2025-02-11 10:43 ` Peter Zijlstra [this message]
2025-02-11 18:33 ` Alexei Starovoitov
2025-02-13 9:59 ` Peter Zijlstra
2025-02-14 2:37 ` Alexei Starovoitov
2025-03-04 10:46 ` Peter Zijlstra
2025-03-05 3:26 ` Alexei Starovoitov
2025-02-10 9:49 ` Peter Zijlstra
2025-02-10 19:16 ` Ankur Arora
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250211104352.GC29593@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brho@google.com \
--cc=daniel@iogearbox.net \
--cc=dohyunkim@google.com \
--cc=eddyz87@gmail.com \
--cc=joshdon@google.com \
--cc=kernel-team@meta.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=llong@redhat.com \
--cc=martin.lau@kernel.org \
--cc=memxor@gmail.com \
--cc=paulmck@kernel.org \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).