From: Jiri Olsa <olsajiri@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Jiri Olsa <olsajiri@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
linux-trace-kernel <linux-trace-kernel@vger.kernel.org>,
Martin KaFai Lau <kafai@fb.com>,
Eduard Zingerman <eddyz87@gmail.com>,
Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
Menglong Dong <menglong8.dong@gmail.com>,
Steven Rostedt <rostedt@kernel.org>
Subject: Re: [RFC bpf-next 00/12] bpf: tracing_multi link
Date: Fri, 6 Feb 2026 09:18:12 +0100 [thread overview]
Message-ID: <aYWjxMeXABGcYqEI@krava> (raw)
In-Reply-To: <CAADnVQLJJn=bSDvQrjNyyeNdLOZABKDjjR_3j6DG3pgbsJ+JXA@mail.gmail.com>
On Thu, Feb 05, 2026 at 07:55:19AM -0800, Alexei Starovoitov wrote:
> On Thu, Feb 5, 2026 at 12:55 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Wed, Feb 04, 2026 at 08:06:50AM -0800, Alexei Starovoitov wrote:
> > > On Wed, Feb 4, 2026 at 4:36 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 03, 2026 at 03:17:05PM -0800, Alexei Starovoitov wrote:
> > > > > On Tue, Feb 3, 2026 at 1:38 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > > > >
> > > > > > hi,
> > > > > > as an option to Meglong's change [1] I'm sending proposal for tracing_multi
> > > > > > link that does not add static trampoline but attaches program to all needed
> > > > > > trampolines.
> > > > > >
> > > > > > This approach keeps the same performance but has some drawbacks:
> > > > > >
> > > > > > - when attaching 20k functions we allocate and attach 20k trampolines
> > > > > > - during attachment we hold each trampoline mutex, so for above
> > > > > > 20k functions we will hold 20k mutexes during the attachment,
> > > > > > should be very prone to deadlock, but haven't hit it yet
> > > > >
> > > > > If you check that it's sorted and always take them in the same order
> > > > > then there will be no deadlock.
> > > > > Or just grab one global mutex first and then grab trampolines mutexes
> > > > > next in any order. The global one will serialize this attach operation.
> > > > >
> > > > > > It looks the trampoline allocations/generation might not be big a problem
> > > > > > and I'll try to find a solution for holding that many mutexes. If there's
> > > > > > no better solution I think having one read/write mutex for tracing multi
> > > > > > link attach/detach should work.
> > > > >
> > > > > If you mean to have one global mutex as I proposed above then I don't see
> > > > > a downside. It only serializes multiple libbpf calls.
> > > >
> > > > we also need to serialize it with standard single trampoline attach,
> > > > because the direct ftrace update is now done under trampoline->mutex:
> > > >
> > > > bpf_trampoline_link_prog(tr)
> > > > {
> > > > mutex_lock(&tr->mutex);
> > > > ...
> > > > update_ftrace_direct_*
> > > > ...
> > > > mutex_unlock(&tr->mutex);
> > > > }
> > > >
> > > > for tracing_multi we would link the program first (with tr->mutex)
> > > > and do the bulk ftrace update later (without tr->mutex)
> > > >
> > > > {
> > > > for each involved trampoline:
> > > > bpf_trampoline_link_prog
> > > >
> > > > --> and here we could race with some other thread doing single
> > > > trampoline attach
> > > >
> > > > update_ftrace_direct_*
> > > > }
> > > >
> > > > note the current version locks all tr->mutex instances all the way
> > > > through the update_ftrace_direct_* update
> > > >
> > > > I think we could use global rwsem and take read lock on single
> > > > trampoline attach path and write lock on tracing_multi attach,
> > > >
> > > > I thought we could take direct_mutex early, but that would mean
> > > > different order with trampoline mutex than we already have in
> > > > single attach path
> > >
> > > I feel we're talking past each other.
> > > I meant:
> > >
> > > For multi:
> > > 1. take some global mutex
> > > 2. take N tramp mutexes in any order
> > >
> > > For single:
> > > 1. take that 1 specific tramp mutex.
> >
> > ah ok, I understand, it's to prevent the lockup but keep holding all
> > the trampolines locks.. the rwsem I mentioned was for the 'fix', where
> > we do not take all the trampolines locks
>
> I don't understand how rwsem would help.
> All the operations on trampoline are protected by mutex.
> Switching to rw makes sense only if we can designate certain
> operations as "read" and others as "write" and number of "reads"
> dominate. This won't be the case with multi-fentry.
> And we still need to take all of them as "write" to update trampoline.
this applies to scenario where we do not hold all the trampoline locks,
in such case we could have race between single and multi attachment,
while single/single attachment race stays safe
as a fix the single attach would take read lock and multi attach would
take write lock, so single/single race is allowed and single/multi is
not ... showed in the patch below
but it might be too much.. in a sense that there's already many locks
involved in trampoline attach/detach, and simple global lock in multi
or just sorting the ids would be enough
jirka
---
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index b76bb545077b..edbc8f133dda 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -30,6 +30,8 @@ static struct hlist_head trampoline_ip_table[TRAMPOLINE_TABLE_SIZE];
/* serializes access to trampoline tables */
static DEFINE_MUTEX(trampoline_mutex);
+static DECLARE_RWSEM(multi_sem);
+
struct bpf_trampoline_ops {
int (*register_fentry)(struct bpf_trampoline *tr, void *new_addr, void *data);
int (*unregister_fentry)(struct bpf_trampoline *tr, u32 orig_flags, void *old_addr, void *data);
@@ -367,11 +369,7 @@ static struct bpf_trampoline *bpf_trampoline_lookup(u64 key, unsigned long ip)
head = &trampoline_ip_table[hash_64(tr->ip, TRAMPOLINE_HASH_BITS)];
hlist_add_head(&tr->hlist_ip, head);
refcount_set(&tr->refcnt, 1);
-#ifdef CONFIG_LOCKDEP
- mutex_init_with_key(&tr->mutex, &__lockdep_no_track__);
-#else
mutex_init(&tr->mutex);
-#endif
for (i = 0; i < BPF_TRAMP_MAX; i++)
INIT_HLIST_HEAD(&tr->progs_hlist[i]);
out:
@@ -871,6 +869,8 @@ int bpf_trampoline_link_prog(struct bpf_tramp_node *node,
{
int err;
+ guard(rwsem_read)(&multi_sem);
+
mutex_lock(&tr->mutex);
err = __bpf_trampoline_link_prog(node, tr, tgt_prog, &trampoline_ops, NULL);
mutex_unlock(&tr->mutex);
@@ -916,6 +916,8 @@ int bpf_trampoline_unlink_prog(struct bpf_tramp_node *node,
{
int err;
+ guard(rwsem_read)(&multi_sem);
+
mutex_lock(&tr->mutex);
err = __bpf_trampoline_unlink_prog(node, tr, tgt_prog, &trampoline_ops, NULL);
mutex_unlock(&tr->mutex);
@@ -1463,6 +1465,8 @@ int bpf_trampoline_multi_attach(struct bpf_prog *prog, u32 *ids,
struct bpf_trampoline *tr;
u64 key;
+ guard(rwsem_write)(&multi_sem);
+
data.reg = alloc_ftrace_hash(FTRACE_HASH_DEFAULT_BITS);
if (!data.reg)
return -ENOMEM;
@@ -1494,12 +1498,10 @@ int bpf_trampoline_multi_attach(struct bpf_prog *prog, u32 *ids,
tr = mnode->trampoline;
mutex_lock(&tr->mutex);
-
err = __bpf_trampoline_link_prog(&mnode->node, tr, NULL, &trampoline_multi_ops, &data);
- if (err) {
- mutex_unlock(&tr->mutex);
+ mutex_unlock(&tr->mutex);
+ if (err)
goto rollback_unlink;
- }
}
if (ftrace_hash_count(data.reg)) {
@@ -1516,11 +1518,6 @@ int bpf_trampoline_multi_attach(struct bpf_prog *prog, u32 *ids,
}
}
- for (i = 0; i < cnt; i++) {
- tr = link->nodes[i].trampoline;
- mutex_unlock(&tr->mutex);
- }
-
free_fentry_multi_data(&data);
return 0;
@@ -1528,6 +1525,7 @@ int bpf_trampoline_multi_attach(struct bpf_prog *prog, u32 *ids,
for (j = 0; j < i; j++) {
mnode = &link->nodes[j];
tr = mnode->trampoline;
+ mutex_lock(&tr->mutex);
WARN_ON_ONCE(__bpf_trampoline_unlink_prog(&mnode->node, tr, NULL,
&trampoline_multi_ops, &data));
mutex_unlock(&tr->mutex);
@@ -1550,6 +1548,8 @@ int bpf_trampoline_multi_detach(struct bpf_prog *prog, struct bpf_tracing_multi_
int i, cnt = link->nodes_cnt;
struct bpf_trampoline *tr;
+ guard(rwsem_write)(&multi_sem);
+
data.unreg = alloc_ftrace_hash(FTRACE_HASH_DEFAULT_BITS);
if (!data.unreg)
return -ENOMEM;
@@ -1567,6 +1567,7 @@ int bpf_trampoline_multi_detach(struct bpf_prog *prog, struct bpf_tracing_multi_
mutex_lock(&tr->mutex);
WARN_ON_ONCE(__bpf_trampoline_unlink_prog(&mnode->node, tr, NULL,
&trampoline_multi_ops, &data));
+ mutex_unlock(&tr->mutex);
}
if (ftrace_hash_count(data.unreg))
@@ -1576,7 +1577,6 @@ int bpf_trampoline_multi_detach(struct bpf_prog *prog, struct bpf_tracing_multi_
for (i = 0; i < cnt; i++) {
tr = link->nodes[i].trampoline;
- mutex_unlock(&tr->mutex);
bpf_trampoline_put(tr);
}
next prev parent reply other threads:[~2026-02-06 8:18 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-03 9:38 [RFC bpf-next 00/12] bpf: tracing_multi link Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 01/12] ftrace: Add ftrace_hash_count function Jiri Olsa
2026-02-03 15:40 ` Steven Rostedt
2026-02-04 12:06 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 02/12] bpf: Add struct bpf_trampoline_ops object Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 03/12] bpf: Add struct bpf_struct_ops_tramp_link object Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 04/12] bpf: Add struct bpf_tramp_node object Jiri Olsa
2026-02-04 19:00 ` Andrii Nakryiko
2026-02-05 8:57 ` Jiri Olsa
2026-02-05 22:27 ` Andrii Nakryiko
2026-02-06 8:27 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 05/12] bpf: Add multi tracing attach types Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-04 2:20 ` Leon Hwang
2026-02-04 12:41 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 06/12] bpf: Add bpf_trampoline_multi_attach/detach functions Jiri Olsa
2026-02-03 10:14 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-05 9:16 ` Menglong Dong
2026-02-05 13:45 ` Jiri Olsa
2026-02-11 8:04 ` Menglong Dong
2026-02-03 9:38 ` [RFC bpf-next 07/12] bpf: Add support to create tracing multi link Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-04 19:05 ` Andrii Nakryiko
2026-02-05 8:55 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 08/12] libbpf: Add btf__find_by_glob_kind function Jiri Olsa
2026-02-03 10:14 ` bot+bpf-ci
2026-02-04 19:04 ` Andrii Nakryiko
2026-02-05 8:57 ` Jiri Olsa
2026-02-05 22:45 ` Andrii Nakryiko
2026-02-06 8:43 ` Jiri Olsa
2026-02-06 16:58 ` Andrii Nakryiko
2026-02-03 9:38 ` [RFC bpf-next 09/12] libbpf: Add support to create tracing multi link Jiri Olsa
2026-02-03 10:14 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-04 19:05 ` Andrii Nakryiko
2026-02-17 22:06 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 10/12] selftests/bpf: Add fentry tracing multi func test Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:06 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 11/12] selftests/bpf: Add fentry intersected " Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 12/12] selftests/bpf: Add tracing multi benchmark test Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:06 ` Jiri Olsa
2026-02-03 23:17 ` [RFC bpf-next 00/12] bpf: tracing_multi link Alexei Starovoitov
2026-02-04 12:36 ` Jiri Olsa
2026-02-04 16:06 ` Alexei Starovoitov
2026-02-05 8:55 ` Jiri Olsa
2026-02-05 15:55 ` Alexei Starovoitov
2026-02-06 8:18 ` Jiri Olsa [this message]
2026-02-06 17:03 ` Andrii Nakryiko
2026-02-08 20:54 ` Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYWjxMeXABGcYqEI@krava \
--to=olsajiri@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=kafai@fb.com \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=menglong8.dong@gmail.com \
--cc=rostedt@kernel.org \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox