From: Jiri Olsa <olsajiri@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Jiri Olsa <olsajiri@gmail.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
linux-trace-kernel <linux-trace-kernel@vger.kernel.org>,
Martin KaFai Lau <kafai@fb.com>,
Eduard Zingerman <eddyz87@gmail.com>,
Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
Menglong Dong <menglong8.dong@gmail.com>,
Steven Rostedt <rostedt@kernel.org>
Subject: Re: [RFC bpf-next 00/12] bpf: tracing_multi link
Date: Sun, 8 Feb 2026 21:54:38 +0100 [thread overview]
Message-ID: <aYj4DnCDlUHUSPgU@krava> (raw)
In-Reply-To: <CAEf4BzbUX8xEp1x66ugtj1C6sxWPJ+_9EpyVmy3TVOMiKrFcWA@mail.gmail.com>
On Fri, Feb 06, 2026 at 09:03:29AM -0800, Andrii Nakryiko wrote:
> On Fri, Feb 6, 2026 at 12:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, Feb 05, 2026 at 07:55:19AM -0800, Alexei Starovoitov wrote:
> > > On Thu, Feb 5, 2026 at 12:55 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > On Wed, Feb 04, 2026 at 08:06:50AM -0800, Alexei Starovoitov wrote:
> > > > > On Wed, Feb 4, 2026 at 4:36 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Feb 03, 2026 at 03:17:05PM -0800, Alexei Starovoitov wrote:
> > > > > > > On Tue, Feb 3, 2026 at 1:38 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > > > > > >
> > > > > > > > hi,
> > > > > > > > as an option to Meglong's change [1] I'm sending proposal for tracing_multi
> > > > > > > > link that does not add static trampoline but attaches program to all needed
> > > > > > > > trampolines.
> > > > > > > >
> > > > > > > > This approach keeps the same performance but has some drawbacks:
> > > > > > > >
> > > > > > > > - when attaching 20k functions we allocate and attach 20k trampolines
> > > > > > > > - during attachment we hold each trampoline mutex, so for above
> > > > > > > > 20k functions we will hold 20k mutexes during the attachment,
> > > > > > > > should be very prone to deadlock, but haven't hit it yet
> > > > > > >
> > > > > > > If you check that it's sorted and always take them in the same order
> > > > > > > then there will be no deadlock.
> > > > > > > Or just grab one global mutex first and then grab trampolines mutexes
> > > > > > > next in any order. The global one will serialize this attach operation.
> > > > > > >
> > > > > > > > It looks the trampoline allocations/generation might not be big a problem
> > > > > > > > and I'll try to find a solution for holding that many mutexes. If there's
> > > > > > > > no better solution I think having one read/write mutex for tracing multi
> > > > > > > > link attach/detach should work.
> > > > > > >
> > > > > > > If you mean to have one global mutex as I proposed above then I don't see
> > > > > > > a downside. It only serializes multiple libbpf calls.
> > > > > >
> > > > > > we also need to serialize it with standard single trampoline attach,
> > > > > > because the direct ftrace update is now done under trampoline->mutex:
> > > > > >
> > > > > > bpf_trampoline_link_prog(tr)
> > > > > > {
> > > > > > mutex_lock(&tr->mutex);
> > > > > > ...
> > > > > > update_ftrace_direct_*
> > > > > > ...
> > > > > > mutex_unlock(&tr->mutex);
> > > > > > }
> > > > > >
> > > > > > for tracing_multi we would link the program first (with tr->mutex)
> > > > > > and do the bulk ftrace update later (without tr->mutex)
> > > > > >
> > > > > > {
> > > > > > for each involved trampoline:
> > > > > > bpf_trampoline_link_prog
> > > > > >
> > > > > > --> and here we could race with some other thread doing single
> > > > > > trampoline attach
> > > > > >
> > > > > > update_ftrace_direct_*
> > > > > > }
> > > > > >
> > > > > > note the current version locks all tr->mutex instances all the way
> > > > > > through the update_ftrace_direct_* update
> > > > > >
> > > > > > I think we could use global rwsem and take read lock on single
> > > > > > trampoline attach path and write lock on tracing_multi attach,
> > > > > >
> > > > > > I thought we could take direct_mutex early, but that would mean
> > > > > > different order with trampoline mutex than we already have in
> > > > > > single attach path
> > > > >
> > > > > I feel we're talking past each other.
> > > > > I meant:
> > > > >
> > > > > For multi:
> > > > > 1. take some global mutex
> > > > > 2. take N tramp mutexes in any order
> > > > >
> > > > > For single:
> > > > > 1. take that 1 specific tramp mutex.
> > > >
> > > > ah ok, I understand, it's to prevent the lockup but keep holding all
> > > > the trampolines locks.. the rwsem I mentioned was for the 'fix', where
> > > > we do not take all the trampolines locks
> > >
> > > I don't understand how rwsem would help.
> > > All the operations on trampoline are protected by mutex.
> > > Switching to rw makes sense only if we can designate certain
> > > operations as "read" and others as "write" and number of "reads"
> > > dominate. This won't be the case with multi-fentry.
> > > And we still need to take all of them as "write" to update trampoline.
> >
> > this applies to scenario where we do not hold all the trampoline locks,
> > in such case we could have race between single and multi attachment,
> > while single/single attachment race stays safe
> >
> > as a fix the single attach would take read lock and multi attach would
> > take write lock, so single/single race is allowed and single/multi is
> > not ... showed in the patch below
> >
> > but it might be too much.. in a sense that there's already many locks
> > involved in trampoline attach/detach, and simple global lock in multi
> > or just sorting the ids would be enough
> >
>
> I'll just throw this idea here, but we don't have to do it right away.
> What if instead of having a per-trampoline lock, we just have a common
> relatively small pool of locks that all trampolines share based on
> some hash (i.e., we deterministically map trampoline to one of the
> locks). Then multi-attach can just go and grab all of them in
> predefined order, while singular trampoline attaches will just get
> their own one. We won't need to sort anything, we reduce the amount of
> different locks. I don't think lock contention (due to lock sharing
> for some trampolines) is a real issue to be worried about either.
nice idea, I'll check on that
thanks,
jirka
prev parent reply other threads:[~2026-02-08 20:54 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-03 9:38 [RFC bpf-next 00/12] bpf: tracing_multi link Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 01/12] ftrace: Add ftrace_hash_count function Jiri Olsa
2026-02-03 15:40 ` Steven Rostedt
2026-02-04 12:06 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 02/12] bpf: Add struct bpf_trampoline_ops object Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 03/12] bpf: Add struct bpf_struct_ops_tramp_link object Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 04/12] bpf: Add struct bpf_tramp_node object Jiri Olsa
2026-02-04 19:00 ` Andrii Nakryiko
2026-02-05 8:57 ` Jiri Olsa
2026-02-05 22:27 ` Andrii Nakryiko
2026-02-06 8:27 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 05/12] bpf: Add multi tracing attach types Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-04 2:20 ` Leon Hwang
2026-02-04 12:41 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 06/12] bpf: Add bpf_trampoline_multi_attach/detach functions Jiri Olsa
2026-02-03 10:14 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-05 9:16 ` Menglong Dong
2026-02-05 13:45 ` Jiri Olsa
2026-02-11 8:04 ` Menglong Dong
2026-02-03 9:38 ` [RFC bpf-next 07/12] bpf: Add support to create tracing multi link Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-04 19:05 ` Andrii Nakryiko
2026-02-05 8:55 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 08/12] libbpf: Add btf__find_by_glob_kind function Jiri Olsa
2026-02-03 10:14 ` bot+bpf-ci
2026-02-04 19:04 ` Andrii Nakryiko
2026-02-05 8:57 ` Jiri Olsa
2026-02-05 22:45 ` Andrii Nakryiko
2026-02-06 8:43 ` Jiri Olsa
2026-02-06 16:58 ` Andrii Nakryiko
2026-02-03 9:38 ` [RFC bpf-next 09/12] libbpf: Add support to create tracing multi link Jiri Olsa
2026-02-03 10:14 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-04 19:05 ` Andrii Nakryiko
2026-02-17 22:06 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 10/12] selftests/bpf: Add fentry tracing multi func test Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:06 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 11/12] selftests/bpf: Add fentry intersected " Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 12/12] selftests/bpf: Add tracing multi benchmark test Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:06 ` Jiri Olsa
2026-02-03 23:17 ` [RFC bpf-next 00/12] bpf: tracing_multi link Alexei Starovoitov
2026-02-04 12:36 ` Jiri Olsa
2026-02-04 16:06 ` Alexei Starovoitov
2026-02-05 8:55 ` Jiri Olsa
2026-02-05 15:55 ` Alexei Starovoitov
2026-02-06 8:18 ` Jiri Olsa
2026-02-06 17:03 ` Andrii Nakryiko
2026-02-08 20:54 ` Jiri Olsa [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYj4DnCDlUHUSPgU@krava \
--to=olsajiri@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=kafai@fb.com \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=menglong8.dong@gmail.com \
--cc=rostedt@kernel.org \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox