From: Jiri Olsa <olsajiri@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Jiri Olsa <olsajiri@gmail.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
linux-trace-kernel <linux-trace-kernel@vger.kernel.org>,
Martin KaFai Lau <kafai@fb.com>,
Eduard Zingerman <eddyz87@gmail.com>,
Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
Menglong Dong <menglong8.dong@gmail.com>,
Steven Rostedt <rostedt@kernel.org>
Subject: Re: [RFC bpf-next 00/12] bpf: tracing_multi link
Date: Sun, 8 Feb 2026 21:54:38 +0100 [thread overview]
Message-ID: <aYj4DnCDlUHUSPgU@krava> (raw)
In-Reply-To: <CAEf4BzbUX8xEp1x66ugtj1C6sxWPJ+_9EpyVmy3TVOMiKrFcWA@mail.gmail.com>
On Fri, Feb 06, 2026 at 09:03:29AM -0800, Andrii Nakryiko wrote:
> On Fri, Feb 6, 2026 at 12:18 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, Feb 05, 2026 at 07:55:19AM -0800, Alexei Starovoitov wrote:
> > > On Thu, Feb 5, 2026 at 12:55 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > On Wed, Feb 04, 2026 at 08:06:50AM -0800, Alexei Starovoitov wrote:
> > > > > On Wed, Feb 4, 2026 at 4:36 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Feb 03, 2026 at 03:17:05PM -0800, Alexei Starovoitov wrote:
> > > > > > > On Tue, Feb 3, 2026 at 1:38 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > > > > > >
> > > > > > > > hi,
> > > > > > > > as an option to Meglong's change [1] I'm sending proposal for tracing_multi
> > > > > > > > link that does not add static trampoline but attaches program to all needed
> > > > > > > > trampolines.
> > > > > > > >
> > > > > > > > This approach keeps the same performance but has some drawbacks:
> > > > > > > >
> > > > > > > > - when attaching 20k functions we allocate and attach 20k trampolines
> > > > > > > > - during attachment we hold each trampoline mutex, so for above
> > > > > > > > 20k functions we will hold 20k mutexes during the attachment,
> > > > > > > > should be very prone to deadlock, but haven't hit it yet
> > > > > > >
> > > > > > > If you check that it's sorted and always take them in the same order
> > > > > > > then there will be no deadlock.
> > > > > > > Or just grab one global mutex first and then grab trampolines mutexes
> > > > > > > next in any order. The global one will serialize this attach operation.
> > > > > > >
> > > > > > > > It looks the trampoline allocations/generation might not be big a problem
> > > > > > > > and I'll try to find a solution for holding that many mutexes. If there's
> > > > > > > > no better solution I think having one read/write mutex for tracing multi
> > > > > > > > link attach/detach should work.
> > > > > > >
> > > > > > > If you mean to have one global mutex as I proposed above then I don't see
> > > > > > > a downside. It only serializes multiple libbpf calls.
> > > > > >
> > > > > > we also need to serialize it with standard single trampoline attach,
> > > > > > because the direct ftrace update is now done under trampoline->mutex:
> > > > > >
> > > > > > bpf_trampoline_link_prog(tr)
> > > > > > {
> > > > > > mutex_lock(&tr->mutex);
> > > > > > ...
> > > > > > update_ftrace_direct_*
> > > > > > ...
> > > > > > mutex_unlock(&tr->mutex);
> > > > > > }
> > > > > >
> > > > > > for tracing_multi we would link the program first (with tr->mutex)
> > > > > > and do the bulk ftrace update later (without tr->mutex)
> > > > > >
> > > > > > {
> > > > > > for each involved trampoline:
> > > > > > bpf_trampoline_link_prog
> > > > > >
> > > > > > --> and here we could race with some other thread doing single
> > > > > > trampoline attach
> > > > > >
> > > > > > update_ftrace_direct_*
> > > > > > }
> > > > > >
> > > > > > note the current version locks all tr->mutex instances all the way
> > > > > > through the update_ftrace_direct_* update
> > > > > >
> > > > > > I think we could use global rwsem and take read lock on single
> > > > > > trampoline attach path and write lock on tracing_multi attach,
> > > > > >
> > > > > > I thought we could take direct_mutex early, but that would mean
> > > > > > different order with trampoline mutex than we already have in
> > > > > > single attach path
> > > > >
> > > > > I feel we're talking past each other.
> > > > > I meant:
> > > > >
> > > > > For multi:
> > > > > 1. take some global mutex
> > > > > 2. take N tramp mutexes in any order
> > > > >
> > > > > For single:
> > > > > 1. take that 1 specific tramp mutex.
> > > >
> > > > ah ok, I understand, it's to prevent the lockup but keep holding all
> > > > the trampolines locks.. the rwsem I mentioned was for the 'fix', where
> > > > we do not take all the trampolines locks
> > >
> > > I don't understand how rwsem would help.
> > > All the operations on trampoline are protected by mutex.
> > > Switching to rw makes sense only if we can designate certain
> > > operations as "read" and others as "write" and number of "reads"
> > > dominate. This won't be the case with multi-fentry.
> > > And we still need to take all of them as "write" to update trampoline.
> >
> > this applies to scenario where we do not hold all the trampoline locks,
> > in such case we could have race between single and multi attachment,
> > while single/single attachment race stays safe
> >
> > as a fix the single attach would take read lock and multi attach would
> > take write lock, so single/single race is allowed and single/multi is
> > not ... showed in the patch below
> >
> > but it might be too much.. in a sense that there's already many locks
> > involved in trampoline attach/detach, and simple global lock in multi
> > or just sorting the ids would be enough
> >
>
> I'll just throw this idea here, but we don't have to do it right away.
> What if instead of having a per-trampoline lock, we just have a common
> relatively small pool of locks that all trampolines share based on
> some hash (i.e., we deterministically map trampoline to one of the
> locks). Then multi-attach can just go and grab all of them in
> predefined order, while singular trampoline attaches will just get
> their own one. We won't need to sort anything, we reduce the amount of
> different locks. I don't think lock contention (due to lock sharing
> for some trampolines) is a real issue to be worried about either.
nice idea, I'll check on that
thanks,
jirka
prev parent reply other threads:[~2026-02-08 20:54 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-03 9:38 [RFC bpf-next 00/12] bpf: tracing_multi link Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 01/12] ftrace: Add ftrace_hash_count function Jiri Olsa
2026-02-03 15:40 ` Steven Rostedt
2026-02-04 12:06 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 02/12] bpf: Add struct bpf_trampoline_ops object Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 03/12] bpf: Add struct bpf_struct_ops_tramp_link object Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 04/12] bpf: Add struct bpf_tramp_node object Jiri Olsa
2026-02-04 19:00 ` Andrii Nakryiko
2026-02-05 8:57 ` Jiri Olsa
2026-02-05 22:27 ` Andrii Nakryiko
2026-02-06 8:27 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 05/12] bpf: Add multi tracing attach types Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-04 2:20 ` Leon Hwang
2026-02-04 12:41 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 06/12] bpf: Add bpf_trampoline_multi_attach/detach functions Jiri Olsa
2026-02-03 10:14 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-05 9:16 ` Menglong Dong
2026-02-05 13:45 ` Jiri Olsa
2026-02-11 8:04 ` Menglong Dong
2026-02-03 9:38 ` [RFC bpf-next 07/12] bpf: Add support to create tracing multi link Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-04 19:05 ` Andrii Nakryiko
2026-02-05 8:55 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 08/12] libbpf: Add btf__find_by_glob_kind function Jiri Olsa
2026-02-03 10:14 ` bot+bpf-ci
2026-02-04 19:04 ` Andrii Nakryiko
2026-02-05 8:57 ` Jiri Olsa
2026-02-05 22:45 ` Andrii Nakryiko
2026-02-06 8:43 ` Jiri Olsa
2026-02-06 16:58 ` Andrii Nakryiko
2026-02-03 9:38 ` [RFC bpf-next 09/12] libbpf: Add support to create tracing multi link Jiri Olsa
2026-02-03 10:14 ` bot+bpf-ci
2026-02-17 22:05 ` Jiri Olsa
2026-02-04 19:05 ` Andrii Nakryiko
2026-02-17 22:06 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 10/12] selftests/bpf: Add fentry tracing multi func test Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:06 ` Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 11/12] selftests/bpf: Add fentry intersected " Jiri Olsa
2026-02-03 9:38 ` [RFC bpf-next 12/12] selftests/bpf: Add tracing multi benchmark test Jiri Olsa
2026-02-03 10:13 ` bot+bpf-ci
2026-02-17 22:06 ` Jiri Olsa
2026-02-03 23:17 ` [RFC bpf-next 00/12] bpf: tracing_multi link Alexei Starovoitov
2026-02-04 12:36 ` Jiri Olsa
2026-02-04 16:06 ` Alexei Starovoitov
2026-02-05 8:55 ` Jiri Olsa
2026-02-05 15:55 ` Alexei Starovoitov
2026-02-06 8:18 ` Jiri Olsa
2026-02-06 17:03 ` Andrii Nakryiko
2026-02-08 20:54 ` Jiri Olsa [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYj4DnCDlUHUSPgU@krava \
--to=olsajiri@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=kafai@fb.com \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=menglong8.dong@gmail.com \
--cc=rostedt@kernel.org \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.