From: Song Liu <songliubraving@fb.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Song Liu <song@kernel.org>, Networking <netdev@vger.kernel.org>,
bpf <bpf@vger.kernel.org>, lkml <linux-kernel@vger.kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Kernel Team <Kernel-team@fb.com>,
"jolsa@kernel.org" <jolsa@kernel.org>,
"mhiramat@kernel.org" <mhiramat@kernel.org>
Subject: Re: [PATCH v2 bpf-next 3/5] ftrace: introduce FTRACE_OPS_FL_SHARE_IPMODIFY
Date: Fri, 15 Jul 2022 20:21:49 +0000 [thread overview]
Message-ID: <6271DEDF-F585-4A3B-90BF-BA2EB76DDC01@fb.com> (raw)
In-Reply-To: <20220715155953.4fb692e2@gandalf.local.home>
> On Jul 15, 2022, at 12:59 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 15 Jul 2022 19:49:00 +0000
> Song Liu <songliubraving@fb.com> wrote:
>
>>>
>>> What about if we release the lock when doing the callback?
>>
>> We can probably unlock ftrace_lock here. But we may break locking order
>> with direct mutex (see below).
>
> You're talking about the multi registering case, right?
We are using the *_ftrace_direct_multi() API here, to be able to specify
ops_func. The direct single API just uses the shared direct_ops.
>
>>
>>>
>>> Then we just need to make sure things are the same after reacquiring the
>>> lock, and if they are different, we release the lock again and do the
>>> callback with the new update. Wash, rinse, repeat, until the state is the
>>> same before and after the callback with locks acquired?
>>
>> Personally, I would like to avoid wash-rinse-repeat here.
>
> But it's common to do. Keeps your hair cleaner that way ;-)
>
>>
>>>
>>> This is a common way to handle callbacks that need to do something that
>>> takes the lock held before doing a callback.
>>>
>>> The reason I say this, is because the more we can keep the accounting
>>> inside of ftrace the better.
>>>
>>> Wouldn't this need to be done anyway if BPF was first and live kernel
>>> patching needed the update? An -EAGAIN would not suffice.
>>
>> prepare_direct_functions_for_ipmodify handles BPF-first-livepatch-later
>> case. The benefit of prepare_direct_functions_for_ipmodify() is that it
>> holds direct_mutex before ftrace_lock, and keeps holding it if necessary.
>> This is enough to make sure we don't need the wash-rinse-repeat.
>>
>> OTOH, if we wait until __ftrace_hash_update_ipmodify(), we already hold
>> ftrace_lock, but not direct_mutex. To make changes to bpf trampoline, we
>> have to unlock ftrace_lock and lock direct_mutex to avoid deadlock.
>> However, this means we will need the wash-rinse-repeat.
What do you think about the prepare_direct_functions_for_ipmodify()
approach? If this is not ideal, maybe we can simplify it so that it only
holds direct_mutex (when necessary). The benefit is that we are sure
direct_mutex is already held in __ftrace_hash_update_ipmodify(). However,
I think it is not safe to unlock ftrace_lock in __ftrace_hash_update_ipmodify().
We can get parallel do_for_each_ftrace_rec(), which is dangerous, no?
>>
>>
>> For livepatch-first-BPF-later case, we can probably handle this in
>> __ftrace_hash_update_ipmodify(), since we hold both direct_mutex and
>> ftrace_lock. We can unlock ftrace_lock and update the BPF trampoline.
>> It is safe against changes to direct ops, because we are still holding
>> direct_mutex. But, is this safe against another IPMODIFY ops? I am not
>> sure yet... Also, this is pretty weird because, we are updating a
>> direct trampoline before we finish registering it for the first time.
>> IOW, we are calling modify_ftrace_direct_multi_nolock for the same
>> trampoline before register_ftrace_direct_multi() returns.
>>
>> The approach in v2 propagates the -EAGAIN to BPF side, so these are two
>> independent calls of register_ftrace_direct_multi(). This does require
>> some protocol between ftrace core and its user, but I still think this
>> is a cleaner approach.
>
> The issue I have with this approach is it couples BPF and ftrace a bit too
> much.
>
> But there is a way with my approach you can still do your approach. That
> is, have ops_func() return zero if everything is fine, and otherwise returns
> a negative value. Then have the register function fail and return whatever
> value that gets returned by the ops_func()
>
> Then have the bpf ops_func() check (does this direct caller handle
> IPMODIFY? if yes, return 0, else return -EAGAIN). Then the registering of
> ftrace fails with your -EAGAIN, and then you can change the direct
> trampoline to handle IPMODIFY and try again. This time when ops_func() is
> called, it sees that the direct trampoline can handle the IPMODIFY and
> returns 0.
>
> Basically, it's a way to still implement my suggestion, but let BPF decide
> to use -EAGAIN to try again. And then BPF and ftrace don't need to have
> these special flags to change the behavior of each other.
I like this one. So there is no protocol about the return value here.
Thanks,
Song
next prev parent reply other threads:[~2022-07-15 20:21 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-02 19:37 [PATCH v2 bpf-next 0/5] ftrace: host klp and bpf trampoline together Song Liu
2022-06-02 19:37 ` [PATCH v2 bpf-next 1/5] ftrace: allow customized flags for ftrace_direct_multi ftrace_ops Song Liu
2022-07-13 23:18 ` Steven Rostedt
2022-07-14 0:11 ` Song Liu
2022-07-14 0:38 ` Steven Rostedt
2022-07-14 1:42 ` Song Liu
2022-07-14 2:55 ` Steven Rostedt
2022-07-14 4:37 ` Song Liu
2022-07-14 13:22 ` Steven Rostedt
2022-06-02 19:37 ` [PATCH v2 bpf-next 2/5] ftrace: add modify_ftrace_direct_multi_nolock Song Liu
2022-06-02 19:37 ` [PATCH v2 bpf-next 3/5] ftrace: introduce FTRACE_OPS_FL_SHARE_IPMODIFY Song Liu
2022-06-06 8:20 ` Jiri Olsa
2022-06-06 15:35 ` Song Liu
2022-07-14 0:33 ` Steven Rostedt
2022-07-15 0:13 ` Song Liu
2022-07-15 0:48 ` Steven Rostedt
2022-07-15 2:04 ` Song Liu
2022-07-15 2:46 ` Steven Rostedt
2022-07-15 2:50 ` Song Liu
2022-07-15 17:42 ` Song Liu
2022-07-15 19:12 ` Steven Rostedt
2022-07-15 19:49 ` Song Liu
2022-07-15 19:59 ` Steven Rostedt
2022-07-15 20:21 ` Song Liu [this message]
2022-07-15 21:29 ` Steven Rostedt
2022-07-15 21:48 ` Song Liu
2022-07-15 21:50 ` Steven Rostedt
2022-06-02 19:37 ` [PATCH v2 bpf-next 4/5] bpf, x64: Allow to use caller address from stack Song Liu
2022-06-02 19:37 ` [PATCH v2 bpf-next 5/5] bpf: trampoline: support FTRACE_OPS_FL_SHARE_IPMODIFY Song Liu
2022-07-06 19:38 ` Steven Rostedt
2022-07-06 21:37 ` Song Liu
2022-07-06 21:40 ` Steven Rostedt
2022-07-06 21:50 ` Song Liu
2022-07-06 22:15 ` Song Liu
2022-07-06 22:29 ` Steven Rostedt
2022-07-07 0:19 ` Song Liu
2022-07-07 1:18 ` Steven Rostedt
2022-07-07 2:11 ` Song Liu
2022-06-06 22:57 ` [PATCH v2 bpf-next 0/5] ftrace: host klp and bpf trampoline together Song Liu
2022-07-11 23:55 ` Steven Rostedt
2022-07-12 5:15 ` Song Liu
2022-07-12 13:36 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6271DEDF-F585-4A3B-90BF-BA2EB76DDC01@fb.com \
--to=songliubraving@fb.com \
--cc=Kernel-team@fb.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox