From: Yonghong Song <yhs@fb.com>
To: Jiri Olsa <olsajiri@gmail.com>, Lee Jones <lee@kernel.org>
Cc: linux-kernel@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>, KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
bpf@vger.kernel.org
Subject: Re: [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid()
Date: Thu, 21 Jul 2022 08:53:08 -0700 [thread overview]
Message-ID: <fbc98bb0-a2d6-a450-e6fc-878701e5906d@fb.com> (raw)
In-Reply-To: <YtlDPYQWDcORbP0o@krava>
On 7/21/22 5:14 AM, Jiri Olsa wrote:
> On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote:
>> On Thu, 21 Jul 2022, Jiri Olsa wrote:
>>
>>> On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
>>>> The documentation for find_pid() clearly states:
>
> typo find_vpid
>
>>>>
>>>> "Must be called with the tasklist_lock or rcu_read_lock() held."
>>>>
>>>> Presently we do neither.
>
> just curious, did you see crash related to this or you just spot that
>
>>>>
>>>> In an ideal world we would wrap the in-lined call to find_vpid() along
>>>> with get_pid_task() in the suggested rcu_read_lock() and have done.
>>>> However, looking at get_pid_task()'s internals, it already does that
>>>> independently, so this would lead to deadlock.
>>>
>>> hm, we can have nested rcu_read_lock calls, right?
>>
>> I assumed not, but that might be an oversight on my part.
From kernel documentation, nested rcu_read_lock is allowed.
https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html
RCU's grace-period guarantee allows updaters to wait for the completion
of all pre-existing RCU read-side critical sections. An RCU read-side
critical section begins with the marker rcu_read_lock() and ends with
the marker rcu_read_unlock(). These markers may be nested, and RCU
treats a nested set as one big RCU read-side critical section.
Production-quality implementations of rcu_read_lock() and
rcu_read_unlock() are extremely lightweight, and in fact have exactly
zero overhead in Linux kernels built for production use with
CONFIG_PREEMPT=n.
>>
>> Would that be your preference?
>
> seems simpler than calling get/put for ppid
The current implementation seems okay since we can hide
rcu_read_lock() inside find_get_pid(). We can also avoid
nested rcu_read_lock(), which is although allowed but
not pretty.
>
> jirka
>
>>
>>>> Instead, we'll use find_get_pid() which searches for the vpid, then
>>>> takes a reference to it preventing early free, all within the safety
>>>> of rcu_read_lock(). Once we have our reference we can safely make use
>>>> of it up until the point it is put.
>>>>
>>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>>> Cc: John Fastabend <john.fastabend@gmail.com>
>>>> Cc: Andrii Nakryiko <andrii@kernel.org>
>>>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
>>>> Cc: Song Liu <song@kernel.org>
>>>> Cc: Yonghong Song <yhs@fb.com>
>>>> Cc: KP Singh <kpsingh@kernel.org>
>>>> Cc: Stanislav Fomichev <sdf@google.com>
>>>> Cc: Hao Luo <haoluo@google.com>
>>>> Cc: Jiri Olsa <jolsa@kernel.org>
>>>> Cc: bpf@vger.kernel.org
>>>> Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
>>>> Signed-off-by: Lee Jones <lee@kernel.org>
>>>> ---
>>>> kernel/bpf/syscall.c | 5 ++++-
>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>>> index 83c7136c5788d..c20cff30581c4 100644
>>>> --- a/kernel/bpf/syscall.c
>>>> +++ b/kernel/bpf/syscall.c
>>>> @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>>>> const struct perf_event *event;
>>>> struct task_struct *task;
>>>> struct file *file;
>>>> + struct pid *ppid;
>>>> int err;
>>>>
>>>> if (CHECK_ATTR(BPF_TASK_FD_QUERY))
>>>> @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>>>> if (attr->task_fd_query.flags != 0)
>>>> return -EINVAL;
>>>>
>>>> - task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
>>>> + ppid = find_get_pid(pid);
>>>> + task = get_pid_task(ppid, PIDTYPE_PID);
>>>> + put_pid(ppid);
>>>> if (!task)
>>>> return -ENOENT;
>>>>
>>
>> --
>> Lee Jones [李琼斯]
next prev parent reply other threads:[~2022-07-21 15:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-21 11:14 [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid() Lee Jones
2022-07-21 11:56 ` Jiri Olsa
2022-07-21 11:59 ` Lee Jones
2022-07-21 12:14 ` Jiri Olsa
2022-07-21 15:53 ` Yonghong Song [this message]
2022-07-21 20:58 ` Lee Jones
2022-07-22 20:15 ` Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fbc98bb0-a2d6-a450-e6fc-878701e5906d@fb.com \
--to=yhs@fb.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=kpsingh@kernel.org \
--cc=lee@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=olsajiri@gmail.com \
--cc=sdf@google.com \
--cc=song@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox