All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Jiri Olsa <olsajiri@gmail.com>, Lee Jones <lee@kernel.org>
Cc: linux-kernel@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	bpf@vger.kernel.org
Subject: Re: [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid()
Date: Thu, 21 Jul 2022 08:53:08 -0700	[thread overview]
Message-ID: <fbc98bb0-a2d6-a450-e6fc-878701e5906d@fb.com> (raw)
In-Reply-To: <YtlDPYQWDcORbP0o@krava>



On 7/21/22 5:14 AM, Jiri Olsa wrote:
> On Thu, Jul 21, 2022 at 12:59:09PM +0100, Lee Jones wrote:
>> On Thu, 21 Jul 2022, Jiri Olsa wrote:
>>
>>> On Thu, Jul 21, 2022 at 12:14:30PM +0100, Lee Jones wrote:
>>>> The documentation for find_pid() clearly states:
> 
> typo find_vpid
> 
>>>>
>>>>    "Must be called with the tasklist_lock or rcu_read_lock() held."
>>>>
>>>> Presently we do neither.
> 
> just curious, did you see crash related to this or you just spot that
> 
>>>>
>>>> In an ideal world we would wrap the in-lined call to find_vpid() along
>>>> with get_pid_task() in the suggested rcu_read_lock() and have done.
>>>> However, looking at get_pid_task()'s internals, it already does that
>>>> independently, so this would lead to deadlock.
>>>
>>> hm, we can have nested rcu_read_lock calls, right?
>>
>> I assumed not, but that might be an oversight on my part.

 From kernel documentation, nested rcu_read_lock is allowed.
https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html

RCU's grace-period guarantee allows updaters to wait for the completion 
of all pre-existing RCU read-side critical sections. An RCU read-side 
critical section begins with the marker rcu_read_lock() and ends with 
the marker rcu_read_unlock(). These markers may be nested, and RCU 
treats a nested set as one big RCU read-side critical section. 
Production-quality implementations of rcu_read_lock() and 
rcu_read_unlock() are extremely lightweight, and in fact have exactly 
zero overhead in Linux kernels built for production use with 
CONFIG_PREEMPT=n.

>>
>> Would that be your preference?
> 
> seems simpler than calling get/put for ppid

The current implementation seems okay since we can hide
rcu_read_lock() inside find_get_pid(). We can also avoid
nested rcu_read_lock(), which is although allowed but
not pretty.

> 
> jirka
> 
>>
>>>> Instead, we'll use find_get_pid() which searches for the vpid, then
>>>> takes a reference to it preventing early free, all within the safety
>>>> of rcu_read_lock().  Once we have our reference we can safely make use
>>>> of it up until the point it is put.
>>>>
>>>> Cc: Alexei Starovoitov <ast@kernel.org>
>>>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>>>> Cc: John Fastabend <john.fastabend@gmail.com>
>>>> Cc: Andrii Nakryiko <andrii@kernel.org>
>>>> Cc: Martin KaFai Lau <martin.lau@linux.dev>
>>>> Cc: Song Liu <song@kernel.org>
>>>> Cc: Yonghong Song <yhs@fb.com>
>>>> Cc: KP Singh <kpsingh@kernel.org>
>>>> Cc: Stanislav Fomichev <sdf@google.com>
>>>> Cc: Hao Luo <haoluo@google.com>
>>>> Cc: Jiri Olsa <jolsa@kernel.org>
>>>> Cc: bpf@vger.kernel.org
>>>> Fixes: 41bdc4b40ed6f ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
>>>> Signed-off-by: Lee Jones <lee@kernel.org>
>>>> ---
>>>>   kernel/bpf/syscall.c | 5 ++++-
>>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>>>> index 83c7136c5788d..c20cff30581c4 100644
>>>> --- a/kernel/bpf/syscall.c
>>>> +++ b/kernel/bpf/syscall.c
>>>> @@ -4385,6 +4385,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>>>>   	const struct perf_event *event;
>>>>   	struct task_struct *task;
>>>>   	struct file *file;
>>>> +	struct pid *ppid;
>>>>   	int err;
>>>>   
>>>>   	if (CHECK_ATTR(BPF_TASK_FD_QUERY))
>>>> @@ -4396,7 +4397,9 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
>>>>   	if (attr->task_fd_query.flags != 0)
>>>>   		return -EINVAL;
>>>>   
>>>> -	task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
>>>> +	ppid = find_get_pid(pid);
>>>> +	task = get_pid_task(ppid, PIDTYPE_PID);
>>>> +	put_pid(ppid);
>>>>   	if (!task)
>>>>   		return -ENOENT;
>>>>   
>>
>> -- 
>> Lee Jones [李琼斯]

  reply	other threads:[~2022-07-21 15:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-21 11:14 [PATCH 1/1] bpf: Drop unprotected find_vpid() in favour of find_get_pid() Lee Jones
2022-07-21 11:56 ` Jiri Olsa
2022-07-21 11:59   ` Lee Jones
2022-07-21 12:14     ` Jiri Olsa
2022-07-21 15:53       ` Yonghong Song [this message]
2022-07-21 20:58         ` Lee Jones
2022-07-22 20:15           ` Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fbc98bb0-a2d6-a450-e6fc-878701e5906d@fb.com \
    --to=yhs@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=lee@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=olsajiri@gmail.com \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.