public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	Martin KaFai Lau <kafai@fb.com>,
	Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v2 11/20] bpf: add task and task/file iterator targets
Date: Wed, 6 May 2020 14:20:10 -0700	[thread overview]
Message-ID: <2f2bb9c4-6fd2-3fdb-959d-0ce408168c85@fb.com> (raw)
In-Reply-To: <CAEf4BzaTTdChHsEy=WX8-j-1c66baZnppK6WaSjexewjph0O=g@mail.gmail.com>



On 5/6/20 1:51 PM, Andrii Nakryiko wrote:
> On Wed, May 6, 2020 at 11:24 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 5/6/20 12:30 AM, Andrii Nakryiko wrote:
>>> On Sun, May 3, 2020 at 11:28 PM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>> Only the tasks belonging to "current" pid namespace
>>>> are enumerated.
>>>>
>>>> For task/file target, the bpf program will have access to
>>>>     struct task_struct *task
>>>>     u32 fd
>>>>     struct file *file
>>>> where fd/file is an open file for the task.
>>>>
>>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>>> ---
>>>
>>> I might be missing some subtleties with task refcounting for task_file
>>> iterator, asked few questions below...
>>>
>>>>    kernel/bpf/Makefile    |   2 +-
>>>>    kernel/bpf/task_iter.c | 336 +++++++++++++++++++++++++++++++++++++++++
>>>>    2 files changed, 337 insertions(+), 1 deletion(-)
>>>>    create mode 100644 kernel/bpf/task_iter.c
>>>>
>>>> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
>>>> index b2b5eefc5254..37b2d8620153 100644
>>>> --- a/kernel/bpf/Makefile
>>>> +++ b/kernel/bpf/Makefile
>>>> @@ -2,7 +2,7 @@
>>>>    obj-y := core.o
>>>>    CFLAGS_core.o += $(call cc-disable-warning, override-init)
>>>>
>>>> -obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o
>>>> +obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o
>>>>    obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
>>>>    obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o
>>>>    obj-$(CONFIG_BPF_SYSCALL) += disasm.o
>>>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>>>> new file mode 100644
>>>> index 000000000000..1ca258f6e9f4
>>>> --- /dev/null
>>>> +++ b/kernel/bpf/task_iter.c
>>>> @@ -0,0 +1,336 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +/* Copyright (c) 2020 Facebook */
>>>> +
>>>> +#include <linux/init.h>
>>>> +#include <linux/namei.h>
>>>> +#include <linux/pid_namespace.h>
>>>> +#include <linux/fs.h>
>>>> +#include <linux/fdtable.h>
>>>> +#include <linux/filter.h>
>>>> +
>>>> +struct bpf_iter_seq_task_common {
>>>> +       struct pid_namespace *ns;
>>>> +};
>>>> +
>>>> +struct bpf_iter_seq_task_info {
>>>> +       struct bpf_iter_seq_task_common common;
>>>
>>> you have comment below in init_seq_pidns() that common is supposed to
>>> be the very first field, but I think it's more important and
>>> appropriate here, so that whoever adds anything here knows that order
>>> of field is important.
>>
>> I can move the comments here.
>>
>>>
>>>> +       struct task_struct *task;
>>>> +       u32 id;
>>>> +};
>>>> +
>>>
>>> [...]
>>>
>>>> +static int __task_seq_show(struct seq_file *seq, void *v, bool in_stop)
>>>> +{
>>>> +       struct bpf_iter_meta meta;
>>>> +       struct bpf_iter__task ctx;
>>>> +       struct bpf_prog *prog;
>>>> +       int ret = 0;
>>>> +
>>>> +       meta.seq = seq;
>>>> +       prog = bpf_iter_get_info(&meta, in_stop);
>>>> +       if (prog) {
>>>
>>>
>>> nit: `if (!prog) return 0;` here would reduce nesting level below
>>>
>>>> +               meta.seq = seq;
>>>> +               ctx.meta = &meta;
>>>> +               ctx.task = v;
>>>> +               ret = bpf_iter_run_prog(prog, &ctx);
>>>> +       }
>>>> +
>>>> +       return 0;
>>>
>>> return **ret**; ?
>>
>> It should return "ret". In task_file show() code is similar but correct.
>> I can do early return with !prog too although we do not have
>> deep nesting level yet.
>>
>>>
>>>> +}
>>>> +
>>>
>>> [...]
>>>
>>>> +
>>>> +static struct file *task_file_seq_get_next(struct pid_namespace *ns, u32 *id,
>>>> +                                          int *fd, struct task_struct **task,
>>>> +                                          struct files_struct **fstruct)
>>>> +{
>>>> +       struct files_struct *files;
>>>> +       struct task_struct *tk;
>>>> +       u32 sid = *id;
>>>> +       int sfd;
>>>> +
>>>> +       /* If this function returns a non-NULL file object,
>>>> +        * it held a reference to the files_struct and file.
>>>> +        * Otherwise, it does not hold any reference.
>>>> +        */
>>>> +again:
>>>> +       if (*fstruct) {
>>>> +               files = *fstruct;
>>>> +               sfd = *fd;
>>>> +       } else {
>>>> +               tk = task_seq_get_next(ns, &sid);
>>>> +               if (!tk)
>>>> +                       return NULL;
>>>> +
>>>> +               files = get_files_struct(tk);
>>>> +               put_task_struct(tk);
>>>
>>> task is put here, but is still used below.. is there some additional
>>> hidden refcounting involved?
>>
>> Good question. I had an impression that we take a reference count
>> for task->files so task should not go away. But reading linux
>> code again, I do not have sufficient evidence to back my claim.
>> So I will reference count task as well, e.g., do not put_task_struct()
>> until all files are done here.
> 
> All threads within the process share files table. So some threads
> might exit, but files will stay, which is why task_struct and
> files_struct have separate refcounting, and having refcount on files
> doesn't guarantee any particular task will stay alive for long enough.
> So I think we need to refcount both files and task in this case.
> Reading source code of copy_files() in kernel/fork.c (CLONE_FILES
> flags just bumps refcnt on old process' files_struct), seems to
> confirm this as well.

Just checked the code. It does look like files are shared among
threads (tasks). So yes, in this case, reference counting to
both task and file_table needed.

> 
>>
>>>
>>>> +               if (!files) {
>>>> +                       sid = ++(*id);
>>>> +                       *fd = 0;
>>>> +                       goto again;
>>>> +               }
>>>> +               *fstruct = files;
>>>> +               *task = tk;
>>>> +               if (sid == *id) {
>>>> +                       sfd = *fd;
>>>> +               } else {
>>>> +                       *id = sid;
>>>> +                       sfd = 0;
>>>> +               }
>>>> +       }
>>>> +
>>>> +       rcu_read_lock();
[...]

  reply	other threads:[~2020-05-06 21:20 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04  6:25 [PATCH bpf-next v2 00/20] bpf: implement bpf iterator for kernel data Yonghong Song
2020-05-04  6:25 ` [PATCH bpf-next v2 01/20] bpf: implement an interface to register bpf_iter targets Yonghong Song
2020-05-05 21:19   ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 02/20] bpf: allow loading of a bpf_iter program Yonghong Song
2020-05-05 21:29   ` Andrii Nakryiko
2020-05-06  0:07     ` Yonghong Song
2020-05-04  6:25 ` [PATCH bpf-next v2 03/20] bpf: support bpf tracing/iter programs for BPF_LINK_CREATE Yonghong Song
2020-05-05 21:30   ` Andrii Nakryiko
2020-05-06  0:14     ` Yonghong Song
2020-05-06  0:54       ` Alexei Starovoitov
2020-05-06  3:09         ` Andrii Nakryiko
2020-05-06 18:08           ` Alexei Starovoitov
2020-05-04  6:25 ` [PATCH bpf-next v2 04/20] bpf: support bpf tracing/iter programs for BPF_LINK_UPDATE Yonghong Song
2020-05-05 21:32   ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 05/20] bpf: implement bpf_seq_read() for bpf iterator Yonghong Song
2020-05-05 19:56   ` Andrii Nakryiko
2020-05-05 19:57     ` Alexei Starovoitov
2020-05-05 20:25     ` Yonghong Song
2020-05-05 21:08       ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 06/20] bpf: create anonymous " Yonghong Song
2020-05-05 20:11   ` Andrii Nakryiko
2020-05-05 20:28     ` Yonghong Song
2020-05-04  6:25 ` [PATCH bpf-next v2 07/20] bpf: create file " Yonghong Song
2020-05-05 20:15   ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 08/20] bpf: implement common macros/helpers for target iterators Yonghong Song
2020-05-05 20:25   ` Andrii Nakryiko
2020-05-05 20:30     ` Yonghong Song
2020-05-05 21:10       ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 09/20] bpf: add bpf_map iterator Yonghong Song
2020-05-06  5:11   ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 10/20] net: bpf: add netlink and ipv6_route bpf_iter targets Yonghong Song
2020-05-06  5:21   ` Andrii Nakryiko
2020-05-06 17:32     ` Yonghong Song
2020-05-04  6:25 ` [PATCH bpf-next v2 11/20] bpf: add task and task/file iterator targets Yonghong Song
2020-05-06  7:30   ` Andrii Nakryiko
2020-05-06 18:24     ` Yonghong Song
2020-05-06 20:51       ` Andrii Nakryiko
2020-05-06 21:20         ` Yonghong Song [this message]
2020-05-04  6:26 ` [PATCH bpf-next v2 12/20] bpf: add PTR_TO_BTF_ID_OR_NULL support Yonghong Song
2020-05-05 20:27   ` Andrii Nakryiko
2020-05-04  6:26 ` [PATCH bpf-next v2 13/20] bpf: add bpf_seq_printf and bpf_seq_write helpers Yonghong Song
2020-05-06 17:37   ` Andrii Nakryiko
2020-05-06 21:42     ` Yonghong Song
2020-05-08 18:15       ` Andrii Nakryiko
2020-05-04  6:26 ` [PATCH bpf-next v2 14/20] bpf: handle spilled PTR_TO_BTF_ID properly when checking stack_boundary Yonghong Song
2020-05-06 17:38   ` Andrii Nakryiko
2020-05-06 21:47     ` Yonghong Song
2020-05-04  6:26 ` [PATCH bpf-next v2 15/20] bpf: support variable length array in tracing programs Yonghong Song
2020-05-06 17:40   ` Andrii Nakryiko
2020-05-04  6:26 ` [PATCH bpf-next v2 16/20] tools/libbpf: add bpf_iter support Yonghong Song
2020-05-06  5:44   ` Andrii Nakryiko
2020-05-04  6:26 ` [PATCH bpf-next v2 17/20] tools/bpftool: add bpf_iter support for bptool Yonghong Song
2020-05-04  6:26 ` [PATCH bpf-next v2 18/20] tools/bpf: selftests: add iterator programs for ipv6_route and netlink Yonghong Song
2020-05-06  6:01   ` Andrii Nakryiko
2020-05-07  1:09     ` Yonghong Song
2020-05-08 18:17       ` Andrii Nakryiko
2020-05-06  6:04   ` Andrii Nakryiko
2020-05-06 23:07     ` Yonghong Song
2020-05-04  6:26 ` [PATCH bpf-next v2 19/20] tools/bpf: selftests: add iter progs for bpf_map/task/task_file Yonghong Song
2020-05-06  6:14   ` Andrii Nakryiko
2020-05-04  6:26 ` [PATCH bpf-next v2 20/20] tools/bpf: selftests: add bpf_iter selftests Yonghong Song
2020-05-06  6:39   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f2bb9c4-6fd2-3fdb-959d-0ce408168c85@fb.com \
    --to=yhs@fb.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox