From: Yonghong Song <yhs@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
Martin KaFai Lau <kafai@fb.com>,
Networking <netdev@vger.kernel.org>,
Alexei Starovoitov <ast@fb.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Kernel Team <kernel-team@fb.com>
Subject: Re: [RFC PATCH bpf-next 05/16] bpf: create file or anonymous dumpers
Date: Tue, 14 Apr 2020 16:59:12 -0700 [thread overview]
Message-ID: <4bf72b3c-5fee-269f-1d71-7f808f436db9@fb.com> (raw)
In-Reply-To: <CAEf4Bzawu2dFXL7nvYhq1tKv9P7Bb9=6ksDpui5nBjxRrx=3_w@mail.gmail.com>
On 4/13/20 10:56 PM, Andrii Nakryiko wrote:
> On Wed, Apr 8, 2020 at 4:26 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> Given a loaded dumper bpf program, which already
>> knows which target it should bind to, there
>> two ways to create a dumper:
>> - a file based dumper under hierarchy of
>> /sys/kernel/bpfdump/ which uses can
>> "cat" to print out the output.
>> - an anonymous dumper which user application
>> can "read" the dumping output.
>>
>> For file based dumper, BPF_OBJ_PIN syscall interface
>> is used. For anonymous dumper, BPF_PROG_ATTACH
>> syscall interface is used.
>
> We discussed this offline with Yonghong a bit, but I thought I'd put
> my thoughts about this in writing for completeness. To me, it seems
> like the most consistent way to do both anonymous and named dumpers is
> through the following steps:
The main motivation for me to use bpf_link is to enumerate
anonymous bpf dumpers by using idr based link_query mechanism in one
of previous Andrii's RFC patch so I do not need to re-invent the wheel.
But looks like there are some difficulties:
>
> 1. BPF_PROG_LOAD to load/verify program, that created program FD.
> 2. LINK_CREATE using that program FD and direntry FD. This creates
> dumper bpf_link (bpf_dumper_link), returns anonymous link FD. If link
bpf dump program already have the target information as part of
verification propose, so it does not need directory FD.
LINK_CREATE probably not a good fit here.
bpf dump program is kind similar to fentry/fexit program,
where after successful program loading, the program will know
where to attach trampoline.
Looking at kernel code, for fentry/fexit program, at raw_tracepoint_open
syscall, the trampoline will be installed and actually bpf program will
be called.
So, ideally, if we want to use kernel bpf_link, we want to
return a cat-able bpf_link because ultimately we want to query
file descriptors which actually 'read' bpf program outputs.
Current bpf_link is not cat-able.
I try to hack by manipulating fops and other stuff, it may work,
but looks ugly. Or we create a bpf_catable_link and build an
infrastructure around that? Not sure whether it is worthwhile for this
one-off thing (bpfdump)?
Or to query anonymous bpf dumpers, I can just write a bpf dump program
to go through all fd's to find out.
BTW, my current approach (in my private branch),
anonymous dumper:
bpf_raw_tracepoint_open(NULL, prog) -> cat-able fd
file dumper:
bpf_obj_pin(prog, path) -> a cat-able file
If you consider program itself is a link, this is like what
described below in 3 and 4.
> FD is closed, dumper program is detached and dumper is destroyed
> (unless pinned in bpffs, just like with any other bpf_link.
> 3. At this point bpf_dumper_link can be treated like a factory of
> seq_files. We can add a new BPF_DUMPER_OPEN_FILE (all names are for
> illustration purposes) command, that accepts dumper link FD and
> returns a new seq_file FD, which can be read() normally (or, e.g.,
> cat'ed from shell).
In this case, link_query may not be accurate if a bpf_dumper_link
is created but no corresponding bpf_dumper_open_file. What we really
need to iterate through all dumper seq_file FDs.
> 4. Additionally, this anonymous bpf_link can be pinned/mounted in
> bpfdumpfs. We can do it as BPF_OBJ_PIN or as a separate command. Once
> pinned at, e.g., /sys/fs/bpfdump/task/my_dumper, just opening that
> file is equivalent to BPF_DUMPER_OPEN_FILE and will create a new
> seq_file that can be read() independently from other seq_files opened
> against the same dumper. Pinning bpfdumpfs entry also bumps refcnt of
> bpf_link itself, so even if process that created link dies, bpf dumper
> stays attached until its bpfdumpfs entry is deleted.
>
> Apart from BPF_DUMPER_OPEN_FILE and open()'ing bpfdumpfs file duality,
> it seems pretty consistent and follows safe-by-default auto-cleanup of
> anonymous link, unless pinned in bpfdumpfs (or one can still pin
> bpf_link in bpffs, but it can't be open()'ed the same way, it just
> preserves BPF program from being cleaned up).
>
> Out of all schemes I could come up with, this one seems most unified
> and nicely fits into bpf_link infra. Thoughts?
>
>>
>> To facilitate target seq_ops->show() to get the
>> bpf program easily, dumper creation increased
>> the target-provided seq_file private data size
>> so bpf program pointer is also stored in seq_file
>> private data.
>>
>> Further, a seq_num which represents how many
>> bpf_dump_get_prog() has been called is also
>> available to the target seq_ops->show().
>> Such information can be used to e.g., print
>> banner before printing out actual data.
>>
>> Note the seq_num does not represent the num
>> of unique kernel objects the bpf program has
>> seen. But it should be a good approximate.
>>
>> A target feature BPF_DUMP_SEQ_NET_PRIVATE
>> is implemented specifically useful for
>> net based dumpers. It sets net namespace
>> as the current process net namespace.
>> This avoids changing existing net seq_ops
>> in order to retrieve net namespace from
>> the seq_file pointer.
>>
>> For open dumper files, anonymous or not, the
>> fdinfo will show the target and prog_id associated
>> with that file descriptor. For dumper file itself,
>> a kernel interface will be provided to retrieve the
>> prog_id in one of the later patches.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>> include/linux/bpf.h | 5 +
>> include/uapi/linux/bpf.h | 6 +-
>> kernel/bpf/dump.c | 338 ++++++++++++++++++++++++++++++++-
>> kernel/bpf/syscall.c | 11 +-
>> tools/include/uapi/linux/bpf.h | 6 +-
>> 5 files changed, 362 insertions(+), 4 deletions(-)
>>
>
> [...]
>
next prev parent reply other threads:[~2020-04-14 23:59 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-08 23:25 [RFC PATCH bpf-next 00/16] bpf: implement bpf based dumping of kernel data structures Yonghong Song
2020-04-08 23:25 ` [RFC PATCH bpf-next 01/16] net: refactor net assignment for seq_net_private structure Yonghong Song
2020-04-08 23:25 ` [RFC PATCH bpf-next 02/16] bpf: create /sys/kernel/bpfdump mount file system Yonghong Song
2020-04-08 23:25 ` [RFC PATCH bpf-next 03/16] bpf: provide a way for targets to register themselves Yonghong Song
2020-04-10 22:18 ` Andrii Nakryiko
2020-04-10 23:24 ` Yonghong Song
2020-04-13 19:31 ` Andrii Nakryiko
2020-04-15 22:57 ` Yonghong Song
2020-04-10 22:25 ` Andrii Nakryiko
2020-04-10 23:25 ` Yonghong Song
2020-04-08 23:25 ` [RFC PATCH bpf-next 04/16] bpf: allow loading of a dumper program Yonghong Song
2020-04-10 22:36 ` Andrii Nakryiko
2020-04-10 23:28 ` Yonghong Song
2020-04-13 19:33 ` Andrii Nakryiko
2020-04-08 23:25 ` [RFC PATCH bpf-next 05/16] bpf: create file or anonymous dumpers Yonghong Song
2020-04-10 3:00 ` Alexei Starovoitov
2020-04-10 6:09 ` Yonghong Song
2020-04-10 22:42 ` Yonghong Song
2020-04-10 22:53 ` Andrii Nakryiko
2020-04-10 23:47 ` Yonghong Song
2020-04-11 23:11 ` Alexei Starovoitov
2020-04-12 6:51 ` Yonghong Song
2020-04-13 20:48 ` Andrii Nakryiko
2020-04-10 22:51 ` Andrii Nakryiko
2020-04-10 23:41 ` Yonghong Song
2020-04-13 19:45 ` Andrii Nakryiko
2020-04-10 23:25 ` Andrii Nakryiko
2020-04-11 0:23 ` Yonghong Song
2020-04-11 23:17 ` Alexei Starovoitov
2020-04-13 21:04 ` Andrii Nakryiko
2020-04-13 19:59 ` Andrii Nakryiko
2020-04-14 5:56 ` Andrii Nakryiko
2020-04-14 23:59 ` Yonghong Song [this message]
2020-04-15 4:45 ` Andrii Nakryiko
2020-04-15 16:46 ` Alexei Starovoitov
2020-04-16 1:48 ` Andrii Nakryiko
2020-04-16 7:15 ` Yonghong Song
2020-04-16 17:04 ` Alexei Starovoitov
2020-04-16 19:35 ` Andrii Nakryiko
2020-04-16 23:18 ` Alexei Starovoitov
2020-04-17 5:11 ` Andrii Nakryiko
2020-04-19 6:11 ` Yonghong Song
2020-04-08 23:25 ` [RFC PATCH bpf-next 06/16] bpf: add netlink and ipv6_route targets Yonghong Song
2020-04-10 23:13 ` Andrii Nakryiko
2020-04-10 23:52 ` Yonghong Song
2020-04-08 23:25 ` [RFC PATCH bpf-next 07/16] bpf: add bpf_map target Yonghong Song
2020-04-13 22:18 ` Andrii Nakryiko
2020-04-13 22:47 ` Andrii Nakryiko
2020-04-08 23:25 ` [RFC PATCH bpf-next 08/16] bpf: add task and task/file targets Yonghong Song
2020-04-10 3:22 ` Alexei Starovoitov
2020-04-10 6:19 ` Yonghong Song
2020-04-10 21:31 ` Alexei Starovoitov
2020-04-10 21:33 ` Alexei Starovoitov
2020-04-13 23:00 ` Andrii Nakryiko
2020-04-08 23:25 ` [RFC PATCH bpf-next 09/16] bpf: add bpf_seq_printf and bpf_seq_write helpers Yonghong Song
2020-04-10 3:26 ` Alexei Starovoitov
2020-04-10 6:12 ` Yonghong Song
2020-04-14 5:28 ` Andrii Nakryiko
2020-04-08 23:25 ` [RFC PATCH bpf-next 10/16] bpf: support variable length array in tracing programs Yonghong Song
2020-04-14 0:13 ` Andrii Nakryiko
2020-04-08 23:25 ` [RFC PATCH bpf-next 11/16] bpf: implement query for target_proto and file dumper prog_id Yonghong Song
2020-04-10 3:10 ` Alexei Starovoitov
2020-04-10 6:11 ` Yonghong Song
2020-04-08 23:25 ` [RFC PATCH bpf-next 12/16] tools/libbpf: libbpf support for bpfdump Yonghong Song
2020-04-08 23:25 ` [RFC PATCH bpf-next 13/16] tools/bpftool: add bpf dumper support Yonghong Song
2020-04-08 23:25 ` [RFC PATCH bpf-next 14/16] tools/bpf: selftests: add dumper programs for ipv6_route and netlink Yonghong Song
2020-04-14 5:39 ` Andrii Nakryiko
2020-04-08 23:25 ` [RFC PATCH bpf-next 15/16] tools/bpf: selftests: add dumper progs for bpf_map/task/task_file Yonghong Song
2020-04-10 3:33 ` Alexei Starovoitov
2020-04-10 6:41 ` Yonghong Song
2020-04-08 23:25 ` [RFC PATCH bpf-next 16/16] tools/bpf: selftests: add a selftest for anonymous dumper Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4bf72b3c-5fee-269f-1d71-7f808f436db9@fb.com \
--to=yhs@fb.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andriin@fb.com \
--cc=ast@fb.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kafai@fb.com \
--cc=kernel-team@fb.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox