From: Greg KH <gregkh@linuxfoundation.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Yonghong Song <yhs@fb.com>, Andrii Nakryiko <andriin@fb.com>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Alexei Starovoitov <ast@fb.com>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
Kernel Team <Kernel-team@fb.com>,
Masahiro Yamada <yamada.masahiro@socionext.com>,
Arnaldo Carvalho de Melo <acme@redhat.com>,
Jiri Olsa <jolsa@kernel.org>, Sam Ravnborg <sam@ravnborg.org>
Subject: Re: [PATCH v2 bpf-next] btf: expose BTF info through sysfs
Date: Thu, 8 Aug 2019 20:11:04 +0200 [thread overview]
Message-ID: <20190808181104.GA31357@kroah.com> (raw)
In-Reply-To: <CAEf4BzaWtumTrc7h1t3w8hA1L8mVo2Cm0B+eLSe4eSghFAu3iw@mail.gmail.com>
On Thu, Aug 08, 2019 at 10:53:44AM -0700, Andrii Nakryiko wrote:
> On Wed, Aug 7, 2019 at 11:08 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Thu, Aug 08, 2019 at 04:24:25AM +0000, Yonghong Song wrote:
> > >
> > >
> > > On 8/7/19 5:32 PM, Andrii Nakryiko wrote:
> > > > Make .BTF section allocated and expose its contents through sysfs.
> >
> > Was this original patch not on bpf@vger? I can't find it in my
> > archive. Anyway...
> >
> > > > /sys/kernel/btf directory is created to contain all the BTFs present
> > > > inside kernel. Currently there is only kernel's main BTF, represented as
> > > > /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
> > > > each module will expose its BTF as /sys/kernel/btf/<module-name> file.
> >
> > Why are you using sysfs for this? Who uses "BTF"s? Are these debugging
> > images that only people working on developing bpf programs are going to
> > need, or are these things that you are going to need on a production
> > system?
>
> We need it in production system. One immediate and direct use case is
> BPF CO-RE (Compile Once - Run Everywhere), which aims to allow to
> pre-compile BPF applications (even those that read internal kernel
> structures) using any local kernel headers, and then distribute and
> run them in binary form on all target production machines without
> dependencies on kernel headers and having Clang on target machine to
> compile C to BPF IR. Libbpf is doing all those adjustments/relocations
> based on kernel's actual BTF. See [0] for a summary and slides, if you
> curious to learn more.
>
> [0] http://vger.kernel.org/bpfconf2019.html#session-2
Ok, then a binary sysfs file is fine, no objection from me.
> > I ask as maybe debugfs is the best place for this if they are not needed
> > on production systems.
> >
> >
> > > >
> > > > Current approach relies on a few pieces coming together:
> > > > 1. pahole is used to take almost final vmlinux image (modulo .BTF and
> > > > kallsyms) and generate .BTF section by converting DWARF info into
> > > > BTF. This section is not allocated and not mapped to any segment,
> > > > though, so is not yet accessible from inside kernel at runtime.
> > > > 2. objcopy dumps .BTF contents into binary file and subsequently
> > > > convert binary file into linkable object file with automatically
> > > > generated symbols _binary__btf_kernel_bin_start and
> > > > _binary__btf_kernel_bin_end, pointing to start and end, respectively,
> > > > of BTF raw data.
> > > > 3. final vmlinux image is generated by linking this object file (and
> > > > kallsyms, if necessary). sysfs_btf.c then creates
> > > > /sys/kernel/btf/kernel file and exposes embedded BTF contents through
> > > > it. This allows, e.g., libbpf and bpftool access BTF info at
> > > > well-known location, without resorting to searching for vmlinux image
> > > > on disk (location of which is not standardized and vmlinux image
> > > > might not be even available in some scenarios, e.g., inside qemu
> > > > during testing).
> > > >
> > > > Alternative approach using .incbin assembler directive to embed BTF
> > > > contents directly was attempted but didn't work, because sysfs_proc.o is
> > > > not re-compiled during link-vmlinux.sh stage. This is required, though,
> > > > to update embedded BTF data (initially empty data is embedded, then
> > > > pahole generates BTF info and we need to regenerate sysfs_btf.o with
> > > > updated contents, but it's too late at that point).
> > > >
> > > > If BTF couldn't be generated due to missing or too old pahole,
> > > > sysfs_btf.c handles that gracefully by detecting that
> > > > _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
> > > > /sys/kernel/btf at all.
> > > >
> > > > v1->v2:
> > > > - allow kallsyms stage to re-use vmlinux generated by gen_btf();
> > > >
> > > > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > > > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > > Cc: Jiri Olsa <jolsa@kernel.org>
> > > > Cc: Sam Ravnborg <sam@ravnborg.org>
> > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > ---
> > > > kernel/bpf/Makefile | 3 +++
> > > > kernel/bpf/sysfs_btf.c | 52 ++++++++++++++++++++++++++++++++++++++
> > > > scripts/link-vmlinux.sh | 55 +++++++++++++++++++++++++++--------------
> > > > 3 files changed, 91 insertions(+), 19 deletions(-)
> > > > create mode 100644 kernel/bpf/sysfs_btf.c
> >
> > First rule, you can't create new sysfs files without a matching
> > Documentation/ABI/ set of entries. Please do that for the next version
> > of this patch so we can properly check to see if what you are
> > documenting lines up with the code. Otherwise we just have to guess as
> > to what the entries you are creating actually do.
>
> Yep, sure, I wasn't aware, will add in v3.
thanks.
> > > > +static int __init btf_kernel_init(void)
> > > > +{
> > > > + if (!_binary__btf_kernel_bin_start)
> > > > + return 0;
> > > > +
> > > > + btf_kernel_attr.size = _binary__btf_kernel_bin_end -
> > > > + _binary__btf_kernel_bin_start;
> > > > +
> > > > + return sysfs_create_group(kernel_kobj, &btf_group_attr);
> >
> > You are nesting directories here without a "real" kobject in the middle.
> > Are you _sure_ you want to do that? It's going to get really tricky
> > later on based on your comments above about creating multiple files in
> > that directory over time once "modules" are allowed.
>
> My thinking was that when we have BTF for modules, I'll need to do
> some code adjustments anyway, at which point it will be more clear how
> we want to structure that. But I can add explicit kobject as static
> variable right now, no problems. Later on we probably will just switch
> it to be exported, so that modules can self-register/unregister their
> BTFs autonomously.
A "real" kobject to start with here would probably be best. Keeps
things simpler later as well.
thanks,
greg k-h
next prev parent reply other threads:[~2019-08-08 18:11 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20190808003215.1462821-1-andriin@fb.com>
2019-08-08 4:24 ` [PATCH v2 bpf-next] btf: expose BTF info through sysfs Yonghong Song
2019-08-08 6:08 ` Greg KH
2019-08-08 17:53 ` Andrii Nakryiko
2019-08-08 18:11 ` Greg KH [this message]
2019-08-08 17:47 ` Andrii Nakryiko
2019-08-08 20:21 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190808181104.GA31357@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=Kernel-team@fb.com \
--cc=acme@redhat.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andriin@fb.com \
--cc=ast@fb.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jolsa@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sam@ravnborg.org \
--cc=yamada.masahiro@socionext.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.