From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>, Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
dwarves@vger.kernel.org, bpf <bpf@vger.kernel.org>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Alan Maguire <alan.maguire@oracle.com>
Subject: Re: [PATCH dwarves 0/7] Add support for generating BTF for all variables
Date: Wed, 07 Sep 2022 14:54:08 -0700 [thread overview]
Message-ID: <87v8pylukf.fsf@oracle.com> (raw)
In-Reply-To: <CAADnVQKbf5nEBnuSLmfZ_kGLmUzeD5htc1ezbJsVg72adF4bLw@mail.gmail.com>
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Wed, Sep 7, 2022 at 12:07 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>> > On Fri, Aug 26, 2022 at 11:54 AM Stephen Brennan
>> > <stephen.s.brennan@oracle.com> wrote:
>> [...]
>> >> Future Work
>> >> -----------
>> >>
>> >> If this proves acceptable, I'd like to follow-up with a kernel patch to
>> >> add a configuration option (default=n) for generating BTF with all
>> >> variables, which distributions could choose to enable or not.
>> >>
>> >> There was previous discussion[3] about leveraging split BTF or building
>> >> additional kernel modules to contain the extra variables. I believe with
>> >> this patch series, it is possible to do that. However, I'd argue that
>> >> simpler is better here: the advantage for using BTF is having it all
>> >> available in the kernel/module image. Storing extra BTF on the
>> >> filesystem would break that advantage, and at that point, you'd be
>> >> better off using a debuginfo format like CTF, which is lightweight and
>> >> expected to be found on the filesystem.
>> >
>> > With all or nothing approach the distros would have a hard choice
>> > to make whether to enable that kconfig, increase BTF and consume
>> > extra memory without any obvious reason or just don't do it.
>> > Majority probably is not going to enable it.
>> > So the feature will become a single vendor only and with
>> > inevitable bit-rot.
>>
>> I'd intend to support it even if just a single distribution enabled it.
>> But I do see your concern.
>
> This thread was dormant for 8 days.
> That's a poor example of "intend to support".
You're right, I definitely could have replied sooner. I'm sorry for that.
>> > Whereas with split BTF and extra kernel module approach
>> > we can enable BTF with all global vars by default.
>> > The extra module will be shipped by all distros and tools
>> > like bpftrace might start using it.
>>
>> Split BTF is currently limited to a single base BTF file. We'd need more
>> patches for pahole to support multiple --btf_base files: e.g.
>> vmlinux.btf and vmlinux-variables.btf. There's also the question of
>> modules: presumably we wouldn't try to have "$MODULE" and
>> "$MODULE-btf-extra" modules due to the added complexity. I doubt the
>> space savings would be worth it.
>>
>> I can look into these concerns, but if possible I would like to proceed
>> with this series, as it is a separate concern from the exact mechanism
>> by which we include extra BTF into the kernel.
>
> Not an option. Sorry.
Ok, so let me describe what I understand to be the proposed design as of
the previous thread, and see if it satisfies your concerns. We can work
from there to make sure we've got a concensus design before going
further.
Option #1
---------
* A new module, "vmlinux-btf-extra" (or something roughly like that) is
added, which contains BTF only. It is generated with
--encode_all_btf_vars and uses --btf_base=path/to/vmlinux_btf so that
it contains only BTF variables. The vmlinux BTF would be generated
same as always (without the --encode_all_btf_vars).
* In the previous thread, it was proposed [1] that modules could
include variables in their BTF in order to reduce the complexity of
the change. Modules would have their BTF generated using
--encode_all_btf_vars and --btf_base=path/to/vmlinux_btf. The
resulting hierarchy would look like this:
vmlinux_btf [ functions and percpu vars only ]
|- vmlinux-btf-extra [ all other vars for vmlinux ]
|- $MODULE [ functions and all vars ]
...
This option is desirable because it means that we only need 2-level
split BTF, and so we don't actually need to make changes to pahole for
multiple --btf_base files. There are two downsides I see:
(a) While we save space on vmlinux BTF, each module will have a bit of
extra data for variable types. On my laptop (5.15 based) I have 9.8
MB of BTF, and if you deduct vmlinux, you're still left with 4.7 MB.
If we assume the same overhead of 23.7%, that would be 1.1 MB of
extra module BTF for my particular use case.
$ ls -l /sys/kernel/btf | awk '{sum += $5} END {print(sum)}'
9876871
$ ls -l /sys/kernel/btf/vmlinux
-r--r--r-- 1 root root 5174406 Sep 7 14:20 /sys/kernel/btf/vmlinux
(b) It's possible for "vmlinux-btf-extras" and "$MODULE" to contain
duplicate type definitions, wasting additional space. However, as
far as I understand it, this was already a possibility, e.g.
$MODULE1 and $MODULE2 could already contain duplicate types. So I
think this downside is no more.
Option #2
---------
* The vmlinux-btf-extra module is still added as in Option #1.
* Further, each module would have its own "$MODULE-btf-extra" module to
add in extra BTF. These would be built with a --btf_base=$MODULE.ko
and of course that BTF is based on vmlinux, so we would have:
vmlinux_btf [ functions and percpu vars only ]
|- vmlinux-btf-extras [ all other vars for vmlinux ]
|- $MODULE [ functions and percpu vars only ]
|- $MODULE-btf-extra [ all other vars for $MODULE ]
This is much more complex, pahole must be extended to support a
hierarchy of --btf_base files. The kernel itself may not need to
understand multi-level BTF since there's no requirement that it actually
understand $MODULE-btf-extra, so long as it exposes it via
/sys/kernel/btf/$MODULE-btf-extra. I'd also like to see some sort of
mechanism to allow an administrator to say "please always load
$MODULE-btf-extras alongside $MODULE", but I think that would be a
userspace problem.
This resolves issue (a) from option #1, of course at implementation
cost.
Regardless of Option #1 or #2, I'd propose that we implement this as a
tristate, similar to what Alan proposed [2]. When set to "m" we use the
solutions described above, and when set to "y", we don't bother with it,
instead using --encode_all_btf_vars for all generation.
If we go with Option #1, no changes to this series should be necessary.
If we go with Option #2, I'll need to extend pahole to support at least
two BTF base files. Please let me know your thoughts.
Stephen
[1]: https://lore.kernel.org/bpf/CAEf4BzZmJKqXaJMBxhKqFNXzjO=eN5gk2xQVnmQVdK2xd3HQ=g@mail.gmail.com/
[2]: https://lore.kernel.org/bpf/alpine.LRH.2.23.451.2205032254390.10133@MyRouter/
next prev parent reply other threads:[~2022-09-07 21:56 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-26 18:49 [PATCH dwarves 0/7] Add support for generating BTF for all variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 1/7] dutil: return ELF section name when looked up by index Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 2/7] btf_encoder: Rename percpu structures to variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 3/7] btf_encoder: cache all ELF section info Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 4/7] btf_encoder: make the variable array dynamic Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 5/7] btf_encoder: record ELF section for collected variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 6/7] btf_encoder: collect all variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 7/7] btf_encoder: allow encoding " Stephen Brennan
2022-08-30 15:14 ` [PATCH dwarves 0/7] Add support for generating BTF for " Alexei Starovoitov
2022-09-07 19:06 ` Stephen Brennan
2022-09-07 19:27 ` Alexei Starovoitov
2022-09-07 21:54 ` Stephen Brennan [this message]
2022-09-08 20:35 ` Alexei Starovoitov
2022-09-09 19:31 ` Stephen Brennan
2022-09-23 23:38 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v8pylukf.fsf@oracle.com \
--to=stephen.s.brennan@oracle.com \
--cc=acme@kernel.org \
--cc=alan.maguire@oracle.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=dwarves@vger.kernel.org \
--cc=jolsa@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox