All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>, Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	dwarves@vger.kernel.org, bpf <bpf@vger.kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alan Maguire <alan.maguire@oracle.com>
Subject: Re: [PATCH dwarves 0/7] Add support for generating BTF for all variables
Date: Wed, 07 Sep 2022 14:54:08 -0700	[thread overview]
Message-ID: <87v8pylukf.fsf@oracle.com> (raw)
In-Reply-To: <CAADnVQKbf5nEBnuSLmfZ_kGLmUzeD5htc1ezbJsVg72adF4bLw@mail.gmail.com>

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Wed, Sep 7, 2022 at 12:07 PM Stephen Brennan
> <stephen.s.brennan@oracle.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>> > On Fri, Aug 26, 2022 at 11:54 AM Stephen Brennan
>> > <stephen.s.brennan@oracle.com> wrote:
>> [...]
>> >> Future Work
>> >> -----------
>> >>
>> >> If this proves acceptable, I'd like to follow-up with a kernel patch to
>> >> add a configuration option (default=n) for generating BTF with all
>> >> variables, which distributions could choose to enable or not.
>> >>
>> >> There was previous discussion[3] about leveraging split BTF or building
>> >> additional kernel modules to contain the extra variables. I believe with
>> >> this patch series, it is possible to do that. However, I'd argue that
>> >> simpler is better here: the advantage for using BTF is having it all
>> >> available in the kernel/module image. Storing extra BTF on the
>> >> filesystem would break that advantage, and at that point, you'd be
>> >> better off using a debuginfo format like CTF, which is lightweight and
>> >> expected to be found on the filesystem.
>> >
>> > With all or nothing approach the distros would have a hard choice
>> > to make whether to enable that kconfig, increase BTF and consume
>> > extra memory without any obvious reason or just don't do it.
>> > Majority probably is not going to enable it.
>> > So the feature will become a single vendor only and with
>> > inevitable bit-rot.
>>
>> I'd intend to support it even if just a single distribution enabled it.
>> But I do see your concern.
>
> This thread was dormant for 8 days.
> That's a poor example of "intend to support".

You're right, I definitely could have replied sooner. I'm sorry for that.

>> > Whereas with split BTF and extra kernel module approach
>> > we can enable BTF with all global vars by default.
>> > The extra module will be shipped by all distros and tools
>> > like bpftrace might start using it.
>>
>> Split BTF is currently limited to a single base BTF file. We'd need more
>> patches for pahole to support multiple --btf_base files: e.g.
>> vmlinux.btf and vmlinux-variables.btf. There's also the question of
>> modules: presumably we wouldn't try to have "$MODULE" and
>> "$MODULE-btf-extra" modules due to the added complexity. I doubt the
>> space savings would be worth it.
>>
>> I can look into these concerns, but if possible I would like to proceed
>> with this series, as it is a separate concern from the exact mechanism
>> by which we include extra BTF into the kernel.
>
> Not an option. Sorry.

Ok, so let me describe what I understand to be the proposed design as of
the previous thread, and see if it satisfies your concerns. We can work
from there to make sure we've got a concensus design before going
further.

Option #1
---------

* A new module, "vmlinux-btf-extra" (or something roughly like that) is
  added, which contains BTF only. It is generated with
  --encode_all_btf_vars and uses --btf_base=path/to/vmlinux_btf so that
  it contains only BTF variables. The vmlinux BTF would be generated
  same as always (without the --encode_all_btf_vars).

* In the previous thread, it was proposed [1] that modules could
  include variables in their BTF in order to reduce the complexity of
  the change. Modules would have their BTF generated using
  --encode_all_btf_vars and --btf_base=path/to/vmlinux_btf. The
  resulting hierarchy would look like this:

  vmlinux_btf  [ functions and percpu vars only ]
  |- vmlinux-btf-extra [ all other vars for vmlinux ]
  |- $MODULE   [ functions and all vars ]
  ...

This option is desirable because it means that we only need 2-level
split BTF, and so we don't actually need to make changes to pahole for
multiple --btf_base files. There are two downsides I see:

(a) While we save space on vmlinux BTF, each module will have a bit of
    extra data for variable types. On my laptop (5.15 based) I have 9.8
    MB of BTF, and if you deduct vmlinux, you're still left with 4.7 MB.
    If we assume the same overhead of 23.7%, that would be 1.1 MB of
    extra module BTF for my particular use case.

    $ ls -l /sys/kernel/btf | awk '{sum += $5} END {print(sum)}'
    9876871
    $ ls -l /sys/kernel/btf/vmlinux
    -r--r--r-- 1 root root 5174406 Sep  7 14:20 /sys/kernel/btf/vmlinux

(b) It's possible for "vmlinux-btf-extras" and "$MODULE" to contain
    duplicate type definitions, wasting additional space. However, as
    far as I understand it, this was already a possibility, e.g.
    $MODULE1 and $MODULE2 could already contain duplicate types. So I
    think this downside is no more.


Option #2
---------

* The vmlinux-btf-extra module is still added as in Option #1.

* Further, each module would have its own "$MODULE-btf-extra" module to
  add in extra BTF. These would be built with a --btf_base=$MODULE.ko
  and of course that BTF is based on vmlinux, so we would have:

  vmlinux_btf              [ functions and percpu vars only ]
  |- vmlinux-btf-extras    [ all other vars for vmlinux ]
  |- $MODULE               [ functions and percpu vars only ]
     |- $MODULE-btf-extra  [ all  other vars for $MODULE ]

This is much more complex, pahole must be extended to support a
hierarchy of --btf_base files. The kernel itself may not need to
understand multi-level BTF since there's no requirement that it actually
understand $MODULE-btf-extra, so long as it exposes it via
/sys/kernel/btf/$MODULE-btf-extra. I'd also like to see some sort of
mechanism to allow an administrator to say "please always load
$MODULE-btf-extras alongside $MODULE", but I think that would be a
userspace problem.

This resolves issue (a) from option #1, of course at implementation
cost.

Regardless of Option #1 or #2, I'd propose that we implement this as a
tristate, similar to what Alan proposed [2]. When set to "m" we use the
solutions described above, and when set to "y", we don't bother with it,
instead using --encode_all_btf_vars for all generation.

If we go with Option #1, no changes to this series should be necessary.
If we go with Option #2, I'll need to extend pahole to support at least
two BTF base files. Please let me know your thoughts.

Stephen

[1]: https://lore.kernel.org/bpf/CAEf4BzZmJKqXaJMBxhKqFNXzjO=eN5gk2xQVnmQVdK2xd3HQ=g@mail.gmail.com/
[2]: https://lore.kernel.org/bpf/alpine.LRH.2.23.451.2205032254390.10133@MyRouter/

  reply	other threads:[~2022-09-07 21:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26 18:49 [PATCH dwarves 0/7] Add support for generating BTF for all variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 1/7] dutil: return ELF section name when looked up by index Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 2/7] btf_encoder: Rename percpu structures to variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 3/7] btf_encoder: cache all ELF section info Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 4/7] btf_encoder: make the variable array dynamic Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 5/7] btf_encoder: record ELF section for collected variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 6/7] btf_encoder: collect all variables Stephen Brennan
2022-08-26 18:49 ` [PATCH dwarves 7/7] btf_encoder: allow encoding " Stephen Brennan
2022-08-30 15:14 ` [PATCH dwarves 0/7] Add support for generating BTF for " Alexei Starovoitov
2022-09-07 19:06   ` Stephen Brennan
2022-09-07 19:27     ` Alexei Starovoitov
2022-09-07 21:54       ` Stephen Brennan [this message]
2022-09-08 20:35         ` Alexei Starovoitov
2022-09-09 19:31           ` Stephen Brennan
2022-09-23 23:38             ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v8pylukf.fsf@oracle.com \
    --to=stephen.s.brennan@oracle.com \
    --cc=acme@kernel.org \
    --cc=alan.maguire@oracle.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dwarves@vger.kernel.org \
    --cc=jolsa@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.