Re: Question: missing vmlinux BTF variable declarations

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Yonghong Song <yhs@fb.com>, Shung-Hsi Yu <shung-hsi.yu@suse.com>,
	bpf <bpf@vger.kernel.org>, Omar Sandoval <osandov@osandov.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: Re: Question: missing vmlinux BTF variable declarations
Date: Wed, 27 Apr 2022 11:24:42 -0700	[thread overview]
Message-ID: <87r15iv0yd.fsf@stepbren-lnx.us.oracle.com> (raw)
In-Reply-To: <CAEf4BzbiFNnsu9pji5ifzj4nVEyAYYdqP=QVZ3XFwzL48prP3A@mail.gmail.com>

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Wed, Mar 16, 2022 at 11:11 PM Stephen Brennan <stephen@brennan.io> wrote:
>>
>> Arnaldo Carvalho de Melo <acme@kernel.org> writes:
>> [...]
>> >> I think that kallsyms, BTF, and ORC together will be enough to provide a
>> >> lite debugging experience. Some things will be missing:
>> >
>> >> - mapping backtrace addresses to source code lines
>> >
>> > So, BTF has provisions for that, and its present in the eBPF programs,
>> > perf annotate uses it, see tools/perf/util/annotate.c,
>> > symbol__disassemble_bpf(), it goes like:
>> >
>> >         struct bpf_prog_linfo *prog_linfo = NULL;
>> >
>> >         info_node = perf_env__find_bpf_prog_info(dso->bpf_prog.env,
>> >                                                  dso->bpf_prog.id);
>> >         if (!info_node) {
>> >                 ret = SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF;
>> >                 goto out;
>> >         }
>> >         info_linear = info_node->info_linear;
>> >         sub_id = dso->bpf_prog.sub_id;
>> >
>> >         info.buffer = (void *)(uintptr_t)(info_linear->info.jited_prog_insns);
>> >         info.buffer_length = info_linear->info.jited_prog_len;
>> >
>> >         if (info_linear->info.nr_line_info)
>> >                 prog_linfo = bpf_prog_linfo__new(&info_linear->info);
>> >
>> >                 addr = pc + ((u64 *)(uintptr_t)(info_linear->info.jited_ksyms))[sub_id];
>> >                 count = disassemble(pc, &info);
>> >
>> >                 if (prog_linfo)
>> >                         linfo = bpf_prog_linfo__lfind_addr_func(prog_linfo,
>> >                                                                 addr, sub_id,
>> >                                                                 nr_skip);
>> >                               if (linfo && btf) {
>> >                         srcline = btf__name_by_offset(btf, linfo->line_off);
>> >                         nr_skip++;
>> >                 } else
>> >                         srcline = NULL;
>> >
>> > etc.
>> >
>> > Having this for the kernel proper is thus doable, but then we go on
>> > making BTF info grow.
>> >
>> > Perhaps having this as optional, distros or appliances wanting to have a
>> > kernel with this extra info would add it and then tools would use it if
>> > available?
>>
>> I didn't know about the source code mapping support! And I certainly see
>> the utility of it for BPF programs. However, I'm not sure that a "lite"
>> kernel debugging experience *needs* source line mapping. I suppose I
>> should have made it more clear, but I don't think of that list of
>> "missing" features as a checklist of things we'd want feature parity
>> for.
>>
>> The advantage of BTF for debugging would be that it is small, and that
>> it is part of the kernel image without referencing any other file,
>> build-id, or kernel version. Ideally, a debugger could load a crash dump
>> with no additional information, and support a reasonable level of
>> debugging. I think looking up typed data structure values via global
>> symbols is part of that level, as well as simple backtraces and other
>> memory access.
>>
>> I wouldn't want to try to re-implement DWARF for debuginfo. If you have
>> the DWARF debuginfo, then your experience should be much better.
>>
>> >> - intelligent stack frame information from DWARF CFI (e.g.
>> >>   register/variable values)
>> >> - probably other things, I'm not a DWARF expert.
>> [...]
>> >> > Currently on my local machine, the vmlinux BTF's size is 4.2MB and
>> >> > adding 1MB would be a big increase. CONFIG_DEBUG_INFO_BTF_ALL is a good
>> >> > idea. But we might be able to just add global variables without this
>> >> > new config if we have strong use case.
>> >
>> >> And unfortunately 1MiB is really just a shot in the dark, guessing
>> >> around 70k variables with no string data.
>> >
>> > Maybe we can have a separate BTF file with all this extra info that
>> > could be fetched from somewhere, keyed by build-id, like is now possible
>> > with debuginfod and DWARF?
>>
>> For me, this ranges into the territory of duplicating DWARF. If you lose
>> the one key advantage of "debuginfoless debugging", then you might as
>> well use the build-id to lookup DWARF debuginfo as we can today.
>>
>> This is why I'm trying to propose the means of combining the kallsyms
>> string data with BTF. Anything that can make the overall size increase
>> manageable so that all the necessary data can stay in the kernel image.
>
> I think this quirk of using kallsyms strings is a no-go. But we should
> experiment and see how much bigger BTF becomes when including all the
> variables. Can you try to prototype pahole's support for this?

Hi Andrii,

Sorry for such a delay here. I tried to prototype this last month but
encountered some issues I couldn't resolve. But recently I picked it up
and I've created a prototype [1] which outputs all variables. (It's a
quite bad prototype, it strips out some useful logic regarding the
BTF_VAR_DATASEC for percpu variables. But I think it's good enough).

On my 5.4-based kernel I saw an increase in BTF section size from 3.8
MiB all the way to 6.1 MiB, or more precisely:

BTF section before: 3905938 bytes
BTF section after:  6391989 bytes (+2486051, +63.6%)

So almost a 2.5 MiB increase. My prototype doesn't output the
btf_var_secinfo structs for percpu variables anymore, which probably
breaks some BPF and reduces BTF slightly. But it also is outputting
a few thousand "dwarf variables" which were correctly filtered before,
so I think it's a wash and it's a pretty good comparison.

Clearly it can't be added without a configuration option, as 2.5 MiB is
pretty huge for a kernel memory addition. But I don't think it's so huge
that nobody would enable it. I know I would :)

[1]: https://github.com/brenns10/dwarves/tree/remove_percpu_restriction_1

> As you
> said, we can guard this extra information with KConfig and pahole
> flags, so distros can always opt-out of bigger BTF if that's too
> prohibitive. As it is right now, without firm understanding how big
> the final BTF is it's hard to make a good decision about go or no-go
> for this.

Hopefully this comparison sheds some light on that now!

>
> As for including source code itself, it going to be prohibitively
> huge, so it's probably out of the question for now as well.

Yeah, I wouldn't advocate for that.

Now, to share some of the cool possibilities that this enables. I have:
- prototype pahole [1] used for the kernel build,
- a prototype drgn with BTF+kallsyms support [2],
- some small kernel patches which add symbols to vmcoreinfo, so that
  drgn can find the kallsyms section. I'm happy to share these, I just
  haven't sent them anywhere yet.

[2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf

Combining these three things, I've got a debugger which can open up a
vmcore _without DWARF debuginfo_ and allow you to print out typed
variable values. It just relies on BTF + kallsyms.

So the proof of concept is proven, and I'm quite excited about it!

Stephen

next prev parent reply	other threads:[~2022-04-27 18:41 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-09 23:20 Question: missing vmlinux BTF variable declarations Stephen Brennan
2022-03-14  7:09 ` Shung-Hsi Yu
2022-03-15  5:53   ` Yonghong Song
2022-03-15 16:37     ` Stephen Brennan
2022-03-15 17:58       ` Arnaldo Carvalho de Melo
2022-03-16 16:06         ` Stephen Brennan
2022-03-25 17:07           ` Andrii Nakryiko
2022-04-27 18:24             ` Stephen Brennan [this message]
2022-04-29 17:10               ` Alexei Starovoitov
2022-05-03 14:39                 ` Arnaldo Carvalho de Melo
2022-05-03 17:29                   ` Stephen Brennan
2022-05-03 22:31                     ` Alan Maguire
2022-05-10  0:10                       ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r15iv0yd.fsf@stepbren-lnx.us.oracle.com \
    --to=stephen.s.brennan@oracle.com \
    --cc=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=shung-hsi.yu@suse.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox