From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
Yonghong Song <yhs@fb.com>, Shung-Hsi Yu <shung-hsi.yu@suse.com>,
bpf <bpf@vger.kernel.org>, Omar Sandoval <osandov@osandov.com>,
Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: Re: Question: missing vmlinux BTF variable declarations
Date: Wed, 27 Apr 2022 11:24:42 -0700 [thread overview]
Message-ID: <87r15iv0yd.fsf@stepbren-lnx.us.oracle.com> (raw)
In-Reply-To: <CAEf4BzbiFNnsu9pji5ifzj4nVEyAYYdqP=QVZ3XFwzL48prP3A@mail.gmail.com>
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Wed, Mar 16, 2022 at 11:11 PM Stephen Brennan <stephen@brennan.io> wrote:
>>
>> Arnaldo Carvalho de Melo <acme@kernel.org> writes:
>> [...]
>> >> I think that kallsyms, BTF, and ORC together will be enough to provide a
>> >> lite debugging experience. Some things will be missing:
>> >
>> >> - mapping backtrace addresses to source code lines
>> >
>> > So, BTF has provisions for that, and its present in the eBPF programs,
>> > perf annotate uses it, see tools/perf/util/annotate.c,
>> > symbol__disassemble_bpf(), it goes like:
>> >
>> > struct bpf_prog_linfo *prog_linfo = NULL;
>> >
>> > info_node = perf_env__find_bpf_prog_info(dso->bpf_prog.env,
>> > dso->bpf_prog.id);
>> > if (!info_node) {
>> > ret = SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF;
>> > goto out;
>> > }
>> > info_linear = info_node->info_linear;
>> > sub_id = dso->bpf_prog.sub_id;
>> >
>> > info.buffer = (void *)(uintptr_t)(info_linear->info.jited_prog_insns);
>> > info.buffer_length = info_linear->info.jited_prog_len;
>> >
>> > if (info_linear->info.nr_line_info)
>> > prog_linfo = bpf_prog_linfo__new(&info_linear->info);
>> >
>> > addr = pc + ((u64 *)(uintptr_t)(info_linear->info.jited_ksyms))[sub_id];
>> > count = disassemble(pc, &info);
>> >
>> > if (prog_linfo)
>> > linfo = bpf_prog_linfo__lfind_addr_func(prog_linfo,
>> > addr, sub_id,
>> > nr_skip);
>> > if (linfo && btf) {
>> > srcline = btf__name_by_offset(btf, linfo->line_off);
>> > nr_skip++;
>> > } else
>> > srcline = NULL;
>> >
>> > etc.
>> >
>> > Having this for the kernel proper is thus doable, but then we go on
>> > making BTF info grow.
>> >
>> > Perhaps having this as optional, distros or appliances wanting to have a
>> > kernel with this extra info would add it and then tools would use it if
>> > available?
>>
>> I didn't know about the source code mapping support! And I certainly see
>> the utility of it for BPF programs. However, I'm not sure that a "lite"
>> kernel debugging experience *needs* source line mapping. I suppose I
>> should have made it more clear, but I don't think of that list of
>> "missing" features as a checklist of things we'd want feature parity
>> for.
>>
>> The advantage of BTF for debugging would be that it is small, and that
>> it is part of the kernel image without referencing any other file,
>> build-id, or kernel version. Ideally, a debugger could load a crash dump
>> with no additional information, and support a reasonable level of
>> debugging. I think looking up typed data structure values via global
>> symbols is part of that level, as well as simple backtraces and other
>> memory access.
>>
>> I wouldn't want to try to re-implement DWARF for debuginfo. If you have
>> the DWARF debuginfo, then your experience should be much better.
>>
>> >> - intelligent stack frame information from DWARF CFI (e.g.
>> >> register/variable values)
>> >> - probably other things, I'm not a DWARF expert.
>> [...]
>> >> > Currently on my local machine, the vmlinux BTF's size is 4.2MB and
>> >> > adding 1MB would be a big increase. CONFIG_DEBUG_INFO_BTF_ALL is a good
>> >> > idea. But we might be able to just add global variables without this
>> >> > new config if we have strong use case.
>> >
>> >> And unfortunately 1MiB is really just a shot in the dark, guessing
>> >> around 70k variables with no string data.
>> >
>> > Maybe we can have a separate BTF file with all this extra info that
>> > could be fetched from somewhere, keyed by build-id, like is now possible
>> > with debuginfod and DWARF?
>>
>> For me, this ranges into the territory of duplicating DWARF. If you lose
>> the one key advantage of "debuginfoless debugging", then you might as
>> well use the build-id to lookup DWARF debuginfo as we can today.
>>
>> This is why I'm trying to propose the means of combining the kallsyms
>> string data with BTF. Anything that can make the overall size increase
>> manageable so that all the necessary data can stay in the kernel image.
>
> I think this quirk of using kallsyms strings is a no-go. But we should
> experiment and see how much bigger BTF becomes when including all the
> variables. Can you try to prototype pahole's support for this?
Hi Andrii,
Sorry for such a delay here. I tried to prototype this last month but
encountered some issues I couldn't resolve. But recently I picked it up
and I've created a prototype [1] which outputs all variables. (It's a
quite bad prototype, it strips out some useful logic regarding the
BTF_VAR_DATASEC for percpu variables. But I think it's good enough).
On my 5.4-based kernel I saw an increase in BTF section size from 3.8
MiB all the way to 6.1 MiB, or more precisely:
BTF section before: 3905938 bytes
BTF section after: 6391989 bytes (+2486051, +63.6%)
So almost a 2.5 MiB increase. My prototype doesn't output the
btf_var_secinfo structs for percpu variables anymore, which probably
breaks some BPF and reduces BTF slightly. But it also is outputting
a few thousand "dwarf variables" which were correctly filtered before,
so I think it's a wash and it's a pretty good comparison.
Clearly it can't be added without a configuration option, as 2.5 MiB is
pretty huge for a kernel memory addition. But I don't think it's so huge
that nobody would enable it. I know I would :)
[1]: https://github.com/brenns10/dwarves/tree/remove_percpu_restriction_1
> As you
> said, we can guard this extra information with KConfig and pahole
> flags, so distros can always opt-out of bigger BTF if that's too
> prohibitive. As it is right now, without firm understanding how big
> the final BTF is it's hard to make a good decision about go or no-go
> for this.
Hopefully this comparison sheds some light on that now!
>
> As for including source code itself, it going to be prohibitively
> huge, so it's probably out of the question for now as well.
Yeah, I wouldn't advocate for that.
Now, to share some of the cool possibilities that this enables. I have:
- prototype pahole [1] used for the kernel build,
- a prototype drgn with BTF+kallsyms support [2],
- some small kernel patches which add symbols to vmcoreinfo, so that
drgn can find the kallsyms section. I'm happy to share these, I just
haven't sent them anywhere yet.
[2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf
Combining these three things, I've got a debugger which can open up a
vmcore _without DWARF debuginfo_ and allow you to print out typed
variable values. It just relies on BTF + kallsyms.
So the proof of concept is proven, and I'm quite excited about it!
Stephen
next prev parent reply other threads:[~2022-04-27 18:41 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-09 23:20 Question: missing vmlinux BTF variable declarations Stephen Brennan
2022-03-14 7:09 ` Shung-Hsi Yu
2022-03-15 5:53 ` Yonghong Song
2022-03-15 16:37 ` Stephen Brennan
2022-03-15 17:58 ` Arnaldo Carvalho de Melo
2022-03-16 16:06 ` Stephen Brennan
2022-03-25 17:07 ` Andrii Nakryiko
2022-04-27 18:24 ` Stephen Brennan [this message]
2022-04-29 17:10 ` Alexei Starovoitov
2022-05-03 14:39 ` Arnaldo Carvalho de Melo
2022-05-03 17:29 ` Stephen Brennan
2022-05-03 22:31 ` Alan Maguire
2022-05-10 0:10 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r15iv0yd.fsf@stepbren-lnx.us.oracle.com \
--to=stephen.s.brennan@oracle.com \
--cc=acme@kernel.org \
--cc=acme@redhat.com \
--cc=andrii.nakryiko@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=osandov@osandov.com \
--cc=shung-hsi.yu@suse.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox