From: Alan Maguire <alan.maguire@oracle.com>
To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org
Cc: martin.lau@linux.dev, acme@kernel.org, ttreyer@meta.com,
yonghong.song@linux.dev, song@kernel.org,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
haoluo@google.com, jolsa@kernel.org, qmo@kernel.org,
ihor.solodrai@linux.dev, david.faust@oracle.com,
jose.marchesi@oracle.com, bpf@vger.kernel.org,
Alan Maguire <alan.maguire@oracle.com>
Subject: [RFC bpf-next 00/15] support inline tracing with BTF
Date: Wed, 8 Oct 2025 18:34:56 +0100 [thread overview]
Message-ID: <20251008173512.731801-1-alan.maguire@oracle.com> (raw)
The Linux kernel is heavily inlined. As a result, function-focused
observability means it can be difficult to map from code to system
behaviour when tracing. A large number of functions effectively
"disappear" at compile-time; approximately 100,000 are inlined to
443,000 sites in the gcc-14-built x86_64 kernel I have been testing
with for example. This greatly outnumbers the number of available
functions that were _not_ inlined. This disappearing act has
traditionally been carried out on static functions but with
Link-Time Optimization (LTO) non-static functions also become eligible
for such optimization.
The good news is that kprobe tracing can be done on most instructions,
so if we know where the inline site is and where the inlined function
parameters are to be found at those points we can suport tracing at most
of these sites. However the ability to trace inlined functions today
depends on analysis of DWARF debuginfo that is hundreds of megabytes in
size for vmlinux alone (255 Mb of .debug_info on my kernel for example).
This series is an attempt to work through the realization of a
representation of inline sites in the BPF Type Format (BTF) that is small
enough to be feasible to carry with the kernel/modules, but expressive
enough to allow useful tracing at inline sites. Small enough is always
going to be a somewhat subjective measure, but the aim was to ensure
it is a similar order of magnitude to existing kernel/module BTF.
For my kernel, vmlinux BTF is ~6Mb so the informal aim is to be in this
ballpark with inline information representation. Specific numbers
are broken out below, but the approach taken here stores info about
the ~443000 vmlinux inline sites and their parameter availability
in 9.2Mb, compressed to 2.8Mb when that data is delivered via a
compressed module.
The series makes location information about inlines available to tracing
tools via addition of .BTF.extra sections to vmlinux and modules which -
like .BTF sections exposed via /sys/kernel/btf - are made available via
/sys/kernel/btf_extra files, one for the kernel (vmlinux) and one each for
each module. These are stored as split BTF, so for example the vmlinux
.BTF.extra can be viewed via
$ bpftool btf dump -B /sys/kernel/btf/vmlinux file /sys/kernel/btf_extra/vmlinux
i.e. it is split BTF relative to vmlinux BTF.
For modules, .BTF.extra is split BTF relative to the module BTF, so it
is multi-split BTF; to view it we specify the base vmlinux, the child
module BTF and finally the grandchild module .BTF.extra.
So for example for the xfs module:
$ bpftool btf dump -B /sys/kernel/btf/vmlinux -B /sys/kernel/btf/xfs file /sys/kernel/btf_extra/xfs
(this requires an enhancement to bpftool in this series to support multi-split BTF)
To generate .BTF.extra data, pahole changes are needed. These will be sent
in a separate RFC series which I will follow up with; it in turn will
require the libbpf changes in this series to actually produce .BTF.extra
data. pahole will have a new "inline" BTF feature, and this can be
optionally directed to a .BTF.extra section if it is specified as
"inline.extra". A single invocation of pahole is requried to generate .BTF
and .BTF.extra sections. In order to generate inline info the libbpf
changes in this series will have to be applied to pahole; to verify this
is working you should see "inline" in the list of supported BTF
features:
$ pahole --supported_btf_features
encode_force,var,float,decl_tag,type_tag,enum64,optimized_func,consistent_func,decl_tag_kfuncs,reproducible_build,distilled_base,global_var,attributes,inline
To make things simpler an updated pahole is available at [1] which still
requires the changes in patch 2-4 below applied to its lib/bpf/src
submodule directory.
Because the size of the vmlinux binary would grow somewhat with
inclusion of the .BTF.extra section, it can also be delivered via
module (CONFIG_DEBUG_INFO_BTF_EXTRA=m).
So how do we represent inline information? This series proposes a fairly
simple approach, but whatever the final form used, the hope is this
series will help push things forward by tackling some of the problems
with _any_ inline representation (libbpf representation and deduplication,
kernel handling and exposure as new .BTF sections, modular delivery for
kernel inline info, providing libbpf support for tracing sites etc).
This series builds on previous work by Thierry Treyer [2] and
analysis done by Eduard Zingerman, as well as work done with Yonghong Song
at Linux Plumbers [3]. The proposed BTF changes are somewhat different than
Thierry's proposal however. They are intended to provide a simple representation
that while appearing not hugely compact in original form, it is designed
to be easily de-deduplicated by representing information about parameters
at locations in such a way that it can be easily shared across multiple
inline sites.
The info about each inline site is stored in an entry in a BTF kind
BTF_KIND_LOCSEC. Each location provides
- a name for the site (inline function name);
- its function prototype (BTF_KIND_FUNC_PROTO) which represents the types
of the parameters
- its location prototype (BTF_KIND_LOC_PROTO) which represents a list of
the locations of those parameters (in register, constant values etc)
- a relative offset for the address of the site
The BTF_KIND_LOC_PROTO is simply a list of BTF type ids which are either
0 (no location info for this parameter) or of kind BTF_KIND_LOC_PARAM.
BTF_KIND_LOC_PARAM specifies whether the parameter is stored in a
register, is a constant etc. In general the nth type in the _LOC_PROTO
will correspond to the nth parameter in the FUNC_PROTO, though some
location parameters require multiple _LOC_PARAM to express them (such as
a 16-byte struct passed by value in two registers).
See patch 1 for more details on how this is handled.
Note however that the representations are designed to be highly shareable
among location sites; as Eduard discovered, many/most will be simply
register values in line with the calling conventions, so they will share
LOC_PARAMs and LOC_PROTOs in many cases. This increases the space
efficiency of the representation, since deduplication of LOC_PARAM and
LOC_PROTO reduces overall size. LOCSEC data cannot be deduplicated since
they are site-specific, so will always be the long pole in any
representation.
As mentioned above, x86_64 vmlinux built with gcc 14 has 443354 inline sites.
Of these 443,354 locations
- 318161 (~71%) have location information for all function parameters
(where there are 0 or more parameters)
- 76520 (~17%) have incomplete location information; some parameters are
available. For these the vast majority (67070) have only one missing
parameter location.
- 48673 (~11%) have no location info for any of their parameters (where
there are 1 or more parameters)
Some of these gaps result from unhandled location data, specifically
DW_OP_GNU_parameter_ref (of which there are 1296 instances) and some from
complex location expressions, so we could potentially improve location
processing to add more locations if we handled these.
In terms of BTF encoding, we wind up with 12010 LOC_PARAM which are
referenced in various combinations from 37061 LOC_PROTO. We see that
given that there are over 400,000 inline sites, deduplication has
considerably cut down on the overhead of representing this information.
LOCSEC will be 443354*16 bytes, i.e. 6.76 Mb. Between extra FUNC_PROTO,
LOC_PROTO, LOC_PARAM and LOCSECs we wind up adding 9.2Mb to accommodate
443354 inline sites and all their metadata. This works out as
approximately 22 bytes to fully represent each inline site, so we can
see the benefits of deduplication of LOC_PARAM and LOC_PROTOs in this scheme.
When vmlinux BTF inline-related info (FUNC_PROTO, LOC_PARAM, LOC_PROTO
and LOCSECs are delivered via a module (btf_extra.ko.gz), the on-disk
size of that module with compression drops from 9.2Mb to 2.8Mb.
Modules also provide .BTF.extra info in their .BTF.extra sections; we
can see the stats for these as follows:
$ find . -name *.ko|xargs objdump -h |grep ".BTF.extra"|awk '{ sum += strtonum("0x"$3); count++ } END { print "total (kbytes): " sum/1024 " num modules: " count " average(kbytes): " sum/1024/count}'
total (kbytes): 46653.5 num modules: 3044 average(kbytes): 15.3264
So we add 46Mb of .BTF.extra data in total across 3044 modules, averaging
15kbytes per module.
Future work/questions
- the same scheme could be used to represent functions with optimized-out
parameters (which we leave out of BTF encoding), hence the more general
"location" term (as opposed to calling them inlines)
- perhaps we should have a separate CONFIG_DEBUG_INFO_BTF_EXTRA_MODULES=y|n
as we do with CONFIG_DEBUG_INFO_BTF_MODULES?
- .BTF.extra is probably a bad name, given that we have .BTF.ext already...
- not yet implemented is location encoding for out-of-tree modules that
use distilled base BTF. The reason is we need to have distill and
BTF relocation working for multi-split BTF. That is doable but not
implemented in this series.
Patch 1 adds UAPI/kernel support for BTF location info. Note the
kernel does not do anything with location data; it will later be
made available however.
Patch 2 is libbpf support including deduplication, distill,
relocation and field iteration. Note that distill/relocation for
multi-split BTF (used for out-of-tree modules) is not yet impelemented.
Patch 3 is needed because deduplication results in changes in
BTF ids and we stash some in pahole when saving BTF location data.
Having access to the mappings makes dealing with this easier.
Patch 4 fixes a bug in parsing of multi-split BTF (missed when adding
support to create multi-split BTF).
Patch 5 adds bpftool dump support for location data.
Patch 6 adds support to bpftool dump to deal with multiple split BTF
so we can dump location data from modules.
Patches 7-10 are selftests covering various aspects of location support.
Patches 11, 12 add kbuild support for adding BTF extra information; to
actually generate it an updated pahole with the associated libbpf
changes in patches 2-4 above is needed.
Patch 13 adds a libbpf function to load BTF extra data.
Patch 14 adds libbpf support to allow users to trace inlines
via SEC("kloc/module:name") sections. Support is very similar to
USDT with info retrieved from BTF extra sections instead of ELF notes.
kloc tracing will trace all instances of the named inline site,
filling in parameters via the BPF_KPROBE()-like BPF_KLOC() macro.
Patch 15 is a simple test exercising this functionality.
[1] https://github.com/alan-maguire/dwarves/tree/pahole-location-encoding
[2] https://lore.kernel.org/dwarves/20250416-btf_inline-v1-0-e4bd2f8adae5@meta.com/
[3] https://lpc.events/event/18/contributions/1945/
Alan Maguire (15):
bpf: Extend UAPI to support location information
libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC
libbpf: Add option to retrieve map from old->new ids from btf__dedup()
libbpf: Fix parsing of multi-split BTF
bpftool: Add ability to dump LOC_PARAM, LOC_PROTO and LOCSEC
bpftool: Handle multi-split BTF by supporting multiple base BTFs
selftests/bpf: Test helper support for BTF_KIND_LOC[_PARAM|_PROTO|SEC]
selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to field iter tests
selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to dedup split tests
selftests/bpf: BTF distill tests to ensure LOC[_PARAM|_PROTO] add to
split BTF
kbuild: Add support for extra BTF
kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m
libbpf: add API to load extra BTF
libbpf: add support for BTF location attachment
selftests/bpf: Add test tracing inline site using SEC("kloc")
include/asm-generic/vmlinux.lds.h | 4 +
include/linux/bpf.h | 1 +
include/linux/btf.h | 31 +-
include/linux/module.h | 4 +
include/uapi/linux/btf.h | 85 ++-
kernel/bpf/Makefile | 1 +
kernel/bpf/btf.c | 282 +++++++-
kernel/bpf/btf_extra.c | 25 +
kernel/bpf/sysfs_btf.c | 21 +-
kernel/module/main.c | 4 +
lib/Kconfig.debug | 18 +
scripts/Makefile.btf | 9 +
scripts/Makefile.modfinal | 5 +
scripts/link-vmlinux.sh | 19 +-
tools/bpf/bpftool/btf.c | 95 +++
tools/bpf/bpftool/main.c | 3 +-
tools/include/uapi/linux/btf.h | 85 ++-
tools/lib/bpf/Build | 2 +-
tools/lib/bpf/Makefile | 2 +-
tools/lib/bpf/btf.c | 384 +++++++++-
tools/lib/bpf/btf.h | 96 ++-
tools/lib/bpf/btf_dump.c | 10 +-
tools/lib/bpf/btf_iter.c | 23 +
tools/lib/bpf/libbpf.c | 76 +-
tools/lib/bpf/libbpf.h | 27 +
tools/lib/bpf/libbpf.map | 7 +
tools/lib/bpf/libbpf_internal.h | 11 +-
tools/lib/bpf/loc.bpf.h | 297 ++++++++
tools/lib/bpf/loc.c | 653 ++++++++++++++++++
tools/testing/selftests/bpf/btf_helpers.c | 43 +-
.../bpf/prog_tests/btf_dedup_split.c | 93 +++
.../selftests/bpf/prog_tests/btf_distill.c | 68 ++
.../selftests/bpf/prog_tests/btf_field_iter.c | 26 +-
tools/testing/selftests/bpf/prog_tests/kloc.c | 51 ++
tools/testing/selftests/bpf/progs/kloc.c | 36 +
tools/testing/selftests/bpf/test_btf.h | 15 +
36 files changed, 2551 insertions(+), 61 deletions(-)
create mode 100644 kernel/bpf/btf_extra.c
create mode 100644 tools/lib/bpf/loc.bpf.h
create mode 100644 tools/lib/bpf/loc.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/kloc.c
create mode 100644 tools/testing/selftests/bpf/progs/kloc.c
--
2.39.3
next reply other threads:[~2025-10-08 17:35 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-08 17:34 Alan Maguire [this message]
2025-10-08 17:34 ` [RFC bpf-next 01/15] bpf: Extend UAPI to support location information Alan Maguire
2025-10-16 18:36 ` Andrii Nakryiko
2025-10-17 8:43 ` Alan Maguire
2025-10-20 20:57 ` Andrii Nakryiko
2025-10-23 8:17 ` Alan Maguire
2025-11-05 0:43 ` Andrii Nakryiko
2025-10-23 0:56 ` Eduard Zingerman
2025-10-23 8:35 ` Alan Maguire
2025-10-08 17:34 ` [RFC bpf-next 02/15] libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
2025-10-23 0:57 ` Eduard Zingerman
2025-10-23 19:18 ` Eduard Zingerman
2025-10-23 19:59 ` Eduard Zingerman
2025-10-08 17:34 ` [RFC bpf-next 03/15] libbpf: Add option to retrieve map from old->new ids from btf__dedup() Alan Maguire
2025-10-16 18:39 ` Andrii Nakryiko
2025-10-17 8:56 ` Alan Maguire
2025-10-20 21:03 ` Andrii Nakryiko
2025-10-23 8:25 ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 04/15] libbpf: Fix parsing of multi-split BTF Alan Maguire
2025-10-16 18:36 ` Andrii Nakryiko
2025-10-17 13:47 ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 05/15] bpftool: Add ability to dump LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
2025-10-23 0:57 ` Eduard Zingerman
2025-10-23 8:38 ` Alan Maguire
2025-10-23 8:50 ` Eduard Zingerman
2025-10-08 17:35 ` [RFC bpf-next 06/15] bpftool: Handle multi-split BTF by supporting multiple base BTFs Alan Maguire
2025-10-16 18:36 ` Andrii Nakryiko
2025-10-17 13:47 ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 07/15] selftests/bpf: Test helper support for BTF_KIND_LOC[_PARAM|_PROTO|SEC] Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 08/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to field iter tests Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 09/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to dedup split tests Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 10/15] selftests/bpf: BTF distill tests to ensure LOC[_PARAM|_PROTO] add to split BTF Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 11/15] kbuild: Add support for extra BTF Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m Alan Maguire
2025-10-16 18:37 ` Andrii Nakryiko
2025-10-17 13:54 ` Alan Maguire
2025-10-20 21:05 ` Andrii Nakryiko
2025-10-23 0:58 ` Eduard Zingerman
2025-10-23 12:00 ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 13/15] libbpf: add API to load extra BTF Alan Maguire
2025-10-16 18:37 ` Andrii Nakryiko
2025-10-17 13:55 ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 14/15] libbpf: add support for BTF location attachment Alan Maguire
2025-10-16 18:36 ` Andrii Nakryiko
2025-10-17 14:02 ` Alan Maguire
2025-10-20 21:07 ` Andrii Nakryiko
2025-10-08 17:35 ` [RFC bpf-next 15/15] selftests/bpf: Add test tracing inline site using SEC("kloc") Alan Maguire
2025-10-12 23:45 ` [RFC bpf-next 00/15] support inline tracing with BTF Alexei Starovoitov
2025-10-13 7:38 ` Alan Maguire
2025-10-14 0:12 ` Alexei Starovoitov
2025-10-14 9:58 ` Alan Maguire
2025-10-16 18:36 ` Andrii Nakryiko
2025-10-23 14:37 ` Alan Maguire
2025-10-23 16:16 ` Andrii Nakryiko
2025-10-24 11:53 ` Alan Maguire
2025-10-14 11:52 ` Jiri Olsa
2025-10-14 14:55 ` Alan Maguire
2025-10-14 23:04 ` Masami Hiramatsu
2025-10-15 14:17 ` Jiri Olsa
2025-10-15 15:19 ` Alan Maguire
2025-10-15 18:35 ` Jiri Olsa
2025-10-23 22:32 ` Eduard Zingerman
2025-10-24 12:54 ` Alan Maguire
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251008173512.731801-1-alan.maguire@oracle.com \
--to=alan.maguire@oracle.com \
--cc=acme@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=david.faust@oracle.com \
--cc=haoluo@google.com \
--cc=ihor.solodrai@linux.dev \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=jose.marchesi@oracle.com \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=qmo@kernel.org \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=ttreyer@meta.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).