* [PATCH v5 bpf-next 0/9] bpf: support resilient split BTF
@ 2024-05-28 12:23 Alan Maguire
2024-05-28 12:24 ` [PATCH v5 bpf-next 1/9] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
` (8 more replies)
0 siblings, 9 replies; 29+ messages in thread
From: Alan Maguire @ 2024-05-28 12:23 UTC (permalink / raw)
To: andrii, jolsa, acme, quentin
Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan, Alan Maguire
Split BPF Type Format (BTF) provides huge advantages in that kernel
modules only have to provide type information for types that they do not
share with the core kernel; for core kernel types, split BTF refers to
core kernel BTF type ids. So for a STRUCT sk_buff, a module that
uses that structure (or a pointer to it) simply needs to refer to the
core kernel type id, saving the need to define the structure and its many
dependents. This cuts down on duplication and makes BTF as compact
as possible.
However, there is a downside. This scheme requires the references from
split BTF to base BTF to be valid not just at encoding time, but at use
time (when the module is loaded). Even a small change in kernel types
can perturb the type ids in core kernel BTF, and - if the new reproducible
BTF option is not used - pahole's parallel processing of compilation units
can lead to different type ids for the same kernel if the BTF is
regenerated.
So we have a robustness problem for split BTF for cases where a module is
not always compiled at the same time as the kernel. This problem is
particularly acute for distros which generally want module builders to be
able to compile a module for the lifetime of a Linux stable-based release,
and have it continue to be valid over the lifetime of that release, even
as changes in data structures (and hence BTF types) accrue. Today it's not
possible to generate BTF for modules that works beyond the initial
kernel it is compiled against - kernel bugfixes etc invalidate the split
BTF references to vmlinux BTF, and BTF is no longer usable for the
module.
The goal of this series is to provide options to provide additional
context for cases like this. That context comes in the form of
distilled base BTF; it stands in for the base BTF, and contains
information about the types referenced from split BTF, but not their
full descriptions. The modified split BTF will refer to type ids in
this .BTF.base section, and when the kernel loads such modules it
will use that .BTF.base to map references from split BTF to the
equivalent current vmlinux base BTF types. Once this relocation
process has succeeded, the module BTF available in /sys/kernel/btf
will look exactly as if it was built with the current vmlinux;
references to base types will be fixed up etc.
A module builder - using this series along with the pahole changes -
can then build a module with distilled base BTF via an out-of-tree
module build, i.e.
make -C . M=path/2/module
The module will have a .BTF section (the split BTF) and a
.BTF.base section. The latter is small in size - distilled base
BTF does not need full struct/union/enum information for named
types for example. For 2667 modules built with distilled base BTF,
the average size observed was 1556 bytes (stddev 1563). The overall
size added to this 2667 modules was 5.3Mb.
Note that for the in-tree modules, this approach is not needed as
split and base BTF in the case of in-tree modules are always built
and re-built together.
The series first focuses on generating split BTF with distilled base
BTF; then relocation support is added to allow split BTF with
an associated distlled base to be relocated with a new base BTF.
Next Eduard's patch allows BTF ELF parsing to work with both
.BTF and .BTF.base sections; this ensures that bpftool will be
able to dump BTF for a module with a .BTF.base section for example,
or indeed dump relocated BTF where a module and a "-B vmlinux"
is supplied.
Then we add support to resolve_btfids to ignore base BTF - i.e.
to avoid relocation - if a .BTF.base section is found. This ensures
the .BTF.ids section is populated with ids relative to the distilled
base (these will be relocated as part of module load).
Finally the series supports storage of .BTF.base data/size in modules
and supports sharing of relocation code with the kernel to allow
relocation of module BTF. For the kernel, this relocation
process happens at module load time, and we relocate split BTF
references to point at types in the current vmlinux BTF. As part of
this, .BTF.ids references need to be mapped also.
So concretely, what happens is
- we generate split BTF in the .BTF section of a module that refers to
types in the .BTF.base section as base types; the latter are not full
type descriptions but provide information about the base type. So
a STRUCT sk_buff would be represented as a FWD struct sk_buff in
distilled base BTF for example.
- when the module is loaded, the split BTF is relocated with vmlinux
BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff
in vmlinux BTF and map all split BTF references to the distilled base
FWD sk_buff, replacing them with references to the vmlinux BTF
STRUCT sk_buff.
A previous approach to this problem [1] utilized standalone BTF for such
cases - where the BTF is not defined relative to base BTF so there is no
relocation required. The problem with that approach is that from
the verifier perspective, some types are special, and having a custom
representation of a core kernel type that did not necessarily match the
current representation is not tenable. So the approach taken here was
to preserve the split BTF model while minimizing the representation of
the context needed to relocate split and current vmlinux BTF.
To generate distilled .BTF.base sections the associated dwarves
patch (to be applied on the "next" branch there) is needed [3]
Without it, things will still work but modules will not be built
with a .BTF.base section.
Changes since v4[4]:
- Moved embeddedness, duplicate name checks to relocation time
and record struct/union size for all distilled struct/unions
instead of using forwards. This allows us to carry out
type compatibility checks based on the base BTF we want to
relocate with (Eduard, patches 1, 3)
- Moved to using qsort() instead of qsort_r() as support for
qsort_r() appears to be missing in Android libc (Andrii, patch 3)
- Sorting/searching now incorporates size matching depending
on BTF kind and embeddedness of struct/union (Eduard, Andrii,
patch 3)
- Improved naming of various types during relocation to avoid
confusion (Andrii, patch 3)
- Incorporated Eduard's patch (patch 5) which handles .BTF.base
sections internally in btf_parse_elf(). This makes ELF parsing
work with split BTF, split BTF with a distilled base, split
BTF with a distilled base _and_ base BTF (by relocating) etc.
Having this avoids the need for bpftool changes; it will work
as-is with .BTF.base sections (Eduard, patch 4)
- Updated resolve_btfids to _not_ relocate BTF for modules
where a .BTF.base section is present; in that one case we
do not want to relocate BTF as the .BTF.ids section should
reflect ids in .BTF.base which will later be relocated on
module load (Eduard, Andrii, patch 5)
Changes since v3[5]:
- distill now checks for duplicate-named struct/unions and records
them as a sized struct/union to help identify which of the
multiple base BTF structs/unions it refers to (Eduard, patch 1)
- added test support for multiple name handling (Eduard, patch 2)
- simplified the string mapping when updating split BTF to use
base BTF instead of distilled base. Since the only string
references split BTF can make to base BTF are the names of
the base types, create a string map from distilled string
offset -> base BTF string offset and update string offsets
by visiting all strings in split BTF; this saves having to
do costly searches of base BTF (Eduard, patch 7,10)
- fixed bpftool manpage and indentation issues (Quentin, patch 11)
Also explored Eduard's suggestion of doing an implicit fallback
to checking for .BTF.base section in btf__parse() when it is
called to get base BTF. However while it is doable, it turned
out to be difficult operationally. Since fallback is implicit
we do not know the source of the BTF - was it from .BTF or
.BTF.base? In bpftool, we want to try first standalone BTF,
then split, then split with distilled base. Having a way
to explicitly request .BTF.base via btf__parse_opts() fits
that model better.
Changes since v2[6]:
- submitted patch to use --btf_features in Makefile.btf for pahole
v1.26 and later separately (Andrii). That has landed in bpf-next
now.
- distilled base now encodes ENUM64 as fwd ENUM (size 8), eliminating
the need for support for ENUM64 in btf__add_fwd (patch 1, Andrii)
- moved to distilling only named types, augmenting split BTF with
associated reference types; this simplifies greatly the distilled
base BTF and the mapping operation between distilled and base
BTF when relocating (most of the series changes, Andrii)
- relocation now iterates over base BTF, looking for matches based
on name in distilled BTF. Distilled BTF is pre-sorted by name
(Andrii, patch 8)
- removed most redundant compabitiliby checks aside from struct
size for base types/embedded structs and kind compatibility
(since we only match on name) (Andrii, patch 8)
- btf__parse_opts() now replaces btf_parse() internally in libbpf
(Eduard, patch 3)
Changes since RFC [7]:
- updated terminology; we replace clunky "base reference" BTF with
distilling base BTF into a .BTF.base section. Similarly BTF
reconcilation becomes BTF relocation (Andrii, most patches)
- add distilled base BTF by default for out-of-tree modules
(Alexei, patch 8)
- distill algorithm updated to record size of embedded struct/union
by recording it as a 0-vlen STRUCT/UNION with size preserved
(Andrii, patch 2)
- verify size match on relocation for such STRUCT/UNIONs (Andrii,
patch 9)
- with embedded STRUCT/UNION recording size, we can have bpftool
dump a header representation using .BTF.base + .BTF sections
rather than special-casing and refusing to use "format c" for
that case (patch 5)
- match enum with enum64 and vice versa (Andrii, patch 9)
- ensure that resolve_btfids works with BTF without .BTF.base
section (patch 7)
- update tests to cover embedded types, arrays and function
prototypes (patches 3, 12)
[1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/
[2] https://lore.kernel.org/bpf/20240501175035.2476830-1-alan.maguire@oracle.com/
[3] https://lore.kernel.org/bpf/20240517102714.4072080-1-alan.maguire@oracle.com/
[4] https://lore.kernel.org/bpf/20240517102246.4070184-1-alan.maguire@oracle.com/
[5] https://lore.kernel.org/bpf/20240510103052.850012-1-alan.maguire@oracle.com/
[6] https://lore.kernel.org/bpf/20240424154806.3417662-1-alan.maguire@oracle.com/
[7] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/
Alan Maguire (8):
libbpf: add btf__distill_base() creating split BTF with distilled base
BTF
selftests/bpf: test distilled base, split BTF generation
libbpf: split BTF relocation
selftests/bpf: extend distilled BTF tests to cover BTF relocation
resolve_btfids: handle presence of .BTF.base section
module, bpf: store BTF base pointer in struct module
libbpf,bpf: share BTF relocate-related code with kernel
kbuild,bpf: add module-specific pahole flags for distilled base BTF
Eduard Zingerman (1):
libbpf: make btf_parse_elf process .BTF.base transparently
include/linux/btf.h | 45 ++
include/linux/module.h | 2 +
kernel/bpf/Makefile | 10 +-
kernel/bpf/btf.c | 168 +++--
kernel/module/main.c | 5 +-
scripts/Makefile.btf | 5 +
scripts/Makefile.modfinal | 2 +-
tools/bpf/resolve_btfids/main.c | 8 +
tools/lib/bpf/Build | 2 +-
tools/lib/bpf/btf.c | 600 ++++++++++++------
tools/lib/bpf/btf.h | 36 ++
tools/lib/bpf/btf_iter.c | 143 +++++
tools/lib/bpf/btf_relocate.c | 453 +++++++++++++
tools/lib/bpf/libbpf.map | 2 +
tools/lib/bpf/libbpf_internal.h | 3 +
.../selftests/bpf/prog_tests/btf_distill.c | 341 ++++++++++
16 files changed, 1590 insertions(+), 235 deletions(-)
create mode 100644 tools/lib/bpf/btf_iter.c
create mode 100644 tools/lib/bpf/btf_relocate.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c
--
2.31.1
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v5 bpf-next 1/9] libbpf: add btf__distill_base() creating split BTF with distilled base BTF
2024-05-28 12:23 [PATCH v5 bpf-next 0/9] bpf: support resilient split BTF Alan Maguire
@ 2024-05-28 12:24 ` Alan Maguire
2024-05-30 19:58 ` Eduard Zingerman
2024-05-28 12:24 ` [PATCH v5 bpf-next 2/9] selftests/bpf: test distilled base, split BTF generation Alan Maguire
` (7 subsequent siblings)
8 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-05-28 12:24 UTC (permalink / raw)
To: andrii, jolsa, acme, quentin
Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan, Alan Maguire
To support more robust split BTF, adding supplemental context for the
base BTF type ids that split BTF refers to is required. Without such
references, a simple shuffling of base BTF type ids (without any other
significant change) invalidates the split BTF. Here the attempt is made
to store additional context to make split BTF more robust.
This context comes in the form of distilled base BTF providing minimal
information (name and - in some cases - size) for base INTs, FLOATs,
STRUCTs, UNIONs, ENUMs and ENUM64s along with modified split BTF that
points at that base and contains any additional types needed (such as
TYPEDEF, PTR and anonymous STRUCT/UNION declarations). This
information constitutes the minimal BTF representation needed to
disambiguate or remove split BTF references to base BTF. The rules
are as follows:
- INT, FLOAT, FWD are recorded in full.
- if a named base BTF STRUCT or UNION is referred to from split BTF, it
will be encoded as a zero-member sized STRUCT/UNION (preserving
size for later relocation checks). Only base BTF STRUCT/UNIONs
that are either embedded in split BTF STRUCT/UNIONs or that have
multiple STRUCT/UNION instances of the same name will _need_ size
checks at relocation time, but as it is possible a different set of
types will be duplicates in the later to-be-resolved base BTF,
we preserve size information for all named STRUCT/UNIONs.
- if an ENUM[64] is named, a ENUM forward representation (an ENUM
with no values) of the same size is used.
- in all other cases, the type is added to the new split BTF.
Avoiding struct/union/enum/enum64 expansion is important to keep the
distilled base BTF representation to a minimum size.
When successful, new representations of the distilled base BTF and new
split BTF that refers to it are returned. Both need to be freed by the
caller.
So to take a simple example, with split BTF with a type referring
to "struct sk_buff", we will generate distilled base BTF with a
0-member STRUCT sk_buff of the appropriate size, and the split BTF
will refer to it instead.
Tools like pahole can utilize such split BTF to populate the .BTF
section (split BTF) and an additional .BTF.base section. Then
when the split BTF is loaded, the distilled base BTF can be used
to relocate split BTF to reference the current (and possibly changed)
base BTF.
So for example if "struct sk_buff" was id 502 when the split BTF was
originally generated, we can use the distilled base BTF to see that
id 502 refers to a "struct sk_buff" and replace instances of id 502
with the current (relocated) base BTF sk_buff type id.
Distilled base BTF is small; when building a kernel with all modules
using distilled base BTF as a test, overall module size grew by only
5.3Mb total across ~2700 modules.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/lib/bpf/btf.c | 302 ++++++++++++++++++++++++++++++++++++++-
tools/lib/bpf/btf.h | 21 +++
tools/lib/bpf/libbpf.map | 1 +
3 files changed, 318 insertions(+), 6 deletions(-)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 2d0840ef599a..9f68268e659a 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1771,9 +1771,8 @@ static int btf_rewrite_str(__u32 *str_off, void *ctx)
return 0;
}
-int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
+static int btf_add_type(struct btf_pipe *p, const struct btf_type *src_type)
{
- struct btf_pipe p = { .src = src_btf, .dst = btf };
struct btf_type *t;
int sz, err;
@@ -1782,20 +1781,27 @@ int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_t
return libbpf_err(sz);
/* deconstruct BTF, if necessary, and invalidate raw_data */
- if (btf_ensure_modifiable(btf))
+ if (btf_ensure_modifiable(p->dst))
return libbpf_err(-ENOMEM);
- t = btf_add_type_mem(btf, sz);
+ t = btf_add_type_mem(p->dst, sz);
if (!t)
return libbpf_err(-ENOMEM);
memcpy(t, src_type, sz);
- err = btf_type_visit_str_offs(t, btf_rewrite_str, &p);
+ err = btf_type_visit_str_offs(t, btf_rewrite_str, p);
if (err)
return libbpf_err(err);
- return btf_commit_type(btf, sz);
+ return btf_commit_type(p->dst, sz);
+}
+
+int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
+{
+ struct btf_pipe p = { .src = src_btf, .dst = btf };
+
+ return btf_add_type(&p, src_type);
}
static int btf_rewrite_type_ids(__u32 *type_id, void *ctx)
@@ -5212,3 +5218,287 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
return 0;
}
+
+struct btf_distill {
+ struct btf_pipe pipe;
+ int *id_map;
+ unsigned int split_start_id;
+ unsigned int split_start_str;
+ int diff_id;
+};
+
+static int btf_add_distilled_type_ids(__u32 *id, void *ctx)
+{
+ struct btf_distill *dist = ctx;
+ struct btf_type *t = btf_type_by_id(dist->pipe.src, *id);
+
+ if (!*id)
+ return 0;
+ /* split BTF id, not needed */
+ if (*id >= dist->split_start_id)
+ return 0;
+ /* already added ? */
+ if (dist->id_map[*id] > 0)
+ return 0;
+
+ /* only a subset of base BTF types should be referenced from split
+ * BTF; ensure nothing unexpected is referenced.
+ */
+ switch (btf_kind(t)) {
+ case BTF_KIND_INT:
+ case BTF_KIND_FLOAT:
+ case BTF_KIND_FWD:
+ case BTF_KIND_ARRAY:
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION:
+ case BTF_KIND_TYPEDEF:
+ case BTF_KIND_ENUM:
+ case BTF_KIND_ENUM64:
+ case BTF_KIND_PTR:
+ case BTF_KIND_CONST:
+ case BTF_KIND_RESTRICT:
+ case BTF_KIND_VOLATILE:
+ case BTF_KIND_FUNC_PROTO:
+ case BTF_KIND_TYPE_TAG:
+ dist->id_map[*id] = *id;
+ break;
+ default:
+ pr_warn("unexpected reference to base type[%u] of kind [%u] when creating distilled base BTF.\n",
+ *id, btf_kind(t));
+ return -EINVAL;
+ }
+
+ /* struct/union members not needed, except for anonymous structs
+ * and unions, which we need since name won't help us determine
+ * matches; so if a named struct/union, no need to recurse
+ * into members. For anonymous struct/unions, we may need their
+ * member types.
+ */
+ if (btf_is_composite(t) && t->name_off)
+ return 0;
+
+ /* ensure references in type are added also. */
+ return btf_type_visit_type_ids(t, btf_add_distilled_type_ids, ctx);
+}
+
+static int btf_add_distilled_types(struct btf_distill *dist)
+{
+ bool adding_to_base = dist->pipe.dst->start_id == 1;
+ int id = btf__type_cnt(dist->pipe.dst);
+ struct btf_type *t;
+ int i, err = 0;
+
+ /* Add types for each of the required references to either distilled
+ * base or split BTF, depending on type characteristics.
+ */
+ for (i = 1; i < dist->split_start_id; i++) {
+ const char *name;
+ int kind;
+
+ if (!dist->id_map[i])
+ continue;
+ t = btf_type_by_id(dist->pipe.src, i);
+ kind = btf_kind(t);
+ name = btf__name_by_offset(dist->pipe.src, t->name_off);
+
+ switch (kind) {
+ case BTF_KIND_INT:
+ case BTF_KIND_FLOAT:
+ case BTF_KIND_FWD:
+ /* Named int, float, fwd are added to base. */
+ if (!adding_to_base)
+ continue;
+ err = btf_add_type(&dist->pipe, t);
+ break;
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION:
+ /* Named struct/union are added to base as 0-vlen
+ * struct/union of same size. Anonymous struct/unions
+ * are added to split BTF as-is.
+ */
+ if (adding_to_base) {
+ if (!t->name_off)
+ continue;
+ err = btf_add_composite(dist->pipe.dst, kind, name, t->size);
+ } else {
+ if (t->name_off)
+ continue;
+ err = btf_add_type(&dist->pipe, t);
+ }
+ break;
+ case BTF_KIND_ENUM:
+ case BTF_KIND_ENUM64:
+ /* Named enum[64]s are added to base as a sized
+ * enum; relocation will match with appropriately-named
+ * and sized enum or enum64.
+ *
+ * Anonymous enums are added to split BTF as-is.
+ */
+ if (adding_to_base) {
+ if (!t->name_off)
+ continue;
+ err = btf__add_enum(dist->pipe.dst, name, t->size);
+ } else {
+ if (t->name_off)
+ continue;
+ err = btf_add_type(&dist->pipe, t);
+ }
+ break;
+ case BTF_KIND_ARRAY:
+ case BTF_KIND_TYPEDEF:
+ case BTF_KIND_PTR:
+ case BTF_KIND_CONST:
+ case BTF_KIND_RESTRICT:
+ case BTF_KIND_VOLATILE:
+ case BTF_KIND_FUNC_PROTO:
+ case BTF_KIND_TYPE_TAG:
+ /* All other types are added to split BTF. */
+ if (adding_to_base)
+ continue;
+ err = btf_add_type(&dist->pipe, t);
+ break;
+ default:
+ pr_warn("unexpected kind when adding base type '%s'[%u] of kind [%u] to distilled base BTF.\n",
+ name, i, kind);
+ return -EINVAL;
+
+ }
+ if (err < 0)
+ break;
+ dist->id_map[i] = id++;
+ }
+ return err;
+}
+
+/* Split BTF ids without a mapping will be shifted downwards since distilled
+ * base BTF is smaller than the original base BTF. For those that have a
+ * mapping (either to base or updated split BTF), update the id based on
+ * that mapping.
+ */
+static int btf_update_distilled_type_ids(__u32 *id, void *ctx)
+{
+ struct btf_distill *dist = ctx;
+
+ if (dist->id_map[*id])
+ *id = dist->id_map[*id];
+ else if (*id >= dist->split_start_id)
+ *id -= dist->diff_id;
+ return 0;
+}
+
+/* Create updated split BTF with distilled base BTF; distilled base BTF
+ * consists of BTF information required to clarify the types that split
+ * BTF refers to, omitting unneeded details. Specifically it will contain
+ * base types and memberless definitions of named structs, unions and enumerated
+ * types. Associated reference types like pointers, arrays and anonymous
+ * structs, unions and enumerated types will be added to split BTF.
+ * Size is recorded for named struct/unions to help guide matching to the
+ * target base BTF during later relocation.
+ *
+ * The only case where structs, unions or enumerated types are fully represented
+ * is when they are anonymous; in such cases, the anonymous type is added to
+ * split BTF in full.
+ *
+ * We return newly-created split BTF where the split BTF refers to a newly-created
+ * distilled base BTF. Both must be freed separately by the caller.
+ */
+int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf,
+ struct btf **new_split_btf)
+{
+ struct btf *new_base = NULL, *new_split = NULL;
+ const struct btf *old_base;
+ unsigned int n = btf__type_cnt(src_btf);
+ struct btf_distill dist = {};
+ struct btf_type *t;
+ int i, err = 0;
+
+ /* src BTF must be split BTF. */
+ old_base = btf__base_btf(src_btf);
+ if (!new_base_btf || !new_split_btf || !old_base)
+ return libbpf_err(-EINVAL);
+
+ new_base = btf__new_empty();
+ if (!new_base)
+ return libbpf_err(-ENOMEM);
+ dist.id_map = calloc(n, sizeof(*dist.id_map));
+ if (!dist.id_map) {
+ err = -ENOMEM;
+ goto done;
+ }
+ dist.pipe.src = src_btf;
+ dist.pipe.dst = new_base;
+ dist.pipe.str_off_map = hashmap__new(btf_dedup_identity_hash_fn, btf_dedup_equal_fn, NULL);
+ if (IS_ERR(dist.pipe.str_off_map)) {
+ err = -ENOMEM;
+ goto done;
+ }
+ dist.split_start_id = btf__type_cnt(old_base);
+ dist.split_start_str = old_base->hdr->str_len;
+
+ /* Pass over src split BTF; generate the list of base BTF type ids it
+ * references; these will constitute our distilled BTF set to be
+ * distributed over base and split BTF as appropriate.
+ */
+ for (i = src_btf->start_id; i < n; i++) {
+ t = btf_type_by_id(src_btf, i);
+ err = btf_type_visit_type_ids(t, btf_add_distilled_type_ids, &dist);
+ if (err < 0)
+ goto done;
+ }
+ /* Next add types for each of the required references to base BTF and split BTF
+ * in turn.
+ */
+ err = btf_add_distilled_types(&dist);
+ if (err < 0)
+ goto done;
+
+ /* Create new split BTF with distilled base BTF as its base; the final
+ * state is split BTF with distilled base BTF that represents enough
+ * about its base references to allow it to be relocated with the base
+ * BTF available.
+ */
+ new_split = btf__new_empty_split(new_base);
+ if (!new_split_btf) {
+ err = -errno;
+ goto done;
+ }
+ dist.pipe.dst = new_split;
+ /* First add all split types */
+ for (i = src_btf->start_id; i < n; i++) {
+ t = btf_type_by_id(src_btf, i);
+ err = btf_add_type(&dist.pipe, t);
+ if (err < 0)
+ goto done;
+ }
+ /* Now add distilled types to split BTF that are not added to base. */
+ err = btf_add_distilled_types(&dist);
+ if (err < 0)
+ goto done;
+
+ /* All split BTF ids will be shifted downwards since there are less base
+ * BTF ids in distilled base BTF.
+ */
+ dist.diff_id = dist.split_start_id - btf__type_cnt(new_base);
+
+ n = btf__type_cnt(new_split);
+ /* Now update base/split BTF ids. */
+ for (i = 1; i < n; i++) {
+ t = btf_type_by_id(new_split, i);
+
+ err = btf_type_visit_type_ids(t, btf_update_distilled_type_ids, &dist);
+ if (err < 0)
+ goto done;
+ }
+done:
+ free(dist.id_map);
+ hashmap__free(dist.pipe.str_off_map);
+ if (err) {
+ btf__free(new_split);
+ btf__free(new_base);
+ return libbpf_err(err);
+ }
+ *new_base_btf = new_base;
+ *new_split_btf = new_split;
+
+ return 0;
+}
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 8e6880d91c84..cb08ee9a5a10 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -107,6 +107,27 @@ LIBBPF_API struct btf *btf__new_empty(void);
*/
LIBBPF_API struct btf *btf__new_empty_split(struct btf *base_btf);
+/**
+ * @brief **btf__distill_base()** creates new versions of the split BTF
+ * *src_btf* and its base BTF. The new base BTF will only contain the types
+ * needed to improve robustness of the split BTF to small changes in base BTF.
+ * When that split BTF is loaded against a (possibly changed) base, this
+ * distilled base BTF will help update references to that (possibly changed)
+ * base BTF.
+ *
+ * Both the new split and its associated new base BTF must be freed by
+ * the caller.
+ *
+ * If successful, 0 is returned and **new_base_btf** and **new_split_btf**
+ * will point at new base/split BTF. Both the new split and its associated
+ * new base BTF must be freed by the caller.
+ *
+ * A negative value is returned on error and the thread-local `errno` variable
+ * is set to the error code as well.
+ */
+LIBBPF_API int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf,
+ struct btf **new_split_btf);
+
LIBBPF_API struct btf *btf__parse(const char *path, struct btf_ext **btf_ext);
LIBBPF_API struct btf *btf__parse_split(const char *path, struct btf *base_btf);
LIBBPF_API struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index c1ce8aa3520b..9e69d6e2a512 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -419,6 +419,7 @@ LIBBPF_1.4.0 {
LIBBPF_1.5.0 {
global:
+ btf__distill_base;
bpf_program__attach_sockmap;
ring__consume_n;
ring_buffer__consume_n;
--
2.31.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v5 bpf-next 2/9] selftests/bpf: test distilled base, split BTF generation
2024-05-28 12:23 [PATCH v5 bpf-next 0/9] bpf: support resilient split BTF Alan Maguire
2024-05-28 12:24 ` [PATCH v5 bpf-next 1/9] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
@ 2024-05-28 12:24 ` Alan Maguire
2024-05-30 20:23 ` Eduard Zingerman
2024-05-28 12:24 ` [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation Alan Maguire
` (6 subsequent siblings)
8 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-05-28 12:24 UTC (permalink / raw)
To: andrii, jolsa, acme, quentin
Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan, Alan Maguire
Test generation of split+distilled base BTF, ensuring that
- named base BTF STRUCTs and UNIONs are represented as 0-vlen sized
STRUCT/UNIONs
- named ENUM[64]s are represented as 0-vlen named ENUM[64]s
- anonymous struct/unions are represented in full in split BTF
- anonymous enums are represented in full in split BTF
- types unreferenced from split BTF are not present in distilled
base BTF
Also test that with vmlinux BTF and split BTF based upon it,
we only represent needed base types referenced from split BTF
in distilled base.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
.../selftests/bpf/prog_tests/btf_distill.c | 274 ++++++++++++++++++
1 file changed, 274 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_distill.c b/tools/testing/selftests/bpf/prog_tests/btf_distill.c
new file mode 100644
index 000000000000..5c3a38747962
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/btf_distill.c
@@ -0,0 +1,274 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024, Oracle and/or its affiliates. */
+
+#include <test_progs.h>
+#include <bpf/btf.h>
+#include "btf_helpers.h"
+
+/* Fabricate base, split BTF with references to base types needed; then create
+ * split BTF with distilled base BTF and ensure expectations are met:
+ * - only referenced base types from split BTF are present
+ * - struct/union/enum are represented as empty unless anonymous, when they
+ * are represented in full in split BTF
+ */
+static void test_distilled_base(void)
+{
+ struct btf *btf1 = NULL, *btf2 = NULL, *btf3 = NULL, *btf4 = NULL;
+
+ btf1 = btf__new_empty();
+ if (!ASSERT_OK_PTR(btf1, "empty_main_btf"))
+ return;
+
+ btf__add_int(btf1, "int", 4, BTF_INT_SIGNED); /* [1] int */
+ btf__add_ptr(btf1, 1); /* [2] ptr to int */
+ btf__add_struct(btf1, "s1", 8); /* [3] struct s1 { */
+ btf__add_field(btf1, "f1", 2, 0, 0); /* int *f1; */
+ /* } */
+ btf__add_struct(btf1, "", 12); /* [4] struct { */
+ btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */
+ btf__add_field(btf1, "f2", 3, 32, 0); /* struct s1 f2; */
+ /* } */
+ btf__add_int(btf1, "unsigned int", 4, 0); /* [5] unsigned int */
+ btf__add_union(btf1, "u1", 12); /* [6] union u1 { */
+ btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */
+ btf__add_field(btf1, "f2", 2, 0, 0); /* int *f2; */
+ /* } */
+ btf__add_union(btf1, "", 4); /* [7] union { */
+ btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */
+ /* } */
+ btf__add_enum(btf1, "e1", 4); /* [8] enum e1 { */
+ btf__add_enum_value(btf1, "v1", 1); /* v1 = 1; */
+ /* } */
+ btf__add_enum(btf1, "", 4); /* [9] enum { */
+ btf__add_enum_value(btf1, "av1", 2); /* av1 = 2; */
+ /* } */
+ btf__add_enum64(btf1, "e641", 8, true); /* [10] enum64 { */
+ btf__add_enum64_value(btf1, "v1", 1024); /* v1 = 1024; */
+ /* } */
+ btf__add_enum64(btf1, "", 8, true); /* [11] enum64 { */
+ btf__add_enum64_value(btf1, "v1", 1025); /* v1 = 1025; */
+ /* } */
+ btf__add_struct(btf1, "unneeded", 4); /* [12] struct unneeded { */
+ btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */
+ /* } */
+ btf__add_struct(btf1, "embedded", 4); /* [13] struct embedded { */
+ btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */
+ /* } */
+ btf__add_func_proto(btf1, 1); /* [14] int (*)(int *p1); */
+ btf__add_func_param(btf1, "p1", 1);
+
+ btf__add_array(btf1, 1, 1, 3); /* [15] int [3]; */
+
+ btf__add_struct(btf1, "from_proto", 4); /* [16] struct from_proto { */
+ btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */
+ /* } */
+ btf__add_union(btf1, "u1", 4); /* [17] union u1 { */
+ btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */
+ /* } */
+ VALIDATE_RAW_BTF(
+ btf1,
+ "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+ "[2] PTR '(anon)' type_id=1",
+ "[3] STRUCT 's1' size=8 vlen=1\n"
+ "\t'f1' type_id=2 bits_offset=0",
+ "[4] STRUCT '(anon)' size=12 vlen=2\n"
+ "\t'f1' type_id=1 bits_offset=0\n"
+ "\t'f2' type_id=3 bits_offset=32",
+ "[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)",
+ "[6] UNION 'u1' size=12 vlen=2\n"
+ "\t'f1' type_id=1 bits_offset=0\n"
+ "\t'f2' type_id=2 bits_offset=0",
+ "[7] UNION '(anon)' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n"
+ "\t'v1' val=1",
+ "[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+ "\t'av1' val=2",
+ "[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n"
+ "\t'v1' val=1024",
+ "[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+ "\t'v1' val=1025",
+ "[12] STRUCT 'unneeded' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[13] STRUCT 'embedded' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[14] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+ "\t'p1' type_id=1",
+ "[15] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3",
+ "[16] STRUCT 'from_proto' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[17] UNION 'u1' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0");
+
+ btf2 = btf__new_empty_split(btf1);
+ if (!ASSERT_OK_PTR(btf2, "empty_split_btf"))
+ goto cleanup;
+
+ btf__add_ptr(btf2, 3); /* [18] ptr to struct s1 */
+ /* add ptr to struct anon */
+ btf__add_ptr(btf2, 4); /* [19] ptr to struct (anon) */
+ btf__add_const(btf2, 6); /* [20] const union u1 */
+ btf__add_restrict(btf2, 7); /* [21] restrict union (anon) */
+ btf__add_volatile(btf2, 8); /* [22] volatile enum e1 */
+ btf__add_typedef(btf2, "et", 9); /* [23] typedef enum (anon) */
+ btf__add_const(btf2, 10); /* [24] const enum64 e641 */
+ btf__add_ptr(btf2, 11); /* [25] restrict enum64 (anon) */
+ btf__add_struct(btf2, "with_embedded", 4); /* [26] struct with_embedded { */
+ btf__add_field(btf2, "f1", 13, 0, 0); /* struct embedded f1; */
+ /* } */
+ btf__add_func(btf2, "fn", BTF_FUNC_STATIC, 14); /* [27] int fn(int p1); */
+ btf__add_typedef(btf2, "arraytype", 15); /* [28] typedef int[3] foo; */
+ btf__add_func_proto(btf2, 1); /* [29] int (*)(struct from proto p1); */
+ btf__add_func_param(btf2, "p1", 16);
+
+ VALIDATE_RAW_BTF(
+ btf2,
+ "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+ "[2] PTR '(anon)' type_id=1",
+ "[3] STRUCT 's1' size=8 vlen=1\n"
+ "\t'f1' type_id=2 bits_offset=0",
+ "[4] STRUCT '(anon)' size=12 vlen=2\n"
+ "\t'f1' type_id=1 bits_offset=0\n"
+ "\t'f2' type_id=3 bits_offset=32",
+ "[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)",
+ "[6] UNION 'u1' size=12 vlen=2\n"
+ "\t'f1' type_id=1 bits_offset=0\n"
+ "\t'f2' type_id=2 bits_offset=0",
+ "[7] UNION '(anon)' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n"
+ "\t'v1' val=1",
+ "[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+ "\t'av1' val=2",
+ "[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n"
+ "\t'v1' val=1024",
+ "[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+ "\t'v1' val=1025",
+ "[12] STRUCT 'unneeded' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[13] STRUCT 'embedded' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[14] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+ "\t'p1' type_id=1",
+ "[15] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3",
+ "[16] STRUCT 'from_proto' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[17] UNION 'u1' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[18] PTR '(anon)' type_id=3",
+ "[19] PTR '(anon)' type_id=4",
+ "[20] CONST '(anon)' type_id=6",
+ "[21] RESTRICT '(anon)' type_id=7",
+ "[22] VOLATILE '(anon)' type_id=8",
+ "[23] TYPEDEF 'et' type_id=9",
+ "[24] CONST '(anon)' type_id=10",
+ "[25] PTR '(anon)' type_id=11",
+ "[26] STRUCT 'with_embedded' size=4 vlen=1\n"
+ "\t'f1' type_id=13 bits_offset=0",
+ "[27] FUNC 'fn' type_id=14 linkage=static",
+ "[28] TYPEDEF 'arraytype' type_id=15",
+ "[29] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+ "\t'p1' type_id=16");
+
+ if (!ASSERT_EQ(0, btf__distill_base(btf2, &btf3, &btf4),
+ "distilled_base") ||
+ !ASSERT_OK_PTR(btf3, "distilled_base") ||
+ !ASSERT_OK_PTR(btf4, "distilled_split") ||
+ !ASSERT_EQ(8, btf__type_cnt(btf3), "distilled_base_type_cnt"))
+ goto cleanup;
+
+ VALIDATE_RAW_BTF(
+ btf4,
+ "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+ "[2] STRUCT 's1' size=8 vlen=0",
+ "[3] UNION 'u1' size=12 vlen=0",
+ "[4] ENUM 'e1' encoding=UNSIGNED size=4 vlen=0",
+ "[5] ENUM 'e641' encoding=UNSIGNED size=8 vlen=0",
+ "[6] STRUCT 'embedded' size=4 vlen=0",
+ "[7] STRUCT 'from_proto' size=4 vlen=0",
+ /* split BTF; these types should match split BTF above from 17-28, with
+ * updated type id references
+ */
+ "[8] PTR '(anon)' type_id=2",
+ "[9] PTR '(anon)' type_id=20",
+ "[10] CONST '(anon)' type_id=3",
+ "[11] RESTRICT '(anon)' type_id=21",
+ "[12] VOLATILE '(anon)' type_id=4",
+ "[13] TYPEDEF 'et' type_id=22",
+ "[14] CONST '(anon)' type_id=5",
+ "[15] PTR '(anon)' type_id=23",
+ "[16] STRUCT 'with_embedded' size=4 vlen=1\n"
+ "\t'f1' type_id=6 bits_offset=0",
+ "[17] FUNC 'fn' type_id=24 linkage=static",
+ "[18] TYPEDEF 'arraytype' type_id=25",
+ "[19] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+ "\t'p1' type_id=7",
+ /* split BTF types added from original base BTF below */
+ "[20] STRUCT '(anon)' size=12 vlen=2\n"
+ "\t'f1' type_id=1 bits_offset=0\n"
+ "\t'f2' type_id=2 bits_offset=32",
+ "[21] UNION '(anon)' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[22] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+ "\t'av1' val=2",
+ "[23] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+ "\t'v1' val=1025",
+ "[24] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+ "\t'p1' type_id=1",
+ "[25] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3");
+
+cleanup:
+ btf__free(btf4);
+ btf__free(btf3);
+ btf__free(btf2);
+ btf__free(btf1);
+}
+
+/* create split reference BTF from vmlinux + split BTF with a few type references;
+ * ensure the resultant split reference BTF is as expected, containing only types
+ * needed to disambiguate references from split BTF.
+ */
+static void test_distilled_base_vmlinux(void)
+{
+ struct btf *split_btf = NULL, *vmlinux_btf = btf__load_vmlinux_btf();
+ struct btf *split_dist = NULL, *base_dist = NULL;
+ __s32 int_id, myint_id;
+
+ if (!ASSERT_OK_PTR(vmlinux_btf, "load_vmlinux"))
+ return;
+ int_id = btf__find_by_name_kind(vmlinux_btf, "int", BTF_KIND_INT);
+ if (!ASSERT_GT(int_id, 0, "find_int"))
+ goto cleanup;
+ split_btf = btf__new_empty_split(vmlinux_btf);
+ if (!ASSERT_OK_PTR(split_btf, "new_split"))
+ goto cleanup;
+ myint_id = btf__add_typedef(split_btf, "myint", int_id);
+ btf__add_ptr(split_btf, myint_id);
+
+ if (!ASSERT_EQ(btf__distill_base(split_btf, &base_dist, &split_dist), 0,
+ "distill_vmlinux_base"))
+ goto cleanup;
+
+ if (!ASSERT_OK_PTR(split_dist, "split_distilled") ||
+ !ASSERT_OK_PTR(base_dist, "base_dist"))
+ goto cleanup;
+ VALIDATE_RAW_BTF(
+ split_dist,
+ "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+ "[2] TYPEDEF 'myint' type_id=1",
+ "[3] PTR '(anon)' type_id=2");
+
+cleanup:
+ btf__free(split_dist);
+ btf__free(base_dist);
+ btf__free(split_btf);
+ btf__free(vmlinux_btf);
+}
+
+void test_btf_distill(void)
+{
+ if (test__start_subtest("distilled_base"))
+ test_distilled_base();
+ if (test__start_subtest("distilled_base_vmlinux"))
+ test_distilled_base_vmlinux();
+}
--
2.31.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation
2024-05-28 12:23 [PATCH v5 bpf-next 0/9] bpf: support resilient split BTF Alan Maguire
2024-05-28 12:24 ` [PATCH v5 bpf-next 1/9] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
2024-05-28 12:24 ` [PATCH v5 bpf-next 2/9] selftests/bpf: test distilled base, split BTF generation Alan Maguire
@ 2024-05-28 12:24 ` Alan Maguire
2024-05-31 2:22 ` Eduard Zingerman
2024-05-31 18:21 ` Andrii Nakryiko
2024-05-28 12:24 ` [PATCH v5 bpf-next 4/9] selftests/bpf: extend distilled BTF tests to cover " Alan Maguire
` (5 subsequent siblings)
8 siblings, 2 replies; 29+ messages in thread
From: Alan Maguire @ 2024-05-28 12:24 UTC (permalink / raw)
To: andrii, jolsa, acme, quentin
Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan, Alan Maguire
Map distilled base BTF type ids referenced in split BTF and their
references to the base BTF passed in, and if the mapping succeeds,
reparent the split BTF to the base BTF.
Relocation is done by first verifying that distilled base BTF
only consists of named INT, FLOAT, ENUM, FWD, STRUCT and
UNION kinds; then we sort these to speed lookups. Once sorted,
the base BTF is iterated, and for each relevant kind we check
for an equivalent in distilled base BTF. When found, the
mapping from distilled -> base BTF id and string offset is recorded.
In establishing mappings, we need to ensure we check STRUCT/UNION
size when the STRUCT/UNION is embedded in a split BTF STRUCT/UNION,
and when duplicate names exist for the same STRUCT/UNION. Otherwise
size is ignored in matching STRUCT/UNIONs.
Once all mappings are established, we can update type ids
and string offsets in split BTF and reparent it to the new base.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/lib/bpf/Build | 2 +-
tools/lib/bpf/btf.c | 17 ++
tools/lib/bpf/btf.h | 14 ++
tools/lib/bpf/btf_relocate.c | 430 ++++++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.map | 1 +
tools/lib/bpf/libbpf_internal.h | 3 +
6 files changed, 466 insertions(+), 1 deletion(-)
create mode 100644 tools/lib/bpf/btf_relocate.c
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index b6619199a706..336da6844d42 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1,4 +1,4 @@
libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \
btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
- usdt.o zip.o elf.o features.o
+ usdt.o zip.o elf.o features.o btf_relocate.o
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 9f68268e659a..cb762d7a5dd7 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -5502,3 +5502,20 @@ int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf,
return 0;
}
+
+const struct btf_header *btf_header(const struct btf *btf)
+{
+ return btf->hdr;
+}
+
+void btf_set_base_btf(struct btf *btf, const struct btf *base_btf)
+{
+ btf->base_btf = (struct btf *)base_btf;
+ btf->start_id = btf__type_cnt(base_btf);
+ btf->start_str_off = base_btf->hdr->str_len;
+}
+
+int btf__relocate(struct btf *btf, const struct btf *base_btf)
+{
+ return libbpf_err(btf_relocate(btf, base_btf, NULL));
+}
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index cb08ee9a5a10..8a93120b7385 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -252,6 +252,20 @@ struct btf_dedup_opts {
LIBBPF_API int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts);
+/**
+ * @brief **btf__relocate()** will check the split BTF *btf* for references
+ * to base BTF kinds, and verify those references are compatible with
+ * *base_btf*; if they are, *btf* is adjusted such that is re-parented to
+ * *base_btf* and type ids and strings are adjusted to accommodate this.
+ *
+ * If successful, 0 is returned and **btf** now has **base_btf** as its
+ * base.
+ *
+ * A negative value is returned on error and the thread-local `errno` variable
+ * is set to the error code as well.
+ */
+LIBBPF_API int btf__relocate(struct btf *btf, const struct btf *base_btf);
+
struct btf_dump;
struct btf_dump_opts {
diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c
new file mode 100644
index 000000000000..f2e91cdfb5cc
--- /dev/null
+++ b/tools/lib/bpf/btf_relocate.c
@@ -0,0 +1,430 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024, Oracle and/or its affiliates. */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+
+#include "btf.h"
+#include "bpf.h"
+#include "libbpf.h"
+#include "libbpf_internal.h"
+
+struct btf;
+
+struct btf_relocate {
+ struct btf *btf;
+ const struct btf *base_btf;
+ const struct btf *dist_base_btf;
+ unsigned int nr_base_types;
+ unsigned int nr_split_types;
+ unsigned int nr_dist_base_types;
+ int dist_str_len;
+ int base_str_len;
+ __u32 *id_map;
+ __u32 *str_map;
+};
+
+/* Set temporarily in relocation id_map if distilled base struct/union is
+ * embedded in a split BTF struct/union; in such a case, size information must
+ * match between distilled base BTF and base BTF representation of type.
+ */
+#define BTF_IS_EMBEDDED ((__u32)-1)
+
+/* <name, size, id> triple used in sorting/searching distilled base BTF. */
+struct btf_name_info {
+ const char *name;
+ int size:31;
+ /* set when search requires a size match */
+ bool needs_size;
+ __u32 id;
+};
+
+static int btf_relocate_rewrite_type_id(__u32 *id, void *ctx)
+{
+ struct btf_relocate *r = ctx;
+
+ *id = r->id_map[*id];
+ return 0;
+}
+
+/* Simple string comparison used for sorting within BTF, since all distilled
+ * types are named. If strings match, and size is non-zero for both elements
+ * fall back to using size for ordering.
+ */
+static int cmp_btf_name_size(const void *n1, const void *n2)
+{
+ const struct btf_name_info *ni1 = n1;
+ const struct btf_name_info *ni2 = n2;
+ int name_diff = strcmp(ni1->name, ni2->name);
+
+ if (!name_diff && ni1->needs_size && ni2->needs_size)
+ return ni2->size - ni1->size;
+ return name_diff;
+}
+
+/* If a member of a split BTF struct/union refers to a base BTF
+ * struct/union, mark that struct/union id temporarily in the id_map
+ * with BTF_IS_EMBEDDED. Members can be const/restrict/volatile/typedef
+ * reference types, but if a pointer is encountered, the type is no longer
+ * considered embedded.
+ */
+static int btf_mark_embedded_composite_type_ids(__u32 *id, void *ctx)
+{
+ struct btf_relocate *r = ctx;
+ struct btf_type *t;
+ __u32 next_id = *id;
+
+ while (true) {
+ if (next_id == 0)
+ return 0;
+ t = btf_type_by_id(r->btf, next_id);
+ switch (btf_kind(t)) {
+ case BTF_KIND_CONST:
+ case BTF_KIND_RESTRICT:
+ case BTF_KIND_VOLATILE:
+ case BTF_KIND_TYPEDEF:
+ case BTF_KIND_TYPE_TAG:
+ next_id = t->type;
+ break;
+ case BTF_KIND_ARRAY: {
+ struct btf_array *a = btf_array(t);
+
+ next_id = a->type;
+ break;
+ }
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION:
+ if (next_id < r->nr_dist_base_types)
+ r->id_map[next_id] = BTF_IS_EMBEDDED;
+ return 0;
+ default:
+ return 0;
+ }
+ }
+
+ return 0;
+}
+
+/* Build a map from distilled base BTF ids to base BTF ids. To do so, iterate
+ * through base BTF looking up distilled type (using binary search) equivalents.
+ */
+static int btf_relocate_map_distilled_base(struct btf_relocate *r)
+{
+ struct btf_name_info *dist_base_info_sorted;
+ struct btf_type *base_t, *dist_t, *split_t;
+ __u8 *base_name_cnt = NULL;
+ int err = 0;
+ __u32 id;
+
+ /* generate a sort index array of name/type ids sorted by name for
+ * distilled base BTF to speed name-based lookups.
+ */
+ dist_base_info_sorted = calloc(r->nr_dist_base_types, sizeof(*dist_base_info_sorted));
+ if (!dist_base_info_sorted) {
+ err = -ENOMEM;
+ goto done;
+ }
+ for (id = 0; id < r->nr_dist_base_types; id++) {
+ dist_t = btf_type_by_id(r->dist_base_btf, id);
+ dist_base_info_sorted[id].name = btf__name_by_offset(r->dist_base_btf,
+ dist_t->name_off);
+ dist_base_info_sorted[id].id = id;
+ dist_base_info_sorted[id].size = dist_t->size;
+ dist_base_info_sorted[id].needs_size = true;
+ }
+ qsort(dist_base_info_sorted, r->nr_dist_base_types, sizeof(*dist_base_info_sorted),
+ cmp_btf_name_size);
+
+ /* Mark distilled base struct/union members of split BTF structs/unions
+ * in id_map with BTF_IS_EMBEDDED; this signals that these types
+ * need to match both name and size, otherwise embeddding the base
+ * struct/union in the split type is invalid.
+ */
+ for (id = r->nr_dist_base_types; id < r->nr_split_types; id++) {
+ split_t = btf_type_by_id(r->btf, id);
+ if (btf_is_composite(split_t)) {
+ err = btf_type_visit_type_ids(split_t, btf_mark_embedded_composite_type_ids,
+ r);
+ if (err < 0)
+ goto done;
+ }
+ }
+
+ /* Collect name counts for composite types in base BTF. If multiple
+ * instances of a struct/union of the same name exist, we need to use
+ * size to determine which to map to since name alone is ambiguous.
+ */
+ base_name_cnt = calloc(r->base_str_len, sizeof(*base_name_cnt));
+ if (!base_name_cnt) {
+ err = -ENOMEM;
+ goto done;
+ }
+ for (id = 1; id < r->nr_base_types; id++) {
+ base_t = btf_type_by_id(r->base_btf, id);
+ if (!btf_is_composite(base_t) || !base_t->name_off)
+ continue;
+ if (base_name_cnt[base_t->name_off] < 255)
+ base_name_cnt[base_t->name_off]++;
+ }
+
+ /* Now search base BTF for matching distilled base BTF types. */
+ for (id = 1; id < r->nr_base_types; id++) {
+ struct btf_name_info *dist_name_info, base_name_info = {};
+ int dist_kind, base_kind;
+
+ base_t = btf_type_by_id(r->base_btf, id);
+ /* distilled base consists of named types only. */
+ if (!base_t->name_off)
+ continue;
+ base_kind = btf_kind(base_t);
+ base_name_info.id = id;
+ base_name_info.name = btf__name_by_offset(r->base_btf, base_t->name_off);
+ switch (base_kind) {
+ case BTF_KIND_INT:
+ case BTF_KIND_FLOAT:
+ case BTF_KIND_ENUM:
+ case BTF_KIND_ENUM64:
+ /* These types should match both name and size */
+ base_name_info.needs_size = true;
+ base_name_info.size = base_t->size;
+ break;
+ case BTF_KIND_FWD:
+ /* No size considerations for fwds. */
+ break;
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION:
+ /* Size only needs to be used for struct/union if there
+ * are multiple types in base BTF with the same name.
+ * If there are multiple _distilled_ types with the same
+ * name (a very unlikely scenario), that doesn't matter
+ * unless corresponding _base_ types to match them are
+ * missing.
+ */
+ base_name_info.needs_size = base_name_cnt[base_t->name_off] > 1;
+ base_name_info.size = base_t->size;
+ break;
+ default:
+ continue;
+ }
+ dist_name_info = bsearch(&base_name_info, dist_base_info_sorted,
+ r->nr_dist_base_types, sizeof(*dist_base_info_sorted),
+ cmp_btf_name_size);
+ if (!dist_name_info)
+ continue;
+ if (!dist_name_info->id || dist_name_info->id > r->nr_dist_base_types) {
+ pr_warn("base BTF id [%d] maps to invalid distilled base BTF id [%d]\n",
+ id, dist_name_info->id);
+ err = -EINVAL;
+ goto done;
+ }
+ dist_t = btf_type_by_id(r->dist_base_btf, dist_name_info->id);
+ dist_kind = btf_kind(dist_t);
+
+ /* Validate that the found distilled type is compatible.
+ * Do not error out on mismatch as another match may occur
+ * for an identically-named type.
+ */
+ switch (dist_kind) {
+ case BTF_KIND_FWD:
+ switch (base_kind) {
+ case BTF_KIND_FWD:
+ if (btf_kflag(dist_t) != btf_kflag(base_t))
+ continue;
+ break;
+ case BTF_KIND_STRUCT:
+ if (btf_kflag(base_t))
+ continue;
+ break;
+ case BTF_KIND_UNION:
+ if (!btf_kflag(base_t))
+ continue;
+ break;
+ default:
+ continue;
+ }
+ break;
+ case BTF_KIND_INT:
+ if (dist_kind != base_kind ||
+ btf_int_encoding(base_t) != btf_int_encoding(dist_t))
+ continue;
+ break;
+ case BTF_KIND_FLOAT:
+ if (dist_kind != base_kind)
+ continue;
+ break;
+ case BTF_KIND_ENUM:
+ /* ENUM and ENUM64 are encoded as sized ENUM in
+ * distilled base BTF.
+ */
+ if (dist_kind != base_kind && base_kind != BTF_KIND_ENUM64)
+ continue;
+ break;
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION:
+ /* size verification is required for embedded
+ * struct/unions.
+ */
+ if (r->id_map[dist_name_info->id] == BTF_IS_EMBEDDED &&
+ base_t->size != dist_t->size)
+ continue;
+ break;
+ default:
+ continue;
+ }
+ /* map id and name */
+ r->id_map[dist_name_info->id] = id;
+ r->str_map[dist_t->name_off] = base_t->name_off;
+ }
+ /* ensure all distilled BTF ids now have a mapping... */
+ for (id = 1; id < r->nr_dist_base_types; id++) {
+ const char *name;
+
+ if (r->id_map[id] && r->id_map[id] != BTF_IS_EMBEDDED)
+ continue;
+ dist_t = btf_type_by_id(r->dist_base_btf, id);
+ name = btf__name_by_offset(r->dist_base_btf, dist_t->name_off);
+ pr_warn("distilled base BTF type '%s' [%d] is not mapped to base BTF id\n",
+ name, id);
+ err = -EINVAL;
+ break;
+ }
+done:
+ free(base_name_cnt);
+ free(dist_base_info_sorted);
+ return err;
+}
+
+/* distilled base should only have named int/float/enum/fwd/struct/union types. */
+static int btf_relocate_validate_distilled_base(struct btf_relocate *r)
+{
+ unsigned int i;
+
+ for (i = 1; i < r->nr_dist_base_types; i++) {
+ struct btf_type *t = btf_type_by_id(r->dist_base_btf, i);
+ int kind = btf_kind(t);
+
+ switch (kind) {
+ case BTF_KIND_INT:
+ case BTF_KIND_FLOAT:
+ case BTF_KIND_ENUM:
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION:
+ case BTF_KIND_FWD:
+ if (t->name_off)
+ break;
+ pr_warn("type [%d], kind [%d] is invalid for distilled base BTF; it is anonymous\n",
+ i, kind);
+ return -EINVAL;
+ default:
+ pr_warn("type [%d] in distilled based BTF has unexpected kind [%d]\n",
+ i, kind);
+ return -EINVAL;
+ }
+ }
+ return 0;
+}
+
+static int btf_relocate_rewrite_strs(__u32 *str_off, void *ctx)
+{
+ struct btf_relocate *r = ctx;
+ int off;
+
+ if (!*str_off)
+ return 0;
+ if (*str_off >= r->dist_str_len) {
+ *str_off += r->base_str_len - r->dist_str_len;
+ } else {
+ off = r->str_map[*str_off];
+ if (!off) {
+ pr_warn("string '%s' [offset %u] is not mapped to base BTF",
+ btf__str_by_offset(r->btf, off), *str_off);
+ return -ENOENT;
+ }
+ *str_off = off;
+ }
+ return 0;
+}
+
+/* If successful, output of relocation is updated BTF with base BTF pointing
+ * at base_btf, and type ids, strings adjusted accordingly.
+ */
+int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **id_map)
+{
+ unsigned int nr_types = btf__type_cnt(btf);
+ const struct btf_header *dist_base_hdr;
+ const struct btf_header *base_hdr;
+ struct btf_relocate r = {};
+ struct btf_type *t;
+ int err = 0;
+ __u32 id, i;
+
+ r.dist_base_btf = btf__base_btf(btf);
+ if (!base_btf || r.dist_base_btf == base_btf)
+ return -EINVAL;
+
+ r.nr_dist_base_types = btf__type_cnt(r.dist_base_btf);
+ r.nr_base_types = btf__type_cnt(base_btf);
+ r.nr_split_types = nr_types - r.nr_dist_base_types;
+ r.btf = btf;
+ r.base_btf = base_btf;
+
+ r.id_map = calloc(nr_types, sizeof(*r.id_map));
+ r.str_map = calloc(btf_header(r.dist_base_btf)->str_len, sizeof(*r.str_map));
+ dist_base_hdr = btf_header(r.dist_base_btf);
+ base_hdr = btf_header(r.base_btf);
+ r.dist_str_len = dist_base_hdr->str_len;
+ r.base_str_len = base_hdr->str_len;
+ if (!r.id_map || !r.str_map) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ err = btf_relocate_validate_distilled_base(&r);
+ if (err)
+ goto err_out;
+
+ /* Split BTF ids need to be adjusted as base and distilled base
+ * have different numbers of types, changing the start id of split
+ * BTF.
+ */
+ for (id = r.nr_dist_base_types; id < nr_types; id++)
+ r.id_map[id] = id + r.nr_base_types - r.nr_dist_base_types;
+
+ /* Build a map from distilled base ids to actual base BTF ids; it is used
+ * to update split BTF id references. Also build a str_map mapping from
+ * distilled base BTF names to base BTF names.
+ */
+ err = btf_relocate_map_distilled_base(&r);
+ if (err)
+ goto err_out;
+
+ /* Next, rewrite type ids in split BTF, replacing split ids with updated
+ * ids based on number of types in base BTF, and base ids with
+ * relocated ids from base_btf.
+ */
+ for (i = 0, id = r.nr_dist_base_types; i < r.nr_split_types; i++, id++) {
+ t = btf_type_by_id(btf, id);
+ err = btf_type_visit_type_ids(t, btf_relocate_rewrite_type_id, &r);
+ if (err)
+ goto err_out;
+ }
+ /* String offsets now need to be updated using the str_map. */
+ for (i = 0; i < r.nr_split_types; i++) {
+ t = btf_type_by_id(btf, i + r.nr_dist_base_types);
+ err = btf_type_visit_str_offs(t, btf_relocate_rewrite_strs, &r);
+ if (err)
+ goto err_out;
+ }
+ /* Finally reset base BTF to be base_btf */
+ btf_set_base_btf(btf, base_btf);
+
+ if (id_map) {
+ *id_map = r.id_map;
+ r.id_map = NULL;
+ }
+err_out:
+ free(r.id_map);
+ free(r.str_map);
+ return err;
+}
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 9e69d6e2a512..b333e78bcd96 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -420,6 +420,7 @@ LIBBPF_1.4.0 {
LIBBPF_1.5.0 {
global:
btf__distill_base;
+ btf__relocate;
bpf_program__attach_sockmap;
ring__consume_n;
ring_buffer__consume_n;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index a0dcfb82e455..a8ce5fb5a8c4 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -234,6 +234,9 @@ struct btf_type;
struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id);
const char *btf_kind_str(const struct btf_type *t);
const struct btf_type *skip_mods_and_typedefs(const struct btf *btf, __u32 id, __u32 *res_id);
+const struct btf_header *btf_header(const struct btf *btf);
+void btf_set_base_btf(struct btf *btf, const struct btf *base_btf);
+int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **id_map);
static inline enum btf_func_linkage btf_func_linkage(const struct btf_type *t)
{
--
2.31.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v5 bpf-next 4/9] selftests/bpf: extend distilled BTF tests to cover BTF relocation
2024-05-28 12:23 [PATCH v5 bpf-next 0/9] bpf: support resilient split BTF Alan Maguire
` (2 preceding siblings ...)
2024-05-28 12:24 ` [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation Alan Maguire
@ 2024-05-28 12:24 ` Alan Maguire
2024-05-31 2:23 ` Eduard Zingerman
2024-05-28 12:24 ` [PATCH v5 bpf-next 5/9] libbpf: make btf_parse_elf process .BTF.base transparently Alan Maguire
` (4 subsequent siblings)
8 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-05-28 12:24 UTC (permalink / raw)
To: andrii, jolsa, acme, quentin
Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan, Alan Maguire
Ensure relocated BTF looks as expected; in this case identical to
original split BTF, with a few duplicate anonymous types added to
split BTF by the relocation process.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
---
.../selftests/bpf/prog_tests/btf_distill.c | 67 +++++++++++++++++++
1 file changed, 67 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_distill.c b/tools/testing/selftests/bpf/prog_tests/btf_distill.c
index 5c3a38747962..80544dd562b7 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_distill.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_distill.c
@@ -217,6 +217,73 @@ static void test_distilled_base(void)
"\t'p1' type_id=1",
"[25] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3");
+ if (!ASSERT_EQ(btf__relocate(btf4, btf1), 0, "relocate_split"))
+ goto cleanup;
+
+ VALIDATE_RAW_BTF(
+ btf4,
+ "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+ "[2] PTR '(anon)' type_id=1",
+ "[3] STRUCT 's1' size=8 vlen=1\n"
+ "\t'f1' type_id=2 bits_offset=0",
+ "[4] STRUCT '(anon)' size=12 vlen=2\n"
+ "\t'f1' type_id=1 bits_offset=0\n"
+ "\t'f2' type_id=3 bits_offset=32",
+ "[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)",
+ "[6] UNION 'u1' size=12 vlen=2\n"
+ "\t'f1' type_id=1 bits_offset=0\n"
+ "\t'f2' type_id=2 bits_offset=0",
+ "[7] UNION '(anon)' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n"
+ "\t'v1' val=1",
+ "[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+ "\t'av1' val=2",
+ "[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n"
+ "\t'v1' val=1024",
+ "[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+ "\t'v1' val=1025",
+ "[12] STRUCT 'unneeded' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[13] STRUCT 'embedded' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[14] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+ "\t'p1' type_id=1",
+ "[15] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3",
+ "[16] STRUCT 'from_proto' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[17] UNION 'u1' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[18] PTR '(anon)' type_id=3",
+ "[19] PTR '(anon)' type_id=30",
+ "[20] CONST '(anon)' type_id=6",
+ "[21] RESTRICT '(anon)' type_id=31",
+ "[22] VOLATILE '(anon)' type_id=8",
+ "[23] TYPEDEF 'et' type_id=32",
+ "[24] CONST '(anon)' type_id=10",
+ "[25] PTR '(anon)' type_id=33",
+ "[26] STRUCT 'with_embedded' size=4 vlen=1\n"
+ "\t'f1' type_id=13 bits_offset=0",
+ "[27] FUNC 'fn' type_id=34 linkage=static",
+ "[28] TYPEDEF 'arraytype' type_id=35",
+ "[29] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+ "\t'p1' type_id=16",
+ /* below here are (duplicate) anon base types added by distill
+ * process to split BTF.
+ */
+ "[30] STRUCT '(anon)' size=12 vlen=2\n"
+ "\t'f1' type_id=1 bits_offset=0\n"
+ "\t'f2' type_id=3 bits_offset=32",
+ "[31] UNION '(anon)' size=4 vlen=1\n"
+ "\t'f1' type_id=1 bits_offset=0",
+ "[32] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+ "\t'av1' val=2",
+ "[33] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+ "\t'v1' val=1025",
+ "[34] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+ "\t'p1' type_id=1",
+ "[35] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3");
+
cleanup:
btf__free(btf4);
btf__free(btf3);
--
2.31.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v5 bpf-next 5/9] libbpf: make btf_parse_elf process .BTF.base transparently
2024-05-28 12:23 [PATCH v5 bpf-next 0/9] bpf: support resilient split BTF Alan Maguire
` (3 preceding siblings ...)
2024-05-28 12:24 ` [PATCH v5 bpf-next 4/9] selftests/bpf: extend distilled BTF tests to cover " Alan Maguire
@ 2024-05-28 12:24 ` Alan Maguire
2024-05-31 18:57 ` Andrii Nakryiko
2024-05-28 12:24 ` [PATCH v5 bpf-next 6/9] resolve_btfids: handle presence of .BTF.base section Alan Maguire
` (3 subsequent siblings)
8 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-05-28 12:24 UTC (permalink / raw)
To: andrii, jolsa, acme, quentin
Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan
From: Eduard Zingerman <eddyz87@gmail.com>
Update btf_parse_elf() to check if .BTF.base section is present.
The logic is as follows:
if .BTF.base section exists:
distilled_base := btf_new(.BTF.base)
if distilled_base:
btf := btf_new(.BTF, .base_btf=distilled_base)
if base_btf:
btf_relocate(btf, base_btf)
else:
btf := btf_new(.BTF)
return btf
In other words:
- if .BTF.base section exists, load BTF from it and use it as a base
for .BTF load;
- if base_btf is specified and .BTF.base section exist, relocate newly
loaded .BTF against base_btf.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
---
tools/lib/bpf/btf.c | 151 +++++++++++++++++++++++++++++---------------
tools/lib/bpf/btf.h | 1 +
2 files changed, 102 insertions(+), 50 deletions(-)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index cb762d7a5dd7..b57f74eedda0 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -114,7 +114,10 @@ struct btf {
/* a set of unique strings */
struct strset *strs_set;
/* whether strings are already deduplicated */
- bool strs_deduped;
+ unsigned strs_deduped:1;
+
+ /* whether base_btf should be freed in btf_free for this instance */
+ unsigned owns_base:1;
/* BTF object FD, if loaded into kernel */
int fd;
@@ -969,6 +972,8 @@ void btf__free(struct btf *btf)
free(btf->raw_data);
free(btf->raw_data_swapped);
free(btf->type_offs);
+ if (btf->owns_base)
+ btf__free(btf->base_btf);
free(btf);
}
@@ -1084,53 +1089,38 @@ struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf)
return libbpf_ptr(btf_new(data, size, base_btf));
}
-static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
- struct btf_ext **btf_ext)
+struct elf_sections_info {
+ Elf_Data *btf_data;
+ Elf_Data *btf_ext_data;
+ Elf_Data *btf_base_data;
+};
+
+static int btf_find_elf_sections(Elf *elf, const char *path, struct elf_sections_info *info)
{
- Elf_Data *btf_data = NULL, *btf_ext_data = NULL;
- int err = 0, fd = -1, idx = 0;
- struct btf *btf = NULL;
Elf_Scn *scn = NULL;
- Elf *elf = NULL;
+ Elf_Data *data;
GElf_Ehdr ehdr;
size_t shstrndx;
+ int idx = 0;
- if (elf_version(EV_CURRENT) == EV_NONE) {
- pr_warn("failed to init libelf for %s\n", path);
- return ERR_PTR(-LIBBPF_ERRNO__LIBELF);
- }
-
- fd = open(path, O_RDONLY | O_CLOEXEC);
- if (fd < 0) {
- err = -errno;
- pr_warn("failed to open %s: %s\n", path, strerror(errno));
- return ERR_PTR(err);
- }
-
- err = -LIBBPF_ERRNO__FORMAT;
-
- elf = elf_begin(fd, ELF_C_READ, NULL);
- if (!elf) {
- pr_warn("failed to open %s as ELF file\n", path);
- goto done;
- }
if (!gelf_getehdr(elf, &ehdr)) {
pr_warn("failed to get EHDR from %s\n", path);
- goto done;
+ goto err;
}
if (elf_getshdrstrndx(elf, &shstrndx)) {
pr_warn("failed to get section names section index for %s\n",
path);
- goto done;
+ goto err;
}
if (!elf_rawdata(elf_getscn(elf, shstrndx), NULL)) {
pr_warn("failed to get e_shstrndx from %s\n", path);
- goto done;
+ goto err;
}
while ((scn = elf_nextscn(elf, scn)) != NULL) {
+ Elf_Data **field;
GElf_Shdr sh;
char *name;
@@ -1138,43 +1128,103 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
if (gelf_getshdr(scn, &sh) != &sh) {
pr_warn("failed to get section(%d) header from %s\n",
idx, path);
- goto done;
+ goto err;
}
name = elf_strptr(elf, shstrndx, sh.sh_name);
if (!name) {
pr_warn("failed to get section(%d) name from %s\n",
idx, path);
- goto done;
+ goto err;
}
- if (strcmp(name, BTF_ELF_SEC) == 0) {
- btf_data = elf_getdata(scn, 0);
- if (!btf_data) {
- pr_warn("failed to get section(%d, %s) data from %s\n",
- idx, name, path);
- goto done;
- }
- continue;
- } else if (btf_ext && strcmp(name, BTF_EXT_ELF_SEC) == 0) {
- btf_ext_data = elf_getdata(scn, 0);
- if (!btf_ext_data) {
- pr_warn("failed to get section(%d, %s) data from %s\n",
- idx, name, path);
- goto done;
- }
+
+ if (strcmp(name, BTF_ELF_SEC) == 0)
+ field = &info->btf_data;
+ else if (strcmp(name, BTF_EXT_ELF_SEC) == 0)
+ field = &info->btf_ext_data;
+ else if (strcmp(name, BTF_BASE_ELF_SEC) == 0)
+ field = &info->btf_base_data;
+ else
continue;
+
+ data = elf_getdata(scn, 0);
+ if (!data) {
+ pr_warn("failed to get section(%d, %s) data from %s\n",
+ idx, name, path);
+ goto err;
}
+ *field = data;
}
- if (!btf_data) {
+ return 0;
+
+err:
+ return -LIBBPF_ERRNO__FORMAT;
+}
+
+static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
+ struct btf_ext **btf_ext)
+{
+ struct elf_sections_info info = {};
+ struct btf *distilled_base_btf = NULL;
+ struct btf *btf = NULL;
+ int err = 0, fd = -1;
+ Elf *elf = NULL;
+
+ if (elf_version(EV_CURRENT) == EV_NONE) {
+ pr_warn("failed to init libelf for %s\n", path);
+ return ERR_PTR(-LIBBPF_ERRNO__LIBELF);
+ }
+
+ fd = open(path, O_RDONLY | O_CLOEXEC);
+ if (fd < 0) {
+ err = -errno;
+ pr_warn("failed to open %s: %s\n", path, strerror(errno));
+ return ERR_PTR(err);
+ }
+
+ elf = elf_begin(fd, ELF_C_READ, NULL);
+ if (!elf) {
+ pr_warn("failed to open %s as ELF file\n", path);
+ goto done;
+ }
+
+ err = btf_find_elf_sections(elf, path, &info);
+ if (err)
+ goto done;
+
+ if (!info.btf_data) {
pr_warn("failed to find '%s' ELF section in %s\n", BTF_ELF_SEC, path);
err = -ENODATA;
goto done;
}
- btf = btf_new(btf_data->d_buf, btf_data->d_size, base_btf);
+
+ if (info.btf_base_data) {
+ distilled_base_btf = btf_new(info.btf_base_data->d_buf, info.btf_base_data->d_size,
+ NULL);
+ err = libbpf_get_error(distilled_base_btf);
+ if (err) {
+ distilled_base_btf = NULL;
+ goto done;
+ }
+ }
+
+ btf = btf_new(info.btf_data->d_buf, info.btf_data->d_size,
+ distilled_base_btf ? distilled_base_btf : base_btf);
err = libbpf_get_error(btf);
if (err)
goto done;
+ if (distilled_base_btf && base_btf) {
+ err = btf__relocate(btf, base_btf);
+ if (err)
+ goto done;
+ btf__free(distilled_base_btf);
+ distilled_base_btf = NULL;
+ }
+
+ if (distilled_base_btf)
+ btf->owns_base = true;
+
switch (gelf_getclass(elf)) {
case ELFCLASS32:
btf__set_pointer_size(btf, 4);
@@ -1187,8 +1237,8 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
break;
}
- if (btf_ext && btf_ext_data) {
- *btf_ext = btf_ext__new(btf_ext_data->d_buf, btf_ext_data->d_size);
+ if (btf_ext && info.btf_ext_data) {
+ *btf_ext = btf_ext__new(info.btf_ext_data->d_buf, info.btf_ext_data->d_size);
err = libbpf_get_error(*btf_ext);
if (err)
goto done;
@@ -1205,6 +1255,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
if (btf_ext)
btf_ext__free(*btf_ext);
+ btf__free(distilled_base_btf);
btf__free(btf);
return ERR_PTR(err);
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 8a93120b7385..b68d216837a9 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -18,6 +18,7 @@ extern "C" {
#define BTF_ELF_SEC ".BTF"
#define BTF_EXT_ELF_SEC ".BTF.ext"
+#define BTF_BASE_ELF_SEC ".BTF.base"
#define MAPS_ELF_SEC ".maps"
struct btf;
--
2.31.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v5 bpf-next 6/9] resolve_btfids: handle presence of .BTF.base section
2024-05-28 12:23 [PATCH v5 bpf-next 0/9] bpf: support resilient split BTF Alan Maguire
` (4 preceding siblings ...)
2024-05-28 12:24 ` [PATCH v5 bpf-next 5/9] libbpf: make btf_parse_elf process .BTF.base transparently Alan Maguire
@ 2024-05-28 12:24 ` Alan Maguire
2024-05-28 12:24 ` [PATCH v5 bpf-next 7/9] module, bpf: store BTF base pointer in struct module Alan Maguire
` (2 subsequent siblings)
8 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-05-28 12:24 UTC (permalink / raw)
To: andrii, jolsa, acme, quentin
Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan, Alan Maguire
Now that btf_parse_elf() handles .BTF.base section presence,
we need to ensure that resolve_btfids uses .BTF.base when present
rather than the vmlinux base BTF passed in via the -B option.
Detect .BTF.base section presence and unset the base BTF path
to ensure that BTF ELF parsing will do the right thing.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/bpf/resolve_btfids/main.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/tools/bpf/resolve_btfids/main.c b/tools/bpf/resolve_btfids/main.c
index d9520cb826b3..de2012f25f71 100644
--- a/tools/bpf/resolve_btfids/main.c
+++ b/tools/bpf/resolve_btfids/main.c
@@ -409,6 +409,14 @@ static int elf_collect(struct object *obj)
obj->efile.idlist = data;
obj->efile.idlist_shndx = idx;
obj->efile.idlist_addr = sh.sh_addr;
+ } else if (!strcmp(name, BTF_BASE_ELF_SEC)) {
+ /* If a .BTF.base section is found, do not resolve
+ * BTF ids relative to vmlinux; resolve relative
+ * to the .BTF.base section instead. btf__parse_split()
+ * will take care of this once the base BTF it is
+ * passed is NULL.
+ */
+ obj->base_btf_path = NULL;
}
if (compressed_section_fix(elf, scn, &sh))
--
2.31.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v5 bpf-next 7/9] module, bpf: store BTF base pointer in struct module
2024-05-28 12:23 [PATCH v5 bpf-next 0/9] bpf: support resilient split BTF Alan Maguire
` (5 preceding siblings ...)
2024-05-28 12:24 ` [PATCH v5 bpf-next 6/9] resolve_btfids: handle presence of .BTF.base section Alan Maguire
@ 2024-05-28 12:24 ` Alan Maguire
2024-05-28 18:27 ` Luis Chamberlain
2024-05-28 12:24 ` [PATCH v5 bpf-next 8/9] libbpf,bpf: share BTF relocate-related code with kernel Alan Maguire
2024-05-28 12:24 ` [PATCH v5 bpf-next 9/9] kbuild,bpf: add module-specific pahole flags for distilled base BTF Alan Maguire
8 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-05-28 12:24 UTC (permalink / raw)
To: andrii, jolsa, acme, quentin
Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan, Alan Maguire
...as this will allow split BTF modules with a base BTF
representation (rather than the full vmlinux BTF at time of
BTF encoding) to resolve their references to kernel types in a
way that is more resilient to small changes in kernel types.
This will allow modules that are not built every time the kernel
is to provide more resilient BTF, rather than have it invalidated
every time BTF ids for core kernel types change.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
include/linux/module.h | 2 ++
kernel/module/main.c | 5 ++++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/module.h b/include/linux/module.h
index ffa1c603163c..ccc5cc5e0850 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -510,6 +510,8 @@ struct module {
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
unsigned int btf_data_size;
void *btf_data;
+ unsigned int btf_base_data_size;
+ void *btf_base_data;
#endif
#ifdef CONFIG_JUMP_LABEL
struct jump_entry *jump_entries;
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 91e185607d4b..bd2870d9b175 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2154,6 +2154,8 @@ static int find_module_sections(struct module *mod, struct load_info *info)
#endif
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
mod->btf_data = any_section_objs(info, ".BTF", 1, &mod->btf_data_size);
+ mod->btf_base_data = any_section_objs(info, ".BTF.base", 1,
+ &mod->btf_base_data_size);
#endif
#ifdef CONFIG_JUMP_LABEL
mod->jump_entries = section_objs(info, "__jump_table",
@@ -2578,8 +2580,9 @@ static noinline int do_init_module(struct module *mod)
}
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
- /* .BTF is not SHF_ALLOC and will get removed, so sanitize pointer */
+ /* .BTF is not SHF_ALLOC and will get removed, so sanitize pointers */
mod->btf_data = NULL;
+ mod->btf_base_data = NULL;
#endif
/*
* We want to free module_init, but be aware that kallsyms may be
--
2.31.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v5 bpf-next 8/9] libbpf,bpf: share BTF relocate-related code with kernel
2024-05-28 12:23 [PATCH v5 bpf-next 0/9] bpf: support resilient split BTF Alan Maguire
` (6 preceding siblings ...)
2024-05-28 12:24 ` [PATCH v5 bpf-next 7/9] module, bpf: store BTF base pointer in struct module Alan Maguire
@ 2024-05-28 12:24 ` Alan Maguire
2024-05-31 9:34 ` Eduard Zingerman
2024-05-31 19:04 ` Andrii Nakryiko
2024-05-28 12:24 ` [PATCH v5 bpf-next 9/9] kbuild,bpf: add module-specific pahole flags for distilled base BTF Alan Maguire
8 siblings, 2 replies; 29+ messages in thread
From: Alan Maguire @ 2024-05-28 12:24 UTC (permalink / raw)
To: andrii, jolsa, acme, quentin
Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan, Alan Maguire
Share relocation implementation with the kernel. As part of this,
we also need the type/string visitation functions so add them to a
btf_common.c file that also gets shared with the kernel. Relocation
code in kernel and userspace is identical save for the impementation
of the reparenting of split BTF to the relocated base BTF and
retrieval of BTF header from "struct btf"; these small functions
need separate user-space and kernel implementations.
One other wrinkle on the kernel side is we have to map .BTF.ids in
modules as they were generated with the type ids used at BTF encoding
time. btf_relocate() optionally returns an array mapping from old BTF
ids to relocated ids, so we use that to fix up these references where
needed for kfuncs.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
include/linux/btf.h | 45 ++++++++++
kernel/bpf/Makefile | 10 ++-
kernel/bpf/btf.c | 168 +++++++++++++++++++++++++----------
tools/lib/bpf/Build | 2 +-
tools/lib/bpf/btf.c | 130 ---------------------------
tools/lib/bpf/btf_iter.c | 143 +++++++++++++++++++++++++++++
tools/lib/bpf/btf_relocate.c | 23 +++++
7 files changed, 344 insertions(+), 177 deletions(-)
create mode 100644 tools/lib/bpf/btf_iter.c
diff --git a/include/linux/btf.h b/include/linux/btf.h
index f9e56fd12a9f..21cb33ba62b3 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -140,6 +140,7 @@ extern const struct file_operations btf_fops;
const char *btf_get_name(const struct btf *btf);
void btf_get(struct btf *btf);
void btf_put(struct btf *btf);
+const struct btf_header *btf_header(const struct btf *btf);
int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_sz);
struct btf *btf_get_by_fd(int fd);
int btf_get_info_by_fd(const struct btf *btf,
@@ -212,8 +213,10 @@ int btf_get_fd_by_id(u32 id);
u32 btf_obj_id(const struct btf *btf);
bool btf_is_kernel(const struct btf *btf);
bool btf_is_module(const struct btf *btf);
+bool btf_is_vmlinux(const struct btf *btf);
struct module *btf_try_get_module(const struct btf *btf);
u32 btf_nr_types(const struct btf *btf);
+struct btf *btf_base_btf(const struct btf *btf);
bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
const struct btf_member *m,
u32 expected_offset, u32 expected_size);
@@ -339,6 +342,11 @@ static inline u8 btf_int_offset(const struct btf_type *t)
return BTF_INT_OFFSET(*(u32 *)(t + 1));
}
+static inline __u8 btf_int_bits(const struct btf_type *t)
+{
+ return BTF_INT_BITS(*(__u32 *)(t + 1));
+}
+
static inline bool btf_type_is_scalar(const struct btf_type *t)
{
return btf_type_is_int(t) || btf_type_is_enum(t);
@@ -478,6 +486,11 @@ static inline struct btf_param *btf_params(const struct btf_type *t)
return (struct btf_param *)(t + 1);
}
+static inline struct btf_decl_tag *btf_decl_tag(const struct btf_type *t)
+{
+ return (struct btf_decl_tag *)(t + 1);
+}
+
static inline int btf_id_cmp_func(const void *a, const void *b)
{
const int *pa = a, *pb = b;
@@ -515,9 +528,17 @@ static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *
}
#endif
+typedef int (*type_id_visit_fn)(__u32 *type_id, void *ctx);
+typedef int (*str_off_visit_fn)(__u32 *str_off, void *ctx);
+
#ifdef CONFIG_BPF_SYSCALL
const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
+void btf_set_base_btf(struct btf *btf, const struct btf *base_btf);
+int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **map_ids);
+int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx);
+int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx);
const char *btf_name_by_offset(const struct btf *btf, u32 offset);
+const char *btf_str_by_offset(const struct btf *btf, u32 offset);
struct btf *btf_parse_vmlinux(void);
struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id,
@@ -543,6 +564,30 @@ static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
{
return NULL;
}
+
+static inline void btf_set_base_btf(struct btf *btf, const struct btf *base_btf)
+{
+ return;
+}
+
+static inline int btf_relocate(void *log, struct btf *btf, const struct btf *base_btf,
+ __u32 **map_ids)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit,
+ void *ctx)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit,
+ void *ctx)
+{
+ return -EOPNOTSUPP;
+}
+
static inline const char *btf_name_by_offset(const struct btf *btf,
u32 offset)
{
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 7eb9ad3a3ae6..d9d148992fbf 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -50,5 +50,13 @@ endif
obj-$(CONFIG_BPF_PRELOAD) += preload/
obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
-$(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
+
+obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o
+
+obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o
+
+# Some source files are common to libbpf.
+vpath %.c $(srctree)/kernel/bpf:$(srctree)/tools/lib/bpf
+
+$(obj)/%.o: %.c FORCE
$(call if_changed_rule,cc_o_c)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 821063660d9f..6715de908581 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -274,6 +274,7 @@ struct btf {
u32 start_str_off; /* first string offset (0 for base BTF) */
char name[MODULE_NAME_LEN];
bool kernel_btf;
+ __u32 *base_id_map; /* map from distilled base BTF -> vmlinux BTF ids */
};
enum verifier_phase {
@@ -530,6 +531,11 @@ static bool btf_type_is_decl_tag_target(const struct btf_type *t)
btf_type_is_var(t) || btf_type_is_typedef(t);
}
+bool btf_is_vmlinux(const struct btf *btf)
+{
+ return btf->kernel_btf && !btf->base_btf;
+}
+
u32 btf_nr_types(const struct btf *btf)
{
u32 total = 0;
@@ -772,7 +778,7 @@ static bool __btf_name_char_ok(char c, bool first)
return true;
}
-static const char *btf_str_by_offset(const struct btf *btf, u32 offset)
+const char *btf_str_by_offset(const struct btf *btf, u32 offset)
{
while (offset < btf->start_str_off)
btf = btf->base_btf;
@@ -1735,7 +1741,12 @@ static void btf_free(struct btf *btf)
kvfree(btf->types);
kvfree(btf->resolved_sizes);
kvfree(btf->resolved_ids);
- kvfree(btf->data);
+ /* vmlinux does not allocate btf->data, it simply points it at
+ * __start_BTF.
+ */
+ if (!btf_is_vmlinux(btf))
+ kvfree(btf->data);
+ kvfree(btf->base_id_map);
kfree(btf);
}
@@ -1764,6 +1775,23 @@ void btf_put(struct btf *btf)
}
}
+struct btf *btf_base_btf(const struct btf *btf)
+{
+ return btf->base_btf;
+}
+
+const struct btf_header *btf_header(const struct btf *btf)
+{
+ return &btf->hdr;
+}
+
+void btf_set_base_btf(struct btf *btf, const struct btf *base_btf)
+{
+ btf->base_btf = (struct btf *)base_btf;
+ btf->start_id = btf_nr_types(base_btf);
+ btf->start_str_off = base_btf->hdr.str_len;
+}
+
static int env_resolve_init(struct btf_verifier_env *env)
{
struct btf *btf = env->btf;
@@ -5982,23 +6010,15 @@ int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_ty
BTF_ID_LIST(bpf_ctx_convert_btf_id)
BTF_ID(struct, bpf_ctx_convert)
-struct btf *btf_parse_vmlinux(void)
+static struct btf *btf_parse_base(struct btf_verifier_env *env, const char *name,
+ void *data, unsigned int data_size)
{
- struct btf_verifier_env *env = NULL;
- struct bpf_verifier_log *log;
struct btf *btf = NULL;
int err;
if (!IS_ENABLED(CONFIG_DEBUG_INFO_BTF))
return ERR_PTR(-ENOENT);
- env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
- if (!env)
- return ERR_PTR(-ENOMEM);
-
- log = &env->log;
- log->level = BPF_LOG_KERNEL;
-
btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
if (!btf) {
err = -ENOMEM;
@@ -6006,10 +6026,10 @@ struct btf *btf_parse_vmlinux(void)
}
env->btf = btf;
- btf->data = __start_BTF;
- btf->data_size = __stop_BTF - __start_BTF;
+ btf->data = data;
+ btf->data_size = data_size;
btf->kernel_btf = true;
- snprintf(btf->name, sizeof(btf->name), "vmlinux");
+ snprintf(btf->name, sizeof(btf->name), "%s", name);
err = btf_parse_hdr(env);
if (err)
@@ -6029,20 +6049,11 @@ struct btf *btf_parse_vmlinux(void)
if (err)
goto errout;
- /* btf_parse_vmlinux() runs under bpf_verifier_lock */
- bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]);
-
refcount_set(&btf->refcnt, 1);
- err = btf_alloc_id(btf);
- if (err)
- goto errout;
-
- btf_verifier_env_free(env);
return btf;
errout:
- btf_verifier_env_free(env);
if (btf) {
kvfree(btf->types);
kfree(btf);
@@ -6050,19 +6061,61 @@ struct btf *btf_parse_vmlinux(void)
return ERR_PTR(err);
}
+struct btf *btf_parse_vmlinux(void)
+{
+ struct btf_verifier_env *env = NULL;
+ struct bpf_verifier_log *log;
+ struct btf *btf;
+ int err;
+
+ env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
+ if (!env)
+ return ERR_PTR(-ENOMEM);
+
+ log = &env->log;
+ log->level = BPF_LOG_KERNEL;
+ btf = btf_parse_base(env, "vmlinux", __start_BTF, __stop_BTF - __start_BTF);
+ if (IS_ERR(btf))
+ goto err_out;
+
+ /* btf_parse_vmlinux() runs under bpf_verifier_lock */
+ bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]);
+ err = btf_alloc_id(btf);
+ if (err) {
+ btf_free(btf);
+ btf = ERR_PTR(err);
+ }
+err_out:
+ btf_verifier_env_free(env);
+ return btf;
+}
+
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
-static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
+/* If .BTF_ids section was created with distilled base BTF, both base and
+ * split BTF ids will need to be mapped to actual base/split ids for
+ * BTF now that it has been relocated.
+ */
+static __u32 btf_relocate_id(const struct btf *btf, __u32 id)
{
+ if (!btf->base_btf || !btf->base_id_map)
+ return id;
+ return btf->base_id_map[id];
+}
+
+static struct btf *btf_parse_module(const char *module_name, const void *data,
+ unsigned int data_size, void *base_data,
+ unsigned int base_data_size)
+{
+ struct btf *btf = NULL, *vmlinux_btf, *base_btf = NULL;
struct btf_verifier_env *env = NULL;
struct bpf_verifier_log *log;
- struct btf *btf = NULL, *base_btf;
- int err;
+ int err = 0;
- base_btf = bpf_get_btf_vmlinux();
- if (IS_ERR(base_btf))
- return base_btf;
- if (!base_btf)
+ vmlinux_btf = bpf_get_btf_vmlinux();
+ if (IS_ERR(vmlinux_btf))
+ return vmlinux_btf;
+ if (!vmlinux_btf)
return ERR_PTR(-EINVAL);
env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
@@ -6072,6 +6125,16 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
log = &env->log;
log->level = BPF_LOG_KERNEL;
+ if (base_data) {
+ base_btf = btf_parse_base(env, ".BTF.base", base_data, base_data_size);
+ if (IS_ERR(base_btf)) {
+ err = PTR_ERR(base_btf);
+ goto errout;
+ }
+ } else {
+ base_btf = vmlinux_btf;
+ }
+
btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
if (!btf) {
err = -ENOMEM;
@@ -6111,12 +6174,22 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
if (err)
goto errout;
+ if (base_btf != vmlinux_btf) {
+ err = btf_relocate(btf, vmlinux_btf, &btf->base_id_map);
+ if (err)
+ goto errout;
+ btf_free(base_btf);
+ base_btf = vmlinux_btf;
+ }
+
btf_verifier_env_free(env);
refcount_set(&btf->refcnt, 1);
return btf;
errout:
btf_verifier_env_free(env);
+ if (base_btf != vmlinux_btf)
+ btf_free(base_btf);
if (btf) {
kvfree(btf->data);
kvfree(btf->types);
@@ -7669,7 +7742,8 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
err = -ENOMEM;
goto out;
}
- btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
+ btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size,
+ mod->btf_base_data, mod->btf_base_data_size);
if (IS_ERR(btf)) {
kfree(btf_mod);
if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH)) {
@@ -7993,7 +8067,7 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
bool add_filter = !!kset->filter;
struct btf_kfunc_set_tab *tab;
struct btf_id_set8 *set;
- u32 set_cnt;
+ u32 set_cnt, i;
int ret;
if (hook >= BTF_KFUNC_HOOK_MAX) {
@@ -8039,21 +8113,15 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
goto end;
}
- /* We don't need to allocate, concatenate, and sort module sets, because
- * only one is allowed per hook. Hence, we can directly assign the
- * pointer and return.
- */
- if (!vmlinux_set) {
- tab->sets[hook] = add_set;
- goto do_add_filter;
- }
-
/* In case of vmlinux sets, there may be more than one set being
* registered per hook. To create a unified set, we allocate a new set
* and concatenate all individual sets being registered. While each set
* is individually sorted, they may become unsorted when concatenated,
* hence re-sorting the final set again is required to make binary
* searching the set using btf_id_set8_contains function work.
+ *
+ * For module sets, we need to allocate as we may need to relocate
+ * BTF ids.
*/
set_cnt = set ? set->cnt : 0;
@@ -8083,11 +8151,14 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
/* Concatenate the two sets */
memcpy(set->pairs + set->cnt, add_set->pairs, add_set->cnt * sizeof(set->pairs[0]));
+ /* Now that the set is copied, update with relocated BTF ids */
+ for (i = set->cnt; i < set->cnt + add_set->cnt; i++)
+ set->pairs[i].id = btf_relocate_id(btf, set->pairs[i].id);
+
set->cnt += add_set->cnt;
sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL);
-do_add_filter:
if (add_filter) {
hook_filter = &tab->hook_filters[hook];
hook_filter->filters[hook_filter->nr_filters++] = kset->filter;
@@ -8207,7 +8278,7 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
return PTR_ERR(btf);
for (i = 0; i < kset->set->cnt; i++) {
- ret = btf_check_kfunc_protos(btf, kset->set->pairs[i].id,
+ ret = btf_check_kfunc_protos(btf, btf_relocate_id(btf, kset->set->pairs[i].id),
kset->set->pairs[i].flags);
if (ret)
goto err_out;
@@ -8306,7 +8377,7 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c
{
struct btf_id_dtor_kfunc_tab *tab;
struct btf *btf;
- u32 tab_cnt;
+ u32 tab_cnt, i;
int ret;
btf = btf_get_module_btf(owner);
@@ -8357,6 +8428,13 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c
btf->dtor_kfunc_tab = tab;
memcpy(tab->dtors + tab->cnt, dtors, add_cnt * sizeof(tab->dtors[0]));
+
+ /* remap BTF ids based on BTF relocation (if any) */
+ for (i = tab_cnt; i < tab_cnt + add_cnt; i++) {
+ tab->dtors[i].btf_id = btf_relocate_id(btf, tab->dtors[i].btf_id);
+ tab->dtors[i].kfunc_btf_id = btf_relocate_id(btf, tab->dtors[i].kfunc_btf_id);
+ }
+
tab->cnt += add_cnt;
sort(tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func, NULL);
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index 336da6844d42..e2cd558ca0b4 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1,4 +1,4 @@
libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \
btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
- usdt.o zip.o elf.o features.o btf_relocate.o
+ usdt.o zip.o elf.o features.o btf_iter.o btf_relocate.o
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index b57f74eedda0..6c3c17bb6ca7 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -5060,136 +5060,6 @@ struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_bt
return btf__parse_split(path, vmlinux_btf);
}
-int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx)
-{
- int i, n, err;
-
- switch (btf_kind(t)) {
- case BTF_KIND_INT:
- case BTF_KIND_FLOAT:
- case BTF_KIND_ENUM:
- case BTF_KIND_ENUM64:
- return 0;
-
- case BTF_KIND_FWD:
- case BTF_KIND_CONST:
- case BTF_KIND_VOLATILE:
- case BTF_KIND_RESTRICT:
- case BTF_KIND_PTR:
- case BTF_KIND_TYPEDEF:
- case BTF_KIND_FUNC:
- case BTF_KIND_VAR:
- case BTF_KIND_DECL_TAG:
- case BTF_KIND_TYPE_TAG:
- return visit(&t->type, ctx);
-
- case BTF_KIND_ARRAY: {
- struct btf_array *a = btf_array(t);
-
- err = visit(&a->type, ctx);
- err = err ?: visit(&a->index_type, ctx);
- return err;
- }
-
- case BTF_KIND_STRUCT:
- case BTF_KIND_UNION: {
- struct btf_member *m = btf_members(t);
-
- for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
- err = visit(&m->type, ctx);
- if (err)
- return err;
- }
- return 0;
- }
-
- case BTF_KIND_FUNC_PROTO: {
- struct btf_param *m = btf_params(t);
-
- err = visit(&t->type, ctx);
- if (err)
- return err;
- for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
- err = visit(&m->type, ctx);
- if (err)
- return err;
- }
- return 0;
- }
-
- case BTF_KIND_DATASEC: {
- struct btf_var_secinfo *m = btf_var_secinfos(t);
-
- for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
- err = visit(&m->type, ctx);
- if (err)
- return err;
- }
- return 0;
- }
-
- default:
- return -EINVAL;
- }
-}
-
-int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx)
-{
- int i, n, err;
-
- err = visit(&t->name_off, ctx);
- if (err)
- return err;
-
- switch (btf_kind(t)) {
- case BTF_KIND_STRUCT:
- case BTF_KIND_UNION: {
- struct btf_member *m = btf_members(t);
-
- for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
- err = visit(&m->name_off, ctx);
- if (err)
- return err;
- }
- break;
- }
- case BTF_KIND_ENUM: {
- struct btf_enum *m = btf_enum(t);
-
- for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
- err = visit(&m->name_off, ctx);
- if (err)
- return err;
- }
- break;
- }
- case BTF_KIND_ENUM64: {
- struct btf_enum64 *m = btf_enum64(t);
-
- for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
- err = visit(&m->name_off, ctx);
- if (err)
- return err;
- }
- break;
- }
- case BTF_KIND_FUNC_PROTO: {
- struct btf_param *m = btf_params(t);
-
- for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
- err = visit(&m->name_off, ctx);
- if (err)
- return err;
- }
- break;
- }
- default:
- break;
- }
-
- return 0;
-}
-
int btf_ext_visit_type_ids(struct btf_ext *btf_ext, type_id_visit_fn visit, void *ctx)
{
const struct btf_ext_info *seg;
diff --git a/tools/lib/bpf/btf_iter.c b/tools/lib/bpf/btf_iter.c
new file mode 100644
index 000000000000..47d615dee8c5
--- /dev/null
+++ b/tools/lib/bpf/btf_iter.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+/* Copyright (c) 2021 Facebook */
+/* Copyright (c) 2024, Oracle and/or its affiliates. */
+
+#ifdef __KERNEL__
+#include <linux/bpf.h>
+#include <linux/btf.h>
+
+#define btf_var_secinfos(t) (struct btf_var_secinfo *)btf_type_var_secinfo(t)
+
+#else
+#include "btf.h"
+#include "libbpf_internal.h"
+#endif
+
+int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx)
+{
+ int i, n, err;
+
+ switch (btf_kind(t)) {
+ case BTF_KIND_INT:
+ case BTF_KIND_FLOAT:
+ case BTF_KIND_ENUM:
+ case BTF_KIND_ENUM64:
+ return 0;
+
+ case BTF_KIND_FWD:
+ case BTF_KIND_CONST:
+ case BTF_KIND_VOLATILE:
+ case BTF_KIND_RESTRICT:
+ case BTF_KIND_PTR:
+ case BTF_KIND_TYPEDEF:
+ case BTF_KIND_FUNC:
+ case BTF_KIND_VAR:
+ case BTF_KIND_DECL_TAG:
+ case BTF_KIND_TYPE_TAG:
+ return visit(&t->type, ctx);
+
+ case BTF_KIND_ARRAY: {
+ struct btf_array *a = btf_array(t);
+
+ err = visit(&a->type, ctx);
+ err = err ?: visit(&a->index_type, ctx);
+ return err;
+ }
+
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION: {
+ struct btf_member *m = btf_members(t);
+
+ for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+ err = visit(&m->type, ctx);
+ if (err)
+ return err;
+ }
+ return 0;
+ }
+ case BTF_KIND_FUNC_PROTO: {
+ struct btf_param *m = btf_params(t);
+
+ err = visit(&t->type, ctx);
+ if (err)
+ return err;
+ for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+ err = visit(&m->type, ctx);
+ if (err)
+ return err;
+ }
+ return 0;
+ }
+
+ case BTF_KIND_DATASEC: {
+ struct btf_var_secinfo *m = btf_var_secinfos(t);
+
+ for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+ err = visit(&m->type, ctx);
+ if (err)
+ return err;
+ }
+ return 0;
+ }
+
+ default:
+ return -EINVAL;
+ }
+}
+
+int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx)
+{
+ int i, n, err;
+
+ err = visit(&t->name_off, ctx);
+ if (err)
+ return err;
+
+ switch (btf_kind(t)) {
+ case BTF_KIND_STRUCT:
+ case BTF_KIND_UNION: {
+ struct btf_member *m = btf_members(t);
+
+ for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+ err = visit(&m->name_off, ctx);
+ if (err)
+ return err;
+ }
+ break;
+ }
+ case BTF_KIND_ENUM: {
+ struct btf_enum *m = btf_enum(t);
+
+ for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+ err = visit(&m->name_off, ctx);
+ if (err)
+ return err;
+ }
+ break;
+ }
+ case BTF_KIND_ENUM64: {
+ struct btf_enum64 *m = btf_enum64(t);
+
+ for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+ err = visit(&m->name_off, ctx);
+ if (err)
+ return err;
+ }
+ break;
+ }
+ case BTF_KIND_FUNC_PROTO: {
+ struct btf_param *m = btf_params(t);
+
+ for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+ err = visit(&m->name_off, ctx);
+ if (err)
+ return err;
+ }
+ break;
+ }
+ default:
+ break;
+ }
+
+ return 0;
+}
diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c
index f2e91cdfb5cc..3245c5fd0c75 100644
--- a/tools/lib/bpf/btf_relocate.c
+++ b/tools/lib/bpf/btf_relocate.c
@@ -5,11 +5,34 @@
#define _GNU_SOURCE
#endif
+#ifdef __KERNEL__
+#include <linux/bpf.h>
+#include <linux/bsearch.h>
+#include <linux/btf.h>
+#include <linux/sort.h>
+#include <linux/string.h>
+#include <linux/bpf_verifier.h>
+
+#define btf_type_by_id (struct btf_type *)btf_type_by_id
+#define btf__type_cnt btf_nr_types
+#define btf__base_btf btf_base_btf
+#define btf__name_by_offset btf_name_by_offset
+#define btf__str_by_offset btf_str_by_offset
+#define btf_kflag btf_type_kflag
+
+#define calloc(nmemb, sz) kvcalloc(nmemb, sz, GFP_KERNEL | __GFP_NOWARN)
+#define free(ptr) kvfree(ptr)
+#define qsort(base, num, sz, cmp) sort(base, num, sz, cmp, NULL)
+
+#else
+
#include "btf.h"
#include "bpf.h"
#include "libbpf.h"
#include "libbpf_internal.h"
+#endif /* __KERNEL__ */
+
struct btf;
struct btf_relocate {
--
2.31.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v5 bpf-next 9/9] kbuild,bpf: add module-specific pahole flags for distilled base BTF
2024-05-28 12:23 [PATCH v5 bpf-next 0/9] bpf: support resilient split BTF Alan Maguire
` (7 preceding siblings ...)
2024-05-28 12:24 ` [PATCH v5 bpf-next 8/9] libbpf,bpf: share BTF relocate-related code with kernel Alan Maguire
@ 2024-05-28 12:24 ` Alan Maguire
2024-05-31 19:06 ` Andrii Nakryiko
8 siblings, 1 reply; 29+ messages in thread
From: Alan Maguire @ 2024-05-28 12:24 UTC (permalink / raw)
To: andrii, jolsa, acme, quentin
Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan, Alan Maguire
Support creation of module BTF along with distilled base BTF;
the latter is stored in a .BTF.base ELF section and supplements
split BTF references to base BTF with information about base types,
allowing for later relocation of split BTF with a (possibly
changed) base. resolve_btfids detects the presence of a .BTF.base
section and will use it instead of the base BTF it is passed in
BTF id resolution.
Modules will be built with a distilled .BTF.base section for external
module build, i.e.
make -C. -M=path2/module
...while in-tree module build as part of a normal kernel build will
not generate distilled base BTF; this is because in-tree modules
change with the kernel and do not require BTF relocation for the
running vmlinux.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
scripts/Makefile.btf | 5 +++++
scripts/Makefile.modfinal | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
index bca8a8f26ea4..191b4903e864 100644
--- a/scripts/Makefile.btf
+++ b/scripts/Makefile.btf
@@ -21,8 +21,13 @@ else
# Switch to using --btf_features for v1.26 and later.
pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func
+ifneq ($(KBUILD_EXTMOD),)
+module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base
+endif
+
endif
pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE) += --lang_exclude=rust
export PAHOLE_FLAGS := $(pahole-flags-y)
+export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y)
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 79fcf2731686..6d2b8da98ee5 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -39,7 +39,7 @@ quiet_cmd_btf_ko = BTF [M] $@
if [ ! -f vmlinux ]; then \
printf "Skipping BTF generation for %s due to unavailability of vmlinux\n" $@ 1>&2; \
else \
- LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) --btf_base vmlinux $@; \
+ LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) $(MODULE_PAHOLE_FLAGS) --btf_base vmlinux $@; \
$(RESOLVE_BTFIDS) -b vmlinux $@; \
fi;
--
2.31.1
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 7/9] module, bpf: store BTF base pointer in struct module
2024-05-28 12:24 ` [PATCH v5 bpf-next 7/9] module, bpf: store BTF base pointer in struct module Alan Maguire
@ 2024-05-28 18:27 ` Luis Chamberlain
0 siblings, 0 replies; 29+ messages in thread
From: Luis Chamberlain @ 2024-05-28 18:27 UTC (permalink / raw)
To: Alan Maguire
Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
haoluo, houtao1, bpf, masahiroy, nathan
On Tue, May 28, 2024 at 01:24:06PM +0100, Alan Maguire wrote:
> ...as this will allow split BTF modules with a base BTF
> representation (rather than the full vmlinux BTF at time of
> BTF encoding) to resolve their references to kernel types in a
> way that is more resilient to small changes in kernel types.
>
> This will allow modules that are not built every time the kernel
> is to provide more resilient BTF, rather than have it invalidated
> every time BTF ids for core kernel types change.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Luis
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 1/9] libbpf: add btf__distill_base() creating split BTF with distilled base BTF
2024-05-28 12:24 ` [PATCH v5 bpf-next 1/9] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
@ 2024-05-30 19:58 ` Eduard Zingerman
0 siblings, 0 replies; 29+ messages in thread
From: Eduard Zingerman @ 2024-05-30 19:58 UTC (permalink / raw)
To: Alan Maguire, andrii, jolsa, acme, quentin
Cc: mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan
On Tue, 2024-05-28 at 13:24 +0100, Alan Maguire wrote:
> To support more robust split BTF, adding supplemental context for the
> base BTF type ids that split BTF refers to is required. Without such
> references, a simple shuffling of base BTF type ids (without any other
> significant change) invalidates the split BTF. Here the attempt is made
> to store additional context to make split BTF more robust.
>
> This context comes in the form of distilled base BTF providing minimal
> information (name and - in some cases - size) for base INTs, FLOATs,
> STRUCTs, UNIONs, ENUMs and ENUM64s along with modified split BTF that
> points at that base and contains any additional types needed (such as
> TYPEDEF, PTR and anonymous STRUCT/UNION declarations). This
> information constitutes the minimal BTF representation needed to
> disambiguate or remove split BTF references to base BTF. The rules
> are as follows:
>
> - INT, FLOAT, FWD are recorded in full.
> - if a named base BTF STRUCT or UNION is referred to from split BTF, it
> will be encoded as a zero-member sized STRUCT/UNION (preserving
> size for later relocation checks). Only base BTF STRUCT/UNIONs
> that are either embedded in split BTF STRUCT/UNIONs or that have
> multiple STRUCT/UNION instances of the same name will _need_ size
> checks at relocation time, but as it is possible a different set of
> types will be duplicates in the later to-be-resolved base BTF,
> we preserve size information for all named STRUCT/UNIONs.
> - if an ENUM[64] is named, a ENUM forward representation (an ENUM
> with no values) of the same size is used.
> - in all other cases, the type is added to the new split BTF.
>
> Avoiding struct/union/enum/enum64 expansion is important to keep the
> distilled base BTF representation to a minimum size.
>
> When successful, new representations of the distilled base BTF and new
> split BTF that refers to it are returned. Both need to be freed by the
> caller.
>
> So to take a simple example, with split BTF with a type referring
> to "struct sk_buff", we will generate distilled base BTF with a
> 0-member STRUCT sk_buff of the appropriate size, and the split BTF
> will refer to it instead.
>
> Tools like pahole can utilize such split BTF to populate the .BTF
> section (split BTF) and an additional .BTF.base section. Then
> when the split BTF is loaded, the distilled base BTF can be used
> to relocate split BTF to reference the current (and possibly changed)
> base BTF.
>
> So for example if "struct sk_buff" was id 502 when the split BTF was
> originally generated, we can use the distilled base BTF to see that
> id 502 refers to a "struct sk_buff" and replace instances of id 502
> with the current (relocated) base BTF sk_buff type id.
>
> Distilled base BTF is small; when building a kernel with all modules
> using distilled base BTF as a test, overall module size grew by only
> 5.3Mb total across ~2700 modules.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
I think this looks good, don't see any logical inconsistencies.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 2/9] selftests/bpf: test distilled base, split BTF generation
2024-05-28 12:24 ` [PATCH v5 bpf-next 2/9] selftests/bpf: test distilled base, split BTF generation Alan Maguire
@ 2024-05-30 20:23 ` Eduard Zingerman
0 siblings, 0 replies; 29+ messages in thread
From: Eduard Zingerman @ 2024-05-30 20:23 UTC (permalink / raw)
To: Alan Maguire, andrii, jolsa, acme, quentin
Cc: mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan
On Tue, 2024-05-28 at 13:24 +0100, Alan Maguire wrote:
> Test generation of split+distilled base BTF, ensuring that
>
> - named base BTF STRUCTs and UNIONs are represented as 0-vlen sized
> STRUCT/UNIONs
> - named ENUM[64]s are represented as 0-vlen named ENUM[64]s
> - anonymous struct/unions are represented in full in split BTF
> - anonymous enums are represented in full in split BTF
> - types unreferenced from split BTF are not present in distilled
> base BTF
>
> Also test that with vmlinux BTF and split BTF based upon it,
> we only represent needed base types referenced from split BTF
> in distilled base.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation
2024-05-28 12:24 ` [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation Alan Maguire
@ 2024-05-31 2:22 ` Eduard Zingerman
2024-05-31 15:38 ` Alan Maguire
2024-05-31 18:21 ` Andrii Nakryiko
1 sibling, 1 reply; 29+ messages in thread
From: Eduard Zingerman @ 2024-05-31 2:22 UTC (permalink / raw)
To: Alan Maguire
Cc: andrii, jolsa, acme, quentin, mykolal, ast, daniel, martin.lau,
song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
houtao1, bpf, masahiroy, mcgrof, nathan
On Tue, 2024-05-28 at 13:24 +0100, Alan Maguire wrote:
[...]
> +/* Build a map from distilled base BTF ids to base BTF ids. To do so, iterate
> + * through base BTF looking up distilled type (using binary search) equivalents.
> + */
> +static int btf_relocate_map_distilled_base(struct btf_relocate *r)
> +{
I have several observations about this algorithm.
The algorithm does Base.Cnt * log2(Dist.Cnt) binary searches.
However, it might be better to switch searches around
and look for distilled {name,size} pairs in base btf,
doing Dist.Cnt * log2(Base.cnt) searches instead.
Suppose Base.Cnt = 2**20 and Dist.Cnt = 2**10, in such case:
- Base.Cnt * log2(Dist.Cnt) = 2**20 * 10
- Dist.Cnt * log2(Base.cnt) = 2**10 * 20, which is smaller
The algorithm might not handle name duplicates in the distilled BTF well,
e.g. in theory, the following is a valid C code
struct foo { int f; }; // sizeof(struct foo) == 4
typedef int foo; // sizeof(foo) == 4
Suppose that these types are a part of the distilled BTF.
Depending on which one would end up first in 'dist_base_info_sorted'
bsearch might fail to find one or the other.
Also, algorithm does not report an error if there are several
types with the same name and size in the base BTF.
I suggest to modify the algorithm as follows:
- let 'base_info_sorted' be a set of tuples {kind,name,size,id}
corresponding to the base BTF, sorted by kind, name and size;
- add a custom utility bsearch_unique, that behaves like bsearch,
but returns NULL if entry is non-unique with regards to current
predicate (e.g. use bsearch but also check neighbors);
- for each type D in the distilled base:
- use bsearch_unique to find entry E in 'base_info_sorted'
that matches D.{kind,name,size} sub-tuple;
- if E exists, set id_map[D] := E.id;
- if E does not exist:
- if id_map[D] == BTF_IS_EMBEDDED, report an error;
- if id_map[D] != BTF_IS_EMBEDDED:
- use bsearch_unique to find entry E in 'base_info_sorted'
that matches D.{kind,name} sub-tuple;
- if E exists, set id_map[D] := E.id;
- otherwise, report an error.
This allows to:
- flip the search order, potentially gaining some speed;
- drop the 'base_name_cnt' array and logic;
- handle the above hypothetical name conflict example.
Wdyt?
> + struct btf_name_info *dist_base_info_sorted;
> + struct btf_type *base_t, *dist_t, *split_t;
> + __u8 *base_name_cnt = NULL;
> + int err = 0;
> + __u32 id;
> +
> + /* generate a sort index array of name/type ids sorted by name for
> + * distilled base BTF to speed name-based lookups.
> + */
> + dist_base_info_sorted = calloc(r->nr_dist_base_types, sizeof(*dist_base_info_sorted));
> + if (!dist_base_info_sorted) {
> + err = -ENOMEM;
> + goto done;
> + }
> + for (id = 0; id < r->nr_dist_base_types; id++) {
> + dist_t = btf_type_by_id(r->dist_base_btf, id);
> + dist_base_info_sorted[id].name = btf__name_by_offset(r->dist_base_btf,
> + dist_t->name_off);
> + dist_base_info_sorted[id].id = id;
> + dist_base_info_sorted[id].size = dist_t->size;
> + dist_base_info_sorted[id].needs_size = true;
> + }
> + qsort(dist_base_info_sorted, r->nr_dist_base_types, sizeof(*dist_base_info_sorted),
> + cmp_btf_name_size);
> +
> + /* Mark distilled base struct/union members of split BTF structs/unions
> + * in id_map with BTF_IS_EMBEDDED; this signals that these types
> + * need to match both name and size, otherwise embeddding the base
> + * struct/union in the split type is invalid.
> + */
> + for (id = r->nr_dist_base_types; id < r->nr_split_types; id++) {
> + split_t = btf_type_by_id(r->btf, id);
> + if (btf_is_composite(split_t)) {
> + err = btf_type_visit_type_ids(split_t, btf_mark_embedded_composite_type_ids,
> + r);
> + if (err < 0)
> + goto done;
> + }
> + }
> +
> + /* Collect name counts for composite types in base BTF. If multiple
> + * instances of a struct/union of the same name exist, we need to use
> + * size to determine which to map to since name alone is ambiguous.
> + */
> + base_name_cnt = calloc(r->base_str_len, sizeof(*base_name_cnt));
> + if (!base_name_cnt) {
> + err = -ENOMEM;
> + goto done;
> + }
> + for (id = 1; id < r->nr_base_types; id++) {
> + base_t = btf_type_by_id(r->base_btf, id);
> + if (!btf_is_composite(base_t) || !base_t->name_off)
> + continue;
> + if (base_name_cnt[base_t->name_off] < 255)
> + base_name_cnt[base_t->name_off]++;
> + }
> +
> + /* Now search base BTF for matching distilled base BTF types. */
> + for (id = 1; id < r->nr_base_types; id++) {
> + struct btf_name_info *dist_name_info, base_name_info = {};
> + int dist_kind, base_kind;
> +
> + base_t = btf_type_by_id(r->base_btf, id);
> + /* distilled base consists of named types only. */
> + if (!base_t->name_off)
> + continue;
> + base_kind = btf_kind(base_t);
> + base_name_info.id = id;
> + base_name_info.name = btf__name_by_offset(r->base_btf, base_t->name_off);
> + switch (base_kind) {
> + case BTF_KIND_INT:
> + case BTF_KIND_FLOAT:
> + case BTF_KIND_ENUM:
> + case BTF_KIND_ENUM64:
> + /* These types should match both name and size */
> + base_name_info.needs_size = true;
> + base_name_info.size = base_t->size;
> + break;
> + case BTF_KIND_FWD:
> + /* No size considerations for fwds. */
> + break;
> + case BTF_KIND_STRUCT:
> + case BTF_KIND_UNION:
> + /* Size only needs to be used for struct/union if there
> + * are multiple types in base BTF with the same name.
> + * If there are multiple _distilled_ types with the same
> + * name (a very unlikely scenario), that doesn't matter
> + * unless corresponding _base_ types to match them are
> + * missing.
> + */
> + base_name_info.needs_size = base_name_cnt[base_t->name_off] > 1;
> + base_name_info.size = base_t->size;
> + break;
> + default:
> + continue;
> + }
> + dist_name_info = bsearch(&base_name_info, dist_base_info_sorted,
> + r->nr_dist_base_types, sizeof(*dist_base_info_sorted),
> + cmp_btf_name_size);
> + if (!dist_name_info)
> + continue;
> + if (!dist_name_info->id || dist_name_info->id > r->nr_dist_base_types) {
> + pr_warn("base BTF id [%d] maps to invalid distilled base BTF id [%d]\n",
> + id, dist_name_info->id);
> + err = -EINVAL;
> + goto done;
> + }
> + dist_t = btf_type_by_id(r->dist_base_btf, dist_name_info->id);
> + dist_kind = btf_kind(dist_t);
> +
> + /* Validate that the found distilled type is compatible.
> + * Do not error out on mismatch as another match may occur
> + * for an identically-named type.
> + */
> + switch (dist_kind) {
> + case BTF_KIND_FWD:
> + switch (base_kind) {
> + case BTF_KIND_FWD:
> + if (btf_kflag(dist_t) != btf_kflag(base_t))
> + continue;
> + break;
> + case BTF_KIND_STRUCT:
> + if (btf_kflag(base_t))
> + continue;
> + break;
> + case BTF_KIND_UNION:
> + if (!btf_kflag(base_t))
> + continue;
> + break;
> + default:
> + continue;
> + }
> + break;
> + case BTF_KIND_INT:
> + if (dist_kind != base_kind ||
> + btf_int_encoding(base_t) != btf_int_encoding(dist_t))
> + continue;
> + break;
> + case BTF_KIND_FLOAT:
> + if (dist_kind != base_kind)
> + continue;
> + break;
> + case BTF_KIND_ENUM:
> + /* ENUM and ENUM64 are encoded as sized ENUM in
> + * distilled base BTF.
> + */
> + if (dist_kind != base_kind && base_kind != BTF_KIND_ENUM64)
> + continue;
> + break;
> + case BTF_KIND_STRUCT:
> + case BTF_KIND_UNION:
> + /* size verification is required for embedded
> + * struct/unions.
> + */
> + if (r->id_map[dist_name_info->id] == BTF_IS_EMBEDDED &&
> + base_t->size != dist_t->size)
> + continue;
> + break;
> + default:
> + continue;
> + }
> + /* map id and name */
> + r->id_map[dist_name_info->id] = id;
> + r->str_map[dist_t->name_off] = base_t->name_off;
> + }
> + /* ensure all distilled BTF ids now have a mapping... */
> + for (id = 1; id < r->nr_dist_base_types; id++) {
> + const char *name;
> +
> + if (r->id_map[id] && r->id_map[id] != BTF_IS_EMBEDDED)
> + continue;
> + dist_t = btf_type_by_id(r->dist_base_btf, id);
> + name = btf__name_by_offset(r->dist_base_btf, dist_t->name_off);
> + pr_warn("distilled base BTF type '%s' [%d] is not mapped to base BTF id\n",
> + name, id);
> + err = -EINVAL;
> + break;
> + }
> +done:
> + free(base_name_cnt);
> + free(dist_base_info_sorted);
> + return err;
> +}
[...]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 4/9] selftests/bpf: extend distilled BTF tests to cover BTF relocation
2024-05-28 12:24 ` [PATCH v5 bpf-next 4/9] selftests/bpf: extend distilled BTF tests to cover " Alan Maguire
@ 2024-05-31 2:23 ` Eduard Zingerman
0 siblings, 0 replies; 29+ messages in thread
From: Eduard Zingerman @ 2024-05-31 2:23 UTC (permalink / raw)
To: Alan Maguire, andrii, jolsa, acme, quentin
Cc: mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan
On Tue, 2024-05-28 at 13:24 +0100, Alan Maguire wrote:
> Ensure relocated BTF looks as expected; in this case identical to
> original split BTF, with a few duplicate anonymous types added to
> split BTF by the relocation process.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
I think we now need a few test cases with types that have duplicate names.
Wdyt?
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 8/9] libbpf,bpf: share BTF relocate-related code with kernel
2024-05-28 12:24 ` [PATCH v5 bpf-next 8/9] libbpf,bpf: share BTF relocate-related code with kernel Alan Maguire
@ 2024-05-31 9:34 ` Eduard Zingerman
2024-05-31 16:13 ` Alan Maguire
2024-05-31 19:04 ` Andrii Nakryiko
1 sibling, 1 reply; 29+ messages in thread
From: Eduard Zingerman @ 2024-05-31 9:34 UTC (permalink / raw)
To: Alan Maguire, andrii, jolsa, acme, quentin
Cc: mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan
On Tue, 2024-05-28 at 13:24 +0100, Alan Maguire wrote:
> Share relocation implementation with the kernel. As part of this,
> we also need the type/string visitation functions so add them to a
> btf_common.c file that also gets shared with the kernel. Relocation
> code in kernel and userspace is identical save for the impementation
> of the reparenting of split BTF to the relocated base BTF and
> retrieval of BTF header from "struct btf"; these small functions
> need separate user-space and kernel implementations.
>
> One other wrinkle on the kernel side is we have to map .BTF.ids in
> modules as they were generated with the type ids used at BTF encoding
> time. btf_relocate() optionally returns an array mapping from old BTF
> ids to relocated ids, so we use that to fix up these references where
> needed for kfuncs.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
I think we also need a test or a few tests to verify that ID sets are
converted correctly.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation
2024-05-31 2:22 ` Eduard Zingerman
@ 2024-05-31 15:38 ` Alan Maguire
2024-05-31 18:44 ` Andrii Nakryiko
2024-06-01 7:56 ` Eduard Zingerman
0 siblings, 2 replies; 29+ messages in thread
From: Alan Maguire @ 2024-05-31 15:38 UTC (permalink / raw)
To: Eduard Zingerman
Cc: andrii, jolsa, acme, quentin, mykolal, ast, daniel, martin.lau,
song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
houtao1, bpf, masahiroy, mcgrof, nathan
On 31/05/2024 03:22, Eduard Zingerman wrote:
> On Tue, 2024-05-28 at 13:24 +0100, Alan Maguire wrote:
>
> [...]
>
>> +/* Build a map from distilled base BTF ids to base BTF ids. To do so, iterate
>> + * through base BTF looking up distilled type (using binary search) equivalents.
>> + */
>> +static int btf_relocate_map_distilled_base(struct btf_relocate *r)
>> +{
>
> I have several observations about this algorithm.
>
> The algorithm does Base.Cnt * log2(Dist.Cnt) binary searches.
> However, it might be better to switch searches around
> and look for distilled {name,size} pairs in base btf,
> doing Dist.Cnt * log2(Base.cnt) searches instead.
> Suppose Base.Cnt = 2**20 and Dist.Cnt = 2**10, in such case:
> - Base.Cnt * log2(Dist.Cnt) = 2**20 * 10
> - Dist.Cnt * log2(Base.cnt) = 2**10 * 20, which is smaller
>
Hi Eduard,
I crunched some numbers on base, distilled base BTF to try and flesh
this out a bit more.
The number of struct, union, fwd, int, float, enum and enum64 types in
my vmlinux BTF is 14307.
This is 11% of the overall number of types, and it is this 11% we will
be searching distilled BTF for matches (we avoid searching for any other
types as we've previously verified that our distilled base BTF only
consists of these).
In terms of distilled BTF sizes, I built a kernel forcing distilled base
BTF for all 2708 modules to help get a sense for distilled .BTF.base
sizes. We can sort these by .BTF.base size by doing the following:
modules=$(find . -name '*.ko' -depth -print);
for module in $modules ; do size=$(objdump -h $module |awk '/.BTF.base/
{ print $3}'); printf '%s %d \n' $module "0x0$size" ; done |sort -n -k2
With this process, we see the largest .BTF.base section is
./drivers/net/ethernet/chelsio/cxgb4/cxgb4.ko 15732
...but most others are much less than 1k bytes in size. The in-tree
kernel modules probably utilize more kernel types so I suspect give us a
good sense of the worst case scenario for distilled BTF size.
Examining the .BTF.base section of cxgb4, we see 609 types.
So in this worst-case scenario I could generate, Base.Cnt = 14307,
Dist.Cnt = 600
Base.Cnt * log2(Dist.Cnt) = 14307 * 9 = 128763 comparisons
Dist.Cnt * log2(Base.cnt) = 600 * 14 = 8400 comparisons
So that looks pretty cut-and-dried, but we also have to factor in the
initial sort comparisons.
The in-kernel sort reports n*log2(n) + 0.37*n + o(n) comparisons on
average; for base BTF that means sorting requires at least
Base: 14307*14+0.37*14307 = 205592 comparisons
Dist: 600*9+0.37*600 = 5622 comparisons
So we get an inversion of the above results, with (unless I'm
miscalculating something), sorting distilled base BTF requiring less
comparisons overall across both sort+search.
Sort Comparisons Search comparisons Total
======================================================================
5622 (distilled) 128763 (to base) 134385
205592 (base) 8400 (to distilled) 213992
To validate the distilled numbers are roughly right, I DTraced[1]
comparison functions when loading cxgb4; as above we expect around
134385 across sort and search:
$ sudo dtrace -n 'fbt::cmp_btf_name_size:entry {@c=count(); }'
dtrace: description 'fbt::cmp_btf_name_size:entry ' matched 1 probe
^C
107971
So the number (107971 calls to cmp_btf_name_size) seem to be in the
ballpark overall; next I tried aggregating by stack to see if the
numbers look right in sort versus search:
$ sudo dtrace -n 'fbt::cmp_btf_name_size:entry {@c[stack()]=count(); }'
dtrace: description 'fbt::cmp_btf_name_size:entry ' matched 1 probe
^C
vmlinux`cmp_btf_name_size+0x5
vmlinux`sort+0x34
vmlinux`btf_relocate_map_distilled_base+0xce
vmlinux`btf_relocate+0x1e3
vmlinux`btf_parse_module+0x24b
vmlinux`btf_module_notify+0xee
vmlinux`notifier_call_chain+0x65
vmlinux`blocking_notifier_call_chain_robust+0x67
vmlinux`load_module+0x7fa
vmlinux`init_module_from_file+0x97
vmlinux`idempotent_init_module+0x109
vmlinux`__x64_sys_finit_module+0x64
vmlinux`x64_sys_call+0x1480
vmlinux`do_syscall_64+0x68
vmlinux`entry_SYSCALL_64_after_hwframe+0x76
5882
vmlinux`cmp_btf_name_size+0x5
vmlinux`btf_relocate_map_distilled_base+0x29f
vmlinux`btf_relocate+0x1e3
vmlinux`btf_parse_module+0x24b
vmlinux`btf_module_notify+0xee
vmlinux`notifier_call_chain+0x65
vmlinux`blocking_notifier_call_chain_robust+0x67
vmlinux`load_module+0x7fa
vmlinux`init_module_from_file+0x97
vmlinux`idempotent_init_module+0x109
vmlinux`__x64_sys_finit_module+0x64
vmlinux`x64_sys_call+0x1480
vmlinux`do_syscall_64+0x68
vmlinux`entry_SYSCALL_64_after_hwframe+0x76
102089
Yep, looks right - 5882 sort comparisons versus 102089 search comparisons.
I also traced the relocation time for btf_relocate() to complete
in-kernel, collecting the module name, distilled number of types, base
number of types and time for btf_relocate() to complete. I loaded 6
modules with varying numbers of distilled base BTF types.
$ sudo dtrace -n 'fbt::btf_relocate:entry
{ self->start = timestamp;
self->btf = (struct btf *)arg0;
self->nr_dist_types = self->btf->base_btf->nr_types
}
fbt::btf_relocate:return /self->start/
{
@reloc[stringof(self->btf->name), self->nr_dist_types,
self->btf->base_btf->nr_types] = avg(timestamp-self->start);
}'
dtrace: description 'fbt::btf_relocate:entry ' matched 2 probes
^C
hid_ite 11 124271
4432193
lib80211 12 124271
4445910
sctp 109 124271
5547703
ib_core 153 124271
5803176
bnxt_en 147 124271
5846436
cxgb4 610 124271
8081113
So the overall relocation time - from 11 distilled types in hid_ite to
610 for cxgb4 - is within a range from 4.5msec (4432193ns above) to
8msec. The times for relocation represent less than 50% of overall
module load times - the later vary from 11-18msec across these modules.
It would be great to find some performance wins here, but I don't
_think_ swapping the sort/search targets will buy us much unfortunately.
> The algorithm might not handle name duplicates in the distilled BTF well,
> e.g. in theory, the following is a valid C code
>
> struct foo { int f; }; // sizeof(struct foo) == 4
> typedef int foo; // sizeof(foo) == 4
>
> Suppose that these types are a part of the distilled BTF.
> Depending on which one would end up first in 'dist_base_info_sorted'
> bsearch might fail to find one or the other.
>
In the case of distilled base BTF, only struct, union, enum, enum64,
int, float and fwd can be present. Size matches would have to be between
one of these kinds I think, but are still possible nevertheless.
> Also, algorithm does not report an error if there are several
> types with the same name and size in the base BTF.
>
Yep, while we have to handle this, it only becomes an ambiguity problem
if distilled base BTF refers to one of such types. On my vmlinux I see
the following duplicate name/size STRUCTs
'd_partition' size=16
'elf_note_info' size=248
'getdents_callback' size=40
'instance_attribute' size=32
'intel_pinctrl_context' size=16
'intel_pinctrl' size=744
'perf_aux_event'size=16
'quirk_entry'size=8
Of these, 5 seem legit: d_partition, getdents_callback,
instance_attribute, perf_aux_event, quirk_entry.
A few seem to be identical, possibly dedup failures:
struct intel_pinctrl {
struct device *dev;
raw_spinlock_t lock;
struct pinctrl_desc pctldesc;
struct pinctrl_dev *pctldev;
struct gpio_chip chip;
const struct intel_pinctrl_soc_data *soc;
struct intel_community *communities;
size_t ncommunities;
struct intel_pinctrl_context context;
int irq;
};
struct intel_pinctrl___2 {
struct device *dev;
raw_spinlock_t lock;
struct pinctrl_desc pctldesc;
struct pinctrl_dev *pctldev;
struct gpio_chip chip;
const struct intel_pinctrl_soc_data *soc;
struct intel_community *communities;
size_t ncommunities;
struct intel_pinctrl_context___2 context;
int irq;
};
struct elf_thread_core_info;
struct elf_note_info {
struct elf_thread_core_info *thread;
struct memelfnote psinfo;
struct memelfnote signote;
struct memelfnote auxv;
struct memelfnote files;
siginfo_t csigdata;
size_t size;
int thread_notes;
};
struct elf_thread_core_info___2;
struct elf_note_info___2 {
struct elf_thread_core_info___2 *thread;
struct memelfnote psinfo;
struct memelfnote signote;
struct memelfnote auxv;
struct memelfnote files;
compat_siginfo_t csigdata;
size_t size;
int thread_notes;
};
Both of these share self-reference, either directly or indirectly so
maybe it's a corner-case of dedup we're missing. I'll dig into these later.
> I suggest to modify the algorithm as follows:
> - let 'base_info_sorted' be a set of tuples {kind,name,size,id}
> corresponding to the base BTF, sorted by kind, name and size;
That was my first thought, but we can't always search by kind; for
example it's possible the distilled base has a fwd and vmlinux only has
a struct kind for the same type name; in such a case we'd want to
support a match provided the fwd's kflag indicated a struct fwd.
In fact looking at the code we're missing logic for the opposite
condition (fwd only in base, struct in distilled base). I'll fix that.
The other case is an enum in distilled base matching an enum64
or an enum.
> - add a custom utility bsearch_unique, that behaves like bsearch,
> but returns NULL if entry is non-unique with regards to current
> predicate (e.g. use bsearch but also check neighbors);
> - for each type D in the distilled base:
> - use bsearch_unique to find entry E in 'base_info_sorted'
> that matches D.{kind,name,size} sub-tuple;
> - if E exists, set id_map[D] := E.id;
> - if E does not exist:
> - if id_map[D] == BTF_IS_EMBEDDED, report an error;
> - if id_map[D] != BTF_IS_EMBEDDED:
> - use bsearch_unique to find entry E in 'base_info_sorted'
> that matches D.{kind,name} sub-tuple;
> - if E exists, set id_map[D] := E.id;
> - otherwise, report an error.
>
> This allows to:
> - flip the search order, potentially gaining some speed;
> - drop the 'base_name_cnt' array and logic;
> - handle the above hypothetical name conflict example.
>
I think flipping the search order could gain search speed, but only at
the expense of slowing things down overall due to the extra cost of
having to sort so many more elements. I suspect it will mostly be a
wash, though numbers above seem to suggest sorting distilled base may
have an edge when we consider both search and sort. The question is
probably which sort/search order is most amenable to handling the data
and helping us deal with the edge cases like duplicates.
With the existing scheme, I think catching cases of name duplicates in
distilled base BTF and name/size duplicates in base BTF for types we
want to relocate from distilled base BTF and erroring out would suffice;
basically the following applied to this patch (patch 3 in the series)
diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c
index f2e91cdfb5cc..4e282ee8f183 100644
--- a/tools/lib/bpf/btf_relocate.c
+++ b/tools/lib/bpf/btf_relocate.c
@@ -113,6 +113,7 @@ static int btf_relocate_map_distilled_base(struct
btf_relocate *r)
{
struct btf_name_info *dist_base_info_sorted;
struct btf_type *base_t, *dist_t, *split_t;
+ const char *last_name = NULL;
__u8 *base_name_cnt = NULL;
int err = 0;
__u32 id;
@@ -136,6 +137,19 @@ static int btf_relocate_map_distilled_base(struct
btf_relocate *r)
qsort(dist_base_info_sorted, r->nr_dist_base_types,
sizeof(*dist_base_info_sorted),
cmp_btf_name_size);
+ /* It is possible - though highly unlikely - that
duplicate-named types
+ * end up in distilled based BTF; error out if this is the case.
+ */
+ for (id = 1; id < r->nr_dist_base_types; id++) {
+ if (last_name == dist_base_info_sorted[id].name) {
+ pr_warn("Multiple distilled base types [%u],
[%u] share name '%s'; cannot relocate with base BTF.\n",
+ id - 1, id, last_name);
+ err = -EINVAL;
+ goto done;
+ }
+ last_name = dist_base_info_sorted[id].name;
+ }
+
/* Mark distilled base struct/union members of split BTF
structs/unions
* in id_map with BTF_IS_EMBEDDED; this signals that these types
* need to match both name and size, otherwise embeddding the base
@@ -272,6 +286,21 @@ static int btf_relocate_map_distilled_base(struct
btf_relocate *r)
default:
continue;
}
+ if (r->id_map[dist_name_info->id] &&
+ r->id_map[dist_name_info->id != BTF_IS_EMBEDDED) {
+ /* we already have a match; this tells us that
+ * multiple base types of the same name
+ * have the same size, since for cases where
+ * multiple types have the same name we match
+ * on name and size. In this case, we have
+ * no way of determining which to relocate
+ * to in base BTF, so error out.
+ */
+ pr_warn("distilled base BTF type '%s' [%u], size
%u has multiple candidates of the same size (ids [%u, %u]) in base BTF\n",
+ base_name_info.name, dist_name_info->id,
base_t->size,
+ id, r->id_map[dist_name_info->id]);
+ err = -EINVAL;
+ goto done;
+ }
/* map id and name */
r->id_map[dist_name_info->id] = id;
r->str_map[dist_t->name_off] = base_t->name_off;
With this change, I then tried using "bpftool btf dump -B vmlinux file
$module" for each of the 2700-odd modules I force-generated .BTF.base
sections for, to see if these conditions ever get triggered in practice
(since with your BTF parsing changes that allows us to test relocation).
They don't it seems - all modules could relocate successfully with
vmlinux - which would suggest at least initially it might not be worth
adding additional complexity to the algorithm to handle them, aside from
error checks like the above.
> Wdyt?
>
My personal take is that it would suffice to error out in some of the
edge cases, but I'm open to other approaches too. Hopefully some of the
data above helps us understand the costs of this approach at least. Thanks!
Alan
[1] https://github.com/oracle/dtrace-utils
>> + struct btf_name_info *dist_base_info_sorted;
>> + struct btf_type *base_t, *dist_t, *split_t;
>> + __u8 *base_name_cnt = NULL;
>> + int err = 0;
>> + __u32 id;
>> +
>> + /* generate a sort index array of name/type ids sorted by name for
>> + * distilled base BTF to speed name-based lookups.
>> + */
>> + dist_base_info_sorted = calloc(r->nr_dist_base_types, sizeof(*dist_base_info_sorted));
>> + if (!dist_base_info_sorted) {
>> + err = -ENOMEM;
>> + goto done;
>> + }
>> + for (id = 0; id < r->nr_dist_base_types; id++) {
>> + dist_t = btf_type_by_id(r->dist_base_btf, id);
>> + dist_base_info_sorted[id].name = btf__name_by_offset(r->dist_base_btf,
>> + dist_t->name_off);
>> + dist_base_info_sorted[id].id = id;
>> + dist_base_info_sorted[id].size = dist_t->size;
>> + dist_base_info_sorted[id].needs_size = true;
>> + }
>> + qsort(dist_base_info_sorted, r->nr_dist_base_types, sizeof(*dist_base_info_sorted),
>> + cmp_btf_name_size);
>> +
>> + /* Mark distilled base struct/union members of split BTF structs/unions
>> + * in id_map with BTF_IS_EMBEDDED; this signals that these types
>> + * need to match both name and size, otherwise embeddding the base
>> + * struct/union in the split type is invalid.
>> + */
>> + for (id = r->nr_dist_base_types; id < r->nr_split_types; id++) {
>> + split_t = btf_type_by_id(r->btf, id);
>> + if (btf_is_composite(split_t)) {
>> + err = btf_type_visit_type_ids(split_t, btf_mark_embedded_composite_type_ids,
>> + r);
>> + if (err < 0)
>> + goto done;
>> + }
>> + }
>> +
>> + /* Collect name counts for composite types in base BTF. If multiple
>> + * instances of a struct/union of the same name exist, we need to use
>> + * size to determine which to map to since name alone is ambiguous.
>> + */
>> + base_name_cnt = calloc(r->base_str_len, sizeof(*base_name_cnt));
>> + if (!base_name_cnt) {
>> + err = -ENOMEM;
>> + goto done;
>> + }
>> + for (id = 1; id < r->nr_base_types; id++) {
>> + base_t = btf_type_by_id(r->base_btf, id);
>> + if (!btf_is_composite(base_t) || !base_t->name_off)
>> + continue;
>> + if (base_name_cnt[base_t->name_off] < 255)
>> + base_name_cnt[base_t->name_off]++;
>> + }
>> +
>> + /* Now search base BTF for matching distilled base BTF types. */
>> + for (id = 1; id < r->nr_base_types; id++) {
>> + struct btf_name_info *dist_name_info, base_name_info = {};
>> + int dist_kind, base_kind;
>> +
>> + base_t = btf_type_by_id(r->base_btf, id);
>> + /* distilled base consists of named types only. */
>> + if (!base_t->name_off)
>> + continue;
>> + base_kind = btf_kind(base_t);
>> + base_name_info.id = id;
>> + base_name_info.name = btf__name_by_offset(r->base_btf, base_t->name_off);
>> + switch (base_kind) {
>> + case BTF_KIND_INT:
>> + case BTF_KIND_FLOAT:
>> + case BTF_KIND_ENUM:
>> + case BTF_KIND_ENUM64:
>> + /* These types should match both name and size */
>> + base_name_info.needs_size = true;
>> + base_name_info.size = base_t->size;
>> + break;
>> + case BTF_KIND_FWD:
>> + /* No size considerations for fwds. */
>> + break;
>> + case BTF_KIND_STRUCT:
>> + case BTF_KIND_UNION:
>> + /* Size only needs to be used for struct/union if there
>> + * are multiple types in base BTF with the same name.
>> + * If there are multiple _distilled_ types with the same
>> + * name (a very unlikely scenario), that doesn't matter
>> + * unless corresponding _base_ types to match them are
>> + * missing.
>> + */
>> + base_name_info.needs_size = base_name_cnt[base_t->name_off] > 1;
>> + base_name_info.size = base_t->size;
>> + break;
>> + default:
>> + continue;
>> + }
>> + dist_name_info = bsearch(&base_name_info, dist_base_info_sorted,
>> + r->nr_dist_base_types, sizeof(*dist_base_info_sorted),
>> + cmp_btf_name_size);
>> + if (!dist_name_info)
>> + continue;
>> + if (!dist_name_info->id || dist_name_info->id > r->nr_dist_base_types) {
>> + pr_warn("base BTF id [%d] maps to invalid distilled base BTF id [%d]\n",
>> + id, dist_name_info->id);
>> + err = -EINVAL;
>> + goto done;
>> + }
>> + dist_t = btf_type_by_id(r->dist_base_btf, dist_name_info->id);
>> + dist_kind = btf_kind(dist_t);
>> +
>> + /* Validate that the found distilled type is compatible.
>> + * Do not error out on mismatch as another match may occur
>> + * for an identically-named type.
>> + */
>> + switch (dist_kind) {
>> + case BTF_KIND_FWD:
>> + switch (base_kind) {
>> + case BTF_KIND_FWD:
>> + if (btf_kflag(dist_t) != btf_kflag(base_t))
>> + continue;
>> + break;
>> + case BTF_KIND_STRUCT:
>> + if (btf_kflag(base_t))
>> + continue;
>> + break;
>> + case BTF_KIND_UNION:
>> + if (!btf_kflag(base_t))
>> + continue;
>> + break;
>> + default:
>> + continue;
>> + }
>> + break;
>> + case BTF_KIND_INT:
>> + if (dist_kind != base_kind ||
>> + btf_int_encoding(base_t) != btf_int_encoding(dist_t))
>> + continue;
>> + break;
>> + case BTF_KIND_FLOAT:
>> + if (dist_kind != base_kind)
>> + continue;
>> + break;
>> + case BTF_KIND_ENUM:
>> + /* ENUM and ENUM64 are encoded as sized ENUM in
>> + * distilled base BTF.
>> + */
>> + if (dist_kind != base_kind && base_kind != BTF_KIND_ENUM64)
>> + continue;
>> + break;
>> + case BTF_KIND_STRUCT:
>> + case BTF_KIND_UNION:
>> + /* size verification is required for embedded
>> + * struct/unions.
>> + */
>> + if (r->id_map[dist_name_info->id] == BTF_IS_EMBEDDED &&
>> + base_t->size != dist_t->size)
>> + continue;
>> + break;
>> + default:
>> + continue;
>> + }
>> + /* map id and name */
>> + r->id_map[dist_name_info->id] = id;
>> + r->str_map[dist_t->name_off] = base_t->name_off;
>> + }
>> + /* ensure all distilled BTF ids now have a mapping... */
>> + for (id = 1; id < r->nr_dist_base_types; id++) {
>> + const char *name;
>> +
>> + if (r->id_map[id] && r->id_map[id] != BTF_IS_EMBEDDED)
>> + continue;
>> + dist_t = btf_type_by_id(r->dist_base_btf, id);
>> + name = btf__name_by_offset(r->dist_base_btf, dist_t->name_off);
>> + pr_warn("distilled base BTF type '%s' [%d] is not mapped to base BTF id\n",
>> + name, id);
>> + err = -EINVAL;
>> + break;
>> + }
>> +done:
>> + free(base_name_cnt);
>> + free(dist_base_info_sorted);
>> + return err;
>> +}
>
> [...]
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 8/9] libbpf,bpf: share BTF relocate-related code with kernel
2024-05-31 9:34 ` Eduard Zingerman
@ 2024-05-31 16:13 ` Alan Maguire
0 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-05-31 16:13 UTC (permalink / raw)
To: Eduard Zingerman, andrii, jolsa, acme, quentin
Cc: mykolal, ast, daniel, martin.lau, song, yonghong.song,
john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
mcgrof, nathan
On 31/05/2024 10:34, Eduard Zingerman wrote:
> On Tue, 2024-05-28 at 13:24 +0100, Alan Maguire wrote:
>> Share relocation implementation with the kernel. As part of this,
>> we also need the type/string visitation functions so add them to a
>> btf_common.c file that also gets shared with the kernel. Relocation
>> code in kernel and userspace is identical save for the impementation
>> of the reparenting of split BTF to the relocated base BTF and
>> retrieval of BTF header from "struct btf"; these small functions
>> need separate user-space and kernel implementations.
>>
>> One other wrinkle on the kernel side is we have to map .BTF.ids in
>> modules as they were generated with the type ids used at BTF encoding
>> time. btf_relocate() optionally returns an array mapping from old BTF
>> ids to relocated ids, so we use that to fix up these references where
>> needed for kfuncs.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>
> I think we also need a test or a few tests to verify that ID sets are
> converted correctly.
>
We sort of get that implicitly since bpf_testmod.ko will have a
.BTF.base section, so its ID sets need to be relocated for the existing
tests to work correctly. I'm thinking something that makes use of a
bpf_iter_testmod_seq*() function while validating its BTF id > num base
types might work, or did you have something specific in mind? Thanks!
Alan
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation
2024-05-28 12:24 ` [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation Alan Maguire
2024-05-31 2:22 ` Eduard Zingerman
@ 2024-05-31 18:21 ` Andrii Nakryiko
1 sibling, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2024-05-31 18:21 UTC (permalink / raw)
To: Alan Maguire
Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
haoluo, houtao1, bpf, masahiroy, mcgrof, nathan
On Tue, May 28, 2024 at 5:26 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Map distilled base BTF type ids referenced in split BTF and their
> references to the base BTF passed in, and if the mapping succeeds,
> reparent the split BTF to the base BTF.
>
> Relocation is done by first verifying that distilled base BTF
> only consists of named INT, FLOAT, ENUM, FWD, STRUCT and
> UNION kinds; then we sort these to speed lookups. Once sorted,
> the base BTF is iterated, and for each relevant kind we check
> for an equivalent in distilled base BTF. When found, the
> mapping from distilled -> base BTF id and string offset is recorded.
> In establishing mappings, we need to ensure we check STRUCT/UNION
> size when the STRUCT/UNION is embedded in a split BTF STRUCT/UNION,
> and when duplicate names exist for the same STRUCT/UNION. Otherwise
> size is ignored in matching STRUCT/UNIONs.
>
> Once all mappings are established, we can update type ids
> and string offsets in split BTF and reparent it to the new base.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> tools/lib/bpf/Build | 2 +-
> tools/lib/bpf/btf.c | 17 ++
> tools/lib/bpf/btf.h | 14 ++
> tools/lib/bpf/btf_relocate.c | 430 ++++++++++++++++++++++++++++++++
> tools/lib/bpf/libbpf.map | 1 +
> tools/lib/bpf/libbpf_internal.h | 3 +
> 6 files changed, 466 insertions(+), 1 deletion(-)
> create mode 100644 tools/lib/bpf/btf_relocate.c
>
[...]
> +/* Set temporarily in relocation id_map if distilled base struct/union is
> + * embedded in a split BTF struct/union; in such a case, size information must
> + * match between distilled base BTF and base BTF representation of type.
> + */
> +#define BTF_IS_EMBEDDED ((__u32)-1)
> +
> +/* <name, size, id> triple used in sorting/searching distilled base BTF. */
> +struct btf_name_info {
> + const char *name;
> + int size:31;
> + /* set when search requires a size match */
> + bool needs_size;
this was meant to be a 1-bit field, right? `: 1` is missing?
> + __u32 id;
> +};
> +
[...]
> + case BTF_KIND_INT:
> + if (dist_kind != base_kind ||
> + btf_int_encoding(base_t) != btf_int_encoding(dist_t))
> + continue;
> + break;
> + case BTF_KIND_FLOAT:
> + if (dist_kind != base_kind)
> + continue;
> + break;
> + case BTF_KIND_ENUM:
> + /* ENUM and ENUM64 are encoded as sized ENUM in
> + * distilled base BTF.
> + */
> + if (dist_kind != base_kind && base_kind != BTF_KIND_ENUM64)
> + continue;
probably unnecessarily strict, I'd check something like
if (!btf_is_enum_any(dist_t) || !btf_is_enum_any(base_t) ||
dist_t->size != base_t->size)
continue;
it's minor, unlikely to matter in practice
> + break;
> + case BTF_KIND_STRUCT:
> + case BTF_KIND_UNION:
> + /* size verification is required for embedded
> + * struct/unions.
> + */
> + if (r->id_map[dist_name_info->id] == BTF_IS_EMBEDDED &&
> + base_t->size != dist_t->size)
> + continue;
> + break;
> + default:
> + continue;
> + }
> + /* map id and name */
> + r->id_map[dist_name_info->id] = id;
> + r->str_map[dist_t->name_off] = base_t->name_off;
> + }
[...]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation
2024-05-31 15:38 ` Alan Maguire
@ 2024-05-31 18:44 ` Andrii Nakryiko
2024-06-01 7:56 ` Eduard Zingerman
1 sibling, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2024-05-31 18:44 UTC (permalink / raw)
To: Alan Maguire
Cc: Eduard Zingerman, andrii, jolsa, acme, quentin, mykolal, ast,
daniel, martin.lau, song, yonghong.song, john.fastabend, kpsingh,
sdf, haoluo, houtao1, bpf, masahiroy, mcgrof, nathan
On Fri, May 31, 2024 at 8:39 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 31/05/2024 03:22, Eduard Zingerman wrote:
> > On Tue, 2024-05-28 at 13:24 +0100, Alan Maguire wrote:
> >
> > [...]
> >
> >> +/* Build a map from distilled base BTF ids to base BTF ids. To do so, iterate
> >> + * through base BTF looking up distilled type (using binary search) equivalents.
> >> + */
> >> +static int btf_relocate_map_distilled_base(struct btf_relocate *r)
> >> +{
> >
> > I have several observations about this algorithm.
> >
> > The algorithm does Base.Cnt * log2(Dist.Cnt) binary searches.
> > However, it might be better to switch searches around
> > and look for distilled {name,size} pairs in base btf,
> > doing Dist.Cnt * log2(Base.cnt) searches instead.
> > Suppose Base.Cnt = 2**20 and Dist.Cnt = 2**10, in such case:
> > - Base.Cnt * log2(Dist.Cnt) = 2**20 * 10
> > - Dist.Cnt * log2(Base.cnt) = 2**10 * 20, which is smaller
> >
>
> Hi Eduard,
>
> I crunched some numbers on base, distilled base BTF to try and flesh
> this out a bit more.
>
Wow, you guys really went hardcore here with counting...
Eduard, as Alan mentioned, you ignored both CPU and *memory*
requirements for sorting a large base number of types. If N is base
type count, M is distilled type count (note, M << N), then
- with Alan's current approach we have
- O(MlogM + N*logM) = O((M+N)logM) for sorting/search
- O(M) memory for index
- with your proposal
- O(NlogN + M*logN) = O((M+N)logN) for sorting/search
- O(N) memory for index.
Just on memory usage it's clear that the current approach wins
significantly and is the reason enough to prefer it. But even on
overall CPU usage we have (N+M)logM < (N+M)logN, which
(asymptotically) is still better.
But I think the memory argument in this case wins, we should avoid
allocating 1MB of extra memory to shave off 1ms (if at all) of kernel
load time, IMO.
[...]
>
> So the overall relocation time - from 11 distilled types in hid_ite to
> 610 for cxgb4 - is within a range from 4.5msec (4432193ns above) to
> 8msec. The times for relocation represent less than 50% of overall
> module load times - the later vary from 11-18msec across these modules.
> It would be great to find some performance wins here, but I don't
> _think_ swapping the sort/search targets will buy us much unfortunately.
>
As long as it's NlogN-ish (mix N/M here), I think it's fine. It's
O(N*M) that would be horrible.
> > The algorithm might not handle name duplicates in the distilled BTF well,
> > e.g. in theory, the following is a valid C code
> >
> > struct foo { int f; }; // sizeof(struct foo) == 4
> > typedef int foo; // sizeof(foo) == 4
> >
I think only typedef has its own "namespace", right? And typedefs are
not in distilled base. All other types share the same name (so at
least we don't have to support two separate namespaces).
But you are right, it's C, and across multiple compile units one can
technically have two different kinds with the same name co-existing
overall in vmlinux BTF.
But instead of doing bsearch_unique(), I propose we implement
lower-bound-like binary search, which will find the first instance of
a given name, and then we iterate linearly across all duplicates,
ignoring incompatible kinds. If we do find two possible matches (when
taking size/kind into account), then we should error out due to
ambiguity.
Alan, please see find_linfo() in kernel/bpf/log.c. You basically need
exactly that implementation, except you'd be using strcmp() <= 0
condition to find the index. After that just iterate starting from
that index and do strcmp() == 0, marking all matches. Check number of
matches (0 -- bad, 1 -- great, >1 -- bad).
Eduard, Alan, makes sense or did I miss anything?
> > Suppose that these types are a part of the distilled BTF.
> > Depending on which one would end up first in 'dist_base_info_sorted'
> > bsearch might fail to find one or the other.
> >
>
> In the case of distilled base BTF, only struct, union, enum, enum64,
> int, float and fwd can be present. Size matches would have to be between
> one of these kinds I think, but are still possible nevertheless.
>
+1
> > Also, algorithm does not report an error if there are several
> > types with the same name and size in the base BTF.
> >
>
[...]
>
>
> struct elf_thread_core_info;
>
> struct elf_note_info {
> struct elf_thread_core_info *thread;
> struct memelfnote psinfo;
> struct memelfnote signote;
> struct memelfnote auxv;
> struct memelfnote files;
> siginfo_t csigdata;
> size_t size;
> int thread_notes;
> };
>
> struct elf_thread_core_info___2;
>
> struct elf_note_info___2 {
> struct elf_thread_core_info___2 *thread;
> struct memelfnote psinfo;
> struct memelfnote signote;
> struct memelfnote auxv;
> struct memelfnote files;
> compat_siginfo_t csigdata;
> size_t size;
> int thread_notes;
> };
>
> Both of these share self-reference, either directly or indirectly so
> maybe it's a corner-case of dedup we're missing. I'll dig into these later.
I think it's some differing type deeper in the
elf_thread_core_info/elf_thread_core_info___2 (and note that this is
not a self-reference, it's a different type).
I don't think it's a bug in dedup algo, this is a known case due to
some type that is supposed to be equivalent actually not being such.
>
> > I suggest to modify the algorithm as follows:
> > - let 'base_info_sorted' be a set of tuples {kind,name,size,id}
> > corresponding to the base BTF, sorted by kind, name and size;
>
> That was my first thought, but we can't always search by kind; for
> example it's possible the distilled base has a fwd and vmlinux only has
> a struct kind for the same type name; in such a case we'd want to
> support a match provided the fwd's kflag indicated a struct fwd.
yep, makes sense (though highly unlikely)
>
> In fact looking at the code we're missing logic for the opposite
> condition (fwd only in base, struct in distilled base). I'll fix that.
I'd error out on this, this looks like a weird nonsensical case (but
technically can happen with BTF, of course).
>
> The other case is an enum in distilled base matching an enum64
> or an enum.
I mentioned that in a previous email, we should just ignore the enum
vs enum64 difference as long as their size matches.
>
> > - add a custom utility bsearch_unique, that behaves like bsearch,
> > but returns NULL if entry is non-unique with regards to current
> > predicate (e.g. use bsearch but also check neighbors);
> > - for each type D in the distilled base:
> > - use bsearch_unique to find entry E in 'base_info_sorted'
> > that matches D.{kind,name,size} sub-tuple;
> > - if E exists, set id_map[D] := E.id;
> > - if E does not exist:
> > - if id_map[D] == BTF_IS_EMBEDDED, report an error;
> > - if id_map[D] != BTF_IS_EMBEDDED:
> > - use bsearch_unique to find entry E in 'base_info_sorted'
> > that matches D.{kind,name} sub-tuple;
> > - if E exists, set id_map[D] := E.id;
> > - otherwise, report an error.
> >
> > This allows to:
> > - flip the search order, potentially gaining some speed;
> > - drop the 'base_name_cnt' array and logic;
> > - handle the above hypothetical name conflict example.
> >
>
> I think flipping the search order could gain search speed, but only at
> the expense of slowing things down overall due to the extra cost of
> having to sort so many more elements. I suspect it will mostly be a
> wash, though numbers above seem to suggest sorting distilled base may
> have an edge when we consider both search and sort. The question is
> probably which sort/search order is most amenable to handling the data
> and helping us deal with the edge cases like duplicates.
>
> With the existing scheme, I think catching cases of name duplicates in
> distilled base BTF and name/size duplicates in base BTF for types we
> want to relocate from distilled base BTF and erroring out would suffice;
> basically the following applied to this patch (patch 3 in the series)
>
> diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c
> index f2e91cdfb5cc..4e282ee8f183 100644
> --- a/tools/lib/bpf/btf_relocate.c
> +++ b/tools/lib/bpf/btf_relocate.c
> @@ -113,6 +113,7 @@ static int btf_relocate_map_distilled_base(struct
> btf_relocate *r)
> {
> struct btf_name_info *dist_base_info_sorted;
> struct btf_type *base_t, *dist_t, *split_t;
> + const char *last_name = NULL;
> __u8 *base_name_cnt = NULL;
> int err = 0;
> __u32 id;
> @@ -136,6 +137,19 @@ static int btf_relocate_map_distilled_base(struct
> btf_relocate *r)
> qsort(dist_base_info_sorted, r->nr_dist_base_types,
> sizeof(*dist_base_info_sorted),
> cmp_btf_name_size);
>
> + /* It is possible - though highly unlikely - that
> duplicate-named types
> + * end up in distilled based BTF; error out if this is the case.
> + */
> + for (id = 1; id < r->nr_dist_base_types; id++) {
> + if (last_name == dist_base_info_sorted[id].name) {
technically this a) has to take into account kind and b) even with the
same kind is legal
On some older kernel version we'd have that with ring_buffer, for
example. There were two completely independent struct ring_buffer
types.
I thought we would do all the relocation-time size check shenanigans
to resolve all this. I might have missed some nuance, though.
> + pr_warn("Multiple distilled base types [%u],
> [%u] share name '%s'; cannot relocate with base BTF.\n",
> + id - 1, id, last_name);
> + err = -EINVAL;
> + goto done;
> + }
> + last_name = dist_base_info_sorted[id].name;
> + }
> +
[...]
> > Wdyt?
> >
>
> My personal take is that it would suffice to error out in some of the
> edge cases, but I'm open to other approaches too. Hopefully some of the
> data above helps us understand the costs of this approach at least. Thanks!
>
I wouldn't change the overall relocation algorithm, but better
detection of ambiguities and reporting would definitely be useful.
> Alan
>
> [1] https://github.com/oracle/dtrace-utils
>
> >> + struct btf_name_info *dist_base_info_sorted;
> >> + struct btf_type *base_t, *dist_t, *split_t;
> >> + __u8 *base_name_cnt = NULL;
> >> + int err = 0;
> >> + __u32 id;
> >> +
[...]
Please try to trim larger chunks of code/discussion that are not
relevant anymore.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 5/9] libbpf: make btf_parse_elf process .BTF.base transparently
2024-05-28 12:24 ` [PATCH v5 bpf-next 5/9] libbpf: make btf_parse_elf process .BTF.base transparently Alan Maguire
@ 2024-05-31 18:57 ` Andrii Nakryiko
2024-06-01 8:04 ` Eduard Zingerman
0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2024-05-31 18:57 UTC (permalink / raw)
To: Alan Maguire
Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
haoluo, houtao1, bpf, masahiroy, mcgrof, nathan
On Tue, May 28, 2024 at 5:26 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> From: Eduard Zingerman <eddyz87@gmail.com>
>
> Update btf_parse_elf() to check if .BTF.base section is present.
> The logic is as follows:
>
> if .BTF.base section exists:
> distilled_base := btf_new(.BTF.base)
> if distilled_base:
> btf := btf_new(.BTF, .base_btf=distilled_base)
> if base_btf:
> btf_relocate(btf, base_btf)
> else:
> btf := btf_new(.BTF)
> return btf
>
> In other words:
> - if .BTF.base section exists, load BTF from it and use it as a base
> for .BTF load;
> - if base_btf is specified and .BTF.base section exist, relocate newly
> loaded .BTF against base_btf.
>
> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
> ---
> tools/lib/bpf/btf.c | 151 +++++++++++++++++++++++++++++---------------
> tools/lib/bpf/btf.h | 1 +
> 2 files changed, 102 insertions(+), 50 deletions(-)
>
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index cb762d7a5dd7..b57f74eedda0 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -114,7 +114,10 @@ struct btf {
> /* a set of unique strings */
> struct strset *strs_set;
> /* whether strings are already deduplicated */
> - bool strs_deduped;
> + unsigned strs_deduped:1;
> +
> + /* whether base_btf should be freed in btf_free for this instance */
> + unsigned owns_base:1;
nit: let's not do bit counting (i.e., bit fields for bool flags) on
rather big things like struct btf, which are only a few of them and
4/8 extra bytes just doesn't matter compared to all the other memory
used for actual data.
>
> /* BTF object FD, if loaded into kernel */
> int fd;
> @@ -969,6 +972,8 @@ void btf__free(struct btf *btf)
> free(btf->raw_data);
> free(btf->raw_data_swapped);
> free(btf->type_offs);
> + if (btf->owns_base)
> + btf__free(btf->base_btf);
> free(btf);
> }
>
> @@ -1084,53 +1089,38 @@ struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf)
> return libbpf_ptr(btf_new(data, size, base_btf));
> }
>
> -static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
> - struct btf_ext **btf_ext)
> +struct elf_sections_info {
> + Elf_Data *btf_data;
> + Elf_Data *btf_ext_data;
> + Elf_Data *btf_base_data;
bikeshedding time: elf_sections_info -> btf_elf_data (or
btf_elf_secs), btf_data -> btf, btf_ext_data -> btf_ext, btf_base_data
-> btf_base ?
> +};
> +
> +static int btf_find_elf_sections(Elf *elf, const char *path, struct elf_sections_info *info)
> {
> - Elf_Data *btf_data = NULL, *btf_ext_data = NULL;
> - int err = 0, fd = -1, idx = 0;
> - struct btf *btf = NULL;
> Elf_Scn *scn = NULL;
> - Elf *elf = NULL;
> + Elf_Data *data;
> GElf_Ehdr ehdr;
> size_t shstrndx;
> + int idx = 0;
[...]
> + if (!info.btf_data) {
> pr_warn("failed to find '%s' ELF section in %s\n", BTF_ELF_SEC, path);
> err = -ENODATA;
> goto done;
> }
> - btf = btf_new(btf_data->d_buf, btf_data->d_size, base_btf);
> +
> + if (info.btf_base_data) {
> + distilled_base_btf = btf_new(info.btf_base_data->d_buf, info.btf_base_data->d_size,
> + NULL);
with the above bikeshedding suggestion, and distilled_base_btf ->
dist_base_btf, let's get it to be a less verbose single-line statement
> + err = libbpf_get_error(distilled_base_btf);
boo to using libbpf_get_error() in new code. btf_new() is internal, so
IS_ERR()/PTR_ERR(), please
pw-bot: cr
> + if (err) {
> + distilled_base_btf = NULL;
> + goto done;
> + }
> + }
> +
> + btf = btf_new(info.btf_data->d_buf, info.btf_data->d_size,
> + distilled_base_btf ? distilled_base_btf : base_btf);
dist_base_btf ?: base_btf
> err = libbpf_get_error(btf);
ditto, IS_ERR/PTR_ERR
> if (err)
> goto done;
>
> + if (distilled_base_btf && base_btf) {
> + err = btf__relocate(btf, base_btf);
> + if (err)
> + goto done;
> + btf__free(distilled_base_btf);
> + distilled_base_btf = NULL;
> + }
> +
> + if (distilled_base_btf)
> + btf->owns_base = true;
should we reset this to false when changing base in btf__relocate()?
> +
> switch (gelf_getclass(elf)) {
> case ELFCLASS32:
> btf__set_pointer_size(btf, 4);
[...]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 8/9] libbpf,bpf: share BTF relocate-related code with kernel
2024-05-28 12:24 ` [PATCH v5 bpf-next 8/9] libbpf,bpf: share BTF relocate-related code with kernel Alan Maguire
2024-05-31 9:34 ` Eduard Zingerman
@ 2024-05-31 19:04 ` Andrii Nakryiko
2024-06-01 1:59 ` Andrii Nakryiko
1 sibling, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2024-05-31 19:04 UTC (permalink / raw)
To: Alan Maguire
Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
haoluo, houtao1, bpf, masahiroy, mcgrof, nathan
On Tue, May 28, 2024 at 5:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Share relocation implementation with the kernel. As part of this,
> we also need the type/string visitation functions so add them to a
> btf_common.c file that also gets shared with the kernel. Relocation
> code in kernel and userspace is identical save for the impementation
> of the reparenting of split BTF to the relocated base BTF and
> retrieval of BTF header from "struct btf"; these small functions
> need separate user-space and kernel implementations.
>
> One other wrinkle on the kernel side is we have to map .BTF.ids in
> modules as they were generated with the type ids used at BTF encoding
> time. btf_relocate() optionally returns an array mapping from old BTF
> ids to relocated ids, so we use that to fix up these references where
> needed for kfuncs.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> include/linux/btf.h | 45 ++++++++++
> kernel/bpf/Makefile | 10 ++-
> kernel/bpf/btf.c | 168 +++++++++++++++++++++++++----------
> tools/lib/bpf/Build | 2 +-
> tools/lib/bpf/btf.c | 130 ---------------------------
> tools/lib/bpf/btf_iter.c | 143 +++++++++++++++++++++++++++++
> tools/lib/bpf/btf_relocate.c | 23 +++++
> 7 files changed, 344 insertions(+), 177 deletions(-)
> create mode 100644 tools/lib/bpf/btf_iter.c
>
[...]
> +static inline struct btf_decl_tag *btf_decl_tag(const struct btf_type *t)
> +{
> + return (struct btf_decl_tag *)(t + 1);
> +}
> +
> static inline int btf_id_cmp_func(const void *a, const void *b)
> {
> const int *pa = a, *pb = b;
> @@ -515,9 +528,17 @@ static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *
> }
> #endif
>
> +typedef int (*type_id_visit_fn)(__u32 *type_id, void *ctx);
> +typedef int (*str_off_visit_fn)(__u32 *str_off, void *ctx);
> +
let me take a quick stab at implementing type/str field iterator in
libbpf. If I don't get stuck anywhere, maybe you can just rebase on
that and avoid the callback hell and the need for this
callback-vs-iter churn in the kernel code as well
> #ifdef CONFIG_BPF_SYSCALL
> const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
> +void btf_set_base_btf(struct btf *btf, const struct btf *base_btf);
> +int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **map_ids);
> +int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx);
> +int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx);
> const char *btf_name_by_offset(const struct btf *btf, u32 offset);
> +const char *btf_str_by_offset(const struct btf *btf, u32 offset);
> struct btf *btf_parse_vmlinux(void);
> struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
> u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id,
[...]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 9/9] kbuild,bpf: add module-specific pahole flags for distilled base BTF
2024-05-28 12:24 ` [PATCH v5 bpf-next 9/9] kbuild,bpf: add module-specific pahole flags for distilled base BTF Alan Maguire
@ 2024-05-31 19:06 ` Andrii Nakryiko
2024-06-02 13:41 ` Alan Maguire
0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2024-05-31 19:06 UTC (permalink / raw)
To: Alan Maguire
Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
haoluo, houtao1, bpf, masahiroy, mcgrof, nathan
On Tue, May 28, 2024 at 5:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Support creation of module BTF along with distilled base BTF;
> the latter is stored in a .BTF.base ELF section and supplements
> split BTF references to base BTF with information about base types,
> allowing for later relocation of split BTF with a (possibly
> changed) base. resolve_btfids detects the presence of a .BTF.base
> section and will use it instead of the base BTF it is passed in
> BTF id resolution.
>
> Modules will be built with a distilled .BTF.base section for external
> module build, i.e.
>
> make -C. -M=path2/module
>
> ...while in-tree module build as part of a normal kernel build will
> not generate distilled base BTF; this is because in-tree modules
> change with the kernel and do not require BTF relocation for the
> running vmlinux.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> scripts/Makefile.btf | 5 +++++
> scripts/Makefile.modfinal | 2 +-
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
> index bca8a8f26ea4..191b4903e864 100644
> --- a/scripts/Makefile.btf
> +++ b/scripts/Makefile.btf
> @@ -21,8 +21,13 @@ else
> # Switch to using --btf_features for v1.26 and later.
> pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func
>
> +ifneq ($(KBUILD_EXTMOD),)
> +module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base
Remind me, please. What's the state of pahole patches? Are they
waiting on these libbpf changes to land first, right?
> +endif
> +
> endif
>
> pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE) += --lang_exclude=rust
>
> export PAHOLE_FLAGS := $(pahole-flags-y)
> +export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y)
> diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
> index 79fcf2731686..6d2b8da98ee5 100644
> --- a/scripts/Makefile.modfinal
> +++ b/scripts/Makefile.modfinal
> @@ -39,7 +39,7 @@ quiet_cmd_btf_ko = BTF [M] $@
> if [ ! -f vmlinux ]; then \
> printf "Skipping BTF generation for %s due to unavailability of vmlinux\n" $@ 1>&2; \
> else \
> - LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) --btf_base vmlinux $@; \
> + LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) $(MODULE_PAHOLE_FLAGS) --btf_base vmlinux $@; \
> $(RESOLVE_BTFIDS) -b vmlinux $@; \
> fi;
>
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 8/9] libbpf,bpf: share BTF relocate-related code with kernel
2024-05-31 19:04 ` Andrii Nakryiko
@ 2024-06-01 1:59 ` Andrii Nakryiko
2024-06-02 13:42 ` Alan Maguire
0 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2024-06-01 1:59 UTC (permalink / raw)
To: Alan Maguire
Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
haoluo, houtao1, bpf, masahiroy, mcgrof, nathan
On Fri, May 31, 2024 at 12:04 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, May 28, 2024 at 5:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >
> > Share relocation implementation with the kernel. As part of this,
> > we also need the type/string visitation functions so add them to a
> > btf_common.c file that also gets shared with the kernel. Relocation
> > code in kernel and userspace is identical save for the impementation
> > of the reparenting of split BTF to the relocated base BTF and
> > retrieval of BTF header from "struct btf"; these small functions
> > need separate user-space and kernel implementations.
> >
> > One other wrinkle on the kernel side is we have to map .BTF.ids in
> > modules as they were generated with the type ids used at BTF encoding
> > time. btf_relocate() optionally returns an array mapping from old BTF
> > ids to relocated ids, so we use that to fix up these references where
> > needed for kfuncs.
> >
> > Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> > ---
> > include/linux/btf.h | 45 ++++++++++
> > kernel/bpf/Makefile | 10 ++-
> > kernel/bpf/btf.c | 168 +++++++++++++++++++++++++----------
> > tools/lib/bpf/Build | 2 +-
> > tools/lib/bpf/btf.c | 130 ---------------------------
> > tools/lib/bpf/btf_iter.c | 143 +++++++++++++++++++++++++++++
> > tools/lib/bpf/btf_relocate.c | 23 +++++
> > 7 files changed, 344 insertions(+), 177 deletions(-)
> > create mode 100644 tools/lib/bpf/btf_iter.c
> >
>
> [...]
>
> > +static inline struct btf_decl_tag *btf_decl_tag(const struct btf_type *t)
> > +{
> > + return (struct btf_decl_tag *)(t + 1);
> > +}
> > +
> > static inline int btf_id_cmp_func(const void *a, const void *b)
> > {
> > const int *pa = a, *pb = b;
> > @@ -515,9 +528,17 @@ static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *
> > }
> > #endif
> >
> > +typedef int (*type_id_visit_fn)(__u32 *type_id, void *ctx);
> > +typedef int (*str_off_visit_fn)(__u32 *str_off, void *ctx);
> > +
>
> let me take a quick stab at implementing type/str field iterator in
> libbpf. If I don't get stuck anywhere, maybe you can just rebase on
> that and avoid the callback hell and the need for this
> callback-vs-iter churn in the kernel code as well
>
Sent it out as one RFC patch (which unfortunately makes it harder to
see iterator logic, sorry; but I ran out of time to split it
properly), see [0].
It is especially nice when per-field logic is very simple (like
bpftool's gen.c logic, where we just remap ID). Please take a look and
let me know what you think.
[0] https://patchwork.kernel.org/project/netdevbpf/patch/20240601014505.3443241-1-andrii@kernel.org/
> > #ifdef CONFIG_BPF_SYSCALL
> > const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
> > +void btf_set_base_btf(struct btf *btf, const struct btf *base_btf);
> > +int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **map_ids);
> > +int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx);
> > +int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx);
> > const char *btf_name_by_offset(const struct btf *btf, u32 offset);
> > +const char *btf_str_by_offset(const struct btf *btf, u32 offset);
> > struct btf *btf_parse_vmlinux(void);
> > struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
> > u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id,
>
> [...]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation
2024-05-31 15:38 ` Alan Maguire
2024-05-31 18:44 ` Andrii Nakryiko
@ 2024-06-01 7:56 ` Eduard Zingerman
2024-06-02 13:47 ` Alan Maguire
1 sibling, 1 reply; 29+ messages in thread
From: Eduard Zingerman @ 2024-06-01 7:56 UTC (permalink / raw)
To: Alan Maguire
Cc: andrii, jolsa, acme, quentin, mykolal, ast, daniel, martin.lau,
song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
houtao1, bpf, masahiroy, mcgrof, nathan
On Fri, 2024-05-31 at 16:38 +0100, Alan Maguire wrote:
Hi Alan,
> The in-kernel sort reports n*log2(n) + 0.37*n + o(n) comparisons on
> average; for base BTF that means sorting requires at least
>
> Base: 14307*14+0.37*14307 = 205592 comparisons
> Dist: 600*9+0.37*600 = 5622 comparisons
>
> So we get an inversion of the above results, with (unless I'm
> miscalculating something), sorting distilled base BTF requiring less
> comparisons overall across both sort+search.
>
> Sort Comparisons Search comparisons Total
> ======================================================================
> 5622 (distilled) 128763 (to base) 134385
> 205592 (base) 8400 (to distilled) 213992
It was absolutely stupid of me to not include the base sort cost into
the calculations, really embarrassing. Thank you for pointing this out
and my apologies for suggesting such nonsense.
[...]
> > The algorithm might not handle name duplicates in the distilled BTF well,
> > e.g. in theory, the following is a valid C code
> >
> > struct foo { int f; }; // sizeof(struct foo) == 4
> > typedef int foo; // sizeof(foo) == 4
> >
> > Suppose that these types are a part of the distilled BTF.
> > Depending on which one would end up first in 'dist_base_info_sorted'
> > bsearch might fail to find one or the other.
>
> In the case of distilled base BTF, only struct, union, enum, enum64,
> int, float and fwd can be present. Size matches would have to be between
> one of these kinds I think, but are still possible nevertheless.
As Andrii noted in a sibling reply, there is still a slim possibility
for name duplicates in the distilled base. Imo, if we can catch the
corner case we should.
> > Also, algorithm does not report an error if there are several
> > types with the same name and size in the base BTF.
>
> Yep, while we have to handle this, it only becomes an ambiguity problem
> if distilled base BTF refers to one of such types. On my vmlinux I see
> the following duplicate name/size STRUCTs
As you noted, this situation is really easy to catch by checking if
id_map slot is already occupied, so it should be checked.
[...]
> struct elf_thread_core_info___2;
>
> struct elf_note_info___2 {
> struct elf_thread_core_info___2 *thread;
> struct memelfnote psinfo;
> struct memelfnote signote;
> struct memelfnote auxv;
> struct memelfnote files;
> compat_siginfo_t csigdata;
> size_t size;
> int thread_notes;
> };
>
> Both of these share self-reference, either directly or indirectly so
> maybe it's a corner-case of dedup we're missing. I'll dig into these later.
This is interesting indeed.
> > I suggest to modify the algorithm as follows:
> > - let 'base_info_sorted' be a set of tuples {kind,name,size,id}
> > corresponding to the base BTF, sorted by kind, name and size;
>
> That was my first thought, but we can't always search by kind; for
> example it's possible the distilled base has a fwd and vmlinux only has
> a struct kind for the same type name; in such a case we'd want to
> support a match provided the fwd's kflag indicated a struct fwd.
>
> In fact looking at the code we're missing logic for the opposite
> condition (fwd only in base, struct in distilled base). I'll fix that.
>
> The other case is an enum in distilled base matching an enum64
> or an enum.
I think it could be possible to do some kinds normalization
(e.g. represent fwd's as zero sized structs or unions in
btf_name_info).
I'll try to implement this and get back to you on Monday.
[...]
> I think flipping the search order could gain search speed, but only at
> the expense of slowing things down overall due to the extra cost of
> having to sort so many more elements. I suspect it will mostly be a
> wash, though numbers above seem to suggest sorting distilled base may
> have an edge when we consider both search and sort. The question is
> probably which sort/search order is most amenable to handling the data
> and helping us deal with the edge cases like duplicates.
Yes, you are absolutely correct.
[...]
> @@ -136,6 +137,19 @@ static int btf_relocate_map_distilled_base(struct
> btf_relocate *r)
> qsort(dist_base_info_sorted, r->nr_dist_base_types,
> sizeof(*dist_base_info_sorted),
> cmp_btf_name_size);
>
> + /* It is possible - though highly unlikely - that
> duplicate-named types
> + * end up in distilled based BTF; error out if this is the case.
> + */
> + for (id = 1; id < r->nr_dist_base_types; id++) {
> + if (last_name == dist_base_info_sorted[id].name) {
> + pr_warn("Multiple distilled base types [%u],
> [%u] share name '%s'; cannot relocate with base BTF.\n",
> + id - 1, id, last_name);
> + err = -EINVAL;
> + goto done;
> + }
> + last_name = dist_base_info_sorted[id].name;
> + }
> +
Nit: this rejects a case when both distilled types are embedded and a
counterpart for each could be found in base. But that's a bit
inconvenient to check for in the current framework. Probably not
important.
> /* Mark distilled base struct/union members of split BTF
> structs/unions
> * in id_map with BTF_IS_EMBEDDED; this signals that these types
> * need to match both name and size, otherwise embeddding the base
> @@ -272,6 +286,21 @@ static int btf_relocate_map_distilled_base(struct
> btf_relocate *r)
> default:
> continue;
> }
> + if (r->id_map[dist_name_info->id] &&
> + r->id_map[dist_name_info->id != BTF_IS_EMBEDDED) {
> + /* we already have a match; this tells us that
> + * multiple base types of the same name
> + * have the same size, since for cases where
> + * multiple types have the same name we match
> + * on name and size. In this case, we have
> + * no way of determining which to relocate
> + * to in base BTF, so error out.
> + */
> + pr_warn("distilled base BTF type '%s' [%u], size
> %u has multiple candidates of the same size (ids [%u, %u]) in base BTF\n",
> + base_name_info.name, dist_name_info->id,
> base_t->size,
> + id, r->id_map[dist_name_info->id]);
> + err = -EINVAL;
> + goto done;
> + }
I think this hunk should be added.
[...]
Best regards,
Eduard
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 5/9] libbpf: make btf_parse_elf process .BTF.base transparently
2024-05-31 18:57 ` Andrii Nakryiko
@ 2024-06-01 8:04 ` Eduard Zingerman
0 siblings, 0 replies; 29+ messages in thread
From: Eduard Zingerman @ 2024-06-01 8:04 UTC (permalink / raw)
To: Andrii Nakryiko, Alan Maguire
Cc: andrii, jolsa, acme, quentin, mykolal, ast, daniel, martin.lau,
song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
houtao1, bpf, masahiroy, mcgrof, nathan
On Fri, 2024-05-31 at 11:57 -0700, Andrii Nakryiko wrote:
[...]
> > @@ -1084,53 +1089,38 @@ struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf)
> > return libbpf_ptr(btf_new(data, size, base_btf));
> > }
> >
> > -static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
> > - struct btf_ext **btf_ext)
> > +struct elf_sections_info {
> > + Elf_Data *btf_data;
> > + Elf_Data *btf_ext_data;
> > + Elf_Data *btf_base_data;
>
> bikeshedding time: elf_sections_info -> btf_elf_data (or
> btf_elf_secs), btf_data -> btf, btf_ext_data -> btf_ext, btf_base_data
> -> btf_base ?
As you and Alan see fit.
[...]
> > - btf = btf_new(btf_data->d_buf, btf_data->d_size, base_btf);
> > +
> > + if (info.btf_base_data) {
> > + distilled_base_btf = btf_new(info.btf_base_data->d_buf, info.btf_base_data->d_size,
> > + NULL);
>
> with the above bikeshedding suggestion, and distilled_base_btf ->
> dist_base_btf, let's get it to be a less verbose single-line statement
>
> > + err = libbpf_get_error(distilled_base_btf);
>
> boo to using libbpf_get_error() in new code. btf_new() is internal, so
> IS_ERR()/PTR_ERR(), please
Noted.
[...]
> > + if (distilled_base_btf)
> > + btf->owns_base = true;
>
> should we reset this to false when changing base in btf__relocate()?
It should, good catch!
[...]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 9/9] kbuild,bpf: add module-specific pahole flags for distilled base BTF
2024-05-31 19:06 ` Andrii Nakryiko
@ 2024-06-02 13:41 ` Alan Maguire
0 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-06-02 13:41 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
haoluo, houtao1, bpf, masahiroy, mcgrof, nathan
On 31/05/2024 20:06, Andrii Nakryiko wrote:
> On Tue, May 28, 2024 at 5:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> Support creation of module BTF along with distilled base BTF;
>> the latter is stored in a .BTF.base ELF section and supplements
>> split BTF references to base BTF with information about base types,
>> allowing for later relocation of split BTF with a (possibly
>> changed) base. resolve_btfids detects the presence of a .BTF.base
>> section and will use it instead of the base BTF it is passed in
>> BTF id resolution.
>>
>> Modules will be built with a distilled .BTF.base section for external
>> module build, i.e.
>>
>> make -C. -M=path2/module
>>
>> ...while in-tree module build as part of a normal kernel build will
>> not generate distilled base BTF; this is because in-tree modules
>> change with the kernel and do not require BTF relocation for the
>> running vmlinux.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>> scripts/Makefile.btf | 5 +++++
>> scripts/Makefile.modfinal | 2 +-
>> 2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
>> index bca8a8f26ea4..191b4903e864 100644
>> --- a/scripts/Makefile.btf
>> +++ b/scripts/Makefile.btf
>> @@ -21,8 +21,13 @@ else
>> # Switch to using --btf_features for v1.26 and later.
>> pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func
>>
>> +ifneq ($(KBUILD_EXTMOD),)
>> +module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base
>
> Remind me, please. What's the state of pahole patches? Are they
> waiting on these libbpf changes to land first, right?
>
Exactly. The idea would be this; we land the libbpf patches first, and
then update the dependent commit in the dwarves repo to point at one
with the relocation APIs. In the interim - where we have the code in the
library but not in pahole - everything continues to work as before. It's
just that the above --btf_feature creating distilled BTF is ignored,
which means we don't generate any distilled BTF in any modules until
that support is in the pahole used.
Thanks!
Alan
>> +endif
>> +
>> endif
>>
>> pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE) += --lang_exclude=rust
>>
>> export PAHOLE_FLAGS := $(pahole-flags-y)
>> +export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y)
>> diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
>> index 79fcf2731686..6d2b8da98ee5 100644
>> --- a/scripts/Makefile.modfinal
>> +++ b/scripts/Makefile.modfinal
>> @@ -39,7 +39,7 @@ quiet_cmd_btf_ko = BTF [M] $@
>> if [ ! -f vmlinux ]; then \
>> printf "Skipping BTF generation for %s due to unavailability of vmlinux\n" $@ 1>&2; \
>> else \
>> - LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) --btf_base vmlinux $@; \
>> + LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) $(MODULE_PAHOLE_FLAGS) --btf_base vmlinux $@; \
>> $(RESOLVE_BTFIDS) -b vmlinux $@; \
>> fi;
>>
>> --
>> 2.31.1
>>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 8/9] libbpf,bpf: share BTF relocate-related code with kernel
2024-06-01 1:59 ` Andrii Nakryiko
@ 2024-06-02 13:42 ` Alan Maguire
0 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-06-02 13:42 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
haoluo, houtao1, bpf, masahiroy, mcgrof, nathan
On 01/06/2024 02:59, Andrii Nakryiko wrote:
> On Fri, May 31, 2024 at 12:04 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
>>
>> On Tue, May 28, 2024 at 5:25 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>
>>> Share relocation implementation with the kernel. As part of this,
>>> we also need the type/string visitation functions so add them to a
>>> btf_common.c file that also gets shared with the kernel. Relocation
>>> code in kernel and userspace is identical save for the impementation
>>> of the reparenting of split BTF to the relocated base BTF and
>>> retrieval of BTF header from "struct btf"; these small functions
>>> need separate user-space and kernel implementations.
>>>
>>> One other wrinkle on the kernel side is we have to map .BTF.ids in
>>> modules as they were generated with the type ids used at BTF encoding
>>> time. btf_relocate() optionally returns an array mapping from old BTF
>>> ids to relocated ids, so we use that to fix up these references where
>>> needed for kfuncs.
>>>
>>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>>> ---
>>> include/linux/btf.h | 45 ++++++++++
>>> kernel/bpf/Makefile | 10 ++-
>>> kernel/bpf/btf.c | 168 +++++++++++++++++++++++++----------
>>> tools/lib/bpf/Build | 2 +-
>>> tools/lib/bpf/btf.c | 130 ---------------------------
>>> tools/lib/bpf/btf_iter.c | 143 +++++++++++++++++++++++++++++
>>> tools/lib/bpf/btf_relocate.c | 23 +++++
>>> 7 files changed, 344 insertions(+), 177 deletions(-)
>>> create mode 100644 tools/lib/bpf/btf_iter.c
>>>
>>
>> [...]
>>
>>> +static inline struct btf_decl_tag *btf_decl_tag(const struct btf_type *t)
>>> +{
>>> + return (struct btf_decl_tag *)(t + 1);
>>> +}
>>> +
>>> static inline int btf_id_cmp_func(const void *a, const void *b)
>>> {
>>> const int *pa = a, *pb = b;
>>> @@ -515,9 +528,17 @@ static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *
>>> }
>>> #endif
>>>
>>> +typedef int (*type_id_visit_fn)(__u32 *type_id, void *ctx);
>>> +typedef int (*str_off_visit_fn)(__u32 *str_off, void *ctx);
>>> +
>>
>> let me take a quick stab at implementing type/str field iterator in
>> libbpf. If I don't get stuck anywhere, maybe you can just rebase on
>> that and avoid the callback hell and the need for this
>> callback-vs-iter churn in the kernel code as well
>>
>
> Sent it out as one RFC patch (which unfortunately makes it harder to
> see iterator logic, sorry; but I ran out of time to split it
> properly), see [0].
> It is especially nice when per-field logic is very simple (like
> bpftool's gen.c logic, where we just remap ID). Please take a look and
> let me know what you think.
>
> [0] https://patchwork.kernel.org/project/netdevbpf/patch/20240601014505.3443241-1-andrii@kernel.org/
>
thanks for this! Looking now...
>>> #ifdef CONFIG_BPF_SYSCALL
>>> const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
>>> +void btf_set_base_btf(struct btf *btf, const struct btf *base_btf);
>>> +int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **map_ids);
>>> +int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx);
>>> +int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx);
>>> const char *btf_name_by_offset(const struct btf *btf, u32 offset);
>>> +const char *btf_str_by_offset(const struct btf *btf, u32 offset);
>>> struct btf *btf_parse_vmlinux(void);
>>> struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
>>> u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id,
>>
>> [...]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation
2024-06-01 7:56 ` Eduard Zingerman
@ 2024-06-02 13:47 ` Alan Maguire
0 siblings, 0 replies; 29+ messages in thread
From: Alan Maguire @ 2024-06-02 13:47 UTC (permalink / raw)
To: Eduard Zingerman
Cc: andrii, jolsa, acme, quentin, mykolal, ast, daniel, martin.lau,
song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
houtao1, bpf, masahiroy, mcgrof, nathan
On 01/06/2024 08:56, Eduard Zingerman wrote:
> On Fri, 2024-05-31 at 16:38 +0100, Alan Maguire wrote:
>
> Hi Alan,
>
>> The in-kernel sort reports n*log2(n) + 0.37*n + o(n) comparisons on
>> average; for base BTF that means sorting requires at least
>>
>> Base: 14307*14+0.37*14307 = 205592 comparisons
>> Dist: 600*9+0.37*600 = 5622 comparisons
>>
>> So we get an inversion of the above results, with (unless I'm
>> miscalculating something), sorting distilled base BTF requiring less
>> comparisons overall across both sort+search.
>>
>> Sort Comparisons Search comparisons Total
>> ======================================================================
>> 5622 (distilled) 128763 (to base) 134385
>> 205592 (base) 8400 (to distilled) 213992
>
> It was absolutely stupid of me to not include the base sort cost into
> the calculations, really embarrassing. Thank you for pointing this out
> and my apologies for suggesting such nonsense.
>
No apologies necessary! Your feedback was a great reminder to actually
check the relative overheads properly (which I had neglected to do in
depth), and it just turns out in this case that the relative proportions
between base and distilled base make sorting the distilled base the
right approach. So this was absolutely a valid theoretical question, and
I'm very glad you asked it.
> [...]
>
>>> The algorithm might not handle name duplicates in the distilled BTF well,
>>> e.g. in theory, the following is a valid C code
>>>
>>> struct foo { int f; }; // sizeof(struct foo) == 4
>>> typedef int foo; // sizeof(foo) == 4
>>>
>>> Suppose that these types are a part of the distilled BTF.
>>> Depending on which one would end up first in 'dist_base_info_sorted'
>>> bsearch might fail to find one or the other.
>>
>> In the case of distilled base BTF, only struct, union, enum, enum64,
>> int, float and fwd can be present. Size matches would have to be between
>> one of these kinds I think, but are still possible nevertheless.
>
> As Andrii noted in a sibling reply, there is still a slim possibility
> for name duplicates in the distilled base. Imo, if we can catch the
> corner case we should.
>
>>> Also, algorithm does not report an error if there are several
>>> types with the same name and size in the base BTF.
>>
>> Yep, while we have to handle this, it only becomes an ambiguity problem
>> if distilled base BTF refers to one of such types. On my vmlinux I see
>> the following duplicate name/size STRUCTs
>
> As you noted, this situation is really easy to catch by checking if
> id_map slot is already occupied, so it should be checked.
>
> [...]
>
>> struct elf_thread_core_info___2;
>>
>> struct elf_note_info___2 {
>> struct elf_thread_core_info___2 *thread;
>> struct memelfnote psinfo;
>> struct memelfnote signote;
>> struct memelfnote auxv;
>> struct memelfnote files;
>> compat_siginfo_t csigdata;
>> size_t size;
>> int thread_notes;
>> };
>>
>> Both of these share self-reference, either directly or indirectly so
>> maybe it's a corner-case of dedup we're missing. I'll dig into these later.
>
> This is interesting indeed.
>
>>> I suggest to modify the algorithm as follows:
>>> - let 'base_info_sorted' be a set of tuples {kind,name,size,id}
>>> corresponding to the base BTF, sorted by kind, name and size;
>>
>> That was my first thought, but we can't always search by kind; for
>> example it's possible the distilled base has a fwd and vmlinux only has
>> a struct kind for the same type name; in such a case we'd want to
>> support a match provided the fwd's kflag indicated a struct fwd.
>>
>> In fact looking at the code we're missing logic for the opposite
>> condition (fwd only in base, struct in distilled base). I'll fix that.
>>
>> The other case is an enum in distilled base matching an enum64
>> or an enum.
>
> I think it could be possible to do some kinds normalization
> (e.g. represent fwd's as zero sized structs or unions in
> btf_name_info).
> I'll try to implement this and get back to you on Monday.
>
> [...]
>
>> I think flipping the search order could gain search speed, but only at
>> the expense of slowing things down overall due to the extra cost of
>> having to sort so many more elements. I suspect it will mostly be a
>> wash, though numbers above seem to suggest sorting distilled base may
>> have an edge when we consider both search and sort. The question is
>> probably which sort/search order is most amenable to handling the data
>> and helping us deal with the edge cases like duplicates.
>
> Yes, you are absolutely correct.
>
> [...]
>
>> @@ -136,6 +137,19 @@ static int btf_relocate_map_distilled_base(struct
>> btf_relocate *r)
>> qsort(dist_base_info_sorted, r->nr_dist_base_types,
>> sizeof(*dist_base_info_sorted),
>> cmp_btf_name_size);
>>
>> + /* It is possible - though highly unlikely - that
>> duplicate-named types
>> + * end up in distilled based BTF; error out if this is the case.
>> + */
>> + for (id = 1; id < r->nr_dist_base_types; id++) {
>> + if (last_name == dist_base_info_sorted[id].name) {
>> + pr_warn("Multiple distilled base types [%u],
>> [%u] share name '%s'; cannot relocate with base BTF.\n",
>> + id - 1, id, last_name);
>> + err = -EINVAL;
>> + goto done;
>> + }
>> + last_name = dist_base_info_sorted[id].name;
>> + }
>> +
>
> Nit: this rejects a case when both distilled types are embedded and a
> counterpart for each could be found in base. But that's a bit
> inconvenient to check for in the current framework. Probably not
> important.
>
>> /* Mark distilled base struct/union members of split BTF
>> structs/unions
>> * in id_map with BTF_IS_EMBEDDED; this signals that these types
>> * need to match both name and size, otherwise embeddding the base
>> @@ -272,6 +286,21 @@ static int btf_relocate_map_distilled_base(struct
>> btf_relocate *r)
>> default:
>> continue;
>> }
>> + if (r->id_map[dist_name_info->id] &&
>> + r->id_map[dist_name_info->id != BTF_IS_EMBEDDED) {
>> + /* we already have a match; this tells us that
>> + * multiple base types of the same name
>> + * have the same size, since for cases where
>> + * multiple types have the same name we match
>> + * on name and size. In this case, we have
>> + * no way of determining which to relocate
>> + * to in base BTF, so error out.
>> + */
>> + pr_warn("distilled base BTF type '%s' [%u], size
>> %u has multiple candidates of the same size (ids [%u, %u]) in base BTF\n",
>> + base_name_info.name, dist_name_info->id,
>> base_t->size,
>> + id, r->id_map[dist_name_info->id]);
>> + err = -EINVAL;
>> + goto done;
>> + }
>
> I think this hunk should be added.
>
Sure, will do! Thanks again!
Alan
> [...]
>
> Best regards,
> Eduard
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2024-06-02 13:48 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28 12:23 [PATCH v5 bpf-next 0/9] bpf: support resilient split BTF Alan Maguire
2024-05-28 12:24 ` [PATCH v5 bpf-next 1/9] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
2024-05-30 19:58 ` Eduard Zingerman
2024-05-28 12:24 ` [PATCH v5 bpf-next 2/9] selftests/bpf: test distilled base, split BTF generation Alan Maguire
2024-05-30 20:23 ` Eduard Zingerman
2024-05-28 12:24 ` [PATCH v5 bpf-next 3/9] libbpf: split BTF relocation Alan Maguire
2024-05-31 2:22 ` Eduard Zingerman
2024-05-31 15:38 ` Alan Maguire
2024-05-31 18:44 ` Andrii Nakryiko
2024-06-01 7:56 ` Eduard Zingerman
2024-06-02 13:47 ` Alan Maguire
2024-05-31 18:21 ` Andrii Nakryiko
2024-05-28 12:24 ` [PATCH v5 bpf-next 4/9] selftests/bpf: extend distilled BTF tests to cover " Alan Maguire
2024-05-31 2:23 ` Eduard Zingerman
2024-05-28 12:24 ` [PATCH v5 bpf-next 5/9] libbpf: make btf_parse_elf process .BTF.base transparently Alan Maguire
2024-05-31 18:57 ` Andrii Nakryiko
2024-06-01 8:04 ` Eduard Zingerman
2024-05-28 12:24 ` [PATCH v5 bpf-next 6/9] resolve_btfids: handle presence of .BTF.base section Alan Maguire
2024-05-28 12:24 ` [PATCH v5 bpf-next 7/9] module, bpf: store BTF base pointer in struct module Alan Maguire
2024-05-28 18:27 ` Luis Chamberlain
2024-05-28 12:24 ` [PATCH v5 bpf-next 8/9] libbpf,bpf: share BTF relocate-related code with kernel Alan Maguire
2024-05-31 9:34 ` Eduard Zingerman
2024-05-31 16:13 ` Alan Maguire
2024-05-31 19:04 ` Andrii Nakryiko
2024-06-01 1:59 ` Andrii Nakryiko
2024-06-02 13:42 ` Alan Maguire
2024-05-28 12:24 ` [PATCH v5 bpf-next 9/9] kbuild,bpf: add module-specific pahole flags for distilled base BTF Alan Maguire
2024-05-31 19:06 ` Andrii Nakryiko
2024-06-02 13:41 ` Alan Maguire
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox