bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
@ 2024-05-17 10:22 Alan Maguire
  2024-05-17 10:22 ` [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
                   ` (13 more replies)
  0 siblings, 14 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Split BPF Type Format (BTF) provides huge advantages in that kernel
modules only have to provide type information for types that they do not
share with the core kernel; for core kernel types, split BTF refers to
core kernel BTF type ids.  So for a STRUCT sk_buff, a module that
uses that structure (or a pointer to it) simply needs to refer to the
core kernel type id, saving the need to define the structure and its many
dependents.  This cuts down on duplication and makes BTF as compact
as possible.

However, there is a downside.  This scheme requires the references from
split BTF to base BTF to be valid not just at encoding time, but at use
time (when the module is loaded).  Even a small change in kernel types
can perturb the type ids in core kernel BTF, and due to pahole's
parallel processing of compilation units, even an unchanged kernel can
have different type ids if BTF is re-generated.  So we have a robustness
problem for split BTF for cases where a module is not always compiled at
the same time as the kernel.  This problem is particularly acute for
distros which generally want module builders to be able to compile a
module for the lifetime of a Linux stable-based release, and have it
continue to be valid over the lifetime of that release, even as changes
in data structures (and hence BTF types) accrue.  Today it's not
possible to generate BTF for modules that works beyond the initial
kernel it is compiled against - kernel bugfixes etc invalidate the split
BTF references to vmlinux BTF, and BTF is no longer usable for the
module.

The goal of this series is to provide options to provide additional
context for cases like this.  That context comes in the form of
distilled base BTF; it stands in for the base BTF, and contains
information about the types referenced from split BTF, but not their
full descriptions.  The modified split BTF will refer to type ids in
this .BTF.base section, and when the kernel loads such modules it
will use that .BTF.base to map references from split BTF to the
equivalent current vmlinux base BTF types.  Once this relocation
process has succeeded, the module BTF available in /sys/kernel/btf
will look exactly as if it was built with the current vmlinux;
references to base types will be fixed up etc.

A module builder - using this series along with the pahole changes -
can then build a module with distilled base BTF via an out-of-tree
module build, i.e.

make -C . M=path/2/module

The module will have a .BTF section (the split BTF) and a
.BTF.base section.  The latter is small in size - distilled base
BTF does not need full struct/union/enum information for named
types for example.  For 2667 modules built with distilled base BTF,
the average size observed was 1556 bytes (stddev 1563).  The overall
size added to this 2667 modules was 5.3Mb.

Note that for the in-tree modules, this approach is not needed as
split and base BTF in the case of in-tree modules are always built
and re-built together.

The series first focuses on generating split BTF with distilled base
BTF, and provides btf__parse_opts() which allows specification
of the section name from which to read BTF data, since we now have
both .BTF and .BTF.base sections that can contain such data.

Then we add support to resolve_btfids for generating the .BTF.ids
section with reference to the .BTF.base section - this ensures the
.BTF.ids match those used in the split/base BTF.

Finally the series provides the mechanism for relocating split BTF with
a new base; the distilled base BTF is used to map the references to base
BTF in the split BTF to the new base.  For the kernel, this relocation
process happens at module load time, and we relocate split BTF
references to point at types in the current vmlinux BTF.  As part of
this, .BTF.ids references need to be mapped also.

So concretely, what happens is

- we generate split BTF in the .BTF section of a module that refers to
  types in the .BTF.base section as base types; the latter are not full
  type descriptions but provide information about the base type.  So
  a STRUCT sk_buff would be represented as a FWD struct sk_buff in
  distilled base BTF for example.
- when the module is loaded, the split BTF is relocated with vmlinux
  BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff
  in vmlinux BTF and map all split BTF references to the distilled base
  FWD sk_buff, replacing them with references to the vmlinux BTF
  STRUCT sk_buff.

Support is also added to bpftool to be able to display split BTF
relative to its .BTF.base section, and also to display the relocated
form via the "-R path_to_base_btf".

A previous approach to this problem [1] utilized standalone BTF for such
cases - where the BTF is not defined relative to base BTF so there is no
relocation required.  The problem with that approach is that from
the verifier perspective, some types are special, and having a custom
representation of a core kernel type that did not necessarily match the
current representation is not tenable.  So the approach taken here was
to preserve the split BTF model while minimizing the representation of
the context needed to relocate split and current vmlinux BTF.

To generate distilled .BTF.base sections the associated dwarves
patch (to be applied on the "next" branch there) is needed.
Without it, things will still work but modules will not be built
with a .BTF.base section.

Changes since v3[3]:

- distill now checks for duplicate-named struct/unions and records
  them as a sized struct/union to help identify which of the
  multiple base BTF structs/unions it refers to (Eduard, patch 1)
- added test support for multiple name handling (Eduard, patch 2)
- simplified the string mapping when updating split BTF to use
  base BTF instead of distilled base.  Since the only string
  references split BTF can make to base BTF are the names of
  the base types, create a string map from distilled string
  offset -> base BTF string offset and update string offsets
  by visiting all strings in split BTF; this saves having to
  do costly searches of base BTF (Eduard, patch 7,10)
- fixed bpftool manpage and indentation issues (Quentin, patch 11)

Also explored Eduard's suggestion of doing an implicit fallback
to checking for .BTF.base section in btf__parse() when it is
called to get base BTF.  However while it is doable, it turned
out to be difficult operationally.  Since fallback is implicit
we do not know the source of the BTF - was it from .BTF or
.BTF.base? In bpftool, we want to try first standalone BTF,
then split, then split with distilled base.  Having a way
to explicitly request .BTF.base via btf__parse_opts() fits
that model better.

Changes since v2[4]:

- submitted patch to use --btf_features in Makefile.btf for pahole
  v1.26 and later separately (Andrii).  That has landed in bpf-next
  now.
- distilled base now encodes ENUM64 as fwd ENUM (size 8), eliminating
  the need for support for ENUM64 in btf__add_fwd (patch 1, Andrii)
- moved to distilling only named types, augmenting split BTF with
  associated reference types; this simplifies greatly the distilled
  base BTF and the mapping operation between distilled and base
  BTF when relocating (most of the series changes, Andrii)
- relocation now iterates over base BTF, looking for matches based
  on name in distilled BTF.  Distilled BTF is pre-sorted by name
  (Andrii, patch 8)
- removed most redundant compabitiliby checks aside from struct
  size for base types/embedded structs and kind compatibility
  (since we only match on name) (Andrii, patch 8)
- btf__parse_opts() now replaces btf_parse() internally in libbpf
  (Eduard, patch 3)

Changes since RFC [5]:

- updated terminology; we replace clunky "base reference" BTF with
  distilling base BTF into a .BTF.base section. Similarly BTF
  reconcilation becomes BTF relocation (Andrii, most patches)
- add distilled base BTF by default for out-of-tree modules
  (Alexei, patch 8)
- distill algorithm updated to record size of embedded struct/union
  by recording it as a 0-vlen STRUCT/UNION with size preserved
  (Andrii, patch 2)
- verify size match on relocation for such STRUCT/UNIONs (Andrii,
  patch 9)
- with embedded STRUCT/UNION recording size, we can have bpftool
  dump a header representation using .BTF.base + .BTF sections
  rather than special-casing and refusing to use "format c" for
  that case (patch 5)
- match enum with enum64 and vice versa (Andrii, patch 9)
- ensure that resolve_btfids works with BTF without .BTF.base
  section (patch 7)
- update tests to cover embedded types, arrays and function
  prototypes (patches 3, 12)

[1] https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/
[2] https://lore.kernel.org/bpf/20240501175035.2476830-1-alan.maguire@oracle.com/
[3] https://lore.kernel.org/bpf/20240510103052.850012-1-alan.maguire@oracle.com/
[4] https://lore.kernel.org/bpf/20240424154806.3417662-1-alan.maguire@oracle.com/
[5] https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/

Alan Maguire (11):
  libbpf: add btf__distill_base() creating split BTF with distilled base
    BTF
  selftests/bpf: test distilled base, split BTF generation
  libbpf: add btf__parse_opts() API for flexible BTF parsing
  bpftool: support displaying raw split BTF using base BTF section as
    base
  resolve_btfids: use .BTF.base ELF section as base BTF if -B option is
    used
  kbuild, bpf: add module-specific pahole/resolve_btfids flags for
    distilled base BTF
  libbpf: split BTF relocation
  selftests/bpf: extend distilled BTF tests to cover BTF relocation
  module, bpf: store BTF base pointer in struct module
  libbpf,bpf: share BTF relocate-related code with kernel
  bpftool: support displaying relocated-with-base split BTF

 include/linux/btf.h                           |  45 ++
 include/linux/module.h                        |   2 +
 kernel/bpf/Makefile                           |   8 +
 kernel/bpf/btf.c                              | 166 +++--
 kernel/module/main.c                          |   5 +-
 scripts/Makefile.btf                          |   7 +
 scripts/Makefile.modfinal                     |   4 +-
 .../bpf/bpftool/Documentation/bpftool-btf.rst |  15 +-
 tools/bpf/bpftool/bash-completion/bpftool     |   7 +-
 tools/bpf/bpftool/btf.c                       |  19 +-
 tools/bpf/bpftool/main.c                      |  14 +-
 tools/bpf/bpftool/main.h                      |   2 +
 tools/bpf/resolve_btfids/main.c               |  28 +-
 tools/lib/bpf/Build                           |   2 +-
 tools/lib/bpf/btf.c                           | 605 +++++++++++++-----
 tools/lib/bpf/btf.h                           |  59 ++
 tools/lib/bpf/btf_common.c                    | 143 +++++
 tools/lib/bpf/btf_relocate.c                  | 341 ++++++++++
 tools/lib/bpf/libbpf.map                      |   3 +
 tools/lib/bpf/libbpf_internal.h               |   3 +
 .../selftests/bpf/prog_tests/btf_distill.c    | 346 ++++++++++
 21 files changed, 1612 insertions(+), 212 deletions(-)
 create mode 100644 tools/lib/bpf/btf_common.c
 create mode 100644 tools/lib/bpf/btf_relocate.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
@ 2024-05-17 10:22 ` Alan Maguire
  2024-05-21 21:48   ` Andrii Nakryiko
  2024-05-22 18:00   ` Kui-Feng Lee
  2024-05-17 10:22 ` [PATCH v4 bpf-next 02/11] selftests/bpf: test distilled base, split BTF generation Alan Maguire
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

To support more robust split BTF, adding supplemental context for the
base BTF type ids that split BTF refers to is required.  Without such
references, a simple shuffling of base BTF type ids (without any other
significant change) invalidates the split BTF.  Here the attempt is made
to store additional context to make split BTF more robust.

This context comes in the form of distilled base BTF providing minimal
information (name and - in some cases - size) for base INTs, FLOATs,
STRUCTs, UNIONs, ENUMs and ENUM64s along with modified split BTF that
points at that base and contains any additional types needed (such as
TYPEDEF, PTR and anonymous STRUCT/UNION declarations).  This
information constitutes the minimal BTF representation needed to
disambiguate or remove split BTF references to base BTF.  The rules
are as follows:

- INT, FLOAT are recorded in full.
- if a named base BTF STRUCT or UNION is referred to from split BTF, it
  will be encoded either as a zero-member sized STRUCT/UNION (preserving
  size for later relocation checks) or as a named FWD.  Only base BTF
  STRUCT/UNIONs that are either embedded in split BTF STRUCT/UNIONs or
  that have multiple STRUCT/UNION instances of the same name need to
  preserve size information, so a FWD representation will be used in
  most cases.
- if an ENUM[64] is named, a ENUM forward representation (an ENUM
  with no values) is used.
- in all other cases, the type is added to the new split BTF.

Avoiding struct/union/enum/enum64 expansion is important to keep the
distilled base BTF representation to a minimum size.

When successful, new representations of the distilled base BTF and new
split BTF that refers to it are returned.  Both need to be freed by the
caller.

So to take a simple example, with split BTF with a type referring
to "struct sk_buff", we will generate distilled base BTF with a
FWD struct sk_buff, and the split BTF will refer to it instead.

Tools like pahole can utilize such split BTF to populate the .BTF
section (split BTF) and an additional .BTF.base section.  Then
when the split BTF is loaded, the distilled base BTF can be used
to relocate split BTF to reference the current (and possibly changed)
base BTF.

So for example if "struct sk_buff" was id 502 when the split BTF was
originally generated,  we can use the distilled base BTF to see that
id 502 refers to a "struct sk_buff" and replace instances of id 502
with the current (relocated) base BTF sk_buff type id.

Distilled base BTF is small; when building a kernel with all modules
using distilled base BTF as a test, ovreall module size grew by only
5.3Mb total across ~2700 modules.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/btf.c      | 409 ++++++++++++++++++++++++++++++++++++++-
 tools/lib/bpf/btf.h      |  20 ++
 tools/lib/bpf/libbpf.map |   1 +
 3 files changed, 424 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 2d0840ef599a..953929d196c3 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1771,9 +1771,8 @@ static int btf_rewrite_str(__u32 *str_off, void *ctx)
 	return 0;
 }
 
-int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
+static int btf_add_type(struct btf_pipe *p, const struct btf_type *src_type)
 {
-	struct btf_pipe p = { .src = src_btf, .dst = btf };
 	struct btf_type *t;
 	int sz, err;
 
@@ -1782,20 +1781,27 @@ int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_t
 		return libbpf_err(sz);
 
 	/* deconstruct BTF, if necessary, and invalidate raw_data */
-	if (btf_ensure_modifiable(btf))
+	if (btf_ensure_modifiable(p->dst))
 		return libbpf_err(-ENOMEM);
 
-	t = btf_add_type_mem(btf, sz);
+	t = btf_add_type_mem(p->dst, sz);
 	if (!t)
 		return libbpf_err(-ENOMEM);
 
 	memcpy(t, src_type, sz);
 
-	err = btf_type_visit_str_offs(t, btf_rewrite_str, &p);
+	err = btf_type_visit_str_offs(t, btf_rewrite_str, p);
 	if (err)
 		return libbpf_err(err);
 
-	return btf_commit_type(btf, sz);
+	return btf_commit_type(p->dst, sz);
+}
+
+int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
+{
+	struct btf_pipe p = { .src = src_btf, .dst = btf };
+
+	return btf_add_type(&p, src_type);
 }
 
 static int btf_rewrite_type_ids(__u32 *type_id, void *ctx)
@@ -5212,3 +5218,394 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
 
 	return 0;
 }
+
+#define BTF_NEEDS_SIZE	(1 << 31)	/* flag set if either struct/union is
+					 * embedded - and thus size info must
+					 * be preserved - or if there are
+					 * multiple instances of the same
+					 * struct/union - where size can be
+					 * used to clarify which is wanted.
+					 */
+#define BTF_ID(id)		(id & ~BTF_NEEDS_SIZE)
+
+struct btf_distill {
+	struct btf_pipe pipe;
+	int *ids;
+	unsigned int split_start_id;
+	unsigned int split_start_str;
+	unsigned int diff_id;
+};
+
+/* Check if a member of a split BTF struct/union refers to a base BTF
+ * struct/union.  Members can be const/restrict/volatile/typedef
+ * reference types, but if a pointer is encountered, type is no longer
+ * considered embedded.
+ */
+static int btf_find_embedded_composite_type_ids(__u32 *id, void *ctx)
+{
+	struct btf_distill *dist = ctx;
+	const struct btf_type *t;
+	__u32 next_id = *id;
+
+	do {
+		if (next_id == 0)
+			return 0;
+		t = btf_type_by_id(dist->pipe.src, next_id);
+		switch (btf_kind(t)) {
+		case BTF_KIND_CONST:
+		case BTF_KIND_RESTRICT:
+		case BTF_KIND_VOLATILE:
+		case BTF_KIND_TYPEDEF:
+		case BTF_KIND_TYPE_TAG:
+			next_id = t->type;
+			break;
+		case BTF_KIND_ARRAY: {
+			struct btf_array *a = btf_array(t);
+
+			next_id = a->type;
+			break;
+		}
+		case BTF_KIND_STRUCT:
+		case BTF_KIND_UNION:
+			dist->ids[next_id] |= BTF_NEEDS_SIZE;
+			return 0;
+		default:
+			return 0;
+		}
+
+	} while (1);
+
+	return 0;
+}
+
+/* Check if composite type has a duplicate-named type; if it does, retain
+ * size information to help guide later relocation towards the correct type.
+ * For example there are duplicate 'dma_chan' structs in vmlinux BTF;
+ * one is size 112, the other 16.  Having size information allows relocation
+ * to choose the right one.
+ */
+static int btf_mark_composite_dups(struct btf_distill *dist, __u32 id)
+{
+	__u8 *cnt = calloc(dist->split_start_str, sizeof(__u8));
+	struct btf_type *t;
+	int i;
+
+	if (!cnt)
+		return -ENOMEM;
+
+	/* First pass; collect name counts for composite types. */
+	for (i = 1; i < dist->split_start_id; i++) {
+		t = btf_type_by_id(dist->pipe.src, i);
+		if (!btf_is_composite(t) || !t->name_off)
+			continue;
+		if (cnt[t->name_off] < 255)
+			cnt[t->name_off]++;
+	}
+	/* Second pass; mark composite types with multiple instances of the
+	 * same name as needing size information.
+	 */
+	for (i = 1; i < dist->split_start_id; i++) {
+		/* id not needed or is already preserving size information */
+		if (!dist->ids[i] || (dist->ids[i] & BTF_NEEDS_SIZE))
+			continue;
+		t = btf_type_by_id(dist->pipe.src, i);
+		if (!btf_is_composite(t) || !t->name_off)
+			continue;
+		if (cnt[t->name_off] > 1)
+			dist->ids[i] |= BTF_NEEDS_SIZE;
+	}
+	free(cnt);
+
+	return 0;
+}
+
+static bool btf_is_eligible_named_fwd(const struct btf_type *t)
+{
+	return (btf_is_composite(t) || btf_is_any_enum(t)) && t->name_off != 0;
+}
+
+static int btf_add_distilled_type_ids(__u32 *id, void *ctx)
+{
+	struct btf_distill *dist = ctx;
+	struct btf_type *t = btf_type_by_id(dist->pipe.src, *id);
+	int err;
+
+	if (!*id)
+		return 0;
+	/* split BTF id, not needed */
+	if (*id >= dist->split_start_id)
+		return 0;
+	/* already added ? */
+	if (BTF_ID(dist->ids[*id]) > 0)
+		return 0;
+
+	/* only a subset of base BTF types should be referenced from split
+	 * BTF; ensure nothing unexpected is referenced.
+	 */
+	switch (btf_kind(t)) {
+	case BTF_KIND_INT:
+	case BTF_KIND_FLOAT:
+	case BTF_KIND_FWD:
+	case BTF_KIND_ARRAY:
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION:
+	case BTF_KIND_TYPEDEF:
+	case BTF_KIND_ENUM:
+	case BTF_KIND_ENUM64:
+	case BTF_KIND_PTR:
+	case BTF_KIND_CONST:
+	case BTF_KIND_RESTRICT:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_FUNC_PROTO:
+	case BTF_KIND_TYPE_TAG:
+		dist->ids[*id] |= *id;
+		break;
+	default:
+		pr_warn("unexpected reference to base type[%u] of kind [%u] when creating distilled base BTF.\n",
+			*id, btf_kind(t));
+		return -EINVAL;
+	}
+
+	/* struct/union members not needed, except for anonymous structs
+	 * and unions, which we need since name won't help us determine
+	 * matches; so if a named struct/union, no need to recurse
+	 * into members.
+	 */
+	if (btf_is_eligible_named_fwd(t))
+		return 0;
+
+	/* ensure references in type are added also. */
+	err = btf_type_visit_type_ids(t, btf_add_distilled_type_ids, ctx);
+	if (err < 0)
+		return err;
+	return 0;
+}
+
+static int btf_add_distilled_types(struct btf_distill *dist)
+{
+	bool adding_to_base = dist->pipe.dst->start_id == 1;
+	int id = btf__type_cnt(dist->pipe.dst);
+	struct btf_type *t;
+	int i, err = 0;
+
+	/* Add types for each of the required references to either distilled
+	 * base or split BTF, depending on type characteristics.
+	 */
+	for (i = 1; i < dist->split_start_id; i++) {
+		const char *name;
+		int kind;
+
+		if (!BTF_ID(dist->ids[i]))
+			continue;
+		t = btf_type_by_id(dist->pipe.src, i);
+		kind = btf_kind(t);
+		name = btf__name_by_offset(dist->pipe.src, t->name_off);
+
+		/* Named int, float, fwd struct, union, enum[64] are added to
+		 * base; everything else is added to split BTF.
+		 */
+		switch (kind) {
+		case BTF_KIND_INT:
+		case BTF_KIND_FLOAT:
+		case BTF_KIND_FWD:
+		case BTF_KIND_STRUCT:
+		case BTF_KIND_UNION:
+		case BTF_KIND_ENUM:
+		case BTF_KIND_ENUM64:
+			if ((adding_to_base && !t->name_off) || (!adding_to_base && t->name_off))
+				continue;
+			break;
+		default:
+			if (adding_to_base)
+				continue;
+			break;
+		}
+		if (dist->ids[i] & BTF_NEEDS_SIZE) {
+			/* If a named struct/union in base BTF is referenced as a type
+			 * in split BTF without use of a pointer - i.e. as an embedded
+			 * struct/union - add an empty struct/union preserving size
+			 * since size must be consistent when relocating split and
+			 * possibly changed base BTF.  Similarly, when a struct/union
+			 * has multiple instances of the same name in the original
+			 * base BTF, retain size to help relocation later pick the
+			 * right struct/union.
+			 */
+			err = btf_add_composite(dist->pipe.dst, kind, name, t->size);
+		} else if (btf_is_eligible_named_fwd(t)) {
+			/* If not embedded, use a fwd for named struct/unions since we
+			 * can match via name without any other details.
+			 */
+			switch (kind) {
+			case BTF_KIND_STRUCT:
+				err = btf__add_fwd(dist->pipe.dst, name, BTF_FWD_STRUCT);
+				break;
+			case BTF_KIND_UNION:
+				err = btf__add_fwd(dist->pipe.dst, name, BTF_FWD_UNION);
+				break;
+			case BTF_KIND_ENUM:
+				err = btf__add_enum(dist->pipe.dst, name, t->size);
+				break;
+			case BTF_KIND_ENUM64:
+				err = btf__add_enum(dist->pipe.dst, name, t->size);
+				break;
+			default:
+				pr_warn("unexpected kind [%u] when creating distilled base BTF.\n",
+					btf_kind(t));
+				return -EINVAL;
+			}
+		} else {
+			err = btf_add_type(&dist->pipe, t);
+		}
+		if (err < 0)
+			break;
+		dist->ids[i] = id++;
+	}
+	return err;
+}
+
+/* Split BTF ids without a mapping will be shifted downwards since distilled
+ * base BTF is smaller than the original base BTF.  For those that have a
+ * mapping (either to base or updated split BTF), update the id based on
+ * that mapping.
+ */
+static int btf_update_distilled_type_ids(__u32 *id, void *ctx)
+{
+	struct btf_distill *dist = ctx;
+
+	if (BTF_ID(dist->ids[*id]))
+		*id = BTF_ID(dist->ids[*id]);
+	else if (*id >= dist->split_start_id)
+		*id -= dist->diff_id;
+	return 0;
+}
+
+/* Create updated split BTF with distilled base BTF; distilled base BTF
+ * consists of BTF information required to clarify the types that split
+ * BTF refers to, omitting unneeded details.  Specifically it will contain
+ * base types and forward declarations of named structs, unions and enumerated
+ * types. Associated reference types like pointers, arrays and anonymous
+ * structs, unions and enumerated types will be added to split BTF.
+ *
+ * The only case where structs, unions or enumerated types are fully represented
+ * is when they are anonymous; in such cases, the anonymous type is added to
+ * split BTF in full.
+ *
+ * We return newly-created split BTF where the split BTF refers to a newly-created
+ * distilled base BTF. Both must be freed separately by the caller.
+ */
+int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf,
+		      struct btf **new_split_btf)
+{
+	struct btf *new_base = NULL, *new_split = NULL;
+	const struct btf *old_base;
+	unsigned int n = btf__type_cnt(src_btf);
+	struct btf_distill dist = {};
+	struct btf_type *t;
+	int i, err = 0;
+
+	/* src BTF must be split BTF. */
+	old_base = btf__base_btf(src_btf);
+	if (!new_base_btf || !new_split_btf || !old_base)
+		return libbpf_err(-EINVAL);
+
+	new_base = btf__new_empty();
+	if (!new_base)
+		return libbpf_err(-ENOMEM);
+	dist.ids = calloc(n, sizeof(*dist.ids));
+	if (!dist.ids) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+	dist.pipe.src = src_btf;
+	dist.pipe.dst = new_base;
+	dist.pipe.str_off_map = hashmap__new(btf_dedup_identity_hash_fn, btf_dedup_equal_fn, NULL);
+	if (IS_ERR(dist.pipe.str_off_map)) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+	dist.split_start_id = btf__type_cnt(old_base);
+	dist.split_start_str = old_base->hdr->str_len;
+
+	/* Pass over src split BTF; generate the list of base BTF
+	 * type ids it references; these will constitute our distilled
+	 * BTF set to be distributed over base and split BTF as appropriate.
+	 */
+	for (i = src_btf->start_id; i < n; i++) {
+		t = btf_type_by_id(src_btf, i);
+
+		/* check if members of struct/union in split BTF refer to base BTF
+		 * struct/union; if so, we will use an empty sized struct to represent
+		 * it rather than a FWD because its size must match on later BTF
+		 * relocation.
+		 */
+		if (btf_is_composite(t)) {
+			err = btf_type_visit_type_ids(t, btf_find_embedded_composite_type_ids,
+						      &dist);
+			if (err < 0)
+				goto err_out;
+		}
+		err = btf_type_visit_type_ids(t,  btf_add_distilled_type_ids, &dist);
+		if (err < 0)
+			goto err_out;
+	}
+	/* Next check the distilled type id list for any struct/unions that
+	 * have multiple instances of the same name in base BTF; size must be
+	 * preserved for those cases also to guide relocation matching.
+	 */
+	err = btf_mark_composite_dups(&dist, i);
+	if (err < 0)
+		goto err_out;
+
+	/* Next add types for each of the required references to base BTF and split BTF in turn. */
+	err = btf_add_distilled_types(&dist);
+	if (err < 0)
+		goto err_out;
+	/* now create new split BTF with distilled base BTF as its base; we end up with
+	 * split BTF that has base BTF that represents enough about its base references
+	 * to allow it to be relocated with the base BTF available.
+	 */
+	new_split = btf__new_empty_split(new_base);
+	if (!new_split_btf) {
+		err = -errno;
+		goto err_out;
+	}
+	dist.pipe.dst = new_split;
+	/* First add all split types */
+	for (i = src_btf->start_id; i < n; i++) {
+		t = btf_type_by_id(src_btf, i);
+		err = btf_add_type(&dist.pipe, t);
+		if (err < 0)
+			goto err_out;
+	}
+	/* Now add distilled types to split BTF that are not added to base. */
+	err = btf_add_distilled_types(&dist);
+	if (err < 0)
+		goto err_out;
+
+	/* all split BTF ids will be shifted downwards since there are less base BTF ids
+	 * in distilled base BTF.
+	 */
+	dist.diff_id = dist.split_start_id - btf__type_cnt(new_base);
+
+	n = btf__type_cnt(new_split);
+	/* Now update base/split BTF ids. */
+	for (i = 1; i < n; i++) {
+		t = btf_type_by_id(new_split, i);
+
+		err = btf_type_visit_type_ids(t, btf_update_distilled_type_ids, &dist);
+		if (err < 0)
+			goto err_out;
+	}
+	free(dist.ids);
+	hashmap__free(dist.pipe.str_off_map);
+	*new_base_btf = new_base;
+	*new_split_btf = new_split;
+	return 0;
+err_out:
+	free(dist.ids);
+	if (!IS_ERR(dist.pipe.str_off_map))
+		hashmap__free(dist.pipe.str_off_map);
+	btf__free(new_split);
+	btf__free(new_base);
+	return libbpf_err(err);
+}
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 8e6880d91c84..f3f149a09088 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -107,6 +107,26 @@ LIBBPF_API struct btf *btf__new_empty(void);
  */
 LIBBPF_API struct btf *btf__new_empty_split(struct btf *base_btf);
 
+/**
+ * @brief **btf__distill_base()** creates new versions of the split BTF
+ * *src_btf* and its base BTF. The new base BTF will only contain the types
+ * needed to improve robustness of the split BTF to small changes in base BTF.
+ * When that split BTF is loaded against a (possibly changed) base, this
+ * distilled base BTF will help update references to that (possibly changed)
+ * base BTF.
+ *
+ * Both the new split and its associated new base BTF must be freed by
+ * the caller.
+ *
+ * If successful, 0 is returned and **new_base_btf** and **new_split_btf**
+ * will point at new base/split BTF. Both the new split and its associated
+ * new base BTF must be freed by the caller.
+ *
+ * A negative value is returned on error.
+ */
+LIBBPF_API int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf,
+				 struct btf **new_split_btf);
+
 LIBBPF_API struct btf *btf__parse(const char *path, struct btf_ext **btf_ext);
 LIBBPF_API struct btf *btf__parse_split(const char *path, struct btf *base_btf);
 LIBBPF_API struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index c1ce8aa3520b..9e69d6e2a512 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -419,6 +419,7 @@ LIBBPF_1.4.0 {
 
 LIBBPF_1.5.0 {
 	global:
+		btf__distill_base;
 		bpf_program__attach_sockmap;
 		ring__consume_n;
 		ring_buffer__consume_n;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 bpf-next 02/11] selftests/bpf: test distilled base, split BTF generation
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
  2024-05-17 10:22 ` [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
@ 2024-05-17 10:22 ` Alan Maguire
  2024-05-17 10:22 ` [PATCH v4 bpf-next 03/11] libbpf: add btf__parse_opts() API for flexible BTF parsing Alan Maguire
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Test generation of split+distilled base BTF, ensuring that

- base BTF STRUCTs which are embedded in split BTF structs are
  represented as 0-member sized structs, allowing size checking
- base BTF UNION which has a duplicate-named UNION is represented
  as a 0-member sized union, helping clarify which union is referred
  to
- FWDs are used in place of full named struct/union declarations
- FWDs are used in place of full named enum declarations
- anonymous struct/unions are represented in full in split BTF
- anonymous enums are represented in full in split BTF
- types unreferenced from split BTF are not present in distilled
  base BTF

Also test that with vmlinux BTF and split BTF based upon it,
we only represent needed base types referenced from split BTF
in distilled base.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 .../selftests/bpf/prog_tests/btf_distill.c    | 279 ++++++++++++++++++
 1 file changed, 279 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c

diff --git a/tools/testing/selftests/bpf/prog_tests/btf_distill.c b/tools/testing/selftests/bpf/prog_tests/btf_distill.c
new file mode 100644
index 000000000000..4ab522df041c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/btf_distill.c
@@ -0,0 +1,279 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024, Oracle and/or its affiliates. */
+
+#include <test_progs.h>
+#include <bpf/btf.h>
+#include "btf_helpers.h"
+
+/* Fabricate base, split BTF with references to base types needed; then create
+ * split BTF with distilled base BTF and ensure expectations are met:
+ *  - only referenced base types from split BTF are present
+ *  - struct/union/enum are represented as FWDs unless anonymous, when they
+ *    are represented in full, or if embedded in a split BTF struct or duplicated
+ *    case they are represented by a STRUCT/UNION with specified size and vlen=0.
+ */
+static void test_distilled_base(void)
+{
+	struct btf *btf1 = NULL, *btf2 = NULL, *btf3 = NULL, *btf4 = NULL;
+
+	btf1 = btf__new_empty();
+	if (!ASSERT_OK_PTR(btf1, "empty_main_btf"))
+		return;
+
+	btf__add_int(btf1, "int", 4, BTF_INT_SIGNED);	/* [1] int */
+	btf__add_ptr(btf1, 1);				/* [2] ptr to int */
+	btf__add_struct(btf1, "s1", 8);			/* [3] struct s1 { */
+	btf__add_field(btf1, "f1", 2, 0, 0);		/*      int *f1; */
+							/* } */
+	btf__add_struct(btf1, "", 12);			/* [4] struct { */
+	btf__add_field(btf1, "f1", 1, 0, 0);		/*	int f1; */
+	btf__add_field(btf1, "f2", 3, 32, 0);		/*	struct s1 f2; */
+							/* } */
+	btf__add_int(btf1, "unsigned int", 4, 0);	/* [5] unsigned int */
+	btf__add_union(btf1, "u1", 12);			/* [6] union u1 { */
+	btf__add_field(btf1, "f1", 1, 0, 0);		/*	int f1; */
+	btf__add_field(btf1, "f2", 2, 0, 0);		/*	int *f2; */
+							/* } */
+	btf__add_union(btf1, "", 4);			/* [7] union { */
+	btf__add_field(btf1, "f1", 1, 0, 0);		/*	int f1; */
+							/* } */
+	btf__add_enum(btf1, "e1", 4);			/* [8] enum e1 { */
+	btf__add_enum_value(btf1, "v1", 1);		/*	v1 = 1; */
+							/* } */
+	btf__add_enum(btf1, "", 4);			/* [9] enum { */
+	btf__add_enum_value(btf1, "av1", 2);		/*	av1 = 2; */
+							/* } */
+	btf__add_enum64(btf1, "e641", 8, true);		/* [10] enum64 { */
+	btf__add_enum64_value(btf1, "v1", 1024);	/*	v1 = 1024; */
+							/* } */
+	btf__add_enum64(btf1, "", 8, true);		/* [11] enum64 { */
+	btf__add_enum64_value(btf1, "v1", 1025);	/*	v1 = 1025; */
+							/* } */
+	btf__add_struct(btf1, "unneeded", 4);		/* [12] struct unneeded { */
+	btf__add_field(btf1, "f1", 1, 0, 0);		/*	int f1; */
+							/* } */
+	btf__add_struct(btf1, "embedded", 4);		/* [13] struct embedded { */
+	btf__add_field(btf1, "f1", 1, 0, 0);		/*	int f1; */
+							/* } */
+	btf__add_func_proto(btf1, 1);			/* [14] int (*)(int *p1); */
+	btf__add_func_param(btf1, "p1", 1);
+
+	btf__add_array(btf1, 1, 1, 3);			/* [15] int [3]; */
+
+	btf__add_struct(btf1, "from_proto", 4);		/* [16] struct from_proto { */
+	btf__add_field(btf1, "f1", 1, 0, 0);		/*	int f1; */
+							/* } */
+	btf__add_union(btf1, "u1", 4);			/* [17] union u1 { */
+	btf__add_field(btf1, "f1", 1, 0, 0);		/*	 int f1; */
+							/* } */
+	VALIDATE_RAW_BTF(
+		btf1,
+		"[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+		"[2] PTR '(anon)' type_id=1",
+		"[3] STRUCT 's1' size=8 vlen=1\n"
+		"\t'f1' type_id=2 bits_offset=0",
+		"[4] STRUCT '(anon)' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=3 bits_offset=32",
+		"[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)",
+		"[6] UNION 'u1' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=2 bits_offset=0",
+		"[7] UNION '(anon)' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'v1' val=1",
+		"[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'av1' val=2",
+		"[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1024",
+		"[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1025",
+		"[12] STRUCT 'unneeded' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[13] STRUCT 'embedded' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[14] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+		"\t'p1' type_id=1",
+		"[15] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3",
+		"[16] STRUCT 'from_proto' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[17] UNION 'u1' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0");
+
+	btf2 = btf__new_empty_split(btf1);
+	if (!ASSERT_OK_PTR(btf2, "empty_split_btf"))
+		goto cleanup;
+
+	btf__add_ptr(btf2, 3);				/* [18] ptr to struct s1 */
+	/* add ptr to struct anon */
+	btf__add_ptr(btf2, 4);				/* [19] ptr to struct (anon) */
+	btf__add_const(btf2, 6);			/* [20] const union u1 */
+	btf__add_restrict(btf2, 7);			/* [21] restrict union (anon) */
+	btf__add_volatile(btf2, 8);			/* [22] volatile enum e1 */
+	btf__add_typedef(btf2, "et", 9);		/* [23] typedef enum (anon) */
+	btf__add_const(btf2, 10);			/* [24] const enum64 e641 */
+	btf__add_ptr(btf2, 11);				/* [25] restrict enum64 (anon) */
+	btf__add_struct(btf2, "with_embedded", 4);	/* [26] struct with_embedded { */
+	btf__add_field(btf2, "f1", 13, 0, 0);		/*	struct embedded f1; */
+							/* } */
+	btf__add_func(btf2, "fn", BTF_FUNC_STATIC, 14);	/* [27] int fn(int p1); */
+	btf__add_typedef(btf2, "arraytype", 15);	/* [28] typedef int[3] foo; */
+	btf__add_func_proto(btf2, 1);			/* [29] int (*)(struct from proto p1); */
+	btf__add_func_param(btf2, "p1", 16);
+
+	VALIDATE_RAW_BTF(
+		btf2,
+		"[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+		"[2] PTR '(anon)' type_id=1",
+		"[3] STRUCT 's1' size=8 vlen=1\n"
+		"\t'f1' type_id=2 bits_offset=0",
+		"[4] STRUCT '(anon)' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=3 bits_offset=32",
+		"[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)",
+		"[6] UNION 'u1' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=2 bits_offset=0",
+		"[7] UNION '(anon)' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'v1' val=1",
+		"[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'av1' val=2",
+		"[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1024",
+		"[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1025",
+		"[12] STRUCT 'unneeded' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[13] STRUCT 'embedded' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[14] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+		"\t'p1' type_id=1",
+		"[15] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3",
+		"[16] STRUCT 'from_proto' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[17] UNION 'u1' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[18] PTR '(anon)' type_id=3",
+		"[19] PTR '(anon)' type_id=4",
+		"[20] CONST '(anon)' type_id=6",
+		"[21] RESTRICT '(anon)' type_id=7",
+		"[22] VOLATILE '(anon)' type_id=8",
+		"[23] TYPEDEF 'et' type_id=9",
+		"[24] CONST '(anon)' type_id=10",
+		"[25] PTR '(anon)' type_id=11",
+		"[26] STRUCT 'with_embedded' size=4 vlen=1\n"
+		"\t'f1' type_id=13 bits_offset=0",
+		"[27] FUNC 'fn' type_id=14 linkage=static",
+		"[28] TYPEDEF 'arraytype' type_id=15",
+		"[29] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+		"\t'p1' type_id=16");
+
+	if (!ASSERT_EQ(0, btf__distill_base(btf2, &btf3, &btf4),
+		       "distilled_base") ||
+	    !ASSERT_OK_PTR(btf3, "distilled_base") ||
+	    !ASSERT_OK_PTR(btf4, "distilled_split") ||
+	    !ASSERT_EQ(8, btf__type_cnt(btf3), "distilled_base_type_cnt"))
+		goto cleanup;
+
+	VALIDATE_RAW_BTF(
+		btf4,
+		"[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+		"[2] FWD 's1' fwd_kind=struct",
+		"[3] UNION 'u1' size=12 vlen=0",
+		"[4] ENUM 'e1' encoding=UNSIGNED size=4 vlen=0",
+		"[5] ENUM 'e641' encoding=UNSIGNED size=8 vlen=0",
+		"[6] STRUCT 'embedded' size=4 vlen=0",
+		"[7] FWD 'from_proto' fwd_kind=struct",
+		/* split BTF; these types should match split BTF above from 17-28, with
+		 * updated type id references
+		 */
+		"[8] PTR '(anon)' type_id=2",
+		"[9] PTR '(anon)' type_id=20",
+		"[10] CONST '(anon)' type_id=3",
+		"[11] RESTRICT '(anon)' type_id=21",
+		"[12] VOLATILE '(anon)' type_id=4",
+		"[13] TYPEDEF 'et' type_id=22",
+		"[14] CONST '(anon)' type_id=5",
+		"[15] PTR '(anon)' type_id=23",
+		"[16] STRUCT 'with_embedded' size=4 vlen=1\n"
+		"\t'f1' type_id=6 bits_offset=0",
+		"[17] FUNC 'fn' type_id=24 linkage=static",
+		"[18] TYPEDEF 'arraytype' type_id=25",
+		"[19] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+		"\t'p1' type_id=7",
+		/* split BTF types added from original base BTF below */
+		"[20] STRUCT '(anon)' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=2 bits_offset=32",
+		"[21] UNION '(anon)' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[22] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'av1' val=2",
+		"[23] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1025",
+		"[24] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+		"\t'p1' type_id=1",
+		"[25] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3");
+
+cleanup:
+	btf__free(btf4);
+	btf__free(btf3);
+	btf__free(btf2);
+	btf__free(btf1);
+}
+
+/* create split reference BTF from vmlinux + split BTF with a few type references;
+ * ensure the resultant split reference BTF is as expected, containing only types
+ * needed to disambiguate references from split BTF.
+ */
+static void test_distilled_base_vmlinux(void)
+{
+	struct btf *split_btf = NULL, *vmlinux_btf = btf__load_vmlinux_btf();
+	struct btf *split_dist = NULL, *base_dist = NULL;
+	__s32 int_id, sk_buff_id;
+
+	if (!ASSERT_OK_PTR(vmlinux_btf, "load_vmlinux"))
+		return;
+	int_id = btf__find_by_name_kind(vmlinux_btf, "int", BTF_KIND_INT);
+	if (!ASSERT_GT(int_id, 0, "find_int"))
+		goto cleanup;
+	sk_buff_id = btf__find_by_name_kind(vmlinux_btf, "sk_buff", BTF_KIND_STRUCT);
+	if (!ASSERT_GT(sk_buff_id, 0, "find_sk_buff_id"))
+		goto cleanup;
+	split_btf = btf__new_empty_split(vmlinux_btf);
+	if (!ASSERT_OK_PTR(split_btf, "new_split"))
+		goto cleanup;
+	btf__add_typedef(split_btf, "myint", int_id);
+	btf__add_ptr(split_btf, sk_buff_id);
+
+	if (!ASSERT_EQ(btf__distill_base(split_btf, &base_dist, &split_dist), 0,
+		       "distill_vmlinux_base"))
+		goto cleanup;
+
+	if (!ASSERT_OK_PTR(split_dist, "split_distilled") ||
+	    !ASSERT_OK_PTR(base_dist, "base_dist"))
+		goto cleanup;
+	VALIDATE_RAW_BTF(
+		split_dist,
+		"[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+		"[2] FWD 'sk_buff' fwd_kind=struct",
+		"[3] TYPEDEF 'myint' type_id=1",
+		"[4] PTR '(anon)' type_id=2");
+
+cleanup:
+	btf__free(split_dist);
+	btf__free(base_dist);
+	btf__free(split_btf);
+	btf__free(vmlinux_btf);
+}
+
+void test_btf_distill(void)
+{
+	if (test__start_subtest("distilled_base"))
+		test_distilled_base();
+	if (test__start_subtest("distilled_base_vmlinux"))
+		test_distilled_base_vmlinux();
+}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 bpf-next 03/11] libbpf: add btf__parse_opts() API for flexible BTF parsing
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
  2024-05-17 10:22 ` [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
  2024-05-17 10:22 ` [PATCH v4 bpf-next 02/11] selftests/bpf: test distilled base, split BTF generation Alan Maguire
@ 2024-05-17 10:22 ` Alan Maguire
  2024-05-21 22:01   ` Andrii Nakryiko
  2024-05-17 10:22 ` [PATCH v4 bpf-next 04/11] bpftool: support displaying raw split BTF using base BTF section as base Alan Maguire
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Options cover existing parsing scenarios (ELF, raw, retrieving
.BTF.ext) and also allow specification of the ELF section name
containing BTF.  This will allow consumers to retrieve BTF from
.BTF.base sections (BTF_BASE_ELF_SEC) also.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/btf.c      | 49 +++++++++++++++++++++++++++-------------
 tools/lib/bpf/btf.h      | 31 +++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 65 insertions(+), 16 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 953929d196c3..feba071087a5 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1084,7 +1084,7 @@ struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf)
 	return libbpf_ptr(btf_new(data, size, base_btf));
 }
 
-static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
+static struct btf *btf_parse_elf(const char *path, const char *btf_sec, struct btf *base_btf,
 				 struct btf_ext **btf_ext)
 {
 	Elf_Data *btf_data = NULL, *btf_ext_data = NULL;
@@ -1146,7 +1146,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
 				idx, path);
 			goto done;
 		}
-		if (strcmp(name, BTF_ELF_SEC) == 0) {
+		if (strcmp(name, btf_sec) == 0) {
 			btf_data = elf_getdata(scn, 0);
 			if (!btf_data) {
 				pr_warn("failed to get section(%d, %s) data from %s\n",
@@ -1166,7 +1166,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
 	}
 
 	if (!btf_data) {
-		pr_warn("failed to find '%s' ELF section in %s\n", BTF_ELF_SEC, path);
+		pr_warn("failed to find '%s' ELF section in %s\n", btf_sec, path);
 		err = -ENODATA;
 		goto done;
 	}
@@ -1212,12 +1212,12 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf,
 
 struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext)
 {
-	return libbpf_ptr(btf_parse_elf(path, NULL, btf_ext));
+	return libbpf_ptr(btf_parse_elf(path, BTF_ELF_SEC, NULL, btf_ext));
 }
 
 struct btf *btf__parse_elf_split(const char *path, struct btf *base_btf)
 {
-	return libbpf_ptr(btf_parse_elf(path, base_btf, NULL));
+	return libbpf_ptr(btf_parse_elf(path, BTF_ELF_SEC, base_btf, NULL));
 }
 
 static struct btf *btf_parse_raw(const char *path, struct btf *base_btf)
@@ -1293,31 +1293,48 @@ struct btf *btf__parse_raw_split(const char *path, struct btf *base_btf)
 	return libbpf_ptr(btf_parse_raw(path, base_btf));
 }
 
-static struct btf *btf_parse(const char *path, struct btf *base_btf, struct btf_ext **btf_ext)
+struct btf *btf__parse_opts(const char *path, struct btf_parse_opts *opts)
 {
-	struct btf *btf;
+	struct btf *btf, *base_btf;
+	const char *btf_sec;
+	struct btf_ext **btf_ext;
 	int err;
 
+	if (!OPTS_VALID(opts, btf_parse_opts))
+		return libbpf_err_ptr(-EINVAL);
+	base_btf = OPTS_GET(opts, base_btf, NULL);
+	btf_sec = OPTS_GET(opts, btf_sec, NULL);
+	btf_ext = OPTS_GET(opts, btf_ext, NULL);
+
 	if (btf_ext)
 		*btf_ext = NULL;
-
-	btf = btf_parse_raw(path, base_btf);
+	if (!btf_sec) {
+		btf = btf_parse_raw(path, base_btf);
+		err = libbpf_get_error(btf);
+		if (!err)
+			return btf;
+		if (err != -EPROTO)
+			return libbpf_err_ptr(err);
+	}
+	btf = btf_parse_elf(path, btf_sec ?: BTF_ELF_SEC, base_btf, btf_ext);
 	err = libbpf_get_error(btf);
-	if (!err)
-		return btf;
-	if (err != -EPROTO)
-		return ERR_PTR(err);
-	return btf_parse_elf(path, base_btf, btf_ext);
+	if (err)
+		return libbpf_err_ptr(err);
+	return btf;
 }
 
 struct btf *btf__parse(const char *path, struct btf_ext **btf_ext)
 {
-	return libbpf_ptr(btf_parse(path, NULL, btf_ext));
+	LIBBPF_OPTS(btf_parse_opts, opts, .btf_ext = btf_ext);
+
+	return btf__parse_opts(path, &opts);
 }
 
 struct btf *btf__parse_split(const char *path, struct btf *base_btf)
 {
-	return libbpf_ptr(btf_parse(path, base_btf, NULL));
+	LIBBPF_OPTS(btf_parse_opts, opts, .base_btf = base_btf);
+
+	return btf__parse_opts(path, &opts);
 }
 
 static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endian);
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index f3f149a09088..8e1702ad5ef4 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -18,6 +18,7 @@ extern "C" {
 
 #define BTF_ELF_SEC ".BTF"
 #define BTF_EXT_ELF_SEC ".BTF.ext"
+#define BTF_BASE_ELF_SEC ".BTF.base"
 #define MAPS_ELF_SEC ".maps"
 
 struct btf;
@@ -134,6 +135,36 @@ LIBBPF_API struct btf *btf__parse_elf_split(const char *path, struct btf *base_b
 LIBBPF_API struct btf *btf__parse_raw(const char *path);
 LIBBPF_API struct btf *btf__parse_raw_split(const char *path, struct btf *base_btf);
 
+struct btf_parse_opts {
+	size_t sz;
+	/* use base BTF to parse split BTF */
+	struct btf *base_btf;
+	/* retrieve optional .BTF.ext info */
+	struct btf_ext **btf_ext;
+	/* BTF section name; if NULL, try parsing raw BTF, falling back to parsing
+	 * .BTF ELF section if that fails also.  If set, parse named ELF section.
+	 */
+	const char *btf_sec;
+	size_t :0;
+};
+
+#define btf_parse_opts__last_field btf_sec
+
+/* @brief **btf__parse_opts()** parses BTF information from either a
+ * raw BTF file (*btf_sec* is NULL) or from the specified BTF section,
+ * also retrieving  .BTF.ext info if *btf_ext* is non-NULL.  If
+ * *base_btf* is specified, use it to parse split BTF from the
+ * specified location.
+ *
+ * @return new BTF object instance which has to be eventually freed with
+ * **btf__free()**
+ *
+ * On error, NULL is returned, with the `errno` variable set to the
+ * error code.
+ */
+
+LIBBPF_API struct btf *btf__parse_opts(const char *path, struct btf_parse_opts *opts);
+
 LIBBPF_API struct btf *btf__load_vmlinux_btf(void);
 LIBBPF_API struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_btf);
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 9e69d6e2a512..fd7bfeaba542 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -420,6 +420,7 @@ LIBBPF_1.4.0 {
 LIBBPF_1.5.0 {
 	global:
 		btf__distill_base;
+		btf__parse_opts;
 		bpf_program__attach_sockmap;
 		ring__consume_n;
 		ring_buffer__consume_n;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 bpf-next 04/11] bpftool: support displaying raw split BTF using base BTF section as base
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
                   ` (2 preceding siblings ...)
  2024-05-17 10:22 ` [PATCH v4 bpf-next 03/11] libbpf: add btf__parse_opts() API for flexible BTF parsing Alan Maguire
@ 2024-05-17 10:22 ` Alan Maguire
  2024-05-17 10:22 ` [PATCH v4 bpf-next 05/11] resolve_btfids: use .BTF.base ELF section as base BTF if -B option is used Alan Maguire
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire, Quentin Monnet

If no base BTF can be found, fall back to checking for the .BTF.base
section and use it to display split BTF.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Quentin Monnet <qmo@kernel.org>
---
 tools/bpf/bpftool/btf.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index af047dedde38..67be216023e8 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -756,6 +756,14 @@ static int do_dump(int argc, char **argv)
 			base = get_vmlinux_btf_from_sysfs();
 
 		btf = btf__parse_split(*argv, base ?: base_btf);
+		/* Finally check for presence of base BTF section */
+		if (!btf && !base && !base_btf) {
+			LIBBPF_OPTS(btf_parse_opts, optp, .btf_sec = BTF_BASE_ELF_SEC);
+
+			base_btf = btf__parse_opts(*argv, &optp);
+			if (base_btf)
+				btf = btf__parse_split(*argv, base_btf);
+		}
 		if (!btf) {
 			err = -errno;
 			p_err("failed to load BTF from %s: %s",
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 bpf-next 05/11] resolve_btfids: use .BTF.base ELF section as base BTF if -B option is used
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
                   ` (3 preceding siblings ...)
  2024-05-17 10:22 ` [PATCH v4 bpf-next 04/11] bpftool: support displaying raw split BTF using base BTF section as base Alan Maguire
@ 2024-05-17 10:22 ` Alan Maguire
  2024-05-17 10:22 ` [PATCH v4 bpf-next 06/11] kbuild, bpf: add module-specific pahole/resolve_btfids flags for distilled base BTF Alan Maguire
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

When resolving BTF ids, use the BTF in the module .BTF.base section
when passed the -B option.  Both references to base BTF from split
BTF and BTF ids will be relocated for base vmlinux on module load.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/bpf/resolve_btfids/main.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/tools/bpf/resolve_btfids/main.c b/tools/bpf/resolve_btfids/main.c
index d9520cb826b3..4f2ff52c3ea7 100644
--- a/tools/bpf/resolve_btfids/main.c
+++ b/tools/bpf/resolve_btfids/main.c
@@ -115,6 +115,7 @@ struct object {
 	const char *path;
 	const char *btf;
 	const char *base_btf_path;
+	int base;
 
 	struct {
 		int		 fd;
@@ -527,22 +528,37 @@ static int symbols_resolve(struct object *obj)
 	int nr_unions   = obj->nr_unions;
 	int nr_funcs    = obj->nr_funcs;
 	struct btf *base_btf = NULL;
-	int err, type_id;
+	int err = 0, type_id;
 	struct btf *btf;
 	__u32 nr_types;
 
 	if (obj->base_btf_path) {
-		base_btf = btf__parse(obj->base_btf_path, NULL);
-		err = libbpf_get_error(base_btf);
+		LIBBPF_OPTS(btf_parse_opts, optp);
+		const char *path;
+
+		if (obj->base) {
+			optp.btf_sec = BTF_BASE_ELF_SEC;
+			path = obj->path;
+			base_btf = btf__parse_opts(path, &optp);
+			/* fall back to normal base parsing if no BTF_BASE_ELF_SEC */
+		}
+		if (!base_btf) {
+			optp.btf_sec = BTF_ELF_SEC;
+			path = obj->base_btf_path;
+			base_btf = btf__parse_opts(path, &optp);
+		}
+		if (!base_btf)
+			err = -errno;
 		if (err) {
 			pr_err("FAILED: load base BTF from %s: %s\n",
-			       obj->base_btf_path, strerror(-err));
+			       path, strerror(-err));
 			return -1;
 		}
 	}
 
 	btf = btf__parse_split(obj->btf ?: obj->path, base_btf);
-	err = libbpf_get_error(btf);
+	if (!btf)
+		err = -errno;
 	if (err) {
 		pr_err("FAILED: load BTF from %s: %s\n",
 			obj->btf ?: obj->path, strerror(-err));
@@ -781,6 +797,8 @@ int main(int argc, const char **argv)
 			   "BTF data"),
 		OPT_STRING('b', "btf_base", &obj.base_btf_path, "file",
 			   "path of file providing base BTF"),
+		OPT_INCR('B', "base", &obj.base,
+			 "use " BTF_BASE_ELF_SEC " ELF section BTF as base"),
 		OPT_END()
 	};
 	int err = -1;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 bpf-next 06/11] kbuild, bpf: add module-specific pahole/resolve_btfids flags for distilled base BTF
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
                   ` (4 preceding siblings ...)
  2024-05-17 10:22 ` [PATCH v4 bpf-next 05/11] resolve_btfids: use .BTF.base ELF section as base BTF if -B option is used Alan Maguire
@ 2024-05-17 10:22 ` Alan Maguire
  2024-05-17 10:22 ` [PATCH v4 bpf-next 07/11] libbpf: split BTF relocation Alan Maguire
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Support creation of module BTF along with distilled base BTF;
the latter is stored in a .BTF.base ELF section and supplements
split BTF references to base BTF with information about base types,
allowing for later relocation of split BTF with a (possibly
changed) base.  resolve_btfids uses the "-B" option to specify
that the BTF.ids section should be populated with split BTF
relative to the added .BTF.base section rather than relative
to the vmlinux base.

Modules will be built with a distilled .BTF.base section for external
module build, i.e.

make -C. -M=path2/module

...while in-tree module build as part of a normal kernel build will
not generate distilled base BTF; this is because in-tree modules
change with the kernel and do not require BTF relocation for the
running vmlinux.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 scripts/Makefile.btf      | 7 +++++++
 scripts/Makefile.modfinal | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
index bca8a8f26ea4..9f1e3163e718 100644
--- a/scripts/Makefile.btf
+++ b/scripts/Makefile.btf
@@ -21,8 +21,15 @@ else
 # Switch to using --btf_features for v1.26 and later.
 pahole-flags-$(call test-ge, $(pahole-ver), 126)  = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func
 
+ifneq ($(KBUILD_EXTMOD),)
+module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base
+module-resolve-btfids-flags-$(call test-ge, $(pahole-ver), 126) = -B
+endif
+
 endif
 
 pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE)		+= --lang_exclude=rust
 
 export PAHOLE_FLAGS := $(pahole-flags-y)
+export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y)
+export MODULE_RESOLVE_BTFIDS_FLAGS := $(module-resolve-btfids-flags-y)
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 79fcf2731686..86189ddf39e8 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -39,8 +39,8 @@ quiet_cmd_btf_ko = BTF [M] $@
 	if [ ! -f vmlinux ]; then					\
 		printf "Skipping BTF generation for %s due to unavailability of vmlinux\n" $@ 1>&2; \
 	else								\
-		LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) --btf_base vmlinux $@; \
-		$(RESOLVE_BTFIDS) -b vmlinux $@; 			\
+		LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) $(MODULE_PAHOLE_FLAGS) --btf_base vmlinux $@; \
+		$(RESOLVE_BTFIDS) $(MODULE_RESOLVE_BTFIDS_FLAGS) -b vmlinux $@; 			\
 	fi;
 
 # Same as newer-prereqs, but allows to exclude specified extra dependencies
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 bpf-next 07/11] libbpf: split BTF relocation
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
                   ` (5 preceding siblings ...)
  2024-05-17 10:22 ` [PATCH v4 bpf-next 06/11] kbuild, bpf: add module-specific pahole/resolve_btfids flags for distilled base BTF Alan Maguire
@ 2024-05-17 10:22 ` Alan Maguire
  2024-05-21 22:34   ` Andrii Nakryiko
  2024-05-23  1:06   ` Kui-Feng Lee
  2024-05-17 10:22 ` [PATCH v4 bpf-next 08/11] selftests/bpf: extend distilled BTF tests to cover " Alan Maguire
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Map distilled base BTF type ids referenced in split BTF and their
references to the base BTF passed in, and if the mapping succeeds,
reparent the split BTF to the base BTF.

Relocation is done by first verifying that distilled base BTF
only consists of named INT, FLOAT, ENUM, FWD, STRUCT and
UNION kinds; then we sort these to speed lookups.  Once sorted,
the base BTF is iterated, and for each relevant kind we check
for an equivalent in distilled base BTF.  When found, the
mapping from distilled -> base BTF id and string offset is recorded.

Once all mappings are established, we can update type ids
and string offsets in split BTF and reparent it to the new base.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 tools/lib/bpf/Build             |   2 +-
 tools/lib/bpf/btf.c             |  17 ++
 tools/lib/bpf/btf.h             |   8 +
 tools/lib/bpf/btf_relocate.c    | 318 ++++++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map        |   1 +
 tools/lib/bpf/libbpf_internal.h |   3 +
 6 files changed, 348 insertions(+), 1 deletion(-)
 create mode 100644 tools/lib/bpf/btf_relocate.c

diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index b6619199a706..336da6844d42 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1,4 +1,4 @@
 libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
 	    netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \
 	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
-	    usdt.o zip.o elf.o features.o
+	    usdt.o zip.o elf.o features.o btf_relocate.o
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index feba071087a5..a4d6c46cc251 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -5626,3 +5626,20 @@ int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf,
 	btf__free(new_base);
 	return libbpf_err(err);
 }
+
+const struct btf_header *btf_header(const struct btf *btf)
+{
+	return btf->hdr;
+}
+
+void btf_set_base_btf(struct btf *btf, const struct btf *base_btf)
+{
+	btf->base_btf = (struct btf *)base_btf;
+	btf->start_id = btf__type_cnt(base_btf);
+	btf->start_str_off = base_btf->hdr->str_len;
+}
+
+int btf__relocate(struct btf *btf, const struct btf *base_btf)
+{
+	return btf_relocate(btf, base_btf, NULL);
+}
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 8e1702ad5ef4..f75db650e426 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -282,6 +282,14 @@ struct btf_dedup_opts {
 
 LIBBPF_API int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts);
 
+/**
+ * @brief **btf__relocate()** will check the split BTF *btf* for references
+ * to base BTF kinds, and verify those references are compatible with
+ * *base_btf*; if they are, *btf* is adjusted such that is re-parented to
+ * *base_btf* and type ids and strings are adjusted to accommodate this.
+ */
+LIBBPF_API int btf__relocate(struct btf *btf, const struct btf *base_btf);
+
 struct btf_dump;
 
 struct btf_dump_opts {
diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c
new file mode 100644
index 000000000000..c06851f05472
--- /dev/null
+++ b/tools/lib/bpf/btf_relocate.c
@@ -0,0 +1,318 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024, Oracle and/or its affiliates. */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+
+#include "btf.h"
+#include "bpf.h"
+#include "libbpf.h"
+#include "libbpf_internal.h"
+
+struct btf;
+
+struct btf_relocate {
+	__u32 search_id;				/* must be first field; see search below */
+	struct btf *btf;
+	const struct btf *base_btf;
+	const struct btf *dist_base_btf;
+	unsigned int nr_base_types;
+	unsigned int nr_split_types;
+	unsigned int nr_dist_base_types;
+	int str_start;
+	int str_diff;
+	__u32 *map;
+	__u32 *str_map;
+	__u32 *dist_base_index;
+};
+
+static int btf_relocate_rewrite_type_id(__u32 *id, void *ctx)
+{
+	struct btf_relocate *r = ctx;
+
+	*id = r->map[*id];
+	return 0;
+}
+
+/* Simple string comparison used for sorting within BTF, since all distilled types are
+ * named.
+ */
+static int cmp_btf_types(const void *id1, const void *id2, void *priv)
+{
+	const struct btf *btf = priv;
+	const struct btf_type *t1 = btf_type_by_id(btf, *(__u32 *)id1);
+	const struct btf_type *t2 = btf_type_by_id(btf, *(__u32 *)id2);
+
+	return strcmp(btf__name_by_offset(btf, t1->name_off),
+		      btf__name_by_offset(btf, t2->name_off));
+}
+
+/* Comparison between base BTF type (search type) and distilled base types (target).
+ * Because there is no bsearch_r() we need to use the search key - which also is
+ * the first element of struct btf_relocate * - as a means to retrieve the
+ * struct btf_relocate *.
+ */
+static int cmp_base_and_distilled_btf_types(const void *idbase, const void *iddist)
+{
+	struct btf_relocate *r = (struct btf_relocate *)idbase;
+	const struct btf_type *tbase = btf_type_by_id(r->base_btf, *(__u32 *)idbase);
+	const struct btf_type *tdist = btf_type_by_id(r->dist_base_btf, *(__u32 *)iddist);
+
+	return strcmp(btf__name_by_offset(r->base_btf, tbase->name_off),
+		      btf__name_by_offset(r->dist_base_btf, tdist->name_off));
+}
+
+/* Build a map from distilled base BTF ids to base BTF ids. To do so, iterate
+ * through base BTF looking up distilled type (using binary search) equivalents.
+ */
+static int btf_relocate_map_distilled_base(struct btf_relocate *r)
+{
+	struct btf_type *t;
+	const char *name;
+	__u32 id;
+
+	/* generate a sort index array of type ids sorted by name for distilled
+	 * base BTF to speed lookups.
+	 */
+	for (id = 1; id < r->nr_dist_base_types; id++)
+		r->dist_base_index[id] = id;
+	qsort_r(r->dist_base_index, r->nr_dist_base_types, sizeof(__u32), cmp_btf_types,
+		(struct btf *)r->dist_base_btf);
+
+	for (id = 1; id < r->nr_base_types; id++) {
+		struct btf_type *dist_t;
+		int dist_kind, kind;
+		bool compat_kind;
+		__u32 *dist_id;
+
+		t = btf_type_by_id(r->base_btf, id);
+		kind = btf_kind(t);
+		/* distilled base consists of named types only. */
+		if (!t->name_off)
+			continue;
+		switch (kind) {
+		case BTF_KIND_INT:
+		case BTF_KIND_FLOAT:
+		case BTF_KIND_ENUM:
+		case BTF_KIND_ENUM64:
+		case BTF_KIND_FWD:
+		case BTF_KIND_STRUCT:
+		case BTF_KIND_UNION:
+			break;
+		default:
+			continue;
+		}
+		r->search_id = id;
+		dist_id = bsearch(&r->search_id, r->dist_base_index, r->nr_dist_base_types,
+				  sizeof(__u32), cmp_base_and_distilled_btf_types);
+		if (!dist_id)
+			continue;
+		if (!*dist_id || *dist_id > r->nr_dist_base_types) {
+			pr_warn("base BTF id [%d] maps to invalid distilled base BTF id [%d]\n",
+				id, *dist_id);
+			return -EINVAL;
+		}
+		/* validate that kinds are compatible */
+		dist_t = btf_type_by_id(r->dist_base_btf, *dist_id);
+		dist_kind = btf_kind(dist_t);
+		name = btf__name_by_offset(r->dist_base_btf, dist_t->name_off);
+		compat_kind = dist_kind == kind;
+		if (!compat_kind) {
+			switch (dist_kind) {
+			case BTF_KIND_FWD:
+				compat_kind = kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
+				break;
+			case BTF_KIND_ENUM:
+				compat_kind = kind == BTF_KIND_ENUM64;
+				break;
+			default:
+				break;
+			}
+			if (!compat_kind) {
+				pr_warn("kind incompatibility (%d != %d) between distilled base type '%s'[%d] and base type [%d]\n",
+					dist_kind, kind, name, *dist_id, id);
+				return -EINVAL;
+			}
+		}
+		/* validate that int, float struct, union sizes are compatible;
+		 * distilled base BTF encodes an empty STRUCT/UNION with
+		 * specific size for cases where a type is embedded in a split
+		 * type (so has to preserve size info).  Do not error out
+		 * on mismatch as another size match may occur for an
+		 * identically-named type.
+		 */
+		switch (btf_kind(dist_t)) {
+		case BTF_KIND_INT:
+			if (*(__u32 *)(t + 1) != *(__u32 *)(dist_t + 1))
+				continue;
+			if (t->size != dist_t->size)
+				continue;
+			break;
+		case BTF_KIND_FLOAT:
+		case BTF_KIND_STRUCT:
+		case BTF_KIND_UNION:
+			if (t->size != dist_t->size)
+				continue;
+			break;
+		default:
+			break;
+		}
+		/* map id and name */
+		r->map[*dist_id] = id;
+		r->str_map[dist_t->name_off] = t->name_off;
+	}
+	/* ensure all distilled BTF ids have a mapping... */
+	for (id = 1; id < r->nr_dist_base_types; id++) {
+		if (r->map[id])
+			continue;
+		t = btf_type_by_id(r->dist_base_btf, id);
+		name = btf__name_by_offset(r->dist_base_btf, t->name_off);
+		pr_warn("distilled base BTF type '%s' [%d] is not mapped to base BTF id\n",
+			name, id);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/* distilled base should only have named int/float/enum/fwd/struct/union types. */
+static int btf_relocate_validate_distilled_base(struct btf_relocate *r)
+{
+	unsigned int i;
+
+	for (i = 1; i < r->nr_dist_base_types; i++) {
+		struct btf_type *t = btf_type_by_id(r->dist_base_btf, i);
+		int kind = btf_kind(t);
+
+		switch (kind) {
+		case BTF_KIND_INT:
+		case BTF_KIND_FLOAT:
+		case BTF_KIND_ENUM:
+		case BTF_KIND_STRUCT:
+		case BTF_KIND_UNION:
+		case BTF_KIND_FWD:
+			if (t->name_off)
+				break;
+			pr_warn("type [%d], kind [%d] is invalid for distilled base BTF; it is anonymous\n",
+				i, kind);
+			return -EINVAL;
+		default:
+			pr_warn("type [%d] in distilled based BTF has unexpected kind [%d]\n",
+				i, kind);
+			return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+static int btf_rewrite_strs(__u32 *str_off, void *ctx)
+{
+	struct btf_relocate *r = ctx;
+	int off;
+
+	if (!*str_off)
+		return 0;
+	if (*str_off >= r->str_start) {
+		*str_off += r->str_diff;
+	} else {
+		off = r->str_map[*str_off];
+		if (!off) {
+			pr_warn("string '%s' [offset %d] is not mapped to base BTF",
+				btf__str_by_offset(r->btf, off), *str_off);
+			return -ENOENT;
+		}
+		*str_off = off;
+	}
+	return 0;
+}
+
+static int btf_relocate_finalize(struct btf_relocate *r)
+{
+	const struct btf_header *dist_base_hdr;
+	const struct btf_header *base_hdr;
+	struct btf_type *t;
+	int i, err;
+
+	dist_base_hdr = btf_header(r->dist_base_btf);
+	base_hdr = btf_header(r->base_btf);
+	r->str_start = dist_base_hdr->str_len;
+	r->str_diff = base_hdr->str_len - dist_base_hdr->str_len;
+	for (i = 0; i < r->nr_split_types; i++) {
+		t = btf_type_by_id(r->btf, i + r->nr_dist_base_types);
+		err = btf_type_visit_str_offs(t, btf_rewrite_strs, r);
+		if (err)
+			break;
+	}
+	btf_set_base_btf(r->btf, r->base_btf);
+
+	return err;
+}
+
+/* If successful, output of relocation is updated BTF with base BTF pointing
+ * at base_btf, and type ids, strings adjusted accordingly
+ */
+int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **map_ids)
+{
+	unsigned int nr_types = btf__type_cnt(btf);
+	struct btf_relocate r = {};
+	struct btf_type *t;
+	int diff_id, err = 0;
+	__u32 id, i;
+
+	r.dist_base_btf = btf__base_btf(btf);
+	if (!base_btf || r.dist_base_btf == base_btf)
+		return 0;
+
+	r.nr_dist_base_types = btf__type_cnt(r.dist_base_btf);
+	r.nr_base_types = btf__type_cnt(base_btf);
+	r.nr_split_types = nr_types - r.nr_dist_base_types;
+	r.btf = btf;
+	r.base_btf = base_btf;
+
+	r.map = calloc(nr_types, sizeof(*r.map));
+	r.str_map = calloc(btf_header(r.dist_base_btf)->str_len, sizeof(*r.str_map));
+	r.dist_base_index = calloc(r.nr_dist_base_types, sizeof(*r.dist_base_index));
+	if (!r.map || !r.str_map || !r.dist_base_index) {
+		err = -ENOMEM;
+		goto err_out;
+	}
+
+	err = btf_relocate_validate_distilled_base(&r);
+	if (err)
+		goto err_out;
+
+	diff_id = r.nr_base_types - r.nr_dist_base_types;
+	/* Split BTF ids will start from after last base BTF id. */
+	for (id = r.nr_dist_base_types; id < nr_types; id++)
+		r.map[id] = id + diff_id;
+
+	/* Build a map from distilled base ids to actual base BTF ids; it is used
+	 * to update split BTF id references.
+	 */
+	err = btf_relocate_map_distilled_base(&r);
+	if (err)
+		goto err_out;
+
+	/* Next, rewrite type ids in split BTF, replacing split ids with updated
+	 * ids based on number of types in base BTF, and base ids with
+	 * relocated ids from base_btf.
+	 */
+	for (i = 0, id = r.nr_dist_base_types; i < r.nr_split_types; i++, id++) {
+		t = btf_type_by_id(btf, id);
+		err = btf_type_visit_type_ids(t, btf_relocate_rewrite_type_id, &r);
+		if (err)
+			goto err_out;
+	}
+	/* Finally reset base BTF to base_btf; as part of this operation, string
+	 * offsets are also updated, and we are done.
+	 */
+	err = btf_relocate_finalize(&r);
+err_out:
+	if (!err && map_ids)
+		*map_ids = r.map;
+	else
+		free(r.map);
+	free(r.str_map);
+	free(r.dist_base_index);
+	return err;
+}
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index fd7bfeaba542..849cbe0def00 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -421,6 +421,7 @@ LIBBPF_1.5.0 {
 	global:
 		btf__distill_base;
 		btf__parse_opts;
+		btf__relocate;
 		bpf_program__attach_sockmap;
 		ring__consume_n;
 		ring_buffer__consume_n;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index a0dcfb82e455..3b98edae254f 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -234,6 +234,9 @@ struct btf_type;
 struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id);
 const char *btf_kind_str(const struct btf_type *t);
 const struct btf_type *skip_mods_and_typedefs(const struct btf *btf, __u32 id, __u32 *res_id);
+const struct btf_header *btf_header(const struct btf *btf);
+void btf_set_base_btf(struct btf *btf, const struct btf *base_btf);
+int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **map_ids);
 
 static inline enum btf_func_linkage btf_func_linkage(const struct btf_type *t)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 bpf-next 08/11] selftests/bpf: extend distilled BTF tests to cover BTF relocation
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
                   ` (6 preceding siblings ...)
  2024-05-17 10:22 ` [PATCH v4 bpf-next 07/11] libbpf: split BTF relocation Alan Maguire
@ 2024-05-17 10:22 ` Alan Maguire
  2024-05-17 10:22 ` [PATCH v4 bpf-next 09/11] module, bpf: store BTF base pointer in struct module Alan Maguire
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Ensure relocated BTF looks as expected; in this case identical to
original split BTF, with a few duplicate anonymous types added to
split BTF by the relocation process.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
---
 .../selftests/bpf/prog_tests/btf_distill.c    | 67 +++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/btf_distill.c b/tools/testing/selftests/bpf/prog_tests/btf_distill.c
index 4ab522df041c..442627719edc 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_distill.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_distill.c
@@ -218,6 +218,73 @@ static void test_distilled_base(void)
 		"\t'p1' type_id=1",
 		"[25] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3");
 
+	if (!ASSERT_EQ(btf__relocate(btf4, btf1), 0, "relocate_split"))
+		goto cleanup;
+
+	VALIDATE_RAW_BTF(
+		btf4,
+		"[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+		"[2] PTR '(anon)' type_id=1",
+		"[3] STRUCT 's1' size=8 vlen=1\n"
+		"\t'f1' type_id=2 bits_offset=0",
+		"[4] STRUCT '(anon)' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=3 bits_offset=32",
+		"[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)",
+		"[6] UNION 'u1' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=2 bits_offset=0",
+		"[7] UNION '(anon)' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'v1' val=1",
+		"[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'av1' val=2",
+		"[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1024",
+		"[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1025",
+		"[12] STRUCT 'unneeded' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[13] STRUCT 'embedded' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[14] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+		"\t'p1' type_id=1",
+		"[15] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3",
+		"[16] STRUCT 'from_proto' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[17] UNION 'u1' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[18] PTR '(anon)' type_id=3",
+		"[19] PTR '(anon)' type_id=30",
+		"[20] CONST '(anon)' type_id=6",
+		"[21] RESTRICT '(anon)' type_id=31",
+		"[22] VOLATILE '(anon)' type_id=8",
+		"[23] TYPEDEF 'et' type_id=32",
+		"[24] CONST '(anon)' type_id=10",
+		"[25] PTR '(anon)' type_id=33",
+		"[26] STRUCT 'with_embedded' size=4 vlen=1\n"
+		"\t'f1' type_id=13 bits_offset=0",
+		"[27] FUNC 'fn' type_id=34 linkage=static",
+		"[28] TYPEDEF 'arraytype' type_id=35",
+		"[29] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+		"\t'p1' type_id=16",
+		/* below here are (duplicate) anon base types added by distill
+		 * process to split BTF.
+		 */
+		"[30] STRUCT '(anon)' size=12 vlen=2\n"
+		"\t'f1' type_id=1 bits_offset=0\n"
+		"\t'f2' type_id=3 bits_offset=32",
+		"[31] UNION '(anon)' size=4 vlen=1\n"
+		"\t'f1' type_id=1 bits_offset=0",
+		"[32] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n"
+		"\t'av1' val=2",
+		"[33] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n"
+		"\t'v1' val=1025",
+		"[34] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+		"\t'p1' type_id=1",
+		"[35] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3");
+
 cleanup:
 	btf__free(btf4);
 	btf__free(btf3);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 bpf-next 09/11] module, bpf: store BTF base pointer in struct module
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
                   ` (7 preceding siblings ...)
  2024-05-17 10:22 ` [PATCH v4 bpf-next 08/11] selftests/bpf: extend distilled BTF tests to cover " Alan Maguire
@ 2024-05-17 10:22 ` Alan Maguire
  2024-05-17 10:22 ` [PATCH v4 bpf-next 10/11] libbpf,bpf: share BTF relocate-related code with kernel Alan Maguire
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

...as this will allow split BTF modules with a base BTF
representation (rather than the full vmlinux BTF at time of
BTF encoding) to resolve their references to kernel types in a
way that is more resilient to small changes in kernel types.

This will allow modules that are not built every time the kernel
is to provide more resilient BTF, rather than have it invalidated
every time BTF ids for core kernel types change.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 include/linux/module.h | 2 ++
 kernel/module/main.c   | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/module.h b/include/linux/module.h
index 1153b0d99a80..f127a79a95d9 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -510,6 +510,8 @@ struct module {
 #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
 	unsigned int btf_data_size;
 	void *btf_data;
+	unsigned int btf_base_data_size;
+	void *btf_base_data;
 #endif
 #ifdef CONFIG_JUMP_LABEL
 	struct jump_entry *jump_entries;
diff --git a/kernel/module/main.c b/kernel/module/main.c
index e1e8a7a9d6c1..e18683abec07 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2148,6 +2148,8 @@ static int find_module_sections(struct module *mod, struct load_info *info)
 #endif
 #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
 	mod->btf_data = any_section_objs(info, ".BTF", 1, &mod->btf_data_size);
+	mod->btf_base_data = any_section_objs(info, ".BTF.base", 1,
+					      &mod->btf_base_data_size);
 #endif
 #ifdef CONFIG_JUMP_LABEL
 	mod->jump_entries = section_objs(info, "__jump_table",
@@ -2587,8 +2589,9 @@ static noinline int do_init_module(struct module *mod)
 	}
 
 #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
-	/* .BTF is not SHF_ALLOC and will get removed, so sanitize pointer */
+	/* .BTF is not SHF_ALLOC and will get removed, so sanitize pointers */
 	mod->btf_data = NULL;
+	mod->btf_base_data = NULL;
 #endif
 	/*
 	 * We want to free module_init, but be aware that kallsyms may be
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 bpf-next 10/11] libbpf,bpf: share BTF relocate-related code with kernel
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
                   ` (8 preceding siblings ...)
  2024-05-17 10:22 ` [PATCH v4 bpf-next 09/11] module, bpf: store BTF base pointer in struct module Alan Maguire
@ 2024-05-17 10:22 ` Alan Maguire
  2024-05-21 22:59   ` Andrii Nakryiko
  2024-05-17 10:22 ` [PATCH v4 bpf-next 11/11] bpftool: support displaying relocated-with-base split BTF Alan Maguire
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire

Share relocation implementation with the kernel.  As part of this,
we also need the type/string visitation functions so add them to a
btf_common.c file that also gets shared with the kernel. Relocation
code in kernel and userspace is identical save for the impementation
of the reparenting of split BTF to the relocated base BTF and
retrieval of BTF header from "struct btf"; these small functions
need separate user-space and kernel implementations.

One other wrinkle on the kernel side is we have to map .BTF.ids in
modules as they were generated with the type ids used at BTF encoding
time. btf_relocate() optionally returns an array mapping from old BTF
ids to relocated ids, so we use that to fix up these references where
needed for kfuncs.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
 include/linux/btf.h          |  45 ++++++++++
 kernel/bpf/Makefile          |   8 ++
 kernel/bpf/btf.c             | 166 +++++++++++++++++++++++++----------
 tools/lib/bpf/Build          |   2 +-
 tools/lib/bpf/btf.c          | 130 ---------------------------
 tools/lib/bpf/btf_common.c   | 143 ++++++++++++++++++++++++++++++
 tools/lib/bpf/btf_relocate.c |  23 +++++
 7 files changed, 341 insertions(+), 176 deletions(-)
 create mode 100644 tools/lib/bpf/btf_common.c

diff --git a/include/linux/btf.h b/include/linux/btf.h
index f9e56fd12a9f..69aa760f5972 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -140,6 +140,8 @@ extern const struct file_operations btf_fops;
 const char *btf_get_name(const struct btf *btf);
 void btf_get(struct btf *btf);
 void btf_put(struct btf *btf);
+const struct btf_header *btf_header(const struct btf *btf);
+void btf_set_base_btf(struct btf *btf, struct btf *base_btf);
 int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_sz);
 struct btf *btf_get_by_fd(int fd);
 int btf_get_info_by_fd(const struct btf *btf,
@@ -214,6 +216,7 @@ bool btf_is_kernel(const struct btf *btf);
 bool btf_is_module(const struct btf *btf);
 struct module *btf_try_get_module(const struct btf *btf);
 u32 btf_nr_types(const struct btf *btf);
+struct btf *btf_base_btf(const struct btf *btf);
 bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s,
 			   const struct btf_member *m,
 			   u32 expected_offset, u32 expected_size);
@@ -339,6 +342,11 @@ static inline u8 btf_int_offset(const struct btf_type *t)
 	return BTF_INT_OFFSET(*(u32 *)(t + 1));
 }
 
+static inline __u8 btf_int_bits(const struct btf_type *t)
+{
+	return BTF_INT_BITS(*(__u32 *)(t + 1));
+}
+
 static inline bool btf_type_is_scalar(const struct btf_type *t)
 {
 	return btf_type_is_int(t) || btf_type_is_enum(t);
@@ -478,6 +486,11 @@ static inline struct btf_param *btf_params(const struct btf_type *t)
 	return (struct btf_param *)(t + 1);
 }
 
+static inline struct btf_decl_tag *btf_decl_tag(const struct btf_type *t)
+{
+	return (struct btf_decl_tag *)(t + 1);
+}
+
 static inline int btf_id_cmp_func(const void *a, const void *b)
 {
 	const int *pa = a, *pb = b;
@@ -515,9 +528,17 @@ static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *
 }
 #endif
 
+typedef int (*type_id_visit_fn)(__u32 *type_id, void *ctx);
+typedef int (*str_off_visit_fn)(__u32 *str_off, void *ctx);
+
 #ifdef CONFIG_BPF_SYSCALL
 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
+void btf_set_base_btf(struct btf *btf, struct btf *base_btf);
+int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **map_ids);
+int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx);
+int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx);
 const char *btf_name_by_offset(const struct btf *btf, u32 offset);
+const char *btf_str_by_offset(const struct btf *btf, u32 offset);
 struct btf *btf_parse_vmlinux(void);
 struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
 u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id,
@@ -543,6 +564,30 @@ static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
 {
 	return NULL;
 }
+
+static inline void btf_set_base_btf(struct btf *btf, struct btf *base_btf)
+{
+	return;
+}
+
+static inline int btf_relocate(void *log, struct btf *btf, const struct btf *base_btf,
+			       __u32 **map_ids)
+{
+	return 0;
+}
+
+static inline int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit,
+					  void *ctx)
+{
+	return 0;
+}
+
+static inline int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit,
+					  void *ctx)
+{
+	return 0;
+}
+
 static inline const char *btf_name_by_offset(const struct btf *btf,
 					     u32 offset)
 {
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 7eb9ad3a3ae6..612eef1228ca 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -52,3 +52,11 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/
 obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
 $(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
 	$(call if_changed_rule,cc_o_c)
+
+obj-$(CONFIG_BPF_SYSCALL) += btf_common.o
+$(obj)/btf_common.o: $(srctree)/tools/lib/bpf/btf_common.c FORCE
+	$(call if_changed_rule,cc_o_c)
+
+obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o
+$(obj)/btf_relocate.o: $(srctree)/tools/lib/bpf/btf_relocate.c FORCE
+	$(call if_changed_rule,cc_o_c)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 821063660d9f..ebc127da4d79 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -274,6 +274,7 @@ struct btf {
 	u32 start_str_off; /* first string offset (0 for base BTF) */
 	char name[MODULE_NAME_LEN];
 	bool kernel_btf;
+	__u32 *base_map; /* map from distilled base BTF -> vmlinux BTF ids */
 };
 
 enum verifier_phase {
@@ -530,6 +531,11 @@ static bool btf_type_is_decl_tag_target(const struct btf_type *t)
 	       btf_type_is_var(t) || btf_type_is_typedef(t);
 }
 
+static bool btf_is_vmlinux(const struct btf *btf)
+{
+	return btf->kernel_btf && !btf->base_btf;
+}
+
 u32 btf_nr_types(const struct btf *btf)
 {
 	u32 total = 0;
@@ -772,7 +778,7 @@ static bool __btf_name_char_ok(char c, bool first)
 	return true;
 }
 
-static const char *btf_str_by_offset(const struct btf *btf, u32 offset)
+const char *btf_str_by_offset(const struct btf *btf, u32 offset)
 {
 	while (offset < btf->start_str_off)
 		btf = btf->base_btf;
@@ -1735,7 +1741,12 @@ static void btf_free(struct btf *btf)
 	kvfree(btf->types);
 	kvfree(btf->resolved_sizes);
 	kvfree(btf->resolved_ids);
-	kvfree(btf->data);
+	/* vmlinux does not allocate btf->data, it simply points it at
+	 * __start_BTF.
+	 */
+	if (!btf_is_vmlinux(btf))
+		kvfree(btf->data);
+	kvfree(btf->base_map);
 	kfree(btf);
 }
 
@@ -1764,6 +1775,23 @@ void btf_put(struct btf *btf)
 	}
 }
 
+struct btf *btf_base_btf(const struct btf *btf)
+{
+	return btf->base_btf;
+}
+
+const struct btf_header *btf_header(const struct btf *btf)
+{
+	return &btf->hdr;
+}
+
+void btf_set_base_btf(struct btf *btf, struct btf *base_btf)
+{
+	btf->base_btf = base_btf;
+	btf->start_id = btf_nr_types(base_btf);
+	btf->start_str_off = base_btf->hdr.str_len;
+}
+
 static int env_resolve_init(struct btf_verifier_env *env)
 {
 	struct btf *btf = env->btf;
@@ -5982,23 +6010,15 @@ int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_ty
 BTF_ID_LIST(bpf_ctx_convert_btf_id)
 BTF_ID(struct, bpf_ctx_convert)
 
-struct btf *btf_parse_vmlinux(void)
+static struct btf *btf_parse_base(struct btf_verifier_env *env, const char *name,
+				  void *data, unsigned int data_size)
 {
-	struct btf_verifier_env *env = NULL;
-	struct bpf_verifier_log *log;
 	struct btf *btf = NULL;
 	int err;
 
 	if (!IS_ENABLED(CONFIG_DEBUG_INFO_BTF))
 		return ERR_PTR(-ENOENT);
 
-	env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
-	if (!env)
-		return ERR_PTR(-ENOMEM);
-
-	log = &env->log;
-	log->level = BPF_LOG_KERNEL;
-
 	btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
 	if (!btf) {
 		err = -ENOMEM;
@@ -6006,10 +6026,10 @@ struct btf *btf_parse_vmlinux(void)
 	}
 	env->btf = btf;
 
-	btf->data = __start_BTF;
-	btf->data_size = __stop_BTF - __start_BTF;
+	btf->data = data;
+	btf->data_size = data_size;
 	btf->kernel_btf = true;
-	snprintf(btf->name, sizeof(btf->name), "vmlinux");
+	snprintf(btf->name, sizeof(btf->name), "%s", name);
 
 	err = btf_parse_hdr(env);
 	if (err)
@@ -6029,20 +6049,11 @@ struct btf *btf_parse_vmlinux(void)
 	if (err)
 		goto errout;
 
-	/* btf_parse_vmlinux() runs under bpf_verifier_lock */
-	bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]);
-
 	refcount_set(&btf->refcnt, 1);
 
-	err = btf_alloc_id(btf);
-	if (err)
-		goto errout;
-
-	btf_verifier_env_free(env);
 	return btf;
 
 errout:
-	btf_verifier_env_free(env);
 	if (btf) {
 		kvfree(btf->types);
 		kfree(btf);
@@ -6050,19 +6061,59 @@ struct btf *btf_parse_vmlinux(void)
 	return ERR_PTR(err);
 }
 
+struct btf *btf_parse_vmlinux(void)
+{
+	struct btf_verifier_env *env = NULL;
+	struct bpf_verifier_log *log;
+	struct btf *btf;
+	int err;
+
+	env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
+	if (!env)
+		return ERR_PTR(-ENOMEM);
+
+	log = &env->log;
+	log->level = BPF_LOG_KERNEL;
+	btf = btf_parse_base(env, "vmlinux", __start_BTF, __stop_BTF - __start_BTF);
+	if (!IS_ERR(btf)) {
+		/* btf_parse_vmlinux() runs under bpf_verifier_lock */
+		bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]);
+		err = btf_alloc_id(btf);
+		if (err) {
+			btf_free(btf);
+			btf = ERR_PTR(err);
+		}
+	}
+	btf_verifier_env_free(env);
+	return btf;
+}
+
 #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
 
-static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
+/* If .BTF_ids section was created with distilled base BTF, both base and
+ * split BTF ids will need to be mapped to actual base/split ids for
+ * BTF now that it has been relocated.
+ */
+static __u32 btf_id_map(const struct btf *btf, __u32 id)
 {
+	if (!btf->base_btf || !btf->base_map)
+		return id;
+	return btf->base_map[id];
+}
+
+static struct btf *btf_parse_module(const char *module_name, const void *data,
+				    unsigned int data_size, void *base_data,
+				    unsigned int base_data_size)
+{
+	struct btf *btf = NULL, *vmlinux_btf, *base_btf = NULL;
 	struct btf_verifier_env *env = NULL;
 	struct bpf_verifier_log *log;
-	struct btf *btf = NULL, *base_btf;
-	int err;
+	int err = 0;
 
-	base_btf = bpf_get_btf_vmlinux();
-	if (IS_ERR(base_btf))
-		return base_btf;
-	if (!base_btf)
+	vmlinux_btf = bpf_get_btf_vmlinux();
+	if (IS_ERR(vmlinux_btf))
+		return vmlinux_btf;
+	if (!vmlinux_btf)
 		return ERR_PTR(-EINVAL);
 
 	env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
@@ -6072,6 +6123,16 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
 	log = &env->log;
 	log->level = BPF_LOG_KERNEL;
 
+	if (base_data) {
+		base_btf = btf_parse_base(env, ".BTF.base", base_data, base_data_size);
+		if (IS_ERR(base_btf)) {
+			err = PTR_ERR(base_btf);
+			goto errout;
+		}
+	} else {
+		base_btf = vmlinux_btf;
+	}
+
 	btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
 	if (!btf) {
 		err = -ENOMEM;
@@ -6111,12 +6172,22 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u
 	if (err)
 		goto errout;
 
+	if (base_btf != vmlinux_btf) {
+		err = btf_relocate(btf, vmlinux_btf, &btf->base_map);
+		if (err)
+			goto errout;
+		btf_free(base_btf);
+		base_btf = vmlinux_btf;
+	}
+
 	btf_verifier_env_free(env);
 	refcount_set(&btf->refcnt, 1);
 	return btf;
 
 errout:
 	btf_verifier_env_free(env);
+	if (base_btf != vmlinux_btf)
+		btf_free(base_btf);
 	if (btf) {
 		kvfree(btf->data);
 		kvfree(btf->types);
@@ -7669,7 +7740,8 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
 			err = -ENOMEM;
 			goto out;
 		}
-		btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size);
+		btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size,
+				       mod->btf_base_data, mod->btf_base_data_size);
 		if (IS_ERR(btf)) {
 			kfree(btf_mod);
 			if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH)) {
@@ -7993,7 +8065,7 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
 	bool add_filter = !!kset->filter;
 	struct btf_kfunc_set_tab *tab;
 	struct btf_id_set8 *set;
-	u32 set_cnt;
+	u32 set_cnt, i;
 	int ret;
 
 	if (hook >= BTF_KFUNC_HOOK_MAX) {
@@ -8039,21 +8111,15 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
 		goto end;
 	}
 
-	/* We don't need to allocate, concatenate, and sort module sets, because
-	 * only one is allowed per hook. Hence, we can directly assign the
-	 * pointer and return.
-	 */
-	if (!vmlinux_set) {
-		tab->sets[hook] = add_set;
-		goto do_add_filter;
-	}
-
 	/* In case of vmlinux sets, there may be more than one set being
 	 * registered per hook. To create a unified set, we allocate a new set
 	 * and concatenate all individual sets being registered. While each set
 	 * is individually sorted, they may become unsorted when concatenated,
 	 * hence re-sorting the final set again is required to make binary
 	 * searching the set using btf_id_set8_contains function work.
+	 *
+	 * For module sets, we need to allocate as we may need to relocate
+	 * BTF ids.
 	 */
 	set_cnt = set ? set->cnt : 0;
 
@@ -8083,11 +8149,14 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook,
 
 	/* Concatenate the two sets */
 	memcpy(set->pairs + set->cnt, add_set->pairs, add_set->cnt * sizeof(set->pairs[0]));
+	/* Now that the set is copied, update with relocated BTF ids */
+	for (i = set->cnt; i < set->cnt + add_set->cnt; i++)
+		set->pairs[i].id = btf_id_map(btf, set->pairs[i].id);
+
 	set->cnt += add_set->cnt;
 
 	sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL);
 
-do_add_filter:
 	if (add_filter) {
 		hook_filter = &tab->hook_filters[hook];
 		hook_filter->filters[hook_filter->nr_filters++] = kset->filter;
@@ -8207,7 +8276,7 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook,
 		return PTR_ERR(btf);
 
 	for (i = 0; i < kset->set->cnt; i++) {
-		ret = btf_check_kfunc_protos(btf, kset->set->pairs[i].id,
+		ret = btf_check_kfunc_protos(btf, btf_id_map(btf, kset->set->pairs[i].id),
 					     kset->set->pairs[i].flags);
 		if (ret)
 			goto err_out;
@@ -8306,7 +8375,7 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c
 {
 	struct btf_id_dtor_kfunc_tab *tab;
 	struct btf *btf;
-	u32 tab_cnt;
+	u32 tab_cnt, i;
 	int ret;
 
 	btf = btf_get_module_btf(owner);
@@ -8357,6 +8426,13 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c
 	btf->dtor_kfunc_tab = tab;
 
 	memcpy(tab->dtors + tab->cnt, dtors, add_cnt * sizeof(tab->dtors[0]));
+
+	/* remap BTF ids based on BTF relocation (if any) */
+	for (i = tab_cnt; i < tab_cnt + add_cnt; i++) {
+		tab->dtors[i].btf_id = btf_id_map(btf, tab->dtors[i].btf_id);
+		tab->dtors[i].kfunc_btf_id = btf_id_map(btf, tab->dtors[i].kfunc_btf_id);
+	}
+
 	tab->cnt += add_cnt;
 
 	sort(tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func, NULL);
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index 336da6844d42..567abaa52131 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1,4 +1,4 @@
 libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \
 	    netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \
 	    btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
-	    usdt.o zip.o elf.o features.o btf_relocate.o
+	    usdt.o zip.o elf.o features.o btf_common.o btf_relocate.o
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index a4d6c46cc251..fc2a60759244 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -5026,136 +5026,6 @@ struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_bt
 	return btf__parse_split(path, vmlinux_btf);
 }
 
-int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx)
-{
-	int i, n, err;
-
-	switch (btf_kind(t)) {
-	case BTF_KIND_INT:
-	case BTF_KIND_FLOAT:
-	case BTF_KIND_ENUM:
-	case BTF_KIND_ENUM64:
-		return 0;
-
-	case BTF_KIND_FWD:
-	case BTF_KIND_CONST:
-	case BTF_KIND_VOLATILE:
-	case BTF_KIND_RESTRICT:
-	case BTF_KIND_PTR:
-	case BTF_KIND_TYPEDEF:
-	case BTF_KIND_FUNC:
-	case BTF_KIND_VAR:
-	case BTF_KIND_DECL_TAG:
-	case BTF_KIND_TYPE_TAG:
-		return visit(&t->type, ctx);
-
-	case BTF_KIND_ARRAY: {
-		struct btf_array *a = btf_array(t);
-
-		err = visit(&a->type, ctx);
-		err = err ?: visit(&a->index_type, ctx);
-		return err;
-	}
-
-	case BTF_KIND_STRUCT:
-	case BTF_KIND_UNION: {
-		struct btf_member *m = btf_members(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->type, ctx);
-			if (err)
-				return err;
-		}
-		return 0;
-	}
-
-	case BTF_KIND_FUNC_PROTO: {
-		struct btf_param *m = btf_params(t);
-
-		err = visit(&t->type, ctx);
-		if (err)
-			return err;
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->type, ctx);
-			if (err)
-				return err;
-		}
-		return 0;
-	}
-
-	case BTF_KIND_DATASEC: {
-		struct btf_var_secinfo *m = btf_var_secinfos(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->type, ctx);
-			if (err)
-				return err;
-		}
-		return 0;
-	}
-
-	default:
-		return -EINVAL;
-	}
-}
-
-int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx)
-{
-	int i, n, err;
-
-	err = visit(&t->name_off, ctx);
-	if (err)
-		return err;
-
-	switch (btf_kind(t)) {
-	case BTF_KIND_STRUCT:
-	case BTF_KIND_UNION: {
-		struct btf_member *m = btf_members(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->name_off, ctx);
-			if (err)
-				return err;
-		}
-		break;
-	}
-	case BTF_KIND_ENUM: {
-		struct btf_enum *m = btf_enum(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->name_off, ctx);
-			if (err)
-				return err;
-		}
-		break;
-	}
-	case BTF_KIND_ENUM64: {
-		struct btf_enum64 *m = btf_enum64(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->name_off, ctx);
-			if (err)
-				return err;
-		}
-		break;
-	}
-	case BTF_KIND_FUNC_PROTO: {
-		struct btf_param *m = btf_params(t);
-
-		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
-			err = visit(&m->name_off, ctx);
-			if (err)
-				return err;
-		}
-		break;
-	}
-	default:
-		break;
-	}
-
-	return 0;
-}
-
 int btf_ext_visit_type_ids(struct btf_ext *btf_ext, type_id_visit_fn visit, void *ctx)
 {
 	const struct btf_ext_info *seg;
diff --git a/tools/lib/bpf/btf_common.c b/tools/lib/bpf/btf_common.c
new file mode 100644
index 000000000000..93f207257dce
--- /dev/null
+++ b/tools/lib/bpf/btf_common.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+/* Copyright (c) 2021 Facebook */
+/* Copyright (c) 2024, Oracle and/or its affiliates. */
+
+#ifdef __KERNEL__
+#include <linux/bpf.h>
+#include <linux/btf.h>
+
+#define btf_var_secinfos	btf_type_var_secinfo
+
+#else
+#include "btf.h"
+#include "libbpf_internal.h"
+#endif
+
+int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx)
+{
+	int i, n, err;
+
+	switch (btf_kind(t)) {
+	case BTF_KIND_INT:
+	case BTF_KIND_FLOAT:
+	case BTF_KIND_ENUM:
+	case BTF_KIND_ENUM64:
+		return 0;
+
+	case BTF_KIND_FWD:
+	case BTF_KIND_CONST:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_RESTRICT:
+	case BTF_KIND_PTR:
+	case BTF_KIND_TYPEDEF:
+	case BTF_KIND_FUNC:
+	case BTF_KIND_VAR:
+	case BTF_KIND_DECL_TAG:
+	case BTF_KIND_TYPE_TAG:
+		return visit(&t->type, ctx);
+
+	case BTF_KIND_ARRAY: {
+		struct btf_array *a = btf_array(t);
+
+		err = visit(&a->type, ctx);
+		err = err ?: visit(&a->index_type, ctx);
+		return err;
+	}
+
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION: {
+		struct btf_member *m = btf_members(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->type, ctx);
+			if (err)
+				return err;
+		}
+		return 0;
+	}
+	case BTF_KIND_FUNC_PROTO: {
+		struct btf_param *m = btf_params(t);
+
+		err = visit(&t->type, ctx);
+		if (err)
+			return err;
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->type, ctx);
+			if (err)
+				return err;
+		}
+		return 0;
+	}
+
+	case BTF_KIND_DATASEC: {
+		struct btf_var_secinfo *m = btf_var_secinfos(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->type, ctx);
+			if (err)
+				return err;
+		}
+		return 0;
+	}
+
+	default:
+		return -EINVAL;
+	}
+}
+
+int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx)
+{
+	int i, n, err;
+
+	err = visit(&t->name_off, ctx);
+	if (err)
+		return err;
+
+	switch (btf_kind(t)) {
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION: {
+		struct btf_member *m = btf_members(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->name_off, ctx);
+			if (err)
+				return err;
+		}
+		break;
+	}
+	case BTF_KIND_ENUM: {
+		struct btf_enum *m = btf_enum(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->name_off, ctx);
+			if (err)
+				return err;
+		}
+		break;
+	}
+	case BTF_KIND_ENUM64: {
+		struct btf_enum64 *m = btf_enum64(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->name_off, ctx);
+			if (err)
+				return err;
+		}
+		break;
+	}
+	case BTF_KIND_FUNC_PROTO: {
+		struct btf_param *m = btf_params(t);
+
+		for (i = 0, n = btf_vlen(t); i < n; i++, m++) {
+			err = visit(&m->name_off, ctx);
+			if (err)
+				return err;
+		}
+		break;
+	}
+	default:
+		break;
+	}
+
+	return 0;
+}
diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c
index c06851f05472..a6534832296d 100644
--- a/tools/lib/bpf/btf_relocate.c
+++ b/tools/lib/bpf/btf_relocate.c
@@ -5,11 +5,34 @@
 #define _GNU_SOURCE
 #endif
 
+#ifdef __KERNEL__
+#include <linux/bpf.h>
+#include <linux/bsearch.h>
+#include <linux/btf.h>
+#include <linux/sort.h>
+#include <linux/string.h>
+#include <linux/bpf_verifier.h>
+
+#define btf_type_by_id				(struct btf_type *)btf_type_by_id
+#define btf__type_cnt				btf_nr_types
+#define btf__base_btf				btf_base_btf
+#define btf__name_by_offset			btf_name_by_offset
+#define btf__str_by_offset			btf_str_by_offset
+#define btf_kflag				btf_type_kflag
+
+#define calloc(nmemb, sz)			kvcalloc(nmemb, sz, GFP_KERNEL | __GFP_NOWARN)
+#define free(ptr)				kvfree(ptr)
+#define qsort_r(base, num, sz, cmp, priv)	sort_r(base, num, sz, (cmp_r_func_t)cmp, NULL, priv)
+
+#else
+
 #include "btf.h"
 #include "bpf.h"
 #include "libbpf.h"
 #include "libbpf_internal.h"
 
+#endif /* __KERNEL__ */
+
 struct btf;
 
 struct btf_relocate {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4 bpf-next 11/11] bpftool: support displaying relocated-with-base split BTF
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
                   ` (9 preceding siblings ...)
  2024-05-17 10:22 ` [PATCH v4 bpf-next 10/11] libbpf,bpf: share BTF relocate-related code with kernel Alan Maguire
@ 2024-05-17 10:22 ` Alan Maguire
  2024-05-22  9:04   ` Quentin Monnet
       [not found] ` <CA+JHD93=ZcVN4GxepbRF6SLorWJjw0gCgJZUYxQG5hxFehdHUw@mail.gmail.com>
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 10:22 UTC (permalink / raw)
  To: andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan, Alan Maguire, Quentin Monnet

If the -R <base_btf> option is used, we can display BTF that has been
generated with distilled base BTF in its relocated form.  For example
for bpf_testmod.ko (which is built as an out-of-tree module, so has
a distilled .BTF.base section:

bpftool btf dump file bpf_testmod.ko

Alternatively, we can display content relocated with
(a possibly changed) base BTF via

bpftool btf dump -R /sys/kernel/btf/vmlinux bpf_testmod.ko

The latter mirrors how the kernel will handle such split
BTF; it relocates its representation with the running
kernel, and if successful, renumbers BTF ids to reference
the current vmlinux BTF.

Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
---
 tools/bpf/bpftool/Documentation/bpftool-btf.rst | 15 ++++++++++++++-
 tools/bpf/bpftool/bash-completion/bpftool       |  7 ++++---
 tools/bpf/bpftool/btf.c                         | 11 ++++++++++-
 tools/bpf/bpftool/main.c                        | 14 +++++++++++++-
 tools/bpf/bpftool/main.h                        |  2 ++
 5 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-btf.rst b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
index 3f6bca03ad2e..b11abebeae81 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-btf.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
@@ -16,7 +16,7 @@ SYNOPSIS
 
 **bpftool** [*OPTIONS*] **btf** *COMMAND*
 
-*OPTIONS* := { |COMMON_OPTIONS| | { **-B** | **--base-btf** } }
+*OPTIONS* := { |COMMON_OPTIONS| | { **-B** | **--base-btf** } { **-R** | **relocate-base-btf** } }
 
 *COMMANDS* := { **dump** | **help** }
 
@@ -87,6 +87,19 @@ OPTIONS
     BTF object is passed through other handles, this option becomes
     necessary.
 
+-R, --relocate-base-btf *FILE*
+    When split BTF is generated with distilled base BTF for relocation,
+    the latter is stored in a .BTF.base section and allows us to later
+    relocate split BTF and a potentially-changed base BTF by using
+    information in the .BTF.base section about the base types referenced
+    from split BTF.  Relocation is carried out against the split BTF
+    supplied via this parameter and the split BTF will then refer to
+    the base types supplied in *FILE*.
+
+    If this option is not used, split BTF is shown relative to the
+    .BTF.base, which contains just enough information to support later
+    relocation.
+
 EXAMPLES
 ========
 **# bpftool btf dump id 1226**
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index be99d49b8714..702335848e3c 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -262,7 +262,7 @@ _bpftool()
     # Deal with options
     if [[ ${words[cword]} == -* ]]; then
         local c='--version --json --pretty --bpffs --mapcompat --debug \
-            --use-loader --base-btf'
+            --use-loader --base-btf --relocate-base-btf'
         COMPREPLY=( $( compgen -W "$c" -- "$cur" ) )
         return 0
     fi
@@ -283,7 +283,7 @@ _bpftool()
             _sysfs_get_netdevs
             return 0
             ;;
-        file|pinned|-B|--base-btf)
+        file|pinned|-B|-R|--base-btf|--relocate-base-btf)
             _filedir
             return 0
             ;;
@@ -297,7 +297,8 @@ _bpftool()
     local i pprev
     for (( i=1; i < ${#words[@]}; )); do
         if [[ ${words[i]::1} == - ]] &&
-            [[ ${words[i]} != "-B" ]] && [[ ${words[i]} != "--base-btf" ]]; then
+            [[ ${words[i]} != "-B" ]] && [[ ${words[i]} != "--base-btf" ]] &&
+            [[ ${words[i]} != "-R" ]] && [[ ${words[i]} != "--relocate-base-btf" ]]; then
             words=( "${words[@]:0:i}" "${words[@]:i+1}" )
             [[ $i -le $cword ]] && cword=$(( cword - 1 ))
         else
diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index 67be216023e8..aaa8e4cb9ffd 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -763,6 +763,14 @@ static int do_dump(int argc, char **argv)
 			base_btf = btf__parse_opts(*argv, &optp);
 			if (base_btf)
 				btf = btf__parse_split(*argv, base_btf);
+			if (btf && relocate_base_btf) {
+				err = btf__relocate(btf, relocate_base_btf);
+				if (err) {
+					p_err("could not relocate BTF from '%s' with base BTF '%s': %s\n",
+					      *argv, relocate_base_btf_path, strerror(-err));
+					goto done;
+				}
+			}
 		}
 		if (!btf) {
 			err = -errno;
@@ -1203,7 +1211,8 @@ static int do_help(int argc, char **argv)
 		"       " HELP_SPEC_MAP "\n"
 		"       " HELP_SPEC_PROGRAM "\n"
 		"       " HELP_SPEC_OPTIONS " |\n"
-		"                    {-B|--base-btf} }\n"
+		"                    {-B|--base-btf} |\n"
+		"                    {-R|--relocate-base-btf} }\n"
 		"",
 		bin_name, "btf");
 
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 08d0ac543c67..69d4906bec5c 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -32,6 +32,8 @@ bool verifier_logs;
 bool relaxed_maps;
 bool use_loader;
 struct btf *base_btf;
+struct btf *relocate_base_btf;
+const char *relocate_base_btf_path;
 struct hashmap *refs_table;
 
 static void __noreturn clean_and_exit(int i)
@@ -448,6 +450,7 @@ int main(int argc, char **argv)
 		{ "debug",	no_argument,	NULL,	'd' },
 		{ "use-loader",	no_argument,	NULL,	'L' },
 		{ "base-btf",	required_argument, NULL, 'B' },
+		{ "relocate-base-btf", required_argument, NULL, 'R' },
 		{ 0 }
 	};
 	bool version_requested = false;
@@ -473,7 +476,7 @@ int main(int argc, char **argv)
 	bin_name = "bpftool";
 
 	opterr = 0;
-	while ((opt = getopt_long(argc, argv, "VhpjfLmndB:l",
+	while ((opt = getopt_long(argc, argv, "VhpjfLmndB:lR:",
 				  options, NULL)) >= 0) {
 		switch (opt) {
 		case 'V':
@@ -519,6 +522,15 @@ int main(int argc, char **argv)
 		case 'L':
 			use_loader = true;
 			break;
+		case 'R':
+			relocate_base_btf_path = optarg;
+			relocate_base_btf = btf__parse(optarg, NULL);
+			if (!relocate_base_btf) {
+				p_err("failed to parse base BTF for relocation at '%s': %d\n",
+				      optarg, -errno);
+				return -1;
+			}
+			break;
 		default:
 			p_err("unrecognized option '%s'", argv[optind - 1]);
 			if (json_output)
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 9eb764fe4cc8..bbf8194a2d76 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -83,6 +83,8 @@ extern bool verifier_logs;
 extern bool relaxed_maps;
 extern bool use_loader;
 extern struct btf *base_btf;
+extern struct btf *relocate_base_btf;
+extern const char *relocate_base_btf_path;
 extern struct hashmap *refs_table;
 
 void __printf(1, 2) p_err(const char *fmt, ...);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
       [not found] ` <CA+JHD93=ZcVN4GxepbRF6SLorWJjw0gCgJZUYxQG5hxFehdHUw@mail.gmail.com>
@ 2024-05-17 11:56   ` Alan Maguire
  0 siblings, 0 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-17 11:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andrii Nakryiko, Jiri Olsa, Arnaldo de Melo, Quentin Monnet,
	Eddy Z, mykolal, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, yonghong.song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, houtao1, bpf,
	Masahiro Yamada, Luis R. Rodriguez, Nathan Chancellor

On 17/05/2024 12:11, Arnaldo Carvalho de Melo wrote:
> On Fri, May 17, 2024, 7:23 AM Alan Maguire <alan.maguire@oracle.com
> <mailto:alan.maguire@oracle.com>> wrote:
> 
>     Split BPF Type Format (BTF) provides huge advantages in that kernel
>     modules only have to provide type information for types that they do not
>     share with the core kernel; for core kernel types, split BTF refers to
>     core kernel BTF type ids.  So for a STRUCT sk_buff, a module that
>     uses that structure (or a pointer to it) simply needs to refer to the
>     core kernel type id, saving the need to define the structure and its
>     many
>     dependents.  This cuts down on duplication and makes BTF as compact
>     as possible.
> 
>     However, there is a downside.  This scheme requires the references from
>     split BTF to base BTF to be valid not just at encoding time, but at use
>     time (when the module is loaded).  Even a small change in kernel types
>     can perturb the type ids in core kernel BTF, and due to pahole's
>     parallel processing of compilation units, even an unchanged kernel can
>     have different type ids if BTF is re-generated.
> 
> 
> 
> I think it would be informative to mention the recently added
> "reproducible_build" feature, i.e. rephrase to "... if the
> reproducible_build isn't selected via --btf_features..." in the relevant
> documentation.
>

Yeah, sorry this part should have been updated after the
reproducible_build feature landed.

> - Arnaldo
> 
> Sent from smartphone, still on my way back home from LSF/MM+BPF
> 
>       So we have a robustness
>     problem for split BTF for cases where a module is not always compiled at
>     the same time as the kernel.  This problem is particularly acute for
>     distros which generally want module builders to be able to compile a
>     module for the lifetime of a Linux stable-based release, and have it
>     continue to be valid over the lifetime of that release, even as changes
>     in data structures (and hence BTF types) accrue.  Today it's not
>     possible to generate BTF for modules that works beyond the initial
>     kernel it is compiled against - kernel bugfixes etc invalidate the split
>     BTF references to vmlinux BTF, and BTF is no longer usable for the
>     module.
> 
>     The goal of this series is to provide options to provide additional
>     context for cases like this.  That context comes in the form of
>     distilled base BTF; it stands in for the base BTF, and contains
>     information about the types referenced from split BTF, but not their
>     full descriptions.  The modified split BTF will refer to type ids in
>     this .BTF.base section, and when the kernel loads such modules it
>     will use that .BTF.base to map references from split BTF to the
>     equivalent current vmlinux base BTF types.  Once this relocation
>     process has succeeded, the module BTF available in /sys/kernel/btf
>     will look exactly as if it was built with the current vmlinux;
>     references to base types will be fixed up etc.
> 
>     A module builder - using this series along with the pahole changes -
>     can then build a module with distilled base BTF via an out-of-tree
>     module build, i.e.
> 
>     make -C . M=path/2/module
> 
>     The module will have a .BTF section (the split BTF) and a
>     .BTF.base section.  The latter is small in size - distilled base
>     BTF does not need full struct/union/enum information for named
>     types for example.  For 2667 modules built with distilled base BTF,
>     the average size observed was 1556 bytes (stddev 1563).  The overall
>     size added to this 2667 modules was 5.3Mb.
> 
>     Note that for the in-tree modules, this approach is not needed as
>     split and base BTF in the case of in-tree modules are always built
>     and re-built together.
> 
>     The series first focuses on generating split BTF with distilled base
>     BTF, and provides btf__parse_opts() which allows specification
>     of the section name from which to read BTF data, since we now have
>     both .BTF and .BTF.base sections that can contain such data.
> 
>     Then we add support to resolve_btfids for generating the .BTF.ids
>     section with reference to the .BTF.base section - this ensures the
>     .BTF.ids match those used in the split/base BTF.
> 
>     Finally the series provides the mechanism for relocating split BTF with
>     a new base; the distilled base BTF is used to map the references to base
>     BTF in the split BTF to the new base.  For the kernel, this relocation
>     process happens at module load time, and we relocate split BTF
>     references to point at types in the current vmlinux BTF.  As part of
>     this, .BTF.ids references need to be mapped also.
> 
>     So concretely, what happens is
> 
>     - we generate split BTF in the .BTF section of a module that refers to
>       types in the .BTF.base section as base types; the latter are not full
>       type descriptions but provide information about the base type.  So
>       a STRUCT sk_buff would be represented as a FWD struct sk_buff in
>       distilled base BTF for example.
>     - when the module is loaded, the split BTF is relocated with vmlinux
>       BTF; in the case of the FWD struct sk_buff, we find the STRUCT sk_buff
>       in vmlinux BTF and map all split BTF references to the distilled base
>       FWD sk_buff, replacing them with references to the vmlinux BTF
>       STRUCT sk_buff.
> 
>     Support is also added to bpftool to be able to display split BTF
>     relative to its .BTF.base section, and also to display the relocated
>     form via the "-R path_to_base_btf".
> 
>     A previous approach to this problem [1] utilized standalone BTF for such
>     cases - where the BTF is not defined relative to base BTF so there is no
>     relocation required.  The problem with that approach is that from
>     the verifier perspective, some types are special, and having a custom
>     representation of a core kernel type that did not necessarily match the
>     current representation is not tenable.  So the approach taken here was
>     to preserve the split BTF model while minimizing the representation of
>     the context needed to relocate split and current vmlinux BTF.
> 
>     To generate distilled .BTF.base sections the associated dwarves
>     patch (to be applied on the "next" branch there) is needed.
>     Without it, things will still work but modules will not be built
>     with a .BTF.base section.
> 
>     Changes since v3[3]:
> 
>     - distill now checks for duplicate-named struct/unions and records
>       them as a sized struct/union to help identify which of the
>       multiple base BTF structs/unions it refers to (Eduard, patch 1)
>     - added test support for multiple name handling (Eduard, patch 2)
>     - simplified the string mapping when updating split BTF to use
>       base BTF instead of distilled base.  Since the only string
>       references split BTF can make to base BTF are the names of
>       the base types, create a string map from distilled string
>       offset -> base BTF string offset and update string offsets
>       by visiting all strings in split BTF; this saves having to
>       do costly searches of base BTF (Eduard, patch 7,10)
>     - fixed bpftool manpage and indentation issues (Quentin, patch 11)
> 
>     Also explored Eduard's suggestion of doing an implicit fallback
>     to checking for .BTF.base section in btf__parse() when it is
>     called to get base BTF.  However while it is doable, it turned
>     out to be difficult operationally.  Since fallback is implicit
>     we do not know the source of the BTF - was it from .BTF or
>     .BTF.base? In bpftool, we want to try first standalone BTF,
>     then split, then split with distilled base.  Having a way
>     to explicitly request .BTF.base via btf__parse_opts() fits
>     that model better.
> 
>     Changes since v2[4]:
> 
>     - submitted patch to use --btf_features in Makefile.btf for pahole
>       v1.26 and later separately (Andrii).  That has landed in bpf-next
>       now.
>     - distilled base now encodes ENUM64 as fwd ENUM (size 8), eliminating
>       the need for support for ENUM64 in btf__add_fwd (patch 1, Andrii)
>     - moved to distilling only named types, augmenting split BTF with
>       associated reference types; this simplifies greatly the distilled
>       base BTF and the mapping operation between distilled and base
>       BTF when relocating (most of the series changes, Andrii)
>     - relocation now iterates over base BTF, looking for matches based
>       on name in distilled BTF.  Distilled BTF is pre-sorted by name
>       (Andrii, patch 8)
>     - removed most redundant compabitiliby checks aside from struct
>       size for base types/embedded structs and kind compatibility
>       (since we only match on name) (Andrii, patch 8)
>     - btf__parse_opts() now replaces btf_parse() internally in libbpf
>       (Eduard, patch 3)
> 
>     Changes since RFC [5]:
> 
>     - updated terminology; we replace clunky "base reference" BTF with
>       distilling base BTF into a .BTF.base section. Similarly BTF
>       reconcilation becomes BTF relocation (Andrii, most patches)
>     - add distilled base BTF by default for out-of-tree modules
>       (Alexei, patch 8)
>     - distill algorithm updated to record size of embedded struct/union
>       by recording it as a 0-vlen STRUCT/UNION with size preserved
>       (Andrii, patch 2)
>     - verify size match on relocation for such STRUCT/UNIONs (Andrii,
>       patch 9)
>     - with embedded STRUCT/UNION recording size, we can have bpftool
>       dump a header representation using .BTF.base + .BTF sections
>       rather than special-casing and refusing to use "format c" for
>       that case (patch 5)
>     - match enum with enum64 and vice versa (Andrii, patch 9)
>     - ensure that resolve_btfids works with BTF without .BTF.base
>       section (patch 7)
>     - update tests to cover embedded types, arrays and function
>       prototypes (patches 3, 12)
> 
>     [1]
>     https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/ <https://lore.kernel.org/bpf/20231112124834.388735-14-alan.maguire@oracle.com/>
>     [2]
>     https://lore.kernel.org/bpf/20240501175035.2476830-1-alan.maguire@oracle.com/ <https://lore.kernel.org/bpf/20240501175035.2476830-1-alan.maguire@oracle.com/>
>     [3]
>     https://lore.kernel.org/bpf/20240510103052.850012-1-alan.maguire@oracle.com/ <https://lore.kernel.org/bpf/20240510103052.850012-1-alan.maguire@oracle.com/>
>     [4]
>     https://lore.kernel.org/bpf/20240424154806.3417662-1-alan.maguire@oracle.com/ <https://lore.kernel.org/bpf/20240424154806.3417662-1-alan.maguire@oracle.com/>
>     [5]
>     https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/ <https://lore.kernel.org/bpf/20240322102455.98558-1-alan.maguire@oracle.com/>
> 
>     Alan Maguire (11):
>       libbpf: add btf__distill_base() creating split BTF with distilled base
>         BTF
>       selftests/bpf: test distilled base, split BTF generation
>       libbpf: add btf__parse_opts() API for flexible BTF parsing
>       bpftool: support displaying raw split BTF using base BTF section as
>         base
>       resolve_btfids: use .BTF.base ELF section as base BTF if -B option is
>         used
>       kbuild, bpf: add module-specific pahole/resolve_btfids flags for
>         distilled base BTF
>       libbpf: split BTF relocation
>       selftests/bpf: extend distilled BTF tests to cover BTF relocation
>       module, bpf: store BTF base pointer in struct module
>       libbpf,bpf: share BTF relocate-related code with kernel
>       bpftool: support displaying relocated-with-base split BTF
> 
>      include/linux/btf.h                           |  45 ++
>      include/linux/module.h                        |   2 +
>      kernel/bpf/Makefile                           |   8 +
>      kernel/bpf/btf.c                              | 166 +++--
>      kernel/module/main.c                          |   5 +-
>      scripts/Makefile.btf                          |   7 +
>      scripts/Makefile.modfinal                     |   4 +-
>      .../bpf/bpftool/Documentation/bpftool-btf.rst |  15 +-
>      tools/bpf/bpftool/bash-completion/bpftool     |   7 +-
>      tools/bpf/bpftool/btf.c                       |  19 +-
>      tools/bpf/bpftool/main.c                      |  14 +-
>      tools/bpf/bpftool/main.h                      |   2 +
>      tools/bpf/resolve_btfids/main.c               |  28 +-
>      tools/lib/bpf/Build                           |   2 +-
>      tools/lib/bpf/btf.c                           | 605 +++++++++++++-----
>      tools/lib/bpf/btf.h                           |  59 ++
>      tools/lib/bpf/btf_common.c                    | 143 +++++
>      tools/lib/bpf/btf_relocate.c                  | 341 ++++++++++
>      tools/lib/bpf/libbpf.map                      |   3 +
>      tools/lib/bpf/libbpf_internal.h               |   3 +
>      .../selftests/bpf/prog_tests/btf_distill.c    | 346 ++++++++++
>      21 files changed, 1612 insertions(+), 212 deletions(-)
>      create mode 100644 tools/lib/bpf/btf_common.c
>      create mode 100644 tools/lib/bpf/btf_relocate.c
>      create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c
> 
>     -- 
>     2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
                   ` (11 preceding siblings ...)
       [not found] ` <CA+JHD93=ZcVN4GxepbRF6SLorWJjw0gCgJZUYxQG5hxFehdHUw@mail.gmail.com>
@ 2024-05-17 21:09 ` Eduard Zingerman
  2024-05-20  9:36   ` Alan Maguire
  2024-05-18  2:38 ` Eduard Zingerman
  13 siblings, 1 reply; 33+ messages in thread
From: Eduard Zingerman @ 2024-05-17 21:09 UTC (permalink / raw)
  To: Alan Maguire, andrii, jolsa, acme, quentin
  Cc: mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan

On Fri, 2024-05-17 at 11:22 +0100, Alan Maguire wrote:
[...]

> Changes since v3[3]:
> 
> - distill now checks for duplicate-named struct/unions and records
>   them as a sized struct/union to help identify which of the
>   multiple base BTF structs/unions it refers to (Eduard, patch 1)

Hi Alan,

Sorry, a little bit more on this topic.
- In patch #1 two kinds of structs get BTF_KIND_STRUCT declaration
  with size: those embedded and those with ambiguous name.
- In patch #7 btf_relocate_map_distilled_base() unconditionally
  requires size field for BTF_KIND_STRUCT to match.

This might hinder portability in the following scenario:
- base type is referred to only by pointer;
- base type has ambiguous name (in the old kernel used for base BTF
  generation);
- base type size is different in the new kernel when module is loaded.

There is also a scenario when type name is not ambiguous in the old
kernel, but becomes ambiguous in the new kernel.

So, what I had in mind when commented in v3 was:
- if distilled base has FWD for a structure and an ambiguous name,
  fail to relocate;
- if distilled base has STRUCT for a structure, find a unique pair
  name/size or fail to relocate.

This covers scenario #1 but ignores scenario #2 and requires minimal
changes for v3 design.

An alternative would be to e.g. keep STRUCT with size for all
structures in the base BTF and to compute "embedded" flag during
relocation:
- if distilled base STRUCT is embedded, search for a unique pair
  name/size or fail to relocate;
- if distilled base STRUCT is not embedded, search for a uniquely
  named struct, if that fails search for a unique pair name/size,
  or fail to relocate.

If we consider above to much of a hassle, I think v3 design + size
check for STRUCT is better because it is a bit simpler.

Wdyt?

[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
  2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
                   ` (12 preceding siblings ...)
  2024-05-17 21:09 ` Eduard Zingerman
@ 2024-05-18  2:38 ` Eduard Zingerman
  2024-05-21  9:15   ` Alan Maguire
  13 siblings, 1 reply; 33+ messages in thread
From: Eduard Zingerman @ 2024-05-18  2:38 UTC (permalink / raw)
  To: Alan Maguire, andrii, jolsa, acme, quentin
  Cc: mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan

On Fri, 2024-05-17 at 11:22 +0100, Alan Maguire wrote:

(Also, please note that CI fails for this series).

[...]

> Also explored Eduard's suggestion of doing an implicit fallback
> to checking for .BTF.base section in btf__parse() when it is
> called to get base BTF.  However while it is doable, it turned
> out to be difficult operationally.  Since fallback is implicit
> we do not know the source of the BTF - was it from .BTF or
> .BTF.base? In bpftool, we want to try first standalone BTF,
> then split, then split with distilled base.  Having a way
> to explicitly request .BTF.base via btf__parse_opts() fits
> that model better.

I don't think this is the case. Here is what I mean:
https://github.com/eddyz87/bpf/tree/distilled-base-alternative-parse-elf

The branch above is a modification for btf_parse_elf() and a few
reverts on top of this patch-set.

I modified btf_parse_elf() to follow the logic below:

| base_btf   | .BTF.base | Effect                                      |
| specified? | present?  |                                             |
|------------+-----------+---------------------------------------------|
| no         | no        | load btf from .BTF                          |
|------------+-----------+---------------------------------------------|
| yes        | no        | load btf from .BTF using base_btf as base   |
|            |           |                                             |
|------------+-----------+---------------------------------------------|
| no         | yes       | load btf from .BTF using .BTF.base as base  |
|            |           |                                             |
|------------+-----------+---------------------------------------------|
| yes        | yes       | load btf from .BTF using .BTF.base as base, |
|            |           | relocate btf against base_btf               |

When organized like that, there is no need to modify libbpf clients to
work with split BTF.

The `bpftool btf dump file ./btf_testmod.ko` would print non-relocated BTF.
The `bpftool btf -B ../../../vmlinux dump file ./btf_testmod.ko` would
print relocated BTF, no need for separate -R flag.
Imo, loading split BTF w/o relocation when .BTF.base is present
is interesting only for debug purposes and could be handled separately
as all building blocks are present in the library.

[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
  2024-05-17 21:09 ` Eduard Zingerman
@ 2024-05-20  9:36   ` Alan Maguire
  0 siblings, 0 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-20  9:36 UTC (permalink / raw)
  To: Eduard Zingerman, andrii, jolsa, acme, quentin
  Cc: mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan

On 17/05/2024 22:09, Eduard Zingerman wrote:
> On Fri, 2024-05-17 at 11:22 +0100, Alan Maguire wrote:
> [...]
> 
>> Changes since v3[3]:
>>
>> - distill now checks for duplicate-named struct/unions and records
>>   them as a sized struct/union to help identify which of the
>>   multiple base BTF structs/unions it refers to (Eduard, patch 1)
> 
> Hi Alan,
> 
> Sorry, a little bit more on this topic.
> - In patch #1 two kinds of structs get BTF_KIND_STRUCT declaration
>   with size: those embedded and those with ambiguous name.
> - In patch #7 btf_relocate_map_distilled_base() unconditionally
>   requires size field for BTF_KIND_STRUCT to match.
> 
> This might hinder portability in the following scenario:
> - base type is referred to only by pointer;
> - base type has ambiguous name (in the old kernel used for base BTF
>   generation);
> - base type size is different in the new kernel when module is loaded.
> 
> There is also a scenario when type name is not ambiguous in the old
> kernel, but becomes ambiguous in the new kernel.
> 
> So, what I had in mind when commented in v3 was:
> - if distilled base has FWD for a structure and an ambiguous name,
>   fail to relocate;
> - if distilled base has STRUCT for a structure, find a unique pair
>   name/size or fail to relocate.
> 
> This covers scenario #1 but ignores scenario #2 and requires minimal
> changes for v3 design.
> 
> An alternative would be to e.g. keep STRUCT with size for all
> structures in the base BTF and to compute "embedded" flag during
> relocation:
> - if distilled base STRUCT is embedded, search for a unique pair
>   name/size or fail to relocate;
> - if distilled base STRUCT is not embedded, search for a uniquely
>   named struct, if that fails search for a unique pair name/size,
>   or fail to relocate.
>

Hi Eduard,

IIRC I think Andrii suggested something like the above; then the idea
was to use a 0 STRUCT size for cases where we didn't care about matching
size rather than using a mix of FWDs and STRUCTs. As you say though,
conditions can be different in the target base such that we might have
wished we had recorded additional size information, and we can only
really know that at relocation time.

That being the case, I think the most robust approach is as you suggest
to always record STRUCT/UNION size at distillation time and then only
require size matches for the embedded and duplicate name-kind cases.
This requires moving the embeddedness/dup logic from distillation to
relocation.

> If we consider above to much of a hassle, I think v3 design + size
> check for STRUCT is better because it is a bit simpler.
> 
> Wdyt?
>

The main goal from my side is to try and ensure we maximize the
opportunities to reconcile split and base BTF where possible. The scheme
you suggest above - always recording size but selectively using it in
matching - seems like the best way to achieve that. Thanks!

Alan

> [...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
  2024-05-18  2:38 ` Eduard Zingerman
@ 2024-05-21  9:15   ` Alan Maguire
  2024-05-21 16:19     ` Eduard Zingerman
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Maguire @ 2024-05-21  9:15 UTC (permalink / raw)
  To: Eduard Zingerman, andrii, jolsa, acme, quentin
  Cc: mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan

On 18/05/2024 03:38, Eduard Zingerman wrote:
> On Fri, 2024-05-17 at 11:22 +0100, Alan Maguire wrote:
> 
> (Also, please note that CI fails for this series).
> 
> [...]
> 
>> Also explored Eduard's suggestion of doing an implicit fallback
>> to checking for .BTF.base section in btf__parse() when it is
>> called to get base BTF.  However while it is doable, it turned
>> out to be difficult operationally.  Since fallback is implicit
>> we do not know the source of the BTF - was it from .BTF or
>> .BTF.base? In bpftool, we want to try first standalone BTF,
>> then split, then split with distilled base.  Having a way
>> to explicitly request .BTF.base via btf__parse_opts() fits
>> that model better.
> 
> I don't think this is the case. Here is what I mean:
> https://github.com/eddyz87/bpf/tree/distilled-base-alternative-parse-elf
> 
> The branch above is a modification for btf_parse_elf() and a few
> reverts on top of this patch-set.
> 
> I modified btf_parse_elf() to follow the logic below:
> 
> | base_btf   | .BTF.base | Effect                                      |
> | specified? | present?  |                                             |
> |------------+-----------+---------------------------------------------|
> | no         | no        | load btf from .BTF                          |
> |------------+-----------+---------------------------------------------|
> | yes        | no        | load btf from .BTF using base_btf as base   |
> |            |           |                                             |
> |------------+-----------+---------------------------------------------|
> | no         | yes       | load btf from .BTF using .BTF.base as base  |
> |            |           |                                             |
> |------------+-----------+---------------------------------------------|
> | yes        | yes       | load btf from .BTF using .BTF.base as base, |
> |            |           | relocate btf against base_btf               |
> 
> When organized like that, there is no need to modify libbpf clients to
> work with split BTF.
> 
> The `bpftool btf dump file ./btf_testmod.ko` would print non-relocated BTF.
> The `bpftool btf -B ../../../vmlinux dump file ./btf_testmod.ko` would
> print relocated BTF, no need for separate -R flag.
> Imo, loading split BTF w/o relocation when .BTF.base is present
> is interesting only for debug purposes and could be handled separately
> as all building blocks are present in the library.
> 

This is a neat approach, and as you say it eliminates the need to modify
bpftool to handle distilled base BTF and relocation.  The only wrinkle
is resolve_btfids; we call resolve_btfids for modules with a "-B
vmlinux" argument, so in that case we'd be calling btf_parse_elf() with
both a split and base BTF. According to the approach outlined above,
we'd relocate split BTF - originally relative to .BTF.base - to be
relative to vmlinux BTF, but in the case of resolve_btfids we don't want
that relocation. We want the BTF ids to reflect the distilled base BTF
ids since they will be relocated later on module load along with the
split BTF references themselves.

We can handle this by having a -R flag to skip relocation; it would
simply ensure we first try calling btf__parse(), falling back to
btf__parse_split(); we need the fallback logic as it is possible the
pahole version didn't add .BTF.base sections. This logic would only be
activated for out-of-tree module builds so seems acceptable to me. If
that makes sense, with your permission I can rework the series to
include your BTF parsing patch.

Thanks!

Alan
> [...]
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
  2024-05-21  9:15   ` Alan Maguire
@ 2024-05-21 16:19     ` Eduard Zingerman
  2024-05-21 18:54       ` Andrii Nakryiko
  0 siblings, 1 reply; 33+ messages in thread
From: Eduard Zingerman @ 2024-05-21 16:19 UTC (permalink / raw)
  To: Alan Maguire, andrii, jolsa, acme, quentin
  Cc: mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan

On Tue, 2024-05-21 at 10:15 +0100, Alan Maguire wrote:

[...]

> This is a neat approach, and as you say it eliminates the need to modify
> bpftool to handle distilled base BTF and relocation.  The only wrinkle
> is resolve_btfids; we call resolve_btfids for modules with a "-B
> vmlinux" argument, so in that case we'd be calling btf_parse_elf() with
> both a split and base BTF. According to the approach outlined above,
> we'd relocate split BTF - originally relative to .BTF.base - to be
> relative to vmlinux BTF, but in the case of resolve_btfids we don't want
> that relocation. We want the BTF ids to reflect the distilled base BTF
> ids since they will be relocated later on module load along with the
> split BTF references themselves.

You are correct, I missed this detail, resolve_btfids needs distilled
base instead of vmlinux for out of tree modules.

> We can handle this by having a -R flag to skip relocation; it would
> simply ensure we first try calling btf__parse(), falling back to
> btf__parse_split(); we need the fallback logic as it is possible the
> pahole version didn't add .BTF.base sections. This logic would only be
> activated for out-of-tree module builds so seems acceptable to me. If
> that makes sense, with your permission I can rework the series to
> include your BTF parsing patch.

Makes sense to me, but I'm curious whether you and Andrii consider
this a good interface, compared to _opts version.

Thanks,
Eduard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
  2024-05-21 16:19     ` Eduard Zingerman
@ 2024-05-21 18:54       ` Andrii Nakryiko
  2024-05-21 19:08         ` Eduard Zingerman
  0 siblings, 1 reply; 33+ messages in thread
From: Andrii Nakryiko @ 2024-05-21 18:54 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alan Maguire, andrii, jolsa, acme, quentin, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Tue, May 21, 2024 at 9:19 AM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2024-05-21 at 10:15 +0100, Alan Maguire wrote:
>
> [...]
>
> > This is a neat approach, and as you say it eliminates the need to modify
> > bpftool to handle distilled base BTF and relocation.  The only wrinkle
> > is resolve_btfids; we call resolve_btfids for modules with a "-B
> > vmlinux" argument, so in that case we'd be calling btf_parse_elf() with
> > both a split and base BTF. According to the approach outlined above,
> > we'd relocate split BTF - originally relative to .BTF.base - to be
> > relative to vmlinux BTF, but in the case of resolve_btfids we don't want
> > that relocation. We want the BTF ids to reflect the distilled base BTF
> > ids since they will be relocated later on module load along with the
> > split BTF references themselves.
>
> You are correct, I missed this detail, resolve_btfids needs distilled
> base instead of vmlinux for out of tree modules.
>
> > We can handle this by having a -R flag to skip relocation; it would
> > simply ensure we first try calling btf__parse(), falling back to
> > btf__parse_split(); we need the fallback logic as it is possible the
> > pahole version didn't add .BTF.base sections. This logic would only be
> > activated for out-of-tree module builds so seems acceptable to me. If
> > that makes sense, with your permission I can rework the series to
> > include your BTF parsing patch.
>
> Makes sense to me, but I'm curious whether you and Andrii consider
> this a good interface, compared to _opts version.
>

Hey, sorry for delays, just getting back to reviewing patches, I will
go through this revision today.

I'm probably leaning towards not doing automatic relocations in
btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
feature, if we need to teach resolve_btfids and bpftool to do
something extra for that case (i.e., call another API for relocation,
if necessary), then it's fine, doesn't seems like a problem.

Much worse is having to do some workarounds to prevent an API from
doing some undesired extra steps (like in resolve_btfids not wanting a
relocation). Orthogonality FTW, IMO.


> Thanks,
> Eduard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
  2024-05-21 18:54       ` Andrii Nakryiko
@ 2024-05-21 19:08         ` Eduard Zingerman
  2024-05-21 22:01           ` Andrii Nakryiko
  0 siblings, 1 reply; 33+ messages in thread
From: Eduard Zingerman @ 2024-05-21 19:08 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alan Maguire, andrii, jolsa, acme, quentin, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Tue, 2024-05-21 at 11:54 -0700, Andrii Nakryiko wrote:

[...]

> I'm probably leaning towards not doing automatic relocations in
> btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
> feature, if we need to teach resolve_btfids and bpftool to do
> something extra for that case (i.e., call another API for relocation,
> if necessary), then it's fine, doesn't seems like a problem.

My point is that with current implementation it does not even make
sense to call btf__parse() for an ELF with distilled base,
because it would fail.

And selecting BTF encoding based on a few retries seems like a kludge
if there is a simple way to check if distilled base has to be used
(presence of the .BTF.base section).

> Much worse is having to do some workarounds to prevent an API from
> doing some undesired extra steps (like in resolve_btfids not wanting a
> relocation). Orthogonality FTW, IMO.

For resolve_btfids it is a bit different, imo.
It does want some base: for in-tree modules it wants vmlinux,
for out-of-tree it wants distilled base.
So it has to be adjusted either way.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF
  2024-05-17 10:22 ` [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
@ 2024-05-21 21:48   ` Andrii Nakryiko
  2024-05-22 16:42     ` Alan Maguire
  2024-05-22 18:00   ` Kui-Feng Lee
  1 sibling, 1 reply; 33+ messages in thread
From: Andrii Nakryiko @ 2024-05-21 21:48 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, May 17, 2024 at 3:23 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> To support more robust split BTF, adding supplemental context for the
> base BTF type ids that split BTF refers to is required.  Without such
> references, a simple shuffling of base BTF type ids (without any other
> significant change) invalidates the split BTF.  Here the attempt is made
> to store additional context to make split BTF more robust.
>
> This context comes in the form of distilled base BTF providing minimal
> information (name and - in some cases - size) for base INTs, FLOATs,
> STRUCTs, UNIONs, ENUMs and ENUM64s along with modified split BTF that
> points at that base and contains any additional types needed (such as
> TYPEDEF, PTR and anonymous STRUCT/UNION declarations).  This
> information constitutes the minimal BTF representation needed to
> disambiguate or remove split BTF references to base BTF.  The rules
> are as follows:
>
> - INT, FLOAT are recorded in full.
> - if a named base BTF STRUCT or UNION is referred to from split BTF, it
>   will be encoded either as a zero-member sized STRUCT/UNION (preserving
>   size for later relocation checks) or as a named FWD.  Only base BTF
>   STRUCT/UNIONs that are either embedded in split BTF STRUCT/UNIONs or
>   that have multiple STRUCT/UNION instances of the same name need to
>   preserve size information, so a FWD representation will be used in
>   most cases.
> - if an ENUM[64] is named, a ENUM forward representation (an ENUM
>   with no values) is used.
> - in all other cases, the type is added to the new split BTF.
>
> Avoiding struct/union/enum/enum64 expansion is important to keep the
> distilled base BTF representation to a minimum size.
>
> When successful, new representations of the distilled base BTF and new
> split BTF that refers to it are returned.  Both need to be freed by the
> caller.
>
> So to take a simple example, with split BTF with a type referring
> to "struct sk_buff", we will generate distilled base BTF with a
> FWD struct sk_buff, and the split BTF will refer to it instead.
>
> Tools like pahole can utilize such split BTF to populate the .BTF
> section (split BTF) and an additional .BTF.base section.  Then
> when the split BTF is loaded, the distilled base BTF can be used
> to relocate split BTF to reference the current (and possibly changed)
> base BTF.
>
> So for example if "struct sk_buff" was id 502 when the split BTF was
> originally generated,  we can use the distilled base BTF to see that
> id 502 refers to a "struct sk_buff" and replace instances of id 502
> with the current (relocated) base BTF sk_buff type id.
>
> Distilled base BTF is small; when building a kernel with all modules
> using distilled base BTF as a test, ovreall module size grew by only

typo: overall

> 5.3Mb total across ~2700 modules.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/lib/bpf/btf.c      | 409 ++++++++++++++++++++++++++++++++++++++-
>  tools/lib/bpf/btf.h      |  20 ++
>  tools/lib/bpf/libbpf.map |   1 +
>  3 files changed, 424 insertions(+), 6 deletions(-)
>
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 2d0840ef599a..953929d196c3 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -1771,9 +1771,8 @@ static int btf_rewrite_str(__u32 *str_off, void *ctx)
>         return 0;
>  }
>
> -int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
> +static int btf_add_type(struct btf_pipe *p, const struct btf_type *src_type)
>  {
> -       struct btf_pipe p = { .src = src_btf, .dst = btf };
>         struct btf_type *t;
>         int sz, err;
>
> @@ -1782,20 +1781,27 @@ int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_t
>                 return libbpf_err(sz);
>
>         /* deconstruct BTF, if necessary, and invalidate raw_data */
> -       if (btf_ensure_modifiable(btf))
> +       if (btf_ensure_modifiable(p->dst))
>                 return libbpf_err(-ENOMEM);
>
> -       t = btf_add_type_mem(btf, sz);
> +       t = btf_add_type_mem(p->dst, sz);
>         if (!t)
>                 return libbpf_err(-ENOMEM);
>
>         memcpy(t, src_type, sz);
>
> -       err = btf_type_visit_str_offs(t, btf_rewrite_str, &p);
> +       err = btf_type_visit_str_offs(t, btf_rewrite_str, p);
>         if (err)
>                 return libbpf_err(err);
>
> -       return btf_commit_type(btf, sz);
> +       return btf_commit_type(p->dst, sz);
> +}
> +
> +int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
> +{
> +       struct btf_pipe p = { .src = src_btf, .dst = btf };
> +
> +       return btf_add_type(&p, src_type);
>  }
>
>  static int btf_rewrite_type_ids(__u32 *type_id, void *ctx)
> @@ -5212,3 +5218,394 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
>
>         return 0;
>  }
> +
> +#define BTF_NEEDS_SIZE (1 << 31)       /* flag set if either struct/union is
> +                                        * embedded - and thus size info must
> +                                        * be preserved - or if there are
> +                                        * multiple instances of the same
> +                                        * struct/union - where size can be
> +                                        * used to clarify which is wanted.
> +                                        */

please use full line comment in front of #define

> +#define BTF_ID(id)             (id & ~BTF_NEEDS_SIZE)
> +
> +struct btf_distill {
> +       struct btf_pipe pipe;
> +       int *ids;

reading the rest of the code, this BTF_NEEDS_SIZE and BTF_ID() macro
use was quite distracting. I'm wondering if it would be better to use
a simple struct with bitfields here, e.g.,

struct type_state {
    int id: 31;
    bool needs_size;
};

struct dist_state *states;

Same memory usage, same efficiency, but more readable code. WDYT?

> +       unsigned int split_start_id;
> +       unsigned int split_start_str;
> +       unsigned int diff_id;
> +};
> +
> +/* Check if a member of a split BTF struct/union refers to a base BTF

nit: comments uses "check" terminology, function name uses "find",
while really it "marks" some time as embedded... Let's use consistent
terminology (where mark seems like the most fitting, IMO)

> + * struct/union.  Members can be const/restrict/volatile/typedef
> + * reference types, but if a pointer is encountered, type is no longer
> + * considered embedded.
> + */
> +static int btf_find_embedded_composite_type_ids(__u32 *id, void *ctx)
> +{
> +       struct btf_distill *dist = ctx;
> +       const struct btf_type *t;
> +       __u32 next_id = *id;
> +
> +       do {
> +               if (next_id == 0)
> +                       return 0;
> +               t = btf_type_by_id(dist->pipe.src, next_id);
> +               switch (btf_kind(t)) {
> +               case BTF_KIND_CONST:
> +               case BTF_KIND_RESTRICT:
> +               case BTF_KIND_VOLATILE:
> +               case BTF_KIND_TYPEDEF:
> +               case BTF_KIND_TYPE_TAG:
> +                       next_id = t->type;
> +                       break;
> +               case BTF_KIND_ARRAY: {
> +                       struct btf_array *a = btf_array(t);
> +
> +                       next_id = a->type;
> +                       break;
> +               }
> +               case BTF_KIND_STRUCT:
> +               case BTF_KIND_UNION:
> +                       dist->ids[next_id] |= BTF_NEEDS_SIZE;
> +                       return 0;
> +               default:
> +                       return 0;
> +               }
> +
> +       } while (1);

nit: while (true)

> +
> +       return 0;
> +}
> +
> +/* Check if composite type has a duplicate-named type; if it does, retain

see above about check vs mark, here at least the function name uses "mark" :)

> + * size information to help guide later relocation towards the correct type.
> + * For example there are duplicate 'dma_chan' structs in vmlinux BTF;
> + * one is size 112, the other 16.  Having size information allows relocation
> + * to choose the right one.

re: this dma_chan and similar cases. Is it ever a problem where a
module actually needs two dma_chan in distilled base BTF? Name
conflicts do happen, but intuitively I'd expect this to happen between
some vmlinux-internal (so to speak, i.e., not the type used in
exported functions/types) and kernel module-specific types. It would
be awkward for the same module to use two different types that are
named the same.

Have you seen this as a problem in practice? What if for now we just
error out if there are two conflicting types required in distilled
BTF?

> + */
> +static int btf_mark_composite_dups(struct btf_distill *dist, __u32 id)
> +{
> +       __u8 *cnt = calloc(dist->split_start_str, sizeof(__u8));

nit: we generally follow that initialization of variable shouldn't be
doing non-trivial operations. So let's do calloc() as a separate
statement outside of variable declaration.

> +       struct btf_type *t;
> +       int i;
> +
> +       if (!cnt)
> +               return -ENOMEM;
> +
> +       /* First pass; collect name counts for composite types. */
> +       for (i = 1; i < dist->split_start_id; i++) {
> +               t = btf_type_by_id(dist->pipe.src, i);
> +               if (!btf_is_composite(t) || !t->name_off)
> +                       continue;
> +               if (cnt[t->name_off] < 255)
> +                       cnt[t->name_off]++;
> +       }
> +       /* Second pass; mark composite types with multiple instances of the
> +        * same name as needing size information.
> +        */
> +       for (i = 1; i < dist->split_start_id; i++) {
> +               /* id not needed or is already preserving size information */
> +               if (!dist->ids[i] || (dist->ids[i] & BTF_NEEDS_SIZE))
> +                       continue;
> +               t = btf_type_by_id(dist->pipe.src, i);
> +               if (!btf_is_composite(t) || !t->name_off)
> +                       continue;
> +               if (cnt[t->name_off] > 1)
> +                       dist->ids[i] |= BTF_NEEDS_SIZE;
> +       }
> +       free(cnt);
> +
> +       return 0;
> +}
> +
> +static bool btf_is_eligible_named_fwd(const struct btf_type *t)
> +{
> +       return (btf_is_composite(t) || btf_is_any_enum(t)) && t->name_off != 0;
> +}
> +
> +static int btf_add_distilled_type_ids(__u32 *id, void *ctx)
> +{
> +       struct btf_distill *dist = ctx;
> +       struct btf_type *t = btf_type_by_id(dist->pipe.src, *id);
> +       int err;
> +
> +       if (!*id)
> +               return 0;
> +       /* split BTF id, not needed */
> +       if (*id >= dist->split_start_id)
> +               return 0;
> +       /* already added ? */
> +       if (BTF_ID(dist->ids[*id]) > 0)
> +               return 0;
> +
> +       /* only a subset of base BTF types should be referenced from split
> +        * BTF; ensure nothing unexpected is referenced.
> +        */
> +       switch (btf_kind(t)) {
> +       case BTF_KIND_INT:
> +       case BTF_KIND_FLOAT:
> +       case BTF_KIND_FWD:
> +       case BTF_KIND_ARRAY:
> +       case BTF_KIND_STRUCT:
> +       case BTF_KIND_UNION:
> +       case BTF_KIND_TYPEDEF:
> +       case BTF_KIND_ENUM:
> +       case BTF_KIND_ENUM64:
> +       case BTF_KIND_PTR:
> +       case BTF_KIND_CONST:
> +       case BTF_KIND_RESTRICT:
> +       case BTF_KIND_VOLATILE:
> +       case BTF_KIND_FUNC_PROTO:
> +       case BTF_KIND_TYPE_TAG:
> +               dist->ids[*id] |= *id;
> +               break;
> +       default:
> +               pr_warn("unexpected reference to base type[%u] of kind [%u] when creating distilled base BTF.\n",
> +                       *id, btf_kind(t));
> +               return -EINVAL;
> +       }
> +
> +       /* struct/union members not needed, except for anonymous structs
> +        * and unions, which we need since name won't help us determine
> +        * matches; so if a named struct/union, no need to recurse
> +        * into members.
> +        */

is this an outdated comment? we shouldn't have anonymous types in the
distilled base, right?

> +       if (btf_is_eligible_named_fwd(t))
> +               return 0;
> +
> +       /* ensure references in type are added also. */
> +       err = btf_type_visit_type_ids(t, btf_add_distilled_type_ids, ctx);
> +       if (err < 0)
> +               return err;
> +       return 0;

nit: could be just `return btf_type_visit_type_ids(...);`?

> +}
> +
> +static int btf_add_distilled_types(struct btf_distill *dist)
> +{
> +       bool adding_to_base = dist->pipe.dst->start_id == 1;
> +       int id = btf__type_cnt(dist->pipe.dst);
> +       struct btf_type *t;
> +       int i, err = 0;
> +
> +       /* Add types for each of the required references to either distilled
> +        * base or split BTF, depending on type characteristics.
> +        */
> +       for (i = 1; i < dist->split_start_id; i++) {
> +               const char *name;
> +               int kind;
> +
> +               if (!BTF_ID(dist->ids[i]))
> +                       continue;
> +               t = btf_type_by_id(dist->pipe.src, i);
> +               kind = btf_kind(t);
> +               name = btf__name_by_offset(dist->pipe.src, t->name_off);
> +
> +               /* Named int, float, fwd struct, union, enum[64] are added to
> +                * base; everything else is added to split BTF.
> +                */
> +               switch (kind) {
> +               case BTF_KIND_INT:
> +               case BTF_KIND_FLOAT:
> +               case BTF_KIND_FWD:
> +               case BTF_KIND_STRUCT:
> +               case BTF_KIND_UNION:
> +               case BTF_KIND_ENUM:
> +               case BTF_KIND_ENUM64:
> +                       if ((adding_to_base && !t->name_off) || (!adding_to_base && t->name_off))
> +                               continue;
> +                       break;
> +               default:
> +                       if (adding_to_base)
> +                               continue;
> +                       break;
> +               }
> +               if (dist->ids[i] & BTF_NEEDS_SIZE) {
> +                       /* If a named struct/union in base BTF is referenced as a type
> +                        * in split BTF without use of a pointer - i.e. as an embedded
> +                        * struct/union - add an empty struct/union preserving size
> +                        * since size must be consistent when relocating split and
> +                        * possibly changed base BTF.  Similarly, when a struct/union
> +                        * has multiple instances of the same name in the original
> +                        * base BTF, retain size to help relocation later pick the
> +                        * right struct/union.
> +                        */
> +                       err = btf_add_composite(dist->pipe.dst, kind, name, t->size);
> +               } else if (btf_is_eligible_named_fwd(t)) {
> +                       /* If not embedded, use a fwd for named struct/unions since we
> +                        * can match via name without any other details.
> +                        */
> +                       switch (kind) {
> +                       case BTF_KIND_STRUCT:
> +                               err = btf__add_fwd(dist->pipe.dst, name, BTF_FWD_STRUCT);
> +                               break;
> +                       case BTF_KIND_UNION:
> +                               err = btf__add_fwd(dist->pipe.dst, name, BTF_FWD_UNION);
> +                               break;
> +                       case BTF_KIND_ENUM:
> +                               err = btf__add_enum(dist->pipe.dst, name, t->size);
> +                               break;
> +                       case BTF_KIND_ENUM64:

nit: combine ENUM/ENUM64 cases?

> +                               err = btf__add_enum(dist->pipe.dst, name, t->size);
> +                               break;
> +                       default:
> +                               pr_warn("unexpected kind [%u] when creating distilled base BTF.\n",
> +                                       btf_kind(t));
> +                               return -EINVAL;
> +                       }
> +               } else {
> +                       err = btf_add_type(&dist->pipe, t);

So this should never happen if adding_to_base == true, is that right?
Can we check this? Or am I missing something as usual?

> +               }
> +               if (err < 0)
> +                       break;
> +               dist->ids[i] = id++;
> +       }
> +       return err;
> +}
> +

[...]

> +       n = btf__type_cnt(new_split);
> +       /* Now update base/split BTF ids. */
> +       for (i = 1; i < n; i++) {
> +               t = btf_type_by_id(new_split, i);
> +
> +               err = btf_type_visit_type_ids(t, btf_update_distilled_type_ids, &dist);
> +               if (err < 0)
> +                       goto err_out;
> +       }
> +       free(dist.ids);
> +       hashmap__free(dist.pipe.str_off_map);
> +       *new_base_btf = new_base;
> +       *new_split_btf = new_split;
> +       return 0;
> +err_out:
> +       free(dist.ids);
> +       if (!IS_ERR(dist.pipe.str_off_map))

you don't need to check this, hashmap__free() works for IS_ERR()
pointers as well

> +               hashmap__free(dist.pipe.str_off_map);
> +       btf__free(new_split);
> +       btf__free(new_base);
> +       return libbpf_err(err);
> +}

[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
  2024-05-21 19:08         ` Eduard Zingerman
@ 2024-05-21 22:01           ` Andrii Nakryiko
  2024-05-21 22:15             ` Eduard Zingerman
  2024-05-22 16:16             ` Alan Maguire
  0 siblings, 2 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2024-05-21 22:01 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alan Maguire, andrii, jolsa, acme, quentin, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Tue, May 21, 2024 at 12:08 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2024-05-21 at 11:54 -0700, Andrii Nakryiko wrote:
>
> [...]
>
> > I'm probably leaning towards not doing automatic relocations in
> > btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
> > feature, if we need to teach resolve_btfids and bpftool to do
> > something extra for that case (i.e., call another API for relocation,
> > if necessary), then it's fine, doesn't seems like a problem.
>
> My point is that with current implementation it does not even make
> sense to call btf__parse() for an ELF with distilled base,
> because it would fail.

True (unless application loaded .BTF.base as stand-alone BTF first,
but it's pretty advanced scenario)

>
> And selecting BTF encoding based on a few retries seems like a kludge
> if there is a simple way to check if distilled base has to be used
> (presence of the .BTF.base section).

agreed

>
> > Much worse is having to do some workarounds to prevent an API from
> > doing some undesired extra steps (like in resolve_btfids not wanting a
> > relocation). Orthogonality FTW, IMO.
>
> For resolve_btfids it is a bit different, imo.
> It does want some base: for in-tree modules it wants vmlinux,
> for out-of-tree it wants distilled base.
> So it has to be adjusted either way.

Ok, so I read some more code and re-read your discussion w/ Alan. I
agree with your proposal, I think it's logical (even if relocation
does feel a bit "extra" for "parse"-like API, but ok, whatever).

I see what you are saying about resolve_btfids needing the changes
either way, and that's true. But instead of adding (unnecessary, IMO)
-R argument, resolve_btfids should be able to detect .BTF.base section
presence and infer that this is distilled BTF case, and thus proceed
with ignoring `-B <vmlinux>` argument (we can even complain that `-B
vmlinux` is specified if distilled BTF is used, not sure.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 03/11] libbpf: add btf__parse_opts() API for flexible BTF parsing
  2024-05-17 10:22 ` [PATCH v4 bpf-next 03/11] libbpf: add btf__parse_opts() API for flexible BTF parsing Alan Maguire
@ 2024-05-21 22:01   ` Andrii Nakryiko
  0 siblings, 0 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2024-05-21 22:01 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, May 17, 2024 at 3:23 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Options cover existing parsing scenarios (ELF, raw, retrieving
> .BTF.ext) and also allow specification of the ELF section name
> containing BTF.  This will allow consumers to retrieve BTF from
> .BTF.base sections (BTF_BASE_ELF_SEC) also.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/lib/bpf/btf.c      | 49 +++++++++++++++++++++++++++-------------
>  tools/lib/bpf/btf.h      | 31 +++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.map |  1 +
>  3 files changed, 65 insertions(+), 16 deletions(-)
>

[...]

> -static struct btf *btf_parse(const char *path, struct btf *base_btf, struct btf_ext **btf_ext)
> +struct btf *btf__parse_opts(const char *path, struct btf_parse_opts *opts)
>  {
> -       struct btf *btf;
> +       struct btf *btf, *base_btf;
> +       const char *btf_sec;
> +       struct btf_ext **btf_ext;
>         int err;
>
> +       if (!OPTS_VALID(opts, btf_parse_opts))
> +               return libbpf_err_ptr(-EINVAL);
> +       base_btf = OPTS_GET(opts, base_btf, NULL);
> +       btf_sec = OPTS_GET(opts, btf_sec, NULL);
> +       btf_ext = OPTS_GET(opts, btf_ext, NULL);
> +
>         if (btf_ext)
>                 *btf_ext = NULL;
> -
> -       btf = btf_parse_raw(path, base_btf);
> +       if (!btf_sec) {
> +               btf = btf_parse_raw(path, base_btf);
> +               err = libbpf_get_error(btf);

IS_ERR/PTR_ERR, btf_parse_raw is internal function, not a public API,
so we shouldn't be using libbpf_get_error() here

> +               if (!err)
> +                       return btf;
> +               if (err != -EPROTO)
> +                       return libbpf_err_ptr(err);
> +       }
> +       btf = btf_parse_elf(path, btf_sec ?: BTF_ELF_SEC, base_btf, btf_ext);
>         err = libbpf_get_error(btf);
> -       if (!err)
> -               return btf;
> -       if (err != -EPROTO)
> -               return ERR_PTR(err);
> -       return btf_parse_elf(path, base_btf, btf_ext);
> +       if (err)
> +               return libbpf_err_ptr(err);
> +       return btf;
>  }

[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
  2024-05-21 22:01           ` Andrii Nakryiko
@ 2024-05-21 22:15             ` Eduard Zingerman
  2024-05-21 22:36               ` Andrii Nakryiko
  2024-05-22 16:16             ` Alan Maguire
  1 sibling, 1 reply; 33+ messages in thread
From: Eduard Zingerman @ 2024-05-21 22:15 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alan Maguire, andrii, jolsa, acme, quentin, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Tue, 2024-05-21 at 15:01 -0700, Andrii Nakryiko wrote:
> On Tue, May 21, 2024 at 12:08 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > 
> > On Tue, 2024-05-21 at 11:54 -0700, Andrii Nakryiko wrote:
> > 
> > [...]
> > 
> > > I'm probably leaning towards not doing automatic relocations in
> > > btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
> > > feature, if we need to teach resolve_btfids and bpftool to do
> > > something extra for that case (i.e., call another API for relocation,
> > > if necessary), then it's fine, doesn't seems like a problem.
> > 
> > My point is that with current implementation it does not even make
> > sense to call btf__parse() for an ELF with distilled base,
> > because it would fail.
> 
> True (unless application loaded .BTF.base as stand-alone BTF first,
> but it's pretty advanced scenario)

In this scenario .BTF.base would be relocated against .BTF.base,
which is useless but not a failure.
Maybe having the _opts() variant with additional degree of control
(e.g. whether to ignore .BTF.base) is interesting as well.
On the other hand, for such use-cases libbpf provides btf__parse()
that accepts raw binary input, and application can extract ELF
contents by itself.

[...]

> I see what you are saying about resolve_btfids needing the changes
> either way, and that's true. But instead of adding (unnecessary, IMO)
> -R argument, resolve_btfids should be able to detect .BTF.base section
> presence and infer that this is distilled BTF case, and thus proceed
> with ignoring `-B <vmlinux>` argument (we can even complain that `-B
> vmlinux` is specified if distilled BTF is used, not sure.

+1 for complaining about -B vmlinux when .BTF.base should be used.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 07/11] libbpf: split BTF relocation
  2024-05-17 10:22 ` [PATCH v4 bpf-next 07/11] libbpf: split BTF relocation Alan Maguire
@ 2024-05-21 22:34   ` Andrii Nakryiko
  2024-05-23  1:06   ` Kui-Feng Lee
  1 sibling, 0 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2024-05-21 22:34 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, May 17, 2024 at 3:23 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Map distilled base BTF type ids referenced in split BTF and their
> references to the base BTF passed in, and if the mapping succeeds,
> reparent the split BTF to the base BTF.
>
> Relocation is done by first verifying that distilled base BTF
> only consists of named INT, FLOAT, ENUM, FWD, STRUCT and
> UNION kinds; then we sort these to speed lookups.  Once sorted,
> the base BTF is iterated, and for each relevant kind we check
> for an equivalent in distilled base BTF.  When found, the
> mapping from distilled -> base BTF id and string offset is recorded.
>
> Once all mappings are established, we can update type ids
> and string offsets in split BTF and reparent it to the new base.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  tools/lib/bpf/Build             |   2 +-
>  tools/lib/bpf/btf.c             |  17 ++
>  tools/lib/bpf/btf.h             |   8 +
>  tools/lib/bpf/btf_relocate.c    | 318 ++++++++++++++++++++++++++++++++
>  tools/lib/bpf/libbpf.map        |   1 +
>  tools/lib/bpf/libbpf_internal.h |   3 +
>  6 files changed, 348 insertions(+), 1 deletion(-)
>  create mode 100644 tools/lib/bpf/btf_relocate.c

[...]

>  LIBBPF_API int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts);
>
> +/**
> + * @brief **btf__relocate()** will check the split BTF *btf* for references
> + * to base BTF kinds, and verify those references are compatible with
> + * *base_btf*; if they are, *btf* is adjusted such that is re-parented to
> + * *base_btf* and type ids and strings are adjusted to accommodate this.
> + */

add boilerplate regarding return results?..

> +LIBBPF_API int btf__relocate(struct btf *btf, const struct btf *base_btf);
> +
>  struct btf_dump;
>
>  struct btf_dump_opts {
> diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c
> new file mode 100644
> index 000000000000..c06851f05472
> --- /dev/null
> +++ b/tools/lib/bpf/btf_relocate.c
> @@ -0,0 +1,318 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2024, Oracle and/or its affiliates. */
> +
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE
> +#endif
> +
> +#include "btf.h"
> +#include "bpf.h"
> +#include "libbpf.h"
> +#include "libbpf_internal.h"
> +
> +struct btf;
> +
> +struct btf_relocate {
> +       __u32 search_id;                                /* must be first field; see search below */

just put that comment before the field, why this horizontal placement?

[...]

> +
> +/* Comparison between base BTF type (search type) and distilled base types (target).
> + * Because there is no bsearch_r() we need to use the search key - which also is
> + * the first element of struct btf_relocate * - as a means to retrieve the
> + * struct btf_relocate *.
> + */
> +static int cmp_base_and_distilled_btf_types(const void *idbase, const void *iddist)
> +{
> +       struct btf_relocate *r = (struct btf_relocate *)idbase;
> +       const struct btf_type *tbase = btf_type_by_id(r->base_btf, *(__u32 *)idbase);
> +       const struct btf_type *tdist = btf_type_by_id(r->dist_base_btf, *(__u32 *)iddist);

boo, id_base or base_id, id_dist or dist_id, we went through such
naming already, I believe :)

I'd also use base_t and dist_t, like you do below with dist_t already

> +
> +       return strcmp(btf__name_by_offset(r->base_btf, tbase->name_off),
> +                     btf__name_by_offset(r->dist_base_btf, tdist->name_off));
> +}
> +
> +/* Build a map from distilled base BTF ids to base BTF ids. To do so, iterate
> + * through base BTF looking up distilled type (using binary search) equivalents.
> + */
> +static int btf_relocate_map_distilled_base(struct btf_relocate *r)
> +{
> +       struct btf_type *t;
> +       const char *name;
> +       __u32 id;
> +
> +       /* generate a sort index array of type ids sorted by name for distilled
> +        * base BTF to speed lookups.
> +        */
> +       for (id = 1; id < r->nr_dist_base_types; id++)
> +               r->dist_base_index[id] = id;
> +       qsort_r(r->dist_base_index, r->nr_dist_base_types, sizeof(__u32), cmp_btf_types,
> +               (struct btf *)r->dist_base_btf);

Is qsort_r() supported in musl and in Android'd libc implementation?
I'd rather not have to scramble to fix the build for them after
release.

> +

[...]

> +               r->search_id = id;
> +               dist_id = bsearch(&r->search_id, r->dist_base_index, r->nr_dist_base_types,
> +                                 sizeof(__u32), cmp_base_and_distilled_btf_types);
> +               if (!dist_id)
> +                       continue;
> +               if (!*dist_id || *dist_id > r->nr_dist_base_types) {

>=

> +                       pr_warn("base BTF id [%d] maps to invalid distilled base BTF id [%d]\n",
> +                               id, *dist_id);
> +                       return -EINVAL;
> +               }
> +               /* validate that kinds are compatible */
> +               dist_t = btf_type_by_id(r->dist_base_btf, *dist_id);
> +               dist_kind = btf_kind(dist_t);
> +               name = btf__name_by_offset(r->dist_base_btf, dist_t->name_off);
> +               compat_kind = dist_kind == kind;
> +               if (!compat_kind) {
> +                       switch (dist_kind) {
> +                       case BTF_KIND_FWD:
> +                               compat_kind = kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;

well, not quite. If we have FWD with kflag, then we should match it to
BTF_KIND_UNION, and otherwise to STRUCT. We shouldn't fix them.

also do we match FWD in *base BTF* with FWD in *distilled base BTF*?
That seems a bit wrong, no?


> +                               break;
> +                       case BTF_KIND_ENUM:
> +                               compat_kind = kind == BTF_KIND_ENUM64;
> +                               break;
> +                       default:
> +                               break;
> +                       }
> +                       if (!compat_kind) {
> +                               pr_warn("kind incompatibility (%d != %d) between distilled base type '%s'[%d] and base type [%d]\n",
> +                                       dist_kind, kind, name, *dist_id, id);
> +                               return -EINVAL;
> +                       }
> +               }

umm, what if we are !compat_kind here? go to next or error out, but
there has to be a check


> +               /* validate that int, float struct, union sizes are compatible;
> +                * distilled base BTF encodes an empty STRUCT/UNION with
> +                * specific size for cases where a type is embedded in a split
> +                * type (so has to preserve size info).  Do not error out
> +                * on mismatch as another size match may occur for an
> +                * identically-named type.
> +                */
> +               switch (btf_kind(dist_t)) {
> +               case BTF_KIND_INT:
> +                       if (*(__u32 *)(t + 1) != *(__u32 *)(dist_t + 1))
> +                               continue;
> +                       if (t->size != dist_t->size)
> +                               continue;
> +                       break;
> +               case BTF_KIND_FLOAT:
> +               case BTF_KIND_STRUCT:
> +               case BTF_KIND_UNION:
> +                       if (t->size != dist_t->size)
> +                               continue;
> +                       break;
> +               default:
> +                       break;
> +               }

I don't know, I feel like all these compatibility checks would be
cleaner to handle as part of single switch based on btf_kind(dist_t).
This split between that big if and switch is error-prone and hard to
follow

> +               /* map id and name */
> +               r->map[*dist_id] = id;
> +               r->str_map[dist_t->name_off] = t->name_off;
> +       }
> +       /* ensure all distilled BTF ids have a mapping... */
> +       for (id = 1; id < r->nr_dist_base_types; id++) {
> +               if (r->map[id])
> +                       continue;
> +               t = btf_type_by_id(r->dist_base_btf, id);
> +               name = btf__name_by_offset(r->dist_base_btf, t->name_off);
> +               pr_warn("distilled base BTF type '%s' [%d] is not mapped to base BTF id\n",
> +                       name, id);
> +               return -EINVAL;
> +       }
> +       return 0;
> +}
> +

[...]

> +static int btf_rewrite_strs(__u32 *str_off, void *ctx)
> +{
> +       struct btf_relocate *r = ctx;
> +       int off;
> +
> +       if (!*str_off)
> +               return 0;
> +       if (*str_off >= r->str_start) {
> +               *str_off += r->str_diff;
> +       } else {
> +               off = r->str_map[*str_off];
> +               if (!off) {
> +                       pr_warn("string '%s' [offset %d] is not mapped to base BTF",
> +                               btf__str_by_offset(r->btf, off), *str_off);

str_off is __u32, but you are using %d

> +                       return -ENOENT;
> +               }
> +               *str_off = off;
> +       }
> +       return 0;
> +}
> +
> +static int btf_relocate_finalize(struct btf_relocate *r)
> +{
> +       const struct btf_header *dist_base_hdr;
> +       const struct btf_header *base_hdr;
> +       struct btf_type *t;
> +       int i, err;
> +
> +       dist_base_hdr = btf_header(r->dist_base_btf);
> +       base_hdr = btf_header(r->base_btf);
> +       r->str_start = dist_base_hdr->str_len;
> +       r->str_diff = base_hdr->str_len - dist_base_hdr->str_len;

it's subjective, but I find str_diff a bit harder to follow compared
to just storing str_old_start and str_new_start, and then doing
obvious translation

str_off = str_off - str_old_start + str_new_start;

This is obvious and will work for any condition, whether old_start is
smaller or bigger than new_start. Same idea for ID translation.

Not a big deal, but I thought I'd call this out.

> +       for (i = 0; i < r->nr_split_types; i++) {
> +               t = btf_type_by_id(r->btf, i + r->nr_dist_base_types);
> +               err = btf_type_visit_str_offs(t, btf_rewrite_strs, r);
> +               if (err)
> +                       break;

return err? Why do we want to set btf_set_base_btf() in case of an error?

> +       }
> +       btf_set_base_btf(r->btf, r->base_btf);
> +
> +       return err;
> +}
> +
> +/* If successful, output of relocation is updated BTF with base BTF pointing
> + * at base_btf, and type ids, strings adjusted accordingly
> + */
> +int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **map_ids)
> +{
> +       unsigned int nr_types = btf__type_cnt(btf);
> +       struct btf_relocate r = {};
> +       struct btf_type *t;
> +       int diff_id, err = 0;
> +       __u32 id, i;
> +
> +       r.dist_base_btf = btf__base_btf(btf);
> +       if (!base_btf || r.dist_base_btf == base_btf)
> +               return 0;

Why is this not an error condition? Users shouldn't be calling
relocate on something that shouldn't be relocated.

> +
> +       r.nr_dist_base_types = btf__type_cnt(r.dist_base_btf);
> +       r.nr_base_types = btf__type_cnt(base_btf);
> +       r.nr_split_types = nr_types - r.nr_dist_base_types;
> +       r.btf = btf;
> +       r.base_btf = base_btf;
> +
> +       r.map = calloc(nr_types, sizeof(*r.map));

Is this an ID map? Then maybe call it id_map to be symmetrical to str_map?

> +       r.str_map = calloc(btf_header(r.dist_base_btf)->str_len, sizeof(*r.str_map));
> +       r.dist_base_index = calloc(r.nr_dist_base_types, sizeof(*r.dist_base_index));
> +       if (!r.map || !r.str_map || !r.dist_base_index) {
> +               err = -ENOMEM;
> +               goto err_out;
> +       }
> +
> +       err = btf_relocate_validate_distilled_base(&r);
> +       if (err)
> +               goto err_out;
> +
> +       diff_id = r.nr_base_types - r.nr_dist_base_types;
> +       /* Split BTF ids will start from after last base BTF id. */
> +       for (id = r.nr_dist_base_types; id < nr_types; id++)
> +               r.map[id] = id + diff_id;
> +
> +       /* Build a map from distilled base ids to actual base BTF ids; it is used
> +        * to update split BTF id references.
> +        */
> +       err = btf_relocate_map_distilled_base(&r);
> +       if (err)
> +               goto err_out;
> +
> +       /* Next, rewrite type ids in split BTF, replacing split ids with updated
> +        * ids based on number of types in base BTF, and base ids with
> +        * relocated ids from base_btf.
> +        */
> +       for (i = 0, id = r.nr_dist_base_types; i < r.nr_split_types; i++, id++) {
> +               t = btf_type_by_id(btf, id);
> +               err = btf_type_visit_type_ids(t, btf_relocate_rewrite_type_id, &r);
> +               if (err)
> +                       goto err_out;
> +       }
> +       /* Finally reset base BTF to base_btf; as part of this operation, string
> +        * offsets are also updated, and we are done.
> +        */
> +       err = btf_relocate_finalize(&r);
> +err_out:
> +       if (!err && map_ids)
> +               *map_ids = r.map;
> +       else
> +               free(r.map);

this is a bit convoluted. maybe something like


    err = btf_relocate_finalize(&r);
    if (err)
        goto err_out;

    if (map_ids) {
        *map_ids = r.map;
        r.map = NULL;
    }

err_out:
    ... all the free()s unconditionally ...


(even just doing only error case for err_out and duplicating a few
free()'s in success path seems nicer)

> +       free(r.str_map);
> +       free(r.dist_base_index);
> +       return err;
> +}

[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
  2024-05-21 22:15             ` Eduard Zingerman
@ 2024-05-21 22:36               ` Andrii Nakryiko
  0 siblings, 0 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2024-05-21 22:36 UTC (permalink / raw)
  To: Eduard Zingerman
  Cc: Alan Maguire, andrii, jolsa, acme, quentin, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Tue, May 21, 2024 at 3:15 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2024-05-21 at 15:01 -0700, Andrii Nakryiko wrote:
> > On Tue, May 21, 2024 at 12:08 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Tue, 2024-05-21 at 11:54 -0700, Andrii Nakryiko wrote:
> > >
> > > [...]
> > >
> > > > I'm probably leaning towards not doing automatic relocations in
> > > > btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
> > > > feature, if we need to teach resolve_btfids and bpftool to do
> > > > something extra for that case (i.e., call another API for relocation,
> > > > if necessary), then it's fine, doesn't seems like a problem.
> > >
> > > My point is that with current implementation it does not even make
> > > sense to call btf__parse() for an ELF with distilled base,
> > > because it would fail.
> >
> > True (unless application loaded .BTF.base as stand-alone BTF first,
> > but it's pretty advanced scenario)
>
> In this scenario .BTF.base would be relocated against .BTF.base,
> which is useless but not a failure.
> Maybe having the _opts() variant with additional degree of control
> (e.g. whether to ignore .BTF.base) is interesting as well.
> On the other hand, for such use-cases libbpf provides btf__parse()
> that accepts raw binary input, and application can extract ELF
> contents by itself.

I was just being pedantic :) We can always add a new _opts() variant
later, if necessary. I don't think this scenario is going to be a real
scenario, so no need to worry about it too much.

>
> [...]
>
> > I see what you are saying about resolve_btfids needing the changes
> > either way, and that's true. But instead of adding (unnecessary, IMO)
> > -R argument, resolve_btfids should be able to detect .BTF.base section
> > presence and infer that this is distilled BTF case, and thus proceed
> > with ignoring `-B <vmlinux>` argument (we can even complain that `-B
> > vmlinux` is specified if distilled BTF is used, not sure.
>
> +1 for complaining about -B vmlinux when .BTF.base should be used.

ok, let's

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 10/11] libbpf,bpf: share BTF relocate-related code with kernel
  2024-05-17 10:22 ` [PATCH v4 bpf-next 10/11] libbpf,bpf: share BTF relocate-related code with kernel Alan Maguire
@ 2024-05-21 22:59   ` Andrii Nakryiko
  0 siblings, 0 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2024-05-21 22:59 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Fri, May 17, 2024 at 3:24 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Share relocation implementation with the kernel.  As part of this,
> we also need the type/string visitation functions so add them to a
> btf_common.c file that also gets shared with the kernel. Relocation
> code in kernel and userspace is identical save for the impementation

typo: implementation

> of the reparenting of split BTF to the relocated base BTF and
> retrieval of BTF header from "struct btf"; these small functions
> need separate user-space and kernel implementations.
>
> One other wrinkle on the kernel side is we have to map .BTF.ids in
> modules as they were generated with the type ids used at BTF encoding
> time. btf_relocate() optionally returns an array mapping from old BTF
> ids to relocated ids, so we use that to fix up these references where
> needed for kfuncs.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  include/linux/btf.h          |  45 ++++++++++
>  kernel/bpf/Makefile          |   8 ++
>  kernel/bpf/btf.c             | 166 +++++++++++++++++++++++++----------
>  tools/lib/bpf/Build          |   2 +-
>  tools/lib/bpf/btf.c          | 130 ---------------------------
>  tools/lib/bpf/btf_common.c   | 143 ++++++++++++++++++++++++++++++

not a big fan of "btf_common" name, it tells nothing about what that
is about. Thinking a bit ahead, we are going to replace all those
callback-calling visitor helpers with iterators soon, so maybe we can
call this btf_iter.c (or at least btf_utils.c) for a more meaningful
name?

>  tools/lib/bpf/btf_relocate.c |  23 +++++
>  7 files changed, 341 insertions(+), 176 deletions(-)
>  create mode 100644 tools/lib/bpf/btf_common.c
>

[...]

>  #ifdef CONFIG_BPF_SYSCALL
>  const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
> +void btf_set_base_btf(struct btf *btf, struct btf *base_btf);
> +int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **map_ids);
> +int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx);
> +int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx);
>  const char *btf_name_by_offset(const struct btf *btf, u32 offset);
> +const char *btf_str_by_offset(const struct btf *btf, u32 offset);
>  struct btf *btf_parse_vmlinux(void);
>  struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog);
>  u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id,
> @@ -543,6 +564,30 @@ static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
>  {
>         return NULL;
>  }
> +
> +static inline void btf_set_base_btf(struct btf *btf, struct btf *base_btf)
> +{
> +       return;
> +}
> +
> +static inline int btf_relocate(void *log, struct btf *btf, const struct btf *base_btf,
> +                              __u32 **map_ids)
> +{
> +       return 0;

-EOPNOTSUPP?

> +}
> +
> +static inline int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit,
> +                                         void *ctx)
> +{
> +       return 0;

ditto

> +}
> +
> +static inline int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit,
> +                                         void *ctx)
> +{
> +       return 0;

ditto

> +}
> +
>  static inline const char *btf_name_by_offset(const struct btf *btf,
>                                              u32 offset)
>  {
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index 7eb9ad3a3ae6..612eef1228ca 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -52,3 +52,11 @@ obj-$(CONFIG_BPF_PRELOAD) += preload/
>  obj-$(CONFIG_BPF_SYSCALL) += relo_core.o
>  $(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE
>         $(call if_changed_rule,cc_o_c)
> +
> +obj-$(CONFIG_BPF_SYSCALL) += btf_common.o
> +$(obj)/btf_common.o: $(srctree)/tools/lib/bpf/btf_common.c FORCE
> +       $(call if_changed_rule,cc_o_c)
> +
> +obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o
> +$(obj)/btf_relocate.o: $(srctree)/tools/lib/bpf/btf_relocate.c FORCE
> +       $(call if_changed_rule,cc_o_c)

I believe make should allow us to do this with one rule, see all those
magical % rules

> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 821063660d9f..ebc127da4d79 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -274,6 +274,7 @@ struct btf {
>         u32 start_str_off; /* first string offset (0 for base BTF) */
>         char name[MODULE_NAME_LEN];
>         bool kernel_btf;
> +       __u32 *base_map; /* map from distilled base BTF -> vmlinux BTF ids */

please point out that it's an ID map in the name, so base_id_map. map
by itself is confusing.

>  };
>
>  enum verifier_phase {
> @@ -530,6 +531,11 @@ static bool btf_type_is_decl_tag_target(const struct btf_type *t)
>                btf_type_is_var(t) || btf_type_is_typedef(t);
>  }
>
> +static bool btf_is_vmlinux(const struct btf *btf)
> +{
> +       return btf->kernel_btf && !btf->base_btf;

there is actually a helper like this somewhere in the kernel. I can't
recall the name, but it checks the name ("vmlinux"), it would be nice
to avoid duplication of logic

> +}
> +
>  u32 btf_nr_types(const struct btf *btf)
>  {
>         u32 total = 0;
> @@ -772,7 +778,7 @@ static bool __btf_name_char_ok(char c, bool first)
>         return true;
>  }
>

[...]

> +struct btf *btf_parse_vmlinux(void)
> +{
> +       struct btf_verifier_env *env = NULL;
> +       struct bpf_verifier_log *log;
> +       struct btf *btf;
> +       int err;
> +
> +       env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
> +       if (!env)
> +               return ERR_PTR(-ENOMEM);
> +
> +       log = &env->log;
> +       log->level = BPF_LOG_KERNEL;
> +       btf = btf_parse_base(env, "vmlinux", __start_BTF, __stop_BTF - __start_BTF);
> +       if (!IS_ERR(btf)) {

nit: let's keep success case logic linear, instead of nesting it. It's
better to have a few goto err_out for error condition, but have a
linear unnested steps for success

> +               /* btf_parse_vmlinux() runs under bpf_verifier_lock */
> +               bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]);
> +               err = btf_alloc_id(btf);
> +               if (err) {
> +                       btf_free(btf);
> +                       btf = ERR_PTR(err);
> +               }
> +       }
> +       btf_verifier_env_free(env);
> +       return btf;
> +}
> +
>  #ifdef CONFIG_DEBUG_INFO_BTF_MODULES
>
> -static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size)
> +/* If .BTF_ids section was created with distilled base BTF, both base and
> + * split BTF ids will need to be mapped to actual base/split ids for
> + * BTF now that it has been relocated.
> + */
> +static __u32 btf_id_map(const struct btf *btf, __u32 id)

... and this should be named "btf_map_id" (prefix + verb + subject),
IMO (or even better btf_remap_id or btf_relocate_id, actually)

>  {
> +       if (!btf->base_btf || !btf->base_map)
> +               return id;
> +       return btf->base_map[id];
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 11/11] bpftool: support displaying relocated-with-base split BTF
  2024-05-17 10:22 ` [PATCH v4 bpf-next 11/11] bpftool: support displaying relocated-with-base split BTF Alan Maguire
@ 2024-05-22  9:04   ` Quentin Monnet
  0 siblings, 0 replies; 33+ messages in thread
From: Quentin Monnet @ 2024-05-22  9:04 UTC (permalink / raw)
  To: Alan Maguire, andrii, jolsa, acme
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan

Hi Alan, I see in your cover letter that you tried to address the nits
from my review on v3 for patch 11, but it seems the changes got lost at
some point, please double-check.


2024-05-17 11:23 UTC+0100 ~ Alan Maguire <alan.maguire@oracle.com>
> If the -R <base_btf> option is used, we can display BTF that has been
> generated with distilled base BTF in its relocated form.  For example
> for bpf_testmod.ko (which is built as an out-of-tree module, so has
> a distilled .BTF.base section:
> 
> bpftool btf dump file bpf_testmod.ko
> 
> Alternatively, we can display content relocated with
> (a possibly changed) base BTF via
> 
> bpftool btf dump -R /sys/kernel/btf/vmlinux bpf_testmod.ko
> 
> The latter mirrors how the kernel will handle such split
> BTF; it relocates its representation with the running
> kernel, and if successful, renumbers BTF ids to reference
> the current vmlinux BTF.
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
> Reviewed-by: Quentin Monnet <qmo@kernel.org>
> ---
>  tools/bpf/bpftool/Documentation/bpftool-btf.rst | 15 ++++++++++++++-
>  tools/bpf/bpftool/bash-completion/bpftool       |  7 ++++---
>  tools/bpf/bpftool/btf.c                         | 11 ++++++++++-
>  tools/bpf/bpftool/main.c                        | 14 +++++++++++++-
>  tools/bpf/bpftool/main.h                        |  2 ++
>  5 files changed, 43 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-btf.rst b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
> index 3f6bca03ad2e..b11abebeae81 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-btf.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
> @@ -16,7 +16,7 @@ SYNOPSIS
>  
>  **bpftool** [*OPTIONS*] **btf** *COMMAND*
>  
> -*OPTIONS* := { |COMMON_OPTIONS| | { **-B** | **--base-btf** } }
> +*OPTIONS* := { |COMMON_OPTIONS| | { **-B** | **--base-btf** } { **-R** | **relocate-base-btf** } }

Please add the missing double-dash on --relocate-base-btf

[...]

> diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
> index 08d0ac543c67..69d4906bec5c 100644
> --- a/tools/bpf/bpftool/main.c
> +++ b/tools/bpf/bpftool/main.c
> @@ -32,6 +32,8 @@ bool verifier_logs;
>  bool relaxed_maps;
>  bool use_loader;
>  struct btf *base_btf;
> +struct btf *relocate_base_btf;
> +const char *relocate_base_btf_path;
>  struct hashmap *refs_table;
>  
>  static void __noreturn clean_and_exit(int i)
> @@ -448,6 +450,7 @@ int main(int argc, char **argv)
>  		{ "debug",	no_argument,	NULL,	'd' },
>  		{ "use-loader",	no_argument,	NULL,	'L' },
>  		{ "base-btf",	required_argument, NULL, 'B' },
> +		{ "relocate-base-btf", required_argument, NULL, 'R' },

And please use tabs here.

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF
  2024-05-21 22:01           ` Andrii Nakryiko
  2024-05-21 22:15             ` Eduard Zingerman
@ 2024-05-22 16:16             ` Alan Maguire
  1 sibling, 0 replies; 33+ messages in thread
From: Alan Maguire @ 2024-05-22 16:16 UTC (permalink / raw)
  To: Andrii Nakryiko, Eduard Zingerman
  Cc: andrii, jolsa, acme, quentin, mykolal, ast, daniel, martin.lau,
	song, yonghong.song, john.fastabend, kpsingh, sdf, haoluo,
	houtao1, bpf, masahiroy, mcgrof, nathan

On 21/05/2024 23:01, Andrii Nakryiko wrote:
> On Tue, May 21, 2024 at 12:08 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>>
>> On Tue, 2024-05-21 at 11:54 -0700, Andrii Nakryiko wrote:
>>
>> [...]
>>
>>> I'm probably leaning towards not doing automatic relocations in
>>> btf__parse(), tbh. Distilled BTF is a rather special kernel-specific
>>> feature, if we need to teach resolve_btfids and bpftool to do
>>> something extra for that case (i.e., call another API for relocation,
>>> if necessary), then it's fine, doesn't seems like a problem.
>>
>> My point is that with current implementation it does not even make
>> sense to call btf__parse() for an ELF with distilled base,
>> because it would fail.
> 
> True (unless application loaded .BTF.base as stand-alone BTF first,
> but it's pretty advanced scenario)
> 
>>
>> And selecting BTF encoding based on a few retries seems like a kludge
>> if there is a simple way to check if distilled base has to be used
>> (presence of the .BTF.base section).
> 
> agreed
> 
>>
>>> Much worse is having to do some workarounds to prevent an API from
>>> doing some undesired extra steps (like in resolve_btfids not wanting a
>>> relocation). Orthogonality FTW, IMO.
>>
>> For resolve_btfids it is a bit different, imo.
>> It does want some base: for in-tree modules it wants vmlinux,
>> for out-of-tree it wants distilled base.
>> So it has to be adjusted either way.
> 
> Ok, so I read some more code and re-read your discussion w/ Alan. I
> agree with your proposal, I think it's logical (even if relocation
> does feel a bit "extra" for "parse"-like API, but ok, whatever).
> 
> I see what you are saying about resolve_btfids needing the changes
> either way, and that's true. But instead of adding (unnecessary, IMO)
> -R argument, resolve_btfids should be able to detect .BTF.base section
> presence and infer that this is distilled BTF case, and thus proceed
> with ignoring `-B <vmlinux>` argument (we can even complain that `-B
> vmlinux` is specified if distilled BTF is used, not sure.

That's a good idea; we already have ELF parsing in resolve_btfids, so as
we can detect the .BTF.base presence there easily, no need for a
parameter. I put together a quick proof-of-concept and it works well.

This simplifies things considerably; we can eliminate the bpftool
patches completely since ELF parsing does the right thing already, and
the resolve_btfids change is small, with no user interface-visible
changes there.

Thanks!

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF
  2024-05-21 21:48   ` Andrii Nakryiko
@ 2024-05-22 16:42     ` Alan Maguire
  2024-05-22 16:57       ` Andrii Nakryiko
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Maguire @ 2024-05-22 16:42 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On 21/05/2024 22:48, Andrii Nakryiko wrote:
> On Fri, May 17, 2024 at 3:23 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> To support more robust split BTF, adding supplemental context for the
>> base BTF type ids that split BTF refers to is required.  Without such
>> references, a simple shuffling of base BTF type ids (without any other
>> significant change) invalidates the split BTF.  Here the attempt is made
>> to store additional context to make split BTF more robust.
>>
>> This context comes in the form of distilled base BTF providing minimal
>> information (name and - in some cases - size) for base INTs, FLOATs,
>> STRUCTs, UNIONs, ENUMs and ENUM64s along with modified split BTF that
>> points at that base and contains any additional types needed (such as
>> TYPEDEF, PTR and anonymous STRUCT/UNION declarations).  This
>> information constitutes the minimal BTF representation needed to
>> disambiguate or remove split BTF references to base BTF.  The rules
>> are as follows:
>>
>> - INT, FLOAT are recorded in full.
>> - if a named base BTF STRUCT or UNION is referred to from split BTF, it
>>   will be encoded either as a zero-member sized STRUCT/UNION (preserving
>>   size for later relocation checks) or as a named FWD.  Only base BTF
>>   STRUCT/UNIONs that are either embedded in split BTF STRUCT/UNIONs or
>>   that have multiple STRUCT/UNION instances of the same name need to
>>   preserve size information, so a FWD representation will be used in
>>   most cases.
>> - if an ENUM[64] is named, a ENUM forward representation (an ENUM
>>   with no values) is used.
>> - in all other cases, the type is added to the new split BTF.
>>
>> Avoiding struct/union/enum/enum64 expansion is important to keep the
>> distilled base BTF representation to a minimum size.
>>
>> When successful, new representations of the distilled base BTF and new
>> split BTF that refers to it are returned.  Both need to be freed by the
>> caller.
>>
>> So to take a simple example, with split BTF with a type referring
>> to "struct sk_buff", we will generate distilled base BTF with a
>> FWD struct sk_buff, and the split BTF will refer to it instead.
>>
>> Tools like pahole can utilize such split BTF to populate the .BTF
>> section (split BTF) and an additional .BTF.base section.  Then
>> when the split BTF is loaded, the distilled base BTF can be used
>> to relocate split BTF to reference the current (and possibly changed)
>> base BTF.
>>
>> So for example if "struct sk_buff" was id 502 when the split BTF was
>> originally generated,  we can use the distilled base BTF to see that
>> id 502 refers to a "struct sk_buff" and replace instances of id 502
>> with the current (relocated) base BTF sk_buff type id.
>>
>> Distilled base BTF is small; when building a kernel with all modules
>> using distilled base BTF as a test, ovreall module size grew by only
> 
> typo: overall
> 
>> 5.3Mb total across ~2700 modules.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>>  tools/lib/bpf/btf.c      | 409 ++++++++++++++++++++++++++++++++++++++-
>>  tools/lib/bpf/btf.h      |  20 ++
>>  tools/lib/bpf/libbpf.map |   1 +
>>  3 files changed, 424 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
>> index 2d0840ef599a..953929d196c3 100644
>> --- a/tools/lib/bpf/btf.c
>> +++ b/tools/lib/bpf/btf.c
>> @@ -1771,9 +1771,8 @@ static int btf_rewrite_str(__u32 *str_off, void *ctx)
>>         return 0;
>>  }
>>
>> -int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
>> +static int btf_add_type(struct btf_pipe *p, const struct btf_type *src_type)
>>  {
>> -       struct btf_pipe p = { .src = src_btf, .dst = btf };
>>         struct btf_type *t;
>>         int sz, err;
>>
>> @@ -1782,20 +1781,27 @@ int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_t
>>                 return libbpf_err(sz);
>>
>>         /* deconstruct BTF, if necessary, and invalidate raw_data */
>> -       if (btf_ensure_modifiable(btf))
>> +       if (btf_ensure_modifiable(p->dst))
>>                 return libbpf_err(-ENOMEM);
>>
>> -       t = btf_add_type_mem(btf, sz);
>> +       t = btf_add_type_mem(p->dst, sz);
>>         if (!t)
>>                 return libbpf_err(-ENOMEM);
>>
>>         memcpy(t, src_type, sz);
>>
>> -       err = btf_type_visit_str_offs(t, btf_rewrite_str, &p);
>> +       err = btf_type_visit_str_offs(t, btf_rewrite_str, p);
>>         if (err)
>>                 return libbpf_err(err);
>>
>> -       return btf_commit_type(btf, sz);
>> +       return btf_commit_type(p->dst, sz);
>> +}
>> +
>> +int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
>> +{
>> +       struct btf_pipe p = { .src = src_btf, .dst = btf };
>> +
>> +       return btf_add_type(&p, src_type);
>>  }
>>
>>  static int btf_rewrite_type_ids(__u32 *type_id, void *ctx)
>> @@ -5212,3 +5218,394 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
>>
>>         return 0;
>>  }
>> +
>> +#define BTF_NEEDS_SIZE (1 << 31)       /* flag set if either struct/union is
>> +                                        * embedded - and thus size info must
>> +                                        * be preserved - or if there are
>> +                                        * multiple instances of the same
>> +                                        * struct/union - where size can be
>> +                                        * used to clarify which is wanted.
>> +                                        */
> 
> please use full line comment in front of #define
>

yep, will do.

>> +#define BTF_ID(id)             (id & ~BTF_NEEDS_SIZE)
>> +
>> +struct btf_distill {
>> +       struct btf_pipe pipe;
>> +       int *ids;
> 
> reading the rest of the code, this BTF_NEEDS_SIZE and BTF_ID() macro
> use was quite distracting. I'm wondering if it would be better to use
> a simple struct with bitfields here, e.g.,
> 
> struct type_state {
>     int id: 31;
>     bool needs_size;
> };
> 
> struct dist_state *states;
> 
> Same memory usage, same efficiency, but more readable code. WDYT?
>

Yeah, I saw Eduard used that approach elsewhere, it's much neater.
However in other discussion with Eduard we talked about moving the
embedded/duplicate validation to relocation time. The motivation for
that is that if we record sizes for all STRUCTs and UNIONs, we can then
be selective about size checks at relocation time.  This is particularly
relevant for the duplicate name case since it's possible a name either
is no longer duplicate in the new vmlinux, or in the new vmlinux has a
duplicate where the original vmlinux did not. In other words it's the
"new world" of the vmlinux we're trying to relocate with that we're
really concerned with, so preserving all size information until we see
that new vmlinux and can spot duplicates and embedded types where the
size checks need to be enforced makes sense I think.

In that context, we mark embedded types prior to assigning mappings, and
don't mark duplicate names at all in the map since it is duplicates in
the new vmlinux we're interested in. So the way I'd approached it is to
have a BTF_IS_EMBEDDED (casting -1) value that the map values would set
when they found a split BTF struct/union with an embedded base BTF type.
That later gets overwritten when we do the map assignments so it never
gets exposed to the user and we can still return a __u32 array of BTF
ids to the caller.
>> +       unsigned int split_start_id;
>> +       unsigned int split_start_str;
>> +       unsigned int diff_id;
>> +};
>> +
>> +/* Check if a member of a split BTF struct/union refers to a base BTF
> 
> nit: comments uses "check" terminology, function name uses "find",
> while really it "marks" some time as embedded... Let's use consistent
> terminology (where mark seems like the most fitting, IMO)
> 

Sure, I'll try and be more consistent around naming here.

>> + * struct/union.  Members can be const/restrict/volatile/typedef
>> + * reference types, but if a pointer is encountered, type is no longer
>> + * considered embedded.
>> + */
>> +static int btf_find_embedded_composite_type_ids(__u32 *id, void *ctx)
>> +{
>> +       struct btf_distill *dist = ctx;
>> +       const struct btf_type *t;
>> +       __u32 next_id = *id;
>> +
>> +       do {
>> +               if (next_id == 0)
>> +                       return 0;
>> +               t = btf_type_by_id(dist->pipe.src, next_id);
>> +               switch (btf_kind(t)) {
>> +               case BTF_KIND_CONST:
>> +               case BTF_KIND_RESTRICT:
>> +               case BTF_KIND_VOLATILE:
>> +               case BTF_KIND_TYPEDEF:
>> +               case BTF_KIND_TYPE_TAG:
>> +                       next_id = t->type;
>> +                       break;
>> +               case BTF_KIND_ARRAY: {
>> +                       struct btf_array *a = btf_array(t);
>> +
>> +                       next_id = a->type;
>> +                       break;
>> +               }
>> +               case BTF_KIND_STRUCT:
>> +               case BTF_KIND_UNION:
>> +                       dist->ids[next_id] |= BTF_NEEDS_SIZE;
>> +                       return 0;
>> +               default:
>> +                       return 0;
>> +               }
>> +
>> +       } while (1);
> 
> nit: while (true)
> 
>> +
>> +       return 0;
>> +}
>> +
>> +/* Check if composite type has a duplicate-named type; if it does, retain
> 
> see above about check vs mark, here at least the function name uses "mark" :)
> 
>> + * size information to help guide later relocation towards the correct type.
>> + * For example there are duplicate 'dma_chan' structs in vmlinux BTF;
>> + * one is size 112, the other 16.  Having size information allows relocation
>> + * to choose the right one.
> 
> re: this dma_chan and similar cases. Is it ever a problem where a
> module actually needs two dma_chan in distilled base BTF? Name
> conflicts do happen, but intuitively I'd expect this to happen between
> some vmlinux-internal (so to speak, i.e., not the type used in
> exported functions/types) and kernel module-specific types. It would
> be awkward for the same module to use two different types that are
> named the same.
> 
> Have you seen this as a problem in practice? What if for now we just
> error out if there are two conflicting types required in distilled
> BTF?

Funnily enough I ran into it with "dma_chan" itself when trying to load
an in-tree module which I'd forced (along with the rest of the tree) to
be built with distilled base BTF.  And it was also an embedded struct
too, so we got 2 for 1 there ;-)

But it does seem like it's a legitimate problem, so if we can address it
I think it'd be worth trying. Even the size checks (applied at
relocation time) would be better than nothing I think.

The only other way I could think that we could disambiguate things would
be to add the duplicate-named type to the split BTF (as we do with the
anonymous structs).  That would remove the ambiguity, but expose us to
the risk of having to pull in a lot more types. I can't imagine a
duplicate-named type would ever figure in a kfunc signature, so it
should be safe to do. But I think either way we probably have to have
some sort of answer for this problem.

> 
>> + */
>> +static int btf_mark_composite_dups(struct btf_distill *dist, __u32 id)
>> +{
>> +       __u8 *cnt = calloc(dist->split_start_str, sizeof(__u8));
> 
> nit: we generally follow that initialization of variable shouldn't be
> doing non-trivial operations. So let's do calloc() as a separate
> statement outside of variable declaration.
> 

Sure, will do.

>> +       struct btf_type *t;
>> +       int i;
>> +
>> +       if (!cnt)
>> +               return -ENOMEM;
>> +
>> +       /* First pass; collect name counts for composite types. */
>> +       for (i = 1; i < dist->split_start_id; i++) {
>> +               t = btf_type_by_id(dist->pipe.src, i);
>> +               if (!btf_is_composite(t) || !t->name_off)
>> +                       continue;
>> +               if (cnt[t->name_off] < 255)
>> +                       cnt[t->name_off]++;
>> +       }
>> +       /* Second pass; mark composite types with multiple instances of the
>> +        * same name as needing size information.
>> +        */
>> +       for (i = 1; i < dist->split_start_id; i++) {
>> +               /* id not needed or is already preserving size information */
>> +               if (!dist->ids[i] || (dist->ids[i] & BTF_NEEDS_SIZE))
>> +                       continue;
>> +               t = btf_type_by_id(dist->pipe.src, i);
>> +               if (!btf_is_composite(t) || !t->name_off)
>> +                       continue;
>> +               if (cnt[t->name_off] > 1)
>> +                       dist->ids[i] |= BTF_NEEDS_SIZE;
>> +       }
>> +       free(cnt);
>> +
>> +       return 0;
>> +}
>> +
>> +static bool btf_is_eligible_named_fwd(const struct btf_type *t)
>> +{
>> +       return (btf_is_composite(t) || btf_is_any_enum(t)) && t->name_off != 0;
>> +}
>> +
>> +static int btf_add_distilled_type_ids(__u32 *id, void *ctx)
>> +{
>> +       struct btf_distill *dist = ctx;
>> +       struct btf_type *t = btf_type_by_id(dist->pipe.src, *id);
>> +       int err;
>> +
>> +       if (!*id)
>> +               return 0;
>> +       /* split BTF id, not needed */
>> +       if (*id >= dist->split_start_id)
>> +               return 0;
>> +       /* already added ? */
>> +       if (BTF_ID(dist->ids[*id]) > 0)
>> +               return 0;
>> +
>> +       /* only a subset of base BTF types should be referenced from split
>> +        * BTF; ensure nothing unexpected is referenced.
>> +        */
>> +       switch (btf_kind(t)) {
>> +       case BTF_KIND_INT:
>> +       case BTF_KIND_FLOAT:
>> +       case BTF_KIND_FWD:
>> +       case BTF_KIND_ARRAY:
>> +       case BTF_KIND_STRUCT:
>> +       case BTF_KIND_UNION:
>> +       case BTF_KIND_TYPEDEF:
>> +       case BTF_KIND_ENUM:
>> +       case BTF_KIND_ENUM64:
>> +       case BTF_KIND_PTR:
>> +       case BTF_KIND_CONST:
>> +       case BTF_KIND_RESTRICT:
>> +       case BTF_KIND_VOLATILE:
>> +       case BTF_KIND_FUNC_PROTO:
>> +       case BTF_KIND_TYPE_TAG:
>> +               dist->ids[*id] |= *id;
>> +               break;
>> +       default:
>> +               pr_warn("unexpected reference to base type[%u] of kind [%u] when creating distilled base BTF.\n",
>> +                       *id, btf_kind(t));
>> +               return -EINVAL;
>> +       }
>> +
>> +       /* struct/union members not needed, except for anonymous structs
>> +        * and unions, which we need since name won't help us determine
>> +        * matches; so if a named struct/union, no need to recurse
>> +        * into members.
>> +        */
> 
> is this an outdated comment? we shouldn't have anonymous types in the
> distilled base, right?

Yep, they go into split BTF. However, we might need to add their members
to our distilled base; for example if we hadn't yet added an int and had

	struct {
		int field;
	};

...we'd want to make sure we added an INT to our distilled base. I'll
try and clarify the comment a bit more.

> 
>> +       if (btf_is_eligible_named_fwd(t))
>> +               return 0;
>> +
>> +       /* ensure references in type are added also. */
>> +       err = btf_type_visit_type_ids(t, btf_add_distilled_type_ids, ctx);
>> +       if (err < 0)
>> +               return err;
>> +       return 0;
> 
> nit: could be just `return btf_type_visit_type_ids(...);`?
> 

good idea.

>> +}
>> +
>> +static int btf_add_distilled_types(struct btf_distill *dist)
>> +{
>> +       bool adding_to_base = dist->pipe.dst->start_id == 1;
>> +       int id = btf__type_cnt(dist->pipe.dst);
>> +       struct btf_type *t;
>> +       int i, err = 0;
>> +
>> +       /* Add types for each of the required references to either distilled
>> +        * base or split BTF, depending on type characteristics.
>> +        */
>> +       for (i = 1; i < dist->split_start_id; i++) {
>> +               const char *name;
>> +               int kind;
>> +
>> +               if (!BTF_ID(dist->ids[i]))
>> +                       continue;
>> +               t = btf_type_by_id(dist->pipe.src, i);
>> +               kind = btf_kind(t);
>> +               name = btf__name_by_offset(dist->pipe.src, t->name_off);
>> +
>> +               /* Named int, float, fwd struct, union, enum[64] are added to
>> +                * base; everything else is added to split BTF.
>> +                */
>> +               switch (kind) {
>> +               case BTF_KIND_INT:
>> +               case BTF_KIND_FLOAT:
>> +               case BTF_KIND_FWD:
>> +               case BTF_KIND_STRUCT:
>> +               case BTF_KIND_UNION:
>> +               case BTF_KIND_ENUM:
>> +               case BTF_KIND_ENUM64:
>> +                       if ((adding_to_base && !t->name_off) || (!adding_to_base && t->name_off))
>> +                               continue;
>> +                       break;
>> +               default:
>> +                       if (adding_to_base)
>> +                               continue;
>> +                       break;
>> +               }
>> +               if (dist->ids[i] & BTF_NEEDS_SIZE) {
>> +                       /* If a named struct/union in base BTF is referenced as a type
>> +                        * in split BTF without use of a pointer - i.e. as an embedded
>> +                        * struct/union - add an empty struct/union preserving size
>> +                        * since size must be consistent when relocating split and
>> +                        * possibly changed base BTF.  Similarly, when a struct/union
>> +                        * has multiple instances of the same name in the original
>> +                        * base BTF, retain size to help relocation later pick the
>> +                        * right struct/union.
>> +                        */
>> +                       err = btf_add_composite(dist->pipe.dst, kind, name, t->size);
>> +               } else if (btf_is_eligible_named_fwd(t)) {
>> +                       /* If not embedded, use a fwd for named struct/unions since we
>> +                        * can match via name without any other details.
>> +                        */
>> +                       switch (kind) {
>> +                       case BTF_KIND_STRUCT:
>> +                               err = btf__add_fwd(dist->pipe.dst, name, BTF_FWD_STRUCT);
>> +                               break;
>> +                       case BTF_KIND_UNION:
>> +                               err = btf__add_fwd(dist->pipe.dst, name, BTF_FWD_UNION);
>> +                               break;
>> +                       case BTF_KIND_ENUM:
>> +                               err = btf__add_enum(dist->pipe.dst, name, t->size);
>> +                               break;
>> +                       case BTF_KIND_ENUM64:
> 
> nit: combine ENUM/ENUM64 cases?
>

yep, good catch.

>> +                               err = btf__add_enum(dist->pipe.dst, name, t->size);
>> +                               break;
>> +                       default:
>> +                               pr_warn("unexpected kind [%u] when creating distilled base BTF.\n",
>> +                                       btf_kind(t));
>> +                               return -EINVAL;
>> +                       }
>> +               } else {
>> +                       err = btf_add_type(&dist->pipe, t);
> 
> So this should never happen if adding_to_base == true, is that right?
> Can we check this? Or am I missing something as usual?
> 

We're currently adding INTs and FLOATs to the base also, so they are two
cases where we end up here I think.

>> +               }
>> +               if (err < 0)
>> +                       break;
>> +               dist->ids[i] = id++;
>> +       }
>> +       return err;
>> +}
>> +
> 
> [...]
> 
>> +       n = btf__type_cnt(new_split);
>> +       /* Now update base/split BTF ids. */
>> +       for (i = 1; i < n; i++) {
>> +               t = btf_type_by_id(new_split, i);
>> +
>> +               err = btf_type_visit_type_ids(t, btf_update_distilled_type_ids, &dist);
>> +               if (err < 0)
>> +                       goto err_out;
>> +       }
>> +       free(dist.ids);
>> +       hashmap__free(dist.pipe.str_off_map);
>> +       *new_base_btf = new_base;
>> +       *new_split_btf = new_split;
>> +       return 0;
>> +err_out:
>> +       free(dist.ids);
>> +       if (!IS_ERR(dist.pipe.str_off_map))
> 
> you don't need to check this, hashmap__free() works for IS_ERR()
> pointers as well
>

ah, good to know, thanks!


>> +               hashmap__free(dist.pipe.str_off_map);
>> +       btf__free(new_split);
>> +       btf__free(new_base);
>> +       return libbpf_err(err);
>> +}
> 
> [...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF
  2024-05-22 16:42     ` Alan Maguire
@ 2024-05-22 16:57       ` Andrii Nakryiko
  0 siblings, 0 replies; 33+ messages in thread
From: Andrii Nakryiko @ 2024-05-22 16:57 UTC (permalink / raw)
  To: Alan Maguire
  Cc: andrii, jolsa, acme, quentin, eddyz87, mykolal, ast, daniel,
	martin.lau, song, yonghong.song, john.fastabend, kpsingh, sdf,
	haoluo, houtao1, bpf, masahiroy, mcgrof, nathan

On Wed, May 22, 2024 at 9:42 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 21/05/2024 22:48, Andrii Nakryiko wrote:
> > On Fri, May 17, 2024 at 3:23 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> To support more robust split BTF, adding supplemental context for the
> >> base BTF type ids that split BTF refers to is required.  Without such
> >> references, a simple shuffling of base BTF type ids (without any other
> >> significant change) invalidates the split BTF.  Here the attempt is made
> >> to store additional context to make split BTF more robust.
> >>
> >> This context comes in the form of distilled base BTF providing minimal
> >> information (name and - in some cases - size) for base INTs, FLOATs,
> >> STRUCTs, UNIONs, ENUMs and ENUM64s along with modified split BTF that
> >> points at that base and contains any additional types needed (such as
> >> TYPEDEF, PTR and anonymous STRUCT/UNION declarations).  This
> >> information constitutes the minimal BTF representation needed to
> >> disambiguate or remove split BTF references to base BTF.  The rules
> >> are as follows:
> >>
> >> - INT, FLOAT are recorded in full.
> >> - if a named base BTF STRUCT or UNION is referred to from split BTF, it
> >>   will be encoded either as a zero-member sized STRUCT/UNION (preserving
> >>   size for later relocation checks) or as a named FWD.  Only base BTF
> >>   STRUCT/UNIONs that are either embedded in split BTF STRUCT/UNIONs or
> >>   that have multiple STRUCT/UNION instances of the same name need to
> >>   preserve size information, so a FWD representation will be used in
> >>   most cases.
> >> - if an ENUM[64] is named, a ENUM forward representation (an ENUM
> >>   with no values) is used.
> >> - in all other cases, the type is added to the new split BTF.
> >>
> >> Avoiding struct/union/enum/enum64 expansion is important to keep the
> >> distilled base BTF representation to a minimum size.
> >>
> >> When successful, new representations of the distilled base BTF and new
> >> split BTF that refers to it are returned.  Both need to be freed by the
> >> caller.
> >>
> >> So to take a simple example, with split BTF with a type referring
> >> to "struct sk_buff", we will generate distilled base BTF with a
> >> FWD struct sk_buff, and the split BTF will refer to it instead.
> >>
> >> Tools like pahole can utilize such split BTF to populate the .BTF
> >> section (split BTF) and an additional .BTF.base section.  Then
> >> when the split BTF is loaded, the distilled base BTF can be used
> >> to relocate split BTF to reference the current (and possibly changed)
> >> base BTF.
> >>
> >> So for example if "struct sk_buff" was id 502 when the split BTF was
> >> originally generated,  we can use the distilled base BTF to see that
> >> id 502 refers to a "struct sk_buff" and replace instances of id 502
> >> with the current (relocated) base BTF sk_buff type id.
> >>
> >> Distilled base BTF is small; when building a kernel with all modules
> >> using distilled base BTF as a test, ovreall module size grew by only
> >
> > typo: overall
> >
> >> 5.3Mb total across ~2700 modules.
> >>
> >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >> ---
> >>  tools/lib/bpf/btf.c      | 409 ++++++++++++++++++++++++++++++++++++++-
> >>  tools/lib/bpf/btf.h      |  20 ++
> >>  tools/lib/bpf/libbpf.map |   1 +
> >>  3 files changed, 424 insertions(+), 6 deletions(-)
> >>

[...]

> >> +#define BTF_ID(id)             (id & ~BTF_NEEDS_SIZE)
> >> +
> >> +struct btf_distill {
> >> +       struct btf_pipe pipe;
> >> +       int *ids;
> >
> > reading the rest of the code, this BTF_NEEDS_SIZE and BTF_ID() macro
> > use was quite distracting. I'm wondering if it would be better to use
> > a simple struct with bitfields here, e.g.,
> >
> > struct type_state {
> >     int id: 31;
> >     bool needs_size;
> > };
> >
> > struct dist_state *states;
> >
> > Same memory usage, same efficiency, but more readable code. WDYT?
> >
>
> Yeah, I saw Eduard used that approach elsewhere, it's much neater.
> However in other discussion with Eduard we talked about moving the
> embedded/duplicate validation to relocation time. The motivation for
> that is that if we record sizes for all STRUCTs and UNIONs, we can then
> be selective about size checks at relocation time.  This is particularly
> relevant for the duplicate name case since it's possible a name either
> is no longer duplicate in the new vmlinux, or in the new vmlinux has a
> duplicate where the original vmlinux did not. In other words it's the
> "new world" of the vmlinux we're trying to relocate with that we're
> really concerned with, so preserving all size information until we see
> that new vmlinux and can spot duplicates and embedded types where the
> size checks need to be enforced makes sense I think.

Sure, I think it makes sense (though let's see how much more
complexity we add to relocation code).

>
> In that context, we mark embedded types prior to assigning mappings, and
> don't mark duplicate names at all in the map since it is duplicates in
> the new vmlinux we're interested in. So the way I'd approached it is to
> have a BTF_IS_EMBEDDED (casting -1) value that the map values would set
> when they found a split BTF struct/union with an embedded base BTF type.
> That later gets overwritten when we do the map assignments so it never
> gets exposed to the user and we can still return a __u32 array of BTF
> ids to the caller.

My only question/concern is duplicate named entries in distilled base.
How will that work with binary search (at least the exact variant you
are using). But I'm not sure I understand all the nuances of what you
agreed on.

[...]

> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +/* Check if composite type has a duplicate-named type; if it does, retain
> >
> > see above about check vs mark, here at least the function name uses "mark" :)
> >
> >> + * size information to help guide later relocation towards the correct type.
> >> + * For example there are duplicate 'dma_chan' structs in vmlinux BTF;
> >> + * one is size 112, the other 16.  Having size information allows relocation
> >> + * to choose the right one.
> >
> > re: this dma_chan and similar cases. Is it ever a problem where a
> > module actually needs two dma_chan in distilled base BTF? Name
> > conflicts do happen, but intuitively I'd expect this to happen between
> > some vmlinux-internal (so to speak, i.e., not the type used in
> > exported functions/types) and kernel module-specific types. It would
> > be awkward for the same module to use two different types that are
> > named the same.
> >
> > Have you seen this as a problem in practice? What if for now we just
> > error out if there are two conflicting types required in distilled
> > BTF?
>
> Funnily enough I ran into it with "dma_chan" itself when trying to load
> an in-tree module which I'd forced (along with the rest of the tree) to
> be built with distilled base BTF.  And it was also an embedded struct
> too, so we got 2 for 1 there ;-)
>
> But it does seem like it's a legitimate problem, so if we can address it
> I think it'd be worth trying. Even the size checks (applied at
> relocation time) would be better than nothing I think.
>
> The only other way I could think that we could disambiguate things would
> be to add the duplicate-named type to the split BTF (as we do with the
> anonymous structs).  That would remove the ambiguity, but expose us to
> the risk of having to pull in a lot more types. I can't imagine a
> duplicate-named type would ever figure in a kfunc signature, so it
> should be safe to do. But I think either way we probably have to have
> some sort of answer for this problem.
>

Ok, just keep in mind that duplicated names in distilled base BTF do
cause issues when "joining" actual vmlinux BTF and distilled base BTF.

[...]

> >> +
> >> +       /* struct/union members not needed, except for anonymous structs
> >> +        * and unions, which we need since name won't help us determine
> >> +        * matches; so if a named struct/union, no need to recurse
> >> +        * into members.
> >> +        */
> >
> > is this an outdated comment? we shouldn't have anonymous types in the
> > distilled base, right?
>
> Yep, they go into split BTF. However, we might need to add their members
> to our distilled base; for example if we hadn't yet added an int and had
>
>         struct {
>                 int field;
>         };
>
> ...we'd want to make sure we added an INT to our distilled base. I'll
> try and clarify the comment a bit more.

I see, it makes sense. Yes, let's add a comment, it's a bit subtle.

>
> >
> >> +       if (btf_is_eligible_named_fwd(t))
> >> +               return 0;
> >> +

[...]

> >> +                               err = btf__add_enum(dist->pipe.dst, name, t->size);
> >> +                               break;
> >> +                       default:
> >> +                               pr_warn("unexpected kind [%u] when creating distilled base BTF.\n",
> >> +                                       btf_kind(t));
> >> +                               return -EINVAL;
> >> +                       }
> >> +               } else {
> >> +                       err = btf_add_type(&dist->pipe, t);
> >
> > So this should never happen if adding_to_base == true, is that right?
> > Can we check this? Or am I missing something as usual?
> >
>
> We're currently adding INTs and FLOATs to the base also, so they are two
> cases where we end up here I think.

let's have them as another explicit case and then warn for everything
we don't expect to be added? I'm trying to prevent silent problems we
haven't thought about or that will appear in the future due to BTF
extension, so let's be conservative everywhere.

>
> >> +               }
> >> +               if (err < 0)
> >> +                       break;
> >> +               dist->ids[i] = id++;
> >> +       }
> >> +       return err;
> >> +}
> >> +
> >

[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF
  2024-05-17 10:22 ` [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
  2024-05-21 21:48   ` Andrii Nakryiko
@ 2024-05-22 18:00   ` Kui-Feng Lee
  1 sibling, 0 replies; 33+ messages in thread
From: Kui-Feng Lee @ 2024-05-22 18:00 UTC (permalink / raw)
  To: Alan Maguire, andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan



On 5/17/24 03:22, Alan Maguire wrote:
> To support more robust split BTF, adding supplemental context for the
> base BTF type ids that split BTF refers to is required.  Without such
> references, a simple shuffling of base BTF type ids (without any other
> significant change) invalidates the split BTF.  Here the attempt is made
> to store additional context to make split BTF more robust.
> 
> This context comes in the form of distilled base BTF providing minimal
> information (name and - in some cases - size) for base INTs, FLOATs,
> STRUCTs, UNIONs, ENUMs and ENUM64s along with modified split BTF that
> points at that base and contains any additional types needed (such as
> TYPEDEF, PTR and anonymous STRUCT/UNION declarations).  This
> information constitutes the minimal BTF representation needed to
> disambiguate or remove split BTF references to base BTF.  The rules
> are as follows:
> 
> - INT, FLOAT are recorded in full.
> - if a named base BTF STRUCT or UNION is referred to from split BTF, it
>    will be encoded either as a zero-member sized STRUCT/UNION (preserving
>    size for later relocation checks) or as a named FWD.  Only base BTF
>    STRUCT/UNIONs that are either embedded in split BTF STRUCT/UNIONs or
>    that have multiple STRUCT/UNION instances of the same name need to
>    preserve size information, so a FWD representation will be used in
>    most cases.
> - if an ENUM[64] is named, a ENUM forward representation (an ENUM
>    with no values) is used.
> - in all other cases, the type is added to the new split BTF.
> 
> Avoiding struct/union/enum/enum64 expansion is important to keep the
> distilled base BTF representation to a minimum size.
> 
> When successful, new representations of the distilled base BTF and new
> split BTF that refers to it are returned.  Both need to be freed by the
> caller.
> 
> So to take a simple example, with split BTF with a type referring
> to "struct sk_buff", we will generate distilled base BTF with a
> FWD struct sk_buff, and the split BTF will refer to it instead.
> 
> Tools like pahole can utilize such split BTF to populate the .BTF
> section (split BTF) and an additional .BTF.base section.  Then
> when the split BTF is loaded, the distilled base BTF can be used
> to relocate split BTF to reference the current (and possibly changed)
> base BTF.
> 
> So for example if "struct sk_buff" was id 502 when the split BTF was
> originally generated,  we can use the distilled base BTF to see that
> id 502 refers to a "struct sk_buff" and replace instances of id 502
> with the current (relocated) base BTF sk_buff type id.
> 
> Distilled base BTF is small; when building a kernel with all modules
> using distilled base BTF as a test, ovreall module size grew by only
> 5.3Mb total across ~2700 modules.
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>   tools/lib/bpf/btf.c      | 409 ++++++++++++++++++++++++++++++++++++++-
>   tools/lib/bpf/btf.h      |  20 ++
>   tools/lib/bpf/libbpf.map |   1 +
>   3 files changed, 424 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 2d0840ef599a..953929d196c3 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -1771,9 +1771,8 @@ static int btf_rewrite_str(__u32 *str_off, void *ctx)
>   	return 0;
>   }
>   
> -int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
> +static int btf_add_type(struct btf_pipe *p, const struct btf_type *src_type)
>   {
> -	struct btf_pipe p = { .src = src_btf, .dst = btf };
>   	struct btf_type *t;
>   	int sz, err;
>   
> @@ -1782,20 +1781,27 @@ int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_t
>   		return libbpf_err(sz);
>   
>   	/* deconstruct BTF, if necessary, and invalidate raw_data */
> -	if (btf_ensure_modifiable(btf))
> +	if (btf_ensure_modifiable(p->dst))
>   		return libbpf_err(-ENOMEM);
>   
> -	t = btf_add_type_mem(btf, sz);
> +	t = btf_add_type_mem(p->dst, sz);
>   	if (!t)
>   		return libbpf_err(-ENOMEM);
>   
>   	memcpy(t, src_type, sz);
>   
> -	err = btf_type_visit_str_offs(t, btf_rewrite_str, &p);
> +	err = btf_type_visit_str_offs(t, btf_rewrite_str, p);
>   	if (err)
>   		return libbpf_err(err);
>   
> -	return btf_commit_type(btf, sz);
> +	return btf_commit_type(p->dst, sz);
> +}
> +
> +int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type)
> +{
> +	struct btf_pipe p = { .src = src_btf, .dst = btf };
> +
> +	return btf_add_type(&p, src_type);
>   }
>   
>   static int btf_rewrite_type_ids(__u32 *type_id, void *ctx)
> @@ -5212,3 +5218,394 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void
>   
>   	return 0;
>   }
> +
> +#define BTF_NEEDS_SIZE	(1 << 31)	/* flag set if either struct/union is
> +					 * embedded - and thus size info must
> +					 * be preserved - or if there are
> +					 * multiple instances of the same
> +					 * struct/union - where size can be
> +					 * used to clarify which is wanted.
> +					 */
> +#define BTF_ID(id)		(id & ~BTF_NEEDS_SIZE)
> +
> +struct btf_distill {
> +	struct btf_pipe pipe;
> +	int *ids;
> +	unsigned int split_start_id;
> +	unsigned int split_start_str;
> +	unsigned int diff_id;
> +};
> +
> +/* Check if a member of a split BTF struct/union refers to a base BTF
> + * struct/union.  Members can be const/restrict/volatile/typedef
> + * reference types, but if a pointer is encountered, type is no longer
> + * considered embedded.
> + */
> +static int btf_find_embedded_composite_type_ids(__u32 *id, void *ctx)
> +{
> +	struct btf_distill *dist = ctx;
> +	const struct btf_type *t;
> +	__u32 next_id = *id;
> +
> +	do {
> +		if (next_id == 0)
> +			return 0;
> +		t = btf_type_by_id(dist->pipe.src, next_id);
> +		switch (btf_kind(t)) {
> +		case BTF_KIND_CONST:
> +		case BTF_KIND_RESTRICT:
> +		case BTF_KIND_VOLATILE:
> +		case BTF_KIND_TYPEDEF:
> +		case BTF_KIND_TYPE_TAG:
> +			next_id = t->type;
> +			break;
> +		case BTF_KIND_ARRAY: {
> +			struct btf_array *a = btf_array(t);
> +
> +			next_id = a->type;
> +			break;
> +		}
> +		case BTF_KIND_STRUCT:
> +		case BTF_KIND_UNION:
> +			dist->ids[next_id] |= BTF_NEEDS_SIZE;
> +			return 0;
> +		default:
> +			return 0;
> +		}
> +
> +	} while (1);
> +
> +	return 0;
> +}
> +
> +/* Check if composite type has a duplicate-named type; if it does, retain
> + * size information to help guide later relocation towards the correct type.
> + * For example there are duplicate 'dma_chan' structs in vmlinux BTF;
> + * one is size 112, the other 16.  Having size information allows relocation
> + * to choose the right one.
> + */
> +static int btf_mark_composite_dups(struct btf_distill *dist, __u32 id)
> +{
> +	__u8 *cnt = calloc(dist->split_start_str, sizeof(__u8));
> +	struct btf_type *t;
> +	int i;
> +
> +	if (!cnt)
> +		return -ENOMEM;
> +
> +	/* First pass; collect name counts for composite types. */
> +	for (i = 1; i < dist->split_start_id; i++) {
> +		t = btf_type_by_id(dist->pipe.src, i);
> +		if (!btf_is_composite(t) || !t->name_off)
> +			continue;
> +		if (cnt[t->name_off] < 255)
> +			cnt[t->name_off]++;
> +	}
> +	/* Second pass; mark composite types with multiple instances of the
> +	 * same name as needing size information.
> +	 */
> +	for (i = 1; i < dist->split_start_id; i++) {
> +		/* id not needed or is already preserving size information */
> +		if (!dist->ids[i] || (dist->ids[i] & BTF_NEEDS_SIZE))
> +			continue;
> +		t = btf_type_by_id(dist->pipe.src, i);
> +		if (!btf_is_composite(t) || !t->name_off)
> +			continue;
> +		if (cnt[t->name_off] > 1)
> +			dist->ids[i] |= BTF_NEEDS_SIZE;
> +	}
> +	free(cnt);
> +
> +	return 0;
> +}
> +
> +static bool btf_is_eligible_named_fwd(const struct btf_type *t)
> +{
> +	return (btf_is_composite(t) || btf_is_any_enum(t)) && t->name_off != 0;
> +}
> +
> +static int btf_add_distilled_type_ids(__u32 *id, void *ctx)
> +{
> +	struct btf_distill *dist = ctx;
> +	struct btf_type *t = btf_type_by_id(dist->pipe.src, *id);
> +	int err;
> +
> +	if (!*id)
> +		return 0;
> +	/* split BTF id, not needed */
> +	if (*id >= dist->split_start_id)
> +		return 0;
> +	/* already added ? */
> +	if (BTF_ID(dist->ids[*id]) > 0)
> +		return 0;
> +
> +	/* only a subset of base BTF types should be referenced from split
> +	 * BTF; ensure nothing unexpected is referenced.
> +	 */
> +	switch (btf_kind(t)) {
> +	case BTF_KIND_INT:
> +	case BTF_KIND_FLOAT:
> +	case BTF_KIND_FWD:
> +	case BTF_KIND_ARRAY:
> +	case BTF_KIND_STRUCT:
> +	case BTF_KIND_UNION:
> +	case BTF_KIND_TYPEDEF:
> +	case BTF_KIND_ENUM:
> +	case BTF_KIND_ENUM64:
> +	case BTF_KIND_PTR:
> +	case BTF_KIND_CONST:
> +	case BTF_KIND_RESTRICT:
> +	case BTF_KIND_VOLATILE:
> +	case BTF_KIND_FUNC_PROTO:
> +	case BTF_KIND_TYPE_TAG:
> +		dist->ids[*id] |= *id;
> +		break;
> +	default:
> +		pr_warn("unexpected reference to base type[%u] of kind [%u] when creating distilled base BTF.\n",
> +			*id, btf_kind(t));
> +		return -EINVAL;
> +	}
> +
> +	/* struct/union members not needed, except for anonymous structs
> +	 * and unions, which we need since name won't help us determine
> +	 * matches; so if a named struct/union, no need to recurse
> +	 * into members.
> +	 */
> +	if (btf_is_eligible_named_fwd(t))
> +		return 0;
> +
> +	/* ensure references in type are added also. */
> +	err = btf_type_visit_type_ids(t, btf_add_distilled_type_ids, ctx);
> +	if (err < 0)
> +		return err;
> +	return 0;
> +}
> +
> +static int btf_add_distilled_types(struct btf_distill *dist)
> +{
> +	bool adding_to_base = dist->pipe.dst->start_id == 1;
> +	int id = btf__type_cnt(dist->pipe.dst);
> +	struct btf_type *t;
> +	int i, err = 0;
> +
> +	/* Add types for each of the required references to either distilled
> +	 * base or split BTF, depending on type characteristics.
> +	 */
> +	for (i = 1; i < dist->split_start_id; i++) {
> +		const char *name;
> +		int kind;
> +
> +		if (!BTF_ID(dist->ids[i]))
> +			continue;
> +		t = btf_type_by_id(dist->pipe.src, i);
> +		kind = btf_kind(t);
> +		name = btf__name_by_offset(dist->pipe.src, t->name_off);
> +
> +		/* Named int, float, fwd struct, union, enum[64] are added to
> +		 * base; everything else is added to split BTF.
> +		 */
> +		switch (kind) {
> +		case BTF_KIND_INT:
> +		case BTF_KIND_FLOAT:
> +		case BTF_KIND_FWD:
> +		case BTF_KIND_STRUCT:
> +		case BTF_KIND_UNION:
> +		case BTF_KIND_ENUM:
> +		case BTF_KIND_ENUM64:
> +			if ((adding_to_base && !t->name_off) || (!adding_to_base && t->name_off))
> +				continue;
> +			break;
> +		default:
> +			if (adding_to_base)
> +				continue;
> +			break;
> +		}
> +		if (dist->ids[i] & BTF_NEEDS_SIZE) {
> +			/* If a named struct/union in base BTF is referenced as a type
> +			 * in split BTF without use of a pointer - i.e. as an embedded
> +			 * struct/union - add an empty struct/union preserving size
> +			 * since size must be consistent when relocating split and
> +			 * possibly changed base BTF.  Similarly, when a struct/union
> +			 * has multiple instances of the same name in the original
> +			 * base BTF, retain size to help relocation later pick the
> +			 * right struct/union.
> +			 */
> +			err = btf_add_composite(dist->pipe.dst, kind, name, t->size);
> +		} else if (btf_is_eligible_named_fwd(t)) {
> +			/* If not embedded, use a fwd for named struct/unions since we
> +			 * can match via name without any other details.
> +			 */
> +			switch (kind) {
> +			case BTF_KIND_STRUCT:
> +				err = btf__add_fwd(dist->pipe.dst, name, BTF_FWD_STRUCT);
> +				break;
> +			case BTF_KIND_UNION:
> +				err = btf__add_fwd(dist->pipe.dst, name, BTF_FWD_UNION);
> +				break;
> +			case BTF_KIND_ENUM:
> +				err = btf__add_enum(dist->pipe.dst, name, t->size);
> +				break;
> +			case BTF_KIND_ENUM64:
> +				err = btf__add_enum(dist->pipe.dst, name, t->size);

Should this be btf__add_enum64() instead of btf__add_enum()?

> +				break;
> +			default:
> +				pr_warn("unexpected kind [%u] when creating distilled base BTF.\n",
> +					btf_kind(t));
> +				return -EINVAL;
> +			}
> +		} else {
> +			err = btf_add_type(&dist->pipe, t);
> +		}
> +		if (err < 0)
> +			break;
> +		dist->ids[i] = id++;
> +	}
> +	return err;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4 bpf-next 07/11] libbpf: split BTF relocation
  2024-05-17 10:22 ` [PATCH v4 bpf-next 07/11] libbpf: split BTF relocation Alan Maguire
  2024-05-21 22:34   ` Andrii Nakryiko
@ 2024-05-23  1:06   ` Kui-Feng Lee
  1 sibling, 0 replies; 33+ messages in thread
From: Kui-Feng Lee @ 2024-05-23  1:06 UTC (permalink / raw)
  To: Alan Maguire, andrii, jolsa, acme, quentin
  Cc: eddyz87, mykolal, ast, daniel, martin.lau, song, yonghong.song,
	john.fastabend, kpsingh, sdf, haoluo, houtao1, bpf, masahiroy,
	mcgrof, nathan



On 5/17/24 03:22, Alan Maguire wrote:
> Map distilled base BTF type ids referenced in split BTF and their
> references to the base BTF passed in, and if the mapping succeeds,
> reparent the split BTF to the base BTF.
> 
> Relocation is done by first verifying that distilled base BTF
> only consists of named INT, FLOAT, ENUM, FWD, STRUCT and
> UNION kinds; then we sort these to speed lookups.  Once sorted,
> the base BTF is iterated, and for each relevant kind we check
> for an equivalent in distilled base BTF.  When found, the
> mapping from distilled -> base BTF id and string offset is recorded.
> 
> Once all mappings are established, we can update type ids
> and string offsets in split BTF and reparent it to the new base.
> 
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
[...]
> +/* Comparison between base BTF type (search type) and distilled base types (target).
> + * Because there is no bsearch_r() we need to use the search key - which also is
> + * the first element of struct btf_relocate * - as a means to retrieve the
> + * struct btf_relocate *.
> + */
> +static int cmp_base_and_distilled_btf_types(const void *idbase, const void *iddist)
> +{
> +	struct btf_relocate *r = (struct btf_relocate *)idbase;
> +	const struct btf_type *tbase = btf_type_by_id(r->base_btf, *(__u32 *)idbase);

"*(__u32 *)idbase" together with the previous line is a little difficult
to decrypt. Using "r->search_id" here is more intuitive, easier to read.

> +	const struct btf_type *tdist = btf_type_by_id(r->dist_base_btf, *(__u32 *)iddist);
> +
> +	return strcmp(btf__name_by_offset(r->base_btf, tbase->name_off),
> +		      btf__name_by_offset(r->dist_base_btf, tdist->name_off));
> +}
> +
> +/* Build a map from distilled base BTF ids to base BTF ids. To do so, iterate
> + * through base BTF looking up distilled type (using binary search) equivalents.
> + */
> +static int btf_relocate_map_distilled_base(struct btf_relocate *r)
> +{
> +	struct btf_type *t;
> +	const char *name;
> +	__u32 id;
> +
> +	/* generate a sort index array of type ids sorted by name for distilled
> +	 * base BTF to speed lookups.
> +	 */
> +	for (id = 1; id < r->nr_dist_base_types; id++)
> +		r->dist_base_index[id] = id;
> +	qsort_r(r->dist_base_index, r->nr_dist_base_types, sizeof(__u32), cmp_btf_types,
> +		(struct btf *)r->dist_base_btf);
> +
> +	for (id = 1; id < r->nr_base_types; id++) {
> +		struct btf_type *dist_t;
> +		int dist_kind, kind;
> +		bool compat_kind;
> +		__u32 *dist_id;
> +
> +		t = btf_type_by_id(r->base_btf, id);
> +		kind = btf_kind(t);
> +		/* distilled base consists of named types only. */
> +		if (!t->name_off)
> +			continue;
> +		switch (kind) {
> +		case BTF_KIND_INT:
> +		case BTF_KIND_FLOAT:
> +		case BTF_KIND_ENUM:
> +		case BTF_KIND_ENUM64:
> +		case BTF_KIND_FWD:
> +		case BTF_KIND_STRUCT:
> +		case BTF_KIND_UNION:
> +			break;
> +		default:
> +			continue;
> +		}
> +		r->search_id = id;
> +		dist_id = bsearch(&r->search_id, r->dist_base_index, r->nr_dist_base_types,
> +				  sizeof(__u32), cmp_base_and_distilled_btf_types);
> +		if (!dist_id)
> +			continue;
> +		if (!*dist_id || *dist_id > r->nr_dist_base_types) {
> +			pr_warn("base BTF id [%d] maps to invalid distilled base BTF id [%d]\n",
> +				id, *dist_id);
> +			return -EINVAL;
> +		}
> +		/* validate that kinds are compatible */
> +		dist_t = btf_type_by_id(r->dist_base_btf, *dist_id);
> +		dist_kind = btf_kind(dist_t);
> +		name = btf__name_by_offset(r->dist_base_btf, dist_t->name_off);
> +		compat_kind = dist_kind == kind;
> +		if (!compat_kind) {
> +			switch (dist_kind) {
> +			case BTF_KIND_FWD:
> +				compat_kind = kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
> +				break;
> +			case BTF_KIND_ENUM:
> +				compat_kind = kind == BTF_KIND_ENUM64;
> +				break;
> +			default:
> +				break;
> +			}
> +			if (!compat_kind) {
> +				pr_warn("kind incompatibility (%d != %d) between distilled base type '%s'[%d] and base type [%d]\n",
> +					dist_kind, kind, name, *dist_id, id);
> +				return -EINVAL;
> +			}
> +		}
> +		/* validate that int, float struct, union sizes are compatible;
> +		 * distilled base BTF encodes an empty STRUCT/UNION with
> +		 * specific size for cases where a type is embedded in a split
> +		 * type (so has to preserve size info).  Do not error out
> +		 * on mismatch as another size match may occur for an
> +		 * identically-named type.
> +		 */
> +		switch (btf_kind(dist_t)) {
> +		case BTF_KIND_INT:
> +			if (*(__u32 *)(t + 1) != *(__u32 *)(dist_t + 1))
> +				continue;

I know we have code like this here and there. But, could we just use
btf_int_encoding() and btf_int_offset() or invent another function to
return this value and make this comparison more meaningful?
Or just a line of comment to explain what it is.

> +			if (t->size != dist_t->size)
> +				continue;
> +			break;
> +		case BTF_KIND_FLOAT:
> +		case BTF_KIND_STRUCT:
> +		case BTF_KIND_UNION:
> +			if (t->size != dist_t->size)
> +				continue;
> +			break;
> +		default:
> +			break;
> +		}
> +		/* map id and name */
> +		r->map[*dist_id] = id;
> +		r->str_map[dist_t->name_off] = t->name_off;
> +	}
> +	/* ensure all distilled BTF ids have a mapping... */
> +	for (id = 1; id < r->nr_dist_base_types; id++) {
> +		if (r->map[id])
> +			continue;
> +		t = btf_type_by_id(r->dist_base_btf, id);
> +		name = btf__name_by_offset(r->dist_base_btf, t->name_off);
> +		pr_warn("distilled base BTF type '%s' [%d] is not mapped to base BTF id\n",
> +			name, id);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
[...]


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-05-23  1:06 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-17 10:22 [PATCH v4 bpf-next 00/11] bpf: support resilient split BTF Alan Maguire
2024-05-17 10:22 ` [PATCH v4 bpf-next 01/11] libbpf: add btf__distill_base() creating split BTF with distilled base BTF Alan Maguire
2024-05-21 21:48   ` Andrii Nakryiko
2024-05-22 16:42     ` Alan Maguire
2024-05-22 16:57       ` Andrii Nakryiko
2024-05-22 18:00   ` Kui-Feng Lee
2024-05-17 10:22 ` [PATCH v4 bpf-next 02/11] selftests/bpf: test distilled base, split BTF generation Alan Maguire
2024-05-17 10:22 ` [PATCH v4 bpf-next 03/11] libbpf: add btf__parse_opts() API for flexible BTF parsing Alan Maguire
2024-05-21 22:01   ` Andrii Nakryiko
2024-05-17 10:22 ` [PATCH v4 bpf-next 04/11] bpftool: support displaying raw split BTF using base BTF section as base Alan Maguire
2024-05-17 10:22 ` [PATCH v4 bpf-next 05/11] resolve_btfids: use .BTF.base ELF section as base BTF if -B option is used Alan Maguire
2024-05-17 10:22 ` [PATCH v4 bpf-next 06/11] kbuild, bpf: add module-specific pahole/resolve_btfids flags for distilled base BTF Alan Maguire
2024-05-17 10:22 ` [PATCH v4 bpf-next 07/11] libbpf: split BTF relocation Alan Maguire
2024-05-21 22:34   ` Andrii Nakryiko
2024-05-23  1:06   ` Kui-Feng Lee
2024-05-17 10:22 ` [PATCH v4 bpf-next 08/11] selftests/bpf: extend distilled BTF tests to cover " Alan Maguire
2024-05-17 10:22 ` [PATCH v4 bpf-next 09/11] module, bpf: store BTF base pointer in struct module Alan Maguire
2024-05-17 10:22 ` [PATCH v4 bpf-next 10/11] libbpf,bpf: share BTF relocate-related code with kernel Alan Maguire
2024-05-21 22:59   ` Andrii Nakryiko
2024-05-17 10:22 ` [PATCH v4 bpf-next 11/11] bpftool: support displaying relocated-with-base split BTF Alan Maguire
2024-05-22  9:04   ` Quentin Monnet
     [not found] ` <CA+JHD93=ZcVN4GxepbRF6SLorWJjw0gCgJZUYxQG5hxFehdHUw@mail.gmail.com>
2024-05-17 11:56   ` [PATCH v4 bpf-next 00/11] bpf: support resilient " Alan Maguire
2024-05-17 21:09 ` Eduard Zingerman
2024-05-20  9:36   ` Alan Maguire
2024-05-18  2:38 ` Eduard Zingerman
2024-05-21  9:15   ` Alan Maguire
2024-05-21 16:19     ` Eduard Zingerman
2024-05-21 18:54       ` Andrii Nakryiko
2024-05-21 19:08         ` Eduard Zingerman
2024-05-21 22:01           ` Andrii Nakryiko
2024-05-21 22:15             ` Eduard Zingerman
2024-05-21 22:36               ` Andrii Nakryiko
2024-05-22 16:16             ` Alan Maguire

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).