* [RFC bpf-next 01/15] bpf: Extend UAPI to support location information
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
@ 2025-10-08 17:34 ` Alan Maguire
2025-10-16 18:36 ` Andrii Nakryiko
2025-10-23 0:56 ` Eduard Zingerman
2025-10-08 17:34 ` [RFC bpf-next 02/15] libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
` (15 subsequent siblings)
16 siblings, 2 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:34 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
Add BTF_KIND_LOC_PARAM, BTF_KIND_LOC_PROTO and BTF_KIND_LOCSEC
to help represent location information for functions.
BTF_KIND_LOC_PARAM is used to represent how we retrieve data at a
location; either via a register, or register+offset or a
constant value.
BTF_KIND_LOC_PROTO represents location information about a location
with multiple BTF_KIND_LOC_PARAMs.
And finally BTF_KIND_LOCSEC is a set of location sites, each
of which has
- a name (function name)
- a function prototype specifying which types are associated
with parameters
- a location prototype specifying where to find those parameters
- an address offset
This can be used to represent
- a fully-inlined function
- a partially-inlined function where some _LOC_PROTOs represent
inlined sites as above and others have normal _FUNC representations
- a function with optimized parameters; again the FUNC_PROTO
represents the original function, with LOC info telling us
where to obtain each parameter (or 0 if the parameter is
unobtainable)
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
include/linux/btf.h | 29 +++++-
include/uapi/linux/btf.h | 85 ++++++++++++++++-
kernel/bpf/btf.c | 168 ++++++++++++++++++++++++++++++++-
tools/include/uapi/linux/btf.h | 85 ++++++++++++++++-
4 files changed, 359 insertions(+), 8 deletions(-)
diff --git a/include/linux/btf.h b/include/linux/btf.h
index f06976ffb63f..65091c6aff4b 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -487,6 +487,33 @@ static inline struct btf_enum64 *btf_enum64(const struct btf_type *t)
return (struct btf_enum64 *)(t + 1);
}
+static inline struct btf_loc_param *btf_loc_param(const struct btf_type *t)
+{
+ return (struct btf_loc_param *)(t + 1);
+}
+
+static inline __s32 btf_loc_param_size(const struct btf_type *t)
+{
+ return (__s32)t->size;
+}
+
+static inline __u64 btf_loc_param_value(const struct btf_type *t)
+{
+ __u32 *v = (__u32 *)(t + 1);
+
+ return *v + ((__u64)(*(v + 1)) << 32);
+}
+
+static inline __u32 *btf_loc_params(const struct btf_type *t)
+{
+ return (__u32 *)(t + 1);
+}
+
+static inline struct btf_loc *btf_type_loc_secinfo(const struct btf_type *t)
+{
+ return (struct btf_loc *)(t + 1);
+}
+
static inline const struct btf_var_secinfo *btf_type_var_secinfo(
const struct btf_type *t)
{
@@ -552,7 +579,7 @@ struct btf_field_desc {
/* member struct size, or zero, if no members */
int m_sz;
/* repeated per-member offsets */
- int m_off_cnt, m_offs[1];
+ int m_off_cnt, m_offs[2];
};
struct btf_field_iter {
diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
index 266d4ffa6c07..a74b9d202847 100644
--- a/include/uapi/linux/btf.h
+++ b/include/uapi/linux/btf.h
@@ -37,14 +37,16 @@ struct btf_type {
* bits 29-30: unused
* bit 31: kind_flag, currently used by
* struct, union, enum, fwd, enum64,
- * decl_tag and type_tag
+ * decl_tag, type_tag and loc
*/
__u32 info;
- /* "size" is used by INT, ENUM, STRUCT, UNION, DATASEC and ENUM64.
+ /* "size" is used by INT, ENUM, STRUCT, UNION, DATASEC, ENUM64
+ * and LOC.
+ *
* "size" tells the size of the type it is describing.
*
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
- * FUNC, FUNC_PROTO, VAR, DECL_TAG and TYPE_TAG.
+ * FUNC, FUNC_PROTO, VAR, DECL_TAG, TYPE_TAG and LOC_PROTO.
* "type" is a type_id referring to another type.
*/
union {
@@ -78,6 +80,9 @@ enum {
BTF_KIND_DECL_TAG = 17, /* Decl Tag */
BTF_KIND_TYPE_TAG = 18, /* Type Tag */
BTF_KIND_ENUM64 = 19, /* Enumeration up to 64-bit values */
+ BTF_KIND_LOC_PARAM = 20, /* Location parameter information */
+ BTF_KIND_LOC_PROTO = 21, /* Location prototype for site */
+ BTF_KIND_LOCSEC = 22, /* Location section */
NR_BTF_KINDS,
BTF_KIND_MAX = NR_BTF_KINDS - 1,
@@ -198,4 +203,78 @@ struct btf_enum64 {
__u32 val_hi32;
};
+/* BTF_KIND_LOC_PARAM consists a btf_type specifying a vlen of 0, name_off is 0
+ * and is followed by a singular "struct btf_loc_param". type/size specifies
+ * the size of the associated location value. The size value should be
+ * cast to a __s32 as negative sizes can be specified; -8 to indicate a signed
+ * 8 byte value for example.
+ *
+ * If kind_flag is 1 the btf_loc is a constant value, otherwise it represents
+ * a register, possibly dereferencing it with the specified offset.
+ *
+ * "struct btf_type" is followed by a "struct btf_loc_param" which consists
+ * of either the 64-bit value or the register number, offset etc.
+ * Interpretation depends on whether the kind_flag is set as described above.
+ */
+
+/* BTF_KIND_LOC_PARAM specifies a signed size; negative values represent signed
+ * values of the specific size, for example -8 is an 8-byte signed value.
+ */
+#define BTF_TYPE_LOC_PARAM_SIZE(t) ((__s32)((t)->size))
+
+/* location param specified by reg + offset is a dereference */
+#define BTF_LOC_FLAG_REG_DEREF 0x1
+/* next location param is needed to specify parameter location also; for example
+ * when two registers are used to store a 16-byte struct by value.
+ */
+#define BTF_LOC_FLAG_CONTINUE 0x2
+
+struct btf_loc_param {
+ union {
+ struct {
+ __u16 reg; /* register number */
+ __u16 flags; /* register dereference */
+ __s32 offset; /* offset from register-stored address */
+ };
+ struct {
+ __u32 val_lo32; /* lo 32 bits of 64-bit value */
+ __u32 val_hi32; /* hi 32 bits of 64-bit value */
+ };
+ };
+};
+
+/* BTF_KIND_LOC_PROTO specifies location prototypes; i.e. how locations relate
+ * to parameters; a struct btf_type of BTF_KIND_LOC_PROTO is followed by a
+ * a vlen-specified number of __u32 which specify the associated
+ * BTF_KIND_LOC_PARAM for each function parameter associated with the
+ * location. The type should either be 0 (no location info) or point at
+ * a BTF_KIND_LOC_PARAM. Multiple BTF_KIND_LOC_PARAMs can be used to
+ * represent a single function parameter; in such a case each should specify
+ * BTF_LOC_FLAG_CONTINUE.
+ *
+ * The type field in the associated "struct btf_type" should point at an
+ * associated BTF_KIND_FUNC_PROTO.
+ */
+
+/* BTF_KIND_LOCSEC consists of vlen-specified number of "struct btf_loc"
+ * containing location site-specific information;
+ *
+ * - name associated with the location (name_off)
+ * - function prototype type id (func_proto)
+ * - location prototype type id (loc_proto)
+ * - address offset (offset)
+ */
+
+struct btf_loc {
+ __u32 name_off;
+ __u32 func_proto;
+ __u32 loc_proto;
+ __u32 offset;
+};
+
+/* helps libbpf know that location declarations are present; libbpf
+ * can then work around absence if this value is not set.
+ */
+#define BTF_KIND_LOC_UAPI_DEFINED 1
+
#endif /* _UAPI__LINUX_BTF_H__ */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 0de8fc8a0e0b..29cec549f119 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -342,6 +342,9 @@ static const char * const btf_kind_str[NR_BTF_KINDS] = {
[BTF_KIND_DECL_TAG] = "DECL_TAG",
[BTF_KIND_TYPE_TAG] = "TYPE_TAG",
[BTF_KIND_ENUM64] = "ENUM64",
+ [BTF_KIND_LOC_PARAM] = "LOC_PARAM",
+ [BTF_KIND_LOC_PROTO] = "LOC_PROTO",
+ [BTF_KIND_LOCSEC] = "LOCSEC"
};
const char *btf_type_str(const struct btf_type *t)
@@ -509,11 +512,27 @@ static bool btf_type_is_decl_tag(const struct btf_type *t)
return BTF_INFO_KIND(t->info) == BTF_KIND_DECL_TAG;
}
+static bool btf_type_is_loc_param(const struct btf_type *t)
+{
+ return BTF_INFO_KIND(t->info) == BTF_KIND_LOC_PARAM;
+}
+
+static bool btf_type_is_loc_proto(const struct btf_type *t)
+{
+ return BTF_INFO_KIND(t->info) == BTF_KIND_LOC_PROTO;
+}
+
+static bool btf_type_is_locsec(const struct btf_type *t)
+{
+ return BTF_INFO_KIND(t->info) == BTF_KIND_LOCSEC;
+}
+
static bool btf_type_nosize(const struct btf_type *t)
{
return btf_type_is_void(t) || btf_type_is_fwd(t) ||
btf_type_is_func(t) || btf_type_is_func_proto(t) ||
- btf_type_is_decl_tag(t);
+ btf_type_is_decl_tag(t) || btf_type_is_loc_param(t) ||
+ btf_type_is_loc_proto(t) || btf_type_is_locsec(t);
}
static bool btf_type_nosize_or_null(const struct btf_type *t)
@@ -4524,6 +4543,150 @@ static const struct btf_kind_operations enum64_ops = {
.show = btf_enum64_show,
};
+static s32 btf_loc_param_check_meta(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left)
+{
+ const struct btf_loc_param *p = btf_loc_param(t);
+ u32 meta_needed;
+ s32 size;
+
+ meta_needed = sizeof(*p);
+
+ if (meta_left < meta_needed) {
+ btf_verifier_log_basic(env, t,
+ "meta_left:%u meta_needed:%u",
+ meta_left, meta_needed);
+ return -EINVAL;
+ }
+
+ if (t->name_off) {
+ btf_verifier_log_type(env, t, "Invalid name");
+ return -EINVAL;
+ }
+ if (btf_type_vlen(t)) {
+ btf_verifier_log_type(env, t, "Invalid vlen");
+ return -EINVAL;
+ }
+ size = btf_loc_param_size(t);
+ if (size < 0)
+ size = -size;
+ if (size > 8 || !is_power_of_2(size)) {
+ btf_verifier_log_type(env, t, "Unexpected size");
+ return -EINVAL;
+ }
+
+ return meta_needed;
+}
+
+static void btf_loc_param_log(struct btf_verifier_env *env,
+ const struct btf_type *t)
+{
+ const struct btf_loc_param *p = btf_loc_param(t);
+
+ if (btf_type_kflag(t))
+ btf_verifier_log(env, "type=%u const=%lld", t->type, btf_loc_param_value(t));
+ else
+ btf_verifier_log(env, "type=%u reg=%u flags=%d offset %u",
+ t->type, p->reg, p->flags, p->offset);
+}
+
+static const struct btf_kind_operations loc_param_ops = {
+ .check_meta = btf_loc_param_check_meta,
+ .resolve = btf_df_resolve,
+ .check_member = btf_df_check_member,
+ .check_kflag_member = btf_df_check_kflag_member,
+ .log_details = btf_loc_param_log,
+ .show = btf_df_show,
+};
+
+static s32 btf_loc_proto_check_meta(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left)
+{
+ u32 meta_needed;
+
+ meta_needed = sizeof(__u32) * btf_type_vlen(t);
+
+ if (meta_left < meta_needed) {
+ btf_verifier_log_basic(env, t,
+ "meta_left:%u meta_needed:%u",
+ meta_left, meta_needed);
+ return -EINVAL;
+ }
+
+ if (t->name_off) {
+ btf_verifier_log_type(env, t, "Invalid name");
+ return -EINVAL;
+ }
+ return meta_needed;
+}
+
+static void btf_loc_proto_log(struct btf_verifier_env *env,
+ const struct btf_type *t)
+{
+ const __u32 *params = btf_loc_params(t);
+ u16 nr_params = btf_type_vlen(t), i;
+
+ btf_verifier_log(env, "loc_proto locs=(");
+ for (i = 0; i < nr_params; i++, params++) {
+ btf_verifier_log(env, "type=%u%s", *params,
+ i + 1 == nr_params ? ")" : ", ");
+ }
+}
+
+static const struct btf_kind_operations loc_proto_ops = {
+ .check_meta = btf_loc_proto_check_meta,
+ .resolve = btf_df_resolve,
+ .check_member = btf_df_check_member,
+ .check_kflag_member = btf_df_check_kflag_member,
+ .log_details = btf_loc_proto_log,
+ .show = btf_df_show,
+};
+
+static s32 btf_locsec_check_meta(struct btf_verifier_env *env,
+ const struct btf_type *t,
+ u32 meta_left)
+{
+ u32 meta_needed;
+
+ meta_needed = sizeof(struct btf_loc) * btf_type_vlen(t);
+
+ if (meta_left < meta_needed) {
+ btf_verifier_log_basic(env, t,
+ "meta_left:%u meta_needed:%u",
+ meta_left, meta_needed);
+ return -EINVAL;
+ }
+ return meta_needed;
+}
+
+static void btf_locsec_log(struct btf_verifier_env *env,
+ const struct btf_type *t)
+{
+ const struct btf_loc *loc = btf_type_loc_secinfo(t);
+ u16 nr_locs = btf_type_vlen(t), i;
+ const struct btf *btf = env->btf;
+
+ btf_verifier_log(env, "locsec %s locs=(",
+ __btf_name_by_offset(btf, t->name_off));
+ for (i = 0; i < nr_locs; i++, loc++) {
+ btf_verifier_log(env, "\n\tname=%s func_proto %u loc_proto %u offset 0x%x%s",
+ __btf_name_by_offset(btf, loc->name_off),
+ loc->func_proto, loc->loc_proto, loc->offset,
+ i + 1 == nr_locs ? ")" : ", ");
+ }
+}
+
+static const struct btf_kind_operations locsec_ops = {
+ .check_meta = btf_locsec_check_meta,
+ .resolve = btf_df_resolve,
+ .check_member = btf_df_check_member,
+ .check_kflag_member = btf_df_check_kflag_member,
+ .log_details = btf_locsec_log,
+ .show = btf_df_show,
+};
+
static s32 btf_func_proto_check_meta(struct btf_verifier_env *env,
const struct btf_type *t,
u32 meta_left)
@@ -5193,6 +5356,9 @@ static const struct btf_kind_operations * const kind_ops[NR_BTF_KINDS] = {
[BTF_KIND_DECL_TAG] = &decl_tag_ops,
[BTF_KIND_TYPE_TAG] = &modifier_ops,
[BTF_KIND_ENUM64] = &enum64_ops,
+ [BTF_KIND_LOC_PARAM] = &loc_param_ops,
+ [BTF_KIND_LOC_PROTO] = &loc_proto_ops,
+ [BTF_KIND_LOCSEC] = &locsec_ops
};
static s32 btf_check_meta(struct btf_verifier_env *env,
diff --git a/tools/include/uapi/linux/btf.h b/tools/include/uapi/linux/btf.h
index 266d4ffa6c07..a74b9d202847 100644
--- a/tools/include/uapi/linux/btf.h
+++ b/tools/include/uapi/linux/btf.h
@@ -37,14 +37,16 @@ struct btf_type {
* bits 29-30: unused
* bit 31: kind_flag, currently used by
* struct, union, enum, fwd, enum64,
- * decl_tag and type_tag
+ * decl_tag, type_tag and loc
*/
__u32 info;
- /* "size" is used by INT, ENUM, STRUCT, UNION, DATASEC and ENUM64.
+ /* "size" is used by INT, ENUM, STRUCT, UNION, DATASEC, ENUM64
+ * and LOC.
+ *
* "size" tells the size of the type it is describing.
*
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
- * FUNC, FUNC_PROTO, VAR, DECL_TAG and TYPE_TAG.
+ * FUNC, FUNC_PROTO, VAR, DECL_TAG, TYPE_TAG and LOC_PROTO.
* "type" is a type_id referring to another type.
*/
union {
@@ -78,6 +80,9 @@ enum {
BTF_KIND_DECL_TAG = 17, /* Decl Tag */
BTF_KIND_TYPE_TAG = 18, /* Type Tag */
BTF_KIND_ENUM64 = 19, /* Enumeration up to 64-bit values */
+ BTF_KIND_LOC_PARAM = 20, /* Location parameter information */
+ BTF_KIND_LOC_PROTO = 21, /* Location prototype for site */
+ BTF_KIND_LOCSEC = 22, /* Location section */
NR_BTF_KINDS,
BTF_KIND_MAX = NR_BTF_KINDS - 1,
@@ -198,4 +203,78 @@ struct btf_enum64 {
__u32 val_hi32;
};
+/* BTF_KIND_LOC_PARAM consists a btf_type specifying a vlen of 0, name_off is 0
+ * and is followed by a singular "struct btf_loc_param". type/size specifies
+ * the size of the associated location value. The size value should be
+ * cast to a __s32 as negative sizes can be specified; -8 to indicate a signed
+ * 8 byte value for example.
+ *
+ * If kind_flag is 1 the btf_loc is a constant value, otherwise it represents
+ * a register, possibly dereferencing it with the specified offset.
+ *
+ * "struct btf_type" is followed by a "struct btf_loc_param" which consists
+ * of either the 64-bit value or the register number, offset etc.
+ * Interpretation depends on whether the kind_flag is set as described above.
+ */
+
+/* BTF_KIND_LOC_PARAM specifies a signed size; negative values represent signed
+ * values of the specific size, for example -8 is an 8-byte signed value.
+ */
+#define BTF_TYPE_LOC_PARAM_SIZE(t) ((__s32)((t)->size))
+
+/* location param specified by reg + offset is a dereference */
+#define BTF_LOC_FLAG_REG_DEREF 0x1
+/* next location param is needed to specify parameter location also; for example
+ * when two registers are used to store a 16-byte struct by value.
+ */
+#define BTF_LOC_FLAG_CONTINUE 0x2
+
+struct btf_loc_param {
+ union {
+ struct {
+ __u16 reg; /* register number */
+ __u16 flags; /* register dereference */
+ __s32 offset; /* offset from register-stored address */
+ };
+ struct {
+ __u32 val_lo32; /* lo 32 bits of 64-bit value */
+ __u32 val_hi32; /* hi 32 bits of 64-bit value */
+ };
+ };
+};
+
+/* BTF_KIND_LOC_PROTO specifies location prototypes; i.e. how locations relate
+ * to parameters; a struct btf_type of BTF_KIND_LOC_PROTO is followed by a
+ * a vlen-specified number of __u32 which specify the associated
+ * BTF_KIND_LOC_PARAM for each function parameter associated with the
+ * location. The type should either be 0 (no location info) or point at
+ * a BTF_KIND_LOC_PARAM. Multiple BTF_KIND_LOC_PARAMs can be used to
+ * represent a single function parameter; in such a case each should specify
+ * BTF_LOC_FLAG_CONTINUE.
+ *
+ * The type field in the associated "struct btf_type" should point at an
+ * associated BTF_KIND_FUNC_PROTO.
+ */
+
+/* BTF_KIND_LOCSEC consists of vlen-specified number of "struct btf_loc"
+ * containing location site-specific information;
+ *
+ * - name associated with the location (name_off)
+ * - function prototype type id (func_proto)
+ * - location prototype type id (loc_proto)
+ * - address offset (offset)
+ */
+
+struct btf_loc {
+ __u32 name_off;
+ __u32 func_proto;
+ __u32 loc_proto;
+ __u32 offset;
+};
+
+/* helps libbpf know that location declarations are present; libbpf
+ * can then work around absence if this value is not set.
+ */
+#define BTF_KIND_LOC_UAPI_DEFINED 1
+
#endif /* _UAPI__LINUX_BTF_H__ */
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 01/15] bpf: Extend UAPI to support location information
2025-10-08 17:34 ` [RFC bpf-next 01/15] bpf: Extend UAPI to support location information Alan Maguire
@ 2025-10-16 18:36 ` Andrii Nakryiko
2025-10-17 8:43 ` Alan Maguire
2025-10-23 0:56 ` Eduard Zingerman
1 sibling, 1 reply; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-16 18:36 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Add BTF_KIND_LOC_PARAM, BTF_KIND_LOC_PROTO and BTF_KIND_LOCSEC
> to help represent location information for functions.
>
> BTF_KIND_LOC_PARAM is used to represent how we retrieve data at a
> location; either via a register, or register+offset or a
> constant value.
>
> BTF_KIND_LOC_PROTO represents location information about a location
> with multiple BTF_KIND_LOC_PARAMs.
>
> And finally BTF_KIND_LOCSEC is a set of location sites, each
> of which has
>
> - a name (function name)
> - a function prototype specifying which types are associated
> with parameters
> - a location prototype specifying where to find those parameters
> - an address offset
>
> This can be used to represent
>
> - a fully-inlined function
> - a partially-inlined function where some _LOC_PROTOs represent
> inlined sites as above and others have normal _FUNC representations
> - a function with optimized parameters; again the FUNC_PROTO
> represents the original function, with LOC info telling us
> where to obtain each parameter (or 0 if the parameter is
> unobtainable)
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> include/linux/btf.h | 29 +++++-
> include/uapi/linux/btf.h | 85 ++++++++++++++++-
> kernel/bpf/btf.c | 168 ++++++++++++++++++++++++++++++++-
> tools/include/uapi/linux/btf.h | 85 ++++++++++++++++-
> 4 files changed, 359 insertions(+), 8 deletions(-)
>
[...]
> @@ -78,6 +80,9 @@ enum {
> BTF_KIND_DECL_TAG = 17, /* Decl Tag */
> BTF_KIND_TYPE_TAG = 18, /* Type Tag */
> BTF_KIND_ENUM64 = 19, /* Enumeration up to 64-bit values */
> + BTF_KIND_LOC_PARAM = 20, /* Location parameter information */
> + BTF_KIND_LOC_PROTO = 21, /* Location prototype for site */
> + BTF_KIND_LOCSEC = 22, /* Location section */
>
> NR_BTF_KINDS,
> BTF_KIND_MAX = NR_BTF_KINDS - 1,
> @@ -198,4 +203,78 @@ struct btf_enum64 {
> __u32 val_hi32;
> };
>
> +/* BTF_KIND_LOC_PARAM consists a btf_type specifying a vlen of 0, name_off is 0
what if we make LOC_PARAM variable-length (i.e., use vlen). We can
always have a fixed 4 bytes value that will contain an arg size, maybe
some flags, and an enum representing what kind of location spec it is
(constant, register, reg-deref, reg+off, reg+off-deref, etc). And then
depending on that enum we'll know how to interpret those vlen * 4
bytes. This will give us extensibility to support more complicated
expressions, when we will be ready to tackle them. Still nicely
dedupable, though. WDYT?
> + * and is followed by a singular "struct btf_loc_param". type/size specifies
> + * the size of the associated location value. The size value should be
> + * cast to a __s32 as negative sizes can be specified; -8 to indicate a signed
> + * 8 byte value for example.
> + *
> + * If kind_flag is 1 the btf_loc is a constant value, otherwise it represents
> + * a register, possibly dereferencing it with the specified offset.
> + *
> + * "struct btf_type" is followed by a "struct btf_loc_param" which consists
> + * of either the 64-bit value or the register number, offset etc.
> + * Interpretation depends on whether the kind_flag is set as described above.
> + */
> +
> +/* BTF_KIND_LOC_PARAM specifies a signed size; negative values represent signed
> + * values of the specific size, for example -8 is an 8-byte signed value.
> + */
> +#define BTF_TYPE_LOC_PARAM_SIZE(t) ((__s32)((t)->size))
> +
> +/* location param specified by reg + offset is a dereference */
> +#define BTF_LOC_FLAG_REG_DEREF 0x1
> +/* next location param is needed to specify parameter location also; for example
> + * when two registers are used to store a 16-byte struct by value.
> + */
> +#define BTF_LOC_FLAG_CONTINUE 0x2
> +
> +struct btf_loc_param {
> + union {
> + struct {
> + __u16 reg; /* register number */
> + __u16 flags; /* register dereference */
> + __s32 offset; /* offset from register-stored address */
> + };
> + struct {
> + __u32 val_lo32; /* lo 32 bits of 64-bit value */
> + __u32 val_hi32; /* hi 32 bits of 64-bit value */
> + };
> + };
> +};
> +
> +/* BTF_KIND_LOC_PROTO specifies location prototypes; i.e. how locations relate
> + * to parameters; a struct btf_type of BTF_KIND_LOC_PROTO is followed by a
> + * a vlen-specified number of __u32 which specify the associated
> + * BTF_KIND_LOC_PARAM for each function parameter associated with the
> + * location. The type should either be 0 (no location info) or point at
> + * a BTF_KIND_LOC_PARAM. Multiple BTF_KIND_LOC_PARAMs can be used to
> + * represent a single function parameter; in such a case each should specify
> + * BTF_LOC_FLAG_CONTINUE.
> + *
> + * The type field in the associated "struct btf_type" should point at an
> + * associated BTF_KIND_FUNC_PROTO.
> + */
> +
> +/* BTF_KIND_LOCSEC consists of vlen-specified number of "struct btf_loc"
> + * containing location site-specific information;
> + *
> + * - name associated with the location (name_off)
> + * - function prototype type id (func_proto)
> + * - location prototype type id (loc_proto)
> + * - address offset (offset)
> + */
> +
> +struct btf_loc {
> + __u32 name_off;
> + __u32 func_proto;
> + __u32 loc_proto;
> + __u32 offset;
> +};
What is that offset relative to? Offset within the function in which
we were inlined? Do we know what that function is? I might have missed
how we represent that.
> +
> +/* helps libbpf know that location declarations are present; libbpf
> + * can then work around absence if this value is not set.
> + */
> +#define BTF_KIND_LOC_UAPI_DEFINED 1
> +
you don't mention that in the commit, I'll have to figure this out
from subsequent patches, but it would be nice to give an overview of
the purpose of this in this patch
[...]
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 01/15] bpf: Extend UAPI to support location information
2025-10-16 18:36 ` Andrii Nakryiko
@ 2025-10-17 8:43 ` Alan Maguire
2025-10-20 20:57 ` Andrii Nakryiko
0 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-17 8:43 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On 16/10/2025 19:36, Andrii Nakryiko wrote:
> On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> Add BTF_KIND_LOC_PARAM, BTF_KIND_LOC_PROTO and BTF_KIND_LOCSEC
>> to help represent location information for functions.
>>
>> BTF_KIND_LOC_PARAM is used to represent how we retrieve data at a
>> location; either via a register, or register+offset or a
>> constant value.
>>
>> BTF_KIND_LOC_PROTO represents location information about a location
>> with multiple BTF_KIND_LOC_PARAMs.
>>
>> And finally BTF_KIND_LOCSEC is a set of location sites, each
>> of which has
>>
>> - a name (function name)
>> - a function prototype specifying which types are associated
>> with parameters
>> - a location prototype specifying where to find those parameters
>> - an address offset
>>
>> This can be used to represent
>>
>> - a fully-inlined function
>> - a partially-inlined function where some _LOC_PROTOs represent
>> inlined sites as above and others have normal _FUNC representations
>> - a function with optimized parameters; again the FUNC_PROTO
>> represents the original function, with LOC info telling us
>> where to obtain each parameter (or 0 if the parameter is
>> unobtainable)
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>> include/linux/btf.h | 29 +++++-
>> include/uapi/linux/btf.h | 85 ++++++++++++++++-
>> kernel/bpf/btf.c | 168 ++++++++++++++++++++++++++++++++-
>> tools/include/uapi/linux/btf.h | 85 ++++++++++++++++-
>> 4 files changed, 359 insertions(+), 8 deletions(-)
>>
>
> [...]
>
>> @@ -78,6 +80,9 @@ enum {
>> BTF_KIND_DECL_TAG = 17, /* Decl Tag */
>> BTF_KIND_TYPE_TAG = 18, /* Type Tag */
>> BTF_KIND_ENUM64 = 19, /* Enumeration up to 64-bit values */
>> + BTF_KIND_LOC_PARAM = 20, /* Location parameter information */
>> + BTF_KIND_LOC_PROTO = 21, /* Location prototype for site */
>> + BTF_KIND_LOCSEC = 22, /* Location section */
>>
>> NR_BTF_KINDS,
>> BTF_KIND_MAX = NR_BTF_KINDS - 1,
>> @@ -198,4 +203,78 @@ struct btf_enum64 {
>> __u32 val_hi32;
>> };
>>
>> +/* BTF_KIND_LOC_PARAM consists a btf_type specifying a vlen of 0, name_off is 0
>
> what if we make LOC_PARAM variable-length (i.e., use vlen). We can
> always have a fixed 4 bytes value that will contain an arg size, maybe
> some flags, and an enum representing what kind of location spec it is
> (constant, register, reg-deref, reg+off, reg+off-deref, etc). And then
> depending on that enum we'll know how to interpret those vlen * 4
> bytes. This will give us extensibility to support more complicated
> expressions, when we will be ready to tackle them. Still nicely
> dedupable, though. WDYT?
>
It's a great idea; extensibility is really important here as I hope we
can learn to cover some of the additional location cases we don't
currently. Also we can retire the whole "continue" flag thing for cases
like multi-register representations of structs; we can instead have a
vlen 2 representation with registers in each slot. What's also nice
about that is that it lines up the LOC_PROTO and FUNC_PROTO indices for
parameters so the same index in LOC_PROTO has its type in FUNC_PROTO.
In terms of specifics, I think removing the arg size from the type/size
btf_type field is a good thing as you suggest; having to reinterpret
negative values there is messy. So what about
/* BTF_KIND_LOC_PARAM consists a btf_type specifying a vlen of 0,
name_off and type/size are 0.
* It is followed by a singular "struct btf_loc_param" and a
vlen-specified set of "struct btf_loc_param_data".
*/
enum {
BTF_LOC_PARAM_REG_DATA,
BTF_LOC_PARAM_CONST_DATA,
};
struct btf_loc_param {
__u8 size; /* signed size; negative values represent signed
* values of the specified size, for example -8
* is an 8-byte signed value.
*/
__u8 data; /* interpret struct btf_loc_param_data */
__u16 flags;
};
struct btf_loc_param_data {
union {
struct {
__u16 reg; /* register number */
__u16 flags; /* register dereference */
__s32 offset; /* offset from
register-stored address */
};
struct {
__u32 val_lo32; /* lo 32 bits of 64-bit
value */
__u32 val_hi32; /* hi 32 bits of 64-bit
value */
};
};
};
I realize we have flags in two places (loc_param and loc_param_data for
registers); just in case we needed some sort of mix of register value
and register dereference I think that makes sense; haven't seen that in
practice yet though. Let me know if the above is what you have in mind.
>> + * and is followed by a singular "struct btf_loc_param". type/size specifies
>> + * the size of the associated location value. The size value should be
>> + * cast to a __s32 as negative sizes can be specified; -8 to indicate a signed
>> + * 8 byte value for example.
>> + *
>> + * If kind_flag is 1 the btf_loc is a constant value, otherwise it represents
>> + * a register, possibly dereferencing it with the specified offset.
>> + *
>> + * "struct btf_type" is followed by a "struct btf_loc_param" which consists
>> + * of either the 64-bit value or the register number, offset etc.
>> + * Interpretation depends on whether the kind_flag is set as described above.
>> + */
>> +
>> +/* BTF_KIND_LOC_PARAM specifies a signed size; negative values represent signed
>> + * values of the specific size, for example -8 is an 8-byte signed value.
>> + */
>> +#define BTF_TYPE_LOC_PARAM_SIZE(t) ((__s32)((t)->size))
>> +
>> +/* location param specified by reg + offset is a dereference */
>> +#define BTF_LOC_FLAG_REG_DEREF 0x1
>> +/* next location param is needed to specify parameter location also; for example
>> + * when two registers are used to store a 16-byte struct by value.
>> + */
>> +#define BTF_LOC_FLAG_CONTINUE 0x2
>> +
>> +struct btf_loc_param {
>> + union {
>> + struct {
>> + __u16 reg; /* register number */
>> + __u16 flags; /* register dereference */
>> + __s32 offset; /* offset from register-stored address */
>> + };
>> + struct {
>> + __u32 val_lo32; /* lo 32 bits of 64-bit value */
>> + __u32 val_hi32; /* hi 32 bits of 64-bit value */
>> + };
>> + };
>> +};
>> +
>> +/* BTF_KIND_LOC_PROTO specifies location prototypes; i.e. how locations relate
>> + * to parameters; a struct btf_type of BTF_KIND_LOC_PROTO is followed by a
>> + * a vlen-specified number of __u32 which specify the associated
>> + * BTF_KIND_LOC_PARAM for each function parameter associated with the
>> + * location. The type should either be 0 (no location info) or point at
>> + * a BTF_KIND_LOC_PARAM. Multiple BTF_KIND_LOC_PARAMs can be used to
>> + * represent a single function parameter; in such a case each should specify
>> + * BTF_LOC_FLAG_CONTINUE.
>> + *
>> + * The type field in the associated "struct btf_type" should point at an
>> + * associated BTF_KIND_FUNC_PROTO.
>> + */
>> +
>> +/* BTF_KIND_LOCSEC consists of vlen-specified number of "struct btf_loc"
>> + * containing location site-specific information;
>> + *
>> + * - name associated with the location (name_off)
>> + * - function prototype type id (func_proto)
>> + * - location prototype type id (loc_proto)
>> + * - address offset (offset)
>> + */
>> +
>> +struct btf_loc {
>> + __u32 name_off;
>> + __u32 func_proto;
>> + __u32 loc_proto;
>> + __u32 offset;
>> +};
>
> What is that offset relative to? Offset within the function in which
> we were inlined? Do we know what that function is? I might have missed
> how we represent that.
The offset is relative to kernel base address (at compile-time the
address of .text, at runtime the address of _start). The reasoning is we
have to deal with kASLR which means any compile-time absolute address
will likely change when the kernel is loaded. So we cannot deal in raw
addresses, and to fixup the addresses we then gather kernel/module base
address at runtime to compute the actual location of the inline site.
See get_base_addr() in tools/lib/bpf/loc.c in patch 14 for an example of
how this is done.
Given this, it might make sense to have a convention where the LOCSEC
specifies the section name also, something like
"inline.text"
What do you think?
>
>> +
>> +/* helps libbpf know that location declarations are present; libbpf
>> + * can then work around absence if this value is not set.
>> + */
>> +#define BTF_KIND_LOC_UAPI_DEFINED 1
>> +
>
> you don't mention that in the commit, I'll have to figure this out
> from subsequent patches, but it would be nice to give an overview of
> the purpose of this in this patch
>
This is a bit ugly, but is intended to help deal with the situation -
which happens a lot with distros where we might want to build libbpf
without latest UAPI headers (some distros may not get new UAPI headers
for a while). The libbpf patches check if the above is defined, and if
not supply their own location-related definitions. If in turn libbpf
needs to define them, it defines BTF_KIND_LOC_LIBBPF_DEFINED. Finally
pahole - which needs to compile both with a checkpointed libbpf commit
and a libbpf that may be older and not have location definitions -
checks for either, and if not present does a similar set of declarations
to ensure compilation still succeeds. We use weak declarations of libbpf
location-related functions locally to check if they are available at
runtime; this dynamically determines if the inline feature is available.
Not pretty, but it will help avioid some of the issues we had with BTF
enum64 and compilation.
Thanks!
Alan
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 01/15] bpf: Extend UAPI to support location information
2025-10-17 8:43 ` Alan Maguire
@ 2025-10-20 20:57 ` Andrii Nakryiko
2025-10-23 8:17 ` Alan Maguire
0 siblings, 1 reply; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-20 20:57 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Fri, Oct 17, 2025 at 1:43 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 16/10/2025 19:36, Andrii Nakryiko wrote:
> > On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> Add BTF_KIND_LOC_PARAM, BTF_KIND_LOC_PROTO and BTF_KIND_LOCSEC
> >> to help represent location information for functions.
> >>
> >> BTF_KIND_LOC_PARAM is used to represent how we retrieve data at a
> >> location; either via a register, or register+offset or a
> >> constant value.
> >>
> >> BTF_KIND_LOC_PROTO represents location information about a location
> >> with multiple BTF_KIND_LOC_PARAMs.
> >>
> >> And finally BTF_KIND_LOCSEC is a set of location sites, each
> >> of which has
> >>
> >> - a name (function name)
> >> - a function prototype specifying which types are associated
> >> with parameters
> >> - a location prototype specifying where to find those parameters
> >> - an address offset
> >>
> >> This can be used to represent
> >>
> >> - a fully-inlined function
> >> - a partially-inlined function where some _LOC_PROTOs represent
> >> inlined sites as above and others have normal _FUNC representations
> >> - a function with optimized parameters; again the FUNC_PROTO
> >> represents the original function, with LOC info telling us
> >> where to obtain each parameter (or 0 if the parameter is
> >> unobtainable)
> >>
> >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >> ---
> >> include/linux/btf.h | 29 +++++-
> >> include/uapi/linux/btf.h | 85 ++++++++++++++++-
> >> kernel/bpf/btf.c | 168 ++++++++++++++++++++++++++++++++-
> >> tools/include/uapi/linux/btf.h | 85 ++++++++++++++++-
> >> 4 files changed, 359 insertions(+), 8 deletions(-)
> >>
> >
> > [...]
> >
> >> @@ -78,6 +80,9 @@ enum {
> >> BTF_KIND_DECL_TAG = 17, /* Decl Tag */
> >> BTF_KIND_TYPE_TAG = 18, /* Type Tag */
> >> BTF_KIND_ENUM64 = 19, /* Enumeration up to 64-bit values */
> >> + BTF_KIND_LOC_PARAM = 20, /* Location parameter information */
> >> + BTF_KIND_LOC_PROTO = 21, /* Location prototype for site */
> >> + BTF_KIND_LOCSEC = 22, /* Location section */
> >>
> >> NR_BTF_KINDS,
> >> BTF_KIND_MAX = NR_BTF_KINDS - 1,
> >> @@ -198,4 +203,78 @@ struct btf_enum64 {
> >> __u32 val_hi32;
> >> };
> >>
> >> +/* BTF_KIND_LOC_PARAM consists a btf_type specifying a vlen of 0, name_off is 0
> >
> > what if we make LOC_PARAM variable-length (i.e., use vlen). We can
> > always have a fixed 4 bytes value that will contain an arg size, maybe
> > some flags, and an enum representing what kind of location spec it is
> > (constant, register, reg-deref, reg+off, reg+off-deref, etc). And then
> > depending on that enum we'll know how to interpret those vlen * 4
> > bytes. This will give us extensibility to support more complicated
> > expressions, when we will be ready to tackle them. Still nicely
> > dedupable, though. WDYT?
> >
>
> It's a great idea; extensibility is really important here as I hope we
> can learn to cover some of the additional location cases we don't
> currently. Also we can retire the whole "continue" flag thing for cases
> like multi-register representations of structs; we can instead have a
> vlen 2 representation with registers in each slot. What's also nice
> about that is that it lines up the LOC_PROTO and FUNC_PROTO indices for
> parameters so the same index in LOC_PROTO has its type in FUNC_PROTO.
>
> In terms of specifics, I think removing the arg size from the type/size
> btf_type field is a good thing as you suggest; having to reinterpret
> negative values there is messy. So what about
>
> /* BTF_KIND_LOC_PARAM consists a btf_type specifying a vlen of 0,
> name_off and type/size are 0.
> * It is followed by a singular "struct btf_loc_param" and a
> vlen-specified set of "struct btf_loc_param_data".
> */
>
> enum {
nit: name this enum, so we can refer to it from comments
> BTF_LOC_PARAM_REG_DATA,
> BTF_LOC_PARAM_CONST_DATA,
> };
>
> struct btf_loc_param {
> __u8 size; /* signed size; negative values represent signed
> * values of the specified size, for example -8
> * is an 8-byte signed value.
> */
> __u8 data; /* interpret struct btf_loc_param_data */
e.g., this will mention that this is enum btf_loc_param_kind from the above
> __u16 flags;
> };
>
> struct btf_loc_param_data {
> union {
> struct {
> __u16 reg; /* register number */
> __u16 flags; /* register dereference */
> __s32 offset; /* offset from
> register-stored address */
> };
> struct {
> __u32 val_lo32; /* lo 32 bits of 64-bit
> value */
> __u32 val_hi32; /* hi 32 bits of 64-bit
> value */
> };
> };
> };
I'd actually specify that each vlen element is 4 byte long (that's
minimal reasonable size we can use to keep everything aligned well),
and then just specify how to interpret those values depending on that
loc_param_kind. I.e., for register we can use vlen=1, and say that
those 4 bytes define register number (or whatever we will use to
identify the register). But for reg+offset we have vlen=2, where first
is register as before, second is offset value. And so on.
>
> I realize we have flags in two places (loc_param and loc_param_data for
> registers); just in case we needed some sort of mix of register value
> and register dereference I think that makes sense; haven't seen that in
> practice yet though. Let me know if the above is what you have in mind.
see above, I think having spec for each kind of param location and
using minimal amount of data will give us this future-proof approach.
We don't even have to define flags until we have them, just specify
that all unused bits/bytes should be zero, until used in the future.
>
>
> >> + * and is followed by a singular "struct btf_loc_param". type/size specifies
> >> + * the size of the associated location value. The size value should be
> >> + * cast to a __s32 as negative sizes can be specified; -8 to indicate a signed
> >> + * 8 byte value for example.
> >> + *
> >> + * If kind_flag is 1 the btf_loc is a constant value, otherwise it represents
> >> + * a register, possibly dereferencing it with the specified offset.
> >> + *
> >> + * "struct btf_type" is followed by a "struct btf_loc_param" which consists
> >> + * of either the 64-bit value or the register number, offset etc.
> >> + * Interpretation depends on whether the kind_flag is set as described above.
> >> + */
> >> +
> >> +/* BTF_KIND_LOC_PARAM specifies a signed size; negative values represent signed
> >> + * values of the specific size, for example -8 is an 8-byte signed value.
> >> + */
> >> +#define BTF_TYPE_LOC_PARAM_SIZE(t) ((__s32)((t)->size))
> >> +
> >> +/* location param specified by reg + offset is a dereference */
> >> +#define BTF_LOC_FLAG_REG_DEREF 0x1
> >> +/* next location param is needed to specify parameter location also; for example
> >> + * when two registers are used to store a 16-byte struct by value.
> >> + */
> >> +#define BTF_LOC_FLAG_CONTINUE 0x2
> >> +
> >> +struct btf_loc_param {
> >> + union {
> >> + struct {
> >> + __u16 reg; /* register number */
> >> + __u16 flags; /* register dereference */
> >> + __s32 offset; /* offset from register-stored address */
> >> + };
> >> + struct {
> >> + __u32 val_lo32; /* lo 32 bits of 64-bit value */
> >> + __u32 val_hi32; /* hi 32 bits of 64-bit value */
> >> + };
> >> + };
> >> +};
> >> +
> >> +/* BTF_KIND_LOC_PROTO specifies location prototypes; i.e. how locations relate
> >> + * to parameters; a struct btf_type of BTF_KIND_LOC_PROTO is followed by a
> >> + * a vlen-specified number of __u32 which specify the associated
> >> + * BTF_KIND_LOC_PARAM for each function parameter associated with the
> >> + * location. The type should either be 0 (no location info) or point at
> >> + * a BTF_KIND_LOC_PARAM. Multiple BTF_KIND_LOC_PARAMs can be used to
> >> + * represent a single function parameter; in such a case each should specify
> >> + * BTF_LOC_FLAG_CONTINUE.
> >> + *
> >> + * The type field in the associated "struct btf_type" should point at an
> >> + * associated BTF_KIND_FUNC_PROTO.
> >> + */
> >> +
> >> +/* BTF_KIND_LOCSEC consists of vlen-specified number of "struct btf_loc"
> >> + * containing location site-specific information;
> >> + *
> >> + * - name associated with the location (name_off)
> >> + * - function prototype type id (func_proto)
> >> + * - location prototype type id (loc_proto)
> >> + * - address offset (offset)
> >> + */
> >> +
> >> +struct btf_loc {
> >> + __u32 name_off;
> >> + __u32 func_proto;
> >> + __u32 loc_proto;
> >> + __u32 offset;
> >> +};
> >
> > What is that offset relative to? Offset within the function in which
> > we were inlined? Do we know what that function is? I might have missed
> > how we represent that.
>
> The offset is relative to kernel base address (at compile-time the
> address of .text, at runtime the address of _start). The reasoning is we
> have to deal with kASLR which means any compile-time absolute address
> will likely change when the kernel is loaded. So we cannot deal in raw
> addresses, and to fixup the addresses we then gather kernel/module base
> address at runtime to compute the actual location of the inline site.
> See get_base_addr() in tools/lib/bpf/loc.c in patch 14 for an example of
> how this is done.
this makes sense, but this should be documented, IMO
>
> Given this, it might make sense to have a convention where the LOCSEC
> specifies the section name also, something like
>
> "inline.text"
>
> What do you think?
hm... I'd specify offsets relative to the KASLR base, uniformly.
Section name is a somewhat superficial detail in terms of tracing
kernel functions, I don't know if it's that important to group
functions by ELF section. (unless I'm missing where this would be
important for correctness?)
>
> >
> >> +
> >> +/* helps libbpf know that location declarations are present; libbpf
> >> + * can then work around absence if this value is not set.
> >> + */
> >> +#define BTF_KIND_LOC_UAPI_DEFINED 1
> >> +
> >
> > you don't mention that in the commit, I'll have to figure this out
> > from subsequent patches, but it would be nice to give an overview of
> > the purpose of this in this patch
> >
>
> This is a bit ugly, but is intended to help deal with the situation -
> which happens a lot with distros where we might want to build libbpf
> without latest UAPI headers (some distros may not get new UAPI headers
> for a while). The libbpf patches check if the above is defined, and if
> not supply their own location-related definitions. If in turn libbpf
> needs to define them, it defines BTF_KIND_LOC_LIBBPF_DEFINED. Finally
> pahole - which needs to compile both with a checkpointed libbpf commit
> and a libbpf that may be older and not have location definitions -
> checks for either, and if not present does a similar set of declarations
> to ensure compilation still succeeds. We use weak declarations of libbpf
> location-related functions locally to check if they are available at
> runtime; this dynamically determines if the inline feature is available.
>
> Not pretty, but it will help avioid some of the issues we had with BTF
> enum64 and compilation.
um... all this is completely unnecessary because libbpf is supplying
its own freshest UAPI headers it needs in Github mirror under
include/uapi/linux subdir. Distros should use those UAPI headers to
build libbpf from source.
So the above BTF_KIND_LOC_UAPI_DEFINED hack is not necessary.
>
> Thanks!
>
> Alan
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 01/15] bpf: Extend UAPI to support location information
2025-10-20 20:57 ` Andrii Nakryiko
@ 2025-10-23 8:17 ` Alan Maguire
2025-11-05 0:43 ` Andrii Nakryiko
0 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-23 8:17 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On 20/10/2025 21:57, Andrii Nakryiko wrote:
> On Fri, Oct 17, 2025 at 1:43 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On 16/10/2025 19:36, Andrii Nakryiko wrote:
>>> On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>
>>>> Add BTF_KIND_LOC_PARAM, BTF_KIND_LOC_PROTO and BTF_KIND_LOCSEC
>>>> to help represent location information for functions.
>>>>
>>>> BTF_KIND_LOC_PARAM is used to represent how we retrieve data at a
>>>> location; either via a register, or register+offset or a
>>>> constant value.
>>>>
>>>> BTF_KIND_LOC_PROTO represents location information about a location
>>>> with multiple BTF_KIND_LOC_PARAMs.
>>>>
>>>> And finally BTF_KIND_LOCSEC is a set of location sites, each
>>>> of which has
>>>>
>>>> - a name (function name)
>>>> - a function prototype specifying which types are associated
>>>> with parameters
>>>> - a location prototype specifying where to find those parameters
>>>> - an address offset
>>>>
>>>> This can be used to represent
>>>>
>>>> - a fully-inlined function
>>>> - a partially-inlined function where some _LOC_PROTOs represent
>>>> inlined sites as above and others have normal _FUNC representations
>>>> - a function with optimized parameters; again the FUNC_PROTO
>>>> represents the original function, with LOC info telling us
>>>> where to obtain each parameter (or 0 if the parameter is
>>>> unobtainable)
>>>>
>>>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>>>> ---
>>>> include/linux/btf.h | 29 +++++-
>>>> include/uapi/linux/btf.h | 85 ++++++++++++++++-
>>>> kernel/bpf/btf.c | 168 ++++++++++++++++++++++++++++++++-
>>>> tools/include/uapi/linux/btf.h | 85 ++++++++++++++++-
>>>> 4 files changed, 359 insertions(+), 8 deletions(-)
>>>>
>>>
>>> [...]
>>>
>>>> @@ -78,6 +80,9 @@ enum {
>>>> BTF_KIND_DECL_TAG = 17, /* Decl Tag */
>>>> BTF_KIND_TYPE_TAG = 18, /* Type Tag */
>>>> BTF_KIND_ENUM64 = 19, /* Enumeration up to 64-bit values */
>>>> + BTF_KIND_LOC_PARAM = 20, /* Location parameter information */
>>>> + BTF_KIND_LOC_PROTO = 21, /* Location prototype for site */
>>>> + BTF_KIND_LOCSEC = 22, /* Location section */
>>>>
>>>> NR_BTF_KINDS,
>>>> BTF_KIND_MAX = NR_BTF_KINDS - 1,
>>>> @@ -198,4 +203,78 @@ struct btf_enum64 {
>>>> __u32 val_hi32;
>>>> };
>>>>
>>>> +/* BTF_KIND_LOC_PARAM consists a btf_type specifying a vlen of 0, name_off is 0
>>>
>>> what if we make LOC_PARAM variable-length (i.e., use vlen). We can
>>> always have a fixed 4 bytes value that will contain an arg size, maybe
>>> some flags, and an enum representing what kind of location spec it is
>>> (constant, register, reg-deref, reg+off, reg+off-deref, etc). And then
>>> depending on that enum we'll know how to interpret those vlen * 4
>>> bytes. This will give us extensibility to support more complicated
>>> expressions, when we will be ready to tackle them. Still nicely
>>> dedupable, though. WDYT?
>>>
>>
>> It's a great idea; extensibility is really important here as I hope we
>> can learn to cover some of the additional location cases we don't
>> currently. Also we can retire the whole "continue" flag thing for cases
>> like multi-register representations of structs; we can instead have a
>> vlen 2 representation with registers in each slot. What's also nice
>> about that is that it lines up the LOC_PROTO and FUNC_PROTO indices for
>> parameters so the same index in LOC_PROTO has its type in FUNC_PROTO.
>>
>> In terms of specifics, I think removing the arg size from the type/size
>> btf_type field is a good thing as you suggest; having to reinterpret
>> negative values there is messy. So what about
>>
>> /* BTF_KIND_LOC_PARAM consists a btf_type specifying a vlen of 0,
>> name_off and type/size are 0.
>> * It is followed by a singular "struct btf_loc_param" and a
>> vlen-specified set of "struct btf_loc_param_data".
>> */
>>
>> enum {
>
> nit: name this enum, so we can refer to it from comments
>
>> BTF_LOC_PARAM_REG_DATA,
>> BTF_LOC_PARAM_CONST_DATA,
>> };
>>
>> struct btf_loc_param {
>> __u8 size; /* signed size; negative values represent signed
>> * values of the specified size, for example -8
>> * is an 8-byte signed value.
>> */
>> __u8 data; /* interpret struct btf_loc_param_data */
>
> e.g., this will mention that this is enum btf_loc_param_kind from the above
>
>> __u16 flags;
>> };
>>
>> struct btf_loc_param_data {
>> union {
>> struct {
>> __u16 reg; /* register number */
>> __u16 flags; /* register dereference */
>> __s32 offset; /* offset from
>> register-stored address */
>> };
>> struct {
>> __u32 val_lo32; /* lo 32 bits of 64-bit
>> value */
>> __u32 val_hi32; /* hi 32 bits of 64-bit
>> value */
>> };
>> };
>> };
>
> I'd actually specify that each vlen element is 4 byte long (that's
> minimal reasonable size we can use to keep everything aligned well),
> and then just specify how to interpret those values depending on that
> loc_param_kind. I.e., for register we can use vlen=1, and say that
> those 4 bytes define register number (or whatever we will use to
> identify the register). But for reg+offset we have vlen=2, where first
> is register as before, second is offset value. And so on.
>
>>
>> I realize we have flags in two places (loc_param and loc_param_data for
>> registers); just in case we needed some sort of mix of register value
>> and register dereference I think that makes sense; haven't seen that in
>> practice yet though. Let me know if the above is what you have in mind.
>
> see above, I think having spec for each kind of param location and
> using minimal amount of data will give us this future-proof approach.
> We don't even have to define flags until we have them, just specify
> that all unused bits/bytes should be zero, until used in the future.
>
Sounds good on the enum+vlen specification. I've managed to cover all
but ~2100 locations in the kernel DWARF with the existing scheme, and
all those would work well for this approach too. I did a bit of
investigation and the remainder that aren't covered have between 2 and
20 location ops/values associated with them. A few are of the uncovered
cases are of the form (register_value & const_mask), ~register_value,
that sort of thing. I think we could cover some of the easier ones like
that with this scheme too, e.g. have a enum btf_loc_param_kind of
REG__WITH_CONST_OP or similar which has vlen 4 (reg#, op, 64 bits for
const). I think anything more complex than that we probably don't want
to worry about.
>>
>>
>>>> + * and is followed by a singular "struct btf_loc_param". type/size specifies
>>>> + * the size of the associated location value. The size value should be
>>>> + * cast to a __s32 as negative sizes can be specified; -8 to indicate a signed
>>>> + * 8 byte value for example.
>>>> + *
>>>> + * If kind_flag is 1 the btf_loc is a constant value, otherwise it represents
>>>> + * a register, possibly dereferencing it with the specified offset.
>>>> + *
>>>> + * "struct btf_type" is followed by a "struct btf_loc_param" which consists
>>>> + * of either the 64-bit value or the register number, offset etc.
>>>> + * Interpretation depends on whether the kind_flag is set as described above.
>>>> + */
>>>> +
>>>> +/* BTF_KIND_LOC_PARAM specifies a signed size; negative values represent signed
>>>> + * values of the specific size, for example -8 is an 8-byte signed value.
>>>> + */
>>>> +#define BTF_TYPE_LOC_PARAM_SIZE(t) ((__s32)((t)->size))
>>>> +
>>>> +/* location param specified by reg + offset is a dereference */
>>>> +#define BTF_LOC_FLAG_REG_DEREF 0x1
>>>> +/* next location param is needed to specify parameter location also; for example
>>>> + * when two registers are used to store a 16-byte struct by value.
>>>> + */
>>>> +#define BTF_LOC_FLAG_CONTINUE 0x2
>>>> +
>>>> +struct btf_loc_param {
>>>> + union {
>>>> + struct {
>>>> + __u16 reg; /* register number */
>>>> + __u16 flags; /* register dereference */
>>>> + __s32 offset; /* offset from register-stored address */
>>>> + };
>>>> + struct {
>>>> + __u32 val_lo32; /* lo 32 bits of 64-bit value */
>>>> + __u32 val_hi32; /* hi 32 bits of 64-bit value */
>>>> + };
>>>> + };
>>>> +};
>>>> +
>>>> +/* BTF_KIND_LOC_PROTO specifies location prototypes; i.e. how locations relate
>>>> + * to parameters; a struct btf_type of BTF_KIND_LOC_PROTO is followed by a
>>>> + * a vlen-specified number of __u32 which specify the associated
>>>> + * BTF_KIND_LOC_PARAM for each function parameter associated with the
>>>> + * location. The type should either be 0 (no location info) or point at
>>>> + * a BTF_KIND_LOC_PARAM. Multiple BTF_KIND_LOC_PARAMs can be used to
>>>> + * represent a single function parameter; in such a case each should specify
>>>> + * BTF_LOC_FLAG_CONTINUE.
>>>> + *
>>>> + * The type field in the associated "struct btf_type" should point at an
>>>> + * associated BTF_KIND_FUNC_PROTO.
>>>> + */
>>>> +
>>>> +/* BTF_KIND_LOCSEC consists of vlen-specified number of "struct btf_loc"
>>>> + * containing location site-specific information;
>>>> + *
>>>> + * - name associated with the location (name_off)
>>>> + * - function prototype type id (func_proto)
>>>> + * - location prototype type id (loc_proto)
>>>> + * - address offset (offset)
>>>> + */
>>>> +
>>>> +struct btf_loc {
>>>> + __u32 name_off;
>>>> + __u32 func_proto;
>>>> + __u32 loc_proto;
>>>> + __u32 offset;
>>>> +};
>>>
>>> What is that offset relative to? Offset within the function in which
>>> we were inlined? Do we know what that function is? I might have missed
>>> how we represent that.
>>
>> The offset is relative to kernel base address (at compile-time the
>> address of .text, at runtime the address of _start). The reasoning is we
>> have to deal with kASLR which means any compile-time absolute address
>> will likely change when the kernel is loaded. So we cannot deal in raw
>> addresses, and to fixup the addresses we then gather kernel/module base
>> address at runtime to compute the actual location of the inline site.
>> See get_base_addr() in tools/lib/bpf/loc.c in patch 14 for an example of
>> how this is done.
>
> this makes sense, but this should be documented, IMO
>
Definitely. Will do next time.
>>
>> Given this, it might make sense to have a convention where the LOCSEC
>> specifies the section name also, something like
>>
>> "inline.text"
>>
>> What do you think?
>
> hm... I'd specify offsets relative to the KASLR base, uniformly.
> Section name is a somewhat superficial detail in terms of tracing
> kernel functions, I don't know if it's that important to group
> functions by ELF section. (unless I'm missing where this would be
> important for correctness?)
>
There are cases where the code lives in a different section from .text
but I think the main case I came across here was stuff like .init
sections in modules that don't hang around after module initialization,
so there's probably no need to handle them specially.
>>
>>>
>>>> +
>>>> +/* helps libbpf know that location declarations are present; libbpf
>>>> + * can then work around absence if this value is not set.
>>>> + */
>>>> +#define BTF_KIND_LOC_UAPI_DEFINED 1
>>>> +
>>>
>>> you don't mention that in the commit, I'll have to figure this out
>>> from subsequent patches, but it would be nice to give an overview of
>>> the purpose of this in this patch
>>>
>>
>> This is a bit ugly, but is intended to help deal with the situation -
>> which happens a lot with distros where we might want to build libbpf
>> without latest UAPI headers (some distros may not get new UAPI headers
>> for a while). The libbpf patches check if the above is defined, and if
>> not supply their own location-related definitions. If in turn libbpf
>> needs to define them, it defines BTF_KIND_LOC_LIBBPF_DEFINED. Finally
>> pahole - which needs to compile both with a checkpointed libbpf commit
>> and a libbpf that may be older and not have location definitions -
>> checks for either, and if not present does a similar set of declarations
>> to ensure compilation still succeeds. We use weak declarations of libbpf
>> location-related functions locally to check if they are available at
>> runtime; this dynamically determines if the inline feature is available.
>>
>> Not pretty, but it will help avioid some of the issues we had with BTF
>> enum64 and compilation.
>
> um... all this is completely unnecessary because libbpf is supplying
> its own freshest UAPI headers it needs in Github mirror under
> include/uapi/linux subdir. Distros should use those UAPI headers to
> build libbpf from source.
>
> So the above BTF_KIND_LOC_UAPI_DEFINED hack is not necessary.
>
Ok sounds good, but we do still have a problem for pahole; it can be
built against an external shared library (i.e. non-embedded) libbpf. It
might make more sense for pahole to include uapi headers from the synced
commit in case it is using non-embedded libbpf (in the non-embedded
libbpf case we don't even pull the libbpf git submodule so might need a
local copy).
Thanks!
Alan
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 01/15] bpf: Extend UAPI to support location information
2025-10-23 8:17 ` Alan Maguire
@ 2025-11-05 0:43 ` Andrii Nakryiko
0 siblings, 0 replies; 63+ messages in thread
From: Andrii Nakryiko @ 2025-11-05 0:43 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Thu, Oct 23, 2025 at 1:17 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 20/10/2025 21:57, Andrii Nakryiko wrote:
> > On Fri, Oct 17, 2025 at 1:43 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> On 16/10/2025 19:36, Andrii Nakryiko wrote:
> >>> On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>>>
> >>>> Add BTF_KIND_LOC_PARAM, BTF_KIND_LOC_PROTO and BTF_KIND_LOCSEC
> >>>> to help represent location information for functions.
> >>>>
> >>>> BTF_KIND_LOC_PARAM is used to represent how we retrieve data at a
> >>>> location; either via a register, or register+offset or a
> >>>> constant value.
> >>>>
> >>>> BTF_KIND_LOC_PROTO represents location information about a location
> >>>> with multiple BTF_KIND_LOC_PARAMs.
> >>>>
> >>>> And finally BTF_KIND_LOCSEC is a set of location sites, each
> >>>> of which has
> >>>>
> >>>> - a name (function name)
> >>>> - a function prototype specifying which types are associated
> >>>> with parameters
> >>>> - a location prototype specifying where to find those parameters
> >>>> - an address offset
> >>>>
> >>>> This can be used to represent
> >>>>
> >>>> - a fully-inlined function
> >>>> - a partially-inlined function where some _LOC_PROTOs represent
> >>>> inlined sites as above and others have normal _FUNC representations
> >>>> - a function with optimized parameters; again the FUNC_PROTO
> >>>> represents the original function, with LOC info telling us
> >>>> where to obtain each parameter (or 0 if the parameter is
> >>>> unobtainable)
> >>>>
> >>>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >>>> ---
> >>>> include/linux/btf.h | 29 +++++-
> >>>> include/uapi/linux/btf.h | 85 ++++++++++++++++-
> >>>> kernel/bpf/btf.c | 168 ++++++++++++++++++++++++++++++++-
> >>>> tools/include/uapi/linux/btf.h | 85 ++++++++++++++++-
> >>>> 4 files changed, 359 insertions(+), 8 deletions(-)
[...]
> >>
> >>>
> >>>> +
> >>>> +/* helps libbpf know that location declarations are present; libbpf
> >>>> + * can then work around absence if this value is not set.
> >>>> + */
> >>>> +#define BTF_KIND_LOC_UAPI_DEFINED 1
> >>>> +
> >>>
> >>> you don't mention that in the commit, I'll have to figure this out
> >>> from subsequent patches, but it would be nice to give an overview of
> >>> the purpose of this in this patch
> >>>
> >>
> >> This is a bit ugly, but is intended to help deal with the situation -
> >> which happens a lot with distros where we might want to build libbpf
> >> without latest UAPI headers (some distros may not get new UAPI headers
> >> for a while). The libbpf patches check if the above is defined, and if
> >> not supply their own location-related definitions. If in turn libbpf
> >> needs to define them, it defines BTF_KIND_LOC_LIBBPF_DEFINED. Finally
> >> pahole - which needs to compile both with a checkpointed libbpf commit
> >> and a libbpf that may be older and not have location definitions -
> >> checks for either, and if not present does a similar set of declarations
> >> to ensure compilation still succeeds. We use weak declarations of libbpf
> >> location-related functions locally to check if they are available at
> >> runtime; this dynamically determines if the inline feature is available.
> >>
> >> Not pretty, but it will help avioid some of the issues we had with BTF
> >> enum64 and compilation.
> >
> > um... all this is completely unnecessary because libbpf is supplying
> > its own freshest UAPI headers it needs in Github mirror under
> > include/uapi/linux subdir. Distros should use those UAPI headers to
> > build libbpf from source.
> >
> > So the above BTF_KIND_LOC_UAPI_DEFINED hack is not necessary.
> >
>
> Ok sounds good, but we do still have a problem for pahole; it can be
> built against an external shared library (i.e. non-embedded) libbpf. It
> might make more sense for pahole to include uapi headers from the synced
> commit in case it is using non-embedded libbpf (in the non-embedded
> libbpf case we don't even pull the libbpf git submodule so might need a
> local copy).
In the years I've been around the BPF ecosystem, I haven't seen a
single case outside of systemd's super dynamic plugin-like setup where
libbpf as a shared library would make sense. I don't think we should
add random #defines in kernel UAPI just to make such self-imposed
painful setups easier. And yes, I know how distros love their shared
libraries... :)
>
> Thanks!
>
> Alan
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 01/15] bpf: Extend UAPI to support location information
2025-10-08 17:34 ` [RFC bpf-next 01/15] bpf: Extend UAPI to support location information Alan Maguire
2025-10-16 18:36 ` Andrii Nakryiko
@ 2025-10-23 0:56 ` Eduard Zingerman
2025-10-23 8:35 ` Alan Maguire
1 sibling, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-10-23 0:56 UTC (permalink / raw)
To: Alan Maguire, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On Wed, 2025-10-08 at 18:34 +0100, Alan Maguire wrote:
[...]
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index f06976ffb63f..65091c6aff4b 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
[...]
> > @@ -552,7 +579,7 @@ struct btf_field_desc {
> /* member struct size, or zero, if no members */
> int m_sz;
> /* repeated per-member offsets */
> - int m_off_cnt, m_offs[1];
> + int m_off_cnt, m_offs[2];
> };
Should this be a part of patch #2?
Commit message of the patch #2 explains why its needed.
> diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
> index 266d4ffa6c07..a74b9d202847 100644
> --- a/include/uapi/linux/btf.h
> +++ b/include/uapi/linux/btf.h
[...]
> +/* BTF_KIND_LOC_PARAM consists a btf_type specifying a vlen of 0, name_off is 0
> + * and is followed by a singular "struct btf_loc_param". type/size specifies
> + * the size of the associated location value. The size value should be
> + * cast to a __s32 as negative sizes can be specified; -8 to indicate a signed
> + * 8 byte value for example.
Not sure it matters after Andrii's suggestion, but I find this
description a bit cryptic. Maybe just note that (s32)(t)->size
can be -8, -4, -2 for signed values, 2, 4, 8 for unsigned values,
and its absolute value denotes the size of the value in bytes?
+1 to Andrii's suggestion to use enum to represent btf_loc_param "tag".
Also, what register numbering scheme is used?
Probably should be mentioned in the docstring.
> + *
> + * If kind_flag is 1 the btf_loc is a constant value, otherwise it represents
> + * a register, possibly dereferencing it with the specified offset.
> + *
> + * "struct btf_type" is followed by a "struct btf_loc_param" which consists
> + * of either the 64-bit value or the register number, offset etc.
> + * Interpretation depends on whether the kind_flag is set as described above.
> + */
[...]
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 0de8fc8a0e0b..29cec549f119 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
[...]
> +static void btf_loc_param_log(struct btf_verifier_env *env,
> + const struct btf_type *t)
> +{
> + const struct btf_loc_param *p = btf_loc_param(t);
> +
> + if (btf_type_kflag(t))
> + btf_verifier_log(env, "type=%u const=%lld", t->type, btf_loc_param_value(t));
> + else
> + btf_verifier_log(env, "type=%u reg=%u flags=%d offset %u",
Nit: print flags in hex?
> + t->type, p->reg, p->flags, p->offset);
> +}
[...]
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 01/15] bpf: Extend UAPI to support location information
2025-10-23 0:56 ` Eduard Zingerman
@ 2025-10-23 8:35 ` Alan Maguire
0 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-23 8:35 UTC (permalink / raw)
To: Eduard Zingerman, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On 23/10/2025 01:56, Eduard Zingerman wrote:
> On Wed, 2025-10-08 at 18:34 +0100, Alan Maguire wrote:
>
> [...]
>
>> diff --git a/include/linux/btf.h b/include/linux/btf.h
>> index f06976ffb63f..65091c6aff4b 100644
>> --- a/include/linux/btf.h
>> +++ b/include/linux/btf.h
>
> [...]
>
>>> @@ -552,7 +579,7 @@ struct btf_field_desc {
>> /* member struct size, or zero, if no members */
>> int m_sz;
>> /* repeated per-member offsets */
>> - int m_off_cnt, m_offs[1];
>> + int m_off_cnt, m_offs[2];
>> };
>
> Should this be a part of patch #2?
> Commit message of the patch #2 explains why its needed.
>
Probably should be; I tried to keep the kernel stuff in patch 1 but this
particular change is more conceptually related to the changes in patch 2
alright given that we share field iteration. Can move all the field
iterator changes into a distinct patch for clarity.
>> diff --git a/include/uapi/linux/btf.h b/include/uapi/linux/btf.h
>> index 266d4ffa6c07..a74b9d202847 100644
>> --- a/include/uapi/linux/btf.h
>> +++ b/include/uapi/linux/btf.h
>
> [...]
>
>> +/* BTF_KIND_LOC_PARAM consists a btf_type specifying a vlen of 0, name_off is 0
>> + * and is followed by a singular "struct btf_loc_param". type/size specifies
>> + * the size of the associated location value. The size value should be
>> + * cast to a __s32 as negative sizes can be specified; -8 to indicate a signed
>> + * 8 byte value for example.
>
> Not sure it matters after Andrii's suggestion, but I find this
> description a bit cryptic. Maybe just note that (s32)(t)->size
> can be -8, -4, -2 for signed values, 2, 4, 8 for unsigned values,
> and its absolute value denotes the size of the value in bytes?
>
> +1 to Andrii's suggestion to use enum to represent btf_loc_param "tag".
>
> Also, what register numbering scheme is used?
> Probably should be mentioned in the docstring.
>
Good point; it's basically the register numbering we get from DWARF to
ensure it's arch-agnostic. Regs 0-31 map the same way they do for DWARF
and reg 33 is fp.
>> + *
>> + * If kind_flag is 1 the btf_loc is a constant value, otherwise it represents
>> + * a register, possibly dereferencing it with the specified offset.
>> + *
>> + * "struct btf_type" is followed by a "struct btf_loc_param" which consists
>> + * of either the 64-bit value or the register number, offset etc.
>> + * Interpretation depends on whether the kind_flag is set as described above.
>> + */
>
> [...]
>
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index 0de8fc8a0e0b..29cec549f119 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>
> [...]
>
>> +static void btf_loc_param_log(struct btf_verifier_env *env,
>> + const struct btf_type *t)
>> +{
>> + const struct btf_loc_param *p = btf_loc_param(t);
>> +
>> + if (btf_type_kflag(t))
>> + btf_verifier_log(env, "type=%u const=%lld", t->type, btf_loc_param_value(t));
>> + else
>> + btf_verifier_log(env, "type=%u reg=%u flags=%d offset %u",
>
> Nit: print flags in hex?
>
yep good idea. Thanks!
Alan
^ permalink raw reply [flat|nested] 63+ messages in thread
* [RFC bpf-next 02/15] libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
2025-10-08 17:34 ` [RFC bpf-next 01/15] bpf: Extend UAPI to support location information Alan Maguire
@ 2025-10-08 17:34 ` Alan Maguire
2025-10-23 0:57 ` Eduard Zingerman
2025-10-23 19:18 ` Eduard Zingerman
2025-10-08 17:34 ` [RFC bpf-next 03/15] libbpf: Add option to retrieve map from old->new ids from btf__dedup() Alan Maguire
` (14 subsequent siblings)
16 siblings, 2 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:34 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
Add support for new kinds to libbpf. BTF_KIND_LOC_PARAM and
BTF_KIND_LOC_PROTO are dedup-able so add support for their
deduplication, whereas since BTF_KIND_LOCSEC contains a unique
offset it is not.
Other considerations: because BTF_KIND_LOCSEC has multiple
member type ids we need to increase the number of member elements
to 2 in the field iterator.
Add APIs to add location param, location prototypes and location
sections.
For BTF relocation we add location info to split BTF.
One small thing noticed during testing; the test for adding_to_base
relies on the split BTF start id being > 1; however it is possible
to have empty distilled base BTF, so this test should be generalized
to check for the base BTF pointer (it will be non-NULL for split
BTF even if the base BTF is empty).
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/lib/bpf/btf.c | 339 +++++++++++++++++++++++++++++++-
tools/lib/bpf/btf.h | 90 +++++++++
tools/lib/bpf/btf_dump.c | 10 +-
tools/lib/bpf/btf_iter.c | 23 +++
tools/lib/bpf/libbpf.map | 5 +
tools/lib/bpf/libbpf_internal.h | 4 +-
6 files changed, 464 insertions(+), 7 deletions(-)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 18907f0fcf9f..0abd7831d6b4 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -327,6 +327,12 @@ static int btf_type_size(const struct btf_type *t)
return base_size + vlen * sizeof(struct btf_var_secinfo);
case BTF_KIND_DECL_TAG:
return base_size + sizeof(struct btf_decl_tag);
+ case BTF_KIND_LOC_PARAM:
+ return base_size + sizeof(__u64);
+ case BTF_KIND_LOC_PROTO:
+ return base_size + vlen * sizeof(__u32);
+ case BTF_KIND_LOCSEC:
+ return base_size + vlen * sizeof(struct btf_loc);
default:
pr_debug("Unsupported BTF_KIND:%u\n", btf_kind(t));
return -EINVAL;
@@ -343,12 +349,15 @@ static void btf_bswap_type_base(struct btf_type *t)
static int btf_bswap_type_rest(struct btf_type *t)
{
struct btf_var_secinfo *v;
+ struct btf_loc_param *lp;
struct btf_enum64 *e64;
struct btf_member *m;
struct btf_array *a;
struct btf_param *p;
struct btf_enum *e;
+ struct btf_loc *l;
__u16 vlen = btf_vlen(t);
+ __u32 *ids;
int i;
switch (btf_kind(t)) {
@@ -411,6 +420,30 @@ static int btf_bswap_type_rest(struct btf_type *t)
case BTF_KIND_DECL_TAG:
btf_decl_tag(t)->component_idx = bswap_32(btf_decl_tag(t)->component_idx);
return 0;
+ case BTF_KIND_LOC_PARAM:
+ lp = btf_loc_param(t);
+ if (btf_kflag(t)) {
+ lp->val_lo32 = bswap_32(lp->val_lo32);
+ lp->val_hi32 = bswap_32(lp->val_hi32);
+ } else {
+ lp->reg = bswap_16(lp->reg);
+ lp->flags = bswap_16(lp->flags);
+ lp->offset = bswap_32(lp->offset);
+ }
+ return 0;
+ case BTF_KIND_LOC_PROTO:
+ for (i = 0, ids = btf_loc_proto_params(t); i < vlen; i++, ids++)
+ *ids = bswap_32(*ids);
+ return 0;
+ case BTF_KIND_LOCSEC:
+ for (i = 0, l = btf_locsec_locs(t); i < vlen; i++, l++) {
+ l->name_off = bswap_32(l->name_off);
+ l->func_proto = bswap_32(l->func_proto);
+ l->loc_proto = bswap_32(l->loc_proto);
+ l->offset = bswap_32(l->offset);
+ }
+ return 0;
+
default:
pr_debug("Unsupported BTF_KIND:%u\n", btf_kind(t));
return -EINVAL;
@@ -588,6 +621,34 @@ static int btf_validate_type(const struct btf *btf, const struct btf_type *t, __
}
break;
}
+ case BTF_KIND_LOC_PARAM:
+ break;
+ case BTF_KIND_LOC_PROTO: {
+ __u32 *p = btf_loc_proto_params(t);
+
+ n = btf_vlen(t);
+ for (i = 0; i < n; i++, p++) {
+ err = btf_validate_id(btf, *p, id);
+ if (err)
+ return err;
+ }
+ break;
+ }
+ case BTF_KIND_LOCSEC: {
+ const struct btf_loc *l = btf_locsec_locs(t);
+
+ n = btf_vlen(t);
+ for (i = 0; i < n; i++, l++) {
+ err = btf_validate_str(btf, l->name_off, "loc name", id);
+ if (!err)
+ err = btf_validate_id(btf, l->func_proto, id);
+ if (!err)
+ btf_validate_id(btf, l->loc_proto, id);
+ if (err)
+ return err;
+ }
+ break;
+ }
default:
pr_warn("btf: type [%u]: unrecognized kind %u\n", id, kind);
return -EINVAL;
@@ -2993,6 +3054,183 @@ int btf__add_decl_attr(struct btf *btf, const char *value, int ref_type_id,
return btf_add_decl_tag(btf, value, ref_type_id, component_idx, 1);
}
+/*
+ * Append new BTF_KIND_LOC_PARAM with either
+ * - *value* set as __u64 value following btf_type, with info->kflag set to 1
+ * if *is_value* is true; or
+ * - *reg* number, *flags* and *offset* set if *is_value* is set to 0, and
+ * info->kflag set to 0.
+ * Returns:
+ * - >0, type ID of newly added BTF type;
+ * - <0, on error.
+ */
+int btf__add_loc_param(struct btf *btf, __s32 size, bool is_value, __u64 value,
+ __u16 reg, __u16 flags, __s32 offset)
+{
+ struct btf_loc_param *p;
+ struct btf_type *t;
+ int sz;
+
+ if (btf_ensure_modifiable(btf))
+ return libbpf_err(-ENOMEM);
+
+ sz = sizeof(struct btf_type) + sizeof(__u64);
+ t = btf_add_type_mem(btf, sz);
+ if (!t)
+ return libbpf_err(-ENOMEM);
+
+ t->name_off = 0;
+ t->info = btf_type_info(BTF_KIND_LOC_PARAM, 0, is_value);
+ t->size = size;
+
+ p = btf_loc_param(t);
+
+ if (is_value) {
+ p->val_lo32 = value & 0xffffffff;
+ p->val_hi32 = value >> 32;
+ } else {
+ p->reg = reg;
+ p->flags = flags;
+ p->offset = offset;
+ }
+ return btf_commit_type(btf, sz);
+}
+
+/*
+ * Append new BTF_KIND_LOC_PROTO
+ *
+ * The prototype is then populated with 0 or more BTF_KIND_LOC_PARAMs via
+ * btf__add_loc_proto_param(); similar to how btf__add_func_param() adds
+ * parameters to a FUNC_PROTO.
+ *
+ * Returns:
+ * - >0, type ID of newly added BTF type;
+ * - <0, on error.
+ */
+int btf__add_loc_proto(struct btf *btf)
+{
+ struct btf_type *t;
+
+ if (btf_ensure_modifiable(btf))
+ return libbpf_err(-ENOMEM);
+
+ t = btf_add_type_mem(btf, sizeof(struct btf_type));
+ if (!t)
+ return libbpf_err(-ENOMEM);
+
+ t->name_off = 0;
+ t->info = btf_type_info(BTF_KIND_LOC_PROTO, 0, 0);
+ t->size = 0;
+
+ return btf_commit_type(btf, sizeof(struct btf_type));
+}
+
+int btf__add_loc_proto_param(struct btf *btf, __u32 id)
+{
+ struct btf_type *t;
+ __u32 *p;
+ int sz;
+
+ if (validate_type_id(id))
+ return libbpf_err(-EINVAL);
+
+ /* last type should be BTF_KIND_LOC_PROTO */
+ if (btf->nr_types == 0)
+ return libbpf_err(-EINVAL);
+ t = btf_last_type(btf);
+ if (!btf_is_loc_proto(t))
+ return libbpf_err(-EINVAL);
+
+ /* decompose and invalidate raw data */
+ if (btf_ensure_modifiable(btf))
+ return libbpf_err(-ENOMEM);
+
+ sz = sizeof(__u32);
+ p = btf_add_type_mem(btf, sz);
+ if (!p)
+ return libbpf_err(-ENOMEM);
+ *p = id;
+
+ /* update parent type's vlen */
+ t = btf_last_type(btf);
+ btf_type_inc_vlen(t);
+
+ btf->hdr->type_len += sz;
+ btf->hdr->str_off += sz;
+ return 0;
+}
+
+int btf__add_locsec(struct btf *btf, const char *name)
+{
+ struct btf_type *t;
+ int name_off = 0;
+
+ if (btf_ensure_modifiable(btf))
+ return libbpf_err(-ENOMEM);
+
+ if (name && name[0]) {
+ name_off = btf__add_str(btf, name);
+ if (name_off < 0)
+ return name_off;
+ }
+ t = btf_add_type_mem(btf, sizeof(struct btf_type));
+ if (!t)
+ return libbpf_err(-ENOMEM);
+
+ t->name_off = name_off;
+ t->info = btf_type_info(BTF_KIND_LOCSEC, 0, 0);
+ t->size = 0;
+
+ return btf_commit_type(btf, sizeof(struct btf_type));
+}
+
+int btf__add_locsec_loc(struct btf *btf, const char *name, __u32 func_proto, __u32 loc_proto,
+ __u32 offset)
+{
+ struct btf_type *t;
+ struct btf_loc *l;
+ int name_off, sz;
+
+ if (!name || !name[0])
+ return libbpf_err(-EINVAL);
+
+ if (validate_type_id(func_proto) || validate_type_id(loc_proto))
+ return libbpf_err(-EINVAL);
+
+ /* last type should be BTF_KIND_LOCSEC */
+ if (btf->nr_types == 0)
+ return libbpf_err(-EINVAL);
+ t = btf_last_type(btf);
+ if (!btf_is_locsec(t))
+ return libbpf_err(-EINVAL);
+
+ /* decompose and invalidate raw data */
+ if (btf_ensure_modifiable(btf))
+ return libbpf_err(-ENOMEM);
+
+ name_off = btf__add_str(btf, name);
+ if (name_off < 0)
+ return name_off;
+
+ sz = sizeof(*l);
+ l = btf_add_type_mem(btf, sz);
+ if (!l)
+ return libbpf_err(-ENOMEM);
+
+ l->name_off = name_off;
+ l->func_proto = func_proto;
+ l->loc_proto = loc_proto;
+ l->offset = offset;
+
+ /* update parent type's vlen */
+ t = btf_last_type(btf);
+ btf_type_inc_vlen(t);
+
+ btf->hdr->type_len += sz;
+ btf->hdr->str_off += sz;
+ return 0;
+}
+
struct btf_ext_sec_info_param {
__u32 off;
__u32 len;
@@ -3760,8 +3998,8 @@ static struct btf_dedup *btf_dedup_new(struct btf *btf, const struct btf_dedup_o
for (i = 1; i < type_cnt; i++) {
struct btf_type *t = btf_type_by_id(d->btf, i);
- /* VAR and DATASEC are never deduped and are self-canonical */
- if (btf_is_var(t) || btf_is_datasec(t))
+ /* VAR DATASEC and LOCSEC are never deduped and are self-canonical */
+ if (btf_is_var(t) || btf_is_datasec(t) || btf_is_locsec(t))
d->map[i] = i;
else
d->map[i] = BTF_UNPROCESSED_ID;
@@ -4001,6 +4239,26 @@ static bool btf_equal_enum(struct btf_type *t1, struct btf_type *t2)
return btf_equal_enum64_members(t1, t2);
}
+static long btf_hash_loc_proto(struct btf_type *t)
+{
+ __u32 *p = btf_loc_proto_params(t);
+ long h = btf_hash_common(t);
+ int i, vlen = btf_vlen(t);
+
+ for (i = 0; i < vlen; i++, p++)
+ h = hash_combine(h, *p);
+ return h;
+}
+
+static bool btf_equal_loc_param(struct btf_type *t1, struct btf_type *t2)
+{
+ if (!btf_equal_common(t1, t2))
+ return false;
+ return btf_kflag(t1) == btf_kflag(t2) &&
+ t1->size == t2->size &&
+ btf_loc_param_value(t1) == btf_loc_param_value(t2);
+}
+
static inline bool btf_is_enum_fwd(struct btf_type *t)
{
return btf_is_any_enum(t) && btf_vlen(t) == 0;
@@ -4214,7 +4472,8 @@ static int btf_dedup_prep(struct btf_dedup *d)
switch (btf_kind(t)) {
case BTF_KIND_VAR:
case BTF_KIND_DATASEC:
- /* VAR and DATASEC are never hash/deduplicated */
+ case BTF_KIND_LOCSEC:
+ /* VAR DATASEC and LOCSEC are never hash/deduplicated */
continue;
case BTF_KIND_CONST:
case BTF_KIND_VOLATILE:
@@ -4245,6 +4504,12 @@ static int btf_dedup_prep(struct btf_dedup *d)
case BTF_KIND_FUNC_PROTO:
h = btf_hash_fnproto(t);
break;
+ case BTF_KIND_LOC_PARAM:
+ h = btf_hash_common(t);
+ break;
+ case BTF_KIND_LOC_PROTO:
+ h = btf_hash_loc_proto(t);
+ break;
default:
pr_debug("unknown kind %d for type [%d]\n", btf_kind(t), type_id);
return -EINVAL;
@@ -4287,6 +4552,8 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id)
case BTF_KIND_DATASEC:
case BTF_KIND_DECL_TAG:
case BTF_KIND_TYPE_TAG:
+ case BTF_KIND_LOC_PROTO:
+ case BTF_KIND_LOCSEC:
return 0;
case BTF_KIND_INT:
@@ -4336,6 +4603,18 @@ static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id)
}
break;
+ case BTF_KIND_LOC_PARAM:
+ h = btf_hash_common(t);
+ for_each_dedup_cand(d, hash_entry, h) {
+ cand_id = hash_entry->value;
+ cand = btf_type_by_id(d->btf, cand_id);
+ if (btf_equal_loc_param(t, cand)) {
+ new_id = cand_id;
+ break;
+ }
+ }
+ break;
+
default:
return -EINVAL;
}
@@ -4749,6 +5028,13 @@ static int btf_dedup_is_equiv(struct btf_dedup *d, __u32 cand_id,
return 1;
}
+ case BTF_KIND_LOC_PARAM:
+ return btf_equal_loc_param(cand_type, canon_type);
+
+ case BTF_KIND_LOC_PROTO:
+ case BTF_KIND_LOCSEC:
+ return 0;
+
default:
return -EINVAL;
}
@@ -5075,6 +5361,45 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
break;
}
+ case BTF_KIND_LOC_PROTO: {
+ __u32 *p1, *p2;
+ __u16 i, vlen;
+
+ p1 = btf_loc_proto_params(t);
+ vlen = btf_vlen(t);
+
+ for (i = 0; i < vlen; i++, p1++) {
+ ref_type_id = btf_dedup_ref_type(d, *p1);
+ if (ref_type_id < 0)
+ return ref_type_id;
+ *p1 = ref_type_id;
+ }
+
+ h = btf_hash_loc_proto(t);
+ for_each_dedup_cand(d, hash_entry, h) {
+ cand_id = hash_entry->value;
+ cand = btf_type_by_id(d->btf, cand_id);
+ if (!btf_equal_common(t, cand))
+ continue;
+ vlen = btf_vlen(cand);
+ p1 = btf_loc_proto_params(t);
+ p2 = btf_loc_proto_params(cand);
+ if (vlen == 0) {
+ new_id = cand_id;
+ break;
+ }
+ for (i = 0; i < vlen; i++, p1++, p2++) {
+ if (*p1 != *p2)
+ break;
+ new_id = cand_id;
+ break;
+ }
+ if (new_id == cand_id)
+ break;
+ }
+ break;
+ }
+
default:
return -EINVAL;
}
@@ -5555,6 +5880,8 @@ static int btf_add_distilled_type_ids(struct btf_distill *dist, __u32 i)
case BTF_KIND_VOLATILE:
case BTF_KIND_FUNC_PROTO:
case BTF_KIND_TYPE_TAG:
+ case BTF_KIND_LOC_PARAM:
+ case BTF_KIND_LOC_PROTO:
dist->id_map[*id] = *id;
break;
default:
@@ -5580,12 +5907,11 @@ static int btf_add_distilled_type_ids(struct btf_distill *dist, __u32 i)
static int btf_add_distilled_types(struct btf_distill *dist)
{
- bool adding_to_base = dist->pipe.dst->start_id == 1;
+ bool adding_to_base = dist->pipe.dst->base_btf == NULL;
int id = btf__type_cnt(dist->pipe.dst);
struct btf_type *t;
int i, err = 0;
-
/* Add types for each of the required references to either distilled
* base or split BTF, depending on type characteristics.
*/
@@ -5650,6 +5976,9 @@ static int btf_add_distilled_types(struct btf_distill *dist)
case BTF_KIND_VOLATILE:
case BTF_KIND_FUNC_PROTO:
case BTF_KIND_TYPE_TAG:
+ case BTF_KIND_LOC_PARAM:
+ case BTF_KIND_LOC_PROTO:
+ case BTF_KIND_LOCSEC:
/* All other types are added to split BTF. */
if (adding_to_base)
continue;
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index ccfd905f03df..0f55518a2be0 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -247,6 +247,18 @@ LIBBPF_API int btf__add_decl_tag(struct btf *btf, const char *value, int ref_typ
LIBBPF_API int btf__add_decl_attr(struct btf *btf, const char *value, int ref_type_id,
int component_idx);
+LIBBPF_API int btf__add_loc_param(struct btf *btf, __s32 size, bool is_value, __u64 value,
+ __u16 reg, __u16 flags, __s32 offset);
+
+LIBBPF_API int btf__add_loc_proto(struct btf *btf);
+
+LIBBPF_API int btf__add_loc_proto_param(struct btf *btf, __u32 id);
+
+LIBBPF_API int btf__add_locsec(struct btf *btf, const char *name);
+
+LIBBPF_API int btf__add_locsec_loc(struct btf *btf, const char *name, __u32 func_proto,
+ __u32 loc_proto, __u32 offset);
+
struct btf_dedup_opts {
size_t sz;
/* optional .BTF.ext info to dedup along the main BTF info */
@@ -360,6 +372,42 @@ btf_dump__dump_type_data(struct btf_dump *d, __u32 id,
#define BTF_KIND_TYPE_TAG 18 /* Type Tag */
#define BTF_KIND_ENUM64 19 /* Enum for up-to 64bit values */
+#ifndef BTF_KIND_LOC_UAPI_DEFINED
+#define BTF_KIND_LOC_LIBBPF_DEFINED
+#define BTF_KIND_LOC_PARAM 20
+#define BTF_KIND_LOC_PROTO 21
+#define BTF_KIND_LOCSEC 22
+
+#define BTF_TYPE_LOC_PARAM_SIZE(t) ((__s32)((t)->size))
+#define BTF_LOC_FLAG_DEREF 0x1
+#define BTF_LOC_FLAG_CONTINUE 0x2
+
+struct btf_loc_param {
+ union {
+ struct {
+ __u16 reg; /* register number */
+ __u16 flags; /* register dereference */
+ __s32 offset; /* offset from register-stored address */
+ };
+ struct {
+ __u32 val_lo32; /* lo 32 bits of 64-bit value */
+ __u32 val_hi32; /* hi 32 bits of 64-bit value */
+ };
+ };
+};
+
+struct btf_loc {
+ __u32 name_off;
+ __u32 func_proto;
+ __u32 loc_proto;
+ __u32 offset;
+};
+
+#else
+struct btf_loc_param;
+struct btf_loc;
+#endif
+
static inline __u16 btf_kind(const struct btf_type *t)
{
return BTF_INFO_KIND(t->info);
@@ -497,6 +545,21 @@ static inline bool btf_is_any_enum(const struct btf_type *t)
return btf_is_enum(t) || btf_is_enum64(t);
}
+static inline bool btf_is_loc_param(const struct btf_type *t)
+{
+ return btf_kind(t) == BTF_KIND_LOC_PARAM;
+}
+
+static inline bool btf_is_loc_proto(const struct btf_type *t)
+{
+ return btf_kind(t) == BTF_KIND_LOC_PROTO;
+}
+
+static inline bool btf_is_locsec(const struct btf_type *t)
+{
+ return btf_kind(t) == BTF_KIND_LOCSEC;
+}
+
static inline bool btf_kind_core_compat(const struct btf_type *t1,
const struct btf_type *t2)
{
@@ -611,6 +674,33 @@ static inline struct btf_decl_tag *btf_decl_tag(const struct btf_type *t)
return (struct btf_decl_tag *)(t + 1);
}
+static inline struct btf_loc_param *btf_loc_param(const struct btf_type *t)
+{
+ return (struct btf_loc_param *)(t + 1);
+}
+
+static inline __s32 btf_loc_param_size(const struct btf_type *t)
+{
+ return (__s32)t->size;
+}
+
+static inline __u64 btf_loc_param_value(const struct btf_type *t)
+{
+ struct btf_loc_param *p = btf_loc_param(t);
+
+ return p->val_lo32 | (((__u64)(p->val_hi32)) << 32);
+}
+
+static inline __u32 *btf_loc_proto_params(const struct btf_type *t)
+{
+ return (__u32 *)(t + 1);
+}
+
+static inline struct btf_loc *btf_locsec_locs(const struct btf_type *t)
+{
+ return (struct btf_loc *)(t + 1);
+}
+
#ifdef __cplusplus
} /* extern "C" */
#endif
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
index 6388392f49a0..95bdda2f4a2d 100644
--- a/tools/lib/bpf/btf_dump.c
+++ b/tools/lib/bpf/btf_dump.c
@@ -328,6 +328,9 @@ static int btf_dump_mark_referenced(struct btf_dump *d)
case BTF_KIND_ENUM64:
case BTF_KIND_FWD:
case BTF_KIND_FLOAT:
+ case BTF_KIND_LOC_PARAM:
+ case BTF_KIND_LOC_PROTO:
+ case BTF_KIND_LOCSEC:
break;
case BTF_KIND_VOLATILE:
@@ -339,7 +342,6 @@ static int btf_dump_mark_referenced(struct btf_dump *d)
case BTF_KIND_VAR:
case BTF_KIND_DECL_TAG:
case BTF_KIND_TYPE_TAG:
- d->type_states[t->type].referenced = 1;
break;
case BTF_KIND_ARRAY: {
@@ -609,6 +611,9 @@ static int btf_dump_order_type(struct btf_dump *d, __u32 id, bool through_ptr)
case BTF_KIND_VAR:
case BTF_KIND_DATASEC:
case BTF_KIND_DECL_TAG:
+ case BTF_KIND_LOC_PARAM:
+ case BTF_KIND_LOC_PROTO:
+ case BTF_KIND_LOCSEC:
d->type_states[id].order_state = ORDERED;
return 0;
@@ -2516,6 +2521,9 @@ static int btf_dump_dump_type_data(struct btf_dump *d,
case BTF_KIND_FUNC:
case BTF_KIND_FUNC_PROTO:
case BTF_KIND_DECL_TAG:
+ case BTF_KIND_LOC_PARAM:
+ case BTF_KIND_LOC_PROTO:
+ case BTF_KIND_LOCSEC:
err = btf_dump_unsupported_data(d, t, id);
break;
case BTF_KIND_INT:
diff --git a/tools/lib/bpf/btf_iter.c b/tools/lib/bpf/btf_iter.c
index 9a6c822c2294..e9a865d84d35 100644
--- a/tools/lib/bpf/btf_iter.c
+++ b/tools/lib/bpf/btf_iter.c
@@ -29,6 +29,7 @@ int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t,
case BTF_KIND_FLOAT:
case BTF_KIND_ENUM:
case BTF_KIND_ENUM64:
+ case BTF_KIND_LOC_PARAM:
it->desc = (struct btf_field_desc) {};
break;
case BTF_KIND_FWD:
@@ -71,6 +72,19 @@ int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t,
1, {offsetof(struct btf_var_secinfo, type)}
};
break;
+ case BTF_KIND_LOC_PROTO:
+ it->desc = (struct btf_field_desc) {
+ 0, {},
+ sizeof(__u32),
+ 1, {0}};
+ break;
+ case BTF_KIND_LOCSEC:
+ it->desc = (struct btf_field_desc) {
+ 0, {},
+ sizeof(struct btf_loc),
+ 2, {offsetof(struct btf_loc, func_proto),
+ offsetof(struct btf_loc, loc_proto)}};
+ break;
default:
return -EINVAL;
}
@@ -94,6 +108,8 @@ int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t,
case BTF_KIND_DECL_TAG:
case BTF_KIND_TYPE_TAG:
case BTF_KIND_DATASEC:
+ case BTF_KIND_LOC_PARAM:
+ case BTF_KIND_LOC_PROTO:
it->desc = (struct btf_field_desc) {
1, {offsetof(struct btf_type, name_off)}
};
@@ -127,6 +143,13 @@ int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t,
1, {offsetof(struct btf_param, name_off)}
};
break;
+ case BTF_KIND_LOCSEC:
+ it->desc = (struct btf_field_desc) {
+ 1, {offsetof(struct btf_type, name_off)},
+ sizeof(struct btf_loc),
+ 1, {offsetof(struct btf_loc, name_off)}
+ };
+ break;
default:
return -EINVAL;
}
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 8ed8749907d4..82a0d2ff1176 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -451,4 +451,9 @@ LIBBPF_1.7.0 {
global:
bpf_map__set_exclusive_program;
bpf_map__exclusive_program;
+ btf__add_loc_param;
+ btf__add_loc_proto;
+ btf__add_loc_proto_param;
+ btf__add_locsec;
+ btf__add_locsec_loc;
} LIBBPF_1.6.0;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 35b2527bedec..2a05518265e9 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -562,7 +562,9 @@ struct btf_field_desc {
/* member struct size, or zero, if no members */
int m_sz;
/* repeated per-member offsets */
- int m_off_cnt, m_offs[1];
+ int m_off_cnt, m_offs[2];
+ /* singular entity size after btf_type, if any */
+ int s_sz;
};
struct btf_field_iter {
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 02/15] libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC
2025-10-08 17:34 ` [RFC bpf-next 02/15] libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
@ 2025-10-23 0:57 ` Eduard Zingerman
2025-10-23 19:18 ` Eduard Zingerman
1 sibling, 0 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-10-23 0:57 UTC (permalink / raw)
To: Alan Maguire, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On Wed, 2025-10-08 at 18:34 +0100, Alan Maguire wrote:
[...]
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 18907f0fcf9f..0abd7831d6b4 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
[...]
> @@ -588,6 +621,34 @@ static int btf_validate_type(const struct btf *btf, const struct btf_type *t, __
offtopic: we should probably switch this function to use field and string iterators.
> }
> break;
> }
> + case BTF_KIND_LOC_PARAM:
> + break;
> + case BTF_KIND_LOC_PROTO: {
> + __u32 *p = btf_loc_proto_params(t);
> +
> + n = btf_vlen(t);
> + for (i = 0; i < n; i++, p++) {
> + err = btf_validate_id(btf, *p, id);
> + if (err)
> + return err;
> + }
> + break;
> + }
> + case BTF_KIND_LOCSEC: {
> + const struct btf_loc *l = btf_locsec_locs(t);
> +
> + n = btf_vlen(t);
> + for (i = 0; i < n; i++, l++) {
> + err = btf_validate_str(btf, l->name_off, "loc name", id);
> + if (!err)
> + err = btf_validate_id(btf, l->func_proto, id);
> + if (!err)
> + btf_validate_id(btf, l->loc_proto, id);
^^^^
Missing `err =`?
> + if (err)
> + return err;
> + }
> + break;
> + }
> default:
> pr_warn("btf: type [%u]: unrecognized kind %u\n", id, kind);
> return -EINVAL;
> @@ -2993,6 +3054,183 @@ int btf__add_decl_attr(struct btf *btf, const char *value, int ref_type_id,
> return btf_add_decl_tag(btf, value, ref_type_id, component_idx, 1);
> }
>
> +/*
> + * Append new BTF_KIND_LOC_PARAM with either
> + * - *value* set as __u64 value following btf_type, with info->kflag set to 1
> + * if *is_value* is true; or
> + * - *reg* number, *flags* and *offset* set if *is_value* is set to 0, and
> + * info->kflag set to 0.
> + * Returns:
> + * - >0, type ID of newly added BTF type;
> + * - <0, on error.
> + */
> +int btf__add_loc_param(struct btf *btf, __s32 size, bool is_value, __u64 value,
> + __u16 reg, __u16 flags, __s32 offset)
Probably, would be more convenient to have several functions, e.g.:
- btf__add_loc_param_const()
- btf__add_loc_param_reg()
- btf__add_loc_param_deref()
Should `size` be some kind of enum?
E.g. with values like S64, S32, ..., U64.
So the the usage would be like:
btf__add_loc_param_const(btf, U64, 0xdeadbeef);
Wdyt?
> +{
> + struct btf_loc_param *p;
> + struct btf_type *t;
> + int sz;
> +
> + if (btf_ensure_modifiable(btf))
> + return libbpf_err(-ENOMEM);
> +
> + sz = sizeof(struct btf_type) + sizeof(__u64);
> + t = btf_add_type_mem(btf, sz);
> + if (!t)
> + return libbpf_err(-ENOMEM);
> +
> + t->name_off = 0;
> + t->info = btf_type_info(BTF_KIND_LOC_PARAM, 0, is_value);
> + t->size = size;
> +
> + p = btf_loc_param(t);
> +
> + if (is_value) {
> + p->val_lo32 = value & 0xffffffff;
> + p->val_hi32 = value >> 32;
> + } else {
> + p->reg = reg;
> + p->flags = flags;
> + p->offset = offset;
> + }
> + return btf_commit_type(btf, sz);
> +}
[...]
> +int btf__add_locsec_loc(struct btf *btf, const char *name, __u32 func_proto, __u32 loc_proto,
> + __u32 offset)
> +{
> + struct btf_type *t;
> + struct btf_loc *l;
> + int name_off, sz;
> +
> + if (!name || !name[0])
> + return libbpf_err(-EINVAL);
> +
> + if (validate_type_id(func_proto) || validate_type_id(loc_proto))
> + return libbpf_err(-EINVAL);
> +
> + /* last type should be BTF_KIND_LOCSEC */
> + if (btf->nr_types == 0)
> + return libbpf_err(-EINVAL);
> + t = btf_last_type(btf);
> + if (!btf_is_locsec(t))
> + return libbpf_err(-EINVAL);
> +
> + /* decompose and invalidate raw data */
> + if (btf_ensure_modifiable(btf))
> + return libbpf_err(-ENOMEM);
> +
> + name_off = btf__add_str(btf, name);
> + if (name_off < 0)
> + return name_off;
> +
> + sz = sizeof(*l);
> + l = btf_add_type_mem(btf, sz);
> + if (!l)
> + return libbpf_err(-ENOMEM);
> +
> + l->name_off = name_off;
> + l->func_proto = func_proto;
> + l->loc_proto = loc_proto;
> + l->offset = offset;
> +
> + /* update parent type's vlen */
> + t = btf_last_type(btf);
> + btf_type_inc_vlen(t);
Since vlen is only u16, maybe check for overflow and report an error here?
> +
> + btf->hdr->type_len += sz;
> + btf->hdr->str_off += sz;
> + return 0;
> +}
> +
> struct btf_ext_sec_info_param {
> __u32 off;
> __u32 len;
[...]
> @@ -5075,6 +5361,45 @@ static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
> break;
> }
>
> + case BTF_KIND_LOC_PROTO: {
> + __u32 *p1, *p2;
> + __u16 i, vlen;
> +
> + p1 = btf_loc_proto_params(t);
> + vlen = btf_vlen(t);
> +
> + for (i = 0; i < vlen; i++, p1++) {
> + ref_type_id = btf_dedup_ref_type(d, *p1);
> + if (ref_type_id < 0)
> + return ref_type_id;
> + *p1 = ref_type_id;
> + }
> +
> + h = btf_hash_loc_proto(t);
> + for_each_dedup_cand(d, hash_entry, h) {
> + cand_id = hash_entry->value;
> + cand = btf_type_by_id(d->btf, cand_id);
> + if (!btf_equal_common(t, cand))
> + continue;
Nit: having btf_equal_loc_proto() would have been more readable here.
> + vlen = btf_vlen(cand);
> + p1 = btf_loc_proto_params(t);
> + p2 = btf_loc_proto_params(cand);
> + if (vlen == 0) {
> + new_id = cand_id;
> + break;
> + }
> + for (i = 0; i < vlen; i++, p1++, p2++) {
> + if (*p1 != *p2)
> + break;
> + new_id = cand_id;
> + break;
> + }
> + if (new_id == cand_id)
> + break;
Why `break` and not `continue`?
Also, BTF_KIND_FUNC_PROTO does not have this special case, why the difference?
> + }
> + break;
> + }
> +
> default:
> return -EINVAL;
> }
[...]
> diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c
> index 6388392f49a0..95bdda2f4a2d 100644
> --- a/tools/lib/bpf/btf_dump.c
> +++ b/tools/lib/bpf/btf_dump.c
> @@ -328,6 +328,9 @@ static int btf_dump_mark_referenced(struct btf_dump *d)
> case BTF_KIND_ENUM64:
> case BTF_KIND_FWD:
> case BTF_KIND_FLOAT:
> + case BTF_KIND_LOC_PARAM:
> + case BTF_KIND_LOC_PROTO:
> + case BTF_KIND_LOCSEC:
> break;
>
> case BTF_KIND_VOLATILE:
> @@ -339,7 +342,6 @@ static int btf_dump_mark_referenced(struct btf_dump *d)
> case BTF_KIND_VAR:
> case BTF_KIND_DECL_TAG:
> case BTF_KIND_TYPE_TAG:
> - d->type_states[t->type].referenced = 1;
> break;
This change seems unrelated, why is it necessary?
>
> case BTF_KIND_ARRAY: {
[...]
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 02/15] libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC
2025-10-08 17:34 ` [RFC bpf-next 02/15] libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
2025-10-23 0:57 ` Eduard Zingerman
@ 2025-10-23 19:18 ` Eduard Zingerman
2025-10-23 19:59 ` Eduard Zingerman
1 sibling, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-10-23 19:18 UTC (permalink / raw)
To: Alan Maguire, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On Wed, 2025-10-08 at 18:34 +0100, Alan Maguire wrote:
[...]
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 18907f0fcf9f..0abd7831d6b4 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
[...]
> @@ -588,6 +621,34 @@ static int btf_validate_type(const struct btf *btf, const struct btf_type *t, __
> }
> break;
> }
> + case BTF_KIND_LOC_PARAM:
> + break;
> + case BTF_KIND_LOC_PROTO: {
> + __u32 *p = btf_loc_proto_params(t);
> +
> + n = btf_vlen(t);
> + for (i = 0; i < n; i++, p++) {
> + err = btf_validate_id(btf, *p, id);
> + if (err)
> + return err;
> + }
> + break;
> + }
> + case BTF_KIND_LOCSEC: {
> + const struct btf_loc *l = btf_locsec_locs(t);
> +
> + n = btf_vlen(t);
> + for (i = 0; i < n; i++, l++) {
> + err = btf_validate_str(btf, l->name_off, "loc name", id);
> + if (!err)
> + err = btf_validate_id(btf, l->func_proto, id);
> + if (!err)
> + btf_validate_id(btf, l->loc_proto, id);
> + if (err)
> + return err;
> + }
> + break;
Do we want to also check that number of parameters in loc_proto is the
same (or less then) number of parameters in func_proto?
Also, would it make sense to support a case when e.g. parameters #1
and #3 are in known locations, but parameter #2 is absent?
> + }
> default:
> pr_warn("btf: type [%u]: unrecognized kind %u\n", id, kind);
> return -EINVAL;
[...]
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 02/15] libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC
2025-10-23 19:18 ` Eduard Zingerman
@ 2025-10-23 19:59 ` Eduard Zingerman
0 siblings, 0 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-10-23 19:59 UTC (permalink / raw)
To: Alan Maguire, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On Thu, 2025-10-23 at 12:18 -0700, Eduard Zingerman wrote:
> On Wed, 2025-10-08 at 18:34 +0100, Alan Maguire wrote:
>
> [...]
>
> > diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> > index 18907f0fcf9f..0abd7831d6b4 100644
> > --- a/tools/lib/bpf/btf.c
> > +++ b/tools/lib/bpf/btf.c
>
> [...]
>
> > @@ -588,6 +621,34 @@ static int btf_validate_type(const struct btf *btf, const struct btf_type *t, __
> > }
> > break;
> > }
> > + case BTF_KIND_LOC_PARAM:
> > + break;
> > + case BTF_KIND_LOC_PROTO: {
> > + __u32 *p = btf_loc_proto_params(t);
> > +
> > + n = btf_vlen(t);
> > + for (i = 0; i < n; i++, p++) {
> > + err = btf_validate_id(btf, *p, id);
> > + if (err)
> > + return err;
> > + }
> > + break;
> > + }
> > + case BTF_KIND_LOCSEC: {
> > + const struct btf_loc *l = btf_locsec_locs(t);
> > +
> > + n = btf_vlen(t);
> > + for (i = 0; i < n; i++, l++) {
> > + err = btf_validate_str(btf, l->name_off, "loc name", id);
> > + if (!err)
> > + err = btf_validate_id(btf, l->func_proto, id);
> > + if (!err)
> > + btf_validate_id(btf, l->loc_proto, id);
> > + if (err)
> > + return err;
> > + }
> > + break;
>
> Do we want to also check that number of parameters in loc_proto is the
> same (or less then) number of parameters in func_proto?
> Also, would it make sense to support a case when e.g. parameters #1
> and #3 are in known locations, but parameter #2 is absent?
Doodling with [1], it looks like ~4% of inline locations have such
partial information for gcc compiled kernel (~19K out of ~480K).
For clang compiled kernel numbers are much smaller: 0.8% (~4K out of 464K).
[1] https://github.com/eddyz87/inline-address-printer/
>
> > + }
> > default:
> > pr_warn("btf: type [%u]: unrecognized kind %u\n", id, kind);
> > return -EINVAL;
>
> [...]
^ permalink raw reply [flat|nested] 63+ messages in thread
* [RFC bpf-next 03/15] libbpf: Add option to retrieve map from old->new ids from btf__dedup()
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
2025-10-08 17:34 ` [RFC bpf-next 01/15] bpf: Extend UAPI to support location information Alan Maguire
2025-10-08 17:34 ` [RFC bpf-next 02/15] libbpf: Add support for BTF kinds LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
@ 2025-10-08 17:34 ` Alan Maguire
2025-10-16 18:39 ` Andrii Nakryiko
2025-10-08 17:35 ` [RFC bpf-next 04/15] libbpf: Fix parsing of multi-split BTF Alan Maguire
` (13 subsequent siblings)
16 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:34 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
When creating split BTF for the .BTF.extra section to record location
information, we need to add function prototypes that refer to base BTF
(vmlinux) types. However since .BTF.extra is split BTF we have a
problem; since collecting those type ids for the parameters, the base
vmlinux BTF has been deduplicated so the type ids are stale. As a
result it is valuable to be able to access the map from old->new type
ids that is constructed as part of deduplication. This allows us to
update the out-of-date type ids in the FUNC_PROTOs.
In order to pass the map back, we need to fill out all of the hypot
map mappings; as an optimization normal dedup only computes type id
mappings needed in existing BTF type id references.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/lib/bpf/btf.c | 35 ++++++++++++++++++++++++++++++++++-
tools/lib/bpf/btf.h | 5 ++++-
2 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 0abd7831d6b4..6b06fb60d39a 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -3650,6 +3650,8 @@ static int btf_dedup_ref_types(struct btf_dedup *d);
static int btf_dedup_resolve_fwds(struct btf_dedup *d);
static int btf_dedup_compact_types(struct btf_dedup *d);
static int btf_dedup_remap_types(struct btf_dedup *d);
+static int btf_dedup_remap_type_id(__u32 *type_id, void *ctx);
+static int btf_dedup_save_map(struct btf_dedup *d, __u32 **save_map);
/*
* Deduplicate BTF types and strings.
@@ -3850,6 +3852,15 @@ int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts)
}
done:
+ if (!err) {
+ if (opts && opts->dedup_map && opts->dedup_map_sz) {
+ err = btf_dedup_save_map(d, opts->dedup_map);
+ if (err >= 0) {
+ *opts->dedup_map_sz = err;
+ err = 0;
+ }
+ }
+ }
btf_dedup_free(d);
return libbpf_err(err);
}
@@ -3880,6 +3891,7 @@ struct btf_dedup {
__u32 *hypot_list;
size_t hypot_cnt;
size_t hypot_cap;
+ size_t hypot_map_cnt;
/* Whether hypothetical mapping, if successful, would need to adjust
* already canonicalized types (due to a new forward declaration to
* concrete type resolution). In such case, during split BTF dedup
@@ -4010,6 +4022,7 @@ static struct btf_dedup *btf_dedup_new(struct btf *btf, const struct btf_dedup_o
err = -ENOMEM;
goto done;
}
+ d->hypot_map_cnt = type_cnt;
for (i = 0; i < type_cnt; i++)
d->hypot_map[i] = BTF_UNPROCESSED_ID;
@@ -5628,7 +5641,6 @@ static int btf_dedup_remap_type_id(__u32 *type_id, void *ctx)
new_type_id = d->hypot_map[resolved_type_id];
if (new_type_id > BTF_MAX_NR_TYPES)
return -EINVAL;
-
*type_id = new_type_id;
return 0;
}
@@ -5678,6 +5690,27 @@ static int btf_dedup_remap_types(struct btf_dedup *d)
return 0;
}
+/* retrieve a copy of map and avoid it being freed during btf_dedup_free(). */
+static int btf_dedup_save_map(struct btf_dedup *d, __u32 **save_map)
+{
+ __u32 i, resolved_id;
+
+ /* only existing references in BTF that needed to be adjusted are
+ * mapped in the hypot map; fill in the rest.
+ */
+ for (i = 0; i < d->hypot_map_cnt; i++) {
+ if (d->hypot_map[i] <= BTF_MAX_NR_TYPES)
+ continue;
+ resolved_id = resolve_type_id(d, i);
+ d->hypot_map[i] = d->hypot_map[resolved_id];
+ }
+ *save_map = d->hypot_map;
+ /* ensure btf_dedup_free() will not free hypot map; it belongs to caller */
+ d->hypot_map = NULL;
+
+ return d->hypot_map_cnt;
+}
+
/*
* Probe few well-known locations for vmlinux kernel image and try to load BTF
* data out of it to use for target BTF.
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 0f55518a2be0..082b010c0228 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -265,9 +265,12 @@ struct btf_dedup_opts {
struct btf_ext *btf_ext;
/* force hash collisions (used for testing) */
bool force_collisions;
+ /* return dedup mapping array (from original -> new id) */
+ __u32 **dedup_map;
+ size_t *dedup_map_sz;
size_t :0;
};
-#define btf_dedup_opts__last_field force_collisions
+#define btf_dedup_opts__last_field dedup_map_sz
LIBBPF_API int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts);
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 03/15] libbpf: Add option to retrieve map from old->new ids from btf__dedup()
2025-10-08 17:34 ` [RFC bpf-next 03/15] libbpf: Add option to retrieve map from old->new ids from btf__dedup() Alan Maguire
@ 2025-10-16 18:39 ` Andrii Nakryiko
2025-10-17 8:56 ` Alan Maguire
0 siblings, 1 reply; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-16 18:39 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> When creating split BTF for the .BTF.extra section to record location
> information, we need to add function prototypes that refer to base BTF
> (vmlinux) types. However since .BTF.extra is split BTF we have a
> problem; since collecting those type ids for the parameters, the base
> vmlinux BTF has been deduplicated so the type ids are stale. As a
> result it is valuable to be able to access the map from old->new type
> ids that is constructed as part of deduplication. This allows us to
> update the out-of-date type ids in the FUNC_PROTOs.
>
> In order to pass the map back, we need to fill out all of the hypot
> map mappings; as an optimization normal dedup only computes type id
> mappings needed in existing BTF type id references.
I probably should look at pahole patches to find out myself, but I'm
going to be lazy here. ;) Wouldn't you want to generate .BTF.extra
after base BTF was generated and deduped? Or is it too inconvenient?
Can you please elaborate a bit with more info?
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> tools/lib/bpf/btf.c | 35 ++++++++++++++++++++++++++++++++++-
> tools/lib/bpf/btf.h | 5 ++++-
> 2 files changed, 38 insertions(+), 2 deletions(-)
>
[...]
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 03/15] libbpf: Add option to retrieve map from old->new ids from btf__dedup()
2025-10-16 18:39 ` Andrii Nakryiko
@ 2025-10-17 8:56 ` Alan Maguire
2025-10-20 21:03 ` Andrii Nakryiko
0 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-17 8:56 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On 16/10/2025 19:39, Andrii Nakryiko wrote:
> On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> When creating split BTF for the .BTF.extra section to record location
>> information, we need to add function prototypes that refer to base BTF
>> (vmlinux) types. However since .BTF.extra is split BTF we have a
>> problem; since collecting those type ids for the parameters, the base
>> vmlinux BTF has been deduplicated so the type ids are stale. As a
>> result it is valuable to be able to access the map from old->new type
>> ids that is constructed as part of deduplication. This allows us to
>> update the out-of-date type ids in the FUNC_PROTOs.
>>
>> In order to pass the map back, we need to fill out all of the hypot
>> map mappings; as an optimization normal dedup only computes type id
>> mappings needed in existing BTF type id references.
>
> I probably should look at pahole patches to find out myself, but I'm
> going to be lazy here. ;) Wouldn't you want to generate .BTF.extra
> after base BTF was generated and deduped? Or is it too inconvenient?
> Can you please elaborate a bit with more info?
>
Yep, the BTF.extra is indeed generated after base BTF+dedup, but the
problem is we need to cache info about inline sites as we process DWARF
CUs and collect inline info. Specifically at that time we need to cache
info about function prototypes associated with inlines, and this is done
- like it is done for real functions - via btf_encoder__save_func(). It
saves a representation of the function prototype using BTF ids of
function parameters, and these are pre-dedup BTF ids.
And it's those BTF ids that are the problem. When we dedup with
FUNC_PROTOs in the same BTF, all the id references get fixed up, but
because we now have stale type id references in FUNC_PROTOs in the split
BTF.extra (that were not fixed up by dedup) since we didn't dedup this
split BTF yet, we are stuck.
There are other alternatives here I suppose, but they seemed equally
bad/worse.
One is to rescan all the CUs for later inline site representation once
vmlinux/module dedup is done. That would make pahole much slower as CU
processing is the most time-consuming aspect of its operation. It seemed
better to collect inline info at the same time we collect everything else.
Another is to put the FUNC_PROTOs (that are only needed for inline
sites) into the vmlinux/module BTF. That would work, but even that would
exhibit the same problem as even those FUNC_PROTO type id references
would also get remapped by vmlinux/module dedup.
So it's not an ideal solution, but I couldn't figure out an easier one
I'm afraid.
Alan
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 03/15] libbpf: Add option to retrieve map from old->new ids from btf__dedup()
2025-10-17 8:56 ` Alan Maguire
@ 2025-10-20 21:03 ` Andrii Nakryiko
2025-10-23 8:25 ` Alan Maguire
0 siblings, 1 reply; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-20 21:03 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Fri, Oct 17, 2025 at 1:57 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 16/10/2025 19:39, Andrii Nakryiko wrote:
> > On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> When creating split BTF for the .BTF.extra section to record location
> >> information, we need to add function prototypes that refer to base BTF
> >> (vmlinux) types. However since .BTF.extra is split BTF we have a
> >> problem; since collecting those type ids for the parameters, the base
> >> vmlinux BTF has been deduplicated so the type ids are stale. As a
> >> result it is valuable to be able to access the map from old->new type
> >> ids that is constructed as part of deduplication. This allows us to
> >> update the out-of-date type ids in the FUNC_PROTOs.
> >>
> >> In order to pass the map back, we need to fill out all of the hypot
> >> map mappings; as an optimization normal dedup only computes type id
> >> mappings needed in existing BTF type id references.
> >
> > I probably should look at pahole patches to find out myself, but I'm
> > going to be lazy here. ;) Wouldn't you want to generate .BTF.extra
> > after base BTF was generated and deduped? Or is it too inconvenient?
> > Can you please elaborate a bit with more info?
> >
>
> Yep, the BTF.extra is indeed generated after base BTF+dedup, but the
> problem is we need to cache info about inline sites as we process DWARF
> CUs and collect inline info. Specifically at that time we need to cache
> info about function prototypes associated with inlines, and this is done
> - like it is done for real functions - via btf_encoder__save_func(). It
> saves a representation of the function prototype using BTF ids of
> function parameters, and these are pre-dedup BTF ids.
>
> And it's those BTF ids that are the problem. When we dedup with
> FUNC_PROTOs in the same BTF, all the id references get fixed up, but
> because we now have stale type id references in FUNC_PROTOs in the split
> BTF.extra (that were not fixed up by dedup) since we didn't dedup this
> split BTF yet, we are stuck.
>
> There are other alternatives here I suppose, but they seemed equally
> bad/worse.
>
> One is to rescan all the CUs for later inline site representation once
> vmlinux/module dedup is done. That would make pahole much slower as CU
> processing is the most time-consuming aspect of its operation. It seemed
> better to collect inline info at the same time we collect everything else.
>
> Another is to put the FUNC_PROTOs (that are only needed for inline
> sites) into the vmlinux/module BTF. That would work, but even that would
> exhibit the same problem as even those FUNC_PROTO type id references
> would also get remapped by vmlinux/module dedup.
>
> So it's not an ideal solution, but I couldn't figure out an easier one
> I'm afraid.
Ok, this makes sense at the conceptual level. This might be useful
overall. But I don't like the implementation, sorry.
The size of mapping "table" is fixed, it's btf__type_cnt(). So just
make caller allocate u32 array of that size, and pass it in. Libbpf
will then maintain/populate provided array with original type ID ->
deduped type ID with an absolutely minimal amount of overhead and
extra code.
so just
__u32 dedup_map;
size_t dedup_map_cnt;
inside btf_dedup_opts ? (and we request user to specify count just to
avoid surprises, we do know the size, but user should know it as well)
>
> Alan
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 03/15] libbpf: Add option to retrieve map from old->new ids from btf__dedup()
2025-10-20 21:03 ` Andrii Nakryiko
@ 2025-10-23 8:25 ` Alan Maguire
0 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-23 8:25 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On 20/10/2025 22:03, Andrii Nakryiko wrote:
> On Fri, Oct 17, 2025 at 1:57 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On 16/10/2025 19:39, Andrii Nakryiko wrote:
>>> On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>
>>>> When creating split BTF for the .BTF.extra section to record location
>>>> information, we need to add function prototypes that refer to base BTF
>>>> (vmlinux) types. However since .BTF.extra is split BTF we have a
>>>> problem; since collecting those type ids for the parameters, the base
>>>> vmlinux BTF has been deduplicated so the type ids are stale. As a
>>>> result it is valuable to be able to access the map from old->new type
>>>> ids that is constructed as part of deduplication. This allows us to
>>>> update the out-of-date type ids in the FUNC_PROTOs.
>>>>
>>>> In order to pass the map back, we need to fill out all of the hypot
>>>> map mappings; as an optimization normal dedup only computes type id
>>>> mappings needed in existing BTF type id references.
>>>
>>> I probably should look at pahole patches to find out myself, but I'm
>>> going to be lazy here. ;) Wouldn't you want to generate .BTF.extra
>>> after base BTF was generated and deduped? Or is it too inconvenient?
>>> Can you please elaborate a bit with more info?
>>>
>>
>> Yep, the BTF.extra is indeed generated after base BTF+dedup, but the
>> problem is we need to cache info about inline sites as we process DWARF
>> CUs and collect inline info. Specifically at that time we need to cache
>> info about function prototypes associated with inlines, and this is done
>> - like it is done for real functions - via btf_encoder__save_func(). It
>> saves a representation of the function prototype using BTF ids of
>> function parameters, and these are pre-dedup BTF ids.
>>
>> And it's those BTF ids that are the problem. When we dedup with
>> FUNC_PROTOs in the same BTF, all the id references get fixed up, but
>> because we now have stale type id references in FUNC_PROTOs in the split
>> BTF.extra (that were not fixed up by dedup) since we didn't dedup this
>> split BTF yet, we are stuck.
>>
>> There are other alternatives here I suppose, but they seemed equally
>> bad/worse.
>>
>> One is to rescan all the CUs for later inline site representation once
>> vmlinux/module dedup is done. That would make pahole much slower as CU
>> processing is the most time-consuming aspect of its operation. It seemed
>> better to collect inline info at the same time we collect everything else.
>>
>> Another is to put the FUNC_PROTOs (that are only needed for inline
>> sites) into the vmlinux/module BTF. That would work, but even that would
>> exhibit the same problem as even those FUNC_PROTO type id references
>> would also get remapped by vmlinux/module dedup.
>>
>> So it's not an ideal solution, but I couldn't figure out an easier one
>> I'm afraid.
>
> Ok, this makes sense at the conceptual level. This might be useful
> overall. But I don't like the implementation, sorry.
>
> The size of mapping "table" is fixed, it's btf__type_cnt(). So just
> make caller allocate u32 array of that size, and pass it in. Libbpf
> will then maintain/populate provided array with original type ID ->
> deduped type ID with an absolutely minimal amount of overhead and
> extra code.
>
> so just
>
> __u32 dedup_map;
> size_t dedup_map_cnt;
>
> inside btf_dedup_opts ? (and we request user to specify count just to
> avoid surprises, we do know the size, but user should know it as well)
sounds good, will adjust in next version. Thanks!
^ permalink raw reply [flat|nested] 63+ messages in thread
* [RFC bpf-next 04/15] libbpf: Fix parsing of multi-split BTF
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (2 preceding siblings ...)
2025-10-08 17:34 ` [RFC bpf-next 03/15] libbpf: Add option to retrieve map from old->new ids from btf__dedup() Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-16 18:36 ` Andrii Nakryiko
2025-10-08 17:35 ` [RFC bpf-next 05/15] bpftool: Add ability to dump LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
` (12 subsequent siblings)
16 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
When creating multi-split BTF we correctly set the start string offset
to be the size of the base string section plus the base BTF start
string offset; the latter is needed for multi-split BTF since the
offset is non-zero there.
Unfortunately the BTF parsing case needed that logic and it was
missed.
Fixes: 4e29128a9ace ("libbpf/btf: Fix string handling to support
multi-split BTF")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/lib/bpf/btf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 6b06fb60d39a..62d80e8e81bf 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1122,7 +1122,7 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf, b
if (base_btf) {
btf->base_btf = base_btf;
btf->start_id = btf__type_cnt(base_btf);
- btf->start_str_off = base_btf->hdr->str_len;
+ btf->start_str_off = base_btf->hdr->str_len + base_btf->start_str_off;
}
if (is_mmap) {
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 04/15] libbpf: Fix parsing of multi-split BTF
2025-10-08 17:35 ` [RFC bpf-next 04/15] libbpf: Fix parsing of multi-split BTF Alan Maguire
@ 2025-10-16 18:36 ` Andrii Nakryiko
2025-10-17 13:47 ` Alan Maguire
0 siblings, 1 reply; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-16 18:36 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> When creating multi-split BTF we correctly set the start string offset
> to be the size of the base string section plus the base BTF start
> string offset; the latter is needed for multi-split BTF since the
> offset is non-zero there.
>
> Unfortunately the BTF parsing case needed that logic and it was
> missed.
>
> Fixes: 4e29128a9ace ("libbpf/btf: Fix string handling to support
> multi-split BTF")
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> tools/lib/bpf/btf.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
please send the fix separately
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 6b06fb60d39a..62d80e8e81bf 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -1122,7 +1122,7 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf, b
> if (base_btf) {
> btf->base_btf = base_btf;
> btf->start_id = btf__type_cnt(base_btf);
> - btf->start_str_off = base_btf->hdr->str_len;
> + btf->start_str_off = base_btf->hdr->str_len + base_btf->start_str_off;
> }
>
> if (is_mmap) {
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 04/15] libbpf: Fix parsing of multi-split BTF
2025-10-16 18:36 ` Andrii Nakryiko
@ 2025-10-17 13:47 ` Alan Maguire
0 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-17 13:47 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On 16/10/2025 19:36, Andrii Nakryiko wrote:
> On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> When creating multi-split BTF we correctly set the start string offset
>> to be the size of the base string section plus the base BTF start
>> string offset; the latter is needed for multi-split BTF since the
>> offset is non-zero there.
>>
>> Unfortunately the BTF parsing case needed that logic and it was
>> missed.
>>
>> Fixes: 4e29128a9ace ("libbpf/btf: Fix string handling to support
>> multi-split BTF")
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>> tools/lib/bpf/btf.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>
> please send the fix separately
>
sure, will do. Thanks!
>
>> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
>> index 6b06fb60d39a..62d80e8e81bf 100644
>> --- a/tools/lib/bpf/btf.c
>> +++ b/tools/lib/bpf/btf.c
>> @@ -1122,7 +1122,7 @@ static struct btf *btf_new(const void *data, __u32 size, struct btf *base_btf, b
>> if (base_btf) {
>> btf->base_btf = base_btf;
>> btf->start_id = btf__type_cnt(base_btf);
>> - btf->start_str_off = base_btf->hdr->str_len;
>> + btf->start_str_off = base_btf->hdr->str_len + base_btf->start_str_off;
>> }
>>
>> if (is_mmap) {
>> --
>> 2.39.3
>>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [RFC bpf-next 05/15] bpftool: Add ability to dump LOC_PARAM, LOC_PROTO and LOCSEC
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (3 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 04/15] libbpf: Fix parsing of multi-split BTF Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-23 0:57 ` Eduard Zingerman
2025-10-08 17:35 ` [RFC bpf-next 06/15] bpftool: Handle multi-split BTF by supporting multiple base BTFs Alan Maguire
` (11 subsequent siblings)
16 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
In raw mode ensure we can dump new BTF kinds in normal/json format.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/bpf/bpftool/btf.c | 95 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 95 insertions(+)
diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index 946612029dee..23b773659ad8 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -50,6 +50,9 @@ static const char * const btf_kind_str[NR_BTF_KINDS] = {
[BTF_KIND_DECL_TAG] = "DECL_TAG",
[BTF_KIND_TYPE_TAG] = "TYPE_TAG",
[BTF_KIND_ENUM64] = "ENUM64",
+ [BTF_KIND_LOC_PARAM] = "LOC_PARAM",
+ [BTF_KIND_LOC_PROTO] = "LOC_PROTO",
+ [BTF_KIND_LOCSEC] = "LOCSEC",
};
struct sort_datum {
@@ -420,6 +423,98 @@ static int dump_btf_type(const struct btf *btf, __u32 id,
}
break;
}
+ case BTF_KIND_LOC_PARAM: {
+ const struct btf_loc_param *p = btf_loc_param(t);
+ __s32 sz = (__s32)t->size;
+
+ if (btf_kflag(t)) {
+ __u64 uval = btf_loc_param_value(t);
+ __s64 sval = (__s64)uval;
+
+ if (json_output) {
+ jsonw_int_field(w, "size", sz);
+ if (sz >= 0)
+ jsonw_uint_field(w, "value", uval);
+ else
+ jsonw_int_field(w, "value", sval);
+ } else {
+ if (sz >= 0)
+ printf(" size=%d value=%llu", sz, uval);
+ else
+ printf(" size=%d value=%lld", sz, sval);
+ }
+ } else {
+ if (json_output) {
+ jsonw_int_field(w, "size", sz);
+ jsonw_uint_field(w, "reg", p->reg);
+ jsonw_uint_field(w, "flags", p->flags);
+ jsonw_int_field(w, "offset", p->offset);
+ } else {
+ printf(" size=%d reg=%u flags=0x%x offset=%d",
+ sz, p->reg, p->flags, p->offset);
+ }
+ }
+ break;
+ }
+
+ case BTF_KIND_LOC_PROTO: {
+ __u32 *params = btf_loc_proto_params(t);
+ __u16 vlen = BTF_INFO_VLEN(t->info);
+ int i;
+
+ if (json_output) {
+ jsonw_uint_field(w, "vlen", vlen);
+ jsonw_name(w, "params");
+ jsonw_start_array(w);
+ } else {
+ printf(" vlen=%u", vlen);
+ }
+
+ for (i = 0; i < vlen; i++, params++) {
+ if (json_output) {
+ jsonw_start_object(w);
+ jsonw_uint_field(w, "type_id", *params);
+ jsonw_end_object(w);
+ } else {
+ printf("\n\t type_id=%u", *params);
+ }
+ }
+ if (json_output)
+ jsonw_end_array(w);
+ break;
+ }
+
+ case BTF_KIND_LOCSEC: {
+ __u16 vlen = BTF_INFO_VLEN(t->info);
+ struct btf_loc *locs = btf_locsec_locs(t);
+ int i;
+
+ if (json_output) {
+ jsonw_uint_field(w, "vlen", vlen);
+ jsonw_name(w, "locs");
+ jsonw_start_array(w);
+ } else {
+ printf(" vlen=%u", vlen);
+ }
+
+ for (i = 0; i < vlen; i++, locs++) {
+ if (json_output) {
+ jsonw_start_object(w);
+ jsonw_string_field(w, "name", btf_str(btf, locs->name_off));
+ jsonw_uint_field(w, "func_proto_type_id", locs->func_proto);
+ jsonw_uint_field(w, "loc_proto_type_id", locs->loc_proto);
+ jsonw_uint_field(w, "offset", locs->offset);
+ jsonw_end_object(w);
+ } else {
+ printf("\n\t '%s' func_proto_type_id=%u loc_proto_type_id=%u offset=%u",
+ btf_str(btf, locs->name_off),
+ locs->func_proto, locs->loc_proto, locs->offset);
+ }
+ }
+ if (json_output)
+ jsonw_end_array(w);
+ break;
+ }
default:
break;
}
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 05/15] bpftool: Add ability to dump LOC_PARAM, LOC_PROTO and LOCSEC
2025-10-08 17:35 ` [RFC bpf-next 05/15] bpftool: Add ability to dump LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
@ 2025-10-23 0:57 ` Eduard Zingerman
2025-10-23 8:38 ` Alan Maguire
0 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-10-23 0:57 UTC (permalink / raw)
To: Alan Maguire, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On Wed, 2025-10-08 at 18:35 +0100, Alan Maguire wrote:
[...]
> @@ -420,6 +423,98 @@ static int dump_btf_type(const struct btf *btf, __u32 id,
> }
> break;
> }
> + case BTF_KIND_LOC_PARAM: {
> + const struct btf_loc_param *p = btf_loc_param(t);
> + __s32 sz = (__s32)t->size;
> +
> + if (btf_kflag(t)) {
> + __u64 uval = btf_loc_param_value(t);
> + __s64 sval = (__s64)uval;
> +
> + if (json_output) {
> + jsonw_int_field(w, "size", sz);
> + if (sz >= 0)
> + jsonw_uint_field(w, "value", uval);
> + else
> + jsonw_int_field(w, "value", sval);
> + } else {
> + if (sz >= 0)
> + printf(" size=%d value=%llu", sz, uval);
> + else
> + printf(" size=%d value=%lld", sz, sval);
> + }
> + } else {
> + if (json_output) {
> + jsonw_int_field(w, "size", sz);
> + jsonw_uint_field(w, "reg", p->reg);
> + jsonw_uint_field(w, "flags", p->flags);
> + jsonw_int_field(w, "offset", p->offset);
> + } else {
> + printf(" size=%d reg=%u flags=0x%x offset=%d",
> + sz, p->reg, p->flags, p->offset);
Did you consider printing this in a more user readable form?
E.g. `*(u64 *)(rbp - 8)`?
> + }
> + }
> + break;
> + }
[...]
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 05/15] bpftool: Add ability to dump LOC_PARAM, LOC_PROTO and LOCSEC
2025-10-23 0:57 ` Eduard Zingerman
@ 2025-10-23 8:38 ` Alan Maguire
2025-10-23 8:50 ` Eduard Zingerman
0 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-23 8:38 UTC (permalink / raw)
To: Eduard Zingerman, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On 23/10/2025 01:57, Eduard Zingerman wrote:
> On Wed, 2025-10-08 at 18:35 +0100, Alan Maguire wrote:
>
> [...]
>
>> @@ -420,6 +423,98 @@ static int dump_btf_type(const struct btf *btf, __u32 id,
>> }
>> break;
>> }
>> + case BTF_KIND_LOC_PARAM: {
>> + const struct btf_loc_param *p = btf_loc_param(t);
>> + __s32 sz = (__s32)t->size;
>> +
>> + if (btf_kflag(t)) {
>> + __u64 uval = btf_loc_param_value(t);
>> + __s64 sval = (__s64)uval;
>> +
>> + if (json_output) {
>> + jsonw_int_field(w, "size", sz);
>> + if (sz >= 0)
>> + jsonw_uint_field(w, "value", uval);
>> + else
>> + jsonw_int_field(w, "value", sval);
>> + } else {
>> + if (sz >= 0)
>> + printf(" size=%d value=%llu", sz, uval);
>> + else
>> + printf(" size=%d value=%lld", sz, sval);
>> + }
>> + } else {
>> + if (json_output) {
>> + jsonw_int_field(w, "size", sz);
>> + jsonw_uint_field(w, "reg", p->reg);
>> + jsonw_uint_field(w, "flags", p->flags);
>> + jsonw_int_field(w, "offset", p->offset);
>> + } else {
>> + printf(" size=%d reg=%u flags=0x%x offset=%d",
>> + sz, p->reg, p->flags, p->offset);
>
> Did you consider printing this in a more user readable form?
> E.g. `*(u64 *)(rbp - 8)`?
>
That's a good idea. However currently we use register numbers we get
from DWARF, so would it be more confusing to see something like
*(u64 *)(reg1 -8)
Not sure (we could translate reg# -> regname but I'm not sure where the
right place to host such a translation might be).
>> + }
>> + }
>> + break;
>> + }
>
> [...]
>
>
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 05/15] bpftool: Add ability to dump LOC_PARAM, LOC_PROTO and LOCSEC
2025-10-23 8:38 ` Alan Maguire
@ 2025-10-23 8:50 ` Eduard Zingerman
0 siblings, 0 replies; 63+ messages in thread
From: Eduard Zingerman @ 2025-10-23 8:50 UTC (permalink / raw)
To: Alan Maguire, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On Thu, 2025-10-23 at 09:38 +0100, Alan Maguire wrote:
> On 23/10/2025 01:57, Eduard Zingerman wrote:
> > On Wed, 2025-10-08 at 18:35 +0100, Alan Maguire wrote:
> >
> > [...]
> >
> > > @@ -420,6 +423,98 @@ static int dump_btf_type(const struct btf *btf, __u32 id,
> > > }
> > > break;
> > > }
> > > + case BTF_KIND_LOC_PARAM: {
> > > + const struct btf_loc_param *p = btf_loc_param(t);
> > > + __s32 sz = (__s32)t->size;
> > > +
> > > + if (btf_kflag(t)) {
> > > + __u64 uval = btf_loc_param_value(t);
> > > + __s64 sval = (__s64)uval;
> > > +
> > > + if (json_output) {
> > > + jsonw_int_field(w, "size", sz);
> > > + if (sz >= 0)
> > > + jsonw_uint_field(w, "value", uval);
> > > + else
> > > + jsonw_int_field(w, "value", sval);
> > > + } else {
> > > + if (sz >= 0)
> > > + printf(" size=%d value=%llu", sz, uval);
> > > + else
> > > + printf(" size=%d value=%lld", sz, sval);
> > > + }
> > > + } else {
> > > + if (json_output) {
> > > + jsonw_int_field(w, "size", sz);
> > > + jsonw_uint_field(w, "reg", p->reg);
> > > + jsonw_uint_field(w, "flags", p->flags);
> > > + jsonw_int_field(w, "offset", p->offset);
> > > + } else {
> > > + printf(" size=%d reg=%u flags=0x%x offset=%d",
> > > + sz, p->reg, p->flags, p->offset);
> >
> > Did you consider printing this in a more user readable form?
> > E.g. `*(u64 *)(rbp - 8)`?
> >
>
> That's a good idea. However currently we use register numbers we get
> from DWARF, so would it be more confusing to see something like
>
> *(u64 *)(reg1 -8)
>
> Not sure (we could translate reg# -> regname but I'm not sure where the
> right place to host such a translation might be).
We can start with hosting the table in bpftool.
Additionally, bpftool can check which architecture the ELF file
containing BPF is built for, and decide whether to apply the
translation.
For x86 you can just grab the table from here:
https://github.com/eddyz87/inline-address-printer/blob/master/main.c#L107
> > > + }
> > > + }
> > > + break;
> > > + }
> >
> > [...]
> >
> >
^ permalink raw reply [flat|nested] 63+ messages in thread
* [RFC bpf-next 06/15] bpftool: Handle multi-split BTF by supporting multiple base BTFs
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (4 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 05/15] bpftool: Add ability to dump LOC_PARAM, LOC_PROTO and LOCSEC Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-16 18:36 ` Andrii Nakryiko
2025-10-08 17:35 ` [RFC bpf-next 07/15] selftests/bpf: Test helper support for BTF_KIND_LOC[_PARAM|_PROTO|SEC] Alan Maguire
` (10 subsequent siblings)
16 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
For bpftool to be able to dump .BTF.extra data in /sys/kernel/btf_extra
for modules, it needs to support multi-split BTF because the
parent-child relationship of BTF extra data for modules is
vmlinux BTF data
module BTF data
module BTF extra data
So for example to dump BTF extra info for xfs we would run
$ bpftool btf dump -B /sys/kernel/btf/vmlinux -B /sys/kernel/btf/xfs file /sys/kernel/btf_extra/xfs
Multiple bases are specified with the vmlinux base BTF first (parent)
followed by the xfs BTF (child), and finally the XFS BTF extra.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/bpf/bpftool/main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index a829a6a49037..aa16560b4157 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -514,7 +514,8 @@ int main(int argc, char **argv)
verifier_logs = true;
break;
case 'B':
- base_btf = btf__parse(optarg, NULL);
+ /* handle multi-split BTF */
+ base_btf = btf__parse_split(optarg, base_btf);
if (!base_btf) {
p_err("failed to parse base BTF at '%s': %d\n",
optarg, -errno);
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 06/15] bpftool: Handle multi-split BTF by supporting multiple base BTFs
2025-10-08 17:35 ` [RFC bpf-next 06/15] bpftool: Handle multi-split BTF by supporting multiple base BTFs Alan Maguire
@ 2025-10-16 18:36 ` Andrii Nakryiko
2025-10-17 13:47 ` Alan Maguire
0 siblings, 1 reply; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-16 18:36 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> For bpftool to be able to dump .BTF.extra data in /sys/kernel/btf_extra
> for modules, it needs to support multi-split BTF because the
> parent-child relationship of BTF extra data for modules is
>
> vmlinux BTF data
> module BTF data
> module BTF extra data
>
> So for example to dump BTF extra info for xfs we would run
>
> $ bpftool btf dump -B /sys/kernel/btf/vmlinux -B /sys/kernel/btf/xfs file /sys/kernel/btf_extra/xfs
>
> Multiple bases are specified with the vmlinux base BTF first (parent)
> followed by the xfs BTF (child), and finally the XFS BTF extra.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> tools/bpf/bpftool/main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
we'll need to update documentation to mention that order of -B matters
and how it is treated
> diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
> index a829a6a49037..aa16560b4157 100644
> --- a/tools/bpf/bpftool/main.c
> +++ b/tools/bpf/bpftool/main.c
> @@ -514,7 +514,8 @@ int main(int argc, char **argv)
> verifier_logs = true;
> break;
> case 'B':
> - base_btf = btf__parse(optarg, NULL);
> + /* handle multi-split BTF */
> + base_btf = btf__parse_split(optarg, base_btf);
> if (!base_btf) {
> p_err("failed to parse base BTF at '%s': %d\n",
> optarg, -errno);
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 06/15] bpftool: Handle multi-split BTF by supporting multiple base BTFs
2025-10-16 18:36 ` Andrii Nakryiko
@ 2025-10-17 13:47 ` Alan Maguire
0 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-17 13:47 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On 16/10/2025 19:36, Andrii Nakryiko wrote:
> On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> For bpftool to be able to dump .BTF.extra data in /sys/kernel/btf_extra
>> for modules, it needs to support multi-split BTF because the
>> parent-child relationship of BTF extra data for modules is
>>
>> vmlinux BTF data
>> module BTF data
>> module BTF extra data
>>
>> So for example to dump BTF extra info for xfs we would run
>>
>> $ bpftool btf dump -B /sys/kernel/btf/vmlinux -B /sys/kernel/btf/xfs file /sys/kernel/btf_extra/xfs
>>
>> Multiple bases are specified with the vmlinux base BTF first (parent)
>> followed by the xfs BTF (child), and finally the XFS BTF extra.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>> tools/bpf/bpftool/main.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>
> we'll need to update documentation to mention that order of -B matters
> and how it is treated
>
yep, good point, I'll add a documentation patch in the next round.
>
>> diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
>> index a829a6a49037..aa16560b4157 100644
>> --- a/tools/bpf/bpftool/main.c
>> +++ b/tools/bpf/bpftool/main.c
>> @@ -514,7 +514,8 @@ int main(int argc, char **argv)
>> verifier_logs = true;
>> break;
>> case 'B':
>> - base_btf = btf__parse(optarg, NULL);
>> + /* handle multi-split BTF */
>> + base_btf = btf__parse_split(optarg, base_btf);
>> if (!base_btf) {
>> p_err("failed to parse base BTF at '%s': %d\n",
>> optarg, -errno);
>> --
>> 2.39.3
>>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [RFC bpf-next 07/15] selftests/bpf: Test helper support for BTF_KIND_LOC[_PARAM|_PROTO|SEC]
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (5 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 06/15] bpftool: Handle multi-split BTF by supporting multiple base BTFs Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 08/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to field iter tests Alan Maguire
` (9 subsequent siblings)
16 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
Add support to dump, encode and validate new location-related kinds.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/testing/selftests/bpf/btf_helpers.c | 43 ++++++++++++++++++++++-
tools/testing/selftests/bpf/test_btf.h | 15 ++++++++
2 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/btf_helpers.c b/tools/testing/selftests/bpf/btf_helpers.c
index 1c1c2c26690a..90455ef8ab0f 100644
--- a/tools/testing/selftests/bpf/btf_helpers.c
+++ b/tools/testing/selftests/bpf/btf_helpers.c
@@ -27,11 +27,14 @@ static const char * const btf_kind_str_mapping[] = {
[BTF_KIND_DECL_TAG] = "DECL_TAG",
[BTF_KIND_TYPE_TAG] = "TYPE_TAG",
[BTF_KIND_ENUM64] = "ENUM64",
+ [BTF_KIND_LOC_PARAM] = "LOC_PARAM",
+ [BTF_KIND_LOC_PROTO] = "LOC_PROTO",
+ [BTF_KIND_LOCSEC] = "LOCSEC",
};
static const char *btf_kind_str(__u16 kind)
{
- if (kind > BTF_KIND_ENUM64)
+ if (kind > BTF_KIND_LOCSEC)
return "UNKNOWN";
return btf_kind_str_mapping[kind];
}
@@ -203,6 +206,44 @@ int fprintf_btf_type_raw(FILE *out, const struct btf *btf, __u32 id)
fprintf(out, " type_id=%u component_idx=%d",
t->type, btf_decl_tag(t)->component_idx);
break;
+ case BTF_KIND_LOC_PARAM: {
+ struct btf_loc_param *p = btf_loc_param(t);
+ __s32 sz = (__s32)t->size;
+
+ if (btf_kflag(t)) {
+ __u64 uval = btf_loc_param_value(t);
+
+ if (sz >= 0) {
+ fprintf(out, " size=%d value=%llu", sz, uval);
+ } else {
+ __s64 sval = (__s64)uval;
+
+ fprintf(out, " size=%d value=%lld", sz, sval);
+ }
+ } else {
+ fprintf(out, " size=%d reg=%u flags=0x%x offset=%d",
+ sz, p->reg, p->flags, p->offset);
+ }
+ break;
+ }
+ case BTF_KIND_LOC_PROTO: {
+ const __u32 *p = btf_loc_proto_params(t);
+
+ fprintf(out, " vlen=%u", vlen);
+ for (i = 0; i < vlen; i++, p++)
+ fprintf(out, "\n\ttype_id=%u", *p);
+ break;
+ }
+ case BTF_KIND_LOCSEC: {
+ const struct btf_loc *l = btf_locsec_locs(t);
+
+ fprintf(out, " vlen=%u", vlen);
+ for (i = 0; i < vlen; i++, l++) {
+ fprintf(out, "\n\t'%s' func_proto_type_id=%u loc_proto_type_id=%u offset=%d",
+ btf_str(btf, l->name_off), l->func_proto, l->loc_proto, l->offset);
+ }
+ break;
+ }
default:
break;
}
diff --git a/tools/testing/selftests/bpf/test_btf.h b/tools/testing/selftests/bpf/test_btf.h
index e65889ab4adf..6e9bc6fe6702 100644
--- a/tools/testing/selftests/bpf/test_btf.h
+++ b/tools/testing/selftests/bpf/test_btf.h
@@ -84,4 +84,19 @@
#define BTF_TYPE_TAG_ENC(value, type) \
BTF_TYPE_ENC(value, BTF_INFO_ENC(BTF_KIND_TYPE_TAG, 0, 0), type)
+#define BTF_LOC_PARAM_ENC(sz, kflag, value) \
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_LOC, kflag, 0), (__u32)sz), \
+ (value >> 32), (value & 0xffffffff)
+
+#define BTF_LOC_PROTO_ENC(nargs) \
+ BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_LOC_PROTO, 0, nargs), 0)
+
+#define BTF_LOC_PROTO_PARAM_ENCODE(param) (param)
+
+#define BTF_LOCSEC_ENC(name, nlocs) \
+ BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_LOCSEC, 0, nlocs), 0)
+
+#define BTF_LOCSEC_LOC_ENCODE(name, func_proto, loc_proto, offset) \
+ (name), (func_proto), (loc_proto), (offset)
+
#endif /* _TEST_BTF_H */
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* [RFC bpf-next 08/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to field iter tests
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (6 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 07/15] selftests/bpf: Test helper support for BTF_KIND_LOC[_PARAM|_PROTO|SEC] Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 09/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to dedup split tests Alan Maguire
` (8 subsequent siblings)
16 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
BTF_KIND_LOC[_PARAM|_PROTO|SEC] need to work with field iteration, so extend
the selftest to cover these and ensure iteration over all types
and names succeeds.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
.../selftests/bpf/prog_tests/btf_field_iter.c | 26 ++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_field_iter.c b/tools/testing/selftests/bpf/prog_tests/btf_field_iter.c
index 32159d3eb281..12f8030dd31a 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_field_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_field_iter.c
@@ -31,8 +31,11 @@ struct field_data {
{ .ids = { 11 }, .strs = { "decltag" } },
{ .ids = { 6 }, .strs = { "typetag" } },
{ .ids = {}, .strs = { "e64", "eval1", "eval2", "eval3" } },
- { .ids = { 15, 16 }, .strs = { "datasec1" } }
-
+ { .ids = { 15, 16 }, .strs = { "datasec1" } },
+ { .ids = {}, .strs = { "" } },
+ { .ids = {}, .strs = { "" } },
+ { .ids = { 22, 23 },.strs = { "" } },
+ { .ids = { 13, 24 },.strs = { ".loc", "func" } }
};
/* Fabricate BTF with various types and check BTF field iteration finds types,
@@ -88,6 +91,16 @@ void test_btf_field_iter(void)
btf__add_datasec_var_info(btf, 15, 0, 4);
btf__add_datasec_var_info(btf, 16, 4, 8);
+ btf__add_loc_param(btf, -4, true, -1, 0, 0, 0); /* [22] loc value -1 */
+ btf__add_loc_param(btf, 8, false, 0, 1, 0, 0); /* [23] loc reg 1 */
+
+ btf__add_loc_proto(btf); /* [24] loc proto */
+ btf__add_loc_proto_param(btf, 22); /* param value -1, */
+ btf__add_loc_proto_param(btf, 23); /* param reg 1 */
+
+ btf__add_locsec(btf, ".loc"); /* [25] locsec ".loc" */
+ btf__add_locsec_loc(btf, "func", 13, 24, 128); /* "func" */
+
VALIDATE_RAW_BTF(
btf,
"[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
@@ -123,7 +136,14 @@ void test_btf_field_iter(void)
"\t'eval3' val=3000",
"[21] DATASEC 'datasec1' size=12 vlen=2\n"
"\ttype_id=15 offset=0 size=4\n"
- "\ttype_id=16 offset=4 size=8");
+ "\ttype_id=16 offset=4 size=8",
+ "[22] LOC_PARAM '(anon)' size=-4 value=-1",
+ "[23] LOC_PARAM '(anon)' size=8 reg=1 flags=0x0 offset=0",
+ "[24] LOC_PROTO '(anon)' vlen=2\n"
+ "\ttype_id=22\n"
+ "\ttype_id=23",
+ "[25] LOCSEC '.loc' vlen=1\n"
+ "\t'func' func_proto_type_id=13 loc_proto_type_id=24 offset=128");
for (id = 1; id < btf__type_cnt(btf); id++) {
struct btf_type *t = btf_type_by_id(btf, id);
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* [RFC bpf-next 09/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to dedup split tests
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (7 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 08/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to field iter tests Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 10/15] selftests/bpf: BTF distill tests to ensure LOC[_PARAM|_PROTO] add to split BTF Alan Maguire
` (7 subsequent siblings)
16 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
Ensure that location params/protos are deduplicated and location
sections are not, and that references to deduplicated locations within
location prototypes and sections are updated after deduplication.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
.../bpf/prog_tests/btf_dedup_split.c | 93 +++++++++++++++++++
1 file changed, 93 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_dedup_split.c b/tools/testing/selftests/bpf/prog_tests/btf_dedup_split.c
index 5bc15bb6b7ce..583d24ce0752 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_dedup_split.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_dedup_split.c
@@ -539,6 +539,97 @@ static void test_split_module(void)
btf__free(vmlinux_btf);
}
+static void test_split_loc(void)
+{
+ //const struct btf_type *t;
+ struct btf *btf1, *btf2;
+ int err;
+
+ btf1 = btf__new_empty();
+ if (!ASSERT_OK_PTR(btf1, "empty_main_btf"))
+ return;
+
+ btf__set_pointer_size(btf1, 8); /* enforce 64-bit arch */
+
+
+ btf__add_int(btf1, "long", 8, BTF_INT_SIGNED); /* [1] long */
+ btf__add_ptr(btf1, 1); /* [2] ptr to long */
+ btf__add_func_proto(btf1, 1); /* [3] long (*)(long, long *); */
+ btf__add_func_param(btf1, "p1", 1);
+ btf__add_func_param(btf1, "p2", 2);
+ btf__add_loc_param(btf1, -8, true, -9223372036854775807, 0, 0, 0);
+ /* [4] loc value */
+ btf__add_loc_param(btf1, 8, false, 0, 1, 0, 0); /* [5] loc reg 1 */
+ btf__add_loc_proto(btf1); /* [6] loc proto */
+ btf__add_loc_proto_param(btf1, 4); /* param value */
+ btf__add_loc_proto_param(btf1, 5); /* param reg 1 */
+
+ VALIDATE_RAW_BTF(
+ btf1,
+ "[1] INT 'long' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED",
+ "[2] PTR '(anon)' type_id=1",
+ "[3] FUNC_PROTO '(anon)' ret_type_id=1 vlen=2\n"
+ "\t'p1' type_id=1\n"
+ "\t'p2' type_id=2",
+ "[4] LOC_PARAM '(anon)' size=-8 value=-9223372036854775807",
+ "[5] LOC_PARAM '(anon)' size=8 reg=1 flags=0x0 offset=0",
+ "[6] LOC_PROTO '(anon)' vlen=2\n"
+ "\ttype_id=4\n"
+ "\ttype_id=5");
+
+ btf2 = btf__new_empty_split(btf1);
+ if (!ASSERT_OK_PTR(btf2, "empty_split_btf"))
+ goto cleanup;
+ btf__add_loc_param(btf2, 8, false, 0, 1, 0, 0); /* [7] loc reg 1 */
+ btf__add_loc_proto(btf2); /* [8] loc proto */
+ btf__add_loc_proto_param(btf2, 4); /* param value */
+ btf__add_loc_proto_param(btf2, 7); /* param reg 1 */
+ btf__add_locsec(btf2, ".locs"); /* [9] locsec ".locs" */
+ btf__add_locsec_loc(btf2, "foo", 3, 8, 128);
+
+ VALIDATE_RAW_BTF(
+ btf2,
+ "[1] INT 'long' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED",
+ "[2] PTR '(anon)' type_id=1",
+ "[3] FUNC_PROTO '(anon)' ret_type_id=1 vlen=2\n"
+ "\t'p1' type_id=1\n"
+ "\t'p2' type_id=2",
+ "[4] LOC_PARAM '(anon)' size=-8 value=-9223372036854775807",
+ "[5] LOC_PARAM '(anon)' size=8 reg=1 flags=0x0 offset=0",
+ "[6] LOC_PROTO '(anon)' vlen=2\n"
+ "\ttype_id=4\n"
+ "\ttype_id=5",
+ "[7] LOC_PARAM '(anon)' size=8 reg=1 flags=0x0 offset=0",
+ "[8] LOC_PROTO '(anon)' vlen=2\n"
+ "\ttype_id=4\n"
+ "\ttype_id=7",
+ "[9] LOCSEC '.locs' vlen=1\n"
+ "\t'foo' func_proto_type_id=3 loc_proto_type_id=8 offset=128");
+
+ err = btf__dedup(btf2, NULL);
+ if (!ASSERT_OK(err, "btf_dedup"))
+ goto cleanup;
+
+ VALIDATE_RAW_BTF(
+ btf2,
+ "[1] INT 'long' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED",
+ "[2] PTR '(anon)' type_id=1",
+ "[3] FUNC_PROTO '(anon)' ret_type_id=1 vlen=2\n"
+ "\t'p1' type_id=1\n"
+ "\t'p2' type_id=2",
+ "[4] LOC_PARAM '(anon)' size=-8 value=-9223372036854775807",
+ "[5] LOC_PARAM '(anon)' size=8 reg=1 flags=0x0 offset=0",
+ "[6] LOC_PROTO '(anon)' vlen=2\n"
+ "\ttype_id=4\n"
+ "\ttype_id=5",
+ "[7] LOCSEC '.locs' vlen=1\n"
+ "\t'foo' func_proto_type_id=3 loc_proto_type_id=6 offset=128");
+
+cleanup:
+ btf__free(btf2);
+ btf__free(btf1);
+}
+
void test_btf_dedup_split()
{
if (test__start_subtest("split_simple"))
@@ -551,4 +642,6 @@ void test_btf_dedup_split()
test_split_dup_struct_in_cu();
if (test__start_subtest("split_module"))
test_split_module();
+ if (test__start_subtest("split_loc"))
+ test_split_loc();
}
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* [RFC bpf-next 10/15] selftests/bpf: BTF distill tests to ensure LOC[_PARAM|_PROTO] add to split BTF
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (8 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 09/15] selftests/bpf: Add LOC_PARAM, LOC_PROTO, LOCSEC to dedup split tests Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 11/15] kbuild: Add support for extra BTF Alan Maguire
` (6 subsequent siblings)
16 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
When creating distilled BTF, BTF_KIND_LOC_PARAM and BTF_KIND_LOC_PROTO
should be added to split BTF. This means potentially some duplication
of location information, but only for out-of-tree modules that use
distilled base/split BTF.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
.../selftests/bpf/prog_tests/btf_distill.c | 68 +++++++++++++++++++
1 file changed, 68 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_distill.c b/tools/testing/selftests/bpf/prog_tests/btf_distill.c
index fb67ae195a73..1dd26ec79b69 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_distill.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_distill.c
@@ -671,6 +671,72 @@ static void test_distilled_base_embedded_err(void)
btf__free(btf1);
}
+/* LOC_PARAM, LOC_PROTO should be added to split BTF. */
+static void test_distilled_loc(void)
+{
+ struct btf *btf1 = NULL, *btf2 = NULL, *btf3 = NULL, *btf4 = NULL;
+
+ btf1 = btf__new_empty();
+ if (!ASSERT_OK_PTR(btf1, "empty_main_btf"))
+ return;
+
+ btf__add_int(btf1, "int", 4, BTF_INT_SIGNED); /* [1] int */
+ btf__add_func_proto(btf1, 1); /* [2] int (*)(int); */
+ btf__add_func_param(btf1, "p1", 1);
+ btf__add_loc_param(btf1, -4, true, -1, 0, 0, 0);/* [3] loc value */
+
+ VALIDATE_RAW_BTF(
+ btf1,
+ "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+ "[2] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+ "\t'p1' type_id=1",
+ "[3] LOC_PARAM '(anon)' size=-4 value=-1");
+
+ btf2 = btf__new_empty_split(btf1);
+ if (!ASSERT_OK_PTR(btf2, "empty_split_btf"))
+ goto cleanup;
+
+ btf__add_loc_proto(btf2); /* [4] loc proto */
+ btf__add_loc_proto_param(btf2, 3); /* param value */
+
+ btf__add_locsec(btf2, ".locs"); /* [5] locsec */
+ btf__add_locsec_loc(btf2, "foo", 2, 4, 256); /* "foo" offset 256 */
+ VALIDATE_RAW_BTF(
+ btf2,
+ "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+ "[2] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+ "\t'p1' type_id=1",
+ "[3] LOC_PARAM '(anon)' size=-4 value=-1",
+ "[4] LOC_PROTO '(anon)' vlen=1\n"
+ "\ttype_id=3",
+ "[5] LOCSEC '.locs' vlen=1\n"
+ "\t'foo' func_proto_type_id=2 loc_proto_type_id=4 offset=256");
+
+ if (!ASSERT_EQ(0, btf__distill_base(btf2, &btf3, &btf4),
+ "distilled_base") ||
+ !ASSERT_OK_PTR(btf3, "distilled_base") ||
+ !ASSERT_OK_PTR(btf4, "distilled_split") ||
+ !ASSERT_EQ(2, btf__type_cnt(btf3), "distilled_base_type_cnt"))
+ goto cleanup;
+
+ VALIDATE_RAW_BTF(
+ btf4,
+ "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED",
+ /* remainder is split BTF */
+ "[2] LOC_PROTO '(anon)' vlen=1\n"
+ "\ttype_id=5",
+ "[3] LOCSEC '.locs' vlen=1\n"
+ "\t'foo' func_proto_type_id=4 loc_proto_type_id=2 offset=256",
+ "[4] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n"
+ "\t'p1' type_id=1",
+ "[5] LOC_PARAM '(anon)' size=-4 value=-1");
+cleanup:
+ btf__free(btf4);
+ btf__free(btf3);
+ btf__free(btf2);
+ btf__free(btf1);
+}
+
void test_btf_distill(void)
{
if (test__start_subtest("distilled_base"))
@@ -689,4 +755,6 @@ void test_btf_distill(void)
test_distilled_base_vmlinux();
if (test__start_subtest("distilled_endianness"))
test_distilled_endianness();
+ if (test__start_subtest("distilled_loc"))
+ test_distilled_loc();
}
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* [RFC bpf-next 11/15] kbuild: Add support for extra BTF
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (9 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 10/15] selftests/bpf: BTF distill tests to ensure LOC[_PARAM|_PROTO] add to split BTF Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-08 17:35 ` [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m Alan Maguire
` (5 subsequent siblings)
16 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
information
.BTF.extra sections wil be used to add additional BTF information
for inlines etc. .BTF.extra sections are split BTF relative to
kernel/module BTF and are enabled via CONFIG_DEBUG_INFO_BTF_EXTRA.
It is bool for now but will become tristate in a later patch when
support for a separate module is added (vmlinux .BTF.extra is
9Mb so 'y' is not a good option for most cases since it will
bloat vmlinux size).
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
include/asm-generic/vmlinux.lds.h | 4 ++++
lib/Kconfig.debug | 18 ++++++++++++++++++
scripts/Makefile.btf | 8 ++++++++
scripts/link-vmlinux.sh | 13 ++++++++++---
4 files changed, 40 insertions(+), 3 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 8a9a2e732a65..523cf20327c1 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -675,6 +675,10 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
. = ALIGN(PAGE_SIZE); \
.BTF_ids : AT(ADDR(.BTF_ids) - LOAD_OFFSET) { \
*(.BTF_ids) \
+ } \
+ . = ALIGN(PAGE_SIZE); \
+ .BTF.extra : AT(ADDR(.BTF.extra) - LOAD_OFFSET) { \
+ BOUNDED_SECTION_BY(.BTF.extra, _BTF_extra) \
}
#else
#define BTF
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 3034e294d50d..0d8b713c94ea 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -409,6 +409,13 @@ config PAHOLE_HAS_LANG_EXCLUDE
otherwise it would emit malformed kernel and module binaries when
using DEBUG_INFO_BTF_MODULES.
+config PAHOLE_HAS_INLINE
+ def_bool PAHOLE_VERSION >= 130
+ help
+ Support for the "inline" feature in BTF is available; it supports
+ encoding information about inline sites in BTF; their location
+ and information to help retrieve their associated parameters.
+
config DEBUG_INFO_BTF_MODULES
bool "Generate BTF type information for kernel modules"
default y
@@ -426,6 +433,17 @@ config MODULE_ALLOW_BTF_MISMATCH
this option will still load module BTF where possible but ignore
it when a mismatch is found.
+config DEBUG_INFO_BTF_EXTRA
+ bool "Provide extra information about inline sites in BTF"
+ default n
+ depends on DEBUG_INFO_BTF && PAHOLE_HAS_INLINE
+ help
+ Generate information about inline sites in .BTF.extra sections
+ which consist of split BTF relative to kernel/module BTF;
+ this BTF is made available to the user in /sys/kernel/btf.extra
+ where the filename corresponds to the kernel (vmlinux) or the
+ module the inline info refers to.
+
config GDB_SCRIPTS
bool "Provide GDB scripts for kernel debugging"
help
diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
index db76335dd917..5ca98446d8b5 100644
--- a/scripts/Makefile.btf
+++ b/scripts/Makefile.btf
@@ -2,6 +2,7 @@
pahole-ver := $(CONFIG_PAHOLE_VERSION)
pahole-flags-y :=
+btf-extra :=
JOBS := $(patsubst -j%,%,$(filter -j%,$(MAKEFLAGS)))
@@ -25,8 +26,14 @@ pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j$(JOBS) --btf_features=enc
pahole-flags-$(call test-ge, $(pahole-ver), 130) += --btf_features=attributes
+btf-extra =
ifneq ($(KBUILD_EXTMOD),)
module-pahole-flags-$(call test-ge, $(pahole-ver), 128) += --btf_features=distilled_base
+else
+ifneq ($(CONFIG_DEBUG_INFO_BTF_EXTRA),)
+pahole-flags-$(call test-ge, $(pahole-ver), 130) += --btf_features=inline.extra
+btf-extra := y
+endif
endif
endif
@@ -35,3 +42,4 @@ pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE) += --lang_exclude=rust
export PAHOLE_FLAGS := $(pahole-flags-y)
export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y)
+export BTF_EXTRA := $(btf-extra)
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 433849ff7529..f88b356fe270 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -105,22 +105,29 @@ vmlinux_link()
${kallsymso} ${btf_vmlinux_bin_o} ${arch_vmlinux_o} ${ldlibs}
}
-# generate .BTF typeinfo from DWARF debuginfo
+# generate .BTF typeinfo from DWARF debuginfo. Optionally add .BTF.extra
+# section if BTF_EXTRA is set.
# ${1} - vmlinux image
gen_btf()
{
local btf_data=${1}.btf.o
+ local btf_extra_flags=
info BTF "${btf_data}"
LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
- # Create ${btf_data} which contains just .BTF section but no symbols. Add
+
+ if [ -n "${BTF_EXTRA}" ]; then
+ btf_extra_flags="--only-section=.BTF.extra --set-section-flags .BTF.extra=alloc,readonly"
+ fi
+
+ # Create ${btf_data} which contains just .BTF sections but no symbols. Add
# SHF_ALLOC because .BTF will be part of the vmlinux image. --strip-all
# deletes all symbols including __start_BTF and __stop_BTF, which will
# be redefined in the linker script. Add 2>/dev/null to suppress GNU
# objcopy warnings: "empty loadable segment detected at ..."
${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
- --strip-all ${1} "${btf_data}" 2>/dev/null
+ ${btf_extra_flags} --strip-all ${1} "${btf_data}" 2>/dev/null
# Change e_type to ET_REL so that it can be used to link final vmlinux.
# GNU ld 2.35+ and lld do not allow an ET_EXEC input.
if is_enabled CONFIG_CPU_BIG_ENDIAN; then
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (10 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 11/15] kbuild: Add support for extra BTF Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-16 18:37 ` Andrii Nakryiko
2025-10-23 0:58 ` Eduard Zingerman
2025-10-08 17:35 ` [RFC bpf-next 13/15] libbpf: add API to load extra BTF Alan Maguire
` (4 subsequent siblings)
16 siblings, 2 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
Allow module-based delivery of potentially large vmlinux .BTF.extra section;
section; also support visibility of BTF data in kernel, modules in
/sys/kernel/btf_extra.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
include/linux/bpf.h | 1 +
include/linux/btf.h | 2 +
include/linux/module.h | 4 ++
kernel/bpf/Makefile | 1 +
kernel/bpf/btf.c | 114 +++++++++++++++++++++++++++-----------
| 25 +++++++++
kernel/bpf/sysfs_btf.c | 21 ++++++-
kernel/module/main.c | 4 ++
lib/Kconfig.debug | 2 +-
scripts/Makefile.btf | 3 +-
scripts/Makefile.modfinal | 5 ++
scripts/link-vmlinux.sh | 6 ++
12 files changed, 154 insertions(+), 34 deletions(-)
create mode 100644 kernel/bpf/btf_extra.c
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a98c83346134..7a15fc077642 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -63,6 +63,7 @@ struct inode;
extern struct idr btf_idr;
extern spinlock_t btf_idr_lock;
extern struct kobject *btf_kobj;
+extern struct kobject *btf_extra_kobj;
extern struct bpf_mem_alloc bpf_global_ma, bpf_global_percpu_ma;
extern bool bpf_global_ma_set;
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 65091c6aff4b..3684f6266b1c 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -621,6 +621,8 @@ int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_ty
bool btf_types_are_same(const struct btf *btf1, u32 id1,
const struct btf *btf2, u32 id2);
int btf_check_iter_arg(struct btf *btf, const struct btf_type *func, int arg_idx);
+struct bin_attribute *sysfs_btf_add(struct kobject *kobj, const char *name,
+ void *data, size_t data_size);
static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t)
{
diff --git a/include/linux/module.h b/include/linux/module.h
index e135cc79acee..c2fceaf392c5 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -512,6 +512,10 @@ struct module {
unsigned int btf_base_data_size;
void *btf_data;
void *btf_base_data;
+#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF_EXTRA)
+ unsigned int btf_extra_data_size;
+ void *btf_extra_data;
+#endif
#endif
#ifdef CONFIG_JUMP_LABEL
struct jump_entry *jump_entries;
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 7fd0badfacb1..08bf991560d7 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_BPF_SYSCALL) += reuseport_array.o
endif
ifeq ($(CONFIG_SYSFS),y)
obj-$(CONFIG_DEBUG_INFO_BTF) += sysfs_btf.o
+obj-$(CONFIG_DEBUG_INFO_BTF_EXTRA) += btf_extra.o
endif
ifeq ($(CONFIG_BPF_JIT),y)
obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 29cec549f119..749e04c679c6 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -8323,12 +8323,42 @@ enum {
BTF_MODULE_F_LIVE = (1 << 0),
};
+#if IS_ENABLED(CONFIG_SYSFS)
+struct bin_attribute *sysfs_btf_add(struct kobject *kobj, const char *name,
+ void *data, size_t data_size)
+{
+ struct bin_attribute *attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+ int err;
+
+ if (!attr)
+ return ERR_PTR(-ENOMEM);
+
+ sysfs_bin_attr_init(attr);
+ attr->attr.name = name;
+ attr->attr.mode = 0444;
+ attr->size = data_size;
+ attr->private = data;
+ attr->read = sysfs_bin_attr_simple_read;
+
+ err = sysfs_create_bin_file(kobj, attr);
+ if (err) {
+ pr_warn("failed to register module [%s] BTF in sysfs : %d\n", name, err);
+ kfree(attr);
+ return ERR_PTR(err);
+ }
+ return attr;
+}
+
+#endif
+
#ifdef CONFIG_DEBUG_INFO_BTF_MODULES
struct btf_module {
struct list_head list;
struct module *module;
struct btf *btf;
struct bin_attribute *sysfs_attr;
+ void *btf_extra_data;
+ struct bin_attribute *sysfs_extra_attr;
int flags;
};
@@ -8342,12 +8372,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
{
struct btf_module *btf_mod, *tmp;
struct module *mod = module;
- struct btf *btf;
+ struct bin_attribute *attr;
+ struct btf *btf = NULL;
int err = 0;
- if (mod->btf_data_size == 0 ||
- (op != MODULE_STATE_COMING && op != MODULE_STATE_LIVE &&
- op != MODULE_STATE_GOING))
+ if (op != MODULE_STATE_COMING && op != MODULE_STATE_LIVE &&
+ op != MODULE_STATE_GOING)
goto out;
switch (op) {
@@ -8357,8 +8387,10 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
err = -ENOMEM;
goto out;
}
- btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size,
- mod->btf_base_data, mod->btf_base_data_size);
+ if (mod->btf_data_size > 0) {
+ btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size,
+ mod->btf_base_data, mod->btf_base_data_size);
+ }
if (IS_ERR(btf)) {
kfree(btf_mod);
if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH)) {
@@ -8370,7 +8402,8 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
}
goto out;
}
- err = btf_alloc_id(btf);
+ if (btf)
+ err = btf_alloc_id(btf);
if (err) {
btf_free(btf);
kfree(btf_mod);
@@ -8384,32 +8417,45 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
list_add(&btf_mod->list, &btf_modules);
mutex_unlock(&btf_module_mutex);
- if (IS_ENABLED(CONFIG_SYSFS)) {
- struct bin_attribute *attr;
-
- attr = kzalloc(sizeof(*attr), GFP_KERNEL);
- if (!attr)
- goto out;
-
- sysfs_bin_attr_init(attr);
- attr->attr.name = btf->name;
- attr->attr.mode = 0444;
- attr->size = btf->data_size;
- attr->private = btf->data;
- attr->read = sysfs_bin_attr_simple_read;
-
- err = sysfs_create_bin_file(btf_kobj, attr);
- if (err) {
- pr_warn("failed to register module [%s] BTF in sysfs: %d\n",
- mod->name, err);
- kfree(attr);
- err = 0;
+ if (IS_ENABLED(CONFIG_SYSFS) && btf) {
+ attr = sysfs_btf_add(btf_kobj, btf->name, btf->data, btf->data_size);
+ if (IS_ERR(attr)) {
+ err = PTR_ERR(attr);
goto out;
}
-
btf_mod->sysfs_attr = attr;
}
+#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF_EXTRA)
+ if (mod->btf_extra_data_size > 0) {
+ const char *name = mod->name;
+ void *data;
+ /* vmlinux .BTF.extra is SHF_ALLOC; other modules
+ * are not, so for them we need to kvmemdup() the data.
+ */
+ if (strcmp(mod->name, "btf_extra") == 0) {
+ name = "vmlinux";
+ data = mod->btf_extra_data;
+ } else {
+ data = kvmemdup(mod->btf_extra_data, mod->btf_extra_data_size,
+ GFP_KERNEL | __GFP_NOWARN);
+ if (!data) {
+ err = -ENOMEM;
+ goto out;
+ }
+ btf_mod->btf_extra_data = data;
+ }
+ attr = sysfs_btf_add(btf_extra_kobj, name, data,
+ mod->btf_extra_data_size);
+ if (IS_ERR(attr)) {
+ err = PTR_ERR(attr);
+ kfree(btf_mod->sysfs_attr);
+ kvfree(btf_mod->btf_extra_data);
+ goto out;
+ }
+ btf_mod->sysfs_extra_attr = attr;
+ }
+#endif
break;
case MODULE_STATE_LIVE:
mutex_lock(&btf_module_mutex);
@@ -8431,9 +8477,15 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
list_del(&btf_mod->list);
if (btf_mod->sysfs_attr)
sysfs_remove_bin_file(btf_kobj, btf_mod->sysfs_attr);
- purge_cand_cache(btf_mod->btf);
- btf_put(btf_mod->btf);
- kfree(btf_mod->sysfs_attr);
+ if (btf_mod->btf_extra_data)
+ kvfree(btf_mod->btf_extra_data);
+ if (btf_mod->sysfs_extra_attr)
+ sysfs_remove_bin_file(btf_extra_kobj, btf_mod->sysfs_extra_attr);
+ if (btf_mod->btf) {
+ purge_cand_cache(btf_mod->btf);
+ btf_put(btf_mod->btf);
+ kfree(btf_mod->sysfs_attr);
+ }
kfree(btf_mod);
break;
}
--git a/kernel/bpf/btf_extra.c b/kernel/bpf/btf_extra.c
new file mode 100644
index 000000000000..f50616801be9
--- /dev/null
+++ b/kernel/bpf/btf_extra.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025, Oracle and/or its affiliates. */
+/*
+ * Provide extra kernel BTF information for use by BPF tools.
+ *
+ * Can be built as a module to support cases where vmlinux .BTF.extra
+ * section size in the vmlinux image is too much.
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+
+static int __init btf_extra_init(void)
+{
+ return 0;
+}
+subsys_initcall(btf_extra_init);
+
+static void __exit btf_extra_exit(void)
+{
+}
+module_exit(btf_extra_exit);
+
+MODULE_DESCRIPTION("Extra BTF information");
+MODULE_LICENSE("GPL v2");
diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c
index 9cbe15ce3540..0298a0936b9f 100644
--- a/kernel/bpf/sysfs_btf.c
+++ b/kernel/bpf/sysfs_btf.c
@@ -49,7 +49,15 @@ static struct bin_attribute bin_attr_btf_vmlinux __ro_after_init = {
.mmap = btf_sysfs_vmlinux_mmap,
};
-struct kobject *btf_kobj;
+struct kobject *btf_kobj, *btf_extra_kobj;
+
+#if IS_BUILTIN(CONFIG_DEBUG_INFO_BTF_EXTRA)
+/* See scripts/link-vmlinux.sh, gen_btf() func for details */
+extern char __start_BTF_extra[];
+extern char __stop_BTF_extra[];
+
+struct bin_attribute *extra_attr;
+#endif
static int __init btf_vmlinux_init(void)
{
@@ -62,6 +70,17 @@ static int __init btf_vmlinux_init(void)
btf_kobj = kobject_create_and_add("btf", kernel_kobj);
if (!btf_kobj)
return -ENOMEM;
+ if (IS_ENABLED(CONFIG_DEBUG_INFO_BTF_EXTRA)) {
+ btf_extra_kobj = kobject_create_and_add("btf_extra", kernel_kobj);
+ if (!btf_extra_kobj)
+ return -ENOMEM;
+#if IS_BUILTIN(CONFIG_DEBUG_INFO_BTF_EXTRA)
+ extra_attr = sysfs_btf_add(btf_extra_kobj, "vmlinux", __start_BTF_extra,
+ __stop_BTF_extra - __start_BTF_extra);
+ if (IS_ERR(extra_attr))
+ return PTR_ERR(extra_attr);
+#endif
+ }
return sysfs_create_bin_file(btf_kobj, &bin_attr_btf_vmlinux);
}
diff --git a/kernel/module/main.c b/kernel/module/main.c
index c66b26184936..0766f5e09020 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -2648,6 +2648,10 @@ static int find_module_sections(struct module *mod, struct load_info *info)
mod->btf_base_data = any_section_objs(info, ".BTF.base", 1,
&mod->btf_base_data_size);
#endif
+#if IS_ENABLED(CONFIG_DEBUG_INFO_BTF_EXTRA)
+ mod->btf_extra_data = any_section_objs(info, ".BTF.extra", 1,
+ &mod->btf_extra_data_size);
+#endif
#ifdef CONFIG_JUMP_LABEL
mod->jump_entries = section_objs(info, "__jump_table",
sizeof(*mod->jump_entries),
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 0d8b713c94ea..8ddf921a4b0e 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -434,7 +434,7 @@ config MODULE_ALLOW_BTF_MISMATCH
it when a mismatch is found.
config DEBUG_INFO_BTF_EXTRA
- bool "Provide extra information about inline sites in BTF"
+ tristate "Provide extra information about inline sites in BTF"
default n
depends on DEBUG_INFO_BTF && PAHOLE_HAS_INLINE
help
diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf
index 5ca98446d8b5..791794b65d67 100644
--- a/scripts/Makefile.btf
+++ b/scripts/Makefile.btf
@@ -32,7 +32,7 @@ module-pahole-flags-$(call test-ge, $(pahole-ver), 128) += --btf_features=distil
else
ifneq ($(CONFIG_DEBUG_INFO_BTF_EXTRA),)
pahole-flags-$(call test-ge, $(pahole-ver), 130) += --btf_features=inline.extra
-btf-extra := y
+btf-extra := $(CONFIG_DEBUG_INFO_BTF_EXTRA)
endif
endif
@@ -43,3 +43,4 @@ pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE) += --lang_exclude=rust
export PAHOLE_FLAGS := $(pahole-flags-y)
export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y)
export BTF_EXTRA := $(btf-extra)
+export VMLINUX_BTF_EXTRA := .tmp_vmlinux_btf_extra
diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal
index 542ba462ed3e..e522ae9090ea 100644
--- a/scripts/Makefile.modfinal
+++ b/scripts/Makefile.modfinal
@@ -34,10 +34,15 @@ quiet_cmd_ld_ko_o = LD [M] $@
$(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \
-T $(objtree)/scripts/module.lds -o $@ $(filter %.o, $^)
+btf_vmlinux_bin_o := .tmp_vmlinux1.btf.o
+
quiet_cmd_btf_ko = BTF [M] $@
cmd_btf_ko = \
if [ ! -f $(objtree)/vmlinux ]; then \
printf "Skipping BTF generation for %s due to unavailability of vmlinux\n" $@ 1>&2; \
+ elif [ $@ == "kernel/bpf/btf_extra.ko" ]; then \
+ ${OBJCOPY} --add-section .BTF.extra=${VMLINUX_BTF_EXTRA} \
+ --set-section-flags .BTF.extra=alloc,readonly $@ ; \
else \
LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) $(MODULE_PAHOLE_FLAGS) --btf_base $(objtree)/vmlinux $@; \
$(RESOLVE_BTFIDS) -b $(objtree)/vmlinux $@; \
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index f88b356fe270..afda64aeed3d 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -137,6 +137,12 @@ gen_btf()
fi
printf "${et_rel}" | dd of="${btf_data}" conv=notrunc bs=1 seek=16 status=none
+ if [ "$BTF_EXTRA" = "m" ]; then
+ # vmlinux BTF extra will be delivered via the btf_extra.ko
+ # module; ensure it is not linked into vmlinux.
+ $OBJCOPY -O binary --only-section=.BTF.extra ${btf_data} ${VMLINUX_BTF_EXTRA}
+ $OBJCOPY --remove-section=.BTF.extra ${btf_data}
+ fi
btf_vmlinux_bin_o=${btf_data}
}
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m
2025-10-08 17:35 ` [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m Alan Maguire
@ 2025-10-16 18:37 ` Andrii Nakryiko
2025-10-17 13:54 ` Alan Maguire
2025-10-23 0:58 ` Eduard Zingerman
1 sibling, 1 reply; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-16 18:37 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Wed, Oct 8, 2025 at 10:36 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Allow module-based delivery of potentially large vmlinux .BTF.extra section;
> section; also support visibility of BTF data in kernel, modules in
> /sys/kernel/btf_extra.
>
nit: whatever naming we pick, I'd keep all the BTF exposed under the
same /sys/kernel/btf/ directory. And then use suffixes to denote extra
subsets of BTF. E.g., vmlinux is base vmlinux BTF, vmlinux.funcs (or
whatever we will agree on) will be split BTF on top of vmlinux BTF
with all this function information. Same for kernel modules: <module>
for "base module BTF" (which is itself split on top of vmlinux, of
course), and <module>.funcs for this extra func info stuff,
(multi-)split on top of <module> BTF itself.
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> include/linux/bpf.h | 1 +
> include/linux/btf.h | 2 +
> include/linux/module.h | 4 ++
> kernel/bpf/Makefile | 1 +
> kernel/bpf/btf.c | 114 +++++++++++++++++++++++++++-----------
> kernel/bpf/btf_extra.c | 25 +++++++++
> kernel/bpf/sysfs_btf.c | 21 ++++++-
> kernel/module/main.c | 4 ++
> lib/Kconfig.debug | 2 +-
> scripts/Makefile.btf | 3 +-
> scripts/Makefile.modfinal | 5 ++
> scripts/link-vmlinux.sh | 6 ++
> 12 files changed, 154 insertions(+), 34 deletions(-)
> create mode 100644 kernel/bpf/btf_extra.c
>
[...]
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m
2025-10-16 18:37 ` Andrii Nakryiko
@ 2025-10-17 13:54 ` Alan Maguire
2025-10-20 21:05 ` Andrii Nakryiko
0 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-17 13:54 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On 16/10/2025 19:37, Andrii Nakryiko wrote:
> On Wed, Oct 8, 2025 at 10:36 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> Allow module-based delivery of potentially large vmlinux .BTF.extra section;
>> section; also support visibility of BTF data in kernel, modules in
>> /sys/kernel/btf_extra.
>>
>
> nit: whatever naming we pick, I'd keep all the BTF exposed under the
> same /sys/kernel/btf/ directory. And then use suffixes to denote extra
> subsets of BTF. E.g., vmlinux is base vmlinux BTF, vmlinux.funcs (or
> whatever we will agree on) will be split BTF on top of vmlinux BTF
> with all this function information. Same for kernel modules: <module>
> for "base module BTF" (which is itself split on top of vmlinux, of
> course), and <module>.funcs for this extra func info stuff,
> (multi-)split on top of <module> BTF itself.
>
I went back and forth on this; my only hesitation in adding to
/sys/kernel/btf was that I was concerned existing tools might make
assumptions about its contents; i.e.
- vmlinux is kernel BTF
- everything else is module BTF relative to base BTF
If we're not too worried about that we can put it in the same directory
with the "." connoting split BTF relative to the prefix
/sys/kernel/btf/[vmlinux|module].func_info
Don't think a "." is valid in a module name, so there should never be
name clashes.
For completeness another possibility is
/sys/kernel/btf/func.info/[vmlinux|module_name]
However I'm happy to adjust to whateer seems most intuitive. Thanks!
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>> include/linux/bpf.h | 1 +
>> include/linux/btf.h | 2 +
>> include/linux/module.h | 4 ++
>> kernel/bpf/Makefile | 1 +
>> kernel/bpf/btf.c | 114 +++++++++++++++++++++++++++-----------
>> kernel/bpf/btf_extra.c | 25 +++++++++
>> kernel/bpf/sysfs_btf.c | 21 ++++++-
>> kernel/module/main.c | 4 ++
>> lib/Kconfig.debug | 2 +-
>> scripts/Makefile.btf | 3 +-
>> scripts/Makefile.modfinal | 5 ++
>> scripts/link-vmlinux.sh | 6 ++
>> 12 files changed, 154 insertions(+), 34 deletions(-)
>> create mode 100644 kernel/bpf/btf_extra.c
>>
>
> [...]
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m
2025-10-17 13:54 ` Alan Maguire
@ 2025-10-20 21:05 ` Andrii Nakryiko
0 siblings, 0 replies; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-20 21:05 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Fri, Oct 17, 2025 at 6:54 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 16/10/2025 19:37, Andrii Nakryiko wrote:
> > On Wed, Oct 8, 2025 at 10:36 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> Allow module-based delivery of potentially large vmlinux .BTF.extra section;
> >> section; also support visibility of BTF data in kernel, modules in
> >> /sys/kernel/btf_extra.
> >>
> >
> > nit: whatever naming we pick, I'd keep all the BTF exposed under the
> > same /sys/kernel/btf/ directory. And then use suffixes to denote extra
> > subsets of BTF. E.g., vmlinux is base vmlinux BTF, vmlinux.funcs (or
> > whatever we will agree on) will be split BTF on top of vmlinux BTF
> > with all this function information. Same for kernel modules: <module>
> > for "base module BTF" (which is itself split on top of vmlinux, of
> > course), and <module>.funcs for this extra func info stuff,
> > (multi-)split on top of <module> BTF itself.
> >
>
> I went back and forth on this; my only hesitation in adding to
> /sys/kernel/btf was that I was concerned existing tools might make
> assumptions about its contents; i.e.
>
> - vmlinux is kernel BTF
> - everything else is module BTF relative to base BTF
>
> If we're not too worried about that we can put it in the same directory
> with the "." connoting split BTF relative to the prefix
>
> /sys/kernel/btf/[vmlinux|module].func_info
>
> Don't think a "." is valid in a module name, so there should never be
> name clashes.
Yep, I don't think there is any compatibility problem to worry about.
Let's go with .suffix as a generic way to specify extensions/subsets
of BTF.
>
> For completeness another possibility is
>
> /sys/kernel/btf/func.info/[vmlinux|module_name]
>
> However I'm happy to adjust to whateer seems most intuitive. Thanks!
>
> >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >> ---
> >> include/linux/bpf.h | 1 +
> >> include/linux/btf.h | 2 +
> >> include/linux/module.h | 4 ++
> >> kernel/bpf/Makefile | 1 +
> >> kernel/bpf/btf.c | 114 +++++++++++++++++++++++++++-----------
> >> kernel/bpf/btf_extra.c | 25 +++++++++
> >> kernel/bpf/sysfs_btf.c | 21 ++++++-
> >> kernel/module/main.c | 4 ++
> >> lib/Kconfig.debug | 2 +-
> >> scripts/Makefile.btf | 3 +-
> >> scripts/Makefile.modfinal | 5 ++
> >> scripts/link-vmlinux.sh | 6 ++
> >> 12 files changed, 154 insertions(+), 34 deletions(-)
> >> create mode 100644 kernel/bpf/btf_extra.c
> >>
> >
> > [...]
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m
2025-10-08 17:35 ` [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m Alan Maguire
2025-10-16 18:37 ` Andrii Nakryiko
@ 2025-10-23 0:58 ` Eduard Zingerman
2025-10-23 12:00 ` Alan Maguire
1 sibling, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-10-23 0:58 UTC (permalink / raw)
To: Alan Maguire, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On Wed, 2025-10-08 at 18:35 +0100, Alan Maguire wrote:
[...]
> diff --git a/include/linux/module.h b/include/linux/module.h
> index e135cc79acee..c2fceaf392c5 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
[...]
> @@ -8342,12 +8372,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> {
> struct btf_module *btf_mod, *tmp;
> struct module *mod = module;
> - struct btf *btf;
> + struct bin_attribute *attr;
> + struct btf *btf = NULL;
> int err = 0;
>
> - if (mod->btf_data_size == 0 ||
> - (op != MODULE_STATE_COMING && op != MODULE_STATE_LIVE &&
> - op != MODULE_STATE_GOING))
> + if (op != MODULE_STATE_COMING && op != MODULE_STATE_LIVE &&
> + op != MODULE_STATE_GOING)
> goto out;
Removing this check leads to:
case MODULE_STATE_COMING:
...
btf_mod->btf = btf;
list_add(new: &btf_mod->list, head: &btf_modules);
Even when `btf` is NULL. Why is it necessary?
>
> switch (op) {
> @@ -8357,8 +8387,10 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> err = -ENOMEM;
> goto out;
> }
> - btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size,
> - mod->btf_base_data, mod->btf_base_data_size);
> + if (mod->btf_data_size > 0) {
> + btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size,
> + mod->btf_base_data, mod->btf_base_data_size);
> + }
> if (IS_ERR(btf)) {
> kfree(btf_mod);
> if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH)) {
> @@ -8370,7 +8402,8 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> }
> goto out;
> }
> - err = btf_alloc_id(btf);
> + if (btf)
> + err = btf_alloc_id(btf);
> if (err) {
> btf_free(btf);
> kfree(btf_mod);
> @@ -8384,32 +8417,45 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
> list_add(&btf_mod->list, &btf_modules);
> mutex_unlock(&btf_module_mutex);
>
[...]
Apologies for delayed response, will read through the rest of the
series and pahole changes tomorrow.
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m
2025-10-23 0:58 ` Eduard Zingerman
@ 2025-10-23 12:00 ` Alan Maguire
0 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-23 12:00 UTC (permalink / raw)
To: Eduard Zingerman, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On 23/10/2025 01:58, Eduard Zingerman wrote:
> On Wed, 2025-10-08 at 18:35 +0100, Alan Maguire wrote:
>
> [...]
>
>> diff --git a/include/linux/module.h b/include/linux/module.h
>> index e135cc79acee..c2fceaf392c5 100644
>> --- a/include/linux/module.h
>> +++ b/include/linux/module.h
>
> [...]
>
>> @@ -8342,12 +8372,12 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
>> {
>> struct btf_module *btf_mod, *tmp;
>> struct module *mod = module;
>> - struct btf *btf;
>> + struct bin_attribute *attr;
>> + struct btf *btf = NULL;
>> int err = 0;
>>
>> - if (mod->btf_data_size == 0 ||
>> - (op != MODULE_STATE_COMING && op != MODULE_STATE_LIVE &&
>> - op != MODULE_STATE_GOING))
>> + if (op != MODULE_STATE_COMING && op != MODULE_STATE_LIVE &&
>> + op != MODULE_STATE_GOING)
>> goto out;
>
> Removing this check leads to:
>
> case MODULE_STATE_COMING:
> ...
> btf_mod->btf = btf;
> list_add(new: &btf_mod->list, head: &btf_modules);
>
> Even when `btf` is NULL. Why is it necessary?
>
The reason here is we need a btf_mod list entry for the btf_extra.ko
module; unlike other cases it has a .BTF.extra section (objcopy'ed into
the module to save bloating vmlinux with the large .BTF.extra section)
but no .BTF section.
>>
>> switch (op) {
>> @@ -8357,8 +8387,10 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
>> err = -ENOMEM;
>> goto out;
>> }
>> - btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size,
>> - mod->btf_base_data, mod->btf_base_data_size);
>> + if (mod->btf_data_size > 0) {
>> + btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size,
>> + mod->btf_base_data, mod->btf_base_data_size);
>> + }
>> if (IS_ERR(btf)) {
>> kfree(btf_mod);
>> if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH)) {
>> @@ -8370,7 +8402,8 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
>> }
>> goto out;
>> }
>> - err = btf_alloc_id(btf);
>> + if (btf)
>> + err = btf_alloc_id(btf);
>> if (err) {
>> btf_free(btf);
>> kfree(btf_mod);
>> @@ -8384,32 +8417,45 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op,
>> list_add(&btf_mod->list, &btf_modules);
>> mutex_unlock(&btf_module_mutex);
>>
>
> [...]
>
> Apologies for delayed response, will read through the rest of the
> series and pahole changes tomorrow.
>
No need to apologize, thanks for taking a look!
Alan
^ permalink raw reply [flat|nested] 63+ messages in thread
* [RFC bpf-next 13/15] libbpf: add API to load extra BTF
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (11 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 12/15] kbuild, module, bpf: Support CONFIG_DEBUG_INFO_BTF_EXTRA=m Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-16 18:37 ` Andrii Nakryiko
2025-10-08 17:35 ` [RFC bpf-next 14/15] libbpf: add support for BTF location attachment Alan Maguire
` (3 subsequent siblings)
16 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
Add btf__load_btf_extra() function to load extra BTF relative to
base passed in. Base can be vmlinux or module BTF.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/lib/bpf/btf.c | 8 ++++++++
tools/lib/bpf/btf.h | 1 +
tools/lib/bpf/libbpf.map | 1 +
3 files changed, 10 insertions(+)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 62d80e8e81bf..028fbb0e03be 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -5783,6 +5783,14 @@ struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_bt
return btf__parse_split(path, vmlinux_btf);
}
+struct btf *btf__load_btf_extra(const char *name, struct btf *base)
+{
+ char path[80];
+
+ snprintf(path, sizeof(path), "/sys/kernel/btf_extra/%s", name);
+ return btf__parse_split(path, base);
+}
+
int btf_ext_visit_type_ids(struct btf_ext *btf_ext, type_id_visit_fn visit, void *ctx)
{
const struct btf_ext_info *seg;
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 082b010c0228..f8ec3a59fca0 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -138,6 +138,7 @@ LIBBPF_API struct btf *btf__parse_raw_split(const char *path, struct btf *base_b
LIBBPF_API struct btf *btf__load_vmlinux_btf(void);
LIBBPF_API struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_btf);
+LIBBPF_API struct btf *btf__load_btf_extra(const char *name, struct btf *base);
LIBBPF_API struct btf *btf__load_from_kernel_by_id(__u32 id);
LIBBPF_API struct btf *btf__load_from_kernel_by_id_split(__u32 id, struct btf *base_btf);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 82a0d2ff1176..5f5cf9773205 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -456,4 +456,5 @@ LIBBPF_1.7.0 {
btf__add_loc_proto_param;
btf__add_locsec;
btf__add_locsec_loc;
+ btf__load_btf_extra;
} LIBBPF_1.6.0;
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 13/15] libbpf: add API to load extra BTF
2025-10-08 17:35 ` [RFC bpf-next 13/15] libbpf: add API to load extra BTF Alan Maguire
@ 2025-10-16 18:37 ` Andrii Nakryiko
2025-10-17 13:55 ` Alan Maguire
0 siblings, 1 reply; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-16 18:37 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Wed, Oct 8, 2025 at 10:36 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Add btf__load_btf_extra() function to load extra BTF relative to
> base passed in. Base can be vmlinux or module BTF.
>
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> tools/lib/bpf/btf.c | 8 ++++++++
> tools/lib/bpf/btf.h | 1 +
> tools/lib/bpf/libbpf.map | 1 +
> 3 files changed, 10 insertions(+)
>
> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
> index 62d80e8e81bf..028fbb0e03be 100644
> --- a/tools/lib/bpf/btf.c
> +++ b/tools/lib/bpf/btf.c
> @@ -5783,6 +5783,14 @@ struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_bt
> return btf__parse_split(path, vmlinux_btf);
> }
>
> +struct btf *btf__load_btf_extra(const char *name, struct btf *base)
> +{
> + char path[80];
> +
> + snprintf(path, sizeof(path), "/sys/kernel/btf_extra/%s", name);
> + return btf__parse_split(path, base);
> +}
> +
why do we need a dedicated libbpf API for loading split BTF?..
> int btf_ext_visit_type_ids(struct btf_ext *btf_ext, type_id_visit_fn visit, void *ctx)
> {
> const struct btf_ext_info *seg;
> diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
> index 082b010c0228..f8ec3a59fca0 100644
> --- a/tools/lib/bpf/btf.h
> +++ b/tools/lib/bpf/btf.h
> @@ -138,6 +138,7 @@ LIBBPF_API struct btf *btf__parse_raw_split(const char *path, struct btf *base_b
>
> LIBBPF_API struct btf *btf__load_vmlinux_btf(void);
> LIBBPF_API struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_btf);
> +LIBBPF_API struct btf *btf__load_btf_extra(const char *name, struct btf *base);
>
> LIBBPF_API struct btf *btf__load_from_kernel_by_id(__u32 id);
> LIBBPF_API struct btf *btf__load_from_kernel_by_id_split(__u32 id, struct btf *base_btf);
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 82a0d2ff1176..5f5cf9773205 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -456,4 +456,5 @@ LIBBPF_1.7.0 {
> btf__add_loc_proto_param;
> btf__add_locsec;
> btf__add_locsec_loc;
> + btf__load_btf_extra;
> } LIBBPF_1.6.0;
> --
> 2.39.3
>
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 13/15] libbpf: add API to load extra BTF
2025-10-16 18:37 ` Andrii Nakryiko
@ 2025-10-17 13:55 ` Alan Maguire
0 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-17 13:55 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On 16/10/2025 19:37, Andrii Nakryiko wrote:
> On Wed, Oct 8, 2025 at 10:36 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> Add btf__load_btf_extra() function to load extra BTF relative to
>> base passed in. Base can be vmlinux or module BTF.
>>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>> tools/lib/bpf/btf.c | 8 ++++++++
>> tools/lib/bpf/btf.h | 1 +
>> tools/lib/bpf/libbpf.map | 1 +
>> 3 files changed, 10 insertions(+)
>>
>> diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
>> index 62d80e8e81bf..028fbb0e03be 100644
>> --- a/tools/lib/bpf/btf.c
>> +++ b/tools/lib/bpf/btf.c
>> @@ -5783,6 +5783,14 @@ struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_bt
>> return btf__parse_split(path, vmlinux_btf);
>> }
>>
>> +struct btf *btf__load_btf_extra(const char *name, struct btf *base)
>> +{
>> + char path[80];
>> +
>> + snprintf(path, sizeof(path), "/sys/kernel/btf_extra/%s", name);
>> + return btf__parse_split(path, base);
>> +}
>> +
>
> why do we need a dedicated libbpf API for loading split BTF?..
>
>
that's a result of the design choice of using /sys/kernel/btf.extra as
the home for the multi-split BTF. If we moved away from that and just
had /sys/kernel/btf we could drop it I think.
>> int btf_ext_visit_type_ids(struct btf_ext *btf_ext, type_id_visit_fn visit, void *ctx)
>> {
>> const struct btf_ext_info *seg;
>> diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
>> index 082b010c0228..f8ec3a59fca0 100644
>> --- a/tools/lib/bpf/btf.h
>> +++ b/tools/lib/bpf/btf.h
>> @@ -138,6 +138,7 @@ LIBBPF_API struct btf *btf__parse_raw_split(const char *path, struct btf *base_b
>>
>> LIBBPF_API struct btf *btf__load_vmlinux_btf(void);
>> LIBBPF_API struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_btf);
>> +LIBBPF_API struct btf *btf__load_btf_extra(const char *name, struct btf *base);
>>
>> LIBBPF_API struct btf *btf__load_from_kernel_by_id(__u32 id);
>> LIBBPF_API struct btf *btf__load_from_kernel_by_id_split(__u32 id, struct btf *base_btf);
>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index 82a0d2ff1176..5f5cf9773205 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -456,4 +456,5 @@ LIBBPF_1.7.0 {
>> btf__add_loc_proto_param;
>> btf__add_locsec;
>> btf__add_locsec_loc;
>> + btf__load_btf_extra;
>> } LIBBPF_1.6.0;
>> --
>> 2.39.3
>>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [RFC bpf-next 14/15] libbpf: add support for BTF location attachment
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (12 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 13/15] libbpf: add API to load extra BTF Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-16 18:36 ` Andrii Nakryiko
2025-10-08 17:35 ` [RFC bpf-next 15/15] selftests/bpf: Add test tracing inline site using SEC("kloc") Alan Maguire
` (2 subsequent siblings)
16 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
Add support for BTF-based location attachment via multiple kprobes
attaching to each instance of an inline site. Note this is not kprobe
multi attach since that requires fprobe on entry and sites are within
functions. Implementation similar to USDT manager where we use BTF
to create a location manager and populate expected arg values with
metadata based upon BTF_KIND_LOC_PARAM/LOC_PROTOs.
Add new auto-attach SEC("kloc/module:name") where the module is
vmlinux/kernel module and the name is the name of the associated
location; all sites associated with that name will be attached via
kprobes for tracing.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/lib/bpf/Build | 2 +-
tools/lib/bpf/Makefile | 2 +-
tools/lib/bpf/libbpf.c | 76 +++-
tools/lib/bpf/libbpf.h | 27 ++
tools/lib/bpf/libbpf.map | 1 +
tools/lib/bpf/libbpf_internal.h | 7 +
tools/lib/bpf/loc.bpf.h | 297 +++++++++++++++
tools/lib/bpf/loc.c | 653 ++++++++++++++++++++++++++++++++
8 files changed, 1062 insertions(+), 3 deletions(-)
create mode 100644 tools/lib/bpf/loc.bpf.h
create mode 100644 tools/lib/bpf/loc.c
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index c80204bb72a2..df216ccb015b 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1,4 +1,4 @@
libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_utils.o \
netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \
btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \
- usdt.o zip.o elf.o features.o btf_iter.o btf_relocate.o
+ usdt.o zip.o elf.o features.o btf_iter.o btf_relocate.o loc.o
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 168140f8e646..b22be124edc3 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -234,7 +234,7 @@ install_lib: all_cmd
SRC_HDRS := bpf.h libbpf.h btf.h libbpf_common.h libbpf_legacy.h \
bpf_helpers.h bpf_tracing.h bpf_endian.h bpf_core_read.h \
- skel_internal.h libbpf_version.h usdt.bpf.h
+ skel_internal.h libbpf_version.h usdt.bpf.h loc.bpf.h
GEN_HDRS := $(BPF_GENERATED)
INSTALL_PFX := $(DESTDIR)$(prefix)/include/bpf
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index dd3b2f57082d..1605b95844cf 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -407,6 +407,7 @@ enum sec_def_flags {
SEC_XDP_FRAGS = 16,
/* Setup proper attach type for usdt probes. */
SEC_USDT = 32,
+ SEC_LOC = 33,
};
struct bpf_sec_def {
@@ -671,6 +672,8 @@ struct elf_state {
struct usdt_manager;
+struct loc_manager;
+
enum bpf_object_state {
OBJ_OPEN,
OBJ_PREPARED,
@@ -733,6 +736,7 @@ struct bpf_object {
size_t fd_array_cnt;
struct usdt_manager *usdt_man;
+ struct loc_manager *loc_man;
int arena_map_idx;
void *arena_data;
@@ -9190,6 +9194,8 @@ void bpf_object__close(struct bpf_object *obj)
usdt_manager_free(obj->usdt_man);
obj->usdt_man = NULL;
+ loc_manager_free(obj->loc_man);
+ obj->loc_man = NULL;
bpf_gen__free(obj->gen_loader);
bpf_object__elf_finish(obj);
@@ -9561,6 +9567,7 @@ static int attach_kprobe_session(const struct bpf_program *prog, long cookie, st
static int attach_uprobe_multi(const struct bpf_program *prog, long cookie, struct bpf_link **link);
static int attach_lsm(const struct bpf_program *prog, long cookie, struct bpf_link **link);
static int attach_iter(const struct bpf_program *prog, long cookie, struct bpf_link **link);
+static int attach_kloc(const struct bpf_program *prog, long cookie, struct bpf_link **link);
static const struct bpf_sec_def section_defs[] = {
SEC_DEF("socket", SOCKET_FILTER, 0, SEC_NONE),
@@ -9666,6 +9673,7 @@ static const struct bpf_sec_def section_defs[] = {
SEC_DEF("struct_ops.s+", STRUCT_OPS, 0, SEC_SLEEPABLE),
SEC_DEF("sk_lookup", SK_LOOKUP, BPF_SK_LOOKUP, SEC_ATTACHABLE),
SEC_DEF("netfilter", NETFILTER, BPF_NETFILTER, SEC_NONE),
+ SEC_DEF("kloc+", KPROBE, 0, SEC_NONE, attach_kloc),
};
int libbpf_register_prog_handler(const char *sec,
@@ -11155,7 +11163,7 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
attr.size = attr_sz;
attr.type = type;
attr.config |= (__u64)ref_ctr_off << PERF_UPROBE_REF_CTR_OFFSET_SHIFT;
- attr.config1 = ptr_to_u64(name); /* kprobe_func or uprobe_path */
+ attr.config1 = name ? ptr_to_u64(name) : 0; /* kprobe_func or uprobe_path */
attr.config2 = offset; /* kprobe_addr or probe_offset */
/* pid filter is meaningful only for uprobes */
@@ -12601,6 +12609,72 @@ static int attach_usdt(const struct bpf_program *prog, long cookie, struct bpf_l
return err;
}
+struct bpf_link *bpf_program__attach_kloc(const struct bpf_program *prog,
+ const char *module, const char *name,
+ const struct bpf_kloc_opts *opts)
+{
+ struct bpf_object *obj = prog->obj;
+ struct bpf_link *link;
+ __u64 loc_cookie;
+ int err;
+
+ if (!OPTS_VALID(opts, bpf_kloc_opts))
+ return libbpf_err_ptr(-EINVAL);
+
+ if (bpf_program__fd(prog) < 0) {
+ pr_warn("prog '%s': can't attach BPF program without FD (was it loaded?)\n",
+ prog->name);
+ return libbpf_err_ptr(-EINVAL);
+ }
+ if (!module || !name)
+ return libbpf_err_ptr(-EINVAL);
+
+ /* loc manager is instantiated lazily on first loc attach. It will
+ * be destroyed together with BPF object in bpf_object__close().
+ */
+ if (IS_ERR(obj->loc_man))
+ return libbpf_ptr(obj->loc_man);
+ if (!obj->loc_man) {
+ obj->loc_man = loc_manager_new(obj);
+ if (IS_ERR(obj->loc_man))
+ return libbpf_ptr(obj->loc_man);
+ }
+
+ loc_cookie = OPTS_GET(opts, loc_cookie, 0);
+ link = loc_manager_attach_kloc(obj->loc_man, prog, module, name, loc_cookie);
+ err = libbpf_get_error(link);
+ if (err)
+ return libbpf_err_ptr(err);
+ return link;
+}
+
+static int attach_kloc(const struct bpf_program *prog, long cookie, struct bpf_link **link)
+{
+ char *module = NULL, *name = NULL;
+ const char *sec_name;
+ int n, err;
+
+ sec_name = bpf_program__section_name(prog);
+ if (strcmp(sec_name, "kloc") == 0) {
+ /* no auto-attach for just SEC("kloc") */
+ *link = NULL;
+ return 0;
+ }
+
+ n = sscanf(sec_name, "kloc/%m[^:]:%m[^:]", &module, &name);
+ if (n != 2) {
+ pr_warn("invalid section '%s', expected SEC(\"kloc/<module>:<name>\")\n",
+ sec_name);
+ err = -EINVAL;
+ } else {
+ *link = bpf_program__attach_kloc(prog, module, name, NULL);
+ err = libbpf_get_error(*link);
+ }
+ free(module);
+ free(name);
+ return err;
+}
+
static int determine_tracepoint_id(const char *tp_category,
const char *tp_name)
{
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 5118d0a90e24..3a5b7ef212a5 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -799,6 +799,33 @@ bpf_program__attach_usdt(const struct bpf_program *prog,
const char *usdt_provider, const char *usdt_name,
const struct bpf_usdt_opts *opts);
+struct bpf_kloc_opts {
+ /* size of this struct, for forward/backward compatibility */
+ size_t sz;
+ /* custom user-provided value fetchable through loc_cookie() */
+ __u64 loc_cookie;
+ size_t:0;
+};
+#define bpf_kloc_opts__last_field loc_cookie
+
+/**
+ * @brief **bpf_program__attach_kloc()** attaches to the location
+ * named *name* in *module* (which can be "vmlinux" or a module name).
+ * Attaches to all locations associated with *name*.
+ *
+ * @param prog BPF program to attach
+ * @param module name
+ * @param name Location name
+ * @param opts Options for altering program attachment
+ * @return Reference to the newly created BPF link: or NULL is returned on error
+ *
+ * error code is stored in errno
+ */
+LIBBPF_API struct bpf_link *
+bpf_program__attach_kloc(const struct bpf_program *prog,
+ const char *module, const char *name,
+ const struct bpf_kloc_opts *opts);
+
struct bpf_tracepoint_opts {
/* size of this struct, for forward/backward compatibility */
size_t sz;
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 5f5cf9773205..94f8a6f8e00f 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -457,4 +457,5 @@ LIBBPF_1.7.0 {
btf__add_locsec;
btf__add_locsec_loc;
btf__load_btf_extra;
+ bpf_program__attach_kloc;
} LIBBPF_1.6.0;
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 2a05518265e9..654337524bdc 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -719,6 +719,13 @@ struct bpf_link * usdt_manager_attach_usdt(struct usdt_manager *man,
const char *usdt_provider, const char *usdt_name,
__u64 usdt_cookie);
+struct loc_manager *loc_manager_new(struct bpf_object *obj);
+void loc_manager_free(struct loc_manager *man);
+struct bpf_link *loc_manager_attach_kloc(struct loc_manager *man,
+ const struct bpf_program *prog,
+ const char *loc_mod, const char *loc_name,
+ __u64 loc_cookie);
+
static inline bool is_pow_of_2(size_t x)
{
return x && (x & (x - 1)) == 0;
diff --git a/tools/lib/bpf/loc.bpf.h b/tools/lib/bpf/loc.bpf.h
new file mode 100644
index 000000000000..65dcff3ea513
--- /dev/null
+++ b/tools/lib/bpf/loc.bpf.h
@@ -0,0 +1,297 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+/* Copyright (c) 2025, Oracle and/or its affiliates. */
+#ifndef __LOC_BPF_H__
+#define __LOC_BPF_H__
+
+#include <linux/errno.h>
+#include "bpf_helpers.h"
+#include "bpf_tracing.h"
+
+/* Below types and maps are internal implementation details of libbpf's loc
+ * support and are subjects to change. Also, bpf_loc_xxx() API helpers should
+ * be considered an unstable API as well and might be adjusted based on user
+ * feedback from using libbpf's location support in production.
+ *
+ * This is based heavily upon usdt.bpf.h.
+ */
+
+/* User can override BPF_LOC_MAX_SPEC_CNT to change default size of internal
+ * map that keeps track of location argument specifications. This might be
+ * necessary if there are a lot of location attachments.
+ */
+#ifndef BPF_LOC_MAX_SPEC_CNT
+#define BPF_LOC_MAX_SPEC_CNT 256
+#endif
+/* User can override BPF_LOC_MAX_IP_CNT to change default size of internal
+ * map that keeps track of IP (memory address) mapping to loc argument
+ * specification.
+ * Note, if kernel supports BPF cookies, this map is not used and could be
+ * resized all the way to 1 to save a bit of memory.
+ */
+#ifndef BPF_LOC_MAX_IP_CNT
+#define BPF_LOC_MAX_IP_CNT (4 * BPF_LOC_MAX_SPEC_CNT)
+#endif
+
+enum __bpf_loc_arg_type {
+ BPF_LOC_ARG_UNAVAILABLE,
+ BPF_LOC_ARG_CONST,
+ BPF_LOC_ARG_REG,
+ BPF_LOC_ARG_REG_DEREF,
+ BPF_LOC_ARG_REG_MULTI,
+};
+
+struct __bpf_loc_arg_spec {
+ /* u64 scalar interpreted depending on arg_type, see below */
+ __u64 val_off;
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+ enum __bpf_loc_arg_type arg_type: 8;
+ /* reserved for future use, keeps reg_off offset stable */
+ __u32 __reserved: 24;
+#else
+ __u32 __reserved: 24;
+ enum __bpf_loc_arg_type arg_type: 8;
+#endif
+ /* offset of referenced register within struct pt_regs */
+ union {
+ short reg_off;
+ short reg_offs[2];
+ };
+ /* whether arg should be interpreted as signed value */
+ bool arg_signed;
+ /* number of bits that need to be cleared and, optionally,
+ * sign-extended to cast arguments that are 1, 2, or 4 bytes
+ * long into final 8-byte u64/s64 value returned to user
+ */
+ char arg_bitshift;
+};
+
+/* should match LOC_MAX_ARG_CNT in loc.c exactly */
+#define BPF_LOC_MAX_ARG_CNT 12
+struct __bpf_loc_spec {
+ struct __bpf_loc_arg_spec args[BPF_LOC_MAX_ARG_CNT];
+ __u64 loc_cookie;
+ short arg_cnt;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __uint(max_entries, BPF_LOC_MAX_SPEC_CNT);
+ __type(key, int);
+ __type(value, struct __bpf_loc_spec);
+} __bpf_loc_specs SEC(".maps") __weak;
+
+struct {
+ __uint(type, BPF_MAP_TYPE_HASH);
+ __uint(max_entries, BPF_LOC_MAX_IP_CNT);
+ __type(key, long);
+ __type(value, __u32);
+} __bpf_loc_ip_to_spec_id SEC(".maps") __weak;
+
+extern const _Bool LINUX_HAS_BPF_COOKIE __kconfig;
+
+static __always_inline
+int __bpf_loc_spec_id(struct pt_regs *ctx)
+{
+ if (!LINUX_HAS_BPF_COOKIE) {
+ long ip = PT_REGS_IP(ctx);
+ int *spec_id_ptr;
+
+ spec_id_ptr = bpf_map_lookup_elem(&__bpf_loc_ip_to_spec_id, &ip);
+ return spec_id_ptr ? *spec_id_ptr : -ESRCH;
+ }
+
+ return bpf_get_attach_cookie(ctx);
+}
+
+/* Return number of loc arguments defined for currently traced loc. */
+__weak __hidden
+int bpf_loc_arg_cnt(struct pt_regs *ctx)
+{
+ struct __bpf_loc_spec *spec;
+ int spec_id;
+
+ spec_id = __bpf_loc_spec_id(ctx);
+ if (spec_id < 0)
+ return -ESRCH;
+
+ spec = bpf_map_lookup_elem(&__bpf_loc_specs, &spec_id);
+ if (!spec)
+ return -ESRCH;
+
+ return spec->arg_cnt;
+}
+
+/* Returns the size in bytes of the #*arg_num* (zero-indexed) loc argument.
+ * Returns negative error if argument is not found or arg_num is invalid.
+ */
+static __always_inline
+int bpf_loc_arg_size(struct pt_regs *ctx, __u64 arg_num)
+{
+ struct __bpf_loc_arg_spec *arg_spec;
+ struct __bpf_loc_spec *spec;
+ int spec_id;
+
+ spec_id = __bpf_loc_spec_id(ctx);
+ if (spec_id < 0)
+ return -ESRCH;
+
+ spec = bpf_map_lookup_elem(&__bpf_loc_specs, &spec_id);
+ if (!spec)
+ return -ESRCH;
+
+ if (arg_num >= BPF_LOC_MAX_ARG_CNT)
+ return -ENOENT;
+ barrier_var(arg_num);
+ if (arg_num >= spec->arg_cnt)
+ return -ENOENT;
+
+ arg_spec = &spec->args[arg_num];
+
+ /* arg_spec->arg_bitshift = 64 - arg_sz * 8
+ * so: arg_sz = (64 - arg_spec->arg_bitshift) / 8
+ */
+ return (unsigned int)(64 - arg_spec->arg_bitshift) / 8;
+}
+
+/* Fetch loc argument #*arg_num* (zero-indexed) and put its value into *res.
+ * Returns 0 on success; negative error, otherwise.
+ * On error *res is guaranteed to be set to zero.
+ */
+__weak __hidden
+int bpf_loc_arg(struct pt_regs *ctx, __u64 arg_num, long *res)
+{
+ struct __bpf_loc_spec *spec;
+ struct __bpf_loc_arg_spec *arg_spec;
+ unsigned long val;
+ int err, spec_id;
+
+ *res = 0;
+
+ spec_id = __bpf_loc_spec_id(ctx);
+ if (spec_id < 0)
+ return -ESRCH;
+
+ spec = bpf_map_lookup_elem(&__bpf_loc_specs, &spec_id);
+ if (!spec)
+ return -ESRCH;
+
+ if (arg_num >= BPF_LOC_MAX_ARG_CNT)
+ return -ENOENT;
+ barrier_var(arg_num);
+ if (arg_num >= spec->arg_cnt)
+ return -ENOENT;
+
+ arg_spec = &spec->args[arg_num];
+ switch (arg_spec->arg_type) {
+ case BPF_LOC_ARG_UNAVAILABLE:
+ *res = 0;
+ return -ENOENT;
+ case BPF_LOC_ARG_CONST:
+ /* Arg is just a constant ("-4@$-9" in loc arg spec).
+ * value is recorded in arg_spec->val_off directly.
+ */
+ val = arg_spec->val_off;
+ break;
+ case BPF_LOC_ARG_REG:
+ /* Arg is stored directly in a register, so we read the
+ * contents of that register directly from struct pt_regs.
+ * To keep things simple user-space parts record
+ * offsetof(struct pt_regs, <regname>) in arg_spec->reg_off.
+ */
+ err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
+ if (err)
+ return err;
+ break;
+ case BPF_LOC_ARG_REG_DEREF:
+ /* Arg is in memory addressed by register, plus some offset
+ * Register is identified like with BPF_LOC_ARG_REG case,
+ * and the offset is in arg_spec->val_off. We first fetch
+ * register contents from pt_regs, then do another probe read
+ * to fetch argument value itself.
+ */
+ err = bpf_probe_read_kernel(&val, sizeof(val), (void *)ctx + arg_spec->reg_off);
+ if (err)
+ return err;
+ err = bpf_probe_read_kernel(&val, sizeof(val), (void *)val + arg_spec->val_off);
+ if (err)
+ return err;
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+ val >>= arg_spec->arg_bitshift;
+#endif
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /* cast arg from 1, 2, or 4 bytes to final 8 byte size clearing
+ * necessary upper arg_bitshift bits, with sign extension if argument
+ * is signed
+ */
+ val <<= arg_spec->arg_bitshift;
+ if (arg_spec->arg_signed)
+ val = ((long)val) >> arg_spec->arg_bitshift;
+ else
+ val = val >> arg_spec->arg_bitshift;
+ *res = val;
+ return 0;
+}
+
+/* Retrieve user-specified cookie value provided during attach as
+ * bpf_loc_opts.loc_cookie. This serves the same purpose as BPF cookie
+ * returned by bpf_get_attach_cookie(). Libbpf's support for locs is itself
+ * utilizing BPF cookies internally, so user can't use BPF cookie directly
+ * for loc programs and has to use bpf_loc_cookie() API instead.
+ */
+__weak __hidden
+long bpf_loc_cookie(struct pt_regs *ctx)
+{
+ struct __bpf_loc_spec *spec;
+ int spec_id;
+
+ spec_id = __bpf_loc_spec_id(ctx);
+ if (spec_id < 0)
+ return 0;
+
+ spec = bpf_map_lookup_elem(&__bpf_loc_specs, &spec_id);
+ if (!spec)
+ return 0;
+
+ return spec->loc_cookie;
+}
+
+/* we rely on ___bpf_apply() and ___bpf_narg() macros already defined in bpf_tracing.h */
+#define ___bpf_loc_args0() ctx
+#define ___bpf_loc_args1(x) ___bpf_loc_args0(), ({ long _x; bpf_loc_arg(ctx, 0, &_x); _x; })
+#define ___bpf_loc_args2(x, args...) ___bpf_loc_args1(args), ({ long _x; bpf_loc_arg(ctx, 1, &_x); _x; })
+#define ___bpf_loc_args3(x, args...) ___bpf_loc_args2(args), ({ long _x; bpf_loc_arg(ctx, 2, &_x); _x; })
+#define ___bpf_loc_args4(x, args...) ___bpf_loc_args3(args), ({ long _x; bpf_loc_arg(ctx, 3, &_x); _x; })
+#define ___bpf_loc_args5(x, args...) ___bpf_loc_args4(args), ({ long _x; bpf_loc_arg(ctx, 4, &_x); _x; })
+#define ___bpf_loc_args6(x, args...) ___bpf_loc_args5(args), ({ long _x; bpf_loc_arg(ctx, 5, &_x); _x; })
+#define ___bpf_loc_args7(x, args...) ___bpf_loc_args6(args), ({ long _x; bpf_loc_arg(ctx, 6, &_x); _x; })
+#define ___bpf_loc_args8(x, args...) ___bpf_loc_args7(args), ({ long _x; bpf_loc_arg(ctx, 7, &_x); _x; })
+#define ___bpf_loc_args9(x, args...) ___bpf_loc_args8(args), ({ long _x; bpf_loc_arg(ctx, 8, &_x); _x; })
+#define ___bpf_loc_args10(x, args...) ___bpf_loc_args9(args), ({ long _x; bpf_loc_arg(ctx, 9, &_x); _x; })
+#define ___bpf_loc_args11(x, args...) ___bpf_loc_args10(args), ({ long _x; bpf_loc_arg(ctx, 10, &_x); _x; })
+#define ___bpf_loc_args12(x, args...) ___bpf_loc_args11(args), ({ long _x; bpf_loc_arg(ctx, 11, &_x); _x; })
+#define ___bpf_loc_args(args...) ___bpf_apply(___bpf_loc_args, ___bpf_narg(args))(args)
+
+/*
+ * BPF_KLOC serves the same purpose for loc handlers as BPF_PROG for
+ * tp_btf/fentry/fexit BPF programs and BPF_KPROBE for kprobes.
+ * Original struct pt_regs * context is preserved as 'ctx' argument.
+ */
+#define BPF_KLOC(name, args...) \
+name(struct pt_regs *ctx); \
+static __always_inline typeof(name(0)) \
+____##name(struct pt_regs *ctx, ##args); \
+typeof(name(0)) name(struct pt_regs *ctx) \
+{ \
+ _Pragma("GCC diagnostic push") \
+ _Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \
+ return ____##name(___bpf_loc_args(args)); \
+ _Pragma("GCC diagnostic pop") \
+} \
+static __always_inline typeof(name(0)) \
+____##name(struct pt_regs *ctx, ##args)
+
+#endif /* __LOC_BPF_H__ */
diff --git a/tools/lib/bpf/loc.c b/tools/lib/bpf/loc.c
new file mode 100644
index 000000000000..345b248bb52e
--- /dev/null
+++ b/tools/lib/bpf/loc.c
@@ -0,0 +1,653 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+/* Copyright (c) 2025, Oracle and/or its affiliates. */
+#include <ctype.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <linux/ptrace.h>
+#include <linux/kernel.h>
+
+/* s8 will be marked as poison while it's a reg of riscv */
+#if defined(__riscv)
+#define rv_s8 s8
+#endif
+
+#include "bpf.h"
+#include "btf.h"
+#include "libbpf.h"
+#include "libbpf_common.h"
+#include "libbpf_internal.h"
+
+/* Location implementation is very similar to usdt.c; key difference
+ * is the data specifying how to retrieve parameters for a target is
+ * in BTF.
+ */
+
+/* should match exactly enum __bpf_loc_arg_type from loc.bpf.h */
+enum loc_arg_type {
+ BPF_LOC_ARG_UNAVAILABLE,
+ BPF_LOC_ARG_CONST,
+ BPF_LOC_ARG_REG,
+ BPF_LOC_ARG_REG_DEREF,
+ BPF_LOC_ARG_REG_MULTI,
+};
+
+/* should match exactly struct __bpf_loc_arg_spec from loc.bpf.h */
+struct loc_arg_spec {
+ /* u64 scalar interpreted depending on arg_type, see below */
+ __u64 val_off;
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+ enum loc_arg_type arg_type: 8;
+ /* reserved for future use, keeps reg_off offset stable */
+ __u32 __reserved: 24;
+#else
+ __u32 __reserved: 24;
+ enum loc_arg_type arg_type: 8;
+#endif
+ /* offset of referenced register within struct pt_regs */
+ union {
+ short reg_off;
+ short reg_offs[2];
+ };
+ /* whether arg should be interpreted as signed value */
+ bool arg_signed;
+ /* number of bits that need to be cleared and, optionally,
+ * sign-extended to cast arguments that are 1, 2, or 4 bytes
+ * long into final 8-byte u64/s64 value returned to user
+ */
+ char arg_bitshift;
+};
+
+#define LOC_MAX_ARG_CNT 12
+struct loc_spec {
+ struct loc_arg_spec args[LOC_MAX_ARG_CNT];
+ __u64 loc_cookie;
+ short arg_cnt;
+};
+
+struct loc_target {
+ long abs_ip;
+ struct loc_spec spec;
+};
+
+struct loc_manager {
+ struct bpf_map *specs_map;
+ struct bpf_map *ip_to_spec_id_map;
+ int *free_spec_ids;
+ size_t free_spec_cnt;
+ size_t next_free_spec_id;
+ struct loc_target *targets;
+ size_t target_cnt;
+ bool has_bpf_cookie;
+};
+
+static int get_base_addr(const char *mod, __u64 *base_addr)
+{
+ bool is_vmlinux = strcmp(mod, "vmlinux") == 0;
+ const char *file = is_vmlinux ? "/proc/kallsyms" : "/proc/modules";
+ char name[PATH_MAX], type;
+ int err = -ENOENT;
+ FILE *f = NULL;
+ long addr;
+
+ *base_addr = 0;
+
+ f = fopen(file, "r");
+ if (!f) {
+ pr_warn("loc: cannot open '%s' (err %s)\n", file, errstr(-errno));
+ return -errno;
+ }
+ if (is_vmlinux) {
+ while (fscanf(f, "%lx %c %499s%*[^\n]\n", &addr, &type, name) == 3) {
+ if (strcmp(name, "_text") != 0)
+ continue;
+ *base_addr = addr;
+ err = 0;
+ break;
+ }
+ } else {
+ while (fscanf(f, "%s %*s %*s %*s %*s 0x%lx\n", name, &addr) == 5) {
+ if (strcmp(name, mod) != 0)
+ continue;
+ *base_addr = addr;
+ err = 0;
+ break;
+ }
+ }
+ fclose(f);
+ if (err)
+ pr_warn("loc: could not find base addr for '%s'\n", mod);
+ return err;
+}
+
+void loc_manager_free(struct loc_manager *man)
+{
+ if (IS_ERR_OR_NULL(man))
+ return;
+
+ free(man->free_spec_ids);
+ free(man);
+}
+
+struct loc_manager *loc_manager_new(struct bpf_object *obj)
+{
+ struct loc_manager *man = NULL;
+ struct bpf_map *specs_map, *ip_to_spec_id_map;
+
+ specs_map = bpf_object__find_map_by_name(obj, "__bpf_loc_specs");
+ ip_to_spec_id_map = bpf_object__find_map_by_name(obj, "__bpf_loc_ip_to_spec_id");
+ if (!specs_map || !ip_to_spec_id_map) {
+ pr_warn("loc: failed to find location support BPF maps, did you forget to include bpf/loc.bpf.h?\n");
+ return ERR_PTR(-ESRCH);
+ }
+
+ man = calloc(1, sizeof(*man));
+ if (!man)
+ return ERR_PTR(-ENOMEM);
+ man->specs_map = specs_map;
+ man->ip_to_spec_id_map = ip_to_spec_id_map;
+
+ /* Detect if BPF cookie is supported for kprobes.
+ * We don't need IP-to-ID mapping if we can use BPF cookies.
+ * Added in: 7adfc6c9b315 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value")
+ */
+ man->has_bpf_cookie = kernel_supports(obj, FEAT_BPF_COOKIE);
+
+ return man;
+}
+
+struct bpf_link_loc {
+ struct bpf_link link;
+
+ struct loc_manager *loc_man;
+
+ size_t spec_cnt;
+ int *spec_ids;
+
+ size_t kprobe_cnt;
+ struct {
+ long abs_ip;
+ struct bpf_link *link;
+ } *kprobes;
+};
+
+static int bpf_link_loc_detach(struct bpf_link *link)
+{
+ struct bpf_link_loc *loc_link = container_of(link, struct bpf_link_loc, link);
+ struct loc_manager *man = loc_link->loc_man;
+ int i;
+
+ for (i = 0; i < loc_link->kprobe_cnt; i++) {
+ /* detach underlying kprobe link */
+ bpf_link__destroy(loc_link->kprobes[i].link);
+ /* there is no need to update specs map because it will be
+ * unconditionally overwritten on subsequent loc attaches,
+ * but if BPF cookies are not used we need to remove entry
+ * from ip_to_spec_id map, otherwise we'll run into false
+ * conflicting IP errors
+ */
+ if (!man->has_bpf_cookie) {
+ /* not much we can do about errors here */
+ (void)bpf_map_delete_elem(bpf_map__fd(man->ip_to_spec_id_map),
+ &loc_link->kprobes[i].abs_ip);
+ }
+ }
+
+ /* try to return the list of previously used spec IDs to loc_manager
+ * for future reuse for subsequent loc attaches
+ */
+ if (!man->free_spec_ids) {
+ /* if there were no free spec IDs yet, just transfer our IDs */
+ man->free_spec_ids = loc_link->spec_ids;
+ man->free_spec_cnt = loc_link->spec_cnt;
+ loc_link->spec_ids = NULL;
+ } else {
+ /* otherwise concat IDs */
+ size_t new_cnt = man->free_spec_cnt + loc_link->spec_cnt;
+ int *new_free_ids;
+
+ new_free_ids = libbpf_reallocarray(man->free_spec_ids, new_cnt,
+ sizeof(*new_free_ids));
+ /* If we couldn't resize free_spec_ids, we'll just leak
+ * a bunch of free IDs; this is very unlikely to happen and if
+ * system is so exhausted on memory, it's the least of user's
+ * concerns, probably.
+ * So just do our best here to return those IDs to usdt_manager.
+ * Another edge case when we can legitimately get NULL is when
+ * new_cnt is zero, which can happen in some edge cases, so we
+ * need to be careful about that.
+ */
+ if (new_free_ids || new_cnt == 0) {
+ memcpy(new_free_ids + man->free_spec_cnt, loc_link->spec_ids,
+ loc_link->spec_cnt * sizeof(*loc_link->spec_ids));
+ man->free_spec_ids = new_free_ids;
+ man->free_spec_cnt = new_cnt;
+ }
+ }
+
+ return 0;
+}
+
+static void bpf_link_loc_dealloc(struct bpf_link *link)
+{
+ struct bpf_link_loc *loc_link = container_of(link, struct bpf_link_loc, link);
+
+ free(loc_link->spec_ids);
+ free(loc_link->kprobes);
+ free(loc_link);
+}
+
+static int allocate_spec_id(struct loc_manager *man, struct bpf_link_loc *link,
+ struct loc_target *target, int *spec_id, bool *is_new)
+{
+ void *new_ids;
+
+ new_ids = libbpf_reallocarray(link->spec_ids, link->spec_cnt + 1, sizeof(*link->spec_ids));
+ if (!new_ids)
+ return -ENOMEM;
+ link->spec_ids = new_ids;
+
+ /* get next free spec ID, giving preference to free list, if not empty */
+ if (man->free_spec_cnt) {
+ *spec_id = man->free_spec_ids[man->free_spec_cnt - 1];
+ man->free_spec_cnt--;
+ } else {
+ /* don't allocate spec ID bigger than what fits in specs map */
+ if (man->next_free_spec_id >= bpf_map__max_entries(man->specs_map))
+ return -E2BIG;
+
+ *spec_id = man->next_free_spec_id;
+ man->next_free_spec_id++;
+ }
+
+ /* remember new spec ID in the link for later return back to free list on detach */
+ link->spec_ids[link->spec_cnt] = *spec_id;
+ link->spec_cnt++;
+ *is_new = true;
+ return 0;
+}
+
+static int collect_loc_targets(struct loc_manager *man, const char *mod, const char *name,
+ __u64 loc_cookie, struct loc_target **out_targets,
+ size_t *out_target_cnt);
+
+struct bpf_link *loc_manager_attach_kloc(struct loc_manager *man, const struct bpf_program *prog,
+ const char *loc_mod, const char *loc_name,
+ __u64 loc_cookie)
+{
+ int i, err, spec_map_fd, ip_map_fd;
+
+ LIBBPF_OPTS(bpf_kprobe_opts, opts);
+ struct bpf_link_loc *link = NULL;
+ struct loc_target *targets = NULL;
+ __u64 *cookies = NULL;
+ size_t target_cnt = 0;
+
+ spec_map_fd = bpf_map__fd(man->specs_map);
+ ip_map_fd = bpf_map__fd(man->ip_to_spec_id_map);
+
+ err = collect_loc_targets(man, loc_mod, loc_name, loc_cookie, &targets, &target_cnt);
+ if (err <= 0) {
+ err = (err == 0) ? -ENOENT : err;
+ goto err_out;
+ }
+ err = 0;
+
+ link = calloc(1, sizeof(*link));
+ if (!link) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ link->loc_man = man;
+ link->link.detach = &bpf_link_loc_detach;
+ link->link.dealloc = &bpf_link_loc_dealloc;
+
+ link->kprobes = calloc(target_cnt, sizeof(*link->kprobes));
+ if (!link->kprobes) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+
+ for (i = 0; i < target_cnt; i++) {
+ struct loc_target *target = &targets[i];
+ struct bpf_link *kprobe_link;
+ bool is_new;
+ int spec_id;
+
+ /* Spec ID can be either reused or newly allocated. */
+ err = allocate_spec_id(man, link, target, &spec_id, &is_new);
+ if (err)
+ goto err_out;
+
+ if (is_new && bpf_map_update_elem(spec_map_fd, &spec_id, &target->spec, BPF_ANY)) {
+ err = -errno;
+ pr_warn("loc: failed to set loc spec #%d for '%s:%s' in : %s\n",
+ spec_id, loc_mod, loc_name, errstr(err));
+ goto err_out;
+ }
+ if (!man->has_bpf_cookie &&
+ bpf_map_update_elem(ip_map_fd, &target->abs_ip, &spec_id, BPF_NOEXIST)) {
+ err = -errno;
+ if (err == -EEXIST) {
+ pr_warn("loc: IP collision detected for spec #%d for '%s:%s''\n",
+ spec_id, loc_mod, loc_name);
+ } else {
+ pr_warn("loc: failed to map IP 0x%lx to spec #%d for '%s:%s': %s\n",
+ target->abs_ip, spec_id, loc_mod, loc_name,
+ errstr(err));
+ }
+ goto err_out;
+ }
+
+
+ opts.bpf_cookie = man->has_bpf_cookie ? spec_id : 0;
+ opts.offset = target->abs_ip;
+ kprobe_link = bpf_program__attach_kprobe_opts(prog, NULL, &opts);
+ err = libbpf_get_error(kprobe_link);
+ if (err) {
+ pr_warn("loc: failed to attach kprobe #%d for '%s:%s': %s\n",
+ i, loc_mod, loc_name, errstr(err));
+ goto err_out;
+ }
+
+ link->kprobes[i].link = kprobe_link;
+ link->kprobes[i].abs_ip = target->abs_ip;
+ link->kprobe_cnt++;
+ }
+
+ return &link->link;
+
+err_out:
+ pr_warn("loc: failed to attach to all loc targets for '%s:%s': %d\n",
+ loc_mod, loc_name, err);
+ free(cookies);
+
+ if (link)
+ bpf_link__destroy(&link->link);
+ free(targets);
+ return libbpf_err_ptr(err);
+}
+
+/* Architecture-specific logic for parsing location info */
+
+#if defined(__x86_64__) || defined(__i386__)
+
+static int calc_pt_regs_off(uint16_t num)
+{
+ size_t reg_map[] = {
+#ifdef __x86_64__
+#define reg_off(reg64, reg32) offsetof(struct pt_regs, reg64)
+#else
+#define reg_off(reg64, reg32) offsetof(struct pt_regs, reg32)
+#endif
+ reg_off(rax, eax),
+ reg_off(rdx, edx),
+ reg_off(rcx, ecx),
+ reg_off(rbx, ebx),
+ reg_off(rsi, esi),
+ reg_off(rdi, edi),
+ reg_off(rbp, ebp),
+ reg_off(rsp, esp),
+ offsetof(struct pt_regs, r8),
+ offsetof(struct pt_regs, r9),
+ offsetof(struct pt_regs, r10),
+ offsetof(struct pt_regs, r11),
+ offsetof(struct pt_regs, r12),
+ offsetof(struct pt_regs, r13),
+ offsetof(struct pt_regs, r14),
+ offsetof(struct pt_regs, r15),
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ -ENOENT,
+ reg_off(rbp, ebp)
+ };
+
+ if (num > ARRAY_SIZE(reg_map) || reg_map[num] == -ENOENT) {
+ pr_warn("loc: unsupported register '%d'\n", num);
+ return -ENOENT;
+ }
+ return reg_map[num];
+}
+
+#elif defined(__aarch64__)
+
+static int calc_pt_regs_off(int num)
+{
+ if (num >= 0 && num < 31)
+ return offsetof(struct user_pt_regs, regs[reg_num]);
+ else if (num == 33)
+ return offsetof(struct user_pt_regs, sp);
+ pr_warn("loc: unsupported register '%d'\n", num);
+ return -ENOENT;
+}
+
+#else
+static int calc_pt_regs_off(int num)
+{
+ pr_warn("loc: unsupported platform (register '%d')\n", num);
+ return -ENOENT;
+}
+#endif
+
+static int parse_loc_arg(const struct btf_type *t, __u64 base_addr, short arg_num, struct loc_arg_spec *arg)
+{
+ const struct btf_loc_param *lp = t ? btf_loc_param(t) : NULL;
+ int reg_off, arg_sz;
+ bool is_const;
+
+ if (!t) {
+ arg->arg_type = BPF_LOC_ARG_UNAVAILABLE;
+ return 0;
+ }
+ is_const = btf_kflag(t);
+ arg_sz = BTF_TYPE_LOC_PARAM_SIZE(t);
+ if (arg_sz < 0) {
+ arg->arg_signed = true;
+ arg_sz = -arg_sz;
+ }
+ switch (arg_sz) {
+ case 1: case 2: case 4: case 8:
+ arg->arg_bitshift = 64 - arg_sz * 8;
+ break;
+ default:
+ pr_warn("loc: unsupported arg #%d size: %d\n",
+ arg_num, arg_sz);
+ return -EINVAL;
+ }
+
+ if (is_const) {
+ arg->arg_type = BPF_LOC_ARG_CONST;
+ arg->val_off = lp->val_lo32 | ((__u64)lp->val_hi32 << 32);
+ if (arg_sz == 8)
+ arg->val_off += base_addr;
+ } else {
+ reg_off = calc_pt_regs_off(lp->reg);
+ if (reg_off < 0)
+ return reg_off;
+ if (arg->arg_type == BPF_LOC_ARG_REG_MULTI) {
+ arg->reg_offs[1] = reg_off;
+ } else {
+ if (lp->flags & BTF_LOC_FLAG_CONTINUE)
+ arg->arg_type = BPF_LOC_ARG_REG_MULTI;
+ else
+ arg->arg_type = BPF_LOC_ARG_REG;
+ arg->reg_off = reg_off;
+ }
+ arg->val_off = 0;
+ if (lp->flags & BTF_LOC_FLAG_REG_DEREF) {
+ arg->arg_type = BPF_LOC_ARG_REG_DEREF;
+ arg->val_off = lp->offset;
+ }
+ }
+ if (lp->flags & BTF_LOC_FLAG_CONTINUE)
+ return 1;
+ return 0;
+}
+
+static int parse_loc_spec(struct btf *btf, __u64 base_addr, const char *name,
+ __u32 func_proto, __u32 loc_proto, __u32 offset,
+ __u64 loc_cookie, struct loc_spec *spec)
+{
+ struct loc_arg_spec *arg;
+ const struct btf_param *p;
+ const struct btf_type *t;
+ int ret, i;
+ __u32 *l;
+
+ pr_debug("loc: parsing spec for '%s': func_proto_id %u loc_proto_id %u abs_ip 0x%llx\n",
+ name, func_proto, loc_proto, base_addr + offset);
+ spec->loc_cookie = loc_cookie;
+
+ t = btf__type_by_id(btf, func_proto);
+ if (!t) {
+ pr_warn("loc: unknown func proto %u for '%s'\n", func_proto, name);
+ return -EINVAL;
+ }
+ spec->arg_cnt = btf_vlen(t);
+ if (spec->arg_cnt >= LOC_MAX_ARG_CNT) {
+ pr_warn("loc: too many loc arguments (> %d) for '%s'\n",
+ LOC_MAX_ARG_CNT, name);
+ return -E2BIG;
+ }
+ p = btf_params(t);
+
+ t = btf__type_by_id(btf, loc_proto);
+ if (!t) {
+ pr_warn("loc: unknown loc proto %u for '%s'\n", func_proto, name);
+ return -EINVAL;
+ }
+ l = btf_loc_proto_params(t);
+
+ for (i = 0 ; i < spec->arg_cnt; i++, l++, p++) {
+ __u64 addr = 0;
+
+ arg = &spec->args[i];
+ if (*l == 0) {
+ t = NULL;
+ } else {
+ __u32 id;
+
+ /* use func proto to determine if the value
+ * is an address; if so we need to add base addr
+ * to value.
+ */
+ for (id = p->type;
+ (t = btf__type_by_id(btf, id)) != NULL;
+ id = t->type) {
+ if (!btf_is_mod(t) && !btf_is_typedef(t))
+ break;
+ }
+ if (t && btf_is_ptr(t))
+ addr = base_addr;
+
+ t = btf__type_by_id(btf, *l);
+ if (!t) {
+ pr_warn("loc: unknown type id %u for '%s'\n",
+ *l, name);
+ return -EINVAL;
+ }
+ }
+ ret = parse_loc_arg(t, addr, i, arg);
+ if (ret < 0)
+ return ret;
+ /* multi-reg location param? */
+ if (ret > 0) {
+ l++;
+ t = btf__type_by_id(btf, *l);
+
+ ret = parse_loc_arg(t, addr, i, arg);
+ }
+ if (ret < 0)
+ return ret;
+ }
+ return 0;
+}
+
+static int collect_loc_targets(struct loc_manager *man, const char *mod, const char *name,
+ __u64 loc_cookie, struct loc_target **out_targets,
+ size_t *out_target_cnt)
+{
+ struct loc_target *tmp, *targets = NULL;
+ struct btf *base_btf, *btf = NULL;
+ __u32 start_id, type_cnt;
+ size_t target_cnt = 0;
+ __u64 base_addr = 0;
+ int err = 0;
+ __u32 i, j;
+
+ base_btf = btf__load_vmlinux_btf();
+ if (!IS_ERR(base_btf) && strcmp(mod, "vmlinux") != 0)
+ base_btf = btf__load_module_btf(mod, base_btf);
+ if (IS_ERR(base_btf))
+ return PTR_ERR(base_btf);
+ btf = btf__load_btf_extra(mod, base_btf);
+ if (IS_ERR(btf)) {
+ btf__free(base_btf);
+ return PTR_ERR(btf);
+ }
+
+ start_id = base_btf ? btf__type_cnt(base_btf) : 1;
+ type_cnt = btf__type_cnt(btf);
+
+ err = get_base_addr(mod, &base_addr);
+ if (err)
+ goto err_out;
+
+ for (i = start_id; i < type_cnt; i++) {
+ struct loc_target *target;
+ const struct btf_type *t;
+ const struct btf_loc *l;
+ const char *locname;
+ int vlen;
+
+ t = btf__type_by_id(btf, i);
+ if (!btf_is_locsec(t))
+ continue;
+ vlen = btf_vlen(t);
+ l = btf_locsec_locs(t);
+
+ for (j = 0; j < vlen; j++, l++) {
+ locname = btf__name_by_offset(btf, l->name_off);
+ if (!locname || strcmp(name, locname) != 0)
+ continue;
+ tmp = libbpf_reallocarray(targets, target_cnt + 1, sizeof(*targets));
+ if (!tmp) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+ targets = tmp;
+ target = &targets[target_cnt];
+ memset(target, 0, sizeof(*target));
+ target->abs_ip = base_addr + l->offset;
+ err = parse_loc_spec(btf, base_addr, locname,
+ l->func_proto, l->loc_proto, l->offset,
+ loc_cookie, &target->spec);
+ if (err)
+ goto err_out;
+ target_cnt++;
+
+ }
+ }
+ *out_targets = targets;
+ *out_target_cnt = target_cnt;
+ return target_cnt;
+err_out:
+ free(targets);
+ return err;
+}
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 14/15] libbpf: add support for BTF location attachment
2025-10-08 17:35 ` [RFC bpf-next 14/15] libbpf: add support for BTF location attachment Alan Maguire
@ 2025-10-16 18:36 ` Andrii Nakryiko
2025-10-17 14:02 ` Alan Maguire
0 siblings, 1 reply; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-16 18:36 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Wed, Oct 8, 2025 at 10:36 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> Add support for BTF-based location attachment via multiple kprobes
> attaching to each instance of an inline site. Note this is not kprobe
> multi attach since that requires fprobe on entry and sites are within
> functions. Implementation similar to USDT manager where we use BTF
> to create a location manager and populate expected arg values with
> metadata based upon BTF_KIND_LOC_PARAM/LOC_PROTOs.
>
> Add new auto-attach SEC("kloc/module:name") where the module is
> vmlinux/kernel module and the name is the name of the associated
> location; all sites associated with that name will be attached via
> kprobes for tracing.
>
If kernel ends up supporting something like this natively, then all
this is irrelevant.
But I'd test-drive this in a purpose-built tracing tool like bpftrace
before committing to baking this into libbpf from the get-go.
Generally speaking, I feel like we need a tracing-focused companion
library to libbpf for stuff like this. And it can take care of extra
utilities like parsing DWARF, kallsyms, ELF symbols, etc. All the
different stuff that is required for powerful BPF-based kernel and
user space tracing, but is not per se BPF itself. libbpf' USDT support
is sort of on the edge of what I'd consider acceptable to be provided
by libbpf, and that's mostly because USDT is stable and
well-established technology that people coming from BCC assume should
be baked into BPF library.
> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> ---
> tools/lib/bpf/Build | 2 +-
> tools/lib/bpf/Makefile | 2 +-
> tools/lib/bpf/libbpf.c | 76 +++-
> tools/lib/bpf/libbpf.h | 27 ++
> tools/lib/bpf/libbpf.map | 1 +
> tools/lib/bpf/libbpf_internal.h | 7 +
> tools/lib/bpf/loc.bpf.h | 297 +++++++++++++++
> tools/lib/bpf/loc.c | 653 ++++++++++++++++++++++++++++++++
> 8 files changed, 1062 insertions(+), 3 deletions(-)
> create mode 100644 tools/lib/bpf/loc.bpf.h
> create mode 100644 tools/lib/bpf/loc.c
>
[...]
> diff --git a/tools/lib/bpf/loc.bpf.h b/tools/lib/bpf/loc.bpf.h
> new file mode 100644
> index 000000000000..65dcff3ea513
> --- /dev/null
> +++ b/tools/lib/bpf/loc.bpf.h
> @@ -0,0 +1,297 @@
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> +/* Copyright (c) 2025, Oracle and/or its affiliates. */
> +#ifndef __LOC_BPF_H__
> +#define __LOC_BPF_H__
> +
> +#include <linux/errno.h>
> +#include "bpf_helpers.h"
> +#include "bpf_tracing.h"
> +
> +/* Below types and maps are internal implementation details of libbpf's loc
> + * support and are subjects to change. Also, bpf_loc_xxx() API helpers should
> + * be considered an unstable API as well and might be adjusted based on user
> + * feedback from using libbpf's location support in production.
> + *
> + * This is based heavily upon usdt.bpf.h.
> + */
> +
> +/* User can override BPF_LOC_MAX_SPEC_CNT to change default size of internal
> + * map that keeps track of location argument specifications. This might be
> + * necessary if there are a lot of location attachments.
> + */
> +#ifndef BPF_LOC_MAX_SPEC_CNT
> +#define BPF_LOC_MAX_SPEC_CNT 256
> +#endif
> +/* User can override BPF_LOC_MAX_IP_CNT to change default size of internal
> + * map that keeps track of IP (memory address) mapping to loc argument
> + * specification.
> + * Note, if kernel supports BPF cookies, this map is not used and could be
> + * resized all the way to 1 to save a bit of memory.
is this just a copy/paste of really we will try to support kernels
without BPF cookies for something bleeding edge like this?..
[...]
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 14/15] libbpf: add support for BTF location attachment
2025-10-16 18:36 ` Andrii Nakryiko
@ 2025-10-17 14:02 ` Alan Maguire
2025-10-20 21:07 ` Andrii Nakryiko
0 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-17 14:02 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On 16/10/2025 19:36, Andrii Nakryiko wrote:
> On Wed, Oct 8, 2025 at 10:36 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> Add support for BTF-based location attachment via multiple kprobes
>> attaching to each instance of an inline site. Note this is not kprobe
>> multi attach since that requires fprobe on entry and sites are within
>> functions. Implementation similar to USDT manager where we use BTF
>> to create a location manager and populate expected arg values with
>> metadata based upon BTF_KIND_LOC_PARAM/LOC_PROTOs.
>>
>> Add new auto-attach SEC("kloc/module:name") where the module is
>> vmlinux/kernel module and the name is the name of the associated
>> location; all sites associated with that name will be attached via
>> kprobes for tracing.
>>
>
> If kernel ends up supporting something like this natively, then all
> this is irrelevant.
>
> But I'd test-drive this in a purpose-built tracing tool like bpftrace
> before committing to baking this into libbpf from the get-go.
>
> Generally speaking, I feel like we need a tracing-focused companion
> library to libbpf for stuff like this. And it can take care of extra
> utilities like parsing DWARF, kallsyms, ELF symbols, etc. All the
> different stuff that is required for powerful BPF-based kernel and
> user space tracing, but is not per se BPF itself. libbpf' USDT support
> is sort of on the edge of what I'd consider acceptable to be provided
> by libbpf, and that's mostly because USDT is stable and
> well-established technology that people coming from BCC assume should
> be baked into BPF library.
>
Yeah, that makes total sense; the implementation is really just there to
facilitate in-tree testing. We could move it to selftests and have
custom ELF section handling there to support it though without adding to
libbpf. It would definitely be good to have some in-tree facilities for
testing to ensure the metadata about inlines is not broken though.
Ideally this would be done by adding inline sites to bpf_testmod but the
RFC series did not support distilled/relocated BTF (which is what
bpf_testmod uses). Next round should hopefully have that support so we
can exercise inline sites more fully.
>
>> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
>> ---
>> tools/lib/bpf/Build | 2 +-
>> tools/lib/bpf/Makefile | 2 +-
>> tools/lib/bpf/libbpf.c | 76 +++-
>> tools/lib/bpf/libbpf.h | 27 ++
>> tools/lib/bpf/libbpf.map | 1 +
>> tools/lib/bpf/libbpf_internal.h | 7 +
>> tools/lib/bpf/loc.bpf.h | 297 +++++++++++++++
>> tools/lib/bpf/loc.c | 653 ++++++++++++++++++++++++++++++++
>> 8 files changed, 1062 insertions(+), 3 deletions(-)
>> create mode 100644 tools/lib/bpf/loc.bpf.h
>> create mode 100644 tools/lib/bpf/loc.c
>>
>
> [...]
>
>> diff --git a/tools/lib/bpf/loc.bpf.h b/tools/lib/bpf/loc.bpf.h
>> new file mode 100644
>> index 000000000000..65dcff3ea513
>> --- /dev/null
>> +++ b/tools/lib/bpf/loc.bpf.h
>> @@ -0,0 +1,297 @@
>> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
>> +/* Copyright (c) 2025, Oracle and/or its affiliates. */
>> +#ifndef __LOC_BPF_H__
>> +#define __LOC_BPF_H__
>> +
>> +#include <linux/errno.h>
>> +#include "bpf_helpers.h"
>> +#include "bpf_tracing.h"
>> +
>> +/* Below types and maps are internal implementation details of libbpf's loc
>> + * support and are subjects to change. Also, bpf_loc_xxx() API helpers should
>> + * be considered an unstable API as well and might be adjusted based on user
>> + * feedback from using libbpf's location support in production.
>> + *
>> + * This is based heavily upon usdt.bpf.h.
>> + */
>> +
>> +/* User can override BPF_LOC_MAX_SPEC_CNT to change default size of internal
>> + * map that keeps track of location argument specifications. This might be
>> + * necessary if there are a lot of location attachments.
>> + */
>> +#ifndef BPF_LOC_MAX_SPEC_CNT
>> +#define BPF_LOC_MAX_SPEC_CNT 256
>> +#endif
>> +/* User can override BPF_LOC_MAX_IP_CNT to change default size of internal
>> + * map that keeps track of IP (memory address) mapping to loc argument
>> + * specification.
>> + * Note, if kernel supports BPF cookies, this map is not used and could be
>> + * resized all the way to 1 to save a bit of memory.
>
> is this just a copy/paste of really we will try to support kernels
> without BPF cookies for something bleeding edge like this?..
>
Yeah, copy-paste; it seems unlikely that a kernel would have location
data and not have BPF cookie support. Even given the fact distros
backport stuff it's generally fixes not features like this. If we end up
moving some testing code to selftests I'll simplify removing no-cookie
workarounds. Thanks!
Alan
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 14/15] libbpf: add support for BTF location attachment
2025-10-17 14:02 ` Alan Maguire
@ 2025-10-20 21:07 ` Andrii Nakryiko
0 siblings, 0 replies; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-20 21:07 UTC (permalink / raw)
To: Alan Maguire
Cc: ast, daniel, andrii, martin.lau, acme, ttreyer, yonghong.song,
song, john.fastabend, kpsingh, sdf, haoluo, jolsa, qmo,
ihor.solodrai, david.faust, jose.marchesi, bpf
On Fri, Oct 17, 2025 at 7:02 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 16/10/2025 19:36, Andrii Nakryiko wrote:
> > On Wed, Oct 8, 2025 at 10:36 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> Add support for BTF-based location attachment via multiple kprobes
> >> attaching to each instance of an inline site. Note this is not kprobe
> >> multi attach since that requires fprobe on entry and sites are within
> >> functions. Implementation similar to USDT manager where we use BTF
> >> to create a location manager and populate expected arg values with
> >> metadata based upon BTF_KIND_LOC_PARAM/LOC_PROTOs.
> >>
> >> Add new auto-attach SEC("kloc/module:name") where the module is
> >> vmlinux/kernel module and the name is the name of the associated
> >> location; all sites associated with that name will be attached via
> >> kprobes for tracing.
> >>
> >
> > If kernel ends up supporting something like this natively, then all
> > this is irrelevant.
> >
> > But I'd test-drive this in a purpose-built tracing tool like bpftrace
> > before committing to baking this into libbpf from the get-go.
> >
> > Generally speaking, I feel like we need a tracing-focused companion
> > library to libbpf for stuff like this. And it can take care of extra
> > utilities like parsing DWARF, kallsyms, ELF symbols, etc. All the
> > different stuff that is required for powerful BPF-based kernel and
> > user space tracing, but is not per se BPF itself. libbpf' USDT support
> > is sort of on the edge of what I'd consider acceptable to be provided
> > by libbpf, and that's mostly because USDT is stable and
> > well-established technology that people coming from BCC assume should
> > be baked into BPF library.
> >
>
> Yeah, that makes total sense; the implementation is really just there to
> facilitate in-tree testing. We could move it to selftests and have
> custom ELF section handling there to support it though without adding to
> libbpf. It would definitely be good to have some in-tree facilities for
> testing to ensure the metadata about inlines is not broken though.
> Ideally this would be done by adding inline sites to bpf_testmod but the
> RFC series did not support distilled/relocated BTF (which is what
> bpf_testmod uses). Next round should hopefully have that support so we
> can exercise inline sites more fully.
TBH, for selftests we don't really need to invent BPF_USDT()-style
macros and such. I'd keep it simple and have some explicit global
variables-based approach to lookup a few values at correct locations.
No need to be really fancy here, IMO.
>
> >
> >> Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
> >> ---
> >> tools/lib/bpf/Build | 2 +-
> >> tools/lib/bpf/Makefile | 2 +-
> >> tools/lib/bpf/libbpf.c | 76 +++-
> >> tools/lib/bpf/libbpf.h | 27 ++
> >> tools/lib/bpf/libbpf.map | 1 +
> >> tools/lib/bpf/libbpf_internal.h | 7 +
> >> tools/lib/bpf/loc.bpf.h | 297 +++++++++++++++
> >> tools/lib/bpf/loc.c | 653 ++++++++++++++++++++++++++++++++
> >> 8 files changed, 1062 insertions(+), 3 deletions(-)
> >> create mode 100644 tools/lib/bpf/loc.bpf.h
> >> create mode 100644 tools/lib/bpf/loc.c
> >>
> >
> > [...]
> >
> >> diff --git a/tools/lib/bpf/loc.bpf.h b/tools/lib/bpf/loc.bpf.h
> >> new file mode 100644
> >> index 000000000000..65dcff3ea513
> >> --- /dev/null
> >> +++ b/tools/lib/bpf/loc.bpf.h
> >> @@ -0,0 +1,297 @@
> >> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
> >> +/* Copyright (c) 2025, Oracle and/or its affiliates. */
> >> +#ifndef __LOC_BPF_H__
> >> +#define __LOC_BPF_H__
> >> +
> >> +#include <linux/errno.h>
> >> +#include "bpf_helpers.h"
> >> +#include "bpf_tracing.h"
> >> +
> >> +/* Below types and maps are internal implementation details of libbpf's loc
> >> + * support and are subjects to change. Also, bpf_loc_xxx() API helpers should
> >> + * be considered an unstable API as well and might be adjusted based on user
> >> + * feedback from using libbpf's location support in production.
> >> + *
> >> + * This is based heavily upon usdt.bpf.h.
> >> + */
> >> +
> >> +/* User can override BPF_LOC_MAX_SPEC_CNT to change default size of internal
> >> + * map that keeps track of location argument specifications. This might be
> >> + * necessary if there are a lot of location attachments.
> >> + */
> >> +#ifndef BPF_LOC_MAX_SPEC_CNT
> >> +#define BPF_LOC_MAX_SPEC_CNT 256
> >> +#endif
> >> +/* User can override BPF_LOC_MAX_IP_CNT to change default size of internal
> >> + * map that keeps track of IP (memory address) mapping to loc argument
> >> + * specification.
> >> + * Note, if kernel supports BPF cookies, this map is not used and could be
> >> + * resized all the way to 1 to save a bit of memory.
> >
> > is this just a copy/paste of really we will try to support kernels
> > without BPF cookies for something bleeding edge like this?..
> >
>
> Yeah, copy-paste; it seems unlikely that a kernel would have location
> data and not have BPF cookie support. Even given the fact distros
> backport stuff it's generally fixes not features like this. If we end up
> moving some testing code to selftests I'll simplify removing no-cookie
> workarounds. Thanks!
yeah, going forward we can assume BPF cookies are available, they are
pretty fundamental
>
> Alan
^ permalink raw reply [flat|nested] 63+ messages in thread
* [RFC bpf-next 15/15] selftests/bpf: Add test tracing inline site using SEC("kloc")
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (13 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 14/15] libbpf: add support for BTF location attachment Alan Maguire
@ 2025-10-08 17:35 ` Alan Maguire
2025-10-12 23:45 ` [RFC bpf-next 00/15] support inline tracing with BTF Alexei Starovoitov
2025-10-23 22:32 ` Eduard Zingerman
16 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-08 17:35 UTC (permalink / raw)
To: ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf, Alan Maguire
Add a test tracing a vmlinux inline function called from
__sys_bpf() and ensure one of its arguments - if available -
is as expected.
A simple test as a starting point but it does demonstrate the
viability of the approach.
Ideally we would add a bunch of inlines to bpf_testmod, but
need to have BTF distillation/relocation working for .BTF.extra
sections; that is not yet implemented.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
---
tools/testing/selftests/bpf/prog_tests/kloc.c | 51 +++++++++++++++++++
tools/testing/selftests/bpf/progs/kloc.c | 36 +++++++++++++
2 files changed, 87 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/kloc.c
create mode 100644 tools/testing/selftests/bpf/progs/kloc.c
diff --git a/tools/testing/selftests/bpf/prog_tests/kloc.c b/tools/testing/selftests/bpf/prog_tests/kloc.c
new file mode 100644
index 000000000000..a88a43f25909
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/kloc.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025, Oracle and/or its affiliates. */
+
+#include <test_progs.h>
+#include <sys/stat.h>
+
+#include "kloc.skel.h"
+
+void test_kloc(void)
+{
+ int err = 0, duration = 0;
+ struct kloc *skel;
+ struct stat sb;
+
+ /* If CONFIG_DEBUG_INFO_BTF_EXTRA=m , ensure vmlinux BTF extra is
+ * loaded.
+ */
+ system("modprobe btf_extra");
+
+ /* Kernel may not have been compiled with extra BTF info or pahole
+ * may not have support.
+ */
+ if (stat("/sys/kernel/btf_extra/vmlinux", &sb) != 0)
+ test__skip();
+
+ skel = kloc__open_and_load();
+ if (CHECK(!skel, "skel_load", "skeleton failed: %d\n", err))
+ goto cleanup;
+
+ skel->bss->test_pid = getpid();
+
+ err = kloc__attach(skel);
+ if (!ASSERT_OK(err, "attach"))
+ goto cleanup;
+ /* trigger bpf syscall to trigger kloc */
+ (void) bpf_obj_get("/sys/fs/bpf/noexist");
+
+ ASSERT_GT(skel->bss->kloc_triggered, 0, "verify kloc was triggered");
+
+ /* this is a conditional since it is possible the size parameter
+ * is not available at the inline site.
+ *
+ * Expected size here is that from bpf_obj_get_opts(); see
+ * tools/lib/bpf/bpf.c.
+ */
+ if (skel->bss->kloc_size > 0)
+ ASSERT_EQ(skel->bss->kloc_size, offsetofend(union bpf_attr, path_fd), "verify kloc size set");
+
+cleanup:
+ kloc__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/kloc.c b/tools/testing/selftests/bpf/progs/kloc.c
new file mode 100644
index 000000000000..8007e53f3210
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/kloc.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025, Oracle and/or its affiliates. */
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+#include <bpf/loc.bpf.h>
+
+char _license[] SEC("license") = "GPL";
+
+int kloc_triggered;
+size_t kloc_size;
+int test_pid;
+
+/* This function is inlined to __sys_bpf() and we trigger a call to
+ * it via bpf_obj_get_opts().
+ */
+SEC("kloc/vmlinux:copy_from_bpfptr_offset")
+int BPF_KLOC(trace_copy_from_bpfptr_offset, void *dst, void *uattr, size_t offset, size_t size)
+{
+ int pid = bpf_get_current_pid_tgid() >> 32;
+ long s;
+
+ if (test_pid != pid)
+ return 0;
+
+ kloc_triggered++;
+
+ /* is arg available? */
+ if (bpf_loc_arg(ctx, 3, &s) == 0)
+ kloc_size = size;
+
+ return 0;
+}
--
2.39.3
^ permalink raw reply related [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (14 preceding siblings ...)
2025-10-08 17:35 ` [RFC bpf-next 15/15] selftests/bpf: Add test tracing inline site using SEC("kloc") Alan Maguire
@ 2025-10-12 23:45 ` Alexei Starovoitov
2025-10-13 7:38 ` Alan Maguire
2025-10-23 22:32 ` Eduard Zingerman
16 siblings, 1 reply; 63+ messages in thread
From: Alexei Starovoitov @ 2025-10-12 23:45 UTC (permalink / raw)
To: Alan Maguire
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Arnaldo Carvalho de Melo, Thierry Treyer,
Yonghong Song, Song Liu, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
Ihor Solodrai, David Faust, Jose E. Marchesi, bpf
On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> In terms of BTF encoding, we wind up with 12010 LOC_PARAM which are
> referenced in various combinations from 37061 LOC_PROTO. We see that
> given that there are over 400,000 inline sites, deduplication has
> considerably cut down on the overhead of representing this information.
Looking at loc_param and loc_proto... they could have been 8 bytes
smaller easily. So the math there is (12k+37k) * 8 ~= 400k byte
is not worth saving, since locsec dominates the size anyway ?
Having a common struct btf_type for all of them also helps dedup, I guess ?
A bit uncomfortable choice, but probably ok.
> LOCSEC will be 443354*16 bytes, i.e. 6.76 Mb. Between extra FUNC_PROTO,
> LOC_PROTO, LOC_PARAM and LOCSECs we wind up adding 9.2Mb to accommodate
> 443354 inline sites and all their metadata. This works out as
> approximately 22 bytes to fully represent each inline site, so we can
> see the benefits of deduplication of LOC_PARAM and LOC_PROTOs in this scheme.
>
> When vmlinux BTF inline-related info (FUNC_PROTO, LOC_PARAM, LOC_PROTO
> and LOCSECs are delivered via a module (btf_extra.ko.gz), the on-disk
> size of that module with compression drops from 9.2Mb to 2.8Mb.
>
> Modules also provide .BTF.extra info in their .BTF.extra sections; we
> can see the stats for these as follows:
>
> $ find . -name *.ko|xargs objdump -h |grep ".BTF.extra"|awk '{ sum += strtonum("0x"$3); count++ } END { print "total (kbytes): " sum/1024 " num modules: " count " average(kbytes): " sum/1024/count}'
> total (kbytes): 46653.5 num modules: 3044 average(kbytes): 15.3264
>
> So we add 46Mb of .BTF.extra data in total across 3044 modules, averaging
> 15kbytes per module.
>
> Future work/questions
>
> - the same scheme could be used to represent functions with optimized-out
> parameters (which we leave out of BTF encoding), hence the more general
> "location" term (as opposed to calling them inlines)
> - perhaps we should have a separate CONFIG_DEBUG_INFO_BTF_EXTRA_MODULES=y|n
> as we do with CONFIG_DEBUG_INFO_BTF_MODULES?
> - .BTF.extra is probably a bad name, given that we have .BTF.ext already...
yeah. 'extra' doesn't really fit. Especially since that will be a hard coded
name of the special module.
Maybe "BTF.inline_info" for section name and "btf_inline_info.ko" ?
The partially inlined functions were the biggest footgun so far.
Missing fully inlined is painful, but it's not a footgun.
So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
user space is not enough. It's great and, probably, can be supported,
but the kernel should use this "BTF.inline_info" as well to
preserve "backward compatibility" for functions that were
not-inlined in an older kernel and got partially inlined in a new kernel.
If we could use kprobe-multi then usdt-like bpf_loc_arg() would
make a lot of sense, but since libbpf has to attach a bunch
of regular kprobes it seems to me the kernel support is more appropriate
for the whole thing.
I mean when the kernel processes SEC("fentry/foo") into partially
inlined function "foo" it should use fentry for "foo" and
automatically add kprobe into inlined callsites and automatically
generated code that collects arguments from appropriate registers
and make "fentry/foo" behave like "foo" was not inlined at all.
Arguably, we can use a new attach type.
If we teach the kernel to do that then doing bpf_loc_arg() and a bunch
of regular kprobes from libbpf is unnecessary.
The kernel can do the same transparently and prepare the args
depending on location.
If some of the callsites are missing args it can fail the whole operation.
Of course, doing the whole thing from libbpf feels good,
since we're burdening the kernel with extra complexity,
but lack of kprobe-multi changes the way to think about this trade off.
Whether we decide that the kernel should do it or stay with bpf_loc_arg()
the first few patches and pahole support can/should be landed first.
Just .02 so far. Need to understand the whole thing better.
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-12 23:45 ` [RFC bpf-next 00/15] support inline tracing with BTF Alexei Starovoitov
@ 2025-10-13 7:38 ` Alan Maguire
2025-10-14 0:12 ` Alexei Starovoitov
0 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-13 7:38 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Arnaldo Carvalho de Melo, Thierry Treyer,
Yonghong Song, Song Liu, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Jiri Olsa, Quentin Monnet,
Ihor Solodrai, David Faust, Jose E. Marchesi, bpf
On 13/10/2025 00:45, Alexei Starovoitov wrote:
> On Wed, Oct 8, 2025 at 10:35 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> In terms of BTF encoding, we wind up with 12010 LOC_PARAM which are
>> referenced in various combinations from 37061 LOC_PROTO. We see that
>> given that there are over 400,000 inline sites, deduplication has
>> considerably cut down on the overhead of representing this information.
>
> Looking at loc_param and loc_proto... they could have been 8 bytes
> smaller easily. So the math there is (12k+37k) * 8 ~= 400k byte
> is not worth saving, since locsec dominates the size anyway ?
Yeah, LOCSEC dominates, and making the params and prototypes easily
dedup-able really helps; they wind up totalling less than 1Mb for all
inline sites. The LOCSEC are over two-thirds of the size, then the
majority of the remainder is function prototypes and string names.
So LOC_PROTO/LOC_PARAM turn out to be one of the smallest components
thanks to dedup.
> Having a common struct btf_type for all of them also helps dedup, I guess ?
> A bit uncomfortable choice, but probably ok.
>
Yeah.
>> LOCSEC will be 443354*16 bytes, i.e. 6.76 Mb. Between extra FUNC_PROTO,
>> LOC_PROTO, LOC_PARAM and LOCSECs we wind up adding 9.2Mb to accommodate
>> 443354 inline sites and all their metadata. This works out as
>> approximately 22 bytes to fully represent each inline site, so we can
>> see the benefits of deduplication of LOC_PARAM and LOC_PROTOs in this scheme.
>>
>> When vmlinux BTF inline-related info (FUNC_PROTO, LOC_PARAM, LOC_PROTO
>> and LOCSECs are delivered via a module (btf_extra.ko.gz), the on-disk
>> size of that module with compression drops from 9.2Mb to 2.8Mb.
>>
>> Modules also provide .BTF.extra info in their .BTF.extra sections; we
>> can see the stats for these as follows:
>>
>> $ find . -name *.ko|xargs objdump -h |grep ".BTF.extra"|awk '{ sum += strtonum("0x"$3); count++ } END { print "total (kbytes): " sum/1024 " num modules: " count " average(kbytes): " sum/1024/count}'
>> total (kbytes): 46653.5 num modules: 3044 average(kbytes): 15.3264
>>
>> So we add 46Mb of .BTF.extra data in total across 3044 modules, averaging
>> 15kbytes per module.
>>
>> Future work/questions
>>
>> - the same scheme could be used to represent functions with optimized-out
>> parameters (which we leave out of BTF encoding), hence the more general
>> "location" term (as opposed to calling them inlines)
>> - perhaps we should have a separate CONFIG_DEBUG_INFO_BTF_EXTRA_MODULES=y|n
>> as we do with CONFIG_DEBUG_INFO_BTF_MODULES?
>> - .BTF.extra is probably a bad name, given that we have .BTF.ext already...
>
> yeah. 'extra' doesn't really fit. Especially since that will be a hard coded
> name of the special module.
> Maybe "BTF.inline_info" for section name and "btf_inline_info.ko" ?
>
I was trying to avoid being specific about inlines since the same
approach works for function sites with optimized-out parameters and they
could be easily added to the representation (and probably should be in a
future version of this series). Another "extra" source of info
potentially is the (non per-cpu) global variables that Stephen sent
patches for a while back and the feeling was it was too big to add to
vmlinux BTF proper.
But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
> The partially inlined functions were the biggest footgun so far.
> Missing fully inlined is painful, but it's not a footgun.
> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
> user space is not enough. It's great and, probably, can be supported,
> but the kernel should use this "BTF.inline_info" as well to
> preserve "backward compatibility" for functions that were
> not-inlined in an older kernel and got partially inlined in a new kernel.
>
That would be great; we'd need to teach the kernel to handle multi-split
BTF but I would hope that wouldn't be too tricky.
> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
> make a lot of sense, but since libbpf has to attach a bunch
> of regular kprobes it seems to me the kernel support is more appropriate
> for the whole thing.
I'm happy with either a userspace or kernel-based approach; the main aim
is to provide this functionality in as straightforward a form as
possible to tracers/libbpf. I have to confess I didn't follow the whole
kprobe multi progress, but at one stage that was more kprobe-based
right? Would there be any value in exploring a flavour of kprobe-multi
that didn't use fprobe and might work for this sort of use case? As you
say if we had that keeping a user-space based approach might be more
attractive as an option.
> I mean when the kernel processes SEC("fentry/foo") into partially
> inlined function "foo" it should use fentry for "foo" and
> automatically add kprobe into inlined callsites and automatically
> generated code that collects arguments from appropriate registers
> and make "fentry/foo" behave like "foo" was not inlined at all.
> Arguably, we can use a new attach type.
> If we teach the kernel to do that then doing bpf_loc_arg() and a bunch
> of regular kprobes from libbpf is unnecessary.
> The kernel can do the same transparently and prepare the args
> depending on location.
> If some of the callsites are missing args it can fail the whole operation.
There's a few options here but I think having attach modes which are
selectable - either best effort or all-or-none would both be needed I
think. The other thing that can go wrong (apart from args missing) is an
inline site can be within a "notrace" function but looking at the code,
kprobe attachment seems to handle that already (by failing) which is great.
> Of course, doing the whole thing from libbpf feels good,
> since we're burdening the kernel with extra complexity,
> but lack of kprobe-multi changes the way to think about this trade off.
>
> Whether we decide that the kernel should do it or stay with bpf_loc_arg()
> the first few patches and pahole support can/should be landed first.
>
Sounds great! Having patches 1-10 would be useful as that would allow us
in turn to update pahole's libbpf submodule commit to generate location
data, which would then allow us to update kbuild and start using it for
attach. So we can focus on generating the inline info first, and then
think about how we want to present that info to consumers.
> Just .02 so far. Need to understand the whole thing better.
Sure, thanks for the feedback! BTW the GNU cauldron videos are online
already so the presentation [1] about this is available now for folks
who missed it. I'd be happy to do a BPF office hours too of course if
that would be helpful in ironing out the details.
[1]
https://www.youtube.com/watch?v=03FiWIcic_g&list=PL_GiHdX17WtxuKn7QYme8EfbBS-RKSn0w&t=1640s
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-13 7:38 ` Alan Maguire
@ 2025-10-14 0:12 ` Alexei Starovoitov
2025-10-14 9:58 ` Alan Maguire
2025-10-14 11:52 ` Jiri Olsa
0 siblings, 2 replies; 63+ messages in thread
From: Alexei Starovoitov @ 2025-10-14 0:12 UTC (permalink / raw)
To: Alan Maguire, Jiri Olsa
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Arnaldo Carvalho de Melo, Thierry Treyer,
Yonghong Song, Song Liu, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Quentin Monnet, Ihor Solodrai,
David Faust, Jose E. Marchesi, bpf
On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
>
> I was trying to avoid being specific about inlines since the same
> approach works for function sites with optimized-out parameters and they
> could be easily added to the representation (and probably should be in a
> future version of this series). Another "extra" source of info
> potentially is the (non per-cpu) global variables that Stephen sent
> patches for a while back and the feeling was it was too big to add to
> vmlinux BTF proper.
>
> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
aux is too abstract and doesn't convey any meaning.
How about "BTF.func_info" ? It will cover inlined and optimized funcs.
Thinking more about reuse of struct btf_type for these...
After sleeping on it it feels a bit awkward today, since if they're
types they suppose to be in one table with other types,
searchable and so on, but we actually don't want them there.
btf_find_*() isn't fast and people are trying to optimize it.
Also if we teach the kernel to use these loc-s they probably
should be in a separate table.
global non per-cpu vars fit into current BTF's datasec concept,
so they can be another kernel module with a different name.
I guess one can argue that LOCSEC is similar to DATASEC.
Both need their own search tables separate from the main type table.
>
> > The partially inlined functions were the biggest footgun so far.
> > Missing fully inlined is painful, but it's not a footgun.
> > So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
> > user space is not enough. It's great and, probably, can be supported,
> > but the kernel should use this "BTF.inline_info" as well to
> > preserve "backward compatibility" for functions that were
> > not-inlined in an older kernel and got partially inlined in a new kernel.
> >
>
> That would be great; we'd need to teach the kernel to handle multi-split
> BTF but I would hope that wouldn't be too tricky.
>
> > If we could use kprobe-multi then usdt-like bpf_loc_arg() would
> > make a lot of sense, but since libbpf has to attach a bunch
> > of regular kprobes it seems to me the kernel support is more appropriate
> > for the whole thing.
>
> I'm happy with either a userspace or kernel-based approach; the main aim
> is to provide this functionality in as straightforward a form as
> possible to tracers/libbpf. I have to confess I didn't follow the whole
> kprobe multi progress, but at one stage that was more kprobe-based
> right? Would there be any value in exploring a flavour of kprobe-multi
> that didn't use fprobe and might work for this sort of use case? As you
> say if we had that keeping a user-space based approach might be more
> attractive as an option.
Agree.
Jiri,
how hard would it be to make multi-kprobe work on arbitrary IPs ?
>
> > I mean when the kernel processes SEC("fentry/foo") into partially
> > inlined function "foo" it should use fentry for "foo" and
> > automatically add kprobe into inlined callsites and automatically
> > generated code that collects arguments from appropriate registers
> > and make "fentry/foo" behave like "foo" was not inlined at all.
> > Arguably, we can use a new attach type.
> > If we teach the kernel to do that then doing bpf_loc_arg() and a bunch
> > of regular kprobes from libbpf is unnecessary.
> > The kernel can do the same transparently and prepare the args
> > depending on location.
> > If some of the callsites are missing args it can fail the whole operation.
>
> There's a few options here but I think having attach modes which are
> selectable - either best effort or all-or-none would both be needed I
> think.
Exactly. For partially inlined we would need all-or-none,
but I see a case where somebody would want to say:
"pls attach to all places where foo() is called and since
it's inlined the actual entry point may not be accurate and it's ok".
The latter would probably need a flag in tracing tools like bpftrace.
I think all-or-none is a better default.
> > Of course, doing the whole thing from libbpf feels good,
> > since we're burdening the kernel with extra complexity,
> > but lack of kprobe-multi changes the way to think about this trade off.
> >
> > Whether we decide that the kernel should do it or stay with bpf_loc_arg()
> > the first few patches and pahole support can/should be landed first.
> >
>
> Sounds great! Having patches 1-10 would be useful as that would allow us
> in turn to update pahole's libbpf submodule commit to generate location
> data, which would then allow us to update kbuild and start using it for
> attach. So we can focus on generating the inline info first, and then
> think about how we want to present that info to consumers.
Yep. Please post pahole patches for review. I doubt folks
will look into your git tree ;)
> Sure, thanks for the feedback! BTW the GNU cauldron videos are online
> already so the presentation [1] about this is available now for folks
> who missed it. I'd be happy to do a BPF office hours too of course if
> that would be helpful in ironing out the details.
Can you share the slides too ?
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-14 0:12 ` Alexei Starovoitov
@ 2025-10-14 9:58 ` Alan Maguire
2025-10-16 18:36 ` Andrii Nakryiko
2025-10-14 11:52 ` Jiri Olsa
1 sibling, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-14 9:58 UTC (permalink / raw)
To: Alexei Starovoitov, Jiri Olsa
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Arnaldo Carvalho de Melo, Thierry Treyer,
Yonghong Song, Song Liu, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Quentin Monnet, Ihor Solodrai,
David Faust, Jose E. Marchesi, bpf
On 14/10/2025 01:12, Alexei Starovoitov wrote:
> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>>
>> I was trying to avoid being specific about inlines since the same
>> approach works for function sites with optimized-out parameters and they
>> could be easily added to the representation (and probably should be in a
>> future version of this series). Another "extra" source of info
>> potentially is the (non per-cpu) global variables that Stephen sent
>> patches for a while back and the feeling was it was too big to add to
>> vmlinux BTF proper.
>>
>> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
>
> aux is too abstract and doesn't convey any meaning.
> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
>
Sure, works for me.
> Thinking more about reuse of struct btf_type for these...
> After sleeping on it it feels a bit awkward today, since if they're
> types they suppose to be in one table with other types,
> searchable and so on, but we actually don't want them there.
> btf_find_*() isn't fast and people are trying to optimize it.
> Also if we teach the kernel to use these loc-s they probably
> should be in a separate table.
>
The BTF with location info is a separate split BTF, so it won't regress
search times of vmlinux/module BTF. Searching by name isn't really a
need for the non-LOCSEC cases; None of the FUNC_PROTO, LOC_PROTO and
LOC_PARAM have names, so the searching that will be done to deal with
inlines will all be within the LOCSEC representations for the inlines,
and from there it'll just be id-based lookup.
Currently the LOCSECs are sorted internally by address, but we could
change that to be by name given that name-based lookup is the much more
likely search mode.
One limitation we hit is that the max BTF vlen number is not sufficient
to represent all the inlines in one LOCSEC; we max out at specifying a
vlen of 65535, and need over 400000 LOCSEC entries. So we add multiple
LOCSECs. That was just a workaround before, but for faster name-based
lookup we could perhaps make use of the multiple LOCSECs by grouping
them by sorted function names. So if the first LOCSEC was called
inline.a and the next LOCSEC inline.c or whatever we'd know locations
named a*, b* are in that first LOCSEC and then do a binary search within
it. We could limit the number of LOCSECs to some reasonable upper bound
like 1024 and this would mean we'd binary search between ~400 LOCSECs
first and then - once we'd found the right one - within it to optimize
lookup time.
> global non per-cpu vars fit into current BTF's datasec concept,
> so they can be another kernel module with a different name.
>
> I guess one can argue that LOCSEC is similar to DATASEC.
> Both need their own search tables separate from the main type table.
>
Right though we could use a hybrid approach of using the LOCSEC name +
multiple LOCSECs (which we need anyway) to speed things up.
>>
>>> The partially inlined functions were the biggest footgun so far.
>>> Missing fully inlined is painful, but it's not a footgun.
>>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
>>> user space is not enough. It's great and, probably, can be supported,
>>> but the kernel should use this "BTF.inline_info" as well to
>>> preserve "backward compatibility" for functions that were
>>> not-inlined in an older kernel and got partially inlined in a new kernel.
>>>
>>
>> That would be great; we'd need to teach the kernel to handle multi-split
>> BTF but I would hope that wouldn't be too tricky.
>>
>>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
>>> make a lot of sense, but since libbpf has to attach a bunch
>>> of regular kprobes it seems to me the kernel support is more appropriate
>>> for the whole thing.
>>
>> I'm happy with either a userspace or kernel-based approach; the main aim
>> is to provide this functionality in as straightforward a form as
>> possible to tracers/libbpf. I have to confess I didn't follow the whole
>> kprobe multi progress, but at one stage that was more kprobe-based
>> right? Would there be any value in exploring a flavour of kprobe-multi
>> that didn't use fprobe and might work for this sort of use case? As you
>> say if we had that keeping a user-space based approach might be more
>> attractive as an option.
>
> Agree.
>
> Jiri,
> how hard would it be to make multi-kprobe work on arbitrary IPs ?
>
>>
>>> I mean when the kernel processes SEC("fentry/foo") into partially
>>> inlined function "foo" it should use fentry for "foo" and
>>> automatically add kprobe into inlined callsites and automatically
>>> generated code that collects arguments from appropriate registers
>>> and make "fentry/foo" behave like "foo" was not inlined at all.
>>> Arguably, we can use a new attach type.
>>> If we teach the kernel to do that then doing bpf_loc_arg() and a bunch
>>> of regular kprobes from libbpf is unnecessary.
>>> The kernel can do the same transparently and prepare the args
>>> depending on location.
>>> If some of the callsites are missing args it can fail the whole operation.
>>
>> There's a few options here but I think having attach modes which are
>> selectable - either best effort or all-or-none would both be needed I
>> think.
>
> Exactly. For partially inlined we would need all-or-none,
> but I see a case where somebody would want to say:
> "pls attach to all places where foo() is called and since
> it's inlined the actual entry point may not be accurate and it's ok".
>
> The latter would probably need a flag in tracing tools like bpftrace.
> I think all-or-none is a better default.
>
Yep, agree.
>>> Of course, doing the whole thing from libbpf feels good,
>>> since we're burdening the kernel with extra complexity,
>>> but lack of kprobe-multi changes the way to think about this trade off.
>>>
>>> Whether we decide that the kernel should do it or stay with bpf_loc_arg()
>>> the first few patches and pahole support can/should be landed first.
>>>
>>
>> Sounds great! Having patches 1-10 would be useful as that would allow us
>> in turn to update pahole's libbpf submodule commit to generate location
>> data, which would then allow us to update kbuild and start using it for
>> attach. So we can focus on generating the inline info first, and then
>> think about how we want to present that info to consumers.
>
> Yep. Please post pahole patches for review. I doubt folks
> will look into your git tree ;)
>
Will do; just chasing a bug found in CI, once that's fixed I'll send
them out.
>> Sure, thanks for the feedback! BTW the GNU cauldron videos are online
>> already so the presentation [1] about this is available now for folks
>> who missed it. I'd be happy to do a BPF office hours too of course if
>> that would be helpful in ironing out the details.
>
> Can you share the slides too ?
Sure; thanks to Jose they are available here now:
https://conf.gnu-tools-cauldron.org/opo25/talk/SBMUWN/
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-14 9:58 ` Alan Maguire
@ 2025-10-16 18:36 ` Andrii Nakryiko
2025-10-23 14:37 ` Alan Maguire
0 siblings, 1 reply; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-16 18:36 UTC (permalink / raw)
To: Alan Maguire
Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Arnaldo Carvalho de Melo, Thierry Treyer, Yonghong Song, Song Liu,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
Quentin Monnet, Ihor Solodrai, David Faust, Jose E. Marchesi, bpf
On Tue, Oct 14, 2025 at 2:58 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 14/10/2025 01:12, Alexei Starovoitov wrote:
> > On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >>
> >> I was trying to avoid being specific about inlines since the same
> >> approach works for function sites with optimized-out parameters and they
> >> could be easily added to the representation (and probably should be in a
> >> future version of this series). Another "extra" source of info
> >> potentially is the (non per-cpu) global variables that Stephen sent
> >> patches for a while back and the feeling was it was too big to add to
> >> vmlinux BTF proper.
> >>
> >> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
> >
> > aux is too abstract and doesn't convey any meaning.
> > How about "BTF.func_info" ? It will cover inlined and optimized funcs.
> >
>
> Sure, works for me.
>
> > Thinking more about reuse of struct btf_type for these...
> > After sleeping on it it feels a bit awkward today, since if they're
> > types they suppose to be in one table with other types,
> > searchable and so on, but we actually don't want them there.
> > btf_find_*() isn't fast and people are trying to optimize it.
> > Also if we teach the kernel to use these loc-s they probably
> > should be in a separate table.
> >
>
> The BTF with location info is a separate split BTF, so it won't regress
> search times of vmlinux/module BTF. Searching by name isn't really a
> need for the non-LOCSEC cases; None of the FUNC_PROTO, LOC_PROTO and
> LOC_PARAM have names, so the searching that will be done to deal with
> inlines will all be within the LOCSEC representations for the inlines,
> and from there it'll just be id-based lookup.
>
> Currently the LOCSECs are sorted internally by address, but we could
> change that to be by name given that name-based lookup is the much more
> likely search mode.
>
> One limitation we hit is that the max BTF vlen number is not sufficient
> to represent all the inlines in one LOCSEC; we max out at specifying a
> vlen of 65535, and need over 400000 LOCSEC entries. So we add multiple
We have this, currently:
/* Max # of struct/union/enum members or func args */
#define BTF_MAX_VLEN 0xffff
struct btf_type {
__u32 name_off;
/* "info" bits arrangement
* bits 0-15: vlen (e.g. # of struct's members)
* bits 16-23: unused
* bits 24-28: kind (e.g. int, ptr, array...etc)
* bits 29-30: unused
* bit 31: kind_flag, currently used by
* struct, union, enum, fwd, enum64,
* decl_tag and type_tag
*/
Note those unused 16-23 bits. We can use them to extend vlen up to 8
million, which should hopefully be good enough? This split by name
prefix sounds unnecessarily convoluted, tbh.
> LOCSECs. That was just a workaround before, but for faster name-based
> lookup we could perhaps make use of the multiple LOCSECs by grouping
> them by sorted function names. So if the first LOCSEC was called
> inline.a and the next LOCSEC inline.c or whatever we'd know locations
> named a*, b* are in that first LOCSEC and then do a binary search within
> it. We could limit the number of LOCSECs to some reasonable upper bound
> like 1024 and this would mean we'd binary search between ~400 LOCSECs
> first and then - once we'd found the right one - within it to optimize
> lookup time.
>
> > global non per-cpu vars fit into current BTF's datasec concept,
> > so they can be another kernel module with a different name.
> >
> > I guess one can argue that LOCSEC is similar to DATASEC.
> > Both need their own search tables separate from the main type table.
> >
>
> Right though we could use a hybrid approach of using the LOCSEC name +
> multiple LOCSECs (which we need anyway) to speed things up.
> >>
> >>> The partially inlined functions were the biggest footgun so far.
> >>> Missing fully inlined is painful, but it's not a footgun.
> >>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
> >>> user space is not enough. It's great and, probably, can be supported,
> >>> but the kernel should use this "BTF.inline_info" as well to
> >>> preserve "backward compatibility" for functions that were
> >>> not-inlined in an older kernel and got partially inlined in a new kernel.
> >>>
> >>
> >> That would be great; we'd need to teach the kernel to handle multi-split
> >> BTF but I would hope that wouldn't be too tricky.
> >>
> >>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
> >>> make a lot of sense, but since libbpf has to attach a bunch
> >>> of regular kprobes it seems to me the kernel support is more appropriate
> >>> for the whole thing.
> >>
> >> I'm happy with either a userspace or kernel-based approach; the main aim
> >> is to provide this functionality in as straightforward a form as
> >> possible to tracers/libbpf. I have to confess I didn't follow the whole
> >> kprobe multi progress, but at one stage that was more kprobe-based
> >> right? Would there be any value in exploring a flavour of kprobe-multi
> >> that didn't use fprobe and might work for this sort of use case? As you
> >> say if we had that keeping a user-space based approach might be more
> >> attractive as an option.
> >
> > Agree.
> >
> > Jiri,
> > how hard would it be to make multi-kprobe work on arbitrary IPs ?
> >
> >>
> >>> I mean when the kernel processes SEC("fentry/foo") into partially
> >>> inlined function "foo" it should use fentry for "foo" and
> >>> automatically add kprobe into inlined callsites and automatically
> >>> generated code that collects arguments from appropriate registers
> >>> and make "fentry/foo" behave like "foo" was not inlined at all.
> >>> Arguably, we can use a new attach type.
> >>> If we teach the kernel to do that then doing bpf_loc_arg() and a bunch
> >>> of regular kprobes from libbpf is unnecessary.
> >>> The kernel can do the same transparently and prepare the args
> >>> depending on location.
> >>> If some of the callsites are missing args it can fail the whole operation.
> >>
> >> There's a few options here but I think having attach modes which are
> >> selectable - either best effort or all-or-none would both be needed I
> >> think.
> >
> > Exactly. For partially inlined we would need all-or-none,
> > but I see a case where somebody would want to say:
> > "pls attach to all places where foo() is called and since
> > it's inlined the actual entry point may not be accurate and it's ok".
> >
> > The latter would probably need a flag in tracing tools like bpftrace.
> > I think all-or-none is a better default.
> >
>
> Yep, agree.
>
> >>> Of course, doing the whole thing from libbpf feels good,
> >>> since we're burdening the kernel with extra complexity,
> >>> but lack of kprobe-multi changes the way to think about this trade off.
> >>>
> >>> Whether we decide that the kernel should do it or stay with bpf_loc_arg()
> >>> the first few patches and pahole support can/should be landed first.
> >>>
> >>
> >> Sounds great! Having patches 1-10 would be useful as that would allow us
> >> in turn to update pahole's libbpf submodule commit to generate location
> >> data, which would then allow us to update kbuild and start using it for
> >> attach. So we can focus on generating the inline info first, and then
> >> think about how we want to present that info to consumers.
> >
> > Yep. Please post pahole patches for review. I doubt folks
> > will look into your git tree ;)
> >
>
BTW, what happened to the self-described BTF patches? With these
additions we are going to break all the BTF-based tooling one more
time. Let's add minimal amount of changes to BTF to allow tools to
skip unknown BTF types and dump the rest? I don't remember all the
details by now, was there any major blocker last time? I feel like
that minimal approach of fixed size + vlen * vlen_size would still
work even for all these newly added types (even with the alternative
for LOC_PARAM I mention in the corresponding patch).
> Will do; just chasing a bug found in CI, once that's fixed I'll send
> them out.
>
> >> Sure, thanks for the feedback! BTW the GNU cauldron videos are online
> >> already so the presentation [1] about this is available now for folks
> >> who missed it. I'd be happy to do a BPF office hours too of course if
> >> that would be helpful in ironing out the details.
> >
> > Can you share the slides too ?
>
>
> Sure; thanks to Jose they are available here now:
>
> https://conf.gnu-tools-cauldron.org/opo25/talk/SBMUWN/
>
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-16 18:36 ` Andrii Nakryiko
@ 2025-10-23 14:37 ` Alan Maguire
2025-10-23 16:16 ` Andrii Nakryiko
0 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-23 14:37 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Arnaldo Carvalho de Melo, Thierry Treyer, Yonghong Song, Song Liu,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
Quentin Monnet, Ihor Solodrai, David Faust, Jose E. Marchesi, bpf
On 16/10/2025 19:36, Andrii Nakryiko wrote:
> On Tue, Oct 14, 2025 at 2:58 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On 14/10/2025 01:12, Alexei Starovoitov wrote:
>>> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>
>>>>
>>>> I was trying to avoid being specific about inlines since the same
>>>> approach works for function sites with optimized-out parameters and they
>>>> could be easily added to the representation (and probably should be in a
>>>> future version of this series). Another "extra" source of info
>>>> potentially is the (non per-cpu) global variables that Stephen sent
>>>> patches for a while back and the feeling was it was too big to add to
>>>> vmlinux BTF proper.
>>>>
>>>> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
>>>
>>> aux is too abstract and doesn't convey any meaning.
>>> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
>>>
>>
>> Sure, works for me.
>>
>>> Thinking more about reuse of struct btf_type for these...
>>> After sleeping on it it feels a bit awkward today, since if they're
>>> types they suppose to be in one table with other types,
>>> searchable and so on, but we actually don't want them there.
>>> btf_find_*() isn't fast and people are trying to optimize it.
>>> Also if we teach the kernel to use these loc-s they probably
>>> should be in a separate table.
>>>
>>
>> The BTF with location info is a separate split BTF, so it won't regress
>> search times of vmlinux/module BTF. Searching by name isn't really a
>> need for the non-LOCSEC cases; None of the FUNC_PROTO, LOC_PROTO and
>> LOC_PARAM have names, so the searching that will be done to deal with
>> inlines will all be within the LOCSEC representations for the inlines,
>> and from there it'll just be id-based lookup.
>>
>> Currently the LOCSECs are sorted internally by address, but we could
>> change that to be by name given that name-based lookup is the much more
>> likely search mode.
>>
>> One limitation we hit is that the max BTF vlen number is not sufficient
>> to represent all the inlines in one LOCSEC; we max out at specifying a
>> vlen of 65535, and need over 400000 LOCSEC entries. So we add multiple
>
> We have this, currently:
>
>
> /* Max # of struct/union/enum members or func args */
> #define BTF_MAX_VLEN 0xffff
>
> struct btf_type {
> __u32 name_off;
> /* "info" bits arrangement
> * bits 0-15: vlen (e.g. # of struct's members)
> * bits 16-23: unused
> * bits 24-28: kind (e.g. int, ptr, array...etc)
> * bits 29-30: unused
> * bit 31: kind_flag, currently used by
> * struct, union, enum, fwd, enum64,
> * decl_tag and type_tag
> */
>
>
> Note those unused 16-23 bits. We can use them to extend vlen up to 8
> million, which should hopefully be good enough? This split by name
> prefix sounds unnecessarily convoluted, tbh.
>
That would be great! Do you have a preference for how libbpf might
handle this? Currently we have
static inline __u16 btf_vlen(const struct btf_type *t)
{
return BTF_INFO_VLEN(t->info);
}
As a result many consumers (in libbpf and elsewhere) use a __u16 for the
vlen value. Would it make sense to add
static inline __u32 btf_extended_vlen(const struct btf_type *t)
{
return BTF_INFO_VLEN(t->info);
}
perhaps?
>
>
>> LOCSECs. That was just a workaround before, but for faster name-based
>> lookup we could perhaps make use of the multiple LOCSECs by grouping
>> them by sorted function names. So if the first LOCSEC was called
>> inline.a and the next LOCSEC inline.c or whatever we'd know locations
>> named a*, b* are in that first LOCSEC and then do a binary search within
>> it. We could limit the number of LOCSECs to some reasonable upper bound
>> like 1024 and this would mean we'd binary search between ~400 LOCSECs
>> first and then - once we'd found the right one - within it to optimize
>> lookup time.
>>
>>> global non per-cpu vars fit into current BTF's datasec concept,
>>> so they can be another kernel module with a different name.
>>>
>>> I guess one can argue that LOCSEC is similar to DATASEC.
>>> Both need their own search tables separate from the main type table.
>>>
>>
>> Right though we could use a hybrid approach of using the LOCSEC name +
>> multiple LOCSECs (which we need anyway) to speed things up.
>>>>
>>>>> The partially inlined functions were the biggest footgun so far.
>>>>> Missing fully inlined is painful, but it's not a footgun.
>>>>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
>>>>> user space is not enough. It's great and, probably, can be supported,
>>>>> but the kernel should use this "BTF.inline_info" as well to
>>>>> preserve "backward compatibility" for functions that were
>>>>> not-inlined in an older kernel and got partially inlined in a new kernel.
>>>>>
>>>>
>>>> That would be great; we'd need to teach the kernel to handle multi-split
>>>> BTF but I would hope that wouldn't be too tricky.
>>>>
>>>>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
>>>>> make a lot of sense, but since libbpf has to attach a bunch
>>>>> of regular kprobes it seems to me the kernel support is more appropriate
>>>>> for the whole thing.
>>>>
>>>> I'm happy with either a userspace or kernel-based approach; the main aim
>>>> is to provide this functionality in as straightforward a form as
>>>> possible to tracers/libbpf. I have to confess I didn't follow the whole
>>>> kprobe multi progress, but at one stage that was more kprobe-based
>>>> right? Would there be any value in exploring a flavour of kprobe-multi
>>>> that didn't use fprobe and might work for this sort of use case? As you
>>>> say if we had that keeping a user-space based approach might be more
>>>> attractive as an option.
>>>
>>> Agree.
>>>
>>> Jiri,
>>> how hard would it be to make multi-kprobe work on arbitrary IPs ?
>>>
>>>>
>>>>> I mean when the kernel processes SEC("fentry/foo") into partially
>>>>> inlined function "foo" it should use fentry for "foo" and
>>>>> automatically add kprobe into inlined callsites and automatically
>>>>> generated code that collects arguments from appropriate registers
>>>>> and make "fentry/foo" behave like "foo" was not inlined at all.
>>>>> Arguably, we can use a new attach type.
>>>>> If we teach the kernel to do that then doing bpf_loc_arg() and a bunch
>>>>> of regular kprobes from libbpf is unnecessary.
>>>>> The kernel can do the same transparently and prepare the args
>>>>> depending on location.
>>>>> If some of the callsites are missing args it can fail the whole operation.
>>>>
>>>> There's a few options here but I think having attach modes which are
>>>> selectable - either best effort or all-or-none would both be needed I
>>>> think.
>>>
>>> Exactly. For partially inlined we would need all-or-none,
>>> but I see a case where somebody would want to say:
>>> "pls attach to all places where foo() is called and since
>>> it's inlined the actual entry point may not be accurate and it's ok".
>>>
>>> The latter would probably need a flag in tracing tools like bpftrace.
>>> I think all-or-none is a better default.
>>>
>>
>> Yep, agree.
>>
>>>>> Of course, doing the whole thing from libbpf feels good,
>>>>> since we're burdening the kernel with extra complexity,
>>>>> but lack of kprobe-multi changes the way to think about this trade off.
>>>>>
>>>>> Whether we decide that the kernel should do it or stay with bpf_loc_arg()
>>>>> the first few patches and pahole support can/should be landed first.
>>>>>
>>>>
>>>> Sounds great! Having patches 1-10 would be useful as that would allow us
>>>> in turn to update pahole's libbpf submodule commit to generate location
>>>> data, which would then allow us to update kbuild and start using it for
>>>> attach. So we can focus on generating the inline info first, and then
>>>> think about how we want to present that info to consumers.
>>>
>>> Yep. Please post pahole patches for review. I doubt folks
>>> will look into your git tree ;)
>>>
>>
>
> BTW, what happened to the self-described BTF patches? With these
> additions we are going to break all the BTF-based tooling one more
> time. Let's add minimal amount of changes to BTF to allow tools to
> skip unknown BTF types and dump the rest? I don't remember all the
> details by now, was there any major blocker last time? I feel like
> that minimal approach of fixed size + vlen * vlen_size would still
> work even for all these newly added types (even with the alternative
> for LOC_PARAM I mention in the corresponding patch).
>
>
Yep that scheme would still work. The reason I didn't prioritize it here
is that the BTF with new LOC kinds is separate from the BTF that legacy
tools would be looking at, but I'd be happy to revive it if it'd help.
Thanks!
Alan
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-23 14:37 ` Alan Maguire
@ 2025-10-23 16:16 ` Andrii Nakryiko
2025-10-24 11:53 ` Alan Maguire
0 siblings, 1 reply; 63+ messages in thread
From: Andrii Nakryiko @ 2025-10-23 16:16 UTC (permalink / raw)
To: Alan Maguire
Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Arnaldo Carvalho de Melo, Thierry Treyer, Yonghong Song, Song Liu,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
Quentin Monnet, Ihor Solodrai, David Faust, Jose E. Marchesi, bpf
On Thu, Oct 23, 2025 at 7:37 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On 16/10/2025 19:36, Andrii Nakryiko wrote:
> > On Tue, Oct 14, 2025 at 2:58 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>
> >> On 14/10/2025 01:12, Alexei Starovoitov wrote:
> >>> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>>>
> >>>>
> >>>> I was trying to avoid being specific about inlines since the same
> >>>> approach works for function sites with optimized-out parameters and they
> >>>> could be easily added to the representation (and probably should be in a
> >>>> future version of this series). Another "extra" source of info
> >>>> potentially is the (non per-cpu) global variables that Stephen sent
> >>>> patches for a while back and the feeling was it was too big to add to
> >>>> vmlinux BTF proper.
> >>>>
> >>>> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
> >>>
> >>> aux is too abstract and doesn't convey any meaning.
> >>> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
> >>>
> >>
> >> Sure, works for me.
> >>
> >>> Thinking more about reuse of struct btf_type for these...
> >>> After sleeping on it it feels a bit awkward today, since if they're
> >>> types they suppose to be in one table with other types,
> >>> searchable and so on, but we actually don't want them there.
> >>> btf_find_*() isn't fast and people are trying to optimize it.
> >>> Also if we teach the kernel to use these loc-s they probably
> >>> should be in a separate table.
> >>>
> >>
> >> The BTF with location info is a separate split BTF, so it won't regress
> >> search times of vmlinux/module BTF. Searching by name isn't really a
> >> need for the non-LOCSEC cases; None of the FUNC_PROTO, LOC_PROTO and
> >> LOC_PARAM have names, so the searching that will be done to deal with
> >> inlines will all be within the LOCSEC representations for the inlines,
> >> and from there it'll just be id-based lookup.
> >>
> >> Currently the LOCSECs are sorted internally by address, but we could
> >> change that to be by name given that name-based lookup is the much more
> >> likely search mode.
> >>
> >> One limitation we hit is that the max BTF vlen number is not sufficient
> >> to represent all the inlines in one LOCSEC; we max out at specifying a
> >> vlen of 65535, and need over 400000 LOCSEC entries. So we add multiple
> >
> > We have this, currently:
> >
> >
> > /* Max # of struct/union/enum members or func args */
> > #define BTF_MAX_VLEN 0xffff
> >
> > struct btf_type {
> > __u32 name_off;
> > /* "info" bits arrangement
> > * bits 0-15: vlen (e.g. # of struct's members)
> > * bits 16-23: unused
> > * bits 24-28: kind (e.g. int, ptr, array...etc)
> > * bits 29-30: unused
> > * bit 31: kind_flag, currently used by
> > * struct, union, enum, fwd, enum64,
> > * decl_tag and type_tag
> > */
> >
> >
> > Note those unused 16-23 bits. We can use them to extend vlen up to 8
> > million, which should hopefully be good enough? This split by name
> > prefix sounds unnecessarily convoluted, tbh.
> >
>
> That would be great! Do you have a preference for how libbpf might
> handle this? Currently we have
>
>
> static inline __u16 btf_vlen(const struct btf_type *t)
> {
> return BTF_INFO_VLEN(t->info);
> }
>
> As a result many consumers (in libbpf and elsewhere) use a __u16 for the
> vlen value. Would it make sense to add
>
> static inline __u32 btf_extended_vlen(const struct btf_type *t)
> {
> return BTF_INFO_VLEN(t->info);
> }
>
> perhaps?
just update btf_vlen() to return __u32 and use more bits. Those bits
should be all zeroes today, so all this should be backwards
compatible.
>
>
> >
> >
> >> LOCSECs. That was just a workaround before, but for faster name-based
> >> lookup we could perhaps make use of the multiple LOCSECs by grouping
> >> them by sorted function names. So if the first LOCSEC was called
> >> inline.a and the next LOCSEC inline.c or whatever we'd know locations
> >> named a*, b* are in that first LOCSEC and then do a binary search within
> >> it. We could limit the number of LOCSECs to some reasonable upper bound
> >> like 1024 and this would mean we'd binary search between ~400 LOCSECs
> >> first and then - once we'd found the right one - within it to optimize
> >> lookup time.
> >>
> >>> global non per-cpu vars fit into current BTF's datasec concept,
> >>> so they can be another kernel module with a different name.
> >>>
> >>> I guess one can argue that LOCSEC is similar to DATASEC.
> >>> Both need their own search tables separate from the main type table.
> >>>
> >>
> >> Right though we could use a hybrid approach of using the LOCSEC name +
> >> multiple LOCSECs (which we need anyway) to speed things up.
> >>>>
> >>>>> The partially inlined functions were the biggest footgun so far.
> >>>>> Missing fully inlined is painful, but it's not a footgun.
> >>>>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
> >>>>> user space is not enough. It's great and, probably, can be supported,
> >>>>> but the kernel should use this "BTF.inline_info" as well to
> >>>>> preserve "backward compatibility" for functions that were
> >>>>> not-inlined in an older kernel and got partially inlined in a new kernel.
> >>>>>
> >>>>
> >>>> That would be great; we'd need to teach the kernel to handle multi-split
> >>>> BTF but I would hope that wouldn't be too tricky.
> >>>>
> >>>>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
> >>>>> make a lot of sense, but since libbpf has to attach a bunch
> >>>>> of regular kprobes it seems to me the kernel support is more appropriate
> >>>>> for the whole thing.
> >>>>
> >>>> I'm happy with either a userspace or kernel-based approach; the main aim
> >>>> is to provide this functionality in as straightforward a form as
> >>>> possible to tracers/libbpf. I have to confess I didn't follow the whole
> >>>> kprobe multi progress, but at one stage that was more kprobe-based
> >>>> right? Would there be any value in exploring a flavour of kprobe-multi
> >>>> that didn't use fprobe and might work for this sort of use case? As you
> >>>> say if we had that keeping a user-space based approach might be more
> >>>> attractive as an option.
> >>>
> >>> Agree.
> >>>
> >>> Jiri,
> >>> how hard would it be to make multi-kprobe work on arbitrary IPs ?
> >>>
> >>>>
> >>>>> I mean when the kernel processes SEC("fentry/foo") into partially
> >>>>> inlined function "foo" it should use fentry for "foo" and
> >>>>> automatically add kprobe into inlined callsites and automatically
> >>>>> generated code that collects arguments from appropriate registers
> >>>>> and make "fentry/foo" behave like "foo" was not inlined at all.
> >>>>> Arguably, we can use a new attach type.
> >>>>> If we teach the kernel to do that then doing bpf_loc_arg() and a bunch
> >>>>> of regular kprobes from libbpf is unnecessary.
> >>>>> The kernel can do the same transparently and prepare the args
> >>>>> depending on location.
> >>>>> If some of the callsites are missing args it can fail the whole operation.
> >>>>
> >>>> There's a few options here but I think having attach modes which are
> >>>> selectable - either best effort or all-or-none would both be needed I
> >>>> think.
> >>>
> >>> Exactly. For partially inlined we would need all-or-none,
> >>> but I see a case where somebody would want to say:
> >>> "pls attach to all places where foo() is called and since
> >>> it's inlined the actual entry point may not be accurate and it's ok".
> >>>
> >>> The latter would probably need a flag in tracing tools like bpftrace.
> >>> I think all-or-none is a better default.
> >>>
> >>
> >> Yep, agree.
> >>
> >>>>> Of course, doing the whole thing from libbpf feels good,
> >>>>> since we're burdening the kernel with extra complexity,
> >>>>> but lack of kprobe-multi changes the way to think about this trade off.
> >>>>>
> >>>>> Whether we decide that the kernel should do it or stay with bpf_loc_arg()
> >>>>> the first few patches and pahole support can/should be landed first.
> >>>>>
> >>>>
> >>>> Sounds great! Having patches 1-10 would be useful as that would allow us
> >>>> in turn to update pahole's libbpf submodule commit to generate location
> >>>> data, which would then allow us to update kbuild and start using it for
> >>>> attach. So we can focus on generating the inline info first, and then
> >>>> think about how we want to present that info to consumers.
> >>>
> >>> Yep. Please post pahole patches for review. I doubt folks
> >>> will look into your git tree ;)
> >>>
> >>
> >
> > BTW, what happened to the self-described BTF patches? With these
> > additions we are going to break all the BTF-based tooling one more
> > time. Let's add minimal amount of changes to BTF to allow tools to
> > skip unknown BTF types and dump the rest? I don't remember all the
> > details by now, was there any major blocker last time? I feel like
> > that minimal approach of fixed size + vlen * vlen_size would still
> > work even for all these newly added types (even with the alternative
> > for LOC_PARAM I mention in the corresponding patch).
> >
> >
>
> Yep that scheme would still work. The reason I didn't prioritize it here
> is that the BTF with new LOC kinds is separate from the BTF that legacy
> tools would be looking at, but I'd be happy to revive it if it'd help.
We are coming up on another big BTF update, so I think it's time to
add this minimal self-describing info and teach bpftool and other
tools to understand this, so that going forward we can add new types
without breaking anything. So yeah, I think we should revive and land
it roughly in the same time frame.
>
> Thanks!
>
> Alan
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-23 16:16 ` Andrii Nakryiko
@ 2025-10-24 11:53 ` Alan Maguire
0 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-24 11:53 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Arnaldo Carvalho de Melo, Thierry Treyer, Yonghong Song, Song Liu,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
Quentin Monnet, Ihor Solodrai, David Faust, Jose E. Marchesi, bpf
On 23/10/2025 17:16, Andrii Nakryiko wrote:
> On Thu, Oct 23, 2025 at 7:37 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>
>> On 16/10/2025 19:36, Andrii Nakryiko wrote:
>>> On Tue, Oct 14, 2025 at 2:58 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>
>>>> On 14/10/2025 01:12, Alexei Starovoitov wrote:
>>>>> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>>>
>>>>>>
>>>>>> I was trying to avoid being specific about inlines since the same
>>>>>> approach works for function sites with optimized-out parameters and they
>>>>>> could be easily added to the representation (and probably should be in a
>>>>>> future version of this series). Another "extra" source of info
>>>>>> potentially is the (non per-cpu) global variables that Stephen sent
>>>>>> patches for a while back and the feeling was it was too big to add to
>>>>>> vmlinux BTF proper.
>>>>>>
>>>>>> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
>>>>>
>>>>> aux is too abstract and doesn't convey any meaning.
>>>>> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
>>>>>
>>>>
>>>> Sure, works for me.
>>>>
>>>>> Thinking more about reuse of struct btf_type for these...
>>>>> After sleeping on it it feels a bit awkward today, since if they're
>>>>> types they suppose to be in one table with other types,
>>>>> searchable and so on, but we actually don't want them there.
>>>>> btf_find_*() isn't fast and people are trying to optimize it.
>>>>> Also if we teach the kernel to use these loc-s they probably
>>>>> should be in a separate table.
>>>>>
>>>>
>>>> The BTF with location info is a separate split BTF, so it won't regress
>>>> search times of vmlinux/module BTF. Searching by name isn't really a
>>>> need for the non-LOCSEC cases; None of the FUNC_PROTO, LOC_PROTO and
>>>> LOC_PARAM have names, so the searching that will be done to deal with
>>>> inlines will all be within the LOCSEC representations for the inlines,
>>>> and from there it'll just be id-based lookup.
>>>>
>>>> Currently the LOCSECs are sorted internally by address, but we could
>>>> change that to be by name given that name-based lookup is the much more
>>>> likely search mode.
>>>>
>>>> One limitation we hit is that the max BTF vlen number is not sufficient
>>>> to represent all the inlines in one LOCSEC; we max out at specifying a
>>>> vlen of 65535, and need over 400000 LOCSEC entries. So we add multiple
>>>
>>> We have this, currently:
>>>
>>>
>>> /* Max # of struct/union/enum members or func args */
>>> #define BTF_MAX_VLEN 0xffff
>>>
>>> struct btf_type {
>>> __u32 name_off;
>>> /* "info" bits arrangement
>>> * bits 0-15: vlen (e.g. # of struct's members)
>>> * bits 16-23: unused
>>> * bits 24-28: kind (e.g. int, ptr, array...etc)
>>> * bits 29-30: unused
>>> * bit 31: kind_flag, currently used by
>>> * struct, union, enum, fwd, enum64,
>>> * decl_tag and type_tag
>>> */
>>>
>>>
>>> Note those unused 16-23 bits. We can use them to extend vlen up to 8
>>> million, which should hopefully be good enough? This split by name
>>> prefix sounds unnecessarily convoluted, tbh.
>>>
>>
>> That would be great! Do you have a preference for how libbpf might
>> handle this? Currently we have
>>
>>
>> static inline __u16 btf_vlen(const struct btf_type *t)
>> {
>> return BTF_INFO_VLEN(t->info);
>> }
>>
>> As a result many consumers (in libbpf and elsewhere) use a __u16 for the
>> vlen value. Would it make sense to add
>>
>> static inline __u32 btf_extended_vlen(const struct btf_type *t)
>> {
>> return BTF_INFO_VLEN(t->info);
>> }
>>
>> perhaps?
>
> just update btf_vlen() to return __u32 and use more bits. Those bits
> should be all zeroes today, so all this should be backwards
> compatible.
>
>>
>>
>>>
>>>
>>>> LOCSECs. That was just a workaround before, but for faster name-based
>>>> lookup we could perhaps make use of the multiple LOCSECs by grouping
>>>> them by sorted function names. So if the first LOCSEC was called
>>>> inline.a and the next LOCSEC inline.c or whatever we'd know locations
>>>> named a*, b* are in that first LOCSEC and then do a binary search within
>>>> it. We could limit the number of LOCSECs to some reasonable upper bound
>>>> like 1024 and this would mean we'd binary search between ~400 LOCSECs
>>>> first and then - once we'd found the right one - within it to optimize
>>>> lookup time.
>>>>
>>>>> global non per-cpu vars fit into current BTF's datasec concept,
>>>>> so they can be another kernel module with a different name.
>>>>>
>>>>> I guess one can argue that LOCSEC is similar to DATASEC.
>>>>> Both need their own search tables separate from the main type table.
>>>>>
>>>>
>>>> Right though we could use a hybrid approach of using the LOCSEC name +
>>>> multiple LOCSECs (which we need anyway) to speed things up.
>>>>>>
>>>>>>> The partially inlined functions were the biggest footgun so far.
>>>>>>> Missing fully inlined is painful, but it's not a footgun.
>>>>>>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
>>>>>>> user space is not enough. It's great and, probably, can be supported,
>>>>>>> but the kernel should use this "BTF.inline_info" as well to
>>>>>>> preserve "backward compatibility" for functions that were
>>>>>>> not-inlined in an older kernel and got partially inlined in a new kernel.
>>>>>>>
>>>>>>
>>>>>> That would be great; we'd need to teach the kernel to handle multi-split
>>>>>> BTF but I would hope that wouldn't be too tricky.
>>>>>>
>>>>>>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
>>>>>>> make a lot of sense, but since libbpf has to attach a bunch
>>>>>>> of regular kprobes it seems to me the kernel support is more appropriate
>>>>>>> for the whole thing.
>>>>>>
>>>>>> I'm happy with either a userspace or kernel-based approach; the main aim
>>>>>> is to provide this functionality in as straightforward a form as
>>>>>> possible to tracers/libbpf. I have to confess I didn't follow the whole
>>>>>> kprobe multi progress, but at one stage that was more kprobe-based
>>>>>> right? Would there be any value in exploring a flavour of kprobe-multi
>>>>>> that didn't use fprobe and might work for this sort of use case? As you
>>>>>> say if we had that keeping a user-space based approach might be more
>>>>>> attractive as an option.
>>>>>
>>>>> Agree.
>>>>>
>>>>> Jiri,
>>>>> how hard would it be to make multi-kprobe work on arbitrary IPs ?
>>>>>
>>>>>>
>>>>>>> I mean when the kernel processes SEC("fentry/foo") into partially
>>>>>>> inlined function "foo" it should use fentry for "foo" and
>>>>>>> automatically add kprobe into inlined callsites and automatically
>>>>>>> generated code that collects arguments from appropriate registers
>>>>>>> and make "fentry/foo" behave like "foo" was not inlined at all.
>>>>>>> Arguably, we can use a new attach type.
>>>>>>> If we teach the kernel to do that then doing bpf_loc_arg() and a bunch
>>>>>>> of regular kprobes from libbpf is unnecessary.
>>>>>>> The kernel can do the same transparently and prepare the args
>>>>>>> depending on location.
>>>>>>> If some of the callsites are missing args it can fail the whole operation.
>>>>>>
>>>>>> There's a few options here but I think having attach modes which are
>>>>>> selectable - either best effort or all-or-none would both be needed I
>>>>>> think.
>>>>>
>>>>> Exactly. For partially inlined we would need all-or-none,
>>>>> but I see a case where somebody would want to say:
>>>>> "pls attach to all places where foo() is called and since
>>>>> it's inlined the actual entry point may not be accurate and it's ok".
>>>>>
>>>>> The latter would probably need a flag in tracing tools like bpftrace.
>>>>> I think all-or-none is a better default.
>>>>>
>>>>
>>>> Yep, agree.
>>>>
>>>>>>> Of course, doing the whole thing from libbpf feels good,
>>>>>>> since we're burdening the kernel with extra complexity,
>>>>>>> but lack of kprobe-multi changes the way to think about this trade off.
>>>>>>>
>>>>>>> Whether we decide that the kernel should do it or stay with bpf_loc_arg()
>>>>>>> the first few patches and pahole support can/should be landed first.
>>>>>>>
>>>>>>
>>>>>> Sounds great! Having patches 1-10 would be useful as that would allow us
>>>>>> in turn to update pahole's libbpf submodule commit to generate location
>>>>>> data, which would then allow us to update kbuild and start using it for
>>>>>> attach. So we can focus on generating the inline info first, and then
>>>>>> think about how we want to present that info to consumers.
>>>>>
>>>>> Yep. Please post pahole patches for review. I doubt folks
>>>>> will look into your git tree ;)
>>>>>
>>>>
>>>
>>> BTW, what happened to the self-described BTF patches? With these
>>> additions we are going to break all the BTF-based tooling one more
>>> time. Let's add minimal amount of changes to BTF to allow tools to
>>> skip unknown BTF types and dump the rest? I don't remember all the
>>> details by now, was there any major blocker last time? I feel like
>>> that minimal approach of fixed size + vlen * vlen_size would still
>>> work even for all these newly added types (even with the alternative
>>> for LOC_PARAM I mention in the corresponding patch).
>>>
>>>
>>
>> Yep that scheme would still work. The reason I didn't prioritize it here
>> is that the BTF with new LOC kinds is separate from the BTF that legacy
>> tools would be looking at, but I'd be happy to revive it if it'd help.
>
> We are coming up on another big BTF update, so I think it's time to
> add this minimal self-describing info and teach bpftool and other
> tools to understand this, so that going forward we can add new types
> without breaking anything. So yeah, I think we should revive and land
> it roughly in the same time frame.
>
Ok sounds good, I'll work on reviving that series as a prerequisite for
the location stuff ASAP.
One other BTF UAPI issue maybe we should look at; should we steal one
more bit for BTF kind representation; we currently have room for 32 and
are using 19, with 3 more for the location stuff. Feels like we should
move to supporting 64 kinds by stealing a bit there too, what do you
think? That would still leave us with one unused bit in "info".
Alan
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-14 0:12 ` Alexei Starovoitov
2025-10-14 9:58 ` Alan Maguire
@ 2025-10-14 11:52 ` Jiri Olsa
2025-10-14 14:55 ` Alan Maguire
1 sibling, 1 reply; 63+ messages in thread
From: Jiri Olsa @ 2025-10-14 11:52 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Arnaldo Carvalho de Melo,
Thierry Treyer, Yonghong Song, Song Liu, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Quentin Monnet, Ihor Solodrai,
David Faust, Jose E. Marchesi, bpf, Masami Hiramatsu,
Steven Rostedt
On Mon, Oct 13, 2025 at 05:12:45PM -0700, Alexei Starovoitov wrote:
> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >
> >
> > I was trying to avoid being specific about inlines since the same
> > approach works for function sites with optimized-out parameters and they
> > could be easily added to the representation (and probably should be in a
> > future version of this series). Another "extra" source of info
> > potentially is the (non per-cpu) global variables that Stephen sent
> > patches for a while back and the feeling was it was too big to add to
> > vmlinux BTF proper.
> >
> > But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
>
> aux is too abstract and doesn't convey any meaning.
> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
>
> Thinking more about reuse of struct btf_type for these...
> After sleeping on it it feels a bit awkward today, since if they're
> types they suppose to be in one table with other types,
> searchable and so on, but we actually don't want them there.
> btf_find_*() isn't fast and people are trying to optimize it.
> Also if we teach the kernel to use these loc-s they probably
> should be in a separate table.
>
> global non per-cpu vars fit into current BTF's datasec concept,
> so they can be another kernel module with a different name.
>
> I guess one can argue that LOCSEC is similar to DATASEC.
> Both need their own search tables separate from the main type table.
>
> >
> > > The partially inlined functions were the biggest footgun so far.
> > > Missing fully inlined is painful, but it's not a footgun.
> > > So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
> > > user space is not enough. It's great and, probably, can be supported,
> > > but the kernel should use this "BTF.inline_info" as well to
> > > preserve "backward compatibility" for functions that were
> > > not-inlined in an older kernel and got partially inlined in a new kernel.
> > >
> >
> > That would be great; we'd need to teach the kernel to handle multi-split
> > BTF but I would hope that wouldn't be too tricky.
> >
> > > If we could use kprobe-multi then usdt-like bpf_loc_arg() would
> > > make a lot of sense, but since libbpf has to attach a bunch
> > > of regular kprobes it seems to me the kernel support is more appropriate
> > > for the whole thing.
> >
> > I'm happy with either a userspace or kernel-based approach; the main aim
> > is to provide this functionality in as straightforward a form as
> > possible to tracers/libbpf. I have to confess I didn't follow the whole
> > kprobe multi progress, but at one stage that was more kprobe-based
> > right? Would there be any value in exploring a flavour of kprobe-multi
> > that didn't use fprobe and might work for this sort of use case? As you
> > say if we had that keeping a user-space based approach might be more
> > attractive as an option.
>
> Agree.
>
> Jiri,
> how hard would it be to make multi-kprobe work on arbitrary IPs ?
multi-kprobe uses fprobe which uses ftrace/fgraph fast api to attach,
but it can do that only on the entry of ftrace-able functions which
have nop5 hooks at the entry
attaching anywhere else requires standard kprobe and the attach time
(and execution time) will be bad
would be great if inlined functions kept the nop5/fentry hooks ;-)
but that's probably not that simple
jirka
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-14 11:52 ` Jiri Olsa
@ 2025-10-14 14:55 ` Alan Maguire
2025-10-14 23:04 ` Masami Hiramatsu
2025-10-15 14:17 ` Jiri Olsa
0 siblings, 2 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-14 14:55 UTC (permalink / raw)
To: Jiri Olsa, Alexei Starovoitov
Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Arnaldo Carvalho de Melo, Thierry Treyer,
Yonghong Song, Song Liu, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Quentin Monnet, Ihor Solodrai,
David Faust, Jose E. Marchesi, bpf, Masami Hiramatsu,
Steven Rostedt
On 14/10/2025 12:52, Jiri Olsa wrote:
> On Mon, Oct 13, 2025 at 05:12:45PM -0700, Alexei Starovoitov wrote:
>> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>
>>>
>>> I was trying to avoid being specific about inlines since the same
>>> approach works for function sites with optimized-out parameters and they
>>> could be easily added to the representation (and probably should be in a
>>> future version of this series). Another "extra" source of info
>>> potentially is the (non per-cpu) global variables that Stephen sent
>>> patches for a while back and the feeling was it was too big to add to
>>> vmlinux BTF proper.
>>>
>>> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
>>
>> aux is too abstract and doesn't convey any meaning.
>> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
>>
>> Thinking more about reuse of struct btf_type for these...
>> After sleeping on it it feels a bit awkward today, since if they're
>> types they suppose to be in one table with other types,
>> searchable and so on, but we actually don't want them there.
>> btf_find_*() isn't fast and people are trying to optimize it.
>> Also if we teach the kernel to use these loc-s they probably
>> should be in a separate table.
>>
>> global non per-cpu vars fit into current BTF's datasec concept,
>> so they can be another kernel module with a different name.
>>
>> I guess one can argue that LOCSEC is similar to DATASEC.
>> Both need their own search tables separate from the main type table.
>>
>>>
>>>> The partially inlined functions were the biggest footgun so far.
>>>> Missing fully inlined is painful, but it's not a footgun.
>>>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
>>>> user space is not enough. It's great and, probably, can be supported,
>>>> but the kernel should use this "BTF.inline_info" as well to
>>>> preserve "backward compatibility" for functions that were
>>>> not-inlined in an older kernel and got partially inlined in a new kernel.
>>>>
>>>
>>> That would be great; we'd need to teach the kernel to handle multi-split
>>> BTF but I would hope that wouldn't be too tricky.
>>>
>>>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
>>>> make a lot of sense, but since libbpf has to attach a bunch
>>>> of regular kprobes it seems to me the kernel support is more appropriate
>>>> for the whole thing.
>>>
>>> I'm happy with either a userspace or kernel-based approach; the main aim
>>> is to provide this functionality in as straightforward a form as
>>> possible to tracers/libbpf. I have to confess I didn't follow the whole
>>> kprobe multi progress, but at one stage that was more kprobe-based
>>> right? Would there be any value in exploring a flavour of kprobe-multi
>>> that didn't use fprobe and might work for this sort of use case? As you
>>> say if we had that keeping a user-space based approach might be more
>>> attractive as an option.
>>
>> Agree.
>>
>> Jiri,
>> how hard would it be to make multi-kprobe work on arbitrary IPs ?
>
> multi-kprobe uses fprobe which uses ftrace/fgraph fast api to attach,
> but it can do that only on the entry of ftrace-able functions which
> have nop5 hooks at the entry
>
> attaching anywhere else requires standard kprobe and the attach time
> (and execution time) will be bad
>
> would be great if inlined functions kept the nop5/fentry hooks ;-)
> but that's probably not that simple
>
Yeah, if it was doable - and with metadata about inline sites it
certainly _seems_ possible - it does seem to work against the reason we
inline stuff (saving overheads). Steve mentioned this as a possibility
at GNU cauldron too if I remember, so worth discussing of course!
I was thinking about something simpler to be honest; a flavour of kprobe
multi that used kprobes under the hood in kernel to be suitable for
inline sites without any tweaking of the sites. So there is a kprobe
performance penalty if you're tracing, but none otherwise.
Thanks!
Alan
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-14 14:55 ` Alan Maguire
@ 2025-10-14 23:04 ` Masami Hiramatsu
2025-10-15 14:17 ` Jiri Olsa
1 sibling, 0 replies; 63+ messages in thread
From: Masami Hiramatsu @ 2025-10-14 23:04 UTC (permalink / raw)
To: Alan Maguire
Cc: Jiri Olsa, Alexei Starovoitov, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Arnaldo Carvalho de Melo, Thierry Treyer, Yonghong Song, Song Liu,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
Quentin Monnet, Ihor Solodrai, David Faust, Jose E. Marchesi, bpf,
Masami Hiramatsu, Steven Rostedt
On Tue, 14 Oct 2025 15:55:53 +0100
Alan Maguire <alan.maguire@oracle.com> wrote:
> On 14/10/2025 12:52, Jiri Olsa wrote:
> > On Mon, Oct 13, 2025 at 05:12:45PM -0700, Alexei Starovoitov wrote:
> >> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>>
> >>>
> >>> I was trying to avoid being specific about inlines since the same
> >>> approach works for function sites with optimized-out parameters and they
> >>> could be easily added to the representation (and probably should be in a
> >>> future version of this series). Another "extra" source of info
> >>> potentially is the (non per-cpu) global variables that Stephen sent
> >>> patches for a while back and the feeling was it was too big to add to
> >>> vmlinux BTF proper.
> >>>
> >>> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
> >>
> >> aux is too abstract and doesn't convey any meaning.
> >> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
> >>
> >> Thinking more about reuse of struct btf_type for these...
> >> After sleeping on it it feels a bit awkward today, since if they're
> >> types they suppose to be in one table with other types,
> >> searchable and so on, but we actually don't want them there.
> >> btf_find_*() isn't fast and people are trying to optimize it.
> >> Also if we teach the kernel to use these loc-s they probably
> >> should be in a separate table.
> >>
> >> global non per-cpu vars fit into current BTF's datasec concept,
> >> so they can be another kernel module with a different name.
> >>
> >> I guess one can argue that LOCSEC is similar to DATASEC.
> >> Both need their own search tables separate from the main type table.
> >>
> >>>
> >>>> The partially inlined functions were the biggest footgun so far.
> >>>> Missing fully inlined is painful, but it's not a footgun.
> >>>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
> >>>> user space is not enough. It's great and, probably, can be supported,
> >>>> but the kernel should use this "BTF.inline_info" as well to
> >>>> preserve "backward compatibility" for functions that were
> >>>> not-inlined in an older kernel and got partially inlined in a new kernel.
> >>>>
> >>>
> >>> That would be great; we'd need to teach the kernel to handle multi-split
> >>> BTF but I would hope that wouldn't be too tricky.
> >>>
> >>>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
> >>>> make a lot of sense, but since libbpf has to attach a bunch
> >>>> of regular kprobes it seems to me the kernel support is more appropriate
> >>>> for the whole thing.
> >>>
> >>> I'm happy with either a userspace or kernel-based approach; the main aim
> >>> is to provide this functionality in as straightforward a form as
> >>> possible to tracers/libbpf. I have to confess I didn't follow the whole
> >>> kprobe multi progress, but at one stage that was more kprobe-based
> >>> right? Would there be any value in exploring a flavour of kprobe-multi
> >>> that didn't use fprobe and might work for this sort of use case? As you
> >>> say if we had that keeping a user-space based approach might be more
> >>> attractive as an option.
> >>
> >> Agree.
> >>
> >> Jiri,
> >> how hard would it be to make multi-kprobe work on arbitrary IPs ?
> >
> > multi-kprobe uses fprobe which uses ftrace/fgraph fast api to attach,
> > but it can do that only on the entry of ftrace-able functions which
> > have nop5 hooks at the entry
> >
> > attaching anywhere else requires standard kprobe and the attach time
> > (and execution time) will be bad
> >
> > would be great if inlined functions kept the nop5/fentry hooks ;-)
> > but that's probably not that simple
> >
>
> Yeah, if it was doable - and with metadata about inline sites it
> certainly _seems_ possible - it does seem to work against the reason we
> inline stuff (saving overheads). Steve mentioned this as a possibility
> at GNU cauldron too if I remember, so worth discussing of course!
IMHO, it may be hard to insert nop (actually it is mcount call) to the
inlined function, because the inlined function code is optimized with
the caller code. In this case, it is hard to find where is the original
entry code. For example, a specific entry code can be skipped even if
a part of its body is used. Thus we need to put mcount calls for each
(but there can be a code which calls both entry and the body).
But if we can work with compilers, since it knows how it optimizes the
code, it may be possible.
>
> I was thinking about something simpler to be honest; a flavour of kprobe
> multi that used kprobes under the hood in kernel to be suitable for
> inline sites without any tweaking of the sites. So there is a kprobe
> performance penalty if you're tracing, but none otherwise.
It is possible if we can find the actual entry points. Of course using
kprobes means it will take a time (overhead) to insert the sw
breakpoints and to sync the code.
Thank you,
>
> Thanks!
>
> Alan
--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-14 14:55 ` Alan Maguire
2025-10-14 23:04 ` Masami Hiramatsu
@ 2025-10-15 14:17 ` Jiri Olsa
2025-10-15 15:19 ` Alan Maguire
1 sibling, 1 reply; 63+ messages in thread
From: Jiri Olsa @ 2025-10-15 14:17 UTC (permalink / raw)
To: Alan Maguire
Cc: Jiri Olsa, Alexei Starovoitov, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Arnaldo Carvalho de Melo, Thierry Treyer, Yonghong Song, Song Liu,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
Quentin Monnet, Ihor Solodrai, David Faust, Jose E. Marchesi, bpf,
Masami Hiramatsu, Steven Rostedt
On Tue, Oct 14, 2025 at 03:55:53PM +0100, Alan Maguire wrote:
> On 14/10/2025 12:52, Jiri Olsa wrote:
> > On Mon, Oct 13, 2025 at 05:12:45PM -0700, Alexei Starovoitov wrote:
> >> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>>
> >>>
> >>> I was trying to avoid being specific about inlines since the same
> >>> approach works for function sites with optimized-out parameters and they
> >>> could be easily added to the representation (and probably should be in a
> >>> future version of this series). Another "extra" source of info
> >>> potentially is the (non per-cpu) global variables that Stephen sent
> >>> patches for a while back and the feeling was it was too big to add to
> >>> vmlinux BTF proper.
> >>>
> >>> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
> >>
> >> aux is too abstract and doesn't convey any meaning.
> >> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
> >>
> >> Thinking more about reuse of struct btf_type for these...
> >> After sleeping on it it feels a bit awkward today, since if they're
> >> types they suppose to be in one table with other types,
> >> searchable and so on, but we actually don't want them there.
> >> btf_find_*() isn't fast and people are trying to optimize it.
> >> Also if we teach the kernel to use these loc-s they probably
> >> should be in a separate table.
> >>
> >> global non per-cpu vars fit into current BTF's datasec concept,
> >> so they can be another kernel module with a different name.
> >>
> >> I guess one can argue that LOCSEC is similar to DATASEC.
> >> Both need their own search tables separate from the main type table.
> >>
> >>>
> >>>> The partially inlined functions were the biggest footgun so far.
> >>>> Missing fully inlined is painful, but it's not a footgun.
> >>>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
> >>>> user space is not enough. It's great and, probably, can be supported,
> >>>> but the kernel should use this "BTF.inline_info" as well to
> >>>> preserve "backward compatibility" for functions that were
> >>>> not-inlined in an older kernel and got partially inlined in a new kernel.
> >>>>
> >>>
> >>> That would be great; we'd need to teach the kernel to handle multi-split
> >>> BTF but I would hope that wouldn't be too tricky.
> >>>
> >>>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
> >>>> make a lot of sense, but since libbpf has to attach a bunch
> >>>> of regular kprobes it seems to me the kernel support is more appropriate
> >>>> for the whole thing.
> >>>
> >>> I'm happy with either a userspace or kernel-based approach; the main aim
> >>> is to provide this functionality in as straightforward a form as
> >>> possible to tracers/libbpf. I have to confess I didn't follow the whole
> >>> kprobe multi progress, but at one stage that was more kprobe-based
> >>> right? Would there be any value in exploring a flavour of kprobe-multi
> >>> that didn't use fprobe and might work for this sort of use case? As you
> >>> say if we had that keeping a user-space based approach might be more
> >>> attractive as an option.
> >>
> >> Agree.
> >>
> >> Jiri,
> >> how hard would it be to make multi-kprobe work on arbitrary IPs ?
> >
> > multi-kprobe uses fprobe which uses ftrace/fgraph fast api to attach,
> > but it can do that only on the entry of ftrace-able functions which
> > have nop5 hooks at the entry
> >
> > attaching anywhere else requires standard kprobe and the attach time
> > (and execution time) will be bad
> >
> > would be great if inlined functions kept the nop5/fentry hooks ;-)
> > but that's probably not that simple
> >
>
> Yeah, if it was doable - and with metadata about inline sites it
> certainly _seems_ possible - it does seem to work against the reason we
> inline stuff (saving overheads). Steve mentioned this as a possibility
> at GNU cauldron too if I remember, so worth discussing of course!
>
> I was thinking about something simpler to be honest; a flavour of kprobe
> multi that used kprobes under the hood in kernel to be suitable for
> inline sites without any tweaking of the sites. So there is a kprobe
> performance penalty if you're tracing, but none otherwise.
so you mean we'd still use kprobe_multi api and its code would use fprobe
for ftrace-able functions and standard kprobe for the rest?
jirka
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-15 14:17 ` Jiri Olsa
@ 2025-10-15 15:19 ` Alan Maguire
2025-10-15 18:35 ` Jiri Olsa
0 siblings, 1 reply; 63+ messages in thread
From: Alan Maguire @ 2025-10-15 15:19 UTC (permalink / raw)
To: Jiri Olsa
Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Arnaldo Carvalho de Melo,
Thierry Treyer, Yonghong Song, Song Liu, John Fastabend, KP Singh,
Stanislav Fomichev, Hao Luo, Quentin Monnet, Ihor Solodrai,
David Faust, Jose E. Marchesi, bpf, Masami Hiramatsu,
Steven Rostedt
On 15/10/2025 15:17, Jiri Olsa wrote:
> On Tue, Oct 14, 2025 at 03:55:53PM +0100, Alan Maguire wrote:
>> On 14/10/2025 12:52, Jiri Olsa wrote:
>>> On Mon, Oct 13, 2025 at 05:12:45PM -0700, Alexei Starovoitov wrote:
>>>> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>>
>>>>>
>>>>> I was trying to avoid being specific about inlines since the same
>>>>> approach works for function sites with optimized-out parameters and they
>>>>> could be easily added to the representation (and probably should be in a
>>>>> future version of this series). Another "extra" source of info
>>>>> potentially is the (non per-cpu) global variables that Stephen sent
>>>>> patches for a while back and the feeling was it was too big to add to
>>>>> vmlinux BTF proper.
>>>>>
>>>>> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
>>>>
>>>> aux is too abstract and doesn't convey any meaning.
>>>> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
>>>>
>>>> Thinking more about reuse of struct btf_type for these...
>>>> After sleeping on it it feels a bit awkward today, since if they're
>>>> types they suppose to be in one table with other types,
>>>> searchable and so on, but we actually don't want them there.
>>>> btf_find_*() isn't fast and people are trying to optimize it.
>>>> Also if we teach the kernel to use these loc-s they probably
>>>> should be in a separate table.
>>>>
>>>> global non per-cpu vars fit into current BTF's datasec concept,
>>>> so they can be another kernel module with a different name.
>>>>
>>>> I guess one can argue that LOCSEC is similar to DATASEC.
>>>> Both need their own search tables separate from the main type table.
>>>>
>>>>>
>>>>>> The partially inlined functions were the biggest footgun so far.
>>>>>> Missing fully inlined is painful, but it's not a footgun.
>>>>>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
>>>>>> user space is not enough. It's great and, probably, can be supported,
>>>>>> but the kernel should use this "BTF.inline_info" as well to
>>>>>> preserve "backward compatibility" for functions that were
>>>>>> not-inlined in an older kernel and got partially inlined in a new kernel.
>>>>>>
>>>>>
>>>>> That would be great; we'd need to teach the kernel to handle multi-split
>>>>> BTF but I would hope that wouldn't be too tricky.
>>>>>
>>>>>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
>>>>>> make a lot of sense, but since libbpf has to attach a bunch
>>>>>> of regular kprobes it seems to me the kernel support is more appropriate
>>>>>> for the whole thing.
>>>>>
>>>>> I'm happy with either a userspace or kernel-based approach; the main aim
>>>>> is to provide this functionality in as straightforward a form as
>>>>> possible to tracers/libbpf. I have to confess I didn't follow the whole
>>>>> kprobe multi progress, but at one stage that was more kprobe-based
>>>>> right? Would there be any value in exploring a flavour of kprobe-multi
>>>>> that didn't use fprobe and might work for this sort of use case? As you
>>>>> say if we had that keeping a user-space based approach might be more
>>>>> attractive as an option.
>>>>
>>>> Agree.
>>>>
>>>> Jiri,
>>>> how hard would it be to make multi-kprobe work on arbitrary IPs ?
>>>
>>> multi-kprobe uses fprobe which uses ftrace/fgraph fast api to attach,
>>> but it can do that only on the entry of ftrace-able functions which
>>> have nop5 hooks at the entry
>>>
>>> attaching anywhere else requires standard kprobe and the attach time
>>> (and execution time) will be bad
>>>
>>> would be great if inlined functions kept the nop5/fentry hooks ;-)
>>> but that's probably not that simple
>>>
>>
>> Yeah, if it was doable - and with metadata about inline sites it
>> certainly _seems_ possible - it does seem to work against the reason we
>> inline stuff (saving overheads). Steve mentioned this as a possibility
>> at GNU cauldron too if I remember, so worth discussing of course!
>>
>> I was thinking about something simpler to be honest; a flavour of kprobe
>> multi that used kprobes under the hood in kernel to be suitable for
>> inline sites without any tweaking of the sites. So there is a kprobe
>> performance penalty if you're tracing, but none otherwise.
>
> so you mean we'd still use kprobe_multi api and its code would use fprobe
> for ftrace-able functions and standard kprobe for the rest?
>
> jirka
Yeah, if possible. For the kernel inline sites we'd be dealing in raw
addresses rather than function names so that in itself might be enough
of a hint that it's not an fprobe site, so I guess it could be framed as
an extension of kprobe multi to support a mix of fprobe-able and
non-fprobe-able sites. Not sure how feasible that is though.
Alan
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-15 15:19 ` Alan Maguire
@ 2025-10-15 18:35 ` Jiri Olsa
0 siblings, 0 replies; 63+ messages in thread
From: Jiri Olsa @ 2025-10-15 18:35 UTC (permalink / raw)
To: Alan Maguire
Cc: Jiri Olsa, Alexei Starovoitov, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Arnaldo Carvalho de Melo, Thierry Treyer, Yonghong Song, Song Liu,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
Quentin Monnet, Ihor Solodrai, David Faust, Jose E. Marchesi, bpf,
Masami Hiramatsu, Steven Rostedt
On Wed, Oct 15, 2025 at 04:19:29PM +0100, Alan Maguire wrote:
> On 15/10/2025 15:17, Jiri Olsa wrote:
> > On Tue, Oct 14, 2025 at 03:55:53PM +0100, Alan Maguire wrote:
> >> On 14/10/2025 12:52, Jiri Olsa wrote:
> >>> On Mon, Oct 13, 2025 at 05:12:45PM -0700, Alexei Starovoitov wrote:
> >>>> On Mon, Oct 13, 2025 at 12:38 AM Alan Maguire <alan.maguire@oracle.com> wrote:
> >>>>>
> >>>>>
> >>>>> I was trying to avoid being specific about inlines since the same
> >>>>> approach works for function sites with optimized-out parameters and they
> >>>>> could be easily added to the representation (and probably should be in a
> >>>>> future version of this series). Another "extra" source of info
> >>>>> potentially is the (non per-cpu) global variables that Stephen sent
> >>>>> patches for a while back and the feeling was it was too big to add to
> >>>>> vmlinux BTF proper.
> >>>>>
> >>>>> But extra is a terrible name. .BTF.aux for auxiliary info perhaps?
> >>>>
> >>>> aux is too abstract and doesn't convey any meaning.
> >>>> How about "BTF.func_info" ? It will cover inlined and optimized funcs.
> >>>>
> >>>> Thinking more about reuse of struct btf_type for these...
> >>>> After sleeping on it it feels a bit awkward today, since if they're
> >>>> types they suppose to be in one table with other types,
> >>>> searchable and so on, but we actually don't want them there.
> >>>> btf_find_*() isn't fast and people are trying to optimize it.
> >>>> Also if we teach the kernel to use these loc-s they probably
> >>>> should be in a separate table.
> >>>>
> >>>> global non per-cpu vars fit into current BTF's datasec concept,
> >>>> so they can be another kernel module with a different name.
> >>>>
> >>>> I guess one can argue that LOCSEC is similar to DATASEC.
> >>>> Both need their own search tables separate from the main type table.
> >>>>
> >>>>>
> >>>>>> The partially inlined functions were the biggest footgun so far.
> >>>>>> Missing fully inlined is painful, but it's not a footgun.
> >>>>>> So I think doing "kloc" and usdt-like bpf_loc_arg() completely in
> >>>>>> user space is not enough. It's great and, probably, can be supported,
> >>>>>> but the kernel should use this "BTF.inline_info" as well to
> >>>>>> preserve "backward compatibility" for functions that were
> >>>>>> not-inlined in an older kernel and got partially inlined in a new kernel.
> >>>>>>
> >>>>>
> >>>>> That would be great; we'd need to teach the kernel to handle multi-split
> >>>>> BTF but I would hope that wouldn't be too tricky.
> >>>>>
> >>>>>> If we could use kprobe-multi then usdt-like bpf_loc_arg() would
> >>>>>> make a lot of sense, but since libbpf has to attach a bunch
> >>>>>> of regular kprobes it seems to me the kernel support is more appropriate
> >>>>>> for the whole thing.
> >>>>>
> >>>>> I'm happy with either a userspace or kernel-based approach; the main aim
> >>>>> is to provide this functionality in as straightforward a form as
> >>>>> possible to tracers/libbpf. I have to confess I didn't follow the whole
> >>>>> kprobe multi progress, but at one stage that was more kprobe-based
> >>>>> right? Would there be any value in exploring a flavour of kprobe-multi
> >>>>> that didn't use fprobe and might work for this sort of use case? As you
> >>>>> say if we had that keeping a user-space based approach might be more
> >>>>> attractive as an option.
> >>>>
> >>>> Agree.
> >>>>
> >>>> Jiri,
> >>>> how hard would it be to make multi-kprobe work on arbitrary IPs ?
> >>>
> >>> multi-kprobe uses fprobe which uses ftrace/fgraph fast api to attach,
> >>> but it can do that only on the entry of ftrace-able functions which
> >>> have nop5 hooks at the entry
> >>>
> >>> attaching anywhere else requires standard kprobe and the attach time
> >>> (and execution time) will be bad
> >>>
> >>> would be great if inlined functions kept the nop5/fentry hooks ;-)
> >>> but that's probably not that simple
> >>>
> >>
> >> Yeah, if it was doable - and with metadata about inline sites it
> >> certainly _seems_ possible - it does seem to work against the reason we
> >> inline stuff (saving overheads). Steve mentioned this as a possibility
> >> at GNU cauldron too if I remember, so worth discussing of course!
> >>
> >> I was thinking about something simpler to be honest; a flavour of kprobe
> >> multi that used kprobes under the hood in kernel to be suitable for
> >> inline sites without any tweaking of the sites. So there is a kprobe
> >> performance penalty if you're tracing, but none otherwise.
> >
> > so you mean we'd still use kprobe_multi api and its code would use fprobe
> > for ftrace-able functions and standard kprobe for the rest?
> >
> > jirka
>
> Yeah, if possible. For the kernel inline sites we'd be dealing in raw
> addresses rather than function names so that in itself might be enough
> of a hint that it's not an fprobe site, so I guess it could be framed as
> an extension of kprobe multi to support a mix of fprobe-able and
> non-fprobe-able sites. Not sure how feasible that is though.
that seems doable, kprobe-multi api already supports both symbols and addresses,
and because ftrace keeps track of each ftrace-able function we can tell which
is which via ftrace_location call
looks like there's also register_kprobes call that registers kprobes for multiple
addresses
I wonder what's the standard kprobe attach slowdow nowaday, it was substantial few
years back, will check to have an idea
jirka
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-08 17:34 [RFC bpf-next 00/15] support inline tracing with BTF Alan Maguire
` (15 preceding siblings ...)
2025-10-12 23:45 ` [RFC bpf-next 00/15] support inline tracing with BTF Alexei Starovoitov
@ 2025-10-23 22:32 ` Eduard Zingerman
2025-10-24 12:54 ` Alan Maguire
16 siblings, 1 reply; 63+ messages in thread
From: Eduard Zingerman @ 2025-10-23 22:32 UTC (permalink / raw)
To: Alan Maguire, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On Wed, 2025-10-08 at 18:34 +0100, Alan Maguire wrote:
> The Linux kernel is heavily inlined. As a result, function-focused
> observability means it can be difficult to map from code to system
> behaviour when tracing. A large number of functions effectively
> "disappear" at compile-time; approximately 100,000 are inlined to
> 443,000 sites in the gcc-14-built x86_64 kernel I have been testing
> with for example. This greatly outnumbers the number of available
> functions that were _not_ inlined. This disappearing act has
> traditionally been carried out on static functions but with
> Link-Time Optimization (LTO) non-static functions also become eligible
> for such optimization.
I looked at patches 1-12 and at pahole changes. Overall the changes
make sense to me, it's great that this is finally moving forward.
Left some minor comments in the thread and on the github.
Could you please post pahole changes to the mailing list,
to facilitate wider discussion?
(I'd like Ihor to take a look at the btf_encoder).
[...]
^ permalink raw reply [flat|nested] 63+ messages in thread* Re: [RFC bpf-next 00/15] support inline tracing with BTF
2025-10-23 22:32 ` Eduard Zingerman
@ 2025-10-24 12:54 ` Alan Maguire
0 siblings, 0 replies; 63+ messages in thread
From: Alan Maguire @ 2025-10-24 12:54 UTC (permalink / raw)
To: Eduard Zingerman, ast, daniel, andrii
Cc: martin.lau, acme, ttreyer, yonghong.song, song, john.fastabend,
kpsingh, sdf, haoluo, jolsa, qmo, ihor.solodrai, david.faust,
jose.marchesi, bpf
On 23/10/2025 23:32, Eduard Zingerman wrote:
> On Wed, 2025-10-08 at 18:34 +0100, Alan Maguire wrote:
>> The Linux kernel is heavily inlined. As a result, function-focused
>> observability means it can be difficult to map from code to system
>> behaviour when tracing. A large number of functions effectively
>> "disappear" at compile-time; approximately 100,000 are inlined to
>> 443,000 sites in the gcc-14-built x86_64 kernel I have been testing
>> with for example. This greatly outnumbers the number of available
>> functions that were _not_ inlined. This disappearing act has
>> traditionally been carried out on static functions but with
>> Link-Time Optimization (LTO) non-static functions also become eligible
>> for such optimization.
>
> I looked at patches 1-12 and at pahole changes. Overall the changes
> make sense to me, it's great that this is finally moving forward.
> Left some minor comments in the thread and on the github.
>
> Could you please post pahole changes to the mailing list,
> to facilitate wider discussion?
> (I'd like Ihor to take a look at the btf_encoder).
>
done; see [1]. There are a few issues here that will be fixed next time
around, but it should at least give a sense of how pahole will handle
inline encoding. Thanks!
[1]
https://lore.kernel.org/dwarves/20251024073328.370457-1-alan.maguire@oracle.com/
> [...]
^ permalink raw reply [flat|nested] 63+ messages in thread