bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tracing/probes: Allow use of BTF names to dereference pointers
@ 2025-07-29 15:33 Steven Rostedt
  2025-07-29 15:40 ` Steven Rostedt
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Steven Rostedt @ 2025-07-29 15:33 UTC (permalink / raw)
  To: LKML, Linux trace kernel, bpf
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Mark Rutland, Peter Zijlstra,
	Namhyung Kim, Takaya Saeki, Douglas Raillard, Tom Zanussi,
	Andrew Morton, Thomas Gleixner, Ian Rogers, aahringo

From: Steven Rostedt <rostedt@goodmis.org>

Add syntax to the FETCHARGS parsing of probes to allow the use of
structure and member names to get the offsets to dereference pointers.

Currently, a dereference must be a number, where the user has to figure
out manually the offset of a member of a structure that they want to
reference. For example, to get the size of a kmem_cache that was passed to
the function kmem_cache_alloc_noprof, one would need to do:

 # cd /sys/kernel/tracing
 # echo 'f:cache kmem_cache_alloc_noprof size=+0x18($arg1):u32' >> dynamic_events

This requires knowing that the offset of size is 0x18, which can be found
with gdb:

  (gdb) p &((struct kmem_cache *)0)->size
  $1 = (unsigned int *) 0x18

If BTF is in the kernel, it can be used to find this with names, where the
user doesn't need to find the actual offset:

 # echo 'f:cache kmem_cache_alloc_noprof size=+kmem_cache.size($arg1):u32' >> dynamic_events

Instead of the "+0x18", it would have "+kmem_cache.size" where the format is:

  +STRUCT.MEMBER[.MEMBER[..]]

The delimiter is '.' and the first item is the structure name. Then the
member of the structure to get the offset of. If that member is an
embedded structure, another '.MEMBER' may be added to get the offset of
its members with respect to the original value.

  "+kmem_cache.size($arg1)" is equivalent to:

  (*(struct kmem_cache *)$arg1).size

Anonymous structures are also handled:

  # echo 'e:xmit net.net_dev_xmit +net_device.name(+sk_buff.dev($skbaddr)):string' >> dynamic_events

Where "+net_device.name(+sk_buff.dev($skbaddr))" is equivalent to:

  (*(struct net_device *)((*(struct sk_buff *)($skbaddr))->dev)->name)

Note that "dev" of struct sk_buff is inside an anonymous structure:

struct sk_buff {
	union {
		struct {
			/* These two members must be first to match sk_buff_head. */
			struct sk_buff		*next;
			struct sk_buff		*prev;

			union {
				struct net_device	*dev;
				[..]
			};
		};
		[..]
	};

This will allow up to three deep of anonymous structures before it will
fail to find a member.

The above produces:

    sshd-session-1080    [000] b..5.  1526.337161: xmit: (net.net_dev_xmit) arg1="enp7s0"

And nested structures can be found by adding more members to the arg:

  # echo 'f:read filemap_readahead.isra.0 file=+0(+dentry.d_name.name(+file.f_path.dentry($arg2))):string' >> dynamic_events

The above is equivalent to:

  *((*(struct dentry *)(*(struct file *)$arg2)->f_path.dentry)->d_name.name)

And produces:

       trace-cmd-1381    [002] ...1.  2082.676268: read: (filemap_readahead.isra.0+0x0/0x150) file="trace.dat"

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
---
 Documentation/trace/kprobetrace.rst |   3 +
 kernel/trace/trace_btf.c            | 106 ++++++++++++++++++++++++++++
 kernel/trace/trace_btf.h            |  10 +++
 kernel/trace/trace_probe.c          |   7 +-
 4 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 3b6791c17e9b..00273157100c 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -54,6 +54,8 @@ Synopsis of kprobe_events
   $retval	: Fetch return value.(\*2)
   $comm		: Fetch current task comm.
   +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
+  +STRUCT.MEMBER[.MEMBER[..]](FETCHARG) : If BTF is supported, Fetch memory
+		  at FETCHARG + the offset of MEMBER inside of STRUCT.(\*5)
   \IMM		: Store an immediate value to the argument.
   NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
   FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
@@ -70,6 +72,7 @@ Synopsis of kprobe_events
         accesses one register.
   (\*3) this is useful for fetching a field of data structures.
   (\*4) "u" means user-space dereference. See :ref:`user_mem_access`.
+  (\*5) +STRUCT.MEMBER(FETCHARG) is equivalent to (*(struct STRUCT *)(FETCHARG)).MEMBER
 
 Function arguments at kretprobe
 -------------------------------
diff --git a/kernel/trace/trace_btf.c b/kernel/trace/trace_btf.c
index 5bbdbcbbde3c..b69404451410 100644
--- a/kernel/trace/trace_btf.c
+++ b/kernel/trace/trace_btf.c
@@ -120,3 +120,109 @@ const struct btf_member *btf_find_struct_member(struct btf *btf,
 	return member;
 }
 
+#define BITS_ROUNDDOWN_BYTES(bits) ((bits) >> 3)
+
+static int find_member(const char *ptr, struct btf *btf,
+		       const struct btf_type **type, int level)
+{
+	const struct btf_member *member;
+	const struct btf_type *t = *type;
+	int i;
+
+	/* Max of 3 depth of anonymous structures */
+	if (level > 3)
+		return -1;
+
+	for_each_member(i, t, member) {
+		const char *tname = btf_name_by_offset(btf, member->name_off);
+
+		if (strcmp(ptr, tname) == 0) {
+			*type = btf_type_by_id(btf, member->type);
+			return BITS_ROUNDDOWN_BYTES(member->offset);
+		}
+
+		/* Handle anonymous structures */
+		if (strlen(tname))
+			continue;
+
+		*type = btf_type_by_id(btf, member->type);
+		if (btf_type_is_struct(*type)) {
+			int offset = find_member(ptr, btf, type, level + 1);
+
+			if (offset < 0)
+				continue;
+
+			return offset + BITS_ROUNDDOWN_BYTES(member->offset);
+		}
+	}
+
+	return -1;
+}
+
+/**
+ * btf_find_offset - Find an offset of a member for a structure
+ * @arg: A structure name followed by one or more members
+ * @offset_p: A pointer to where to store the offset
+ *
+ * Will parse @arg with the expected format of: struct.member[[.member]..]
+ * It is delimited by '.'. The first item must be a structure type.
+ * The next are its members. If the member is also of a structure type it
+ * another member may follow ".member".
+ *
+ * Note, @arg is modified but will be put back to what it was on return.
+ *
+ * Returns: 0 on success and -EINVAL if no '.' is present
+ *    or -ENXIO if the structure or member is not found.
+ *    Returns -EINVAL if BTF is not defined.
+ *  On success, @offset_p will contain the offset of the member specified
+ *    by @arg.
+ */
+int btf_find_offset(char *arg, long *offset_p)
+{
+	const struct btf_type *t;
+	struct btf *btf;
+	long offset = 0;
+	char *ptr;
+	int ret;
+	s32 id;
+
+	ptr = strchr(arg, '.');
+	if (!ptr)
+		return -EINVAL;
+
+	*ptr = '\0';
+
+	id = bpf_find_btf_id(arg, BTF_KIND_STRUCT, &btf);
+	if (id < 0)
+		goto error;
+
+	/* Get BTF_KIND_FUNC type */
+	t = btf_type_by_id(btf, id);
+
+	/* May allow more than one member, as long as they are structures */
+	do {
+		if (!t || !btf_type_is_struct(t))
+			goto error;
+
+		*ptr++ = '.';
+		arg = ptr;
+		ptr = strchr(ptr, '.');
+		if (ptr)
+			*ptr = '\0';
+
+		ret = find_member(arg, btf, &t, 0);
+		if (ret < 0)
+			goto error;
+
+		offset += ret;
+
+	} while (ptr);
+
+	*offset_p = offset;
+	return 0;
+
+error:
+	if (ptr)
+		*ptr = '.';
+	return -ENXIO;
+}
diff --git a/kernel/trace/trace_btf.h b/kernel/trace/trace_btf.h
index 4bc44bc261e6..7b0797a6050b 100644
--- a/kernel/trace/trace_btf.h
+++ b/kernel/trace/trace_btf.h
@@ -9,3 +9,13 @@ const struct btf_member *btf_find_struct_member(struct btf *btf,
 						const struct btf_type *type,
 						const char *member_name,
 						u32 *anon_offset);
+
+#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
+/* Will modify arg, but will put it back before returning. */
+int btf_find_offset(char *arg, long *offset);
+#else
+static inline int btf_find_offset(char *arg, long *offset)
+{
+	return -EINVAL;
+}
+#endif
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 424751cdf31f..4c13e51ea481 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -1137,7 +1137,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 
 	case '+':	/* deref memory */
 	case '-':
-		if (arg[1] == 'u') {
+		if (arg[1] == 'u' && isdigit(arg[2])) {
 			deref = FETCH_OP_UDEREF;
 			arg[1] = arg[0];
 			arg++;
@@ -1150,7 +1150,10 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 			return -EINVAL;
 		}
 		*tmp = '\0';
-		ret = kstrtol(arg, 0, &offset);
+		if (arg[0] != '-' && !isdigit(*arg))
+			ret = btf_find_offset(arg, &offset);
+		else
+			ret = kstrtol(arg, 0, &offset);
 		if (ret) {
 			trace_probe_log_err(ctx->offset, BAD_DEREF_OFFS);
 			break;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing/probes: Allow use of BTF names to dereference pointers
  2025-07-29 15:33 [PATCH] tracing/probes: Allow use of BTF names to dereference pointers Steven Rostedt
@ 2025-07-29 15:40 ` Steven Rostedt
  2025-07-30 13:53 ` Masami Hiramatsu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2025-07-29 15:40 UTC (permalink / raw)
  To: LKML, Linux trace kernel, bpf
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Mark Rutland, Peter Zijlstra,
	Namhyung Kim, Takaya Saeki, Douglas Raillard, Tom Zanussi,
	Andrew Morton, Thomas Gleixner, Ian Rogers, aahringo

On Tue, 29 Jul 2025 11:33:35 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Anonymous structures are also handled:
> 
>   # echo 'e:xmit net.net_dev_xmit +net_device.name(+sk_buff.dev($skbaddr)):string' >> dynamic_events
> 
> Where "+net_device.name(+sk_buff.dev($skbaddr))" is equivalent to:
> 
>   (*(struct net_device *)((*(struct sk_buff *)($skbaddr))->dev)->name)

The above in wrong. It is equivalent to:

  (*(struct net_device *)((*(struct sk_buff *)($skbaddr)).dev).name)

I purposely tried to not use "->" but then failed to do so :-p

> 
> And nested structures can be found by adding more members to the arg:
> 
>   # echo 'f:read filemap_readahead.isra.0 file=+0(+dentry.d_name.name(+file.f_path.dentry($arg2))):string' >> dynamic_events
> 
> The above is equivalent to:
> 
>   *((*(struct dentry *)(*(struct file *)$arg2)->f_path.dentry)->d_name.name)

And this is supposed to be:

   *((*(struct dentry *)(*(struct file *)$arg2).f_path.dentry).d_name.name)

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing/probes: Allow use of BTF names to dereference pointers
  2025-07-29 15:33 [PATCH] tracing/probes: Allow use of BTF names to dereference pointers Steven Rostedt
  2025-07-29 15:40 ` Steven Rostedt
@ 2025-07-30 13:53 ` Masami Hiramatsu
  2025-07-30 14:33   ` Steven Rostedt
  2025-07-31 11:44 ` Douglas Raillard
  2025-07-31 21:52 ` Jiri Olsa
  3 siblings, 1 reply; 9+ messages in thread
From: Masami Hiramatsu @ 2025-07-30 13:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux trace kernel, bpf, Masami Hiramatsu,
	Mathieu Desnoyers, Mark Rutland, Peter Zijlstra, Namhyung Kim,
	Takaya Saeki, Douglas Raillard, Tom Zanussi, Andrew Morton,
	Thomas Gleixner, Ian Rogers, aahringo

On Tue, 29 Jul 2025 11:33:35 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> From: Steven Rostedt <rostedt@goodmis.org>
> 
> Add syntax to the FETCHARGS parsing of probes to allow the use of
> structure and member names to get the offsets to dereference pointers.
> 
> Currently, a dereference must be a number, where the user has to figure
> out manually the offset of a member of a structure that they want to
> reference. For example, to get the size of a kmem_cache that was passed to
> the function kmem_cache_alloc_noprof, one would need to do:
> 
>  # cd /sys/kernel/tracing
>  # echo 'f:cache kmem_cache_alloc_noprof size=+0x18($arg1):u32' >> dynamic_events
> 
> This requires knowing that the offset of size is 0x18, which can be found
> with gdb:
> 
>   (gdb) p &((struct kmem_cache *)0)->size
>   $1 = (unsigned int *) 0x18
> 
> If BTF is in the kernel, it can be used to find this with names, where the
> user doesn't need to find the actual offset:
> 
>  # echo 'f:cache kmem_cache_alloc_noprof size=+kmem_cache.size($arg1):u32' >> dynamic_events

Great! This is something like a evolution of assembler to "symbolic"
assembler. ;)

> 
> Instead of the "+0x18", it would have "+kmem_cache.size" where the format is:
> 
>   +STRUCT.MEMBER[.MEMBER[..]]

Yeah, and using '.' delimiter looks nice to me.

> 
> The delimiter is '.' and the first item is the structure name. Then the
> member of the structure to get the offset of. If that member is an
> embedded structure, another '.MEMBER' may be added to get the offset of
> its members with respect to the original value.
> 
>   "+kmem_cache.size($arg1)" is equivalent to:
> 
>   (*(struct kmem_cache *)$arg1).size
> 
> Anonymous structures are also handled:
> 
>   # echo 'e:xmit net.net_dev_xmit +net_device.name(+sk_buff.dev($skbaddr)):string' >> dynamic_events

So this only replaces the "offset" part. So we still need to use
+OFFS() syntax for dereferencing the pointer.

> 
> Where "+net_device.name(+sk_buff.dev($skbaddr))" is equivalent to:
> 
>   (*(struct net_device *)((*(struct sk_buff *)($skbaddr))->dev)->name)
> 
> Note that "dev" of struct sk_buff is inside an anonymous structure:
> 
> struct sk_buff {
> 	union {
> 		struct {
> 			/* These two members must be first to match sk_buff_head. */
> 			struct sk_buff		*next;
> 			struct sk_buff		*prev;
> 
> 			union {
> 				struct net_device	*dev;
> 				[..]
> 			};
> 		};
> 		[..]
> 	};
> 
> This will allow up to three deep of anonymous structures before it will
> fail to find a member.
> 
> The above produces:
> 
>     sshd-session-1080    [000] b..5.  1526.337161: xmit: (net.net_dev_xmit) arg1="enp7s0"
> 
> And nested structures can be found by adding more members to the arg:
> 
>   # echo 'f:read filemap_readahead.isra.0 file=+0(+dentry.d_name.name(+file.f_path.dentry($arg2))):string' >> dynamic_events
> 
> The above is equivalent to:
> 
>   *((*(struct dentry *)(*(struct file *)$arg2)->f_path.dentry)->d_name.name)
> 
> And produces:
> 
>        trace-cmd-1381    [002] ...1.  2082.676268: read: (filemap_readahead.isra.0+0x0/0x150) file="trace.dat"
> 

OK, the desgin looks good to me. I have some comments below.


> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>  Documentation/trace/kprobetrace.rst |   3 +
>  kernel/trace/trace_btf.c            | 106 ++++++++++++++++++++++++++++
>  kernel/trace/trace_btf.h            |  10 +++
>  kernel/trace/trace_probe.c          |   7 +-
>  4 files changed, 124 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
> index 3b6791c17e9b..00273157100c 100644
> --- a/Documentation/trace/kprobetrace.rst
> +++ b/Documentation/trace/kprobetrace.rst
> @@ -54,6 +54,8 @@ Synopsis of kprobe_events
>    $retval	: Fetch return value.(\*2)
>    $comm		: Fetch current task comm.
>    +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
> +  +STRUCT.MEMBER[.MEMBER[..]](FETCHARG) : If BTF is supported, Fetch memory
> +		  at FETCHARG + the offset of MEMBER inside of STRUCT.(\*5)
>    \IMM		: Store an immediate value to the argument.
>    NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
>    FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
> @@ -70,6 +72,7 @@ Synopsis of kprobe_events
>          accesses one register.
>    (\*3) this is useful for fetching a field of data structures.
>    (\*4) "u" means user-space dereference. See :ref:`user_mem_access`.
> +  (\*5) +STRUCT.MEMBER(FETCHARG) is equivalent to (*(struct STRUCT *)(FETCHARG)).MEMBER
>  
>  Function arguments at kretprobe
>  -------------------------------
> diff --git a/kernel/trace/trace_btf.c b/kernel/trace/trace_btf.c
> index 5bbdbcbbde3c..b69404451410 100644
> --- a/kernel/trace/trace_btf.c
> +++ b/kernel/trace/trace_btf.c
> @@ -120,3 +120,109 @@ const struct btf_member *btf_find_struct_member(struct btf *btf,
>  	return member;
>  }
>  
> +#define BITS_ROUNDDOWN_BYTES(bits) ((bits) >> 3)
> +
> +static int find_member(const char *ptr, struct btf *btf,
> +		       const struct btf_type **type, int level)
> +{
> +	const struct btf_member *member;
> +	const struct btf_type *t = *type;
> +	int i;
> +
> +	/* Max of 3 depth of anonymous structures */
> +	if (level > 3)
> +		return -1;

Please return an error code, maybe this is -E2BIG?

> +
> +	for_each_member(i, t, member) {
> +		const char *tname = btf_name_by_offset(btf, member->name_off);
> +
> +		if (strcmp(ptr, tname) == 0) {
> +			*type = btf_type_by_id(btf, member->type);
> +			return BITS_ROUNDDOWN_BYTES(member->offset);
> +		}
> +
> +		/* Handle anonymous structures */
> +		if (strlen(tname))
> +			continue;
> +
> +		*type = btf_type_by_id(btf, member->type);
> +		if (btf_type_is_struct(*type)) {
> +			int offset = find_member(ptr, btf, type, level + 1);
> +
> +			if (offset < 0)
> +				continue;
> +
> +			return offset + BITS_ROUNDDOWN_BYTES(member->offset);
> +		}
> +	}
> +
> +	return -1;

	return -ENOENT;

> +}
> +
> +/**
> + * btf_find_offset - Find an offset of a member for a structure
> + * @arg: A structure name followed by one or more members
> + * @offset_p: A pointer to where to store the offset
> + *
> + * Will parse @arg with the expected format of: struct.member[[.member]..]
> + * It is delimited by '.'. The first item must be a structure type.
> + * The next are its members. If the member is also of a structure type it
> + * another member may follow ".member".
> + *
> + * Note, @arg is modified but will be put back to what it was on return.
> + *
> + * Returns: 0 on success and -EINVAL if no '.' is present
> + *    or -ENXIO if the structure or member is not found.
> + *    Returns -EINVAL if BTF is not defined.
> + *  On success, @offset_p will contain the offset of the member specified
> + *    by @arg.
> + */
> +int btf_find_offset(char *arg, long *offset_p)
> +{
> +	const struct btf_type *t;
> +	struct btf *btf;
> +	long offset = 0;
> +	char *ptr;
> +	int ret;
> +	s32 id;
> +
> +	ptr = strchr(arg, '.');
> +	if (!ptr)
> +		return -EINVAL;

Instead of just returning error, can't we log an error for helping user?

trace_probe_log_err(BYTE_OFFSET, ERROR_CODE);

The base offset is stored in ctx->offset, so you can use it.
ERROR_CODE is defined in trace_probe.h. 

Maybe you can add something like

	C(BAD_STRUCT_FMT,		"Symbolic offset must be +STRUCT.MEMBER format"),\

And for other cases, you can use

	C(BAD_BTF_TID,		"Failed to get BTF type info."),\

> +
> +	*ptr = '\0';
> +
> +	id = bpf_find_btf_id(arg, BTF_KIND_STRUCT, &btf);
> +	if (id < 0)
> +		goto error;
> +
> +	/* Get BTF_KIND_FUNC type */
> +	t = btf_type_by_id(btf, id);
> +
> +	/* May allow more than one member, as long as they are structures */
> +	do {
> +		if (!t || !btf_type_is_struct(t))
> +			goto error;
> +
> +		*ptr++ = '.';
> +		arg = ptr;
> +		ptr = strchr(ptr, '.');
> +		if (ptr)
> +			*ptr = '\0';
> +
> +		ret = find_member(arg, btf, &t, 0);
> +		if (ret < 0)
> +			goto error;
> +
> +		offset += ret;
> +
> +	} while (ptr);
> +
> +	*offset_p = offset;
> +	return 0;
> +
> +error:
> +	if (ptr)
> +		*ptr = '.';
> +	return -ENXIO;
> +}
> diff --git a/kernel/trace/trace_btf.h b/kernel/trace/trace_btf.h
> index 4bc44bc261e6..7b0797a6050b 100644
> --- a/kernel/trace/trace_btf.h
> +++ b/kernel/trace/trace_btf.h
> @@ -9,3 +9,13 @@ const struct btf_member *btf_find_struct_member(struct btf *btf,
>  						const struct btf_type *type,
>  						const char *member_name,
>  						u32 *anon_offset);
> +
> +#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
> +/* Will modify arg, but will put it back before returning. */
> +int btf_find_offset(char *arg, long *offset);
> +#else
> +static inline int btf_find_offset(char *arg, long *offset)
> +{

Here also should use 

	C(NOSUP_BTFARG,		"BTF is not available or not supported"),	\


Thank you,

> +	return -EINVAL;
> +}
> +#endif
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 424751cdf31f..4c13e51ea481 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -1137,7 +1137,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
>  
>  	case '+':	/* deref memory */
>  	case '-':
> -		if (arg[1] == 'u') {
> +		if (arg[1] == 'u' && isdigit(arg[2])) {
>  			deref = FETCH_OP_UDEREF;
>  			arg[1] = arg[0];
>  			arg++;
> @@ -1150,7 +1150,10 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
>  			return -EINVAL;
>  		}
>  		*tmp = '\0';
> -		ret = kstrtol(arg, 0, &offset);
> +		if (arg[0] != '-' && !isdigit(*arg))
> +			ret = btf_find_offset(arg, &offset);
> +		else
> +			ret = kstrtol(arg, 0, &offset);
>  		if (ret) {
>  			trace_probe_log_err(ctx->offset, BAD_DEREF_OFFS);
>  			break;
> -- 
> 2.47.2
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing/probes: Allow use of BTF names to dereference pointers
  2025-07-30 13:53 ` Masami Hiramatsu
@ 2025-07-30 14:33   ` Steven Rostedt
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2025-07-30 14:33 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: LKML, Linux trace kernel, bpf, Mathieu Desnoyers, Mark Rutland,
	Peter Zijlstra, Namhyung Kim, Takaya Saeki, Douglas Raillard,
	Tom Zanussi, Andrew Morton, Thomas Gleixner, Ian Rogers, aahringo

On Wed, 30 Jul 2025 22:53:40 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:


> > If BTF is in the kernel, it can be used to find this with names, where the
> > user doesn't need to find the actual offset:
> > 
> >  # echo 'f:cache kmem_cache_alloc_noprof size=+kmem_cache.size($arg1):u32' >> dynamic_events  
> 
> Great! This is something like a evolution of assembler to "symbolic"
> assembler. ;)

Yep!

> 
> > 
> > Instead of the "+0x18", it would have "+kmem_cache.size" where the format is:
> > 
> >   +STRUCT.MEMBER[.MEMBER[..]]  
> 
> Yeah, and using '.' delimiter looks nice to me.

I know I initially suggested using ':' but then it hit me that a structure
uses '.' and that made a lot more sense. Also, it made parsing easier as I
had a hack to get around the ':' parsing that extracted the "type"
(like :u64 or :string)

> 
> > 
> > The delimiter is '.' and the first item is the structure name. Then the
> > member of the structure to get the offset of. If that member is an
> > embedded structure, another '.MEMBER' may be added to get the offset of
> > its members with respect to the original value.
> > 
> >   "+kmem_cache.size($arg1)" is equivalent to:
> > 
> >   (*(struct kmem_cache *)$arg1).size
> > 
> > Anonymous structures are also handled:
> > 
> >   # echo 'e:xmit net.net_dev_xmit +net_device.name(+sk_buff.dev($skbaddr)):string' >> dynamic_events  
> 
> So this only replaces the "offset" part. So we still need to use
> +OFFS() syntax for dereferencing the pointer.

Correct.


> > And produces:
> > 
> >        trace-cmd-1381    [002] ...1.  2082.676268: read: (filemap_readahead.isra.0+0x0/0x150) file="trace.dat"
> >   
> 
> OK, the desgin looks good to me. I have some comments below.

I'd expected as much ;-)

This is for the next merge window so we have plenty of time.

> >  
> > +#define BITS_ROUNDDOWN_BYTES(bits) ((bits) >> 3)
> > +
> > +static int find_member(const char *ptr, struct btf *btf,
> > +		       const struct btf_type **type, int level)
> > +{
> > +	const struct btf_member *member;
> > +	const struct btf_type *t = *type;
> > +	int i;
> > +
> > +	/* Max of 3 depth of anonymous structures */
> > +	if (level > 3)
> > +		return -1;  
> 
> Please return an error code, maybe this is -E2BIG?

OK.

I was thinking that perhaps we should update the error_log to handle these
cases too.

> 
> > +
> > +	for_each_member(i, t, member) {
> > +		const char *tname = btf_name_by_offset(btf, member->name_off);
> > +
> > +		if (strcmp(ptr, tname) == 0) {
> > +			*type = btf_type_by_id(btf, member->type);
> > +			return BITS_ROUNDDOWN_BYTES(member->offset);
> > +		}
> > +
> > +		/* Handle anonymous structures */
> > +		if (strlen(tname))
> > +			continue;
> > +
> > +		*type = btf_type_by_id(btf, member->type);
> > +		if (btf_type_is_struct(*type)) {
> > +			int offset = find_member(ptr, btf, type, level + 1);
> > +
> > +			if (offset < 0)
> > +				continue;
> > +
> > +			return offset + BITS_ROUNDDOWN_BYTES(member->offset);
> > +		}
> > +	}
> > +
> > +	return -1;  
> 
> 	return -ENOENT;

Ah. This return doesn't propagate up to the caller. It's a static function
and just returns "not found". Which means that the "-E2BIG" will not be used.

If we want the -E2BIG to be used as well, then we can return the result of
this function.


> 
> > +}
> > +
> > +/**
> > + * btf_find_offset - Find an offset of a member for a structure
> > + * @arg: A structure name followed by one or more members
> > + * @offset_p: A pointer to where to store the offset
> > + *
> > + * Will parse @arg with the expected format of: struct.member[[.member]..]
> > + * It is delimited by '.'. The first item must be a structure type.
> > + * The next are its members. If the member is also of a structure type it
> > + * another member may follow ".member".
> > + *
> > + * Note, @arg is modified but will be put back to what it was on return.
> > + *
> > + * Returns: 0 on success and -EINVAL if no '.' is present
> > + *    or -ENXIO if the structure or member is not found.
> > + *    Returns -EINVAL if BTF is not defined.
> > + *  On success, @offset_p will contain the offset of the member specified
> > + *    by @arg.
> > + */
> > +int btf_find_offset(char *arg, long *offset_p)
> > +{
> > +	const struct btf_type *t;
> > +	struct btf *btf;
> > +	long offset = 0;
> > +	char *ptr;
> > +	int ret;
> > +	s32 id;
> > +
> > +	ptr = strchr(arg, '.');
> > +	if (!ptr)
> > +		return -EINVAL;  
> 
> Instead of just returning error, can't we log an error for helping user?

Yeah, but that will require the caller of this to handle it. I rather not
have the probe error logging code put in this file.

That means the negative numbers will need to mean something so that the
trace probe logic can know what to report.

> 
> trace_probe_log_err(BYTE_OFFSET, ERROR_CODE);
> 
> The base offset is stored in ctx->offset, so you can use it.
> ERROR_CODE is defined in trace_probe.h. 
> 
> Maybe you can add something like
> 
> 	C(BAD_STRUCT_FMT,		"Symbolic offset must be +STRUCT.MEMBER format"),\
> 
> And for other cases, you can use
> 
> 	C(BAD_BTF_TID,		"Failed to get BTF type info."),\

OK, so the caller can handle this (see below).

> 
> > +
> > +	*ptr = '\0';
> > +
> > +	id = bpf_find_btf_id(arg, BTF_KIND_STRUCT, &btf);
> > +	if (id < 0)
> > +		goto error;
> > +
> > +	/* Get BTF_KIND_FUNC type */
> > +	t = btf_type_by_id(btf, id);
> > +
> > +	/* May allow more than one member, as long as they are structures */
> > +	do {
> > +		if (!t || !btf_type_is_struct(t))
> > +			goto error;
> > +
> > +		*ptr++ = '.';
> > +		arg = ptr;
> > +		ptr = strchr(ptr, '.');
> > +		if (ptr)
> > +			*ptr = '\0';
> > +
> > +		ret = find_member(arg, btf, &t, 0);
> > +		if (ret < 0)
> > +			goto error;
> > +
> > +		offset += ret;
> > +
> > +	} while (ptr);
> > +
> > +	*offset_p = offset;
> > +	return 0;
> > +
> > +error:
> > +	if (ptr)
> > +		*ptr = '.';
> > +	return -ENXIO;
> > +}
> > diff --git a/kernel/trace/trace_btf.h b/kernel/trace/trace_btf.h
> > index 4bc44bc261e6..7b0797a6050b 100644
> > --- a/kernel/trace/trace_btf.h
> > +++ b/kernel/trace/trace_btf.h
> > @@ -9,3 +9,13 @@ const struct btf_member *btf_find_struct_member(struct btf *btf,
> >  						const struct btf_type *type,
> >  						const char *member_name,
> >  						u32 *anon_offset);
> > +
> > +#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
> > +/* Will modify arg, but will put it back before returning. */
> > +int btf_find_offset(char *arg, long *offset);
> > +#else
> > +static inline int btf_find_offset(char *arg, long *offset)
> > +{  
> 
> Here also should use 
> 
> 	C(NOSUP_BTFARG,		"BTF is not available or not supported"),	\
> 

Again, this should be from the caller.

> 
> Thank you,
> 
> > +	return -EINVAL;

So this can return -ENODEV;


> > +}
> > +#endif
> > diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> > index 424751cdf31f..4c13e51ea481 100644
> > --- a/kernel/trace/trace_probe.c
> > +++ b/kernel/trace/trace_probe.c
> > @@ -1137,7 +1137,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> >  
> >  	case '+':	/* deref memory */
> >  	case '-':
> > -		if (arg[1] == 'u') {
> > +		if (arg[1] == 'u' && isdigit(arg[2])) {
> >  			deref = FETCH_OP_UDEREF;
> >  			arg[1] = arg[0];
> >  			arg++;
> > @@ -1150,7 +1150,10 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> >  			return -EINVAL;
> >  		}
> >  		*tmp = '\0';
> > -		ret = kstrtol(arg, 0, &offset);
> > +		if (arg[0] != '-' && !isdigit(*arg))
> > +			ret = btf_find_offset(arg, &offset);

Here we could have:

			if (ret < 0) {
				int err = 0;
				switch (ret) {
				case -ENODEV: err = NOSUP_BTFARG; break;
				case -E2BIG: err = MEMBER_TOO_DEEP; break;
				case -EINVAL: err = BAD_STRUCT_FMT; break;
				case -ENXIO: err = BAD_BTF_TID; break;
				}
				if (err)
					trace_probe_log_err(ctx->offset, err);
				return ret;
			}

-- Steve

			


> > +		else
> > +			ret = kstrtol(arg, 0, &offset);
> >  		if (ret) {
> >  			trace_probe_log_err(ctx->offset, BAD_DEREF_OFFS);
> >  			break;
> > -- 
> > 2.47.2
> >   
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing/probes: Allow use of BTF names to dereference pointers
  2025-07-29 15:33 [PATCH] tracing/probes: Allow use of BTF names to dereference pointers Steven Rostedt
  2025-07-29 15:40 ` Steven Rostedt
  2025-07-30 13:53 ` Masami Hiramatsu
@ 2025-07-31 11:44 ` Douglas Raillard
  2025-07-31 13:29   ` Steven Rostedt
  2025-07-31 21:52 ` Jiri Olsa
  3 siblings, 1 reply; 9+ messages in thread
From: Douglas Raillard @ 2025-07-31 11:44 UTC (permalink / raw)
  To: Steven Rostedt, LKML, Linux trace kernel, bpf
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Mark Rutland, Peter Zijlstra,
	Namhyung Kim, Takaya Saeki, Tom Zanussi, Andrew Morton,
	Thomas Gleixner, Ian Rogers, aahringo

On 29-07-2025 16:33, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> Add syntax to the FETCHARGS parsing of probes to allow the use of
> structure and member names to get the offsets to dereference pointers.
> 
> Currently, a dereference must be a number, where the user has to figure
> out manually the offset of a member of a structure that they want to
> reference. For example, to get the size of a kmem_cache that was passed to
> the function kmem_cache_alloc_noprof, one would need to do:
> 
>   # cd /sys/kernel/tracing
>   # echo 'f:cache kmem_cache_alloc_noprof size=+0x18($arg1):u32' >> dynamic_events
> This requires knowing that the offset of size is 0x18, which can be found
> with gdb:
> 
>    (gdb) p &((struct kmem_cache *)0)->size
>    $1 = (unsigned int *) 0x18
> 
> If BTF is in the kernel, it can be used to find this with names, where the
> user doesn't need to find the actual offset:
> 
>   # echo 'f:cache kmem_cache_alloc_noprof size=+kmem_cache.size($arg1):u32' >> dynamic_events
> 
> Instead of the "+0x18", it would have "+kmem_cache.size" where the format is:
> 
>    +STRUCT.MEMBER[.MEMBER[..]]
> 
> The delimiter is '.' and the first item is the structure name. Then the
> member of the structure to get the offset of. If that member is an
> embedded structure, another '.MEMBER' may be added to get the offset of
> its members with respect to the original value.
> 
>    "+kmem_cache.size($arg1)" is equivalent to:
> 
>    (*(struct kmem_cache *)$arg1).size
> 
> Anonymous structures are also handled:
> 
>    # echo 'e:xmit net.net_dev_xmit +net_device.name(+sk_buff.dev($skbaddr)):string' >> dynamic_events

Not sure how hard that would be but the type of the expression could probably be inferred from
BTF as well in some cases. Some cases may be ambiguous (like char* that could be either a buffer
to display as hex or a null-terminated ASCII string) but BTF would still allow to restrict
to something sensible (e.g. prevent u32 for a char*).

> 
> Where "+net_device.name(+sk_buff.dev($skbaddr))" is equivalent to:
> 
>    (*(struct net_device *)((*(struct sk_buff *)($skbaddr))->dev)->name)
> > Note that "dev" of struct sk_buff is inside an anonymous structure:
> 
> struct sk_buff {
> 	union {
> 		struct {
> 			/* These two members must be first to match sk_buff_head. */
> 			struct sk_buff		*next;
> 			struct sk_buff		*prev;
> 
> 			union {
> 				struct net_device	*dev;
> 				[..]
> 			};
> 		};
> 		[..]
> 	};
> 
> This will allow up to three deep of anonymous structures before it will
> fail to find a member.
> 
> The above produces:
> 
>      sshd-session-1080    [000] b..5.  1526.337161: xmit: (net.net_dev_xmit) arg1="enp7s0"
> 
> And nested structures can be found by adding more members to the arg:
> 
>    # echo 'f:read filemap_readahead.isra.0 file=+0(+dentry.d_name.name(+file.f_path.dentry($arg2))):string' >> dynamic_events
> 
> The above is equivalent to:
> 
>    *((*(struct dentry *)(*(struct file *)$arg2)->f_path.dentry)->d_name.name)
> 
> And produces:
> 
>         trace-cmd-1381    [002] ...1.  2082.676268: read: (filemap_readahead.isra.0+0x0/0x150) file="trace.dat"
> 
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
> ---
>   Documentation/trace/kprobetrace.rst |   3 +
>   kernel/trace/trace_btf.c            | 106 ++++++++++++++++++++++++++++
>   kernel/trace/trace_btf.h            |  10 +++
>   kernel/trace/trace_probe.c          |   7 +-
>   4 files changed, 124 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
> index 3b6791c17e9b..00273157100c 100644
> --- a/Documentation/trace/kprobetrace.rst
> +++ b/Documentation/trace/kprobetrace.rst
> @@ -54,6 +54,8 @@ Synopsis of kprobe_events
>     $retval	: Fetch return value.(\*2)
>     $comm		: Fetch current task comm.
>     +|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*3)(\*4)
> +  +STRUCT.MEMBER[.MEMBER[..]](FETCHARG) : If BTF is supported, Fetch memory
> +		  at FETCHARG + the offset of MEMBER inside of STRUCT.(\*5)
>     \IMM		: Store an immediate value to the argument.
>     NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
>     FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
> @@ -70,6 +72,7 @@ Synopsis of kprobe_events
>           accesses one register.
>     (\*3) this is useful for fetching a field of data structures.
>     (\*4) "u" means user-space dereference. See :ref:`user_mem_access`.
> +  (\*5) +STRUCT.MEMBER(FETCHARG) is equivalent to (*(struct STRUCT *)(FETCHARG)).MEMBER
>   
>   Function arguments at kretprobe
>   -------------------------------
> diff --git a/kernel/trace/trace_btf.c b/kernel/trace/trace_btf.c
> index 5bbdbcbbde3c..b69404451410 100644
> --- a/kernel/trace/trace_btf.c
> +++ b/kernel/trace/trace_btf.c
> @@ -120,3 +120,109 @@ const struct btf_member *btf_find_struct_member(struct btf *btf,
>   	return member;
>   }
>   
> +#define BITS_ROUNDDOWN_BYTES(bits) ((bits) >> 3)
> +
> +static int find_member(const char *ptr, struct btf *btf,
> +		       const struct btf_type **type, int level)
> +{
> +	const struct btf_member *member;
> +	const struct btf_type *t = *type;
> +	int i;
> +
> +	/* Max of 3 depth of anonymous structures */
> +	if (level > 3)
> +		return -1;
> +
> +	for_each_member(i, t, member) {
> +		const char *tname = btf_name_by_offset(btf, member->name_off);
> +
> +		if (strcmp(ptr, tname) == 0) {
> +			*type = btf_type_by_id(btf, member->type);
> +			return BITS_ROUNDDOWN_BYTES(member->offset);

member->offset does not only contain the offset, and the offset may not be
a multiple of 8:
https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/btf.h#L126

 From the BTF spec (https://docs.kernel.org/bpf/btf.html):

If the kind_flag is set, the btf_member.offset contains
both member bitfield size and bit offset.
The bitfield size and bit offset are calculated as below.:

#define BTF_MEMBER_BITFIELD_SIZE(val)   ((val) >> 24)
#define BTF_MEMBER_BIT_OFFSET(val)      ((val) & 0xffffff)

> +		}
> +
> +		/* Handle anonymous structures */
> +		if (strlen(tname))
> +			continue;
> +
> +		*type = btf_type_by_id(btf, member->type);
> +		if (btf_type_is_struct(*type)) {
> +			int offset = find_member(ptr, btf, type, level + 1);
> +
> +			if (offset < 0)
> +				continue;
> +
> +			return offset + BITS_ROUNDDOWN_BYTES(member->offset);
> +		}
> +	}
> +
> +	return -1;
> +}
> +
> +/**
> + * btf_find_offset - Find an offset of a member for a structure
> + * @arg: A structure name followed by one or more members
> + * @offset_p: A pointer to where to store the offset
> + *
> + * Will parse @arg with the expected format of: struct.member[[.member]..]
> + * It is delimited by '.'. The first item must be a structure type.
> + * The next are its members. If the member is also of a structure type it
> + * another member may follow ".member".
> + *
> + * Note, @arg is modified but will be put back to what it was on return.
> + *
> + * Returns: 0 on success and -EINVAL if no '.' is present
> + *    or -ENXIO if the structure or member is not found.
> + *    Returns -EINVAL if BTF is not defined.
> + *  On success, @offset_p will contain the offset of the member specified
> + *    by @arg.
> + */
> +int btf_find_offset(char *arg, long *offset_p)
> +{
> +	const struct btf_type *t;
> +	struct btf *btf;
> +	long offset = 0;
> +	char *ptr;
> +	int ret;
> +	s32 id;
> +
> +	ptr = strchr(arg, '.');
> +	if (!ptr)
> +		return -EINVAL;
> +
> +	*ptr = '\0';
> +
> +	id = bpf_find_btf_id(arg, BTF_KIND_STRUCT, &btf);
> +	if (id < 0)
> +		goto error;
> +
> +	/* Get BTF_KIND_FUNC type */
> +	t = btf_type_by_id(btf, id);
> +
> +	/* May allow more than one member, as long as they are structures */
> +	do {
> +		if (!t || !btf_type_is_struct(t))
> +			goto error;
> +
> +		*ptr++ = '.';
> +		arg = ptr;
> +		ptr = strchr(ptr, '.');
> +		if (ptr)
> +			*ptr = '\0';
> +
> +		ret = find_member(arg, btf, &t, 0);
> +		if (ret < 0)
> +			goto error;
> +
> +		offset += ret;
> +
> +	} while (ptr);
> +
> +	*offset_p = offset;
> +	return 0;
> +
> +error:
> +	if (ptr)
> +		*ptr = '.';
> +	return -ENXIO;
> +}
> diff --git a/kernel/trace/trace_btf.h b/kernel/trace/trace_btf.h
> index 4bc44bc261e6..7b0797a6050b 100644
> --- a/kernel/trace/trace_btf.h
> +++ b/kernel/trace/trace_btf.h
> @@ -9,3 +9,13 @@ const struct btf_member *btf_find_struct_member(struct btf *btf,
>   						const struct btf_type *type,
>   						const char *member_name,
>   						u32 *anon_offset);
> +
> +#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
> +/* Will modify arg, but will put it back before returning. */
> +int btf_find_offset(char *arg, long *offset);
> +#else
> +static inline int btf_find_offset(char *arg, long *offset)
> +{
> +	return -EINVAL;
> +}
> +#endif
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 424751cdf31f..4c13e51ea481 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -1137,7 +1137,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
>   
>   	case '+':	/* deref memory */
>   	case '-':
> -		if (arg[1] == 'u') {
> +		if (arg[1] == 'u' && isdigit(arg[2])) {
>   			deref = FETCH_OP_UDEREF;
>   			arg[1] = arg[0];
>   			arg++;
> @@ -1150,7 +1150,10 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
>   			return -EINVAL;
>   		}
>   		*tmp = '\0';
> -		ret = kstrtol(arg, 0, &offset);
> +		if (arg[0] != '-' && !isdigit(*arg))
> +			ret = btf_find_offset(arg, &offset);
> +		else
> +			ret = kstrtol(arg, 0, &offset);
>   		if (ret) {
>   			trace_probe_log_err(ctx->offset, BAD_DEREF_OFFS);
>   			break;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing/probes: Allow use of BTF names to dereference pointers
  2025-07-31 11:44 ` Douglas Raillard
@ 2025-07-31 13:29   ` Steven Rostedt
  2025-07-31 15:42     ` Douglas Raillard
  0 siblings, 1 reply; 9+ messages in thread
From: Steven Rostedt @ 2025-07-31 13:29 UTC (permalink / raw)
  To: Douglas Raillard
  Cc: LKML, Linux trace kernel, bpf, Masami Hiramatsu,
	Mathieu Desnoyers, Mark Rutland, Peter Zijlstra, Namhyung Kim,
	Takaya Saeki, Tom Zanussi, Andrew Morton, Thomas Gleixner,
	Ian Rogers, aahringo

On Thu, 31 Jul 2025 12:44:49 +0100
Douglas Raillard <douglas.raillard@arm.com> wrote:

> > The delimiter is '.' and the first item is the structure name. Then the
> > member of the structure to get the offset of. If that member is an
> > embedded structure, another '.MEMBER' may be added to get the offset of
> > its members with respect to the original value.
> > 
> >    "+kmem_cache.size($arg1)" is equivalent to:
> > 
> >    (*(struct kmem_cache *)$arg1).size
> > 
> > Anonymous structures are also handled:
> > 
> >    # echo 'e:xmit net.net_dev_xmit +net_device.name(+sk_buff.dev($skbaddr)):string' >> dynamic_events  
> 
> Not sure how hard that would be but the type of the expression could probably be inferred from
> BTF as well in some cases. Some cases may be ambiguous (like char* that could be either a buffer
> to display as hex or a null-terminated ASCII string) but BTF would still allow to restrict
> to something sensible (e.g. prevent u32 for a char*).

Hmm, should be possible, but would require passing that information back to
the caller of the BTF lookup function.



> > diff --git a/kernel/trace/trace_btf.c b/kernel/trace/trace_btf.c
> > index 5bbdbcbbde3c..b69404451410 100644
> > --- a/kernel/trace/trace_btf.c
> > +++ b/kernel/trace/trace_btf.c
> > @@ -120,3 +120,109 @@ const struct btf_member *btf_find_struct_member(struct btf *btf,
> >   	return member;
> >   }
> >   
> > +#define BITS_ROUNDDOWN_BYTES(bits) ((bits) >> 3)
> > +
> > +static int find_member(const char *ptr, struct btf *btf,
> > +		       const struct btf_type **type, int level)
> > +{
> > +	const struct btf_member *member;
> > +	const struct btf_type *t = *type;
> > +	int i;
> > +
> > +	/* Max of 3 depth of anonymous structures */
> > +	if (level > 3)
> > +		return -1;
> > +
> > +	for_each_member(i, t, member) {
> > +		const char *tname = btf_name_by_offset(btf, member->name_off);
> > +
> > +		if (strcmp(ptr, tname) == 0) {
> > +			*type = btf_type_by_id(btf, member->type);
> > +			return BITS_ROUNDDOWN_BYTES(member->offset);  
> 
> member->offset does not only contain the offset, and the offset may not be
> a multiple of 8:
> https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/btf.h#L126
> 
>  From the BTF spec (https://docs.kernel.org/bpf/btf.html):
> 
> If the kind_flag is set, the btf_member.offset contains
> both member bitfield size and bit offset.
> The bitfield size and bit offset are calculated as below.:
> 
> #define BTF_MEMBER_BITFIELD_SIZE(val)   ((val) >> 24)
> #define BTF_MEMBER_BIT_OFFSET(val)      ((val) & 0xffffff)

So basically just need to change that to:

		if (strcmp(ptr, tname) == 0) {
			int offset = BTF_MEMBER_BIT_OFFSET(member->offset);
			*type = btf_type_by_id(btf, member->type);
			return BITS_ROUNDDOWN_BYTES(offset);

?

> 
> > +		}
> > +
> > +		/* Handle anonymous structures */
> > +		if (strlen(tname))
> > +			continue;
> > +
> > +		*type = btf_type_by_id(btf, member->type);
> > +		if (btf_type_is_struct(*type)) {
> > +			int offset = find_member(ptr, btf, type, level + 1);
> > +
> > +			if (offset < 0)
> > +				continue;
> > +
> > +			return offset + BITS_ROUNDDOWN_BYTES(member->offset);

And here too.

-- Steve

> > +		}
> > +	}
> > +
> > +	return -1;
> > +}
> > +

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing/probes: Allow use of BTF names to dereference pointers
  2025-07-31 13:29   ` Steven Rostedt
@ 2025-07-31 15:42     ` Douglas Raillard
  0 siblings, 0 replies; 9+ messages in thread
From: Douglas Raillard @ 2025-07-31 15:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux trace kernel, bpf, Masami Hiramatsu,
	Mathieu Desnoyers, Mark Rutland, Peter Zijlstra, Namhyung Kim,
	Takaya Saeki, Tom Zanussi, Andrew Morton, Thomas Gleixner,
	Ian Rogers, aahringo

On 31-07-2025 14:29, Steven Rostedt wrote:
> On Thu, 31 Jul 2025 12:44:49 +0100
> Douglas Raillard <douglas.raillard@arm.com> wrote:
> 
>>> The delimiter is '.' and the first item is the structure name. Then the
>>> member of the structure to get the offset of. If that member is an
>>> embedded structure, another '.MEMBER' may be added to get the offset of
>>> its members with respect to the original value.
>>>
>>>     "+kmem_cache.size($arg1)" is equivalent to:
>>>
>>>     (*(struct kmem_cache *)$arg1).size
>>>
>>> Anonymous structures are also handled:
>>>
>>>     # echo 'e:xmit net.net_dev_xmit +net_device.name(+sk_buff.dev($skbaddr)):string' >> dynamic_events
>>
>> Not sure how hard that would be but the type of the expression could probably be inferred from
>> BTF as well in some cases. Some cases may be ambiguous (like char* that could be either a buffer
>> to display as hex or a null-terminated ASCII string) but BTF would still allow to restrict
>> to something sensible (e.g. prevent u32 for a char*).
> 
> Hmm, should be possible, but would require passing that information back to
> the caller of the BTF lookup function.
> 
> 
> 
>>> diff --git a/kernel/trace/trace_btf.c b/kernel/trace/trace_btf.c
>>> index 5bbdbcbbde3c..b69404451410 100644
>>> --- a/kernel/trace/trace_btf.c
>>> +++ b/kernel/trace/trace_btf.c
>>> @@ -120,3 +120,109 @@ const struct btf_member *btf_find_struct_member(struct btf *btf,
>>>    	return member;
>>>    }
>>>    
>>> +#define BITS_ROUNDDOWN_BYTES(bits) ((bits) >> 3)
>>> +
>>> +static int find_member(const char *ptr, struct btf *btf,
>>> +		       const struct btf_type **type, int level)
>>> +{
>>> +	const struct btf_member *member;
>>> +	const struct btf_type *t = *type;
>>> +	int i;
>>> +
>>> +	/* Max of 3 depth of anonymous structures */
>>> +	if (level > 3)
>>> +		return -1;
>>> +
>>> +	for_each_member(i, t, member) {
>>> +		const char *tname = btf_name_by_offset(btf, member->name_off);
>>> +
>>> +		if (strcmp(ptr, tname) == 0) {
>>> +			*type = btf_type_by_id(btf, member->type);
>>> +			return BITS_ROUNDDOWN_BYTES(member->offset);
>>
>> member->offset does not only contain the offset, and the offset may not be
>> a multiple of 8:
>> https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/btf.h#L126
>>
>>   From the BTF spec (https://docs.kernel.org/bpf/btf.html):
>>
>> If the kind_flag is set, the btf_member.offset contains
>> both member bitfield size and bit offset.
>> The bitfield size and bit offset are calculated as below.:
>>
>> #define BTF_MEMBER_BITFIELD_SIZE(val)   ((val) >> 24)
>> #define BTF_MEMBER_BIT_OFFSET(val)      ((val) & 0xffffff)
> 
> So basically just need to change that to:
> 
> 		if (strcmp(ptr, tname) == 0) {
> 			int offset = BTF_MEMBER_BIT_OFFSET(member->offset);
> 			*type = btf_type_by_id(btf, member->type);
> 			return BITS_ROUNDDOWN_BYTES(offset);
> 
> ?

This would work in practice for now, but strictly speaking you should check
the kind_flag field in btf_type.info (bit 31) of the parent struct/union:
https://elixir.bootlin.com/linux/v6.16/source/include/uapi/linux/btf.h#L38
  __btf_member_bit_offset() seems to do exactly that.


While writing that, I realized there is another subtlety: BTF encodes int member offsets in 2 different ways:
1. Either their bit offset is encoded struct btf_member, and the btf_type of the member is an integer type with no leading padding bits.
2. Or the rounded-down offset is encoded in struct btf_member and the integer type contains some leading padding bits information:
https://docs.kernel.org/bpf/btf.html#btf-kind-int

The 2nd case is somewhat surprising but BTF_KIND_INT has 3 pieces of information:
1. The C signedness of the type.
2. The number of value bits of the type.
3. The offset of the 1st bit to interpret as being the value. Anything before is leading padding.

That means that the actual bit offset of an int member's value in a parent struct is:

   <offset of the member> + <offset of the type of the member>

You could technically have all members with btf_member.offset == 0 and then encode the actual values offsets in the btf_type of the members.

> 
>>
>>> +		}
>>> +
>>> +		/* Handle anonymous structures */
>>> +		if (strlen(tname))
>>> +			continue;
>>> +
>>> +		*type = btf_type_by_id(btf, member->type);
>>> +		if (btf_type_is_struct(*type)) {
>>> +			int offset = find_member(ptr, btf, type, level + 1);
>>> +
>>> +			if (offset < 0)
>>> +				continue;
>>> +
>>> +			return offset + BITS_ROUNDDOWN_BYTES(member->offset);
> 
> And here too.
> 
> -- Steve
> 
>>> +		}
>>> +	}
>>> +
>>> +	return -1;
>>> +}
>>> +


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing/probes: Allow use of BTF names to dereference pointers
  2025-07-29 15:33 [PATCH] tracing/probes: Allow use of BTF names to dereference pointers Steven Rostedt
                   ` (2 preceding siblings ...)
  2025-07-31 11:44 ` Douglas Raillard
@ 2025-07-31 21:52 ` Jiri Olsa
  2025-07-31 22:15   ` Steven Rostedt
  3 siblings, 1 reply; 9+ messages in thread
From: Jiri Olsa @ 2025-07-31 21:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Linux trace kernel, bpf, Masami Hiramatsu,
	Mathieu Desnoyers, Mark Rutland, Peter Zijlstra, Namhyung Kim,
	Takaya Saeki, Douglas Raillard, Tom Zanussi, Andrew Morton,
	Thomas Gleixner, Ian Rogers, aahringo

On Tue, Jul 29, 2025 at 11:33:35AM -0400, Steven Rostedt wrote:

SNIP

> +/**
> + * btf_find_offset - Find an offset of a member for a structure
> + * @arg: A structure name followed by one or more members
> + * @offset_p: A pointer to where to store the offset
> + *
> + * Will parse @arg with the expected format of: struct.member[[.member]..]
> + * It is delimited by '.'. The first item must be a structure type.
> + * The next are its members. If the member is also of a structure type it
> + * another member may follow ".member".
> + *
> + * Note, @arg is modified but will be put back to what it was on return.
> + *
> + * Returns: 0 on success and -EINVAL if no '.' is present
> + *    or -ENXIO if the structure or member is not found.
> + *    Returns -EINVAL if BTF is not defined.
> + *  On success, @offset_p will contain the offset of the member specified
> + *    by @arg.
> + */
> +int btf_find_offset(char *arg, long *offset_p)
> +{
> +	const struct btf_type *t;
> +	struct btf *btf;
> +	long offset = 0;
> +	char *ptr;
> +	int ret;
> +	s32 id;
> +
> +	ptr = strchr(arg, '.');
> +	if (!ptr)
> +		return -EINVAL;
> +
> +	*ptr = '\0';
> +
> +	id = bpf_find_btf_id(arg, BTF_KIND_STRUCT, &btf);

hi,
I think you need to call btf_put(btf) before return

jirka


> +	if (id < 0)
> +		goto error;
> +
> +	/* Get BTF_KIND_FUNC type */
> +	t = btf_type_by_id(btf, id);
> +
> +	/* May allow more than one member, as long as they are structures */
> +	do {
> +		if (!t || !btf_type_is_struct(t))
> +			goto error;
> +
> +		*ptr++ = '.';
> +		arg = ptr;
> +		ptr = strchr(ptr, '.');
> +		if (ptr)
> +			*ptr = '\0';
> +
> +		ret = find_member(arg, btf, &t, 0);
> +		if (ret < 0)
> +			goto error;
> +
> +		offset += ret;
> +
> +	} while (ptr);
> +
> +	*offset_p = offset;
> +	return 0;
> +
> +error:
> +	if (ptr)
> +		*ptr = '.';
> +	return -ENXIO;
> +}

SNIP

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tracing/probes: Allow use of BTF names to dereference pointers
  2025-07-31 21:52 ` Jiri Olsa
@ 2025-07-31 22:15   ` Steven Rostedt
  0 siblings, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2025-07-31 22:15 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: LKML, Linux trace kernel, bpf, Masami Hiramatsu,
	Mathieu Desnoyers, Mark Rutland, Peter Zijlstra, Namhyung Kim,
	Takaya Saeki, Douglas Raillard, Tom Zanussi, Andrew Morton,
	Thomas Gleixner, Ian Rogers, aahringo

On Thu, 31 Jul 2025 23:52:45 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> > +int btf_find_offset(char *arg, long *offset_p)
> > +{
> > +	const struct btf_type *t;
> > +	struct btf *btf;
> > +	long offset = 0;
> > +	char *ptr;
> > +	int ret;
> > +	s32 id;
> > +
> > +	ptr = strchr(arg, '.');
> > +	if (!ptr)
> > +		return -EINVAL;
> > +
> > +	*ptr = '\0';
> > +
> > +	id = bpf_find_btf_id(arg, BTF_KIND_STRUCT, &btf);  
> 
> hi,
> I think you need to call btf_put(btf) before return

I think you're correct ;-)

-- Steve

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-07-31 22:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29 15:33 [PATCH] tracing/probes: Allow use of BTF names to dereference pointers Steven Rostedt
2025-07-29 15:40 ` Steven Rostedt
2025-07-30 13:53 ` Masami Hiramatsu
2025-07-30 14:33   ` Steven Rostedt
2025-07-31 11:44 ` Douglas Raillard
2025-07-31 13:29   ` Steven Rostedt
2025-07-31 15:42     ` Douglas Raillard
2025-07-31 21:52 ` Jiri Olsa
2025-07-31 22:15   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).