linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/9] tracing: Improbe BTF support on probe events
@ 2023-07-31  7:30 Masami Hiramatsu (Google)
  2023-07-31  7:30 ` [PATCH v4 1/9] tracing/probes: Support BTF argument on module functions Masami Hiramatsu (Google)
                   ` (8 more replies)
  0 siblings, 9 replies; 47+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-07-31  7:30 UTC (permalink / raw)
  To: linux-trace-kernel
  Cc: linux-kernel, Steven Rostedt, mhiramat, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

Hi,

Here is the 4th version of series to improve the BTF support on probe events.
The previous series is here:

https://lore.kernel.org/all/169037639315.607919.2613476171148037242.stgit@devnote2/

This version updates the btf_find_struct_member() to use a simple stack
to walk through the anonymous unions/structures instead of recursive call.
Also, returning int error code from query_btf_context()
if !CONFIG_PROBE_EVENTS_BTF_ARGS.

This series enables {f,k}probe events to access the members of data
structures using BTF from arguments and return value. This also adds
some new APIs to BTF so that other users can use BTF to find function
prototypes and the members of data structures.

This series can be applied on top of "probes/core" branch of
https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git/

You can also get this series from:

git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git topic/fprobe-event-ext

Thank you,

---

Masami Hiramatsu (Google) (9):
      tracing/probes: Support BTF argument on module functions
      bpf/btf: tracing: Move finding func-proto API and getting func-param API to BTF
      bpf/btf: Add a function to search a member of a struct/union
      tracing/probes: Support BTF based data structure field access
      tracing/probes: Support BTF field access from $retval
      tracing/probes: Add string type check with BTF
      tracing/fprobe-event: Assume fprobe is a return event by $retval
      selftests/ftrace: Add BTF fields access testcases
      Documentation: tracing: Update fprobe event example with BTF field


 Documentation/trace/fprobetrace.rst                |   64 ++-
 include/linux/btf.h                                |    8 
 kernel/bpf/btf.c                                   |   89 ++++
 kernel/trace/trace_eprobe.c                        |    4 
 kernel/trace/trace_fprobe.c                        |   59 ++
 kernel/trace/trace_kprobe.c                        |    1 
 kernel/trace/trace_probe.c                         |  494 +++++++++++++++-----
 kernel/trace/trace_probe.h                         |   27 +
 kernel/trace/trace_uprobe.c                        |    1 
 .../ftrace/test.d/dynevent/add_remove_btfarg.tc    |   14 +
 .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |    6 
 11 files changed, 593 insertions(+), 174 deletions(-)

--
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v4 1/9] tracing/probes: Support BTF argument on module functions
  2023-07-31  7:30 [PATCH v4 0/9] tracing: Improbe BTF support on probe events Masami Hiramatsu (Google)
@ 2023-07-31  7:30 ` Masami Hiramatsu (Google)
  2023-07-31  7:30 ` [PATCH v4 2/9] bpf/btf: tracing: Move finding func-proto API and getting func-param API to BTF Masami Hiramatsu (Google)
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-07-31  7:30 UTC (permalink / raw)
  To: linux-trace-kernel
  Cc: linux-kernel, Steven Rostedt, mhiramat, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Since the btf returned from bpf_get_btf_vmlinux() only covers functions in
the vmlinux, BTF argument is not available on the functions in the modules.
Use bpf_find_btf_id() instead of bpf_get_btf_vmlinux()+btf_find_name_kind()
so that BTF argument can find the correct struct btf and btf_type in it.
With this fix, fprobe events can use `$arg*` on module functions as below

 # grep nf_log_ip_packet /proc/kallsyms
ffffffffa0005c00 t nf_log_ip_packet	[nf_log_syslog]
ffffffffa0005bf0 t __pfx_nf_log_ip_packet	[nf_log_syslog]
 # echo 'f nf_log_ip_packet $arg*' > dynamic_events
 # cat dynamic_events
f:fprobes/nf_log_ip_packet__entry nf_log_ip_packet net=net pf=pf hooknum=hooknum skb=skb in=in out=out loginfo=loginfo prefix=prefix

To support the module's btf which is removable, the struct btf needs to be
ref-counted. So this also records the btf in the traceprobe_parse_context
and returns the refcount when the parse has done.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v3:
  - Newly added.
---
 include/linux/btf.h         |    1 
 kernel/bpf/btf.c            |    2 -
 kernel/trace/trace_eprobe.c |    4 --
 kernel/trace/trace_fprobe.c |    1 
 kernel/trace/trace_kprobe.c |    1 
 kernel/trace/trace_probe.c  |  100 +++++++++++++++++++++++++------------------
 kernel/trace/trace_probe.h  |   14 +++++-
 kernel/trace/trace_uprobe.c |    1 
 8 files changed, 75 insertions(+), 49 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index cac9f304e27a..dbfe41a09c4b 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -211,6 +211,7 @@ struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type
 int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec);
 bool btf_type_is_void(const struct btf_type *t);
 s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
+s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p);
 const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,
 					       u32 id, u32 *res_id);
 const struct btf_type *btf_type_resolve_ptr(const struct btf *btf,
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 817204d53372..b9b0eb1189bb 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -552,7 +552,7 @@ s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind)
 	return -ENOENT;
 }
 
-static s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p)
+s32 bpf_find_btf_id(const char *name, u32 kind, struct btf **btf_p)
 {
 	struct btf *btf;
 	s32 ret;
diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c
index a0a704ba27db..09bac836d72b 100644
--- a/kernel/trace/trace_eprobe.c
+++ b/kernel/trace/trace_eprobe.c
@@ -807,13 +807,11 @@ static int trace_eprobe_tp_update_arg(struct trace_eprobe *ep, const char *argv[
 	int ret;
 
 	ret = traceprobe_parse_probe_arg(&ep->tp, i, argv[i], &ctx);
-	if (ret)
-		return ret;
-
 	/* Handle symbols "@" */
 	if (!ret)
 		ret = traceprobe_update_arg(&ep->tp.args[i]);
 
+	traceprobe_finish_parse(&ctx);
 	return ret;
 }
 
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index dfe2e546acdc..8f43f1f65b1b 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -1096,6 +1096,7 @@ static int __trace_fprobe_create(int argc, const char *argv[])
 	}
 
 out:
+	traceprobe_finish_parse(&ctx);
 	trace_probe_log_clear();
 	kfree(new_argv);
 	kfree(symbol);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 23dba01831f7..cc822f69bfe8 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -907,6 +907,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])
 	}
 
 out:
+	traceprobe_finish_parse(&ctx);
 	trace_probe_log_clear();
 	kfree(new_argv);
 	kfree(symbol);
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index c68a72707852..ecbe28f8d676 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -304,16 +304,6 @@ static int parse_trace_event_arg(char *arg, struct fetch_insn *code,
 
 #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
 
-static struct btf *traceprobe_get_btf(void)
-{
-	struct btf *btf = bpf_get_btf_vmlinux();
-
-	if (IS_ERR_OR_NULL(btf))
-		return NULL;
-
-	return btf;
-}
-
 static u32 btf_type_int(const struct btf_type *t)
 {
 	return *(u32 *)(t + 1);
@@ -371,42 +361,49 @@ static const char *type_from_btf_id(struct btf *btf, s32 id)
 	return NULL;
 }
 
-static const struct btf_type *find_btf_func_proto(const char *funcname)
+static const struct btf_type *find_btf_func_proto(const char *funcname,
+						  struct btf **btf_p)
 {
-	struct btf *btf = traceprobe_get_btf();
 	const struct btf_type *t;
+	struct btf *btf = NULL;
 	s32 id;
 
-	if (!btf || !funcname)
+	if (!funcname)
 		return ERR_PTR(-EINVAL);
 
-	id = btf_find_by_name_kind(btf, funcname, BTF_KIND_FUNC);
+	id = bpf_find_btf_id(funcname, BTF_KIND_FUNC, &btf);
 	if (id <= 0)
 		return ERR_PTR(-ENOENT);
 
 	/* Get BTF_KIND_FUNC type */
 	t = btf_type_by_id(btf, id);
 	if (!t || !btf_type_is_func(t))
-		return ERR_PTR(-ENOENT);
+		goto err;
 
 	/* The type of BTF_KIND_FUNC is BTF_KIND_FUNC_PROTO */
 	t = btf_type_by_id(btf, t->type);
 	if (!t || !btf_type_is_func_proto(t))
-		return ERR_PTR(-ENOENT);
+		goto err;
 
+	*btf_p = btf;
 	return t;
+
+err:
+	btf_put(btf);
+	return ERR_PTR(-ENOENT);
 }
 
 static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr,
-						   bool tracepoint)
+						   struct btf **btf_p, bool tracepoint)
 {
 	const struct btf_param *param;
 	const struct btf_type *t;
+	struct btf *btf;
 
 	if (!funcname || !nr)
 		return ERR_PTR(-EINVAL);
 
-	t = find_btf_func_proto(funcname);
+	t = find_btf_func_proto(funcname, &btf);
 	if (IS_ERR(t))
 		return (const struct btf_param *)t;
 
@@ -419,29 +416,37 @@ static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr
 		param++;
 	}
 
-	if (*nr > 0)
+	if (*nr > 0) {
+		*btf_p = btf;
 		return param;
-	else
-		return NULL;
+	}
+
+	btf_put(btf);
+	return NULL;
+}
+
+static void clear_btf_context(struct traceprobe_parse_context *ctx)
+{
+	if (ctx->btf) {
+		btf_put(ctx->btf);
+		ctx->btf = NULL;
+		ctx->params = NULL;
+		ctx->nr_params = 0;
+	}
 }
 
 static int parse_btf_arg(const char *varname, struct fetch_insn *code,
 			 struct traceprobe_parse_context *ctx)
 {
-	struct btf *btf = traceprobe_get_btf();
 	const struct btf_param *params;
 	int i;
 
-	if (!btf) {
-		trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
-		return -EOPNOTSUPP;
-	}
-
 	if (WARN_ON_ONCE(!ctx->funcname))
 		return -EINVAL;
 
 	if (!ctx->params) {
-		params = find_btf_func_param(ctx->funcname, &ctx->nr_params,
+		params = find_btf_func_param(ctx->funcname,
+					     &ctx->nr_params, &ctx->btf,
 					     ctx->flags & TPARG_FL_TPOINT);
 		if (IS_ERR_OR_NULL(params)) {
 			trace_probe_log_err(ctx->offset, NO_BTF_ENTRY);
@@ -452,7 +457,7 @@ static int parse_btf_arg(const char *varname, struct fetch_insn *code,
 		params = ctx->params;
 
 	for (i = 0; i < ctx->nr_params; i++) {
-		const char *name = btf_name_by_offset(btf, params[i].name_off);
+		const char *name = btf_name_by_offset(ctx->btf, params[i].name_off);
 
 		if (name && !strcmp(name, varname)) {
 			code->op = FETCH_OP_ARG;
@@ -470,7 +475,7 @@ static int parse_btf_arg(const char *varname, struct fetch_insn *code,
 static const struct fetch_type *parse_btf_arg_type(int arg_idx,
 					struct traceprobe_parse_context *ctx)
 {
-	struct btf *btf = traceprobe_get_btf();
+	struct btf *btf = ctx->btf;
 	const char *typestr = NULL;
 
 	if (btf && ctx->params) {
@@ -485,14 +490,17 @@ static const struct fetch_type *parse_btf_arg_type(int arg_idx,
 static const struct fetch_type *parse_btf_retval_type(
 					struct traceprobe_parse_context *ctx)
 {
-	struct btf *btf = traceprobe_get_btf();
 	const char *typestr = NULL;
 	const struct btf_type *t;
+	struct btf *btf;
 
-	if (btf && ctx->funcname) {
-		t = find_btf_func_proto(ctx->funcname);
-		if (!IS_ERR(t))
+	if (ctx->funcname) {
+		/* Do not use ctx->btf, because it must be used with ctx->param */
+		t = find_btf_func_proto(ctx->funcname, &btf);
+		if (!IS_ERR(t)) {
 			typestr = type_from_btf_id(btf, t->type);
+			btf_put(btf);
+		}
 	}
 
 	return find_fetch_type(typestr, ctx->flags);
@@ -501,21 +509,25 @@ static const struct fetch_type *parse_btf_retval_type(
 static bool is_btf_retval_void(const char *funcname)
 {
 	const struct btf_type *t;
+	struct btf *btf;
+	bool ret;
 
-	t = find_btf_func_proto(funcname);
+	t = find_btf_func_proto(funcname, &btf);
 	if (IS_ERR(t))
 		return false;
 
-	return t->type == 0;
+	ret = (t->type == 0);
+	btf_put(btf);
+	return ret;
 }
 #else
-static struct btf *traceprobe_get_btf(void)
+static void clear_btf_context(struct traceprobe_parse_context *ctx)
 {
-	return NULL;
+	ctx->btf = NULL;
 }
 
 static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr,
-						   bool tracepoint)
+						   struct btf **btf_p, bool tracepoint)
 {
 	return ERR_PTR(-EOPNOTSUPP);
 }
@@ -1231,7 +1243,6 @@ static int sprint_nth_btf_arg(int idx, const char *type,
 			      char *buf, int bufsize,
 			      struct traceprobe_parse_context *ctx)
 {
-	struct btf *btf = traceprobe_get_btf();
 	const char *name;
 	int ret;
 
@@ -1239,7 +1250,7 @@ static int sprint_nth_btf_arg(int idx, const char *type,
 		trace_probe_log_err(0, NO_BTFARG);
 		return -ENOENT;
 	}
-	name = btf_name_by_offset(btf, ctx->params[idx].name_off);
+	name = btf_name_by_offset(ctx->btf, ctx->params[idx].name_off);
 	if (!name) {
 		trace_probe_log_err(0, NO_BTF_ENTRY);
 		return -ENOENT;
@@ -1271,7 +1282,7 @@ const char **traceprobe_expand_meta_args(int argc, const char *argv[],
 		return NULL;
 	}
 
-	params = find_btf_func_param(ctx->funcname, &nr_params,
+	params = find_btf_func_param(ctx->funcname, &nr_params, &ctx->btf,
 				     ctx->flags & TPARG_FL_TPOINT);
 	if (IS_ERR_OR_NULL(params)) {
 		if (args_idx != -1) {
@@ -1337,6 +1348,11 @@ const char **traceprobe_expand_meta_args(int argc, const char *argv[],
 	return ERR_PTR(ret);
 }
 
+void traceprobe_finish_parse(struct traceprobe_parse_context *ctx)
+{
+	clear_btf_context(ctx);
+}
+
 int traceprobe_update_arg(struct probe_arg *arg)
 {
 	struct fetch_insn *code = arg->code;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 01ea148723de..4dc91460a75d 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -383,9 +383,11 @@ static inline bool tparg_is_function_entry(unsigned int flags)
 
 struct traceprobe_parse_context {
 	struct trace_event_call *event;
-	const struct btf_param *params;
-	s32 nr_params;
-	const char *funcname;
+	/* BTF related parameters */
+	const char *funcname;		/* Function name in BTF */
+	const struct btf_param *params;	/* Parameter of the function */
+	s32 nr_params;			/* The number of the parameters */
+	struct btf *btf;		/* The BTF to be used */
 	unsigned int flags;
 	int offset;
 };
@@ -400,6 +402,12 @@ const char **traceprobe_expand_meta_args(int argc, const char *argv[],
 extern int traceprobe_update_arg(struct probe_arg *arg);
 extern void traceprobe_free_probe_arg(struct probe_arg *arg);
 
+/*
+ * If either traceprobe_parse_probe_arg() or traceprobe_expand_meta_args() is called,
+ * this MUST be called for clean up the context and return a resource.
+ */
+void traceprobe_finish_parse(struct traceprobe_parse_context *ctx);
+
 extern int traceprobe_split_symbol_offset(char *symbol, long *offset);
 int traceprobe_parse_event_name(const char **pevent, const char **pgroup,
 				char *buf, int offset);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 688bf579f2f1..9790f8f0a32d 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -693,6 +693,7 @@ static int __trace_uprobe_create(int argc, const char **argv)
 
 		trace_probe_log_set_index(i + 2);
 		ret = traceprobe_parse_probe_arg(&tu->tp, i, argv[i], &ctx);
+		traceprobe_finish_parse(&ctx);
 		if (ret)
 			goto error;
 	}


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 2/9] bpf/btf: tracing: Move finding func-proto API and getting func-param API to BTF
  2023-07-31  7:30 [PATCH v4 0/9] tracing: Improbe BTF support on probe events Masami Hiramatsu (Google)
  2023-07-31  7:30 ` [PATCH v4 1/9] tracing/probes: Support BTF argument on module functions Masami Hiramatsu (Google)
@ 2023-07-31  7:30 ` Masami Hiramatsu (Google)
  2023-07-31  7:30 ` [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union Masami Hiramatsu (Google)
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-07-31  7:30 UTC (permalink / raw)
  To: linux-trace-kernel
  Cc: linux-kernel, Steven Rostedt, mhiramat, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Move generic function-proto find API and getting function parameter API
to BTF library code from trace_probe.c. This will avoid redundant efforts
on different feature.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v3:
  - Remove perameter check.
  - Fix a typo.
  - Add a type check for btf_get_func_param() and add comment for that.
  - Use bpf_find_btf_id() and add bpf_put().
  - Move the code before btf_show() related code.
---
 include/linux/btf.h        |    4 ++++
 kernel/bpf/btf.c           |   47 +++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_probe.c |   50 +++++++++-----------------------------------
 3 files changed, 61 insertions(+), 40 deletions(-)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index dbfe41a09c4b..20e3a07eef8f 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -222,6 +222,10 @@ const struct btf_type *
 btf_resolve_size(const struct btf *btf, const struct btf_type *type,
 		 u32 *type_size);
 const char *btf_type_str(const struct btf_type *t);
+const struct btf_type *btf_find_func_proto(const char *func_name,
+					   struct btf **btf_p);
+const struct btf_param *btf_get_func_param(const struct btf_type *func_proto,
+					   s32 *nr);
 
 #define for_each_member(i, struct_type, member)			\
 	for (i = 0, member = btf_type_member(struct_type);	\
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index b9b0eb1189bb..f7b25c615269 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -911,6 +911,53 @@ static const struct btf_type *btf_type_skip_qualifiers(const struct btf *btf,
 	return t;
 }
 
+/*
+ * Find a function proto type by name, and return the btf_type with its btf
+ * in *@btf_p. Return NULL if not found.
+ * Note that caller has to call btf_put(*@btf_p) after using the btf_type.
+ */
+const struct btf_type *btf_find_func_proto(const char *func_name, struct btf **btf_p)
+{
+	const struct btf_type *t;
+	s32 id;
+
+	id = bpf_find_btf_id(func_name, BTF_KIND_FUNC, btf_p);
+	if (id < 0)
+		return NULL;
+
+	/* Get BTF_KIND_FUNC type */
+	t = btf_type_by_id(*btf_p, id);
+	if (!t || !btf_type_is_func(t))
+		goto err;
+
+	/* The type of BTF_KIND_FUNC is BTF_KIND_FUNC_PROTO */
+	t = btf_type_by_id(*btf_p, t->type);
+	if (!t || !btf_type_is_func_proto(t))
+		goto err;
+
+	return t;
+err:
+	btf_put(*btf_p);
+	return NULL;
+}
+
+/*
+ * Get function parameter with the number of parameters.
+ * This can return NULL if the function has no parameters.
+ * It can return -EINVAL if the @func_proto is not a function proto type.
+ */
+const struct btf_param *btf_get_func_param(const struct btf_type *func_proto, s32 *nr)
+{
+	if (!btf_type_is_func_proto(func_proto))
+		return ERR_PTR(-EINVAL);
+
+	*nr = btf_type_vlen(func_proto);
+	if (*nr > 0)
+		return (const struct btf_param *)(func_proto + 1);
+	else
+		return NULL;
+}
+
 #define BTF_SHOW_MAX_ITER	10
 
 #define BTF_KIND_BIT(kind)	(1ULL << kind)
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index ecbe28f8d676..21a228d88ebb 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -361,38 +361,6 @@ static const char *type_from_btf_id(struct btf *btf, s32 id)
 	return NULL;
 }
 
-static const struct btf_type *find_btf_func_proto(const char *funcname,
-						  struct btf **btf_p)
-{
-	const struct btf_type *t;
-	struct btf *btf = NULL;
-	s32 id;
-
-	if (!funcname)
-		return ERR_PTR(-EINVAL);
-
-	id = bpf_find_btf_id(funcname, BTF_KIND_FUNC, &btf);
-	if (id <= 0)
-		return ERR_PTR(-ENOENT);
-
-	/* Get BTF_KIND_FUNC type */
-	t = btf_type_by_id(btf, id);
-	if (!t || !btf_type_is_func(t))
-		goto err;
-
-	/* The type of BTF_KIND_FUNC is BTF_KIND_FUNC_PROTO */
-	t = btf_type_by_id(btf, t->type);
-	if (!t || !btf_type_is_func_proto(t))
-		goto err;
-
-	*btf_p = btf;
-	return t;
-
-err:
-	btf_put(btf);
-	return ERR_PTR(-ENOENT);
-}
-
 static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr,
 						   struct btf **btf_p, bool tracepoint)
 {
@@ -403,12 +371,13 @@ static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr
 	if (!funcname || !nr)
 		return ERR_PTR(-EINVAL);
 
-	t = find_btf_func_proto(funcname, &btf);
-	if (IS_ERR(t))
+	t = btf_find_func_proto(funcname, &btf);
+	if (!t)
 		return (const struct btf_param *)t;
 
-	*nr = btf_type_vlen(t);
-	param = (const struct btf_param *)(t + 1);
+	param = btf_get_func_param(t, nr);
+	if (IS_ERR_OR_NULL(param))
+		goto err;
 
 	/* Hide the first 'data' argument of tracepoint */
 	if (tracepoint) {
@@ -421,6 +390,7 @@ static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr
 		return param;
 	}
 
+err:
 	btf_put(btf);
 	return NULL;
 }
@@ -496,8 +466,8 @@ static const struct fetch_type *parse_btf_retval_type(
 
 	if (ctx->funcname) {
 		/* Do not use ctx->btf, because it must be used with ctx->param */
-		t = find_btf_func_proto(ctx->funcname, &btf);
-		if (!IS_ERR(t)) {
+		t = btf_find_func_proto(ctx->funcname, &btf);
+		if (t) {
 			typestr = type_from_btf_id(btf, t->type);
 			btf_put(btf);
 		}
@@ -512,8 +482,8 @@ static bool is_btf_retval_void(const char *funcname)
 	struct btf *btf;
 	bool ret;
 
-	t = find_btf_func_proto(funcname, &btf);
-	if (IS_ERR(t))
+	t = btf_find_func_proto(funcname, &btf);
+	if (!t)
 		return false;
 
 	ret = (t->type == 0);


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-07-31  7:30 [PATCH v4 0/9] tracing: Improbe BTF support on probe events Masami Hiramatsu (Google)
  2023-07-31  7:30 ` [PATCH v4 1/9] tracing/probes: Support BTF argument on module functions Masami Hiramatsu (Google)
  2023-07-31  7:30 ` [PATCH v4 2/9] bpf/btf: tracing: Move finding func-proto API and getting func-param API to BTF Masami Hiramatsu (Google)
@ 2023-07-31  7:30 ` Masami Hiramatsu (Google)
  2023-07-31 21:59   ` Alexei Starovoitov
  2023-07-31  7:30 ` [PATCH v4 4/9] tracing/probes: Support BTF based data structure field access Masami Hiramatsu (Google)
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 47+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-07-31  7:30 UTC (permalink / raw)
  To: linux-trace-kernel
  Cc: linux-kernel, Steven Rostedt, mhiramat, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Add btf_find_struct_member() API to search a member of a given data structure
or union from the member's name.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
---
 Changes in v3:
  - Remove simple input check.
  - Fix unneeded IS_ERR_OR_NULL() check for btf_type_by_id().
  - Move the code next to btf_get_func_param().
  - Use for_each_member() macro instead of for-loop.
  - Use btf_type_skip_modifiers() instead of btf_type_by_id().
 Changes in v4:
  - Use a stack for searching in anonymous members instead of nested call.
---
 include/linux/btf.h |    3 +++
 kernel/bpf/btf.c    |   40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/include/linux/btf.h b/include/linux/btf.h
index 20e3a07eef8f..4b10d57ceee0 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -226,6 +226,9 @@ const struct btf_type *btf_find_func_proto(const char *func_name,
 					   struct btf **btf_p);
 const struct btf_param *btf_get_func_param(const struct btf_type *func_proto,
 					   s32 *nr);
+const struct btf_member *btf_find_struct_member(struct btf *btf,
+						const struct btf_type *type,
+						const char *member_name);
 
 #define for_each_member(i, struct_type, member)			\
 	for (i = 0, member = btf_type_member(struct_type);	\
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index f7b25c615269..8d81a4ffa67b 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -958,6 +958,46 @@ const struct btf_param *btf_get_func_param(const struct btf_type *func_proto, s3
 		return NULL;
 }
 
+#define BTF_ANON_STACK_MAX	16
+
+/*
+ * Find a member of data structure/union by name and return it.
+ * Return NULL if not found, or -EINVAL if parameter is invalid.
+ */
+const struct btf_member *btf_find_struct_member(struct btf *btf,
+						const struct btf_type *type,
+						const char *member_name)
+{
+	const struct btf_type *anon_stack[BTF_ANON_STACK_MAX];
+	const struct btf_member *member;
+	const char *name;
+	int i, top = 0;
+
+retry:
+	if (!btf_type_is_struct(type))
+		return ERR_PTR(-EINVAL);
+
+	for_each_member(i, type, member) {
+		if (!member->name_off) {
+			/* Anonymous union/struct: push it for later use */
+			type = btf_type_skip_modifiers(btf, member->type, NULL);
+			if (type && top < BTF_ANON_STACK_MAX)
+				anon_stack[top++] = type;
+		} else {
+			name = btf_name_by_offset(btf, member->name_off);
+			if (name && !strcmp(member_name, name))
+				return member;
+		}
+	}
+	if (top > 0) {
+		/* Pop from the anonymous stack and retry */
+		type = anon_stack[--top];
+		goto retry;
+	}
+
+	return NULL;
+}
+
 #define BTF_SHOW_MAX_ITER	10
 
 #define BTF_KIND_BIT(kind)	(1ULL << kind)


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 4/9] tracing/probes: Support BTF based data structure field access
  2023-07-31  7:30 [PATCH v4 0/9] tracing: Improbe BTF support on probe events Masami Hiramatsu (Google)
                   ` (2 preceding siblings ...)
  2023-07-31  7:30 ` [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union Masami Hiramatsu (Google)
@ 2023-07-31  7:30 ` Masami Hiramatsu (Google)
  2023-07-31  7:30 ` [PATCH v4 5/9] tracing/probes: Support BTF field access from $retval Masami Hiramatsu (Google)
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-07-31  7:30 UTC (permalink / raw)
  To: linux-trace-kernel
  Cc: linux-kernel, Steven Rostedt, mhiramat, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Using BTF to access the fields of a data structure. You can use this
for accessing the field with '->' or '.' operation with BTF argument.

 # echo 't sched_switch next=next->pid vruntime=next->se.vruntime' \
   > dynamic_events
 # echo 1 > events/tracepoints/sched_switch/enable
 # head -n 40 trace | tail
          <idle>-0       [000] d..3.   272.565382: sched_switch: (__probestub_sched_switch+0x4/0x10) next=26 vruntime=956533179
      kcompactd0-26      [000] d..3.   272.565406: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0
          <idle>-0       [000] d..3.   273.069441: sched_switch: (__probestub_sched_switch+0x4/0x10) next=9 vruntime=956533179
     kworker/0:1-9       [000] d..3.   273.069464: sched_switch: (__probestub_sched_switch+0x4/0x10) next=26 vruntime=956579181
      kcompactd0-26      [000] d..3.   273.069480: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0
          <idle>-0       [000] d..3.   273.141434: sched_switch: (__probestub_sched_switch+0x4/0x10) next=22 vruntime=956533179
    kworker/u2:1-22      [000] d..3.   273.141461: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0
          <idle>-0       [000] d..3.   273.480872: sched_switch: (__probestub_sched_switch+0x4/0x10) next=22 vruntime=956585857
    kworker/u2:1-22      [000] d..3.   273.480905: sched_switch: (__probestub_sched_switch+0x4/0x10) next=70 vruntime=959533179
              sh-70      [000] d..3.   273.481102: sched_switch: (__probestub_sched_switch+0x4/0x10) next=0 vruntime=0

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
---
 Changes in v2:
  - Use new BTF API for finding the member.
 Changes in v3:
  - Update according to previous changes in the series.
---
 kernel/trace/trace_probe.c |  226 +++++++++++++++++++++++++++++++++++++++-----
 kernel/trace/trace_probe.h |   11 ++
 2 files changed, 210 insertions(+), 27 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 21a228d88ebb..f6b855de4256 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -309,16 +309,14 @@ static u32 btf_type_int(const struct btf_type *t)
 	return *(u32 *)(t + 1);
 }
 
-static const char *type_from_btf_id(struct btf *btf, s32 id)
+static const char *fetch_type_from_btf_type(struct btf *btf,
+					const struct btf_type *type,
+					struct traceprobe_parse_context *ctx)
 {
-	const struct btf_type *t;
 	u32 intdata;
-	s32 tid;
 
 	/* TODO: const char * could be converted as a string */
-	t = btf_type_skip_modifiers(btf, id, &tid);
-
-	switch (BTF_INFO_KIND(t->info)) {
+	switch (BTF_INFO_KIND(type->info)) {
 	case BTF_KIND_ENUM:
 		/* enum is "int", so convert to "s32" */
 		return "s32";
@@ -331,7 +329,7 @@ static const char *type_from_btf_id(struct btf *btf, s32 id)
 		else
 			return "x32";
 	case BTF_KIND_INT:
-		intdata = btf_type_int(t);
+		intdata = btf_type_int(type);
 		if (BTF_INT_ENCODING(intdata) & BTF_INT_SIGNED) {
 			switch (BTF_INT_BITS(intdata)) {
 			case 8:
@@ -354,6 +352,10 @@ static const char *type_from_btf_id(struct btf *btf, s32 id)
 			case 64:
 				return "u64";
 			}
+			/* bitfield, size is encoded in the type */
+			ctx->last_bitsize = BTF_INT_BITS(intdata);
+			ctx->last_bitoffs += BTF_INT_OFFSET(intdata);
+			return "u64";
 		}
 	}
 	/* TODO: support other types */
@@ -405,15 +407,132 @@ static void clear_btf_context(struct traceprobe_parse_context *ctx)
 	}
 }
 
-static int parse_btf_arg(const char *varname, struct fetch_insn *code,
+/* Return 1 if the field separater is arrow operator ('->') */
+static int split_next_field(char *varname, char **next_field,
+			    struct traceprobe_parse_context *ctx)
+{
+	char *field;
+	int ret = 0;
+
+	field = strpbrk(varname, ".-");
+	if (field) {
+		if (field[0] == '-' && field[1] == '>') {
+			field[0] = '\0';
+			field += 2;
+			ret = 1;
+		} else if (field[0] == '.') {
+			field[0] = '\0';
+			field += 1;
+		} else {
+			trace_probe_log_err(ctx->offset + field - varname, BAD_HYPHEN);
+			return -EINVAL;
+		}
+		*next_field = field;
+	}
+
+	return ret;
+}
+
+/*
+ * Parse the field of data structure. The @type must be a pointer type
+ * pointing the target data structure type.
+ */
+static int parse_btf_field(char *fieldname, const struct btf_type *type,
+			   struct fetch_insn **pcode, struct fetch_insn *end,
+			   struct traceprobe_parse_context *ctx)
+{
+	struct fetch_insn *code = *pcode;
+	const struct btf_member *field;
+	u32 bitoffs;
+	char *next;
+	int is_ptr;
+	s32 tid;
+
+	do {
+		/* Outer loop for solving arrow operator ('->') */
+		if (BTF_INFO_KIND(type->info) != BTF_KIND_PTR) {
+			trace_probe_log_err(ctx->offset, NO_PTR_STRCT);
+			return -EINVAL;
+		}
+		/* Convert a struct pointer type to a struct type */
+		type = btf_type_skip_modifiers(ctx->btf, type->type, &tid);
+		if (!type) {
+			trace_probe_log_err(ctx->offset, BAD_BTF_TID);
+			return -EINVAL;
+		}
+
+		bitoffs = 0;
+		do {
+			/* Inner loop for solving dot operator ('.') */
+			next = NULL;
+			is_ptr = split_next_field(fieldname, &next, ctx);
+			if (is_ptr < 0)
+				return is_ptr;
+
+			field = btf_find_struct_member(ctx->btf, type, fieldname);
+			if (!field) {
+				trace_probe_log_err(ctx->offset, NO_BTF_FIELD);
+				return -ENOENT;
+			}
+
+			/* Accumulate the bit-offsets of the dot-connected fields */
+			if (btf_type_kflag(type)) {
+				bitoffs += BTF_MEMBER_BIT_OFFSET(field->offset);
+				ctx->last_bitsize = BTF_MEMBER_BITFIELD_SIZE(field->offset);
+			} else {
+				bitoffs += field->offset;
+				ctx->last_bitsize = 0;
+			}
+
+			type = btf_type_skip_modifiers(ctx->btf, field->type, &tid);
+			if (!type) {
+				trace_probe_log_err(ctx->offset, BAD_BTF_TID);
+				return -EINVAL;
+			}
+
+			ctx->offset += next - fieldname;
+			fieldname = next;
+		} while (!is_ptr && fieldname);
+
+		if (++code == end) {
+			trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
+			return -EINVAL;
+		}
+		code->op = FETCH_OP_DEREF;	/* TODO: user deref support */
+		code->offset = bitoffs / 8;
+		*pcode = code;
+
+		ctx->last_bitoffs = bitoffs % 8;
+		ctx->last_type = type;
+	} while (fieldname);
+
+	return 0;
+}
+
+static int parse_btf_arg(char *varname,
+			 struct fetch_insn **pcode, struct fetch_insn *end,
 			 struct traceprobe_parse_context *ctx)
 {
+	struct fetch_insn *code = *pcode;
 	const struct btf_param *params;
-	int i;
+	const struct btf_type *type;
+	char *field = NULL;
+	int i, is_ptr;
+	u32 tid;
 
 	if (WARN_ON_ONCE(!ctx->funcname))
 		return -EINVAL;
 
+	is_ptr = split_next_field(varname, &field, ctx);
+	if (is_ptr < 0)
+		return is_ptr;
+	if (!is_ptr && field) {
+		/* dot-connected field on an argument is not supported. */
+		trace_probe_log_err(ctx->offset + field - varname,
+				    NOSUP_DAT_ARG);
+		return -EOPNOTSUPP;
+	}
+
 	if (!ctx->params) {
 		params = find_btf_func_param(ctx->funcname,
 					     &ctx->nr_params, &ctx->btf,
@@ -435,24 +554,39 @@ static int parse_btf_arg(const char *varname, struct fetch_insn *code,
 				code->param = i + 1;
 			else
 				code->param = i;
-			return 0;
+
+			tid = params[i].type;
+			goto found;
 		}
 	}
 	trace_probe_log_err(ctx->offset, NO_BTFARG);
 	return -ENOENT;
+
+found:
+	type = btf_type_skip_modifiers(ctx->btf, tid, &tid);
+	if (!type) {
+		trace_probe_log_err(ctx->offset, BAD_BTF_TID);
+		return -EINVAL;
+	}
+	/* Initialize the last type information */
+	ctx->last_type = type;
+	ctx->last_bitoffs = 0;
+	ctx->last_bitsize = 0;
+	if (field) {
+		ctx->offset += field - varname;
+		return parse_btf_field(field, type, pcode, end, ctx);
+	}
+	return 0;
 }
 
-static const struct fetch_type *parse_btf_arg_type(int arg_idx,
+static const struct fetch_type *parse_btf_arg_type(
 					struct traceprobe_parse_context *ctx)
 {
 	struct btf *btf = ctx->btf;
 	const char *typestr = NULL;
 
-	if (btf && ctx->params) {
-		if (ctx->flags & TPARG_FL_TPOINT)
-			arg_idx--;
-		typestr = type_from_btf_id(btf, ctx->params[arg_idx].type);
-	}
+	if (btf && ctx->last_type)
+		typestr = fetch_type_from_btf_type(btf, ctx->last_type, ctx);
 
 	return find_fetch_type(typestr, ctx->flags);
 }
@@ -461,14 +595,16 @@ static const struct fetch_type *parse_btf_retval_type(
 					struct traceprobe_parse_context *ctx)
 {
 	const char *typestr = NULL;
-	const struct btf_type *t;
+	const struct btf_type *type;
 	struct btf *btf;
 
 	if (ctx->funcname) {
 		/* Do not use ctx->btf, because it must be used with ctx->param */
-		t = btf_find_func_proto(ctx->funcname, &btf);
-		if (t) {
-			typestr = type_from_btf_id(btf, t->type);
+		type = btf_find_func_proto(ctx->funcname, &btf);
+		if (type) {
+			type = btf_type_skip_modifiers(btf, type->type, NULL);
+			if (!IS_ERR_OR_NULL(type))
+				typestr = fetch_type_from_btf_type(btf, type, ctx);
 			btf_put(btf);
 		}
 	}
@@ -476,6 +612,28 @@ static const struct fetch_type *parse_btf_retval_type(
 	return find_fetch_type(typestr, ctx->flags);
 }
 
+static int parse_btf_bitfield(struct fetch_insn **pcode,
+			      struct traceprobe_parse_context *ctx)
+{
+	struct fetch_insn *code = *pcode;
+
+	if ((ctx->last_bitsize % 8 == 0) && ctx->last_bitoffs == 0)
+		return 0;
+
+	code++;
+	if (code->op != FETCH_OP_NOP) {
+		trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
+		return -EINVAL;
+	}
+	*pcode = code;
+
+	code->op = FETCH_OP_MOD_BF;
+	code->lshift = 64 - (ctx->last_bitsize + ctx->last_bitoffs);
+	code->rshift = 64 - ctx->last_bitsize;
+	code->basesize = 64 / 8;
+	return 0;
+}
+
 static bool is_btf_retval_void(const char *funcname)
 {
 	const struct btf_type *t;
@@ -502,14 +660,22 @@ static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr
 	return ERR_PTR(-EOPNOTSUPP);
 }
 
-static int parse_btf_arg(const char *varname, struct fetch_insn *code,
+static int parse_btf_arg(char *varname,
+			 struct fetch_insn **pcode, struct fetch_insn *end,
 			 struct traceprobe_parse_context *ctx)
 {
 	trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
 	return -EOPNOTSUPP;
 }
 
-#define parse_btf_arg_type(idx, ctx)		\
+static int parse_btf_bitfield(struct fetch_insn **pcode,
+			      struct traceprobe_parse_context *ctx)
+{
+	trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
+	return -EOPNOTSUPP;
+}
+
+#define parse_btf_arg_type(ctx)		\
 	find_fetch_type(NULL, ctx->flags)
 
 #define parse_btf_retval_type(ctx)		\
@@ -777,6 +943,8 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 
 			code->op = deref;
 			code->offset = offset;
+			/* Reset the last type if used */
+			ctx->last_type = NULL;
 		}
 		break;
 	case '\\':	/* Immediate value */
@@ -800,7 +968,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 				trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
 				return -EINVAL;
 			}
-			ret = parse_btf_arg(arg, code, ctx);
+			ret = parse_btf_arg(arg, pcode, end, ctx);
 			break;
 		}
 	}
@@ -946,6 +1114,7 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
 		goto out;
 	code[FETCH_INSN_MAX - 1].op = FETCH_OP_END;
 
+	ctx->last_type = NULL;
 	ret = parse_probe_arg(arg, parg->type, &code, &code[FETCH_INSN_MAX - 1],
 			      ctx);
 	if (ret)
@@ -953,9 +1122,9 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
 
 	/* Update storing type if BTF is available */
 	if (IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS) && !t) {
-		if (code->op == FETCH_OP_ARG)
-			parg->type = parse_btf_arg_type(code->param, ctx);
-		else if (code->op == FETCH_OP_RETVAL)
+		if (ctx->last_type)
+			parg->type = parse_btf_arg_type(ctx);
+		else if (ctx->flags & TPARG_FL_RETURN)
 			parg->type = parse_btf_retval_type(ctx);
 	}
 
@@ -1030,6 +1199,11 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
 			trace_probe_log_err(ctx->offset + t - arg, BAD_BITFIELD);
 			goto fail;
 		}
+	} else if (IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS) &&
+		   ctx->last_type) {
+		ret = parse_btf_bitfield(&code, ctx);
+		if (ret)
+			goto fail;
 	}
 	ret = -EINVAL;
 	/* Loop(Array) operation */
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 4dc91460a75d..6111f1ffca6c 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -388,6 +388,9 @@ struct traceprobe_parse_context {
 	const struct btf_param *params;	/* Parameter of the function */
 	s32 nr_params;			/* The number of the parameters */
 	struct btf *btf;		/* The BTF to be used */
+	const struct btf_type *last_type;	/* Saved type */
+	u32 last_bitoffs;		/* Saved bitoffs */
+	u32 last_bitsize;		/* Saved bitsize */
 	unsigned int flags;
 	int offset;
 };
@@ -503,7 +506,13 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
 	C(BAD_VAR_ARGS,		"$arg* must be an independent parameter without name etc."),\
 	C(NOFENTRY_ARGS,	"$arg* can be used only on function entry"),	\
 	C(DOUBLE_ARGS,		"$arg* can be used only once in the parameters"),	\
-	C(ARGS_2LONG,		"$arg* failed because the argument list is too long"),
+	C(ARGS_2LONG,		"$arg* failed because the argument list is too long"),	\
+	C(ARGIDX_2BIG,		"$argN index is too big"),		\
+	C(NO_PTR_STRCT,		"This is not a pointer to union/structure."),	\
+	C(NOSUP_DAT_ARG,	"Non pointer structure/union argument is not supported."),\
+	C(BAD_HYPHEN,		"Failed to parse single hyphen. Forgot '>'?"),	\
+	C(NO_BTF_FIELD,		"This field is not found."),	\
+	C(BAD_BTF_TID,		"Failed to get BTF type info."),
 
 #undef C
 #define C(a, b)		TP_ERR_##a


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 5/9] tracing/probes: Support BTF field access from $retval
  2023-07-31  7:30 [PATCH v4 0/9] tracing: Improbe BTF support on probe events Masami Hiramatsu (Google)
                   ` (3 preceding siblings ...)
  2023-07-31  7:30 ` [PATCH v4 4/9] tracing/probes: Support BTF based data structure field access Masami Hiramatsu (Google)
@ 2023-07-31  7:30 ` Masami Hiramatsu (Google)
  2023-07-31  7:31 ` [PATCH v4 6/9] tracing/probes: Add string type check with BTF Masami Hiramatsu (Google)
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-07-31  7:30 UTC (permalink / raw)
  To: linux-trace-kernel
  Cc: linux-kernel, Steven Rostedt, mhiramat, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Support BTF argument on '$retval' for function return events including
kretprobe and fprobe for accessing the return value.
This also allows user to access its fields if the return value is a
pointer of a data structure.

E.g.
 # echo 'f getname_flags%return +0($retval->name):string' \
   > dynamic_events
 # echo 1 > events/fprobes/getname_flags__exit/enable
 # ls > /dev/null
 # head -n 40 trace | tail
              ls-87      [000] ...1.  8067.616101: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./function_profile_enabled"
              ls-87      [000] ...1.  8067.616108: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./trace_stat"
              ls-87      [000] ...1.  8067.616115: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./set_graph_notrace"
              ls-87      [000] ...1.  8067.616122: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./set_graph_function"
              ls-87      [000] ...1.  8067.616129: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./set_ftrace_notrace"
              ls-87      [000] ...1.  8067.616135: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./set_ftrace_filter"
              ls-87      [000] ...1.  8067.616143: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./touched_functions"
              ls-87      [000] ...1.  8067.616237: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./enabled_functions"
              ls-87      [000] ...1.  8067.616245: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./available_filter_functions"
              ls-87      [000] ...1.  8067.616253: getname_flags__exit: (vfs_fstatat+0x3c/0x70 <- getname_flags) arg1="./set_ftrace_notrace_pid"


Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v2:
  - Use '$retval' instead of 'retval' because it is confusing.
 Changes in v3:
  - Introduce query_btf_context() to cache the btf related data (function
    prototype) for using common field analyzing code with function
    parameters.
 Changes in v3.1:
  - Return int error code from query_btf_context() if !CONFIG_PROBE_EVENTS_BTF_ARGS
 Changes in v4:
  - Fix wrong BTF access if query_btf_context() is failed in parse_btf_arg().
  - Return error if $retval accesses a field but BTF is not found.
---
 kernel/trace/trace_probe.c |  187 ++++++++++++++++++++------------------------
 kernel/trace/trace_probe.h |    1 
 2 files changed, 86 insertions(+), 102 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index f6b855de4256..8ce7d8f04849 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -363,38 +363,46 @@ static const char *fetch_type_from_btf_type(struct btf *btf,
 	return NULL;
 }
 
-static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr,
-						   struct btf **btf_p, bool tracepoint)
+static int query_btf_context(struct traceprobe_parse_context *ctx)
 {
 	const struct btf_param *param;
-	const struct btf_type *t;
+	const struct btf_type *type;
 	struct btf *btf;
+	s32 nr;
 
-	if (!funcname || !nr)
-		return ERR_PTR(-EINVAL);
+	if (ctx->btf)
+		return 0;
 
-	t = btf_find_func_proto(funcname, &btf);
-	if (!t)
-		return (const struct btf_param *)t;
+	if (!ctx->funcname)
+		return -EINVAL;
 
-	param = btf_get_func_param(t, nr);
-	if (IS_ERR_OR_NULL(param))
-		goto err;
+	type = btf_find_func_proto(ctx->funcname, &btf);
+	if (!type)
+		return -ENOENT;
 
-	/* Hide the first 'data' argument of tracepoint */
-	if (tracepoint) {
-		(*nr)--;
-		param++;
+	ctx->btf = btf;
+	ctx->proto = type;
+
+	/* ctx->params is optional, since func(void) will not have params. */
+	nr = 0;
+	param = btf_get_func_param(type, &nr);
+	if (!IS_ERR_OR_NULL(param)) {
+		/* Hide the first 'data' argument of tracepoint */
+		if (ctx->flags & TPARG_FL_TPOINT) {
+			nr--;
+			param++;
+		}
 	}
 
-	if (*nr > 0) {
-		*btf_p = btf;
-		return param;
+	if (nr > 0) {
+		ctx->nr_params = nr;
+		ctx->params = param;
+	} else {
+		ctx->nr_params = 0;
+		ctx->params = NULL;
 	}
 
-err:
-	btf_put(btf);
-	return NULL;
+	return 0;
 }
 
 static void clear_btf_context(struct traceprobe_parse_context *ctx)
@@ -402,6 +410,7 @@ static void clear_btf_context(struct traceprobe_parse_context *ctx)
 	if (ctx->btf) {
 		btf_put(ctx->btf);
 		ctx->btf = NULL;
+		ctx->proto = NULL;
 		ctx->params = NULL;
 		ctx->nr_params = 0;
 	}
@@ -517,7 +526,7 @@ static int parse_btf_arg(char *varname,
 	const struct btf_param *params;
 	const struct btf_type *type;
 	char *field = NULL;
-	int i, is_ptr;
+	int i, is_ptr, ret;
 	u32 tid;
 
 	if (WARN_ON_ONCE(!ctx->funcname))
@@ -533,17 +542,37 @@ static int parse_btf_arg(char *varname,
 		return -EOPNOTSUPP;
 	}
 
-	if (!ctx->params) {
-		params = find_btf_func_param(ctx->funcname,
-					     &ctx->nr_params, &ctx->btf,
-					     ctx->flags & TPARG_FL_TPOINT);
-		if (IS_ERR_OR_NULL(params)) {
+	if (ctx->flags & TPARG_FL_RETURN) {
+		if (strcmp(varname, "$retval") != 0) {
+			trace_probe_log_err(ctx->offset, NO_BTFARG);
+			return -ENOENT;
+		}
+		code->op = FETCH_OP_RETVAL;
+		/* Check whether the function return type is not void */
+		if (query_btf_context(ctx) == 0) {
+			if (ctx->proto->type == 0) {
+				trace_probe_log_err(ctx->offset, NO_RETVAL);
+				return -ENOENT;
+			}
+			tid = ctx->proto->type;
+			goto found;
+		}
+		if (field) {
+			trace_probe_log_err(ctx->offset + field - varname,
+					    NO_BTF_ENTRY);
+			return -ENOENT;
+		}
+		return 0;
+	}
+
+	if (!ctx->btf) {
+		ret = query_btf_context(ctx);
+		if (ret < 0 || ctx->nr_params == 0) {
 			trace_probe_log_err(ctx->offset, NO_BTF_ENTRY);
 			return PTR_ERR(params);
 		}
-		ctx->params = params;
-	} else
-		params = ctx->params;
+	}
+	params = ctx->params;
 
 	for (i = 0; i < ctx->nr_params; i++) {
 		const char *name = btf_name_by_offset(ctx->btf, params[i].name_off);
@@ -554,7 +583,6 @@ static int parse_btf_arg(char *varname,
 				code->param = i + 1;
 			else
 				code->param = i;
-
 			tid = params[i].type;
 			goto found;
 		}
@@ -579,7 +607,7 @@ static int parse_btf_arg(char *varname,
 	return 0;
 }
 
-static const struct fetch_type *parse_btf_arg_type(
+static const struct fetch_type *find_fetch_type_from_btf_type(
 					struct traceprobe_parse_context *ctx)
 {
 	struct btf *btf = ctx->btf;
@@ -591,27 +619,6 @@ static const struct fetch_type *parse_btf_arg_type(
 	return find_fetch_type(typestr, ctx->flags);
 }
 
-static const struct fetch_type *parse_btf_retval_type(
-					struct traceprobe_parse_context *ctx)
-{
-	const char *typestr = NULL;
-	const struct btf_type *type;
-	struct btf *btf;
-
-	if (ctx->funcname) {
-		/* Do not use ctx->btf, because it must be used with ctx->param */
-		type = btf_find_func_proto(ctx->funcname, &btf);
-		if (type) {
-			type = btf_type_skip_modifiers(btf, type->type, NULL);
-			if (!IS_ERR_OR_NULL(type))
-				typestr = fetch_type_from_btf_type(btf, type, ctx);
-			btf_put(btf);
-		}
-	}
-
-	return find_fetch_type(typestr, ctx->flags);
-}
-
 static int parse_btf_bitfield(struct fetch_insn **pcode,
 			      struct traceprobe_parse_context *ctx)
 {
@@ -634,30 +641,15 @@ static int parse_btf_bitfield(struct fetch_insn **pcode,
 	return 0;
 }
 
-static bool is_btf_retval_void(const char *funcname)
-{
-	const struct btf_type *t;
-	struct btf *btf;
-	bool ret;
-
-	t = btf_find_func_proto(funcname, &btf);
-	if (!t)
-		return false;
-
-	ret = (t->type == 0);
-	btf_put(btf);
-	return ret;
-}
 #else
 static void clear_btf_context(struct traceprobe_parse_context *ctx)
 {
 	ctx->btf = NULL;
 }
 
-static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr,
-						   struct btf **btf_p, bool tracepoint)
+static int query_btf_context(struct traceprobe_parse_context *ctx)
 {
-	return ERR_PTR(-EOPNOTSUPP);
+	return -EOPNOTSUPP;
 }
 
 static int parse_btf_arg(char *varname,
@@ -675,24 +667,23 @@ static int parse_btf_bitfield(struct fetch_insn **pcode,
 	return -EOPNOTSUPP;
 }
 
-#define parse_btf_arg_type(ctx)		\
-	find_fetch_type(NULL, ctx->flags)
-
-#define parse_btf_retval_type(ctx)		\
+#define find_fetch_type_from_btf_type(ctx)		\
 	find_fetch_type(NULL, ctx->flags)
 
-#define is_btf_retval_void(funcname)	(false)
-
 #endif
 
 #define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))
 
-static int parse_probe_vars(char *arg, const struct fetch_type *t,
-			    struct fetch_insn *code,
+/* Parse $vars. @orig_arg points '$', which syncs to @ctx->offset */
+static int parse_probe_vars(char *orig_arg, const struct fetch_type *t,
+			    struct fetch_insn **pcode,
+			    struct fetch_insn *end,
 			    struct traceprobe_parse_context *ctx)
 {
-	unsigned long param;
+	struct fetch_insn *code = *pcode;
 	int err = TP_ERR_BAD_VAR;
+	char *arg = orig_arg + 1;
+	unsigned long param;
 	int ret = 0;
 	int len;
 
@@ -711,18 +702,17 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
 		goto inval;
 	}
 
-	if (strcmp(arg, "retval") == 0) {
-		if (ctx->flags & TPARG_FL_RETURN) {
-			if ((ctx->flags & TPARG_FL_KERNEL) &&
-			    is_btf_retval_void(ctx->funcname)) {
-				err = TP_ERR_NO_RETVAL;
-				goto inval;
-			}
+	if (str_has_prefix(arg, "retval")) {
+		if (!(ctx->flags & TPARG_FL_RETURN)) {
+			err = TP_ERR_RETVAL_ON_PROBE;
+			goto inval;
+		}
+		if (!(ctx->flags & TPARG_FL_KERNEL) ||
+		    !IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS)) {
 			code->op = FETCH_OP_RETVAL;
 			return 0;
 		}
-		err = TP_ERR_RETVAL_ON_PROBE;
-		goto inval;
+		return parse_btf_arg(orig_arg, pcode, end, ctx);
 	}
 
 	len = str_has_prefix(arg, "stack");
@@ -824,7 +814,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
 
 	switch (arg[0]) {
 	case '$':
-		ret = parse_probe_vars(arg + 1, type, code, ctx);
+		ret = parse_probe_vars(arg, type, pcode, end, ctx);
 		break;
 
 	case '%':	/* named register */
@@ -1121,12 +1111,9 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
 		goto fail;
 
 	/* Update storing type if BTF is available */
-	if (IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS) && !t) {
-		if (ctx->last_type)
-			parg->type = parse_btf_arg_type(ctx);
-		else if (ctx->flags & TPARG_FL_RETURN)
-			parg->type = parse_btf_retval_type(ctx);
-	}
+	if (IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS) &&
+	    !t && ctx->last_type)
+		parg->type = find_fetch_type_from_btf_type(ctx);
 
 	ret = -EINVAL;
 	/* Store operation */
@@ -1415,7 +1402,6 @@ const char **traceprobe_expand_meta_args(int argc, const char *argv[],
 	const struct btf_param *params = NULL;
 	int i, j, n, used, ret, args_idx = -1;
 	const char **new_argv = NULL;
-	int nr_params;
 
 	ret = argv_has_var_arg(argc, argv, &args_idx, ctx);
 	if (ret < 0)
@@ -1426,9 +1412,8 @@ const char **traceprobe_expand_meta_args(int argc, const char *argv[],
 		return NULL;
 	}
 
-	params = find_btf_func_param(ctx->funcname, &nr_params, &ctx->btf,
-				     ctx->flags & TPARG_FL_TPOINT);
-	if (IS_ERR_OR_NULL(params)) {
+	ret = query_btf_context(ctx);
+	if (ret < 0 || ctx->nr_params == 0) {
 		if (args_idx != -1) {
 			/* $arg* requires BTF info */
 			trace_probe_log_err(0, NOSUP_BTFARG);
@@ -1437,8 +1422,6 @@ const char **traceprobe_expand_meta_args(int argc, const char *argv[],
 		*new_argc = argc;
 		return NULL;
 	}
-	ctx->params = params;
-	ctx->nr_params = nr_params;
 
 	if (args_idx >= 0)
 		*new_argc = argc + ctx->nr_params - 1;
@@ -1453,7 +1436,7 @@ const char **traceprobe_expand_meta_args(int argc, const char *argv[],
 	for (i = 0, j = 0; i < argc; i++) {
 		trace_probe_log_set_index(i + 2);
 		if (i == args_idx) {
-			for (n = 0; n < nr_params; n++) {
+			for (n = 0; n < ctx->nr_params; n++) {
 				ret = sprint_nth_btf_arg(n, "", buf + used,
 							 bufsize - used, ctx);
 				if (ret < 0)
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 6111f1ffca6c..9184c84833f8 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -385,6 +385,7 @@ struct traceprobe_parse_context {
 	struct trace_event_call *event;
 	/* BTF related parameters */
 	const char *funcname;		/* Function name in BTF */
+	const struct btf_type  *proto;	/* Prototype of the function */
 	const struct btf_param *params;	/* Parameter of the function */
 	s32 nr_params;			/* The number of the parameters */
 	struct btf *btf;		/* The BTF to be used */


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 6/9] tracing/probes: Add string type check with BTF
  2023-07-31  7:30 [PATCH v4 0/9] tracing: Improbe BTF support on probe events Masami Hiramatsu (Google)
                   ` (4 preceding siblings ...)
  2023-07-31  7:30 ` [PATCH v4 5/9] tracing/probes: Support BTF field access from $retval Masami Hiramatsu (Google)
@ 2023-07-31  7:31 ` Masami Hiramatsu (Google)
  2023-07-31  7:31 ` [PATCH v4 7/9] tracing/fprobe-event: Assume fprobe is a return event by $retval Masami Hiramatsu (Google)
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 47+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-07-31  7:31 UTC (permalink / raw)
  To: linux-trace-kernel
  Cc: linux-kernel, Steven Rostedt, mhiramat, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Add a string type checking with BTF information if possible.
This will check whether the given BTF argument (and field) is
signed char array or pointer to signed char. If not, it reject
the 'string' type. If it is pointer to signed char, it adds
a dereference opration so that it can correctly fetch the
string data from memory.

 # echo 'f getname_flags%return retval->name:string' >> dynamic_events
 # echo 't sched_switch next->comm:string' >> dynamic_events

The above cases, 'struct filename::name' is 'char *' and
'struct task_struct::comm' is 'char []'. But in both case,
user can specify ':string' to fetch the string data.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 Changes in v3:
  - Use ctx->btf instead of traceprobe_get_btf().
---
 kernel/trace/trace_probe.c |   89 +++++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/trace_probe.h |    3 +
 2 files changed, 89 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 8ce7d8f04849..46a604f786c5 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -309,6 +309,77 @@ static u32 btf_type_int(const struct btf_type *t)
 	return *(u32 *)(t + 1);
 }
 
+static bool btf_type_is_char_ptr(struct btf *btf, const struct btf_type *type)
+{
+	const struct btf_type *real_type;
+	u32 intdata;
+	s32 tid;
+
+	real_type = btf_type_skip_modifiers(btf, type->type, &tid);
+	if (!real_type)
+		return false;
+
+	if (BTF_INFO_KIND(real_type->info) != BTF_KIND_INT)
+		return false;
+
+	intdata = btf_type_int(real_type);
+	return !(BTF_INT_ENCODING(intdata) & BTF_INT_SIGNED)
+		&& BTF_INT_BITS(intdata) == 8;
+}
+
+static bool btf_type_is_char_array(struct btf *btf, const struct btf_type *type)
+{
+	const struct btf_type *real_type;
+	const struct btf_array *array;
+	u32 intdata;
+	s32 tid;
+
+	if (BTF_INFO_KIND(type->info) != BTF_KIND_ARRAY)
+		return false;
+
+	array = (const struct btf_array *)(type + 1);
+
+	real_type = btf_type_skip_modifiers(btf, array->type, &tid);
+
+	intdata = btf_type_int(real_type);
+	return !(BTF_INT_ENCODING(intdata) & BTF_INT_SIGNED)
+		&& BTF_INT_BITS(intdata) == 8;
+}
+
+static int check_prepare_btf_string_fetch(char *typename,
+				struct fetch_insn **pcode,
+				struct traceprobe_parse_context *ctx)
+{
+	struct btf *btf = ctx->btf;
+
+	if (!btf || !ctx->last_type)
+		return 0;
+
+	/* char [] does not need any change. */
+	if (btf_type_is_char_array(btf, ctx->last_type))
+		return 0;
+
+	/* char * requires dereference the pointer. */
+	if (btf_type_is_char_ptr(btf, ctx->last_type)) {
+		struct fetch_insn *code = *pcode + 1;
+
+		if (code->op == FETCH_OP_END) {
+			trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
+			return -E2BIG;
+		}
+		if (typename[0] == 'u')
+			code->op = FETCH_OP_UDEREF;
+		else
+			code->op = FETCH_OP_DEREF;
+		code->offset = 0;
+		*pcode = code;
+		return 0;
+	}
+	/* Other types are not available for string */
+	trace_probe_log_err(ctx->offset, BAD_TYPE4STR);
+	return -EINVAL;
+}
+
 static const char *fetch_type_from_btf_type(struct btf *btf,
 					const struct btf_type *type,
 					struct traceprobe_parse_context *ctx)
@@ -670,6 +741,13 @@ static int parse_btf_bitfield(struct fetch_insn **pcode,
 #define find_fetch_type_from_btf_type(ctx)		\
 	find_fetch_type(NULL, ctx->flags)
 
+static int check_prepare_btf_string_fetch(char *typename,
+				struct fetch_insn **pcode,
+				struct traceprobe_parse_context *ctx)
+{
+	return 0;
+}
+
 #endif
 
 #define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))
@@ -1112,8 +1190,15 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
 
 	/* Update storing type if BTF is available */
 	if (IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS) &&
-	    !t && ctx->last_type)
-		parg->type = find_fetch_type_from_btf_type(ctx);
+	    ctx->last_type) {
+		if (!t) {
+			parg->type = find_fetch_type_from_btf_type(ctx);
+		} else if (strstr(t, "string")) {
+			ret = check_prepare_btf_string_fetch(t, &code, ctx);
+			if (ret)
+				goto fail;
+		}
+	}
 
 	ret = -EINVAL;
 	/* Store operation */
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 9184c84833f8..7f929482e8d4 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -513,7 +513,8 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
 	C(NOSUP_DAT_ARG,	"Non pointer structure/union argument is not supported."),\
 	C(BAD_HYPHEN,		"Failed to parse single hyphen. Forgot '>'?"),	\
 	C(NO_BTF_FIELD,		"This field is not found."),	\
-	C(BAD_BTF_TID,		"Failed to get BTF type info."),
+	C(BAD_BTF_TID,		"Failed to get BTF type info."),\
+	C(BAD_TYPE4STR,		"This type does not fit for string."),
 
 #undef C
 #define C(a, b)		TP_ERR_##a


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 7/9] tracing/fprobe-event: Assume fprobe is a return event by $retval
  2023-07-31  7:30 [PATCH v4 0/9] tracing: Improbe BTF support on probe events Masami Hiramatsu (Google)
                   ` (5 preceding siblings ...)
  2023-07-31  7:31 ` [PATCH v4 6/9] tracing/probes: Add string type check with BTF Masami Hiramatsu (Google)
@ 2023-07-31  7:31 ` Masami Hiramatsu (Google)
  2023-07-31  7:31 ` [PATCH v4 8/9] selftests/ftrace: Add BTF fields access testcases Masami Hiramatsu (Google)
  2023-07-31  7:31 ` [PATCH v4 9/9] Documentation: tracing: Update fprobe event example with BTF field Masami Hiramatsu (Google)
  8 siblings, 0 replies; 47+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-07-31  7:31 UTC (permalink / raw)
  To: linux-trace-kernel
  Cc: linux-kernel, Steven Rostedt, mhiramat, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Assume the fprobe event is a return event if there is $retval is
used in the probe's argument without %return. e.g.

echo 'f:myevent vfs_read $retval' >> dynamic_events

then 'myevent' is a return probe event.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 kernel/trace/trace_fprobe.c                        |   58 +++++++++++++++-----
 .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |    2 -
 2 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 8f43f1f65b1b..8bfe23af9c73 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -898,6 +898,46 @@ static struct tracepoint *find_tracepoint(const char *tp_name)
 	return data.tpoint;
 }
 
+static int parse_symbol_and_return(int argc, const char *argv[],
+				   char **symbol, bool *is_return,
+				   bool is_tracepoint)
+{
+	char *tmp = strchr(argv[1], '%');
+	int i;
+
+	if (tmp) {
+		int len = tmp - argv[1];
+
+		if (!is_tracepoint && !strcmp(tmp, "%return")) {
+			*is_return = true;
+		} else {
+			trace_probe_log_err(len, BAD_ADDR_SUFFIX);
+			return -EINVAL;
+		}
+		*symbol = kmemdup_nul(argv[1], len, GFP_KERNEL);
+	} else
+		*symbol = kstrdup(argv[1], GFP_KERNEL);
+	if (!*symbol)
+		return -ENOMEM;
+
+	if (*is_return)
+		return 0;
+
+	/* If there is $retval, this should be a return fprobe. */
+	for (i = 2; i < argc; i++) {
+		tmp = strstr(argv[i], "$retval");
+		if (tmp && !isalnum(tmp[7]) && tmp[7] != '_') {
+			*is_return = true;
+			/*
+			 * NOTE: Don't check is_tracepoint here, because it will
+			 * be checked when the argument is parsed.
+			 */
+			break;
+		}
+	}
+	return 0;
+}
+
 static int __trace_fprobe_create(int argc, const char *argv[])
 {
 	/*
@@ -927,7 +967,7 @@ static int __trace_fprobe_create(int argc, const char *argv[])
 	struct trace_fprobe *tf = NULL;
 	int i, len, new_argc = 0, ret = 0;
 	bool is_return = false;
-	char *symbol = NULL, *tmp = NULL;
+	char *symbol = NULL;
 	const char *event = NULL, *group = FPROBE_EVENT_SYSTEM;
 	const char **new_argv = NULL;
 	int maxactive = 0;
@@ -983,20 +1023,10 @@ static int __trace_fprobe_create(int argc, const char *argv[])
 	trace_probe_log_set_index(1);
 
 	/* a symbol(or tracepoint) must be specified */
-	symbol = kstrdup(argv[1], GFP_KERNEL);
-	if (!symbol)
-		return -ENOMEM;
+	ret = parse_symbol_and_return(argc, argv, &symbol, &is_return, is_tracepoint);
+	if (ret < 0)
+		goto parse_error;
 
-	tmp = strchr(symbol, '%');
-	if (tmp) {
-		if (!is_tracepoint && !strcmp(tmp, "%return")) {
-			*tmp = '\0';
-			is_return = true;
-		} else {
-			trace_probe_log_err(tmp - symbol, BAD_ADDR_SUFFIX);
-			goto parse_error;
-		}
-	}
 	if (!is_return && maxactive) {
 		trace_probe_log_set_index(0);
 		trace_probe_log_err(1, BAD_MAXACT_TYPE);
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
index 812f5b3f6055..72563b2e0812 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
@@ -30,11 +30,11 @@ check_error 'f:^ vfs_read'		# NO_EVENT_NAME
 check_error 'f:foo/^12345678901234567890123456789012345678901234567890123456789012345 vfs_read'	# EVENT_TOO_LONG
 check_error 'f:foo/^bar.1 vfs_read'	# BAD_EVENT_NAME
 
-check_error 'f vfs_read ^$retval'	# RETVAL_ON_PROBE
 check_error 'f vfs_read ^$stack10000'	# BAD_STACK_NUM
 
 check_error 'f vfs_read ^$arg10000'	# BAD_ARG_NUM
 
+check_error 'f vfs_read $retval ^$arg1' # BAD_VAR
 check_error 'f vfs_read ^$none_var'	# BAD_VAR
 check_error 'f vfs_read ^'$REG		# BAD_VAR
 


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 8/9] selftests/ftrace: Add BTF fields access testcases
  2023-07-31  7:30 [PATCH v4 0/9] tracing: Improbe BTF support on probe events Masami Hiramatsu (Google)
                   ` (6 preceding siblings ...)
  2023-07-31  7:31 ` [PATCH v4 7/9] tracing/fprobe-event: Assume fprobe is a return event by $retval Masami Hiramatsu (Google)
@ 2023-07-31  7:31 ` Masami Hiramatsu (Google)
  2023-07-31  7:31 ` [PATCH v4 9/9] Documentation: tracing: Update fprobe event example with BTF field Masami Hiramatsu (Google)
  8 siblings, 0 replies; 47+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-07-31  7:31 UTC (permalink / raw)
  To: linux-trace-kernel
  Cc: linux-kernel, Steven Rostedt, mhiramat, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Add test cases for accessing the data structure fields using BTF info.
This includes the field access from parameters and retval, and accessing
string information.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
---
Changes in v2:
 - Use '$retval' instead of 'retval'.
 - Add a test that use both '$retval' and '$arg1' for fprobe.
Changes in v3:
 - Change a test case with a numeric value.
 - Add a test case with mixed '.' and '->' operators.
---
 .../ftrace/test.d/dynevent/add_remove_btfarg.tc    |   14 ++++++++++++++
 .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc |    4 ++++
 2 files changed, 18 insertions(+)

diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_btfarg.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_btfarg.tc
index f34b14ef9781..4bfd2f45db42 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_btfarg.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_btfarg.tc
@@ -21,6 +21,9 @@ echo 0 > events/enable
 echo > dynamic_events
 
 TP=kfree
+TP2=kmem_cache_alloc
+TP3=getname_flags
+TP4=sched_wakeup
 
 if [ "$FPROBES" ] ; then
 echo "f:fpevent $TP object" >> dynamic_events
@@ -33,6 +36,7 @@ echo > dynamic_events
 
 echo "f:fpevent $TP "'$arg1' >> dynamic_events
 grep -q "fpevent.*object=object" dynamic_events
+
 echo > dynamic_events
 
 echo "f:fpevent $TP "'$arg*' >> dynamic_events
@@ -45,6 +49,16 @@ fi
 
 echo > dynamic_events
 
+echo "t:tpevent ${TP2} obj_size=s->object_size" >> dynamic_events
+echo "f:fpevent ${TP3}%return path=\$retval->name:string" >> dynamic_events
+echo "t:tpevent2 ${TP4} p->se.group_node.next->prev" >> dynamic_events
+
+grep -q "tpevent .*obj_size=s->object_size" dynamic_events
+grep -q "fpevent.*path=\$retval->name:string" dynamic_events
+grep -q 'tpevent2 .*p->se.group_node.next->prev' dynamic_events
+
+echo > dynamic_events
+
 if [ "$KPROBES" ] ; then
 echo "p:kpevent $TP object" >> dynamic_events
 grep -q "kpevent.*object=object" dynamic_events
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
index 72563b2e0812..49758f77c923 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
@@ -103,6 +103,10 @@ check_error 'f vfs_read%return ^$arg*'		# NOFENTRY_ARGS
 check_error 'f vfs_read ^hoge'			# NO_BTFARG
 check_error 'f kfree ^$arg10'			# NO_BTFARG (exceed the number of parameters)
 check_error 'f kfree%return ^$retval'		# NO_RETVAL
+check_error 'f vfs_read%return $retval->^foo'	# NO_PTR_STRCT
+check_error 'f vfs_read file->^foo'		# NO_BTF_FIELD
+check_error 'f vfs_read file^-.foo'		# BAD_HYPHEN
+check_error 'f vfs_read ^file:string'		# BAD_TYPE4STR
 else
 check_error 'f vfs_read ^$arg*'			# NOSUP_BTFARG
 check_error 't kfree ^$arg*'			# NOSUP_BTFARG


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4 9/9] Documentation: tracing: Update fprobe event example with BTF field
  2023-07-31  7:30 [PATCH v4 0/9] tracing: Improbe BTF support on probe events Masami Hiramatsu (Google)
                   ` (7 preceding siblings ...)
  2023-07-31  7:31 ` [PATCH v4 8/9] selftests/ftrace: Add BTF fields access testcases Masami Hiramatsu (Google)
@ 2023-07-31  7:31 ` Masami Hiramatsu (Google)
  8 siblings, 0 replies; 47+ messages in thread
From: Masami Hiramatsu (Google) @ 2023-07-31  7:31 UTC (permalink / raw)
  To: linux-trace-kernel
  Cc: linux-kernel, Steven Rostedt, mhiramat, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Update fprobe event example with BTF data structure field specification.

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
---
 Changes in v2:
  - Remove 'retval' and use '$retval'.
 Changes in v3:
  - Add description about mixture of '.' and '->' usage.
---
 Documentation/trace/fprobetrace.rst |   64 +++++++++++++++++++++++++----------
 1 file changed, 46 insertions(+), 18 deletions(-)

diff --git a/Documentation/trace/fprobetrace.rst b/Documentation/trace/fprobetrace.rst
index 7297f9478459..8e9bebcf0a2e 100644
--- a/Documentation/trace/fprobetrace.rst
+++ b/Documentation/trace/fprobetrace.rst
@@ -79,9 +79,9 @@ automatically set by the given name. ::
  f:fprobes/myprobe vfs_read count=count pos=pos
 
 It also chooses the fetch type from BTF information. For example, in the above
-example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus, both
-are converted to 64bit unsigned long, but only ``pos`` has "%Lx" print-format as
-below ::
+example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus,
+both are converted to 64bit unsigned long, but only ``pos`` has "%Lx"
+print-format as below ::
 
  # cat events/fprobes/myprobe/format
  name: myprobe
@@ -105,9 +105,47 @@ is expanded to all function arguments of the function or the tracepoint. ::
  # cat dynamic_events
  f:fprobes/myprobe vfs_read file=file buf=buf count=count pos=pos
 
-BTF also affects the ``$retval``. If user doesn't set any type, the retval type is
-automatically picked from the BTF. If the function returns ``void``, ``$retval``
-is rejected.
+BTF also affects the ``$retval``. If user doesn't set any type, the retval
+type is automatically picked from the BTF. If the function returns ``void``,
+``$retval`` is rejected.
+
+You can access the data fields of a data structure using allow operator ``->``
+(for pointer type) and dot operator ``.`` (for data structure type.)::
+
+# echo 't sched_switch preempt prev_pid=prev->pid next_pid=next->pid' >> dynamic_events
+
+The field access operators, ``->`` and ``.`` can be combined for accessing deeper
+members and other structure members pointed by the member. e.g. ``foo->bar.baz->qux``
+If there is non-name union member, you can directly access it as the C code does.
+For example::
+
+ struct {
+	union {
+	int a;
+	int b;
+	};
+ } *foo;
+
+To access ``a`` and ``b``, use ``foo->a`` and ``foo->b`` in this case.
+
+This data field access is available for the return value via ``$retval``,
+e.g. ``$retval->name``.
+
+For these BTF arguments and fields, ``:string`` and ``:ustring`` change the
+behavior. If these are used for BTF argument or field, it checks whether
+the BTF type of the argument or the data field is ``char *`` or ``char []``,
+or not.  If not, it rejects applying the string types. Also, with the BTF
+support, you don't need a memory dereference operator (``+0(PTR)``) for
+accessing the string pointed by a ``PTR``. It automatically adds the memory
+dereference operator according to the BTF type. e.g. ::
+
+# echo 't sched_switch prev->comm:string' >> dynamic_events
+# echo 'f getname_flags%return $retval->name:string' >> dynamic_events
+
+The ``prev->comm`` is an embedded char array in the data structure, and
+``$retval->name`` is a char pointer in the data structure. But in both
+cases, you can use ``:string`` type to get the string.
+
 
 Usage examples
 --------------
@@ -161,10 +199,10 @@ parameters. This means you can access any field values in the task
 structure pointed by the ``prev`` and ``next`` arguments.
 
 For example, usually ``task_struct::start_time`` is not traced, but with this
-traceprobe event, you can trace it as below.
+traceprobe event, you can trace that field as below.
 ::
 
-  # echo 't sched_switch comm=+1896(next):string start_time=+1728(next):u64' > dynamic_events
+  # echo 't sched_switch comm=next->comm:string next->start_time' > dynamic_events
   # head -n 20 trace | tail
  #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
  #              | |         |   |||||     |         |
@@ -176,13 +214,3 @@ traceprobe event, you can trace it as below.
            <idle>-0       [000] d..3.  5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
       kworker/0:1-14      [000] d..3.  5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
            <idle>-0       [000] d..3.  5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
-
-Currently, to find the offset of a specific field in the data structure,
-you need to build kernel with debuginfo and run `perf probe` command with
-`-D` option. e.g.
-::
-
- # perf probe -D "__probestub_sched_switch next->comm:string next->start_time"
- p:probe/__probestub_sched_switch __probestub_sched_switch+0 comm=+1896(%cx):string start_time=+1728(%cx):u64
-
-And replace the ``%cx`` with the ``next``.


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-07-31  7:30 ` [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union Masami Hiramatsu (Google)
@ 2023-07-31 21:59   ` Alexei Starovoitov
  2023-07-31 23:57     ` Masami Hiramatsu
  2023-08-01  1:15     ` Steven Rostedt
  0 siblings, 2 replies; 47+ messages in thread
From: Alexei Starovoitov @ 2023-07-31 21:59 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: linux-trace-kernel, LKML, Steven Rostedt, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

On Mon, Jul 31, 2023 at 12:30 AM Masami Hiramatsu (Google)
<mhiramat@kernel.org> wrote:
>
> From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
>
> Add btf_find_struct_member() API to search a member of a given data structure
> or union from the member's name.
>
> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> ---
>  Changes in v3:
>   - Remove simple input check.
>   - Fix unneeded IS_ERR_OR_NULL() check for btf_type_by_id().
>   - Move the code next to btf_get_func_param().
>   - Use for_each_member() macro instead of for-loop.
>   - Use btf_type_skip_modifiers() instead of btf_type_by_id().
>  Changes in v4:
>   - Use a stack for searching in anonymous members instead of nested call.
> ---
>  include/linux/btf.h |    3 +++
>  kernel/bpf/btf.c    |   40 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+)
>
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 20e3a07eef8f..4b10d57ceee0 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -226,6 +226,9 @@ const struct btf_type *btf_find_func_proto(const char *func_name,
>                                            struct btf **btf_p);
>  const struct btf_param *btf_get_func_param(const struct btf_type *func_proto,
>                                            s32 *nr);
> +const struct btf_member *btf_find_struct_member(struct btf *btf,
> +                                               const struct btf_type *type,
> +                                               const char *member_name);
>
>  #define for_each_member(i, struct_type, member)                        \
>         for (i = 0, member = btf_type_member(struct_type);      \
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index f7b25c615269..8d81a4ffa67b 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -958,6 +958,46 @@ const struct btf_param *btf_get_func_param(const struct btf_type *func_proto, s3
>                 return NULL;
>  }
>
> +#define BTF_ANON_STACK_MAX     16
> +
> +/*
> + * Find a member of data structure/union by name and return it.
> + * Return NULL if not found, or -EINVAL if parameter is invalid.
> + */
> +const struct btf_member *btf_find_struct_member(struct btf *btf,
> +                                               const struct btf_type *type,
> +                                               const char *member_name)
> +{
> +       const struct btf_type *anon_stack[BTF_ANON_STACK_MAX];
> +       const struct btf_member *member;
> +       const char *name;
> +       int i, top = 0;
> +
> +retry:
> +       if (!btf_type_is_struct(type))
> +               return ERR_PTR(-EINVAL);
> +
> +       for_each_member(i, type, member) {
> +               if (!member->name_off) {
> +                       /* Anonymous union/struct: push it for later use */
> +                       type = btf_type_skip_modifiers(btf, member->type, NULL);
> +                       if (type && top < BTF_ANON_STACK_MAX)
> +                               anon_stack[top++] = type;
> +               } else {
> +                       name = btf_name_by_offset(btf, member->name_off);
> +                       if (name && !strcmp(member_name, name))
> +                               return member;
> +               }
> +       }
> +       if (top > 0) {
> +               /* Pop from the anonymous stack and retry */
> +               type = anon_stack[--top];
> +               goto retry;
> +       }

Looks good, but I don't see a test case for this.
The logic is a bit tricky. I'd like to have a selftest that covers it.

You probably need to drop Alan's reviewed-by, since the patch is quite
different from the time he reviewed it.

Assuming that is addressed. How do we merge the series?
The first 3 patches have serious conflicts with bpf trees.

Maybe send the first 3 with extra selftest for above recursion
targeting bpf-next then we can have a merge commit that Steven can pull
into tracing?

Or if we can have acks for patches 4-9 we can pull the whole set into bpf-next.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-07-31 21:59   ` Alexei Starovoitov
@ 2023-07-31 23:57     ` Masami Hiramatsu
  2023-08-01  0:29       ` Alexei Starovoitov
  2023-08-01  1:15     ` Steven Rostedt
  1 sibling, 1 reply; 47+ messages in thread
From: Masami Hiramatsu @ 2023-07-31 23:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-trace-kernel, LKML, Steven Rostedt, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

On Mon, 31 Jul 2023 14:59:47 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Mon, Jul 31, 2023 at 12:30 AM Masami Hiramatsu (Google)
> <mhiramat@kernel.org> wrote:
> >
> > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> >
> > Add btf_find_struct_member() API to search a member of a given data structure
> > or union from the member's name.
> >
> > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> > ---
> >  Changes in v3:
> >   - Remove simple input check.
> >   - Fix unneeded IS_ERR_OR_NULL() check for btf_type_by_id().
> >   - Move the code next to btf_get_func_param().
> >   - Use for_each_member() macro instead of for-loop.
> >   - Use btf_type_skip_modifiers() instead of btf_type_by_id().
> >  Changes in v4:
> >   - Use a stack for searching in anonymous members instead of nested call.
> > ---
> >  include/linux/btf.h |    3 +++
> >  kernel/bpf/btf.c    |   40 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 43 insertions(+)
> >
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index 20e3a07eef8f..4b10d57ceee0 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -226,6 +226,9 @@ const struct btf_type *btf_find_func_proto(const char *func_name,
> >                                            struct btf **btf_p);
> >  const struct btf_param *btf_get_func_param(const struct btf_type *func_proto,
> >                                            s32 *nr);
> > +const struct btf_member *btf_find_struct_member(struct btf *btf,
> > +                                               const struct btf_type *type,
> > +                                               const char *member_name);
> >
> >  #define for_each_member(i, struct_type, member)                        \
> >         for (i = 0, member = btf_type_member(struct_type);      \
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index f7b25c615269..8d81a4ffa67b 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -958,6 +958,46 @@ const struct btf_param *btf_get_func_param(const struct btf_type *func_proto, s3
> >                 return NULL;
> >  }
> >
> > +#define BTF_ANON_STACK_MAX     16
> > +
> > +/*
> > + * Find a member of data structure/union by name and return it.
> > + * Return NULL if not found, or -EINVAL if parameter is invalid.
> > + */
> > +const struct btf_member *btf_find_struct_member(struct btf *btf,
> > +                                               const struct btf_type *type,
> > +                                               const char *member_name)
> > +{
> > +       const struct btf_type *anon_stack[BTF_ANON_STACK_MAX];
> > +       const struct btf_member *member;
> > +       const char *name;
> > +       int i, top = 0;
> > +
> > +retry:
> > +       if (!btf_type_is_struct(type))
> > +               return ERR_PTR(-EINVAL);
> > +
> > +       for_each_member(i, type, member) {
> > +               if (!member->name_off) {
> > +                       /* Anonymous union/struct: push it for later use */
> > +                       type = btf_type_skip_modifiers(btf, member->type, NULL);
> > +                       if (type && top < BTF_ANON_STACK_MAX)
> > +                               anon_stack[top++] = type;
> > +               } else {
> > +                       name = btf_name_by_offset(btf, member->name_off);
> > +                       if (name && !strcmp(member_name, name))
> > +                               return member;
> > +               }
> > +       }
> > +       if (top > 0) {
> > +               /* Pop from the anonymous stack and retry */
> > +               type = anon_stack[--top];
> > +               goto retry;
> > +       }
> 
> Looks good, but I don't see a test case for this.
> The logic is a bit tricky. I'd like to have a selftest that covers it.

Thanks, and I agree about selftest.

> 
> You probably need to drop Alan's reviewed-by, since the patch is quite
> different from the time he reviewed it.

OK. BTW, I found a problem on this function. I guess the member->offset will
be the offset from the intermediate anonymous union, it is usually 0, but
I need the offset from the given structure. Thus the interface design must
be changed. Passing a 'u32 *offset' and set the correct offset in it. If
it has nested intermediate anonymous unions, that offset must also be pushed.

> 
> Assuming that is addressed. How do we merge the series?
> The first 3 patches have serious conflicts with bpf trees.
> 
> Maybe send the first 3 with extra selftest for above recursion
> targeting bpf-next then we can have a merge commit that Steven can pull
> into tracing?
> 
> Or if we can have acks for patches 4-9 we can pull the whole set into bpf-next.

That's a good question. I don't like splitting the whole series in 2 -next
branches. So I can send this to the bpf-next.
I need to work on another series(*) on fprobes which will not have conflicts with
this series. (*Replacing pt_regs with ftrace_regs on fprobe, which will take longer
time, and need to adjust with eBPF).

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-07-31 23:57     ` Masami Hiramatsu
@ 2023-08-01  0:29       ` Alexei Starovoitov
  2023-08-01 15:02         ` Masami Hiramatsu
  0 siblings, 1 reply; 47+ messages in thread
From: Alexei Starovoitov @ 2023-08-01  0:29 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-trace-kernel, LKML, Steven Rostedt, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

On Mon, Jul 31, 2023 at 4:57 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Mon, 31 Jul 2023 14:59:47 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > On Mon, Jul 31, 2023 at 12:30 AM Masami Hiramatsu (Google)
> > <mhiramat@kernel.org> wrote:
> > >
> > > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > >
> > > Add btf_find_struct_member() API to search a member of a given data structure
> > > or union from the member's name.
> > >
> > > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> > > ---
> > >  Changes in v3:
> > >   - Remove simple input check.
> > >   - Fix unneeded IS_ERR_OR_NULL() check for btf_type_by_id().
> > >   - Move the code next to btf_get_func_param().
> > >   - Use for_each_member() macro instead of for-loop.
> > >   - Use btf_type_skip_modifiers() instead of btf_type_by_id().
> > >  Changes in v4:
> > >   - Use a stack for searching in anonymous members instead of nested call.
> > > ---
> > >  include/linux/btf.h |    3 +++
> > >  kernel/bpf/btf.c    |   40 ++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 43 insertions(+)
> > >
> > > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > > index 20e3a07eef8f..4b10d57ceee0 100644
> > > --- a/include/linux/btf.h
> > > +++ b/include/linux/btf.h
> > > @@ -226,6 +226,9 @@ const struct btf_type *btf_find_func_proto(const char *func_name,
> > >                                            struct btf **btf_p);
> > >  const struct btf_param *btf_get_func_param(const struct btf_type *func_proto,
> > >                                            s32 *nr);
> > > +const struct btf_member *btf_find_struct_member(struct btf *btf,
> > > +                                               const struct btf_type *type,
> > > +                                               const char *member_name);
> > >
> > >  #define for_each_member(i, struct_type, member)                        \
> > >         for (i = 0, member = btf_type_member(struct_type);      \
> > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > index f7b25c615269..8d81a4ffa67b 100644
> > > --- a/kernel/bpf/btf.c
> > > +++ b/kernel/bpf/btf.c
> > > @@ -958,6 +958,46 @@ const struct btf_param *btf_get_func_param(const struct btf_type *func_proto, s3
> > >                 return NULL;
> > >  }
> > >
> > > +#define BTF_ANON_STACK_MAX     16
> > > +
> > > +/*
> > > + * Find a member of data structure/union by name and return it.
> > > + * Return NULL if not found, or -EINVAL if parameter is invalid.
> > > + */
> > > +const struct btf_member *btf_find_struct_member(struct btf *btf,
> > > +                                               const struct btf_type *type,
> > > +                                               const char *member_name)
> > > +{
> > > +       const struct btf_type *anon_stack[BTF_ANON_STACK_MAX];
> > > +       const struct btf_member *member;
> > > +       const char *name;
> > > +       int i, top = 0;
> > > +
> > > +retry:
> > > +       if (!btf_type_is_struct(type))
> > > +               return ERR_PTR(-EINVAL);
> > > +
> > > +       for_each_member(i, type, member) {
> > > +               if (!member->name_off) {
> > > +                       /* Anonymous union/struct: push it for later use */
> > > +                       type = btf_type_skip_modifiers(btf, member->type, NULL);
> > > +                       if (type && top < BTF_ANON_STACK_MAX)
> > > +                               anon_stack[top++] = type;
> > > +               } else {
> > > +                       name = btf_name_by_offset(btf, member->name_off);
> > > +                       if (name && !strcmp(member_name, name))
> > > +                               return member;
> > > +               }
> > > +       }
> > > +       if (top > 0) {
> > > +               /* Pop from the anonymous stack and retry */
> > > +               type = anon_stack[--top];
> > > +               goto retry;
> > > +       }
> >
> > Looks good, but I don't see a test case for this.
> > The logic is a bit tricky. I'd like to have a selftest that covers it.
>
> Thanks, and I agree about selftest.
>
> >
> > You probably need to drop Alan's reviewed-by, since the patch is quite
> > different from the time he reviewed it.
>
> OK. BTW, I found a problem on this function. I guess the member->offset will
> be the offset from the intermediate anonymous union, it is usually 0, but
> I need the offset from the given structure. Thus the interface design must
> be changed. Passing a 'u32 *offset' and set the correct offset in it. If
> it has nested intermediate anonymous unions, that offset must also be pushed.

With all that piling up have you considering reusing btf_struct_walk() ?
It's doing the opposite off -> btf_id while you need name -> btf_id.
But it will give an idea of overall complexity if you want to solve it
for nested arrays and struct/union.

> >
> > Assuming that is addressed. How do we merge the series?
> > The first 3 patches have serious conflicts with bpf trees.
> >
> > Maybe send the first 3 with extra selftest for above recursion
> > targeting bpf-next then we can have a merge commit that Steven can pull
> > into tracing?
> >
> > Or if we can have acks for patches 4-9 we can pull the whole set into bpf-next.
>
> That's a good question. I don't like splitting the whole series in 2 -next
> branches. So I can send this to the bpf-next.

Works for me.

> I need to work on another series(*) on fprobes which will not have conflicts with
> this series. (*Replacing pt_regs with ftrace_regs on fprobe, which will take longer
> time, and need to adjust with eBPF).

ftrace_regs?
Ouch. For bpf we rely on pt_regs being an argument.
fprobe should be 100% compatible replacement of kprobe-at-the-func-start.
If it diverges from that it's a big issue for bpf.
We'd have to remove all of fprobe usage.
I could be missing something, of course.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-07-31 21:59   ` Alexei Starovoitov
  2023-07-31 23:57     ` Masami Hiramatsu
@ 2023-08-01  1:15     ` Steven Rostedt
  2023-08-01  2:24       ` Alexei Starovoitov
  1 sibling, 1 reply; 47+ messages in thread
From: Steven Rostedt @ 2023-08-01  1:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Masami Hiramatsu (Google), linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Linus Torvalds

On Mon, 31 Jul 2023 14:59:47 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> Assuming that is addressed. How do we merge the series?
> The first 3 patches have serious conflicts with bpf trees.
> 
> Maybe send the first 3 with extra selftest for above recursion
> targeting bpf-next then we can have a merge commit that Steven can pull
> into tracing?

Would it be possible to do this by basing it off of one of Linus's tags,
and doing the merge and conflict resolution in your tree before it gets to
Linus?

That way we can pull in that clean branch without having to pull in
anything else from BPF. I believe Linus prefers this over having tracing
having extra changes from BPF that are not yet in his tree. We only need
these particular changes, we shouldn't be pulling in anything specific for
BPF, as I believe that will cause issues on Linus's side.

-- Steve


> 
> Or if we can have acks for patches 4-9 we can pull the whole set into bpf-next.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01  1:15     ` Steven Rostedt
@ 2023-08-01  2:24       ` Alexei Starovoitov
  2023-08-01 13:35         ` Steven Rostedt
  2023-08-01 15:18         ` Masami Hiramatsu
  0 siblings, 2 replies; 47+ messages in thread
From: Alexei Starovoitov @ 2023-08-01  2:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu (Google), linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Linus Torvalds

On Mon, Jul 31, 2023 at 6:15 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 31 Jul 2023 14:59:47 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > Assuming that is addressed. How do we merge the series?
> > The first 3 patches have serious conflicts with bpf trees.
> >
> > Maybe send the first 3 with extra selftest for above recursion
> > targeting bpf-next then we can have a merge commit that Steven can pull
> > into tracing?
>
> Would it be possible to do this by basing it off of one of Linus's tags,
> and doing the merge and conflict resolution in your tree before it gets to
> Linus?
>
> That way we can pull in that clean branch without having to pull in
> anything else from BPF. I believe Linus prefers this over having tracing
> having extra changes from BPF that are not yet in his tree. We only need
> these particular changes, we shouldn't be pulling in anything specific for
> BPF, as I believe that will cause issues on Linus's side.

We can try, but I suspect git tricks won't do it.
Masami's changes depend on patches for kernel/bpf/btf.c that
are already in bpf-next, so git would have to follow all commits
that touch this file. I don't think git is smart enough to
thread the needle and split the commit into files. If one commit touches
btf.c and something else that whole commit becomes a dependency
that pulls another commit with all files touched by
the previous commit and so on.
tbh for this set, the easiest for everyone, is to land the whole thing
through bpf-next, since there are no conflicts on fprobe side.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01  2:24       ` Alexei Starovoitov
@ 2023-08-01 13:35         ` Steven Rostedt
  2023-08-01 15:18         ` Masami Hiramatsu
  1 sibling, 0 replies; 47+ messages in thread
From: Steven Rostedt @ 2023-08-01 13:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Masami Hiramatsu (Google), linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Linus Torvalds

On Mon, 31 Jul 2023 19:24:25 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Mon, Jul 31, 2023 at 6:15 PM Steven Rostedt <rostedt@goodmis.org>
> wrote:
> >
> > On Mon, 31 Jul 2023 14:59:47 -0700
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >  
> > > Assuming that is addressed. How do we merge the series?
> > > The first 3 patches have serious conflicts with bpf trees.
> > >
> > > Maybe send the first 3 with extra selftest for above recursion
> > > targeting bpf-next then we can have a merge commit that Steven can
> > > pull into tracing?  
> >
> > Would it be possible to do this by basing it off of one of Linus's tags,
> > and doing the merge and conflict resolution in your tree before it gets
> > to Linus?
> >
> > That way we can pull in that clean branch without having to pull in
> > anything else from BPF. I believe Linus prefers this over having tracing
> > having extra changes from BPF that are not yet in his tree. We only need
> > these particular changes, we shouldn't be pulling in anything specific
> > for BPF, as I believe that will cause issues on Linus's side.  
> 
> We can try, but I suspect git tricks won't do it.
> Masami's changes depend on patches for kernel/bpf/btf.c that
> are already in bpf-next, so git would have to follow all commits

You mean other patches that Masami has sent are in the bpf tree already and
these are on top of them?

-- Steve

> that touch this file. I don't think git is smart enough to
> thread the needle and split the commit into files. If one commit touches
> btf.c and something else that whole commit becomes a dependency
> that pulls another commit with all files touched by
> the previous commit and so on.
> tbh for this set, the easiest for everyone, is to land the whole thing
> through bpf-next, since there are no conflicts on fprobe side.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01  0:29       ` Alexei Starovoitov
@ 2023-08-01 15:02         ` Masami Hiramatsu
  2023-08-01 15:20           ` Steven Rostedt
  0 siblings, 1 reply; 47+ messages in thread
From: Masami Hiramatsu @ 2023-08-01 15:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: linux-trace-kernel, LKML, Steven Rostedt, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov

On Mon, 31 Jul 2023 17:29:49 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Mon, Jul 31, 2023 at 4:57 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Mon, 31 Jul 2023 14:59:47 -0700
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > On Mon, Jul 31, 2023 at 12:30 AM Masami Hiramatsu (Google)
> > > <mhiramat@kernel.org> wrote:
> > > >
> > > > From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > >
> > > > Add btf_find_struct_member() API to search a member of a given data structure
> > > > or union from the member's name.
> > > >
> > > > Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> > > > Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
> > > > ---
> > > >  Changes in v3:
> > > >   - Remove simple input check.
> > > >   - Fix unneeded IS_ERR_OR_NULL() check for btf_type_by_id().
> > > >   - Move the code next to btf_get_func_param().
> > > >   - Use for_each_member() macro instead of for-loop.
> > > >   - Use btf_type_skip_modifiers() instead of btf_type_by_id().
> > > >  Changes in v4:
> > > >   - Use a stack for searching in anonymous members instead of nested call.
> > > > ---
> > > >  include/linux/btf.h |    3 +++
> > > >  kernel/bpf/btf.c    |   40 ++++++++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 43 insertions(+)
> > > >
> > > > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > > > index 20e3a07eef8f..4b10d57ceee0 100644
> > > > --- a/include/linux/btf.h
> > > > +++ b/include/linux/btf.h
> > > > @@ -226,6 +226,9 @@ const struct btf_type *btf_find_func_proto(const char *func_name,
> > > >                                            struct btf **btf_p);
> > > >  const struct btf_param *btf_get_func_param(const struct btf_type *func_proto,
> > > >                                            s32 *nr);
> > > > +const struct btf_member *btf_find_struct_member(struct btf *btf,
> > > > +                                               const struct btf_type *type,
> > > > +                                               const char *member_name);
> > > >
> > > >  #define for_each_member(i, struct_type, member)                        \
> > > >         for (i = 0, member = btf_type_member(struct_type);      \
> > > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > > index f7b25c615269..8d81a4ffa67b 100644
> > > > --- a/kernel/bpf/btf.c
> > > > +++ b/kernel/bpf/btf.c
> > > > @@ -958,6 +958,46 @@ const struct btf_param *btf_get_func_param(const struct btf_type *func_proto, s3
> > > >                 return NULL;
> > > >  }
> > > >
> > > > +#define BTF_ANON_STACK_MAX     16
> > > > +
> > > > +/*
> > > > + * Find a member of data structure/union by name and return it.
> > > > + * Return NULL if not found, or -EINVAL if parameter is invalid.
> > > > + */
> > > > +const struct btf_member *btf_find_struct_member(struct btf *btf,
> > > > +                                               const struct btf_type *type,
> > > > +                                               const char *member_name)
> > > > +{
> > > > +       const struct btf_type *anon_stack[BTF_ANON_STACK_MAX];
> > > > +       const struct btf_member *member;
> > > > +       const char *name;
> > > > +       int i, top = 0;
> > > > +
> > > > +retry:
> > > > +       if (!btf_type_is_struct(type))
> > > > +               return ERR_PTR(-EINVAL);
> > > > +
> > > > +       for_each_member(i, type, member) {
> > > > +               if (!member->name_off) {
> > > > +                       /* Anonymous union/struct: push it for later use */
> > > > +                       type = btf_type_skip_modifiers(btf, member->type, NULL);
> > > > +                       if (type && top < BTF_ANON_STACK_MAX)
> > > > +                               anon_stack[top++] = type;
> > > > +               } else {
> > > > +                       name = btf_name_by_offset(btf, member->name_off);
> > > > +                       if (name && !strcmp(member_name, name))
> > > > +                               return member;
> > > > +               }
> > > > +       }
> > > > +       if (top > 0) {
> > > > +               /* Pop from the anonymous stack and retry */
> > > > +               type = anon_stack[--top];
> > > > +               goto retry;
> > > > +       }
> > >
> > > Looks good, but I don't see a test case for this.
> > > The logic is a bit tricky. I'd like to have a selftest that covers it.
> >
> > Thanks, and I agree about selftest.
> >
> > >
> > > You probably need to drop Alan's reviewed-by, since the patch is quite
> > > different from the time he reviewed it.
> >
> > OK. BTW, I found a problem on this function. I guess the member->offset will
> > be the offset from the intermediate anonymous union, it is usually 0, but
> > I need the offset from the given structure. Thus the interface design must
> > be changed. Passing a 'u32 *offset' and set the correct offset in it. If
> > it has nested intermediate anonymous unions, that offset must also be pushed.
> 
> With all that piling up have you considering reusing btf_struct_walk() ?
> It's doing the opposite off -> btf_id while you need name -> btf_id.
> But it will give an idea of overall complexity if you want to solve it
> for nested arrays and struct/union.

No, it seems a bit different. (and it may not return the name correctly for
anonymous struct/union) Of course it seems an interesting work. What I found
is returning btf_member is not enough because btf_member in the nested union
will have the offset from the nested structure. I have to accumulate the
offset. It is easy to fix (just stacking (tid,offset) instead of type*) :)

> 
> > >
> > > Assuming that is addressed. How do we merge the series?
> > > The first 3 patches have serious conflicts with bpf trees.
> > >
> > > Maybe send the first 3 with extra selftest for above recursion
> > > targeting bpf-next then we can have a merge commit that Steven can pull
> > > into tracing?
> > >
> > > Or if we can have acks for patches 4-9 we can pull the whole set into bpf-next.
> >
> > That's a good question. I don't like splitting the whole series in 2 -next
> > branches. So I can send this to the bpf-next.
> 
> Works for me.

Or, yet another option is keeping new btf APIs in trace/trace_probe.c in this
series, and move all of them to btf.c in the next series.
This will not make any merge problem between trees, but just needs 2 series
on different releases. (since unless the first one is merged, we cannot send
the second one)

> 
> > I need to work on another series(*) on fprobes which will not have conflicts with
> > this series. (*Replacing pt_regs with ftrace_regs on fprobe, which will take longer
> > time, and need to adjust with eBPF).
> 
> ftrace_regs?
> Ouch. For bpf we rely on pt_regs being an argument.

Yeah, that's a problem.

> fprobe should be 100% compatible replacement of kprobe-at-the-func-start.

No, fprobe is not such feature. It must provide more generic interface because
it is a probe version of ftrace, not kprobe.

> If it diverges from that it's a big issue for bpf.
> We'd have to remove all of fprobe usage.
> I could be missing something, of course.

Yes, so that's the discussion point. At first, I will disable fprobe on BPF
if ftrace_regs is not compatible with pt_regs, but eventually it should be
handled to support arm64. I believe BPF can do it since ftrace can do.

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01  2:24       ` Alexei Starovoitov
  2023-08-01 13:35         ` Steven Rostedt
@ 2023-08-01 15:18         ` Masami Hiramatsu
  2023-08-01 22:21           ` Alexei Starovoitov
  1 sibling, 1 reply; 47+ messages in thread
From: Masami Hiramatsu @ 2023-08-01 15:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Masami Hiramatsu (Google), linux-trace-kernel,
	LKML, Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Linus Torvalds

On Mon, 31 Jul 2023 19:24:25 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Mon, Jul 31, 2023 at 6:15 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Mon, 31 Jul 2023 14:59:47 -0700
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > Assuming that is addressed. How do we merge the series?
> > > The first 3 patches have serious conflicts with bpf trees.
> > >
> > > Maybe send the first 3 with extra selftest for above recursion
> > > targeting bpf-next then we can have a merge commit that Steven can pull
> > > into tracing?
> >
> > Would it be possible to do this by basing it off of one of Linus's tags,
> > and doing the merge and conflict resolution in your tree before it gets to
> > Linus?
> >
> > That way we can pull in that clean branch without having to pull in
> > anything else from BPF. I believe Linus prefers this over having tracing
> > having extra changes from BPF that are not yet in his tree. We only need
> > these particular changes, we shouldn't be pulling in anything specific for
> > BPF, as I believe that will cause issues on Linus's side.
> 
> We can try, but I suspect git tricks won't do it.
> Masami's changes depend on patches for kernel/bpf/btf.c that
> are already in bpf-next, so git would have to follow all commits
> that touch this file. 

This point is strange. I'm working on probe/fixes which is based on
v6.5-rc3, so any bpf-next change should not be involved. Can you recheck
this point?

> I don't think git is smart enough to
> thread the needle and split the commit into files. If one commit touches
> btf.c and something else that whole commit becomes a dependency
> that pulls another commit with all files touched by
> the previous commit and so on.

As far as I understand Steve's method, we will have an intermediate branch
on bpf or probe tree, like 

linus(some common commit) ---- probes/btf-find-api

and merge it to both bpf-next and probes/for-next branch

          +----------------------bpf-next --- (merge bpf patches)
         /                       / merge
common -/\ probes/btf-find-api -/-\
          \                        \ merge
           +----------------------probes/for-next --- (merge probe patches)

Thus, we can merge both for-next branches at next merge window without
any issue. (But, yes, this is not simple, and needs maxium care)

Thank you,

> tbh for this set, the easiest for everyone, is to land the whole thing
> through bpf-next, since there are no conflicts on fprobe side.

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01 15:02         ` Masami Hiramatsu
@ 2023-08-01 15:20           ` Steven Rostedt
  2023-08-01 15:32             ` Steven Rostedt
  0 siblings, 1 reply; 47+ messages in thread
From: Steven Rostedt @ 2023-08-01 15:20 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Alexei Starovoitov, linux-trace-kernel, LKML, Martin KaFai Lau,
	bpf, Sven Schnelle, Alexei Starovoitov

On Wed, 2 Aug 2023 00:02:28 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > If it diverges from that it's a big issue for bpf.
> > We'd have to remove all of fprobe usage.
> > I could be missing something, of course.  
> 
> Yes, so that's the discussion point. At first, I will disable fprobe on BPF
> if ftrace_regs is not compatible with pt_regs, but eventually it should be
> handled to support arm64. I believe BPF can do it since ftrace can do.

Note, for FYI let me give you a little history of where ftrace_regs came
from. When I realized that all function tracing had to save all the
registers that represent the arguments of a function as well as the stack
pointer, I wanted to change the non FTRACE_WITH_REGS to be able to have
access to those registers. This is where FTRACE_WITH_ARGS came from.

My first attempt was to pass a pt_regs that was partially filled, with only
the registers required for the arguments. But the x86 maintainers NACK'd
that. They refused to allow a partially filled pt_regs as that could cause
bugs in the future when a user may assume that the pt_regs is filled but is
not.

The solution was to come up with ftrace_regs, which just means it has all
the registers to extract the arguments of a function and nothing more. Most
implementations just have a partially filled pt_regs within it, but an API
needs to be used to get to the argument values.

When you say BPF uses pt_regs, is the pt_regs full or does it get passed a
partially filled structure?

For fast function entry, ftrace_regs is what should be used if the pt_regs
is not filled. As it is only for use for function entry. It supplies all
regs and stack pointer to get to all the arguments.

-- Steve

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01 15:20           ` Steven Rostedt
@ 2023-08-01 15:32             ` Steven Rostedt
  2023-08-01 22:18               ` Alexei Starovoitov
  0 siblings, 1 reply; 47+ messages in thread
From: Steven Rostedt @ 2023-08-01 15:32 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Alexei Starovoitov, linux-trace-kernel, LKML, Martin KaFai Lau,
	bpf, Sven Schnelle, Alexei Starovoitov

On Tue, 1 Aug 2023 11:20:36 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> The solution was to come up with ftrace_regs, which just means it has all
> the registers to extract the arguments of a function and nothing more. Most

This isn't 100% true. The ftrace_regs may hold a fully filled pt_regs. As
the FTRACE_WITH_REGS callbacks still get passed a ftrace_regs pointer. They
will do:

	void callback(..., struct ftrace_regs *fregs) {
		struct pt_regs *regs = ftrace_get_regs(fregs);


Where ftrace_get_regs() will return the pt_regs only if it is fully filled.
If it is not, then it returns NULL. This was what the x86 maintainers
agreed with.

-- Steve


> implementations just have a partially filled pt_regs within it, but an API
> needs to be used to get to the argument values.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01 15:32             ` Steven Rostedt
@ 2023-08-01 22:18               ` Alexei Starovoitov
  2023-08-01 23:09                 ` Steven Rostedt
  0 siblings, 1 reply; 47+ messages in thread
From: Alexei Starovoitov @ 2023-08-01 22:18 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu (Google), linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire

On Tue, Aug 1, 2023 at 8:32 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 1 Aug 2023 11:20:36 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > The solution was to come up with ftrace_regs, which just means it has all
> > the registers to extract the arguments of a function and nothing more. Most
>
> This isn't 100% true. The ftrace_regs may hold a fully filled pt_regs. As
> the FTRACE_WITH_REGS callbacks still get passed a ftrace_regs pointer. They
> will do:
>
>         void callback(..., struct ftrace_regs *fregs) {
>                 struct pt_regs *regs = ftrace_get_regs(fregs);
>
>
> Where ftrace_get_regs() will return the pt_regs only if it is fully filled.
> If it is not, then it returns NULL. This was what the x86 maintainers
> agreed with.

arch/arm64/include/asm/ftrace.h:#define arch_ftrace_get_regs(regs) NULL

Ouch. That's very bad.
We care a lot about bpf running well on arm64.

If you guys decide to convert fprobe to ftrace_regs please
make it depend on kconfig or something.
bpf side needs full pt_regs.
It's not about access to args.
pt_regs is passed from bpf prog further into all kinds of perf event
functions including stack walking.
I think ORC unwinder might depend on availability of all registers.
Other perf helpers might need it too. Like perf_event_output.
bpf progs need to access arguments, no doubt about that.
If ftrace_regs have them exactly in the same offsets as in pt_regs
that might work transparently for bpf progs, but, I'm afraid,
it's not the case on all archs.
So we need full pt_regs to make sure all paths are still working.

Adding Jiri and others.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01 15:18         ` Masami Hiramatsu
@ 2023-08-01 22:21           ` Alexei Starovoitov
  2023-08-01 23:17             ` Masami Hiramatsu
  0 siblings, 1 reply; 47+ messages in thread
From: Alexei Starovoitov @ 2023-08-01 22:21 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, linux-trace-kernel, LKML, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov, Linus Torvalds

On Tue, Aug 1, 2023 at 8:18 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Mon, 31 Jul 2023 19:24:25 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > On Mon, Jul 31, 2023 at 6:15 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Mon, 31 Jul 2023 14:59:47 -0700
> > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > >
> > > > Assuming that is addressed. How do we merge the series?
> > > > The first 3 patches have serious conflicts with bpf trees.
> > > >
> > > > Maybe send the first 3 with extra selftest for above recursion
> > > > targeting bpf-next then we can have a merge commit that Steven can pull
> > > > into tracing?
> > >
> > > Would it be possible to do this by basing it off of one of Linus's tags,
> > > and doing the merge and conflict resolution in your tree before it gets to
> > > Linus?
> > >
> > > That way we can pull in that clean branch without having to pull in
> > > anything else from BPF. I believe Linus prefers this over having tracing
> > > having extra changes from BPF that are not yet in his tree. We only need
> > > these particular changes, we shouldn't be pulling in anything specific for
> > > BPF, as I believe that will cause issues on Linus's side.
> >
> > We can try, but I suspect git tricks won't do it.
> > Masami's changes depend on patches for kernel/bpf/btf.c that
> > are already in bpf-next, so git would have to follow all commits
> > that touch this file.
>
> This point is strange. I'm working on probe/fixes which is based on
> v6.5-rc3, so any bpf-next change should not be involved. Can you recheck
> this point?
>
> > I don't think git is smart enough to
> > thread the needle and split the commit into files. If one commit touches
> > btf.c and something else that whole commit becomes a dependency
> > that pulls another commit with all files touched by
> > the previous commit and so on.
>
> As far as I understand Steve's method, we will have an intermediate branch
> on bpf or probe tree, like
>
> linus(some common commit) ---- probes/btf-find-api
>
> and merge it to both bpf-next and probes/for-next branch
>
>           +----------------------bpf-next --- (merge bpf patches)
>          /                       / merge
> common -/\ probes/btf-find-api -/-\
>           \                        \ merge
>            +----------------------probes/for-next --- (merge probe patches)
>
> Thus, we can merge both for-next branches at next merge window without
> any issue. (But, yes, this is not simple, and needs maxium care)

Sounds like the path of least resistance is to keep the changes
in kernel/trace and consolidate with kernel/bpf/btf.c after the next
merge window.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01 22:18               ` Alexei Starovoitov
@ 2023-08-01 23:09                 ` Steven Rostedt
  2023-08-01 23:44                   ` Alexei Starovoitov
                                     ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Steven Rostedt @ 2023-08-01 23:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Masami Hiramatsu (Google), linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Florent Revest, Peter Zijlstra,
	Thomas Gleixner

On Tue, 1 Aug 2023 15:18:56 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Tue, Aug 1, 2023 at 8:32 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Tue, 1 Aug 2023 11:20:36 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >  
> > > The solution was to come up with ftrace_regs, which just means it has all
> > > the registers to extract the arguments of a function and nothing more. Most  
> >
> > This isn't 100% true. The ftrace_regs may hold a fully filled pt_regs. As
> > the FTRACE_WITH_REGS callbacks still get passed a ftrace_regs pointer. They
> > will do:
> >
> >         void callback(..., struct ftrace_regs *fregs) {
> >                 struct pt_regs *regs = ftrace_get_regs(fregs);
> >
> >
> > Where ftrace_get_regs() will return the pt_regs only if it is fully filled.
> > If it is not, then it returns NULL. This was what the x86 maintainers
> > agreed with.  
> 
> arch/arm64/include/asm/ftrace.h:#define arch_ftrace_get_regs(regs) NULL
> 
> Ouch. That's very bad.
> We care a lot about bpf running well on arm64.

[ Adding Mark and Florent ]

That's because arm64 doesn't support FTRACE_WITH_REGS anymore. Their
function handlers only care about the arguments. If you want full regs at
function entry, then you need to take a breakpoint hit for a full kprobe.

In fact, fprobes isn't even supported on arm64 because it it doesn't have
DYNAMIC_FTRACE_WITH_REGS. I believe that was the reason Masami was trying
to get it to work with ftrace_regs. To get it to work on arm64.

Again, ftrace_get_regs(fregs) is only suppose to return something if the
pt_regs is fully supplied. If they are not, then it must not be used. Are
you not using a fully filled pt_regs? Because that's what both Thomas and
Peter (also added) told me not to do!

Otherwise, ftrace_regs() has support on arm64 for getting to the argument
registers and the stack. Even live kernel patching now uses ftrace_regs().

> 
> If you guys decide to convert fprobe to ftrace_regs please
> make it depend on kconfig or something.
> bpf side needs full pt_regs.

Then use kprobes. When I asked Masami what the difference between fprobes
and kprobes was, he told me that it would be that it would no longer rely
on the slower FTRACE_WITH_REGS. But currently, it still does.

The reason I started the FTRACE_WITH_ARGS (which gave us ftrace_regs) in
the first place, was because of the overhead you reported to me with
ftrace_regs_caller and why you wanted to go the direct trampoline approach.
That's when I realized I could use a subset because those registers were
already being saved. The only reason FTRACE_WITH_REGS was created was it
had to supply full pt_regs (including flags) and emulate a breakpoint for
the kprobes interface. But in reality, nothing really needs all that.

> It's not about access to args.
> pt_regs is passed from bpf prog further into all kinds of perf event
> functions including stack walking.

ftrace_regs gives you the stack pointer. Basically, it gives you access to
anything that is required to be saved to do a function call from fentry.

> I think ORC unwinder might depend on availability of all registers.
> Other perf helpers might need it too. Like perf_event_output.
> bpf progs need to access arguments, no doubt about that.
> If ftrace_regs have them exactly in the same offsets as in pt_regs
> that might work transparently for bpf progs, but, I'm afraid,
> it's not the case on all archs.
> So we need full pt_regs to make sure all paths are still working.
> 
> Adding Jiri and others.

Then I recommend that you give up using fprobes and just stick with kprobes
as that's guaranteed to give you full pt_regs (at the overhead of doing
things like filing in flags and such). And currently for arm64, fprobes can
only work with ftrace_regs, without the full pt_regs.

-- Steve

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01 22:21           ` Alexei Starovoitov
@ 2023-08-01 23:17             ` Masami Hiramatsu
  0 siblings, 0 replies; 47+ messages in thread
From: Masami Hiramatsu @ 2023-08-01 23:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, linux-trace-kernel, LKML, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov, Linus Torvalds

On Tue, 1 Aug 2023 15:21:59 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Tue, Aug 1, 2023 at 8:18 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Mon, 31 Jul 2023 19:24:25 -0700
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > On Mon, Jul 31, 2023 at 6:15 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > > >
> > > > On Mon, 31 Jul 2023 14:59:47 -0700
> > > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > > Assuming that is addressed. How do we merge the series?
> > > > > The first 3 patches have serious conflicts with bpf trees.
> > > > >
> > > > > Maybe send the first 3 with extra selftest for above recursion
> > > > > targeting bpf-next then we can have a merge commit that Steven can pull
> > > > > into tracing?
> > > >
> > > > Would it be possible to do this by basing it off of one of Linus's tags,
> > > > and doing the merge and conflict resolution in your tree before it gets to
> > > > Linus?
> > > >
> > > > That way we can pull in that clean branch without having to pull in
> > > > anything else from BPF. I believe Linus prefers this over having tracing
> > > > having extra changes from BPF that are not yet in his tree. We only need
> > > > these particular changes, we shouldn't be pulling in anything specific for
> > > > BPF, as I believe that will cause issues on Linus's side.
> > >
> > > We can try, but I suspect git tricks won't do it.
> > > Masami's changes depend on patches for kernel/bpf/btf.c that
> > > are already in bpf-next, so git would have to follow all commits
> > > that touch this file.
> >
> > This point is strange. I'm working on probe/fixes which is based on
> > v6.5-rc3, so any bpf-next change should not be involved. Can you recheck
> > this point?
> >
> > > I don't think git is smart enough to
> > > thread the needle and split the commit into files. If one commit touches
> > > btf.c and something else that whole commit becomes a dependency
> > > that pulls another commit with all files touched by
> > > the previous commit and so on.
> >
> > As far as I understand Steve's method, we will have an intermediate branch
> > on bpf or probe tree, like
> >
> > linus(some common commit) ---- probes/btf-find-api
> >
> > and merge it to both bpf-next and probes/for-next branch
> >
> >           +----------------------bpf-next --- (merge bpf patches)
> >          /                       / merge
> > common -/\ probes/btf-find-api -/-\
> >           \                        \ merge
> >            +----------------------probes/for-next --- (merge probe patches)
> >
> > Thus, we can merge both for-next branches at next merge window without
> > any issue. (But, yes, this is not simple, and needs maxium care)
> 
> Sounds like the path of least resistance is to keep the changes
> in kernel/trace and consolidate with kernel/bpf/btf.c after the next
> merge window.

OK, sounds good to me. I will only expose the bpf_find_btf_id() then.

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01 23:09                 ` Steven Rostedt
@ 2023-08-01 23:44                   ` Alexei Starovoitov
  2023-08-02  0:21                   ` Masami Hiramatsu
  2023-08-02 14:44                   ` Florent Revest
  2 siblings, 0 replies; 47+ messages in thread
From: Alexei Starovoitov @ 2023-08-01 23:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu (Google), linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Florent Revest, Peter Zijlstra,
	Thomas Gleixner

On Tue, Aug 1, 2023 at 4:09 PM Steven Rostedt <rostedt@goodmis.org> wrote>
> Then I recommend that you give up using fprobes and just stick with kprobes
> as that's guaranteed to give you full pt_regs (at the overhead of doing
> things like filing in flags and such). And currently for arm64, fprobes can
> only work with ftrace_regs, without the full pt_regs.

bpf doesn't attach to fprobes directly. That was never requested.
But Jiri's work to support multi attach
https://lore.kernel.org/bpf/20220316122419.933957-1-jolsa@kernel.org/
was a joint effort with Masami that relied on fprobe multi attach api.
register_fprobe_ips() in particular, because the promise you guys
give us that callback will get pt_regs as
described in Documentation/trace/fprobe.rst.
From bpf side we don't care that such pt_regs is 100% filled in or
only partial as long as this pt_regs pointer is valid for perf_event_output
and stack walking that consume pt_regs.
I believe that was and still is the case for both x86 and arm64.

The way I understood Masami's intent is to change that promise and
fprobe callback will receive ftrace_regs that is incompatible with
pt_regs and that's obviously bad.
What you're suggesting "give up on using fprobe" is not up to us.
We're not using them. We care about register_fprobe_ips() and what
callback receives. Whatever internal changes to fprobe you're doing
are ok as long as the callback receives valid pt_regs (even partially filled).

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01 23:09                 ` Steven Rostedt
  2023-08-01 23:44                   ` Alexei Starovoitov
@ 2023-08-02  0:21                   ` Masami Hiramatsu
  2023-08-02  0:40                     ` Steven Rostedt
  2023-08-02 14:44                   ` Florent Revest
  2 siblings, 1 reply; 47+ messages in thread
From: Masami Hiramatsu @ 2023-08-02  0:21 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, Masami Hiramatsu (Google), linux-trace-kernel,
	LKML, Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Florent Revest, Peter Zijlstra,
	Thomas Gleixner

On Tue, 1 Aug 2023 19:09:20 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 1 Aug 2023 15:18:56 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > On Tue, Aug 1, 2023 at 8:32 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Tue, 1 Aug 2023 11:20:36 -0400
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > >  
> > > > The solution was to come up with ftrace_regs, which just means it has all
> > > > the registers to extract the arguments of a function and nothing more. Most  
> > >
> > > This isn't 100% true. The ftrace_regs may hold a fully filled pt_regs. As
> > > the FTRACE_WITH_REGS callbacks still get passed a ftrace_regs pointer. They
> > > will do:
> > >
> > >         void callback(..., struct ftrace_regs *fregs) {
> > >                 struct pt_regs *regs = ftrace_get_regs(fregs);
> > >
> > >
> > > Where ftrace_get_regs() will return the pt_regs only if it is fully filled.
> > > If it is not, then it returns NULL. This was what the x86 maintainers
> > > agreed with.  
> > 
> > arch/arm64/include/asm/ftrace.h:#define arch_ftrace_get_regs(regs) NULL
> > 
> > Ouch. That's very bad.
> > We care a lot about bpf running well on arm64.
> 
> [ Adding Mark and Florent ]
> 
> That's because arm64 doesn't support FTRACE_WITH_REGS anymore. Their
> function handlers only care about the arguments. If you want full regs at
> function entry, then you need to take a breakpoint hit for a full kprobe.
> 
> In fact, fprobes isn't even supported on arm64 because it it doesn't have
> DYNAMIC_FTRACE_WITH_REGS. I believe that was the reason Masami was trying
> to get it to work with ftrace_regs. To get it to work on arm64.

That's right. And I think (agree) pt_regs is too heavy for function entry/exit
because most users needs to access the function arguments or return value.
kprobes is a bit different because it is for instruction level inspection
tool.

> 
> Again, ftrace_get_regs(fregs) is only suppose to return something if the
> pt_regs is fully supplied. If they are not, then it must not be used. Are
> you not using a fully filled pt_regs? Because that's what both Thomas and
> Peter (also added) told me not to do!

I guess that the user-land BPF tools (compliers etc.) only generates
bytecode to access registers in pt_regs for kernel probes currently.
This is why you are using "kprobes" as a naming. But I think you can be
more flexible to generate the code to access registers in ftrace_regs.
(because it's just a difference in the offset value)

> 
> Otherwise, ftrace_regs() has support on arm64 for getting to the argument
> registers and the stack. Even live kernel patching now uses ftrace_regs().
> 
> > 
> > If you guys decide to convert fprobe to ftrace_regs please
> > make it depend on kconfig or something.
> > bpf side needs full pt_regs.
> 
> Then use kprobes. When I asked Masami what the difference between fprobes
> and kprobes was, he told me that it would be that it would no longer rely
> on the slower FTRACE_WITH_REGS. But currently, it still does.

kprobes needs to keep using pt_regs because software-breakpoint exception
handler gets that. And fprobe is used for bpf multi-kprobe interface,
but I think it can be optional.

So until user-land tool supports the ftrace_regs, you can just disable
using fprobes if CONFIG_DYNAMIC_FTRACE_WITH_REGS=n

Then you can safely use 

struct pt_regs *regs = ftrace_get_regs(fregs);

I think we can just replace the CONFIG_FPROBE ifdefs with
CONFIG_DYNAMIC_FTRACE_WITH_REGS in kernel/trace/bpf_trace.c
And that will be the first version of using ftrace_regs in fprobe.

> 
> The reason I started the FTRACE_WITH_ARGS (which gave us ftrace_regs) in
> the first place, was because of the overhead you reported to me with
> ftrace_regs_caller and why you wanted to go the direct trampoline approach.
> That's when I realized I could use a subset because those registers were
> already being saved. The only reason FTRACE_WITH_REGS was created was it
> had to supply full pt_regs (including flags) and emulate a breakpoint for
> the kprobes interface. But in reality, nothing really needs all that.
> 
> > It's not about access to args.
> > pt_regs is passed from bpf prog further into all kinds of perf event
> > functions including stack walking.
> 
> ftrace_regs gives you the stack pointer. Basically, it gives you access to
> anything that is required to be saved to do a function call from fentry.

Yeah, for stack walking, we usually need stack pointer and instruction pointer
or frame pointer. But Alexei made a good point, linux/stacktrace.h provides
pt_regs interaface because pt_regs is a generic (arch-independent) data
structure. (see arch_stack_walk()) We need a new interface for it.

> 
> > I think ORC unwinder might depend on availability of all registers.

This is not correct. ORC uses limited registers (r10, r13, bp, sp, di, dx)
on x86. Anyway, since ftrace can make a stacktrace, it should be possible
to use ORC with ftrace_regs.

> > Other perf helpers might need it too. Like perf_event_output.
> > bpf progs need to access arguments, no doubt about that.
> > If ftrace_regs have them exactly in the same offsets as in pt_regs
> > that might work transparently for bpf progs, but, I'm afraid,
> > it's not the case on all archs.
> > So we need full pt_regs to make sure all paths are still working.
> > 
> > Adding Jiri and others.
> 
> Then I recommend that you give up using fprobes and just stick with kprobes
> as that's guaranteed to give you full pt_regs (at the overhead of doing
> things like filing in flags and such). And currently for arm64, fprobes can
> only work with ftrace_regs, without the full pt_regs.

I think we can continue to limit usage of fprobe(kprobe_multi) with
CONFIG_DYNAMIC_FTRACE_WITH_REGS, which can be configured on x86. That will
not change anything from the BPF point of view.

Thank you,

> 
> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02  0:21                   ` Masami Hiramatsu
@ 2023-08-02  0:40                     ` Steven Rostedt
  2023-08-02  0:44                       ` Steven Rostedt
  2023-08-02 13:56                       ` Masami Hiramatsu
  0 siblings, 2 replies; 47+ messages in thread
From: Steven Rostedt @ 2023-08-02  0:40 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Alexei Starovoitov, linux-trace-kernel, LKML, Martin KaFai Lau,
	bpf, Sven Schnelle, Alexei Starovoitov, Jiri Olsa,
	Arnaldo Carvalho de Melo, Daniel Borkmann, Alan Maguire,
	Mark Rutland, Florent Revest, Peter Zijlstra, Thomas Gleixner

On Wed, 2 Aug 2023 09:21:46 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> > Then use kprobes. When I asked Masami what the difference between fprobes
> > and kprobes was, he told me that it would be that it would no longer rely
> > on the slower FTRACE_WITH_REGS. But currently, it still does.  
> 
> kprobes needs to keep using pt_regs because software-breakpoint exception
> handler gets that. And fprobe is used for bpf multi-kprobe interface,
> but I think it can be optional.
> 
> So until user-land tool supports the ftrace_regs, you can just disable
> using fprobes if CONFIG_DYNAMIC_FTRACE_WITH_REGS=n

I'm confused. I asked about the difference between kprobes on ftrace
and fprobes, and you said it was to get rid of the requirement of
FTRACE_WITH_REGS.

 https://lore.kernel.org/all/20230120205535.98998636329ca4d5f8325bc3@kernel.org/

> 
> Then you can safely use 
> 
> struct pt_regs *regs = ftrace_get_regs(fregs);
> 
> I think we can just replace the CONFIG_FPROBE ifdefs with
> CONFIG_DYNAMIC_FTRACE_WITH_REGS in kernel/trace/bpf_trace.c
> And that will be the first version of using ftrace_regs in fprobe.

But it is still slow. The FTRACE_WITH_REGS gives us the full pt_regs
and saves all registers including flags, which is a very slow operation
(and noticeable in profilers).

And this still doesn't work on arm64.

Maybe we can add a ftrace_partial_regs(fregs) that returns a
partially filled pt_regs, and the caller that uses this obviously knows
its partial (as it's in the name). But this doesn't quite help out arm64
because unlike x86, struct ftrace_regs does not contain an address
compatibility with pt_regs fields. It would need to do a copy.

 ftrace_partial_regs(fregs, &regs) ?

-- Steve

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02  0:40                     ` Steven Rostedt
@ 2023-08-02  0:44                       ` Steven Rostedt
  2023-08-02  2:22                         ` Alexei Starovoitov
  2023-08-02 13:56                       ` Masami Hiramatsu
  1 sibling, 1 reply; 47+ messages in thread
From: Steven Rostedt @ 2023-08-02  0:44 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Alexei Starovoitov, linux-trace-kernel, LKML, Martin KaFai Lau,
	bpf, Sven Schnelle, Alexei Starovoitov, Jiri Olsa,
	Arnaldo Carvalho de Melo, Daniel Borkmann, Alan Maguire,
	Mark Rutland, Florent Revest, Peter Zijlstra, Thomas Gleixner

On Tue, 1 Aug 2023 20:40:54 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Maybe we can add a ftrace_partial_regs(fregs) that returns a
> partially filled pt_regs, and the caller that uses this obviously knows
> its partial (as it's in the name). But this doesn't quite help out arm64
> because unlike x86, struct ftrace_regs does not contain an address
> compatibility with pt_regs fields. It would need to do a copy.
> 
>  ftrace_partial_regs(fregs, &regs) ?

Well, both would be pointers so you wouldn't need the "&", but it was
to stress that it would be copying one to the other.

  void ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs regs);

-- Steve

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02  0:44                       ` Steven Rostedt
@ 2023-08-02  2:22                         ` Alexei Starovoitov
  2023-08-02  2:32                           ` Steven Rostedt
  2023-08-02 14:07                           ` Masami Hiramatsu
  0 siblings, 2 replies; 47+ messages in thread
From: Alexei Starovoitov @ 2023-08-02  2:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu (Google), linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Florent Revest, Peter Zijlstra,
	Thomas Gleixner

On Tue, Aug 1, 2023 at 5:44 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 1 Aug 2023 20:40:54 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > Maybe we can add a ftrace_partial_regs(fregs) that returns a
> > partially filled pt_regs, and the caller that uses this obviously knows
> > its partial (as it's in the name). But this doesn't quite help out arm64
> > because unlike x86, struct ftrace_regs does not contain an address
> > compatibility with pt_regs fields. It would need to do a copy.
> >
> >  ftrace_partial_regs(fregs, &regs) ?
>
> Well, both would be pointers so you wouldn't need the "&", but it was
> to stress that it would be copying one to the other.
>
>   void ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs regs);

Copy works, but why did you pick a different layout?
Why not to use pt_regs ? if save of flags is slow, just skip that part
and whatever else that is slow. You don't even need to zero out
unsaved fields. Just ask the caller to zero out pt_regs before hand.
Most users have per-cpu pt_regs that is being reused.
So there will be one zero-out in the beginning and every partial
save of regs will be fast.
Then there won't be any need for copy-converter from ftrace_regs to pt_regs.
Maybe too much churn at this point. copy is fine.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02  2:22                         ` Alexei Starovoitov
@ 2023-08-02  2:32                           ` Steven Rostedt
  2023-08-02 14:07                           ` Masami Hiramatsu
  1 sibling, 0 replies; 47+ messages in thread
From: Steven Rostedt @ 2023-08-02  2:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Masami Hiramatsu (Google), linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Florent Revest, Peter Zijlstra,
	Thomas Gleixner

On Tue, 1 Aug 2023 19:22:01 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> >   void ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs regs);  
> 
> Copy works, but why did you pick a different layout?

I didn't. That code was written by the arm64 maintainers.

-- Steve

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02  0:40                     ` Steven Rostedt
  2023-08-02  0:44                       ` Steven Rostedt
@ 2023-08-02 13:56                       ` Masami Hiramatsu
  2023-08-02 14:48                         ` Florent Revest
                                           ` (2 more replies)
  1 sibling, 3 replies; 47+ messages in thread
From: Masami Hiramatsu @ 2023-08-02 13:56 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, linux-trace-kernel, LKML, Martin KaFai Lau,
	bpf, Sven Schnelle, Alexei Starovoitov, Jiri Olsa,
	Arnaldo Carvalho de Melo, Daniel Borkmann, Alan Maguire,
	Mark Rutland, Florent Revest, Peter Zijlstra, Thomas Gleixner

On Tue, 1 Aug 2023 20:40:54 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 2 Aug 2023 09:21:46 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > > Then use kprobes. When I asked Masami what the difference between fprobes
> > > and kprobes was, he told me that it would be that it would no longer rely
> > > on the slower FTRACE_WITH_REGS. But currently, it still does.  
> > 
> > kprobes needs to keep using pt_regs because software-breakpoint exception
> > handler gets that. And fprobe is used for bpf multi-kprobe interface,
> > but I think it can be optional.
> > 
> > So until user-land tool supports the ftrace_regs, you can just disable
> > using fprobes if CONFIG_DYNAMIC_FTRACE_WITH_REGS=n
> 
> I'm confused. I asked about the difference between kprobes on ftrace
> and fprobes, and you said it was to get rid of the requirement of
> FTRACE_WITH_REGS.
> 
>  https://lore.kernel.org/all/20230120205535.98998636329ca4d5f8325bc3@kernel.org/

Yes, it is for enabling fprobe (and fprobe-event) on more architectures.
I don't think it's possible to change everything at once. So, it will be
changed step by step. At the first step, I will replace pt_regs with
ftrace_regs, and make bpf_trace.c and fprobe_event depends on
FTRACE_WITH_REGS.

At this point, we can split the problem into two, how to move bpf on
ftrace_regs and how to move fprobe-event on ftrace_regs. fprobe-event
change is not hard because it is closing in the kernel and I can do it.
But for BPF, I need to ask BPF user-land tools to support ftrace_regs.

> 
> > 
> > Then you can safely use 
> > 
> > struct pt_regs *regs = ftrace_get_regs(fregs);
> > 
> > I think we can just replace the CONFIG_FPROBE ifdefs with
> > CONFIG_DYNAMIC_FTRACE_WITH_REGS in kernel/trace/bpf_trace.c
> > And that will be the first version of using ftrace_regs in fprobe.
> 
> But it is still slow. The FTRACE_WITH_REGS gives us the full pt_regs
> and saves all registers including flags, which is a very slow operation
> (and noticeable in profilers).

Yes, to solve this part, we need to work with BPF user-land people.
I guess the BPF is accessing registers from pt_regs with fixed offset
which is calculated from pt_regs layout in the user-space.

> 
> And this still doesn't work on arm64.

Yes, and this makes more motivation to move on ftrace_regs.

Thank you,


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02  2:22                         ` Alexei Starovoitov
  2023-08-02  2:32                           ` Steven Rostedt
@ 2023-08-02 14:07                           ` Masami Hiramatsu
  2023-08-02 15:08                             ` Florent Revest
  1 sibling, 1 reply; 47+ messages in thread
From: Masami Hiramatsu @ 2023-08-02 14:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Masami Hiramatsu (Google), linux-trace-kernel,
	LKML, Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Florent Revest, Peter Zijlstra,
	Thomas Gleixner

On Tue, 1 Aug 2023 19:22:01 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Tue, Aug 1, 2023 at 5:44 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Tue, 1 Aug 2023 20:40:54 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > > Maybe we can add a ftrace_partial_regs(fregs) that returns a
> > > partially filled pt_regs, and the caller that uses this obviously knows
> > > its partial (as it's in the name). But this doesn't quite help out arm64
> > > because unlike x86, struct ftrace_regs does not contain an address
> > > compatibility with pt_regs fields. It would need to do a copy.
> > >
> > >  ftrace_partial_regs(fregs, &regs) ?
> >
> > Well, both would be pointers so you wouldn't need the "&", but it was
> > to stress that it would be copying one to the other.
> >
> >   void ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs regs);
> 
> Copy works, but why did you pick a different layout?

I think it is for minimize the stack consumption. pt_regs on arm64 will
consume 42*u64 = 336 bytes, on the other hand ftrace_regs will use
14*unsigned long = 112 bytes. And most of the registers in pt_regs are not
accessed usually. (as you may know RISC processors usually have many
registers - and x86 will be if we use APX in kernel. So pt_regs is big.)

> Why not to use pt_regs ? if save of flags is slow, just skip that part
> and whatever else that is slow. You don't even need to zero out
> unsaved fields. Just ask the caller to zero out pt_regs before hand.
> Most users have per-cpu pt_regs that is being reused.
> So there will be one zero-out in the beginning and every partial
> save of regs will be fast.
> Then there won't be any need for copy-converter from ftrace_regs to pt_regs.
> Maybe too much churn at this point. copy is fine.

If there is no nested call, yeah, per-cpu pt_regs will work.

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-01 23:09                 ` Steven Rostedt
  2023-08-01 23:44                   ` Alexei Starovoitov
  2023-08-02  0:21                   ` Masami Hiramatsu
@ 2023-08-02 14:44                   ` Florent Revest
  2023-08-02 16:11                     ` Steven Rostedt
  2023-08-03 15:42                     ` Masami Hiramatsu
  2 siblings, 2 replies; 47+ messages in thread
From: Florent Revest @ 2023-08-02 14:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, Masami Hiramatsu (Google), linux-trace-kernel,
	LKML, Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Peter Zijlstra, Thomas Gleixner

On Wed, Aug 2, 2023 at 1:09 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 1 Aug 2023 15:18:56 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > On Tue, Aug 1, 2023 at 8:32 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Tue, 1 Aug 2023 11:20:36 -0400
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > > The solution was to come up with ftrace_regs, which just means it has all
> > > > the registers to extract the arguments of a function and nothing more. Most
> > >
> > > This isn't 100% true. The ftrace_regs may hold a fully filled pt_regs. As
> > > the FTRACE_WITH_REGS callbacks still get passed a ftrace_regs pointer. They
> > > will do:
> > >
> > >         void callback(..., struct ftrace_regs *fregs) {
> > >                 struct pt_regs *regs = ftrace_get_regs(fregs);
> > >
> > >
> > > Where ftrace_get_regs() will return the pt_regs only if it is fully filled.
> > > If it is not, then it returns NULL. This was what the x86 maintainers
> > > agreed with.
> >
> > arch/arm64/include/asm/ftrace.h:#define arch_ftrace_get_regs(regs) NULL
> >
> > Ouch. That's very bad.
> > We care a lot about bpf running well on arm64.
>
> [ Adding Mark and Florent ]

Ah, thanks Steve! That's my favorite can of worms :) I actually
consider sending a talk proposal to the tracing MC at LPC "pt_regs -
the good the bad and the ugly" on this very topic because I care about
unblocking BPF "multi_kprobe" (which really is fprobe) on arm64, maybe
it would be interesting.

> That's because arm64 doesn't support FTRACE_WITH_REGS anymore. Their
> function handlers only care about the arguments. If you want full regs at
> function entry, then you need to take a breakpoint hit for a full kprobe.

The main reason why arm64 dropped FTRACE_WITH_REGS is because some
registers (like pstate) can not be saved outside of an exception entry
(they are just wrong), so trampolines either have to make a pstate up
or not populate it.

The other reasons are: simplicity (for architectural reasons, it's a
lot easier to have only one type of ftrace trampoline on arm64, the
"with_args" one) and performance (as you said, why bother saving a
pt_regs when most ftrace users don't need it anyway). If you need an
actual full pt_regs, then your use case is debugging rather than
tracing and you should be able to deal with the slowness and go
through an exception (a kprobe).

> In fact, fprobes isn't even supported on arm64 because it it doesn't have
> DYNAMIC_FTRACE_WITH_REGS. I believe that was the reason Masami was trying
> to get it to work with ftrace_regs. To get it to work on arm64.
>
> Again, ftrace_get_regs(fregs) is only suppose to return something if the
> pt_regs is fully supplied. If they are not, then it must not be used. Are
> you not using a fully filled pt_regs? Because that's what both Thomas and
> Peter (also added) told me not to do!

Funnily enough, there's another use of sparse pt_regs in the kernel, in Perf:
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/tree/arch/arm64/include/asm/perf_event.h#n20
Notice how Perf on arm64 implicitly expects the "pstate" register to
be set (the very register which we try so hard not to fake in
ftrace_regs) because Perf happens to call the "user_mode()" macro
somewhere which reads this field:
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/tree/arch/arm64/include/asm/ptrace.h#n227

I pointed this out in
https://lore.kernel.org/all/CABRcYm+esb8J2O1v6=C+h+HSa5NxraPUgo63w7-iZj0CXbpusg@mail.gmail.com/#t
when Masami proposed adding calls from fprobe to perf. If every
subsystem makes different assumptions about "how sparse" their pt_regs
is and they call into one another, this could lead to... interesting
bugs. (eg: currently, we don't populate a fake pstate in ftrace_regs.
so we'd need to fake it when creating a sparse pt_regs _for Perf_,
knowing that Perf specifically expects this reg to be set. this would
require a struct copy anyway and some knowledge about how the data
will be consumed, in an arch- and subsystem- specific way)

On the other hand, untangling all code paths that come from
trampolines (with a light regs structure) from those that come from an
exception (with a pt_regs) could lead to a lot of duplicated code, and
converting between each subsystem's idea of a light regs structure
(what if perf introduces a perf_regs now ?) would be tedious and slow
(lots of copies ?).

> Otherwise, ftrace_regs() has support on arm64 for getting to the argument
> registers and the stack. Even live kernel patching now uses ftrace_regs().
>
> >
> > If you guys decide to convert fprobe to ftrace_regs please
> > make it depend on kconfig or something.
> > bpf side needs full pt_regs.

Some wild ideas that I brought up once in a BPF office hour: BPF
"multi_kprobe" could provide a fake pt_regs (either by constructing a
sparse one on the stack or by JIT-ing different offset accesses and/or
by having the verifier deny access to unpopulated fields) or break the
current API (is it conceivable to phase out BPF "multi_kprobe"
programs in favor of BPF "fprobe" programs that don't lie about their
API and guarantees and just provide a ftrace_regs ?)

> Then use kprobes. When I asked Masami what the difference between fprobes
> and kprobes was, he told me that it would be that it would no longer rely
> on the slower FTRACE_WITH_REGS. But currently, it still does.

Actually... Moving fprobe to ftrace_regs should get even more spicy!
:) Fprobe also wraps "rethook" which is basically the same thing as
kretprobe: a return trampoline that saves a pt_regs, to the point that
on x86 kretprobe's trampoline got dropped in favor of rethook's
trampoline. But for the same reasons that we don't want ftrace to save
pt_regs on arm64, rethook should probably also just save a ftrace_regs
? (also, to keep the fprobe callback signatures consistent between
pre- and post- handlers). But if we want fprobe "post" callbacks to
save a ftrace_regs now, either we need to re-introduce the kretprobe
trampoline or also change the API of kretprobe (and break its symmetry
with kprobe and we'd have the same problem all over again with BPF
kretprobe program types...). All of this is "beautifully" entangled...
:)

> The reason I started the FTRACE_WITH_ARGS (which gave us ftrace_regs) in
> the first place, was because of the overhead you reported to me with
> ftrace_regs_caller and why you wanted to go the direct trampoline approach.
> That's when I realized I could use a subset because those registers were
> already being saved. The only reason FTRACE_WITH_REGS was created was it
> had to supply full pt_regs (including flags) and emulate a breakpoint for
> the kprobes interface. But in reality, nothing really needs all that.
>
> > It's not about access to args.
> > pt_regs is passed from bpf prog further into all kinds of perf event
> > functions including stack walking.

If all accesses are done in BPF bytecode, we could (theoretically)
have the verifier and JIT work together to deny accesses to
unpopulated fields, or relocate pt_regs accesses to ftrace_regs
accesses to keep backward compatibility with existing multi_kprobe BPF
programs.

Is there a risk that a "multi_kprobe" program could call into a BPF
helper or kfunc that reads this pt_regs pointer and expect certain
fields to be set ? I suppose we could also deny giving that "pt_regs"
pointer to a helper... :/

> ftrace_regs gives you the stack pointer. Basically, it gives you access to
> anything that is required to be saved to do a function call from fentry.
>
> > I think ORC unwinder might depend on availability of all registers.
> > Other perf helpers might need it too. Like perf_event_output.
> > bpf progs need to access arguments, no doubt about that.
> > If ftrace_regs have them exactly in the same offsets as in pt_regs
> > that might work transparently for bpf progs, but, I'm afraid,
> > it's not the case on all archs.
> > So we need full pt_regs to make sure all paths are still working.
> >
> > Adding Jiri and others.
>
> Then I recommend that you give up using fprobes and just stick with kprobes
> as that's guaranteed to give you full pt_regs (at the overhead of doing
> things like filing in flags and such). And currently for arm64, fprobes can
> only work with ftrace_regs, without the full pt_regs.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02 13:56                       ` Masami Hiramatsu
@ 2023-08-02 14:48                         ` Florent Revest
  2023-08-02 15:47                         ` Florent Revest
  2023-08-02 18:24                         ` Alexei Starovoitov
  2 siblings, 0 replies; 47+ messages in thread
From: Florent Revest @ 2023-08-02 14:48 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Alexei Starovoitov, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Peter Zijlstra, Thomas Gleixner

On Wed, Aug 2, 2023 at 3:56 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Tue, 1 Aug 2023 20:40:54 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Wed, 2 Aug 2023 09:21:46 +0900
> > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> >
> > > > Then use kprobes. When I asked Masami what the difference between fprobes
> > > > and kprobes was, he told me that it would be that it would no longer rely
> > > > on the slower FTRACE_WITH_REGS. But currently, it still does.
> > >
> > > kprobes needs to keep using pt_regs because software-breakpoint exception
> > > handler gets that. And fprobe is used for bpf multi-kprobe interface,
> > > but I think it can be optional.
> > >
> > > So until user-land tool supports the ftrace_regs, you can just disable
> > > using fprobes if CONFIG_DYNAMIC_FTRACE_WITH_REGS=n
> >
> > I'm confused. I asked about the difference between kprobes on ftrace
> > and fprobes, and you said it was to get rid of the requirement of
> > FTRACE_WITH_REGS.
> >
> >  https://lore.kernel.org/all/20230120205535.98998636329ca4d5f8325bc3@kernel.org/
>
> Yes, it is for enabling fprobe (and fprobe-event) on more architectures.
> I don't think it's possible to change everything at once. So, it will be
> changed step by step. At the first step, I will replace pt_regs with
> ftrace_regs, and make bpf_trace.c and fprobe_event depends on
> FTRACE_WITH_REGS.

Just a small note that, strictly speaking,
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y is not enough. fprobe_init() would
also need a way to set FTRACE_OPS_FL_SAVE_REGS conditionally. (you
could be on an arch that supports saving either regs or args and if
you don't set FTRACE_OPS_FL_SAVE_REGS you'd go through the args
trampoline and get a ftrace_regs that doesn't hold a pt_regs)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02 14:07                           ` Masami Hiramatsu
@ 2023-08-02 15:08                             ` Florent Revest
  0 siblings, 0 replies; 47+ messages in thread
From: Florent Revest @ 2023-08-02 15:08 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Alexei Starovoitov, Steven Rostedt, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Peter Zijlstra, Thomas Gleixner

On Wed, Aug 2, 2023 at 4:07 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Tue, 1 Aug 2023 19:22:01 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > On Tue, Aug 1, 2023 at 5:44 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Tue, 1 Aug 2023 20:40:54 -0400
> > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > > Maybe we can add a ftrace_partial_regs(fregs) that returns a
> > > > partially filled pt_regs, and the caller that uses this obviously knows
> > > > its partial (as it's in the name). But this doesn't quite help out arm64
> > > > because unlike x86, struct ftrace_regs does not contain an address
> > > > compatibility with pt_regs fields. It would need to do a copy.
> > > >
> > > >  ftrace_partial_regs(fregs, &regs) ?
> > >
> > > Well, both would be pointers so you wouldn't need the "&", but it was
> > > to stress that it would be copying one to the other.
> > >
> > >   void ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs regs);
> >
> > Copy works, but why did you pick a different layout?
>
> I think it is for minimize the stack consumption. pt_regs on arm64 will
> consume 42*u64 = 336 bytes, on the other hand ftrace_regs will use
> 14*unsigned long = 112 bytes. And most of the registers in pt_regs are not
> accessed usually. (as you may know RISC processors usually have many
> registers - and x86 will be if we use APX in kernel. So pt_regs is big.)
>
> > Why not to use pt_regs ? if save of flags is slow, just skip that part
> > and whatever else that is slow. You don't even need to zero out
> > unsaved fields. Just ask the caller to zero out pt_regs before hand.
> > Most users have per-cpu pt_regs that is being reused.
> > So there will be one zero-out in the beginning and every partial
> > save of regs will be fast.
> > Then there won't be any need for copy-converter from ftrace_regs to pt_regs.
> > Maybe too much churn at this point. copy is fine.
>
> If there is no nested call, yeah, per-cpu pt_regs will work.

BPF "multi_kprobe" programs (ugh, it's pretty awkward we called them
that way given that kprobe is out of the picture and fprobe is subject
to completely different constraints than kprobe) can't be nested, as
checked here: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/kernel/trace/bpf_trace.c?id=4c9fbff54297471d4e2bbfe9c27e80067c722eae#n2642
(this is probably the place where we'd be calling a
"ftrace_partical_regs" anyway so that's cool)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02 13:56                       ` Masami Hiramatsu
  2023-08-02 14:48                         ` Florent Revest
@ 2023-08-02 15:47                         ` Florent Revest
  2023-08-03  1:55                           ` Masami Hiramatsu
  2023-08-02 18:24                         ` Alexei Starovoitov
  2 siblings, 1 reply; 47+ messages in thread
From: Florent Revest @ 2023-08-02 15:47 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Alexei Starovoitov, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Peter Zijlstra, Thomas Gleixner

On Wed, Aug 2, 2023 at 3:56 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Tue, 1 Aug 2023 20:40:54 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Wed, 2 Aug 2023 09:21:46 +0900
> > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> >
> > > > Then use kprobes. When I asked Masami what the difference between fprobes
> > > > and kprobes was, he told me that it would be that it would no longer rely
> > > > on the slower FTRACE_WITH_REGS. But currently, it still does.
> > >
> > > kprobes needs to keep using pt_regs because software-breakpoint exception
> > > handler gets that. And fprobe is used for bpf multi-kprobe interface,
> > > but I think it can be optional.
> > >
> > > So until user-land tool supports the ftrace_regs, you can just disable
> > > using fprobes if CONFIG_DYNAMIC_FTRACE_WITH_REGS=n
> >
> > I'm confused. I asked about the difference between kprobes on ftrace
> > and fprobes, and you said it was to get rid of the requirement of
> > FTRACE_WITH_REGS.
> >
> >  https://lore.kernel.org/all/20230120205535.98998636329ca4d5f8325bc3@kernel.org/
>
> Yes, it is for enabling fprobe (and fprobe-event) on more architectures.
> I don't think it's possible to change everything at once. So, it will be
> changed step by step. At the first step, I will replace pt_regs with
> ftrace_regs, and make bpf_trace.c and fprobe_event depends on
> FTRACE_WITH_REGS.
>
> At this point, we can split the problem into two, how to move bpf on
> ftrace_regs and how to move fprobe-event on ftrace_regs. fprobe-event
> change is not hard because it is closing in the kernel and I can do it.
> But for BPF, I need to ask BPF user-land tools to support ftrace_regs.

Ah! I finally found the branch where I had pushed my proof of concept
of fprobe with ftrace_regs... it's a few months old and I didn't get
it in a state such that it could be sent to the list but maybe this
can save you a little bit of lead time Masami :) (especially the bpf
and arm64 specific bits)

https://github.com/FlorentRevest/linux/commits/bpf-arm-complete

08afb628c6e1 ("ftrace: Add a macro to forge an incomplete pt_regs from
a ftrace_regs")
203e96fe1790 ("fprobe, rethook: Use struct ftrace_regs instead of
struct pt_regs")
1a9e280b9b16 ("arm64,rethook,kprobes: Replace kretprobe with rethook on arm64")
7751c6db9f9d ("bpf: Fix bpf get_func_ip() on arm64 multi-kprobe programs")
a10c49c0d717 ("selftests/bpf: Update the tests deny list on aarch64")

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02 14:44                   ` Florent Revest
@ 2023-08-02 16:11                     ` Steven Rostedt
  2023-08-03 15:42                     ` Masami Hiramatsu
  1 sibling, 0 replies; 47+ messages in thread
From: Steven Rostedt @ 2023-08-02 16:11 UTC (permalink / raw)
  To: Florent Revest
  Cc: Alexei Starovoitov, Masami Hiramatsu (Google), linux-trace-kernel,
	LKML, Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Peter Zijlstra, Thomas Gleixner

On Wed, 2 Aug 2023 16:44:09 +0200
Florent Revest <revest@chromium.org> wrote:

> > [ Adding Mark and Florent ]  
> 
> Ah, thanks Steve! That's my favorite can of worms :) I actually
> consider sending a talk proposal to the tracing MC at LPC "pt_regs -
> the good the bad and the ugly" on this very topic because I care about
> unblocking BPF "multi_kprobe" (which really is fprobe) on arm64, maybe
> it would be interesting.

You bring up some excellent points, and the CFP for the Tracing MC is still
open. Which reminds me, I need to write my blog post!

-- Steve

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02 13:56                       ` Masami Hiramatsu
  2023-08-02 14:48                         ` Florent Revest
  2023-08-02 15:47                         ` Florent Revest
@ 2023-08-02 18:24                         ` Alexei Starovoitov
  2023-08-02 18:38                           ` Steven Rostedt
  2 siblings, 1 reply; 47+ messages in thread
From: Alexei Starovoitov @ 2023-08-02 18:24 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, linux-trace-kernel, LKML, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov, Jiri Olsa,
	Arnaldo Carvalho de Melo, Daniel Borkmann, Alan Maguire,
	Mark Rutland, Florent Revest, Peter Zijlstra, Thomas Gleixner

On Wed, Aug 2, 2023 at 6:56 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Tue, 1 Aug 2023 20:40:54 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Wed, 2 Aug 2023 09:21:46 +0900
> > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> >
> > > > Then use kprobes. When I asked Masami what the difference between fprobes
> > > > and kprobes was, he told me that it would be that it would no longer rely
> > > > on the slower FTRACE_WITH_REGS. But currently, it still does.
> > >
> > > kprobes needs to keep using pt_regs because software-breakpoint exception
> > > handler gets that. And fprobe is used for bpf multi-kprobe interface,
> > > but I think it can be optional.
> > >
> > > So until user-land tool supports the ftrace_regs, you can just disable
> > > using fprobes if CONFIG_DYNAMIC_FTRACE_WITH_REGS=n
> >
> > I'm confused. I asked about the difference between kprobes on ftrace
> > and fprobes, and you said it was to get rid of the requirement of
> > FTRACE_WITH_REGS.
> >
> >  https://lore.kernel.org/all/20230120205535.98998636329ca4d5f8325bc3@kernel.org/
>
> Yes, it is for enabling fprobe (and fprobe-event) on more architectures.
> I don't think it's possible to change everything at once. So, it will be
> changed step by step. At the first step, I will replace pt_regs with
> ftrace_regs, and make bpf_trace.c and fprobe_event depends on
> FTRACE_WITH_REGS.
>
> At this point, we can split the problem into two, how to move bpf on
> ftrace_regs and how to move fprobe-event on ftrace_regs. fprobe-event
> change is not hard because it is closing in the kernel and I can do it.
> But for BPF, I need to ask BPF user-land tools to support ftrace_regs.
>
> >
> > >
> > > Then you can safely use
> > >
> > > struct pt_regs *regs = ftrace_get_regs(fregs);
> > >
> > > I think we can just replace the CONFIG_FPROBE ifdefs with
> > > CONFIG_DYNAMIC_FTRACE_WITH_REGS in kernel/trace/bpf_trace.c
> > > And that will be the first version of using ftrace_regs in fprobe.
> >
> > But it is still slow. The FTRACE_WITH_REGS gives us the full pt_regs
> > and saves all registers including flags, which is a very slow operation
> > (and noticeable in profilers).
>
> Yes, to solve this part, we need to work with BPF user-land people.
> I guess the BPF is accessing registers from pt_regs with fixed offset
> which is calculated from pt_regs layout in the user-space.

This is a non starter.
bpf progs expect arch dependent 'struct pt_regs *' and we cannot change that.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02 18:24                         ` Alexei Starovoitov
@ 2023-08-02 18:38                           ` Steven Rostedt
  2023-08-02 19:48                             ` Alexei Starovoitov
  0 siblings, 1 reply; 47+ messages in thread
From: Steven Rostedt @ 2023-08-02 18:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Masami Hiramatsu, linux-trace-kernel, LKML, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov, Jiri Olsa,
	Arnaldo Carvalho de Melo, Daniel Borkmann, Alan Maguire,
	Mark Rutland, Florent Revest, Peter Zijlstra, Thomas Gleixner

On Wed, 2 Aug 2023 11:24:12 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> This is a non starter.
> bpf progs expect arch dependent 'struct pt_regs *' and we cannot change that.

If the progs are compiled into native code, isn't there optimizations that
could be done? That is, if ftrace_regs is available, and the bpf program is
just using the subset of pt_regs, is it possible that it could be compiled
to use ftrace_regs?

Forgive my ignorance on how BPF programs turn into executables when running
in the kernel.

-- Steve

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02 18:38                           ` Steven Rostedt
@ 2023-08-02 19:48                             ` Alexei Starovoitov
  2023-08-02 20:12                               ` Steven Rostedt
  0 siblings, 1 reply; 47+ messages in thread
From: Alexei Starovoitov @ 2023-08-02 19:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, linux-trace-kernel, LKML, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov, Jiri Olsa,
	Arnaldo Carvalho de Melo, Daniel Borkmann, Alan Maguire,
	Mark Rutland, Florent Revest, Peter Zijlstra, Thomas Gleixner

On Wed, Aug 2, 2023 at 11:38 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 2 Aug 2023 11:24:12 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > This is a non starter.
> > bpf progs expect arch dependent 'struct pt_regs *' and we cannot change that.
>
> If the progs are compiled into native code, isn't there optimizations that
> could be done? That is, if ftrace_regs is available, and the bpf program is
> just using the subset of pt_regs, is it possible that it could be compiled
> to use ftrace_regs?
>
> Forgive my ignorance on how BPF programs turn into executables when running
> in the kernel.

Right. It's possible for the verifier to do an offset rewrite,
forbid certain access, always return 0 on load from certain offset,
and so on.
It's all non trivial amount of work.
ftrace_partial_regs() from ftrace_regs into pt_regs is so much simpler.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02 19:48                             ` Alexei Starovoitov
@ 2023-08-02 20:12                               ` Steven Rostedt
  2023-08-02 21:28                                 ` Alexei Starovoitov
  0 siblings, 1 reply; 47+ messages in thread
From: Steven Rostedt @ 2023-08-02 20:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Masami Hiramatsu, linux-trace-kernel, LKML, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov, Jiri Olsa,
	Arnaldo Carvalho de Melo, Daniel Borkmann, Alan Maguire,
	Mark Rutland, Florent Revest, Peter Zijlstra, Thomas Gleixner

On Wed, 2 Aug 2023 12:48:14 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Wed, Aug 2, 2023 at 11:38 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 2 Aug 2023 11:24:12 -0700
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >  
> > > This is a non starter.
> > > bpf progs expect arch dependent 'struct pt_regs *' and we cannot change that.  
> >
> > If the progs are compiled into native code, isn't there optimizations that
> > could be done? That is, if ftrace_regs is available, and the bpf program is
> > just using the subset of pt_regs, is it possible that it could be compiled
> > to use ftrace_regs?
> >
> > Forgive my ignorance on how BPF programs turn into executables when running
> > in the kernel.  
> 
> Right. It's possible for the verifier to do an offset rewrite,
> forbid certain access, always return 0 on load from certain offset,
> and so on.
> It's all non trivial amount of work.
> ftrace_partial_regs() from ftrace_regs into pt_regs is so much simpler.

Sure, and the copy could be the solution we have in the near future, but if
we could optimize it in the future, then perhaps it would be worth doing it.

Also, how are the bpf programs referencing the pt_regs? Could a ftrace_regs
API be added too? If the verifier sees that the program is using
ftrace_regs, it could then use the lighter weight fprobes for access,
otherwise it falls back to the kprobe version.

-- Steve

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02 20:12                               ` Steven Rostedt
@ 2023-08-02 21:28                                 ` Alexei Starovoitov
  0 siblings, 0 replies; 47+ messages in thread
From: Alexei Starovoitov @ 2023-08-02 21:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Masami Hiramatsu, linux-trace-kernel, LKML, Martin KaFai Lau, bpf,
	Sven Schnelle, Alexei Starovoitov, Jiri Olsa,
	Arnaldo Carvalho de Melo, Daniel Borkmann, Alan Maguire,
	Mark Rutland, Florent Revest, Peter Zijlstra, Thomas Gleixner

On Wed, Aug 2, 2023 at 1:12 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 2 Aug 2023 12:48:14 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > On Wed, Aug 2, 2023 at 11:38 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Wed, 2 Aug 2023 11:24:12 -0700
> > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > >
> > > > This is a non starter.
> > > > bpf progs expect arch dependent 'struct pt_regs *' and we cannot change that.
> > >
> > > If the progs are compiled into native code, isn't there optimizations that
> > > could be done? That is, if ftrace_regs is available, and the bpf program is
> > > just using the subset of pt_regs, is it possible that it could be compiled
> > > to use ftrace_regs?
> > >
> > > Forgive my ignorance on how BPF programs turn into executables when running
> > > in the kernel.
> >
> > Right. It's possible for the verifier to do an offset rewrite,
> > forbid certain access, always return 0 on load from certain offset,
> > and so on.
> > It's all non trivial amount of work.
> > ftrace_partial_regs() from ftrace_regs into pt_regs is so much simpler.
>
> Sure, and the copy could be the solution we have in the near future, but if
> we could optimize it in the future, then perhaps it would be worth doing it.
>
> Also, how are the bpf programs referencing the pt_regs?

Typically through macros that abstract arch differences away in
tools/lib/bpf/bpf_tracing.h
PT_REGS_PARM1
PT_REGS_PARM1_CORE
PT_REGS_PARM1_SYSCALL

pt_regs at syscall entry is special, since syscall calling convention
is different from the rest of the kernel.
ftrace_regs cannot help with that either.

> Could a ftrace_regs
> API be added too?

Potentially yes, but I don't see the value.
bpf users are slowly migrating to fentry/fexit that has accurate
args and return value and much faster.
kprobes are still heavily used, of course.
multi-kprobe (with fprobe_ips underneath) is a new addition that is
also very important to some users.

> If the verifier sees that the program is using
> ftrace_regs, it could then use the lighter weight fprobes for access,
> otherwise it falls back to the kprobe version.
>
> -- Steve

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02 15:47                         ` Florent Revest
@ 2023-08-03  1:55                           ` Masami Hiramatsu
  0 siblings, 0 replies; 47+ messages in thread
From: Masami Hiramatsu @ 2023-08-03  1:55 UTC (permalink / raw)
  To: Florent Revest
  Cc: Steven Rostedt, Alexei Starovoitov, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Peter Zijlstra, Thomas Gleixner

On Wed, 2 Aug 2023 17:47:03 +0200
Florent Revest <revest@chromium.org> wrote:

> On Wed, Aug 2, 2023 at 3:56 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Tue, 1 Aug 2023 20:40:54 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > > On Wed, 2 Aug 2023 09:21:46 +0900
> > > Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> > >
> > > > > Then use kprobes. When I asked Masami what the difference between fprobes
> > > > > and kprobes was, he told me that it would be that it would no longer rely
> > > > > on the slower FTRACE_WITH_REGS. But currently, it still does.
> > > >
> > > > kprobes needs to keep using pt_regs because software-breakpoint exception
> > > > handler gets that. And fprobe is used for bpf multi-kprobe interface,
> > > > but I think it can be optional.
> > > >
> > > > So until user-land tool supports the ftrace_regs, you can just disable
> > > > using fprobes if CONFIG_DYNAMIC_FTRACE_WITH_REGS=n
> > >
> > > I'm confused. I asked about the difference between kprobes on ftrace
> > > and fprobes, and you said it was to get rid of the requirement of
> > > FTRACE_WITH_REGS.
> > >
> > >  https://lore.kernel.org/all/20230120205535.98998636329ca4d5f8325bc3@kernel.org/
> >
> > Yes, it is for enabling fprobe (and fprobe-event) on more architectures.
> > I don't think it's possible to change everything at once. So, it will be
> > changed step by step. At the first step, I will replace pt_regs with
> > ftrace_regs, and make bpf_trace.c and fprobe_event depends on
> > FTRACE_WITH_REGS.
> >
> > At this point, we can split the problem into two, how to move bpf on
> > ftrace_regs and how to move fprobe-event on ftrace_regs. fprobe-event
> > change is not hard because it is closing in the kernel and I can do it.
> > But for BPF, I need to ask BPF user-land tools to support ftrace_regs.
> 
> Ah! I finally found the branch where I had pushed my proof of concept
> of fprobe with ftrace_regs... it's a few months old and I didn't get
> it in a state such that it could be sent to the list but maybe this
> can save you a little bit of lead time Masami :) (especially the bpf
> and arm64 specific bits)
> 
> https://github.com/FlorentRevest/linux/commits/bpf-arm-complete
> 
> 08afb628c6e1 ("ftrace: Add a macro to forge an incomplete pt_regs from
> a ftrace_regs")
> 203e96fe1790 ("fprobe, rethook: Use struct ftrace_regs instead of
> struct pt_regs")
> 1a9e280b9b16 ("arm64,rethook,kprobes: Replace kretprobe with rethook on arm64")
> 7751c6db9f9d ("bpf: Fix bpf get_func_ip() on arm64 multi-kprobe programs")
> a10c49c0d717 ("selftests/bpf: Update the tests deny list on aarch64")

Thanks for the work! I also pushed my patches on 

https://kernel.googlesource.com/pub/scm/linux/kernel/git/mhiramat/linux/+/refs/heads/topic/fprobe-ftrace-regs

628e6c19d7dc ("tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS")
311c98c29cfd ("fprobe: Use fprobe_regs in fprobe entry handler")

This doesn't cover arm64 and rethook, but provides ftrace_regs optimized
fprobe-event code, which uses a correct APIs for ftrace_regs.

For the rethook we still need to provide 2 version for kretprobe(pt_regs)
and fprobe(ftrace_regs).
I think eventually we should replace the kretprobe with fprobe, but
current rethook is tightly coupled with kretprobe and the kretprobe
needs pt_regs. So, I would like to keep arm64 kretprobe impl, and add
new rethook with ftrace_regs.

Or, maybe we need these 2 configs intermediately.
CONFIG_RETHOOK_WITH_REGS - in this case, kretprobe uses rethook
CONFIG_RETHOOK_WITH_ARGS - in this case, kretprobe uses its own stack

The problem is ftrace_regs only depends on CONFIG_DYNAMIC_FTRACE_WITH_*.

Thank you,

-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-02 14:44                   ` Florent Revest
  2023-08-02 16:11                     ` Steven Rostedt
@ 2023-08-03 15:42                     ` Masami Hiramatsu
  2023-08-03 16:37                       ` Florent Revest
  2023-08-07 20:48                       ` Jiri Olsa
  1 sibling, 2 replies; 47+ messages in thread
From: Masami Hiramatsu @ 2023-08-03 15:42 UTC (permalink / raw)
  To: Florent Revest
  Cc: Steven Rostedt, Alexei Starovoitov, Masami Hiramatsu (Google),
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Jiri Olsa, Arnaldo Carvalho de Melo,
	Daniel Borkmann, Alan Maguire, Mark Rutland, Peter Zijlstra,
	Thomas Gleixner

On Wed, 2 Aug 2023 16:44:09 +0200
Florent Revest <revest@chromium.org> wrote:

> On Wed, Aug 2, 2023 at 1:09 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Tue, 1 Aug 2023 15:18:56 -0700
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > > On Tue, Aug 1, 2023 at 8:32 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > > >
> > > > On Tue, 1 Aug 2023 11:20:36 -0400
> > > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > >
> > > > > The solution was to come up with ftrace_regs, which just means it has all
> > > > > the registers to extract the arguments of a function and nothing more. Most
> > > >
> > > > This isn't 100% true. The ftrace_regs may hold a fully filled pt_regs. As
> > > > the FTRACE_WITH_REGS callbacks still get passed a ftrace_regs pointer. They
> > > > will do:
> > > >
> > > >         void callback(..., struct ftrace_regs *fregs) {
> > > >                 struct pt_regs *regs = ftrace_get_regs(fregs);
> > > >
> > > >
> > > > Where ftrace_get_regs() will return the pt_regs only if it is fully filled.
> > > > If it is not, then it returns NULL. This was what the x86 maintainers
> > > > agreed with.
> > >
> > > arch/arm64/include/asm/ftrace.h:#define arch_ftrace_get_regs(regs) NULL
> > >
> > > Ouch. That's very bad.
> > > We care a lot about bpf running well on arm64.
> >
> > [ Adding Mark and Florent ]
> 
> Ah, thanks Steve! That's my favorite can of worms :) I actually
> consider sending a talk proposal to the tracing MC at LPC "pt_regs -
> the good the bad and the ugly" on this very topic because I care about
> unblocking BPF "multi_kprobe" (which really is fprobe) on arm64, maybe
> it would be interesting.

Ah, it is almost same as my talk :)

> 
> > That's because arm64 doesn't support FTRACE_WITH_REGS anymore. Their
> > function handlers only care about the arguments. If you want full regs at
> > function entry, then you need to take a breakpoint hit for a full kprobe.
> 
> The main reason why arm64 dropped FTRACE_WITH_REGS is because some
> registers (like pstate) can not be saved outside of an exception entry
> (they are just wrong), so trampolines either have to make a pstate up
> or not populate it.
> 
> The other reasons are: simplicity (for architectural reasons, it's a
> lot easier to have only one type of ftrace trampoline on arm64, the
> "with_args" one) and performance (as you said, why bother saving a
> pt_regs when most ftrace users don't need it anyway). If you need an
> actual full pt_regs, then your use case is debugging rather than
> tracing and you should be able to deal with the slowness and go
> through an exception (a kprobe).

Agreed. Both reasons are reasonable. Especially function entry and
exit tracing API, we don't need full pt_regs because there is 
established ABI.

> 
> > In fact, fprobes isn't even supported on arm64 because it it doesn't have
> > DYNAMIC_FTRACE_WITH_REGS. I believe that was the reason Masami was trying
> > to get it to work with ftrace_regs. To get it to work on arm64.
> >
> > Again, ftrace_get_regs(fregs) is only suppose to return something if the
> > pt_regs is fully supplied. If they are not, then it must not be used. Are
> > you not using a fully filled pt_regs? Because that's what both Thomas and
> > Peter (also added) told me not to do!
> 
> Funnily enough, there's another use of sparse pt_regs in the kernel, in Perf:
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/tree/arch/arm64/include/asm/perf_event.h#n20
> Notice how Perf on arm64 implicitly expects the "pstate" register to
> be set (the very register which we try so hard not to fake in
> ftrace_regs) because Perf happens to call the "user_mode()" macro
> somewhere which reads this field:
> https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/tree/arch/arm64/include/asm/ptrace.h#n227

I think interrupt/exception based API like kprobes and perf (PMU) may need
to use pt_regs. 

> 
> I pointed this out in
> https://lore.kernel.org/all/CABRcYm+esb8J2O1v6=C+h+HSa5NxraPUgo63w7-iZj0CXbpusg@mail.gmail.com/#t
> when Masami proposed adding calls from fprobe to perf. If every
> subsystem makes different assumptions about "how sparse" their pt_regs
> is and they call into one another, this could lead to... interesting
> bugs. (eg: currently, we don't populate a fake pstate in ftrace_regs.
> so we'd need to fake it when creating a sparse pt_regs _for Perf_,
> knowing that Perf specifically expects this reg to be set. this would
> require a struct copy anyway and some knowledge about how the data
> will be consumed, in an arch- and subsystem- specific way)

yeah, sorry I missed that point. I should remove it until we can fix it.

I think we can add another kernel event only perf_trace_buf_submit()
which doesn't have the user_mode() check.

> 
> On the other hand, untangling all code paths that come from
> trampolines (with a light regs structure) from those that come from an
> exception (with a pt_regs) could lead to a lot of duplicated code, and
> converting between each subsystem's idea of a light regs structure
> (what if perf introduces a perf_regs now ?) would be tedious and slow
> (lots of copies ?).

This is one discussion point I think. Actually, using pt_regs in kretprobe
(and rethook) is histrical accident. Originally, it had put a kprobe on
the function return trampoline to hook it. So keep the API compatiblity
I made the hand assembled code to save the pt_regs on the stack.

My another question is if we have the fprobe to trace (hook) the function
return, why we still need the kretprobe itself. I think we can remove
kretprobe and use fprobe exit handler, because "function" probing will
be done by fprobe, not kprobe. And then, we can simplify the kprobe
interface and clarify what it is -- "kprobe is a wrapper of software
breakpoint". And we don't need to think about duplicated code anymore :)

>  
> > Otherwise, ftrace_regs() has support on arm64 for getting to the argument
> > registers and the stack. Even live kernel patching now uses ftrace_regs().
> >
> > >
> > > If you guys decide to convert fprobe to ftrace_regs please
> > > make it depend on kconfig or something.
> > > bpf side needs full pt_regs.
> 
> Some wild ideas that I brought up once in a BPF office hour: BPF
> "multi_kprobe" could provide a fake pt_regs (either by constructing a
> sparse one on the stack or by JIT-ing different offset accesses and/or
> by having the verifier deny access to unpopulated fields) or break the
> current API (is it conceivable to phase out BPF "multi_kprobe"
> programs in favor of BPF "fprobe" programs that don't lie about their
> API and guarantees and just provide a ftrace_regs ?)

+1 :)

> 
> > Then use kprobes. When I asked Masami what the difference between fprobes
> > and kprobes was, he told me that it would be that it would no longer rely
> > on the slower FTRACE_WITH_REGS. But currently, it still does.
> 
> Actually... Moving fprobe to ftrace_regs should get even more spicy!
> :) Fprobe also wraps "rethook" which is basically the same thing as
> kretprobe: a return trampoline that saves a pt_regs, to the point that
> on x86 kretprobe's trampoline got dropped in favor of rethook's
> trampoline. But for the same reasons that we don't want ftrace to save
> pt_regs on arm64, rethook should probably also just save a ftrace_regs
> ? (also, to keep the fprobe callback signatures consistent between
> pre- and post- handlers). But if we want fprobe "post" callbacks to
> save a ftrace_regs now, either we need to re-introduce the kretprobe
> trampoline or also change the API of kretprobe (and break its symmetry
> with kprobe and we'd have the same problem all over again with BPF
> kretprobe program types...). All of this is "beautifully" entangled...
> :)

As I said, I would like to phase out the kretprobe itself because it
provides the same feature of fprobe, which is confusing. jprobe was
removed a while ago, and now kretprobe is. But we can not phase out
it at once. So I think we will keep current kretprobe trampoline on
arm64 and just add new ftrace_regs based rethook. Then remove the
API next release. (after all users including systemtap is moved) 

> 
> > The reason I started the FTRACE_WITH_ARGS (which gave us ftrace_regs) in
> > the first place, was because of the overhead you reported to me with
> > ftrace_regs_caller and why you wanted to go the direct trampoline approach.
> > That's when I realized I could use a subset because those registers were
> > already being saved. The only reason FTRACE_WITH_REGS was created was it
> > had to supply full pt_regs (including flags) and emulate a breakpoint for
> > the kprobes interface. But in reality, nothing really needs all that.
> >
> > > It's not about access to args.
> > > pt_regs is passed from bpf prog further into all kinds of perf event
> > > functions including stack walking.
> 
> If all accesses are done in BPF bytecode, we could (theoretically)
> have the verifier and JIT work together to deny accesses to
> unpopulated fields, or relocate pt_regs accesses to ftrace_regs
> accesses to keep backward compatibility with existing multi_kprobe BPF
> programs.

Yeah, that is what I would like to suggest, and what my patch does.
(let me update rethook too, it'll be a bit tricky since I don't want
break anything) 

Thanks,

> 
> Is there a risk that a "multi_kprobe" program could call into a BPF
> helper or kfunc that reads this pt_regs pointer and expect certain
> fields to be set ? I suppose we could also deny giving that "pt_regs"
> pointer to a helper... :/
> 
> > ftrace_regs gives you the stack pointer. Basically, it gives you access to
> > anything that is required to be saved to do a function call from fentry.
> >
> > > I think ORC unwinder might depend on availability of all registers.
> > > Other perf helpers might need it too. Like perf_event_output.
> > > bpf progs need to access arguments, no doubt about that.
> > > If ftrace_regs have them exactly in the same offsets as in pt_regs
> > > that might work transparently for bpf progs, but, I'm afraid,
> > > it's not the case on all archs.
> > > So we need full pt_regs to make sure all paths are still working.
> > >
> > > Adding Jiri and others.
> >
> > Then I recommend that you give up using fprobes and just stick with kprobes
> > as that's guaranteed to give you full pt_regs (at the overhead of doing
> > things like filing in flags and such). And currently for arm64, fprobes can
> > only work with ftrace_regs, without the full pt_regs.


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-03 15:42                     ` Masami Hiramatsu
@ 2023-08-03 16:37                       ` Florent Revest
  2023-08-07 20:48                       ` Jiri Olsa
  1 sibling, 0 replies; 47+ messages in thread
From: Florent Revest @ 2023-08-03 16:37 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Steven Rostedt, Alexei Starovoitov, linux-trace-kernel, LKML,
	Martin KaFai Lau, bpf, Sven Schnelle, Alexei Starovoitov,
	Jiri Olsa, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Peter Zijlstra, Thomas Gleixner

On Thu, Aug 3, 2023 at 5:42 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Wed, 2 Aug 2023 16:44:09 +0200
> Florent Revest <revest@chromium.org> wrote:
>
> > On Wed, Aug 2, 2023 at 1:09 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > On Tue, 1 Aug 2023 15:18:56 -0700
> > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > >
> > > > On Tue, Aug 1, 2023 at 8:32 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > >
> > > > > On Tue, 1 Aug 2023 11:20:36 -0400
> > > > > Steven Rostedt <rostedt@goodmis.org> wrote:
> > > > >
> > > > > > The solution was to come up with ftrace_regs, which just means it has all
> > > > > > the registers to extract the arguments of a function and nothing more. Most
> > > > >
> > > > > This isn't 100% true. The ftrace_regs may hold a fully filled pt_regs. As
> > > > > the FTRACE_WITH_REGS callbacks still get passed a ftrace_regs pointer. They
> > > > > will do:
> > > > >
> > > > >         void callback(..., struct ftrace_regs *fregs) {
> > > > >                 struct pt_regs *regs = ftrace_get_regs(fregs);
> > > > >
> > > > >
> > > > > Where ftrace_get_regs() will return the pt_regs only if it is fully filled.
> > > > > If it is not, then it returns NULL. This was what the x86 maintainers
> > > > > agreed with.
> > > >
> > > > arch/arm64/include/asm/ftrace.h:#define arch_ftrace_get_regs(regs) NULL
> > > >
> > > > Ouch. That's very bad.
> > > > We care a lot about bpf running well on arm64.
> > >
> > > [ Adding Mark and Florent ]
> >
> > Ah, thanks Steve! That's my favorite can of worms :) I actually
> > consider sending a talk proposal to the tracing MC at LPC "pt_regs -
> > the good the bad and the ugly" on this very topic because I care about
> > unblocking BPF "multi_kprobe" (which really is fprobe) on arm64, maybe
> > it would be interesting.
>
> Ah, it is almost same as my talk :)

Oh, I didn't know! I submitted a proposal today but if the talks have
a lot of overlap maybe it's best that only you give your talk, since
you're the actual maintainer :) or we could co-present if there's
something I could add but I think you have all the background anyway

> > I pointed this out in
> > https://lore.kernel.org/all/CABRcYm+esb8J2O1v6=C+h+HSa5NxraPUgo63w7-iZj0CXbpusg@mail.gmail.com/#t
> > when Masami proposed adding calls from fprobe to perf. If every
> > subsystem makes different assumptions about "how sparse" their pt_regs
> > is and they call into one another, this could lead to... interesting
> > bugs. (eg: currently, we don't populate a fake pstate in ftrace_regs.
> > so we'd need to fake it when creating a sparse pt_regs _for Perf_,
> > knowing that Perf specifically expects this reg to be set. this would
> > require a struct copy anyway and some knowledge about how the data
> > will be consumed, in an arch- and subsystem- specific way)
>
> yeah, sorry I missed that point. I should remove it until we can fix it.

Uh, I shouldn't have buried my important comments so far down the
email :/ I wasn't sure whether you had missed the paragraph.

> > On the other hand, untangling all code paths that come from
> > trampolines (with a light regs structure) from those that come from an
> > exception (with a pt_regs) could lead to a lot of duplicated code, and
> > converting between each subsystem's idea of a light regs structure
> > (what if perf introduces a perf_regs now ?) would be tedious and slow
> > (lots of copies ?).
>
> This is one discussion point I think. Actually, using pt_regs in kretprobe
> (and rethook) is histrical accident. Originally, it had put a kprobe on
> the function return trampoline to hook it. So keep the API compatiblity
> I made the hand assembled code to save the pt_regs on the stack.
>
> My another question is if we have the fprobe to trace (hook) the function
> return, why we still need the kretprobe itself. I think we can remove
> kretprobe and use fprobe exit handler, because "function" probing will
> be done by fprobe, not kprobe. And then, we can simplify the kprobe
> interface and clarify what it is -- "kprobe is a wrapper of software
> breakpoint". And we don't need to think about duplicated code anymore :)

That sounds reasonable to me

> As I said, I would like to phase out the kretprobe itself because it
> provides the same feature of fprobe, which is confusing. jprobe was
> removed a while ago, and now kretprobe is. But we can not phase out
> it at once. So I think we will keep current kretprobe trampoline on
> arm64 and just add new ftrace_regs based rethook. Then remove the
> API next release. (after all users including systemtap is moved)

Heads up to BPF folks though since they also have BPF "kretprobe"
program types which would break in a similar fashion as multi_kprobe
(even though BPF kretprobe programs have also been discouraged for a
while in favor of BPF fexit programs)

> > > The reason I started the FTRACE_WITH_ARGS (which gave us ftrace_regs) in
> > > the first place, was because of the overhead you reported to me with
> > > ftrace_regs_caller and why you wanted to go the direct trampoline approach.
> > > That's when I realized I could use a subset because those registers were
> > > already being saved. The only reason FTRACE_WITH_REGS was created was it
> > > had to supply full pt_regs (including flags) and emulate a breakpoint for
> > > the kprobes interface. But in reality, nothing really needs all that.
> > >
> > > > It's not about access to args.
> > > > pt_regs is passed from bpf prog further into all kinds of perf event
> > > > functions including stack walking.
> >
> > If all accesses are done in BPF bytecode, we could (theoretically)
> > have the verifier and JIT work together to deny accesses to
> > unpopulated fields, or relocate pt_regs accesses to ftrace_regs
> > accesses to keep backward compatibility with existing multi_kprobe BPF
> > programs.
>
> Yeah, that is what I would like to suggest, and what my patch does.
> (let me update rethook too, it'll be a bit tricky since I don't want
> break anything)

I agree with Alexei that this is an unnecessary amount of complexity
in the verifier just to avoid a struct copy though. It's good to know
that we _could_ do it if we really need to someday but then again, if
a user chooses an interface that gets a pt_regs they shouldn't expect
high performance. Therefore, I think it's ok for BPF multi_kprobe to
copy fields from a ftrace_regs to a pt_regs on stack, especially if it
avoids so much additional complexity in the verifier.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-03 15:42                     ` Masami Hiramatsu
  2023-08-03 16:37                       ` Florent Revest
@ 2023-08-07 20:48                       ` Jiri Olsa
  2023-08-08 14:32                         ` Masami Hiramatsu
  1 sibling, 1 reply; 47+ messages in thread
From: Jiri Olsa @ 2023-08-07 20:48 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Florent Revest, Steven Rostedt, Alexei Starovoitov,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Peter Zijlstra, Thomas Gleixner

On Fri, Aug 04, 2023 at 12:42:06AM +0900, Masami Hiramatsu wrote:

SNIP

> > 
> > On the other hand, untangling all code paths that come from
> > trampolines (with a light regs structure) from those that come from an
> > exception (with a pt_regs) could lead to a lot of duplicated code, and
> > converting between each subsystem's idea of a light regs structure
> > (what if perf introduces a perf_regs now ?) would be tedious and slow
> > (lots of copies ?).
> 
> This is one discussion point I think. Actually, using pt_regs in kretprobe
> (and rethook) is histrical accident. Originally, it had put a kprobe on
> the function return trampoline to hook it. So keep the API compatiblity
> I made the hand assembled code to save the pt_regs on the stack.
> 
> My another question is if we have the fprobe to trace (hook) the function
> return, why we still need the kretprobe itself. I think we can remove
> kretprobe and use fprobe exit handler, because "function" probing will
> be done by fprobe, not kprobe. And then, we can simplify the kprobe
> interface and clarify what it is -- "kprobe is a wrapper of software
> breakpoint". And we don't need to think about duplicated code anymore :)

1+ sounds like good idea

> 
> >  
> > > Otherwise, ftrace_regs() has support on arm64 for getting to the argument
> > > registers and the stack. Even live kernel patching now uses ftrace_regs().
> > >
> > > >
> > > > If you guys decide to convert fprobe to ftrace_regs please
> > > > make it depend on kconfig or something.
> > > > bpf side needs full pt_regs.
> > 
> > Some wild ideas that I brought up once in a BPF office hour: BPF
> > "multi_kprobe" could provide a fake pt_regs (either by constructing a
> > sparse one on the stack or by JIT-ing different offset accesses and/or
> > by having the verifier deny access to unpopulated fields) or break the
> > current API (is it conceivable to phase out BPF "multi_kprobe"
> > programs in favor of BPF "fprobe" programs that don't lie about their
> > API and guarantees and just provide a ftrace_regs ?)
> 
> +1 :)

so multi_kprobe link was created to allow fast attach of BPF kprobe-type
programs to multiple functions.. I don't think there's need for new fprobe
program

> 
> > 
> > > Then use kprobes. When I asked Masami what the difference between fprobes
> > > and kprobes was, he told me that it would be that it would no longer rely
> > > on the slower FTRACE_WITH_REGS. But currently, it still does.
> > 
> > Actually... Moving fprobe to ftrace_regs should get even more spicy!
> > :) Fprobe also wraps "rethook" which is basically the same thing as
> > kretprobe: a return trampoline that saves a pt_regs, to the point that
> > on x86 kretprobe's trampoline got dropped in favor of rethook's
> > trampoline. But for the same reasons that we don't want ftrace to save
> > pt_regs on arm64, rethook should probably also just save a ftrace_regs
> > ? (also, to keep the fprobe callback signatures consistent between
> > pre- and post- handlers). But if we want fprobe "post" callbacks to
> > save a ftrace_regs now, either we need to re-introduce the kretprobe
> > trampoline or also change the API of kretprobe (and break its symmetry
> > with kprobe and we'd have the same problem all over again with BPF
> > kretprobe program types...). All of this is "beautifully" entangled...
> > :)
> 
> As I said, I would like to phase out the kretprobe itself because it
> provides the same feature of fprobe, which is confusing. jprobe was
> removed a while ago, and now kretprobe is. But we can not phase out
> it at once. So I think we will keep current kretprobe trampoline on
> arm64 and just add new ftrace_regs based rethook. Then remove the
> API next release. (after all users including systemtap is moved) 
> 
> > 
> > > The reason I started the FTRACE_WITH_ARGS (which gave us ftrace_regs) in
> > > the first place, was because of the overhead you reported to me with
> > > ftrace_regs_caller and why you wanted to go the direct trampoline approach.
> > > That's when I realized I could use a subset because those registers were
> > > already being saved. The only reason FTRACE_WITH_REGS was created was it
> > > had to supply full pt_regs (including flags) and emulate a breakpoint for
> > > the kprobes interface. But in reality, nothing really needs all that.
> > >
> > > > It's not about access to args.
> > > > pt_regs is passed from bpf prog further into all kinds of perf event
> > > > functions including stack walking.
> > 
> > If all accesses are done in BPF bytecode, we could (theoretically)
> > have the verifier and JIT work together to deny accesses to
> > unpopulated fields, or relocate pt_regs accesses to ftrace_regs
> > accesses to keep backward compatibility with existing multi_kprobe BPF
> > programs.
> 
> Yeah, that is what I would like to suggest, and what my patch does.
> (let me update rethook too, it'll be a bit tricky since I don't want
> break anything) 
> 
> Thanks,
> 
> > 
> > Is there a risk that a "multi_kprobe" program could call into a BPF
> > helper or kfunc that reads this pt_regs pointer and expect certain
> > fields to be set ? I suppose we could also deny giving that "pt_regs"
> > pointer to a helper... :/

I think Alexei answered this earlier in the thread:

 >From bpf side we don't care that such pt_regs is 100% filled in or
 >only partial as long as this pt_regs pointer is valid for perf_event_output
 >and stack walking that consume pt_regs.
 >I believe that was and still is the case for both x86 and arm64.


jirka

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union
  2023-08-07 20:48                       ` Jiri Olsa
@ 2023-08-08 14:32                         ` Masami Hiramatsu
  0 siblings, 0 replies; 47+ messages in thread
From: Masami Hiramatsu @ 2023-08-08 14:32 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Florent Revest, Steven Rostedt, Alexei Starovoitov,
	linux-trace-kernel, LKML, Martin KaFai Lau, bpf, Sven Schnelle,
	Alexei Starovoitov, Arnaldo Carvalho de Melo, Daniel Borkmann,
	Alan Maguire, Mark Rutland, Peter Zijlstra, Thomas Gleixner

On Mon, 7 Aug 2023 22:48:29 +0200
Jiri Olsa <olsajiri@gmail.com> wrote:

> On Fri, Aug 04, 2023 at 12:42:06AM +0900, Masami Hiramatsu wrote:
> 
> SNIP
> 
> > > 
> > > On the other hand, untangling all code paths that come from
> > > trampolines (with a light regs structure) from those that come from an
> > > exception (with a pt_regs) could lead to a lot of duplicated code, and
> > > converting between each subsystem's idea of a light regs structure
> > > (what if perf introduces a perf_regs now ?) would be tedious and slow
> > > (lots of copies ?).
> > 
> > This is one discussion point I think. Actually, using pt_regs in kretprobe
> > (and rethook) is histrical accident. Originally, it had put a kprobe on
> > the function return trampoline to hook it. So keep the API compatiblity
> > I made the hand assembled code to save the pt_regs on the stack.
> > 
> > My another question is if we have the fprobe to trace (hook) the function
> > return, why we still need the kretprobe itself. I think we can remove
> > kretprobe and use fprobe exit handler, because "function" probing will
> > be done by fprobe, not kprobe. And then, we can simplify the kprobe
> > interface and clarify what it is -- "kprobe is a wrapper of software
> > breakpoint". And we don't need to think about duplicated code anymore :)
> 
> 1+ sounds like good idea

Thanks! the downside will be that it requires to enable CONFIG_FPROBE
instead of CONFIG_KPROBES, but I think it is natural that the user, who 
wants to trace function boundary, enables CONFIG_FUNCTION_TRACER.

> > > > Otherwise, ftrace_regs() has support on arm64 for getting to the argument
> > > > registers and the stack. Even live kernel patching now uses ftrace_regs().
> > > >
> > > > >
> > > > > If you guys decide to convert fprobe to ftrace_regs please
> > > > > make it depend on kconfig or something.
> > > > > bpf side needs full pt_regs.
> > > 
> > > Some wild ideas that I brought up once in a BPF office hour: BPF
> > > "multi_kprobe" could provide a fake pt_regs (either by constructing a
> > > sparse one on the stack or by JIT-ing different offset accesses and/or
> > > by having the verifier deny access to unpopulated fields) or break the
> > > current API (is it conceivable to phase out BPF "multi_kprobe"
> > > programs in favor of BPF "fprobe" programs that don't lie about their
> > > API and guarantees and just provide a ftrace_regs ?)
> > 
> > +1 :)
> 
> so multi_kprobe link was created to allow fast attach of BPF kprobe-type
> programs to multiple functions.. I don't think there's need for new fprobe
> program

Ah, OK. So the focus point is shortening registration time.

> 
> > 
> > > 
> > > > Then use kprobes. When I asked Masami what the difference between fprobes
> > > > and kprobes was, he told me that it would be that it would no longer rely
> > > > on the slower FTRACE_WITH_REGS. But currently, it still does.
> > > 
> > > Actually... Moving fprobe to ftrace_regs should get even more spicy!
> > > :) Fprobe also wraps "rethook" which is basically the same thing as
> > > kretprobe: a return trampoline that saves a pt_regs, to the point that
> > > on x86 kretprobe's trampoline got dropped in favor of rethook's
> > > trampoline. But for the same reasons that we don't want ftrace to save
> > > pt_regs on arm64, rethook should probably also just save a ftrace_regs
> > > ? (also, to keep the fprobe callback signatures consistent between
> > > pre- and post- handlers). But if we want fprobe "post" callbacks to
> > > save a ftrace_regs now, either we need to re-introduce the kretprobe
> > > trampoline or also change the API of kretprobe (and break its symmetry
> > > with kprobe and we'd have the same problem all over again with BPF
> > > kretprobe program types...). All of this is "beautifully" entangled...
> > > :)
> > 
> > As I said, I would like to phase out the kretprobe itself because it
> > provides the same feature of fprobe, which is confusing. jprobe was
> > removed a while ago, and now kretprobe is. But we can not phase out
> > it at once. So I think we will keep current kretprobe trampoline on
> > arm64 and just add new ftrace_regs based rethook. Then remove the
> > API next release. (after all users including systemtap is moved) 
> > 
> > > 
> > > > The reason I started the FTRACE_WITH_ARGS (which gave us ftrace_regs) in
> > > > the first place, was because of the overhead you reported to me with
> > > > ftrace_regs_caller and why you wanted to go the direct trampoline approach.
> > > > That's when I realized I could use a subset because those registers were
> > > > already being saved. The only reason FTRACE_WITH_REGS was created was it
> > > > had to supply full pt_regs (including flags) and emulate a breakpoint for
> > > > the kprobes interface. But in reality, nothing really needs all that.
> > > >
> > > > > It's not about access to args.
> > > > > pt_regs is passed from bpf prog further into all kinds of perf event
> > > > > functions including stack walking.
> > > 
> > > If all accesses are done in BPF bytecode, we could (theoretically)
> > > have the verifier and JIT work together to deny accesses to
> > > unpopulated fields, or relocate pt_regs accesses to ftrace_regs
> > > accesses to keep backward compatibility with existing multi_kprobe BPF
> > > programs.
> > 
> > Yeah, that is what I would like to suggest, and what my patch does.
> > (let me update rethook too, it'll be a bit tricky since I don't want
> > break anything) 
> > 
> > Thanks,
> > 
> > > 
> > > Is there a risk that a "multi_kprobe" program could call into a BPF
> > > helper or kfunc that reads this pt_regs pointer and expect certain
> > > fields to be set ? I suppose we could also deny giving that "pt_regs"
> > > pointer to a helper... :/
> 
> I think Alexei answered this earlier in the thread:
> 
>  >From bpf side we don't care that such pt_regs is 100% filled in or
>  >only partial as long as this pt_regs pointer is valid for perf_event_output
>  >and stack walking that consume pt_regs.
>  >I believe that was and still is the case for both x86 and arm64.

OK, so I've made the ftrace_partial_regs() according to the idea now.

Thanks,

> 
> 
> jirka


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2023-08-08 16:31 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-31  7:30 [PATCH v4 0/9] tracing: Improbe BTF support on probe events Masami Hiramatsu (Google)
2023-07-31  7:30 ` [PATCH v4 1/9] tracing/probes: Support BTF argument on module functions Masami Hiramatsu (Google)
2023-07-31  7:30 ` [PATCH v4 2/9] bpf/btf: tracing: Move finding func-proto API and getting func-param API to BTF Masami Hiramatsu (Google)
2023-07-31  7:30 ` [PATCH v4 3/9] bpf/btf: Add a function to search a member of a struct/union Masami Hiramatsu (Google)
2023-07-31 21:59   ` Alexei Starovoitov
2023-07-31 23:57     ` Masami Hiramatsu
2023-08-01  0:29       ` Alexei Starovoitov
2023-08-01 15:02         ` Masami Hiramatsu
2023-08-01 15:20           ` Steven Rostedt
2023-08-01 15:32             ` Steven Rostedt
2023-08-01 22:18               ` Alexei Starovoitov
2023-08-01 23:09                 ` Steven Rostedt
2023-08-01 23:44                   ` Alexei Starovoitov
2023-08-02  0:21                   ` Masami Hiramatsu
2023-08-02  0:40                     ` Steven Rostedt
2023-08-02  0:44                       ` Steven Rostedt
2023-08-02  2:22                         ` Alexei Starovoitov
2023-08-02  2:32                           ` Steven Rostedt
2023-08-02 14:07                           ` Masami Hiramatsu
2023-08-02 15:08                             ` Florent Revest
2023-08-02 13:56                       ` Masami Hiramatsu
2023-08-02 14:48                         ` Florent Revest
2023-08-02 15:47                         ` Florent Revest
2023-08-03  1:55                           ` Masami Hiramatsu
2023-08-02 18:24                         ` Alexei Starovoitov
2023-08-02 18:38                           ` Steven Rostedt
2023-08-02 19:48                             ` Alexei Starovoitov
2023-08-02 20:12                               ` Steven Rostedt
2023-08-02 21:28                                 ` Alexei Starovoitov
2023-08-02 14:44                   ` Florent Revest
2023-08-02 16:11                     ` Steven Rostedt
2023-08-03 15:42                     ` Masami Hiramatsu
2023-08-03 16:37                       ` Florent Revest
2023-08-07 20:48                       ` Jiri Olsa
2023-08-08 14:32                         ` Masami Hiramatsu
2023-08-01  1:15     ` Steven Rostedt
2023-08-01  2:24       ` Alexei Starovoitov
2023-08-01 13:35         ` Steven Rostedt
2023-08-01 15:18         ` Masami Hiramatsu
2023-08-01 22:21           ` Alexei Starovoitov
2023-08-01 23:17             ` Masami Hiramatsu
2023-07-31  7:30 ` [PATCH v4 4/9] tracing/probes: Support BTF based data structure field access Masami Hiramatsu (Google)
2023-07-31  7:30 ` [PATCH v4 5/9] tracing/probes: Support BTF field access from $retval Masami Hiramatsu (Google)
2023-07-31  7:31 ` [PATCH v4 6/9] tracing/probes: Add string type check with BTF Masami Hiramatsu (Google)
2023-07-31  7:31 ` [PATCH v4 7/9] tracing/fprobe-event: Assume fprobe is a return event by $retval Masami Hiramatsu (Google)
2023-07-31  7:31 ` [PATCH v4 8/9] selftests/ftrace: Add BTF fields access testcases Masami Hiramatsu (Google)
2023-07-31  7:31 ` [PATCH v4 9/9] Documentation: tracing: Update fprobe event example with BTF field Masami Hiramatsu (Google)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).