BPF List
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie
@ 2024-03-19 23:38 Andrii Nakryiko
  2024-03-19 23:38 ` [PATCH v3 bpf-next 1/5] bpf: flatten bpf_probe_register call chain Andrii Nakryiko
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2024-03-19 23:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

Add ability to specify and retrieve BPF cookie for raw tracepoint programs.
Both BTF-aware (SEC("tp_btf")) and non-BTF-aware (SEC("raw_tp")) are
supported, as they are exactly the same at runtime.

This issue recently came up in production use cases, where custom tried to
switch from slower classic tracepoints to raw tracepoints and ran into this
limitation. Luckily, it's not that hard to support this for raw_tp programs.

v2->v3:
  - s/bpf_raw_tp_open/bpf_raw_tracepoint_open_opts/ (Alexei, Eduard);
v1->v2:
  - fixed type definition for stubs of bpf_probe_{register,unregister};
  - added __u32 :u32 and aligned raw_tp fields (Jiri);
  - added Stanislav's ack.

Andrii Nakryiko (5):
  bpf: flatten bpf_probe_register call chain
  bpf: pass whole link instead of prog when triggering raw tracepoint
  bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
  libbpf: add support for BPF cookie for raw_tp/tp_btf programs
  selftests/bpf: add raw_tp/tp_btf BPF cookie subtests

 include/linux/bpf.h                           |   6 +
 include/linux/trace_events.h                  |  36 +++---
 include/trace/bpf_probe.h                     |   3 +-
 include/uapi/linux/bpf.h                      |   6 +-
 kernel/bpf/syscall.c                          |  22 ++--
 kernel/trace/bpf_trace.c                      |  36 ++++--
 tools/include/uapi/linux/bpf.h                |   1 +
 tools/lib/bpf/bpf.c                           |  16 ++-
 tools/lib/bpf/bpf.h                           |   9 ++
 tools/lib/bpf/libbpf.c                        |  20 ++-
 tools/lib/bpf/libbpf.h                        |  11 ++
 tools/lib/bpf/libbpf.map                      |   2 +
 .../selftests/bpf/prog_tests/bpf_cookie.c     | 114 +++++++++++++++++-
 .../selftests/bpf/progs/test_bpf_cookie.c     |  16 +++
 14 files changed, 248 insertions(+), 50 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 bpf-next 1/5] bpf: flatten bpf_probe_register call chain
  2024-03-19 23:38 [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie Andrii Nakryiko
@ 2024-03-19 23:38 ` Andrii Nakryiko
  2024-03-19 23:38 ` [PATCH v3 bpf-next 2/5] bpf: pass whole link instead of prog when triggering raw tracepoint Andrii Nakryiko
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2024-03-19 23:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Stanislav Fomichev

bpf_probe_register() and __bpf_probe_register() have identical
signatures and bpf_probe_register() just redirect to
__bpf_probe_register(). So get rid of this extra function call step to
simplify following the source code.

It has no difference at runtime due to inlining, of course.

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/trace/bpf_trace.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 1b041911b1d8..30ecf62f8a17 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2425,7 +2425,7 @@ BPF_TRACE_DEFN_x(10);
 BPF_TRACE_DEFN_x(11);
 BPF_TRACE_DEFN_x(12);
 
-static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
 {
 	struct tracepoint *tp = btp->tp;
 
@@ -2443,11 +2443,6 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
 						   prog);
 }
 
-int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
-{
-	return __bpf_probe_register(btp, prog);
-}
-
 int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
 {
 	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 bpf-next 2/5] bpf: pass whole link instead of prog when triggering raw tracepoint
  2024-03-19 23:38 [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie Andrii Nakryiko
  2024-03-19 23:38 ` [PATCH v3 bpf-next 1/5] bpf: flatten bpf_probe_register call chain Andrii Nakryiko
@ 2024-03-19 23:38 ` Andrii Nakryiko
  2024-04-09  8:37   ` Pengfei Xu
  2024-03-19 23:38 ` [PATCH v3 bpf-next 3/5] bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs Andrii Nakryiko
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2024-03-19 23:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Stanislav Fomichev

Instead of passing prog as an argument to bpf_trace_runX() helpers, that
are called from tracepoint triggering calls, store BPF link itself
(struct bpf_raw_tp_link for raw tracepoints). This will allow to pass
extra information like BPF cookie into raw tracepoint registration.

Instead of replacing `struct bpf_prog *prog = __data;` with
corresponding `struct bpf_raw_tp_link *link = __data;` assignment in
`__bpf_trace_##call` I just passed `__data` through into underlying
bpf_trace_runX() call. This works well because we implicitly cast `void *`,
and it also avoids naming clashes with arguments coming from
tracepoint's "proto" list. We could have run into the same problem with
"prog", we just happened to not have a tracepoint that has "prog" input
argument. We are less lucky with "link", as there are tracepoints using
"link" argument name already. So instead of trying to avoid naming
conflicts, let's just remove intermediate local variable. It doesn't
hurt readibility, it's either way a bit of a maze of calls and macros,
that requires careful reading.

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h          |  5 +++++
 include/linux/trace_events.h | 36 ++++++++++++++++++++----------------
 include/trace/bpf_probe.h    |  3 +--
 kernel/bpf/syscall.c         |  9 ++-------
 kernel/trace/bpf_trace.c     | 18 ++++++++++--------
 5 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 17843e66a1d3..2ea8ce59f582 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1607,6 +1607,11 @@ struct bpf_tracing_link {
 	struct bpf_prog *tgt_prog;
 };
 
+struct bpf_raw_tp_link {
+	struct bpf_link link;
+	struct bpf_raw_event_map *btp;
+};
+
 struct bpf_link_primer {
 	struct bpf_link *link;
 	struct file *file;
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index d68ff9b1247f..a7fc6fb6de3c 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -759,8 +759,11 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
 int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
 void perf_event_detach_bpf_prog(struct perf_event *event);
 int perf_event_query_prog_array(struct perf_event *event, void __user *info);
-int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
-int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
+
+struct bpf_raw_tp_link;
+int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link);
+int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link);
+
 struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
 void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
@@ -788,11 +791,12 @@ perf_event_query_prog_array(struct perf_event *event, void __user *info)
 {
 	return -EOPNOTSUPP;
 }
-static inline int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *p)
+struct bpf_raw_tp_link;
+static inline int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link)
 {
 	return -EOPNOTSUPP;
 }
-static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *p)
+static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link)
 {
 	return -EOPNOTSUPP;
 }
@@ -903,31 +907,31 @@ void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
 int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
 void perf_event_free_bpf_prog(struct perf_event *event);
 
-void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
-void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
-void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run1(struct bpf_raw_tp_link *link, u64 arg1);
+void bpf_trace_run2(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2);
+void bpf_trace_run3(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
 		    u64 arg3);
-void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run4(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4);
-void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run5(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4, u64 arg5);
-void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run6(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4, u64 arg5, u64 arg6);
-void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run7(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7);
-void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run8(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
 		    u64 arg8);
-void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run9(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
 		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
 		    u64 arg8, u64 arg9);
-void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run10(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
 		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
 		     u64 arg8, u64 arg9, u64 arg10);
-void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run11(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
 		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
 		     u64 arg8, u64 arg9, u64 arg10, u64 arg11);
-void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
+void bpf_trace_run12(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
 		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
 		     u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12);
 void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index e609cd7da47e..a2ea11cc912e 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -46,8 +46,7 @@
 static notrace void							\
 __bpf_trace_##call(void *__data, proto)					\
 {									\
-	struct bpf_prog *prog = __data;					\
-	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));	\
+	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));	\
 }
 
 #undef DECLARE_EVENT_CLASS
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ae2ff73bde7e..1cb4c3809af4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3469,17 +3469,12 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 	return err;
 }
 
-struct bpf_raw_tp_link {
-	struct bpf_link link;
-	struct bpf_raw_event_map *btp;
-};
-
 static void bpf_raw_tp_link_release(struct bpf_link *link)
 {
 	struct bpf_raw_tp_link *raw_tp =
 		container_of(link, struct bpf_raw_tp_link, link);
 
-	bpf_probe_unregister(raw_tp->btp, raw_tp->link.prog);
+	bpf_probe_unregister(raw_tp->btp, raw_tp);
 	bpf_put_raw_tracepoint(raw_tp->btp);
 }
 
@@ -3833,7 +3828,7 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
 		goto out_put_btp;
 	}
 
-	err = bpf_probe_register(link->btp, prog);
+	err = bpf_probe_register(link->btp, link);
 	if (err) {
 		bpf_link_cleanup(&link_primer);
 		goto out_put_btp;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 30ecf62f8a17..17de91ad4a1f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2366,8 +2366,10 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
 }
 
 static __always_inline
-void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
+void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
 {
+	struct bpf_prog *prog = link->link.prog;
+
 	cant_sleep();
 	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
 		bpf_prog_inc_misses_counter(prog);
@@ -2404,12 +2406,12 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
 #define __SEQ_0_11	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
 
 #define BPF_TRACE_DEFN_x(x)						\
-	void bpf_trace_run##x(struct bpf_prog *prog,			\
+	void bpf_trace_run##x(struct bpf_raw_tp_link *link,		\
 			      REPEAT(x, SARG, __DL_COM, __SEQ_0_11))	\
 	{								\
 		u64 args[x];						\
 		REPEAT(x, COPY, __DL_SEM, __SEQ_0_11);			\
-		__bpf_trace_run(prog, args);				\
+		__bpf_trace_run(link, args);				\
 	}								\
 	EXPORT_SYMBOL_GPL(bpf_trace_run##x)
 BPF_TRACE_DEFN_x(1);
@@ -2425,9 +2427,10 @@ BPF_TRACE_DEFN_x(10);
 BPF_TRACE_DEFN_x(11);
 BPF_TRACE_DEFN_x(12);
 
-int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link)
 {
 	struct tracepoint *tp = btp->tp;
+	struct bpf_prog *prog = link->link.prog;
 
 	/*
 	 * check that program doesn't access arguments beyond what's
@@ -2439,13 +2442,12 @@ int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
 	if (prog->aux->max_tp_access > btp->writable_size)
 		return -EINVAL;
 
-	return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func,
-						   prog);
+	return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func, link);
 }
 
-int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
+int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link)
 {
-	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
+	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, link);
 }
 
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 bpf-next 3/5] bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
  2024-03-19 23:38 [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie Andrii Nakryiko
  2024-03-19 23:38 ` [PATCH v3 bpf-next 1/5] bpf: flatten bpf_probe_register call chain Andrii Nakryiko
  2024-03-19 23:38 ` [PATCH v3 bpf-next 2/5] bpf: pass whole link instead of prog when triggering raw tracepoint Andrii Nakryiko
@ 2024-03-19 23:38 ` Andrii Nakryiko
  2024-03-19 23:38 ` [PATCH v3 bpf-next 4/5] libbpf: add support for BPF cookie for raw_tp/tp_btf programs Andrii Nakryiko
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2024-03-19 23:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau
  Cc: andrii, kernel-team, Stanislav Fomichev, Eduard Zingerman

Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF
aware variants). This brings them up to part w.r.t. BPF cookie usage
with classic tracepoint and fentry/fexit programs.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h            |  1 +
 include/uapi/linux/bpf.h       |  6 ++++--
 kernel/bpf/syscall.c           | 13 +++++++++----
 kernel/trace/bpf_trace.c       | 13 +++++++++++++
 tools/include/uapi/linux/bpf.h |  1 +
 5 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2ea8ce59f582..62762390c93d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1610,6 +1610,7 @@ struct bpf_tracing_link {
 struct bpf_raw_tp_link {
 	struct bpf_link link;
 	struct bpf_raw_event_map *btp;
+	u64 cookie;
 };
 
 struct bpf_link_primer {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3c42b9f1bada..9585f5345353 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1662,8 +1662,10 @@ union bpf_attr {
 	} query;
 
 	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
-		__u64 name;
-		__u32 prog_fd;
+		__u64		name;
+		__u32		prog_fd;
+		__u32		:32;
+		__aligned_u64	cookie;
 	} raw_tracepoint;
 
 	struct { /* anonymous struct for BPF_BTF_LOAD */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1cb4c3809af4..e44c276e8617 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3774,7 +3774,7 @@ static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *pro
 #endif /* CONFIG_PERF_EVENTS */
 
 static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
-				  const char __user *user_tp_name)
+				  const char __user *user_tp_name, u64 cookie)
 {
 	struct bpf_link_primer link_primer;
 	struct bpf_raw_tp_link *link;
@@ -3821,6 +3821,7 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
 	bpf_link_init(&link->link, BPF_LINK_TYPE_RAW_TRACEPOINT,
 		      &bpf_raw_tp_link_lops, prog);
 	link->btp = btp;
+	link->cookie = cookie;
 
 	err = bpf_link_prime(&link->link, &link_primer);
 	if (err) {
@@ -3841,11 +3842,13 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
 	return err;
 }
 
-#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
+#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.cookie
 
 static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 {
 	struct bpf_prog *prog;
+	void __user *tp_name;
+	__u64 cookie;
 	int fd;
 
 	if (CHECK_ATTR(BPF_RAW_TRACEPOINT_OPEN))
@@ -3855,7 +3858,9 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 	if (IS_ERR(prog))
 		return PTR_ERR(prog);
 
-	fd = bpf_raw_tp_link_attach(prog, u64_to_user_ptr(attr->raw_tracepoint.name));
+	tp_name = u64_to_user_ptr(attr->raw_tracepoint.name);
+	cookie = attr->raw_tracepoint.cookie;
+	fd = bpf_raw_tp_link_attach(prog, tp_name, cookie);
 	if (fd < 0)
 		bpf_prog_put(prog);
 	return fd;
@@ -5193,7 +5198,7 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
 			goto out;
 		}
 		if (prog->expected_attach_type == BPF_TRACE_RAW_TP)
-			ret = bpf_raw_tp_link_attach(prog, NULL);
+			ret = bpf_raw_tp_link_attach(prog, NULL, attr->link_create.tracing.cookie);
 		else if (prog->expected_attach_type == BPF_TRACE_ITER)
 			ret = bpf_iter_link_attach(attr, uattr, prog);
 		else if (prog->expected_attach_type == BPF_LSM_CGROUP)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 17de91ad4a1f..434e3ece6688 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2004,6 +2004,8 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_stackid_proto_raw_tp;
 	case BPF_FUNC_get_stack:
 		return &bpf_get_stack_proto_raw_tp;
+	case BPF_FUNC_get_attach_cookie:
+		return &bpf_get_attach_cookie_proto_tracing;
 	default:
 		return bpf_tracing_func_proto(func_id, prog);
 	}
@@ -2066,6 +2068,9 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_func_arg_cnt:
 		return bpf_prog_has_trampoline(prog) ? &bpf_get_func_arg_cnt_proto : NULL;
 	case BPF_FUNC_get_attach_cookie:
+		if (prog->type == BPF_PROG_TYPE_TRACING &&
+		    prog->expected_attach_type == BPF_TRACE_RAW_TP)
+			return &bpf_get_attach_cookie_proto_tracing;
 		return bpf_prog_has_trampoline(prog) ? &bpf_get_attach_cookie_proto_tracing : NULL;
 	default:
 		fn = raw_tp_prog_func_proto(func_id, prog);
@@ -2369,15 +2374,23 @@ static __always_inline
 void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
 {
 	struct bpf_prog *prog = link->link.prog;
+	struct bpf_run_ctx *old_run_ctx;
+	struct bpf_trace_run_ctx run_ctx;
 
 	cant_sleep();
 	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
 		bpf_prog_inc_misses_counter(prog);
 		goto out;
 	}
+
+	run_ctx.bpf_cookie = link->cookie;
+	old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+
 	rcu_read_lock();
 	(void) bpf_prog_run(prog, args);
 	rcu_read_unlock();
+
+	bpf_reset_run_ctx(old_run_ctx);
 out:
 	this_cpu_dec(*(prog->active));
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 3c42b9f1bada..bf80b614c4db 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1664,6 +1664,7 @@ union bpf_attr {
 	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
 		__u64 name;
 		__u32 prog_fd;
+		__aligned_u64 cookie;
 	} raw_tracepoint;
 
 	struct { /* anonymous struct for BPF_BTF_LOAD */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 bpf-next 4/5] libbpf: add support for BPF cookie for raw_tp/tp_btf programs
  2024-03-19 23:38 [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2024-03-19 23:38 ` [PATCH v3 bpf-next 3/5] bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs Andrii Nakryiko
@ 2024-03-19 23:38 ` Andrii Nakryiko
  2024-03-19 23:45   ` Eduard Zingerman
  2024-03-19 23:38 ` [PATCH v3 bpf-next 5/5] selftests/bpf: add raw_tp/tp_btf BPF cookie subtests Andrii Nakryiko
  2024-03-20  6:20 ` [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie patchwork-bot+netdevbpf
  5 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2024-03-19 23:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau
  Cc: andrii, kernel-team, Stanislav Fomichev, Eduard Zingerman

Wire up BPF cookie passing or raw_tp and tp_btf programs, both in
low-level and high-level APIs.

Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c      | 16 ++++++++++++++--
 tools/lib/bpf/bpf.h      |  9 +++++++++
 tools/lib/bpf/libbpf.c   | 20 +++++++++++++++++---
 tools/lib/bpf/libbpf.h   | 11 +++++++++++
 tools/lib/bpf/libbpf.map |  2 ++
 5 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 97ec005c3c47..c9f4e04f38fe 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -785,6 +785,7 @@ int bpf_link_create(int prog_fd, int target_fd,
 		if (!OPTS_ZEROED(opts, uprobe_multi))
 			return libbpf_err(-EINVAL);
 		break;
+	case BPF_TRACE_RAW_TP:
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
 	case BPF_MODIFY_RETURN:
@@ -1173,20 +1174,31 @@ int bpf_link_get_info_by_fd(int link_fd, struct bpf_link_info *info, __u32 *info
 	return bpf_obj_get_info_by_fd(link_fd, info, info_len);
 }
 
-int bpf_raw_tracepoint_open(const char *name, int prog_fd)
+int bpf_raw_tracepoint_open_opts(int prog_fd, struct bpf_raw_tp_opts *opts)
 {
 	const size_t attr_sz = offsetofend(union bpf_attr, raw_tracepoint);
 	union bpf_attr attr;
 	int fd;
 
+	if (!OPTS_VALID(opts, bpf_raw_tp_opts))
+		return libbpf_err(-EINVAL);
+
 	memset(&attr, 0, attr_sz);
-	attr.raw_tracepoint.name = ptr_to_u64(name);
 	attr.raw_tracepoint.prog_fd = prog_fd;
+	attr.raw_tracepoint.name = ptr_to_u64(OPTS_GET(opts, tp_name, NULL));
+	attr.raw_tracepoint.cookie = OPTS_GET(opts, cookie, 0);
 
 	fd = sys_bpf_fd(BPF_RAW_TRACEPOINT_OPEN, &attr, attr_sz);
 	return libbpf_err_errno(fd);
 }
 
+int bpf_raw_tracepoint_open(const char *name, int prog_fd)
+{
+	LIBBPF_OPTS(bpf_raw_tp_opts, opts, .tp_name = name);
+
+	return bpf_raw_tracepoint_open_opts(prog_fd, &opts);
+}
+
 int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts *opts)
 {
 	const size_t attr_sz = offsetofend(union bpf_attr, btf_token_fd);
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index df0db2f0cdb7..972e17ec0c09 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -617,6 +617,15 @@ LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
 			      __u32 query_flags, __u32 *attach_flags,
 			      __u32 *prog_ids, __u32 *prog_cnt);
 
+struct bpf_raw_tp_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	const char *tp_name;
+	__u64 cookie;
+	size_t :0;
+};
+#define bpf_raw_tp_opts__last_field cookie
+
+LIBBPF_API int bpf_raw_tracepoint_open_opts(int prog_fd, struct bpf_raw_tp_opts *opts);
 LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd);
 LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
 				 __u32 *buf_len, __u32 *prog_id, __u32 *fd_type,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 3a756d61c120..86df0d50cba7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -12309,13 +12309,19 @@ static int attach_tp(const struct bpf_program *prog, long cookie, struct bpf_lin
 	return libbpf_get_error(*link);
 }
 
-struct bpf_link *bpf_program__attach_raw_tracepoint(const struct bpf_program *prog,
-						    const char *tp_name)
+struct bpf_link *
+bpf_program__attach_raw_tracepoint_opts(const struct bpf_program *prog,
+					const char *tp_name,
+					struct bpf_raw_tracepoint_opts *opts)
 {
+	LIBBPF_OPTS(bpf_raw_tp_opts, raw_opts);
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link *link;
 	int prog_fd, pfd;
 
+	if (!OPTS_VALID(opts, bpf_raw_tracepoint_opts))
+		return libbpf_err_ptr(-EINVAL);
+
 	prog_fd = bpf_program__fd(prog);
 	if (prog_fd < 0) {
 		pr_warn("prog '%s': can't attach before loaded\n", prog->name);
@@ -12327,7 +12333,9 @@ struct bpf_link *bpf_program__attach_raw_tracepoint(const struct bpf_program *pr
 		return libbpf_err_ptr(-ENOMEM);
 	link->detach = &bpf_link__detach_fd;
 
-	pfd = bpf_raw_tracepoint_open(tp_name, prog_fd);
+	raw_opts.tp_name = tp_name;
+	raw_opts.cookie = OPTS_GET(opts, cookie, 0);
+	pfd = bpf_raw_tracepoint_open_opts(prog_fd, &raw_opts);
 	if (pfd < 0) {
 		pfd = -errno;
 		free(link);
@@ -12339,6 +12347,12 @@ struct bpf_link *bpf_program__attach_raw_tracepoint(const struct bpf_program *pr
 	return link;
 }
 
+struct bpf_link *bpf_program__attach_raw_tracepoint(const struct bpf_program *prog,
+						    const char *tp_name)
+{
+	return bpf_program__attach_raw_tracepoint_opts(prog, tp_name, NULL);
+}
+
 static int attach_raw_tp(const struct bpf_program *prog, long cookie, struct bpf_link **link)
 {
 	static const char *const prefixes[] = {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 7b510761f545..f88ab50c0229 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -760,9 +760,20 @@ bpf_program__attach_tracepoint_opts(const struct bpf_program *prog,
 				    const char *tp_name,
 				    const struct bpf_tracepoint_opts *opts);
 
+struct bpf_raw_tracepoint_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u64 cookie;
+	size_t :0;
+};
+#define bpf_raw_tracepoint_opts__last_field cookie
+
 LIBBPF_API struct bpf_link *
 bpf_program__attach_raw_tracepoint(const struct bpf_program *prog,
 				   const char *tp_name);
+LIBBPF_API struct bpf_link *
+bpf_program__attach_raw_tracepoint_opts(const struct bpf_program *prog,
+					const char *tp_name,
+					struct bpf_raw_tracepoint_opts *opts);
 
 struct bpf_trace_opts {
 	/* size of this struct, for forward/backward compatibility */
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 86804fd90dd1..51732ecb1385 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -410,6 +410,8 @@ LIBBPF_1.3.0 {
 
 LIBBPF_1.4.0 {
 	global:
+		bpf_program__attach_raw_tracepoint_opts;
+		bpf_raw_tracepoint_open_opts;
 		bpf_token_create;
 		btf__new_split;
 		btf_ext__raw_data;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 bpf-next 5/5] selftests/bpf: add raw_tp/tp_btf BPF cookie subtests
  2024-03-19 23:38 [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2024-03-19 23:38 ` [PATCH v3 bpf-next 4/5] libbpf: add support for BPF cookie for raw_tp/tp_btf programs Andrii Nakryiko
@ 2024-03-19 23:38 ` Andrii Nakryiko
  2024-03-20  6:20 ` [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie patchwork-bot+netdevbpf
  5 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2024-03-19 23:38 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team, Stanislav Fomichev

Add test validating BPF cookie can be passed during raw_tp/tp_btf
attachment and can be retried at runtime with bpf_get_attach_cookie()
helper.

Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../selftests/bpf/prog_tests/bpf_cookie.c     | 114 +++++++++++++++++-
 .../selftests/bpf/progs/test_bpf_cookie.c     |  16 +++
 2 files changed, 129 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
index 1454cebc262b..4407ea428e77 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
@@ -573,6 +573,115 @@ static void lsm_subtest(struct test_bpf_cookie *skel)
 		close(lsm_fd);
 }
 
+static void tp_btf_subtest(struct test_bpf_cookie *skel)
+{
+	__u64 cookie;
+	int prog_fd, link_fd = -1;
+	struct bpf_link *link = NULL;
+	LIBBPF_OPTS(bpf_link_create_opts, link_opts);
+	LIBBPF_OPTS(bpf_raw_tp_opts, raw_tp_opts);
+	LIBBPF_OPTS(bpf_trace_opts, trace_opts);
+
+	/* There are three different ways to attach tp_btf (BTF-aware raw
+	 * tracepoint) programs. Let's test all of them.
+	 */
+	prog_fd = bpf_program__fd(skel->progs.handle_tp_btf);
+
+	/* low-level BPF_RAW_TRACEPOINT_OPEN command wrapper */
+	skel->bss->tp_btf_res = 0;
+
+	raw_tp_opts.cookie = cookie = 0x11000000000000L;
+	link_fd = bpf_raw_tracepoint_open_opts(prog_fd, &raw_tp_opts);
+	if (!ASSERT_GE(link_fd, 0, "bpf_raw_tracepoint_open_opts"))
+		goto cleanup;
+
+	usleep(1); /* trigger */
+	close(link_fd); /* detach */
+	link_fd = -1;
+
+	ASSERT_EQ(skel->bss->tp_btf_res, cookie, "raw_tp_open_res");
+
+	/* low-level generic bpf_link_create() API */
+	skel->bss->tp_btf_res = 0;
+
+	link_opts.tracing.cookie = cookie = 0x22000000000000L;
+	link_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_RAW_TP, &link_opts);
+	if (!ASSERT_GE(link_fd, 0, "bpf_link_create"))
+		goto cleanup;
+
+	usleep(1); /* trigger */
+	close(link_fd); /* detach */
+	link_fd = -1;
+
+	ASSERT_EQ(skel->bss->tp_btf_res, cookie, "link_create_res");
+
+	/* high-level bpf_link-based bpf_program__attach_trace_opts() API */
+	skel->bss->tp_btf_res = 0;
+
+	trace_opts.cookie = cookie = 0x33000000000000L;
+	link = bpf_program__attach_trace_opts(skel->progs.handle_tp_btf, &trace_opts);
+	if (!ASSERT_OK_PTR(link, "attach_trace_opts"))
+		goto cleanup;
+
+	usleep(1); /* trigger */
+	bpf_link__destroy(link); /* detach */
+	link = NULL;
+
+	ASSERT_EQ(skel->bss->tp_btf_res, cookie, "attach_trace_opts_res");
+
+cleanup:
+	if (link_fd >= 0)
+		close(link_fd);
+	bpf_link__destroy(link);
+}
+
+static void raw_tp_subtest(struct test_bpf_cookie *skel)
+{
+	__u64 cookie;
+	int prog_fd, link_fd = -1;
+	struct bpf_link *link = NULL;
+	LIBBPF_OPTS(bpf_raw_tp_opts, raw_tp_opts);
+	LIBBPF_OPTS(bpf_raw_tracepoint_opts, opts);
+
+	/* There are two different ways to attach raw_tp programs */
+	prog_fd = bpf_program__fd(skel->progs.handle_raw_tp);
+
+	/* low-level BPF_RAW_TRACEPOINT_OPEN command wrapper */
+	skel->bss->raw_tp_res = 0;
+
+	raw_tp_opts.tp_name = "sys_enter";
+	raw_tp_opts.cookie = cookie = 0x55000000000000L;
+	link_fd = bpf_raw_tracepoint_open_opts(prog_fd, &raw_tp_opts);
+	if (!ASSERT_GE(link_fd, 0, "bpf_raw_tracepoint_open_opts"))
+		goto cleanup;
+
+	usleep(1); /* trigger */
+	close(link_fd); /* detach */
+	link_fd = -1;
+
+	ASSERT_EQ(skel->bss->raw_tp_res, cookie, "raw_tp_open_res");
+
+	/* high-level bpf_link-based bpf_program__attach_raw_tracepoint_opts() API */
+	skel->bss->raw_tp_res = 0;
+
+	opts.cookie = cookie = 0x66000000000000L;
+	link = bpf_program__attach_raw_tracepoint_opts(skel->progs.handle_raw_tp,
+						       "sys_enter", &opts);
+	if (!ASSERT_OK_PTR(link, "attach_raw_tp_opts"))
+		goto cleanup;
+
+	usleep(1); /* trigger */
+	bpf_link__destroy(link); /* detach */
+	link = NULL;
+
+	ASSERT_EQ(skel->bss->raw_tp_res, cookie, "attach_raw_tp_opts_res");
+
+cleanup:
+	if (link_fd >= 0)
+		close(link_fd);
+	bpf_link__destroy(link);
+}
+
 void test_bpf_cookie(void)
 {
 	struct test_bpf_cookie *skel;
@@ -601,6 +710,9 @@ void test_bpf_cookie(void)
 		tracing_subtest(skel);
 	if (test__start_subtest("lsm"))
 		lsm_subtest(skel);
-
+	if (test__start_subtest("tp_btf"))
+		tp_btf_subtest(skel);
+	if (test__start_subtest("raw_tp"))
+		raw_tp_subtest(skel);
 	test_bpf_cookie__destroy(skel);
 }
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_cookie.c b/tools/testing/selftests/bpf/progs/test_bpf_cookie.c
index 5a3a80f751c4..c83142b55f47 100644
--- a/tools/testing/selftests/bpf/progs/test_bpf_cookie.c
+++ b/tools/testing/selftests/bpf/progs/test_bpf_cookie.c
@@ -15,6 +15,8 @@ __u64 uprobe_res;
 __u64 uretprobe_res;
 __u64 tp_res;
 __u64 pe_res;
+__u64 raw_tp_res;
+__u64 tp_btf_res;
 __u64 fentry_res;
 __u64 fexit_res;
 __u64 fmod_ret_res;
@@ -87,6 +89,20 @@ int handle_pe(struct pt_regs *ctx)
 	return 0;
 }
 
+SEC("raw_tp/sys_enter")
+int handle_raw_tp(void *ctx)
+{
+	update(ctx, &raw_tp_res);
+	return 0;
+}
+
+SEC("tp_btf/sys_enter")
+int handle_tp_btf(void *ctx)
+{
+	update(ctx, &tp_btf_res);
+	return 0;
+}
+
 SEC("fentry/bpf_fentry_test1")
 int BPF_PROG(fentry_test1, int a)
 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 bpf-next 4/5] libbpf: add support for BPF cookie for raw_tp/tp_btf programs
  2024-03-19 23:38 ` [PATCH v3 bpf-next 4/5] libbpf: add support for BPF cookie for raw_tp/tp_btf programs Andrii Nakryiko
@ 2024-03-19 23:45   ` Eduard Zingerman
  0 siblings, 0 replies; 10+ messages in thread
From: Eduard Zingerman @ 2024-03-19 23:45 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, ast, daniel, martin.lau
  Cc: kernel-team, Stanislav Fomichev

On Tue, 2024-03-19 at 16:38 -0700, Andrii Nakryiko wrote:
[...]

> +LIBBPF_API int bpf_raw_tracepoint_open_opts(int prog_fd, struct bpf_raw_tp_opts *opts);

Thank you!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie
  2024-03-19 23:38 [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2024-03-19 23:38 ` [PATCH v3 bpf-next 5/5] selftests/bpf: add raw_tp/tp_btf BPF cookie subtests Andrii Nakryiko
@ 2024-03-20  6:20 ` patchwork-bot+netdevbpf
  5 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-03-20  6:20 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, ast, daniel, martin.lau, kernel-team

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Tue, 19 Mar 2024 16:38:47 -0700 you wrote:
> Add ability to specify and retrieve BPF cookie for raw tracepoint programs.
> Both BTF-aware (SEC("tp_btf")) and non-BTF-aware (SEC("raw_tp")) are
> supported, as they are exactly the same at runtime.
> 
> This issue recently came up in production use cases, where custom tried to
> switch from slower classic tracepoints to raw tracepoints and ran into this
> limitation. Luckily, it's not that hard to support this for raw_tp programs.
> 
> [...]

Here is the summary with links:
  - [v3,bpf-next,1/5] bpf: flatten bpf_probe_register call chain
    https://git.kernel.org/bpf/bpf-next/c/6b9c2950c912
  - [v3,bpf-next,2/5] bpf: pass whole link instead of prog when triggering raw tracepoint
    https://git.kernel.org/bpf/bpf-next/c/d4dfc5700e86
  - [v3,bpf-next,3/5] bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs
    https://git.kernel.org/bpf/bpf-next/c/68ca5d4eebb8
  - [v3,bpf-next,4/5] libbpf: add support for BPF cookie for raw_tp/tp_btf programs
    https://git.kernel.org/bpf/bpf-next/c/36ffb2023e37
  - [v3,bpf-next,5/5] selftests/bpf: add raw_tp/tp_btf BPF cookie subtests
    https://git.kernel.org/bpf/bpf-next/c/51146ff0fae3

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 bpf-next 2/5] bpf: pass whole link instead of prog when triggering raw tracepoint
  2024-03-19 23:38 ` [PATCH v3 bpf-next 2/5] bpf: pass whole link instead of prog when triggering raw tracepoint Andrii Nakryiko
@ 2024-04-09  8:37   ` Pengfei Xu
  2024-04-18 20:52     ` Andrii Nakryiko
  0 siblings, 1 reply; 10+ messages in thread
From: Pengfei Xu @ 2024-04-09  8:37 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, ast, daniel, martin.lau, kernel-team, Stanislav Fomichev,
	syzkaller-bugs, pengfei.xu

Hi Andrii Nakryiko,

Greeting!

On 2024-03-19 at 16:38:49 -0700, Andrii Nakryiko wrote:
> Instead of passing prog as an argument to bpf_trace_runX() helpers, that
> are called from tracepoint triggering calls, store BPF link itself
> (struct bpf_raw_tp_link for raw tracepoints). This will allow to pass
> extra information like BPF cookie into raw tracepoint registration.
> 
> Instead of replacing `struct bpf_prog *prog = __data;` with
> corresponding `struct bpf_raw_tp_link *link = __data;` assignment in
> `__bpf_trace_##call` I just passed `__data` through into underlying
> bpf_trace_runX() call. This works well because we implicitly cast `void *`,
> and it also avoids naming clashes with arguments coming from
> tracepoint's "proto" list. We could have run into the same problem with
> "prog", we just happened to not have a tracepoint that has "prog" input
> argument. We are less lucky with "link", as there are tracepoints using
> "link" argument name already. So instead of trying to avoid naming
> conflicts, let's just remove intermediate local variable. It doesn't
> hurt readibility, it's either way a bit of a maze of calls and macros,
> that requires careful reading.
> 
> Acked-by: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/bpf.h          |  5 +++++
>  include/linux/trace_events.h | 36 ++++++++++++++++++++----------------
>  include/trace/bpf_probe.h    |  3 +--
>  kernel/bpf/syscall.c         |  9 ++-------
>  kernel/trace/bpf_trace.c     | 18 ++++++++++--------
>  5 files changed, 38 insertions(+), 33 deletions(-)
> 

I used syzkaller and test intel internal kernel and found "KASAN:
slab-use-after-free Read in bpf_trace_run4" problem.

Bisected and found related commit:
d4dfc5700e86 bpf: pass whole link instead of prog when triggering raw tracepoint

Checked that the commit above is the same as this commit.

All detailed info:https://github.com/xupengfe/syzkaller_logs/tree/main/240409_092216_bpf_trace_run4
Syzkaller repro code: https://github.com/xupengfe/syzkaller_logs/blob/main/240409_092216_bpf_trace_run4/repro.c
Syzkaller syscall repro steps: https://github.com/xupengfe/syzkaller_logs/blob/main/240409_092216_bpf_trace_run4/repro.prog
Kconfig(make olddefconfig): https://github.com/xupengfe/syzkaller_logs/blob/main/240409_092216_bpf_trace_run4/kconfig_origin
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/240409_092216_bpf_trace_run4/bisect_info.log
issue_bzImage: https://github.com/xupengfe/syzkaller_logs/raw/main/240409_092216_bpf_trace_run4/bzImage_v6.9-rc2_next.tar.gz
issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/240409_092216_bpf_trace_run4/5d8569db0cb982d3c630482c285578e98a75fc90_dmesg.log

"
[   24.977435] ==================================================================
[   24.978307] BUG: KASAN: slab-use-after-free in bpf_trace_run4+0x547/0x5e0
[   24.979138] Read of size 8 at addr ffff888015676218 by task rcu_preempt/16
[   24.979936] 
[   24.980152] CPU: 0 PID: 16 Comm: rcu_preempt Not tainted 6.9.0-rc2-5d8569db0cb9+ #1
[   24.981040] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[   24.982352] Call Trace:
[   24.982672]  <TASK>
[   24.982952]  dump_stack_lvl+0xe9/0x150
[   24.983449]  print_report+0xd0/0x610
[   24.983904]  ? bpf_trace_run4+0x547/0x5e0
[   24.984393]  ? kasan_complete_mode_report_info+0x80/0x200
[   24.985039]  ? bpf_trace_run4+0x547/0x5e0
[   24.985528]  kasan_report+0x9f/0xe0
[   24.985961]  ? bpf_trace_run4+0x547/0x5e0
[   24.986457]  __asan_report_load8_noabort+0x18/0x20
[   24.987055]  bpf_trace_run4+0x547/0x5e0
[   24.987532]  ? __pfx_bpf_trace_run4+0x10/0x10
[   24.988061]  ? __this_cpu_preempt_check+0x20/0x30
[   24.988670]  ? lock_is_held_type+0xe5/0x140
[   24.989185]  ? set_next_entity+0x38c/0x630
[   24.989698]  ? put_prev_entity+0x50/0x1f0
[   24.990199]  __bpf_trace_sched_switch+0x14/0x20
[   24.990776]  __traceiter_sched_switch+0x7a/0xd0
[   24.991293]  __schedule+0xc6d/0x2840
[   24.991721]  ? __pfx___schedule+0x10/0x10
[   24.992170]  ? lock_release+0x3f6/0x790
[   24.992616]  ? __this_cpu_preempt_check+0x20/0x30
[   24.993140]  ? schedule+0x1f3/0x290
[   24.993536]  ? __pfx_lock_release+0x10/0x10
[   24.994003]  ? _raw_spin_unlock_irqrestore+0x39/0x70
[   24.994561]  ? schedule_timeout+0x559/0x940
[   24.995021]  ? __this_cpu_preempt_check+0x20/0x30
[   24.995548]  schedule+0xcf/0x290
[   24.995922]  schedule_timeout+0x55e/0x940
[   24.996369]  ? __pfx_schedule_timeout+0x10/0x10
[   24.996870]  ? prepare_to_swait_event+0xff/0x450
[   24.997401]  ? prepare_to_swait_event+0xc4/0x450
[   24.997916]  ? __this_cpu_preempt_check+0x20/0x30
[   24.998445]  ? __pfx_process_timeout+0x10/0x10
[   24.998971]  ? tcp_get_idx+0xd0/0x270
[   24.999408]  ? prepare_to_swait_event+0xff/0x450
[   24.999934]  rcu_gp_fqs_loop+0x661/0xa70
[   25.000399]  ? __pfx_rcu_gp_fqs_loop+0x10/0x10
[   25.000913]  ? __pfx_rcu_gp_init+0x10/0x10
[   25.001381]  rcu_gp_kthread+0x25e/0x360
[   25.001822]  ? __pfx_rcu_gp_kthread+0x10/0x10
[   25.002324]  ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
[   25.002966]  ? __kthread_parkme+0x146/0x220
[   25.003472]  ? __pfx_rcu_gp_kthread+0x10/0x10
[   25.003995]  kthread+0x354/0x470
[   25.004400]  ? __pfx_kthread+0x10/0x10
[   25.004862]  ret_from_fork+0x57/0x90
[   25.005315]  ? __pfx_kthread+0x10/0x10
[   25.005778]  ret_from_fork_asm+0x1a/0x30
[   25.006282]  </TASK>
[   25.006560] 
[   25.006773] Allocated by task 732:
[   25.007187]  kasan_save_stack+0x2a/0x50
[   25.007660]  kasan_save_track+0x18/0x40
[   25.008124]  kasan_save_alloc_info+0x3b/0x50
[   25.008649]  __kasan_kmalloc+0x86/0xa0
[   25.009107]  kmalloc_trace+0x1c5/0x3d0
[   25.009599]  bpf_raw_tp_link_attach+0x28e/0x5a0
[   25.010163]  __sys_bpf+0x452/0x5550
[   25.010599]  __x64_sys_bpf+0x7e/0xc0
[   25.011054]  do_syscall_64+0x73/0x150
[   25.011523]  entry_SYSCALL_64_after_hwframe+0x71/0x79
[   25.012156] 
[   25.012360] Freed by task 732:
[   25.012740]  kasan_save_stack+0x2a/0x50
[   25.013211]  kasan_save_track+0x18/0x40
[   25.013689]  kasan_save_free_info+0x3e/0x60
[   25.014198]  __kasan_slab_free+0x107/0x190
[   25.014694]  kfree+0xf3/0x320
[   25.015085]  bpf_raw_tp_link_dealloc+0x1e/0x30
[   25.015632]  bpf_link_free+0x145/0x1b0
[   25.016094]  bpf_link_put_direct+0x45/0x60
[   25.016593]  bpf_link_release+0x40/0x50
[   25.017064]  __fput+0x273/0xb70
[   25.017489]  ____fput+0x1e/0x30
[   25.017890]  task_work_run+0x1a3/0x2d0
[   25.018356]  do_exit+0xad3/0x31b0
[   25.018800]  do_group_exit+0xdf/0x2b0
[   25.019256]  __x64_sys_exit_group+0x47/0x50
[   25.019763]  do_syscall_64+0x73/0x150
[   25.020215]  entry_SYSCALL_64_after_hwframe+0x71/0x79
[   25.020830] 
[   25.021036] The buggy address belongs to the object at ffff888015676200
[   25.021036]  which belongs to the cache kmalloc-128 of size 128
[   25.022465] The buggy address is located 24 bytes inside of
[   25.022465]  freed 128-byte region [ffff888015676200, ffff888015676280)
[   25.023780] 
[   25.023970] The buggy address belongs to the physical page:
[   25.024563] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x15676
[   25.025401] flags: 0xfffffe0000800(slab|node=0|zone=1|lastcpupid=0x3fffff)
[   25.026140] page_type: 0xffffffff()
[   25.026535] raw: 000fffffe0000800 ffff88800a0418c0 dead000000000122 0000000000000000
[   25.027363] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
[   25.028182] page dumped because: kasan: bad access detected
[   25.028773] 
[   25.028957] Memory state around the buggy address:
[   25.029476]  ffff888015676100: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   25.030244]  ffff888015676180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   25.031018] >ffff888015676200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   25.031780]                             ^
[   25.032221]  ffff888015676280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   25.032992]  ffff888015676300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   25.033769] ==================================================================
[   25.034571] Disabling lock debugging due to kernel taint
"

Could you take a look is it useful?

Thanks!

---

If you don't need the following environment to reproduce the problem or if you
already have one reproduced environment, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
  // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
  // You could change the bzImage_xxx as you want
  // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
You could use below command to log in, there is no password for root.
ssh -p 10023 root@localhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@localhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
yum -y install libslirp-devel.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
make
make install

Best Regards,
Thanks!



> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 17843e66a1d3..2ea8ce59f582 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1607,6 +1607,11 @@ struct bpf_tracing_link {
>  	struct bpf_prog *tgt_prog;
>  };
>  
> +struct bpf_raw_tp_link {
> +	struct bpf_link link;
> +	struct bpf_raw_event_map *btp;
> +};
> +
>  struct bpf_link_primer {
>  	struct bpf_link *link;
>  	struct file *file;
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index d68ff9b1247f..a7fc6fb6de3c 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -759,8 +759,11 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
>  int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
>  void perf_event_detach_bpf_prog(struct perf_event *event);
>  int perf_event_query_prog_array(struct perf_event *event, void __user *info);
> -int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
> -int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
> +
> +struct bpf_raw_tp_link;
> +int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link);
> +int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link);
> +
>  struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
>  void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
>  int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
> @@ -788,11 +791,12 @@ perf_event_query_prog_array(struct perf_event *event, void __user *info)
>  {
>  	return -EOPNOTSUPP;
>  }
> -static inline int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *p)
> +struct bpf_raw_tp_link;
> +static inline int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link)
>  {
>  	return -EOPNOTSUPP;
>  }
> -static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *p)
> +static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link)
>  {
>  	return -EOPNOTSUPP;
>  }
> @@ -903,31 +907,31 @@ void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
>  int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
>  void perf_event_free_bpf_prog(struct perf_event *event);
>  
> -void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> -void bpf_trace_run2(struct bpf_prog *prog, u64 arg1, u64 arg2);
> -void bpf_trace_run3(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run1(struct bpf_raw_tp_link *link, u64 arg1);
> +void bpf_trace_run2(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2);
> +void bpf_trace_run3(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
>  		    u64 arg3);
> -void bpf_trace_run4(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run4(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
>  		    u64 arg3, u64 arg4);
> -void bpf_trace_run5(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run5(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
>  		    u64 arg3, u64 arg4, u64 arg5);
> -void bpf_trace_run6(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run6(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
>  		    u64 arg3, u64 arg4, u64 arg5, u64 arg6);
> -void bpf_trace_run7(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run7(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
>  		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7);
> -void bpf_trace_run8(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run8(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
>  		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>  		    u64 arg8);
> -void bpf_trace_run9(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run9(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
>  		    u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>  		    u64 arg8, u64 arg9);
> -void bpf_trace_run10(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run10(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
>  		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>  		     u64 arg8, u64 arg9, u64 arg10);
> -void bpf_trace_run11(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run11(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
>  		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>  		     u64 arg8, u64 arg9, u64 arg10, u64 arg11);
> -void bpf_trace_run12(struct bpf_prog *prog, u64 arg1, u64 arg2,
> +void bpf_trace_run12(struct bpf_raw_tp_link *link, u64 arg1, u64 arg2,
>  		     u64 arg3, u64 arg4, u64 arg5, u64 arg6, u64 arg7,
>  		     u64 arg8, u64 arg9, u64 arg10, u64 arg11, u64 arg12);
>  void perf_trace_run_bpf_submit(void *raw_data, int size, int rctx,
> diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> index e609cd7da47e..a2ea11cc912e 100644
> --- a/include/trace/bpf_probe.h
> +++ b/include/trace/bpf_probe.h
> @@ -46,8 +46,7 @@
>  static notrace void							\
>  __bpf_trace_##call(void *__data, proto)					\
>  {									\
> -	struct bpf_prog *prog = __data;					\
> -	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));	\
> +	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));	\
>  }
>  
>  #undef DECLARE_EVENT_CLASS
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ae2ff73bde7e..1cb4c3809af4 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -3469,17 +3469,12 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
>  	return err;
>  }
>  
> -struct bpf_raw_tp_link {
> -	struct bpf_link link;
> -	struct bpf_raw_event_map *btp;
> -};
> -
>  static void bpf_raw_tp_link_release(struct bpf_link *link)
>  {
>  	struct bpf_raw_tp_link *raw_tp =
>  		container_of(link, struct bpf_raw_tp_link, link);
>  
> -	bpf_probe_unregister(raw_tp->btp, raw_tp->link.prog);
> +	bpf_probe_unregister(raw_tp->btp, raw_tp);
>  	bpf_put_raw_tracepoint(raw_tp->btp);
>  }
>  
> @@ -3833,7 +3828,7 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
>  		goto out_put_btp;
>  	}
>  
> -	err = bpf_probe_register(link->btp, prog);
> +	err = bpf_probe_register(link->btp, link);
>  	if (err) {
>  		bpf_link_cleanup(&link_primer);
>  		goto out_put_btp;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 30ecf62f8a17..17de91ad4a1f 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2366,8 +2366,10 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
>  }
>  
>  static __always_inline
> -void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
> +void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
>  {
> +	struct bpf_prog *prog = link->link.prog;
> +
>  	cant_sleep();
>  	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
>  		bpf_prog_inc_misses_counter(prog);
> @@ -2404,12 +2406,12 @@ void __bpf_trace_run(struct bpf_prog *prog, u64 *args)
>  #define __SEQ_0_11	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
>  
>  #define BPF_TRACE_DEFN_x(x)						\
> -	void bpf_trace_run##x(struct bpf_prog *prog,			\
> +	void bpf_trace_run##x(struct bpf_raw_tp_link *link,		\
>  			      REPEAT(x, SARG, __DL_COM, __SEQ_0_11))	\
>  	{								\
>  		u64 args[x];						\
>  		REPEAT(x, COPY, __DL_SEM, __SEQ_0_11);			\
> -		__bpf_trace_run(prog, args);				\
> +		__bpf_trace_run(link, args);				\
>  	}								\
>  	EXPORT_SYMBOL_GPL(bpf_trace_run##x)
>  BPF_TRACE_DEFN_x(1);
> @@ -2425,9 +2427,10 @@ BPF_TRACE_DEFN_x(10);
>  BPF_TRACE_DEFN_x(11);
>  BPF_TRACE_DEFN_x(12);
>  
> -int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> +int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link)
>  {
>  	struct tracepoint *tp = btp->tp;
> +	struct bpf_prog *prog = link->link.prog;
>  
>  	/*
>  	 * check that program doesn't access arguments beyond what's
> @@ -2439,13 +2442,12 @@ int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
>  	if (prog->aux->max_tp_access > btp->writable_size)
>  		return -EINVAL;
>  
> -	return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func,
> -						   prog);
> +	return tracepoint_probe_register_may_exist(tp, (void *)btp->bpf_func, link);
>  }
>  
> -int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
> +int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_raw_tp_link *link)
>  {
> -	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
> +	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, link);
>  }
>  
>  int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 bpf-next 2/5] bpf: pass whole link instead of prog when triggering raw tracepoint
  2024-04-09  8:37   ` Pengfei Xu
@ 2024-04-18 20:52     ` Andrii Nakryiko
  0 siblings, 0 replies; 10+ messages in thread
From: Andrii Nakryiko @ 2024-04-18 20:52 UTC (permalink / raw)
  To: Pengfei Xu
  Cc: Andrii Nakryiko, bpf, ast, daniel, martin.lau, kernel-team,
	Stanislav Fomichev, syzkaller-bugs

On Tue, Apr 9, 2024 at 1:41 AM Pengfei Xu <pengfei.xu@intel.com> wrote:
>
> Hi Andrii Nakryiko,
>
> Greeting!
>
> On 2024-03-19 at 16:38:49 -0700, Andrii Nakryiko wrote:
> > Instead of passing prog as an argument to bpf_trace_runX() helpers, that
> > are called from tracepoint triggering calls, store BPF link itself
> > (struct bpf_raw_tp_link for raw tracepoints). This will allow to pass
> > extra information like BPF cookie into raw tracepoint registration.
> >
> > Instead of replacing `struct bpf_prog *prog = __data;` with
> > corresponding `struct bpf_raw_tp_link *link = __data;` assignment in
> > `__bpf_trace_##call` I just passed `__data` through into underlying
> > bpf_trace_runX() call. This works well because we implicitly cast `void *`,
> > and it also avoids naming clashes with arguments coming from
> > tracepoint's "proto" list. We could have run into the same problem with
> > "prog", we just happened to not have a tracepoint that has "prog" input
> > argument. We are less lucky with "link", as there are tracepoints using
> > "link" argument name already. So instead of trying to avoid naming
> > conflicts, let's just remove intermediate local variable. It doesn't
> > hurt readibility, it's either way a bit of a maze of calls and macros,
> > that requires careful reading.
> >
> > Acked-by: Stanislav Fomichev <sdf@google.com>
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/bpf.h          |  5 +++++
> >  include/linux/trace_events.h | 36 ++++++++++++++++++++----------------
> >  include/trace/bpf_probe.h    |  3 +--
> >  kernel/bpf/syscall.c         |  9 ++-------
> >  kernel/trace/bpf_trace.c     | 18 ++++++++++--------
> >  5 files changed, 38 insertions(+), 33 deletions(-)
> >
>
> I used syzkaller and test intel internal kernel and found "KASAN:
> slab-use-after-free Read in bpf_trace_run4" problem.
>
> Bisected and found related commit:
> d4dfc5700e86 bpf: pass whole link instead of prog when triggering raw tracepoint

I think [0] is fixing this problem, it landed into bpf tree

  [0] https://lore.kernel.org/all/20240328052426.3042617-2-andrii@kernel.org/

>
> Checked that the commit above is the same as this commit.
>
> All detailed info:https://github.com/xupengfe/syzkaller_logs/tree/main/240409_092216_bpf_trace_run4
> Syzkaller repro code: https://github.com/xupengfe/syzkaller_logs/blob/main/240409_092216_bpf_trace_run4/repro.c
> Syzkaller syscall repro steps: https://github.com/xupengfe/syzkaller_logs/blob/main/240409_092216_bpf_trace_run4/repro.prog
> Kconfig(make olddefconfig): https://github.com/xupengfe/syzkaller_logs/blob/main/240409_092216_bpf_trace_run4/kconfig_origin
> Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/240409_092216_bpf_trace_run4/bisect_info.log
> issue_bzImage: https://github.com/xupengfe/syzkaller_logs/raw/main/240409_092216_bpf_trace_run4/bzImage_v6.9-rc2_next.tar.gz
> issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/240409_092216_bpf_trace_run4/5d8569db0cb982d3c630482c285578e98a75fc90_dmesg.log
>
> "
> [   24.977435] ==================================================================
> [   24.978307] BUG: KASAN: slab-use-after-free in bpf_trace_run4+0x547/0x5e0
> [   24.979138] Read of size 8 at addr ffff888015676218 by task rcu_preempt/16
> [   24.979936]
> [   24.980152] CPU: 0 PID: 16 Comm: rcu_preempt Not tainted 6.9.0-rc2-5d8569db0cb9+ #1
> [   24.981040] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> [   24.982352] Call Trace:
> [   24.982672]  <TASK>
> [   24.982952]  dump_stack_lvl+0xe9/0x150
> [   24.983449]  print_report+0xd0/0x610
> [   24.983904]  ? bpf_trace_run4+0x547/0x5e0
> [   24.984393]  ? kasan_complete_mode_report_info+0x80/0x200
> [   24.985039]  ? bpf_trace_run4+0x547/0x5e0
> [   24.985528]  kasan_report+0x9f/0xe0
> [   24.985961]  ? bpf_trace_run4+0x547/0x5e0
> [   24.986457]  __asan_report_load8_noabort+0x18/0x20
> [   24.987055]  bpf_trace_run4+0x547/0x5e0
> [   24.987532]  ? __pfx_bpf_trace_run4+0x10/0x10
> [   24.988061]  ? __this_cpu_preempt_check+0x20/0x30
> [   24.988670]  ? lock_is_held_type+0xe5/0x140
> [   24.989185]  ? set_next_entity+0x38c/0x630
> [   24.989698]  ? put_prev_entity+0x50/0x1f0
> [   24.990199]  __bpf_trace_sched_switch+0x14/0x20
> [   24.990776]  __traceiter_sched_switch+0x7a/0xd0
> [   24.991293]  __schedule+0xc6d/0x2840
> [   24.991721]  ? __pfx___schedule+0x10/0x10
> [   24.992170]  ? lock_release+0x3f6/0x790
> [   24.992616]  ? __this_cpu_preempt_check+0x20/0x30
> [   24.993140]  ? schedule+0x1f3/0x290
> [   24.993536]  ? __pfx_lock_release+0x10/0x10
> [   24.994003]  ? _raw_spin_unlock_irqrestore+0x39/0x70
> [   24.994561]  ? schedule_timeout+0x559/0x940
> [   24.995021]  ? __this_cpu_preempt_check+0x20/0x30
> [   24.995548]  schedule+0xcf/0x290
> [   24.995922]  schedule_timeout+0x55e/0x940
> [   24.996369]  ? __pfx_schedule_timeout+0x10/0x10
> [   24.996870]  ? prepare_to_swait_event+0xff/0x450
> [   24.997401]  ? prepare_to_swait_event+0xc4/0x450
> [   24.997916]  ? __this_cpu_preempt_check+0x20/0x30
> [   24.998445]  ? __pfx_process_timeout+0x10/0x10
> [   24.998971]  ? tcp_get_idx+0xd0/0x270
> [   24.999408]  ? prepare_to_swait_event+0xff/0x450
> [   24.999934]  rcu_gp_fqs_loop+0x661/0xa70
> [   25.000399]  ? __pfx_rcu_gp_fqs_loop+0x10/0x10
> [   25.000913]  ? __pfx_rcu_gp_init+0x10/0x10
> [   25.001381]  rcu_gp_kthread+0x25e/0x360
> [   25.001822]  ? __pfx_rcu_gp_kthread+0x10/0x10
> [   25.002324]  ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30
> [   25.002966]  ? __kthread_parkme+0x146/0x220
> [   25.003472]  ? __pfx_rcu_gp_kthread+0x10/0x10
> [   25.003995]  kthread+0x354/0x470
> [   25.004400]  ? __pfx_kthread+0x10/0x10
> [   25.004862]  ret_from_fork+0x57/0x90
> [   25.005315]  ? __pfx_kthread+0x10/0x10
> [   25.005778]  ret_from_fork_asm+0x1a/0x30
> [   25.006282]  </TASK>
> [   25.006560]
> [   25.006773] Allocated by task 732:
> [   25.007187]  kasan_save_stack+0x2a/0x50
> [   25.007660]  kasan_save_track+0x18/0x40
> [   25.008124]  kasan_save_alloc_info+0x3b/0x50
> [   25.008649]  __kasan_kmalloc+0x86/0xa0
> [   25.009107]  kmalloc_trace+0x1c5/0x3d0
> [   25.009599]  bpf_raw_tp_link_attach+0x28e/0x5a0
> [   25.010163]  __sys_bpf+0x452/0x5550
> [   25.010599]  __x64_sys_bpf+0x7e/0xc0
> [   25.011054]  do_syscall_64+0x73/0x150
> [   25.011523]  entry_SYSCALL_64_after_hwframe+0x71/0x79
> [   25.012156]
> [   25.012360] Freed by task 732:
> [   25.012740]  kasan_save_stack+0x2a/0x50
> [   25.013211]  kasan_save_track+0x18/0x40
> [   25.013689]  kasan_save_free_info+0x3e/0x60
> [   25.014198]  __kasan_slab_free+0x107/0x190
> [   25.014694]  kfree+0xf3/0x320
> [   25.015085]  bpf_raw_tp_link_dealloc+0x1e/0x30
> [   25.015632]  bpf_link_free+0x145/0x1b0
> [   25.016094]  bpf_link_put_direct+0x45/0x60
> [   25.016593]  bpf_link_release+0x40/0x50
> [   25.017064]  __fput+0x273/0xb70
> [   25.017489]  ____fput+0x1e/0x30
> [   25.017890]  task_work_run+0x1a3/0x2d0
> [   25.018356]  do_exit+0xad3/0x31b0
> [   25.018800]  do_group_exit+0xdf/0x2b0
> [   25.019256]  __x64_sys_exit_group+0x47/0x50
> [   25.019763]  do_syscall_64+0x73/0x150
> [   25.020215]  entry_SYSCALL_64_after_hwframe+0x71/0x79
> [   25.020830]
> [   25.021036] The buggy address belongs to the object at ffff888015676200
> [   25.021036]  which belongs to the cache kmalloc-128 of size 128
> [   25.022465] The buggy address is located 24 bytes inside of
> [   25.022465]  freed 128-byte region [ffff888015676200, ffff888015676280)
> [   25.023780]
> [   25.023970] The buggy address belongs to the physical page:
> [   25.024563] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x15676
> [   25.025401] flags: 0xfffffe0000800(slab|node=0|zone=1|lastcpupid=0x3fffff)
> [   25.026140] page_type: 0xffffffff()
> [   25.026535] raw: 000fffffe0000800 ffff88800a0418c0 dead000000000122 0000000000000000
> [   25.027363] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> [   25.028182] page dumped because: kasan: bad access detected
> [   25.028773]
> [   25.028957] Memory state around the buggy address:
> [   25.029476]  ffff888015676100: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   25.030244]  ffff888015676180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [   25.031018] >ffff888015676200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   25.031780]                             ^
> [   25.032221]  ffff888015676280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [   25.032992]  ffff888015676300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> [   25.033769] ==================================================================
> [   25.034571] Disabling lock debugging due to kernel taint
> "
>
> Could you take a look is it useful?
>
> Thanks!
>
> ---
>
> If you don't need the following environment to reproduce the problem or if you
> already have one reproduced environment, please ignore the following information.
>
> How to reproduce:
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
>   // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
>   // You could change the bzImage_xxx as you want
>   // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
> You could use below command to log in, there is no password for root.
> ssh -p 10023 root@localhost
>
> After login vm(virtual machine) successfully, you could transfer reproduced
> binary to the vm by below way, and reproduce the problem in vm:
> gcc -pthread -o repro repro.c
> scp -P 10023 repro root@localhost:/root/
>
> Get the bzImage for target kernel:
> Please use target kconfig and copy it to kernel_src/.config
> make olddefconfig
> make -jx bzImage           //x should equal or less than cpu num your pc has
>
> Fill the bzImage file into above start3.sh to load the target kernel in vm.
>
>
> Tips:
> If you already have qemu-system-x86_64, please ignore below info.
> If you want to install qemu v7.1.0 version:
> git clone https://github.com/qemu/qemu.git
> cd qemu
> git checkout -f v7.1.0
> mkdir build
> cd build
> yum install -y ninja-build.x86_64
> yum -y install libslirp-devel.x86_64
> ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
> make
> make install
>
> Best Regards,
> Thanks!
>

[...]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-04-18 20:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-19 23:38 [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie Andrii Nakryiko
2024-03-19 23:38 ` [PATCH v3 bpf-next 1/5] bpf: flatten bpf_probe_register call chain Andrii Nakryiko
2024-03-19 23:38 ` [PATCH v3 bpf-next 2/5] bpf: pass whole link instead of prog when triggering raw tracepoint Andrii Nakryiko
2024-04-09  8:37   ` Pengfei Xu
2024-04-18 20:52     ` Andrii Nakryiko
2024-03-19 23:38 ` [PATCH v3 bpf-next 3/5] bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs Andrii Nakryiko
2024-03-19 23:38 ` [PATCH v3 bpf-next 4/5] libbpf: add support for BPF cookie for raw_tp/tp_btf programs Andrii Nakryiko
2024-03-19 23:45   ` Eduard Zingerman
2024-03-19 23:38 ` [PATCH v3 bpf-next 5/5] selftests/bpf: add raw_tp/tp_btf BPF cookie subtests Andrii Nakryiko
2024-03-20  6:20 ` [PATCH v3 bpf-next 0/5] BPF raw tracepoint support for BPF cookie patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox