[PATCH RFC bpf-next 0/5] bpf: tracing session supporting

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC bpf-next 0/5] bpf: tracing session supporting
@ 2025-10-18 14:21 Menglong Dong
  2025-10-18 14:21 ` [PATCH RFC bpf-next 1/5] bpf: add tracing session support Menglong Dong
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Menglong Dong @ 2025-10-18 14:21 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

Sometimes, we need to hook both the entry and exit of a function with
TRACING. Therefore, we need define a FENTRY and a FEXIT for the target
function, which is not convenient.

Therefore, we add a tracing session support for TRACING. Generally
speaking, it's similar to kprobe session, which can hook both the entry
and exit of a function with a single BPF program. Meanwhile, it can also
control the execution of the fexit with the return value of the fentry.
session cookie is not supported yet, and I'm not sure if it's necessary.

For now, only x86_64 is supported. Other architectures will be supported
later.

Menglong Dong (5):
  bpf: add tracing session support
  bpf: add kfunc bpf_tracing_is_exit for TRACE_SESSION
  bpf,x86: add tracing session supporting for x86_64
  libbpf: add support for tracing session
  selftests/bpf: add testcases for tracing session

 arch/arm64/net/bpf_jit_comp.c                 |   3 +
 arch/loongarch/net/bpf_jit.c                  |   3 +
 arch/powerpc/net/bpf_jit_comp.c               |   3 +
 arch/riscv/net/bpf_jit_comp64.c               |   3 +
 arch/s390/net/bpf_jit_comp.c                  |   3 +
 arch/x86/net/bpf_jit_comp.c                   | 115 ++++++++++-
 include/linux/bpf.h                           |   1 +
 include/uapi/linux/bpf.h                      |   1 +
 kernel/bpf/btf.c                              |   2 +
 kernel/bpf/syscall.c                          |   2 +
 kernel/bpf/trampoline.c                       |   5 +-
 kernel/bpf/verifier.c                         |  17 +-
 kernel/trace/bpf_trace.c                      |  43 ++++-
 net/bpf/test_run.c                            |   1 +
 net/core/bpf_sk_storage.c                     |   1 +
 tools/bpf/bpftool/common.c                    |   1 +
 tools/include/uapi/linux/bpf.h                |   1 +
 tools/lib/bpf/bpf.c                           |   2 +
 tools/lib/bpf/libbpf.c                        |   3 +
 .../selftests/bpf/prog_tests/fsession_test.c  | 132 +++++++++++++
 .../selftests/bpf/progs/fsession_test.c       | 178 ++++++++++++++++++
 21 files changed, 511 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/fsession_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/fsession_test.c

-- 
2.51.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC bpf-next 1/5] bpf: add tracing session support
  2025-10-18 14:21 [PATCH RFC bpf-next 0/5] bpf: tracing session supporting Menglong Dong
@ 2025-10-18 14:21 ` Menglong Dong
  2025-10-18 14:21 ` [PATCH RFC bpf-next 2/5] bpf: add kfunc bpf_tracing_is_exit for TRACE_SESSION Menglong Dong
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Menglong Dong @ 2025-10-18 14:21 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

The tracing session is something that similar to kprobe session. It allow
to attach a single BPF program to both the entry and the exit of the
target functions.

While a non-zero value is returned by the fentry, the fexit will be
skipped, which is similar to kprobe session.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/arm64/net/bpf_jit_comp.c   |  3 +++
 arch/loongarch/net/bpf_jit.c    |  3 +++
 arch/powerpc/net/bpf_jit_comp.c |  3 +++
 arch/riscv/net/bpf_jit_comp64.c |  3 +++
 arch/s390/net/bpf_jit_comp.c    |  3 +++
 include/linux/bpf.h             |  1 +
 include/uapi/linux/bpf.h        |  1 +
 kernel/bpf/btf.c                |  2 ++
 kernel/bpf/syscall.c            |  2 ++
 kernel/bpf/trampoline.c         |  5 ++++-
 kernel/bpf/verifier.c           | 12 +++++++++---
 net/bpf/test_run.c              |  1 +
 net/core/bpf_sk_storage.c       |  1 +
 tools/include/uapi/linux/bpf.h  |  1 +
 14 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index ab83089c3d8f..06f4bd6c6755 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -2788,6 +2788,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
 	void *image, *tmp;
 	int ret;
 
+	if (tlinks[BPF_TRAMP_SESSION].nr_links)
+		return -EOPNOTSUPP;
+
 	/* image doesn't need to be in module memory range, so we can
 	 * use kvmalloc.
 	 */
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index cbe53d0b7fb0..ad596341658a 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -1739,6 +1739,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
 	void *image, *tmp;
 	struct jit_ctx ctx;
 
+	if (tlinks[BPF_TRAMP_SESSION].nr_links)
+		return -EOPNOTSUPP;
+
 	size = ro_image_end - ro_image;
 	image = kvmalloc(size, GFP_KERNEL);
 	if (!image)
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 88ad5ba7b87f..bcc0ce09f6fa 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -1017,6 +1017,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	void *rw_image, *tmp;
 	int ret;
 
+	if (tlinks[BPF_TRAMP_SESSION].nr_links)
+		return -EOPNOTSUPP;
+
 	/*
 	 * rw_image doesn't need to be in module memory range, so we can
 	 * use kvmalloc.
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 45cbc7c6fe49..55b0284bf177 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -1286,6 +1286,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
 	struct rv_jit_context ctx;
 	u32 size = ro_image_end - ro_image;
 
+	if (tlinks[BPF_TRAMP_SESSION].nr_links)
+		return -EOPNOTSUPP;
+
 	image = kvmalloc(size, GFP_KERNEL);
 	if (!image)
 		return -ENOMEM;
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index cf461d76e9da..3f25bf55b150 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -2924,6 +2924,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
 	struct bpf_tramp_jit tjit;
 	int ret;
 
+	if (tlinks[BPF_TRAMP_SESSION].nr_links)
+		return -EOPNOTSUPP;
+
 	/* Compute offsets, check whether the code fits. */
 	memset(&tjit, 0, sizeof(tjit));
 	ret = __arch_prepare_bpf_trampoline(im, &tjit, m, flags,
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 86afd9ac6848..aa9f02b56edd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1270,6 +1270,7 @@ enum bpf_tramp_prog_type {
 	BPF_TRAMP_FENTRY,
 	BPF_TRAMP_FEXIT,
 	BPF_TRAMP_MODIFY_RETURN,
+	BPF_TRAMP_SESSION,
 	BPF_TRAMP_MAX,
 	BPF_TRAMP_REPLACE, /* more than MAX */
 };
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6829936d33f5..79ba3023e8be 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1133,6 +1133,7 @@ enum bpf_attach_type {
 	BPF_NETKIT_PEER,
 	BPF_TRACE_KPROBE_SESSION,
 	BPF_TRACE_UPROBE_SESSION,
+	BPF_TRACE_SESSION,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 0de8fc8a0e0b..2c1c3e0caff8 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6107,6 +6107,7 @@ static int btf_validate_prog_ctx_type(struct bpf_verifier_log *log, const struct
 		case BPF_TRACE_FENTRY:
 		case BPF_TRACE_FEXIT:
 		case BPF_MODIFY_RETURN:
+		case BPF_TRACE_SESSION:
 			/* allow u64* as ctx */
 			if (btf_is_int(t) && t->size == 8)
 				return 0;
@@ -6704,6 +6705,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 			fallthrough;
 		case BPF_LSM_CGROUP:
 		case BPF_TRACE_FEXIT:
+		case BPF_TRACE_SESSION:
 			/* When LSM programs are attached to void LSM hooks
 			 * they use FEXIT trampolines and when attached to
 			 * int LSM hooks, they use MODIFY_RETURN trampolines.
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 2a9456a3e730..15ce86b19ca4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3549,6 +3549,7 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 	case BPF_PROG_TYPE_TRACING:
 		if (prog->expected_attach_type != BPF_TRACE_FENTRY &&
 		    prog->expected_attach_type != BPF_TRACE_FEXIT &&
+		    prog->expected_attach_type != BPF_TRACE_SESSION &&
 		    prog->expected_attach_type != BPF_MODIFY_RETURN) {
 			err = -EINVAL;
 			goto out_put_prog;
@@ -4322,6 +4323,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 	case BPF_TRACE_RAW_TP:
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
+	case BPF_TRACE_SESSION:
 	case BPF_MODIFY_RETURN:
 		return BPF_PROG_TYPE_TRACING;
 	case BPF_LSM_MAC:
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 5949095e51c3..f6d4dea3461e 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -111,7 +111,7 @@ bool bpf_prog_has_trampoline(const struct bpf_prog *prog)
 
 	return (ptype == BPF_PROG_TYPE_TRACING &&
 		(eatype == BPF_TRACE_FENTRY || eatype == BPF_TRACE_FEXIT ||
-		 eatype == BPF_MODIFY_RETURN)) ||
+		 eatype == BPF_MODIFY_RETURN || eatype == BPF_TRACE_SESSION)) ||
 		(ptype == BPF_PROG_TYPE_LSM && eatype == BPF_LSM_MAC);
 }
 
@@ -418,6 +418,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
 	tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);
 
 	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
+	    tlinks[BPF_TRAMP_SESSION].nr_links ||
 	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
 		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
 		 * should not be set together.
@@ -515,6 +516,8 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
 		return BPF_TRAMP_MODIFY_RETURN;
 	case BPF_TRACE_FEXIT:
 		return BPF_TRAMP_FEXIT;
+	case BPF_TRACE_SESSION:
+		return BPF_TRAMP_SESSION;
 	case BPF_LSM_MAC:
 		if (!prog->aux->attach_func_proto->type)
 			/* The function returns void, we cannot modify its
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index c908015b2d34..40e3274e8bc2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17272,6 +17272,7 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char
 			break;
 		case BPF_TRACE_RAW_TP:
 		case BPF_MODIFY_RETURN:
+		case BPF_TRACE_SESSION:
 			return 0;
 		case BPF_TRACE_ITER:
 			break;
@@ -22727,6 +22728,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 		if (prog_type == BPF_PROG_TYPE_TRACING &&
 		    insn->imm == BPF_FUNC_get_func_ret) {
 			if (eatype == BPF_TRACE_FEXIT ||
+			    eatype == BPF_TRACE_SESSION ||
 			    eatype == BPF_MODIFY_RETURN) {
 				/* Load nr_args from ctx - 8 */
 				insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
@@ -23668,7 +23670,8 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 		if (tgt_prog->type == BPF_PROG_TYPE_TRACING &&
 		    prog_extension &&
 		    (tgt_prog->expected_attach_type == BPF_TRACE_FENTRY ||
-		     tgt_prog->expected_attach_type == BPF_TRACE_FEXIT)) {
+		     tgt_prog->expected_attach_type == BPF_TRACE_FEXIT ||
+		     tgt_prog->expected_attach_type == BPF_TRACE_SESSION)) {
 			/* Program extensions can extend all program types
 			 * except fentry/fexit. The reason is the following.
 			 * The fentry/fexit programs are used for performance
@@ -23683,7 +23686,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 			 * beyond reasonable stack size. Hence extending fentry
 			 * is not allowed.
 			 */
-			bpf_log(log, "Cannot extend fentry/fexit\n");
+			bpf_log(log, "Cannot extend fentry/fexit/session\n");
 			return -EINVAL;
 		}
 	} else {
@@ -23767,6 +23770,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 	case BPF_LSM_CGROUP:
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
+	case BPF_TRACE_SESSION:
 		if (!btf_type_is_func(t)) {
 			bpf_log(log, "attach_btf_id %u is not a function\n",
 				btf_id);
@@ -23933,6 +23937,7 @@ static bool can_be_sleepable(struct bpf_prog *prog)
 		case BPF_TRACE_FEXIT:
 		case BPF_MODIFY_RETURN:
 		case BPF_TRACE_ITER:
+		case BPF_TRACE_SESSION:
 			return true;
 		default:
 			return false;
@@ -24014,9 +24019,10 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 			tgt_info.tgt_name);
 		return -EINVAL;
 	} else if ((prog->expected_attach_type == BPF_TRACE_FEXIT ||
+		   prog->expected_attach_type == BPF_TRACE_SESSION ||
 		   prog->expected_attach_type == BPF_MODIFY_RETURN) &&
 		   btf_id_set_contains(&noreturn_deny, btf_id)) {
-		verbose(env, "Attaching fexit/fmod_ret to __noreturn function '%s' is rejected.\n",
+		verbose(env, "Attaching fexit/session/fmod_ret to __noreturn function '%s' is rejected.\n",
 			tgt_info.tgt_name);
 		return -EINVAL;
 	}
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 05e30ff5b6f9..aa2b5b17a7c7 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -696,6 +696,7 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog,
 	switch (prog->expected_attach_type) {
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
+	case BPF_TRACE_SESSION:
 		if (bpf_fentry_test1(1) != 2 ||
 		    bpf_fentry_test2(2, 3) != 5 ||
 		    bpf_fentry_test3(4, 5, 6) != 15 ||
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index d3fbaf89a698..8da8834aa134 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -365,6 +365,7 @@ static bool bpf_sk_storage_tracing_allowed(const struct bpf_prog *prog)
 		return true;
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
+	case BPF_TRACE_SESSION:
 		return !!strncmp(prog->aux->attach_func_name, "bpf_sk_storage",
 				 strlen("bpf_sk_storage"));
 	default:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6829936d33f5..79ba3023e8be 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1133,6 +1133,7 @@ enum bpf_attach_type {
 	BPF_NETKIT_PEER,
 	BPF_TRACE_KPROBE_SESSION,
 	BPF_TRACE_UPROBE_SESSION,
+	BPF_TRACE_SESSION,
 	__MAX_BPF_ATTACH_TYPE
 };
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC bpf-next 2/5] bpf: add kfunc bpf_tracing_is_exit for TRACE_SESSION
  2025-10-18 14:21 [PATCH RFC bpf-next 0/5] bpf: tracing session supporting Menglong Dong
  2025-10-18 14:21 ` [PATCH RFC bpf-next 1/5] bpf: add tracing session support Menglong Dong
@ 2025-10-18 14:21 ` Menglong Dong
  2025-10-20  8:19   ` Jiri Olsa
  2025-10-18 14:21 ` [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64 Menglong Dong
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Menglong Dong @ 2025-10-18 14:21 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

If TRACE_SESSION exists, we will use extra 8-bytes in the stack of the
trampoline to store the flags that we needed, and the 8-bytes lie before
the function argument count, which means ctx[-2]. And we will store the
flag "is_exit" to the first bit of it.

Introduce the kfunc bpf_tracing_is_exit(), which is used to tell if it
is fexit currently.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 kernel/bpf/verifier.c    |  5 ++++-
 kernel/trace/bpf_trace.c | 43 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 40e3274e8bc2..a1db11818d01 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12284,6 +12284,7 @@ enum special_kfunc_type {
 	KF___bpf_trap,
 	KF_bpf_task_work_schedule_signal,
 	KF_bpf_task_work_schedule_resume,
+	KF_bpf_tracing_is_exit,
 };
 
 BTF_ID_LIST(special_kfunc_list)
@@ -12356,6 +12357,7 @@ BTF_ID(func, bpf_res_spin_unlock_irqrestore)
 BTF_ID(func, __bpf_trap)
 BTF_ID(func, bpf_task_work_schedule_signal)
 BTF_ID(func, bpf_task_work_schedule_resume)
+BTF_ID(func, bpf_tracing_is_exit)
 
 static bool is_task_work_add_kfunc(u32 func_id)
 {
@@ -12410,7 +12412,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	struct bpf_reg_state *reg = &regs[regno];
 	bool arg_mem_size = false;
 
-	if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx])
+	if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
+	    meta->func_id == special_kfunc_list[KF_bpf_tracing_is_exit])
 		return KF_ARG_PTR_TO_CTX;
 
 	/* In this function, we verify the kfunc's BTF as per the argument type,
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 4f87c16d915a..6dde48b9d27f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3356,12 +3356,49 @@ static const struct btf_kfunc_id_set bpf_kprobe_multi_kfunc_set = {
 	.filter = bpf_kprobe_multi_filter,
 };
 
-static int __init bpf_kprobe_multi_kfuncs_init(void)
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc bool bpf_tracing_is_exit(void *ctx)
+{
+	/* ctx[-2] is the session flags, and the last bit is is_exit */
+	return ((u64 *)ctx)[-2] & 1;
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(tracing_kfunc_set_ids)
+BTF_ID_FLAGS(func, bpf_tracing_is_exit)
+BTF_KFUNCS_END(tracing_kfunc_set_ids)
+
+static int bpf_tracing_filter(const struct bpf_prog *prog, u32 kfunc_id)
+{
+	if (!btf_id_set8_contains(&tracing_kfunc_set_ids, kfunc_id))
+		return 0;
+
+	if (prog->type != BPF_PROG_TYPE_TRACING ||
+	    prog->expected_attach_type != BPF_TRACE_SESSION)
+		return -EINVAL;
+
+	return 0;
+}
+
+static const struct btf_kfunc_id_set bpf_tracing_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set = &tracing_kfunc_set_ids,
+	.filter = bpf_tracing_filter,
+};
+
+static int __init bpf_trace_kfuncs_init(void)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_KPROBE, &bpf_kprobe_multi_kfunc_set);
+	int err = 0;
+
+	err = err ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_KPROBE, &bpf_kprobe_multi_kfunc_set);
+	err = err ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_tracing_kfunc_set);
+
+	return err;
 }
 
-late_initcall(bpf_kprobe_multi_kfuncs_init);
+late_initcall(bpf_trace_kfuncs_init);
 
 typedef int (*copy_fn_t)(void *dst, const void *src, u32 size, struct task_struct *tsk);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64
  2025-10-18 14:21 [PATCH RFC bpf-next 0/5] bpf: tracing session supporting Menglong Dong
  2025-10-18 14:21 ` [PATCH RFC bpf-next 1/5] bpf: add tracing session support Menglong Dong
  2025-10-18 14:21 ` [PATCH RFC bpf-next 2/5] bpf: add kfunc bpf_tracing_is_exit for TRACE_SESSION Menglong Dong
@ 2025-10-18 14:21 ` Menglong Dong
  2025-10-19  2:03   ` Menglong Dong
                     ` (2 more replies)
  2025-10-18 14:21 ` [PATCH RFC bpf-next 4/5] libbpf: add support for tracing session Menglong Dong
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 17+ messages in thread
From: Menglong Dong @ 2025-10-18 14:21 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
invoke_bpf_session_exit is introduced for this purpose.

In invoke_bpf_session_entry(), we will check if the return value of the
fentry is 0, and clear the corresponding flag if not. And in
invoke_bpf_session_exit(), we will check if the corresponding flag is
set. If not set, the fexit will be skipped.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/x86/net/bpf_jit_comp.c | 115 +++++++++++++++++++++++++++++++++++-
 1 file changed, 114 insertions(+), 1 deletion(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index d4c93d9e73e4..0586b96ed529 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3108,6 +3108,97 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
 	return 0;
 }
 
+static int invoke_bpf_session_entry(const struct btf_func_model *m, u8 **pprog,
+				    struct bpf_tramp_links *tl, int stack_size,
+				    int run_ctx_off, int session_off,
+				    void *image, void *rw_image)
+{
+	u64 session_flags;
+	u8 *prog = *pprog;
+	u8 *jmp_insn;
+	int i;
+
+	/* clear the session flags:
+	 *
+	 *   xor rax, rax
+	 *   mov QWORD PTR [rbp - session_off], rax
+	 */
+	EMIT3(0x48, 0x31, 0xC0);
+	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -session_off);
+
+	for (i = 0; i < tl->nr_links; i++) {
+		if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, run_ctx_off, true,
+				    image, rw_image))
+			return -EINVAL;
+
+		/* fentry prog stored return value into [rbp - 8]. Emit:
+		 * if (*(u64 *)(rbp - 8) !=  0)
+		 *	*(u64 *)(rbp - session_off) |= (1 << (i + 1));
+		 */
+		/* cmp QWORD PTR [rbp - 0x8], 0x0 */
+		EMIT4(0x48, 0x83, 0x7d, 0xf8); EMIT1(0x00);
+		/* emit 2 nops that will be replaced with JE insn */
+		jmp_insn = prog;
+		emit_nops(&prog, 2);
+
+		session_flags = (1ULL << (i + 1));
+		/* mov rax, $session_flags */
+		emit_mov_imm64(&prog, BPF_REG_0, session_flags >> 32, (u32) session_flags);
+		/* or QWORD PTR [rbp - session_off], rax */
+		EMIT2(0x48, 0x09);
+		emit_insn_suffix(&prog, BPF_REG_FP, BPF_REG_0, -session_off);
+
+		jmp_insn[0] = X86_JE;
+		jmp_insn[1] = prog - jmp_insn - 2;
+	}
+
+	*pprog = prog;
+	return 0;
+}
+
+static int invoke_bpf_session_exit(const struct btf_func_model *m, u8 **pprog,
+				   struct bpf_tramp_links *tl, int stack_size,
+				   int run_ctx_off, int session_off,
+				   void *image, void *rw_image)
+{
+	u64 session_flags;
+	u8 *prog = *pprog;
+	u8 *jmp_insn;
+	int i;
+
+	/* set the bpf_trace_is_exit flag to the session flags */
+	/* mov rax, 1 */
+	emit_mov_imm32(&prog, false, BPF_REG_0, 1);
+	/* or QWORD PTR [rbp - session_off], rax */
+	EMIT2(0x48, 0x09);
+	emit_insn_suffix(&prog, BPF_REG_FP, BPF_REG_0, -session_off);
+
+	for (i = 0; i < tl->nr_links; i++) {
+		/* check if (1 << (i+1)) is set in the session flags, and
+		 * skip the execution of the fexit program if it is.
+		 */
+		session_flags = 1ULL << (i + 1);
+		/* mov rax, $session_flags */
+		emit_mov_imm64(&prog, BPF_REG_1, session_flags >> 32, (u32) session_flags);
+		/* test QWORD PTR [rbp - session_off], rax */
+		EMIT2(0x48, 0x85);
+		emit_insn_suffix(&prog, BPF_REG_FP, BPF_REG_1, -session_off);
+		/* emit 2 nops that will be replaced with JE insn */
+		jmp_insn = prog;
+		emit_nops(&prog, 2);
+
+		if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, run_ctx_off, false,
+				    image, rw_image))
+			return -EINVAL;
+
+		jmp_insn[0] = X86_JNE;
+		jmp_insn[1] = prog - jmp_insn - 2;
+	}
+
+	*pprog = prog;
+	return 0;
+}
+
 /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
 #define LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack)	\
 	__LOAD_TCC_PTR(-round_up(stack, 8) - 8)
@@ -3179,8 +3270,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 					 void *func_addr)
 {
 	int i, ret, nr_regs = m->nr_args, stack_size = 0;
-	int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
+	int regs_off, nregs_off, session_off, ip_off, run_ctx_off,
+	    arg_stack_off, rbx_off;
 	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
+	struct bpf_tramp_links *session = &tlinks[BPF_TRAMP_SESSION];
 	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
 	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
 	void *orig_call = func_addr;
@@ -3222,6 +3315,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 	 *
 	 * RBP - nregs_off [ regs count	     ]  always
 	 *
+	 * RBP - session_off [ session flags ] tracing session
+	 *
 	 * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
 	 *
 	 * RBP - rbx_off   [ rbx value       ]  always
@@ -3246,6 +3341,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 	/* regs count  */
 	stack_size += 8;
 	nregs_off = stack_size;
+	stack_size += 8;
+	session_off = stack_size;
 
 	if (flags & BPF_TRAMP_F_IP_ARG)
 		stack_size += 8; /* room for IP address argument */
@@ -3345,6 +3442,13 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 			return -EINVAL;
 	}
 
+	if (session->nr_links) {
+		if (invoke_bpf_session_entry(m, &prog, session, regs_off,
+					     run_ctx_off, session_off,
+					     image, rw_image))
+			return -EINVAL;
+	}
+
 	if (fmod_ret->nr_links) {
 		branches = kcalloc(fmod_ret->nr_links, sizeof(u8 *),
 				   GFP_KERNEL);
@@ -3409,6 +3513,15 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 		}
 	}
 
+	if (session->nr_links) {
+		if (invoke_bpf_session_exit(m, &prog, session, regs_off,
+					    run_ctx_off, session_off,
+					    image, rw_image)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+	}
+
 	if (flags & BPF_TRAMP_F_RESTORE_REGS)
 		restore_regs(m, &prog, regs_off);
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC bpf-next 4/5] libbpf: add support for tracing session
  2025-10-18 14:21 [PATCH RFC bpf-next 0/5] bpf: tracing session supporting Menglong Dong
                   ` (2 preceding siblings ...)
  2025-10-18 14:21 ` [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64 Menglong Dong
@ 2025-10-18 14:21 ` Menglong Dong
  2025-10-18 14:21 ` [PATCH RFC bpf-next 5/5] selftests/bpf: add testcases " Menglong Dong
  2025-10-20  8:18 ` [PATCH RFC bpf-next 0/5] bpf: tracing session supporting Jiri Olsa
  5 siblings, 0 replies; 17+ messages in thread
From: Menglong Dong @ 2025-10-18 14:21 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

Add BPF_TRACE_SESSION to libbpf and bpftool.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 tools/bpf/bpftool/common.c | 1 +
 tools/lib/bpf/bpf.c        | 2 ++
 tools/lib/bpf/libbpf.c     | 3 +++
 3 files changed, 6 insertions(+)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index e8daf963ecef..534be6cfa2be 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -1191,6 +1191,7 @@ const char *bpf_attach_type_input_str(enum bpf_attach_type t)
 	case BPF_TRACE_FENTRY:			return "fentry";
 	case BPF_TRACE_FEXIT:			return "fexit";
 	case BPF_MODIFY_RETURN:			return "mod_ret";
+	case BPF_TRACE_SESSION:			return "fsession";
 	case BPF_SK_REUSEPORT_SELECT:		return "sk_skb_reuseport_select";
 	case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:	return "sk_skb_reuseport_select_or_migrate";
 	default:	return libbpf_bpf_attach_type_str(t);
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 339b19797237..caed2b689068 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -794,6 +794,7 @@ int bpf_link_create(int prog_fd, int target_fd,
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
 	case BPF_MODIFY_RETURN:
+	case BPF_TRACE_SESSION:
 	case BPF_LSM_MAC:
 		attr.link_create.tracing.cookie = OPTS_GET(opts, tracing.cookie, 0);
 		if (!OPTS_ZEROED(opts, tracing))
@@ -917,6 +918,7 @@ int bpf_link_create(int prog_fd, int target_fd,
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
 	case BPF_MODIFY_RETURN:
+	case BPF_TRACE_SESSION:
 		return bpf_raw_tracepoint_open(NULL, prog_fd);
 	default:
 		return libbpf_err(err);
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index dd3b2f57082d..e582620cd097 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -115,6 +115,7 @@ static const char * const attach_type_name[] = {
 	[BPF_TRACE_FENTRY]		= "trace_fentry",
 	[BPF_TRACE_FEXIT]		= "trace_fexit",
 	[BPF_MODIFY_RETURN]		= "modify_return",
+	[BPF_TRACE_SESSION]		= "trace_session",
 	[BPF_LSM_MAC]			= "lsm_mac",
 	[BPF_LSM_CGROUP]		= "lsm_cgroup",
 	[BPF_SK_LOOKUP]			= "sk_lookup",
@@ -9607,6 +9608,8 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("fentry.s+",		TRACING, BPF_TRACE_FENTRY, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
 	SEC_DEF("fmod_ret.s+",		TRACING, BPF_MODIFY_RETURN, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
 	SEC_DEF("fexit.s+",		TRACING, BPF_TRACE_FEXIT, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
+	SEC_DEF("fsession+",		TRACING, BPF_TRACE_SESSION, SEC_ATTACH_BTF, attach_trace),
+	SEC_DEF("fsession.s+",		TRACING, BPF_TRACE_SESSION, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
 	SEC_DEF("freplace+",		EXT, 0, SEC_ATTACH_BTF, attach_trace),
 	SEC_DEF("lsm+",			LSM, BPF_LSM_MAC, SEC_ATTACH_BTF, attach_lsm),
 	SEC_DEF("lsm.s+",		LSM, BPF_LSM_MAC, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_lsm),
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC bpf-next 5/5] selftests/bpf: add testcases for tracing session
  2025-10-18 14:21 [PATCH RFC bpf-next 0/5] bpf: tracing session supporting Menglong Dong
                   ` (3 preceding siblings ...)
  2025-10-18 14:21 ` [PATCH RFC bpf-next 4/5] libbpf: add support for tracing session Menglong Dong
@ 2025-10-18 14:21 ` Menglong Dong
  2025-10-20  8:19   ` Jiri Olsa
  2025-10-20  8:18 ` [PATCH RFC bpf-next 0/5] bpf: tracing session supporting Jiri Olsa
  5 siblings, 1 reply; 17+ messages in thread
From: Menglong Dong @ 2025-10-18 14:21 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

Add testcases for BPF_TRACE_SESSION.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 .../selftests/bpf/prog_tests/fsession_test.c  | 136 +++++++++++++
 .../selftests/bpf/progs/fsession_test.c       | 178 ++++++++++++++++++
 2 files changed, 314 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/fsession_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/fsession_test.c

diff --git a/tools/testing/selftests/bpf/prog_tests/fsession_test.c b/tools/testing/selftests/bpf/prog_tests/fsession_test.c
new file mode 100644
index 000000000000..e2913da57b38
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/fsession_test.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+#include <test_progs.h>
+#include "fsession_test.skel.h"
+
+static void test_fsession_basic(void)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+	struct fsession_test *skel = NULL;
+	int err, prog_fd;
+
+	skel = fsession_test__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "fsession_test__open_and_load"))
+		goto cleanup;
+
+	err = fsession_test__attach(skel);
+	if (!ASSERT_OK(err, "fsession_attach"))
+		goto cleanup;
+
+	/* Trigger test function calls */
+	prog_fd = bpf_program__fd(skel->progs.test1);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	if (!ASSERT_OK(err, "test_run_opts err"))
+		return;
+	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
+		return;
+
+	/* Verify test1: both entry and exit are called */
+	ASSERT_EQ(skel->bss->test1_entry_called, 1, "test1_entry_called");
+	ASSERT_EQ(skel->bss->test1_exit_called, 1, "test1_exit_called");
+	ASSERT_EQ(skel->bss->test1_entry_result, 1, "test1_entry_result");
+	ASSERT_EQ(skel->bss->test1_exit_result, 1, "test1_exit_result");
+
+	/* Verify test2: entry is called but exit is blocked */
+	ASSERT_EQ(skel->bss->test2_entry_called, 1, "test2_entry_called");
+	ASSERT_EQ(skel->bss->test2_exit_called, 0, "test2_exit_not_called");
+	ASSERT_EQ(skel->bss->test2_entry_result, 1, "test2_entry_result");
+	ASSERT_EQ(skel->bss->test2_exit_result, 0, "test2_exit_result");
+
+	/* Verify test3: both entry and exit are called */
+	ASSERT_EQ(skel->bss->test3_entry_called, 1, "test3_entry_called");
+	ASSERT_EQ(skel->bss->test3_exit_called, 1, "test3_exit_called");
+	ASSERT_EQ(skel->bss->test3_entry_result, 1, "test3_entry_result");
+	ASSERT_EQ(skel->bss->test3_exit_result, 1, "test3_exit_result");
+
+	/* Verify test4: both entry and exit are called */
+	ASSERT_EQ(skel->bss->test4_entry_called, 1, "test4_entry_called");
+	ASSERT_EQ(skel->bss->test4_exit_called, 1, "test4_exit_called");
+	ASSERT_EQ(skel->bss->test4_entry_result, 1, "test4_entry_result");
+	ASSERT_EQ(skel->bss->test4_exit_result, 1, "test4_exit_result");
+
+	/* Verify test5: both entry and exit are called */
+	ASSERT_EQ(skel->bss->test5_entry_called, 1, "test5_entry_called");
+	ASSERT_EQ(skel->bss->test5_exit_called, 1, "test5_exit_called");
+	ASSERT_EQ(skel->bss->test5_entry_result, 1, "test5_entry_result");
+	ASSERT_EQ(skel->bss->test5_exit_result, 1, "test5_exit_result");
+
+	/* Verify test6: entry is called but exit is blocked */
+	ASSERT_EQ(skel->bss->test6_entry_called, 1, "test6_entry_called");
+	ASSERT_EQ(skel->bss->test6_exit_called, 0, "test6_exit_not_called");
+	ASSERT_EQ(skel->bss->test6_entry_result, 1, "test6_entry_result");
+	ASSERT_EQ(skel->bss->test6_exit_result, 0, "test6_exit_result");
+
+	/* Verify test7: entry is called but exit is blocked */
+	ASSERT_EQ(skel->bss->test7_entry_called, 1, "test7_entry_called");
+	ASSERT_EQ(skel->bss->test7_exit_called, 0, "test7_exit_not_called");
+	ASSERT_EQ(skel->bss->test7_entry_result, 1, "test7_entry_result");
+	ASSERT_EQ(skel->bss->test7_exit_result, 0, "test7_exit_result");
+
+cleanup:
+	fsession_test__destroy(skel);
+}
+
+static void test_fsession_reattach(void)
+{
+	struct fsession_test *skel = NULL;
+	int err, prog_fd;
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+	skel = fsession_test__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "fsession_test__open_and_load"))
+		goto cleanup;
+
+	/* First attach */
+	err = fsession_test__attach(skel);
+	if (!ASSERT_OK(err, "fsession_first_attach"))
+		goto cleanup;
+
+	/* Trigger test function calls */
+	prog_fd = bpf_program__fd(skel->progs.test1);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	if (!ASSERT_OK(err, "test_run_opts err"))
+		return;
+	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
+		return;
+
+	/* Verify first call */
+	ASSERT_EQ(skel->bss->test1_entry_called, 1, "test1_entry_first");
+	ASSERT_EQ(skel->bss->test1_exit_called, 1, "test1_exit_first");
+
+	/* Detach */
+	fsession_test__detach(skel);
+
+	/* Reset counters */
+	memset(skel->bss, 0, sizeof(*skel->bss));
+
+	/* Second attach */
+	err = fsession_test__attach(skel);
+	if (!ASSERT_OK(err, "fsession_second_attach"))
+		goto cleanup;
+
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	if (!ASSERT_OK(err, "test_run_opts err"))
+		return;
+	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
+		return;
+
+	/* Verify second call */
+	ASSERT_EQ(skel->bss->test1_entry_called, 1, "test1_entry_second");
+	ASSERT_EQ(skel->bss->test1_exit_called, 1, "test1_exit_second");
+
+cleanup:
+	fsession_test__destroy(skel);
+}
+
+void test_fsession_test(void)
+{
+#if !defined(__x86_64__)
+	test__skip();
+	return;
+#endif
+	if (test__start_subtest("fsession_basic"))
+		test_fsession_basic();
+	if (test__start_subtest("fsession_reattach"))
+		test_fsession_reattach();
+}
diff --git a/tools/testing/selftests/bpf/progs/fsession_test.c b/tools/testing/selftests/bpf/progs/fsession_test.c
new file mode 100644
index 000000000000..cce2b32f7c2c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/fsession_test.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+__u64 test1_entry_result = 0;
+__u64 test1_exit_result = 0;
+__u64 test1_entry_called = 0;
+__u64 test1_exit_called = 0;
+
+SEC("fsession/bpf_fentry_test1")
+int BPF_PROG(test1, int a)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		/* This is entry */
+		test1_entry_called = 1;
+		test1_entry_result = a == 1;
+		return 0; /* Return 0 to allow exit to be called */
+	}
+
+	/* This is exit */
+	test1_exit_called = 1;
+	test1_exit_result = a == 1;
+	return 0;
+}
+
+__u64 test2_entry_result = 0;
+__u64 test2_exit_result = 0;
+__u64 test2_entry_called = 0;
+__u64 test2_exit_called = 0;
+
+SEC("fsession/bpf_fentry_test2")
+int BPF_PROG(test2, int a, __u64 b)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		/* This is entry */
+		test2_entry_called = 1;
+		test2_entry_result = a == 2 && b == 3;
+		return 1; /* Return non-zero value to block exit call */
+	}
+
+	/* This is exit - should not be called due to blocking */
+	test2_exit_called = 1;
+	test2_exit_result = a == 2 && b == 3;
+	return 0;
+}
+
+__u64 test3_entry_result = 0;
+__u64 test3_exit_result = 0;
+__u64 test3_entry_called = 0;
+__u64 test3_exit_called = 0;
+
+SEC("fsession/bpf_fentry_test3")
+int BPF_PROG(test3, char a, int b, __u64 c)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		/* This is entry */
+		test3_entry_called = 1;
+		test3_entry_result = a == 4 && b == 5 && c == 6;
+		return 0; /* Allow exit to be called */
+	}
+
+	/* This is exit */
+	test3_exit_called = 1;
+	test3_exit_result = a == 4 && b == 5 && c == 6;
+	return 0;
+}
+
+__u64 test4_entry_result = 0;
+__u64 test4_exit_result = 0;
+__u64 test4_entry_called = 0;
+__u64 test4_exit_called = 0;
+
+SEC("fsession/bpf_fentry_test4")
+int BPF_PROG(test4, void *a, char b, int c, __u64 d)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		/* This is entry */
+		test4_entry_called = 1;
+		test4_entry_result = a == (void *)7 && b == 8 && c == 9 && d == 10;
+		return 0; /* Allow exit to be called */
+	}
+
+	/* This is exit */
+	test4_exit_called = 1;
+	test4_exit_result = a == (void *)7 && b == 8 && c == 9 && d == 10;
+	return 0;
+}
+
+__u64 test5_entry_result = 0;
+__u64 test5_exit_result = 0;
+__u64 test5_entry_called = 0;
+__u64 test5_exit_called = 0;
+
+SEC("fsession/bpf_fentry_test7")
+int BPF_PROG(test5, struct bpf_fentry_test_t *arg)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		/* This is entry */
+		test5_entry_called = 1;
+		if (!arg)
+			test5_entry_result = 1;
+		return 0; /* Allow exit to be called */
+	}
+
+	/* This is exit */
+	test5_exit_called = 1;
+	if (!arg)
+		test5_exit_result = 1;
+	return 0;
+}
+
+__u64 test6_entry_result = 0;
+__u64 test6_exit_result = 0;
+__u64 test6_entry_called = 0;
+__u64 test6_exit_called = 0;
+
+SEC("fsession/bpf_fentry_test5")
+int BPF_PROG(test6, __u64 a, void *b, short c, int d, __u64 e)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		/* This is entry */
+		test6_entry_called = 1;
+		test6_entry_result = a == 11 && b == (void *)12 && c == 13 && d == 14 &&
+			e == 15;
+		/* Decide whether to block exit call based on condition */
+		if (a == 11)
+			return 1; /* Block exit call */
+		return 0;
+	}
+
+	/* This is exit - should not be called due to blocking */
+	test6_exit_called = 1;
+	test6_exit_result = a == 11 && b == (void *)12 && c == 13 && d == 14 &&
+		e == 15;
+	return 0;
+}
+
+__u64 test7_entry_result = 0;
+__u64 test7_exit_result = 0;
+__u64 test7_entry_called = 0;
+__u64 test7_exit_called = 0;
+
+SEC("fsession/bpf_fentry_test6")
+int BPF_PROG(test7, __u64 a, void *b, short c, int d, void *e, __u64 f)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		/* This is entry */
+		test7_entry_called = 1;
+		test7_entry_result = a == 16 && b == (void *)17 && c == 18 && d == 19 &&
+			e == (void *)20 && f == 21;
+		/* Return non-zero to block exit call */
+		return 1;
+	}
+
+	/* This is exit - should not be called due to blocking */
+	test7_exit_called = 1;
+	test7_exit_result = a == 16 && b == (void *)17 && c == 18 && d == 19 &&
+		e == (void *)20 && f == 21;
+	return 0;
+}
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64
  2025-10-18 14:21 ` [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64 Menglong Dong
@ 2025-10-19  2:03   ` Menglong Dong
  2025-10-20  8:19   ` Jiri Olsa
  2025-10-21 18:16   ` Alexei Starovoitov
  2 siblings, 0 replies; 17+ messages in thread
From: Menglong Dong @ 2025-10-19  2:03 UTC (permalink / raw)
  To: ast, jolsa, Menglong Dong
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

On 2025/10/18 22:21, Menglong Dong wrote:
> Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
> invoke_bpf_session_exit is introduced for this purpose.
> 
> In invoke_bpf_session_entry(), we will check if the return value of the
> fentry is 0, and clear the corresponding flag if not. And in
> invoke_bpf_session_exit(), we will check if the corresponding flag is
> set. If not set, the fexit will be skipped.
> 
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  arch/x86/net/bpf_jit_comp.c | 115 +++++++++++++++++++++++++++++++++++-
>  1 file changed, 114 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index d4c93d9e73e4..0586b96ed529 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -3108,6 +3108,97 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
>  	return 0;
>  }
>  
> +static int invoke_bpf_session_entry(const struct btf_func_model *m, u8 **pprog,
> +				    struct bpf_tramp_links *tl, int stack_size,
> +				    int run_ctx_off, int session_off,
> +				    void *image, void *rw_image)
> +{
> +	u64 session_flags;
> +	u8 *prog = *pprog;
> +	u8 *jmp_insn;
> +	int i;
> +
> +	/* clear the session flags:
> +	 *
> +	 *   xor rax, rax
> +	 *   mov QWORD PTR [rbp - session_off], rax
> +	 */
> +	EMIT3(0x48, 0x31, 0xC0);
> +	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -session_off);
> +
> +	for (i = 0; i < tl->nr_links; i++) {
> +		if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, run_ctx_off, true,
> +				    image, rw_image))
> +			return -EINVAL;
> +
> +		/* fentry prog stored return value into [rbp - 8]. Emit:
> +		 * if (*(u64 *)(rbp - 8) !=  0)
> +		 *	*(u64 *)(rbp - session_off) |= (1 << (i + 1));
> +		 */
> +		/* cmp QWORD PTR [rbp - 0x8], 0x0 */
> +		EMIT4(0x48, 0x83, 0x7d, 0xf8); EMIT1(0x00);
> +		/* emit 2 nops that will be replaced with JE insn */
> +		jmp_insn = prog;
> +		emit_nops(&prog, 2);
> +
> +		session_flags = (1ULL << (i + 1));
> +		/* mov rax, $session_flags */
> +		emit_mov_imm64(&prog, BPF_REG_0, session_flags >> 32, (u32) session_flags);
> +		/* or QWORD PTR [rbp - session_off], rax */
> +		EMIT2(0x48, 0x09);
> +		emit_insn_suffix(&prog, BPF_REG_FP, BPF_REG_0, -session_off);
> +
> +		jmp_insn[0] = X86_JE;
> +		jmp_insn[1] = prog - jmp_insn - 2;
> +	}
> +
> +	*pprog = prog;
> +	return 0;
> +}
> +
> +static int invoke_bpf_session_exit(const struct btf_func_model *m, u8 **pprog,
> +				   struct bpf_tramp_links *tl, int stack_size,
> +				   int run_ctx_off, int session_off,
> +				   void *image, void *rw_image)
> +{
> +	u64 session_flags;
> +	u8 *prog = *pprog;
> +	u8 *jmp_insn;
> +	int i;
> +
> +	/* set the bpf_trace_is_exit flag to the session flags */
> +	/* mov rax, 1 */
> +	emit_mov_imm32(&prog, false, BPF_REG_0, 1);
> +	/* or QWORD PTR [rbp - session_off], rax */
> +	EMIT2(0x48, 0x09);
> +	emit_insn_suffix(&prog, BPF_REG_FP, BPF_REG_0, -session_off);
> +
> +	for (i = 0; i < tl->nr_links; i++) {
> +		/* check if (1 << (i+1)) is set in the session flags, and
> +		 * skip the execution of the fexit program if it is.
> +		 */
> +		session_flags = 1ULL << (i + 1);
> +		/* mov rax, $session_flags */
> +		emit_mov_imm64(&prog, BPF_REG_1, session_flags >> 32, (u32) session_flags);
> +		/* test QWORD PTR [rbp - session_off], rax */
> +		EMIT2(0x48, 0x85);
> +		emit_insn_suffix(&prog, BPF_REG_FP, BPF_REG_1, -session_off);
> +		/* emit 2 nops that will be replaced with JE insn */
> +		jmp_insn = prog;
> +		emit_nops(&prog, 2);
> +
> +		if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, run_ctx_off, false,
> +				    image, rw_image))
> +			return -EINVAL;
> +
> +		jmp_insn[0] = X86_JNE;
> +		jmp_insn[1] = prog - jmp_insn - 2;
> +	}
> +
> +	*pprog = prog;
> +	return 0;
> +}
> +
>  /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
>  #define LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack)	\
>  	__LOAD_TCC_PTR(-round_up(stack, 8) - 8)
> @@ -3179,8 +3270,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>  					 void *func_addr)
>  {
>  	int i, ret, nr_regs = m->nr_args, stack_size = 0;
> -	int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
> +	int regs_off, nregs_off, session_off, ip_off, run_ctx_off,
> +	    arg_stack_off, rbx_off;
>  	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
> +	struct bpf_tramp_links *session = &tlinks[BPF_TRAMP_SESSION];
>  	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
>  	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
>  	void *orig_call = func_addr;
> @@ -3222,6 +3315,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>  	 *
>  	 * RBP - nregs_off [ regs count	     ]  always
>  	 *
> +	 * RBP - session_off [ session flags ] tracing session
> +	 *
>  	 * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
>  	 *
>  	 * RBP - rbx_off   [ rbx value       ]  always
> @@ -3246,6 +3341,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>  	/* regs count  */
>  	stack_size += 8;
>  	nregs_off = stack_size;
> +	stack_size += 8;
> +	session_off = stack_size;

Oops, this break bpf_get_func_ip(), which will get the ip with ctx[-2].
I'll introduce a "bpf_get_func_ip_proto_tracing_session" to fix it.

>  
>  	if (flags & BPF_TRAMP_F_IP_ARG)
>  		stack_size += 8; /* room for IP address argument */
> @@ -3345,6 +3442,13 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>  			return -EINVAL;
>  	}
>  
> +	if (session->nr_links) {
> +		if (invoke_bpf_session_entry(m, &prog, session, regs_off,
> +					     run_ctx_off, session_off,
> +					     image, rw_image))
> +			return -EINVAL;
> +	}
> +
>  	if (fmod_ret->nr_links) {
>  		branches = kcalloc(fmod_ret->nr_links, sizeof(u8 *),
>  				   GFP_KERNEL);
> @@ -3409,6 +3513,15 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>  		}
>  	}
>  
> +	if (session->nr_links) {
> +		if (invoke_bpf_session_exit(m, &prog, session, regs_off,
> +					    run_ctx_off, session_off,
> +					    image, rw_image)) {
> +			ret = -EINVAL;
> +			goto cleanup;
> +		}
> +	}
> +
>  	if (flags & BPF_TRAMP_F_RESTORE_REGS)
>  		restore_regs(m, &prog, regs_off);
>  
> 





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC bpf-next 0/5] bpf: tracing session supporting
  2025-10-18 14:21 [PATCH RFC bpf-next 0/5] bpf: tracing session supporting Menglong Dong
                   ` (4 preceding siblings ...)
  2025-10-18 14:21 ` [PATCH RFC bpf-next 5/5] selftests/bpf: add testcases " Menglong Dong
@ 2025-10-20  8:18 ` Jiri Olsa
  2025-10-20  8:55   ` Menglong Dong
  5 siblings, 1 reply; 17+ messages in thread
From: Jiri Olsa @ 2025-10-20  8:18 UTC (permalink / raw)
  To: Menglong Dong
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

On Sat, Oct 18, 2025 at 10:21:19PM +0800, Menglong Dong wrote:
> Sometimes, we need to hook both the entry and exit of a function with
> TRACING. Therefore, we need define a FENTRY and a FEXIT for the target
> function, which is not convenient.
> 
> Therefore, we add a tracing session support for TRACING. Generally
> speaking, it's similar to kprobe session, which can hook both the entry
> and exit of a function with a single BPF program. Meanwhile, it can also
> control the execution of the fexit with the return value of the fentry.
> session cookie is not supported yet, and I'm not sure if it's necessary.

hi,
I think it'd be useful to have support for cookie, people that use kprobe
session because of multi attach, could easily migrate to trampolines once
we have fast multi attach for trampolines

jirka


> 
> For now, only x86_64 is supported. Other architectures will be supported
> later.
> 
> Menglong Dong (5):
>   bpf: add tracing session support
>   bpf: add kfunc bpf_tracing_is_exit for TRACE_SESSION
>   bpf,x86: add tracing session supporting for x86_64
>   libbpf: add support for tracing session
>   selftests/bpf: add testcases for tracing session
> 
>  arch/arm64/net/bpf_jit_comp.c                 |   3 +
>  arch/loongarch/net/bpf_jit.c                  |   3 +
>  arch/powerpc/net/bpf_jit_comp.c               |   3 +
>  arch/riscv/net/bpf_jit_comp64.c               |   3 +
>  arch/s390/net/bpf_jit_comp.c                  |   3 +
>  arch/x86/net/bpf_jit_comp.c                   | 115 ++++++++++-
>  include/linux/bpf.h                           |   1 +
>  include/uapi/linux/bpf.h                      |   1 +
>  kernel/bpf/btf.c                              |   2 +
>  kernel/bpf/syscall.c                          |   2 +
>  kernel/bpf/trampoline.c                       |   5 +-
>  kernel/bpf/verifier.c                         |  17 +-
>  kernel/trace/bpf_trace.c                      |  43 ++++-
>  net/bpf/test_run.c                            |   1 +
>  net/core/bpf_sk_storage.c                     |   1 +
>  tools/bpf/bpftool/common.c                    |   1 +
>  tools/include/uapi/linux/bpf.h                |   1 +
>  tools/lib/bpf/bpf.c                           |   2 +
>  tools/lib/bpf/libbpf.c                        |   3 +
>  .../selftests/bpf/prog_tests/fsession_test.c  | 132 +++++++++++++
>  .../selftests/bpf/progs/fsession_test.c       | 178 ++++++++++++++++++
>  21 files changed, 511 insertions(+), 9 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/fsession_test.c
>  create mode 100644 tools/testing/selftests/bpf/progs/fsession_test.c
> 
> -- 
> 2.51.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC bpf-next 2/5] bpf: add kfunc bpf_tracing_is_exit for TRACE_SESSION
  2025-10-18 14:21 ` [PATCH RFC bpf-next 2/5] bpf: add kfunc bpf_tracing_is_exit for TRACE_SESSION Menglong Dong
@ 2025-10-20  8:19   ` Jiri Olsa
  2025-10-20  8:30     ` Menglong Dong
  0 siblings, 1 reply; 17+ messages in thread
From: Jiri Olsa @ 2025-10-20  8:19 UTC (permalink / raw)
  To: Menglong Dong
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

On Sat, Oct 18, 2025 at 10:21:21PM +0800, Menglong Dong wrote:
> If TRACE_SESSION exists, we will use extra 8-bytes in the stack of the
> trampoline to store the flags that we needed, and the 8-bytes lie before
> the function argument count, which means ctx[-2]. And we will store the
> flag "is_exit" to the first bit of it.
> 
> Introduce the kfunc bpf_tracing_is_exit(), which is used to tell if it
> is fexit currently.
> 
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>  kernel/bpf/verifier.c    |  5 ++++-
>  kernel/trace/bpf_trace.c | 43 +++++++++++++++++++++++++++++++++++++---
>  2 files changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 40e3274e8bc2..a1db11818d01 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12284,6 +12284,7 @@ enum special_kfunc_type {
>  	KF___bpf_trap,
>  	KF_bpf_task_work_schedule_signal,
>  	KF_bpf_task_work_schedule_resume,
> +	KF_bpf_tracing_is_exit,
>  };
>  
>  BTF_ID_LIST(special_kfunc_list)
> @@ -12356,6 +12357,7 @@ BTF_ID(func, bpf_res_spin_unlock_irqrestore)
>  BTF_ID(func, __bpf_trap)
>  BTF_ID(func, bpf_task_work_schedule_signal)
>  BTF_ID(func, bpf_task_work_schedule_resume)
> +BTF_ID(func, bpf_tracing_is_exit)
>  
>  static bool is_task_work_add_kfunc(u32 func_id)
>  {
> @@ -12410,7 +12412,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>  	struct bpf_reg_state *reg = &regs[regno];
>  	bool arg_mem_size = false;
>  
> -	if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx])
> +	if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
> +	    meta->func_id == special_kfunc_list[KF_bpf_tracing_is_exit])
>  		return KF_ARG_PTR_TO_CTX;
>  
>  	/* In this function, we verify the kfunc's BTF as per the argument type,
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 4f87c16d915a..6dde48b9d27f 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -3356,12 +3356,49 @@ static const struct btf_kfunc_id_set bpf_kprobe_multi_kfunc_set = {
>  	.filter = bpf_kprobe_multi_filter,
>  };
>  
> -static int __init bpf_kprobe_multi_kfuncs_init(void)
> +__bpf_kfunc_start_defs();
> +
> +__bpf_kfunc bool bpf_tracing_is_exit(void *ctx)
> +{
> +	/* ctx[-2] is the session flags, and the last bit is is_exit */
> +	return ((u64 *)ctx)[-2] & 1;
> +}

I think this could be inlined by verifier

jirka


> +
> +__bpf_kfunc_end_defs();

SNIP

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64
  2025-10-18 14:21 ` [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64 Menglong Dong
  2025-10-19  2:03   ` Menglong Dong
@ 2025-10-20  8:19   ` Jiri Olsa
  2025-10-20  8:31     ` Menglong Dong
  2025-10-21 18:16   ` Alexei Starovoitov
  2 siblings, 1 reply; 17+ messages in thread
From: Jiri Olsa @ 2025-10-20  8:19 UTC (permalink / raw)
  To: Menglong Dong
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

On Sat, Oct 18, 2025 at 10:21:22PM +0800, Menglong Dong wrote:

SNIP

>  /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
>  #define LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack)	\
>  	__LOAD_TCC_PTR(-round_up(stack, 8) - 8)
> @@ -3179,8 +3270,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>  					 void *func_addr)
>  {
>  	int i, ret, nr_regs = m->nr_args, stack_size = 0;
> -	int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
> +	int regs_off, nregs_off, session_off, ip_off, run_ctx_off,
> +	    arg_stack_off, rbx_off;
>  	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
> +	struct bpf_tramp_links *session = &tlinks[BPF_TRAMP_SESSION];
>  	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
>  	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
>  	void *orig_call = func_addr;
> @@ -3222,6 +3315,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>  	 *
>  	 * RBP - nregs_off [ regs count	     ]  always
>  	 *
> +	 * RBP - session_off [ session flags ] tracing session
> +	 *
>  	 * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
>  	 *
>  	 * RBP - rbx_off   [ rbx value       ]  always
> @@ -3246,6 +3341,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>  	/* regs count  */
>  	stack_size += 8;
>  	nregs_off = stack_size;
> +	stack_size += 8;
> +	session_off = stack_size;

should this depend on session->nr_links ?

jirka

>  
>  	if (flags & BPF_TRAMP_F_IP_ARG)
>  		stack_size += 8; /* room for IP address argument */
> @@ -3345,6 +3442,13 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>  			return -EINVAL;
>  	}
>  
> +	if (session->nr_links) {
> +		if (invoke_bpf_session_entry(m, &prog, session, regs_off,
> +					     run_ctx_off, session_off,
> +					     image, rw_image))
> +			return -EINVAL;
> +	}
> +
>  	if (fmod_ret->nr_links) {
>  		branches = kcalloc(fmod_ret->nr_links, sizeof(u8 *),
>  				   GFP_KERNEL);
> @@ -3409,6 +3513,15 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>  		}
>  	}
>  
> +	if (session->nr_links) {
> +		if (invoke_bpf_session_exit(m, &prog, session, regs_off,
> +					    run_ctx_off, session_off,
> +					    image, rw_image)) {
> +			ret = -EINVAL;
> +			goto cleanup;
> +		}
> +	}
> +
>  	if (flags & BPF_TRAMP_F_RESTORE_REGS)
>  		restore_regs(m, &prog, regs_off);
>  
> -- 
> 2.51.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC bpf-next 5/5] selftests/bpf: add testcases for tracing session
  2025-10-18 14:21 ` [PATCH RFC bpf-next 5/5] selftests/bpf: add testcases " Menglong Dong
@ 2025-10-20  8:19   ` Jiri Olsa
  2025-10-20  8:40     ` Menglong Dong
  0 siblings, 1 reply; 17+ messages in thread
From: Jiri Olsa @ 2025-10-20  8:19 UTC (permalink / raw)
  To: Menglong Dong
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

On Sat, Oct 18, 2025 at 10:21:24PM +0800, Menglong Dong wrote:

SNIP

> +static void test_fsession_reattach(void)
> +{
> +	struct fsession_test *skel = NULL;
> +	int err, prog_fd;
> +	LIBBPF_OPTS(bpf_test_run_opts, topts);
> +
> +	skel = fsession_test__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "fsession_test__open_and_load"))
> +		goto cleanup;
> +
> +	/* First attach */
> +	err = fsession_test__attach(skel);
> +	if (!ASSERT_OK(err, "fsession_first_attach"))
> +		goto cleanup;
> +
> +	/* Trigger test function calls */
> +	prog_fd = bpf_program__fd(skel->progs.test1);
> +	err = bpf_prog_test_run_opts(prog_fd, &topts);
> +	if (!ASSERT_OK(err, "test_run_opts err"))
> +		return;

goto cleanup

> +	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
> +		return;

goto cleanup

> +
> +	/* Verify first call */
> +	ASSERT_EQ(skel->bss->test1_entry_called, 1, "test1_entry_first");
> +	ASSERT_EQ(skel->bss->test1_exit_called, 1, "test1_exit_first");
> +
> +	/* Detach */
> +	fsession_test__detach(skel);
> +
> +	/* Reset counters */
> +	memset(skel->bss, 0, sizeof(*skel->bss));
> +
> +	/* Second attach */
> +	err = fsession_test__attach(skel);
> +	if (!ASSERT_OK(err, "fsession_second_attach"))
> +		goto cleanup;
> +
> +	err = bpf_prog_test_run_opts(prog_fd, &topts);
> +	if (!ASSERT_OK(err, "test_run_opts err"))
> +		return;

goto cleanup

> +	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
> +		return;

goto cleanup

> +
> +	/* Verify second call */
> +	ASSERT_EQ(skel->bss->test1_entry_called, 1, "test1_entry_second");
> +	ASSERT_EQ(skel->bss->test1_exit_called, 1, "test1_exit_second");
> +
> +cleanup:
> +	fsession_test__destroy(skel);
> +}
> +
> +void test_fsession_test(void)
> +{
> +#if !defined(__x86_64__)
> +	test__skip();
> +	return;
> +#endif
> +	if (test__start_subtest("fsession_basic"))
> +		test_fsession_basic();
> +	if (test__start_subtest("fsession_reattach"))
> +		test_fsession_reattach();
> +}
> diff --git a/tools/testing/selftests/bpf/progs/fsession_test.c b/tools/testing/selftests/bpf/progs/fsession_test.c
> new file mode 100644
> index 000000000000..cce2b32f7c2c
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/fsession_test.c
> @@ -0,0 +1,178 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2025 ChinaTelecom */
> +#include <vmlinux.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +char _license[] SEC("license") = "GPL";
> +
> +__u64 test1_entry_result = 0;
> +__u64 test1_exit_result = 0;
> +__u64 test1_entry_called = 0;
> +__u64 test1_exit_called = 0;
> +
> +SEC("fsession/bpf_fentry_test1")
> +int BPF_PROG(test1, int a)
> +{

I guess we can access return argument directly but it makes sense only
for exit session program, or we could use bpf_get_func_ret

jirka


> +	bool is_exit = bpf_tracing_is_exit(ctx);
> +
> +	if (!is_exit) {
> +		/* This is entry */
> +		test1_entry_called = 1;
> +		test1_entry_result = a == 1;
> +		return 0; /* Return 0 to allow exit to be called */
> +	}
> +
> +	/* This is exit */
> +	test1_exit_called = 1;
> +	test1_exit_result = a == 1;
> +	return 0;
> +}
> +
> +__u64 test2_entry_result = 0;
> +__u64 test2_exit_result = 0;
> +__u64 test2_entry_called = 0;
> +__u64 test2_exit_called = 0;
> +

SNIP

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC bpf-next 2/5] bpf: add kfunc bpf_tracing_is_exit for TRACE_SESSION
  2025-10-20  8:19   ` Jiri Olsa
@ 2025-10-20  8:30     ` Menglong Dong
  0 siblings, 0 replies; 17+ messages in thread
From: Menglong Dong @ 2025-10-20  8:30 UTC (permalink / raw)
  To: Menglong Dong, Jiri Olsa
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

On 2025/10/20 16:19, Jiri Olsa wrote:
> On Sat, Oct 18, 2025 at 10:21:21PM +0800, Menglong Dong wrote:
> > If TRACE_SESSION exists, we will use extra 8-bytes in the stack of the
> > trampoline to store the flags that we needed, and the 8-bytes lie before
> > the function argument count, which means ctx[-2]. And we will store the
> > flag "is_exit" to the first bit of it.
> > 
> > Introduce the kfunc bpf_tracing_is_exit(), which is used to tell if it
> > is fexit currently.
> > 
> > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> > Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
> > Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> > ---
> >  kernel/bpf/verifier.c    |  5 ++++-
> >  kernel/trace/bpf_trace.c | 43 +++++++++++++++++++++++++++++++++++++---
> >  2 files changed, 44 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 40e3274e8bc2..a1db11818d01 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -12284,6 +12284,7 @@ enum special_kfunc_type {
> >  	KF___bpf_trap,
> >  	KF_bpf_task_work_schedule_signal,
> >  	KF_bpf_task_work_schedule_resume,
> > +	KF_bpf_tracing_is_exit,
> >  };
> >  
> >  BTF_ID_LIST(special_kfunc_list)
> > @@ -12356,6 +12357,7 @@ BTF_ID(func, bpf_res_spin_unlock_irqrestore)
> >  BTF_ID(func, __bpf_trap)
> >  BTF_ID(func, bpf_task_work_schedule_signal)
> >  BTF_ID(func, bpf_task_work_schedule_resume)
> > +BTF_ID(func, bpf_tracing_is_exit)
> >  
> >  static bool is_task_work_add_kfunc(u32 func_id)
> >  {
> > @@ -12410,7 +12412,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
> >  	struct bpf_reg_state *reg = &regs[regno];
> >  	bool arg_mem_size = false;
> >  
> > -	if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx])
> > +	if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
> > +	    meta->func_id == special_kfunc_list[KF_bpf_tracing_is_exit])
> >  		return KF_ARG_PTR_TO_CTX;
> >  
> >  	/* In this function, we verify the kfunc's BTF as per the argument type,
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 4f87c16d915a..6dde48b9d27f 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -3356,12 +3356,49 @@ static const struct btf_kfunc_id_set bpf_kprobe_multi_kfunc_set = {
> >  	.filter = bpf_kprobe_multi_filter,
> >  };
> >  
> > -static int __init bpf_kprobe_multi_kfuncs_init(void)
> > +__bpf_kfunc_start_defs();
> > +
> > +__bpf_kfunc bool bpf_tracing_is_exit(void *ctx)
> > +{
> > +	/* ctx[-2] is the session flags, and the last bit is is_exit */
> > +	return ((u64 *)ctx)[-2] & 1;
> > +}
> 
> I think this could be inlined by verifier

Yeah, that make sense. I'll inline it in the next version.

Thanks!
Menglong Dong

> 
> jirka
> 
> 
> > +
> > +__bpf_kfunc_end_defs();
> 
> SNIP
> 
> 





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64
  2025-10-20  8:19   ` Jiri Olsa
@ 2025-10-20  8:31     ` Menglong Dong
  0 siblings, 0 replies; 17+ messages in thread
From: Menglong Dong @ 2025-10-20  8:31 UTC (permalink / raw)
  To: Menglong Dong, Jiri Olsa
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

On 2025/10/20 16:19, Jiri Olsa wrote:
> On Sat, Oct 18, 2025 at 10:21:22PM +0800, Menglong Dong wrote:
> 
> SNIP
> 
> >  /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
> >  #define LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack)	\
> >  	__LOAD_TCC_PTR(-round_up(stack, 8) - 8)
> > @@ -3179,8 +3270,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> >  					 void *func_addr)
> >  {
> >  	int i, ret, nr_regs = m->nr_args, stack_size = 0;
> > -	int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
> > +	int regs_off, nregs_off, session_off, ip_off, run_ctx_off,
> > +	    arg_stack_off, rbx_off;
> >  	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
> > +	struct bpf_tramp_links *session = &tlinks[BPF_TRAMP_SESSION];
> >  	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
> >  	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
> >  	void *orig_call = func_addr;
> > @@ -3222,6 +3315,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> >  	 *
> >  	 * RBP - nregs_off [ regs count	     ]  always
> >  	 *
> > +	 * RBP - session_off [ session flags ] tracing session
> > +	 *
> >  	 * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
> >  	 *
> >  	 * RBP - rbx_off   [ rbx value       ]  always
> > @@ -3246,6 +3341,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> >  	/* regs count  */
> >  	stack_size += 8;
> >  	nregs_off = stack_size;
> > +	stack_size += 8;
> > +	session_off = stack_size;
> 
> should this depend on session->nr_links ?

Hmm...my mistake, it should. And this also break the bpf_get_func_ip(),
which I'll fix in the next version.

> 
> jirka
> 
> >  
> >  	if (flags & BPF_TRAMP_F_IP_ARG)
> >  		stack_size += 8; /* room for IP address argument */
> > @@ -3345,6 +3442,13 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> >  			return -EINVAL;
> >  	}
> >  
> > +	if (session->nr_links) {
> > +		if (invoke_bpf_session_entry(m, &prog, session, regs_off,
> > +					     run_ctx_off, session_off,
> > +					     image, rw_image))
> > +			return -EINVAL;
> > +	}
> > +
> >  	if (fmod_ret->nr_links) {
> >  		branches = kcalloc(fmod_ret->nr_links, sizeof(u8 *),
> >  				   GFP_KERNEL);
> > @@ -3409,6 +3513,15 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> >  		}
> >  	}
> >  
> > +	if (session->nr_links) {
> > +		if (invoke_bpf_session_exit(m, &prog, session, regs_off,
> > +					    run_ctx_off, session_off,
> > +					    image, rw_image)) {
> > +			ret = -EINVAL;
> > +			goto cleanup;
> > +		}
> > +	}
> > +
> >  	if (flags & BPF_TRAMP_F_RESTORE_REGS)
> >  		restore_regs(m, &prog, regs_off);
> >  
> 
> 





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC bpf-next 5/5] selftests/bpf: add testcases for tracing session
  2025-10-20  8:19   ` Jiri Olsa
@ 2025-10-20  8:40     ` Menglong Dong
  0 siblings, 0 replies; 17+ messages in thread
From: Menglong Dong @ 2025-10-20  8:40 UTC (permalink / raw)
  To: Menglong Dong, Jiri Olsa
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

On 2025/10/20 16:19, Jiri Olsa wrote:
> On Sat, Oct 18, 2025 at 10:21:24PM +0800, Menglong Dong wrote:
> 
> SNIP
> 
> > +static void test_fsession_reattach(void)
> > +{
> > +	struct fsession_test *skel = NULL;
> > +	int err, prog_fd;
> > +	LIBBPF_OPTS(bpf_test_run_opts, topts);
> > +
> > +	skel = fsession_test__open_and_load();
> > +	if (!ASSERT_OK_PTR(skel, "fsession_test__open_and_load"))
> > +		goto cleanup;
> > +
> > +	/* First attach */
> > +	err = fsession_test__attach(skel);
> > +	if (!ASSERT_OK(err, "fsession_first_attach"))
> > +		goto cleanup;
> > +
> > +	/* Trigger test function calls */
> > +	prog_fd = bpf_program__fd(skel->progs.test1);
> > +	err = bpf_prog_test_run_opts(prog_fd, &topts);
> > +	if (!ASSERT_OK(err, "test_run_opts err"))
> > +		return;
> 
> goto cleanup

ACK.

> 
> > +	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
> > +		return;
> 
> goto cleanup

ACK.

> 
> > +
> > +	/* Verify first call */
> > +	ASSERT_EQ(skel->bss->test1_entry_called, 1, "test1_entry_first");
> > +	ASSERT_EQ(skel->bss->test1_exit_called, 1, "test1_exit_first");
> > +
> > +	/* Detach */
> > +	fsession_test__detach(skel);
> > +
> > +	/* Reset counters */
> > +	memset(skel->bss, 0, sizeof(*skel->bss));
> > +
> > +	/* Second attach */
> > +	err = fsession_test__attach(skel);
> > +	if (!ASSERT_OK(err, "fsession_second_attach"))
> > +		goto cleanup;
> > +
> > +	err = bpf_prog_test_run_opts(prog_fd, &topts);
> > +	if (!ASSERT_OK(err, "test_run_opts err"))
> > +		return;
> 
> goto cleanup

ACK.

> 
> > +	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
> > +		return;
> 
> goto cleanup

ACK.

> 
> > +
> > +	/* Verify second call */
> > +	ASSERT_EQ(skel->bss->test1_entry_called, 1, "test1_entry_second");
> > +	ASSERT_EQ(skel->bss->test1_exit_called, 1, "test1_exit_second");
> > +
> > +cleanup:
> > +	fsession_test__destroy(skel);
> > +}
> > +
> > +void test_fsession_test(void)
> > +{
> > +#if !defined(__x86_64__)
> > +	test__skip();
> > +	return;
> > +#endif
> > +	if (test__start_subtest("fsession_basic"))
> > +		test_fsession_basic();
> > +	if (test__start_subtest("fsession_reattach"))
> > +		test_fsession_reattach();
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/fsession_test.c b/tools/testing/selftests/bpf/progs/fsession_test.c
> > new file mode 100644
> > index 000000000000..cce2b32f7c2c
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/fsession_test.c
> > @@ -0,0 +1,178 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2025 ChinaTelecom */
> > +#include <vmlinux.h>
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +
> > +char _license[] SEC("license") = "GPL";
> > +
> > +__u64 test1_entry_result = 0;
> > +__u64 test1_exit_result = 0;
> > +__u64 test1_entry_called = 0;
> > +__u64 test1_exit_called = 0;
> > +
> > +SEC("fsession/bpf_fentry_test1")
> > +int BPF_PROG(test1, int a)
> > +{
> 
> I guess we can access return argument directly but it makes sense only
> for exit session program, or we could use bpf_get_func_ret

Yeah, we can access the return value directly here or use
bpf_get_func_ret(). For fentry, it is also allow to access the return value.
It makes no sense to obtain the return value in the fentry, and
what it gets is just the previous fsession-fentry returned.
And it needs more effort in the verifier to forbid such operation.

The testcases is not complete, and I'll add more testcases in the
next version to cover more cases.

Thanks!
Menglong Dong

> 
> jirka
> 
> 
> > +	bool is_exit = bpf_tracing_is_exit(ctx);
> > +
> > +	if (!is_exit) {
> > +		/* This is entry */
> > +		test1_entry_called = 1;
> > +		test1_entry_result = a == 1;
> > +		return 0; /* Return 0 to allow exit to be called */
> > +	}
> > +
> > +	/* This is exit */
> > +	test1_exit_called = 1;
> > +	test1_exit_result = a == 1;
> > +	return 0;
> > +}
> > +
> > +__u64 test2_entry_result = 0;
> > +__u64 test2_exit_result = 0;
> > +__u64 test2_entry_called = 0;
> > +__u64 test2_exit_called = 0;
> > +
> 
> SNIP
> 
> 





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC bpf-next 0/5] bpf: tracing session supporting
  2025-10-20  8:18 ` [PATCH RFC bpf-next 0/5] bpf: tracing session supporting Jiri Olsa
@ 2025-10-20  8:55   ` Menglong Dong
  0 siblings, 0 replies; 17+ messages in thread
From: Menglong Dong @ 2025-10-20  8:55 UTC (permalink / raw)
  To: Menglong Dong, Jiri Olsa
  Cc: ast, daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, mathieu.desnoyers, leon.hwang, bpf, linux-kernel,
	linux-trace-kernel

On 2025/10/20 16:18, Jiri Olsa wrote:
> On Sat, Oct 18, 2025 at 10:21:19PM +0800, Menglong Dong wrote:
> > Sometimes, we need to hook both the entry and exit of a function with
> > TRACING. Therefore, we need define a FENTRY and a FEXIT for the target
> > function, which is not convenient.
> > 
> > Therefore, we add a tracing session support for TRACING. Generally
> > speaking, it's similar to kprobe session, which can hook both the entry
> > and exit of a function with a single BPF program. Meanwhile, it can also
> > control the execution of the fexit with the return value of the fentry.
> > session cookie is not supported yet, and I'm not sure if it's necessary.
> 
> hi,
> I think it'd be useful to have support for cookie, people that use kprobe
> session because of multi attach, could easily migrate to trampolines once
> we have fast multi attach for trampolines

OK, I'll implement it in the next version.

> 
> jirka
> 
> 
> > 
> > For now, only x86_64 is supported. Other architectures will be supported
> > later.
> > 
> > Menglong Dong (5):
> >   bpf: add tracing session support
> >   bpf: add kfunc bpf_tracing_is_exit for TRACE_SESSION
> >   bpf,x86: add tracing session supporting for x86_64
> >   libbpf: add support for tracing session
> >   selftests/bpf: add testcases for tracing session
> > 
> >  arch/arm64/net/bpf_jit_comp.c                 |   3 +
> >  arch/loongarch/net/bpf_jit.c                  |   3 +
> >  arch/powerpc/net/bpf_jit_comp.c               |   3 +
> >  arch/riscv/net/bpf_jit_comp64.c               |   3 +
> >  arch/s390/net/bpf_jit_comp.c                  |   3 +
> >  arch/x86/net/bpf_jit_comp.c                   | 115 ++++++++++-
> >  include/linux/bpf.h                           |   1 +
> >  include/uapi/linux/bpf.h                      |   1 +
> >  kernel/bpf/btf.c                              |   2 +
> >  kernel/bpf/syscall.c                          |   2 +
> >  kernel/bpf/trampoline.c                       |   5 +-
> >  kernel/bpf/verifier.c                         |  17 +-
> >  kernel/trace/bpf_trace.c                      |  43 ++++-
> >  net/bpf/test_run.c                            |   1 +
> >  net/core/bpf_sk_storage.c                     |   1 +
> >  tools/bpf/bpftool/common.c                    |   1 +
> >  tools/include/uapi/linux/bpf.h                |   1 +
> >  tools/lib/bpf/bpf.c                           |   2 +
> >  tools/lib/bpf/libbpf.c                        |   3 +
> >  .../selftests/bpf/prog_tests/fsession_test.c  | 132 +++++++++++++
> >  .../selftests/bpf/progs/fsession_test.c       | 178 ++++++++++++++++++
> >  21 files changed, 511 insertions(+), 9 deletions(-)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/fsession_test.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/fsession_test.c
> > 
> 
> 





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64
  2025-10-18 14:21 ` [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64 Menglong Dong
  2025-10-19  2:03   ` Menglong Dong
  2025-10-20  8:19   ` Jiri Olsa
@ 2025-10-21 18:16   ` Alexei Starovoitov
  2025-10-22  1:05     ` Menglong Dong
  2 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2025-10-21 18:16 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexei Starovoitov, Jiri Olsa, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Matt Bobrowski, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Leon Hwang, bpf, LKML, linux-trace-kernel

On Sat, Oct 18, 2025 at 7:21 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
>  /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
>  #define LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack)    \
>         __LOAD_TCC_PTR(-round_up(stack, 8) - 8)
> @@ -3179,8 +3270,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>                                          void *func_addr)
>  {
>         int i, ret, nr_regs = m->nr_args, stack_size = 0;
> -       int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
> +       int regs_off, nregs_off, session_off, ip_off, run_ctx_off,
> +           arg_stack_off, rbx_off;
>         struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
> +       struct bpf_tramp_links *session = &tlinks[BPF_TRAMP_SESSION];
>         struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
>         struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
>         void *orig_call = func_addr;
> @@ -3222,6 +3315,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>          *
>          * RBP - nregs_off [ regs count      ]  always
>          *
> +        * RBP - session_off [ session flags ] tracing session
> +        *
>          * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
>          *
>          * RBP - rbx_off   [ rbx value       ]  always
> @@ -3246,6 +3341,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
>         /* regs count  */
>         stack_size += 8;
>         nregs_off = stack_size;
> +       stack_size += 8;
> +       session_off = stack_size;

Unconditional stack increase? :(

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64
  2025-10-21 18:16   ` Alexei Starovoitov
@ 2025-10-22  1:05     ` Menglong Dong
  0 siblings, 0 replies; 17+ messages in thread
From: Menglong Dong @ 2025-10-22  1:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Jiri Olsa, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Matt Bobrowski, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Leon Hwang, bpf, LKML, linux-trace-kernel

On Wed, Oct 22, 2025 at 2:17 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Oct 18, 2025 at 7:21 AM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >  /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
> >  #define LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack)    \
> >         __LOAD_TCC_PTR(-round_up(stack, 8) - 8)
> > @@ -3179,8 +3270,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> >                                          void *func_addr)
> >  {
> >         int i, ret, nr_regs = m->nr_args, stack_size = 0;
> > -       int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
> > +       int regs_off, nregs_off, session_off, ip_off, run_ctx_off,
> > +           arg_stack_off, rbx_off;
> >         struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
> > +       struct bpf_tramp_links *session = &tlinks[BPF_TRAMP_SESSION];
> >         struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
> >         struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
> >         void *orig_call = func_addr;
> > @@ -3222,6 +3315,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> >          *
> >          * RBP - nregs_off [ regs count      ]  always
> >          *
> > +        * RBP - session_off [ session flags ] tracing session
> > +        *
> >          * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
> >          *
> >          * RBP - rbx_off   [ rbx value       ]  always
> > @@ -3246,6 +3341,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> >         /* regs count  */
> >         stack_size += 8;
> >         nregs_off = stack_size;
> > +       stack_size += 8;
> > +       session_off = stack_size;
>
> Unconditional stack increase? :(

Ah, it should be conditional increase and I made a mistake here,
which will be fixed in the V2.

In fact, we can't add the session stuff here. Once we make it
conditional increase, we can't tell the location of "ip" in
bpf_get_func_ip() anymore, as we can't tell if session stuff exist
in bpf_get_func_ip().

Several solution that I come up:

1. reuse the nregs_off. It's 8-bytes, but 1-byte is enough for it.
Therefore, we can store some metadata flags to the high 7-bytes
of it, such as "SESSION_EXIST" or "IP_OFFSET". And then,
we can get the offset of the ip in bpf_get_func_ip().
It works, but it will make the code more confusing.

2. Introduce a bpf_tramp_session_run_ctx:
struct bpf_tramp_session_run_ctx {
  struct bpf_tramp_run_ctx;
  __u64 session_flags;
  __u64 session_cookie;
}
If the session exist, use the bpf_tramp_session_run_ctx in the
trampoline.
It work and simple.

3. Add the session stuff to the tail of the context, which means
after the "return value". And the stack will become this:
session cookie -> 8-bytes if session
session flags   -> 8-bytes if session
return value     -> 8-bytes
argN
.....
arg1

Both method 2 and method 3 work and simple, and I decide use
the method 3 in the V2.

Thanks!
Menglong Dong

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-10-22  1:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-18 14:21 [PATCH RFC bpf-next 0/5] bpf: tracing session supporting Menglong Dong
2025-10-18 14:21 ` [PATCH RFC bpf-next 1/5] bpf: add tracing session support Menglong Dong
2025-10-18 14:21 ` [PATCH RFC bpf-next 2/5] bpf: add kfunc bpf_tracing_is_exit for TRACE_SESSION Menglong Dong
2025-10-20  8:19   ` Jiri Olsa
2025-10-20  8:30     ` Menglong Dong
2025-10-18 14:21 ` [PATCH RFC bpf-next 3/5] bpf,x86: add tracing session supporting for x86_64 Menglong Dong
2025-10-19  2:03   ` Menglong Dong
2025-10-20  8:19   ` Jiri Olsa
2025-10-20  8:31     ` Menglong Dong
2025-10-21 18:16   ` Alexei Starovoitov
2025-10-22  1:05     ` Menglong Dong
2025-10-18 14:21 ` [PATCH RFC bpf-next 4/5] libbpf: add support for tracing session Menglong Dong
2025-10-18 14:21 ` [PATCH RFC bpf-next 5/5] selftests/bpf: add testcases " Menglong Dong
2025-10-20  8:19   ` Jiri Olsa
2025-10-20  8:40     ` Menglong Dong
2025-10-20  8:18 ` [PATCH RFC bpf-next 0/5] bpf: tracing session supporting Jiri Olsa
2025-10-20  8:55   ` Menglong Dong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).