[PATCH bpf-next v3 0/7] bpf: tracing session supporting

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next v3 0/7] bpf: tracing session supporting
@ 2025-10-26  3:01 Menglong Dong
  2025-10-26  3:01 ` [PATCH bpf-next v3 1/7] bpf: add tracing session support Menglong Dong
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Menglong Dong @ 2025-10-26  3:01 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

Sometimes, we need to hook both the entry and exit of a function with
TRACING. Therefore, we need define a FENTRY and a FEXIT for the target
function, which is not convenient.

Therefore, we add a tracing session support for TRACING. Generally
speaking, it's similar to kprobe session, which can hook both the entry
and exit of a function with a single BPF program. Meanwhile, it can also
control the execution of the fexit with the return value of the fentry.
Session cookie is also supported with the kfunc bpf_fsession_cookie().

The kfunc bpf_tracing_is_exit() and bpf_fsession_cookie() are both inlined
in the verifier.

We allow the usage of bpf_get_func_ret() to get the return value in the
fentry of the tracing session, as it will always get "0", which is safe
enough and is OK.

The while fsession stuff is arch related, so the -EOPNOTSUPP will be
returned if it is not supported yet by the arch. In this series, we only
support x86_64. And later, other arch will be implemented.

Changes since v2:
* squeeze some patches:
  - the 2 patches for the kfunc bpf_tracing_is_exit() and
    bpf_fsession_cookie() are merged into the second patch.
  - the testcases for fsession are also squeezed.

* fix the CI error by move the testcase for bpf_get_func_ip to
  fsession_test.c

Changes since v1:
* session cookie support.
  In this version, session cookie is implemented, and the kfunc
  bpf_fsession_cookie() is added.

* restructure the layout of the stack.
  In this version, the session stuff that stored in the stack is changed,
  and we locate them after the return value to not break
  bpf_get_func_ip().

* testcase enhancement.
  Some nits in the testcase that suggested by Jiri is fixed. Meanwhile,
  the testcase for get_func_ip and session cookie is added too.

Menglong Dong (7):
  bpf: add tracing session support
  bpf: add two kfunc for TRACE_SESSION
  bpf,x86: add ret_off to invoke_bpf()
  bpf,x86: add tracing session supporting for x86_64
  libbpf: add support for tracing session
  selftests/bpf: add testcases for tracing session
  selftests/bpf: test fsession mixed with fentry and fexit

 arch/arm64/net/bpf_jit_comp.c                 |   3 +
 arch/loongarch/net/bpf_jit.c                  |   3 +
 arch/powerpc/net/bpf_jit_comp.c               |   3 +
 arch/riscv/net/bpf_jit_comp64.c               |   3 +
 arch/s390/net/bpf_jit_comp.c                  |   3 +
 arch/x86/net/bpf_jit_comp.c                   | 221 +++++++++++++--
 include/linux/bpf.h                           |   2 +
 include/uapi/linux/bpf.h                      |   1 +
 kernel/bpf/btf.c                              |   2 +
 kernel/bpf/syscall.c                          |   2 +
 kernel/bpf/trampoline.c                       |   5 +-
 kernel/bpf/verifier.c                         |  45 ++-
 kernel/trace/bpf_trace.c                      |  59 +++-
 net/bpf/test_run.c                            |   1 +
 net/core/bpf_sk_storage.c                     |   1 +
 tools/bpf/bpftool/common.c                    |   1 +
 tools/include/uapi/linux/bpf.h                |   1 +
 tools/lib/bpf/bpf.c                           |   2 +
 tools/lib/bpf/libbpf.c                        |   3 +
 .../selftests/bpf/prog_tests/fsession_test.c  |  95 +++++++
 .../bpf/prog_tests/tracing_failure.c          |   2 +-
 .../selftests/bpf/progs/fsession_test.c       | 264 ++++++++++++++++++
 22 files changed, 694 insertions(+), 28 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/fsession_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/fsession_test.c

-- 
2.51.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH bpf-next v3 1/7] bpf: add tracing session support
  2025-10-26  3:01 [PATCH bpf-next v3 0/7] bpf: tracing session supporting Menglong Dong
@ 2025-10-26  3:01 ` Menglong Dong
  2025-10-26  3:01 ` [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION Menglong Dong
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-10-26  3:01 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

The tracing session is something that similar to kprobe session. It allow
to attach a single BPF program to both the entry and the exit of the
target functions.

While a non-zero value is returned by the fentry, the fexit will be
skipped, which is similar to kprobe session.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 arch/arm64/net/bpf_jit_comp.c                        |  3 +++
 arch/loongarch/net/bpf_jit.c                         |  3 +++
 arch/powerpc/net/bpf_jit_comp.c                      |  3 +++
 arch/riscv/net/bpf_jit_comp64.c                      |  3 +++
 arch/s390/net/bpf_jit_comp.c                         |  3 +++
 arch/x86/net/bpf_jit_comp.c                          |  3 +++
 include/linux/bpf.h                                  |  1 +
 include/uapi/linux/bpf.h                             |  1 +
 kernel/bpf/btf.c                                     |  2 ++
 kernel/bpf/syscall.c                                 |  2 ++
 kernel/bpf/trampoline.c                              |  5 ++++-
 kernel/bpf/verifier.c                                | 12 +++++++++---
 net/bpf/test_run.c                                   |  1 +
 net/core/bpf_sk_storage.c                            |  1 +
 tools/include/uapi/linux/bpf.h                       |  1 +
 .../selftests/bpf/prog_tests/tracing_failure.c       |  2 +-
 16 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index ab83089c3d8f..06f4bd6c6755 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -2788,6 +2788,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
 	void *image, *tmp;
 	int ret;
 
+	if (tlinks[BPF_TRAMP_SESSION].nr_links)
+		return -EOPNOTSUPP;
+
 	/* image doesn't need to be in module memory range, so we can
 	 * use kvmalloc.
 	 */
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index cbe53d0b7fb0..ad596341658a 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -1739,6 +1739,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
 	void *image, *tmp;
 	struct jit_ctx ctx;
 
+	if (tlinks[BPF_TRAMP_SESSION].nr_links)
+		return -EOPNOTSUPP;
+
 	size = ro_image_end - ro_image;
 	image = kvmalloc(size, GFP_KERNEL);
 	if (!image)
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 88ad5ba7b87f..bcc0ce09f6fa 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -1017,6 +1017,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	void *rw_image, *tmp;
 	int ret;
 
+	if (tlinks[BPF_TRAMP_SESSION].nr_links)
+		return -EOPNOTSUPP;
+
 	/*
 	 * rw_image doesn't need to be in module memory range, so we can
 	 * use kvmalloc.
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 45cbc7c6fe49..55b0284bf177 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -1286,6 +1286,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *ro_image,
 	struct rv_jit_context ctx;
 	u32 size = ro_image_end - ro_image;
 
+	if (tlinks[BPF_TRAMP_SESSION].nr_links)
+		return -EOPNOTSUPP;
+
 	image = kvmalloc(size, GFP_KERNEL);
 	if (!image)
 		return -ENOMEM;
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index cf461d76e9da..3f25bf55b150 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -2924,6 +2924,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image,
 	struct bpf_tramp_jit tjit;
 	int ret;
 
+	if (tlinks[BPF_TRAMP_SESSION].nr_links)
+		return -EOPNOTSUPP;
+
 	/* Compute offsets, check whether the code fits. */
 	memset(&tjit, 0, sizeof(tjit));
 	ret = __arch_prepare_bpf_trampoline(im, &tjit, m, flags,
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index d4c93d9e73e4..389c3a96e2b8 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3478,6 +3478,9 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	int ret;
 	u32 size = image_end - image;
 
+	if (tlinks[BPF_TRAMP_SESSION].nr_links)
+		return -EOPNOTSUPP;
+
 	/* rw_image doesn't need to be in module memory range, so we can
 	 * use kvmalloc.
 	 */
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e53cda0aabb6..6b5855c80fa6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1270,6 +1270,7 @@ enum bpf_tramp_prog_type {
 	BPF_TRAMP_FENTRY,
 	BPF_TRAMP_FEXIT,
 	BPF_TRAMP_MODIFY_RETURN,
+	BPF_TRAMP_SESSION,
 	BPF_TRAMP_MAX,
 	BPF_TRAMP_REPLACE, /* more than MAX */
 };
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6829936d33f5..79ba3023e8be 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1133,6 +1133,7 @@ enum bpf_attach_type {
 	BPF_NETKIT_PEER,
 	BPF_TRACE_KPROBE_SESSION,
 	BPF_TRACE_UPROBE_SESSION,
+	BPF_TRACE_SESSION,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 0de8fc8a0e0b..2c1c3e0caff8 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6107,6 +6107,7 @@ static int btf_validate_prog_ctx_type(struct bpf_verifier_log *log, const struct
 		case BPF_TRACE_FENTRY:
 		case BPF_TRACE_FEXIT:
 		case BPF_MODIFY_RETURN:
+		case BPF_TRACE_SESSION:
 			/* allow u64* as ctx */
 			if (btf_is_int(t) && t->size == 8)
 				return 0;
@@ -6704,6 +6705,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 			fallthrough;
 		case BPF_LSM_CGROUP:
 		case BPF_TRACE_FEXIT:
+		case BPF_TRACE_SESSION:
 			/* When LSM programs are attached to void LSM hooks
 			 * they use FEXIT trampolines and when attached to
 			 * int LSM hooks, they use MODIFY_RETURN trampolines.
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8a129746bd6c..cb483701fe39 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3564,6 +3564,7 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog,
 	case BPF_PROG_TYPE_TRACING:
 		if (prog->expected_attach_type != BPF_TRACE_FENTRY &&
 		    prog->expected_attach_type != BPF_TRACE_FEXIT &&
+		    prog->expected_attach_type != BPF_TRACE_SESSION &&
 		    prog->expected_attach_type != BPF_MODIFY_RETURN) {
 			err = -EINVAL;
 			goto out_put_prog;
@@ -4337,6 +4338,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 	case BPF_TRACE_RAW_TP:
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
+	case BPF_TRACE_SESSION:
 	case BPF_MODIFY_RETURN:
 		return BPF_PROG_TYPE_TRACING;
 	case BPF_LSM_MAC:
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 5949095e51c3..f6d4dea3461e 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -111,7 +111,7 @@ bool bpf_prog_has_trampoline(const struct bpf_prog *prog)
 
 	return (ptype == BPF_PROG_TYPE_TRACING &&
 		(eatype == BPF_TRACE_FENTRY || eatype == BPF_TRACE_FEXIT ||
-		 eatype == BPF_MODIFY_RETURN)) ||
+		 eatype == BPF_MODIFY_RETURN || eatype == BPF_TRACE_SESSION)) ||
 		(ptype == BPF_PROG_TYPE_LSM && eatype == BPF_LSM_MAC);
 }
 
@@ -418,6 +418,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut
 	tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX);
 
 	if (tlinks[BPF_TRAMP_FEXIT].nr_links ||
+	    tlinks[BPF_TRAMP_SESSION].nr_links ||
 	    tlinks[BPF_TRAMP_MODIFY_RETURN].nr_links) {
 		/* NOTE: BPF_TRAMP_F_RESTORE_REGS and BPF_TRAMP_F_SKIP_FRAME
 		 * should not be set together.
@@ -515,6 +516,8 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
 		return BPF_TRAMP_MODIFY_RETURN;
 	case BPF_TRACE_FEXIT:
 		return BPF_TRAMP_FEXIT;
+	case BPF_TRACE_SESSION:
+		return BPF_TRAMP_SESSION;
 	case BPF_LSM_MAC:
 		if (!prog->aux->attach_func_proto->type)
 			/* The function returns void, we cannot modify its
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6d175849e57a..818deb6a06e4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -17281,6 +17281,7 @@ static int check_return_code(struct bpf_verifier_env *env, int regno, const char
 			break;
 		case BPF_TRACE_RAW_TP:
 		case BPF_MODIFY_RETURN:
+		case BPF_TRACE_SESSION:
 			return 0;
 		case BPF_TRACE_ITER:
 			break;
@@ -22754,6 +22755,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
 		if (prog_type == BPF_PROG_TYPE_TRACING &&
 		    insn->imm == BPF_FUNC_get_func_ret) {
 			if (eatype == BPF_TRACE_FEXIT ||
+			    eatype == BPF_TRACE_SESSION ||
 			    eatype == BPF_MODIFY_RETURN) {
 				/* Load nr_args from ctx - 8 */
 				insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
@@ -23695,7 +23697,8 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 		if (tgt_prog->type == BPF_PROG_TYPE_TRACING &&
 		    prog_extension &&
 		    (tgt_prog->expected_attach_type == BPF_TRACE_FENTRY ||
-		     tgt_prog->expected_attach_type == BPF_TRACE_FEXIT)) {
+		     tgt_prog->expected_attach_type == BPF_TRACE_FEXIT ||
+		     tgt_prog->expected_attach_type == BPF_TRACE_SESSION)) {
 			/* Program extensions can extend all program types
 			 * except fentry/fexit. The reason is the following.
 			 * The fentry/fexit programs are used for performance
@@ -23710,7 +23713,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 			 * beyond reasonable stack size. Hence extending fentry
 			 * is not allowed.
 			 */
-			bpf_log(log, "Cannot extend fentry/fexit\n");
+			bpf_log(log, "Cannot extend fentry/fexit/session\n");
 			return -EINVAL;
 		}
 	} else {
@@ -23794,6 +23797,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 	case BPF_LSM_CGROUP:
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
+	case BPF_TRACE_SESSION:
 		if (!btf_type_is_func(t)) {
 			bpf_log(log, "attach_btf_id %u is not a function\n",
 				btf_id);
@@ -23960,6 +23964,7 @@ static bool can_be_sleepable(struct bpf_prog *prog)
 		case BPF_TRACE_FEXIT:
 		case BPF_MODIFY_RETURN:
 		case BPF_TRACE_ITER:
+		case BPF_TRACE_SESSION:
 			return true;
 		default:
 			return false;
@@ -24041,9 +24046,10 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 			tgt_info.tgt_name);
 		return -EINVAL;
 	} else if ((prog->expected_attach_type == BPF_TRACE_FEXIT ||
+		   prog->expected_attach_type == BPF_TRACE_SESSION ||
 		   prog->expected_attach_type == BPF_MODIFY_RETURN) &&
 		   btf_id_set_contains(&noreturn_deny, btf_id)) {
-		verbose(env, "Attaching fexit/fmod_ret to __noreturn function '%s' is rejected.\n",
+		verbose(env, "Attaching fexit/session/fmod_ret to __noreturn function '%s' is rejected.\n",
 			tgt_info.tgt_name);
 		return -EINVAL;
 	}
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 655efac6f133..ddec08b696de 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -685,6 +685,7 @@ int bpf_prog_test_run_tracing(struct bpf_prog *prog,
 	switch (prog->expected_attach_type) {
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
+	case BPF_TRACE_SESSION:
 		if (bpf_fentry_test1(1) != 2 ||
 		    bpf_fentry_test2(2, 3) != 5 ||
 		    bpf_fentry_test3(4, 5, 6) != 15 ||
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index d3fbaf89a698..8da8834aa134 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -365,6 +365,7 @@ static bool bpf_sk_storage_tracing_allowed(const struct bpf_prog *prog)
 		return true;
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
+	case BPF_TRACE_SESSION:
 		return !!strncmp(prog->aux->attach_func_name, "bpf_sk_storage",
 				 strlen("bpf_sk_storage"));
 	default:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6829936d33f5..79ba3023e8be 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1133,6 +1133,7 @@ enum bpf_attach_type {
 	BPF_NETKIT_PEER,
 	BPF_TRACE_KPROBE_SESSION,
 	BPF_TRACE_UPROBE_SESSION,
+	BPF_TRACE_SESSION,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/tools/testing/selftests/bpf/prog_tests/tracing_failure.c b/tools/testing/selftests/bpf/prog_tests/tracing_failure.c
index 10e231965589..58b02552507d 100644
--- a/tools/testing/selftests/bpf/prog_tests/tracing_failure.c
+++ b/tools/testing/selftests/bpf/prog_tests/tracing_failure.c
@@ -73,7 +73,7 @@ static void test_tracing_deny(void)
 static void test_fexit_noreturns(void)
 {
 	test_tracing_fail_prog("fexit_noreturns",
-			       "Attaching fexit/fmod_ret to __noreturn function 'do_exit' is rejected.");
+			       "Attaching fexit/session/fmod_ret to __noreturn function 'do_exit' is rejected.");
 }
 
 void test_tracing_failure(void)
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION
  2025-10-26  3:01 [PATCH bpf-next v3 0/7] bpf: tracing session supporting Menglong Dong
  2025-10-26  3:01 ` [PATCH bpf-next v3 1/7] bpf: add tracing session support Menglong Dong
@ 2025-10-26  3:01 ` Menglong Dong
  2025-10-26  4:42   ` kernel test robot
                     ` (3 more replies)
  2025-10-26  3:01 ` [PATCH bpf-next v3 3/7] bpf,x86: add ret_off to invoke_bpf() Menglong Dong
                   ` (4 subsequent siblings)
  6 siblings, 4 replies; 24+ messages in thread
From: Menglong Dong @ 2025-10-26  3:01 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

If TRACE_SESSION exists, we will use extra 8-bytes in the stack of the
trampoline to store the flags that we needed, and the 8-bytes lie after
the return value, which means ctx[nr_args + 1]. And we will store the
flag "is_exit" to the first bit of it.

Introduce the kfunc bpf_tracing_is_exit(), which is used to tell if it
is fexit currently. Meanwhile, inline it in the verifier.

Add the kfunc bpf_fsession_cookie(), which is similar to
bpf_session_cookie() and return the address of the session cookie. The
address of the session cookie is stored after session flags, which means
ctx[nr_args + 2]. Inline this kfunc in the verifier too.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
v3:
- merge the bpf_tracing_is_exit and bpf_fsession_cookie into a single
  patch

v2:
- store the session flags after return value, instead of before nr_args
- inline the bpf_tracing_is_exit, as Jiri suggested
---
 include/linux/bpf.h      |  1 +
 kernel/bpf/verifier.c    | 33 ++++++++++++++++++++--
 kernel/trace/bpf_trace.c | 59 ++++++++++++++++++++++++++++++++++++++--
 3 files changed, 88 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6b5855c80fa6..ce55d3881c0d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1736,6 +1736,7 @@ struct bpf_prog {
 				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
 				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
 				call_get_func_ip:1, /* Do we call get_func_ip() */
+				call_session_cookie:1, /* Do we call bpf_fsession_cookie() */
 				tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
 				sleepable:1;	/* BPF program is sleepable */
 	enum bpf_prog_type	type;		/* Type of BPF program */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 818deb6a06e4..6f8aa4718d6f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12293,6 +12293,8 @@ enum special_kfunc_type {
 	KF___bpf_trap,
 	KF_bpf_task_work_schedule_signal,
 	KF_bpf_task_work_schedule_resume,
+	KF_bpf_tracing_is_exit,
+	KF_bpf_fsession_cookie,
 };
 
 BTF_ID_LIST(special_kfunc_list)
@@ -12365,6 +12367,8 @@ BTF_ID(func, bpf_res_spin_unlock_irqrestore)
 BTF_ID(func, __bpf_trap)
 BTF_ID(func, bpf_task_work_schedule_signal)
 BTF_ID(func, bpf_task_work_schedule_resume)
+BTF_ID(func, bpf_tracing_is_exit)
+BTF_ID(func, bpf_fsession_cookie)
 
 static bool is_task_work_add_kfunc(u32 func_id)
 {
@@ -12419,7 +12423,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	struct bpf_reg_state *reg = &regs[regno];
 	bool arg_mem_size = false;
 
-	if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx])
+	if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
+	    meta->func_id == special_kfunc_list[KF_bpf_tracing_is_exit] ||
+	    meta->func_id == special_kfunc_list[KF_bpf_fsession_cookie])
 		return KF_ARG_PTR_TO_CTX;
 
 	/* In this function, we verify the kfunc's BTF as per the argument type,
@@ -13912,7 +13918,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		}
 	}
 
-	if (meta.func_id == special_kfunc_list[KF_bpf_session_cookie]) {
+	if (meta.func_id == special_kfunc_list[KF_bpf_session_cookie] ||
+	    meta.func_id == special_kfunc_list[KF_bpf_fsession_cookie]) {
 		meta.r0_size = sizeof(u64);
 		meta.r0_rdonly = false;
 	}
@@ -14193,6 +14200,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			return err;
 	}
 
+	if (meta.func_id == special_kfunc_list[KF_bpf_fsession_cookie])
+		env->prog->call_session_cookie = true;
+
 	return 0;
 }
 
@@ -22012,6 +22022,25 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		   desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
 		insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1);
 		*cnt = 1;
+	} else if (desc->func_id == special_kfunc_list[KF_bpf_tracing_is_exit]) {
+		/* Load nr_args from ctx - 8 */
+		insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
+		/* add rax, 1 */
+		insn_buf[1] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1);
+		insn_buf[2] = BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 3);
+		insn_buf[3] = BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1);
+		insn_buf[4] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0);
+		insn_buf[5] = BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 1);
+		*cnt = 6;
+	} else if (desc->func_id == special_kfunc_list[KF_bpf_fsession_cookie]) {
+		/* Load nr_args from ctx - 8 */
+		insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
+		/* add rax, 2 */
+		insn_buf[1] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 2);
+		insn_buf[2] = BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 3);
+		insn_buf[3] = BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1);
+		insn_buf[4] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0);
+		*cnt = 5;
 	}
 
 	if (env->insn_aux_data[insn_idx].arg_prog) {
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 4f87c16d915a..4a8568bd654d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3356,12 +3356,65 @@ static const struct btf_kfunc_id_set bpf_kprobe_multi_kfunc_set = {
 	.filter = bpf_kprobe_multi_filter,
 };
 
-static int __init bpf_kprobe_multi_kfuncs_init(void)
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc bool bpf_tracing_is_exit(void *ctx)
 {
-	return register_btf_kfunc_id_set(BPF_PROG_TYPE_KPROBE, &bpf_kprobe_multi_kfunc_set);
+	/* This helper call is inlined by verifier. */
+	u64 nr_args = ((u64 *)ctx)[-1];
+
+	/*
+	 * ctx[nr_args + 1] is the session flags, and the last bit is
+	 * is_exit.
+	 */
+	return ((u64 *)ctx)[nr_args + 1] & 1;
+}
+
+__bpf_kfunc u64 *bpf_fsession_cookie(void *ctx)
+{
+	/* This helper call is inlined by verifier. */
+	u64 nr_args = ((u64 *)ctx)[-1];
+
+	/* ctx[nr_args + 2] is the session cookie address */
+	return (u64 *)((u64 *)ctx)[nr_args + 2];
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(tracing_kfunc_set_ids)
+BTF_ID_FLAGS(func, bpf_tracing_is_exit, KF_FASTCALL)
+BTF_ID_FLAGS(func, bpf_fsession_cookie, KF_FASTCALL)
+BTF_KFUNCS_END(tracing_kfunc_set_ids)
+
+static int bpf_tracing_filter(const struct bpf_prog *prog, u32 kfunc_id)
+{
+	if (!btf_id_set8_contains(&tracing_kfunc_set_ids, kfunc_id))
+		return 0;
+
+	if (prog->type != BPF_PROG_TYPE_TRACING ||
+	    prog->expected_attach_type != BPF_TRACE_SESSION)
+		return -EINVAL;
+
+	return 0;
+}
+
+static const struct btf_kfunc_id_set bpf_tracing_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set = &tracing_kfunc_set_ids,
+	.filter = bpf_tracing_filter,
+};
+
+static int __init bpf_trace_kfuncs_init(void)
+{
+	int err = 0;
+
+	err = err ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_KPROBE, &bpf_kprobe_multi_kfunc_set);
+	err = err ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_tracing_kfunc_set);
+
+	return err;
 }
 
-late_initcall(bpf_kprobe_multi_kfuncs_init);
+late_initcall(bpf_trace_kfuncs_init);
 
 typedef int (*copy_fn_t)(void *dst, const void *src, u32 size, struct task_struct *tsk);
 
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next v3 3/7] bpf,x86: add ret_off to invoke_bpf()
  2025-10-26  3:01 [PATCH bpf-next v3 0/7] bpf: tracing session supporting Menglong Dong
  2025-10-26  3:01 ` [PATCH bpf-next v3 1/7] bpf: add tracing session support Menglong Dong
  2025-10-26  3:01 ` [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION Menglong Dong
@ 2025-10-26  3:01 ` Menglong Dong
  2025-10-26  3:01 ` [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64 Menglong Dong
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-10-26  3:01 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

For now, the offset of the return value in trampoline is fixed 8-bytes.
In this commit, we introduce the variable "ret_off" to represent the
offset of the return value. For now, the "ret_off" is just 8. And in the
following patch, we will make it something else to use the room after it.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 arch/x86/net/bpf_jit_comp.c | 41 +++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 389c3a96e2b8..7a604ee9713f 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2940,7 +2940,7 @@ static void restore_regs(const struct btf_func_model *m, u8 **prog,
 
 static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 			   struct bpf_tramp_link *l, int stack_size,
-			   int run_ctx_off, bool save_ret,
+			   int run_ctx_off, bool save_ret, int ret_off,
 			   void *image, void *rw_image)
 {
 	u8 *prog = *pprog;
@@ -3005,7 +3005,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 	 * value of BPF_PROG_TYPE_STRUCT_OPS prog.
 	 */
 	if (save_ret)
-		emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
+		emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -ret_off);
 
 	/* replace 2 nops with JE insn, since jmp target is known */
 	jmp_insn[0] = X86_JE;
@@ -3055,7 +3055,7 @@ static int emit_cond_near_jump(u8 **pprog, void *func, void *ip, u8 jmp_cond)
 
 static int invoke_bpf(const struct btf_func_model *m, u8 **pprog,
 		      struct bpf_tramp_links *tl, int stack_size,
-		      int run_ctx_off, bool save_ret,
+		      int run_ctx_off, bool save_ret, int ret_off,
 		      void *image, void *rw_image)
 {
 	int i;
@@ -3063,7 +3063,8 @@ static int invoke_bpf(const struct btf_func_model *m, u8 **pprog,
 
 	for (i = 0; i < tl->nr_links; i++) {
 		if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size,
-				    run_ctx_off, save_ret, image, rw_image))
+				    run_ctx_off, save_ret, ret_off, image,
+				    rw_image))
 			return -EINVAL;
 	}
 	*pprog = prog;
@@ -3072,7 +3073,7 @@ static int invoke_bpf(const struct btf_func_model *m, u8 **pprog,
 
 static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
 			      struct bpf_tramp_links *tl, int stack_size,
-			      int run_ctx_off, u8 **branches,
+			      int run_ctx_off, int ret_off, u8 **branches,
 			      void *image, void *rw_image)
 {
 	u8 *prog = *pprog;
@@ -3082,18 +3083,18 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
 	 * Set this to 0 to avoid confusing the program.
 	 */
 	emit_mov_imm32(&prog, false, BPF_REG_0, 0);
-	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
+	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -ret_off);
 	for (i = 0; i < tl->nr_links; i++) {
 		if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, run_ctx_off, true,
-				    image, rw_image))
+				    ret_off, image, rw_image))
 			return -EINVAL;
 
-		/* mod_ret prog stored return value into [rbp - 8]. Emit:
-		 * if (*(u64 *)(rbp - 8) !=  0)
+		/* mod_ret prog stored return value into [rbp - ret_off]. Emit:
+		 * if (*(u64 *)(rbp - ret_off) !=  0)
 		 *	goto do_fexit;
 		 */
-		/* cmp QWORD PTR [rbp - 0x8], 0x0 */
-		EMIT4(0x48, 0x83, 0x7d, 0xf8); EMIT1(0x00);
+		/* cmp QWORD PTR [rbp - ret_off], 0x0 */
+		EMIT4(0x48, 0x83, 0x7d, -ret_off); EMIT1(0x00);
 
 		/* Save the location of the branch and Generate 6 nops
 		 * (4 bytes for an offset and 2 bytes for the jump) These nops
@@ -3179,7 +3180,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 					 void *func_addr)
 {
 	int i, ret, nr_regs = m->nr_args, stack_size = 0;
-	int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off;
+	int ret_off, regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off,
+	    rbx_off;
 	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
 	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
 	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
@@ -3213,7 +3215,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 	 * RBP + 8         [ return address  ]
 	 * RBP + 0         [ RBP             ]
 	 *
-	 * RBP - 8         [ return value    ]  BPF_TRAMP_F_CALL_ORIG or
+	 * RBP - ret_off   [ return value    ]  BPF_TRAMP_F_CALL_ORIG or
 	 *                                      BPF_TRAMP_F_RET_FENTRY_RET flags
 	 *
 	 *                 [ reg_argN        ]  always
@@ -3239,6 +3241,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 	save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
 	if (save_ret)
 		stack_size += 8;
+	ret_off = stack_size;
 
 	stack_size += nr_regs * 8;
 	regs_off = stack_size;
@@ -3341,7 +3344,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 
 	if (fentry->nr_links) {
 		if (invoke_bpf(m, &prog, fentry, regs_off, run_ctx_off,
-			       flags & BPF_TRAMP_F_RET_FENTRY_RET, image, rw_image))
+			       flags & BPF_TRAMP_F_RET_FENTRY_RET, ret_off,
+			       image, rw_image))
 			return -EINVAL;
 	}
 
@@ -3352,7 +3356,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 			return -ENOMEM;
 
 		if (invoke_bpf_mod_ret(m, &prog, fmod_ret, regs_off,
-				       run_ctx_off, branches, image, rw_image)) {
+				       run_ctx_off, ret_off, branches,
+				       image, rw_image)) {
 			ret = -EINVAL;
 			goto cleanup;
 		}
@@ -3380,7 +3385,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 			}
 		}
 		/* remember return value in a stack for bpf prog to access */
-		emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
+		emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -ret_off);
 		im->ip_after_call = image + (prog - (u8 *)rw_image);
 		emit_nops(&prog, X86_PATCH_SIZE);
 	}
@@ -3403,7 +3408,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 
 	if (fexit->nr_links) {
 		if (invoke_bpf(m, &prog, fexit, regs_off, run_ctx_off,
-			       false, image, rw_image)) {
+			       false, ret_off, image, rw_image)) {
 			ret = -EINVAL;
 			goto cleanup;
 		}
@@ -3433,7 +3438,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 
 	/* restore return value of orig_call or fentry prog back into RAX */
 	if (save_ret)
-		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
+		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -ret_off);
 
 	emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
 	EMIT1(0xC9); /* leave */
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-10-26  3:01 [PATCH bpf-next v3 0/7] bpf: tracing session supporting Menglong Dong
                   ` (2 preceding siblings ...)
  2025-10-26  3:01 ` [PATCH bpf-next v3 3/7] bpf,x86: add ret_off to invoke_bpf() Menglong Dong
@ 2025-10-26  3:01 ` Menglong Dong
  2025-10-31  1:42   ` Alexei Starovoitov
  2025-10-26  3:01 ` [PATCH bpf-next v3 5/7] libbpf: add support for tracing session Menglong Dong
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Menglong Dong @ 2025-10-26  3:01 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
invoke_bpf_session_exit is introduced for this purpose.

In invoke_bpf_session_entry(), we will check if the return value of the
fentry is 0, and set the corresponding session flag if not. And in
invoke_bpf_session_exit(), we will check if the corresponding flag is
set. If set, the fexit will be skipped.

As designed, the session flags and session cookie address is stored after
the return value, and the stack look like this:

  cookie ptr	-> 8 bytes
  session flags	-> 8 bytes
  return value	-> 8 bytes
  argN		-> 8 bytes
  ...
  arg1		-> 8 bytes
  nr_args	-> 8 bytes
  ...
  cookieN	-> 8 bytes
  cookie1	-> 8 bytes

In the entry of the session, we will clear the return value, so the fentry
will always get 0 with ctx[nr_args] or bpf_get_func_ret().

Before the execution of the BPF prog, the "cookie ptr" will be filled with
the corresponding cookie address, which is done in
invoke_bpf_session_entry() and invoke_bpf_session_exit().

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
v2:
- add session cookie support
- add the session stuff after return value, instead of before nr_args
---
 arch/x86/net/bpf_jit_comp.c | 185 +++++++++++++++++++++++++++++++++++-
 1 file changed, 181 insertions(+), 4 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 7a604ee9713f..2fffc530c88c 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -3109,6 +3109,148 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog,
 	return 0;
 }
 
+static int invoke_bpf_session_entry(const struct btf_func_model *m, u8 **pprog,
+				    struct bpf_tramp_links *tl, int stack_size,
+				    int run_ctx_off, int ret_off, int sflags_off,
+				    int cookies_off, void *image, void *rw_image)
+{
+	int i, j = 0, cur_cookie_off;
+	u64 session_flags;
+	u8 *prog = *pprog;
+	u8 *jmp_insn;
+
+	/* clear the session flags:
+	 *   xor rax, rax
+	 *   mov QWORD PTR [rbp - sflags_off], rax
+	 */
+	EMIT3(0x48, 0x31, 0xC0);
+	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -sflags_off);
+	/*
+	 * clear the return value to make sure bpf_get_func_ret() always
+	 * get 0 in fentry:
+	 *   mov QWORD PTR [rbp - 0x8], rax
+	 */
+	emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -ret_off);
+	/* clear all the cookies in the cookie array */
+	for (i = 0; i < tl->nr_links; i++) {
+		if (tl->links[i]->link.prog->call_session_cookie) {
+			cur_cookie_off = -cookies_off + j * 8;
+			/* mov QWORD PTR [rbp - sflags_off], rax */
+			emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0,
+				 cur_cookie_off);
+			j++;
+		}
+	}
+
+	j = 0;
+	for (i = 0; i < tl->nr_links; i++) {
+		if (tl->links[i]->link.prog->call_session_cookie) {
+			cur_cookie_off = -cookies_off + j * 8;
+			/*
+			 * save the cookie address to rbp - sflags_off + 8:
+			 *   lea rax, [rbp - cur_cookie_off]
+			 *   mov QWORD PTR [rbp - sflags_off + 8], rax
+			 */
+			if (!is_imm8(cur_cookie_off))
+				EMIT3_off32(0x48, 0x8D, 0x85, cur_cookie_off);
+			else
+				EMIT4(0x48, 0x8D, 0x45, cur_cookie_off);
+			emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -sflags_off + 8);
+			j++;
+		}
+		if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, run_ctx_off, true,
+				    ret_off, image, rw_image))
+			return -EINVAL;
+
+		/* fentry prog stored return value into [rbp - 8]. Emit:
+		 * if (*(u64 *)(rbp - ret_off) !=  0) {
+		 *	*(u64 *)(rbp - sflags_off) |= (1 << (i + 1));
+		 *	*(u64 *)(rbp - ret_off) = 0;
+		 * }
+		 */
+		/* cmp QWORD PTR [rbp - ret_off], 0x0 */
+		EMIT4(0x48, 0x83, 0x7d, -ret_off); EMIT1(0x00);
+		/* emit 2 nops that will be replaced with JE insn */
+		jmp_insn = prog;
+		emit_nops(&prog, 2);
+
+		session_flags = (1ULL << (i + 1));
+		/* mov rax, $session_flags */
+		emit_mov_imm64(&prog, BPF_REG_0, session_flags >> 32, (u32) session_flags);
+		/* or QWORD PTR [rbp - sflags_off], rax */
+		EMIT2(0x48, 0x09);
+		emit_insn_suffix(&prog, BPF_REG_FP, BPF_REG_0, -sflags_off);
+
+		/* mov QWORD PTR [rbp - ret_off], 0x0 */
+		EMIT4(0x48, 0xC7, 0x45, -ret_off); EMIT4(0x00, 0x00, 0x00, 0x00);
+
+		jmp_insn[0] = X86_JE;
+		jmp_insn[1] = prog - jmp_insn - 2;
+	}
+
+	*pprog = prog;
+	return 0;
+}
+
+static int invoke_bpf_session_exit(const struct btf_func_model *m, u8 **pprog,
+				   struct bpf_tramp_links *tl, int stack_size,
+				   int run_ctx_off, int ret_off, int sflags_off,
+				   int cookies_off, void *image, void *rw_image)
+{
+	int i, j = 0, cur_cookie_off;
+	u64 session_flags;
+	u8 *prog = *pprog;
+	u8 *jmp_insn;
+
+	/*
+	 * set the bpf_trace_is_exit flag to the session flags:
+	 *   mov rax, 1
+	 *   or QWORD PTR [rbp - sflags_off], rax
+	 */
+	emit_mov_imm32(&prog, false, BPF_REG_0, 1);
+	EMIT2(0x48, 0x09);
+	emit_insn_suffix(&prog, BPF_REG_FP, BPF_REG_0, -sflags_off);
+
+	for (i = 0; i < tl->nr_links; i++) {
+		if (tl->links[i]->link.prog->call_session_cookie) {
+			cur_cookie_off = -cookies_off + j * 8;
+			/*
+			 * save the cookie address to rbp - sflags_off + 8:
+			 *   lea rax, [rbp - cur_cookie_off]
+			 *   mov QWORD PTR [rbp - sflags_off + 8], rax
+			 */
+			if (!is_imm8(cur_cookie_off))
+				EMIT3_off32(0x48, 0x8D, 0x85, cur_cookie_off);
+			else
+				EMIT4(0x48, 0x8D, 0x45, cur_cookie_off);
+			emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -sflags_off + 8);
+			j++;
+		}
+		/* check if (1 << (i+1)) is set in the session flags, and
+		 * skip the execution of the fexit program if it is.
+		 */
+		session_flags = 1ULL << (i + 1);
+		/* mov rax, $session_flags */
+		emit_mov_imm64(&prog, BPF_REG_0, session_flags >> 32, (u32) session_flags);
+		/* test QWORD PTR [rbp - sflags_off], rax */
+		EMIT2(0x48, 0x85);
+		emit_insn_suffix(&prog, BPF_REG_FP, BPF_REG_0, -sflags_off);
+		/* emit 2 nops that will be replaced with JE insn */
+		jmp_insn = prog;
+		emit_nops(&prog, 2);
+
+		if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, run_ctx_off, false,
+				    ret_off, image, rw_image))
+			return -EINVAL;
+
+		jmp_insn[0] = X86_JNE;
+		jmp_insn[1] = prog - jmp_insn - 2;
+	}
+
+	*pprog = prog;
+	return 0;
+}
+
 /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
 #define LOAD_TRAMP_TAIL_CALL_CNT_PTR(stack)	\
 	__LOAD_TCC_PTR(-round_up(stack, 8) - 8)
@@ -3181,8 +3323,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 {
 	int i, ret, nr_regs = m->nr_args, stack_size = 0;
 	int ret_off, regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off,
-	    rbx_off;
+	    rbx_off, sflags_off = 0, cookies_off;
 	struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
+	struct bpf_tramp_links *session = &tlinks[BPF_TRAMP_SESSION];
 	struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
 	struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
 	void *orig_call = func_addr;
@@ -3215,6 +3358,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 	 * RBP + 8         [ return address  ]
 	 * RBP + 0         [ RBP             ]
 	 *
+	 *                  [ cookie ptr ] tracing session
+	 * RBP - sflags_off [ session flags ] tracing session
+	 *
 	 * RBP - ret_off   [ return value    ]  BPF_TRAMP_F_CALL_ORIG or
 	 *                                      BPF_TRAMP_F_RET_FENTRY_RET flags
 	 *
@@ -3230,6 +3376,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 	 *
 	 * RBP - run_ctx_off [ bpf_tramp_run_ctx ]
 	 *
+	 *                   [ session cookieN ]
+	 *                   [ ... ]
+	 * RBP - cookies_off [ session cookie1 ] tracing session
+	 *
 	 *                     [ stack_argN ]  BPF_TRAMP_F_CALL_ORIG
 	 *                     [ ...        ]
 	 *                     [ stack_arg2 ]
@@ -3237,6 +3387,12 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 	 * RSP                 [ tail_call_cnt_ptr ] BPF_TRAMP_F_TAIL_CALL_CTX
 	 */
 
+	/* room for session flags and cookie ptr */
+	if (session->nr_links) {
+		stack_size += 8 + 8;
+		sflags_off = stack_size;
+	}
+
 	/* room for return value of orig_call or fentry prog */
 	save_ret = flags & (BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_RET_FENTRY_RET);
 	if (save_ret)
@@ -3261,6 +3417,14 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 	stack_size += (sizeof(struct bpf_tramp_run_ctx) + 7) & ~0x7;
 	run_ctx_off = stack_size;
 
+	if (session->nr_links) {
+		for (i = 0; i < session->nr_links; i++) {
+			if (session->links[i]->link.prog->call_session_cookie)
+				stack_size += 8;
+		}
+	}
+	cookies_off = stack_size;
+
 	if (nr_regs > 6 && (flags & BPF_TRAMP_F_CALL_ORIG)) {
 		/* the space that used to pass arguments on-stack */
 		stack_size += (nr_regs - get_nr_used_regs(m)) * 8;
@@ -3349,6 +3513,13 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 			return -EINVAL;
 	}
 
+	if (session->nr_links) {
+		if (invoke_bpf_session_entry(m, &prog, session, regs_off,
+					     run_ctx_off, ret_off, sflags_off,
+					     cookies_off, image, rw_image))
+			return -EINVAL;
+	}
+
 	if (fmod_ret->nr_links) {
 		branches = kcalloc(fmod_ret->nr_links, sizeof(u8 *),
 				   GFP_KERNEL);
@@ -3414,6 +3585,15 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
 		}
 	}
 
+	if (session->nr_links) {
+		if (invoke_bpf_session_exit(m, &prog, session, regs_off,
+					    run_ctx_off, ret_off, sflags_off,
+					    cookies_off, image, rw_image)) {
+			ret = -EINVAL;
+			goto cleanup;
+		}
+	}
+
 	if (flags & BPF_TRAMP_F_RESTORE_REGS)
 		restore_regs(m, &prog, regs_off);
 
@@ -3483,9 +3663,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
 	int ret;
 	u32 size = image_end - image;
 
-	if (tlinks[BPF_TRAMP_SESSION].nr_links)
-		return -EOPNOTSUPP;
-
 	/* rw_image doesn't need to be in module memory range, so we can
 	 * use kvmalloc.
 	 */
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next v3 5/7] libbpf: add support for tracing session
  2025-10-26  3:01 [PATCH bpf-next v3 0/7] bpf: tracing session supporting Menglong Dong
                   ` (3 preceding siblings ...)
  2025-10-26  3:01 ` [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64 Menglong Dong
@ 2025-10-26  3:01 ` Menglong Dong
  2025-10-26  3:01 ` [PATCH bpf-next v3 6/7] selftests/bpf: add testcases " Menglong Dong
  2025-10-26  3:01 ` [PATCH bpf-next v3 7/7] selftests/bpf: test fsession mixed with fentry and fexit Menglong Dong
  6 siblings, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-10-26  3:01 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

Add BPF_TRACE_SESSION to libbpf and bpftool.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 tools/bpf/bpftool/common.c | 1 +
 tools/lib/bpf/bpf.c        | 2 ++
 tools/lib/bpf/libbpf.c     | 3 +++
 3 files changed, 6 insertions(+)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index e8daf963ecef..534be6cfa2be 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -1191,6 +1191,7 @@ const char *bpf_attach_type_input_str(enum bpf_attach_type t)
 	case BPF_TRACE_FENTRY:			return "fentry";
 	case BPF_TRACE_FEXIT:			return "fexit";
 	case BPF_MODIFY_RETURN:			return "mod_ret";
+	case BPF_TRACE_SESSION:			return "fsession";
 	case BPF_SK_REUSEPORT_SELECT:		return "sk_skb_reuseport_select";
 	case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE:	return "sk_skb_reuseport_select_or_migrate";
 	default:	return libbpf_bpf_attach_type_str(t);
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 339b19797237..caed2b689068 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -794,6 +794,7 @@ int bpf_link_create(int prog_fd, int target_fd,
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
 	case BPF_MODIFY_RETURN:
+	case BPF_TRACE_SESSION:
 	case BPF_LSM_MAC:
 		attr.link_create.tracing.cookie = OPTS_GET(opts, tracing.cookie, 0);
 		if (!OPTS_ZEROED(opts, tracing))
@@ -917,6 +918,7 @@ int bpf_link_create(int prog_fd, int target_fd,
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
 	case BPF_MODIFY_RETURN:
+	case BPF_TRACE_SESSION:
 		return bpf_raw_tracepoint_open(NULL, prog_fd);
 	default:
 		return libbpf_err(err);
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b90574f39d1c..50d10d01315a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -115,6 +115,7 @@ static const char * const attach_type_name[] = {
 	[BPF_TRACE_FENTRY]		= "trace_fentry",
 	[BPF_TRACE_FEXIT]		= "trace_fexit",
 	[BPF_MODIFY_RETURN]		= "modify_return",
+	[BPF_TRACE_SESSION]		= "trace_session",
 	[BPF_LSM_MAC]			= "lsm_mac",
 	[BPF_LSM_CGROUP]		= "lsm_cgroup",
 	[BPF_SK_LOOKUP]			= "sk_lookup",
@@ -9607,6 +9608,8 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("fentry.s+",		TRACING, BPF_TRACE_FENTRY, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
 	SEC_DEF("fmod_ret.s+",		TRACING, BPF_MODIFY_RETURN, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
 	SEC_DEF("fexit.s+",		TRACING, BPF_TRACE_FEXIT, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
+	SEC_DEF("fsession+",		TRACING, BPF_TRACE_SESSION, SEC_ATTACH_BTF, attach_trace),
+	SEC_DEF("fsession.s+",		TRACING, BPF_TRACE_SESSION, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_trace),
 	SEC_DEF("freplace+",		EXT, 0, SEC_ATTACH_BTF, attach_trace),
 	SEC_DEF("lsm+",			LSM, BPF_LSM_MAC, SEC_ATTACH_BTF, attach_lsm),
 	SEC_DEF("lsm.s+",		LSM, BPF_LSM_MAC, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_lsm),
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next v3 6/7] selftests/bpf: add testcases for tracing session
  2025-10-26  3:01 [PATCH bpf-next v3 0/7] bpf: tracing session supporting Menglong Dong
                   ` (4 preceding siblings ...)
  2025-10-26  3:01 ` [PATCH bpf-next v3 5/7] libbpf: add support for tracing session Menglong Dong
@ 2025-10-26  3:01 ` Menglong Dong
  2025-10-26  3:01 ` [PATCH bpf-next v3 7/7] selftests/bpf: test fsession mixed with fentry and fexit Menglong Dong
  6 siblings, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-10-26  3:01 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

Add testcases for BPF_TRACE_SESSION. The function arguments and return
value are tested both in the entry and exit. And the kfunc
bpf_tracing_is_exit() is also tested.

As the layout of the stack changed for fsession, so we also test
bpf_get_func_ip() for it.

Session cookie for fsession is also tested. Multiple fsession BPF progs is
attached to bpf_fentry_test1() and session cookie is read and write in
the testcase.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
v3:
- restructure the testcase by combine the testcases for session cookie and
  get_func_ip into one patch
---
 .../selftests/bpf/prog_tests/fsession_test.c  |  95 ++++++++
 .../selftests/bpf/progs/fsession_test.c       | 230 ++++++++++++++++++
 2 files changed, 325 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/fsession_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/fsession_test.c

diff --git a/tools/testing/selftests/bpf/prog_tests/fsession_test.c b/tools/testing/selftests/bpf/prog_tests/fsession_test.c
new file mode 100644
index 000000000000..d70bdb683691
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/fsession_test.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+#include <test_progs.h>
+#include "fsession_test.skel.h"
+
+static int check_result(struct fsession_test *skel)
+{
+	LIBBPF_OPTS(bpf_test_run_opts, topts);
+	int err, prog_fd;
+
+	/* Trigger test function calls */
+	prog_fd = bpf_program__fd(skel->progs.test1);
+	err = bpf_prog_test_run_opts(prog_fd, &topts);
+	if (!ASSERT_OK(err, "test_run_opts err"))
+		return err;
+	if (!ASSERT_OK(topts.retval, "test_run_opts retval"))
+		return topts.retval;
+
+	for (int i = 0; i < sizeof(*skel->bss) / sizeof(__u64); i++) {
+		if (!ASSERT_EQ(((__u64 *)skel->bss)[i], 1, "test_result"))
+			return -EINVAL;
+	}
+
+	/* some fields go to the "data" sections, not "bss" */
+	for (int i = 0; i < sizeof(*skel->data) / sizeof(__u64); i++) {
+		if (!ASSERT_EQ(((__u64 *)skel->data)[i], 1, "test_result"))
+			return -EINVAL;
+	}
+	return 0;
+}
+
+static void test_fsession_basic(void)
+{
+	struct fsession_test *skel = NULL;
+	int err;
+
+	skel = fsession_test__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "fsession_test__open_and_load"))
+		goto cleanup;
+
+	err = fsession_test__attach(skel);
+	if (!ASSERT_OK(err, "fsession_attach"))
+		goto cleanup;
+
+	check_result(skel);
+cleanup:
+	fsession_test__destroy(skel);
+}
+
+static void test_fsession_reattach(void)
+{
+	struct fsession_test *skel = NULL;
+	int err;
+
+	skel = fsession_test__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "fsession_test__open_and_load"))
+		goto cleanup;
+
+	/* First attach */
+	err = fsession_test__attach(skel);
+	if (!ASSERT_OK(err, "fsession_first_attach"))
+		goto cleanup;
+
+	if (check_result(skel))
+		goto cleanup;
+
+	/* Detach */
+	fsession_test__detach(skel);
+
+	/* Reset counters */
+	memset(skel->bss, 0, sizeof(*skel->bss));
+
+	/* Second attach */
+	err = fsession_test__attach(skel);
+	if (!ASSERT_OK(err, "fsession_second_attach"))
+		goto cleanup;
+
+	if (check_result(skel))
+		goto cleanup;
+
+cleanup:
+	fsession_test__destroy(skel);
+}
+
+void test_fsession_test(void)
+{
+#if !defined(__x86_64__)
+	test__skip();
+	return;
+#endif
+	if (test__start_subtest("fsession_basic"))
+		test_fsession_basic();
+	if (test__start_subtest("fsession_reattach"))
+		test_fsession_reattach();
+}
diff --git a/tools/testing/selftests/bpf/progs/fsession_test.c b/tools/testing/selftests/bpf/progs/fsession_test.c
new file mode 100644
index 000000000000..8f266d8e4b55
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/fsession_test.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2025 ChinaTelecom */
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+__u64 test1_entry_result = 0;
+__u64 test1_exit_result = 0;
+
+SEC("fsession/bpf_fentry_test1")
+int BPF_PROG(test1, int a, int ret)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		/* This is entry */
+		test1_entry_result = a == 1 && ret == 0;
+		/* Return 0 to allow exit to be called */
+		return 0;
+	}
+
+	/* This is exit */
+	test1_exit_result = a == 1 && ret == 2;
+	return 0;
+}
+
+__u64 test2_entry_result = 0;
+__u64 test2_exit_result = 1;
+
+SEC("fsession/bpf_fentry_test2")
+int BPF_PROG(test2, int a, __u64 b, int ret)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		/* This is entry */
+		test2_entry_result = a == 2 && b == 3 && ret == 0;
+		/* Return non-zero value to block exit call */
+		return 1;
+	}
+
+	/* This is exit - should not be called due to blocking */
+	test2_exit_result = 0;
+	return 0;
+}
+
+__u64 test3_entry_result = 0;
+__u64 test3_exit_result = 0;
+
+SEC("fsession/bpf_fentry_test3")
+int BPF_PROG(test3, char a, int b, __u64 c, int ret)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		test3_entry_result = a == 4 && b == 5 && c == 6 && ret == 0;
+		return 0;
+	}
+
+	test3_exit_result = a == 4 && b == 5 && c == 6 && ret == 15;
+	return 0;
+}
+
+__u64 test4_entry_result = 0;
+__u64 test4_exit_result = 0;
+
+SEC("fsession/bpf_fentry_test4")
+int BPF_PROG(test4, void *a, char b, int c, __u64 d, int ret)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		test4_entry_result = a == (void *)7 && b == 8 && c == 9 && d == 10 && ret == 0;
+		return 0;
+	}
+
+	test4_exit_result = a == (void *)7 && b == 8 && c == 9 && d == 10 && ret == 34;
+	return 0;
+}
+
+__u64 test5_entry_result = 0;
+__u64 test5_exit_result = 0;
+
+SEC("fsession/bpf_fentry_test5")
+int BPF_PROG(test5, __u64 a, void *b, short c, int d, __u64 e, int ret)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		test5_entry_result = a == 11 && b == (void *)12 && c == 13 && d == 14 &&
+			e == 15 && ret == 0;
+		return 0;
+	}
+
+	test5_exit_result = a == 11 && b == (void *)12 && c == 13 && d == 14 &&
+		e == 15 && ret == 65;
+	return 0;
+}
+
+__u64 test6_entry_result = 0;
+__u64 test6_exit_result = 1;
+
+SEC("fsession/bpf_fentry_test6")
+int BPF_PROG(test6, __u64 a, void *b, short c, int d, void *e, __u64 f, int ret)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		test6_entry_result = a == 16 && b == (void *)17 && c == 18 && d == 19 &&
+			e == (void *)20 && f == 21 && ret == 0;
+		return 1;
+	}
+
+	test6_exit_result = 0;
+	return 0;
+}
+
+__u64 test7_entry_result = 0;
+__u64 test7_exit_result = 0;
+
+SEC("fsession/bpf_fentry_test7")
+int BPF_PROG(test7, struct bpf_fentry_test_t *arg, int ret)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		if (!arg)
+			test7_entry_result = ret == 0;
+		return 0;
+	}
+
+	if (!arg)
+		test7_exit_result = 1;
+	return 0;
+}
+
+__u64 test8_entry_result = 0;
+__u64 test8_exit_result = 1;
+/*
+ * test1, test8 and test9 hook the same target to verify the "ret" is always
+ * 0 in the entry.
+ */
+SEC("fsession/bpf_fentry_test1")
+int BPF_PROG(test8, int a, int ret)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		test8_entry_result = a == 1 && ret == 0;
+		return -21;
+	}
+
+	/* This is exit */
+	test8_exit_result = 0;
+	return 0;
+}
+
+__u64 test9_entry_result = 0;
+__u64 test9_exit_result = 1;
+
+SEC("fsession/bpf_fentry_test1")
+int BPF_PROG(test9, int a, int ret)
+{
+	bool is_exit = bpf_tracing_is_exit(ctx);
+
+	if (!is_exit) {
+		test9_entry_result = a == 1 && ret == 0;
+		return -22;
+	}
+
+	test9_exit_result = 0;
+	return 0;
+}
+
+__u64 test10_entry_result = 0;
+__u64 test10_exit_result = 0;
+SEC("fsession/bpf_fentry_test1")
+int BPF_PROG(test10, int a)
+{
+	__u64 addr = bpf_get_func_ip(ctx);
+
+	if (bpf_tracing_is_exit(ctx))
+		test10_exit_result = (const void *) addr == &bpf_fentry_test1;
+	else
+		test10_entry_result = (const void *) addr == &bpf_fentry_test1;
+	return 0;
+}
+
+__u64 test11_entry_ok = 0;
+__u64 test11_exit_ok = 0;
+SEC("fsession/bpf_fentry_test1")
+int BPF_PROG(test11, int a)
+{
+	__u64 *cookie = bpf_fsession_cookie(ctx);
+
+	if (!bpf_tracing_is_exit(ctx)) {
+		if (cookie) {
+			*cookie = 0xAAAABBBBCCCCDDDDull;
+			test11_entry_ok = *cookie == 0xAAAABBBBCCCCDDDDull;
+		}
+		return 0;
+	}
+
+	if (cookie)
+		test11_exit_ok = *cookie == 0xAAAABBBBCCCCDDDDull;
+	return 0;
+}
+
+__u64 test12_entry_ok = 0;
+__u64 test12_exit_ok = 0;
+
+SEC("fsession/bpf_fentry_test1")
+int BPF_PROG(test12, int a)
+{
+	__u64 *cookie = bpf_fsession_cookie(ctx);
+
+	if (!bpf_tracing_is_exit(ctx)) {
+		if (cookie) {
+			*cookie = 0x1111222233334444ull;
+			test12_entry_ok = *cookie == 0x1111222233334444ull;
+		}
+		return 0;
+	}
+
+	if (cookie)
+		test12_exit_ok = *cookie == 0x1111222233334444ull;
+	return 0;
+}
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH bpf-next v3 7/7] selftests/bpf: test fsession mixed with fentry and fexit
  2025-10-26  3:01 [PATCH bpf-next v3 0/7] bpf: tracing session supporting Menglong Dong
                   ` (5 preceding siblings ...)
  2025-10-26  3:01 ` [PATCH bpf-next v3 6/7] selftests/bpf: add testcases " Menglong Dong
@ 2025-10-26  3:01 ` Menglong Dong
  6 siblings, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-10-26  3:01 UTC (permalink / raw)
  To: ast, jolsa
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

Test the fsession when it is used together with fentry, fexit.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
 .../selftests/bpf/progs/fsession_test.c       | 34 +++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/fsession_test.c b/tools/testing/selftests/bpf/progs/fsession_test.c
index 8f266d8e4b55..f78348e541a4 100644
--- a/tools/testing/selftests/bpf/progs/fsession_test.c
+++ b/tools/testing/selftests/bpf/progs/fsession_test.c
@@ -228,3 +228,37 @@ int BPF_PROG(test12, int a)
 		test12_exit_ok = *cookie == 0x1111222233334444ull;
 	return 0;
 }
+
+__u64 test13_entry_result = 0;
+__u64 test13_exit_result = 0;
+
+SEC("fsession/bpf_fentry_test1")
+int BPF_PROG(test13, int a, int ret)
+{
+	__u64 *cookie = bpf_fsession_cookie(ctx);
+
+	if (!bpf_tracing_is_exit(ctx)) {
+		test13_entry_result = a == 1 && ret == 0;
+		*cookie = 0x123456ULL;
+		return 0;
+	}
+
+	test13_exit_result = a == 1 && ret == 2 && *cookie == 0x123456ULL;
+	return 0;
+}
+
+__u64 test14_result = 0;
+SEC("fexit/bpf_fentry_test1")
+int BPF_PROG(test14, int a, int ret)
+{
+	test14_result = a == 1 && ret == 2;
+	return 0;
+}
+
+__u64 test15_result = 0;
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(test15, int a)
+{
+	test15_result = a == 1;
+	return 0;
+}
-- 
2.51.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION
  2025-10-26  3:01 ` [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION Menglong Dong
@ 2025-10-26  4:42   ` kernel test robot
  2025-10-27  2:44   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: kernel test robot @ 2025-10-26  4:42 UTC (permalink / raw)
  To: Menglong Dong, ast, jolsa
  Cc: oe-kbuild-all, daniel, john.fastabend, andrii, martin.lau,
	eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, mattbobrowski,
	rostedt, mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

Hi Menglong,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Menglong-Dong/bpf-add-tracing-session-support/20251026-110720
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20251026030143.23807-3-dongml2%40chinatelecom.cn
patch subject: [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION
config: i386-buildonly-randconfig-003-20251026 (https://download.01.org/0day-ci/archive/20251026/202510261253.qRd57kJv-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251026/202510261253.qRd57kJv-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510261253.qRd57kJv-lkp@intel.com/

All warnings (new ones prefixed by >>):

   kernel/trace/bpf_trace.c: In function '____bpf_trace_printk':
   kernel/trace/bpf_trace.c:377:9: warning: function '____bpf_trace_printk' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
     377 |         ret = bstr_printf(data.buf, MAX_BPRINTF_BUF, fmt, data.bin_args);
         |         ^~~
   kernel/trace/bpf_trace.c: In function '____bpf_trace_vprintk':
   kernel/trace/bpf_trace.c:433:9: warning: function '____bpf_trace_vprintk' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
     433 |         ret = bstr_printf(data.buf, MAX_BPRINTF_BUF, fmt, data.bin_args);
         |         ^~~
   kernel/trace/bpf_trace.c: In function '____bpf_seq_printf':
   kernel/trace/bpf_trace.c:475:9: warning: function '____bpf_seq_printf' might be a candidate for 'gnu_printf' format attribute [-Wsuggest-attribute=format]
     475 |         seq_bprintf(m, fmt, data.bin_args);
         |         ^~~~~~~~~~~
   kernel/trace/bpf_trace.c: In function 'bpf_fsession_cookie':
>> kernel/trace/bpf_trace.c:3379:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    3379 |         return (u64 *)((u64 *)ctx)[nr_args + 2];
         |                ^

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for I2C_K1
   Depends on [n]: I2C [=y] && HAS_IOMEM [=y] && (ARCH_SPACEMIT || COMPILE_TEST [=y]) && OF [=n]
   Selected by [y]:
   - MFD_SPACEMIT_P1 [=y] && HAS_IOMEM [=y] && (ARCH_SPACEMIT || COMPILE_TEST [=y]) && I2C [=y]


vim +3379 kernel/trace/bpf_trace.c

  3372	
  3373	__bpf_kfunc u64 *bpf_fsession_cookie(void *ctx)
  3374	{
  3375		/* This helper call is inlined by verifier. */
  3376		u64 nr_args = ((u64 *)ctx)[-1];
  3377	
  3378		/* ctx[nr_args + 2] is the session cookie address */
> 3379		return (u64 *)((u64 *)ctx)[nr_args + 2];
  3380	}
  3381	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION
  2025-10-26  3:01 ` [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION Menglong Dong
  2025-10-26  4:42   ` kernel test robot
@ 2025-10-27  2:44   ` kernel test robot
  2025-10-29  2:54   ` Menglong Dong
  2025-11-05  0:40   ` Andrii Nakryiko
  3 siblings, 0 replies; 24+ messages in thread
From: kernel test robot @ 2025-10-27  2:44 UTC (permalink / raw)
  To: Menglong Dong, ast, jolsa
  Cc: oe-kbuild-all, daniel, john.fastabend, andrii, martin.lau,
	eddyz87, song, yonghong.song, kpsingh, sdf, haoluo, mattbobrowski,
	rostedt, mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

Hi Menglong,

kernel test robot noticed the following build warnings:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Menglong-Dong/bpf-add-tracing-session-support/20251026-110720
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link:    https://lore.kernel.org/r/20251026030143.23807-3-dongml2%40chinatelecom.cn
patch subject: [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION
config: i386-randconfig-061-20251027 (https://download.01.org/0day-ci/archive/20251027/202510270955.mbxODFvZ-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251027/202510270955.mbxODFvZ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510270955.mbxODFvZ-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
   kernel/trace/bpf_trace.c:833:41: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected void [noderef] __user *[addressable] [assigned] [usertype] sival_ptr @@     got void * @@
   kernel/trace/bpf_trace.c:833:41: sparse:     expected void [noderef] __user *[addressable] [assigned] [usertype] sival_ptr
   kernel/trace/bpf_trace.c:833:41: sparse:     got void *
   kernel/trace/bpf_trace.c:3573:52: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3587:56: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3601:52: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3608:56: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3616:52: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c:3624:56: sparse: sparse: cast removes address space '__user' of expression
   kernel/trace/bpf_trace.c: note: in included file (through include/linux/rbtree.h, include/linux/mm_types.h, include/linux/mmzone.h, ...):
   include/linux/rcupdate.h:895:9: sparse: sparse: context imbalance in 'uprobe_prog_run' - unexpected unlock
>> kernel/trace/bpf_trace.c:3379:35: sparse: sparse: non size-preserving integer to pointer cast

vim +3379 kernel/trace/bpf_trace.c

  3372	
  3373	__bpf_kfunc u64 *bpf_fsession_cookie(void *ctx)
  3374	{
  3375		/* This helper call is inlined by verifier. */
  3376		u64 nr_args = ((u64 *)ctx)[-1];
  3377	
  3378		/* ctx[nr_args + 2] is the session cookie address */
> 3379		return (u64 *)((u64 *)ctx)[nr_args + 2];
  3380	}
  3381	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION
  2025-10-26  3:01 ` [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION Menglong Dong
  2025-10-26  4:42   ` kernel test robot
  2025-10-27  2:44   ` kernel test robot
@ 2025-10-29  2:54   ` Menglong Dong
  2025-11-05  0:40   ` Andrii Nakryiko
  3 siblings, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-10-29  2:54 UTC (permalink / raw)
  To: ast, jolsa, Menglong Dong
  Cc: daniel, john.fastabend, andrii, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

On 2025/10/26 11:01, Menglong Dong wrote:
> If TRACE_SESSION exists, we will use extra 8-bytes in the stack of the
> trampoline to store the flags that we needed, and the 8-bytes lie after
> the return value, which means ctx[nr_args + 1]. And we will store the
> flag "is_exit" to the first bit of it.
> 
> Introduce the kfunc bpf_tracing_is_exit(), which is used to tell if it
> is fexit currently. Meanwhile, inline it in the verifier.
> 
> Add the kfunc bpf_fsession_cookie(), which is similar to
> bpf_session_cookie() and return the address of the session cookie. The
> address of the session cookie is stored after session flags, which means
> ctx[nr_args + 2]. Inline this kfunc in the verifier too.
> 
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> v3:
> - merge the bpf_tracing_is_exit and bpf_fsession_cookie into a single
>   patch
> 
> v2:
> - store the session flags after return value, instead of before nr_args
> - inline the bpf_tracing_is_exit, as Jiri suggested
> ---
>  include/linux/bpf.h      |  1 +
>  kernel/bpf/verifier.c    | 33 ++++++++++++++++++++--
>  kernel/trace/bpf_trace.c | 59 ++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 88 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 6b5855c80fa6..ce55d3881c0d 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1736,6 +1736,7 @@ struct bpf_prog {
>  				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
>  				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
>  				call_get_func_ip:1, /* Do we call get_func_ip() */
> +				call_session_cookie:1, /* Do we call bpf_fsession_cookie() */
>  				tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
>  				sleepable:1;	/* BPF program is sleepable */
>  	enum bpf_prog_type	type;		/* Type of BPF program */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 818deb6a06e4..6f8aa4718d6f 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12293,6 +12293,8 @@ enum special_kfunc_type {
>  	KF___bpf_trap,
>  	KF_bpf_task_work_schedule_signal,
>  	KF_bpf_task_work_schedule_resume,
> +	KF_bpf_tracing_is_exit,
> +	KF_bpf_fsession_cookie,
>  };
>  
>  BTF_ID_LIST(special_kfunc_list)
> @@ -12365,6 +12367,8 @@ BTF_ID(func, bpf_res_spin_unlock_irqrestore)
>  BTF_ID(func, __bpf_trap)
>  BTF_ID(func, bpf_task_work_schedule_signal)
>  BTF_ID(func, bpf_task_work_schedule_resume)
> +BTF_ID(func, bpf_tracing_is_exit)
> +BTF_ID(func, bpf_fsession_cookie)
>  
>  static bool is_task_work_add_kfunc(u32 func_id)
>  {
> @@ -12419,7 +12423,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
>  	struct bpf_reg_state *reg = &regs[regno];
>  	bool arg_mem_size = false;
>  
> -	if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx])
> +	if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
> +	    meta->func_id == special_kfunc_list[KF_bpf_tracing_is_exit] ||
> +	    meta->func_id == special_kfunc_list[KF_bpf_fsession_cookie])
>  		return KF_ARG_PTR_TO_CTX;
>  
>  	/* In this function, we verify the kfunc's BTF as per the argument type,
> @@ -13912,7 +13918,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  		}
>  	}
>  
> -	if (meta.func_id == special_kfunc_list[KF_bpf_session_cookie]) {
> +	if (meta.func_id == special_kfunc_list[KF_bpf_session_cookie] ||
> +	    meta.func_id == special_kfunc_list[KF_bpf_fsession_cookie]) {
>  		meta.r0_size = sizeof(u64);
>  		meta.r0_rdonly = false;
>  	}
> @@ -14193,6 +14200,9 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  			return err;
>  	}
>  
> +	if (meta.func_id == special_kfunc_list[KF_bpf_fsession_cookie])
> +		env->prog->call_session_cookie = true;
> +
>  	return 0;
>  }
>  
> @@ -22012,6 +22022,25 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  		   desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
>  		insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1);
>  		*cnt = 1;
> +	} else if (desc->func_id == special_kfunc_list[KF_bpf_tracing_is_exit]) {
> +		/* Load nr_args from ctx - 8 */
> +		insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
> +		/* add rax, 1 */
> +		insn_buf[1] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 1);
> +		insn_buf[2] = BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 3);
> +		insn_buf[3] = BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1);
> +		insn_buf[4] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0);
> +		insn_buf[5] = BPF_ALU64_IMM(BPF_AND, BPF_REG_0, 1);
> +		*cnt = 6;
> +	} else if (desc->func_id == special_kfunc_list[KF_bpf_fsession_cookie]) {
> +		/* Load nr_args from ctx - 8 */
> +		insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
> +		/* add rax, 2 */
> +		insn_buf[1] = BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 2);
> +		insn_buf[2] = BPF_ALU64_IMM(BPF_LSH, BPF_REG_0, 3);
> +		insn_buf[3] = BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1);
> +		insn_buf[4] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0);
> +		*cnt = 5;
>  	}
>  
>  	if (env->insn_aux_data[insn_idx].arg_prog) {
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 4f87c16d915a..4a8568bd654d 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -3356,12 +3356,65 @@ static const struct btf_kfunc_id_set bpf_kprobe_multi_kfunc_set = {
>  	.filter = bpf_kprobe_multi_filter,
>  };
>  
> -static int __init bpf_kprobe_multi_kfuncs_init(void)
> +__bpf_kfunc_start_defs();
> +
> +__bpf_kfunc bool bpf_tracing_is_exit(void *ctx)
>  {
> -	return register_btf_kfunc_id_set(BPF_PROG_TYPE_KPROBE, &bpf_kprobe_multi_kfunc_set);
> +	/* This helper call is inlined by verifier. */
> +	u64 nr_args = ((u64 *)ctx)[-1];
> +
> +	/*
> +	 * ctx[nr_args + 1] is the session flags, and the last bit is
> +	 * is_exit.
> +	 */
> +	return ((u64 *)ctx)[nr_args + 1] & 1;
> +}
> +
> +__bpf_kfunc u64 *bpf_fsession_cookie(void *ctx)
> +{
> +	/* This helper call is inlined by verifier. */
> +	u64 nr_args = ((u64 *)ctx)[-1];
> +
> +	/* ctx[nr_args + 2] is the session cookie address */
> +	return (u64 *)((u64 *)ctx)[nr_args + 2];

This casting cause build warning in 32-bits arch. I'll make it
like this in the next version after more human comments
on this series:

    return (u64 *)(long)((u64 *)ctx)[nr_args + 2];

> +}
> +
> +__bpf_kfunc_end_defs();
> +
> +BTF_KFUNCS_START(tracing_kfunc_set_ids)
> +BTF_ID_FLAGS(func, bpf_tracing_is_exit, KF_FASTCALL)
> +BTF_ID_FLAGS(func, bpf_fsession_cookie, KF_FASTCALL)
> +BTF_KFUNCS_END(tracing_kfunc_set_ids)
> +
> +static int bpf_tracing_filter(const struct bpf_prog *prog, u32 kfunc_id)
> +{
> +	if (!btf_id_set8_contains(&tracing_kfunc_set_ids, kfunc_id))
> +		return 0;
> +
> +	if (prog->type != BPF_PROG_TYPE_TRACING ||
> +	    prog->expected_attach_type != BPF_TRACE_SESSION)
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +static const struct btf_kfunc_id_set bpf_tracing_kfunc_set = {
> +	.owner = THIS_MODULE,
> +	.set = &tracing_kfunc_set_ids,
> +	.filter = bpf_tracing_filter,
> +};
> +
> +static int __init bpf_trace_kfuncs_init(void)
> +{
> +	int err = 0;
> +
> +	err = err ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_KPROBE, &bpf_kprobe_multi_kfunc_set);
> +	err = err ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_tracing_kfunc_set);
> +
> +	return err;
>  }
>  
> -late_initcall(bpf_kprobe_multi_kfuncs_init);
> +late_initcall(bpf_trace_kfuncs_init);
>  
>  typedef int (*copy_fn_t)(void *dst, const void *src, u32 size, struct task_struct *tsk);
>  
> 





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-10-26  3:01 ` [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64 Menglong Dong
@ 2025-10-31  1:42   ` Alexei Starovoitov
  2025-10-31  3:36     ` Menglong Dong
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2025-10-31  1:42 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexei Starovoitov, Jiri Olsa, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Matt Bobrowski, Steven Rostedt, Masami Hiramatsu, Leon Hwang,
	jiang.biao, bpf, LKML, linux-trace-kernel

On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
> invoke_bpf_session_exit is introduced for this purpose.
>
> In invoke_bpf_session_entry(), we will check if the return value of the
> fentry is 0, and set the corresponding session flag if not. And in
> invoke_bpf_session_exit(), we will check if the corresponding flag is
> set. If set, the fexit will be skipped.
>
> As designed, the session flags and session cookie address is stored after
> the return value, and the stack look like this:
>
>   cookie ptr    -> 8 bytes
>   session flags -> 8 bytes
>   return value  -> 8 bytes
>   argN          -> 8 bytes
>   ...
>   arg1          -> 8 bytes
>   nr_args       -> 8 bytes
>   ...
>   cookieN       -> 8 bytes
>   cookie1       -> 8 bytes
>
> In the entry of the session, we will clear the return value, so the fentry
> will always get 0 with ctx[nr_args] or bpf_get_func_ret().
>
> Before the execution of the BPF prog, the "cookie ptr" will be filled with
> the corresponding cookie address, which is done in
> invoke_bpf_session_entry() and invoke_bpf_session_exit().

...

> +       if (session->nr_links) {
> +               for (i = 0; i < session->nr_links; i++) {
> +                       if (session->links[i]->link.prog->call_session_cookie)
> +                               stack_size += 8;
> +               }
> +       }
> +       cookies_off = stack_size;

This is not great. It's all root and such,
but if somebody attaches 64 progs that use session cookies
then the trampoline will consume 64*8 of stack space just for
these cookies. Plus more for args, cookie, ptr, session_flag, etc.
Sigh.
I understand that cookie from one session shouldn't interfere
with another, but it's all getting quite complex
especially when everything is in assembly.
And this is just x86 JIT. Other JITs would need to copy
this complex logic :(
At this point I'm not sure that "symmetry with kprobe_multi_session"
is justified as a reason to add all that.
We don't have a kprobe_session for individual kprobes after all.

I think we better spend the energy designing a mechanism to
connect existing fentry prog with fexit prog without hacking
it through a stack in the bpf trampoline.

Sorry.

pw-bot: cr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-10-31  1:42   ` Alexei Starovoitov
@ 2025-10-31  3:36     ` Menglong Dong
  2025-10-31 17:57       ` Alexei Starovoitov
  0 siblings, 1 reply; 24+ messages in thread
From: Menglong Dong @ 2025-10-31  3:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Jiri Olsa, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Matt Bobrowski, Steven Rostedt, Masami Hiramatsu, Leon Hwang,
	jiang.biao, bpf, LKML, linux-trace-kernel

On Fri, Oct 31, 2025 at 9:42 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
> > invoke_bpf_session_exit is introduced for this purpose.
> >
> > In invoke_bpf_session_entry(), we will check if the return value of the
> > fentry is 0, and set the corresponding session flag if not. And in
> > invoke_bpf_session_exit(), we will check if the corresponding flag is
> > set. If set, the fexit will be skipped.
> >
> > As designed, the session flags and session cookie address is stored after
> > the return value, and the stack look like this:
> >
> >   cookie ptr    -> 8 bytes
> >   session flags -> 8 bytes
> >   return value  -> 8 bytes
> >   argN          -> 8 bytes
> >   ...
> >   arg1          -> 8 bytes
> >   nr_args       -> 8 bytes
> >   ...
> >   cookieN       -> 8 bytes
> >   cookie1       -> 8 bytes
> >
> > In the entry of the session, we will clear the return value, so the fentry
> > will always get 0 with ctx[nr_args] or bpf_get_func_ret().
> >
> > Before the execution of the BPF prog, the "cookie ptr" will be filled with
> > the corresponding cookie address, which is done in
> > invoke_bpf_session_entry() and invoke_bpf_session_exit().
>
> ...
>
> > +       if (session->nr_links) {
> > +               for (i = 0; i < session->nr_links; i++) {
> > +                       if (session->links[i]->link.prog->call_session_cookie)
> > +                               stack_size += 8;
> > +               }
> > +       }
> > +       cookies_off = stack_size;
>
> This is not great. It's all root and such,
> but if somebody attaches 64 progs that use session cookies
> then the trampoline will consume 64*8 of stack space just for
> these cookies. Plus more for args, cookie, ptr, session_flag, etc.

The session cookie stuff does take a lot of stack memory.
For fprobe, it will store the cookie into its shadow stack, which
can free the stack.

How about we remove the session cookie stuff? Therefore,
only 8-bytes(session flags) are used on the stack. And if we reuse
the nr_regs slot, no stack will be consumed. However, it will make
thing complex, which I don't think we should do.

> Sigh.
> I understand that cookie from one session shouldn't interfere
> with another, but it's all getting quite complex
> especially when everything is in assembly.
> And this is just x86 JIT. Other JITs would need to copy
> this complex logic :(

Without the session cookie, it will be much easier to implement
in another arch. And with the hepler of AI(such as cursor), it can
be done easily ;)

> At this point I'm not sure that "symmetry with kprobe_multi_session"
> is justified as a reason to add all that.
> We don't have a kprobe_session for individual kprobes after all.

As for my case, the tracing session can make my code much
simpler, as I always use the fentry+fexit to hook a function. And
the fexit skipping according to the return value of fentry can also
achieve better performance.

AFAIT, the mast usage of session cookie in kprobe is passing the
function arguments to the exit. For tracing, we can get the args
in the fexit. So the session cookie in tracing is not as important as
in kprobe.

What do you think?

Thanks!
Menglong Dong

> I think we better spend the energy designing a mechanism to
> connect existing fentry prog with fexit prog without hacking
> it through a stack in the bpf trampoline.
>
> Sorry.
>
> pw-bot: cr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-10-31  3:36     ` Menglong Dong
@ 2025-10-31 17:57       ` Alexei Starovoitov
  2025-11-03  6:00         ` Leon Hwang
  2025-11-03 11:24         ` Menglong Dong
  0 siblings, 2 replies; 24+ messages in thread
From: Alexei Starovoitov @ 2025-10-31 17:57 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexei Starovoitov, Jiri Olsa, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Matt Bobrowski, Steven Rostedt, Masami Hiramatsu, Leon Hwang,
	jiang.biao, bpf, LKML, linux-trace-kernel

On Thu, Oct 30, 2025 at 8:36 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> On Fri, Oct 31, 2025 at 9:42 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > >
> > > Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
> > > invoke_bpf_session_exit is introduced for this purpose.
> > >
> > > In invoke_bpf_session_entry(), we will check if the return value of the
> > > fentry is 0, and set the corresponding session flag if not. And in
> > > invoke_bpf_session_exit(), we will check if the corresponding flag is
> > > set. If set, the fexit will be skipped.
> > >
> > > As designed, the session flags and session cookie address is stored after
> > > the return value, and the stack look like this:
> > >
> > >   cookie ptr    -> 8 bytes
> > >   session flags -> 8 bytes
> > >   return value  -> 8 bytes
> > >   argN          -> 8 bytes
> > >   ...
> > >   arg1          -> 8 bytes
> > >   nr_args       -> 8 bytes
> > >   ...
> > >   cookieN       -> 8 bytes
> > >   cookie1       -> 8 bytes
> > >
> > > In the entry of the session, we will clear the return value, so the fentry
> > > will always get 0 with ctx[nr_args] or bpf_get_func_ret().
> > >
> > > Before the execution of the BPF prog, the "cookie ptr" will be filled with
> > > the corresponding cookie address, which is done in
> > > invoke_bpf_session_entry() and invoke_bpf_session_exit().
> >
> > ...
> >
> > > +       if (session->nr_links) {
> > > +               for (i = 0; i < session->nr_links; i++) {
> > > +                       if (session->links[i]->link.prog->call_session_cookie)
> > > +                               stack_size += 8;
> > > +               }
> > > +       }
> > > +       cookies_off = stack_size;
> >
> > This is not great. It's all root and such,
> > but if somebody attaches 64 progs that use session cookies
> > then the trampoline will consume 64*8 of stack space just for
> > these cookies. Plus more for args, cookie, ptr, session_flag, etc.
>
> The session cookie stuff does take a lot of stack memory.
> For fprobe, it will store the cookie into its shadow stack, which
> can free the stack.
>
> How about we remove the session cookie stuff? Therefore,
> only 8-bytes(session flags) are used on the stack. And if we reuse
> the nr_regs slot, no stack will be consumed. However, it will make
> thing complex, which I don't think we should do.
>
> > Sigh.
> > I understand that cookie from one session shouldn't interfere
> > with another, but it's all getting quite complex
> > especially when everything is in assembly.
> > And this is just x86 JIT. Other JITs would need to copy
> > this complex logic :(
>
> Without the session cookie, it will be much easier to implement
> in another arch. And with the hepler of AI(such as cursor), it can
> be done easily ;)

The reality is the opposite. We see plenty of AI generated garbage.
Please stay human.

>
> > At this point I'm not sure that "symmetry with kprobe_multi_session"
> > is justified as a reason to add all that.
> > We don't have a kprobe_session for individual kprobes after all.
>
> As for my case, the tracing session can make my code much
> simpler, as I always use the fentry+fexit to hook a function. And
> the fexit skipping according to the return value of fentry can also
> achieve better performance.

I don't buy the argument that 'if (cond) goto skip_fexit_prog'
in the generated trampoline is measurably faster than
'if (cond) return' inside the fexit program.

> AFAIT, the mast usage of session cookie in kprobe is passing the
> function arguments to the exit. For tracing, we can get the args
> in the fexit. So the session cookie in tracing is not as important as
> in kprobe.

Since kprobe_multi was introduced, retsnoop and tetragon adopted
it to do mass attach, and both use bpf_get_attach_cookie().
While both don't use bpf_session_cookie().
Searching things around I also didn't find a single real user
of bpf_session_cookie() other than selftests/bpf and Jiri's slides :)

So, doing all this work in trampoline for bpf_session_cookie()
doesn't seem warranted, but with that doing session in trampoline
also doesn't look useful, since the only benefit vs a pair
of fentry/fexit is skip of fexit, which can be done already.
Plus complexity in all JITs... so, I say, let's shelve the whole thing for now.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-10-31 17:57       ` Alexei Starovoitov
@ 2025-11-03  6:00         ` Leon Hwang
  2025-11-03 11:24         ` Menglong Dong
  1 sibling, 0 replies; 24+ messages in thread
From: Leon Hwang @ 2025-11-03  6:00 UTC (permalink / raw)
  To: Alexei Starovoitov, Menglong Dong
  Cc: Alexei Starovoitov, Jiri Olsa, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Matt Bobrowski, Steven Rostedt, Masami Hiramatsu, Leon Hwang,
	jiang.biao, bpf, LKML, linux-trace-kernel



On 1/11/25 01:57, Alexei Starovoitov wrote:
> On Thu, Oct 30, 2025 at 8:36 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>>
>> On Fri, Oct 31, 2025 at 9:42 AM Alexei Starovoitov
>> <alexei.starovoitov@gmail.com> wrote:
>>>
>>> On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@gmail.com> wrote:

[...]

>>
>> Without the session cookie, it will be much easier to implement
>> in another arch. And with the hepler of AI(such as cursor), it can
>> be done easily ;)
>
> The reality is the opposite. We see plenty of AI generated garbage.
> Please stay human.
>
>>
>>> At this point I'm not sure that "symmetry with kprobe_multi_session"
>>> is justified as a reason to add all that.
>>> We don't have a kprobe_session for individual kprobes after all.
>>
>> As for my case, the tracing session can make my code much
>> simpler, as I always use the fentry+fexit to hook a function. And
>> the fexit skipping according to the return value of fentry can also
>> achieve better performance.
>
> I don't buy the argument that 'if (cond) goto skip_fexit_prog'
> in the generated trampoline is measurably faster than
> 'if (cond) return' inside the fexit program.
>
>> AFAIT, the mast usage of session cookie in kprobe is passing the
>> function arguments to the exit. For tracing, we can get the args
>> in the fexit. So the session cookie in tracing is not as important as
>> in kprobe.
>
> Since kprobe_multi was introduced, retsnoop and tetragon adopted
> it to do mass attach, and both use bpf_get_attach_cookie().
> While both don't use bpf_session_cookie().
> Searching things around I also didn't find a single real user
> of bpf_session_cookie() other than selftests/bpf and Jiri's slides :)
>
> So, doing all this work in trampoline for bpf_session_cookie()
> doesn't seem warranted, but with that doing session in trampoline
> also doesn't look useful, since the only benefit vs a pair
> of fentry/fexit is skip of fexit, which can be done already.
> Plus complexity in all JITs... so, I say, let's shelve the whole thing for now.
>

As for bpfsnoop[1], the new attach type is indeed helpful.

For example, when tracing hundreds of kernel functions with both fentry
and fexit programs on a 48-core bare-metal server, the following results
were observed:

bpfsnoop -k 'tcp_connect' --output-fgraph --limit-events 1 -D
2025/11/03 13:32:11 Tracing 586 tracees costs 10.190874525s
2025/11/03 13:32:14 bpfsnoop is exiting..
2025/11/03 13:32:45 Untracing 586 tracees costs 30.667462289s

With the new attach type, about half of the time can be saved.

For bpfsnoop, the return-value control could help avoid redundant
filtering in fexit programs, though it's not strictly necessary.

For instance, when tracing udp_rcv with both packet and argument filters:

bpfsnoop -k '(b)udp_rcv' --filter-pkt 'host 1.1.1.1 and udp src port 53'
--filter-arg 'skb->dev->ifindex == 6' -v
2025/11/03 13:52:55 Tracing(fentry) kernel function udp_rcv
2025/11/03 13:52:55 Tracing(fexit) kernel function udp_rcv

With return-value control, the filtering in fexit could be skipped.

Links:
[1] https://github.com/bpfsnoop/bpfsnoop

Thanks,
Leon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-10-31 17:57       ` Alexei Starovoitov
  2025-11-03  6:00         ` Leon Hwang
@ 2025-11-03 11:24         ` Menglong Dong
  2025-11-05  0:40           ` Andrii Nakryiko
  1 sibling, 1 reply; 24+ messages in thread
From: Menglong Dong @ 2025-11-03 11:24 UTC (permalink / raw)
  To: Menglong Dong, Alexei Starovoitov
  Cc: Alexei Starovoitov, Jiri Olsa, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Martin KaFai Lau, Eduard, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Matt Bobrowski, Steven Rostedt, Masami Hiramatsu, Leon Hwang,
	jiang.biao, bpf, LKML, linux-trace-kernel

On 2025/11/1 01:57, Alexei Starovoitov wrote:
> On Thu, Oct 30, 2025 at 8:36 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >
> > On Fri, Oct 31, 2025 at 9:42 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > >
> > > > Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
> > > > invoke_bpf_session_exit is introduced for this purpose.
> > > >
> > > > In invoke_bpf_session_entry(), we will check if the return value of the
> > > > fentry is 0, and set the corresponding session flag if not. And in
> > > > invoke_bpf_session_exit(), we will check if the corresponding flag is
> > > > set. If set, the fexit will be skipped.
> > > >
> > > > As designed, the session flags and session cookie address is stored after
> > > > the return value, and the stack look like this:
> > > >
> > > >   cookie ptr    -> 8 bytes
> > > >   session flags -> 8 bytes
> > > >   return value  -> 8 bytes
> > > >   argN          -> 8 bytes
> > > >   ...
> > > >   arg1          -> 8 bytes
> > > >   nr_args       -> 8 bytes
> > > >   ...
> > > >   cookieN       -> 8 bytes
> > > >   cookie1       -> 8 bytes
> > > >
> > > > In the entry of the session, we will clear the return value, so the fentry
> > > > will always get 0 with ctx[nr_args] or bpf_get_func_ret().
> > > >
> > > > Before the execution of the BPF prog, the "cookie ptr" will be filled with
> > > > the corresponding cookie address, which is done in
> > > > invoke_bpf_session_entry() and invoke_bpf_session_exit().
> > >
> > > ...
> > >
> > > > +       if (session->nr_links) {
> > > > +               for (i = 0; i < session->nr_links; i++) {
> > > > +                       if (session->links[i]->link.prog->call_session_cookie)
> > > > +                               stack_size += 8;
> > > > +               }
> > > > +       }
> > > > +       cookies_off = stack_size;
> > >
> > > This is not great. It's all root and such,
> > > but if somebody attaches 64 progs that use session cookies
> > > then the trampoline will consume 64*8 of stack space just for
> > > these cookies. Plus more for args, cookie, ptr, session_flag, etc.
> >
> > The session cookie stuff does take a lot of stack memory.
> > For fprobe, it will store the cookie into its shadow stack, which
> > can free the stack.
> >
> > How about we remove the session cookie stuff? Therefore,
> > only 8-bytes(session flags) are used on the stack. And if we reuse
> > the nr_regs slot, no stack will be consumed. However, it will make
> > thing complex, which I don't think we should do.
> >
> > > Sigh.
> > > I understand that cookie from one session shouldn't interfere
> > > with another, but it's all getting quite complex
> > > especially when everything is in assembly.
> > > And this is just x86 JIT. Other JITs would need to copy
> > > this complex logic :(
> >
> > Without the session cookie, it will be much easier to implement
> > in another arch. And with the hepler of AI(such as cursor), it can
> > be done easily ;)
> 
> The reality is the opposite. We see plenty of AI generated garbage.
> Please stay human.

It's not wised to make the AI generate all the things for us, and
I find the AI is not good at planing and designing, but good at
implement a single thing, such as generating a instruction or
machine code from the C code. Of course I can generate it by
myself with clang, but it still save a lot efforts.

> 
> >
> > > At this point I'm not sure that "symmetry with kprobe_multi_session"
> > > is justified as a reason to add all that.
> > > We don't have a kprobe_session for individual kprobes after all.
> >
> > As for my case, the tracing session can make my code much
> > simpler, as I always use the fentry+fexit to hook a function. And
> > the fexit skipping according to the return value of fentry can also
> > achieve better performance.
> 
> I don't buy the argument that 'if (cond) goto skip_fexit_prog'
> in the generated trampoline is measurably faster than
> 'if (cond) return' inside the fexit program.

For now, there maybe no performance improvement. And I
were playing to implement such logic in the next step:

If there are no fexit and modify_return progs in the trampoline,
I'll check if "session_flags == ALL_SKIP" after the entry of the
fsession, and skip the origin call in such case and return directly
in such case. Therefore, the performance of fsession is almost
the same as fentry. According to our testing, the performance
of fexit is half of fentry. So the performance in this case has a
100% increasing.

This is a just rough thought, not sure if it works.

In fact, the performance improvement can be achieved more in the
bpf prog. For example, I want to trace the return value of skb_clone()
with the TCP port being 8080, I have to write following code:

SEC("fentry/skb_clone")
int clone_entry(struct sk_buff *skb)
{
    /* parse the skb and do some filter
     *    |
     * return 0 if not TCP + 80 port
     *    |
     * save the address of "skb" to the hash table "m_context"
     *    |
     * output the skb + timestamp
     */
    return 0;
}

SEC("fexit/skb_clone")
int clone_exit(struct sk_buff *skb, u64 ret)
{
    /* lookup if skb in the "m_context", return 0 if not
     *    |
     * output the skb + return value + timestamp
     *    |
     * delete the "skb" from the m_context
     */
    return 0;
}

I have to maintain a hash table "m_context" to check if
the exit of skb_clone() is what I want. It works, but it has
extra overhead in the hash table lookup and deleting. What's
more, it's not stable on some complex case.

The problem is that we don't has a way to pair the
fentry/fexit on the stack(do we?).

> 
> > AFAIT, the mast usage of session cookie in kprobe is passing the
> > function arguments to the exit. For tracing, we can get the args
> > in the fexit. So the session cookie in tracing is not as important as
> > in kprobe.
> 
> Since kprobe_multi was introduced, retsnoop and tetragon adopted
> it to do mass attach, and both use bpf_get_attach_cookie().
> While both don't use bpf_session_cookie().
> Searching things around I also didn't find a single real user
> of bpf_session_cookie() other than selftests/bpf and Jiri's slides :)
> 
> So, doing all this work in trampoline for bpf_session_cookie()
> doesn't seem warranted, but with that doing session in trampoline
> also doesn't look useful, since the only benefit vs a pair
> of fentry/fexit is skip of fexit, which can be done already.
> Plus complexity in all JITs... so, I say, let's shelve the whole thing for now.

Yeah, the session cookie is not useful in my case too, I'm OK to
skip it.

The pair of fentry/fexit have some use cases(my nettrace and
Leon's bpfsnoop, at least). Not sure if the reason above is sufficient,
please ignore the message if it is not :)

The way we implement the trampoline makes it arch related and
complex. I tried to rewrite it with C code, but failed. It's too difficult
and I suspect it's impossible :/

Thanks!
Menglong Dong

> 
> 





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-11-03 11:24         ` Menglong Dong
@ 2025-11-05  0:40           ` Andrii Nakryiko
  2025-11-05  2:43             ` Alexei Starovoitov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2025-11-05  0:40 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Menglong Dong, Alexei Starovoitov, Alexei Starovoitov, Jiri Olsa,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Eduard, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Matt Bobrowski, Steven Rostedt,
	Masami Hiramatsu, Leon Hwang, jiang.biao, bpf, LKML,
	linux-trace-kernel

On Mon, Nov 3, 2025 at 3:29 AM Menglong Dong <menglong.dong@linux.dev> wrote:
>
> On 2025/11/1 01:57, Alexei Starovoitov wrote:
> > On Thu, Oct 30, 2025 at 8:36 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > >
> > > On Fri, Oct 31, 2025 at 9:42 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > >
> > > > > Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
> > > > > invoke_bpf_session_exit is introduced for this purpose.
> > > > >
> > > > > In invoke_bpf_session_entry(), we will check if the return value of the
> > > > > fentry is 0, and set the corresponding session flag if not. And in
> > > > > invoke_bpf_session_exit(), we will check if the corresponding flag is
> > > > > set. If set, the fexit will be skipped.
> > > > >
> > > > > As designed, the session flags and session cookie address is stored after
> > > > > the return value, and the stack look like this:
> > > > >
> > > > >   cookie ptr    -> 8 bytes
> > > > >   session flags -> 8 bytes
> > > > >   return value  -> 8 bytes
> > > > >   argN          -> 8 bytes
> > > > >   ...
> > > > >   arg1          -> 8 bytes
> > > > >   nr_args       -> 8 bytes

Let's look at "cookie ptr", "session flags", and "nr_args". We can
combine all of them into a single 8 byte slot: assign each session
program index 0, 1, ..., Nsession. 1 bit for entry/exit flag, few bits
for session prog index, and few more bits for nr_args, and we still
will have tons of space for some other additions in the future. From
that session program index you can calculate cookieN address to return
to user.

And we should look whether moving nr_args into bpf_run_ctx would
actually minimize amount of trampoline assembly code, as we can
implement a bunch of stuff in pure C. (well, BPF verifier inlining is
a separate thing, but it can be mostly arch-independent, right?)

> > > > >   ...
> > > > >   cookieN       -> 8 bytes
> > > > >   cookie1       -> 8 bytes
> > > > >
> > > > > In the entry of the session, we will clear the return value, so the fentry
> > > > > will always get 0 with ctx[nr_args] or bpf_get_func_ret().
> > > > >
> > > > > Before the execution of the BPF prog, the "cookie ptr" will be filled with
> > > > > the corresponding cookie address, which is done in
> > > > > invoke_bpf_session_entry() and invoke_bpf_session_exit().
> > > >

[...]

> I have to maintain a hash table "m_context" to check if
> the exit of skb_clone() is what I want. It works, but it has
> extra overhead in the hash table lookup and deleting. What's
> more, it's not stable on some complex case.
>
> The problem is that we don't has a way to pair the
> fentry/fexit on the stack(do we?).
>
> >
> > > AFAIT, the mast usage of session cookie in kprobe is passing the
> > > function arguments to the exit. For tracing, we can get the args
> > > in the fexit. So the session cookie in tracing is not as important as
> > > in kprobe.
> >
> > Since kprobe_multi was introduced, retsnoop and tetragon adopted
> > it to do mass attach, and both use bpf_get_attach_cookie().
> > While both don't use bpf_session_cookie().
> > Searching things around I also didn't find a single real user
> > of bpf_session_cookie() other than selftests/bpf and Jiri's slides :)
> >
> > So, doing all this work in trampoline for bpf_session_cookie()
> > doesn't seem warranted, but with that doing session in trampoline
> > also doesn't look useful, since the only benefit vs a pair
> > of fentry/fexit is skip of fexit, which can be done already.
> > Plus complexity in all JITs... so, I say, let's shelve the whole thing for now.
>
> Yeah, the session cookie is not useful in my case too, I'm OK to
> skip it.

hmm, in my view session cookie is one of the main reasons why
"session" concept is useful :) that's certainly the case for
multi-kprobe session, IMO (and yes, I haven't yet used it in retsnoop,
but it's just because I'm lazy, not because it's not useful... )

>
> The pair of fentry/fexit have some use cases(my nettrace and
> Leon's bpfsnoop, at least). Not sure if the reason above is sufficient,
> please ignore the message if it is not :)
>
> The way we implement the trampoline makes it arch related and
> complex. I tried to rewrite it with C code, but failed. It's too difficult
> and I suspect it's impossible :/
>
> Thanks!
> Menglong Dong
>
> >
> >
>
>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION
  2025-10-26  3:01 ` [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION Menglong Dong
                     ` (2 preceding siblings ...)
  2025-10-29  2:54   ` Menglong Dong
@ 2025-11-05  0:40   ` Andrii Nakryiko
  3 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2025-11-05  0:40 UTC (permalink / raw)
  To: Menglong Dong
  Cc: ast, jolsa, daniel, john.fastabend, andrii, martin.lau, eddyz87,
	song, yonghong.song, kpsingh, sdf, haoluo, mattbobrowski, rostedt,
	mhiramat, leon.hwang, jiang.biao, bpf, linux-kernel,
	linux-trace-kernel

On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> If TRACE_SESSION exists, we will use extra 8-bytes in the stack of the
> trampoline to store the flags that we needed, and the 8-bytes lie after
> the return value, which means ctx[nr_args + 1]. And we will store the
> flag "is_exit" to the first bit of it.
>
> Introduce the kfunc bpf_tracing_is_exit(), which is used to tell if it
> is fexit currently. Meanwhile, inline it in the verifier.
>
> Add the kfunc bpf_fsession_cookie(), which is similar to
> bpf_session_cookie() and return the address of the session cookie. The
> address of the session cookie is stored after session flags, which means
> ctx[nr_args + 2]. Inline this kfunc in the verifier too.
>
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> Co-developed-by: Leon Hwang <leon.hwang@linux.dev>
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
> v3:
> - merge the bpf_tracing_is_exit and bpf_fsession_cookie into a single
>   patch
>
> v2:
> - store the session flags after return value, instead of before nr_args
> - inline the bpf_tracing_is_exit, as Jiri suggested
> ---
>  include/linux/bpf.h      |  1 +
>  kernel/bpf/verifier.c    | 33 ++++++++++++++++++++--
>  kernel/trace/bpf_trace.c | 59 ++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 88 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 6b5855c80fa6..ce55d3881c0d 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1736,6 +1736,7 @@ struct bpf_prog {
>                                 enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
>                                 call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
>                                 call_get_func_ip:1, /* Do we call get_func_ip() */
> +                               call_session_cookie:1, /* Do we call bpf_fsession_cookie() */
>                                 tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */
>                                 sleepable:1;    /* BPF program is sleepable */
>         enum bpf_prog_type      type;           /* Type of BPF program */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 818deb6a06e4..6f8aa4718d6f 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12293,6 +12293,8 @@ enum special_kfunc_type {
>         KF___bpf_trap,
>         KF_bpf_task_work_schedule_signal,
>         KF_bpf_task_work_schedule_resume,
> +       KF_bpf_tracing_is_exit,

we have bpf_session_is_return(), can't we just implement it for
fsession program type? Is that because we need ctx access? But we can
get bpf_run_ctx without that, can't we store this flag in run_ctx?

> +       KF_bpf_fsession_cookie,

same, we have bpf_session_cookie, can we support that? And again, we
can just make sure that session cookie is put into run_ctx.

And if not, let's at least use consistent naming then?
bpf_fsession_is_return() and bpf_fsession_cookie() as one more
consistent example?


>  };
>
>  BTF_ID_LIST(special_kfunc_list)
> @@ -12365,6 +12367,8 @@ BTF_ID(func, bpf_res_spin_unlock_irqrestore)
>  BTF_ID(func, __bpf_trap)
>  BTF_ID(func, bpf_task_work_schedule_signal)
>  BTF_ID(func, bpf_task_work_schedule_resume)
> +BTF_ID(func, bpf_tracing_is_exit)
> +BTF_ID(func, bpf_fsession_cookie)
>
>  static bool is_task_work_add_kfunc(u32 func_id)
>  {

[...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-11-05  0:40           ` Andrii Nakryiko
@ 2025-11-05  2:43             ` Alexei Starovoitov
  2025-11-05  3:48               ` Menglong Dong
  2025-11-05 17:29               ` Andrii Nakryiko
  0 siblings, 2 replies; 24+ messages in thread
From: Alexei Starovoitov @ 2025-11-05  2:43 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Menglong Dong, Menglong Dong, Alexei Starovoitov, Jiri Olsa,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Eduard, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Matt Bobrowski, Steven Rostedt,
	Masami Hiramatsu, Leon Hwang, jiang.biao, bpf, LKML,
	linux-trace-kernel

On Tue, Nov 4, 2025 at 4:40 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Nov 3, 2025 at 3:29 AM Menglong Dong <menglong.dong@linux.dev> wrote:
> >
> > On 2025/11/1 01:57, Alexei Starovoitov wrote:
> > > On Thu, Oct 30, 2025 at 8:36 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 31, 2025 at 9:42 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > > >
> > > > > > Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
> > > > > > invoke_bpf_session_exit is introduced for this purpose.
> > > > > >
> > > > > > In invoke_bpf_session_entry(), we will check if the return value of the
> > > > > > fentry is 0, and set the corresponding session flag if not. And in
> > > > > > invoke_bpf_session_exit(), we will check if the corresponding flag is
> > > > > > set. If set, the fexit will be skipped.
> > > > > >
> > > > > > As designed, the session flags and session cookie address is stored after
> > > > > > the return value, and the stack look like this:
> > > > > >
> > > > > >   cookie ptr    -> 8 bytes
> > > > > >   session flags -> 8 bytes
> > > > > >   return value  -> 8 bytes
> > > > > >   argN          -> 8 bytes
> > > > > >   ...
> > > > > >   arg1          -> 8 bytes
> > > > > >   nr_args       -> 8 bytes
>
> Let's look at "cookie ptr", "session flags", and "nr_args". We can
> combine all of them into a single 8 byte slot: assign each session
> program index 0, 1, ..., Nsession. 1 bit for entry/exit flag, few bits
> for session prog index, and few more bits for nr_args, and we still
> will have tons of space for some other additions in the future. From
> that session program index you can calculate cookieN address to return
> to user.
>
> And we should look whether moving nr_args into bpf_run_ctx would
> actually minimize amount of trampoline assembly code, as we can
> implement a bunch of stuff in pure C. (well, BPF verifier inlining is
> a separate thing, but it can be mostly arch-independent, right?)

Instead of all that I have a different suggestion...

how about we introduce this "session" attach type,
but won't mess with trampoline and whole new session->nr_links.
Instead the same prog can be added to 'fentry' list
and 'fexit' list.
We lose the ability to skip fexit, but I'm still not convinced
it's necessary.
The biggest benefit is that it will work for existing JITs and trampolines.
No new complex asm will be necessary.
As far as writable session_cookie ...
let's add another 8 byte space to bpf_tramp_run_ctx
and only allow single 'fsession' prog for a given kernel function.
Again to avoid changing all trampolines.
This way the feature can be implemented purely in C and no arch
specific changes.
It's more limited, but doesn't sound that the use case for multiple
fsession-s exist. All this is on/off tracing. Not something
that will be attached 24/7.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-11-05  2:43             ` Alexei Starovoitov
@ 2025-11-05  3:48               ` Menglong Dong
  2025-11-05 17:29               ` Andrii Nakryiko
  1 sibling, 0 replies; 24+ messages in thread
From: Menglong Dong @ 2025-11-05  3:48 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: Menglong Dong, Alexei Starovoitov, Jiri Olsa, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Matt Bobrowski, Steven Rostedt, Masami Hiramatsu, Leon Hwang,
	jiang.biao, bpf, LKML, linux-trace-kernel

On 2025/11/5 10:43, Alexei Starovoitov wrote:
> On Tue, Nov 4, 2025 at 4:40 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Nov 3, 2025 at 3:29 AM Menglong Dong <menglong.dong@linux.dev> wrote:
> > >
> > > On 2025/11/1 01:57, Alexei Starovoitov wrote:
> > > > On Thu, Oct 30, 2025 at 8:36 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 31, 2025 at 9:42 AM Alexei Starovoitov
> > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > >
> > > > > > On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > > > >
> > > > > > > Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
> > > > > > > invoke_bpf_session_exit is introduced for this purpose.
> > > > > > >
> > > > > > > In invoke_bpf_session_entry(), we will check if the return value of the
> > > > > > > fentry is 0, and set the corresponding session flag if not. And in
> > > > > > > invoke_bpf_session_exit(), we will check if the corresponding flag is
> > > > > > > set. If set, the fexit will be skipped.
> > > > > > >
> > > > > > > As designed, the session flags and session cookie address is stored after
> > > > > > > the return value, and the stack look like this:
> > > > > > >
> > > > > > >   cookie ptr    -> 8 bytes
> > > > > > >   session flags -> 8 bytes
> > > > > > >   return value  -> 8 bytes
> > > > > > >   argN          -> 8 bytes
> > > > > > >   ...
> > > > > > >   arg1          -> 8 bytes
> > > > > > >   nr_args       -> 8 bytes
> >
> > Let's look at "cookie ptr", "session flags", and "nr_args". We can
> > combine all of them into a single 8 byte slot: assign each session
> > program index 0, 1, ..., Nsession. 1 bit for entry/exit flag, few bits
> > for session prog index, and few more bits for nr_args, and we still
> > will have tons of space for some other additions in the future. From
> > that session program index you can calculate cookieN address to return
> > to user.

Yeah, it should work, and I thought this way too. I didn't do it
this way, because I have to adopt the usage of "nr_args" in all the
arch, which makes this series looks complicated. We can do it this
way if we still use this solution.

> >
> > And we should look whether moving nr_args into bpf_run_ctx would
> > actually minimize amount of trampoline assembly code, as we can
> > implement a bunch of stuff in pure C. (well, BPF verifier inlining is
> > a separate thing, but it can be mostly arch-independent, right?)

If we still on this way, it worth a try.

> 
> Instead of all that I have a different suggestion...
> 
> how about we introduce this "session" attach type,
> but won't mess with trampoline and whole new session->nr_links.
> Instead the same prog can be added to 'fentry' list
> and 'fexit' list.
> We lose the ability to skip fexit, but I'm still not convinced
> it's necessary.
> The biggest benefit is that it will work for existing JITs and trampolines.
> No new complex asm will be necessary.
> As far as writable session_cookie ...
> let's add another 8 byte space to bpf_tramp_run_ctx

I'm OK with this approach too. In fact, the session cookie can
also be used to store the information that if skip the fexit, and
the user can do the skipping by themselves.

However, we need to implement bpf_session_is_return() to tell
if the current prog is fexit, as we attach the prog to both the
entry and exit of the target function. And it still need to be done
in the trampoline, and maybe we can reuse the nr_args for this
purpose :/

If we don't want to modify the trampoline, I think a way
is that we don't introduce tracing session, and simply add the
"session cookie", and all the fentry and fexit is able to read/write
it.

Wait a minutes......we can directly make the attach cookie writable?
Things become weird now........

> and only allow single 'fsession' prog for a given kernel function.

Maybe we don't need to limit it to give the user more choice?
For example, progA use the first 4-bytes, and progB use the
last 4-bytes. Hmm...it's a little dangerous, things may mess up
if they don't talk it over beforehand.

Thanks!
Menglong Dong

> Again to avoid changing all trampolines.
> This way the feature can be implemented purely in C and no arch
> specific changes.
> It's more limited, but doesn't sound that the use case for multiple
> fsession-s exist. All this is on/off tracing. Not something
> that will be attached 24/7.
> 
> 





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-11-05  2:43             ` Alexei Starovoitov
  2025-11-05  3:48               ` Menglong Dong
@ 2025-11-05 17:29               ` Andrii Nakryiko
  2025-11-05 22:00                 ` Alexei Starovoitov
  1 sibling, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2025-11-05 17:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Menglong Dong, Menglong Dong, Alexei Starovoitov, Jiri Olsa,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Eduard, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Matt Bobrowski, Steven Rostedt,
	Masami Hiramatsu, Leon Hwang, jiang.biao, bpf, LKML,
	linux-trace-kernel

On Tue, Nov 4, 2025 at 6:43 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Nov 4, 2025 at 4:40 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Nov 3, 2025 at 3:29 AM Menglong Dong <menglong.dong@linux.dev> wrote:
> > >
> > > On 2025/11/1 01:57, Alexei Starovoitov wrote:
> > > > On Thu, Oct 30, 2025 at 8:36 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > >
> > > > > On Fri, Oct 31, 2025 at 9:42 AM Alexei Starovoitov
> > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > >
> > > > > > On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > > > >
> > > > > > > Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
> > > > > > > invoke_bpf_session_exit is introduced for this purpose.
> > > > > > >
> > > > > > > In invoke_bpf_session_entry(), we will check if the return value of the
> > > > > > > fentry is 0, and set the corresponding session flag if not. And in
> > > > > > > invoke_bpf_session_exit(), we will check if the corresponding flag is
> > > > > > > set. If set, the fexit will be skipped.
> > > > > > >
> > > > > > > As designed, the session flags and session cookie address is stored after
> > > > > > > the return value, and the stack look like this:
> > > > > > >
> > > > > > >   cookie ptr    -> 8 bytes
> > > > > > >   session flags -> 8 bytes
> > > > > > >   return value  -> 8 bytes
> > > > > > >   argN          -> 8 bytes
> > > > > > >   ...
> > > > > > >   arg1          -> 8 bytes
> > > > > > >   nr_args       -> 8 bytes
> >
> > Let's look at "cookie ptr", "session flags", and "nr_args". We can
> > combine all of them into a single 8 byte slot: assign each session
> > program index 0, 1, ..., Nsession. 1 bit for entry/exit flag, few bits
> > for session prog index, and few more bits for nr_args, and we still
> > will have tons of space for some other additions in the future. From
> > that session program index you can calculate cookieN address to return
> > to user.
> >
> > And we should look whether moving nr_args into bpf_run_ctx would
> > actually minimize amount of trampoline assembly code, as we can
> > implement a bunch of stuff in pure C. (well, BPF verifier inlining is
> > a separate thing, but it can be mostly arch-independent, right?)
>
> Instead of all that I have a different suggestion...
>
> how about we introduce this "session" attach type,
> but won't mess with trampoline and whole new session->nr_links.
> Instead the same prog can be added to 'fentry' list
> and 'fexit' list.
> We lose the ability to skip fexit, but I'm still not convinced
> it's necessary.
> The biggest benefit is that it will work for existing JITs and trampolines.
> No new complex asm will be necessary.
> As far as writable session_cookie ...
> let's add another 8 byte space to bpf_tramp_run_ctx
> and only allow single 'fsession' prog for a given kernel function.
> Again to avoid changing all trampolines.
> This way the feature can be implemented purely in C and no arch
> specific changes.
> It's more limited, but doesn't sound that the use case for multiple
> fsession-s exist. All this is on/off tracing. Not something
> that will be attached 24/7.

I'd rather not have a feature at all, than have a feature that might
or might not work depending on circumstances I don't control. If
someone happens to be using fsession program on the same kernel
function I happen to be tracing (e.g., with retsnoop), random failure
to attach would be maddening to debug.

If we can't implement this properly, let's just put it on hold instead
of adding crippled implementation.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-11-05 17:29               ` Andrii Nakryiko
@ 2025-11-05 22:00                 ` Alexei Starovoitov
  2025-11-06 12:14                   ` Menglong Dong
  0 siblings, 1 reply; 24+ messages in thread
From: Alexei Starovoitov @ 2025-11-05 22:00 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Menglong Dong, Menglong Dong, Alexei Starovoitov, Jiri Olsa,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Eduard, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Matt Bobrowski, Steven Rostedt,
	Masami Hiramatsu, Leon Hwang, jiang.biao, bpf, LKML,
	linux-trace-kernel

On Wed, Nov 5, 2025 at 9:30 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Nov 4, 2025 at 6:43 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Nov 4, 2025 at 4:40 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Nov 3, 2025 at 3:29 AM Menglong Dong <menglong.dong@linux.dev> wrote:
> > > >
> > > > On 2025/11/1 01:57, Alexei Starovoitov wrote:
> > > > > On Thu, Oct 30, 2025 at 8:36 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Oct 31, 2025 at 9:42 AM Alexei Starovoitov
> > > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > > >
> > > > > > > On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
> > > > > > > > invoke_bpf_session_exit is introduced for this purpose.
> > > > > > > >
> > > > > > > > In invoke_bpf_session_entry(), we will check if the return value of the
> > > > > > > > fentry is 0, and set the corresponding session flag if not. And in
> > > > > > > > invoke_bpf_session_exit(), we will check if the corresponding flag is
> > > > > > > > set. If set, the fexit will be skipped.
> > > > > > > >
> > > > > > > > As designed, the session flags and session cookie address is stored after
> > > > > > > > the return value, and the stack look like this:
> > > > > > > >
> > > > > > > >   cookie ptr    -> 8 bytes
> > > > > > > >   session flags -> 8 bytes
> > > > > > > >   return value  -> 8 bytes
> > > > > > > >   argN          -> 8 bytes
> > > > > > > >   ...
> > > > > > > >   arg1          -> 8 bytes
> > > > > > > >   nr_args       -> 8 bytes
> > >
> > > Let's look at "cookie ptr", "session flags", and "nr_args". We can
> > > combine all of them into a single 8 byte slot: assign each session
> > > program index 0, 1, ..., Nsession. 1 bit for entry/exit flag, few bits
> > > for session prog index, and few more bits for nr_args, and we still
> > > will have tons of space for some other additions in the future. From
> > > that session program index you can calculate cookieN address to return
> > > to user.
> > >
> > > And we should look whether moving nr_args into bpf_run_ctx would
> > > actually minimize amount of trampoline assembly code, as we can
> > > implement a bunch of stuff in pure C. (well, BPF verifier inlining is
> > > a separate thing, but it can be mostly arch-independent, right?)
> >
> > Instead of all that I have a different suggestion...
> >
> > how about we introduce this "session" attach type,
> > but won't mess with trampoline and whole new session->nr_links.
> > Instead the same prog can be added to 'fentry' list
> > and 'fexit' list.
> > We lose the ability to skip fexit, but I'm still not convinced
> > it's necessary.
> > The biggest benefit is that it will work for existing JITs and trampolines.
> > No new complex asm will be necessary.
> > As far as writable session_cookie ...
> > let's add another 8 byte space to bpf_tramp_run_ctx
> > and only allow single 'fsession' prog for a given kernel function.
> > Again to avoid changing all trampolines.
> > This way the feature can be implemented purely in C and no arch
> > specific changes.
> > It's more limited, but doesn't sound that the use case for multiple
> > fsession-s exist. All this is on/off tracing. Not something
> > that will be attached 24/7.
>
> I'd rather not have a feature at all, than have a feature that might
> or might not work depending on circumstances I don't control. If
> someone happens to be using fsession program on the same kernel
> function I happen to be tracing (e.g., with retsnoop), random failure
> to attach would be maddening to debug.

fentry won't conflict with fsession. I'm proposing
the limit of fsession-s to 1. Due to stack usage there gotta be
a limit anyway. I say, 32 is really the max. which is 256 bytes
for cookies plus all the stack usage for args, nr_args, run_ctx, etc.
Total of under 512 is ok.
So tooling would have to deal with the limit regardless.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-11-05 22:00                 ` Alexei Starovoitov
@ 2025-11-06 12:14                   ` Menglong Dong
  2025-11-06 17:03                     ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Menglong Dong @ 2025-11-06 12:14 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: Menglong Dong, Alexei Starovoitov, Jiri Olsa, Daniel Borkmann,
	John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Eduard,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Matt Bobrowski, Steven Rostedt, Masami Hiramatsu, Leon Hwang,
	jiang.biao, bpf, LKML, linux-trace-kernel

On 2025/11/6 06:00, Alexei Starovoitov wrote:
> On Wed, Nov 5, 2025 at 9:30 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Nov 4, 2025 at 6:43 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, Nov 4, 2025 at 4:40 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
[......]
> > >
> > > Instead of all that I have a different suggestion...
> > >
> > > how about we introduce this "session" attach type,
> > > but won't mess with trampoline and whole new session->nr_links.
> > > Instead the same prog can be added to 'fentry' list
> > > and 'fexit' list.
> > > We lose the ability to skip fexit, but I'm still not convinced
> > > it's necessary.
> > > The biggest benefit is that it will work for existing JITs and trampolines.
> > > No new complex asm will be necessary.
> > > As far as writable session_cookie ...
> > > let's add another 8 byte space to bpf_tramp_run_ctx
> > > and only allow single 'fsession' prog for a given kernel function.
> > > Again to avoid changing all trampolines.
> > > This way the feature can be implemented purely in C and no arch
> > > specific changes.
> > > It's more limited, but doesn't sound that the use case for multiple
> > > fsession-s exist. All this is on/off tracing. Not something
> > > that will be attached 24/7.
> >
> > I'd rather not have a feature at all, than have a feature that might
> > or might not work depending on circumstances I don't control. If
> > someone happens to be using fsession program on the same kernel
> > function I happen to be tracing (e.g., with retsnoop), random failure
> > to attach would be maddening to debug.
> 
> fentry won't conflict with fsession. I'm proposing
> the limit of fsession-s to 1. Due to stack usage there gotta be

I think Andrii means that the problem is the limiting the fsession to
1, which can make we attach fail if someone else has already attach
it.

If we want to limit the stack usage, I think what we should limit is
the count of the fsession progs that use session cookie, rather the
count of the fsessions.

I understand your idea that add the prog to both fentry and fexit list
instead of introducing a BPF_TRAMP_SESSION in the progs_hlist.
However, we still have to modify the arch stuff, as we need to store the
"bpf_fsession_return". What's more, it's a little complex to add a prog
to both fentry and fexit list, as bpf_tramp_link doesn't support such
operation.

So I think it's more clear to keep current logic. As Andrii suggested,
we can reuse the nr_args, and no more room will be used on the
stack, except the session cookies.

As for the session cookies, how about we limit its number? For example,
only 4 session cookies are allowed to be used on a target, which
I think should be enough.

I can remove the "fexit skipping" part to make the trampoline simpler
if you prefer, which I'm OK, considering the optimization I'm making
to the origin call in x86_64.

Therefore, the logic won't be complex, just reserve the room for the
session cookies and the call to invoke_bpf().

Thanks!
Menglong Dong

> a limit anyway. I say, 32 is really the max. which is 256 bytes
> for cookies plus all the stack usage for args, nr_args, run_ctx, etc.
> Total of under 512 is ok.
> So tooling would have to deal with the limit regardless.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64
  2025-11-06 12:14                   ` Menglong Dong
@ 2025-11-06 17:03                     ` Andrii Nakryiko
  0 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2025-11-06 17:03 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Alexei Starovoitov, Menglong Dong, Alexei Starovoitov, Jiri Olsa,
	Daniel Borkmann, John Fastabend, Andrii Nakryiko,
	Martin KaFai Lau, Eduard, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Matt Bobrowski, Steven Rostedt,
	Masami Hiramatsu, Leon Hwang, jiang.biao, bpf, LKML,
	linux-trace-kernel

On Thu, Nov 6, 2025 at 4:15 AM Menglong Dong <menglong.dong@linux.dev> wrote:
>
> On 2025/11/6 06:00, Alexei Starovoitov wrote:
> > On Wed, Nov 5, 2025 at 9:30 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Tue, Nov 4, 2025 at 6:43 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Tue, Nov 4, 2025 at 4:40 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> [......]
> > > >
> > > > Instead of all that I have a different suggestion...
> > > >
> > > > how about we introduce this "session" attach type,
> > > > but won't mess with trampoline and whole new session->nr_links.
> > > > Instead the same prog can be added to 'fentry' list
> > > > and 'fexit' list.
> > > > We lose the ability to skip fexit, but I'm still not convinced
> > > > it's necessary.
> > > > The biggest benefit is that it will work for existing JITs and trampolines.
> > > > No new complex asm will be necessary.
> > > > As far as writable session_cookie ...
> > > > let's add another 8 byte space to bpf_tramp_run_ctx
> > > > and only allow single 'fsession' prog for a given kernel function.
> > > > Again to avoid changing all trampolines.
> > > > This way the feature can be implemented purely in C and no arch
> > > > specific changes.
> > > > It's more limited, but doesn't sound that the use case for multiple
> > > > fsession-s exist. All this is on/off tracing. Not something
> > > > that will be attached 24/7.
> > >
> > > I'd rather not have a feature at all, than have a feature that might
> > > or might not work depending on circumstances I don't control. If
> > > someone happens to be using fsession program on the same kernel
> > > function I happen to be tracing (e.g., with retsnoop), random failure
> > > to attach would be maddening to debug.
> >
> > fentry won't conflict with fsession. I'm proposing
> > the limit of fsession-s to 1. Due to stack usage there gotta be
>
> I think Andrii means that the problem is the limiting the fsession to
> 1, which can make we attach fail if someone else has already attach
> it.
>

Yeah, I wasn't worried about fsession vs fentry interactions. I was
(still am) worried that whenever I'll be deciding to use fsession I'd
have to think about (and implement) fallback plan in case fsession
attachment fails because (presumably) someone else already used
fsession on the function. Just too much hassle to bother with fsession
at that point.

> If we want to limit the stack usage, I think what we should limit is
> the count of the fsession progs that use session cookie, rather the
> count of the fsessions.
>
> I understand your idea that add the prog to both fentry and fexit list
> instead of introducing a BPF_TRAMP_SESSION in the progs_hlist.
> However, we still have to modify the arch stuff, as we need to store the
> "bpf_fsession_return". What's more, it's a little complex to add a prog
> to both fentry and fexit list, as bpf_tramp_link doesn't support such
> operation.
>
> So I think it's more clear to keep current logic. As Andrii suggested,
> we can reuse the nr_args, and no more room will be used on the
> stack, except the session cookies.
>
> As for the session cookies, how about we limit its number? For example,
> only 4 session cookies are allowed to be used on a target, which
> I think should be enough.

I think Alexei's concern is not so much stack space usage
(realistically you will almost never have more that 1-2-3
fsessions+fentry+fexit programs per trampoline), his concern is code
complexity in arch-specific trampoline generation code. And I share
the concern. But if we do support 4 session cookies, we might as well
just not put any artificial limits because code complexity-wise this
won't change anything.

So I think we should decide whether we add fsession with session
cookie as a concept, and if yes, then not add unnecessary
restrictions, as at that point the code complexity won't really change
much.

>
> I can remove the "fexit skipping" part to make the trampoline simpler
> if you prefer, which I'm OK, considering the optimization I'm making
> to the origin call in x86_64.
>
> Therefore, the logic won't be complex, just reserve the room for the
> session cookies and the call to invoke_bpf().
>
> Thanks!
> Menglong Dong
>
> > a limit anyway. I say, 32 is really the max. which is 256 bytes
> > for cookies plus all the stack usage for args, nr_args, run_ctx, etc.
> > Total of under 512 is ok.
> > So tooling would have to deal with the limit regardless.
> >
>
>
>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-11-06 17:04 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-26  3:01 [PATCH bpf-next v3 0/7] bpf: tracing session supporting Menglong Dong
2025-10-26  3:01 ` [PATCH bpf-next v3 1/7] bpf: add tracing session support Menglong Dong
2025-10-26  3:01 ` [PATCH bpf-next v3 2/7] bpf: add two kfunc for TRACE_SESSION Menglong Dong
2025-10-26  4:42   ` kernel test robot
2025-10-27  2:44   ` kernel test robot
2025-10-29  2:54   ` Menglong Dong
2025-11-05  0:40   ` Andrii Nakryiko
2025-10-26  3:01 ` [PATCH bpf-next v3 3/7] bpf,x86: add ret_off to invoke_bpf() Menglong Dong
2025-10-26  3:01 ` [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for x86_64 Menglong Dong
2025-10-31  1:42   ` Alexei Starovoitov
2025-10-31  3:36     ` Menglong Dong
2025-10-31 17:57       ` Alexei Starovoitov
2025-11-03  6:00         ` Leon Hwang
2025-11-03 11:24         ` Menglong Dong
2025-11-05  0:40           ` Andrii Nakryiko
2025-11-05  2:43             ` Alexei Starovoitov
2025-11-05  3:48               ` Menglong Dong
2025-11-05 17:29               ` Andrii Nakryiko
2025-11-05 22:00                 ` Alexei Starovoitov
2025-11-06 12:14                   ` Menglong Dong
2025-11-06 17:03                     ` Andrii Nakryiko
2025-10-26  3:01 ` [PATCH bpf-next v3 5/7] libbpf: add support for tracing session Menglong Dong
2025-10-26  3:01 ` [PATCH bpf-next v3 6/7] selftests/bpf: add testcases " Menglong Dong
2025-10-26  3:01 ` [PATCH bpf-next v3 7/7] selftests/bpf: test fsession mixed with fentry and fexit Menglong Dong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).