[PATCH bpf-next 0/2] bpf: enable x86 fentry on tail-called programs

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/2] bpf: enable x86 fentry on tail-called programs
@ 2026-03-27 14:16 Takeru Hayasaka
  2026-03-27 14:16 ` [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs Takeru Hayasaka
  2026-03-27 14:16 ` [PATCH bpf-next 2/2] selftests/bpf: cover fentry on tailcalled programs Takeru Hayasaka
  0 siblings, 2 replies; 15+ messages in thread
From: Takeru Hayasaka @ 2026-03-27 14:16 UTC (permalink / raw)
  To: ast, daniel, andrii; +Cc: bpf, x86, linux-kselftest, linux-kernel

This series enables fentry on x86 BPF programs reached via tail calls
and adds a focused selftest for the expected behavior.

Patch 1 fixes the x86 mirrored text-poke lookup so the tail-call landing
slot is patched on both IBT and non-IBT JITs.

Patch 2 adds a selftest that checks both direct entry and tail-called
entry for a tailcall callee with fentry attached.

Takeru Hayasaka (2):
  bpf, x86: patch tail-call fentry slot on non-IBT JITs
  selftests/bpf: cover fentry on tailcalled programs

 arch/x86/net/bpf_jit_comp.c                   |  47 +++++++-
 .../selftests/bpf/prog_tests/tailcalls.c      | 110 ++++++++++++++++++
 .../bpf/progs/tailcall_fentry_probe.c         |  16 +++
 .../bpf/progs/tailcall_fentry_target.c        |  27 +++++
 4 files changed, 197 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_fentry_probe.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_fentry_target.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-27 14:16 [PATCH bpf-next 0/2] bpf: enable x86 fentry on tail-called programs Takeru Hayasaka
@ 2026-03-27 14:16 ` Takeru Hayasaka
  2026-03-27 14:24   ` Alexei Starovoitov
  2026-03-27 14:16 ` [PATCH bpf-next 2/2] selftests/bpf: cover fentry on tailcalled programs Takeru Hayasaka
  1 sibling, 1 reply; 15+ messages in thread
From: Takeru Hayasaka @ 2026-03-27 14:16 UTC (permalink / raw)
  To: ast, daniel, andrii; +Cc: bpf, x86, linux-kselftest, linux-kernel

x86 tail-call fentry patching mirrors CALL text pokes to the tail-call
landing slot.

The helper that locates that mirrored slot assumes an ENDBR-prefixed
landing, which works on IBT JITs but fails on non-IBT JITs where the
landing starts directly with the 5-byte patch slot.

As a result, the regular entry gets patched but the tail-call landing
remains NOP5, so fentry never fires for tail-called programs on non-IBT
kernels.

Anchor the lookup on the landing address, verify the short-jump layout
first, and only check ENDBR when one is actually emitted.

Signed-off-by: Takeru Hayasaka <hayatake396@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c | 47 ++++++++++++++++++++++++++++++++++---
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index e9b78040d703..fe5fd37f65d8 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -325,8 +325,10 @@ struct jit_context {
 
 /* Number of bytes emit_patch() needs to generate instructions */
 #define X86_PATCH_SIZE		5
+/* Number of bytes used by the short jump that skips the tail-call hook. */
+#define X86_TAIL_CALL_SKIP_JMP_SIZE	2
 /* Number of bytes that will be skipped on tailcall */
-#define X86_TAIL_CALL_OFFSET	(12 + ENDBR_INSN_SIZE)
+#define X86_TAIL_CALL_OFFSET	(12 + X86_TAIL_CALL_SKIP_JMP_SIZE + ENDBR_INSN_SIZE)
 
 static void push_r9(u8 **pprog)
 {
@@ -545,8 +547,15 @@ static void emit_prologue(u8 **pprog, u8 *ip, u32 stack_depth, bool ebpf_from_cb
 		EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
 	}
 
+	if (!is_subprog) {
+		/* Normal entry skips the tail-call-only trampoline hook. */
+		EMIT2(0xEB, ENDBR_INSN_SIZE + X86_PATCH_SIZE);
+	}
+
 	/* X86_TAIL_CALL_OFFSET is here */
 	EMIT_ENDBR();
+	if (!is_subprog)
+		emit_nops(&prog, X86_PATCH_SIZE);
 
 	/* sub rsp, rounded_stack_depth */
 	if (stack_depth)
@@ -632,12 +641,33 @@ static int __bpf_arch_text_poke(void *ip, enum bpf_text_poke_type old_t,
 	return ret;
 }
 
+static void *bpf_tail_call_fentry_ip(void *ip)
+{
+	u8 *tail_ip = ip + X86_TAIL_CALL_OFFSET;
+	u8 *landing = tail_ip - ENDBR_INSN_SIZE;
+
+	/* ip points at the regular fentry slot after the entry ENDBR. */
+	if (landing[-X86_TAIL_CALL_SKIP_JMP_SIZE] != 0xEB ||
+	    landing[-X86_TAIL_CALL_SKIP_JMP_SIZE + 1] !=
+		    ENDBR_INSN_SIZE + X86_PATCH_SIZE)
+		return NULL;
+
+	if (ENDBR_INSN_SIZE && !is_endbr((u32 *)landing))
+		return NULL;
+
+	return tail_ip;
+}
+
 int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type old_t,
 		       enum bpf_text_poke_type new_t, void *old_addr,
 		       void *new_addr)
 {
+	void *tail_ip = NULL;
+	bool is_bpf_text = is_bpf_text_address((long)ip);
+	int ret, tail_ret;
+
 	if (!is_kernel_text((long)ip) &&
-	    !is_bpf_text_address((long)ip))
+	    !is_bpf_text)
 		/* BPF poking in modules is not supported */
 		return -EINVAL;
 
@@ -648,7 +678,18 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type old_t,
 	if (is_endbr(ip))
 		ip += ENDBR_INSN_SIZE;
 
-	return __bpf_arch_text_poke(ip, old_t, new_t, old_addr, new_addr);
+	if (is_bpf_text && (old_t == BPF_MOD_CALL || new_t == BPF_MOD_CALL))
+		tail_ip = bpf_tail_call_fentry_ip(ip);
+
+	ret = __bpf_arch_text_poke(ip, old_t, new_t, old_addr, new_addr);
+	if (ret < 0 || !tail_ip)
+		return ret;
+
+	tail_ret = __bpf_arch_text_poke(tail_ip, old_t, new_t, old_addr, new_addr);
+	if (tail_ret < 0)
+		return tail_ret;
+
+	return ret && tail_ret;
 }
 
 #define EMIT_LFENCE()	EMIT3(0x0F, 0xAE, 0xE8)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next 2/2] selftests/bpf: cover fentry on tailcalled programs
  2026-03-27 14:16 [PATCH bpf-next 0/2] bpf: enable x86 fentry on tail-called programs Takeru Hayasaka
  2026-03-27 14:16 ` [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs Takeru Hayasaka
@ 2026-03-27 14:16 ` Takeru Hayasaka
  1 sibling, 0 replies; 15+ messages in thread
From: Takeru Hayasaka @ 2026-03-27 14:16 UTC (permalink / raw)
  To: ast, daniel, andrii; +Cc: bpf, x86, linux-kselftest, linux-kernel

Add a small tailcall target/probe pair and a tailcalls subtest that
attaches fentry to the callee program.

The test first runs the callee directly and checks that fentry fires
exactly once, then runs the entry program, tail-calls into the callee
and checks that the same fentry also fires exactly once on the tail-call
path.

This covers both sides of the x86 change: direct entry must not
double-fire, while tail-called entry must now be observable.

Signed-off-by: Takeru Hayasaka <hayatake396@gmail.com>
---
 .../selftests/bpf/prog_tests/tailcalls.c      | 110 ++++++++++++++++++
 .../bpf/progs/tailcall_fentry_probe.c         |  16 +++
 .../bpf/progs/tailcall_fentry_target.c        |  27 +++++
 3 files changed, 153 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_fentry_probe.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_fentry_target.c

diff --git a/tools/testing/selftests/bpf/prog_tests/tailcalls.c b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
index 7d534fde0af9..ac05df65b666 100644
--- a/tools/testing/selftests/bpf/prog_tests/tailcalls.c
+++ b/tools/testing/selftests/bpf/prog_tests/tailcalls.c
@@ -1113,6 +1113,114 @@ static void test_tailcall_bpf2bpf_fentry_entry(void)
 	bpf_object__close(tgt_obj);
 }
 
+static void test_tailcall_fentry_tailcallee(void)
+{
+	struct bpf_object *tgt_obj = NULL, *fentry_obj = NULL;
+	struct bpf_map *prog_array, *data_map;
+	struct bpf_link *fentry_link = NULL;
+	struct bpf_program *prog;
+	int err, map_fd, callee_fd, main_fd, data_fd, i, val;
+	char buff[128] = {};
+
+	LIBBPF_OPTS(bpf_test_run_opts, topts,
+		    .data_in = buff,
+		    .data_size_in = sizeof(buff),
+		    .repeat = 1,
+	);
+
+	err = bpf_prog_test_load("tailcall_fentry_target.bpf.o",
+				 BPF_PROG_TYPE_SCHED_CLS,
+				 &tgt_obj, &main_fd);
+	if (!ASSERT_OK(err, "load tgt_obj"))
+		return;
+
+	prog_array = bpf_object__find_map_by_name(tgt_obj, "jmp_table");
+	if (!ASSERT_OK_PTR(prog_array, "find jmp_table map"))
+		goto out;
+
+	map_fd = bpf_map__fd(prog_array);
+	if (!ASSERT_FALSE(map_fd < 0, "find jmp_table map fd"))
+		goto out;
+
+	prog = bpf_object__find_program_by_name(tgt_obj, "entry");
+	if (!ASSERT_OK_PTR(prog, "find entry prog"))
+		goto out;
+
+	main_fd = bpf_program__fd(prog);
+	if (!ASSERT_FALSE(main_fd < 0, "find entry prog fd"))
+		goto out;
+
+	prog = bpf_object__find_program_by_name(tgt_obj, "callee");
+	if (!ASSERT_OK_PTR(prog, "find callee prog"))
+		goto out;
+
+	callee_fd = bpf_program__fd(prog);
+	if (!ASSERT_FALSE(callee_fd < 0, "find callee prog fd"))
+		goto out;
+
+	i = 0;
+	err = bpf_map_update_elem(map_fd, &i, &callee_fd, BPF_ANY);
+	if (!ASSERT_OK(err, "update jmp_table"))
+		goto out;
+
+	fentry_obj = bpf_object__open_file("tailcall_fentry_probe.bpf.o", NULL);
+	if (!ASSERT_OK_PTR(fentry_obj, "open fentry_obj file"))
+		goto out;
+
+	prog = bpf_object__find_program_by_name(fentry_obj, "fentry_callee");
+	if (!ASSERT_OK_PTR(prog, "find fentry prog"))
+		goto out;
+
+	err = bpf_program__set_attach_target(prog, callee_fd, "callee");
+	if (!ASSERT_OK(err, "set_attach_target callee"))
+		goto out;
+
+	err = bpf_object__load(fentry_obj);
+	if (!ASSERT_OK(err, "load fentry_obj"))
+		goto out;
+
+	fentry_link = bpf_program__attach_trace(prog);
+	if (!ASSERT_OK_PTR(fentry_link, "attach_trace"))
+		goto out;
+
+	data_map = bpf_object__find_map_by_name(fentry_obj, ".bss");
+	if (!ASSERT_FALSE(!data_map || !bpf_map__is_internal(data_map),
+			  "find tailcall_fentry_probe.bss map"))
+		goto out;
+
+	data_fd = bpf_map__fd(data_map);
+	if (!ASSERT_FALSE(data_fd < 0,
+			  "find tailcall_fentry_probe.bss map fd"))
+		goto out;
+
+	err = bpf_prog_test_run_opts(callee_fd, &topts);
+	ASSERT_OK(err, "direct callee");
+	ASSERT_EQ(topts.retval, 7, "direct callee retval");
+
+	i = 0;
+	err = bpf_map_lookup_elem(data_fd, &i, &val);
+	ASSERT_OK(err, "direct fentry count");
+	ASSERT_EQ(val, 1, "direct fentry count");
+
+	val = 0;
+	err = bpf_map_update_elem(data_fd, &i, &val, BPF_ANY);
+	ASSERT_OK(err, "reset fentry count");
+
+	err = bpf_prog_test_run_opts(main_fd, &topts);
+	ASSERT_OK(err, "tailcall");
+	ASSERT_EQ(topts.retval, 7, "tailcall retval");
+
+	i = 0;
+	err = bpf_map_lookup_elem(data_fd, &i, &val);
+	ASSERT_OK(err, "fentry count");
+	ASSERT_EQ(val, 1, "fentry count");
+
+out:
+	bpf_link__destroy(fentry_link);
+	bpf_object__close(fentry_obj);
+	bpf_object__close(tgt_obj);
+}
+
 #define JMP_TABLE "/sys/fs/bpf/jmp_table"
 
 static int poke_thread_exit;
@@ -1759,6 +1867,8 @@ void test_tailcalls(void)
 		test_tailcall_bpf2bpf_fentry_fexit();
 	if (test__start_subtest("tailcall_bpf2bpf_fentry_entry"))
 		test_tailcall_bpf2bpf_fentry_entry();
+	if (test__start_subtest("tailcall_fentry_tailcallee"))
+		test_tailcall_fentry_tailcallee();
 	if (test__start_subtest("tailcall_poke"))
 		test_tailcall_poke();
 	if (test__start_subtest("tailcall_bpf2bpf_hierarchy_1"))
diff --git a/tools/testing/selftests/bpf/progs/tailcall_fentry_probe.c b/tools/testing/selftests/bpf/progs/tailcall_fentry_probe.c
new file mode 100644
index 000000000000..b784aeffb316
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tailcall_fentry_probe.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+int count = 0;
+
+SEC("fentry/callee")
+int BPF_PROG(fentry_callee, struct sk_buff *skb)
+{
+	count++;
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/tailcall_fentry_target.c b/tools/testing/selftests/bpf/progs/tailcall_fentry_target.c
new file mode 100644
index 000000000000..46da06ce323c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/tailcall_fentry_target.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include "bpf_legacy.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} jmp_table SEC(".maps");
+
+SEC("tc")
+int callee(struct __sk_buff *skb)
+{
+	return 7;
+}
+
+SEC("tc")
+int entry(struct __sk_buff *skb)
+{
+	bpf_tail_call_static(skb, &jmp_table, 0);
+	return 0;
+}
+
+char __license[] SEC("license") = "GPL";
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-27 14:16 ` [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs Takeru Hayasaka
@ 2026-03-27 14:24   ` Alexei Starovoitov
  2026-03-27 15:12     ` Takeru Hayasaka
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2026-03-27 14:24 UTC (permalink / raw)
  To: Takeru Hayasaka
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf, X86 ML,
	open list:KERNEL SELFTEST FRAMEWORK, LKML

On Fri, Mar 27, 2026 at 7:16 AM Takeru Hayasaka <hayatake396@gmail.com> wrote:
>
> x86 tail-call fentry patching mirrors CALL text pokes to the tail-call
> landing slot.
>
> The helper that locates that mirrored slot assumes an ENDBR-prefixed
> landing, which works on IBT JITs but fails on non-IBT JITs where the
> landing starts directly with the 5-byte patch slot.

tailcalls are deprecated. We should go the other way and
disable them ibt jit instead.
The less interaction between fentry and tailcall the better.

pw-bot: cr

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-27 14:24   ` Alexei Starovoitov
@ 2026-03-27 15:12     ` Takeru Hayasaka
  2026-03-27 15:21       ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Takeru Hayasaka @ 2026-03-27 15:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf, X86 ML,
	open list:KERNEL SELFTEST FRAMEWORK, LKML

Hi Alexei

Thanks, and Sorry, I sent an older changelog from while I was still
iterating on this, and it described the issue incorrectly.

My changelog made this sound like an IBT/non-IBT-specific issue, but
that was wrong. On current kernels, fentry on tail-called programs is
not supported in either case. Only the regular fentry patch site is
patched; there is no tail-call landing patching in either case, so
disabling IBT does not make it work.

What this series was trying to do was add support for fentry on
tail-called x86 programs. The non-IBT part was only about a bug in my
initial implementation of that support, not the underlying motivation.

The motivation is observability of existing tailcall-heavy BPF/XDP
programs, where tail-called leaf programs are currently a blind spot for
fentry-based debugging.

If supporting fentry on tail-called programs is still not something
you'd want upstream, I understand. If I resend this, I'll fix the
changelog/cover letter to describe it correctly.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-27 15:12     ` Takeru Hayasaka
@ 2026-03-27 15:21       ` Alexei Starovoitov
  2026-03-27 15:44         ` Takeru Hayasaka
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2026-03-27 15:21 UTC (permalink / raw)
  To: Takeru Hayasaka
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf, X86 ML,
	open list:KERNEL SELFTEST FRAMEWORK, LKML

On Fri, Mar 27, 2026 at 8:12 AM Takeru Hayasaka <hayatake396@gmail.com> wrote:
>
> Hi Alexei
>
> Thanks, and Sorry, I sent an older changelog from while I was still
> iterating on this, and it described the issue incorrectly.
>
> My changelog made this sound like an IBT/non-IBT-specific issue, but
> that was wrong. On current kernels, fentry on tail-called programs is
> not supported in either case. Only the regular fentry patch site is
> patched; there is no tail-call landing patching in either case, so
> disabling IBT does not make it work.
>
> What this series was trying to do was add support for fentry on
> tail-called x86 programs. The non-IBT part was only about a bug in my
> initial implementation of that support, not the underlying motivation.
>
> The motivation is observability of existing tailcall-heavy BPF/XDP
> programs, where tail-called leaf programs are currently a blind spot for
> fentry-based debugging.

I get that, but I'd rather not open this can of worms.
We had enough headaches when tailcalls, fentry, subprogs are combined.
Like this set:
https://lore.kernel.org/all/20230912150442.2009-1-hffilwlqm@gmail.com/
and the followups.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-27 15:21       ` Alexei Starovoitov
@ 2026-03-27 15:44         ` Takeru Hayasaka
  2026-03-27 15:58           ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Takeru Hayasaka @ 2026-03-27 15:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf, X86 ML,
	open list:KERNEL SELFTEST FRAMEWORK, LKML

Understood. I was a bit surprised to read that this area ended up taking
months of follow-up work....

One thing I am still trying to understand is what the preferred
debuggability/observability direction would be for existing
tailcall-heavy BPF/XDP deployments.

Tail calls are already used in practice as a program decomposition
mechanism, especially in XDP pipelines, and that leaves tail-called leaf
programs harder to observe today.

If fentry on tail-called programs is not something you'd want upstream,
is there another direction you would recommend for improving
observability/debuggability of such existing deployments?

2026年3月28日(土) 0:21 Alexei Starovoitov <alexei.starovoitov@gmail.com>:
>
> On Fri, Mar 27, 2026 at 8:12 AM Takeru Hayasaka <hayatake396@gmail.com> wrote:
> >
> > Hi Alexei
> >
> > Thanks, and Sorry, I sent an older changelog from while I was still
> > iterating on this, and it described the issue incorrectly.
> >
> > My changelog made this sound like an IBT/non-IBT-specific issue, but
> > that was wrong. On current kernels, fentry on tail-called programs is
> > not supported in either case. Only the regular fentry patch site is
> > patched; there is no tail-call landing patching in either case, so
> > disabling IBT does not make it work.
> >
> > What this series was trying to do was add support for fentry on
> > tail-called x86 programs. The non-IBT part was only about a bug in my
> > initial implementation of that support, not the underlying motivation.
> >
> > The motivation is observability of existing tailcall-heavy BPF/XDP
> > programs, where tail-called leaf programs are currently a blind spot for
> > fentry-based debugging.
>
> I get that, but I'd rather not open this can of worms.
> We had enough headaches when tailcalls, fentry, subprogs are combined.
> Like this set:
> https://lore.kernel.org/all/20230912150442.2009-1-hffilwlqm@gmail.com/
> and the followups.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-27 15:44         ` Takeru Hayasaka
@ 2026-03-27 15:58           ` Alexei Starovoitov
  2026-03-27 16:06             ` Takeru Hayasaka
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2026-03-27 15:58 UTC (permalink / raw)
  To: Takeru Hayasaka
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf, X86 ML,
	open list:KERNEL SELFTEST FRAMEWORK, LKML

On Fri, Mar 27, 2026 at 8:45 AM Takeru Hayasaka <hayatake396@gmail.com> wrote:
>
> Understood. I was a bit surprised to read that this area ended up taking
> months of follow-up work....
>
> One thing I am still trying to understand is what the preferred
> debuggability/observability direction would be for existing
> tailcall-heavy BPF/XDP deployments.
>
> Tail calls are already used in practice as a program decomposition
> mechanism, especially in XDP pipelines, and that leaves tail-called leaf
> programs harder to observe today.
>
> If fentry on tail-called programs is not something you'd want upstream,
> is there another direction you would recommend for improving
> observability/debuggability of such existing deployments?

You don't need fentry to debug.
perf works just fine on all bpf progs whether tailcall or not.

Also pls don't top post.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-27 15:58           ` Alexei Starovoitov
@ 2026-03-27 16:06             ` Takeru Hayasaka
  2026-03-27 16:09               ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Takeru Hayasaka @ 2026-03-27 16:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf, X86 ML,
	open list:KERNEL SELFTEST FRAMEWORK, LKML

Sorry about the top-posting.

That makes sense, thanks. I agree perf can provide visibility into which
BPF programs are running, including tail-called ones.

What I am still unsure about is packet-level / structured-data
observability. My use case is closer to xdpdump-style debugging, where I
want to inspect packet-related context from specific XDP leaf programs
in a live pipeline.

That feels harder to express with perf alone, so I am trying to
understand what the preferred direction would be for that kind of use
case in tailcall-heavy XDP deployments.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-27 16:06             ` Takeru Hayasaka
@ 2026-03-27 16:09               ` Alexei Starovoitov
  2026-03-27 16:30                 ` Takeru Hayasaka
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2026-03-27 16:09 UTC (permalink / raw)
  To: Takeru Hayasaka
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf, X86 ML,
	open list:KERNEL SELFTEST FRAMEWORK, LKML

On Fri, Mar 27, 2026 at 9:06 AM Takeru Hayasaka <hayatake396@gmail.com> wrote:
>
> Sorry about the top-posting.

yet you're still top posting :(

> That makes sense, thanks. I agree perf can provide visibility into which
> BPF programs are running, including tail-called ones.
>
> What I am still unsure about is packet-level / structured-data
> observability. My use case is closer to xdpdump-style debugging, where I
> want to inspect packet-related context from specific XDP leaf programs
> in a live pipeline.

see how cilium did it. with pwru tool, etc.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-27 16:09               ` Alexei Starovoitov
@ 2026-03-27 16:30                 ` Takeru Hayasaka
  2026-03-30  9:07                   ` Leon Hwang
  0 siblings, 1 reply; 15+ messages in thread
From: Takeru Hayasaka @ 2026-03-27 16:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf, X86 ML,
	open list:KERNEL SELFTEST FRAMEWORK, LKML

> yet you're still top posting :(

Sorry about that. I misunderstood what top posting meant and ended up
replying in the wrong style.
I had not understood that it referred to quoting in that way, and I am
embarrassed that I got it wrong....

> see how cilium did it. with pwru tool, etc.

Thank you for the suggestion.
As for pwru, I had thought it was not able to capture packet data such as pcap,
and understood it more as a tool to trace where a specific packet
enters the processing path and how it is handled.

For example, in an environment where systems are already
interconnected and running, I sometimes want to capture the actual
packets being sent for real processing.
On the other hand, if the goal is simply to observe processing safely
in a development environment, I think tools such as ipftrace2 or pwru
can be very useful.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-27 16:30                 ` Takeru Hayasaka
@ 2026-03-30  9:07                   ` Leon Hwang
  2026-03-30 16:46                     ` Takeru Hayasaka
  0 siblings, 1 reply; 15+ messages in thread
From: Leon Hwang @ 2026-03-30  9:07 UTC (permalink / raw)
  To: Takeru Hayasaka, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf, X86 ML,
	open list:KERNEL SELFTEST FRAMEWORK, LKML

On 28/3/26 00:30, Takeru Hayasaka wrote:
>> see how cilium did it. with pwru tool, etc.
> 
> Thank you for the suggestion.
> As for pwru, I had thought it was not able to capture packet data such as pcap,
> and understood it more as a tool to trace where a specific packet
> enters the processing path and how it is handled.
> 
> For example, in an environment where systems are already
> interconnected and running, I sometimes want to capture the actual
> packets being sent for real processing.
> On the other hand, if the goal is simply to observe processing safely
> in a development environment, I think tools such as ipftrace2 or pwru
> can be very useful.
> 

Sounds like you are developing/maintaining an XDP project.

If so, and the kernel carries the patches in
https://lore.kernel.org/all/20230912150442.2009-1-hffilwlqm@gmail.com/,
recommend modifying the XDP project using dispatcher like libxdp [1].
Then, you are able to trace the subprogs which aim to run tail calls;
meanwhile, you are able to filter packets using pcap-filter, and to
output packets using bpf_xdp_output() helper.

[1]
https://github.com/xdp-project/xdp-tools/blob/main/lib/libxdp/xdp-dispatcher.c.in


Thanks,
Leon


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-30  9:07                   ` Leon Hwang
@ 2026-03-30 16:46                     ` Takeru Hayasaka
  2026-03-31  2:24                       ` Leon Hwang
  0 siblings, 1 reply; 15+ messages in thread
From: Takeru Hayasaka @ 2026-03-30 16:46 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, X86 ML, open list:KERNEL SELFTEST FRAMEWORK,
	LKML

> Sounds like you are developing/maintaining an XDP project.
>
> If so, and the kernel carries the patches in
> https://lore.kernel.org/all/20230912150442.2009-1-hffilwlqm@gmail.com/,
> recommend modifying the XDP project using dispatcher like libxdp [1].
> Then, you are able to trace the subprogs which aim to run tail calls;
> meanwhile, you are able to filter packets using pcap-filter, and to
> output packets using bpf_xdp_output() helper.
>
> [1]
> https://github.com/xdp-project/xdp-tools/blob/main/lib/libxdp/xdp-dispatcher.c.in

Thank you very much for your wonderful comment, Leon.
This was the first time I learned that such a mechanism exists.

It is a very interesting ecosystem.
If I understand correctly, the idea is to invoke a component that
dumps pcap data as one of the tail-called components, right?
Thank you very much for sharing this idea with me.
If I have a chance to write a new XDP program in the future, I would
definitely like to try it.

On the other hand, I feel that it is somewhat difficult to apply this
idea directly to existing codebases, or to cases where the code is
written in Go using something like cilium/ebpf.
Also, when it comes to code running in production environments, making
changes itself can be difficult.

For that reason, I prototyped a tool like this.
It is something like a middle ground between xdpdump and xdpcap.
I built it so that only packets matched by cbpf are sent up through
perf, and while testing it, I noticed that it does not work well for
targets invoked via tail call.
This is what motivated me to send the patch.

https://github.com/takehaya/xdp-ninja

Once again, thank you for sharing the idea.
Takeru

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-30 16:46                     ` Takeru Hayasaka
@ 2026-03-31  2:24                       ` Leon Hwang
  2026-03-31  4:53                         ` Takeru Hayasaka
  0 siblings, 1 reply; 15+ messages in thread
From: Leon Hwang @ 2026-03-31  2:24 UTC (permalink / raw)
  To: Takeru Hayasaka
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, X86 ML, open list:KERNEL SELFTEST FRAMEWORK,
	LKML

On 31/3/26 00:46, Takeru Hayasaka wrote:
>> Sounds like you are developing/maintaining an XDP project.
>>
>> If so, and the kernel carries the patches in
>> https://lore.kernel.org/all/20230912150442.2009-1-hffilwlqm@gmail.com/,
>> recommend modifying the XDP project using dispatcher like libxdp [1].
>> Then, you are able to trace the subprogs which aim to run tail calls;
>> meanwhile, you are able to filter packets using pcap-filter, and to
>> output packets using bpf_xdp_output() helper.
>>
>> [1]
>> https://github.com/xdp-project/xdp-tools/blob/main/lib/libxdp/xdp-dispatcher.c.in
> 
> Thank you very much for your wonderful comment, Leon.
> This was the first time I learned that such a mechanism exists.
> 
> It is a very interesting ecosystem.
> If I understand correctly, the idea is to invoke a component that
> dumps pcap data as one of the tail-called components, right?

It is similar to xdp-ninja/xdp-dump.

However, this idea has one more step forward: it is to trace the
subprogs instead of only the main prog.

For example,

__noinline int subprog0(struct xdp_md *xdp) { bpf_tail_call_static(xdp,
&m, 0); }
__noinline int subprog1(struct xdp_md *xdp) { bpf_tail_call_static(xdp,
&m, 1); }
__noinline int subprog2(struct xdp_md *xdp) { bpf_tail_call_static(xdp,
&m, 2); }
SEC("xdp") int main(struct xdp_md *xdp)
{
	subprog0(xdp);
	subprog1(xdp);
	subprog2(xdp);
	return XDP_PASS;
}

All of them, subprog{0,1,2} and main, will be traced.

In this idea, it is to inject pcap-filter expression, the cbpf, using
elibpcap [1], and to output packets like your xdp-ninja.

It works well during the time I maintained an XDP project.

[1] https://github.com/jschwinger233/elibpcap

> Thank you very much for sharing this idea with me.
> If I have a chance to write a new XDP program in the future, I would
> definitely like to try it.
> 
> On the other hand, I feel that it is somewhat difficult to apply this
> idea directly to existing codebases, or to cases where the code is> written in Go using something like cilium/ebpf.
> Also, when it comes to code running in production environments, making
> changes itself can be difficult.

Correct. If cannot modify the code, and the tail calls are not called
inner subprogs, the aforementioned idea is helpless to trace the tail
callees.

> 
> For that reason, I prototyped a tool like this.
> It is something like a middle ground between xdpdump and xdpcap.
> I built it so that only packets matched by cbpf are sent up through
> perf, and while testing it, I noticed that it does not work well for
> targets invoked via tail call.
> This is what motivated me to send the patch.
> 

I have similar idea years ago, a more generic tracer for tail calls.
However, as Alexei's concern, I won't post it.

> https://github.com/takehaya/xdp-ninja
> 

It looks wonderful.

I developed a similar tool, bpfsnoop [1], to trace BPF progs/subprogs
and kernel functions with filtering packets/arguments and outputting
packets/arguments info. However, it lacks the ability of outputting
packets to pcap file.

[1] https://github.com/bpfsnoop/bpfsnoop

Thanks,
Leon

> Once again, thank you for sharing the idea.
> Takeru


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs
  2026-03-31  2:24                       ` Leon Hwang
@ 2026-03-31  4:53                         ` Takeru Hayasaka
  0 siblings, 0 replies; 15+ messages in thread
From: Takeru Hayasaka @ 2026-03-31  4:53 UTC (permalink / raw)
  To: Leon Hwang
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, X86 ML, open list:KERNEL SELFTEST FRAMEWORK,
	LKML

> It is similar to xdp-ninja/xdp-dump.
>
> However, this idea has one more step forward: it is to trace the
> subprogs instead of only the main prog.
>
> For example,
>
> __noinline int subprog0(struct xdp_md *xdp) { bpf_tail_call_static(xdp,
> &m, 0); }
> __noinline int subprog1(struct xdp_md *xdp) { bpf_tail_call_static(xdp,
> &m, 1); }
> __noinline int subprog2(struct xdp_md *xdp) { bpf_tail_call_static(xdp,
> &m, 2); }
> SEC("xdp") int main(struct xdp_md *xdp)
> {
>         subprog0(xdp);
>         subprog1(xdp);
>         subprog2(xdp);
>         return XDP_PASS;
> }
>
> All of them, subprog{0,1,2} and main, will be traced.
>
> In this idea, it is to inject pcap-filter expression, the cbpf, using
> elibpcap [1], and to output packets like your xdp-ninja.
>
> It works well during the time I maintained an XDP project.
>
> [1] https://github.com/jschwinger233/elibpcap

Thank you very much for your kind reply.
elibpcap is a very interesting idea as well.

I had also considered whether making it __noinline might leave the
function prologue intact, so that it could be hooked via trampoline. I
actually tried that idea myself, but I had not yet been able to get it
working. I had not investigated it in enough detail, but I was
suspecting that it might already be expanded by JIT, which could make
that approach difficult.

That is why I thought it was excellent that you realized it by taking
the approach of inserting the logic there in the first place, rather
than trying to hook it afterward.

I had also been thinking about supporting use cases where packets need
to be captured only after some packet-related decision has already
been made in the program. For example, there are cases where the value
to match depends on runtime state, such as a Session ID that changes
whenever a tunnel is established. I think this kind of approach is
very helpful for such cases.

So, I was very happy to learn that someone else had been thinking
about a similar technique.

> It looks wonderful.
>
> I developed a similar tool, bpfsnoop [1], to trace BPF progs/subprogs
> and kernel functions with filtering packets/arguments and outputting
> packets/arguments info. However, it lacks the ability of outputting
> packets to pcap file.
>
> [1] https://github.com/bpfsnoop/bpfsnoop

Also, bpfsnoop looks like a very nice tool. I starred it on GitHub:)

Thank you very much again. I am very grateful that you shared such an
excellent idea with me.

Thanks,
Takeru

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-03-31  5:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-27 14:16 [PATCH bpf-next 0/2] bpf: enable x86 fentry on tail-called programs Takeru Hayasaka
2026-03-27 14:16 ` [PATCH bpf-next 1/2] bpf, x86: patch tail-call fentry slot on non-IBT JITs Takeru Hayasaka
2026-03-27 14:24   ` Alexei Starovoitov
2026-03-27 15:12     ` Takeru Hayasaka
2026-03-27 15:21       ` Alexei Starovoitov
2026-03-27 15:44         ` Takeru Hayasaka
2026-03-27 15:58           ` Alexei Starovoitov
2026-03-27 16:06             ` Takeru Hayasaka
2026-03-27 16:09               ` Alexei Starovoitov
2026-03-27 16:30                 ` Takeru Hayasaka
2026-03-30  9:07                   ` Leon Hwang
2026-03-30 16:46                     ` Takeru Hayasaka
2026-03-31  2:24                       ` Leon Hwang
2026-03-31  4:53                         ` Takeru Hayasaka
2026-03-27 14:16 ` [PATCH bpf-next 2/2] selftests/bpf: cover fentry on tailcalled programs Takeru Hayasaka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox