From: Leon Hwang <leon.hwang@linux.dev>
To: Xu Kuohai <xukuohai@huaweicloud.com>,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Eduard Zingerman <eddyz87@gmail.com>,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Yonghong Song <yonghong.song@linux.dev>,
Jiri Olsa <jolsa@kernel.org>, KP Singh <kpsingh@kernel.org>,
Anton Protopopov <a.s.protopopov@gmail.com>,
Amery Hung <ameryhung@gmail.com>,
Eyal Birger <eyal.birger@gmail.com>, Rong Tao <rongtao@cestc.cn>
Subject: Re: [RFC PATCH bpf-next 00/12] bpf: Introduce static-defined tracing probe for BPF
Date: Mon, 29 Jun 2026 13:26:43 +0800 [thread overview]
Message-ID: <b8d8ab5e-bef1-4c95-8e5b-ea9768041b2e@linux.dev> (raw)
In-Reply-To: <ec28f0f9-0693-4c9b-88fa-bffca446a63c@huaweicloud.com>
On 29/6/26 11:27, Xu Kuohai wrote:
> On 6/29/2026 10:14 AM, Leon Hwang wrote:
>> Hi Kuohai,
>>
>> On 28/6/26 06:51, Xu Kuohai wrote:
>>> From: Xu Kuohai <xukuohai@huawei.com>
>>>
>>> This series introduces static-defined tracing probes for BPF programs.
>>> BPF SDT (static-defined tracing) works similarly to USDT. User defines
>>
>>
>> At first glance, the SDT idea looks cool to me.
>>
>> However, what's your purpose of introducing SDT?
>>
>
> Well, the purpose is to add a dynamic, zero-overhead tracing mechanism for
> bpf, not just at function entry, but anywhere inside the prog source code.
>
Better to carry the purpose in cover letter in the future.
>> If to provide points in bpf progs to be traced, like tracepoints in
>> kernel functions, I think subprog+fentry is an alternative approach.
>> Comparing with SDT, subprog+fentry requires a function call at run time,
>> instead of a NOP like SDT.
>>
[...]
>>
>> Furthermore, if users don't want a function call at run time, e.g. they
>> don't want to call 'my_trace' at run time in production, they can patch
>> the callsite of 'my_trace' with NOP before loading 'xdp_prog', and drop
>> the subprog 'my_trace' in their user space application. This elimination
>> is approachable, since it is used heavily in bpfsnoop [1].
>
> Sounds like the subprog+fentry you described gives a good evidance for real
> demand of dyanmic tracing inside function body.
Correct.
A subprog in an existing bpf prog can be used to inspect the prog's
runtime details, including the 'tail_call_cnt' on the stack.
An extra subprog as stub is better for dynamic tracing.
See my blog post:
https://blog.leonhw.com/post/ebpf-talk-138-debug-tailcall-bug-with-fentry/.
>
> IIUC, even though the CALL instruction at the callsite is patched to NOP at
> runtime, the argument preparation instructions - r1 = len, r2 = ctx -
> remain
Correct.
> in the callsite. For SDT, the argument preparation is recorded as metadata
> out of line, and is never executed.
So, does argument preparation require the verifier to analyze the
registers to identify the argument registers, when an SDT is defined in
a prog? What if the verifier cannot identify them?
>
> And I think SDT is cleaner and easier to use. User just declares the
> prototype
> and insert the probe, no need to hack with subprog+fentry.
>
>> However, this elimination is not easy to understand. Want me to show >
>> mored etails about this elimination?
>>
>
> That would be appreciated, thanks.
>
static __noinline void
subprog(int len, int ret)
{
__sink(len);
__sink(ret);
}
SEC("xdp")
int xdp_fn(struct xdp_md *ctx)
{
struct ethhdr *eth = (struct ethhdr *)(ctx_ptr(ctx, data));
struct iphdr *iph = (struct iphdr *)(eth + 1);
int len = ctx->data_end - ctx->data;
int ret = XDP_PASS;
if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
return ret;
if (iph->protocol != IPPROTO_ICMP)
return ret;
barrier_var(ret);
subprog(len, ret);
return ret;
}
After attaching 'xdp_fn' to 'lo',
===
without the elimination:
bpftool p d x n xdp_fn
int xdp_fn(struct xdp_md * ctx):
; int xdp_fn(struct xdp_md *ctx)
0: (b7) r0 = 2
; struct ethhdr *eth = (struct ethhdr *)(ctx_ptr(ctx, data));
1: (79) r2 = *(u64 *)(r1 +0)
; int len = ctx->data_end - ctx->data;
2: (79) r1 = *(u64 *)(r1 +8)
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
3: (bf) r3 = r2
4: (07) r3 += 34
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
5: (2d) if r3 > r1 goto pc+10
6: (b7) r3 = 9
; struct iphdr *iph = (struct iphdr *)(eth + 1);
7: (bf) r4 = r2
8: (0f) r4 += r3
; if (iph->protocol != IPPROTO_ICMP)
9: (71) r3 = *(u8 *)(r4 +14)
; if (iph->protocol != IPPROTO_ICMP)
10: (55) if r3 != 0x1 goto pc+5
11: (1f) r1 -= r2
12: (b7) r6 = 2
; subprog(len, ret);
13: (bf) r2 = r6
14: (85) call pc+2#bpf_prog_6a2f766e16102c10_subprog
15: (bf) r0 = r6
; }
16: (95) exit
void subprog(int len, int ret):
; subprog(int len, int ret)
17: (63) *(u32 *)(r10 -8) = r2
18: (63) *(u32 *)(r10 -4) = r1
; __sink(len);
19: (63) *(u32 *)(r10 -4) = r1
; __sink(ret);
20: (63) *(u32 *)(r10 -8) = r2
; }
21: (95) exit
bpftool p d j n xdp_fn
int xdp_fn(struct xdp_md * ctx):
bpf_prog_6480db4581c3a618_xdp_fn:
; int xdp_fn(struct xdp_md *ctx)
0: nopl (%rax,%rax)
5: nop
7: pushq %rbp
8: movq %rsp, %rbp
b: pushq %rbx
c: movl $2, %eax
; struct ethhdr *eth = (struct ethhdr *)(ctx_ptr(ctx, data));
11: movq (%rdi), %rsi
; int len = ctx->data_end - ctx->data;
15: movq 8(%rdi), %rdi
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
19: movq %rsi, %rdx
1c: addq $34, %rdx
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
20: cmpq %rdi, %rdx
23: ja 0xffffffffc0000926
25: movl $9, %edx
; struct iphdr *iph = (struct iphdr *)(eth + 1);
2a: movq %rsi, %rcx
2d: addq %rdx, %rcx
; if (iph->protocol != IPPROTO_ICMP)
30: movzbq 14(%rcx), %rdx
; if (iph->protocol != IPPROTO_ICMP)
35: cmpq $1, %rdx
39: jne 0xffffffffc0000926
3b: subq %rsi, %rdi
3e: movl $2, %ebx
; subprog(len, ret);
43: movq %rbx, %rsi
46: callq 0xffffffffc00009d0
4b: movq %rbx, %rax
; }
4e: popq %rbx
4f: leave
50: retq
51: int3
void subprog(int len, int ret):
bpf_prog_6a2f766e16102c10_subprog:
; subprog(int len, int ret)
0: nopl (%rax,%rax)
5: nop
7: pushq %rbp
8: movq %rsp, %rbp
b: subq $8, %rsp
12: movl %esi, -8(%rbp)
15: movl %edi, -4(%rbp)
; __sink(len);
18: movl %edi, -4(%rbp)
; __sink(ret);
1b: movl %esi, -8(%rbp)
; }
1e: leave
1f: retq
20: int3
===
with the elimination:
bpftool p d x n xdp_fn
int xdp_fn(struct xdp_md * ctx):
; int xdp_fn(struct xdp_md *ctx)
0: (b7) r0 = 2
; struct ethhdr *eth = (struct ethhdr *)(ctx_ptr(ctx, data));
1: (79) r2 = *(u64 *)(r1 +0)
; int len = ctx->data_end - ctx->data;
2: (79) r1 = *(u64 *)(r1 +8)
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
3: (bf) r3 = r2
4: (07) r3 += 34
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
5: (2d) if r3 > r1 goto pc+9
6: (b7) r3 = 9
; struct iphdr *iph = (struct iphdr *)(eth + 1);
7: (bf) r4 = r2
8: (0f) r4 += r3
; if (iph->protocol != IPPROTO_ICMP)
9: (71) r3 = *(u8 *)(r4 +14)
; if (iph->protocol != IPPROTO_ICMP)
10: (55) if r3 != 0x1 goto pc+4
11: (1f) r1 -= r2
12: (b7) r6 = 2
; subprog(len, ret);
13: (bf) r2 = r6
14: (bf) r0 = r6
; }
15: (95) exit
bpftool p d j n xdp_fn
int xdp_fn(struct xdp_md * ctx):
bpf_prog_861d0ecc72ad8d9e_xdp_fn:
; int xdp_fn(struct xdp_md *ctx)
0: nopl (%rax,%rax)
5: nop
7: pushq %rbp
8: movq %rsp, %rbp
b: pushq %rbx
c: movl $2, %eax
; struct ethhdr *eth = (struct ethhdr *)(ctx_ptr(ctx, data));
11: movq (%rdi), %rsi
; int len = ctx->data_end - ctx->data;
15: movq 8(%rdi), %rdi
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
19: movq %rsi, %rdx
1c: addq $34, %rdx
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
20: cmpq %rdi, %rdx
23: ja 0xffffffffc0000929
25: movl $9, %edx
; struct iphdr *iph = (struct iphdr *)(eth + 1);
2a: movq %rsi, %rcx
2d: addq %rdx, %rcx
; if (iph->protocol != IPPROTO_ICMP)
30: movzbq 14(%rcx), %rdx
; if (iph->protocol != IPPROTO_ICMP)
35: cmpq $1, %rdx
39: jne 0xffffffffc0000929
3b: subq %rsi, %rdi
3e: movl $2, %ebx
; subprog(len, ret);
43: movq %rbx, %rsi
46: movq %rbx, %rax
; }
49: popq %rbx
4a: leave
4b: retq
4c: int3
Demo source code:
https://github.com/Asphaltt/learn-by-example/tree/main/ebpf/eliminate-subprog
With the elimination, the callsite of 'subprog' has been eliminated.
However, as you mentioned above, the argument preparation insns are kept.
Thanks,
Leon
next prev parent reply other threads:[~2026-06-29 5:27 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-27 22:51 [RFC PATCH bpf-next 00/12] bpf: Introduce static-defined tracing probe for BPF Xu Kuohai
2026-06-27 20:51 ` [syzbot ci] " syzbot ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 01/12] libbpf: Prepare bpf SDT probe section for the linker Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 02/12] libbpf: Introduce bpf SDT probe macros Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 03/12] libbpf: Add bpf_sdt_notes section parser Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 04/12] bpf: Create insn_array map for bpf SDT probe Xu Kuohai
2026-06-27 15:34 ` bot+bpf-ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 05/12] bpf: Collect SDT probe BTF IDs from BTF decl tags Xu Kuohai
2026-06-27 15:34 ` bot+bpf-ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 06/12] bpf: Add type check for SDT probe site Xu Kuohai
2026-06-27 15:22 ` bot+bpf-ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 07/12] bpf: Record probe name in SDT map Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 08/12] libbpf: Add libbpf support to load SDT observer program Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 09/12] bpf: Add kernel " Xu Kuohai
2026-06-27 15:22 ` bot+bpf-ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 10/12] bpf: Support attach and detach for " Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 11/12] bpf, x86: Add JIT support SDT for probe Xu Kuohai
2026-06-27 15:22 ` bot+bpf-ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 12/12] selftests/bpf: Add tests for bpf SDT probe Xu Kuohai
2026-06-29 2:14 ` [RFC PATCH bpf-next 00/12] bpf: Introduce static-defined tracing probe for BPF Leon Hwang
2026-06-29 3:27 ` Xu Kuohai
2026-06-29 5:26 ` Leon Hwang [this message]
2026-06-29 7:51 ` Xu Kuohai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b8d8ab5e-bef1-4c95-8e5b-ea9768041b2e@linux.dev \
--to=leon.hwang@linux.dev \
--cc=a.s.protopopov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=eyal.birger@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=memxor@gmail.com \
--cc=rongtao@cestc.cn \
--cc=xukuohai@huaweicloud.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox