The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Leon Hwang <leon.hwang@linux.dev>
To: Xu Kuohai <xukuohai@huaweicloud.com>,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Yonghong Song <yonghong.song@linux.dev>,
	Jiri Olsa <jolsa@kernel.org>, KP Singh <kpsingh@kernel.org>,
	Anton Protopopov <a.s.protopopov@gmail.com>,
	Amery Hung <ameryhung@gmail.com>,
	Eyal Birger <eyal.birger@gmail.com>, Rong Tao <rongtao@cestc.cn>
Subject: Re: [RFC PATCH bpf-next 00/12] bpf: Introduce static-defined tracing probe for BPF
Date: Mon, 29 Jun 2026 13:26:43 +0800	[thread overview]
Message-ID: <b8d8ab5e-bef1-4c95-8e5b-ea9768041b2e@linux.dev> (raw)
In-Reply-To: <ec28f0f9-0693-4c9b-88fa-bffca446a63c@huaweicloud.com>

On 29/6/26 11:27, Xu Kuohai wrote:
> On 6/29/2026 10:14 AM, Leon Hwang wrote:
>> Hi Kuohai,
>>
>> On 28/6/26 06:51, Xu Kuohai wrote:
>>> From: Xu Kuohai <xukuohai@huawei.com>
>>>
>>> This series introduces static-defined tracing probes for BPF programs.
>>> BPF SDT (static-defined tracing) works similarly to USDT. User defines
>>
>>
>> At first glance, the SDT idea looks cool to me.
>>
>> However, what's your purpose of introducing SDT?
>>
> 
> Well, the purpose is to add a dynamic, zero-overhead tracing mechanism for
> bpf, not just at function entry, but anywhere inside the prog source code.
> 

Better to carry the purpose in cover letter in the future.

>> If to provide points in bpf progs to be traced, like tracepoints in
>> kernel functions, I think subprog+fentry is an alternative approach.
>> Comparing with SDT, subprog+fentry requires a function call at run time,
>> instead of a NOP like SDT.
>>
[...]
>>
>> Furthermore, if users don't want a function call at run time, e.g. they
>> don't want to call 'my_trace' at run time in production, they can patch
>> the callsite of 'my_trace' with NOP before loading 'xdp_prog', and drop
>> the subprog 'my_trace' in their user space application. This elimination
>> is approachable, since it is used heavily in bpfsnoop [1].
> 
> Sounds like the subprog+fentry you described gives a good evidance for real
> demand of dyanmic tracing inside function body.

Correct.

A subprog in an existing bpf prog can be used to inspect the prog's
runtime details, including the 'tail_call_cnt' on the stack.

An extra subprog as stub is better for dynamic tracing.

See my blog post:
https://blog.leonhw.com/post/ebpf-talk-138-debug-tailcall-bug-with-fentry/.

> 
> IIUC, even though the CALL instruction at the callsite is patched to NOP at
> runtime, the argument preparation instructions - r1 = len, r2 = ctx -
> remain

Correct.

> in the callsite. For SDT, the argument preparation is recorded as metadata
> out of line, and is never executed.

So, does argument preparation require the verifier to analyze the
registers to identify the argument registers, when an SDT is defined in
a prog? What if the verifier cannot identify them?

> 
> And I think SDT is cleaner and easier to use. User just declares the
> prototype
> and insert the probe, no need to hack with subprog+fentry.
> 
>> However, this elimination is not easy to understand. Want me to show >
>> mored etails about this elimination?
>>
> 
> That would be appreciated, thanks.
> 

static __noinline void
subprog(int len, int ret)
{
    __sink(len);
    __sink(ret);
}

SEC("xdp")
int xdp_fn(struct xdp_md *ctx)
{
    struct ethhdr *eth = (struct ethhdr *)(ctx_ptr(ctx, data));
    struct iphdr *iph = (struct iphdr *)(eth + 1);
    int len = ctx->data_end - ctx->data;
    int ret = XDP_PASS;

    if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
        return ret;

    if (iph->protocol != IPPROTO_ICMP)
        return ret;

    barrier_var(ret);
    subprog(len, ret);

    return ret;
}

After attaching 'xdp_fn' to 'lo',

===
without the elimination:

bpftool p d x n xdp_fn
int xdp_fn(struct xdp_md * ctx):
; int xdp_fn(struct xdp_md *ctx)
   0: (b7) r0 = 2
; struct ethhdr *eth = (struct ethhdr *)(ctx_ptr(ctx, data));
   1: (79) r2 = *(u64 *)(r1 +0)
; int len = ctx->data_end - ctx->data;
   2: (79) r1 = *(u64 *)(r1 +8)
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
   3: (bf) r3 = r2
   4: (07) r3 += 34
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
   5: (2d) if r3 > r1 goto pc+10
   6: (b7) r3 = 9
; struct iphdr *iph = (struct iphdr *)(eth + 1);
   7: (bf) r4 = r2
   8: (0f) r4 += r3
; if (iph->protocol != IPPROTO_ICMP)
   9: (71) r3 = *(u8 *)(r4 +14)
; if (iph->protocol != IPPROTO_ICMP)
  10: (55) if r3 != 0x1 goto pc+5
  11: (1f) r1 -= r2
  12: (b7) r6 = 2
; subprog(len, ret);
  13: (bf) r2 = r6
  14: (85) call pc+2#bpf_prog_6a2f766e16102c10_subprog
  15: (bf) r0 = r6
; }
  16: (95) exit
void subprog(int len, int ret):
; subprog(int len, int ret)
  17: (63) *(u32 *)(r10 -8) = r2
  18: (63) *(u32 *)(r10 -4) = r1
; __sink(len);
  19: (63) *(u32 *)(r10 -4) = r1
; __sink(ret);
  20: (63) *(u32 *)(r10 -8) = r2
; }
  21: (95) exit


bpftool p d j n xdp_fn
int xdp_fn(struct xdp_md * ctx):
bpf_prog_6480db4581c3a618_xdp_fn:
; int xdp_fn(struct xdp_md *ctx)
   0:	nopl	(%rax,%rax)
   5:	nop
   7:	pushq	%rbp
   8:	movq	%rsp, %rbp
   b:	pushq	%rbx
   c:	movl	$2, %eax
; struct ethhdr *eth = (struct ethhdr *)(ctx_ptr(ctx, data));
  11:	movq	(%rdi), %rsi
; int len = ctx->data_end - ctx->data;
  15:	movq	8(%rdi), %rdi
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
  19:	movq	%rsi, %rdx
  1c:	addq	$34, %rdx
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
  20:	cmpq	%rdi, %rdx
  23:	ja	0xffffffffc0000926
  25:	movl	$9, %edx
; struct iphdr *iph = (struct iphdr *)(eth + 1);
  2a:	movq	%rsi, %rcx
  2d:	addq	%rdx, %rcx
; if (iph->protocol != IPPROTO_ICMP)
  30:	movzbq	14(%rcx), %rdx
; if (iph->protocol != IPPROTO_ICMP)
  35:	cmpq	$1, %rdx
  39:	jne	0xffffffffc0000926
  3b:	subq	%rsi, %rdi
  3e:	movl	$2, %ebx
; subprog(len, ret);
  43:	movq	%rbx, %rsi
  46:	callq	0xffffffffc00009d0
  4b:	movq	%rbx, %rax
; }
  4e:	popq	%rbx
  4f:	leave
  50:	retq
  51:	int3

void subprog(int len, int ret):
bpf_prog_6a2f766e16102c10_subprog:
; subprog(int len, int ret)
   0:	nopl	(%rax,%rax)
   5:	nop
   7:	pushq	%rbp
   8:	movq	%rsp, %rbp
   b:	subq	$8, %rsp
  12:	movl	%esi, -8(%rbp)
  15:	movl	%edi, -4(%rbp)
; __sink(len);
  18:	movl	%edi, -4(%rbp)
; __sink(ret);
  1b:	movl	%esi, -8(%rbp)
; }
  1e:	leave
  1f:	retq
  20:	int3


===
with the elimination:

bpftool p d x n xdp_fn
int xdp_fn(struct xdp_md * ctx):
; int xdp_fn(struct xdp_md *ctx)
   0: (b7) r0 = 2
; struct ethhdr *eth = (struct ethhdr *)(ctx_ptr(ctx, data));
   1: (79) r2 = *(u64 *)(r1 +0)
; int len = ctx->data_end - ctx->data;
   2: (79) r1 = *(u64 *)(r1 +8)
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
   3: (bf) r3 = r2
   4: (07) r3 += 34
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
   5: (2d) if r3 > r1 goto pc+9
   6: (b7) r3 = 9
; struct iphdr *iph = (struct iphdr *)(eth + 1);
   7: (bf) r4 = r2
   8: (0f) r4 += r3
; if (iph->protocol != IPPROTO_ICMP)
   9: (71) r3 = *(u8 *)(r4 +14)
; if (iph->protocol != IPPROTO_ICMP)
  10: (55) if r3 != 0x1 goto pc+4
  11: (1f) r1 -= r2
  12: (b7) r6 = 2
; subprog(len, ret);
  13: (bf) r2 = r6
  14: (bf) r0 = r6
; }
  15: (95) exit


bpftool p d j n xdp_fn
int xdp_fn(struct xdp_md * ctx):
bpf_prog_861d0ecc72ad8d9e_xdp_fn:
; int xdp_fn(struct xdp_md *ctx)
   0:	nopl	(%rax,%rax)
   5:	nop
   7:	pushq	%rbp
   8:	movq	%rsp, %rbp
   b:	pushq	%rbx
   c:	movl	$2, %eax
; struct ethhdr *eth = (struct ethhdr *)(ctx_ptr(ctx, data));
  11:	movq	(%rdi), %rsi
; int len = ctx->data_end - ctx->data;
  15:	movq	8(%rdi), %rdi
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
  19:	movq	%rsi, %rdx
  1c:	addq	$34, %rdx
; if ((void *)(iph + 1) > ctx_ptr(ctx, data_end))
  20:	cmpq	%rdi, %rdx
  23:	ja	0xffffffffc0000929
  25:	movl	$9, %edx
; struct iphdr *iph = (struct iphdr *)(eth + 1);
  2a:	movq	%rsi, %rcx
  2d:	addq	%rdx, %rcx
; if (iph->protocol != IPPROTO_ICMP)
  30:	movzbq	14(%rcx), %rdx
; if (iph->protocol != IPPROTO_ICMP)
  35:	cmpq	$1, %rdx
  39:	jne	0xffffffffc0000929
  3b:	subq	%rsi, %rdi
  3e:	movl	$2, %ebx
; subprog(len, ret);
  43:	movq	%rbx, %rsi
  46:	movq	%rbx, %rax
; }
  49:	popq	%rbx
  4a:	leave
  4b:	retq
  4c:	int3


Demo source code:
https://github.com/Asphaltt/learn-by-example/tree/main/ebpf/eliminate-subprog


With the elimination, the callsite of 'subprog' has been eliminated.

However, as you mentioned above, the argument preparation insns are kept.

Thanks,
Leon


  reply	other threads:[~2026-06-29  5:27 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-27 22:51 [RFC PATCH bpf-next 00/12] bpf: Introduce static-defined tracing probe for BPF Xu Kuohai
2026-06-27 20:51 ` [syzbot ci] " syzbot ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 01/12] libbpf: Prepare bpf SDT probe section for the linker Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 02/12] libbpf: Introduce bpf SDT probe macros Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 03/12] libbpf: Add bpf_sdt_notes section parser Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 04/12] bpf: Create insn_array map for bpf SDT probe Xu Kuohai
2026-06-27 15:34   ` bot+bpf-ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 05/12] bpf: Collect SDT probe BTF IDs from BTF decl tags Xu Kuohai
2026-06-27 15:34   ` bot+bpf-ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 06/12] bpf: Add type check for SDT probe site Xu Kuohai
2026-06-27 15:22   ` bot+bpf-ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 07/12] bpf: Record probe name in SDT map Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 08/12] libbpf: Add libbpf support to load SDT observer program Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 09/12] bpf: Add kernel " Xu Kuohai
2026-06-27 15:22   ` bot+bpf-ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 10/12] bpf: Support attach and detach for " Xu Kuohai
2026-06-27 22:51 ` [RFC PATCH bpf-next 11/12] bpf, x86: Add JIT support SDT for probe Xu Kuohai
2026-06-27 15:22   ` bot+bpf-ci
2026-06-27 22:51 ` [RFC PATCH bpf-next 12/12] selftests/bpf: Add tests for bpf SDT probe Xu Kuohai
2026-06-29  2:14 ` [RFC PATCH bpf-next 00/12] bpf: Introduce static-defined tracing probe for BPF Leon Hwang
2026-06-29  3:27   ` Xu Kuohai
2026-06-29  5:26     ` Leon Hwang [this message]
2026-06-29  7:51       ` Xu Kuohai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8d8ab5e-bef1-4c95-8e5b-ea9768041b2e@linux.dev \
    --to=leon.hwang@linux.dev \
    --cc=a.s.protopopov@gmail.com \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=eyal.birger@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=memxor@gmail.com \
    --cc=rongtao@cestc.cn \
    --cc=xukuohai@huaweicloud.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox