Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCH v21 8/9] ring-buffer: Show persistent buffer dropped events in trace file
From: Masami Hiramatsu @ 2026-05-27  3:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, linux-trace-kernel, Mark Rutland, Mathieu Desnoyers,
	Andrew Morton, Ian Rogers
In-Reply-To: <20260526134116.11e1db99@gandalf.local.home>

On Tue, 26 May 2026 13:41:16 -0400
Steven Rostedt <rostedt@kernel.org> wrote:

> On Tue, 26 May 2026 14:06:09 +0900
> Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:
> 
> > > @@ -7204,10 +7209,12 @@ int ring_buffer_read_page(struct trace_buffer *buffer,
> > >  	 * Set a flag in the commit field if we lost events
> > >  	 */
> > >  	if (missed_events) {
> > > -		/* If there is room at the end of the page to save the
> > > +		/*
> > > +		 * If there is room at the end of the page to save the
> > >  		 * missed events, then record it there.
> > >  		 */
> > > -		if (buffer->subbuf_size - commit >= sizeof(missed_events)) {
> > > +		if (missed_events > 0 &&
> > > +		    buffer->subbuf_size - commit >= sizeof(missed_events)) {
> > >  			memcpy(&dpage->data[commit], &missed_events,
> > >  			       sizeof(missed_events));
> > >  			local_add(RB_MISSED_STORED, &dpage->commit);  
> > 
> > After this line, we "add" RB_MISSED_EVENTS instead of set.
> > In this case, does it clear the RB_MISSED_EVENTS bit because
> > it already sets RB_MISSED_EVENTS.
> > 
> > 			commit += sizeof(missed_events);
> > 		}
> > 		local_add(RB_MISSED_EVENTS, &bpage->commit);
> >                       ^^^ here.
> 
> Perhaps this needs to be commented better.
> 
> The answer to your question is "No". The reason is that this is a *copy* of
> the page we are reading. As persistent pages are always assigned to
> specific memory, it can never leave the buffer even for the splice system
> call. It is always copied to a new page.
> 
> The new page doesn't have these bits set and needs to set them depending on
> what was found when reading the page from the buffer.
> 
> Now if this was a normal ring buffer where it did a zero copy from the
> buffer itself by swapping pages with the passed in page, if the bit was set
> before, then adding would cause a problem. But normal ring buffer pages
> never set these bits while in the buffer. They are only set by this function.

Yeah, for the persistent ring buffer, it does not happen.
But there seems RB_MISSED_EVENTS bit can be cleared in
"else" path (after applying 1-8 patches)?

----------
	if (read || (len < (commit - read)) ||
	    cpu_buffer->reader_page == cpu_buffer->commit_page ||
	    force_memcpy) { // <-- persistent ring buffer sets force_memcpy = true.
[...]
	} else {
		/* update the entry counter */
[...]
		if (!missed_events && rb_data_page_commit(dpage) & RB_MISSED_EVENTS)
			missed_events = -1;	
			//^-- we check RB_MISSED_EVENTS bit on @dpage->commit and set missed_events = -1.

		/*
		 * Use the real_end for the data size,
		 * This gives us a chance to store the lost events
		 * on the page.
		 */
		if (reader->real_end)
			local_set(&dpage->commit, reader->real_end);
                        // ^- only if @reader->real_end, RB_MISSED_EVENTS bit is dropped.
	}

	cpu_buffer->lost_events = 0;

	commit = rb_data_page_commit(dpage);
	/*
	 * Set a flag in the commit field if we lost events
	 */
	if (missed_events) {
		/*
		 * If there is room at the end of the page to save the
		 * missed events, then record it there.
		 */
		if (missed_events > 0 &&
		    buffer->subbuf_size - commit >= sizeof(missed_events)) {
			memcpy(&dpage->data[commit], &missed_events,
			       sizeof(missed_events));
			local_add(RB_MISSED_STORED, &dpage->commit);
			commit += sizeof(missed_events);
		}
		local_add(RB_MISSED_EVENTS, &dpage->commit); // <-- @dpage->commit is updated.
	}
----------

Thanks,


> 
> -- Steve


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH v2] tracing: Point constant hist field type to string literal
From: Masami Hiramatsu @ 2026-05-27  2:41 UTC (permalink / raw)
  To: Yu Peng
  Cc: rostedt, linux-kernel, linux-trace-kernel, mathieu.desnoyers,
	mhiramat
In-Reply-To: <20260527023450.2137639-1-pengyu@kylinos.cn>

On Wed, 27 May 2026 10:34:50 +0800
Yu Peng <pengyu@kylinos.cn> wrote:

> The HIST_FIELD_FL_CONST path uses the fixed "u64" type string.
> 
> Point hist_field->type directly to the string literal, matching the
> HIST_FIELD_FL_HITCOUNT path. The release path already uses kfree_const(),
> so no duplication is needed.
> 
> Signed-off-by: Yu Peng <pengyu@kylinos.cn>

This looks good to me.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks,

> ---
> Changes in v2:
> - Point hist_field->type directly to "u64" as suggested.
> 
>  kernel/trace/trace_events_hist.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index eb2c2bc8bc3d5..b50f2bd5ff771 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -1992,9 +1992,7 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
>  	if (flags & HIST_FIELD_FL_CONST) {
>  		hist_field->fn_num = HIST_FIELD_FN_CONST;
>  		hist_field->size = sizeof(u64);
> -		hist_field->type = kstrdup("u64", GFP_KERNEL);
> -		if (!hist_field->type)
> -			goto free;
> +		hist_field->type = "u64";
>  		goto out;
>  	}
>  
> -- 
> 2.43.0


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* [PATCH v2] tracing: Point constant hist field type to string literal
From: Yu Peng @ 2026-05-27  2:34 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, linux-trace-kernel, mathieu.desnoyers, mhiramat,
	pengyu

The HIST_FIELD_FL_CONST path uses the fixed "u64" type string.

Point hist_field->type directly to the string literal, matching the
HIST_FIELD_FL_HITCOUNT path. The release path already uses kfree_const(),
so no duplication is needed.

Signed-off-by: Yu Peng <pengyu@kylinos.cn>
---
Changes in v2:
- Point hist_field->type directly to "u64" as suggested.

 kernel/trace/trace_events_hist.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index eb2c2bc8bc3d5..b50f2bd5ff771 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1992,9 +1992,7 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
 	if (flags & HIST_FIELD_FL_CONST) {
 		hist_field->fn_num = HIST_FIELD_FN_CONST;
 		hist_field->size = sizeof(u64);
-		hist_field->type = kstrdup("u64", GFP_KERNEL);
-		if (!hist_field->type)
-			goto free;
+		hist_field->type = "u64";
 		goto out;
 	}
 
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH v2] tracing: Point constant hist field type to string literal
From: Steven Rostedt @ 2026-05-27  2:26 UTC (permalink / raw)
  To: Yu Peng; +Cc: linux-kernel, linux-trace-kernel, mathieu.desnoyers, mhiramat
In-Reply-To: <20260527021827.2123529-1-pengyu@kylinos.cn>

On Wed, 27 May 2026 10:18:27 +0800
Yu Peng <pengyu@kylinos.cn> wrote:

> The HIST_FIELD_FL_CONST path uses the fixed "u64" type string.
> 
> Point hist_field->type directly to the string literal, matching the
> HIST_FIELD_FL_HITCOUNT path. The release path already uses kfree_const(),
> so no duplication is needed.
> 
> Signed-off-by: Yu Peng <pengyu@kylinos.cn>
> ---
> Changes in v2:
> - Point hist_field->type directly to "u64" as suggested

All new versions of a patch need to start a new thread. Please resend
this as a new thread and not a reply.

-- Steve


> 
>  kernel/trace/trace_events_hist.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index eb2c2bc8bc3d5..b50f2bd5ff771 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -1992,9 +1992,7 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
>  	if (flags & HIST_FIELD_FL_CONST) {
>  		hist_field->fn_num = HIST_FIELD_FN_CONST;
>  		hist_field->size = sizeof(u64);
> -		hist_field->type = kstrdup("u64", GFP_KERNEL);
> -		if (!hist_field->type)
> -			goto free;
> +		hist_field->type = "u64";
>  		goto out;
>  	}
>  


^ permalink raw reply

* Re: [RFC PATCH v3 0/3] trace: stack trace deduplication for ftrace ring buffer
From: Li Pengfei @ 2026-05-27  2:23 UTC (permalink / raw)
  To: rostedt
  Cc: mhiramat, linux-trace-kernel, linux-kernel, cmllamas, zhangbo56,
	Pengfei Li
In-Reply-To: <20260526153905.093aff7c@gandalf.local.home>

From: Pengfei Li <lipengfei28@xiaomi.com>

On Mon, 26 May 2026 15:39:05 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Please DO NOT SEND new versions of a patch or patch series as a reply to
> the old one. It makes it extremely difficult for maintainers to manage the
> replies and patches.
> 
> A new version should ALWAYS start a new email thread!

Understood. Will start a fresh thread from v4 onwards and use a
lore link to reference the previous version in the cover letter.

Sorry for the noise.

Pengfei

^ permalink raw reply

* [PATCH v2] tracing: Point constant hist field type to string literal
From: Yu Peng @ 2026-05-27  2:18 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, linux-trace-kernel, mathieu.desnoyers, mhiramat,
	pengyu
In-Reply-To: <20260526215033.78fdac7f@fedora>

The HIST_FIELD_FL_CONST path uses the fixed "u64" type string.

Point hist_field->type directly to the string literal, matching the
HIST_FIELD_FL_HITCOUNT path. The release path already uses kfree_const(),
so no duplication is needed.

Signed-off-by: Yu Peng <pengyu@kylinos.cn>
---
Changes in v2:
- Point hist_field->type directly to "u64" as suggested.

 kernel/trace/trace_events_hist.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index eb2c2bc8bc3d5..b50f2bd5ff771 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1992,9 +1992,7 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
 	if (flags & HIST_FIELD_FL_CONST) {
 		hist_field->fn_num = HIST_FIELD_FN_CONST;
 		hist_field->size = sizeof(u64);
-		hist_field->type = kstrdup("u64", GFP_KERNEL);
-		if (!hist_field->type)
-			goto free;
+		hist_field->type = "u64";
 		goto out;
 	}
 
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH bpf-next v2 2/3] tracing: Expose tracepoint BTF ids via tracefs
From: Steven Rostedt @ 2026-05-27  1:57 UTC (permalink / raw)
  To: Mykyta Yatsenko
  Cc: bpf, Mykyta Yatsenko, linux-trace-kernel, Andrii Nakryiko,
	Alexei Starovoitov
In-Reply-To: <eed59d12-969f-4135-a84b-4965743692cd@gmail.com>

On Tue, 26 May 2026 11:07:56 +0100
Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> wrote:

> Hi Steven,
> 
> Gentle ping on this patch from the series.
> 
> Since this part touches tracing, I’d appreciate your thoughts on the
> tracing changes whenever you have a chance.
> 

Hi,

I've been looking at this and was wondering if there are ways to not
extend the trace_event_class structure. It's added for most trace
events (actually each DECLARE_EVENT_CLASS). Although when things like
BTF is enabled, this is a very small amount of extra memory.

I haven't been ignoring this. I've just been thinking about other
approaches, but haven't come up with anything. Of course, I haven't
been spending that much time on it, as I've been focused on other
things.

-- Steve

^ permalink raw reply

* Re: [PATCH] tracing: Use kstrdup_const() for constant hist field type
From: Steven Rostedt @ 2026-05-27  1:50 UTC (permalink / raw)
  To: Masami Hiramatsu (Google)
  Cc: Yu Peng, Mathieu Desnoyers, linux-trace-kernel, linux-kernel
In-Reply-To: <20260527101015.74fc2d54860f16e4aba83cd7@kernel.org>

On Wed, 27 May 2026 10:10:15 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> On Tue, 26 May 2026 17:51:05 +0800
> Yu Peng <pengyu@kylinos.cn> wrote:
> 
> > The HIST_FIELD_FL_CONST path duplicates the literal "u64" type with
> > kstrdup(), then releases it through kfree_const().
> > 
> > Use kstrdup_const() instead, avoiding the allocation for a .rodata string
> > while keeping the matching free helper.
> >   
> 
> Since we are using kfree_const() to free hist_field->type,
> we can just point it as same as HIST_FIELD_FL_HITCOUNT case.
> 

Agreed.

> Thanks,
> 
> > Signed-off-by: Yu Peng <pengyu@kylinos.cn>
> > ---
> >  kernel/trace/trace_events_hist.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> > index eb2c2bc8bc3d..6ffe9f4720a0 100644
> > --- a/kernel/trace/trace_events_hist.c
> > +++ b/kernel/trace/trace_events_hist.c
> > @@ -1992,7 +1992,7 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
> >  	if (flags & HIST_FIELD_FL_CONST) {
> >  		hist_field->fn_num = HIST_FIELD_FN_CONST;
> >  		hist_field->size = sizeof(u64);
> > -		hist_field->type = kstrdup("u64", GFP_KERNEL);
> > +		hist_field->type = kstrdup_const("u64", GFP_KERNEL);

This can simply be made to point to "u64".

Thanks,

-- Steve

> >  		if (!hist_field->type)
> >  			goto free;
> >  		goto out;
> > -- 
> > 2.43.0  
> 
> 


^ permalink raw reply

* Re: [PATCH] tracing: Use kstrdup_const() for constant hist field type
From: Masami Hiramatsu @ 2026-05-27  1:10 UTC (permalink / raw)
  To: Yu Peng
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers,
	linux-trace-kernel, linux-kernel
In-Reply-To: <20260526095105.1330810-1-pengyu@kylinos.cn>

On Tue, 26 May 2026 17:51:05 +0800
Yu Peng <pengyu@kylinos.cn> wrote:

> The HIST_FIELD_FL_CONST path duplicates the literal "u64" type with
> kstrdup(), then releases it through kfree_const().
> 
> Use kstrdup_const() instead, avoiding the allocation for a .rodata string
> while keeping the matching free helper.
> 

Since we are using kfree_const() to free hist_field->type,
we can just point it as same as HIST_FIELD_FL_HITCOUNT case.

Thanks,

> Signed-off-by: Yu Peng <pengyu@kylinos.cn>
> ---
>  kernel/trace/trace_events_hist.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index eb2c2bc8bc3d..6ffe9f4720a0 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -1992,7 +1992,7 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
>  	if (flags & HIST_FIELD_FL_CONST) {
>  		hist_field->fn_num = HIST_FIELD_FN_CONST;
>  		hist_field->size = sizeof(u64);
> -		hist_field->type = kstrdup("u64", GFP_KERNEL);
> +		hist_field->type = kstrdup_const("u64", GFP_KERNEL);
>  		if (!hist_field->type)
>  			goto free;
>  		goto out;
> -- 
> 2.43.0


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCH] tracing: Fix field_var_str allocation errno
From: Masami Hiramatsu @ 2026-05-27  1:04 UTC (permalink / raw)
  To: Yu Peng
  Cc: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers, Tom Zanussi,
	linux-trace-kernel, linux-kernel
In-Reply-To: <20260526095022.1330107-1-pengyu@kylinos.cn>

On Tue, 26 May 2026 17:50:22 +0800
Yu Peng <pengyu@kylinos.cn> wrote:

> hist_trigger_elt_data_alloc() returns -EINVAL when the field_var_str
> kcalloc() fails. Return -ENOMEM instead, matching the other allocation
> failures in the function.
> 

Looks good to me.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Thanks,

> Fixes: c910db943d35 ("tracing: Dynamically allocate the per-elt hist_elt_data array")
> Signed-off-by: Yu Peng <pengyu@kylinos.cn>
> ---
>  kernel/trace/trace_events_hist.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index eb2c2bc8bc3d..17fe13e12a4f 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -1680,7 +1680,7 @@ static int hist_trigger_elt_data_alloc(struct tracing_map_elt *elt)
>  	elt_data->field_var_str = kcalloc(n_str, sizeof(char *), GFP_KERNEL);
>  	if (!elt_data->field_var_str) {
>  		hist_elt_data_free(elt_data);
> -		return -EINVAL;
> +		return -ENOMEM;
>  	}
>  	elt_data->n_field_var_str = n_str;
>  
> -- 
> 2.43.0


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply

* Re: [PATCHv4 04/13] uprobes/x86: Unmap trampoline vma object in case it's unused
From: bot+bpf-ci @ 2026-05-26 21:46 UTC (permalink / raw)
  To: jolsa, oleg, peterz, mingo, mhiramat, andrii
  Cc: bpf, linux-trace-kernel, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260526205840.173790-5-jolsa@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 3456 bytes --]

> commit 68883203f0583fb1a8d79536f5c8945542974abd
> Author: Jiri Olsa <jolsa@kernel.org>
> Date:   Tue May 26 22:58:31 2026 +0200
>
>     uprobes/x86: Unmap trampoline vma object in case it's unused
>
>     In case the optimization fails, we leak new-ly created trampoline
>     vma mapping (in case we just created it), let's unmap it.
>
>     Fixes: ba2bfc97b462 ("uprobes/x86: Add support to optimize uprobes")
>     Signed-off-by: Jiri Olsa <jolsa@kernel.org>

Should this Fixes: tag point to a9fe6f54968c instead? The vma leak bug
was actually introduced by commit a9fe6f54968c ("uprobes/x86: Remove
struct uprobe_trampoline object"), not ba2bfc97b462. Commit a9fe6f54968c
removed the destroy_uprobe_trampoline() cleanup call, and its commit
message explicitly states: "The new vma leak is fixed in following
change."

Suggested: Fixes: a9fe6f54968c ("uprobes/x86: Remove struct
uprobe_trampoline object")

> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index c8af41ed681a..af5af7d67999 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -677,11 +677,14 @@ static unsigned long find_nearest_trampoline(unsigned long vaddr)
>  	return high_tramp;
>  }
>
> -static struct vm_area_struct *get_uprobe_trampoline(struct mm_struct *mm, unsigned long vaddr)
> +static struct vm_area_struct *get_uprobe_trampoline(struct mm_struct *mm, unsigned long vaddr,
> +						    bool *new_mapping)
>  {
>  	VMA_ITERATOR(vmi, mm, 0);
>  	struct vm_area_struct *vma;
>
> +	*new_mapping = false;
> +
>  	if (vaddr > TASK_SIZE || vaddr < PAGE_SIZE)
>  		return ERR_PTR(-EINVAL);
>

Does this fix address the VMA leak scenario across fork chains? In v2,
sashiko-bot raised a concern about inherited-but-not-tracked VMAs:

https://lore.kernel.org/bpf/20260518114208.CE128C2BCF6@smtp.kernel.org/

When VM_DONTCOPY is removed, child processes inherit trampoline VMAs
but their tracking list (mm->uprobes_state.head_tramps) is empty. When
the child executes a uprobe, get_uprobe_trampoline() fails to find the
inherited VMA and creates a duplicate trampoline.

Child fork:
  dup_mmap()
    copies trampoline VMA to child
  mm_init_uprobes_state()
    initializes head_tramps as empty (parent's state not copied)

Child executes uprobe:
  arch_uprobe_optimize()
    get_uprobe_trampoline()
      head_tramps is empty, fails to find inherited VMA
      create_uprobe_trampoline()
        find_nearest_trampoline() finds new unmapped area
        installs second duplicate trampoline VMA

In a deep fork chain, each generation inherits all previous VMAs, has an
empty head_tramps list, and creates a new VMA. Can this cause linear
accumulation of redundant VMAs, eventually exhausting vm.max_map_count?

While this patch addresses unmapping failed optimization attempts, it
does not prevent the inherited-but-not-tracked VMA leak scenario in fork
chains.

> @@ -696,6 +699,7 @@ static struct vm_area_struct *get_uprobe_trampoline(struct mm_struct *mm, unsign
>  	if (IS_ERR_VALUE(vaddr))
>  		return ERR_PTR(vaddr);
>
> +	*new_mapping = true;
>  	return _install_special_mapping(mm, vaddr, PAGE_SIZE,
>  				VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_IO,
>  				&tramp_mapping);

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

^ permalink raw reply

* Re: [PATCHv4 08/13] selftests/bpf: Emit nop,nop10 instructions combo for x86_64 arch
From: bot+bpf-ci @ 2026-05-26 21:46 UTC (permalink / raw)
  To: jolsa, oleg, peterz, mingo, mhiramat, andrii
  Cc: jakub, bpf, linux-trace-kernel, ast, andrii, daniel, martin.lau,
	eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260526205840.173790-9-jolsa@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1909 bytes --]

> diff --git a/tools/testing/selftests/bpf/usdt.h b/tools/testing/selftests/bpf/usdt.h
> index c71e21df38b3..75687f50f4e2 100644
> --- a/tools/testing/selftests/bpf/usdt.h
> +++ b/tools/testing/selftests/bpf/usdt.h
> @@ -313,7 +313,7 @@ struct usdt_sema { volatile unsigned short active; };
>  #if defined(__ia64__) || defined(__s390__) || defined(__s390x__)
>  #define USDT_NOP			nop 0
>  #elif defined(__x86_64__)
> -#define USDT_NOP                       .byte 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x0 /* nop, nop5 */
> +#define USDT_NOP                       .byte 0x90, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 /* nop, nop10 */
>  #else
>  #define USDT_NOP			nop
>  #endif

Does this change break the USDT selftest?  Multiple reviewers raised
concerns across v1 through v4 that subtest_optimized_attach() in
prog_tests/usdt.c searches for the old 6-byte nop1_nop5_combo sequence.
When usdt_2.c emits the new 11-byte nop,nop10 instead, find_instr() will
return NULL and fail the ASSERT_OK_PTR check.

You mentioned in v1/v2 that the test would be updated in a following commit,
preferring to keep changes separated for easier review.  However, this creates
a git bisect regression where the test breaks at this commit. Should the test
update be squashed in to avoid breaking bisectability?

The commit message contains a placeholder reference:

>     Syncing latest usdt.h change [1].
>
>     Now that we have nop10 optimization support in kernel, let's emit
>     nop,nop10 for usdt probe. We leave it up to the library to use
>     desirable nop instruction.
>
>     [1] TBD

This isn't a bug, but the "[1] TBD" should either be filled in or removed.

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

^ permalink raw reply

* Re: [PATCHv4 07/13] libbpf: Detect uprobe syscall with new error
From: bot+bpf-ci @ 2026-05-26 21:46 UTC (permalink / raw)
  To: jolsa, oleg, peterz, mingo, mhiramat, andrii
  Cc: bpf, linux-trace-kernel, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260526205840.173790-8-jolsa@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1261 bytes --]

> libbpf: Detect uprobe syscall with new error
>
> In the previous optimized uprobe fix we changed the syscall
> error used for its detection from ENXIO to EPROTO.
>
> Changing related probe_uprobe_syscall detection check.
>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> Fixes: 05738da0efa1 ("libbpf: Add uprobe syscall feature detection")
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

The commit message states "In the previous optimized uprobe fix we
changed the syscall error used for its detection from ENXIO to EPROTO."

The Fixes tag points to commit 05738da0efa1 ("libbpf: Add uprobe syscall
feature detection"), but that commit originally added the ENXIO check
correctly. The bug was actually introduced by commit af734a7ddb4e
("uprobes/x86: Move optimized uprobe from nop5 to nop10"), which changed
the kernel to return EPROTO instead of ENXIO but did not update the
libbpf detection code to match.

Should the Fixes tag be:

  Fixes: af734a7ddb4e ("uprobes/x86: Move optimized uprobe from nop5 to nop10")


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

^ permalink raw reply

* Re: [PATCHv4 02/13] uprobes/x86: Remove struct uprobe_trampoline object
From: bot+bpf-ci @ 2026-05-26 21:46 UTC (permalink / raw)
  To: jolsa, oleg, peterz, mingo, mhiramat, andrii
  Cc: bpf, linux-trace-kernel, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260526205840.173790-3-jolsa@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 4383 bytes --]

> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 2be6707e3320..733655bc610e 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c

[ ... ]

> @@ -1111,21 +1051,18 @@ int set_orig_insn(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
>  static int __arch_uprobe_optimize(struct arch_uprobe *auprobe, struct mm_struct *mm,
>  				  unsigned long vaddr)
>  {
> -	struct uprobe_trampoline *tramp;
> -	struct vm_area_struct *vma;
> -	bool new = false;
> -	int err = 0;
> +	struct pt_regs *regs = task_pt_regs(current);
> +	struct vm_area_struct *vma, *tramp;
>
> +	if (!user_64bit_mode(regs))
> +		return -EINVAL;
>  	vma = find_vma(mm, vaddr);
>  	if (!vma)
>  		return -EINVAL;
> -	tramp = get_uprobe_trampoline(vaddr, &new);
> -	if (!tramp)
> -		return -EINVAL;
> -	err = swbp_optimize(auprobe, vma, vaddr, tramp->vaddr);
> -	if (WARN_ON_ONCE(err) && new)
> -		destroy_uprobe_trampoline(tramp);
> -	return err;
> +	tramp = get_uprobe_trampoline(mm, vaddr);
> +	if (IS_ERR(tramp))
> +		return PTR_ERR(tramp);
> +	return WARN_ON_ONCE(swbp_optimize(auprobe, vma, vaddr, tramp->vm_start));
>  }

When swbp_optimize() fails in __arch_uprobe_optimize() and the preceding
get_uprobe_trampoline() call created a new trampoline VMA via
_install_special_mapping(), does the new VMA leak in the process address
space?

The pre-patch code tracked whether the trampoline was newly created and
called destroy_uprobe_trampoline() on the failure path:

  err = swbp_optimize(auprobe, vma, vaddr, tramp->vaddr);
  if (WARN_ON_ONCE(err) && new)
      destroy_uprobe_trampoline(tramp);
  return err;

The new code drops that cleanup entirely:

  return WARN_ON_ONCE(swbp_optimize(auprobe, vma, vaddr, tramp->vm_start));

Although the old destroy_uprobe_trampoline only freed the struct (not the
underlying VMA), the new code appears to introduce a VMA leak: the freshly
mapped PAGE_SIZE special mapping in the user's address space stays mapped
even though optimization failed. arch_uprobe_optimize() then sets
ARCH_UPROBE_FLAG_OPTIMIZE_FAIL so subsequent calls won't retry, leaving the
orphan trampoline mapping in the address space until exit_mmap() reaps it at
process teardown.

The commit message mentions: "Note the original code called
destroy_uprobe_trampoline if the optimiation failed, but it only freed the
struct uprobe_trampoline object, not the vma. The new vma leak is fixed in
following change."

Is the VMA leak addressed in the subsequent commit in this series?

A secondary behaviour change is that 'return WARN_ON_ONCE(swbp_optimize(...))'
now returns the boolean truth value of the error (0 or 1) instead of the
original errno. While the current caller (arch_uprobe_optimize) only treats
the value as boolean, could this surprise a future caller that propagates the
return code?

Commit message:

>     uprobes/x86: Remove struct uprobe_trampoline object
>
>     Removing struct uprobe_trampoline object and it's tracking code,
>     because it's not needed. We can do same thing directly on top of
>     struct vm_area_struct objects.
>
>     This makes the code simpler and allows easy propagation of the
>     trampoline vma object into child process in following change.
>
>     Note the original code called destroy_uprobe_trampoline if the
>     optimiation failed, but it only freed the struct uprobe_trampoline
>     object, not the vma. The new vma leak is fixed in following change.
>
>     Acked-by: Andrii Nakryiko <andrii@kernel.org>
>     Signed-off-by: Jiri Olsa <jolsa@kernel.org>

CI bot suggested adding a Fixes tag in v3: 'Fixes: ba2bfc97b462
("uprobes/x86: Add support to optimize uprobes")' because the commit message
notes a VMA resource leak in the original code.

Reference: https://lore.kernel.org/bpf/8861cc7af8e0dfff2674ca4ae6381ea2d82c242dec115774e8e11048bde358c4@mail.kernel.org/

The author responded that 'the logic stays' and a separate fix is needed, but
did not explain why the Fixes tag should not be included. If this patch is
documenting a pre-existing bug (even if the full fix comes later), should it
include a Fixes tag for backporting purposes?

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

^ permalink raw reply

* Re: [PATCHv4 12/13] selftests/bpf: Add tests for uprobe nop10 red zone clobbering
From: bot+bpf-ci @ 2026-05-26 21:46 UTC (permalink / raw)
  To: jolsa, oleg, peterz, mingo, mhiramat, andrii
  Cc: bpf, linux-trace-kernel, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260526205840.173790-13-jolsa@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 3775 bytes --]

> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index 969f4deba9fd..efff0c515184 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c

[ ... ]

> @@ -855,6 +897,37 @@ static void test_uprobe_race(void)
>  #define __NR_uprobe 336
>  #endif
>
> +static void test_uprobe_red_zone(void)
> +{
> +	struct uprobe_syscall_executed *skel;
> +	struct bpf_link *link;
> +	void *nop10_addr;
> +	size_t offset;
> +	int i;
> +
> +	nop10_addr = find_nop10(uprobe_red_zone_test);
> +	if (!ASSERT_NEQ(nop10_addr, NULL, "find_nop10"))
> +		return;
> +
> +	skel = uprobe_syscall_executed__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "open_and_load"))
> +		return;
> +
> +	offset = get_uprobe_offset(nop10_addr);
> +	link = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
> +			0, "/proc/self/exe", offset, NULL);
> +	if (!ASSERT_OK_PTR(link, "attach_uprobe"))
> +		goto cleanup;
> +
> +	for (i = 0; i < 10; i++)
> +		ASSERT_EQ(uprobe_red_zone_test(), 0, "red_zone_intact");

Does this loop execute too quickly to guarantee the uprobe optimization has
completed?

Uprobe optimization is performed asynchronously by a worker thread. Since this
is a tight loop with no delays or blocking, the 10 iterations might finish
before the background worker patches the code.

If all hits execute on the unoptimized INT3 path, the test could falsely pass
on kernels where the CALL-based optimization bug is present.

(This concern was raised by sashiko-bot@kernel.org in v1 and remains
unaddressed:
https://lore.kernel.org/bpf/20260514212219.BF1F3C2BCB3@smtp.kernel.org/)

> +
> +	bpf_link__destroy(link);
> +
> +cleanup:
> +	uprobe_syscall_executed__destroy(skel);
> +}

[ ... ]

> diff --git a/tools/testing/selftests/bpf/usdt_2.c b/tools/testing/selftests/bpf/usdt_2.c
> index b359b389f6c0..5e38f8605b02 100644
> --- a/tools/testing/selftests/bpf/usdt_2.c
> +++ b/tools/testing/selftests/bpf/usdt_2.c
> @@ -13,4 +13,17 @@ void usdt_2(void)
>  	USDT(optimized_attach, usdt_2);
>  }
>
> +static volatile unsigned long usdt_red_zone_arg1 = 0xDEADBEEF;
> +static volatile unsigned long usdt_red_zone_arg2 = 0xCAFEBABE;
> +static volatile unsigned long usdt_red_zone_arg3 = 0xFEEDFACE;
> +
> +void __attribute__((noinline)) usdt_red_zone_trigger(void)
> +{
> +	unsigned long a1 = usdt_red_zone_arg1;
> +	unsigned long a2 = usdt_red_zone_arg2;
> +	unsigned long a3 = usdt_red_zone_arg3;
> +
> +	USDT(optimized_attach, usdt_red_zone, a1, a2, a3);
> +}

Is this test reliable across different compiler versions and optimizations?

The USDT macro uses the "nor" inline assembly constraint, which allows the
compiler to allocate these operands to CPU registers.

Because there is no register pressure in usdt_red_zone_trigger(), modern
compilers might allocate a1, a2, and a3 directly to registers rather than the
stack's red zone.

If the operands are placed in registers, the uprobe CALL optimization
clobbering [rsp-8] will not corrupt them. This could cause the test to pass
even on buggy kernels, creating a false positive.

Would it be safer to explicitly force these operands into the red zone using
inline assembly constraints, rather than depending on the compiler's register
allocator?

(This concern was raised by bot+bpf-ci@kernel.org in v1 and v3 and remains
unaddressed:
https://lore.kernel.org/bpf/e91d31e0c1378fe970db6385c7af419e97af0c2acd37b3b8775a2fac02ea2361@mail.kernel.org/)


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

^ permalink raw reply

* Re: [PATCHv4 03/13] uprobes/x86: Allow to copy uprobe trampolines on fork
From: bot+bpf-ci @ 2026-05-26 21:46 UTC (permalink / raw)
  To: jolsa, oleg, peterz, mingo, mhiramat, andrii
  Cc: bpf, linux-trace-kernel, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260526205840.173790-4-jolsa@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1449 bytes --]

> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 733655bc610e..c8af41ed681a 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -697,7 +697,7 @@ static struct vm_area_struct *get_uprobe_trampoline(struct mm_struct *mm, unsign
>  		return ERR_PTR(vaddr);
>
>  	return _install_special_mapping(mm, vaddr, PAGE_SIZE,
> -				VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_DONTCOPY|VM_IO,
> +				VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_IO,
>  				&tramp_mapping);
>  }

When a process with optimized uprobes is forked, the child inherits the
optimized code and trampoline. This means sys_uprobe invocations in the
child will go into the uprobe trampoline, and the syscall will keep
returning -EPROTO since no handler is registered.

Is there a way to restore the original memory pages for clones?

This behavior seems suboptimal from a performance perspective. For
standard int3-based uprobes, breakpoints are removed via
UPROBE_HANDLER_REMOVE. Could something similar be implemented for
optimized uprobes, perhaps with proper filtering in uprobe_multi?

Reference: https://lore.kernel.org/bpf/CAEf4BzYo-8PAXFJt9MHoUn9ux1O2YVxJADC0tGSsacVu_R8Stw@mail.gmail.com/

---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

^ permalink raw reply

* Re: [PATCHv4 09/13] selftests/bpf: Change uprobe syscall tests to use nop10
From: bot+bpf-ci @ 2026-05-26 21:46 UTC (permalink / raw)
  To: jolsa, oleg, peterz, mingo, mhiramat, andrii
  Cc: bpf, linux-trace-kernel, ast, andrii, daniel, martin.lau, eddyz87,
	yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260526205840.173790-10-jolsa@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 3728 bytes --]

> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index c944136252c6..0868fb9793e0 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c

[ ... ]

> @@ -388,14 +388,15 @@ static int find_uprobes_trampoline(void *tramp_addr)
>  	return ret;
>  }
>
> -static unsigned char nop5[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
> +static unsigned char nop10[10]  = { 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 };
> +static unsigned char lea_rsp[5] = { 0x48, 0x8d, 0x64, 0x24, 0x80 };
>
> -static void *find_nop5(void *fn)
> +static void *find_nop10(void *fn)
>  {
>  	int i;
>
> -	for (i = 0; i < 10; i++) {
> -		if (!memcmp(nop5, fn + i, 5))
> +	for (i = 0; i < 128; i++) {
> +		if (!memcmp(nop10, fn + i, 10))
>  			return fn + i;
>  	}
>  	return NULL;

[ ... ]

> diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> index 69759b27794d..fda3a298ccfc 100644
> --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> @@ -252,7 +252,7 @@ extern void usdt_1(void);
>  extern void usdt_2(void);
>
>  static unsigned char nop1[1] = { 0x90 };
> -static unsigned char nop1_nop5_combo[6] = { 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 };
> +static unsigned char nop1_nop10_combo[11] = { 0x90, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 };
>
>  static void *find_instr(void *fn, unsigned char *instr, size_t cnt)
>  {

Is the loop in find_instr() adequate to find the updated instruction
sequence? In uprobe_syscall.c, find_nop10() was updated to search up
to 128 bytes to account for compiler-generated prologues:

tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c:find_nop10() {
	for (i = 0; i < 128; i++) {
		if (!memcmp(nop10, fn + i, 10))
			return fn + i;
	}
}

But find_instr() in usdt.c only searches the first 10 bytes:

tools/testing/selftests/bpf/prog_tests/usdt.c:find_instr() {
	for (i = 0; i < 10; i++) {
		if (!memcmp(instr, fn + i, cnt))
			return fn + i;
	}
}

If a modern compiler generates a prologue longer than 9 bytes for
usdt_2(), find_instr() will prematurely terminate and cause
subtest_optimized_attach() to fail. This concern was raised by
reviewers in v2 and v3 of the patch series:

  https://lore.kernel.org/bpf/20260518111604.CC731C2BCB7@smtp.kernel.org/

The author acknowledged this with "yea find_instr needs same update,
will fix" but the loop bound remains 10 in the current v4 patch.

> @@ -271,17 +271,17 @@ static void subtest_optimized_attach(void)
>  	__u8 *addr_1, *addr_2;
>
>  	/* usdt_1 USDT probe has single nop instruction */
> -	addr_1 = find_instr(usdt_1, nop1_nop5_combo, 6);
> -	if (!ASSERT_NULL(addr_1, "usdt_1_find_nop1_nop5_combo"))
> +	addr_1 = find_instr(usdt_1, nop1_nop10_combo, 11);
> +	if (!ASSERT_NULL(addr_1, "usdt_1_find_nop1_nop10_combo"))
>  		return;
>
>  	addr_1 = find_instr(usdt_1, nop1, 1);
>  	if (!ASSERT_OK_PTR(addr_1, "usdt_1_find_nop1"))
>  		return;
>
> -	/* usdt_2 USDT probe has nop,nop5 instructions combo */
> -	addr_2 = find_instr(usdt_2, nop1_nop5_combo, 6);
> -	if (!ASSERT_OK_PTR(addr_2, "usdt_2_find_nop1_nop5_combo"))
> +	/* usdt_2 USDT probe has nop,nop10 instructions combo */
> +	addr_2 = find_instr(usdt_2, nop1_nop10_combo, 11);
> +	if (!ASSERT_OK_PTR(addr_2, "usdt_2_find_nop1_nop10_combo"))
>  		return;

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

^ permalink raw reply

* Re: [PATCHv4 06/13] libbpf: Change has_nop_combo to work on top of nop10
From: bot+bpf-ci @ 2026-05-26 21:46 UTC (permalink / raw)
  To: jolsa, oleg, peterz, mingo, mhiramat, andrii
  Cc: jakub, bpf, linux-trace-kernel, ast, andrii, daniel, martin.lau,
	eddyz87, yonghong.song, clm, ihor.solodrai
In-Reply-To: <20260526205840.173790-7-jolsa@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2986 bytes --]

> commit 5ad96df841a0877eeca7d2cbe4b3f726b7fd0b72
> Author: Jiri Olsa <jolsa@kernel.org>
> Date:   Tue May 26 22:58:33 2026 +0200
>
>     libbpf: Change has_nop_combo to work on top of nop10
>
>     We now expect nop combo with 10 bytes nop instead of 5 bytes nop,
>     fixing has_nop_combo to reflect that.
>
>     Fixes: 41a5c7df4466 ("libbpf: Add support to detect nop,nop5 instructions combo for usdt probe")
>     Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
>     Acked-by: Andrii Nakryiko <andrii@kernel.org>
>     Signed-off-by: Jiri Olsa <jolsa@kernel.org>

The Fixes: tag references commit 41a5c7df4466 which originally added
has_nop_combo() with correct nop5 support. However, the actual bug was
introduced by commit af734a7ddb4e which changed the kernel's uprobe
implementation from nop5 to nop10. Should the tag be:

  Fixes: af734a7ddb4e ("uprobes/x86: Move optimized uprobe from nop5 to nop10")

> diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
> index e3710933fd52..484a4354e82b 100644
> --- a/tools/lib/bpf/usdt.c
> +++ b/tools/lib/bpf/usdt.c
> @@ -305,7 +305,7 @@ struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
>
>  	/*
>  	 * Detect kernel support for uprobe() syscall, it's presence means we can
> -	 * take advantage of faster nop5 uprobe handling.
> +	 * take advantage of faster nop10 uprobe handling.
>  	 * Added in: 56101b69c919 ("uprobes/x86: Add uprobe syscall to speed up uprobe")
>  	 */

The comment at the end of this block references commit 56101b69c919 which
added the uprobe syscall. Andrii Nakryiko noted in v3 review that it would
be good to also reference the commit that switches nop5 to nop10, though he
acknowledged the hash isn't stable until it lands. Now that af734a7ddb4e is
available, could this be updated to reference both commits?

> @@ -596,14 +596,14 @@ static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note,
>  #if defined(__x86_64__)
>  static bool has_nop_combo(int fd, long off)
>  {
> -	unsigned char nop_combo[6] = {
> -		0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 /* nop,nop5 */
> +	unsigned char nop_combo[11] = {
> +		0x90, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00,
>  	};
> -	unsigned char buf[6];
> +	unsigned char buf[11];
>
> -	if (pread(fd, buf, 6, off) != 6)
> +	if (pread(fd, buf, 11, off) != 11)
>  		return false;
> -	return memcmp(buf, nop_combo, 6) == 0;
> +	return memcmp(buf, nop_combo, 11) == 0;
>  }

This isn't a bug, but Jakub Sitnicki suggested in v1 review using
ARRAY_SIZE(buf) instead of repeating the scalar value 11. The author
preferred keeping the literal number for simplicity, though the value
now appears in 5 locations (array declarations, pread size, pread check,
memcmp size).

[ ... ]


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/26475828601

^ permalink raw reply

* Re: [PATCH v2 06/17] landlock: Add create_ruleset and free_ruleset tracepoints
From: Justin Suess @ 2026-05-26 21:34 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Christian Brauner, Günther Noack, Steven Rostedt, Jann Horn,
	Jeff Xu, Kees Cook, Masami Hiramatsu, Mathieu Desnoyers,
	Matthieu Buffet, Mikhail Ivanov, Tingmao Wang, kernel-team,
	linux-fsdevel, linux-security-module, linux-trace-kernel
In-Reply-To: <20260406143717.1815792-7-mic@digikod.net>

On Mon, Apr 06, 2026 at 04:37:04PM +0200, Mickaël Salaün wrote:
> Add tracepoints for ruleset lifecycle events: landlock_create_ruleset
> fires from the landlock_create_ruleset() syscall handler, logging the
> ruleset Landlock ID and handled access masks; landlock_free_ruleset
> fires in free_ruleset() before the ruleset is freed, so eBPF programs
> can access the full ruleset state via BTF.
> 
> The create_ruleset TP_PROTO takes only the ruleset pointer.  The handled
> access masks are read from the ruleset in TP_fast_assign rather than
> passed as scalar arguments, so eBPF programs can access the full ruleset
> state (rules, access masks) via BTF on a single pointer.  No lock is
> needed because the ruleset is not yet shared (the file descriptor has
> not been installed).
> 
> Create the trace header with a DOC comment documenting the consistency
> guarantees, locking conventions, TP_PROTO safety, and security
> considerations shared by all Landlock tracepoints.  Add
> CREATE_TRACE_POINTS in log.c to generate the tracepoint implementations.
> 
> Add an id field to struct landlock_ruleset, assigned from
> landlock_get_id_range() at creation time.  Extend the CONFIG guard on
> landlock_get_id_range() from CONFIG_AUDIT to
> CONFIG_SECURITY_LANDLOCK_LOG so that IDs are available for tracing even
> without audit support.
> 
> The deallocation events use the "free_" prefix (rather than "drop_")
> because they fire when the object is actually freed.  There is no need
> for allocated/deallocated symmetry because ruleset creation happens with
> the landlock_create_ruleset tracepoint.
> 
> landlock_create_ruleset tracepoint.
> 
> Unlike audit records which share a record type and need a "status="
> field to distinguish allocation from deallocation, tracepoints provide
> one event type per lifecycle transition, each with a type-safe TP_PROTO
> matching the specific transition.  This enables type-safe eBPF BTF
> access and precise ftrace filtering by event name.
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: Justin Suess <utilityemal77@gmail.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tingmao Wang <m@maowtm.org>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v1:
> - New patch (split from the v1 add_rule_fs tracepoint patch).
> ---
>  MAINTAINERS                     |  1 +
>  include/trace/events/landlock.h | 94 +++++++++++++++++++++++++++++++++
>  security/landlock/id.h          |  6 +--
>  security/landlock/log.c         |  5 ++
>  security/landlock/ruleset.c     |  8 +++
>  security/landlock/ruleset.h     |  9 ++++
>  security/landlock/syscalls.c    |  5 ++
>  7 files changed, 125 insertions(+), 3 deletions(-)
>  create mode 100644 include/trace/events/landlock.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c3fe46d7c4bc..51104faa3951 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14389,6 +14389,7 @@ F:	Documentation/admin-guide/LSM/landlock.rst
>  F:	Documentation/security/landlock.rst
>  F:	Documentation/userspace-api/landlock.rst
>  F:	fs/ioctl.c
> +F:	include/trace/events/landlock.h
>  F:	include/uapi/linux/landlock.h
>  F:	samples/landlock/
>  F:	security/landlock/
> diff --git a/include/trace/events/landlock.h b/include/trace/events/landlock.h
> new file mode 100644
> index 000000000000..5e847844fbf7
> --- /dev/null
> +++ b/include/trace/events/landlock.h
> @@ -0,0 +1,94 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright © 2025 Microsoft Corporation
> + * Copyright © 2026 Cloudflare
> + */
> +
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM landlock
> +
> +#if !defined(_TRACE_LANDLOCK_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_LANDLOCK_H
> +
> +#include <linux/tracepoint.h>
> +
> +struct landlock_ruleset;
> +
> +/**
> + * DOC: Landlock trace events
> + *
> + * Consistency guarantee: every trace event corresponds to an operation
> + * that has irrevocably succeeded.  Lifecycle events fire only after
> + * the point of no return; denial events fire only for denials that
> + * actually happen.  This guarantees that eBPF programs observing the
> + * trace stream can build a faithful model of Landlock state without
> + * reconciliation logic.
> + *
> + * Mutable object pointers in TP_PROTO (e.g., struct landlock_ruleset
> + * for add_rule events) are passed while the caller holds the object's
> + * lock, so that TP_fast_assign and eBPF programs reading via BTF see a
> + * consistent snapshot.  For objects that are immutable at the emission
> + * site (e.g., a domain after creation), no lock is needed.
> + *
> + * All pointer arguments in TP_PROTO are guaranteed non-NULL by the
> + * caller.  eBPF programs can access these pointers via BTF for richer
> + * introspection than the TP_STRUCT__entry fields provide.
> + *
> + * TP_STRUCT__entry fields serve TP_printk display only.  eBPF programs
> + * access the raw TP_PROTO arguments directly.
> + *
> + * Security: as for audit, Landlock trace events may expose sensitive
> + * information about all sandboxed processes on the system.  See
> + * Documentation/admin-guide/LSM/landlock.rst for security considerations
> + * and privilege requirements.
> + */
> +
> +/**
> + * landlock_create_ruleset - new ruleset created
> + * @ruleset: Newly created ruleset (never NULL); not yet shared via an fd,
> + *           so no lock is needed.  eBPF programs can read the full ruleset
> + *           state via BTF.
> + */
> +TRACE_EVENT(
> +	landlock_create_ruleset,
> +
> +	TP_PROTO(const struct landlock_ruleset *ruleset),
> +
> +	TP_ARGS(ruleset),
> +
> +	TP_STRUCT__entry(__field(__u64, ruleset_id) __field(access_mask_t,
> +							    handled_fs)
> +				 __field(access_mask_t, handled_net)
> +					 __field(access_mask_t, scoped)),
> +
> +	TP_fast_assign(__entry->ruleset_id = ruleset->id;
> +		       __entry->handled_fs = ruleset->layer.fs;
> +		       __entry->handled_net = ruleset->layer.net;
> +		       __entry->scoped = ruleset->layer.scope;),
> +
> +	TP_printk("ruleset=%llx handled_fs=0x%x handled_net=0x%x scoped=0x%x",
> +		  __entry->ruleset_id, __entry->handled_fs,
> +		  __entry->handled_net, __entry->scoped));
> +
> +/**
> + * landlock_free_ruleset - Ruleset freed
> + *
> + * Emitted when a ruleset's last reference is dropped (typically when
> + * the creating process closes the ruleset file descriptor).
> + */
> +TRACE_EVENT(landlock_free_ruleset,
> +
> +	    TP_PROTO(const struct landlock_ruleset *ruleset),
> +
> +	    TP_ARGS(ruleset),
> +
> +	    TP_STRUCT__entry(__field(__u64, ruleset_id)),
> +
> +	    TP_fast_assign(__entry->ruleset_id = ruleset->id;),
> +
> +	    TP_printk("ruleset=%llx", __entry->ruleset_id));
> +
> +#endif /* _TRACE_LANDLOCK_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/security/landlock/id.h b/security/landlock/id.h
> index 45dcfb9e9a8b..2a43c2b523a8 100644
> --- a/security/landlock/id.h
> +++ b/security/landlock/id.h
> @@ -8,18 +8,18 @@
>  #ifndef _SECURITY_LANDLOCK_ID_H
>  #define _SECURITY_LANDLOCK_ID_H
>  
> -#ifdef CONFIG_AUDIT
> +#ifdef CONFIG_SECURITY_LANDLOCK_LOG
>  
>  void __init landlock_init_id(void);
>  
>  u64 landlock_get_id_range(size_t number_of_ids);
>  
> -#else /* CONFIG_AUDIT */
> +#else /* CONFIG_SECURITY_LANDLOCK_LOG */
>  
>  static inline void __init landlock_init_id(void)
>  {
>  }
>  
> -#endif /* CONFIG_AUDIT */
> +#endif /* CONFIG_SECURITY_LANDLOCK_LOG */
>  
>  #endif /* _SECURITY_LANDLOCK_ID_H */
> diff --git a/security/landlock/log.c b/security/landlock/log.c
> index c9b506707af0..ef79e4ed0037 100644
> --- a/security/landlock/log.c
> +++ b/security/landlock/log.c
> @@ -174,6 +174,11 @@ static void audit_denial(const struct landlock_cred_security *const subject,
>  
>  #endif /* CONFIG_AUDIT */
>  
> +#ifdef CONFIG_TRACEPOINTS
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/landlock.h>
> +#endif /* CONFIG_TRACEPOINTS */
> +
>  static struct landlock_hierarchy *
>  get_hierarchy(const struct landlock_domain *const domain, const size_t layer)
>  {
> diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c
> index c220e0f9cf5f..0d1e3dadb318 100644
> --- a/security/landlock/ruleset.c
> +++ b/security/landlock/ruleset.c
> @@ -22,10 +22,13 @@
>  #include <linux/spinlock.h>
>  
>  #include "access.h"
> +#include "id.h"
>  #include "limits.h"
>  #include "object.h"
>  #include "ruleset.h"
>  
> +#include <trace/events/landlock.h>
> +
>  struct landlock_ruleset *
>  landlock_create_ruleset(const access_mask_t fs_access_mask,
>  			const access_mask_t net_access_mask,
> @@ -49,6 +52,10 @@ landlock_create_ruleset(const access_mask_t fs_access_mask,
>  	new_ruleset->rules.root_net_port = RB_ROOT;
>  #endif /* IS_ENABLED(CONFIG_INET) */
>  
> +#ifdef CONFIG_SECURITY_LANDLOCK_LOG
> +	new_ruleset->id = landlock_get_id_range(1);
> +#endif /* CONFIG_SECURITY_LANDLOCK_LOG */
The addition of IDs to rulesets for logging makes sense.

But it is limited in usefulness without some form of introspection to be
able to correlate it to a specific userspace ruleset.

If a program creates multiple Landlock rulesets, and wishes to trace and
correlate which ruleset FD corresponds to the log/tracepoint, it is
difficult when no form of introspection exists.

Maybe a syscall flag or ioctl to retrieve the identifier for correlation
would be useful?

Justin
> +
>  	/* Should already be checked in sys_landlock_create_ruleset(). */
>  	if (fs_access_mask) {
>  		WARN_ON_ONCE(fs_access_mask !=
> @@ -312,6 +319,7 @@ void landlock_free_rules(struct landlock_rules *const rules)
>  static void free_ruleset(struct landlock_ruleset *const ruleset)
> [...]

^ permalink raw reply

* [PATCHv4 13/13] selftests/bpf: Add tests for forked/cloned optimized uprobes
From: Jiri Olsa @ 2026-05-26 20:58 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260526205840.173790-1-jolsa@kernel.org>

Adding tests for forked/cloned optimized uprobes and make
sure the child can properly execute optimized probe for
both fork (dups mm) and clone with CLONE_VM.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 88 +++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index efff0c515184..033d32b4cc27 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -4,6 +4,8 @@
 
 #ifdef __x86_64__
 
+#define _GNU_SOURCE
+#include <sched.h>
 #include <unistd.h>
 #include <asm/ptrace.h>
 #include <linux/compiler.h>
@@ -936,6 +938,88 @@ static void test_uprobe_error(void)
 	ASSERT_EQ(errno, EPROTO, "errno");
 }
 
+__attribute__((aligned(16)))
+__nocf_check __weak __naked void uprobe_fork_test(void)
+{
+	asm volatile (
+		".byte 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n" /* nop10 */
+		"ret\n"
+	);
+}
+
+static int child_func(void *arg)
+{
+	struct uprobe_syscall_executed *skel = arg;
+
+	/* Make sure the child's probe is still there and optimized.. */
+	if (memcmp(uprobe_fork_test, lea_rsp, sizeof(lea_rsp)))
+		_exit(1);
+
+	skel->bss->pid = getpid();
+
+	/* .. and it executes properly. */
+	uprobe_fork_test();
+
+	if (skel->bss->executed != 3)
+		_exit(2);
+
+	_exit(0);
+}
+
+static void test_uprobe_fork_optimized(bool clone_vm)
+{
+	struct uprobe_syscall_executed *skel = NULL;
+	struct bpf_link *link = NULL;
+	unsigned long offset;
+	int pid, status, err;
+	char stack[65535];
+
+	offset = get_uprobe_offset(&uprobe_fork_test);
+	if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
+		return;
+
+	skel = uprobe_syscall_executed__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		goto cleanup;
+
+	link = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
+				-1, "/proc/self/exe", offset, NULL);
+	if (!ASSERT_OK_PTR(link, "attach_uprobe"))
+		goto cleanup;
+
+	skel->bss->pid = getpid();
+
+	/* Trigger optimization of uprobe in uprobe_fork_test.  */
+	uprobe_fork_test();
+	uprobe_fork_test();
+
+	/* Make sure it got optimied. */
+	if (!ASSERT_OK(memcmp(uprobe_fork_test, lea_rsp, sizeof(lea_rsp)), "optimized"))
+		goto cleanup;
+
+	if (clone_vm) {
+		pid = clone(child_func, stack + sizeof(stack), CLONE_VM|SIGCHLD, skel);
+		if (!ASSERT_GT(pid, 0, "clone"))
+			goto cleanup;
+	} else {
+		pid = fork();
+		if (!ASSERT_GE(pid, 0, "fork"))
+			goto cleanup;
+		if (pid == 0)
+			child_func(skel);
+	}
+
+	/* Wait for the child and verify it exited properly with 0. */
+	err = waitpid(pid, &status, 0);
+	if (ASSERT_EQ(err, pid, "waitpid")) {
+		ASSERT_EQ(WIFEXITED(status), 1, "child_exited");
+		ASSERT_EQ(WEXITSTATUS(status), 0, "child_exit_code");
+	}
+
+cleanup:
+	uprobe_syscall_executed__destroy(skel);
+}
+
 static void __test_uprobe_syscall(void)
 {
 	if (test__start_subtest("uretprobe_regs_equal"))
@@ -956,6 +1040,10 @@ static void __test_uprobe_syscall(void)
 		test_uprobe_race();
 	if (test__start_subtest("uprobe_red_zone"))
 		test_uprobe_red_zone();
+	if (test__start_subtest("uprobe_optimized_fork"))
+		test_uprobe_fork_optimized(false);
+	if (test__start_subtest("uprobe_optimized_clone_vm"))
+		test_uprobe_fork_optimized(true);
 	if (test__start_subtest("uprobe_error"))
 		test_uprobe_error();
 	if (test__start_subtest("uprobe_regs_equal"))
-- 
2.54.0


^ permalink raw reply related

* [PATCHv4 12/13] selftests/bpf: Add tests for uprobe nop10 red zone clobbering
From: Jiri Olsa @ 2026-05-26 20:58 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260526205840.173790-1-jolsa@kernel.org>

From: Andrii Nakryiko <andrii@kernel.org>

The uprobe nop5 optimization used to replace a 5-byte NOP with a 5-byte
CALL to a trampoline. The CALL pushes a return address onto the stack at
[rsp-8], clobbering whatever was stored there.

On x86-64, the red zone is the 128 bytes below rsp that user code may use
for temporary storage without adjusting rsp. Compilers can place USDT
argument operands there, generating specs like "8@-8(%rbp)" when rbp ==
rsp. With the CALL-based optimization, the return address overwrites that
argument before the BPF-side USDT argument fetch runs.

Add two tests for this case. The uprobe_syscall subtest stores known values
at -8(%rsp), -16(%rsp), and -24(%rsp), executes an optimized nop10 uprobe,
and verifies the red-zone data is still intact. The USDT subtest triggers a
probe in a function where the compiler places three USDT operands in the
red zone and verifies that all 10 optimized invocations deliver the expected
argument values to BPF.

On an unfixed kernel, the first hit goes through the INT3 path and later
hits use the optimized CALL path, so the red-zone checks fail after
optimization.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
[ updates to use nop10 ]
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 75 +++++++++++++++++++
 tools/testing/selftests/bpf/prog_tests/usdt.c | 49 ++++++++++++
 tools/testing/selftests/bpf/progs/test_usdt.c | 25 +++++++
 tools/testing/selftests/bpf/usdt_2.c          | 13 ++++
 4 files changed, 162 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 969f4deba9fd..efff0c515184 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -357,6 +357,48 @@ __nocf_check __weak void usdt_test(void)
 	USDT(optimized_uprobe, usdt);
 }
 
+/*
+ * Assembly-level red zone clobbering test. Stores known values in the
+ * red zone (below RSP), executes a nop10 (uprobe site), and checks that
+ * the values survived. Returns 0 if intact, 1 if clobbered.
+ *
+ * The nop5 optimization used CALL (which pushes a return address to
+ * [rsp-8]), the value at -8(%rsp) was overwritten. The nop10 optimization
+ * should escape that by moving stackpointer below the redzone before
+ * doing the CALL.
+ */
+__attribute__((aligned(16)))
+__nocf_check __weak __naked unsigned long uprobe_red_zone_test(void)
+{
+	asm volatile (
+		"movabs $0x1111111111111111, %%rax\n"
+		"movq   %%rax, -8(%%rsp)\n"
+		"movabs $0x2222222222222222, %%rax\n"
+		"movq   %%rax, -16(%%rsp)\n"
+		"movabs $0x3333333333333333, %%rax\n"
+		"movq   %%rax, -24(%%rsp)\n"
+
+		".byte 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n" /* nop10: uprobe site */
+
+		"movabs $0x1111111111111111, %%rax\n"
+		"cmpq   %%rax, -8(%%rsp)\n"
+		"jne    1f\n"
+		"movabs $0x2222222222222222, %%rax\n"
+		"cmpq   %%rax, -16(%%rsp)\n"
+		"jne    1f\n"
+		"movabs $0x3333333333333333, %%rax\n"
+		"cmpq   %%rax, -24(%%rsp)\n"
+		"jne    1f\n"
+
+		"xorl   %%eax, %%eax\n"
+		"retq\n"
+		"1:\n"
+		"movl   $1, %%eax\n"
+		"retq\n"
+		::: "rax", "memory"
+	);
+}
+
 static int find_uprobes_trampoline(void *tramp_addr)
 {
 	void *start, *end;
@@ -855,6 +897,37 @@ static void test_uprobe_race(void)
 #define __NR_uprobe 336
 #endif
 
+static void test_uprobe_red_zone(void)
+{
+	struct uprobe_syscall_executed *skel;
+	struct bpf_link *link;
+	void *nop10_addr;
+	size_t offset;
+	int i;
+
+	nop10_addr = find_nop10(uprobe_red_zone_test);
+	if (!ASSERT_NEQ(nop10_addr, NULL, "find_nop10"))
+		return;
+
+	skel = uprobe_syscall_executed__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		return;
+
+	offset = get_uprobe_offset(nop10_addr);
+	link = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
+			0, "/proc/self/exe", offset, NULL);
+	if (!ASSERT_OK_PTR(link, "attach_uprobe"))
+		goto cleanup;
+
+	for (i = 0; i < 10; i++)
+		ASSERT_EQ(uprobe_red_zone_test(), 0, "red_zone_intact");
+
+	bpf_link__destroy(link);
+
+cleanup:
+	uprobe_syscall_executed__destroy(skel);
+}
+
 static void test_uprobe_error(void)
 {
 	long err = syscall(__NR_uprobe);
@@ -881,6 +954,8 @@ static void __test_uprobe_syscall(void)
 		test_uprobe_usdt();
 	if (test__start_subtest("uprobe_race"))
 		test_uprobe_race();
+	if (test__start_subtest("uprobe_red_zone"))
+		test_uprobe_red_zone();
 	if (test__start_subtest("uprobe_error"))
 		test_uprobe_error();
 	if (test__start_subtest("uprobe_regs_equal"))
diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
index fda3a298ccfc..8004c9568ffa 100644
--- a/tools/testing/selftests/bpf/prog_tests/usdt.c
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -250,6 +250,7 @@ static void subtest_basic_usdt(bool optimized)
 #ifdef __x86_64__
 extern void usdt_1(void);
 extern void usdt_2(void);
+extern void usdt_red_zone_trigger(void);
 
 static unsigned char nop1[1] = { 0x90 };
 static unsigned char nop1_nop10_combo[11] = { 0x90, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 };
@@ -340,6 +341,52 @@ static void subtest_optimized_attach(void)
 cleanup:
 	test_usdt__destroy(skel);
 }
+
+/*
+ * Test that USDT arguments survive nop10 optimization in a function where
+ * the compiler places operands in the red zone.
+ *
+ * Signal handlers are prone to having the compiler place USDT argument
+ * operands in the red zone (below rsp).
+ *
+ * The nop5 optimization used CALL (which pushes a return address to
+ * [rsp-8]), the value at -8(%rsp) was overwritten. The nop10 optimization
+ * should escape that by moving stackpointer below the redzone before
+ * doing the CALL.
+ */
+static void subtest_optimized_red_zone(void)
+{
+	struct test_usdt *skel;
+	int i;
+
+	skel = test_usdt__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "open_and_load"))
+		return;
+
+	skel->bss->expected_arg[0] = 0xDEADBEEF;
+	skel->bss->expected_arg[1] = 0xCAFEBABE;
+	skel->bss->expected_arg[2] = 0xFEEDFACE;
+	skel->bss->expected_pid = getpid();
+
+	skel->links.usdt_check_arg = bpf_program__attach_usdt(
+		skel->progs.usdt_check_arg, 0, "/proc/self/exe",
+		"optimized_attach", "usdt_red_zone", NULL);
+	if (!ASSERT_OK_PTR(skel->links.usdt_check_arg, "attach_usdt_red_zone"))
+		goto cleanup;
+
+	for (i = 0; i < 10; i++)
+		usdt_red_zone_trigger();
+
+	ASSERT_EQ(skel->bss->arg_total, 10, "arg_total");
+	ASSERT_EQ(skel->bss->arg_bad, 0, "arg_bad");
+	ASSERT_EQ(skel->bss->arg_last[0], 0xDEADBEEF, "arg_last_1");
+	ASSERT_EQ(skel->bss->arg_last[1], 0xCAFEBABE, "arg_last_2");
+	ASSERT_EQ(skel->bss->arg_last[2], 0xFEEDFACE, "arg_last_3");
+
+cleanup:
+	test_usdt__destroy(skel);
+}
+
 #endif
 
 unsigned short test_usdt_100_semaphore SEC(".probes");
@@ -613,6 +660,8 @@ void test_usdt(void)
 		subtest_basic_usdt(true);
 	if (test__start_subtest("optimized_attach"))
 		subtest_optimized_attach();
+	if (test__start_subtest("optimized_red_zone"))
+		subtest_optimized_red_zone();
 #endif
 	if (test__start_subtest("multispec"))
 		subtest_multispec_usdt();
diff --git a/tools/testing/selftests/bpf/progs/test_usdt.c b/tools/testing/selftests/bpf/progs/test_usdt.c
index f00cb52874e0..0ee78fb050a1 100644
--- a/tools/testing/selftests/bpf/progs/test_usdt.c
+++ b/tools/testing/selftests/bpf/progs/test_usdt.c
@@ -149,5 +149,30 @@ int usdt_executed(struct pt_regs *ctx)
 		executed++;
 	return 0;
 }
+
+int arg_total;
+int arg_bad;
+long arg_last[3];
+long expected_arg[3];
+int expected_pid;
+
+SEC("usdt")
+int BPF_USDT(usdt_check_arg, long arg1, long arg2, long arg3)
+{
+	if (expected_pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	__sync_fetch_and_add(&arg_total, 1);
+	arg_last[0] = arg1;
+	arg_last[1] = arg2;
+	arg_last[2] = arg3;
+
+	if (arg1 != expected_arg[0] ||
+	    arg2 != expected_arg[1] ||
+	    arg3 != expected_arg[2])
+		__sync_fetch_and_add(&arg_bad, 1);
+
+	return 0;
+}
 #endif
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/usdt_2.c b/tools/testing/selftests/bpf/usdt_2.c
index b359b389f6c0..5e38f8605b02 100644
--- a/tools/testing/selftests/bpf/usdt_2.c
+++ b/tools/testing/selftests/bpf/usdt_2.c
@@ -13,4 +13,17 @@ void usdt_2(void)
 	USDT(optimized_attach, usdt_2);
 }
 
+static volatile unsigned long usdt_red_zone_arg1 = 0xDEADBEEF;
+static volatile unsigned long usdt_red_zone_arg2 = 0xCAFEBABE;
+static volatile unsigned long usdt_red_zone_arg3 = 0xFEEDFACE;
+
+void __attribute__((noinline)) usdt_red_zone_trigger(void)
+{
+	unsigned long a1 = usdt_red_zone_arg1;
+	unsigned long a2 = usdt_red_zone_arg2;
+	unsigned long a3 = usdt_red_zone_arg3;
+
+	USDT(optimized_attach, usdt_red_zone, a1, a2, a3);
+}
+
 #endif
-- 
2.54.0


^ permalink raw reply related

* [PATCHv4 11/13] selftests/bpf: Add reattach tests for uprobe syscall
From: Jiri Olsa @ 2026-05-26 20:58 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260526205840.173790-1-jolsa@kernel.org>

Adding reattach tests for uprobe syscall tests to make sure
we can re-attach and optimize same uprobe multiple times.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 114 ++++++++++++++++--
 1 file changed, 104 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 0868fb9793e0..969f4deba9fd 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -430,23 +430,28 @@ static void *check_attach(struct uprobe_syscall_executed *skel, trigger_t trigge
 	return tramp;
 }
 
-static void check_detach(void *addr, void *tramp)
+static bool check_detach(void *addr, void *tramp)
 {
 	static const char nop10_prefix[] = { 0x66, 0x2e, 0x0f, 0x1f, 0x84 };
+	bool ok = true;
 
 	/* [uprobes_trampoline] stays after detach */
-	ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline");
-	ASSERT_OK(memcmp(addr, nop10_prefix, 5), "nop10_prefix");
+	if (!ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline"))
+		ok = false;
+	if (!ASSERT_OK(memcmp(addr, nop10_prefix, 5), "nop10_prefix"))
+		ok = false;
+	return ok;
 }
 
-static void check(struct uprobe_syscall_executed *skel, struct bpf_link *link,
-		  trigger_t trigger, void *addr, int executed)
+static void *check(struct uprobe_syscall_executed *skel, struct bpf_link *link,
+		   trigger_t trigger, void *addr, int executed)
 {
 	void *tramp;
 
 	tramp = check_attach(skel, trigger, addr, executed);
 	bpf_link__destroy(link);
 	check_detach(addr, tramp);
+	return tramp;
 }
 
 static void test_uprobe_legacy(void)
@@ -457,6 +462,7 @@ static void test_uprobe_legacy(void)
 	);
 	struct bpf_link *link;
 	unsigned long offset;
+	void *tramp;
 
 	offset = get_uprobe_offset(&uprobe_test);
 	if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
@@ -474,7 +480,28 @@ static void test_uprobe_legacy(void)
 	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_opts"))
 		goto cleanup;
 
-	check(skel, link, uprobe_test, uprobe_test, 2);
+	tramp = check(skel, link, uprobe_test, uprobe_test, 2);
+
+	/* reattach and detach without triggering optimization */
+	link = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
+					       0, "/proc/self/exe", offset, NULL);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_opts"))
+		goto cleanup;
+
+	bpf_link__destroy(link);
+	if (!check_detach(uprobe_test, tramp))
+		goto cleanup;
+
+	uprobe_test();
+	ASSERT_EQ(skel->bss->executed, 2, "executed_no_probe");
+
+	/* reattach with triggering optimization */
+	link = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
+				0, "/proc/self/exe", offset, NULL);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_opts"))
+		goto cleanup;
+
+	check(skel, link, uprobe_test, uprobe_test, 4);
 
 	/* uretprobe */
 	skel->bss->executed = 0;
@@ -496,6 +523,7 @@ static void test_uprobe_multi(void)
 	LIBBPF_OPTS(bpf_uprobe_multi_opts, opts);
 	struct bpf_link *link;
 	unsigned long offset;
+	void *tramp;
 
 	offset = get_uprobe_offset(&uprobe_test);
 	if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
@@ -516,7 +544,28 @@ static void test_uprobe_multi(void)
 	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
 		goto cleanup;
 
-	check(skel, link, uprobe_test, uprobe_test, 2);
+	tramp = check(skel, link, uprobe_test, uprobe_test, 2);
+
+	/* reattach and detach without triggering optimization */
+	link = bpf_program__attach_uprobe_multi(skel->progs.test_uprobe_multi,
+				0, "/proc/self/exe", NULL, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
+		goto cleanup;
+
+	bpf_link__destroy(link);
+	if (!check_detach(uprobe_test, tramp))
+		goto cleanup;
+
+	uprobe_test();
+	ASSERT_EQ(skel->bss->executed, 2, "executed_no_probe");
+
+	/* reattach with triggering optimization */
+	link = bpf_program__attach_uprobe_multi(skel->progs.test_uprobe_multi,
+				0, "/proc/self/exe", NULL, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
+		goto cleanup;
+
+	check(skel, link, uprobe_test, uprobe_test, 4);
 
 	/* uretprobe.multi */
 	skel->bss->executed = 0;
@@ -540,6 +589,7 @@ static void test_uprobe_session(void)
 	);
 	struct bpf_link *link;
 	unsigned long offset;
+	void *tramp;
 
 	offset = get_uprobe_offset(&uprobe_test);
 	if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
@@ -559,7 +609,28 @@ static void test_uprobe_session(void)
 	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
 		goto cleanup;
 
-	check(skel, link, uprobe_test, uprobe_test, 4);
+	tramp = check(skel, link, uprobe_test, uprobe_test, 4);
+
+	/* reattach and detach without triggering optimization */
+	link = bpf_program__attach_uprobe_multi(skel->progs.test_uprobe_session,
+				0, "/proc/self/exe", NULL, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
+		goto cleanup;
+
+	bpf_link__destroy(link);
+	if (!check_detach(uprobe_test, tramp))
+		goto cleanup;
+
+	uprobe_test();
+	ASSERT_EQ(skel->bss->executed, 4, "executed_no_probe");
+
+	/* reattach with triggering optimization */
+	link = bpf_program__attach_uprobe_multi(skel->progs.test_uprobe_session,
+				0, "/proc/self/exe", NULL, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
+		goto cleanup;
+
+	check(skel, link, uprobe_test, uprobe_test, 8);
 
 cleanup:
 	uprobe_syscall_executed__destroy(skel);
@@ -569,7 +640,7 @@ static void test_uprobe_usdt(void)
 {
 	struct uprobe_syscall_executed *skel;
 	struct bpf_link *link;
-	void *addr;
+	void *addr, *tramp;
 
 	errno = 0;
 	addr = find_nop10(usdt_test);
@@ -588,7 +659,30 @@ static void test_uprobe_usdt(void)
 	if (!ASSERT_OK_PTR(link, "bpf_program__attach_usdt"))
 		goto cleanup;
 
-	check(skel, link, usdt_test, addr, 2);
+	tramp = check(skel, link, usdt_test, addr, 2);
+
+	/* reattach and detach without triggering optimization */
+	link = bpf_program__attach_usdt(skel->progs.test_usdt,
+				-1 /* all PIDs */, "/proc/self/exe",
+				"optimized_uprobe", "usdt", NULL);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_usdt"))
+		goto cleanup;
+
+	bpf_link__destroy(link);
+	if (!check_detach(addr, tramp))
+		goto cleanup;
+
+	usdt_test();
+	ASSERT_EQ(skel->bss->executed, 2, "executed_no_probe");
+
+	/* reattach with triggering optimization */
+	link = bpf_program__attach_usdt(skel->progs.test_usdt,
+				-1 /* all PIDs */, "/proc/self/exe",
+				"optimized_uprobe", "usdt", NULL);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_usdt"))
+		goto cleanup;
+
+	check(skel, link, usdt_test, addr, 4);
 
 cleanup:
 	uprobe_syscall_executed__destroy(skel);
-- 
2.54.0


^ permalink raw reply related

* [PATCHv4 10/13] selftests/bpf: Change uprobe/usdt trigger bench code to use nop10
From: Jiri Olsa @ 2026-05-26 20:58 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260526205840.173790-1-jolsa@kernel.org>

Changing uprobe/usdt trigger bench code to use nop10 instead
of nop5. Also changing run_bench_uprobes.sh to use nop10 triggers.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/bench.c           | 20 +++++------
 .../selftests/bpf/benchs/bench_trigger.c      | 36 +++++++++----------
 .../selftests/bpf/benchs/run_bench_uprobes.sh |  2 +-
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 6155ce455c27..1252a1af2e84 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -539,12 +539,12 @@ extern const struct bench bench_trig_uretprobe_multi_push;
 extern const struct bench bench_trig_uprobe_multi_ret;
 extern const struct bench bench_trig_uretprobe_multi_ret;
 #ifdef __x86_64__
-extern const struct bench bench_trig_uprobe_nop5;
-extern const struct bench bench_trig_uretprobe_nop5;
-extern const struct bench bench_trig_uprobe_multi_nop5;
-extern const struct bench bench_trig_uretprobe_multi_nop5;
+extern const struct bench bench_trig_uprobe_nop10;
+extern const struct bench bench_trig_uretprobe_nop10;
+extern const struct bench bench_trig_uprobe_multi_nop10;
+extern const struct bench bench_trig_uretprobe_multi_nop10;
 extern const struct bench bench_trig_usdt_nop;
-extern const struct bench bench_trig_usdt_nop5;
+extern const struct bench bench_trig_usdt_nop10;
 #endif
 
 extern const struct bench bench_rb_libbpf;
@@ -619,12 +619,12 @@ static const struct bench *benchs[] = {
 	&bench_trig_uprobe_multi_ret,
 	&bench_trig_uretprobe_multi_ret,
 #ifdef __x86_64__
-	&bench_trig_uprobe_nop5,
-	&bench_trig_uretprobe_nop5,
-	&bench_trig_uprobe_multi_nop5,
-	&bench_trig_uretprobe_multi_nop5,
+	&bench_trig_uprobe_nop10,
+	&bench_trig_uretprobe_nop10,
+	&bench_trig_uprobe_multi_nop10,
+	&bench_trig_uretprobe_multi_nop10,
 	&bench_trig_usdt_nop,
-	&bench_trig_usdt_nop5,
+	&bench_trig_usdt_nop10,
 #endif
 	/* ringbuf/perfbuf benchmarks */
 	&bench_rb_libbpf,
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index a60b8173cdc4..61513efc167a 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -396,15 +396,15 @@ static void *uprobe_producer_ret(void *input)
 }
 
 #ifdef __x86_64__
-__nocf_check __weak void uprobe_target_nop5(void)
+__nocf_check __weak void uprobe_target_nop10(void)
 {
 	asm volatile (".byte 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00");
 }
 
-static void *uprobe_producer_nop5(void *input)
+static void *uprobe_producer_nop10(void *input)
 {
 	while (true)
-		uprobe_target_nop5();
+		uprobe_target_nop10();
 	return NULL;
 }
 
@@ -418,7 +418,7 @@ static void *uprobe_producer_usdt_nop(void *input)
 	return NULL;
 }
 
-static void *uprobe_producer_usdt_nop5(void *input)
+static void *uprobe_producer_usdt_nop10(void *input)
 {
 	while (true)
 		usdt_2();
@@ -542,24 +542,24 @@ static void uretprobe_multi_ret_setup(void)
 }
 
 #ifdef __x86_64__
-static void uprobe_nop5_setup(void)
+static void uprobe_nop10_setup(void)
 {
-	usetup(false, false /* !use_multi */, &uprobe_target_nop5);
+	usetup(false, false /* !use_multi */, &uprobe_target_nop10);
 }
 
-static void uretprobe_nop5_setup(void)
+static void uretprobe_nop10_setup(void)
 {
-	usetup(true, false /* !use_multi */, &uprobe_target_nop5);
+	usetup(true, false /* !use_multi */, &uprobe_target_nop10);
 }
 
-static void uprobe_multi_nop5_setup(void)
+static void uprobe_multi_nop10_setup(void)
 {
-	usetup(false, true /* use_multi */, &uprobe_target_nop5);
+	usetup(false, true /* use_multi */, &uprobe_target_nop10);
 }
 
-static void uretprobe_multi_nop5_setup(void)
+static void uretprobe_multi_nop10_setup(void)
 {
-	usetup(true, true /* use_multi */, &uprobe_target_nop5);
+	usetup(true, true /* use_multi */, &uprobe_target_nop10);
 }
 
 static void usdt_setup(const char *name)
@@ -598,7 +598,7 @@ static void usdt_nop_setup(void)
 	usdt_setup("usdt_1");
 }
 
-static void usdt_nop5_setup(void)
+static void usdt_nop10_setup(void)
 {
 	usdt_setup("usdt_2");
 }
@@ -665,10 +665,10 @@ BENCH_TRIG_USERMODE(uretprobe_multi_nop, nop, "uretprobe-multi-nop");
 BENCH_TRIG_USERMODE(uretprobe_multi_push, push, "uretprobe-multi-push");
 BENCH_TRIG_USERMODE(uretprobe_multi_ret, ret, "uretprobe-multi-ret");
 #ifdef __x86_64__
-BENCH_TRIG_USERMODE(uprobe_nop5, nop5, "uprobe-nop5");
-BENCH_TRIG_USERMODE(uretprobe_nop5, nop5, "uretprobe-nop5");
-BENCH_TRIG_USERMODE(uprobe_multi_nop5, nop5, "uprobe-multi-nop5");
-BENCH_TRIG_USERMODE(uretprobe_multi_nop5, nop5, "uretprobe-multi-nop5");
+BENCH_TRIG_USERMODE(uprobe_nop10, nop10, "uprobe-nop10");
+BENCH_TRIG_USERMODE(uretprobe_nop10, nop10, "uretprobe-nop10");
+BENCH_TRIG_USERMODE(uprobe_multi_nop10, nop10, "uprobe-multi-nop10");
+BENCH_TRIG_USERMODE(uretprobe_multi_nop10, nop10, "uretprobe-multi-nop10");
 BENCH_TRIG_USERMODE(usdt_nop, usdt_nop, "usdt-nop");
-BENCH_TRIG_USERMODE(usdt_nop5, usdt_nop5, "usdt-nop5");
+BENCH_TRIG_USERMODE(usdt_nop10, usdt_nop10, "usdt-nop10");
 #endif
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
index 9ec59423b949..e490b337e960 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
@@ -2,7 +2,7 @@
 
 set -eufo pipefail
 
-for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop5} usdt-nop usdt-nop5
+for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop10} usdt-nop usdt-nop10
 do
 	summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
 	printf "%-15s: %s\n" $i "$summary"
-- 
2.54.0


^ permalink raw reply related

* [PATCHv4 09/13] selftests/bpf: Change uprobe syscall tests to use nop10
From: Jiri Olsa @ 2026-05-26 20:58 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260526205840.173790-1-jolsa@kernel.org>

Optimized uprobes are now on top of 10-bytes nop instructions,
reflect that in existing tests.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/benchs/bench_trigger.c      |  2 +-
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 30 +++++++++++--------
 tools/testing/selftests/bpf/prog_tests/usdt.c | 25 +++++++++-------
 tools/testing/selftests/bpf/usdt_2.c          |  2 +-
 4 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index 2f22ec61667b..a60b8173cdc4 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -398,7 +398,7 @@ static void *uprobe_producer_ret(void *input)
 #ifdef __x86_64__
 __nocf_check __weak void uprobe_target_nop5(void)
 {
-	asm volatile (".byte 0x0f, 0x1f, 0x44, 0x00, 0x00");
+	asm volatile (".byte 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00");
 }
 
 static void *uprobe_producer_nop5(void *input)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index c944136252c6..0868fb9793e0 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -17,7 +17,7 @@
 #include "uprobe_syscall_executed.skel.h"
 #include "bpf/libbpf_internal.h"
 
-#define USDT_NOP .byte 0x0f, 0x1f, 0x44, 0x00, 0x00
+#define USDT_NOP .byte 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00
 #include "usdt.h"
 
 #pragma GCC diagnostic ignored "-Wattributes"
@@ -26,7 +26,7 @@ __attribute__((aligned(16)))
 __nocf_check __weak __naked unsigned long uprobe_regs_trigger(void)
 {
 	asm volatile (
-		".byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n" /* nop5 */
+		".byte 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n" /* nop10 */
 		"movq $0xdeadbeef, %rax\n"
 		"ret\n"
 	);
@@ -345,9 +345,9 @@ static void test_uretprobe_syscall_call(void)
 __attribute__((aligned(16)))
 __nocf_check __weak __naked void uprobe_test(void)
 {
-	asm volatile ("					\n"
-		".byte 0x0f, 0x1f, 0x44, 0x00, 0x00	\n"
-		"ret					\n"
+	asm volatile (
+		".byte 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n" /* nop10 */
+		"ret\n"
 	);
 }
 
@@ -388,14 +388,15 @@ static int find_uprobes_trampoline(void *tramp_addr)
 	return ret;
 }
 
-static unsigned char nop5[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
+static unsigned char nop10[10]  = { 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 };
+static unsigned char lea_rsp[5] = { 0x48, 0x8d, 0x64, 0x24, 0x80 };
 
-static void *find_nop5(void *fn)
+static void *find_nop10(void *fn)
 {
 	int i;
 
-	for (i = 0; i < 10; i++) {
-		if (!memcmp(nop5, fn + i, 5))
+	for (i = 0; i < 128; i++) {
+		if (!memcmp(nop10, fn + i, 10))
 			return fn + i;
 	}
 	return NULL;
@@ -420,7 +421,8 @@ static void *check_attach(struct uprobe_syscall_executed *skel, trigger_t trigge
 	ASSERT_EQ(skel->bss->executed, executed, "executed");
 
 	/* .. and check the trampoline is as expected. */
-	call = (struct __arch_relative_insn *) addr;
+	ASSERT_OK(memcmp(addr, lea_rsp, 5), "lea_rsp");
+	call = (struct __arch_relative_insn *)(addr + 5);
 	tramp = (void *) (call + 1) + call->raddr;
 	ASSERT_EQ(call->op, 0xe8, "call");
 	ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline");
@@ -430,9 +432,11 @@ static void *check_attach(struct uprobe_syscall_executed *skel, trigger_t trigge
 
 static void check_detach(void *addr, void *tramp)
 {
+	static const char nop10_prefix[] = { 0x66, 0x2e, 0x0f, 0x1f, 0x84 };
+
 	/* [uprobes_trampoline] stays after detach */
 	ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline");
-	ASSERT_OK(memcmp(addr, nop5, 5), "nop5");
+	ASSERT_OK(memcmp(addr, nop10_prefix, 5), "nop10_prefix");
 }
 
 static void check(struct uprobe_syscall_executed *skel, struct bpf_link *link,
@@ -568,8 +572,8 @@ static void test_uprobe_usdt(void)
 	void *addr;
 
 	errno = 0;
-	addr = find_nop5(usdt_test);
-	if (!ASSERT_OK_PTR(addr, "find_nop5"))
+	addr = find_nop10(usdt_test);
+	if (!ASSERT_OK_PTR(addr, "find_nop10"))
 		return;
 
 	skel = uprobe_syscall_executed__open_and_load();
diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
index 69759b27794d..fda3a298ccfc 100644
--- a/tools/testing/selftests/bpf/prog_tests/usdt.c
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -252,7 +252,7 @@ extern void usdt_1(void);
 extern void usdt_2(void);
 
 static unsigned char nop1[1] = { 0x90 };
-static unsigned char nop1_nop5_combo[6] = { 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 };
+static unsigned char nop1_nop10_combo[11] = { 0x90, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 };
 
 static void *find_instr(void *fn, unsigned char *instr, size_t cnt)
 {
@@ -271,17 +271,17 @@ static void subtest_optimized_attach(void)
 	__u8 *addr_1, *addr_2;
 
 	/* usdt_1 USDT probe has single nop instruction */
-	addr_1 = find_instr(usdt_1, nop1_nop5_combo, 6);
-	if (!ASSERT_NULL(addr_1, "usdt_1_find_nop1_nop5_combo"))
+	addr_1 = find_instr(usdt_1, nop1_nop10_combo, 11);
+	if (!ASSERT_NULL(addr_1, "usdt_1_find_nop1_nop10_combo"))
 		return;
 
 	addr_1 = find_instr(usdt_1, nop1, 1);
 	if (!ASSERT_OK_PTR(addr_1, "usdt_1_find_nop1"))
 		return;
 
-	/* usdt_2 USDT probe has nop,nop5 instructions combo */
-	addr_2 = find_instr(usdt_2, nop1_nop5_combo, 6);
-	if (!ASSERT_OK_PTR(addr_2, "usdt_2_find_nop1_nop5_combo"))
+	/* usdt_2 USDT probe has nop,nop10 instructions combo */
+	addr_2 = find_instr(usdt_2, nop1_nop10_combo, 11);
+	if (!ASSERT_OK_PTR(addr_2, "usdt_2_find_nop1_nop10_combo"))
 		return;
 
 	skel = test_usdt__open_and_load();
@@ -309,12 +309,12 @@ static void subtest_optimized_attach(void)
 
 	bpf_link__destroy(skel->links.usdt_executed);
 
-	/* we expect the nop5 ip */
+	/* we expect the nop10 ip */
 	skel->bss->expected_ip = (unsigned long) addr_2 + 1;
 
 	/*
 	 * Attach program on top of usdt_2 which is probe defined on top
-	 * of nop1,nop5 combo, so the probe gets optimized on top of nop5.
+	 * of nop1,nop10 combo, so the probe gets optimized on top of nop10.
 	 */
 	skel->links.usdt_executed = bpf_program__attach_usdt(skel->progs.usdt_executed,
 						     0 /*self*/, "/proc/self/exe",
@@ -328,8 +328,13 @@ static void subtest_optimized_attach(void)
 	/* nop stays on addr_2 address */
 	ASSERT_EQ(*addr_2, 0x90, "nop");
 
-	/* call is on addr_2 + 1 address */
-	ASSERT_EQ(*(addr_2 + 1), 0xe8, "call");
+	/*
+	 * lea -0x80(%rsp), %rsp
+	 * call ...
+	 */
+	static unsigned char expected[] = { 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8 };
+
+	ASSERT_MEMEQ(addr_2 + 1, expected, sizeof(expected), "lea_and_call");
 	ASSERT_EQ(skel->bss->executed, 4, "executed");
 
 cleanup:
diff --git a/tools/testing/selftests/bpf/usdt_2.c b/tools/testing/selftests/bpf/usdt_2.c
index 789883aaca4c..b359b389f6c0 100644
--- a/tools/testing/selftests/bpf/usdt_2.c
+++ b/tools/testing/selftests/bpf/usdt_2.c
@@ -3,7 +3,7 @@
 #if defined(__x86_64__)
 
 /*
- * Include usdt.h with default nop,nop5 instructions combo.
+ * Include usdt.h with default nop,nop10 instructions combo.
  */
 #include "usdt.h"
 
-- 
2.54.0


^ permalink raw reply related

* [PATCHv4 08/13] selftests/bpf: Emit nop,nop10 instructions combo for x86_64 arch
From: Jiri Olsa @ 2026-05-26 20:58 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: Jakub Sitnicki, bpf, linux-trace-kernel
In-Reply-To: <20260526205840.173790-1-jolsa@kernel.org>

Syncing latest usdt.h change [1].

Now that we have nop10 optimization support in kernel, let's emit
nop,nop10 for usdt probe. We leave it up to the library to use
desirable nop instruction.

[1] TBD
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/usdt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/usdt.h b/tools/testing/selftests/bpf/usdt.h
index c71e21df38b3..75687f50f4e2 100644
--- a/tools/testing/selftests/bpf/usdt.h
+++ b/tools/testing/selftests/bpf/usdt.h
@@ -313,7 +313,7 @@ struct usdt_sema { volatile unsigned short active; };
 #if defined(__ia64__) || defined(__s390__) || defined(__s390x__)
 #define USDT_NOP			nop 0
 #elif defined(__x86_64__)
-#define USDT_NOP                       .byte 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x0 /* nop, nop5 */
+#define USDT_NOP                       .byte 0x90, 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 /* nop, nop10 */
 #else
 #define USDT_NOP			nop
 #endif
-- 
2.54.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox