Linux Trace Kernel

Linux Trace Kernel
 help / color / mirror / Atom feed

* Re: [PATCH 7/7] selftests/bpf: Add tests for uprobe nop10 red zone clobbering
From: Jiri Olsa @ 2026-05-18  7:30 UTC (permalink / raw)
  To: bot+bpf-ci
  Cc: oleg, peterz, mingo, mhiramat, andrii, bpf, linux-trace-kernel,
	ast, daniel, martin.lau, eddyz87, yonghong.song, clm,
	ihor.solodrai
In-Reply-To: <e91d31e0c1378fe970db6385c7af419e97af0c2acd37b3b8775a2fac02ea2361@mail.kernel.org>

On Thu, May 14, 2026 at 02:55:27PM +0000, bot+bpf-ci@kernel.org wrote:
> > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > index 82b3c0ce9253..d553485e7db5 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> 
> [ ... ]
> 
> > diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> > index be34c4087ff5..606601ccdc42 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> 
> [ ... ]
> 
> > diff --git a/tools/testing/selftests/bpf/progs/test_usdt.c b/tools/testing/selftests/bpf/progs/test_usdt.c
> > index f00cb52874e0..0ee78fb050a1 100644
> > --- a/tools/testing/selftests/bpf/progs/test_usdt.c
> > +++ b/tools/testing/selftests/bpf/progs/test_usdt.c
> 
> [ ... ]
> 
> > diff --git a/tools/testing/selftests/bpf/usdt_2.c b/tools/testing/selftests/bpf/usdt_2.c
> > index b359b389f6c0..5e38f8605b02 100644
> > --- a/tools/testing/selftests/bpf/usdt_2.c
> > +++ b/tools/testing/selftests/bpf/usdt_2.c
> > @@ -13,4 +13,17 @@ void usdt_2(void)
> >  	USDT(optimized_attach, usdt_2);
> >  }
> >
> > +static volatile unsigned long usdt_red_zone_arg1 = 0xDEADBEEF;
> > +static volatile unsigned long usdt_red_zone_arg2 = 0xCAFEBABE;
> > +static volatile unsigned long usdt_red_zone_arg3 = 0xFEEDFACE;
> > +
> > +void __attribute__((noinline)) usdt_red_zone_trigger(void)
> > +{
> > +	unsigned long a1 = usdt_red_zone_arg1;
> > +	unsigned long a2 = usdt_red_zone_arg2;
> > +	unsigned long a3 = usdt_red_zone_arg3;
> > +
> > +	USDT(optimized_attach, usdt_red_zone, a1, a2, a3);
> > +}
> > +
> >  #endif
> 
> Is this test reliable across different compiler versions and optimizations?
> 
> The USDT macro uses the "nor" inline assembly constraint, which allows the
> compiler to allocate these operands to CPU registers. Because there is no
> register pressure in usdt_red_zone_trigger(), modern compilers might allocate
> a1, a2, and a3 directly to registers rather than the stack's red zone.
> 
> If the operands are placed in registers, the uprobe CALL optimization
> clobbering [rsp-8] will not corrupt them. This could cause the test to pass
> even on buggy kernels, creating a false positive.
> 
> Would it be safer to explicitly force these operands into the red zone using
> inline assembly constraints, rather than depending on the compiler's register
> allocator?

answered by Andrii in:
  https://lore.kernel.org/bpf/CAEf4BzZ-nASpoF+-vLYZ1D8eWRVd5JWAuhrwx8XZNrX7ZPY5_Q@mail.gmail.com/

jirka

^ permalink raw reply

* Re: [PATCH 07/13] rv: Simply hybrid automata monitors's clock variables
From: Nam Cao @ 2026-05-18  7:44 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: Steven Rostedt, Wander Lairson Costa, linux-trace-kernel,
	linux-kernel
In-Reply-To: <ad9ca4916604d3f5ffe7a6683f9b82008784fa0e.camel@redhat.com>

Gabriele Monaco <gmonaco@redhat.com> writes:
> On Mon, 2026-05-11 at 13:55 +0200, Nam Cao wrote:
>> That can work, but not ideal, because hrtimer will not be usable.
>
> Why not? If we have HA_TIMER_WHEEL , we'd use timer and expire, if we have
> HA_TIMER_HRTIMER we'd only need hrtimer with it's hrtimer_get_expires():
>
>  union {
>  struct hrtimer hrtimer;
>  struct {
>  struct timer_list timer;
>  u64 expire; /* Explicitly store the armed budget */
>  };
>
> we already can't use timer and hrtimer interchangeably.
> What am I missing here?

Ah, now I understand the trick, thanks.

We already have an "expires" field in struct timer_list. But I am not
sure if we are supposed to touch that field. Your proposal looks safer.

>> Looking at the throttle monitor again, is it possible to rewrite
>> runtime_left_ns() to read .dl_runtime instead of .runtime? I don't know
>> the deadline schedule very well, but I think .dl_runtime is not changing
>> like .runtime?
>
> In theory yes, but since the runtime is consumed only when running, we cannot
> just set the timeout once. We either save how much was consumed somewhere or do
> some start/pause mechanism.
> Neither looks simpler to me.

Understood.

Nam

^ permalink raw reply

* Re: [PATCH v2 01/14] tools/rv: Fix substring match bug in monitor name search
From: Nam Cao @ 2026-05-18  8:13 UTC (permalink / raw)
  To: Gabriele Monaco, linux-kernel, linux-trace-kernel, Steven Rostedt,
	Gabriele Monaco
  Cc: Thomas Weissschuh, Tomas Glozar, John Kacur, Wen Yang
In-Reply-To: <20260514152055.229162-2-gmonaco@redhat.com>

Gabriele Monaco <gmonaco@redhat.com> writes:
>  static int __ikm_find_monitor_name(char *monitor_name, char *out_name)
>  {
> -	char *available_monitors, container[MAX_DA_NAME_LEN+1], *cursor, *end;
> -	int retval = 1;
> +	char *available_monitors, *cursor, *line;
> +	int len = strlen(monitor_name);
> +	int found = 0;
>  
>  	available_monitors = tracefs_instance_file_read(NULL, "rv/available_monitors", NULL);
>  	if (!available_monitors)
>  		return -1;
>  
> -	cursor = strstr(available_monitors, monitor_name);
> -	if (!cursor) {
> -		retval = 0;
> -		goto out_free;
> -	}
> +	config_is_container = 0;

Isn't config_is_container unused?

Perhaps it is used in a follow-up patch? Let me keep reading..

Nam

^ permalink raw reply

* Re: [PATCH v2 01/14] tools/rv: Fix substring match bug in monitor name search
From: Nam Cao @ 2026-05-18  8:15 UTC (permalink / raw)
  To: Gabriele Monaco, linux-kernel, linux-trace-kernel, Steven Rostedt,
	Gabriele Monaco
  Cc: Thomas Weissschuh, Tomas Glozar, John Kacur, Wen Yang
In-Reply-To: <87ecj9879f.fsf@yellow.woof>

Nam Cao <namcao@linutronix.de> writes:

> Gabriele Monaco <gmonaco@redhat.com> writes:
>>  static int __ikm_find_monitor_name(char *monitor_name, char *out_name)
>>  {
>> -	char *available_monitors, container[MAX_DA_NAME_LEN+1], *cursor, *end;
>> -	int retval = 1;
>> +	char *available_monitors, *cursor, *line;
>> +	int len = strlen(monitor_name);
>> +	int found = 0;
>>  
>>  	available_monitors = tracefs_instance_file_read(NULL, "rv/available_monitors", NULL);
>>  	if (!available_monitors)
>>  		return -1;
>>  
>> -	cursor = strstr(available_monitors, monitor_name);
>> -	if (!cursor) {
>> -		retval = 0;
>> -		goto out_free;
>> -	}
>> +	config_is_container = 0;
>
> Isn't config_is_container unused?
>
> Perhaps it is used in a follow-up patch? Let me keep reading..

Never mind, I'm stupid.

Reviewed-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply

* Re: [PATCH v6 2/2] blk-mq: expose tag starvation counts via debugfs
From: John Garry @ 2026-05-18  8:14 UTC (permalink / raw)
  To: Aaron Tomlin, axboe, rostedt, mhiramat, mathieu.desnoyers
  Cc: bvanassche, johannes.thumshirn, kch, dlemoal, ritesh.list,
	loberman, neelx, sean, mproche, chjohnst, linux-block,
	linux-kernel, linux-trace-kernel
In-Reply-To: <20260517213614.350367-3-atomlin@atomlin.com>

On 17/05/2026 22:36, Aaron Tomlin wrote:
> In high-performance storage environments, particularly when utilising
> RAID controllers with shared tag sets (BLK_MQ_F_TAG_HCTX_SHARED), severe
> latency spikes can occur when fast devices are starved of available
> tags.
> 
> This patch introduces two new debugfs attributes for each block
> hardware queue:
>    - /sys/kernel/debug/block/[device]/hctxN/wait_on_hw_tag
>    - /sys/kernel/debug/block/[device]/hctxN/wait_on_sched_tag

How would these counters be used? You are just saying that we may have 
performance latency spikes and so here are two new counters.

> 
> These files expose atomic counters that increment each time a submitting
> context is forced into an uninterruptible sleep via io_schedule() due to
> the complete exhaustion of physical driver tags or software scheduler
> tags, respectively.
> 
> To ensure negligible performance overhead even in production
> environments where CONFIG_BLK_DEBUG_FS is actively enabled, this
> tracking logic utilises dynamically allocated per-CPU counters. When
> this configuration is disabled, the tracking logic compiles down to a
> safe no-op.

How does one normalise the values which are measured? I mean, during a 
period of high contention, we may get a bunch of threads waiting for a 
driver tag and the value in wait_on_hw_tag may jump considerably - how 
do you normalize this value in wait_on_hw_tag for meaningful analysis?

> 
> Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
> ---
>   block/blk-mq-debugfs.c | 109 +++++++++++++++++++++++++++++++++++++++++
>   block/blk-mq-debugfs.h |  19 +++++++
>   block/blk-mq-tag.c     |   4 ++
>   block/blk-mq.c         |   5 ++
>   include/linux/blk-mq.h |  12 +++++
>   5 files changed, 149 insertions(+)
> 
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 047ec887456b..a94ffc2eacdf 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -7,6 +7,7 @@
>   #include <linux/blkdev.h>
>   #include <linux/build_bug.h>
>   #include <linux/debugfs.h>
> +#include <linux/percpu.h>
>   
>   #include "blk.h"
>   #include "blk-mq.h"
> @@ -484,6 +485,54 @@ static int hctx_dispatch_busy_show(void *data, struct seq_file *m)
>   	return 0;
>   }
>   
> +/**
> + * hctx_wait_on_hw_tag_show - display hardware tag starvation count
> + * @data: generic pointer to the associated hardware context (hctx)
> + * @m: seq_file pointer for debugfs output formatting
> + *
> + * Prints the cumulative number of times a submitting context was forced
> + * to block due to the exhaustion of physical hardware driver tags.
> + *
> + * Return: 0 on success.
> + */
> +static int hctx_wait_on_hw_tag_show(void *data, struct seq_file *m)
> +{
> +	struct blk_mq_hw_ctx *hctx = data;
> +	unsigned long count = 0;
> +	int cpu;
> +
> +	if (hctx->wait_on_hw_tag) {
> +		for_each_possible_cpu(cpu)
> +			count += *per_cpu_ptr(hctx->wait_on_hw_tag, cpu);
> +	}
> +	seq_printf(m, "%lu\n", count);
> +	return 0;
> +}
> +
> +/**
> + * hctx_wait_on_sched_tag_show - display scheduler tag starvation count
> + * @data: generic pointer to the associated hardware context (hctx)
> + * @m: seq_file pointer for debugfs output formatting
> + *
> + * Prints the cumulative number of times a submitting context was forced
> + * to block due to the exhaustion of software scheduler tags.
> + *
> + * Return: 0 on success.
> + */
> +static int hctx_wait_on_sched_tag_show(void *data, struct seq_file *m)
> +{
> +	struct blk_mq_hw_ctx *hctx = data;
> +	unsigned long count = 0;
> +	int cpu;
> +
> +	if (hctx->wait_on_sched_tag) {
> +		for_each_possible_cpu(cpu)
> +			count += *per_cpu_ptr(hctx->wait_on_sched_tag, cpu);
> +	}
> +	seq_printf(m, "%lu\n", count);
> +	return 0;
> +}
> +
>   #define CTX_RQ_SEQ_OPS(name, type)					\
>   static void *ctx_##name##_rq_list_start(struct seq_file *m, loff_t *pos) \
>   	__acquires(&ctx->lock)						\
> @@ -599,6 +648,8 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
>   	{"active", 0400, hctx_active_show},
>   	{"dispatch_busy", 0400, hctx_dispatch_busy_show},
>   	{"type", 0400, hctx_type_show},
> +	{"wait_on_hw_tag", 0400, hctx_wait_on_hw_tag_show},
> +	{"wait_on_sched_tag", 0400, hctx_wait_on_sched_tag_show},
>   	{},
>   };
>   
> @@ -815,3 +866,61 @@ void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx)
>   	debugfs_remove_recursive(hctx->sched_debugfs_dir);
>   	hctx->sched_debugfs_dir = NULL;
>   }
> +
> +/**
> + * blk_mq_debugfs_alloc_hctx_stats - Allocate per-cpu starvation statistics
> + * @hctx: hardware context associated with the tag allocation
> + * @gfp: memory allocation flags
> + *
> + * Allocates the per-cpu memory for tracking hardware and scheduler tag
> + * starvation.
> + */
> +void blk_mq_debugfs_alloc_hctx_stats(struct blk_mq_hw_ctx *hctx, gfp_t gfp)
> +{
> +	if (!hctx->wait_on_hw_tag)
> +		hctx->wait_on_hw_tag = alloc_percpu_gfp(unsigned long,
> +							gfp);
> +	if (!hctx->wait_on_sched_tag)
> +		hctx->wait_on_sched_tag = alloc_percpu_gfp(unsigned long,
> +							   gfp);
> +}
> +
> +/**
> + * blk_mq_debugfs_free_hctx_stats - Free per-cpu starvation statistics
> + * @hctx: hardware context associated with the tag allocation
> + *
> + * Frees the per-cpu memory used for tracking hardware and scheduler tag
> + * starvation. This must only be called during hardware queue teardown when
> + * the queue is safely frozen and no active I/O submissions can race to
> + * increment the statistics.
> + */
> +void blk_mq_debugfs_free_hctx_stats(struct blk_mq_hw_ctx *hctx)
> +{
> +	free_percpu(hctx->wait_on_hw_tag);
> +	hctx->wait_on_hw_tag = NULL;
> +	free_percpu(hctx->wait_on_sched_tag);
> +	hctx->wait_on_sched_tag = NULL;
> +}
> +
> +/**
> + * blk_mq_debugfs_inc_wait_tags - increment the tag starvation counters
> + * @hctx: hardware context associated with the tag allocation
> + * @is_sched: true if the starved pool is the software scheduler
> + *
> + * Evaluates the exhausted tag pool and safely increments the appropriate
> + * per-cpu debugfs starvation counter.
> + *
> + * Note: The per-cpu pointers are explicitly checked to prevent a NULL
> + * pointer dereference in the event that the system was under heavy memory
> + * pressure and the initial per-cpu allocation failed.
> + */
> +void blk_mq_debugfs_inc_wait_tags(struct blk_mq_hw_ctx *hctx,
> +				  bool is_sched)
> +{
> +	unsigned long __percpu *tags = is_sched ?
> +			READ_ONCE(hctx->wait_on_sched_tag) :
> +			READ_ONCE(hctx->wait_on_hw_tag);
> +
> +	if (likely(tags))
> +		raw_cpu_inc(*tags);
> +}
> diff --git a/block/blk-mq-debugfs.h b/block/blk-mq-debugfs.h
> index 49bb1aaa83dc..7a7c0f376a2b 100644
> --- a/block/blk-mq-debugfs.h
> +++ b/block/blk-mq-debugfs.h
> @@ -17,6 +17,8 @@ struct blk_mq_debugfs_attr {
>   	const struct seq_operations *seq_ops;
>   };
>   
> +void blk_mq_debugfs_inc_wait_tags(struct blk_mq_hw_ctx *hctx,
> +				  bool is_sched);
>   int __blk_mq_debugfs_rq_show(struct seq_file *m, struct request *rq);
>   int blk_mq_debugfs_rq_show(struct seq_file *m, void *v);
>   
> @@ -26,6 +28,9 @@ void blk_mq_debugfs_register_hctx(struct request_queue *q,
>   void blk_mq_debugfs_unregister_hctx(struct blk_mq_hw_ctx *hctx);
>   void blk_mq_debugfs_register_hctxs(struct request_queue *q);
>   void blk_mq_debugfs_unregister_hctxs(struct request_queue *q);
> +void blk_mq_debugfs_alloc_hctx_stats(struct blk_mq_hw_ctx *hctx,
> +				     gfp_t gfp);
> +void blk_mq_debugfs_free_hctx_stats(struct blk_mq_hw_ctx *hctx);
>   
>   void blk_mq_debugfs_register_sched(struct request_queue *q);
>   void blk_mq_debugfs_unregister_sched(struct request_queue *q);
> @@ -35,6 +40,11 @@ void blk_mq_debugfs_unregister_sched_hctx(struct blk_mq_hw_ctx *hctx);
>   
>   void blk_mq_debugfs_register_rq_qos(struct request_queue *q);
>   #else
> +static inline void blk_mq_debugfs_inc_wait_tags(struct blk_mq_hw_ctx *hctx,
> +						bool is_sched)
> +{
> +}
> +
>   static inline void blk_mq_debugfs_register(struct request_queue *q)
>   {
>   }
> @@ -56,6 +66,15 @@ static inline void blk_mq_debugfs_unregister_hctxs(struct request_queue *q)
>   {
>   }
>   
> +static inline void blk_mq_debugfs_alloc_hctx_stats(struct blk_mq_hw_ctx *hctx,
> +						   gfp_t gfp)
> +{
> +}
> +
> +static inline void blk_mq_debugfs_free_hctx_stats(struct blk_mq_hw_ctx *hctx)
> +{
> +}
> +
>   static inline void blk_mq_debugfs_register_sched(struct request_queue *q)
>   {
>   }
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 66138dd043d4..3cc6a97a87a0 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -17,6 +17,7 @@
>   #include "blk.h"
>   #include "blk-mq.h"
>   #include "blk-mq-sched.h"
> +#include "blk-mq-debugfs.h"
>   
>   /*
>    * Recalculate wakeup batch when tag is shared by hctx.
> @@ -191,6 +192,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
>   		trace_block_rq_tag_wait(data->q, data->hctx,
>   					data->rq_flags & RQF_SCHED_TAGS);
>   
> +		blk_mq_debugfs_inc_wait_tags(data->hctx,
> +					     data->rq_flags & RQF_SCHED_TAGS);
> +
>   		bt_prev = bt;
>   		io_schedule();
>   
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4c5c16cce4f8..cd52bf6f82ce 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3991,6 +3991,8 @@ static void blk_mq_exit_hctx(struct request_queue *q,
>   			blk_free_flush_queue_callback);
>   	hctx->fq = NULL;
>   
> +	blk_mq_debugfs_free_hctx_stats(hctx);
> +
>   	spin_lock(&q->unused_hctx_lock);
>   	list_add(&hctx->hctx_list, &q->unused_hctx_list);
>   	spin_unlock(&q->unused_hctx_lock);
> @@ -4016,6 +4018,8 @@ static int blk_mq_init_hctx(struct request_queue *q,
>   {
>   	gfp_t gfp = GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY;
>   
> +	blk_mq_debugfs_alloc_hctx_stats(hctx, gfp);
> +
>   	hctx->fq = blk_alloc_flush_queue(hctx->numa_node, set->cmd_size, gfp);
>   	if (!hctx->fq)
>   		goto fail;
> @@ -4041,6 +4045,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
>   	blk_free_flush_queue(hctx->fq);
>   	hctx->fq = NULL;
>    fail:
> +	blk_mq_debugfs_free_hctx_stats(hctx);
>   	return -1;
>   }
>   
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 18a2388ba581..41d61488d683 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -453,6 +453,18 @@ struct blk_mq_hw_ctx {
>   	struct dentry		*debugfs_dir;
>   	/** @sched_debugfs_dir:	debugfs directory for the scheduler. */
>   	struct dentry		*sched_debugfs_dir;
> +	/**
> +	 * @wait_on_hw_tag: Cumulative per-cpu counter incremented each
> +	 * time a submitting context is forced to block due to physical
> +	 * hardware tag exhaustion.
> +	 */
> +	unsigned long __percpu	*wait_on_hw_tag;
> +	/**
> +	 * @wait_on_sched_tag: Cumulative per-cpu counter incremented each
> +	 * time a submitting context is forced to block due to software
> +	 * scheduler tag exhaustion.
> +	 */
> +	unsigned long __percpu	*wait_on_sched_tag;
>   #endif
>   
>   	/**


^ permalink raw reply

* Re: [PATCH v2 02/14] tools/rv: Fix substring match when listing container monitors
From: Nam Cao @ 2026-05-18  8:21 UTC (permalink / raw)
  To: Gabriele Monaco, linux-kernel, linux-trace-kernel, Steven Rostedt,
	Gabriele Monaco
  Cc: Thomas Weissschuh, Tomas Glozar, John Kacur, Wen Yang
In-Reply-To: <20260514152055.229162-3-gmonaco@redhat.com>

Gabriele Monaco <gmonaco@redhat.com> writes:
> When listing monitors within a specific container (rv list <container>),
> the tool incorrectly matched monitors if the requested container name
> was only a prefix of the actual container (e.g., 'rv list sche' would
> incorrectly list monitors from 'sched:').
>
> Fix this by ensuring the container name is an exact match and is
> immediately followed by the ':' separator.
>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

Reviewed-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply

* Re: [PATCH v2 03/14] tools/rv: Fix exit status when monitor execution fails
From: Nam Cao @ 2026-05-18  8:32 UTC (permalink / raw)
  To: Gabriele Monaco, linux-kernel, linux-trace-kernel, Steven Rostedt,
	Gabriele Monaco
  Cc: Thomas Weissschuh, Tomas Glozar, John Kacur, Wen Yang
In-Reply-To: <20260514152055.229162-4-gmonaco@redhat.com>

Gabriele Monaco <gmonaco@redhat.com> writes:
> +	exit(run <= 0);

Probably better to stick to the C standard:

    exit(run > 0 ? EXIT_SUCCESS : EXIT_FAILURE)

but whatever.

Reviewed-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply

* Re: [PATCH v2 04/14] tools/rv: Fix cleanup after failed trace setup
From: Nam Cao @ 2026-05-18  8:42 UTC (permalink / raw)
  To: Gabriele Monaco, linux-kernel, linux-trace-kernel, Steven Rostedt,
	Gabriele Monaco
  Cc: Thomas Weissschuh, Tomas Glozar, John Kacur, Wen Yang
In-Reply-To: <20260514152055.229162-5-gmonaco@redhat.com>

Gabriele Monaco <gmonaco@redhat.com> writes:

> Currently if ikm_setup_trace_instance() fails, the tool returns without
> any cleanup, if rv was called with both -t and -r, this means the
> reactor is not going to be cleared.
>
> Jump to the cleanup label to restore the reactor if necessary.
>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

Reviewed-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply

* Re: [PATCH v2 05/14] tools/rv: Add selftests
From: Nam Cao @ 2026-05-18  8:46 UTC (permalink / raw)
  To: Gabriele Monaco, linux-kernel, linux-trace-kernel, Steven Rostedt,
	Gabriele Monaco
  Cc: Thomas Weissschuh, Tomas Glozar, John Kacur, Wen Yang
In-Reply-To: <20260514152055.229162-6-gmonaco@redhat.com>

Gabriele Monaco <gmonaco@redhat.com> writes:
> The rv tool needs automated testing to catch regressions and verify
> correct functionality across different usage scenarios.
>
> Add selftests that validate monitor listing (including containers and
> nested monitors), monitor execution with different configurations
> (reactors, verbose output, tracing), and trace output format for both
> per-task and per-cpu monitors. Error handling paths are also tested.
> Tests use a shared engine for common patterns.
>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

I am not good enough at bash script to review this. But test scripts can
never hurt, so:

Acked-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply

* Re: [PATCH v2] tools/rtla: Fix --dump-tasks usage in timerlat
From: Tomas Glozar @ 2026-05-18  8:49 UTC (permalink / raw)
  To: Costa Shulyupin
  Cc: Steven Rostedt, Crystal Wood, Wander Lairson Costa, Ivan Pravdin,
	linux-trace-kernel, linux-kernel
In-Reply-To: <20260414185223.65353-1-costa.shul@redhat.com>

út 14. 4. 2026 v 20:52 odesílatel Costa Shulyupin
<costa.shul@redhat.com> napsal:
>
> Fix --dump-task to --dump-tasks in timerlat_hist usage string
> and getopt_long table for consistency with timerlat_top.
>
> Add missing --dump-tasks to timerlat_top usage synopsis.
>
> Assisted-by: Claude:claude-opus-4-6
> Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
> ---
> v2:
> - Address comments of Crystal Wood.

Please also add the link to v1 next time when sending a v2, it makes
it easier to check what is being addressed in the v2.

> ---
>  tools/tracing/rtla/src/timerlat_hist.c | 4 ++--
>  tools/tracing/rtla/src/timerlat_top.c  | 3 ++-
>  2 files changed, 4 insertions(+), 3 deletions(-)
>

The runtime test expanded to hist in [1] now passes, thanks! Adding:

Fixes: 2091336b9a8b ("rtla/timerlat_hist: Add auto-analysis support")

[1] https://lore.kernel.org/linux-trace-kernel/20260423130558.882022-2-tglozar@redhat.com/

Tomas


^ permalink raw reply

* Re: [PATCH v2 06/14] verification/rvgen: Fix options shared among commands
From: Nam Cao @ 2026-05-18  8:49 UTC (permalink / raw)
  To: Gabriele Monaco, linux-kernel, linux-trace-kernel, Steven Rostedt,
	Gabriele Monaco
  Cc: Thomas Weissschuh, Tomas Glozar, John Kacur, Wen Yang
In-Reply-To: <20260514152055.229162-7-gmonaco@redhat.com>

Gabriele Monaco <gmonaco@redhat.com> writes:
> After rvgen was refactored to use subparsers, the common options (-a and
> -D) were left in the main parser. This meant that they needed to be
> called /before/ the subcommand and using them without subcommand was
> allowed. This is not the original intent.
>
>   rvgen -D "some description" container -n name
>
> Define the options as parent in the subparsers to allow them to be used
> from both subcommands together with other options.
>
>   rvgen container -n name -D "some description"
>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

I didn't know we can do this.

Reviewed-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply

* Re: [PATCH v2 07/14] verification/rvgen: Fix ltl2k writing True as a literal
From: Nam Cao @ 2026-05-18  8:52 UTC (permalink / raw)
  To: Gabriele Monaco, linux-kernel, linux-trace-kernel, Steven Rostedt,
	Gabriele Monaco
  Cc: Thomas Weissschuh, Tomas Glozar, John Kacur, Wen Yang
In-Reply-To: <20260514152055.229162-8-gmonaco@redhat.com>

Gabriele Monaco <gmonaco@redhat.com> writes:
> The rvgen parser for LTL stores literal true values in the python
> representation (capitalised True), this doesn't build in C.
> The Literal class should already handle this case but ASTNode skips its
> strigification method and converts the value (true/false) directly.
>
> Fix by delegating ASTNode stringification to the Literal and Variable
> classes instead of bypassing them.
>
> Fixes: 97ffa4ce6ab32 ("verification/rvgen: Add support for linear temporal logic")
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

Reviewed-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply

* Re: [PATCH v2 08/14] verification/rvgen: Add golden and spec folders for tests
From: Nam Cao @ 2026-05-18  8:57 UTC (permalink / raw)
  To: Gabriele Monaco, linux-kernel, linux-trace-kernel, Steven Rostedt,
	Gabriele Monaco
  Cc: Thomas Weissschuh, Tomas Glozar, John Kacur, Wen Yang
In-Reply-To: <20260514152055.229162-9-gmonaco@redhat.com>

Gabriele Monaco <gmonaco@redhat.com> writes:
> Create reference models specifications and generated files in the golded
> folder. Those can be used as reference to validate rvgen still generates
> files as expected in automated tests.
>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

Didn't look at the "golden" files, I presume those are generated.

Reviewed-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply

* Re: [PATCH v2 05/17] tracing: Add __print_untrusted_str()
From: Mickaël Salaün @ 2026-05-18 10:26 UTC (permalink / raw)
  To: Steven Rostedt, Masami Hiramatsu, Mathieu Desnoyers
  Cc: Christian Brauner, Günther Noack, Jann Horn, Jeff Xu,
	Justin Suess, Kees Cook, Mathieu Desnoyers, Matthieu Buffet,
	Mikhail Ivanov, Tingmao Wang, kernel-team, linux-fsdevel,
	linux-security-module, linux-trace-kernel, Andrii Nakryiko
In-Reply-To: <20260406143717.1815792-6-mic@digikod.net>

Steve, Masami, Mathieu, are you ok with this new helper?

On Mon, Apr 06, 2026 at 04:37:03PM +0200, Mickaël Salaün wrote:
> Landlock tracepoints expose filesystem paths and process names
> that may contain spaces, equal signs, or other characters that
> break ftrace field parsing.
> 
> Add a new __print_untrusted_str() helper to safely print strings after
> escaping all special characters, including common separators (space,
> equal sign), quotes, and backslashes.  This transforms a string from an
> untrusted source (e.g. user space) to make it:
> - safe to parse,
> - easy to read (for simple strings),
> - easy to get back the original.
> 
> Cc: Günther Noack <gnoack@google.com>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tingmao Wang <m@maowtm.org>
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> ---
> 
> Changes since v1:
> https://lore.kernel.org/r/20250523165741.693976-4-mic@digikod.net
> - Remove WARN_ON() (pointed out by Steven Rostedt).
> ---
>  include/linux/trace_events.h               |  2 ++
>  include/trace/stages/stage3_trace_output.h |  4 +++
>  include/trace/stages/stage7_class_define.h |  1 +
>  kernel/trace/trace_output.c                | 41 ++++++++++++++++++++++
>  4 files changed, 48 insertions(+)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 37eb2f0f3dd8..7f4325d327ee 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -57,6 +57,8 @@ trace_print_hex_dump_seq(struct trace_seq *p, const char *prefix_str,
>  			 int prefix_type, int rowsize, int groupsize,
>  			 const void *buf, size_t len, bool ascii);
>  
> +const char *trace_print_untrusted_str_seq(struct trace_seq *s, const char *str);
> +
>  int trace_raw_output_prep(struct trace_iterator *iter,
>  			  struct trace_event *event);
>  extern __printf(2, 3)
> diff --git a/include/trace/stages/stage3_trace_output.h b/include/trace/stages/stage3_trace_output.h
> index fce85ea2df1c..62e98babb969 100644
> --- a/include/trace/stages/stage3_trace_output.h
> +++ b/include/trace/stages/stage3_trace_output.h
> @@ -133,6 +133,10 @@
>  	trace_print_hex_dump_seq(p, prefix_str, prefix_type,		\
>  				 rowsize, groupsize, buf, len, ascii)
>  
> +#undef __print_untrusted_str
> +#define __print_untrusted_str(str)							\
> +		trace_print_untrusted_str_seq(p, __get_str(str))
> +
>  #undef __print_ns_to_secs
>  #define __print_ns_to_secs(value)			\
>  	({						\
> diff --git a/include/trace/stages/stage7_class_define.h b/include/trace/stages/stage7_class_define.h
> index fcd564a590f4..1164aacd550f 100644
> --- a/include/trace/stages/stage7_class_define.h
> +++ b/include/trace/stages/stage7_class_define.h
> @@ -24,6 +24,7 @@
>  #undef __print_array
>  #undef __print_dynamic_array
>  #undef __print_hex_dump
> +#undef __print_untrusted_str
>  #undef __get_buf
>  
>  /*
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 1996d7aba038..9d14c7cc654d 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -16,6 +16,7 @@
>  #include <linux/btf.h>
>  #include <linux/bpf.h>
>  #include <linux/hashtable.h>
> +#include <linux/string_helpers.h>
>  
>  #include "trace_output.h"
>  #include "trace_btf.h"
> @@ -321,6 +322,46 @@ trace_print_hex_dump_seq(struct trace_seq *p, const char *prefix_str,
>  }
>  EXPORT_SYMBOL(trace_print_hex_dump_seq);
>  
> +/**
> + * trace_print_untrusted_str_seq - print a string after escaping characters
> + * @s: trace seq struct to write to
> + * @src: The string to print
> + *
> + * Prints a string to a trace seq after escaping all special characters,
> + * including common separators (space, equal sign), quotes, and backslashes.
> + * This transforms a string from an untrusted source (e.g. user space) to make
> + * it:
> + * - safe to parse,
> + * - easy to read (for simple strings),
> + * - easy to get back the original.
> + */
> +const char *trace_print_untrusted_str_seq(struct trace_seq *s,
> +					   const char *src)
> +{
> +	int escaped_size;
> +	char *buf;
> +	size_t buf_size = seq_buf_get_buf(&s->seq, &buf);
> +	const char *ret = trace_seq_buffer_ptr(s);
> +
> +	/* Buffer exhaustion is normal when the trace buffer is full. */
> +	if (!src || buf_size == 0)
> +		return NULL;
> +
> +	escaped_size = string_escape_mem(src, strlen(src), buf, buf_size,
> +		ESCAPE_SPACE | ESCAPE_SPECIAL | ESCAPE_NAP | ESCAPE_APPEND |
> +		ESCAPE_OCTAL, " ='\"\\");
> +	if (unlikely(escaped_size >= buf_size)) {
> +		/* We need some room for the final '\0'. */
> +		seq_buf_set_overflow(&s->seq);
> +		s->full = 1;
> +		return NULL;
> +	}
> +	seq_buf_commit(&s->seq, escaped_size);
> +	trace_seq_putc(s, 0);
> +	return ret;
> +}
> +EXPORT_SYMBOL(trace_print_untrusted_str_seq);
> +
>  int trace_raw_output_prep(struct trace_iterator *iter,
>  			  struct trace_event *trace_event)
>  {
> -- 
> 2.53.0
> 
> 

^ permalink raw reply

* Re: [PATCH 1/7] uprobes/x86: Move optimized uprobe from nop5 to nop10
From: Peter Zijlstra @ 2026-05-18 10:43 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Ingo Molnar, Masami Hiramatsu, Andrii Nakryiko,
	bpf, linux-trace-kernel, x86, linux-kernel
In-Reply-To: <20260514135342.22130-2-jolsa@kernel.org>


You seem to have forgotten to Cc LKML and x86 :-(

On Thu, May 14, 2026 at 03:53:36PM +0200, Jiri Olsa wrote:

> @@ -1017,17 +1030,32 @@ static int int3_update(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
>  static int swbp_optimize(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
>  			 unsigned long vaddr, unsigned long tramp)
>  {
> -	u8 call[5];
> +	u8 insn[OPT_INSN_SIZE], *call = &insn[LEA_INSN_SIZE];
>  
> -	__text_gen_insn(call, CALL_INSN_OPCODE, (const void *) vaddr,
> +	/*
> +	 * We have nop10 instruction (with first byte overwritten to int3),
> +	 * changing it to:
> +	 *   lea -0x80(%rsp), %rsp
> +	 *   call tramp
> +	 */
> +	memcpy(insn, lea_rsp, LEA_INSN_SIZE);
> +	__text_gen_insn(call, CALL_INSN_OPCODE,
> +			(const void *) (vaddr + LEA_INSN_SIZE),
>  			(const void *) tramp, CALL_INSN_SIZE);
> -	return int3_update(auprobe, vma, vaddr, call, true /* optimize */);
> +	return int3_update(auprobe, vma, vaddr, insn, OPT_INSN_SIZE, true /* optimize */);
>  }
>  
>  static int swbp_unoptimize(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
>  			   unsigned long vaddr)
>  {
> -	return int3_update(auprobe, vma, vaddr, auprobe->insn, false /* optimize */);
> +	/*
> +	 * We have optimized nop10 (lea, call), changing it to 'jmp rel8' to
> +	 * end of the 10-byte slot instead of restoring the original nop10,
> +	 * because we could have thread already inside lea instruction.

Inaccurate, RIP could be on CALL, not inside LEA. Writing NOP10 would
make it inside NOP10 though, and that would cause havoc IF you use the
normal NOP10.

Thing is, the encoding of NOP{8,9,10} would actually allow you to
preserve the CALL instruction :-)

That is, observe:

       PF1   PF2   ESC   NOPL  MOD   SIB   DISP32

NOP10: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1)
NOP10: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0xe8, 0x78, 0x56, 0x34, 0x12 -- cs nopw 0x12345678(%rax,%rbp,8)

Specifically the CALL opcode sits in the SIB byte and decodes like:

  e8 := 11 101 000

  scale = 11  (2^3 = 8)
  index = 101 BP
  base  = 000 AX

And the displacement is just that, a displacement.

So you *could* in fact, write back _A_ NOP10, just not the standard
NOP10.

> +	 */
> +	u8 jmp[OPT_INSN_SIZE] = { JMP8_INSN_OPCODE, OPT_JMP8_OFFSET };
> +
> +	return int3_update(auprobe, vma, vaddr, jmp, JMP8_INSN_SIZE, false /* optimize */);
>  }

Changelog wants significant update to explain this scheme.

So we have:

  NOP10 -+-> LEA -0x80(%rsp), %rsp, CALL foo -> JMP.d8 +8
         |                                          |
         `------------------------------------------'

And you want to belabour the point of how you ensure re-writing the CALL
instruction isn't a problem (because I'm not convinced).

Note that the above results in:

initial:
0: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1)

optimize-int3:
1: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- int3
optimize-tail:
2: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
optimize-finish:
3: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- lea -0x80(%rsp),%rsp; call 0x78563412

unoptimize-int3:
4: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
unoptimize-tail:
5: 0xcc, 0x08, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
unoptimize-finish:
6: 0xeb, 0x08, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- jmp.d8 +8; call 0x78563412

optimize-int3:
7: 0xcc, 0x08, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
optimize-tail:
8: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- int3; call 0x12345678
optimize-finish:
9: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- int3; call 0x12345678

Note that from step 7 to step 8, you re-write the CALL instruction
without going through INT3. This means it is entirely possible for a
concurrent execution to observe a composite instruction.

This is NOT sound!

However, I think it can be salvaged, if instead of only writing INT3 at
+0, you also write INT3 at +5. The sequence then becomes:

initial:
0: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1)

optimize-int3:
1: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0xcc, 0x00, 0x00, 0x00, 0x00 -- int3; int3
optimize-tail(s):
2: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xcc, 0x12, 0x34, 0x56, 0x78 -- int3; int3
optimize-finish-1:
3: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
optimize-finish-2:
3: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- lea -0x80(%rsp),%rsp; call 0x78563412

unoptimize-int3:
4: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
unoptimize-tail:
5: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
unoptimize-finish:
6: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0xe8, 0x12, 0x34, 0x56, 0x78 -- cs nopw 0x78563412(%rax,%rbp,8); call 0x78563412

optimize-int3:
7: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0xcc, 0x12, 0x34, 0x56, 0x78 -- int3; int3
optimize-tail(s):
8: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xcc, 0x78, 0x56, 0x34, 0x12 -- int3; int3
optimize-finish-1:
9: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- int3; call 0x12345678
optimize-finish-2:
9: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- lea -0x80(%rsp),%rsp; call 0x12345678

> @@ -1095,14 +1125,25 @@ int set_orig_insn(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
>  		  unsigned long vaddr)
>  {
>  	if (test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags)) {
> -		int ret = is_optimized(vma->vm_mm, vaddr);
> -		if (ret < 0)
> +		uprobe_opcode_t insn[OPT_INSN_SIZE];
> +		int ret;
> +
> +		ret = copy_from_vaddr(vma->vm_mm, vaddr, &insn, OPT_INSN_SIZE);
> +		if (ret)
>  			return ret;
> -		if (ret) {
> +		if (__is_optimized((uprobe_opcode_t *)&insn, vaddr)) {
>  			ret = swbp_unoptimize(auprobe, vma, vaddr);
>  			WARN_ON_ONCE(ret);
>  			return ret;
>  		}
> +		/*
> +		 * We can have re-attached probe on top of jmp8 instruction,
> +		 * which did not get optimized. We need to restore the jmp8
> +		 * instruction, instead of the original instruction (nop10).
> +		 */
> +		if (is_swbp_insn(&insn[0]) && insn[1] == OPT_JMP8_OFFSET)
> +			return uprobe_write_opcode(auprobe, vma, vaddr, JMP8_INSN_OPCODE,
> +						   false /* is_register */);

Coding style wants { } on any multi-line statement, even if its only one
statement.

>  	}
>  	return uprobe_write_opcode(auprobe, vma, vaddr, *(uprobe_opcode_t *)&auprobe->insn,
>  				   false /* is_register */);

^ permalink raw reply

* Re: [PATCH v2] tracing/probes: Allow use of BTF names to dereference pointers
From: Leon Hwang @ 2026-05-18 10:45 UTC (permalink / raw)
  To: Steven Rostedt, LKML, Linux trace kernel, bpf
  Cc: Masami Hiramatsu, Mathieu Desnoyers, Mark Rutland, Peter Zijlstra,
	Namhyung Kim, Takaya Saeki, Douglas Raillard, Tom Zanussi,
	Andrew Morton, Thomas Gleixner, Ian Rogers, Jiri Olsa
In-Reply-To: <20260516173310.1dbad146@fedora>

On 17/5/26 05:33, Steven Rostedt wrote:
> From: Steven Rostedt <rostedt@goodmis.org>
> 
> Add syntax to the FETCHARGS parsing of probes to allow the use of
> structure and member names to get the offsets to dereference pointers.
> 
> Currently, a dereference must be a number, where the user has to figure
> out manually the offset of a member of a structure that they want to
> reference. For example, to get the size of a kmem_cache that was passed to
> the function kmem_cache_alloc_noprof, one would need to do:
> 
>  # cd /sys/kernel/tracing
>  # echo 'f:cache kmem_cache_alloc_noprof size=+0x18($arg1):u32' >> dynamic_events
> 
> This requires knowing that the offset of size is 0x18, which can be found
> with gdb:
> 
>   (gdb) p &((struct kmem_cache *)0)->size
>   $1 = (unsigned int *) 0x18
> 
> If BTF is in the kernel, it can be used to find this with names, where the
> user doesn't need to find the actual offset:
> 
>  # echo 'f:cache kmem_cache_alloc_noprof size=+kmem_cache.size($arg1):u32' >> dynamic_events
> 
> Instead of the "+0x18", it would have "+kmem_cache.size" where the format is:
> 
>   +STRUCT.MEMBER[.MEMBER[..]]
> 
> The delimiter is '.' and the first item is the structure name. Then the
> member of the structure to get the offset of. If that member is an
> embedded structure, another '.MEMBER' may be added to get the offset of
> its members with respect to the original value.
> 
>   "+kmem_cache.size($arg1)" is equivalent to:
> 
>   (*(struct kmem_cache *)$arg1).size
> 
> Anonymous structures are also handled:
> 
>   # echo 'e:xmit net.net_dev_xmit +net_device.name(+sk_buff.dev($skbaddr)):string' >> dynamic_events
> 
> Where "+net_device.name(+sk_buff.dev($skbaddr))" is equivalent to:
> 
>   (*(struct net_device *)((*(struct sk_buff *)($skbaddr)).dev)->name)
> 
> Note that "dev" of struct sk_buff is inside an anonymous structure:
> 
> struct sk_buff {
> 	union {
> 		struct {
> 			/* These two members must be first to match sk_buff_head. */
> 			struct sk_buff		*next;
> 			struct sk_buff		*prev;
> 
> 			union {
> 				struct net_device	*dev;
> 				[..]
> 			};
> 		};
> 		[..]
> 	};
> 
> This will allow up to three deep of anonymous structures before it will
> fail to find a member.
> 
> The above produces:
> 
>     sshd-session-1080    [000] b..5.  1526.337161: xmit: (net.net_dev_xmit) arg1="enp7s0"
> 
> And nested structures can be found by adding more members to the arg:
> 
>   # echo 'f:read filemap_readahead.isra.0 file=+0(+dentry.d_name.name(+file.f_path.dentry($arg2))):string' >> dynamic_events
> 
> The above is equivalent to:
> 
>   *((*(struct dentry *)(*(struct file *)$arg2).f_path.dentry)->d_name.name)
> 
> And produces:
> 
>        trace-cmd-1381    [002] ...1.  2082.676268: read: (filemap_readahead.isra.0+0x0/0x150) file="trace.dat"
> 
Hi Steve,

Great to see that BTF is going to be nested into trace.

I'm glad to share my BPF tool, bpfsnoop [1], that utilizes the similar
way to inspect argument's data.

Read device name:
bpfsnoop -t net_dev_xmit --output-arg 'str(skb->dev->name)'
--limit-events 20
- net_dev_xmit[tp] args=((struct sk_buff *)skb=0xffff88818821d4e8,
(int)rc=0, (struct net_device *)dev=0xffff88984ba64000, (unsigned
int)skb_len=0x1f2/498) cpu=2 process=(0:swapper/2)
timestamp=18:06:17.309492697
Arg attrs: (array(char[16]))'str(skb->dev->name)'="eth0"

Read dentry name:
bpfsnoop -k 'vfs_read' --output-arg
'str((file->f_path.dentry)->d_name.name)' --limit-events 20
← vfs_read args=((struct file *)file=0xffff888175e08400, (char
*)buf=0x55c7a1168400(0x0/0), (size_t)count=0x10000/65536, (loff_t
*)pos=0xffffc9000f707bb0(0)) retval=(long int)510 cpu=3
process=(339834:sudo) timestamp=18:24:16.22021166
Arg attrs: (unsigned char *)'str((file->f_path.dentry)->d_name.name)'="ptmx"

In bpfsnoop, it provides a friendly way to inspect argument's data using
C expressions. Under the hood, it compiles the C expressions, specified
by --filter-arg/--output-arg, into BPF byte code by parsing the
struct/union member access with BTF. (I'm too lazy to write documents to
explain its internal details. But you can study it with AI assistance.)

Insanely, after developing such feature for bpfsnoop, I wondered whether
to embed a light-weight C compiler into trace tool in order to compile C
expression into BPF byte code, and then load the BPF program to
filter/output argument. Finally, users are able to filter/output
arguments using C expressions. It seemed too crazy for me to post such
idea to trace mailing list at that time, as I wasn't familiar with trace
infrastructure.

[1] https://github.com/bpfsnoop/bpfsnoop/

Thanks,
Leon


^ permalink raw reply

* [PATCHv2 00/11] uprobes/x86: Fix red zone issue for optimized uprobes
From: Jiri Olsa @ 2026-05-18 10:59 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel

hi,
Andrii reported an issue with optimized uprobes [1] that can clobber
redzone area with call instruction storing return address on stack
where user code may keep temporary data without adjusting rsp.

Fixing this by moving the optimized uprobes on top of 10-bytes nop
instruction, so we can squeeze another instruction to escape the
redzone area before doing the call.

Note we need upstream update first for patch 3 (github.com/libbpf/usdt),
if we decide to take this change.

thanks,
jirka

v1: https://lore.kernel.org/bpf/20260514135342.22130-1-jolsa@kernel.org/

v2 changes:
- several selftest fixes [sashiko]
- consolidate is_lea_insn and is_call_insn insto single check [Jakub Sitnicki]
- use proper mm_struct object in __in_uprobe_trampoline check [sashiko]
- allow to copy uprobe trampolines vma objects on fork [sashiko]
- change uprobe syscall detection error from -ENXIO to -EPROTO [Andrii]
- added fork/clone tests
- I kept the selftest changes and nop5->nop10 changes in separate
  commits for easier review, we can squash them later if we want to keep
  bisect working properly


[1] https://lore.kernel.org/bpf/20260509003146.976844-1-andrii@kernel.org/
---
Andrii Nakryiko (1):
      selftests/bpf: Add tests for uprobe nop10 red zone clobbering

Jiri Olsa (10):
      uprobes/x86: Use proper mm_struct in __in_uprobe_trampoline
      uprobes/x86: Allow to copy uprobe trampolines on fork
      uprobes/x86: Move optimized uprobe from nop5 to nop10
      libbpf: Change has_nop_combo to work on top of nop10
      libbpf: Detect uprobe syscall with new error
      selftests/bpf: Emit nop,nop10 instructions combo for x86_64 arch
      selftests/bpf: Change uprobe syscall tests to use nop10
      selftests/bpf: Change uprobe/usdt trigger bench code to use nop10
      selftests/bpf: Add reattach tests for uprobe syscall
      selftests/bpf: Add tests for forked/cloned optimized uprobes

 arch/x86/kernel/uprobes.c                               | 144 ++++++++++++++++++++++------------
 tools/lib/bpf/features.c                                |   4 +-
 tools/lib/bpf/usdt.c                                    |  16 ++--
 tools/testing/selftests/bpf/bench.c                     |  20 ++---
 tools/testing/selftests/bpf/benchs/bench_trigger.c      |  38 ++++-----
 tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh |   2 +-
 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 307 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 tools/testing/selftests/bpf/prog_tests/usdt.c           |  74 +++++++++++++++---
 tools/testing/selftests/bpf/progs/test_usdt.c           |  25 ++++++
 tools/testing/selftests/bpf/usdt.h                      |   2 +-
 tools/testing/selftests/bpf/usdt_2.c                    |  15 +++-
 11 files changed, 524 insertions(+), 123 deletions(-)

^ permalink raw reply

* [PATCHv2 01/11] uprobes/x86: Use proper mm_struct in __in_uprobe_trampoline
From: Jiri Olsa @ 2026-05-18 10:59 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260518105957.123445-1-jolsa@kernel.org>

In the unregister path we use __in_uprobe_trampoline check with
current->mm for the VMA lookup, which is wrong, because we are
in the tracer context, not the traced process.

Add mm_struct pointer argument to __in_uprobe_trampoline and
changing related callers to pass proper mm_struct pointer.

Fixes: ba2bfc97b462 ("uprobes/x86: Add support to optimize uprobes")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/kernel/uprobes.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index ebb1baf1eb1d..2be6707e3320 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -761,9 +761,9 @@ void arch_uprobe_clear_state(struct mm_struct *mm)
 		destroy_uprobe_trampoline(tramp);
 }
 
-static bool __in_uprobe_trampoline(unsigned long ip)
+static bool __in_uprobe_trampoline(struct mm_struct *mm, unsigned long ip)
 {
-	struct vm_area_struct *vma = vma_lookup(current->mm, ip);
+	struct vm_area_struct *vma = vma_lookup(mm, ip);
 
 	return vma && vma_is_special_mapping(vma, &tramp_mapping);
 }
@@ -776,14 +776,14 @@ static bool in_uprobe_trampoline(unsigned long ip)
 
 	rcu_read_lock();
 	if (mmap_lock_speculate_try_begin(mm, &seq)) {
-		found = __in_uprobe_trampoline(ip);
+		found = __in_uprobe_trampoline(mm, ip);
 		retry = mmap_lock_speculate_retry(mm, seq);
 	}
 	rcu_read_unlock();
 
 	if (retry) {
 		mmap_read_lock(mm);
-		found = __in_uprobe_trampoline(ip);
+		found = __in_uprobe_trampoline(mm, ip);
 		mmap_read_unlock(mm);
 	}
 	return found;
@@ -1044,7 +1044,7 @@ static int copy_from_vaddr(struct mm_struct *mm, unsigned long vaddr, void *dst,
 	return 0;
 }
 
-static bool __is_optimized(uprobe_opcode_t *insn, unsigned long vaddr)
+static bool __is_optimized(struct mm_struct *mm, uprobe_opcode_t *insn, unsigned long vaddr)
 {
 	struct __packed __arch_relative_insn {
 		u8 op;
@@ -1053,7 +1053,7 @@ static bool __is_optimized(uprobe_opcode_t *insn, unsigned long vaddr)
 
 	if (!is_call_insn(insn))
 		return false;
-	return __in_uprobe_trampoline(vaddr + 5 + call->raddr);
+	return __in_uprobe_trampoline(mm, vaddr + 5 + call->raddr);
 }
 
 static int is_optimized(struct mm_struct *mm, unsigned long vaddr)
@@ -1064,7 +1064,7 @@ static int is_optimized(struct mm_struct *mm, unsigned long vaddr)
 	err = copy_from_vaddr(mm, vaddr, &insn, 5);
 	if (err)
 		return err;
-	return __is_optimized((uprobe_opcode_t *)&insn, vaddr);
+	return __is_optimized(mm, (uprobe_opcode_t *)&insn, vaddr);
 }
 
 static bool should_optimize(struct arch_uprobe *auprobe)
-- 
2.53.0


^ permalink raw reply related

* [PATCHv2 02/11] uprobes/x86: Allow to copy uprobe trampolines on fork
From: Jiri Olsa @ 2026-05-18 10:59 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260518105957.123445-1-jolsa@kernel.org>

When we do fork or clone without CLONE_VM the new process won't
have uprobe trampoline vma objects and at the same time it will
have optimized code calling the trampolines.

Fixing this by allowing vma uprobe trampoline objects to be copied
on fork to the new process.

Fixes: ba2bfc97b462 ("uprobes/x86: Add support to optimize uprobes")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/kernel/uprobes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 2be6707e3320..37faf038be33 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -702,7 +702,7 @@ static struct uprobe_trampoline *create_uprobe_trampoline(unsigned long vaddr)
 
 	tramp->vaddr = vaddr;
 	vma = _install_special_mapping(mm, tramp->vaddr, PAGE_SIZE,
-				VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_DONTCOPY|VM_IO,
+				VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_IO,
 				&tramp_mapping);
 	if (IS_ERR(vma)) {
 		kfree(tramp);
-- 
2.53.0


^ permalink raw reply related

* [PATCHv2 03/11] uprobes/x86: Move optimized uprobe from nop5 to nop10
From: Jiri Olsa @ 2026-05-18 10:59 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260518105957.123445-1-jolsa@kernel.org>

Andrii reported an issue with optimized uprobes [1] that can clobber
redzone area with call instruction storing return address on stack
where user code may keep temporary data without adjusting rsp.

Fixing this by moving the optimized uprobes on top of 10-bytes nop
instruction, so we can squeeze another instruction to escape the
redzone area before doing the call, like:

  lea -0x80(%rsp), %rsp
  call tramp

Note the lea instruction is used to adjust the rsp register without
changing the flags.

The unoptimize path is bit tricky, because we can't change back to nop10
instruction, because we could have some thread already inside lea instruction.
Instead we change it to 'jmp rel8' jump instruction to end of the 10-byte
slot. The `jmp rel8' is also added as another instruction that allows
optimized uprobe in can_optimize function.

The optimized uprobe performance stays the same:

        uprobe-nop     :    3.129 ± 0.013M/s
        uprobe-push    :    3.045 ± 0.006M/s
        uprobe-ret     :    1.095 ± 0.004M/s
  -->   uprobe-nop10   :    7.170 ± 0.020M/s
        uretprobe-nop  :    2.143 ± 0.021M/s
        uretprobe-push :    2.090 ± 0.000M/s
        uretprobe-ret  :    0.942 ± 0.000M/s
  -->   uretprobe-nop10:    3.381 ± 0.003M/s
        usdt-nop       :    3.245 ± 0.004M/s
  -->   usdt-nop10     :    7.256 ± 0.023M/s

[1] https://lore.kernel.org/bpf/20260509003146.976844-1-andrii@kernel.org/
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Closes: https://lore.kernel.org/bpf/20260509003146.976844-1-andrii@kernel.org/
Fixes: ba2bfc97b462 ("uprobes/x86: Add support to optimize uprobes")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/kernel/uprobes.c | 130 ++++++++++++++++++++++++++------------
 1 file changed, 89 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 37faf038be33..e0067d1b6242 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -636,9 +636,26 @@ struct uprobe_trampoline {
 	unsigned long		vaddr;
 };
 
+#define LEA_INSN_SIZE		5
+#define OPT_INSN_SIZE		(LEA_INSN_SIZE + CALL_INSN_SIZE)
+#define OPT_JMP8_OFFSET		(OPT_INSN_SIZE - JMP8_INSN_SIZE)
+#define REDZONE_SIZE		0x80
+
+static const u8 lea_rsp[] = { 0x48, 0x8d, 0x64, 0x24, 0x80 };
+
+static bool is_opt_insns(const uprobe_opcode_t *insn)
+{
+	static const u8 opt_insns[] = {
+		0x48, 0x8d, 0x64, 0x24, REDZONE_SIZE, /* lea -0x80(%rsp), %rsp */
+		CALL_INSN_OPCODE
+	};
+
+	return !memcmp(insn, opt_insns, ARRAY_SIZE(opt_insns));
+}
+
 static bool is_reachable_by_call(unsigned long vtramp, unsigned long vaddr)
 {
-	long delta = (long)(vaddr + 5 - vtramp);
+	long delta = (long)(vaddr + OPT_INSN_SIZE - vtramp);
 
 	return delta >= INT_MIN && delta <= INT_MAX;
 }
@@ -651,7 +668,7 @@ static unsigned long find_nearest_trampoline(unsigned long vaddr)
 	};
 	unsigned long low_limit, high_limit;
 	unsigned long low_tramp, high_tramp;
-	unsigned long call_end = vaddr + 5;
+	unsigned long call_end = vaddr + OPT_INSN_SIZE;
 
 	if (check_add_overflow(call_end, INT_MIN, &low_limit))
 		low_limit = PAGE_SIZE;
@@ -810,7 +827,7 @@ SYSCALL_DEFINE0(uprobe)
 
 	/* Allow execution only from uprobe trampolines. */
 	if (!in_uprobe_trampoline(regs->ip))
-		return -ENXIO;
+		return -EPROTO;
 
 	err = copy_from_user(&args, (void __user *)regs->sp, sizeof(args));
 	if (err)
@@ -826,8 +843,8 @@ SYSCALL_DEFINE0(uprobe)
 	regs->ax  = args.ax;
 	regs->r11 = args.r11;
 	regs->cx  = args.cx;
-	regs->ip  = args.retaddr - 5;
-	regs->sp += sizeof(args);
+	regs->ip  = args.retaddr - OPT_INSN_SIZE;
+	regs->sp += sizeof(args) + REDZONE_SIZE;
 	regs->orig_ax = -1;
 
 	sp = regs->sp;
@@ -844,12 +861,12 @@ SYSCALL_DEFINE0(uprobe)
 	 */
 	if (regs->sp != sp) {
 		/* skip the trampoline call */
-		if (args.retaddr - 5 == regs->ip)
-			regs->ip += 5;
+		if (args.retaddr - OPT_INSN_SIZE == regs->ip)
+			regs->ip += OPT_INSN_SIZE;
 		return regs->ax;
 	}
 
-	regs->sp -= sizeof(args);
+	regs->sp -= sizeof(args) + REDZONE_SIZE;
 
 	/* for the case uprobe_consumer has changed ax/r11/cx */
 	args.ax  = regs->ax;
@@ -857,7 +874,7 @@ SYSCALL_DEFINE0(uprobe)
 	args.cx  = regs->cx;
 
 	/* keep return address unless we are instructed otherwise */
-	if (args.retaddr - 5 != regs->ip)
+	if (args.retaddr - OPT_INSN_SIZE != regs->ip)
 		args.retaddr = regs->ip;
 
 	if (shstk_push(args.retaddr) == -EFAULT)
@@ -891,7 +908,7 @@ asm (
 	"pop %rax\n"
 	"pop %r11\n"
 	"pop %rcx\n"
-	"ret\n"
+	"ret $" __stringify(REDZONE_SIZE) "\n"
 	"int3\n"
 	".balign " __stringify(PAGE_SIZE) "\n"
 	".popsection\n"
@@ -909,7 +926,7 @@ late_initcall(arch_uprobes_init);
 
 enum {
 	EXPECT_SWBP,
-	EXPECT_CALL,
+	EXPECT_OPTIMIZED,
 };
 
 struct write_opcode_ctx {
@@ -917,11 +934,6 @@ struct write_opcode_ctx {
 	int expect;
 };
 
-static int is_call_insn(uprobe_opcode_t *insn)
-{
-	return *insn == CALL_INSN_OPCODE;
-}
-
 /*
  * Verification callback used by int3_update uprobe_write calls to make sure
  * the underlying instruction is as expected - either int3 or call.
@@ -930,17 +942,17 @@ static int verify_insn(struct page *page, unsigned long vaddr, uprobe_opcode_t *
 		       int nbytes, void *data)
 {
 	struct write_opcode_ctx *ctx = data;
-	uprobe_opcode_t old_opcode[5];
+	uprobe_opcode_t old_opcode[OPT_INSN_SIZE];
 
-	uprobe_copy_from_page(page, ctx->base, (uprobe_opcode_t *) &old_opcode, 5);
+	uprobe_copy_from_page(page, ctx->base, old_opcode, OPT_INSN_SIZE);
 
 	switch (ctx->expect) {
 	case EXPECT_SWBP:
 		if (is_swbp_insn(&old_opcode[0]))
 			return 1;
 		break;
-	case EXPECT_CALL:
-		if (is_call_insn(&old_opcode[0]))
+	case EXPECT_OPTIMIZED:
+		if (is_opt_insns(&old_opcode[0]))
 			return 1;
 		break;
 	}
@@ -963,7 +975,7 @@ static int verify_insn(struct page *page, unsigned long vaddr, uprobe_opcode_t *
  *   - SMP sync all CPUs
  */
 static int int3_update(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
-		       unsigned long vaddr, char *insn, bool optimize)
+		       unsigned long vaddr, char *insn, int size, bool optimize)
 {
 	uprobe_opcode_t int3 = UPROBE_SWBP_INSN;
 	struct write_opcode_ctx ctx = {
@@ -978,7 +990,7 @@ static int int3_update(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 	 * so we can skip this step for optimize == true.
 	 */
 	if (!optimize) {
-		ctx.expect = EXPECT_CALL;
+		ctx.expect = EXPECT_OPTIMIZED;
 		err = uprobe_write(auprobe, vma, vaddr, &int3, 1, verify_insn,
 				   true /* is_register */, false /* do_update_ref_ctr */,
 				   &ctx);
@@ -990,7 +1002,7 @@ static int int3_update(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 
 	/* Write all but the first byte of the patched range. */
 	ctx.expect = EXPECT_SWBP;
-	err = uprobe_write(auprobe, vma, vaddr + 1, insn + 1, 4, verify_insn,
+	err = uprobe_write(auprobe, vma, vaddr + 1, insn + 1, size - 1, verify_insn,
 			   true /* is_register */, false /* do_update_ref_ctr */,
 			   &ctx);
 	if (err)
@@ -1017,17 +1029,32 @@ static int int3_update(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 static int swbp_optimize(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 			 unsigned long vaddr, unsigned long tramp)
 {
-	u8 call[5];
+	u8 insn[OPT_INSN_SIZE], *call = &insn[LEA_INSN_SIZE];
 
-	__text_gen_insn(call, CALL_INSN_OPCODE, (const void *) vaddr,
+	/*
+	 * We have nop10 instruction (with first byte overwritten to int3),
+	 * changing it to:
+	 *   lea -0x80(%rsp), %rsp
+	 *   call tramp
+	 */
+	memcpy(insn, lea_rsp, LEA_INSN_SIZE);
+	__text_gen_insn(call, CALL_INSN_OPCODE,
+			(const void *) (vaddr + LEA_INSN_SIZE),
 			(const void *) tramp, CALL_INSN_SIZE);
-	return int3_update(auprobe, vma, vaddr, call, true /* optimize */);
+	return int3_update(auprobe, vma, vaddr, insn, OPT_INSN_SIZE, true /* optimize */);
 }
 
 static int swbp_unoptimize(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 			   unsigned long vaddr)
 {
-	return int3_update(auprobe, vma, vaddr, auprobe->insn, false /* optimize */);
+	/*
+	 * We have optimized nop10 (lea, call), changing it to 'jmp rel8' to
+	 * end of the 10-byte slot instead of restoring the original nop10,
+	 * because we could have thread already inside lea instruction.
+	 */
+	u8 jmp[OPT_INSN_SIZE] = { JMP8_INSN_OPCODE, OPT_JMP8_OFFSET };
+
+	return int3_update(auprobe, vma, vaddr, jmp, JMP8_INSN_SIZE, false /* optimize */);
 }
 
 static int copy_from_vaddr(struct mm_struct *mm, unsigned long vaddr, void *dst, int len)
@@ -1049,19 +1076,19 @@ static bool __is_optimized(struct mm_struct *mm, uprobe_opcode_t *insn, unsigned
 	struct __packed __arch_relative_insn {
 		u8 op;
 		s32 raddr;
-	} *call = (struct __arch_relative_insn *) insn;
+	} *call = (struct __arch_relative_insn *)(insn + LEA_INSN_SIZE);
 
-	if (!is_call_insn(insn))
+	if (!is_opt_insns(insn))
 		return false;
-	return __in_uprobe_trampoline(mm, vaddr + 5 + call->raddr);
+	return __in_uprobe_trampoline(mm, vaddr + OPT_INSN_SIZE + call->raddr);
 }
 
 static int is_optimized(struct mm_struct *mm, unsigned long vaddr)
 {
-	uprobe_opcode_t insn[5];
+	uprobe_opcode_t insn[OPT_INSN_SIZE];
 	int err;
 
-	err = copy_from_vaddr(mm, vaddr, &insn, 5);
+	err = copy_from_vaddr(mm, vaddr, &insn, OPT_INSN_SIZE);
 	if (err)
 		return err;
 	return __is_optimized(mm, (uprobe_opcode_t *)&insn, vaddr);
@@ -1095,14 +1122,25 @@ int set_orig_insn(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 		  unsigned long vaddr)
 {
 	if (test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags)) {
-		int ret = is_optimized(vma->vm_mm, vaddr);
-		if (ret < 0)
+		uprobe_opcode_t insn[OPT_INSN_SIZE];
+		int ret;
+
+		ret = copy_from_vaddr(vma->vm_mm, vaddr, &insn, OPT_INSN_SIZE);
+		if (ret)
 			return ret;
-		if (ret) {
+		if (__is_optimized(vma->vm_mm, (uprobe_opcode_t *)&insn, vaddr)) {
 			ret = swbp_unoptimize(auprobe, vma, vaddr);
 			WARN_ON_ONCE(ret);
 			return ret;
 		}
+		/*
+		 * We can have re-attached probe on top of jmp8 instruction,
+		 * which did not get optimized. We need to restore the jmp8
+		 * instruction, instead of the original instruction (nop10).
+		 */
+		if (is_swbp_insn(&insn[0]) && insn[1] == OPT_JMP8_OFFSET)
+			return uprobe_write_opcode(auprobe, vma, vaddr, JMP8_INSN_OPCODE,
+						   false /* is_register */);
 	}
 	return uprobe_write_opcode(auprobe, vma, vaddr, *(uprobe_opcode_t *)&auprobe->insn,
 				   false /* is_register */);
@@ -1131,7 +1169,7 @@ static int __arch_uprobe_optimize(struct arch_uprobe *auprobe, struct mm_struct
 void arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
 {
 	struct mm_struct *mm = current->mm;
-	uprobe_opcode_t insn[5];
+	uprobe_opcode_t insn[OPT_INSN_SIZE];
 
 	if (!should_optimize(auprobe))
 		return;
@@ -1142,7 +1180,7 @@ void arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
 	 * Check if some other thread already optimized the uprobe for us,
 	 * if it's the case just go away silently.
 	 */
-	if (copy_from_vaddr(mm, vaddr, &insn, 5))
+	if (copy_from_vaddr(mm, vaddr, &insn, OPT_INSN_SIZE))
 		goto unlock;
 	if (!is_swbp_insn((uprobe_opcode_t*) &insn))
 		goto unlock;
@@ -1160,14 +1198,24 @@ void arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
 
 static bool can_optimize(struct insn *insn, unsigned long vaddr)
 {
-	if (!insn->x86_64 || insn->length != 5)
+	if (!insn->x86_64)
 		return false;
 
-	if (!insn_is_nop(insn))
+	/* We can't do cross page atomic writes yet. */
+	if (PAGE_SIZE - (vaddr & ~PAGE_MASK) < OPT_INSN_SIZE)
 		return false;
 
-	/* We can't do cross page atomic writes yet. */
-	return PAGE_SIZE - (vaddr & ~PAGE_MASK) >= 5;
+	/* We can optimize on top of nop10.. */
+	if (insn->length == OPT_INSN_SIZE && insn_is_nop(insn))
+		return true;
+
+	/* .. and JMP rel8 to end of slot — check swbp_unoptimize. */
+	if (insn->length == 2 &&
+	    insn->opcode.bytes[0] == JMP8_INSN_OPCODE &&
+	    insn->immediate.value == OPT_JMP8_OFFSET)
+		return true;
+
+	return false;
 }
 #else /* 32-bit: */
 /*
-- 
2.53.0


^ permalink raw reply related

* [PATCHv2 04/11] libbpf: Change has_nop_combo to work on top of nop10
From: Jiri Olsa @ 2026-05-18 10:59 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: Jakub Sitnicki, bpf, linux-trace-kernel
In-Reply-To: <20260518105957.123445-1-jolsa@kernel.org>

We now expect nop combo with 10 bytes nop instead of 5 bytes nop,
fixing has_nop_combo to reflect that.

Fixes: 41a5c7df4466 ("libbpf: Add support to detect nop,nop5 instructions combo for usdt probe")
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/lib/bpf/usdt.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tools/lib/bpf/usdt.c b/tools/lib/bpf/usdt.c
index e3710933fd52..7e62e4d5bedd 100644
--- a/tools/lib/bpf/usdt.c
+++ b/tools/lib/bpf/usdt.c
@@ -305,7 +305,7 @@ struct usdt_manager *usdt_manager_new(struct bpf_object *obj)
 
 	/*
 	 * Detect kernel support for uprobe() syscall, it's presence means we can
-	 * take advantage of faster nop5 uprobe handling.
+	 * take advantage of faster nop10 uprobe handling.
 	 * Added in: 56101b69c919 ("uprobes/x86: Add uprobe syscall to speed up uprobe")
 	 */
 	man->has_uprobe_syscall = kernel_supports(obj, FEAT_UPROBE_SYSCALL);
@@ -596,14 +596,14 @@ static int parse_usdt_spec(struct usdt_spec *spec, const struct usdt_note *note,
 #if defined(__x86_64__)
 static bool has_nop_combo(int fd, long off)
 {
-	unsigned char nop_combo[6] = {
-		0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 /* nop,nop5 */
+	unsigned char nop_combo[11] = {
+		0x90, 0x66, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00,
 	};
-	unsigned char buf[6];
+	unsigned char buf[11];
 
-	if (pread(fd, buf, 6, off) != 6)
+	if (pread(fd, buf, 11, off) != 11)
 		return false;
-	return memcmp(buf, nop_combo, 6) == 0;
+	return memcmp(buf, nop_combo, 11) == 0;
 }
 #else
 static bool has_nop_combo(int fd, long off)
@@ -814,8 +814,8 @@ static int collect_usdt_targets(struct usdt_manager *man, struct elf_fd *elf_fd,
 		memset(target, 0, sizeof(*target));
 
 		/*
-		 * We have uprobe syscall and usdt with nop,nop5 instructions combo,
-		 * so we can place the uprobe directly on nop5 (+1) and get this probe
+		 * We have uprobe syscall and usdt with nop,nop10 instructions combo,
+		 * so we can place the uprobe directly on nop10 (+1) and get this probe
 		 * optimized.
 		 */
 		if (man->has_uprobe_syscall && has_nop_combo(elf_fd->fd, usdt_rel_ip)) {
-- 
2.53.0


^ permalink raw reply related

* [PATCHv2 05/11] libbpf: Detect uprobe syscall with new error
From: Jiri Olsa @ 2026-05-18 10:59 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260518105957.123445-1-jolsa@kernel.org>

In the previous optimized uprobe fix we changed the syscall
error used for its detection from ENXIO to EPROTO.

Changing related probe_uprobe_syscall detection check.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Fixes: 05738da0efa1 ("libbpf: Add uprobe syscall feature detection")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/lib/bpf/features.c                                | 4 ++--
 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/features.c b/tools/lib/bpf/features.c
index b7e388f99d0b..e5641fa60163 100644
--- a/tools/lib/bpf/features.c
+++ b/tools/lib/bpf/features.c
@@ -577,10 +577,10 @@ static int probe_ldimm64_full_range_off(int token_fd)
 static int probe_uprobe_syscall(int token_fd)
 {
 	/*
-	 * If kernel supports uprobe() syscall, it will return -ENXIO when called
+	 * If kernel supports uprobe() syscall, it will return -EPROTO when called
 	 * from the outside of a kernel-generated uprobe trampoline.
 	 */
-	return syscall(__NR_uprobe) < 0 && errno == ENXIO;
+	return syscall(__NR_uprobe) < 0 && errno == EPROTO;
 }
 #else
 static int probe_uprobe_syscall(int token_fd)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 955a37751b52..c944136252c6 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -762,7 +762,7 @@ static void test_uprobe_error(void)
 	long err = syscall(__NR_uprobe);
 
 	ASSERT_EQ(err, -1, "error");
-	ASSERT_EQ(errno, ENXIO, "errno");
+	ASSERT_EQ(errno, EPROTO, "errno");
 }
 
 static void __test_uprobe_syscall(void)
-- 
2.53.0


^ permalink raw reply related

* [PATCHv2 06/11] selftests/bpf: Emit nop,nop10 instructions combo for x86_64 arch
From: Jiri Olsa @ 2026-05-18 10:59 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260518105957.123445-1-jolsa@kernel.org>

Syncing latest usdt.h change [1].

Now that we have nop10 optimization support in kernel, let's emit
nop,nop10 for usdt probe. We leave it up to the library to use
desirable nop instruction.

[1] TBD
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/usdt.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/usdt.h b/tools/testing/selftests/bpf/usdt.h
index c71e21df38b3..d359663b9c32 100644
--- a/tools/testing/selftests/bpf/usdt.h
+++ b/tools/testing/selftests/bpf/usdt.h
@@ -313,7 +313,7 @@ struct usdt_sema { volatile unsigned short active; };
 #if defined(__ia64__) || defined(__s390__) || defined(__s390x__)
 #define USDT_NOP			nop 0
 #elif defined(__x86_64__)
-#define USDT_NOP                       .byte 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x0 /* nop, nop5 */
+#define USDT_NOP                       .byte 0x90, 0x66, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 /* nop, nop10 */
 #else
 #define USDT_NOP			nop
 #endif
-- 
2.53.0


^ permalink raw reply related

* [PATCHv2 07/11] selftests/bpf: Change uprobe syscall tests to use nop10
From: Jiri Olsa @ 2026-05-18 10:59 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260518105957.123445-1-jolsa@kernel.org>

Optimized uprobes are now on top of 10-bytes nop instructions,
reflect that in existing tests.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/benchs/bench_trigger.c      |  2 +-
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 29 ++++++++++---------
 tools/testing/selftests/bpf/prog_tests/usdt.c | 25 +++++++++-------
 tools/testing/selftests/bpf/usdt_2.c          |  2 +-
 4 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index 2f22ec61667b..bcc4820c802e 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -398,7 +398,7 @@ static void *uprobe_producer_ret(void *input)
 #ifdef __x86_64__
 __nocf_check __weak void uprobe_target_nop5(void)
 {
-	asm volatile (".byte 0x0f, 0x1f, 0x44, 0x00, 0x00");
+	asm volatile (".byte 0x66, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00");
 }
 
 static void *uprobe_producer_nop5(void *input)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index c944136252c6..e4a19dc9df69 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -17,7 +17,7 @@
 #include "uprobe_syscall_executed.skel.h"
 #include "bpf/libbpf_internal.h"
 
-#define USDT_NOP .byte 0x0f, 0x1f, 0x44, 0x00, 0x00
+#define USDT_NOP .byte 0x66, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00
 #include "usdt.h"
 
 #pragma GCC diagnostic ignored "-Wattributes"
@@ -26,7 +26,7 @@ __attribute__((aligned(16)))
 __nocf_check __weak __naked unsigned long uprobe_regs_trigger(void)
 {
 	asm volatile (
-		".byte 0x0f, 0x1f, 0x44, 0x00, 0x00\n" /* nop5 */
+		".byte 0x66, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n" /* nop10 */
 		"movq $0xdeadbeef, %rax\n"
 		"ret\n"
 	);
@@ -345,9 +345,9 @@ static void test_uretprobe_syscall_call(void)
 __attribute__((aligned(16)))
 __nocf_check __weak __naked void uprobe_test(void)
 {
-	asm volatile ("					\n"
-		".byte 0x0f, 0x1f, 0x44, 0x00, 0x00	\n"
-		"ret					\n"
+	asm volatile (
+		".byte 0x66, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00\n" /* nop10 */
+		"ret\n"
 	);
 }
 
@@ -388,14 +388,16 @@ static int find_uprobes_trampoline(void *tramp_addr)
 	return ret;
 }
 
-static unsigned char nop5[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
+static unsigned char jmp2B[2]   = { 0xeb, 8 };
+static unsigned char nop10[10]  = { 0x66, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 };
+static unsigned char lea_rsp[5] = { 0x48, 0x8d, 0x64, 0x24, 0x80 };
 
-static void *find_nop5(void *fn)
+static void *find_nop10(void *fn)
 {
 	int i;
 
-	for (i = 0; i < 10; i++) {
-		if (!memcmp(nop5, fn + i, 5))
+	for (i = 0; i < 128; i++) {
+		if (!memcmp(nop10, fn + i, 10))
 			return fn + i;
 	}
 	return NULL;
@@ -420,7 +422,8 @@ static void *check_attach(struct uprobe_syscall_executed *skel, trigger_t trigge
 	ASSERT_EQ(skel->bss->executed, executed, "executed");
 
 	/* .. and check the trampoline is as expected. */
-	call = (struct __arch_relative_insn *) addr;
+	ASSERT_OK(memcmp(addr, lea_rsp, 5), "lea_rsp");
+	call = (struct __arch_relative_insn *)(addr + 5);
 	tramp = (void *) (call + 1) + call->raddr;
 	ASSERT_EQ(call->op, 0xe8, "call");
 	ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline");
@@ -432,7 +435,7 @@ static void check_detach(void *addr, void *tramp)
 {
 	/* [uprobes_trampoline] stays after detach */
 	ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline");
-	ASSERT_OK(memcmp(addr, nop5, 5), "nop5");
+	ASSERT_OK(memcmp(addr, jmp2B, 2), "jmp2B");
 }
 
 static void check(struct uprobe_syscall_executed *skel, struct bpf_link *link,
@@ -568,8 +571,8 @@ static void test_uprobe_usdt(void)
 	void *addr;
 
 	errno = 0;
-	addr = find_nop5(usdt_test);
-	if (!ASSERT_OK_PTR(addr, "find_nop5"))
+	addr = find_nop10(usdt_test);
+	if (!ASSERT_OK_PTR(addr, "find_nop10"))
 		return;
 
 	skel = uprobe_syscall_executed__open_and_load();
diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
index 69759b27794d..a160d7c4fa0d 100644
--- a/tools/testing/selftests/bpf/prog_tests/usdt.c
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -252,7 +252,7 @@ extern void usdt_1(void);
 extern void usdt_2(void);
 
 static unsigned char nop1[1] = { 0x90 };
-static unsigned char nop1_nop5_combo[6] = { 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x00 };
+static unsigned char nop1_nop10_combo[11] = { 0x90, 0x66, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 };
 
 static void *find_instr(void *fn, unsigned char *instr, size_t cnt)
 {
@@ -271,17 +271,17 @@ static void subtest_optimized_attach(void)
 	__u8 *addr_1, *addr_2;
 
 	/* usdt_1 USDT probe has single nop instruction */
-	addr_1 = find_instr(usdt_1, nop1_nop5_combo, 6);
-	if (!ASSERT_NULL(addr_1, "usdt_1_find_nop1_nop5_combo"))
+	addr_1 = find_instr(usdt_1, nop1_nop10_combo, 11);
+	if (!ASSERT_NULL(addr_1, "usdt_1_find_nop1_nop10_combo"))
 		return;
 
 	addr_1 = find_instr(usdt_1, nop1, 1);
 	if (!ASSERT_OK_PTR(addr_1, "usdt_1_find_nop1"))
 		return;
 
-	/* usdt_2 USDT probe has nop,nop5 instructions combo */
-	addr_2 = find_instr(usdt_2, nop1_nop5_combo, 6);
-	if (!ASSERT_OK_PTR(addr_2, "usdt_2_find_nop1_nop5_combo"))
+	/* usdt_2 USDT probe has nop,nop10 instructions combo */
+	addr_2 = find_instr(usdt_2, nop1_nop10_combo, 11);
+	if (!ASSERT_OK_PTR(addr_2, "usdt_2_find_nop1_nop10_combo"))
 		return;
 
 	skel = test_usdt__open_and_load();
@@ -309,12 +309,12 @@ static void subtest_optimized_attach(void)
 
 	bpf_link__destroy(skel->links.usdt_executed);
 
-	/* we expect the nop5 ip */
+	/* we expect the nop10 ip */
 	skel->bss->expected_ip = (unsigned long) addr_2 + 1;
 
 	/*
 	 * Attach program on top of usdt_2 which is probe defined on top
-	 * of nop1,nop5 combo, so the probe gets optimized on top of nop5.
+	 * of nop1,nop10 combo, so the probe gets optimized on top of nop10.
 	 */
 	skel->links.usdt_executed = bpf_program__attach_usdt(skel->progs.usdt_executed,
 						     0 /*self*/, "/proc/self/exe",
@@ -328,8 +328,13 @@ static void subtest_optimized_attach(void)
 	/* nop stays on addr_2 address */
 	ASSERT_EQ(*addr_2, 0x90, "nop");
 
-	/* call is on addr_2 + 1 address */
-	ASSERT_EQ(*(addr_2 + 1), 0xe8, "call");
+	/*
+	 * lea -0x80(%rsp), %rsp
+	 * call ...
+	 */
+	static unsigned char expected[] = { 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8 };
+
+	ASSERT_MEMEQ(addr_2 + 1, expected, sizeof(expected), "lea_and_call");
 	ASSERT_EQ(skel->bss->executed, 4, "executed");
 
 cleanup:
diff --git a/tools/testing/selftests/bpf/usdt_2.c b/tools/testing/selftests/bpf/usdt_2.c
index 789883aaca4c..b359b389f6c0 100644
--- a/tools/testing/selftests/bpf/usdt_2.c
+++ b/tools/testing/selftests/bpf/usdt_2.c
@@ -3,7 +3,7 @@
 #if defined(__x86_64__)
 
 /*
- * Include usdt.h with default nop,nop5 instructions combo.
+ * Include usdt.h with default nop,nop10 instructions combo.
  */
 #include "usdt.h"
 
-- 
2.53.0


^ permalink raw reply related

* [PATCHv2 08/11] selftests/bpf: Change uprobe/usdt trigger bench code to use nop10
From: Jiri Olsa @ 2026-05-18 10:59 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Ingo Molnar, Masami Hiramatsu,
	Andrii Nakryiko
  Cc: bpf, linux-trace-kernel
In-Reply-To: <20260518105957.123445-1-jolsa@kernel.org>

Changing uprobe/usdt trigger bench code to use nop10 instead
of nop5. Also changing un_bench_uprobes.sh to use nop10 triggers.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/bench.c           | 20 +++++------
 .../selftests/bpf/benchs/bench_trigger.c      | 36 +++++++++----------
 .../selftests/bpf/benchs/run_bench_uprobes.sh |  2 +-
 3 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 6155ce455c27..1252a1af2e84 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -539,12 +539,12 @@ extern const struct bench bench_trig_uretprobe_multi_push;
 extern const struct bench bench_trig_uprobe_multi_ret;
 extern const struct bench bench_trig_uretprobe_multi_ret;
 #ifdef __x86_64__
-extern const struct bench bench_trig_uprobe_nop5;
-extern const struct bench bench_trig_uretprobe_nop5;
-extern const struct bench bench_trig_uprobe_multi_nop5;
-extern const struct bench bench_trig_uretprobe_multi_nop5;
+extern const struct bench bench_trig_uprobe_nop10;
+extern const struct bench bench_trig_uretprobe_nop10;
+extern const struct bench bench_trig_uprobe_multi_nop10;
+extern const struct bench bench_trig_uretprobe_multi_nop10;
 extern const struct bench bench_trig_usdt_nop;
-extern const struct bench bench_trig_usdt_nop5;
+extern const struct bench bench_trig_usdt_nop10;
 #endif
 
 extern const struct bench bench_rb_libbpf;
@@ -619,12 +619,12 @@ static const struct bench *benchs[] = {
 	&bench_trig_uprobe_multi_ret,
 	&bench_trig_uretprobe_multi_ret,
 #ifdef __x86_64__
-	&bench_trig_uprobe_nop5,
-	&bench_trig_uretprobe_nop5,
-	&bench_trig_uprobe_multi_nop5,
-	&bench_trig_uretprobe_multi_nop5,
+	&bench_trig_uprobe_nop10,
+	&bench_trig_uretprobe_nop10,
+	&bench_trig_uprobe_multi_nop10,
+	&bench_trig_uretprobe_multi_nop10,
 	&bench_trig_usdt_nop,
-	&bench_trig_usdt_nop5,
+	&bench_trig_usdt_nop10,
 #endif
 	/* ringbuf/perfbuf benchmarks */
 	&bench_rb_libbpf,
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index bcc4820c802e..3998ea8ff9aa 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -396,15 +396,15 @@ static void *uprobe_producer_ret(void *input)
 }
 
 #ifdef __x86_64__
-__nocf_check __weak void uprobe_target_nop5(void)
+__nocf_check __weak void uprobe_target_nop10(void)
 {
 	asm volatile (".byte 0x66, 0x66, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00");
 }
 
-static void *uprobe_producer_nop5(void *input)
+static void *uprobe_producer_nop10(void *input)
 {
 	while (true)
-		uprobe_target_nop5();
+		uprobe_target_nop10();
 	return NULL;
 }
 
@@ -418,7 +418,7 @@ static void *uprobe_producer_usdt_nop(void *input)
 	return NULL;
 }
 
-static void *uprobe_producer_usdt_nop5(void *input)
+static void *uprobe_producer_usdt_nop10(void *input)
 {
 	while (true)
 		usdt_2();
@@ -542,24 +542,24 @@ static void uretprobe_multi_ret_setup(void)
 }
 
 #ifdef __x86_64__
-static void uprobe_nop5_setup(void)
+static void uprobe_nop10_setup(void)
 {
-	usetup(false, false /* !use_multi */, &uprobe_target_nop5);
+	usetup(false, false /* !use_multi */, &uprobe_target_nop10);
 }
 
-static void uretprobe_nop5_setup(void)
+static void uretprobe_nop10_setup(void)
 {
-	usetup(true, false /* !use_multi */, &uprobe_target_nop5);
+	usetup(true, false /* !use_multi */, &uprobe_target_nop10);
 }
 
-static void uprobe_multi_nop5_setup(void)
+static void uprobe_multi_nop10_setup(void)
 {
-	usetup(false, true /* use_multi */, &uprobe_target_nop5);
+	usetup(false, true /* use_multi */, &uprobe_target_nop10);
 }
 
-static void uretprobe_multi_nop5_setup(void)
+static void uretprobe_multi_nop10_setup(void)
 {
-	usetup(true, true /* use_multi */, &uprobe_target_nop5);
+	usetup(true, true /* use_multi */, &uprobe_target_nop10);
 }
 
 static void usdt_setup(const char *name)
@@ -598,7 +598,7 @@ static void usdt_nop_setup(void)
 	usdt_setup("usdt_1");
 }
 
-static void usdt_nop5_setup(void)
+static void usdt_nop10_setup(void)
 {
 	usdt_setup("usdt_2");
 }
@@ -665,10 +665,10 @@ BENCH_TRIG_USERMODE(uretprobe_multi_nop, nop, "uretprobe-multi-nop");
 BENCH_TRIG_USERMODE(uretprobe_multi_push, push, "uretprobe-multi-push");
 BENCH_TRIG_USERMODE(uretprobe_multi_ret, ret, "uretprobe-multi-ret");
 #ifdef __x86_64__
-BENCH_TRIG_USERMODE(uprobe_nop5, nop5, "uprobe-nop5");
-BENCH_TRIG_USERMODE(uretprobe_nop5, nop5, "uretprobe-nop5");
-BENCH_TRIG_USERMODE(uprobe_multi_nop5, nop5, "uprobe-multi-nop5");
-BENCH_TRIG_USERMODE(uretprobe_multi_nop5, nop5, "uretprobe-multi-nop5");
+BENCH_TRIG_USERMODE(uprobe_nop10, nop10, "uprobe-nop10");
+BENCH_TRIG_USERMODE(uretprobe_nop10, nop10, "uretprobe-nop10");
+BENCH_TRIG_USERMODE(uprobe_multi_nop10, nop10, "uprobe-multi-nop10");
+BENCH_TRIG_USERMODE(uretprobe_multi_nop10, nop10, "uretprobe-multi-nop10");
 BENCH_TRIG_USERMODE(usdt_nop, usdt_nop, "usdt-nop");
-BENCH_TRIG_USERMODE(usdt_nop5, usdt_nop5, "usdt-nop5");
+BENCH_TRIG_USERMODE(usdt_nop10, usdt_nop10, "usdt-nop10");
 #endif
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
index 9ec59423b949..e490b337e960 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
@@ -2,7 +2,7 @@
 
 set -eufo pipefail
 
-for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop5} usdt-nop usdt-nop5
+for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop10} usdt-nop usdt-nop10
 do
 	summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
 	printf "%-15s: %s\n" $i "$summary"
-- 
2.53.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox