From: Jiri Olsa <olsajiri@gmail.com>
To: Andrii Nakryiko <andrii@kernel.org>
Cc: bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
oleg@redhat.com, peterz@infradead.org, mingo@kernel.org,
mhiramat@kernel.org
Subject: Re: [PATCH bpf 1/2] uprobes/x86: Fix red zone clobbering in nop5 optimization
Date: Sun, 10 May 2026 23:25:26 +0200 [thread overview]
Message-ID: <agD3xq2MUyskd7X-@krava> (raw)
In-Reply-To: <20260509003146.976844-1-andrii@kernel.org>
On Fri, May 08, 2026 at 05:30:56PM -0700, Andrii Nakryiko wrote:
> The x86 uprobe nop5 optimization currently replaces a 5-byte NOP at the
> probe site with a CALL into a uprobe trampoline. CALL pushes a return
> address to [rsp-8]. On x86-64 this is inside the 128-byte red zone, where
> user code may keep temporary data without adjusting rsp.
>
> Use a 5-byte JMP instead. JMP does not write to the user stack, but it
> also does not provide a return address. Replace the single trampoline
> entry with a page of 16-byte slots. Each optimized probe jumps to its
> assigned slot, the slot moves rsp below the red zone, saves the registers
> clobbered by syscall, and invokes the uprobe syscall:
>
> Probe site: jmp slot_N (5B, replaces nop5)
>
> Slot N: lea -128(%rsp), %rsp (5B) skip red zone
> push %rcx (1B) save (syscall clobbers)
> push %r11 (2B) save (syscall clobbers)
> push %rax (1B) save (syscall uses for nr)
> mov $336, %eax (5B) uprobe syscall number
> syscall (2B)
>
> All slots contain identical code at different offsets, so the trampoline
> page is generated once at boot and mapped read-execute into each process.
> The syscall handler identifies the slot from regs->ip, which points just
> after the syscall instruction, and uses a per-mm slot table to recover the
> original probe address.
>
> The uprobe syscall does not return to the trampoline slot. The handler
> restores the probe-site register state, runs the uprobe consumers, sets
> pt_regs to continue at probe_addr + 5 unless a consumer redirected
> execution, and returns directly through the IRET path. This preserves
> general purpose registers, including rcx and r11, without requiring any
> post-syscall cleanup code in the trampoline and avoids call/ret, RSB, and
> shadow stack concerns.
>
> Protect the per-mm trampoline list with RCU and free trampoline metadata
> with kfree_rcu(). This lets the syscall path resolve trampoline slots
> without taking mmap_lock. The optimized-instruction detection path also
> walks the trampoline list under an RCU read-side lock. Since that path
> starts from the JMP target, it translates the slot start to the post-syscall
> IP expected by the shared resolver before checking the trampoline mapping.
>
> Each trampoline page provides 256 slots. Slots stay permanently assigned
> to their first probe address and are reused only when the same address is
> probed again. Reassigning detached slots is deliberately avoided because a
> thread can remain in a trampoline for an unbounded time due to ptrace,
> interrupts, or scheduling delays. If a reachable trampoline page runs out
> of slots, probes that cannot allocate a slot fall back to the slower INT3
> path.
>
> Require the entire trampoline page to be reachable by a rel32 JMP before
> reusing it for a probe. This keeps every slot in the page within the range
> that can be encoded at the probe site.
>
> Change the error code returned when the uprobe syscall is invoked outside
> a kernel-generated trampoline from -ENXIO to -EPROTO. This lets libbpf and
> similar libraries distinguish fixed kernels from kernels with the
> red-zone-clobbering implementation and enable nop5 optimization only on
> fixed kernels.
>
> Performance (usdt single-thread, M/s):
>
> usdt-nop usdt-nop5-base usdt-nop5-fix nop5-change iret%
> Skylake 3.149 6.422 4.865 -24.3% 39.1%
> Milan 2.910 3.443 3.820 +11.0% 24.3%
> Sapphire Rapids 1.896 4.023 3.693 -8.2% 24.9%
> Bergamo 3.393 3.895 3.849 -1.2% 24.5%
>
> The fixed nop5 path remains faster than the non-optimized INT3 path on all
> measured systems. The regression relative to the old CALL-based trampoline
> comes from IRET being more expensive than SYSRET, most noticeably on older
> Intel Skylake. Newer Intel CPUs and tested AMD CPUs have lower IRET cost,
> and AMD Milan improves because removing mmap_lock from the hot path more
> than offsets the IRET cost.
>
> Multi-threaded throughput scales nearly linearly with the number of CPUs, like
> it used to, thanks to lockless RCU-protected uprobe trampoline lookup.
hi,
thanks a lot for the fix
FWIW we discussed also an option to have 10-bytes nop and do:
[rsp+0x80, call trampoline]
we would not need the slots re-use logic, but not sure what other
surprises there are with 10-bytes nop
I tried that change [1], it seems to work, but it has other
difficulties, like I think the unoptimized path needs to do:
[rsp+0x80, call trampoline] -> [jmp end of 10-bytes nop]
instead of patching back the 10-byte nop, because some thread
could be inside the nop area already.
jirka
[1] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=redzone_fix&id=74b09240289dba8368c2783b771e678b2cc31574
>
> Fixes: ba2bfc97b462 ("uprobes/x86: Add support to optimize uprobes")
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
> arch/x86/include/asm/uprobes.h | 18 ++
> arch/x86/kernel/uprobes.c | 262 ++++++++++--------
> tools/lib/bpf/features.c | 8 +-
> .../selftests/bpf/prog_tests/uprobe_syscall.c | 5 +-
> tools/testing/selftests/bpf/prog_tests/usdt.c | 2 +-
> 5 files changed, 181 insertions(+), 114 deletions(-)
>
> diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> index 362210c79998..a7cf5c92d95a 100644
> --- a/arch/x86/include/asm/uprobes.h
> +++ b/arch/x86/include/asm/uprobes.h
> @@ -25,6 +25,24 @@ enum {
> ARCH_UPROBE_FLAG_OPTIMIZE_FAIL = 1,
> };
>
> +/*
> + * Trampoline page layout: identical 16-byte slots, each containing:
> + * lea -128(%rsp), %rsp (5B) skip red zone
> + * push %rcx (1B) save (syscall clobbers)
> + * push %r11 (2B) save (syscall clobbers)
> + * push %rax (1B) save (syscall uses for nr)
> + * mov $336, %eax (5B) uprobe syscall number
> + * syscall (2B)
> + * = 16B, no padding needed
> + *
> + * The handler identifies which probe fired from regs->ip (each
> + * slot is at a unique offset), looks up the probe address from a
> + * per-process table, and returns directly to probe_addr+5 via iret
> + * with all registers restored.
> + */
> +#define UPROBE_TRAMP_SLOT_SIZE 16
> +#define UPROBE_TRAMP_MAX_SLOTS (PAGE_SIZE / UPROBE_TRAMP_SLOT_SIZE)
> +
> struct uprobe_xol_ops;
>
> struct arch_uprobe {
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index ebb1baf1eb1d..7e1f14200bbb 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -633,16 +633,25 @@ static struct vm_special_mapping tramp_mapping = {
>
> struct uprobe_trampoline {
> struct hlist_node node;
> + struct rcu_head rcu;
> unsigned long vaddr;
> + unsigned long probe_addrs[UPROBE_TRAMP_MAX_SLOTS];
> };
>
> -static bool is_reachable_by_call(unsigned long vtramp, unsigned long vaddr)
> +
> +static bool is_reachable_by_jmp(unsigned long dst, unsigned long src)
> {
> - long delta = (long)(vaddr + 5 - vtramp);
> + long delta = (long)(dst - (src + JMP32_INSN_SIZE));
>
> return delta >= INT_MIN && delta <= INT_MAX;
> }
>
> +static bool is_reachable_by_trampoline(unsigned long vtramp, unsigned long vaddr)
> +{
> + return is_reachable_by_jmp(vtramp, vaddr) &&
> + is_reachable_by_jmp(vtramp + PAGE_SIZE - 1, vaddr);
> +}
> +
> static unsigned long find_nearest_trampoline(unsigned long vaddr)
> {
> struct vm_unmapped_area_info info = {
> @@ -711,6 +720,21 @@ static struct uprobe_trampoline *create_uprobe_trampoline(unsigned long vaddr)
> return tramp;
> }
>
> +static int tramp_alloc_slot(struct uprobe_trampoline *tramp, unsigned long probe_addr)
> +{
> + int i;
> +
> + for (i = 0; i < UPROBE_TRAMP_MAX_SLOTS; i++) {
> + if (tramp->probe_addrs[i] == probe_addr)
> + return i;
> + if (tramp->probe_addrs[i] == 0) {
> + tramp->probe_addrs[i] = probe_addr;
> + return i;
> + }
> + }
> + return -ENOSPC;
> +}
> +
> static struct uprobe_trampoline *get_uprobe_trampoline(unsigned long vaddr, bool *new)
> {
> struct uprobes_state *state = ¤t->mm->uprobes_state;
> @@ -720,7 +744,7 @@ static struct uprobe_trampoline *get_uprobe_trampoline(unsigned long vaddr, bool
> return NULL;
>
> hlist_for_each_entry(tramp, &state->head_tramps, node) {
> - if (is_reachable_by_call(tramp->vaddr, vaddr)) {
> + if (is_reachable_by_trampoline(tramp->vaddr, vaddr)) {
> *new = false;
> return tramp;
> }
> @@ -731,7 +755,7 @@ static struct uprobe_trampoline *get_uprobe_trampoline(unsigned long vaddr, bool
> return NULL;
>
> *new = true;
> - hlist_add_head(&tramp->node, &state->head_tramps);
> + hlist_add_head_rcu(&tramp->node, &state->head_tramps);
> return tramp;
> }
>
> @@ -742,8 +766,8 @@ static void destroy_uprobe_trampoline(struct uprobe_trampoline *tramp)
> * because there's no easy way to make sure none of the threads
> * is still inside the trampoline.
> */
> - hlist_del(&tramp->node);
> - kfree(tramp);
> + hlist_del_rcu(&tramp->node);
> + kfree_rcu(tramp, rcu);
> }
>
> void arch_uprobe_init_state(struct mm_struct *mm)
> @@ -761,147 +785,153 @@ void arch_uprobe_clear_state(struct mm_struct *mm)
> destroy_uprobe_trampoline(tramp);
> }
>
> -static bool __in_uprobe_trampoline(unsigned long ip)
> +/*
> + * Find the trampoline containing @ip. If @probe_addr is non-NULL, also
> + * resolve the slot index from @ip and return the probe address.
> + *
> + * @ip is expected to point right after the syscall instruction, i.e.,
> + * at the end of the slot (slot_start + UPROBE_TRAMP_SLOT_SIZE).
> + */
> +static bool resolve_uprobe_addr(unsigned long ip, unsigned long *probe_addr)
> {
> - struct vm_area_struct *vma = vma_lookup(current->mm, ip);
> + struct uprobes_state *state = ¤t->mm->uprobes_state;
> + struct uprobe_trampoline *tramp;
>
> - return vma && vma_is_special_mapping(vma, &tramp_mapping);
> -}
> + hlist_for_each_entry_rcu(tramp, &state->head_tramps, node) {
> + /*
> + * ip points to after syscall, so it's on 16 byte boundary,
> + * which means that valid ip can point right after the page
> + * and should never be at zero offset within the page
> + */
> + if (ip <= tramp->vaddr || ip > tramp->vaddr + PAGE_SIZE)
> + continue;
>
> -static bool in_uprobe_trampoline(unsigned long ip)
> -{
> - struct mm_struct *mm = current->mm;
> - bool found, retry = true;
> - unsigned int seq;
> + if (probe_addr) {
> + /* we already validated ip is within expected range */
> + unsigned int slot = (ip - tramp->vaddr - 1) / UPROBE_TRAMP_SLOT_SIZE;
> + unsigned long addr = tramp->probe_addrs[slot];
>
> - rcu_read_lock();
> - if (mmap_lock_speculate_try_begin(mm, &seq)) {
> - found = __in_uprobe_trampoline(ip);
> - retry = mmap_lock_speculate_retry(mm, seq);
> - }
> - rcu_read_unlock();
> + *probe_addr = addr;
> + if (addr == 0)
> + return false;
> + }
>
> - if (retry) {
> - mmap_read_lock(mm);
> - found = __in_uprobe_trampoline(ip);
> - mmap_read_unlock(mm);
> + return true;
> }
> - return found;
> + return false;
> +}
> +
> +static bool in_uprobe_trampoline(unsigned long ip, unsigned long *probe_addr)
> +{
> + guard(rcu)();
> + return resolve_uprobe_addr(ip, probe_addr);
> }
>
> /*
> - * See uprobe syscall trampoline; the call to the trampoline will push
> - * the return address on the stack, the trampoline itself then pushes
> - * cx, r11 and ax.
> + * The trampoline slot pushes cx, r11, ax (the registers syscall clobbers)
> + * before doing the uprobe syscall. No return address is pushed — the
> + * probe site uses jmp, not call.
> */
> struct uprobe_syscall_args {
> unsigned long ax;
> unsigned long r11;
> unsigned long cx;
> - unsigned long retaddr;
> };
>
> +#define UPROBE_TRAMP_REDZONE 128
> +
> SYSCALL_DEFINE0(uprobe)
> {
> struct pt_regs *regs = task_pt_regs(current);
> struct uprobe_syscall_args args;
> - unsigned long ip, sp, sret;
> + unsigned long probe_addr;
> int err;
>
> /* Allow execution only from uprobe trampolines. */
> - if (!in_uprobe_trampoline(regs->ip))
> - return -ENXIO;
> + if (!in_uprobe_trampoline(regs->ip, &probe_addr))
> + return -EPROTO;
>
> err = copy_from_user(&args, (void __user *)regs->sp, sizeof(args));
> if (err)
> goto sigill;
>
> - ip = regs->ip;
> -
> /*
> - * expose the "right" values of ax/r11/cx/ip/sp to uprobe_consumer/s, plus:
> - * - adjust ip to the probe address, call saved next instruction address
> - * - adjust sp to the probe's stack frame (check trampoline code)
> + * Restore the register state as it was at the probe site:
> + * - ax/r11/cx from the trampoline-saved copies on user stack
> + * - adjust ip to the probe address based on matching slot
> + * - adjust sp to skip red zone and pushed args
> */
> regs->ax = args.ax;
> regs->r11 = args.r11;
> regs->cx = args.cx;
> - regs->ip = args.retaddr - 5;
> - regs->sp += sizeof(args);
> + regs->ip = probe_addr;
> + regs->sp += sizeof(args) + UPROBE_TRAMP_REDZONE;
> regs->orig_ax = -1;
>
> - sp = regs->sp;
> -
> - err = shstk_pop((u64 *)&sret);
> - if (err == -EFAULT || (!err && sret != args.retaddr))
> - goto sigill;
> -
> - handle_syscall_uprobe(regs, regs->ip);
> + handle_syscall_uprobe(regs, probe_addr);
>
> /*
> - * Some of the uprobe consumers has changed sp, we can do nothing,
> - * just return via iret.
> + * Skip the jmp instruction at the probe site (5 bytes) unless
> + * a consumer redirected execution elsewhere.
> */
> - if (regs->sp != sp) {
> - /* skip the trampoline call */
> - if (args.retaddr - 5 == regs->ip)
> - regs->ip += 5;
> - return regs->ax;
> - }
> + if (regs->ip == probe_addr)
> + regs->ip = probe_addr + 5;
>
> - regs->sp -= sizeof(args);
> -
> - /* for the case uprobe_consumer has changed ax/r11/cx */
> - args.ax = regs->ax;
> - args.r11 = regs->r11;
> - args.cx = regs->cx;
> -
> - /* keep return address unless we are instructed otherwise */
> - if (args.retaddr - 5 != regs->ip)
> - args.retaddr = regs->ip;
> -
> - if (shstk_push(args.retaddr) == -EFAULT)
> - goto sigill;
> -
> - regs->ip = ip;
> -
> - err = copy_to_user((void __user *)regs->sp, &args, sizeof(args));
> - if (err)
> - goto sigill;
> -
> - /* ensure sysret, see do_syscall_64() */
> - regs->r11 = regs->flags;
> - regs->cx = regs->ip;
> - return 0;
> + /*
> + * Return via iret by returning regs->ax. This preserves all
> + * GP registers (including cx and r11) without needing any
> + * user-space cleanup code. The iret path is used because we
> + * don't set up cx/r11 for sysret.
> + */
> + return regs->ax;
>
> sigill:
> force_sig(SIGILL);
> return -1;
> }
>
> +/*
> + * All uprobe trampoline slots are identical: skip the red zone,
> + * save the three registers that syscall clobbers, then invoke
> + * the uprobe syscall. The handler returns directly to the probe
> + * caller via iret. Execution never returns to the trampoline.
> + */
> asm (
> ".pushsection .rodata\n"
> - ".balign " __stringify(PAGE_SIZE) "\n"
> - "uprobe_trampoline_entry:\n"
> + ".balign " __stringify(UPROBE_TRAMP_SLOT_SIZE) "\n"
> + "uprobe_trampoline_slot:\n"
> + "lea -128(%rsp), %rsp\n"
> "push %rcx\n"
> "push %r11\n"
> "push %rax\n"
> - "mov $" __stringify(__NR_uprobe) ", %rax\n"
> + "mov $" __stringify(__NR_uprobe) ", %eax\n"
> "syscall\n"
> - "pop %rax\n"
> - "pop %r11\n"
> - "pop %rcx\n"
> - "ret\n"
> - "int3\n"
> - ".balign " __stringify(PAGE_SIZE) "\n"
> + "uprobe_trampoline_slot_end:\n"
> ".popsection\n"
> );
>
> -extern u8 uprobe_trampoline_entry[];
> +extern u8 uprobe_trampoline_slot[];
> +extern u8 uprobe_trampoline_slot_end[];
>
> static int __init arch_uprobes_init(void)
> {
> - tramp_mapping_pages[0] = virt_to_page(uprobe_trampoline_entry);
> + unsigned int slot_size = uprobe_trampoline_slot_end - uprobe_trampoline_slot;
> + struct page *page;
> + u8 *page_addr;
> + int i;
> +
> + BUILD_BUG_ON(UPROBE_TRAMP_SLOT_SIZE != 16);
> + WARN_ON_ONCE(slot_size != UPROBE_TRAMP_SLOT_SIZE);
> +
> + page = alloc_page(GFP_KERNEL);
> + if (!page)
> + return -ENOMEM;
> +
> + page_addr = page_address(page);
> + for (i = 0; i < UPROBE_TRAMP_MAX_SLOTS; i++)
> + memcpy(page_addr + i * UPROBE_TRAMP_SLOT_SIZE, uprobe_trampoline_slot, slot_size);
> +
> + tramp_mapping_pages[0] = page;
> return 0;
> }
>
> @@ -909,7 +939,7 @@ late_initcall(arch_uprobes_init);
>
> enum {
> EXPECT_SWBP,
> - EXPECT_CALL,
> + EXPECT_JMP,
> };
>
> struct write_opcode_ctx {
> @@ -917,14 +947,14 @@ struct write_opcode_ctx {
> int expect;
> };
>
> -static int is_call_insn(uprobe_opcode_t *insn)
> +static int is_jmp_insn(uprobe_opcode_t *insn)
> {
> - return *insn == CALL_INSN_OPCODE;
> + return *insn == JMP32_INSN_OPCODE;
> }
>
> /*
> * Verification callback used by int3_update uprobe_write calls to make sure
> - * the underlying instruction is as expected - either int3 or call.
> + * the underlying instruction is as expected - either int3 or jmp.
> */
> static int verify_insn(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode,
> int nbytes, void *data)
> @@ -939,8 +969,8 @@ static int verify_insn(struct page *page, unsigned long vaddr, uprobe_opcode_t *
> if (is_swbp_insn(&old_opcode[0]))
> return 1;
> break;
> - case EXPECT_CALL:
> - if (is_call_insn(&old_opcode[0]))
> + case EXPECT_JMP:
> + if (is_jmp_insn(&old_opcode[0]))
> return 1;
> break;
> }
> @@ -978,7 +1008,7 @@ static int int3_update(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> * so we can skip this step for optimize == true.
> */
> if (!optimize) {
> - ctx.expect = EXPECT_CALL;
> + ctx.expect = EXPECT_JMP;
> err = uprobe_write(auprobe, vma, vaddr, &int3, 1, verify_insn,
> true /* is_register */, false /* do_update_ref_ctr */,
> &ctx);
> @@ -1015,13 +1045,13 @@ static int int3_update(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> }
>
> static int swbp_optimize(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> - unsigned long vaddr, unsigned long tramp)
> + unsigned long vaddr, unsigned long slot_vaddr)
> {
> - u8 call[5];
> + u8 jmp[5];
>
> - __text_gen_insn(call, CALL_INSN_OPCODE, (const void *) vaddr,
> - (const void *) tramp, CALL_INSN_SIZE);
> - return int3_update(auprobe, vma, vaddr, call, true /* optimize */);
> + __text_gen_insn(jmp, JMP32_INSN_OPCODE, (const void *) vaddr,
> + (const void *) slot_vaddr, JMP32_INSN_SIZE);
> + return int3_update(auprobe, vma, vaddr, jmp, true /* optimize */);
> }
>
> static int swbp_unoptimize(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> @@ -1049,11 +1079,17 @@ static bool __is_optimized(uprobe_opcode_t *insn, unsigned long vaddr)
> struct __packed __arch_relative_insn {
> u8 op;
> s32 raddr;
> - } *call = (struct __arch_relative_insn *) insn;
> + } *jmp = (struct __arch_relative_insn *) insn;
>
> - if (!is_call_insn(insn))
> + if (!is_jmp_insn(&jmp->op))
> return false;
> - return __in_uprobe_trampoline(vaddr + 5 + call->raddr);
> +
> + guard(rcu)();
> + /*
> + * resolve_uprobe_addr() expects IP pointing after syscall instruction
> + * (after the slot, basically), so adjust jump target address accordingly
> + */
> + return resolve_uprobe_addr(vaddr + 5 + jmp->raddr + UPROBE_TRAMP_SLOT_SIZE, NULL);
> }
>
> static int is_optimized(struct mm_struct *mm, unsigned long vaddr)
> @@ -1113,8 +1149,9 @@ static int __arch_uprobe_optimize(struct arch_uprobe *auprobe, struct mm_struct
> {
> struct uprobe_trampoline *tramp;
> struct vm_area_struct *vma;
> + unsigned long slot_vaddr;
> bool new = false;
> - int err = 0;
> + int slot, err;
>
> vma = find_vma(mm, vaddr);
> if (!vma)
> @@ -1122,8 +1159,17 @@ static int __arch_uprobe_optimize(struct arch_uprobe *auprobe, struct mm_struct
> tramp = get_uprobe_trampoline(vaddr, &new);
> if (!tramp)
> return -EINVAL;
> - err = swbp_optimize(auprobe, vma, vaddr, tramp->vaddr);
> - if (WARN_ON_ONCE(err) && new)
> +
> + slot = tramp_alloc_slot(tramp, vaddr);
> + if (slot < 0) {
> + if (new)
> + destroy_uprobe_trampoline(tramp);
> + return slot;
> + }
> +
> + slot_vaddr = tramp->vaddr + slot * UPROBE_TRAMP_SLOT_SIZE;
> + err = swbp_optimize(auprobe, vma, vaddr, slot_vaddr);
> + if (err && new)
> destroy_uprobe_trampoline(tramp);
> return err;
> }
> diff --git a/tools/lib/bpf/features.c b/tools/lib/bpf/features.c
> index 4f19a0d79b0c..1b6c113357b2 100644
> --- a/tools/lib/bpf/features.c
> +++ b/tools/lib/bpf/features.c
> @@ -577,10 +577,12 @@ static int probe_ldimm64_full_range_off(int token_fd)
> static int probe_uprobe_syscall(int token_fd)
> {
> /*
> - * If kernel supports uprobe() syscall, it will return -ENXIO when called
> - * from the outside of a kernel-generated uprobe trampoline.
> + * If kernel supports uprobe() syscall, it will return -EPROTO when
> + * called from outside a kernel-generated uprobe trampoline.
> + * Older kernels with the red-zone-clobbering bug return -ENXIO;
> + * we only enable the nop5 optimization on fixed kernels.
> */
> - return syscall(__NR_uprobe) < 0 && errno == ENXIO;
> + return syscall(__NR_uprobe) < 0 && errno == EPROTO;
> }
> #else
> static int probe_uprobe_syscall(int token_fd)
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index 955a37751b52..0d5eb4cd1ddf 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> @@ -422,7 +422,8 @@ static void *check_attach(struct uprobe_syscall_executed *skel, trigger_t trigge
> /* .. and check the trampoline is as expected. */
> call = (struct __arch_relative_insn *) addr;
> tramp = (void *) (call + 1) + call->raddr;
> - ASSERT_EQ(call->op, 0xe8, "call");
> + tramp = (void *)((unsigned long)tramp & ~(getpagesize() - 1UL));
> + ASSERT_EQ(call->op, 0xe9, "jmp");
> ASSERT_OK(find_uprobes_trampoline(tramp), "uprobes_trampoline");
>
> return tramp;
> @@ -762,7 +763,7 @@ static void test_uprobe_error(void)
> long err = syscall(__NR_uprobe);
>
> ASSERT_EQ(err, -1, "error");
> - ASSERT_EQ(errno, ENXIO, "errno");
> + ASSERT_EQ(errno, EPROTO, "errno");
> }
>
> static void __test_uprobe_syscall(void)
> diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> index 69759b27794d..9d3744d4e936 100644
> --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> @@ -329,7 +329,7 @@ static void subtest_optimized_attach(void)
> ASSERT_EQ(*addr_2, 0x90, "nop");
>
> /* call is on addr_2 + 1 address */
> - ASSERT_EQ(*(addr_2 + 1), 0xe8, "call");
> + ASSERT_EQ(*(addr_2 + 1), 0xe9, "jmp");
> ASSERT_EQ(skel->bss->executed, 4, "executed");
>
> cleanup:
> --
> 2.53.0-Meta
>
next prev parent reply other threads:[~2026-05-10 21:25 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-09 0:30 [PATCH bpf 1/2] uprobes/x86: Fix red zone clobbering in nop5 optimization Andrii Nakryiko
2026-05-09 0:30 ` [PATCH bpf 2/2] selftests/bpf: Add tests for uprobe nop5 red zone clobbering Andrii Nakryiko
2026-05-09 2:12 ` sashiko-bot
2026-05-11 16:58 ` Andrii Nakryiko
2026-05-09 2:02 ` [PATCH bpf 1/2] uprobes/x86: Fix red zone clobbering in nop5 optimization sashiko-bot
2026-05-11 16:38 ` Andrii Nakryiko
2026-05-11 16:53 ` Andrii Nakryiko
2026-05-10 21:25 ` Jiri Olsa [this message]
2026-05-11 16:41 ` Andrii Nakryiko
2026-05-12 5:14 ` Masami Hiramatsu
2026-05-11 14:45 ` Oleg Nesterov
2026-05-11 16:56 ` Andrii Nakryiko
2026-05-11 17:24 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agD3xq2MUyskd7X-@krava \
--to=olsajiri@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox