From: Jiri Olsa <olsajiri@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCHv4 05/13] uprobes/x86: Move optimized uprobe from nop5 to nop10
Date: Tue, 9 Jun 2026 13:44:14 +0200 [thread overview]
Message-ID: <aif8jkUPst-tTskE@krava> (raw)
In-Reply-To: <CAEf4BzZiFUeFa4fdghuSGpqiEzyvJVxs6TDgJd5U4hSj--imiQ@mail.gmail.com>
On Mon, Jun 08, 2026 at 01:46:39PM -0700, Andrii Nakryiko wrote:
> On Tue, May 26, 2026 at 1:59 PM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Andrii reported an issue with optimized uprobes [1] that can clobber
> > redzone area with call instruction storing return address on stack
> > where user code may keep temporary data without adjusting rsp.
> >
> > Fixing this by moving the optimized uprobes on top of 10-bytes nop
> > instruction, so we can squeeze another instruction to escape the
> > redzone area before doing the call, like:
> >
> > lea -0x80(%rsp), %rsp
> > call tramp
> >
> > Note the lea instruction is used to adjust the rsp register without
> > changing the flags.
> >
> > We use nop10 and following transformation to optimized instructions
> > above and back as suggested by Peterz [2].
> >
> > Optimize path (int3_update_optimize):
> >
> > 1) Initial state after set_swbp() installed the uprobe:
> > cc 2e 0f 1f 84 00 00 00 00 00
> >
> > From offset 0 this is INT3 followed by the tail of the original
> > 10-byte NOP.
> >
> > After a previous unoptimization bytes 5..9 may still contain the
> > old call instruction, which remains valid for threads already there.
> >
> > 2) Rewrite the LEA tail and call displacement:
> > cc [8d 64 24 80 e8 d0 d1 d2 d3]
> >
> > From offset 0 this traps on the uprobe INT3. Bytes 1..9 are not
> > executable entry points while byte 0 is trapped.
> >
> > 3) Publish the first LEA byte:
> > [48] 8d 64 24 80 e8 d0 d1 d2 d3
> >
> > From offset 0 this is:
> > lea -0x80(%rsp), %rsp
> > call <uprobe-trampoline>
> >
> > Unoptimize path (int3_update_unoptimize):
> >
> > 1) Initial optimized state:
> > 48 8d 64 24 80 e8 d0 d1 d2 d3
> > Same as 3) above.
> >
> > 2) Trap new entries before restoring the NOP bytes:
> > [cc] 8d 64 24 80 e8 d0 d1 d2 d3
> >
> > From offset 0 this traps. A thread that had already executed the
> > LEA can still reach the intact CALL at offset 5.
> >
> > 3) Restore bytes 1..4 of the original NOP while keeping byte 0 trapped
> > and byte 5 as CALL.
> > cc [2e 0f 1f 84] e8 d0 d1 d2 d3
> >
> > From offset 0 this still traps. Offset 5 is still the CALL for any
> > thread that was already past the first LEA byte.
> >
> > 4) Publish the first byte of the original NOP:
> > [66] 2e 0f 1f 84 e8 d0 d1 d2 d3
> >
> > From offset 0 this is the restored 10-byte NOP; the CALL opcode and
> > displacement are now only NOP operands. Offset 5 still decodes as
> > CALL for a thread that was already there.
> >
> > Tthere is only a single target uprobe-trampoline for the given nop10
> > instruction address, so the CALL instruction will not be changed across
> > unoptimization/optimization cycles.
> > Therefore, any task that is preempted at the CALL instruction is guaranteed
> > to observe that CALL and not anything else.
> >
> > Note as explained in [2] we need to use following nop10:
> > PF1 PF2 ESC NOPL MOD SIB DISP32
> > NOP10: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1)
> >
> > which means we need to allow 0x2e prefix which maps to INAT_PFX_CS
> > attribute in is_prefix_bad function.
> >
> > Also changing the uprobe syscall error when called out of uprobe
> > trampoline to -EPROTO, so we are able to detect the fixed kernel.
> >
> > The optimized uprobe performance stays the same:
> >
> > uprobe-nop : 3.129 ± 0.013M/s
> > uprobe-push : 3.045 ± 0.006M/s
> > uprobe-ret : 1.095 ± 0.004M/s
> > --> uprobe-nop10 : 7.170 ± 0.020M/s
> > uretprobe-nop : 2.143 ± 0.021M/s
> > uretprobe-push : 2.090 ± 0.000M/s
> > uretprobe-ret : 0.942 ± 0.000M/s
> > --> uretprobe-nop10: 3.381 ± 0.003M/s
> > usdt-nop : 3.245 ± 0.004M/s
> > --> usdt-nop10 : 7.256 ± 0.023M/s
> >
> > [1] https://lore.kernel.org/bpf/20260509003146.976844-1-andrii@kernel.org/
> > [2] https://lore.kernel.org/bpf/20260518104306.GU3102624@noisy.programming.kicks-ass.net/#t
> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > Closes: https://lore.kernel.org/bpf/20260509003146.976844-1-andrii@kernel.org/
> > Fixes: ba2bfc97b462 ("uprobes/x86: Add support to optimize uprobes")
> > Assisted-by: Codex:GPT-5.5
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> > arch/x86/kernel/uprobes.c | 255 ++++++++++++++++++++++++++++----------
> > 1 file changed, 190 insertions(+), 65 deletions(-)
> >
>
> [...]
>
> > @@ -943,13 +1026,31 @@ static int int3_update(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> > smp_text_poke_sync_each_cpu();
> >
> > /*
> > - * Write first byte.
> > + * 3) Restore bytes 1..4 of the original NOP while keeping byte 0 trapped
> > + * and byte 5 as CALL:
> > + * cc [2e 0f 1f 84] e8 d0 d1 d2 d3
> > + */
> > + ctx.expect = EXPECT_SWBP_OPTIMIZED;
> > + err = uprobe_write(auprobe, vma, vaddr + 1, insn + 1,
> > + LEA_INSN_SIZE - 1, verify_insn,
> > + true /* is_register */, false /* do_update_ref_ctr */,
>
> tbh, it's quite subtle and non-obvious why is_register should be set
> to true first two times (and especially that is_register and
> do_update_ref_ctr are implicitly connected), not sure how to make it
> cleaner, but maybe leave a short comment explaining this twice
> register, once unregister sequence?
ok, I came up with comment below
thanks,
jirka
---
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index de544516ea70..92449f34c005 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -1011,6 +1011,12 @@ static int int3_update_unoptimize(struct arch_uprobe *auprobe, struct vm_area_st
int err;
/*
+ * Note the first two uprobe_write calls use is_register=true, because they
+ * are intermediate patching states while the probe is still active.
+ *
+ * The last uprobe_write to nop10 instruction is called with is_register=false
+ * and do_update_ref_ctr=true to trigger the refctr update.
+ *
* 1) Initial optimized state:
* 48 8d 64 24 80 e8 d0 d1 d2 d3
*
next prev parent reply other threads:[~2026-06-09 11:44 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-26 20:58 [PATCHv4 00/13] uprobes/x86: Fix red zone issue for optimized uprobes Jiri Olsa
2026-05-26 20:58 ` [PATCHv4 01/13] uprobes/x86: Use proper mm_struct in __in_uprobe_trampoline Jiri Olsa
2026-05-26 20:58 ` [PATCHv4 02/13] uprobes/x86: Remove struct uprobe_trampoline object Jiri Olsa
2026-05-26 21:46 ` bot+bpf-ci
2026-05-27 9:58 ` Jiri Olsa
2026-06-01 8:31 ` Jiri Olsa
2026-05-26 20:58 ` [PATCHv4 03/13] uprobes/x86: Allow to copy uprobe trampolines on fork Jiri Olsa
2026-05-26 21:46 ` bot+bpf-ci
2026-05-27 9:58 ` Jiri Olsa
2026-05-26 20:58 ` [PATCHv4 04/13] uprobes/x86: Unmap trampoline vma object in case it's unused Jiri Olsa
2026-05-26 21:46 ` bot+bpf-ci
2026-05-27 9:57 ` Jiri Olsa
2026-05-26 20:58 ` [PATCHv4 05/13] uprobes/x86: Move optimized uprobe from nop5 to nop10 Jiri Olsa
2026-06-08 20:46 ` Andrii Nakryiko
2026-06-09 11:44 ` Jiri Olsa [this message]
2026-06-09 16:43 ` Andrii Nakryiko
2026-06-10 8:18 ` Jiri Olsa
2026-06-10 18:02 ` Andrii Nakryiko
2026-05-26 20:58 ` [PATCHv4 06/13] libbpf: Change has_nop_combo to work on top of nop10 Jiri Olsa
2026-05-26 21:28 ` sashiko-bot
2026-05-27 9:57 ` Jiri Olsa
2026-05-26 21:46 ` bot+bpf-ci
2026-05-27 9:57 ` Jiri Olsa
2026-05-26 20:58 ` [PATCHv4 07/13] libbpf: Detect uprobe syscall with new error Jiri Olsa
2026-05-26 21:36 ` sashiko-bot
2026-05-26 21:46 ` bot+bpf-ci
2026-05-26 20:58 ` [PATCHv4 08/13] selftests/bpf: Emit nop,nop10 instructions combo for x86_64 arch Jiri Olsa
2026-05-26 21:19 ` sashiko-bot
2026-05-26 21:46 ` bot+bpf-ci
2026-05-26 20:58 ` [PATCHv4 09/13] selftests/bpf: Change uprobe syscall tests to use nop10 Jiri Olsa
2026-05-26 21:15 ` sashiko-bot
2026-05-27 9:58 ` Jiri Olsa
2026-05-26 21:46 ` bot+bpf-ci
2026-05-27 9:58 ` Jiri Olsa
2026-05-27 10:30 ` Jakub Sitnicki
2026-05-26 20:58 ` [PATCHv4 10/13] selftests/bpf: Change uprobe/usdt trigger bench code " Jiri Olsa
2026-05-27 10:46 ` Jakub Sitnicki
2026-05-26 20:58 ` [PATCHv4 11/13] selftests/bpf: Add reattach tests for uprobe syscall Jiri Olsa
2026-05-27 11:32 ` Jakub Sitnicki
2026-05-28 11:10 ` Jiri Olsa
2026-05-26 20:58 ` [PATCHv4 12/13] selftests/bpf: Add tests for uprobe nop10 red zone clobbering Jiri Olsa
2026-05-26 21:46 ` bot+bpf-ci
2026-05-27 10:26 ` Jiri Olsa
2026-05-28 12:46 ` Jakub Sitnicki
2026-05-26 20:58 ` [PATCHv4 13/13] selftests/bpf: Add tests for forked/cloned optimized uprobes Jiri Olsa
2026-05-28 13:00 ` Jakub Sitnicki
2026-06-01 8:31 ` Jiri Olsa
2026-06-04 6:59 ` [PATCHv4 00/13] uprobes/x86: Fix red zone issue for " Jiri Olsa
2026-06-08 20:48 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aif8jkUPst-tTskE@krava \
--to=olsajiri@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.