From: Peter Zijlstra <peterz@infradead.org>
To: Jiri Olsa <jolsa@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@kernel.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Subject: Re: [PATCHv3 04/12] uprobes/x86: Move optimized uprobe from nop5 to nop10
Date: Thu, 21 May 2026 15:35:48 +0200 [thread overview]
Message-ID: <20260521133548.GK3126523@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20260521124411.31133-5-jolsa@kernel.org>
On Thu, May 21, 2026 at 02:44:03PM +0200, Jiri Olsa wrote:
> Andrii reported an issue with optimized uprobes [1] that can clobber
> redzone area with call instruction storing return address on stack
> where user code may keep temporary data without adjusting rsp.
>
> Fixing this by moving the optimized uprobes on top of 10-bytes nop
> instruction, so we can squeeze another instruction to escape the
> redzone area before doing the call, like:
>
> lea -0x80(%rsp), %rsp
> call tramp
>
> Note the lea instruction is used to adjust the rsp register without
> changing the flags.
>
> We use nop10 and following transofrmation to optimized instructions
> above and back as suggested by Peterz [2].
>
> Optimize path (int3_update_optimize):
>
> 1) Initial state after set_swbp() installed the uprobe:
> cc 2e 0f 1f 84 00 00 00 00 00
>
> From offset 0 this is INT3 followed by the tail of the original
> 10-byte NOP.
>
> 2) Trap the call slot before rewriting the NOP tail:
> cc 2e 0f 1f 84 [cc] 00 00 00 00
>
> From offset 0 this traps on the uprobe INT3. A thread reaching
> offset 5 traps on the temporary INT3 instead of seeing a partially
> patched call.
>
> 3) Rewrite the LEA tail and call displacement, keeping both INT3 bytes:
> cc [8d 64 24 80] cc [d0 d1 d2 d3]
>
> From offset 0 and offset 5 this still traps. The bytes between
> them are not executable entry points while both traps are in place.
>
> 4) Restore the call opcode at offset 5:
> cc 8d 64 24 80 [e8] d0 d1 d2 d3
>
> From offset 0 this still traps. From offset 5 the instruction is
> the final CALL to the uprobe trampoline.
>
> 5) Publish the first LEA byte:
> [48] 8d 64 24 80 e8 d0 d1 d2 d3
>
> From offset 0 this is:
> lea -0x80(%rsp), %rsp
> call <uprobe-trampoline>
>
> Unoptimize path (int3_update_unoptimize):
>
> 1) Initial optimized state:
> 48 8d 64 24 80 e8 d0 d1 d2 d3
> Same as 5) above.
>
> 2) Trap new entries before restoring the NOP bytes:
> [cc] 8d 64 24 80 e8 d0 d1 d2 d3
>
> From offset 0 this traps. A thread that had already executed the
> LEA can still reach the intact CALL at offset 5.
>
> 3) Restore bytes 1..4 of the original NOP while keeping byte 0 trapped
> and byte 5 as CALL.
> cc [2e 0f 1f 84] e8 d0 d1 d2 d3
>
> From offset 0 this still traps. Offset 5 is still the CALL for any
> thread that was already past the first LEA byte.
>
> 4) Publish the first byte of the original NOP:
> [66] 2e 0f 1f 84 e8 d0 d1 d2 d3
>
> From offset 0 this is the restored 10-byte NOP; the CALL opcode and
> displacement are now only NOP operands. Offset 5 still decodes as
> CALL for a thread that was already there.
>
> Note as explained in [2] we need to use following nop10:
> PF1 PF2 ESC NOPL MOD SIB DISP32
> NOP10: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1)
>
> which means we need to allow 0x2e prefix which maps to INAT_PFX_CS
> attribute in is_prefix_bad function.
>
> The optimized uprobe performance stays the same:
>
> uprobe-nop : 3.129 ± 0.013M/s
> uprobe-push : 3.045 ± 0.006M/s
> uprobe-ret : 1.095 ± 0.004M/s
> --> uprobe-nop10 : 7.170 ± 0.020M/s
> uretprobe-nop : 2.143 ± 0.021M/s
> uretprobe-push : 2.090 ± 0.000M/s
> uretprobe-ret : 0.942 ± 0.000M/s
> --> uretprobe-nop10: 3.381 ± 0.003M/s
> usdt-nop : 3.245 ± 0.004M/s
> --> usdt-nop10 : 7.256 ± 0.023M/s
>
> @@ -893,48 +918,134 @@ static int verify_insn(struct page *page, unsigned long vaddr, uprobe_opcode_t *
> }
>
> /*
> + * Modify the optimized instruction by using INT3 breakpoints on SMP.
> * We completely avoid using stop_machine() here, and achieve the
> * synchronization using INT3 breakpoints and SMP cross-calls.
> * (borrowed comment from smp_text_poke_batch_finish)
> *
> + * The way it is done for optimization (int3_update_optimize):
> + * 1) Start with the uprobe INT3 trap already installed
> + * 2) Add an INT3 trap to the call slot
> + * 3) Update everything but the first byte and the call opcode
> + * 4) Replace the call slot INT3 by the call opcode
> + * 5) Replace the first INT3 by the first byte of the LEA instruction
> + *
> + * The way it is done for unoptimization (int3_update_unoptimize):
> + * 1) Start with the optimized uprobe lea/call instructions
> + * 2) Add an INT3 trap to the address that will be patched
> + * 3) Restore the NOP bytes before the call opcode
> + * 4) Replace the first INT3 by the first byte of the NOP instruction
> + *
> + * Note that unoptimization deliberately keeps the call opcode and displacement
> + * in bytes 5..9. Those bytes become operands of the restored 10-byte NOP.
> */
One important thing to note is that (as earlier noted by Andrii) the
CALL address is never changed. A new optimization pass will not change
the CALL instruction again.
If you noted this anywhere, I failed to find it. This is crucially
important for the correctness of the scheme and should not be emitted.
That is, please add something like:
"Since there is only a single uprobe-trampoline, the CALL instruction
will not be changed across unoptimization/optimization cycles.
Therefore, any task that is preempted at the CALL instruction is
guaranteed to observe that CALL and not anything else."
next prev parent reply other threads:[~2026-05-21 13:35 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-21 12:43 [PATCHv3 00/12] uprobes/x86: Fix red zone issue for optimized uprobes Jiri Olsa
2026-05-21 12:44 ` [PATCHv3 01/12] uprobes/x86: Use proper mm_struct in __in_uprobe_trampoline Jiri Olsa
2026-05-21 12:44 ` [PATCHv3 02/12] uprobes/x86: Remove struct uprobe_trampoline object Jiri Olsa
2026-05-21 13:26 ` bot+bpf-ci
2026-05-21 12:44 ` [PATCHv3 03/12] uprobes/x86: Allow to copy uprobe trampolines on fork Jiri Olsa
2026-05-21 12:44 ` [PATCHv3 04/12] uprobes/x86: Move optimized uprobe from nop5 to nop10 Jiri Olsa
2026-05-21 13:35 ` Peter Zijlstra [this message]
2026-05-21 12:44 ` [PATCHv3 05/12] libbpf: Change has_nop_combo to work on top of nop10 Jiri Olsa
2026-05-21 12:44 ` [PATCHv3 06/12] libbpf: Detect uprobe syscall with new error Jiri Olsa
2026-05-21 13:26 ` bot+bpf-ci
2026-05-21 12:44 ` [PATCHv3 07/12] selftests/bpf: Emit nop,nop10 instructions combo for x86_64 arch Jiri Olsa
2026-05-21 13:26 ` bot+bpf-ci
2026-05-21 12:44 ` [PATCHv3 08/12] selftests/bpf: Change uprobe syscall tests to use nop10 Jiri Olsa
2026-05-21 13:26 ` bot+bpf-ci
2026-05-21 12:44 ` [PATCHv3 09/12] selftests/bpf: Change uprobe/usdt trigger bench code " Jiri Olsa
2026-05-21 12:44 ` [PATCHv3 10/12] selftests/bpf: Add reattach tests for uprobe syscall Jiri Olsa
2026-05-21 12:44 ` [PATCHv3 11/12] selftests/bpf: Add tests for uprobe nop10 red zone clobbering Jiri Olsa
2026-05-21 13:26 ` bot+bpf-ci
2026-05-21 12:44 ` [PATCHv3 12/12] selftests/bpf: Add tests for forked/cloned optimized uprobes Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260521133548.GK3126523@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=jolsa@kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox