Re: [PATCH 1/7] uprobes/x86: Move optimized uprobe from nop5 to nop10

Linux Trace Kernel
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Jiri Olsa <jolsa@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/7] uprobes/x86: Move optimized uprobe from nop5 to nop10
Date: Mon, 18 May 2026 12:43:06 +0200	[thread overview]
Message-ID: <20260518104306.GU3102624@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20260514135342.22130-2-jolsa@kernel.org>


You seem to have forgotten to Cc LKML and x86 :-(

On Thu, May 14, 2026 at 03:53:36PM +0200, Jiri Olsa wrote:

> @@ -1017,17 +1030,32 @@ static int int3_update(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
>  static int swbp_optimize(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
>  			 unsigned long vaddr, unsigned long tramp)
>  {
> -	u8 call[5];
> +	u8 insn[OPT_INSN_SIZE], *call = &insn[LEA_INSN_SIZE];
>  
> -	__text_gen_insn(call, CALL_INSN_OPCODE, (const void *) vaddr,
> +	/*
> +	 * We have nop10 instruction (with first byte overwritten to int3),
> +	 * changing it to:
> +	 *   lea -0x80(%rsp), %rsp
> +	 *   call tramp
> +	 */
> +	memcpy(insn, lea_rsp, LEA_INSN_SIZE);
> +	__text_gen_insn(call, CALL_INSN_OPCODE,
> +			(const void *) (vaddr + LEA_INSN_SIZE),
>  			(const void *) tramp, CALL_INSN_SIZE);
> -	return int3_update(auprobe, vma, vaddr, call, true /* optimize */);
> +	return int3_update(auprobe, vma, vaddr, insn, OPT_INSN_SIZE, true /* optimize */);
>  }
>  
>  static int swbp_unoptimize(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
>  			   unsigned long vaddr)
>  {
> -	return int3_update(auprobe, vma, vaddr, auprobe->insn, false /* optimize */);
> +	/*
> +	 * We have optimized nop10 (lea, call), changing it to 'jmp rel8' to
> +	 * end of the 10-byte slot instead of restoring the original nop10,
> +	 * because we could have thread already inside lea instruction.

Inaccurate, RIP could be on CALL, not inside LEA. Writing NOP10 would
make it inside NOP10 though, and that would cause havoc IF you use the
normal NOP10.

Thing is, the encoding of NOP{8,9,10} would actually allow you to
preserve the CALL instruction :-)

That is, observe:

       PF1   PF2   ESC   NOPL  MOD   SIB   DISP32

NOP10: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1)
NOP10: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0xe8, 0x78, 0x56, 0x34, 0x12 -- cs nopw 0x12345678(%rax,%rbp,8)

Specifically the CALL opcode sits in the SIB byte and decodes like:

  e8 := 11 101 000

  scale = 11  (2^3 = 8)
  index = 101 BP
  base  = 000 AX

And the displacement is just that, a displacement.

So you *could* in fact, write back _A_ NOP10, just not the standard
NOP10.

> +	 */
> +	u8 jmp[OPT_INSN_SIZE] = { JMP8_INSN_OPCODE, OPT_JMP8_OFFSET };
> +
> +	return int3_update(auprobe, vma, vaddr, jmp, JMP8_INSN_SIZE, false /* optimize */);
>  }

Changelog wants significant update to explain this scheme.

So we have:

  NOP10 -+-> LEA -0x80(%rsp), %rsp, CALL foo -> JMP.d8 +8
         |                                          |
         `------------------------------------------'

And you want to belabour the point of how you ensure re-writing the CALL
instruction isn't a problem (because I'm not convinced).

Note that the above results in:

initial:
0: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1)

optimize-int3:
1: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- int3
optimize-tail:
2: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
optimize-finish:
3: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- lea -0x80(%rsp),%rsp; call 0x78563412

unoptimize-int3:
4: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
unoptimize-tail:
5: 0xcc, 0x08, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
unoptimize-finish:
6: 0xeb, 0x08, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- jmp.d8 +8; call 0x78563412

optimize-int3:
7: 0xcc, 0x08, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
optimize-tail:
8: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- int3; call 0x12345678
optimize-finish:
9: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- int3; call 0x12345678

Note that from step 7 to step 8, you re-write the CALL instruction
without going through INT3. This means it is entirely possible for a
concurrent execution to observe a composite instruction.

This is NOT sound!

However, I think it can be salvaged, if instead of only writing INT3 at
+0, you also write INT3 at +5. The sequence then becomes:

initial:
0: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1)

optimize-int3:
1: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0xcc, 0x00, 0x00, 0x00, 0x00 -- int3; int3
optimize-tail(s):
2: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xcc, 0x12, 0x34, 0x56, 0x78 -- int3; int3
optimize-finish-1:
3: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
optimize-finish-2:
3: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- lea -0x80(%rsp),%rsp; call 0x78563412

unoptimize-int3:
4: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
unoptimize-tail:
5: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412
unoptimize-finish:
6: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0xe8, 0x12, 0x34, 0x56, 0x78 -- cs nopw 0x78563412(%rax,%rbp,8); call 0x78563412

optimize-int3:
7: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0xcc, 0x12, 0x34, 0x56, 0x78 -- int3; int3
optimize-tail(s):
8: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xcc, 0x78, 0x56, 0x34, 0x12 -- int3; int3
optimize-finish-1:
9: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- int3; call 0x12345678
optimize-finish-2:
9: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- lea -0x80(%rsp),%rsp; call 0x12345678

> @@ -1095,14 +1125,25 @@ int set_orig_insn(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
>  		  unsigned long vaddr)
>  {
>  	if (test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags)) {
> -		int ret = is_optimized(vma->vm_mm, vaddr);
> -		if (ret < 0)
> +		uprobe_opcode_t insn[OPT_INSN_SIZE];
> +		int ret;
> +
> +		ret = copy_from_vaddr(vma->vm_mm, vaddr, &insn, OPT_INSN_SIZE);
> +		if (ret)
>  			return ret;
> -		if (ret) {
> +		if (__is_optimized((uprobe_opcode_t *)&insn, vaddr)) {
>  			ret = swbp_unoptimize(auprobe, vma, vaddr);
>  			WARN_ON_ONCE(ret);
>  			return ret;
>  		}
> +		/*
> +		 * We can have re-attached probe on top of jmp8 instruction,
> +		 * which did not get optimized. We need to restore the jmp8
> +		 * instruction, instead of the original instruction (nop10).
> +		 */
> +		if (is_swbp_insn(&insn[0]) && insn[1] == OPT_JMP8_OFFSET)
> +			return uprobe_write_opcode(auprobe, vma, vaddr, JMP8_INSN_OPCODE,
> +						   false /* is_register */);

Coding style wants { } on any multi-line statement, even if its only one
statement.

>  	}
>  	return uprobe_write_opcode(auprobe, vma, vaddr, *(uprobe_opcode_t *)&auprobe->insn,
>  				   false /* is_register */);

next prev parent reply	other threads:[~2026-05-18 10:43 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14 13:53 [PATCH 0/7] uprobes/x86: Fix red zone issue for optimized uprobes Jiri Olsa
2026-05-14 13:53 ` [PATCH 1/7] uprobes/x86: Move optimized uprobe from nop5 to nop10 Jiri Olsa
2026-05-14 16:54   ` Jakub Sitnicki
2026-05-15 12:31     ` Jiri Olsa
2026-05-15 20:31   ` Andrii Nakryiko
2026-05-17 11:45     ` Jiri Olsa
2026-05-18 10:43   ` Peter Zijlstra [this message]
2026-05-18 16:14     ` Andrii Nakryiko
2026-05-18 16:39     ` Jiri Olsa
2026-05-14 13:53 ` [PATCH 2/7] libbpf: Change has_nop_combo to work on top of nop10 Jiri Olsa
2026-05-14 14:55   ` bot+bpf-ci
2026-05-15 12:32     ` Jiri Olsa
2026-05-15 11:12   ` Jakub Sitnicki
2026-05-14 13:53 ` [PATCH 3/7] selftests/bpf: Emit nop,nop10 instructions combo for x86_64 arch Jiri Olsa
2026-05-14 13:53 ` [PATCH 4/7] selftests/bpf: Change uprobe syscall tests to use nop10 Jiri Olsa
2026-05-14 13:53 ` [PATCH 5/7] selftests/bpf: Change uprobe/usdt trigger bench code " Jiri Olsa
2026-05-14 13:53 ` [PATCH 6/7] selftests/bpf: Add reattach tests for uprobe syscall Jiri Olsa
2026-05-14 13:53 ` [PATCH 7/7] selftests/bpf: Add tests for uprobe nop10 red zone clobbering Jiri Olsa
2026-05-14 14:55   ` bot+bpf-ci
2026-05-18  7:30     ` Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260518104306.GU3102624@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox