From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3CBC33A717; Mon, 18 May 2026 10:43:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779101004; cv=none; b=sMNln+AUg/0XAtisiVapoNns8m6WCEvT/6oc1DjpRvggieoVV+WjPu7slLrbjxrx+0pYiRymatVkUkaMrvRo1hN3bjXDTuE/xAlI4KWd6ZC8rL0q9A9rs4TI9TBr+OzQ0dVTpNsC0mcQbjCCA4hqO7UWlC8c+RBkx937c5QyZik= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779101004; c=relaxed/simple; bh=2NhjxYRBxcXUSncpFcRjF45CLK7nNKc9Zu1f7G+ZwBA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ABArk7qZWSsr7s9IS2CxPq4VywK+8l7LK3GFQfQlyzKssLVDWmw6HE6Gj6cNz9rFAA4NSe0FZtNI2PP+oCEiuyOh2U0nkE4YCrGP1PpW86JlMfPq7vIjXwFD25eY+hiAdF2UgNr+xy1PvwvmmkhTPjYCabFl5aTo0MfsVSxz95Q= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=vXxyJs8+; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="vXxyJs8+" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=dJ5XIKQl59J8DrbBzzXCcVwZkYeS3cJ80gBNWoPVePY=; b=vXxyJs8+DFrwpc7bNWz52hMBvN A3v1MEdSyllEv1ta1eFYrx7xA8ApRzKGSgXx2qc6PnThwofWu7N996hCUXYl1xWo3IEHwD8H/bnWe i/pkdqgFIyIfbb7VKreo4lq8jcNqMelXKq2ih50AIDwIH2fyBH00HAjWMo4QjYx9ILUOmjK41bXxL 2a4vuYr+MrBr3xCrIjMjrEfDjHU4jwlbWBvGf+l2JrErDYfnhM03VegmHbC49+r7+7e6CAIoMGwob W2Ey/MfdsHXqXqzyb2rvcrKdD1fsnt1KULnHqiARsdsne6ap0NUyaErrwTLRaMF/JoiWd2boZBxPl 47cvTgIQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wOvRE-00000004g5e-29kT; Mon, 18 May 2026 10:43:08 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id CEB973007A4; Mon, 18 May 2026 12:43:06 +0200 (CEST) Date: Mon, 18 May 2026 12:43:06 +0200 From: Peter Zijlstra To: Jiri Olsa Cc: Oleg Nesterov , Ingo Molnar , Masami Hiramatsu , Andrii Nakryiko , bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/7] uprobes/x86: Move optimized uprobe from nop5 to nop10 Message-ID: <20260518104306.GU3102624@noisy.programming.kicks-ass.net> References: <20260514135342.22130-1-jolsa@kernel.org> <20260514135342.22130-2-jolsa@kernel.org> Precedence: bulk X-Mailing-List: linux-trace-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260514135342.22130-2-jolsa@kernel.org> You seem to have forgotten to Cc LKML and x86 :-( On Thu, May 14, 2026 at 03:53:36PM +0200, Jiri Olsa wrote: > @@ -1017,17 +1030,32 @@ static int int3_update(struct arch_uprobe *auprobe, struct vm_area_struct *vma, > static int swbp_optimize(struct arch_uprobe *auprobe, struct vm_area_struct *vma, > unsigned long vaddr, unsigned long tramp) > { > - u8 call[5]; > + u8 insn[OPT_INSN_SIZE], *call = &insn[LEA_INSN_SIZE]; > > - __text_gen_insn(call, CALL_INSN_OPCODE, (const void *) vaddr, > + /* > + * We have nop10 instruction (with first byte overwritten to int3), > + * changing it to: > + * lea -0x80(%rsp), %rsp > + * call tramp > + */ > + memcpy(insn, lea_rsp, LEA_INSN_SIZE); > + __text_gen_insn(call, CALL_INSN_OPCODE, > + (const void *) (vaddr + LEA_INSN_SIZE), > (const void *) tramp, CALL_INSN_SIZE); > - return int3_update(auprobe, vma, vaddr, call, true /* optimize */); > + return int3_update(auprobe, vma, vaddr, insn, OPT_INSN_SIZE, true /* optimize */); > } > > static int swbp_unoptimize(struct arch_uprobe *auprobe, struct vm_area_struct *vma, > unsigned long vaddr) > { > - return int3_update(auprobe, vma, vaddr, auprobe->insn, false /* optimize */); > + /* > + * We have optimized nop10 (lea, call), changing it to 'jmp rel8' to > + * end of the 10-byte slot instead of restoring the original nop10, > + * because we could have thread already inside lea instruction. Inaccurate, RIP could be on CALL, not inside LEA. Writing NOP10 would make it inside NOP10 though, and that would cause havoc IF you use the normal NOP10. Thing is, the encoding of NOP{8,9,10} would actually allow you to preserve the CALL instruction :-) That is, observe: PF1 PF2 ESC NOPL MOD SIB DISP32 NOP10: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1) NOP10: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0xe8, 0x78, 0x56, 0x34, 0x12 -- cs nopw 0x12345678(%rax,%rbp,8) Specifically the CALL opcode sits in the SIB byte and decodes like: e8 := 11 101 000 scale = 11 (2^3 = 8) index = 101 BP base = 000 AX And the displacement is just that, a displacement. So you *could* in fact, write back _A_ NOP10, just not the standard NOP10. > + */ > + u8 jmp[OPT_INSN_SIZE] = { JMP8_INSN_OPCODE, OPT_JMP8_OFFSET }; > + > + return int3_update(auprobe, vma, vaddr, jmp, JMP8_INSN_SIZE, false /* optimize */); > } Changelog wants significant update to explain this scheme. So we have: NOP10 -+-> LEA -0x80(%rsp), %rsp, CALL foo -> JMP.d8 +8 | | `------------------------------------------' And you want to belabour the point of how you ensure re-writing the CALL instruction isn't a problem (because I'm not convinced). Note that the above results in: initial: 0: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1) optimize-int3: 1: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- int3 optimize-tail: 2: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412 optimize-finish: 3: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- lea -0x80(%rsp),%rsp; call 0x78563412 unoptimize-int3: 4: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412 unoptimize-tail: 5: 0xcc, 0x08, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412 unoptimize-finish: 6: 0xeb, 0x08, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- jmp.d8 +8; call 0x78563412 optimize-int3: 7: 0xcc, 0x08, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412 optimize-tail: 8: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- int3; call 0x12345678 optimize-finish: 9: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- int3; call 0x12345678 Note that from step 7 to step 8, you re-write the CALL instruction without going through INT3. This means it is entirely possible for a concurrent execution to observe a composite instruction. This is NOT sound! However, I think it can be salvaged, if instead of only writing INT3 at +0, you also write INT3 at +5. The sequence then becomes: initial: 0: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00 -- cs nopw 0x00000000(%rax,%rax,1) optimize-int3: 1: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0xcc, 0x00, 0x00, 0x00, 0x00 -- int3; int3 optimize-tail(s): 2: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xcc, 0x12, 0x34, 0x56, 0x78 -- int3; int3 optimize-finish-1: 3: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412 optimize-finish-2: 3: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- lea -0x80(%rsp),%rsp; call 0x78563412 unoptimize-int3: 4: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412 unoptimize-tail: 5: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0xe8, 0x12, 0x34, 0x56, 0x78 -- int3; call 0x78563412 unoptimize-finish: 6: 0x66, 0x2e, 0x0f, 0x1f, 0x84, 0xe8, 0x12, 0x34, 0x56, 0x78 -- cs nopw 0x78563412(%rax,%rbp,8); call 0x78563412 optimize-int3: 7: 0xcc, 0x2e, 0x0f, 0x1f, 0x84, 0xcc, 0x12, 0x34, 0x56, 0x78 -- int3; int3 optimize-tail(s): 8: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xcc, 0x78, 0x56, 0x34, 0x12 -- int3; int3 optimize-finish-1: 9: 0xcc, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- int3; call 0x12345678 optimize-finish-2: 9: 0x48, 0x8d, 0x64, 0x24, 0x80, 0xe8, 0x78, 0x56, 0x34, 0x12 -- lea -0x80(%rsp),%rsp; call 0x12345678 > @@ -1095,14 +1125,25 @@ int set_orig_insn(struct arch_uprobe *auprobe, struct vm_area_struct *vma, > unsigned long vaddr) > { > if (test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags)) { > - int ret = is_optimized(vma->vm_mm, vaddr); > - if (ret < 0) > + uprobe_opcode_t insn[OPT_INSN_SIZE]; > + int ret; > + > + ret = copy_from_vaddr(vma->vm_mm, vaddr, &insn, OPT_INSN_SIZE); > + if (ret) > return ret; > - if (ret) { > + if (__is_optimized((uprobe_opcode_t *)&insn, vaddr)) { > ret = swbp_unoptimize(auprobe, vma, vaddr); > WARN_ON_ONCE(ret); > return ret; > } > + /* > + * We can have re-attached probe on top of jmp8 instruction, > + * which did not get optimized. We need to restore the jmp8 > + * instruction, instead of the original instruction (nop10). > + */ > + if (is_swbp_insn(&insn[0]) && insn[1] == OPT_JMP8_OFFSET) > + return uprobe_write_opcode(auprobe, vma, vaddr, JMP8_INSN_OPCODE, > + false /* is_register */); Coding style wants { } on any multi-line statement, even if its only one statement. > } > return uprobe_write_opcode(auprobe, vma, vaddr, *(uprobe_opcode_t *)&auprobe->insn, > false /* is_register */);