linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64
@ 2025-04-21 21:44 Jiri Olsa
  2025-04-21 21:44 ` [PATCH perf/core 01/22] uprobes: Rename arch_uretprobe_trampoline function Jiri Olsa
                   ` (21 more replies)
  0 siblings, 22 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: Alejandro Colomar, Eyal Birger, kees, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

hi,
this patchset adds support to optimize usdt probes on top of 5-byte
nop instruction.

The generic approach (optimize all uprobes) is hard due to emulating
possible multiple original instructions and its related issues. The
usdt case, which stores 5-byte nop seems much easier, so starting
with that.

The basic idea is to replace breakpoint exception with syscall which
is faster on x86_64. For more details please see changelog of patch 8.

The run_bench_uprobes.sh benchmark triggers uprobe (on top of different
original instructions) in a loop and counts how many of those happened
per second (the unit below is million loops).

There's big speed up if you consider current usdt implementation
(uprobe-nop) compared to proposed usdt (uprobe-nop5):

current:
        usermode-count :  152.501 ± 0.012M/s
        syscall-count  :   14.463 ± 0.062M/s
-->     uprobe-nop     :    3.160 ± 0.005M/s
        uprobe-push    :    3.003 ± 0.003M/s
        uprobe-ret     :    1.100 ± 0.003M/s
        uprobe-nop5    :    3.132 ± 0.012M/s
        uretprobe-nop  :    2.103 ± 0.002M/s
        uretprobe-push :    2.027 ± 0.004M/s
        uretprobe-ret  :    0.914 ± 0.002M/s
        uretprobe-nop5 :    2.115 ± 0.002M/s

after the change:
        usermode-count :  152.343 ± 0.400M/s
        syscall-count  :   14.851 ± 0.033M/s
        uprobe-nop     :    3.204 ± 0.005M/s
        uprobe-push    :    3.040 ± 0.005M/s
        uprobe-ret     :    1.098 ± 0.003M/s
-->     uprobe-nop5    :    7.286 ± 0.017M/s
        uretprobe-nop  :    2.144 ± 0.001M/s
        uretprobe-push :    2.069 ± 0.002M/s
        uretprobe-ret  :    0.922 ± 0.000M/s
        uretprobe-nop5 :    3.487 ± 0.001M/s

I see bit more speed up on Intel (above) compared to AMD. The big nop5
speed up is partly due to emulating nop5 and partly due to optimization.

The key speed up we do this for is the USDT switch from nop to nop5:
	uprobe-nop     :    3.160 ± 0.005M/s
	uprobe-nop5    :    7.286 ± 0.017M/s


Changes from last rfc:
- change to emulate all nops got in
- rebased on top of tip/perf/core,mm/unstable,bpf-next/master to get latest
  uprobe and bpf changes 
- used guard(rcu_tasks_trace) in handle_syscall_uprobe [Andrii]
- patch#6 change orig argument to is_register, which turned
  out to be less changes


This patchset is adding new syscall, here are notes to check list items
in Documentation/process/adding-syscalls.rst:

- System Call Alternatives
  New syscall seems like the best way in here, because we need
  just to quickly enter kernel with no extra arguments processing,
  which we'd need to do if we decided to use another syscall.

- Designing the API: Planning for Extension
  The uprobe syscall is very specific and most likely won't be
  extended in the future.

- Designing the API: Other Considerations
  N/A because uprobe syscall does not return reference to kernel
  object.

- Proposing the API
  Wiring up of the uprobe system call is in separate change,
  selftests and man page changes are part of the patchset.

- Generic System Call Implementation
  There's no CONFIG option for the new functionality because it
  keeps the same behaviour from the user POV.

- x86 System Call Implementation
  It's 64-bit syscall only.

- Compatibility System Calls (Generic)
  N/A uprobe syscall has no arguments and is not supported
  for compat processes.

- Compatibility System Calls (x86)
  N/A uprobe syscall is not supported for compat processes.

- System Calls Returning Elsewhere
  N/A.

- Other Details
  N/A.

- Testing
  Adding new bpf selftests.

- Man Page
  Attached.

- Do not call System Calls in the Kernel
  N/A

pending todo (or follow ups):
- use PROCMAP_QUERY in tests
- alloc 'struct uprobes_state' for mm_struct only when needed [Andrii]


thanks,
jirka


Cc: Alejandro Colomar <alx@kernel.org>
Cc: Eyal Birger <eyal.birger@gmail.com>
Cc: kees@kernel.org
---
Jiri Olsa (21):
      uprobes: Rename arch_uretprobe_trampoline function
      uprobes: Make copy_from_page global
      uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
      uprobes: Add uprobe_write function
      uprobes: Add nbytes argument to uprobe_write
      uprobes: Add is_register argument to uprobe_write and uprobe_write_opcode
      uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock
      uprobes/x86: Add mapping for optimized uprobe trampolines
      uprobes/x86: Add uprobe syscall to speed up uprobe
      uprobes/x86: Add support to optimize uprobes
      selftests/bpf: Use 5-byte nop for x86 usdt probes
      selftests/bpf: Reorg the uprobe_syscall test function
      selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi
      selftests/bpf: Add uprobe/usdt syscall tests
      selftests/bpf: Add hit/attach/detach race optimized uprobe test
      selftests/bpf: Add uprobe syscall sigill signal test
      selftests/bpf: Add optimized usdt variant for basic usdt test
      selftests/bpf: Add uprobe_regs_equal test
      selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe
      seccomp: passthrough uprobe systemcall without filtering
      selftests/seccomp: validate uprobe syscall passes through seccomp

 arch/arm/probes/uprobes/core.c                              |   2 +-
 arch/x86/entry/syscalls/syscall_64.tbl                      |   1 +
 arch/x86/include/asm/uprobes.h                              |   7 ++
 arch/x86/kernel/uprobes.c                                   | 532 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/syscalls.h                                    |   2 +
 include/linux/uprobes.h                                     |  20 +++-
 kernel/events/uprobes.c                                     | 147 +++++++++++++++++--------
 kernel/fork.c                                               |   1 +
 kernel/seccomp.c                                            |  32 ++++--
 kernel/sys_ni.c                                             |   1 +
 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c     | 478 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 tools/testing/selftests/bpf/prog_tests/usdt.c               |  38 ++++---
 tools/testing/selftests/bpf/progs/uprobe_syscall.c          |   4 +-
 tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c |  41 ++++++-
 tools/testing/selftests/bpf/sdt.h                           |   9 +-
 tools/testing/selftests/bpf/test_kmods/bpf_testmod.c        |  11 +-
 tools/testing/selftests/seccomp/seccomp_bpf.c               | 107 +++++++++++++++----
 17 files changed, 1299 insertions(+), 134 deletions(-)


Jiri Olsa (1):
      man2: Add uprobe syscall page

 man/man2/uprobe.2    | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 man/man2/uretprobe.2 |  2 ++
 2 files changed, 51 insertions(+)
 create mode 100644 man/man2/uprobe.2

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH perf/core 01/22] uprobes: Rename arch_uretprobe_trampoline function
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-21 21:44 ` [PATCH perf/core 02/22] uprobes: Make copy_from_page global Jiri Olsa
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

We are about to add uprobe trampoline, so cleaning up the namespace.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/kernel/uprobes.c | 2 +-
 include/linux/uprobes.h   | 2 +-
 kernel/events/uprobes.c   | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 6d383839e839..77050e5a4680 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -338,7 +338,7 @@ extern u8 uretprobe_trampoline_entry[];
 extern u8 uretprobe_trampoline_end[];
 extern u8 uretprobe_syscall_check[];
 
-void *arch_uprobe_trampoline(unsigned long *psize)
+void *arch_uretprobe_trampoline(unsigned long *psize)
 {
 	static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
 	struct pt_regs *regs = task_pt_regs(current);
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 516217c39094..01112f27cd21 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -224,7 +224,7 @@ extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
 extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 					 void *src, unsigned long len);
 extern void uprobe_handle_trampoline(struct pt_regs *regs);
-extern void *arch_uprobe_trampoline(unsigned long *psize);
+extern void *arch_uretprobe_trampoline(unsigned long *psize);
 extern unsigned long uprobe_get_trampoline_vaddr(void);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 4c965ba77f9f..8415c087a71f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1727,7 +1727,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
 	return ret;
 }
 
-void * __weak arch_uprobe_trampoline(unsigned long *psize)
+void * __weak arch_uretprobe_trampoline(unsigned long *psize)
 {
 	static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
 
@@ -1759,7 +1759,7 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
 	init_waitqueue_head(&area->wq);
 	/* Reserve the 1st slot for get_trampoline_vaddr() */
 	set_bit(0, area->bitmap);
-	insns = arch_uprobe_trampoline(&insns_size);
+	insns = arch_uretprobe_trampoline(&insns_size);
 	arch_uprobe_copy_ixol(area->page, 0, insns, insns_size);
 
 	if (!xol_add_vma(mm, area))
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 02/22] uprobes: Make copy_from_page global
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
  2025-04-21 21:44 ` [PATCH perf/core 01/22] uprobes: Rename arch_uretprobe_trampoline function Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-21 21:44 ` [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode Jiri Olsa
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Making copy_from_page global and adding uprobe prefix.
Adding the uprobe prefix to copy_to_page as well for symmetry.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/uprobes.h |  1 +
 kernel/events/uprobes.c | 10 +++++-----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 01112f27cd21..7447e15559b8 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -226,6 +226,7 @@ extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
 extern void uprobe_handle_trampoline(struct pt_regs *regs);
 extern void *arch_uretprobe_trampoline(unsigned long *psize);
 extern unsigned long uprobe_get_trampoline_vaddr(void);
+extern void uprobe_copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 8415c087a71f..87bca004ee6a 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -177,7 +177,7 @@ bool __weak is_trap_insn(uprobe_opcode_t *insn)
 	return is_swbp_insn(insn);
 }
 
-static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len)
+void uprobe_copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len)
 {
 	void *kaddr = kmap_atomic(page);
 	memcpy(dst, kaddr + (vaddr & ~PAGE_MASK), len);
@@ -205,7 +205,7 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
 	 * is a trap variant; uprobes always wins over any other (gdb)
 	 * breakpoint.
 	 */
-	copy_from_page(page, vaddr, &old_opcode, UPROBE_SWBP_INSN_SIZE);
+	uprobe_copy_from_page(page, vaddr, &old_opcode, UPROBE_SWBP_INSN_SIZE);
 	is_swbp = is_swbp_insn(&old_opcode);
 
 	if (is_swbp_insn(new_opcode)) {
@@ -1052,7 +1052,7 @@ static int __copy_insn(struct address_space *mapping, struct file *filp,
 	if (IS_ERR(page))
 		return PTR_ERR(page);
 
-	copy_from_page(page, offset, insn, nbytes);
+	uprobe_copy_from_page(page, offset, insn, nbytes);
 	put_page(page);
 
 	return 0;
@@ -1398,7 +1398,7 @@ struct uprobe *uprobe_register(struct inode *inode,
 		return ERR_PTR(-EINVAL);
 
 	/*
-	 * This ensures that copy_from_page(), copy_to_page() and
+	 * This ensures that uprobe_copy_from_page(), copy_to_page() and
 	 * __update_ref_ctr() can't cross page boundary.
 	 */
 	if (!IS_ALIGNED(offset, UPROBE_SWBP_INSN_SIZE))
@@ -2394,7 +2394,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
 	if (result < 0)
 		return result;
 
-	copy_from_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
+	uprobe_copy_from_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
 	put_page(page);
  out:
 	/* This needs to return true for any variant of the trap insn */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
  2025-04-21 21:44 ` [PATCH perf/core 01/22] uprobes: Rename arch_uretprobe_trampoline function Jiri Olsa
  2025-04-21 21:44 ` [PATCH perf/core 02/22] uprobes: Make copy_from_page global Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-22 23:48   ` Andrii Nakryiko
  2025-04-27 14:13   ` Oleg Nesterov
  2025-04-21 21:44 ` [PATCH perf/core 04/22] uprobes: Add uprobe_write function Jiri Olsa
                   ` (18 subsequent siblings)
  21 siblings, 2 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

The uprobe_write_opcode function currently updates also refctr offset
if there's one defined for uprobe.

This is not handy for following changes which needs to make several
updates (writes) to install or remove uprobe, but update refctr offset
just once.

Adding set_swbp_refctr/set_orig_refctr which makes sure refctr offset
is updated.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/uprobes.h |  2 +-
 kernel/events/uprobes.c | 62 ++++++++++++++++++++++++-----------------
 2 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 7447e15559b8..d3496f7bc583 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -194,7 +194,7 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma, unsigned long vaddr, uprobe_opcode_t);
+extern int uprobe_write_opcode(struct vm_area_struct *vma, unsigned long vaddr, uprobe_opcode_t opcode);
 extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 87bca004ee6a..8b31340ed1c3 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -486,13 +486,12 @@ static int __uprobe_write_opcode(struct vm_area_struct *vma,
  * Called with mm->mmap_lock held for read or write.
  * Return 0 (success) or a negative errno.
  */
-int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
-		const unsigned long opcode_vaddr, uprobe_opcode_t opcode)
+int uprobe_write_opcode(struct vm_area_struct *vma, const unsigned long opcode_vaddr,
+			uprobe_opcode_t opcode)
 {
 	const unsigned long vaddr = opcode_vaddr & PAGE_MASK;
 	struct mm_struct *mm = vma->vm_mm;
-	struct uprobe *uprobe;
-	int ret, is_register, ref_ctr_updated = 0;
+	int ret, is_register;
 	unsigned int gup_flags = FOLL_FORCE;
 	struct mmu_notifier_range range;
 	struct folio_walk fw;
@@ -500,7 +499,6 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 	struct page *page;
 
 	is_register = is_swbp_insn(&opcode);
-	uprobe = container_of(auprobe, struct uprobe, arch);
 
 	if (WARN_ON_ONCE(!is_cow_mapping(vma->vm_flags)))
 		return -EINVAL;
@@ -528,17 +526,6 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 		goto out;
 	}
 
-	/* We are going to replace instruction, update ref_ctr. */
-	if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
-		ret = update_ref_ctr(uprobe, mm, is_register ? 1 : -1);
-		if (ret) {
-			folio_put(folio);
-			goto out;
-		}
-
-		ref_ctr_updated = 1;
-	}
-
 	ret = 0;
 	if (unlikely(!folio_test_anon(folio))) {
 		VM_WARN_ON_ONCE(is_register);
@@ -580,10 +567,6 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 	}
 
 out:
-	/* Revert back reference counter if instruction update failed. */
-	if (ret < 0 && is_register && ref_ctr_updated)
-		update_ref_ctr(uprobe, mm, -1);
-
 	/* try collapse pmd for compound page */
 	if (ret > 0)
 		collapse_pte_mapped_thp(mm, vaddr, false);
@@ -603,7 +586,27 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 int __weak set_swbp(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 		unsigned long vaddr)
 {
-	return uprobe_write_opcode(auprobe, vma, vaddr, UPROBE_SWBP_INSN);
+	return uprobe_write_opcode(vma, vaddr, UPROBE_SWBP_INSN);
+}
+
+static int set_swbp_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
+{
+	struct mm_struct *mm = vma->vm_mm;
+	int err;
+
+	/* We are going to replace instruction, update ref_ctr. */
+	if (uprobe->ref_ctr_offset) {
+		err = update_ref_ctr(uprobe, mm, 1);
+		if (err)
+			return err;
+	}
+
+	err = set_swbp(&uprobe->arch, vma, vaddr);
+
+	/* Revert back reference counter if instruction update failed. */
+	if (err && uprobe->ref_ctr_offset)
+		update_ref_ctr(uprobe, mm, -1);
+	return err;
 }
 
 /**
@@ -618,8 +621,17 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 int __weak set_orig_insn(struct arch_uprobe *auprobe,
 		struct vm_area_struct *vma, unsigned long vaddr)
 {
-	return uprobe_write_opcode(auprobe, vma, vaddr,
-			*(uprobe_opcode_t *)&auprobe->insn);
+	return uprobe_write_opcode(vma, vaddr, *(uprobe_opcode_t *)&auprobe->insn);
+}
+
+static int set_orig_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
+{
+	int err = set_orig_insn(&uprobe->arch, vma, vaddr);
+
+	/* Revert back reference counter even if instruction update failed. */
+	if (uprobe->ref_ctr_offset)
+		update_ref_ctr(uprobe, vma->vm_mm, -1);
+	return err;
 }
 
 /* uprobe should have guaranteed positive refcount */
@@ -1158,7 +1170,7 @@ static int install_breakpoint(struct uprobe *uprobe, struct vm_area_struct *vma,
 	if (first_uprobe)
 		set_bit(MMF_HAS_UPROBES, &mm->flags);
 
-	ret = set_swbp(&uprobe->arch, vma, vaddr);
+	ret = set_swbp_refctr(uprobe, vma, vaddr);
 	if (!ret)
 		clear_bit(MMF_RECALC_UPROBES, &mm->flags);
 	else if (first_uprobe)
@@ -1173,7 +1185,7 @@ static int remove_breakpoint(struct uprobe *uprobe, struct vm_area_struct *vma,
 	struct mm_struct *mm = vma->vm_mm;
 
 	set_bit(MMF_RECALC_UPROBES, &mm->flags);
-	return set_orig_insn(&uprobe->arch, vma, vaddr);
+	return set_orig_refctr(uprobe, vma, vaddr);
 }
 
 struct map_info {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 04/22] uprobes: Add uprobe_write function
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (2 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-21 21:44 ` [PATCH perf/core 05/22] uprobes: Add nbytes argument to uprobe_write Jiri Olsa
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Adding uprobe_write function that does what uprobe_write_opcode did
so far, but allows to pass verify callback function that checks the
memory location before writing the opcode.

It will be used in following changes to implement specific checking
logic for instruction update.

The uprobe_write_opcode now calls uprobe_write with verify_opcode as
the verify callback.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/uprobes.h |  5 +++++
 kernel/events/uprobes.c | 14 ++++++++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index d3496f7bc583..09fe93816173 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -187,6 +187,9 @@ struct uprobes_state {
 	struct xol_area		*xol_area;
 };
 
+typedef int (*uprobe_write_verify_t)(struct page *page, unsigned long vaddr,
+				     uprobe_opcode_t *opcode);
+
 extern void __init uprobes_init(void);
 extern int set_swbp(struct arch_uprobe *aup, struct vm_area_struct *vma, unsigned long vaddr);
 extern int set_orig_insn(struct arch_uprobe *aup, struct vm_area_struct *vma, unsigned long vaddr);
@@ -195,6 +198,8 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct vm_area_struct *vma, unsigned long vaddr, uprobe_opcode_t opcode);
+extern int uprobe_write(struct vm_area_struct *vma, const unsigned long opcode_vaddr,
+			uprobe_opcode_t opcode, uprobe_write_verify_t verify);
 extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 8b31340ed1c3..3c5dc86bfe65 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -399,7 +399,7 @@ static bool orig_page_is_identical(struct vm_area_struct *vma,
 	return identical;
 }
 
-static int __uprobe_write_opcode(struct vm_area_struct *vma,
+static int __uprobe_write(struct vm_area_struct *vma,
 		struct folio_walk *fw, struct folio *folio,
 		unsigned long opcode_vaddr, uprobe_opcode_t opcode)
 {
@@ -488,6 +488,12 @@ static int __uprobe_write_opcode(struct vm_area_struct *vma,
  */
 int uprobe_write_opcode(struct vm_area_struct *vma, const unsigned long opcode_vaddr,
 			uprobe_opcode_t opcode)
+{
+	return uprobe_write(vma, opcode_vaddr, opcode, verify_opcode);
+}
+
+int uprobe_write(struct vm_area_struct *vma, const unsigned long opcode_vaddr,
+		 uprobe_opcode_t opcode, uprobe_write_verify_t verify)
 {
 	const unsigned long vaddr = opcode_vaddr & PAGE_MASK;
 	struct mm_struct *mm = vma->vm_mm;
@@ -508,7 +514,7 @@ int uprobe_write_opcode(struct vm_area_struct *vma, const unsigned long opcode_v
 	 * page that we can safely modify. Use FOLL_WRITE to trigger a write
 	 * fault if required. When unregistering, we might be lucky and the
 	 * anon page is already gone. So defer write faults until really
-	 * required. Use FOLL_SPLIT_PMD, because __uprobe_write_opcode()
+	 * required. Use FOLL_SPLIT_PMD, because __uprobe_write()
 	 * cannot deal with PMDs yet.
 	 */
 	if (is_register)
@@ -520,7 +526,7 @@ int uprobe_write_opcode(struct vm_area_struct *vma, const unsigned long opcode_v
 		goto out;
 	folio = page_folio(page);
 
-	ret = verify_opcode(page, opcode_vaddr, &opcode);
+	ret = verify(page, opcode_vaddr, &opcode);
 	if (ret <= 0) {
 		folio_put(folio);
 		goto out;
@@ -548,7 +554,7 @@ int uprobe_write_opcode(struct vm_area_struct *vma, const unsigned long opcode_v
 	/* Walk the page tables again, to perform the actual update. */
 	if (folio_walk_start(&fw, vma, vaddr, 0)) {
 		if (fw.page == page)
-			ret = __uprobe_write_opcode(vma, &fw, folio, opcode_vaddr, opcode);
+			ret = __uprobe_write(vma, &fw, folio, opcode_vaddr, opcode);
 		folio_walk_end(&fw, vma);
 	}
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 05/22] uprobes: Add nbytes argument to uprobe_write
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (3 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 04/22] uprobes: Add uprobe_write function Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-22 23:48   ` Andrii Nakryiko
  2025-04-21 21:44 ` [PATCH perf/core 06/22] uprobes: Add is_register argument to uprobe_write and uprobe_write_opcode Jiri Olsa
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Adding nbytes argument to uprobe_write and related functions as
preparation for writing whole instructions in following changes.

Also renaming opcode arguments to insn, which seems to fit better.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/uprobes.h |  6 +++---
 kernel/events/uprobes.c | 27 ++++++++++++++-------------
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 09fe93816173..b86a2f0475a4 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -188,7 +188,7 @@ struct uprobes_state {
 };
 
 typedef int (*uprobe_write_verify_t)(struct page *page, unsigned long vaddr,
-				     uprobe_opcode_t *opcode);
+				     uprobe_opcode_t *insn, int nbytes);
 
 extern void __init uprobes_init(void);
 extern int set_swbp(struct arch_uprobe *aup, struct vm_area_struct *vma, unsigned long vaddr);
@@ -198,8 +198,8 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct vm_area_struct *vma, unsigned long vaddr, uprobe_opcode_t opcode);
-extern int uprobe_write(struct vm_area_struct *vma, const unsigned long opcode_vaddr,
-			uprobe_opcode_t opcode, uprobe_write_verify_t verify);
+extern int uprobe_write(struct vm_area_struct *vma, const unsigned long insn_vaddr,
+			uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify);
 extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 3c5dc86bfe65..6dc7f0b2756d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -191,7 +191,8 @@ static void copy_to_page(struct page *page, unsigned long vaddr, const void *src
 	kunmap_atomic(kaddr);
 }
 
-static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode)
+static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *insn,
+			 int nbytes)
 {
 	uprobe_opcode_t old_opcode;
 	bool is_swbp;
@@ -208,7 +209,7 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
 	uprobe_copy_from_page(page, vaddr, &old_opcode, UPROBE_SWBP_INSN_SIZE);
 	is_swbp = is_swbp_insn(&old_opcode);
 
-	if (is_swbp_insn(new_opcode)) {
+	if (is_swbp_insn(insn)) {
 		if (is_swbp)		/* register: already installed? */
 			return 0;
 	} else {
@@ -401,10 +402,10 @@ static bool orig_page_is_identical(struct vm_area_struct *vma,
 
 static int __uprobe_write(struct vm_area_struct *vma,
 		struct folio_walk *fw, struct folio *folio,
-		unsigned long opcode_vaddr, uprobe_opcode_t opcode)
+		unsigned long insn_vaddr, uprobe_opcode_t *insn, int nbytes)
 {
-	const unsigned long vaddr = opcode_vaddr & PAGE_MASK;
-	const bool is_register = !!is_swbp_insn(&opcode);
+	const unsigned long vaddr = insn_vaddr & PAGE_MASK;
+	const bool is_register = !!is_swbp_insn(insn);
 	bool pmd_mappable;
 
 	/* For now, we'll only handle PTE-mapped folios. */
@@ -429,7 +430,7 @@ static int __uprobe_write(struct vm_area_struct *vma,
 	 */
 	flush_cache_page(vma, vaddr, pte_pfn(fw->pte));
 	fw->pte = ptep_clear_flush(vma, vaddr, fw->ptep);
-	copy_to_page(fw->page, opcode_vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
+	copy_to_page(fw->page, insn_vaddr, insn, nbytes);
 
 	/*
 	 * When unregistering, we may only zap a PTE if uffd is disabled and
@@ -489,13 +490,13 @@ static int __uprobe_write(struct vm_area_struct *vma,
 int uprobe_write_opcode(struct vm_area_struct *vma, const unsigned long opcode_vaddr,
 			uprobe_opcode_t opcode)
 {
-	return uprobe_write(vma, opcode_vaddr, opcode, verify_opcode);
+	return uprobe_write(vma, opcode_vaddr, &opcode, UPROBE_SWBP_INSN_SIZE, verify_opcode);
 }
 
-int uprobe_write(struct vm_area_struct *vma, const unsigned long opcode_vaddr,
-		 uprobe_opcode_t opcode, uprobe_write_verify_t verify)
+int uprobe_write(struct vm_area_struct *vma, const unsigned long insn_vaddr,
+		 uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify)
 {
-	const unsigned long vaddr = opcode_vaddr & PAGE_MASK;
+	const unsigned long vaddr = insn_vaddr & PAGE_MASK;
 	struct mm_struct *mm = vma->vm_mm;
 	int ret, is_register;
 	unsigned int gup_flags = FOLL_FORCE;
@@ -504,7 +505,7 @@ int uprobe_write(struct vm_area_struct *vma, const unsigned long opcode_vaddr,
 	struct folio *folio;
 	struct page *page;
 
-	is_register = is_swbp_insn(&opcode);
+	is_register = is_swbp_insn(insn);
 
 	if (WARN_ON_ONCE(!is_cow_mapping(vma->vm_flags)))
 		return -EINVAL;
@@ -526,7 +527,7 @@ int uprobe_write(struct vm_area_struct *vma, const unsigned long opcode_vaddr,
 		goto out;
 	folio = page_folio(page);
 
-	ret = verify(page, opcode_vaddr, &opcode);
+	ret = verify(page, insn_vaddr, insn, nbytes);
 	if (ret <= 0) {
 		folio_put(folio);
 		goto out;
@@ -554,7 +555,7 @@ int uprobe_write(struct vm_area_struct *vma, const unsigned long opcode_vaddr,
 	/* Walk the page tables again, to perform the actual update. */
 	if (folio_walk_start(&fw, vma, vaddr, 0)) {
 		if (fw.page == page)
-			ret = __uprobe_write(vma, &fw, folio, opcode_vaddr, opcode);
+			ret = __uprobe_write(vma, &fw, folio, insn_vaddr, insn, nbytes);
 		folio_walk_end(&fw, vma);
 	}
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 06/22] uprobes: Add is_register argument to uprobe_write and uprobe_write_opcode
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (4 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 05/22] uprobes: Add nbytes argument to uprobe_write Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-22 23:48   ` Andrii Nakryiko
  2025-04-21 21:44 ` [PATCH perf/core 07/22] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock Jiri Olsa
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

The uprobe_write has special path to restore the original page when we
write original instruction back. This happens when uprobe_write detects
that we want to write anything else but breakpoint instruction.

Moving the detection away and passing it to uprobe_write as argument,
so it's possible to write different instructions (other than just
breakpoint and rest).

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/arm/probes/uprobes/core.c |  2 +-
 include/linux/uprobes.h        |  5 +++--
 kernel/events/uprobes.c        | 22 +++++++++++-----------
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index 885e0c5e8c20..3d96fb41d624 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -30,7 +30,7 @@ int set_swbp(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 	     unsigned long vaddr)
 {
 	return uprobe_write_opcode(auprobe, vma, vaddr,
-		   __opcode_to_mem_arm(auprobe->bpinsn));
+		   __opcode_to_mem_arm(auprobe->bpinsn), true);
 }
 
 bool arch_uprobe_ignore(struct arch_uprobe *auprobe, struct pt_regs *regs)
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index b86a2f0475a4..6af61e977bfb 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -197,9 +197,10 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
 extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct vm_area_struct *vma, unsigned long vaddr, uprobe_opcode_t opcode);
+extern int uprobe_write_opcode(struct vm_area_struct *vma, unsigned long vaddr,
+			       uprobe_opcode_t opcode, bool is_register);
 extern int uprobe_write(struct vm_area_struct *vma, const unsigned long insn_vaddr,
-			uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify);
+			uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify, bool is_register);
 extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 6dc7f0b2756d..c8d88060dfbf 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -402,10 +402,10 @@ static bool orig_page_is_identical(struct vm_area_struct *vma,
 
 static int __uprobe_write(struct vm_area_struct *vma,
 		struct folio_walk *fw, struct folio *folio,
-		unsigned long insn_vaddr, uprobe_opcode_t *insn, int nbytes)
+		unsigned long insn_vaddr, uprobe_opcode_t *insn, int nbytes,
+		bool is_register)
 {
 	const unsigned long vaddr = insn_vaddr & PAGE_MASK;
-	const bool is_register = !!is_swbp_insn(insn);
 	bool pmd_mappable;
 
 	/* For now, we'll only handle PTE-mapped folios. */
@@ -488,25 +488,25 @@ static int __uprobe_write(struct vm_area_struct *vma,
  * Return 0 (success) or a negative errno.
  */
 int uprobe_write_opcode(struct vm_area_struct *vma, const unsigned long opcode_vaddr,
-			uprobe_opcode_t opcode)
+			uprobe_opcode_t opcode, bool is_register)
 {
-	return uprobe_write(vma, opcode_vaddr, &opcode, UPROBE_SWBP_INSN_SIZE, verify_opcode);
+	return uprobe_write(vma, opcode_vaddr, &opcode, UPROBE_SWBP_INSN_SIZE,
+			    verify_opcode, is_register);
 }
 
 int uprobe_write(struct vm_area_struct *vma, const unsigned long insn_vaddr,
-		 uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify)
+		 uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify,
+		 bool is_register)
 {
 	const unsigned long vaddr = insn_vaddr & PAGE_MASK;
 	struct mm_struct *mm = vma->vm_mm;
-	int ret, is_register;
+	int ret;
 	unsigned int gup_flags = FOLL_FORCE;
 	struct mmu_notifier_range range;
 	struct folio_walk fw;
 	struct folio *folio;
 	struct page *page;
 
-	is_register = is_swbp_insn(insn);
-
 	if (WARN_ON_ONCE(!is_cow_mapping(vma->vm_flags)))
 		return -EINVAL;
 
@@ -555,7 +555,7 @@ int uprobe_write(struct vm_area_struct *vma, const unsigned long insn_vaddr,
 	/* Walk the page tables again, to perform the actual update. */
 	if (folio_walk_start(&fw, vma, vaddr, 0)) {
 		if (fw.page == page)
-			ret = __uprobe_write(vma, &fw, folio, insn_vaddr, insn, nbytes);
+			ret = __uprobe_write(vma, &fw, folio, insn_vaddr, insn, nbytes, is_register);
 		folio_walk_end(&fw, vma);
 	}
 
@@ -593,7 +593,7 @@ int uprobe_write(struct vm_area_struct *vma, const unsigned long insn_vaddr,
 int __weak set_swbp(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 		unsigned long vaddr)
 {
-	return uprobe_write_opcode(vma, vaddr, UPROBE_SWBP_INSN);
+	return uprobe_write_opcode(vma, vaddr, UPROBE_SWBP_INSN, true);
 }
 
 static int set_swbp_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
@@ -628,7 +628,7 @@ static int set_swbp_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, un
 int __weak set_orig_insn(struct arch_uprobe *auprobe,
 		struct vm_area_struct *vma, unsigned long vaddr)
 {
-	return uprobe_write_opcode(vma, vaddr, *(uprobe_opcode_t *)&auprobe->insn);
+	return uprobe_write_opcode(vma, vaddr, *(uprobe_opcode_t *)&auprobe->insn, false);
 }
 
 static int set_orig_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 07/22] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (5 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 06/22] uprobes: Add is_register argument to uprobe_write and uprobe_write_opcode Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-22 23:48   ` Andrii Nakryiko
  2025-04-27 14:24   ` Oleg Nesterov
  2025-04-21 21:44 ` [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines Jiri Olsa
                   ` (14 subsequent siblings)
  21 siblings, 2 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Currently unapply_uprobe takes mmap_read_lock, but it might call
remove_breakpoint which eventually changes user pages.

Current code writes either breakpoint or original instruction, so
it can probably go away with that, but with the upcoming change that
writes multiple instructions on the probed address we need to ensure
that any update to mm's pages is exclusive.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/uprobes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c8d88060dfbf..d256c695d7ff 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1483,7 +1483,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 	struct vm_area_struct *vma;
 	int err = 0;
 
-	mmap_read_lock(mm);
+	mmap_write_lock(mm);
 	for_each_vma(vmi, vma) {
 		unsigned long vaddr;
 		loff_t offset;
@@ -1500,7 +1500,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 		vaddr = offset_to_vaddr(vma, uprobe->offset);
 		err |= remove_breakpoint(uprobe, vma, vaddr);
 	}
-	mmap_read_unlock(mm);
+	mmap_write_unlock(mm);
 
 	return err;
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (6 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 07/22] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-22 23:51   ` Andrii Nakryiko
                     ` (2 more replies)
  2025-04-21 21:44 ` [PATCH perf/core 09/22] uprobes/x86: Add uprobe syscall to speed up uprobe Jiri Olsa
                   ` (13 subsequent siblings)
  21 siblings, 3 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Adding support to add special mapping for for user space trampoline
with following functions:

  uprobe_trampoline_get - find or add uprobe_trampoline
  uprobe_trampoline_put - remove or destroy uprobe_trampoline

The user space trampoline is exported as arch specific user space special
mapping through tramp_mapping, which is initialized in following changes
with new uprobe syscall.

The uprobe trampoline needs to be callable/reachable from the probed address,
so while searching for available address we use is_reachable_by_call function
to decide if the uprobe trampoline is callable from the probe address.

All uprobe_trampoline objects are stored in uprobes_state object and are
cleaned up when the process mm_struct goes down. Adding new arch hooks
for that, because this change is x86_64 specific.

Locking is provided by callers in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/kernel/uprobes.c | 131 ++++++++++++++++++++++++++++++++++++++
 include/linux/uprobes.h   |   6 ++
 kernel/events/uprobes.c   |  10 +++
 kernel/fork.c             |   1 +
 4 files changed, 148 insertions(+)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 77050e5a4680..023c55d52138 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -608,6 +608,137 @@ static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 		*sr = utask->autask.saved_scratch_register;
 	}
 }
+
+static int tramp_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma)
+{
+	return -EPERM;
+}
+
+static struct page *tramp_mapping_pages[2] __ro_after_init;
+
+static struct vm_special_mapping tramp_mapping = {
+	.name   = "[uprobes-trampoline]",
+	.mremap = tramp_mremap,
+	.pages  = tramp_mapping_pages,
+};
+
+struct uprobe_trampoline {
+	struct hlist_node	node;
+	unsigned long		vaddr;
+	atomic64_t		ref;
+};
+
+static bool is_reachable_by_call(unsigned long vtramp, unsigned long vaddr)
+{
+	long delta = (long)(vaddr + 5 - vtramp);
+
+	return delta >= INT_MIN && delta <= INT_MAX;
+}
+
+static unsigned long find_nearest_page(unsigned long vaddr)
+{
+	struct vm_area_struct *vma, *prev = NULL;
+	unsigned long prev_vm_end = PAGE_SIZE;
+	VMA_ITERATOR(vmi, current->mm, 0);
+
+	vma = vma_next(&vmi);
+	while (vma) {
+		if (prev)
+			prev_vm_end = prev->vm_end;
+		if (vma->vm_start - prev_vm_end  >= PAGE_SIZE) {
+			if (is_reachable_by_call(prev_vm_end, vaddr))
+				return prev_vm_end;
+			if (is_reachable_by_call(vma->vm_start - PAGE_SIZE, vaddr))
+				return vma->vm_start - PAGE_SIZE;
+		}
+		prev = vma;
+		vma = vma_next(&vmi);
+	}
+
+	return 0;
+}
+
+static struct uprobe_trampoline *create_uprobe_trampoline(unsigned long vaddr)
+{
+	struct pt_regs *regs = task_pt_regs(current);
+	struct mm_struct *mm = current->mm;
+	struct uprobe_trampoline *tramp;
+	struct vm_area_struct *vma;
+
+	if (!user_64bit_mode(regs))
+		return NULL;
+
+	vaddr = find_nearest_page(vaddr);
+	if (!vaddr)
+		return NULL;
+
+	tramp = kzalloc(sizeof(*tramp), GFP_KERNEL);
+	if (unlikely(!tramp))
+		return NULL;
+
+	atomic64_set(&tramp->ref, 1);
+	tramp->vaddr = vaddr;
+
+	vma = _install_special_mapping(mm, tramp->vaddr, PAGE_SIZE,
+				VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_DONTCOPY|VM_IO,
+				&tramp_mapping);
+	if (IS_ERR(vma))
+		goto free_area;
+	return tramp;
+
+free_area:
+	kfree(tramp);
+	return NULL;
+}
+
+__maybe_unused
+static struct uprobe_trampoline *uprobe_trampoline_get(unsigned long vaddr)
+{
+	struct uprobes_state *state = &current->mm->uprobes_state;
+	struct uprobe_trampoline *tramp = NULL;
+
+	hlist_for_each_entry(tramp, &state->head_tramps, node) {
+		if (is_reachable_by_call(tramp->vaddr, vaddr)) {
+			atomic64_inc(&tramp->ref);
+			return tramp;
+		}
+	}
+
+	tramp = create_uprobe_trampoline(vaddr);
+	if (!tramp)
+		return NULL;
+
+	hlist_add_head(&tramp->node, &state->head_tramps);
+	return tramp;
+}
+
+static void destroy_uprobe_trampoline(struct uprobe_trampoline *tramp)
+{
+	hlist_del(&tramp->node);
+	kfree(tramp);
+}
+
+__maybe_unused
+static void uprobe_trampoline_put(struct uprobe_trampoline *tramp)
+{
+	if (tramp && atomic64_dec_and_test(&tramp->ref))
+		destroy_uprobe_trampoline(tramp);
+}
+
+void arch_uprobe_init_state(struct mm_struct *mm)
+{
+	INIT_HLIST_HEAD(&mm->uprobes_state.head_tramps);
+}
+
+void arch_uprobe_clear_state(struct mm_struct *mm)
+{
+	struct uprobes_state *state = &mm->uprobes_state;
+	struct uprobe_trampoline *tramp;
+	struct hlist_node *n;
+
+	hlist_for_each_entry_safe(tramp, n, &state->head_tramps, node)
+		destroy_uprobe_trampoline(tramp);
+}
 #else /* 32-bit: */
 /*
  * No RIP-relative addressing on 32-bit
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 6af61e977bfb..bc532d086813 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -17,6 +17,7 @@
 #include <linux/wait.h>
 #include <linux/timer.h>
 #include <linux/seqlock.h>
+#include <linux/mutex.h>
 
 struct uprobe;
 struct vm_area_struct;
@@ -185,6 +186,9 @@ struct xol_area;
 
 struct uprobes_state {
 	struct xol_area		*xol_area;
+#ifdef CONFIG_X86_64
+	struct hlist_head	head_tramps;
+#endif
 };
 
 typedef int (*uprobe_write_verify_t)(struct page *page, unsigned long vaddr,
@@ -233,6 +237,8 @@ extern void uprobe_handle_trampoline(struct pt_regs *regs);
 extern void *arch_uretprobe_trampoline(unsigned long *psize);
 extern unsigned long uprobe_get_trampoline_vaddr(void);
 extern void uprobe_copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len);
+extern void arch_uprobe_clear_state(struct mm_struct *mm);
+extern void arch_uprobe_init_state(struct mm_struct *mm);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d256c695d7ff..a3107f63f295 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1812,6 +1812,14 @@ static struct xol_area *get_xol_area(void)
 	return area;
 }
 
+void __weak arch_uprobe_clear_state(struct mm_struct *mm)
+{
+}
+
+void __weak arch_uprobe_init_state(struct mm_struct *mm)
+{
+}
+
 /*
  * uprobe_clear_state - Free the area allocated for slots.
  */
@@ -1823,6 +1831,8 @@ void uprobe_clear_state(struct mm_struct *mm)
 	delayed_uprobe_remove(NULL, mm);
 	mutex_unlock(&delayed_uprobe_lock);
 
+	arch_uprobe_clear_state(mm);
+
 	if (!area)
 		return;
 
diff --git a/kernel/fork.c b/kernel/fork.c
index c4b26cd8998b..4c2df3816728 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1269,6 +1269,7 @@ static void mm_init_uprobes_state(struct mm_struct *mm)
 {
 #ifdef CONFIG_UPROBES
 	mm->uprobes_state.xol_area = NULL;
+	arch_uprobe_init_state(mm);
 #endif
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 09/22] uprobes/x86: Add uprobe syscall to speed up uprobe
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (7 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-22 23:48   ` Andrii Nakryiko
  2025-04-27 15:51   ` Oleg Nesterov
  2025-04-21 21:44 ` [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes Jiri Olsa
                   ` (12 subsequent siblings)
  21 siblings, 2 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Adding new uprobe syscall that calls uprobe handlers for given
'breakpoint' address.

The idea is that the 'breakpoint' address calls the user space
trampoline which executes the uprobe syscall.

The syscall handler reads the return address of the initial call
to retrieve the original 'breakpoint' address. With this address
we find the related uprobe object and call its consumers.

Adding the arch_uprobe_trampoline_mapping function that provides
uprobe trampoline mapping. This mapping is backed with one global
page initialized at __init time and shared by the all the mapping
instances.

We do not allow to execute uprobe syscall if the caller is not
from uprobe trampoline mapping.

The uprobe syscall ensures the consumer (bpf program) sees registers
values in the state before the trampoline was called.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 arch/x86/kernel/uprobes.c              | 122 +++++++++++++++++++++++++
 include/linux/syscalls.h               |   2 +
 include/linux/uprobes.h                |   1 +
 kernel/events/uprobes.c                |  17 ++++
 kernel/sys_ni.c                        |   1 +
 6 files changed, 144 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index cfb5ca41e30d..9fd1291e7bdf 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -345,6 +345,7 @@
 333	common	io_pgetevents		sys_io_pgetevents
 334	common	rseq			sys_rseq
 335	common	uretprobe		sys_uretprobe
+336	common	uprobe			sys_uprobe
 # don't use numbers 387 through 423, add new calls after the last
 # 'common' entry
 424	common	pidfd_send_signal	sys_pidfd_send_signal
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 023c55d52138..01b3035e01ea 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -739,6 +739,128 @@ void arch_uprobe_clear_state(struct mm_struct *mm)
 	hlist_for_each_entry_safe(tramp, n, &state->head_tramps, node)
 		destroy_uprobe_trampoline(tramp);
 }
+
+static bool __in_uprobe_trampoline(unsigned long ip)
+{
+	struct vm_area_struct *vma = vma_lookup(current->mm, ip);
+
+	return vma && vma_is_special_mapping(vma, &tramp_mapping);
+}
+
+static bool in_uprobe_trampoline(unsigned long ip)
+{
+	struct mm_struct *mm = current->mm;
+	bool found, retry = true;
+	unsigned int seq;
+
+	rcu_read_lock();
+	if (mmap_lock_speculate_try_begin(mm, &seq)) {
+		found = __in_uprobe_trampoline(ip);
+		retry = mmap_lock_speculate_retry(mm, seq);
+	}
+	rcu_read_unlock();
+
+	if (retry) {
+		mmap_read_lock(mm);
+		found = __in_uprobe_trampoline(ip);
+		mmap_read_unlock(mm);
+	}
+	return found;
+}
+
+SYSCALL_DEFINE0(uprobe)
+{
+	struct pt_regs *regs = task_pt_regs(current);
+	unsigned long ip, sp, ax_r11_cx_ip[4];
+	int err;
+
+	/* Allow execution only from uprobe trampolines. */
+	if (!in_uprobe_trampoline(regs->ip))
+		goto sigill;
+
+	err = copy_from_user(ax_r11_cx_ip, (void __user *)regs->sp, sizeof(ax_r11_cx_ip));
+	if (err)
+		goto sigill;
+
+	ip = regs->ip;
+
+	/*
+	 * expose the "right" values of ax/r11/cx/ip/sp to uprobe_consumer/s, plus:
+	 * - adjust ip to the probe address, call saved next instruction address
+	 * - adjust sp to the probe's stack frame (check trampoline code)
+	 */
+	regs->ax  = ax_r11_cx_ip[0];
+	regs->r11 = ax_r11_cx_ip[1];
+	regs->cx  = ax_r11_cx_ip[2];
+	regs->ip  = ax_r11_cx_ip[3] - 5;
+	regs->sp += sizeof(ax_r11_cx_ip);
+	regs->orig_ax = -1;
+
+	sp = regs->sp;
+
+	handle_syscall_uprobe(regs, regs->ip);
+
+	/*
+	 * Some of the uprobe consumers has changed sp, we can do nothing,
+	 * just return via iret.
+	 */
+	if (regs->sp != sp)
+		return regs->ax;
+
+	regs->sp -= sizeof(ax_r11_cx_ip);
+
+	/* for the case uprobe_consumer has changed ax/r11/cx */
+	ax_r11_cx_ip[0] = regs->ax;
+	ax_r11_cx_ip[1] = regs->r11;
+	ax_r11_cx_ip[2] = regs->cx;
+
+	/* keep return address unless we are instructed otherwise */
+	if (ax_r11_cx_ip[3] - 5 != regs->ip)
+		ax_r11_cx_ip[3] = regs->ip;
+
+	regs->ip = ip;
+
+	err = copy_to_user((void __user *)regs->sp, ax_r11_cx_ip, sizeof(ax_r11_cx_ip));
+	if (err)
+		goto sigill;
+
+	/* ensure sysret, see do_syscall_64() */
+	regs->r11 = regs->flags;
+	regs->cx  = regs->ip;
+	return 0;
+
+sigill:
+	force_sig(SIGILL);
+	return -1;
+}
+
+asm (
+	".pushsection .rodata\n"
+	".balign " __stringify(PAGE_SIZE) "\n"
+	"uprobe_trampoline_entry:\n"
+	"push %rcx\n"
+	"push %r11\n"
+	"push %rax\n"
+	"movq $" __stringify(__NR_uprobe) ", %rax\n"
+	"syscall\n"
+	"pop %rax\n"
+	"pop %r11\n"
+	"pop %rcx\n"
+	"ret\n"
+	".balign " __stringify(PAGE_SIZE) "\n"
+	".popsection\n"
+);
+
+extern u8 uprobe_trampoline_entry[];
+
+static int __init arch_uprobes_init(void)
+{
+	tramp_mapping_pages[0] = virt_to_page(uprobe_trampoline_entry);
+	return 0;
+}
+
+late_initcall(arch_uprobes_init);
+
 #else /* 32-bit: */
 /*
  * No RIP-relative addressing on 32-bit
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e5603cc91963..b0cc60f1c458 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -998,6 +998,8 @@ asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int on);
 
 asmlinkage long sys_uretprobe(void);
 
+asmlinkage long sys_uprobe(void);
+
 /* pciconfig: alpha, arm, arm64, ia64, sparc */
 asmlinkage long sys_pciconfig_read(unsigned long bus, unsigned long dfn,
 				unsigned long off, unsigned long len,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bc532d086813..bbe218ff16cc 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -239,6 +239,7 @@ extern unsigned long uprobe_get_trampoline_vaddr(void);
 extern void uprobe_copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len);
 extern void arch_uprobe_clear_state(struct mm_struct *mm);
 extern void arch_uprobe_init_state(struct mm_struct *mm);
+extern void handle_syscall_uprobe(struct pt_regs *regs, unsigned long bp_vaddr);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index a3107f63f295..97a7b9f0c7ca 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -2782,6 +2782,23 @@ static void handle_swbp(struct pt_regs *regs)
 	rcu_read_unlock_trace();
 }
 
+void handle_syscall_uprobe(struct pt_regs *regs, unsigned long bp_vaddr)
+{
+	struct uprobe *uprobe;
+	int is_swbp;
+
+	guard(rcu_tasks_trace)();
+
+	uprobe = find_active_uprobe_rcu(bp_vaddr, &is_swbp);
+	if (!uprobe)
+		return;
+	if (!get_utask())
+		return;
+	if (arch_uprobe_ignore(&uprobe->arch, regs))
+		return;
+	handler_chain(uprobe, regs);
+}
+
 /*
  * Perform required fix-ups and disable singlestep.
  * Allow pending signals to take effect.
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index c00a86931f8c..bf5d05c635ff 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -392,3 +392,4 @@ COND_SYSCALL(setuid16);
 COND_SYSCALL(rseq);
 
 COND_SYSCALL(uretprobe);
+COND_SYSCALL(uprobe);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (8 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 09/22] uprobes/x86: Add uprobe syscall to speed up uprobe Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-23  0:04   ` Andrii Nakryiko
  2025-04-27 17:11   ` Oleg Nesterov
  2025-04-21 21:44 ` [PATCH perf/core 11/22] selftests/bpf: Use 5-byte nop for x86 usdt probes Jiri Olsa
                   ` (11 subsequent siblings)
  21 siblings, 2 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Putting together all the previously added pieces to support optimized
uprobes on top of 5-byte nop instruction.

The current uprobe execution goes through following:

  - installs breakpoint instruction over original instruction
  - exception handler hit and calls related uprobe consumers
  - and either simulates original instruction or does out of line single step
    execution of it
  - returns to user space

The optimized uprobe path does following:

  - checks the original instruction is 5-byte nop (plus other checks)
  - adds (or uses existing) user space trampoline with uprobe syscall
  - overwrites original instruction (5-byte nop) with call to user space
    trampoline
  - the user space trampoline executes uprobe syscall that calls related uprobe
    consumers
  - trampoline returns back to next instruction

This approach won't speed up all uprobes as it's limited to using nop5 as
original instruction, but we plan to use nop5 as USDT probe instruction
(which currently uses single byte nop) and speed up the USDT probes.

The arch_uprobe_optimize triggers the uprobe optimization and is called after
first uprobe hit. I originally had it called on uprobe installation but then
it clashed with elf loader, because the user space trampoline was added in a
place where loader might need to put elf segments, so I decided to do it after
first uprobe hit when loading is done.

The uprobe is un-optimized in arch specific set_orig_insn call.

The instruction overwrite is x86 arch specific and needs to go through 3 updates:
(on top of nop5 instruction)

  - write int3 into 1st byte
  - write last 4 bytes of the call instruction
  - update the call instruction opcode

And cleanup goes though similar reverse stages:

  - overwrite call opcode with breakpoint (int3)
  - write last 4 bytes of the nop5 instruction
  - write the nop5 first instruction byte

We do not unmap and release uprobe trampoline when it's no longer needed,
because there's no easy way to make sure none of the threads is still
inside the trampoline. But we do not waste memory, because there's just
single page for all the uprobe trampoline mappings.

We do waste frame on page mapping for every 4GB by keeping the uprobe
trampoline page mapped, but that seems ok.

We take the benefit from the fact that set_swbp and set_orig_insn are
called under mmap_write_lock(mm), so we can use the current instruction
as the state the uprobe is in - nop5/breakpoint/call trampoline -
and decide the needed action (optimize/un-optimize) based on that.

Attaching the speed up from benchs/run_bench_uprobes.sh script:

current:
        usermode-count :  152.604 ± 0.044M/s
        syscall-count  :   13.359 ± 0.042M/s
-->     uprobe-nop     :    3.229 ± 0.002M/s
        uprobe-push    :    3.086 ± 0.004M/s
        uprobe-ret     :    1.114 ± 0.004M/s
        uprobe-nop5    :    1.121 ± 0.005M/s
        uretprobe-nop  :    2.145 ± 0.002M/s
        uretprobe-push :    2.070 ± 0.001M/s
        uretprobe-ret  :    0.931 ± 0.001M/s
        uretprobe-nop5 :    0.957 ± 0.001M/s

after the change:
        usermode-count :  152.448 ± 0.244M/s
        syscall-count  :   14.321 ± 0.059M/s
        uprobe-nop     :    3.148 ± 0.007M/s
        uprobe-push    :    2.976 ± 0.004M/s
        uprobe-ret     :    1.068 ± 0.003M/s
-->     uprobe-nop5    :    7.038 ± 0.007M/s
        uretprobe-nop  :    2.109 ± 0.004M/s
        uretprobe-push :    2.035 ± 0.001M/s
        uretprobe-ret  :    0.908 ± 0.001M/s
        uretprobe-nop5 :    3.377 ± 0.009M/s

I see bit more speed up on Intel (above) compared to AMD. The big nop5
speed up is partly due to emulating nop5 and partly due to optimization.

The key speed up we do this for is the USDT switch from nop to nop5:
        uprobe-nop     :    3.148 ± 0.007M/s
        uprobe-nop5    :    7.038 ± 0.007M/s

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 arch/x86/include/asm/uprobes.h |   7 +
 arch/x86/kernel/uprobes.c      | 281 ++++++++++++++++++++++++++++++++-
 include/linux/uprobes.h        |   6 +-
 kernel/events/uprobes.c        |  15 +-
 4 files changed, 301 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 678fb546f0a7..1ee2e5115955 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -20,6 +20,11 @@ typedef u8 uprobe_opcode_t;
 #define UPROBE_SWBP_INSN		0xcc
 #define UPROBE_SWBP_INSN_SIZE		   1
 
+enum {
+	ARCH_UPROBE_FLAG_CAN_OPTIMIZE   = 0,
+	ARCH_UPROBE_FLAG_OPTIMIZE_FAIL  = 1,
+};
+
 struct uprobe_xol_ops;
 
 struct arch_uprobe {
@@ -45,6 +50,8 @@ struct arch_uprobe {
 			u8	ilen;
 		}			push;
 	};
+
+	unsigned long flags;
 };
 
 struct arch_uprobe_task {
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 01b3035e01ea..d5ef04a1626d 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -18,6 +18,7 @@
 #include <asm/processor.h>
 #include <asm/insn.h>
 #include <asm/mmu_context.h>
+#include <asm/nops.h>
 
 /* Post-execution fixups. */
 
@@ -691,7 +692,6 @@ static struct uprobe_trampoline *create_uprobe_trampoline(unsigned long vaddr)
 	return NULL;
 }
 
-__maybe_unused
 static struct uprobe_trampoline *uprobe_trampoline_get(unsigned long vaddr)
 {
 	struct uprobes_state *state = &current->mm->uprobes_state;
@@ -718,7 +718,6 @@ static void destroy_uprobe_trampoline(struct uprobe_trampoline *tramp)
 	kfree(tramp);
 }
 
-__maybe_unused
 static void uprobe_trampoline_put(struct uprobe_trampoline *tramp)
 {
 	if (tramp && atomic64_dec_and_test(&tramp->ref))
@@ -861,6 +860,277 @@ static int __init arch_uprobes_init(void)
 
 late_initcall(arch_uprobes_init);
 
+enum {
+	OPT_PART,
+	OPT_INSN,
+	UNOPT_INT3,
+	UNOPT_PART,
+};
+
+struct write_opcode_ctx {
+	unsigned long base;
+	int update;
+};
+
+static int is_call_insn(uprobe_opcode_t *insn)
+{
+	return *insn == CALL_INSN_OPCODE;
+}
+
+static int verify_insn(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode,
+		       int nbytes, void *data)
+{
+	struct write_opcode_ctx *ctx = data;
+	uprobe_opcode_t old_opcode[5];
+
+	uprobe_copy_from_page(page, ctx->base, (uprobe_opcode_t *) &old_opcode, 5);
+
+	switch (ctx->update) {
+	case OPT_PART:
+	case OPT_INSN:
+		if (is_swbp_insn(&old_opcode[0]))
+			return 1;
+		break;
+	case UNOPT_INT3:
+		if (is_call_insn(&old_opcode[0]))
+			return 1;
+		break;
+	case UNOPT_PART:
+		if (is_swbp_insn(&old_opcode[0]))
+			return 1;
+		break;
+	}
+
+	return -1;
+}
+
+static int write_insn(struct vm_area_struct *vma, unsigned long vaddr,
+		      uprobe_opcode_t *insn, int nbytes, void *ctx)
+{
+	return uprobe_write(vma, vaddr, insn, nbytes, verify_insn, true, ctx);
+}
+
+static void relative_call(void *dest, long from, long to)
+{
+	struct __packed __arch_relative_insn {
+		u8 op;
+		s32 raddr;
+	} *insn;
+
+	insn = (struct __arch_relative_insn *)dest;
+	insn->raddr = (s32)(to - (from + 5));
+	insn->op = CALL_INSN_OPCODE;
+}
+
+static int swbp_optimize(struct vm_area_struct *vma, unsigned long vaddr, unsigned long tramp)
+{
+	struct write_opcode_ctx ctx = {
+		.base = vaddr,
+	};
+	char call[5];
+	int err;
+
+	relative_call(call, vaddr, tramp);
+
+	/*
+	 * We are in state where breakpoint (int3) is installed on top of first
+	 * byte of the nop5 instruction. We will do following steps to overwrite
+	 * this to call instruction:
+	 *
+	 * - sync cores
+	 * - write last 4 bytes of the call instruction
+	 * - sync cores
+	 * - update the call instruction opcode
+	 */
+
+	text_poke_sync();
+
+	ctx.update = OPT_PART;
+	err = write_insn(vma, vaddr + 1, call + 1, 4, &ctx);
+	if (err)
+		return err;
+
+	text_poke_sync();
+
+	ctx.update = OPT_INSN;
+	return write_insn(vma, vaddr, call, 1, &ctx);
+}
+
+static int swbp_unoptimize(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
+			   unsigned long vaddr)
+{
+	uprobe_opcode_t int3 = UPROBE_SWBP_INSN;
+	struct write_opcode_ctx ctx = {
+		.base = vaddr,
+	};
+	int err;
+
+	/*
+	 * We need to overwrite call instruction into nop5 instruction with
+	 * breakpoint (int3) installed on top of its first byte. We will:
+	 *
+	 * - overwrite call opcode with breakpoint (int3)
+	 * - sync cores
+	 * - write last 4 bytes of the nop5 instruction
+	 * - sync cores
+	 */
+
+	ctx.update = UNOPT_INT3;
+	err = write_insn(vma, vaddr, &int3, 1, &ctx);
+	if (err)
+		return err;
+
+	text_poke_sync();
+
+	ctx.update = UNOPT_PART;
+	err = write_insn(vma, vaddr + 1, (uprobe_opcode_t *) auprobe->insn + 1, 4, &ctx);
+
+	text_poke_sync();
+	return err;
+}
+
+static int copy_from_vaddr(struct mm_struct *mm, unsigned long vaddr, void *dst, int len)
+{
+	unsigned int gup_flags = FOLL_FORCE|FOLL_SPLIT_PMD;
+	struct vm_area_struct *vma;
+	struct page *page;
+
+	page = get_user_page_vma_remote(mm, vaddr, gup_flags, &vma);
+	if (IS_ERR(page))
+		return PTR_ERR(page);
+	uprobe_copy_from_page(page, vaddr, dst, len);
+	put_page(page);
+	return 0;
+}
+
+static bool __is_optimized(uprobe_opcode_t *insn, unsigned long vaddr)
+{
+	struct __packed __arch_relative_insn {
+		u8 op;
+		s32 raddr;
+	} *call = (struct __arch_relative_insn *) insn;
+
+	if (!is_call_insn(insn))
+		return false;
+	return __in_uprobe_trampoline(vaddr + 5 + call->raddr);
+}
+
+static int is_optimized(struct mm_struct *mm, unsigned long vaddr, bool *optimized)
+{
+	uprobe_opcode_t insn[5];
+	int err;
+
+	err = copy_from_vaddr(mm, vaddr, &insn, 5);
+	if (err)
+		return err;
+	*optimized = __is_optimized((uprobe_opcode_t *)&insn, vaddr);
+	return 0;
+}
+
+static bool should_optimize(struct arch_uprobe *auprobe)
+{
+	return !test_bit(ARCH_UPROBE_FLAG_OPTIMIZE_FAIL, &auprobe->flags) &&
+		test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags);
+}
+
+int set_swbp(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
+	     unsigned long vaddr)
+{
+	if (should_optimize(auprobe)) {
+		bool optimized = false;
+		int err;
+
+		/*
+		 * We could race with another thread that already optimized the probe,
+		 * so let's not overwrite it with int3 again in this case.
+		 */
+		err = is_optimized(vma->vm_mm, vaddr, &optimized);
+		if (err || optimized)
+			return err;
+	}
+	return uprobe_write_opcode(vma, vaddr, UPROBE_SWBP_INSN, true);
+}
+
+int set_orig_insn(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
+		  unsigned long vaddr)
+{
+	if (test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags)) {
+		struct mm_struct *mm = vma->vm_mm;
+		bool optimized = false;
+		int err;
+
+		err = is_optimized(mm, vaddr, &optimized);
+		if (err)
+			return err;
+		if (optimized)
+			WARN_ON_ONCE(swbp_unoptimize(auprobe, vma, vaddr));
+	}
+	return uprobe_write_opcode(vma, vaddr, *(uprobe_opcode_t *)&auprobe->insn, false);
+}
+
+static int __arch_uprobe_optimize(struct mm_struct *mm, unsigned long vaddr)
+{
+	struct uprobe_trampoline *tramp;
+	struct vm_area_struct *vma;
+	int err = 0;
+
+	vma = find_vma(mm, vaddr);
+	if (!vma)
+		return -1;
+	tramp = uprobe_trampoline_get(vaddr);
+	if (!tramp)
+		return -1;
+	err = swbp_optimize(vma, vaddr, tramp->vaddr);
+	if (WARN_ON_ONCE(err))
+		uprobe_trampoline_put(tramp);
+	return err;
+}
+
+void arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
+{
+	struct mm_struct *mm = current->mm;
+	uprobe_opcode_t insn[5];
+
+	/*
+	 * Do not optimize if shadow stack is enabled, the return address hijack
+	 * code in arch_uretprobe_hijack_return_addr updates wrong frame when
+	 * the entry uprobe is optimized and the shadow stack crashes the app.
+	 */
+	if (shstk_is_enabled())
+		return;
+
+	if (!should_optimize(auprobe))
+		return;
+
+	mmap_write_lock(mm);
+
+	/*
+	 * Check if some other thread already optimized the uprobe for us,
+	 * if it's the case just go away silently.
+	 */
+	if (copy_from_vaddr(mm, vaddr, &insn, 5))
+		goto unlock;
+	if (!is_swbp_insn((uprobe_opcode_t*) &insn))
+		goto unlock;
+
+	/*
+	 * If we fail to optimize the uprobe we set the fail bit so the
+	 * above should_optimize will fail from now on.
+	 */
+	if (__arch_uprobe_optimize(mm, vaddr))
+		set_bit(ARCH_UPROBE_FLAG_OPTIMIZE_FAIL, &auprobe->flags);
+
+unlock:
+	mmap_write_unlock(mm);
+}
+
+static bool can_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
+{
+	if (memcmp(&auprobe->insn, x86_nops[5], 5))
+		return false;
+	/* We can't do cross page atomic writes yet. */
+	return PAGE_SIZE - (vaddr & ~PAGE_MASK) >= 5;
+}
 #else /* 32-bit: */
 /*
  * No RIP-relative addressing on 32-bit
@@ -874,6 +1144,10 @@ static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
 {
 }
+static bool can_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
+{
+	return false;
+}
 #endif /* CONFIG_X86_64 */
 
 struct uprobe_xol_ops {
@@ -1240,6 +1514,9 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
 	if (ret)
 		return ret;
 
+	if (can_optimize(auprobe, addr))
+		set_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags);
+
 	ret = branch_setup_xol_ops(auprobe, &insn);
 	if (ret != -ENOSYS)
 		return ret;
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index bbe218ff16cc..d4c1fed9a9e4 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -192,7 +192,7 @@ struct uprobes_state {
 };
 
 typedef int (*uprobe_write_verify_t)(struct page *page, unsigned long vaddr,
-				     uprobe_opcode_t *insn, int nbytes);
+				     uprobe_opcode_t *insn, int nbytes, void *data);
 
 extern void __init uprobes_init(void);
 extern int set_swbp(struct arch_uprobe *aup, struct vm_area_struct *vma, unsigned long vaddr);
@@ -204,7 +204,8 @@ extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct vm_area_struct *vma, unsigned long vaddr,
 			       uprobe_opcode_t opcode, bool is_register);
 extern int uprobe_write(struct vm_area_struct *vma, const unsigned long insn_vaddr,
-			uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify, bool is_register);
+			uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify, bool is_register,
+			void *data);
 extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc);
@@ -240,6 +241,7 @@ extern void uprobe_copy_from_page(struct page *page, unsigned long vaddr, void *
 extern void arch_uprobe_clear_state(struct mm_struct *mm);
 extern void arch_uprobe_init_state(struct mm_struct *mm);
 extern void handle_syscall_uprobe(struct pt_regs *regs, unsigned long bp_vaddr);
+extern void arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr);
 #else /* !CONFIG_UPROBES */
 struct uprobes_state {
 };
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 97a7b9f0c7ca..408a134c1a7b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -192,7 +192,7 @@ static void copy_to_page(struct page *page, unsigned long vaddr, const void *src
 }
 
 static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *insn,
-			 int nbytes)
+			 int nbytes, void *data)
 {
 	uprobe_opcode_t old_opcode;
 	bool is_swbp;
@@ -491,12 +491,12 @@ int uprobe_write_opcode(struct vm_area_struct *vma, const unsigned long opcode_v
 			uprobe_opcode_t opcode, bool is_register)
 {
 	return uprobe_write(vma, opcode_vaddr, &opcode, UPROBE_SWBP_INSN_SIZE,
-			    verify_opcode, is_register);
+			    verify_opcode, is_register, NULL);
 }
 
 int uprobe_write(struct vm_area_struct *vma, const unsigned long insn_vaddr,
 		 uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify,
-		 bool is_register)
+		 bool is_register, void *data)
 {
 	const unsigned long vaddr = insn_vaddr & PAGE_MASK;
 	struct mm_struct *mm = vma->vm_mm;
@@ -527,7 +527,7 @@ int uprobe_write(struct vm_area_struct *vma, const unsigned long insn_vaddr,
 		goto out;
 	folio = page_folio(page);
 
-	ret = verify(page, insn_vaddr, insn, nbytes);
+	ret = verify(page, insn_vaddr, insn, nbytes, data);
 	if (ret <= 0) {
 		folio_put(folio);
 		goto out;
@@ -2707,6 +2707,10 @@ bool __weak arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check c
 	return true;
 }
 
+void __weak arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
+{
+}
+
 /*
  * Run handler and ask thread to singlestep.
  * Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
@@ -2771,6 +2775,9 @@ static void handle_swbp(struct pt_regs *regs)
 
 	handler_chain(uprobe, regs);
 
+	/* Try to optimize after first hit. */
+	arch_uprobe_optimize(&uprobe->arch, bp_vaddr);
+
 	if (arch_uprobe_skip_sstep(&uprobe->arch, regs))
 		goto out;
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 11/22] selftests/bpf: Use 5-byte nop for x86 usdt probes
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (9 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-23 17:33   ` Andrii Nakryiko
  2025-04-21 21:44 ` [PATCH perf/core 12/22] selftests/bpf: Reorg the uprobe_syscall test function Jiri Olsa
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Using 5-byte nop for x86 usdt probes so we can switch
to optimized uprobe them.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/sdt.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/sdt.h b/tools/testing/selftests/bpf/sdt.h
index 1fcfa5160231..1d62c06f5ddc 100644
--- a/tools/testing/selftests/bpf/sdt.h
+++ b/tools/testing/selftests/bpf/sdt.h
@@ -236,6 +236,13 @@ __extension__ extern unsigned long long __sdt_unsp;
 #define _SDT_NOP	nop
 #endif
 
+/* Use 5 byte nop for x86_64 to allow optimizing uprobes. */
+#if defined(__x86_64__)
+# define _SDT_DEF_NOP _SDT_ASM_5(990:	.byte 0x0f, 0x1f, 0x44, 0x00, 0x00)
+#else
+# define _SDT_DEF_NOP _SDT_ASM_1(990:	_SDT_NOP)
+#endif
+
 #define _SDT_NOTE_NAME	"stapsdt"
 #define _SDT_NOTE_TYPE	3
 
@@ -288,7 +295,7 @@ __extension__ extern unsigned long long __sdt_unsp;
 
 #define _SDT_ASM_BODY(provider, name, pack_args, args, ...)		      \
   _SDT_DEF_MACROS							      \
-  _SDT_ASM_1(990:	_SDT_NOP)					      \
+  _SDT_DEF_NOP								      \
   _SDT_ASM_3(		.pushsection .note.stapsdt,_SDT_ASM_AUTOGROUP,"note") \
   _SDT_ASM_1(		.balign 4)					      \
   _SDT_ASM_3(		.4byte 992f-991f, 994f-993f, _SDT_NOTE_TYPE)	      \
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 12/22] selftests/bpf: Reorg the uprobe_syscall test function
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (10 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 11/22] selftests/bpf: Use 5-byte nop for x86 usdt probes Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-23 17:34   ` Andrii Nakryiko
  2025-04-21 21:44 ` [PATCH perf/core 13/22] selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi Jiri Olsa
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Adding __test_uprobe_syscall with non x86_64 stub to execute all the tests,
so we don't need to keep adding non x86_64 stub functions for new tests.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 34 +++++++------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index c397336fe1ed..2b00f16406c8 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -350,29 +350,8 @@ static void test_uretprobe_shadow_stack(void)
 
 	ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
 }
-#else
-static void test_uretprobe_regs_equal(void)
-{
-	test__skip();
-}
-
-static void test_uretprobe_regs_change(void)
-{
-	test__skip();
-}
-
-static void test_uretprobe_syscall_call(void)
-{
-	test__skip();
-}
 
-static void test_uretprobe_shadow_stack(void)
-{
-	test__skip();
-}
-#endif
-
-void test_uprobe_syscall(void)
+static void __test_uprobe_syscall(void)
 {
 	if (test__start_subtest("uretprobe_regs_equal"))
 		test_uretprobe_regs_equal();
@@ -383,3 +362,14 @@ void test_uprobe_syscall(void)
 	if (test__start_subtest("uretprobe_shadow_stack"))
 		test_uretprobe_shadow_stack();
 }
+#else
+static void __test_uprobe_syscall(void)
+{
+	test__skip();
+}
+#endif
+
+void test_uprobe_syscall(void)
+{
+	__test_uprobe_syscall();
+}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 13/22] selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (11 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 12/22] selftests/bpf: Reorg the uprobe_syscall test function Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-23 17:36   ` Andrii Nakryiko
  2025-04-21 21:44 ` [PATCH perf/core 14/22] selftests/bpf: Add uprobe/usdt syscall tests Jiri Olsa
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Renaming uprobe_syscall_executed prog to test_uretprobe_multi
to fit properly in the following changes that add more programs.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c   | 8 ++++----
 .../testing/selftests/bpf/progs/uprobe_syscall_executed.c | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 2b00f16406c8..3c74a079e6d9 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -277,10 +277,10 @@ static void test_uretprobe_syscall_call(void)
 		_exit(0);
 	}
 
-	skel->links.test = bpf_program__attach_uprobe_multi(skel->progs.test, pid,
-							    "/proc/self/exe",
-							    "uretprobe_syscall_call", &opts);
-	if (!ASSERT_OK_PTR(skel->links.test, "bpf_program__attach_uprobe_multi"))
+	skel->links.test_uretprobe_multi = bpf_program__attach_uprobe_multi(skel->progs.test_uretprobe_multi,
+							pid, "/proc/self/exe",
+							"uretprobe_syscall_call", &opts);
+	if (!ASSERT_OK_PTR(skel->links.test_uretprobe_multi, "bpf_program__attach_uprobe_multi"))
 		goto cleanup;
 
 	/* kick the child */
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
index 0d7f1a7db2e2..2e1b689ed4fb 100644
--- a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
@@ -10,8 +10,8 @@ char _license[] SEC("license") = "GPL";
 int executed = 0;
 
 SEC("uretprobe.multi")
-int test(struct pt_regs *regs)
+int test_uretprobe_multi(struct pt_regs *ctx)
 {
-	executed = 1;
+	executed++;
 	return 0;
 }
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 14/22] selftests/bpf: Add uprobe/usdt syscall tests
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (12 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 13/22] selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-23 17:40   ` Andrii Nakryiko
  2025-04-21 21:44 ` [PATCH perf/core 15/22] selftests/bpf: Add hit/attach/detach race optimized uprobe test Jiri Olsa
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Adding tests for optimized uprobe/usdt probes.

Checking that we get expected trampoline and attached bpf programs
get executed properly.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 278 +++++++++++++++++-
 .../bpf/progs/uprobe_syscall_executed.c       |  37 +++
 2 files changed, 314 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 3c74a079e6d9..16effe0bca1d 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -14,6 +14,9 @@
 #include <asm/prctl.h>
 #include "uprobe_syscall.skel.h"
 #include "uprobe_syscall_executed.skel.h"
+#include "sdt.h"
+
+#pragma GCC diagnostic ignored "-Wattributes"
 
 __naked unsigned long uretprobe_regs_trigger(void)
 {
@@ -301,6 +304,262 @@ static void test_uretprobe_syscall_call(void)
 	close(go[0]);
 }
 
+#define TRAMP "[uprobes-trampoline]"
+
+__attribute__((aligned(16)))
+__nocf_check __weak __naked void uprobe_test(void)
+{
+	asm volatile ("					\n"
+		".byte 0x0f, 0x1f, 0x44, 0x00, 0x00	\n"
+		"ret					\n"
+	);
+}
+
+__attribute__((aligned(16)))
+__nocf_check __weak void usdt_test(void)
+{
+	STAP_PROBE(optimized_uprobe, usdt);
+}
+
+static int find_uprobes_trampoline(void **start, void **end)
+{
+	char line[128];
+	int ret = -1;
+	FILE *maps;
+
+	maps = fopen("/proc/self/maps", "r");
+	if (!maps) {
+		fprintf(stderr, "cannot open maps\n");
+		return -1;
+	}
+
+	while (fgets(line, sizeof(line), maps)) {
+		int m = -1;
+
+		/* We care only about private r-x mappings. */
+		if (sscanf(line, "%p-%p r-xp %*x %*x:%*x %*u %n", start, end, &m) != 2)
+			continue;
+		if (m < 0)
+			continue;
+		if (!strncmp(&line[m], TRAMP, sizeof(TRAMP)-1)) {
+			ret = 0;
+			break;
+		}
+	}
+
+	fclose(maps);
+	return ret;
+}
+
+static unsigned char nop5[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
+
+static void *find_nop5(void *fn)
+{
+	int i;
+
+	for (i = 0; i < 10; i++) {
+		if (!memcmp(nop5, fn + i, 5))
+			return fn + i;
+	}
+	return NULL;
+}
+
+typedef void (__attribute__((nocf_check)) *trigger_t)(void);
+
+static bool shstk_is_enabled;
+
+static void check_attach(struct uprobe_syscall_executed *skel, trigger_t trigger,
+			 void *addr, int executed)
+{
+	void *tramp_start, *tramp_end;
+	struct __arch_relative_insn {
+		u8 op;
+		s32 raddr;
+	} __packed *call;
+	s32 delta;
+	u8 *bp;
+
+	/* Uprobe gets optimized after first trigger, so let's press twice. */
+	trigger();
+	trigger();
+
+	if (!shstk_is_enabled &&
+	    !ASSERT_OK(find_uprobes_trampoline(&tramp_start, &tramp_end), "uprobes_trampoline"))
+		return;
+
+	/* Make sure bpf program got executed.. */
+	ASSERT_EQ(skel->bss->executed, executed, "executed");
+
+	if (shstk_is_enabled) {
+		/* .. and check optimization is disabled under shadow stack. */
+		bp = (u8 *) addr;
+		ASSERT_EQ(*bp, 0xcc, "int3");
+	} else {
+		/* .. and check the trampoline is as expected. */
+		call = (struct __arch_relative_insn *) addr;
+		delta = (unsigned long) tramp_start - ((unsigned long) addr + 5);
+
+		ASSERT_EQ(call->op, 0xe8, "call");
+		ASSERT_EQ(call->raddr, delta, "delta");
+		ASSERT_EQ(tramp_end - tramp_start, 4096, "size");
+	}
+}
+
+static void check_detach(struct uprobe_syscall_executed *skel, trigger_t trigger, void *addr)
+{
+	void *tramp_start, *tramp_end;
+
+	/* [uprobes_trampoline] stays after detach */
+	ASSERT_OK(!shstk_is_enabled &&
+		  find_uprobes_trampoline(&tramp_start, &tramp_end), "uprobes_trampoline");
+	ASSERT_OK(memcmp(addr, nop5, 5), "nop5");
+}
+
+static void check(struct uprobe_syscall_executed *skel, struct bpf_link *link,
+		  trigger_t trigger, void *addr, int executed)
+{
+	check_attach(skel, trigger, addr, executed);
+	bpf_link__destroy(link);
+	check_detach(skel, trigger, addr);
+}
+
+static void test_uprobe_legacy(void)
+{
+	struct uprobe_syscall_executed *skel = NULL;
+	LIBBPF_OPTS(bpf_uprobe_opts, opts,
+		.retprobe = true,
+	);
+	struct bpf_link *link;
+	unsigned long offset;
+
+	offset = get_uprobe_offset(&uprobe_test);
+	if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
+		goto cleanup;
+
+	/* uprobe */
+	skel = uprobe_syscall_executed__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+		return;
+
+	link = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
+				0, "/proc/self/exe", offset, NULL);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_opts"))
+		goto cleanup;
+
+	check(skel, link, uprobe_test, uprobe_test, 2);
+
+	/* uretprobe */
+	skel->bss->executed = 0;
+
+	link = bpf_program__attach_uprobe_opts(skel->progs.test_uretprobe,
+				0, "/proc/self/exe", offset, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_opts"))
+		goto cleanup;
+
+	check(skel, link, uprobe_test, uprobe_test, 2);
+
+cleanup:
+	uprobe_syscall_executed__destroy(skel);
+}
+
+static void test_uprobe_multi(void)
+{
+	struct uprobe_syscall_executed *skel = NULL;
+	LIBBPF_OPTS(bpf_uprobe_multi_opts, opts);
+	struct bpf_link *link;
+	unsigned long offset;
+
+	offset = get_uprobe_offset(&uprobe_test);
+	if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
+		goto cleanup;
+
+	opts.offsets = &offset;
+	opts.cnt = 1;
+
+	skel = uprobe_syscall_executed__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+		return;
+
+	/* uprobe.multi */
+	link = bpf_program__attach_uprobe_multi(skel->progs.test_uprobe_multi,
+				0, "/proc/self/exe", NULL, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
+		goto cleanup;
+
+	check(skel, link, uprobe_test, uprobe_test, 2);
+
+	/* uretprobe.multi */
+	skel->bss->executed = 0;
+	opts.retprobe = true;
+	link = bpf_program__attach_uprobe_multi(skel->progs.test_uretprobe_multi,
+				0, "/proc/self/exe", NULL, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
+		goto cleanup;
+
+	check(skel, link, uprobe_test, uprobe_test, 2);
+
+cleanup:
+	uprobe_syscall_executed__destroy(skel);
+}
+
+static void test_uprobe_session(void)
+{
+	struct uprobe_syscall_executed *skel = NULL;
+	LIBBPF_OPTS(bpf_uprobe_multi_opts, opts,
+		.session = true,
+	);
+	struct bpf_link *link;
+	unsigned long offset;
+
+	offset = get_uprobe_offset(&uprobe_test);
+	if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
+		goto cleanup;
+
+	opts.offsets = &offset;
+	opts.cnt = 1;
+
+	skel = uprobe_syscall_executed__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+		return;
+
+	link = bpf_program__attach_uprobe_multi(skel->progs.test_uprobe_session,
+				0, "/proc/self/exe", NULL, &opts);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
+		goto cleanup;
+
+	check(skel, link, uprobe_test, uprobe_test, 4);
+
+cleanup:
+	uprobe_syscall_executed__destroy(skel);
+}
+
+static void test_uprobe_usdt(void)
+{
+	struct uprobe_syscall_executed *skel;
+	struct bpf_link *link;
+	void *addr;
+
+	errno = 0;
+	addr = find_nop5(usdt_test);
+	if (!ASSERT_OK_PTR(addr, "find_nop5"))
+		return;
+
+	skel = uprobe_syscall_executed__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+		return;
+
+	link = bpf_program__attach_usdt(skel->progs.test_usdt,
+				-1 /* all PIDs */, "/proc/self/exe",
+				"optimized_uprobe", "usdt", NULL);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_usdt"))
+		goto cleanup;
+
+	check(skel, link, usdt_test, addr, 2);
+
+cleanup:
+	uprobe_syscall_executed__destroy(skel);
+}
+
 /*
  * Borrowed from tools/testing/selftests/x86/test_shadow_stack.c.
  *
@@ -343,11 +602,20 @@ static void test_uretprobe_shadow_stack(void)
 		return;
 	}
 
-	/* Run all of the uretprobe tests. */
+	/* Run all the tests with shadow stack in place. */
+	shstk_is_enabled = true;
+
 	test_uretprobe_regs_equal();
 	test_uretprobe_regs_change();
 	test_uretprobe_syscall_call();
 
+	test_uprobe_legacy();
+	test_uprobe_multi();
+	test_uprobe_session();
+	test_uprobe_usdt();
+
+	shstk_is_enabled = false;
+
 	ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
 }
 
@@ -361,6 +629,14 @@ static void __test_uprobe_syscall(void)
 		test_uretprobe_syscall_call();
 	if (test__start_subtest("uretprobe_shadow_stack"))
 		test_uretprobe_shadow_stack();
+	if (test__start_subtest("uprobe_legacy"))
+		test_uprobe_legacy();
+	if (test__start_subtest("uprobe_multi"))
+		test_uprobe_multi();
+	if (test__start_subtest("uprobe_session"))
+		test_uprobe_session();
+	if (test__start_subtest("uprobe_usdt"))
+		test_uprobe_usdt();
 }
 #else
 static void __test_uprobe_syscall(void)
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
index 2e1b689ed4fb..7bb4338c3ee2 100644
--- a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
@@ -1,6 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "vmlinux.h"
 #include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/usdt.bpf.h>
 #include <string.h>
 
 struct pt_regs regs;
@@ -9,9 +11,44 @@ char _license[] SEC("license") = "GPL";
 
 int executed = 0;
 
+SEC("uprobe")
+int BPF_UPROBE(test_uprobe)
+{
+	executed++;
+	return 0;
+}
+
+SEC("uretprobe")
+int BPF_URETPROBE(test_uretprobe)
+{
+	executed++;
+	return 0;
+}
+
+SEC("uprobe.multi")
+int test_uprobe_multi(struct pt_regs *ctx)
+{
+	executed++;
+	return 0;
+}
+
 SEC("uretprobe.multi")
 int test_uretprobe_multi(struct pt_regs *ctx)
 {
 	executed++;
 	return 0;
 }
+
+SEC("uprobe.session")
+int test_uprobe_session(struct pt_regs *ctx)
+{
+	executed++;
+	return 0;
+}
+
+SEC("usdt")
+int test_usdt(struct pt_regs *ctx)
+{
+	executed++;
+	return 0;
+}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 15/22] selftests/bpf: Add hit/attach/detach race optimized uprobe test
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (13 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 14/22] selftests/bpf: Add uprobe/usdt syscall tests Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-23 17:42   ` Andrii Nakryiko
  2025-04-21 21:44 ` [PATCH perf/core 16/22] selftests/bpf: Add uprobe syscall sigill signal test Jiri Olsa
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Adding test that makes sure parallel execution of the uprobe and
attach/detach of optimized uprobe on it works properly.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 74 +++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 16effe0bca1d..57ef1207c3f5 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -619,6 +619,78 @@ static void test_uretprobe_shadow_stack(void)
 	ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
 }
 
+static volatile bool race_stop;
+
+static void *worker_trigger(void *arg)
+{
+	unsigned long rounds = 0;
+
+	while (!race_stop) {
+		uprobe_test();
+		rounds++;
+	}
+
+	printf("tid %d trigger rounds: %lu\n", gettid(), rounds);
+	return NULL;
+}
+
+static void *worker_attach(void *arg)
+{
+	struct uprobe_syscall_executed *skel;
+	unsigned long rounds = 0, offset;
+
+	offset = get_uprobe_offset(&uprobe_test);
+	if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
+		return NULL;
+
+	skel = uprobe_syscall_executed__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+		return NULL;
+
+	while (!race_stop) {
+		skel->links.test_uprobe = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
+					0, "/proc/self/exe", offset, NULL);
+		if (!ASSERT_OK_PTR(skel->links.test_uprobe, "bpf_program__attach_uprobe_opts"))
+			break;
+
+		bpf_link__destroy(skel->links.test_uprobe);
+		skel->links.test_uprobe = NULL;
+		rounds++;
+	}
+
+	printf("tid %d attach rounds: %lu hits: %d\n", gettid(), rounds, skel->bss->executed);
+	uprobe_syscall_executed__destroy(skel);
+	return NULL;
+}
+
+static void test_uprobe_race(void)
+{
+	int err, i, nr_threads;
+	pthread_t *threads;
+
+	nr_threads = libbpf_num_possible_cpus();
+	if (!ASSERT_GE(nr_threads, 0, "libbpf_num_possible_cpus"))
+		return;
+
+	threads = malloc(sizeof(*threads) * nr_threads);
+	if (!ASSERT_OK_PTR(threads, "malloc"))
+		return;
+
+	for (i = 0; i < nr_threads; i++) {
+		err = pthread_create(&threads[i], NULL, i % 2 ? worker_trigger : worker_attach,
+				     NULL);
+		if (!ASSERT_OK(err, "pthread_create"))
+			goto cleanup;
+	}
+
+	sleep(4);
+
+cleanup:
+	race_stop = true;
+	for (nr_threads = i, i = 0; i < nr_threads; i++)
+		pthread_join(threads[i], NULL);
+}
+
 static void __test_uprobe_syscall(void)
 {
 	if (test__start_subtest("uretprobe_regs_equal"))
@@ -637,6 +709,8 @@ static void __test_uprobe_syscall(void)
 		test_uprobe_session();
 	if (test__start_subtest("uprobe_usdt"))
 		test_uprobe_usdt();
+	if (test__start_subtest("uprobe_race"))
+		test_uprobe_race();
 }
 #else
 static void __test_uprobe_syscall(void)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 16/22] selftests/bpf: Add uprobe syscall sigill signal test
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (14 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 15/22] selftests/bpf: Add hit/attach/detach race optimized uprobe test Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-21 21:44 ` [PATCH perf/core 17/22] selftests/bpf: Add optimized usdt variant for basic usdt test Jiri Olsa
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Make sure that calling uprobe syscall from outside uprobe trampoline
results in sigill signal.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 36 +++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 57ef1207c3f5..f001986981ab 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -691,6 +691,40 @@ static void test_uprobe_race(void)
 		pthread_join(threads[i], NULL);
 }
 
+#ifndef __NR_uprobe
+#define __NR_uprobe 336
+#endif
+
+static void test_uprobe_sigill(void)
+{
+	int status, err, pid;
+
+	pid = fork();
+	if (!ASSERT_GE(pid, 0, "fork"))
+		return;
+	/* child */
+	if (pid == 0) {
+		asm volatile (
+			"pushq %rax\n"
+			"pushq %rcx\n"
+			"pushq %r11\n"
+			"movq $" __stringify(__NR_uprobe) ", %rax\n"
+			"syscall\n"
+			"popq %r11\n"
+			"popq %rcx\n"
+			"retq\n"
+		);
+		exit(0);
+	}
+
+	err = waitpid(pid, &status, 0);
+	ASSERT_EQ(err, pid, "waitpid");
+
+	/* verify the child got killed with SIGILL */
+	ASSERT_EQ(WIFSIGNALED(status), 1, "WIFSIGNALED");
+	ASSERT_EQ(WTERMSIG(status), SIGILL, "WTERMSIG");
+}
+
 static void __test_uprobe_syscall(void)
 {
 	if (test__start_subtest("uretprobe_regs_equal"))
@@ -711,6 +745,8 @@ static void __test_uprobe_syscall(void)
 		test_uprobe_usdt();
 	if (test__start_subtest("uprobe_race"))
 		test_uprobe_race();
+	if (test__start_subtest("uprobe_sigill"))
+		test_uprobe_sigill();
 }
 #else
 static void __test_uprobe_syscall(void)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 17/22] selftests/bpf: Add optimized usdt variant for basic usdt test
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (15 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 16/22] selftests/bpf: Add uprobe syscall sigill signal test Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-23 17:44   ` Andrii Nakryiko
  2025-04-21 21:44 ` [PATCH perf/core 18/22] selftests/bpf: Add uprobe_regs_equal test Jiri Olsa
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Adding optimized usdt variant for basic usdt test to check that
usdt arguments are properly passed in optimized code path.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/bpf/prog_tests/usdt.c | 38 ++++++++++++-------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
index 495d66414b57..3a5b5230bfa0 100644
--- a/tools/testing/selftests/bpf/prog_tests/usdt.c
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -40,12 +40,19 @@ static void __always_inline trigger_func(int x) {
 	}
 }
 
-static void subtest_basic_usdt(void)
+static void subtest_basic_usdt(bool optimized)
 {
 	LIBBPF_OPTS(bpf_usdt_opts, opts);
 	struct test_usdt *skel;
 	struct test_usdt__bss *bss;
-	int err, i;
+	int err, i, called;
+
+#define TRIGGER(x) ({			\
+	trigger_func(x);		\
+	if (optimized)			\
+		trigger_func(x);	\
+	optimized ? 2 : 1;		\
+	})
 
 	skel = test_usdt__open_and_load();
 	if (!ASSERT_OK_PTR(skel, "skel_open"))
@@ -66,11 +73,11 @@ static void subtest_basic_usdt(void)
 	if (!ASSERT_OK_PTR(skel->links.usdt0, "usdt0_link"))
 		goto cleanup;
 
-	trigger_func(1);
+	called = TRIGGER(1);
 
-	ASSERT_EQ(bss->usdt0_called, 1, "usdt0_called");
-	ASSERT_EQ(bss->usdt3_called, 1, "usdt3_called");
-	ASSERT_EQ(bss->usdt12_called, 1, "usdt12_called");
+	ASSERT_EQ(bss->usdt0_called, called, "usdt0_called");
+	ASSERT_EQ(bss->usdt3_called, called, "usdt3_called");
+	ASSERT_EQ(bss->usdt12_called, called, "usdt12_called");
 
 	ASSERT_EQ(bss->usdt0_cookie, 0xcafedeadbeeffeed, "usdt0_cookie");
 	ASSERT_EQ(bss->usdt0_arg_cnt, 0, "usdt0_arg_cnt");
@@ -119,11 +126,11 @@ static void subtest_basic_usdt(void)
 	 * bpf_program__attach_usdt() handles this properly and attaches to
 	 * all possible places of USDT invocation.
 	 */
-	trigger_func(2);
+	called += TRIGGER(2);
 
-	ASSERT_EQ(bss->usdt0_called, 2, "usdt0_called");
-	ASSERT_EQ(bss->usdt3_called, 2, "usdt3_called");
-	ASSERT_EQ(bss->usdt12_called, 2, "usdt12_called");
+	ASSERT_EQ(bss->usdt0_called, called, "usdt0_called");
+	ASSERT_EQ(bss->usdt3_called, called, "usdt3_called");
+	ASSERT_EQ(bss->usdt12_called, called, "usdt12_called");
 
 	/* only check values that depend on trigger_func()'s input value */
 	ASSERT_EQ(bss->usdt3_args[0], 2, "usdt3_arg1");
@@ -142,9 +149,9 @@ static void subtest_basic_usdt(void)
 	if (!ASSERT_OK_PTR(skel->links.usdt3, "usdt3_reattach"))
 		goto cleanup;
 
-	trigger_func(3);
+	called += TRIGGER(3);
 
-	ASSERT_EQ(bss->usdt3_called, 3, "usdt3_called");
+	ASSERT_EQ(bss->usdt3_called, called, "usdt3_called");
 	/* this time usdt3 has custom cookie */
 	ASSERT_EQ(bss->usdt3_cookie, 0xBADC00C51E, "usdt3_cookie");
 	ASSERT_EQ(bss->usdt3_arg_cnt, 3, "usdt3_arg_cnt");
@@ -158,6 +165,7 @@ static void subtest_basic_usdt(void)
 
 cleanup:
 	test_usdt__destroy(skel);
+#undef TRIGGER
 }
 
 unsigned short test_usdt_100_semaphore SEC(".probes");
@@ -419,7 +427,11 @@ static void subtest_urandom_usdt(bool auto_attach)
 void test_usdt(void)
 {
 	if (test__start_subtest("basic"))
-		subtest_basic_usdt();
+		subtest_basic_usdt(false);
+#ifdef __x86_64__
+	if (test__start_subtest("basic_optimized"))
+		subtest_basic_usdt(true);
+#endif
 	if (test__start_subtest("multispec"))
 		subtest_multispec_usdt();
 	if (test__start_subtest("urand_auto_attach"))
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 18/22] selftests/bpf: Add uprobe_regs_equal test
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (16 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 17/22] selftests/bpf: Add optimized usdt variant for basic usdt test Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-23 17:46   ` Andrii Nakryiko
  2025-04-21 21:44 ` [PATCH perf/core 19/22] selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe Jiri Olsa
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Changing uretprobe_regs_trigger to allow the test for both
uprobe and uretprobe and renaming it to uprobe_regs_equal.

We check that both uprobe and uretprobe probes (bpf programs)
see expected registers with few exceptions.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/uprobe_syscall.c | 58 ++++++++++++++-----
 .../selftests/bpf/progs/uprobe_syscall.c      |  4 +-
 2 files changed, 45 insertions(+), 17 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index f001986981ab..6d88c5b0f6aa 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -18,15 +18,17 @@
 
 #pragma GCC diagnostic ignored "-Wattributes"
 
-__naked unsigned long uretprobe_regs_trigger(void)
+__attribute__((aligned(16)))
+__nocf_check __weak __naked unsigned long uprobe_regs_trigger(void)
 {
 	asm volatile (
-		"movq $0xdeadbeef, %rax\n"
+		".byte 0x0f, 0x1f, 0x44, 0x00, 0x00	\n"
+		"movq $0xdeadbeef, %rax			\n"
 		"ret\n"
 	);
 }
 
-__naked void uretprobe_regs(struct pt_regs *before, struct pt_regs *after)
+__naked void uprobe_regs(struct pt_regs *before, struct pt_regs *after)
 {
 	asm volatile (
 		"movq %r15,   0(%rdi)\n"
@@ -47,15 +49,17 @@ __naked void uretprobe_regs(struct pt_regs *before, struct pt_regs *after)
 		"movq   $0, 120(%rdi)\n" /* orig_rax */
 		"movq   $0, 128(%rdi)\n" /* rip      */
 		"movq   $0, 136(%rdi)\n" /* cs       */
+		"pushq %rax\n"
 		"pushf\n"
 		"pop %rax\n"
 		"movq %rax, 144(%rdi)\n" /* eflags   */
+		"pop %rax\n"
 		"movq %rsp, 152(%rdi)\n" /* rsp      */
 		"movq   $0, 160(%rdi)\n" /* ss       */
 
 		/* save 2nd argument */
 		"pushq %rsi\n"
-		"call uretprobe_regs_trigger\n"
+		"call uprobe_regs_trigger\n"
 
 		/* save  return value and load 2nd argument pointer to rax */
 		"pushq %rax\n"
@@ -95,25 +99,37 @@ __naked void uretprobe_regs(struct pt_regs *before, struct pt_regs *after)
 );
 }
 
-static void test_uretprobe_regs_equal(void)
+static void test_uprobe_regs_equal(bool retprobe)
 {
+	LIBBPF_OPTS(bpf_uprobe_opts, opts,
+		.retprobe = retprobe,
+	);
 	struct uprobe_syscall *skel = NULL;
 	struct pt_regs before = {}, after = {};
 	unsigned long *pb = (unsigned long *) &before;
 	unsigned long *pa = (unsigned long *) &after;
 	unsigned long *pp;
+	unsigned long offset;
 	unsigned int i, cnt;
-	int err;
+
+	offset = get_uprobe_offset(&uprobe_regs_trigger);
+	if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
+		return;
 
 	skel = uprobe_syscall__open_and_load();
 	if (!ASSERT_OK_PTR(skel, "uprobe_syscall__open_and_load"))
 		goto cleanup;
 
-	err = uprobe_syscall__attach(skel);
-	if (!ASSERT_OK(err, "uprobe_syscall__attach"))
+	skel->links.probe = bpf_program__attach_uprobe_opts(skel->progs.probe,
+				0, "/proc/self/exe", offset, &opts);
+	if (!ASSERT_OK_PTR(skel->links.probe, "bpf_program__attach_uprobe_opts"))
 		goto cleanup;
 
-	uretprobe_regs(&before, &after);
+	/* make sure uprobe gets optimized */
+	if (!retprobe)
+		uprobe_regs_trigger();
+
+	uprobe_regs(&before, &after);
 
 	pp = (unsigned long *) &skel->bss->regs;
 	cnt = sizeof(before)/sizeof(*pb);
@@ -122,7 +138,7 @@ static void test_uretprobe_regs_equal(void)
 		unsigned int offset = i * sizeof(unsigned long);
 
 		/*
-		 * Check register before and after uretprobe_regs_trigger call
+		 * Check register before and after uprobe_regs_trigger call
 		 * that triggers the uretprobe.
 		 */
 		switch (offset) {
@@ -136,7 +152,7 @@ static void test_uretprobe_regs_equal(void)
 
 		/*
 		 * Check register seen from bpf program and register after
-		 * uretprobe_regs_trigger call
+		 * uprobe_regs_trigger call (with rax exception, check below).
 		 */
 		switch (offset) {
 		/*
@@ -149,6 +165,15 @@ static void test_uretprobe_regs_equal(void)
 		case offsetof(struct pt_regs, rsp):
 		case offsetof(struct pt_regs, ss):
 			break;
+		/*
+		 * uprobe does not see return value in rax, it needs to see the
+		 * original (before) rax value
+		 */
+		case offsetof(struct pt_regs, rax):
+			if (!retprobe) {
+				ASSERT_EQ(pp[i], pb[i], "uprobe rax prog-before value check");
+				break;
+			}
 		default:
 			if (!ASSERT_EQ(pp[i], pa[i], "register prog-after value check"))
 				fprintf(stdout, "failed register offset %u\n", offset);
@@ -186,13 +211,13 @@ static void test_uretprobe_regs_change(void)
 	unsigned long cnt = sizeof(before)/sizeof(*pb);
 	unsigned int i, err, offset;
 
-	offset = get_uprobe_offset(uretprobe_regs_trigger);
+	offset = get_uprobe_offset(uprobe_regs_trigger);
 
 	err = write_bpf_testmod_uprobe(offset);
 	if (!ASSERT_OK(err, "register_uprobe"))
 		return;
 
-	uretprobe_regs(&before, &after);
+	uprobe_regs(&before, &after);
 
 	err = write_bpf_testmod_uprobe(0);
 	if (!ASSERT_OK(err, "unregister_uprobe"))
@@ -605,7 +630,8 @@ static void test_uretprobe_shadow_stack(void)
 	/* Run all the tests with shadow stack in place. */
 	shstk_is_enabled = true;
 
-	test_uretprobe_regs_equal();
+	test_uprobe_regs_equal(false);
+	test_uprobe_regs_equal(true);
 	test_uretprobe_regs_change();
 	test_uretprobe_syscall_call();
 
@@ -728,7 +754,7 @@ static void test_uprobe_sigill(void)
 static void __test_uprobe_syscall(void)
 {
 	if (test__start_subtest("uretprobe_regs_equal"))
-		test_uretprobe_regs_equal();
+		test_uprobe_regs_equal(true);
 	if (test__start_subtest("uretprobe_regs_change"))
 		test_uretprobe_regs_change();
 	if (test__start_subtest("uretprobe_syscall_call"))
@@ -747,6 +773,8 @@ static void __test_uprobe_syscall(void)
 		test_uprobe_race();
 	if (test__start_subtest("uprobe_sigill"))
 		test_uprobe_sigill();
+	if (test__start_subtest("uprobe_regs_equal"))
+		test_uprobe_regs_equal(false);
 }
 #else
 static void __test_uprobe_syscall(void)
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall.c b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
index 8a4fa6c7ef59..e08c31669e5a 100644
--- a/tools/testing/selftests/bpf/progs/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
@@ -7,8 +7,8 @@ struct pt_regs regs;
 
 char _license[] SEC("license") = "GPL";
 
-SEC("uretprobe//proc/self/exe:uretprobe_regs_trigger")
-int uretprobe(struct pt_regs *ctx)
+SEC("uprobe")
+int probe(struct pt_regs *ctx)
 {
 	__builtin_memcpy(&regs, ctx, sizeof(regs));
 	return 0;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 19/22] selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (17 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 18/22] selftests/bpf: Add uprobe_regs_equal test Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-21 21:44 ` [PATCH perf/core 20/22] seccomp: passthrough uprobe systemcall without filtering Jiri Olsa
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
	Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Changing the test_uretprobe_regs_change test to test both uprobe
and uretprobe by adding entry consumer handler to the testmod
and making it to change one of the registers.

Making sure that changed values both uprobe and uretprobe handlers
propagate to the user space.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 .../selftests/bpf/prog_tests/uprobe_syscall.c        | 12 ++++++++----
 tools/testing/selftests/bpf/test_kmods/bpf_testmod.c | 11 +++++++++--
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 6d88c5b0f6aa..684f8ab2e7f8 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -203,7 +203,7 @@ static int write_bpf_testmod_uprobe(unsigned long offset)
 	return ret != n ? (int) ret : 0;
 }
 
-static void test_uretprobe_regs_change(void)
+static void test_regs_change(void)
 {
 	struct pt_regs before = {}, after = {};
 	unsigned long *pb = (unsigned long *) &before;
@@ -217,6 +217,9 @@ static void test_uretprobe_regs_change(void)
 	if (!ASSERT_OK(err, "register_uprobe"))
 		return;
 
+	/* make sure uprobe gets optimized */
+	uprobe_regs_trigger();
+
 	uprobe_regs(&before, &after);
 
 	err = write_bpf_testmod_uprobe(0);
@@ -632,7 +635,6 @@ static void test_uretprobe_shadow_stack(void)
 
 	test_uprobe_regs_equal(false);
 	test_uprobe_regs_equal(true);
-	test_uretprobe_regs_change();
 	test_uretprobe_syscall_call();
 
 	test_uprobe_legacy();
@@ -640,6 +642,8 @@ static void test_uretprobe_shadow_stack(void)
 	test_uprobe_session();
 	test_uprobe_usdt();
 
+	test_regs_change();
+
 	shstk_is_enabled = false;
 
 	ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
@@ -755,8 +759,6 @@ static void __test_uprobe_syscall(void)
 {
 	if (test__start_subtest("uretprobe_regs_equal"))
 		test_uprobe_regs_equal(true);
-	if (test__start_subtest("uretprobe_regs_change"))
-		test_uretprobe_regs_change();
 	if (test__start_subtest("uretprobe_syscall_call"))
 		test_uretprobe_syscall_call();
 	if (test__start_subtest("uretprobe_shadow_stack"))
@@ -775,6 +777,8 @@ static void __test_uprobe_syscall(void)
 		test_uprobe_sigill();
 	if (test__start_subtest("uprobe_regs_equal"))
 		test_uprobe_regs_equal(false);
+	if (test__start_subtest("regs_change"))
+		test_regs_change();
 }
 #else
 static void __test_uprobe_syscall(void)
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
index f38eaf0d35ef..5a3dc463ace5 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
@@ -496,15 +496,21 @@ static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = {
  */
 #ifdef __x86_64__
 
+static int
+uprobe_handler(struct uprobe_consumer *self, struct pt_regs *regs, __u64 *data)
+{
+	regs->cx = 0x87654321feebdaed;
+	return 0;
+}
+
 static int
 uprobe_ret_handler(struct uprobe_consumer *self, unsigned long func,
 		   struct pt_regs *regs, __u64 *data)
 
 {
 	regs->ax  = 0x12345678deadbeef;
-	regs->cx  = 0x87654321feebdaed;
 	regs->r11 = (u64) -1;
-	return true;
+	return 0;
 }
 
 struct testmod_uprobe {
@@ -516,6 +522,7 @@ struct testmod_uprobe {
 static DEFINE_MUTEX(testmod_uprobe_mutex);
 
 static struct testmod_uprobe uprobe = {
+	.consumer.handler = uprobe_handler,
 	.consumer.ret_handler = uprobe_ret_handler,
 };
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 20/22] seccomp: passthrough uprobe systemcall without filtering
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (18 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 19/22] selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-21 23:04   ` Kees Cook
  2025-04-21 21:44 ` [PATCH perf/core 21/22] selftests/seccomp: validate uprobe syscall passes through seccomp Jiri Olsa
  2025-04-21 21:44 ` [PATCH 22/22] man2: Add uprobe syscall page Jiri Olsa
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: Kees Cook, Eyal Birger, bpf, linux-kernel, linux-trace-kernel,
	x86, Song Liu, Yonghong Song, John Fastabend, Hao Luo,
	Steven Rostedt, Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Adding uprobe as another exception to the seccomp filter alongside
with the uretprobe syscall.

Same as the uretprobe the uprobe syscall is installed by kernel as
replacement for the breakpoint exception and is limited to x86_64
arch and isn't expected to ever be supported in i386.

Cc: Kees Cook <keescook@chromium.org>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/seccomp.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 41aa761c7738..7daf2da09e8e 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -741,6 +741,26 @@ seccomp_prepare_user_filter(const char __user *user_filter)
 }
 
 #ifdef SECCOMP_ARCH_NATIVE
+static bool seccomp_uprobe_exception(struct seccomp_data *sd)
+{
+#if defined __NR_uretprobe || defined __NR_uprobe
+#ifdef SECCOMP_ARCH_COMPAT
+	if (sd->arch == SECCOMP_ARCH_NATIVE)
+#endif
+	{
+#ifdef __NR_uretprobe
+		if (sd->nr == __NR_uretprobe)
+			return true;
+#endif
+#ifdef __NR_uprobe
+		if (sd->nr == __NR_uprobe)
+			return true;
+#endif
+	}
+#endif
+	return false;
+}
+
 /**
  * seccomp_is_const_allow - check if filter is constant allow with given data
  * @fprog: The BPF programs
@@ -758,13 +778,8 @@ static bool seccomp_is_const_allow(struct sock_fprog_kern *fprog,
 		return false;
 
 	/* Our single exception to filtering. */
-#ifdef __NR_uretprobe
-#ifdef SECCOMP_ARCH_COMPAT
-	if (sd->arch == SECCOMP_ARCH_NATIVE)
-#endif
-		if (sd->nr == __NR_uretprobe)
-			return true;
-#endif
+	if (seccomp_uprobe_exception(sd))
+		return true;
 
 	for (pc = 0; pc < fprog->len; pc++) {
 		struct sock_filter *insn = &fprog->filter[pc];
@@ -1042,6 +1057,9 @@ static const int mode1_syscalls[] = {
 	__NR_seccomp_read, __NR_seccomp_write, __NR_seccomp_exit, __NR_seccomp_sigreturn,
 #ifdef __NR_uretprobe
 	__NR_uretprobe,
+#endif
+#ifdef __NR_uprobe
+	__NR_uprobe,
 #endif
 	-1, /* negative terminated */
 };
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH perf/core 21/22] selftests/seccomp: validate uprobe syscall passes through seccomp
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (19 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 20/22] seccomp: passthrough uprobe systemcall without filtering Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-21 23:04   ` Kees Cook
  2025-04-21 21:44 ` [PATCH 22/22] man2: Add uprobe syscall page Jiri Olsa
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: Kees Cook, Eyal Birger, bpf, linux-kernel, linux-trace-kernel,
	x86, Song Liu, Yonghong Song, John Fastabend, Hao Luo,
	Steven Rostedt, Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Adding uprobe checks into the current uretprobe tests.

All the related tests are now executed with attached uprobe
or uretprobe or without any probe.

Renaming the test fixture to uprobe, because it seems better.

Cc: Kees Cook <keescook@chromium.org>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++----
 1 file changed, 86 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index b2f76a52215a..d566e40a6028 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -73,6 +73,14 @@
 #define noinline __attribute__((noinline))
 #endif
 
+#ifndef __nocf_check
+#define __nocf_check __attribute__((nocf_check))
+#endif
+
+#ifndef __naked
+#define __naked __attribute__((__naked__))
+#endif
+
 #ifndef PR_SET_NO_NEW_PRIVS
 #define PR_SET_NO_NEW_PRIVS 38
 #define PR_GET_NO_NEW_PRIVS 39
@@ -4899,7 +4907,36 @@ TEST(tsync_vs_dead_thread_leader)
 	EXPECT_EQ(0, status);
 }
 
-noinline int probed(void)
+#ifdef __x86_64__
+
+/*
+ * We need naked probed_uprobe function. Using __nocf_check
+ * check to skip possible endbr64 instruction and ignoring
+ * -Wattributes, otherwise the compilation might fail.
+ */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wattributes"
+
+__naked __nocf_check noinline int probed_uprobe(void)
+{
+	/*
+	 * Optimized uprobe is possible only on top of nop5 instruction.
+	 */
+	asm volatile ("                                 \n"
+		".byte 0x0f, 0x1f, 0x44, 0x00, 0x00     \n"
+		"ret                                    \n"
+	);
+}
+#pragma GCC diagnostic pop
+
+#else
+noinline int probed_uprobe(void)
+{
+	return 1;
+}
+#endif
+
+noinline int probed_uretprobe(void)
 {
 	return 1;
 }
@@ -4952,35 +4989,46 @@ static ssize_t get_uprobe_offset(const void *addr)
 	return found ? (uintptr_t)addr - start + base : -1;
 }
 
-FIXTURE(URETPROBE) {
+FIXTURE(UPROBE) {
 	int fd;
 };
 
-FIXTURE_VARIANT(URETPROBE) {
+FIXTURE_VARIANT(UPROBE) {
 	/*
-	 * All of the URETPROBE behaviors can be tested with either
-	 * uretprobe attached or not
+	 * All of the U(RET)PROBE behaviors can be tested with either
+	 * u(ret)probe attached or not
 	 */
 	bool attach;
+	/*
+	 * Test both uprobe and uretprobe.
+	 */
+	bool uretprobe;
 };
 
-FIXTURE_VARIANT_ADD(URETPROBE, attached) {
+FIXTURE_VARIANT_ADD(UPROBE, not_attached) {
+	.attach = false,
+	.uretprobe = false,
+};
+
+FIXTURE_VARIANT_ADD(UPROBE, uprobe_attached) {
 	.attach = true,
+	.uretprobe = false,
 };
 
-FIXTURE_VARIANT_ADD(URETPROBE, not_attached) {
-	.attach = false,
+FIXTURE_VARIANT_ADD(UPROBE, uretprobe_attached) {
+	.attach = true,
+	.uretprobe = true,
 };
 
-FIXTURE_SETUP(URETPROBE)
+FIXTURE_SETUP(UPROBE)
 {
 	const size_t attr_sz = sizeof(struct perf_event_attr);
 	struct perf_event_attr attr;
 	ssize_t offset;
 	int type, bit;
 
-#ifndef __NR_uretprobe
-	SKIP(return, "__NR_uretprobe syscall not defined");
+#if !defined(__NR_uprobe) || !defined(__NR_uretprobe)
+	SKIP(return, "__NR_uprobe ot __NR_uretprobe syscalls not defined");
 #endif
 
 	if (!variant->attach)
@@ -4990,12 +5038,17 @@ FIXTURE_SETUP(URETPROBE)
 
 	type = determine_uprobe_perf_type();
 	ASSERT_GE(type, 0);
-	bit = determine_uprobe_retprobe_bit();
-	ASSERT_GE(bit, 0);
-	offset = get_uprobe_offset(probed);
+
+	if (variant->uretprobe) {
+		bit = determine_uprobe_retprobe_bit();
+		ASSERT_GE(bit, 0);
+	}
+
+	offset = get_uprobe_offset(variant->uretprobe ? probed_uretprobe : probed_uprobe);
 	ASSERT_GE(offset, 0);
 
-	attr.config |= 1 << bit;
+	if (variant->uretprobe)
+		attr.config |= 1 << bit;
 	attr.size = attr_sz;
 	attr.type = type;
 	attr.config1 = ptr_to_u64("/proc/self/exe");
@@ -5006,7 +5059,7 @@ FIXTURE_SETUP(URETPROBE)
 			   PERF_FLAG_FD_CLOEXEC);
 }
 
-FIXTURE_TEARDOWN(URETPROBE)
+FIXTURE_TEARDOWN(UPROBE)
 {
 	/* we could call close(self->fd), but we'd need extra filter for
 	 * that and since we are calling _exit right away..
@@ -5020,11 +5073,17 @@ static int run_probed_with_filter(struct sock_fprog *prog)
 		return -1;
 	}
 
-	probed();
+	/*
+	 * Uprobe is optimized after first hit, so let's hit twice.
+	 */
+	probed_uprobe();
+	probed_uprobe();
+
+	probed_uretprobe();
 	return 0;
 }
 
-TEST_F(URETPROBE, uretprobe_default_allow)
+TEST_F(UPROBE, uprobe_default_allow)
 {
 	struct sock_filter filter[] = {
 		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
@@ -5037,7 +5096,7 @@ TEST_F(URETPROBE, uretprobe_default_allow)
 	ASSERT_EQ(0, run_probed_with_filter(&prog));
 }
 
-TEST_F(URETPROBE, uretprobe_default_block)
+TEST_F(UPROBE, uprobe_default_block)
 {
 	struct sock_filter filter[] = {
 		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
@@ -5054,11 +5113,14 @@ TEST_F(URETPROBE, uretprobe_default_block)
 	ASSERT_EQ(0, run_probed_with_filter(&prog));
 }
 
-TEST_F(URETPROBE, uretprobe_block_uretprobe_syscall)
+TEST_F(UPROBE, uprobe_block_syscall)
 {
 	struct sock_filter filter[] = {
 		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
 			offsetof(struct seccomp_data, nr)),
+#ifdef __NR_uprobe
+		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_uprobe, 1, 2),
+#endif
 #ifdef __NR_uretprobe
 		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_uretprobe, 0, 1),
 #endif
@@ -5073,11 +5135,14 @@ TEST_F(URETPROBE, uretprobe_block_uretprobe_syscall)
 	ASSERT_EQ(0, run_probed_with_filter(&prog));
 }
 
-TEST_F(URETPROBE, uretprobe_default_block_with_uretprobe_syscall)
+TEST_F(UPROBE, uprobe_default_block_with_syscall)
 {
 	struct sock_filter filter[] = {
 		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
 			offsetof(struct seccomp_data, nr)),
+#ifdef __NR_uprobe
+		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_uprobe, 3, 0),
+#endif
 #ifdef __NR_uretprobe
 		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_uretprobe, 2, 0),
 #endif
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 22/22] man2: Add uprobe syscall page
  2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
                   ` (20 preceding siblings ...)
  2025-04-21 21:44 ` [PATCH perf/core 21/22] selftests/seccomp: validate uprobe syscall passes through seccomp Jiri Olsa
@ 2025-04-21 21:44 ` Jiri Olsa
  2025-04-22  7:00   ` Alejandro Colomar
  21 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-21 21:44 UTC (permalink / raw)
  To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
  Cc: Alejandro Colomar, bpf, linux-kernel, linux-trace-kernel, x86,
	Song Liu, Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
	Masami Hiramatsu, Alan Maguire, David Laight,
	Thomas Weißschuh, Ingo Molnar

Adding man page for new uprobe syscall.

Cc: Alejandro Colomar <alx@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 man/man2/uprobe.2    | 49 ++++++++++++++++++++++++++++++++++++++++++++
 man/man2/uretprobe.2 |  2 ++
 2 files changed, 51 insertions(+)
 create mode 100644 man/man2/uprobe.2

diff --git a/man/man2/uprobe.2 b/man/man2/uprobe.2
new file mode 100644
index 000000000000..2b01a5ab5f3e
--- /dev/null
+++ b/man/man2/uprobe.2
@@ -0,0 +1,49 @@
+.\" Copyright (C) 2024, Jiri Olsa <jolsa@kernel.org>
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH uprobe 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+uprobe
+\-
+execute pending entry uprobes
+.SH SYNOPSIS
+.nf
+.B int uprobe(void);
+.fi
+.SH DESCRIPTION
+.BR uprobe ()
+is an alternative to breakpoint instructions
+for triggering entry uprobe consumers.
+.P
+Calls to
+.BR uprobe ()
+are only made from the user-space trampoline provided by the kernel.
+Calls from any other place result in a
+.BR SIGILL .
+.SH RETURN VALUE
+The return value is architecture-specific.
+.SH ERRORS
+.TP
+.B SIGILL
+.BR uprobe ()
+was called by a user-space program.
+.SH VERSIONS
+The behavior varies across systems.
+.SH STANDARDS
+None.
+.SH HISTORY
+TBD
+.P
+.BR uprobe ()
+was initially introduced for the x86_64 architecture
+where it was shown to be faster than breakpoint traps.
+It might be extended to other architectures.
+.SH CAVEATS
+.BR uprobe ()
+exists only to allow the invocation of entry uprobe consumers.
+It should
+.B never
+be called directly.
+.SH SEE ALSO
+.BR uretprobe (2)
diff --git a/man/man2/uretprobe.2 b/man/man2/uretprobe.2
index bbbfb0c59335..bb8bf4e32e5d 100644
--- a/man/man2/uretprobe.2
+++ b/man/man2/uretprobe.2
@@ -45,3 +45,5 @@ exists only to allow the invocation of return uprobe consumers.
 It should
 .B never
 be called directly.
+.SH SEE ALSO
+.BR uprobe (2)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 21/22] selftests/seccomp: validate uprobe syscall passes through seccomp
  2025-04-21 21:44 ` [PATCH perf/core 21/22] selftests/seccomp: validate uprobe syscall passes through seccomp Jiri Olsa
@ 2025-04-21 23:04   ` Kees Cook
  0 siblings, 0 replies; 74+ messages in thread
From: Kees Cook @ 2025-04-21 23:04 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, Eyal Birger, bpf,
	linux-kernel, linux-trace-kernel, x86, Song Liu, Yonghong Song,
	John Fastabend, Hao Luo, Steven Rostedt, Masami Hiramatsu,
	Alan Maguire, David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 11:44:21PM +0200, Jiri Olsa wrote:
> Adding uprobe checks into the current uretprobe tests.
> 
> All the related tests are now executed with attached uprobe
> or uretprobe or without any probe.
> 
> Renaming the test fixture to uprobe, because it seems better.
> 
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Eyal Birger <eyal.birger@gmail.com>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

Thanks for updating the tests!

Reviewed-by: Kees Cook <kees@kernel.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 20/22] seccomp: passthrough uprobe systemcall without filtering
  2025-04-21 21:44 ` [PATCH perf/core 20/22] seccomp: passthrough uprobe systemcall without filtering Jiri Olsa
@ 2025-04-21 23:04   ` Kees Cook
  0 siblings, 0 replies; 74+ messages in thread
From: Kees Cook @ 2025-04-21 23:04 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, Eyal Birger, bpf,
	linux-kernel, linux-trace-kernel, x86, Song Liu, Yonghong Song,
	John Fastabend, Hao Luo, Steven Rostedt, Masami Hiramatsu,
	Alan Maguire, David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 11:44:20PM +0200, Jiri Olsa wrote:
> Adding uprobe as another exception to the seccomp filter alongside
> with the uretprobe syscall.
> 
> Same as the uretprobe the uprobe syscall is installed by kernel as
> replacement for the breakpoint exception and is limited to x86_64
> arch and isn't expected to ever be supported in i386.
> 
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Eyal Birger <eyal.birger@gmail.com>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>

<insert standard grumbling>

Going forward, how can we avoid this kind of thing?

Reviewed-by: Kees Cook <kees@kernel.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 22/22] man2: Add uprobe syscall page
  2025-04-21 21:44 ` [PATCH 22/22] man2: Add uprobe syscall page Jiri Olsa
@ 2025-04-22  7:00   ` Alejandro Colomar
  2025-04-22 14:01     ` Jiri Olsa
  0 siblings, 1 reply; 74+ messages in thread
From: Alejandro Colomar @ 2025-04-22  7:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 6795 bytes --]

Hi Jiri,

On Mon, Apr 21, 2025 at 11:44:22PM +0200, Jiri Olsa wrote:
> Adding man page for new uprobe syscall.
> 
> Cc: Alejandro Colomar <alx@kernel.org>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  man/man2/uprobe.2    | 49 ++++++++++++++++++++++++++++++++++++++++++++
>  man/man2/uretprobe.2 |  2 ++
>  2 files changed, 51 insertions(+)
>  create mode 100644 man/man2/uprobe.2
> 
> diff --git a/man/man2/uprobe.2 b/man/man2/uprobe.2
> new file mode 100644
> index 000000000000..2b01a5ab5f3e
> --- /dev/null
> +++ b/man/man2/uprobe.2
> @@ -0,0 +1,49 @@
> +.\" Copyright (C) 2024, Jiri Olsa <jolsa@kernel.org>
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH uprobe 2 (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +uprobe
> +\-
> +execute pending entry uprobes
> +.SH SYNOPSIS
> +.nf
> +.B int uprobe(void);
> +.fi
> +.SH DESCRIPTION
> +.BR uprobe ()
> +is an alternative to breakpoint instructions
> +for triggering entry uprobe consumers.

What are breakpoint instructions?

> +.P
> +Calls to
> +.BR uprobe ()
> +are only made from the user-space trampoline provided by the kernel.
> +Calls from any other place result in a
> +.BR SIGILL .
> +.SH RETURN VALUE
> +The return value is architecture-specific.
> +.SH ERRORS
> +.TP
> +.B SIGILL
> +.BR uprobe ()
> +was called by a user-space program.
> +.SH VERSIONS
> +The behavior varies across systems.
> +.SH STANDARDS
> +None.
> +.SH HISTORY
> +TBD
> +.P
> +.BR uprobe ()
> +was initially introduced for the x86_64 architecture
> +where it was shown to be faster than breakpoint traps.
> +It might be extended to other architectures.
> +.SH CAVEATS
> +.BR uprobe ()
> +exists only to allow the invocation of entry uprobe consumers.
> +It should
> +.B never
> +be called directly.
> +.SH SEE ALSO
> +.BR uretprobe (2)

The pages are almost identical.  Should we document both pages in the
same page?

> diff --git a/man/man2/uretprobe.2 b/man/man2/uretprobe.2
> index bbbfb0c59335..bb8bf4e32e5d 100644
> --- a/man/man2/uretprobe.2
> +++ b/man/man2/uretprobe.2
> @@ -45,3 +45,5 @@ exists only to allow the invocation of return uprobe consumers.
>  It should
>  .B never
>  be called directly.
> +.SH SEE ALSO
> +.BR uprobe (2)
> -- 
> 2.49.0


How about something like the diff below?


Have a lovely day!
Alex

---
diff --git i/man/man2/uretprobe.2 w/man/man2/uretprobe.2
index bbbfb0c59..df0e5d92e 100644
--- i/man/man2/uretprobe.2
+++ w/man/man2/uretprobe.2
@@ -2,22 +2,28 @@
 .\"
 .\" SPDX-License-Identifier: Linux-man-pages-copyleft
 .\"
-.TH uretprobe 2 (date) "Linux man-pages (unreleased)"
+.TH uprobe 2 (date) "Linux man-pages (unreleased)"
 .SH NAME
+uprobe,
 uretprobe
 \-
-execute pending return uprobes
+execute pending entry or return uprobes
 .SH SYNOPSIS
 .nf
+.B int uprobe(void);
 .B int uretprobe(void);
 .fi
 .SH DESCRIPTION
+.BR uprobe ()
+is an alternative to breakpoint instructions
+for triggering entry uprobe consumers.
+.P
 .BR uretprobe ()
 is an alternative to breakpoint instructions
 for triggering return uprobe consumers.
 .P
 Calls to
-.BR uretprobe ()
+these system calls
 are only made from the user-space trampoline provided by the kernel.
 Calls from any other place result in a
 .BR SIGILL .
@@ -26,22 +32,28 @@ .SH RETURN VALUE
 .SH ERRORS
 .TP
 .B SIGILL
-.BR uretprobe ()
-was called by a user-space program.
+These system calls
+were called by a user-space program.
 .SH VERSIONS
 The behavior varies across systems.
 .SH STANDARDS
 None.
 .SH HISTORY
+.TP
+.BR uprobe ()
+TBD
+.TP
+.BR uretprobe ()
 Linux 6.11.
 .P
-.BR uretprobe ()
-was initially introduced for the x86_64 architecture
-where it was shown to be faster than breakpoint traps.
-It might be extended to other architectures.
+These system calls
+were initially introduced for the x86_64 architecture
+where they were shown to be faster than breakpoint traps.
+They might be extended to other architectures.
 .SH CAVEATS
-.BR uretprobe ()
-exists only to allow the invocation of return uprobe consumers.
-It should
+These system calls
+exist only to allow the invocation of
+entry or return uprobe consumers.
+They should
 .B never
 be called directly.


$ MANWIDTH=64 diffman-git
--- HEAD:man/man2/uretprobe.2
+++ ./man/man2/uretprobe.2
@@ -1,24 +1,30 @@
-uretprobe(2)          System Calls Manual          uretprobe(2)
+uprobe(2)             System Calls Manual             uprobe(2)
 
 NAME
-       uretprobe - execute pending return uprobes
+       uprobe,  uretprobe - execute pending entry or return up‐
+       robes
 
 SYNOPSIS
+       int uprobe(void);
        int uretprobe(void);
 
 DESCRIPTION
+       uprobe() is an alternative  to  breakpoint  instructions
+       for triggering entry uprobe consumers.
+
        uretprobe() is an alternative to breakpoint instructions
        for triggering return uprobe consumers.
 
-       Calls  to  uretprobe() are only made from the user‐space
-       trampoline provided by the kernel.  Calls from any other
-       place result in a SIGILL.
+       Calls to these system calls are only made from the user‐
+       space trampoline provided by the kernel.  Calls from any
+       other place result in a SIGILL.
 
 RETURN VALUE
        The return value is architecture‐specific.
 
 ERRORS
-       SIGILL uretprobe() was called by a user‐space program.
+       SIGILL These  system  calls  were called by a user‐space
+              program.
 
 VERSIONS
        The behavior varies across systems.
@@ -27,16 +33,20 @@
        None.
 
 HISTORY
-       Linux 6.11.
+       uprobe()
+              TBD
+
+       uretprobe()
+              Linux 6.11.
 
-       uretprobe() was initially introduced for the x86_64  ar‐
-       chitecture  where  it was shown to be faster than break‐
-       point traps.  It might be extended  to  other  architec‐
-       tures.
+       These system calls were  initially  introduced  for  the
+       x86_64  architecture  where they were shown to be faster
+       than breakpoint traps.  They might be extended to  other
+       architectures.
 
 CAVEATS
-       uretprobe()  exists  only to allow the invocation of re‐
-       turn uprobe consumers.  It should never  be  called  di‐
-       rectly.
+       These system calls exist only to allow the invocation of
+       entry  or return uprobe consumers.  They should never be
+       called directly.
 
-Linux man‐pages (unreleased) (date)                uretprobe(2)
+Linux man‐pages (unreleased) (date)                   uprobe(2)

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 22/22] man2: Add uprobe syscall page
  2025-04-22  7:00   ` Alejandro Colomar
@ 2025-04-22 14:01     ` Jiri Olsa
  2025-04-22 20:45       ` Alejandro Colomar
  0 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-22 14:01 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Tue, Apr 22, 2025 at 09:00:17AM +0200, Alejandro Colomar wrote:
> Hi Jiri,
> 
> On Mon, Apr 21, 2025 at 11:44:22PM +0200, Jiri Olsa wrote:
> > Adding man page for new uprobe syscall.
> > 
> > Cc: Alejandro Colomar <alx@kernel.org>
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  man/man2/uprobe.2    | 49 ++++++++++++++++++++++++++++++++++++++++++++
> >  man/man2/uretprobe.2 |  2 ++
> >  2 files changed, 51 insertions(+)
> >  create mode 100644 man/man2/uprobe.2
> > 
> > diff --git a/man/man2/uprobe.2 b/man/man2/uprobe.2
> > new file mode 100644
> > index 000000000000..2b01a5ab5f3e
> > --- /dev/null
> > +++ b/man/man2/uprobe.2
> > @@ -0,0 +1,49 @@
> > +.\" Copyright (C) 2024, Jiri Olsa <jolsa@kernel.org>
> > +.\"
> > +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> > +.\"
> > +.TH uprobe 2 (date) "Linux man-pages (unreleased)"
> > +.SH NAME
> > +uprobe
> > +\-
> > +execute pending entry uprobes
> > +.SH SYNOPSIS
> > +.nf
> > +.B int uprobe(void);
> > +.fi
> > +.SH DESCRIPTION
> > +.BR uprobe ()
> > +is an alternative to breakpoint instructions
> > +for triggering entry uprobe consumers.
> 
> What are breakpoint instructions?

it's int3 instruction to trigger breakpoint (on x86_64)

> 
> > +.P
> > +Calls to
> > +.BR uprobe ()
> > +are only made from the user-space trampoline provided by the kernel.
> > +Calls from any other place result in a
> > +.BR SIGILL .
> > +.SH RETURN VALUE
> > +The return value is architecture-specific.
> > +.SH ERRORS
> > +.TP
> > +.B SIGILL
> > +.BR uprobe ()
> > +was called by a user-space program.
> > +.SH VERSIONS
> > +The behavior varies across systems.
> > +.SH STANDARDS
> > +None.
> > +.SH HISTORY
> > +TBD
> > +.P
> > +.BR uprobe ()
> > +was initially introduced for the x86_64 architecture
> > +where it was shown to be faster than breakpoint traps.
> > +It might be extended to other architectures.
> > +.SH CAVEATS
> > +.BR uprobe ()
> > +exists only to allow the invocation of entry uprobe consumers.
> > +It should
> > +.B never
> > +be called directly.
> > +.SH SEE ALSO
> > +.BR uretprobe (2)
> 
> The pages are almost identical.  Should we document both pages in the
> same page?

great, I was wondering this was an option, looks much better
should we also add uprobe link, like below?

thanks,
jirka


---
diff --git a/man/man2/uprobe.2 b/man/man2/uprobe.2
new file mode 100644
index 000000000000..ea5ccf901591
--- /dev/null
+++ b/man/man2/uprobe.2
@@ -0,0 +1 @@
+.so man2/uretprobe.2

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 22/22] man2: Add uprobe syscall page
  2025-04-22 14:01     ` Jiri Olsa
@ 2025-04-22 20:45       ` Alejandro Colomar
  2025-05-01 21:26         ` Alejandro Colomar
  0 siblings, 1 reply; 74+ messages in thread
From: Alejandro Colomar @ 2025-04-22 20:45 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

Hi Jiri,

On Tue, Apr 22, 2025 at 04:01:56PM +0200, Jiri Olsa wrote:
> > > +is an alternative to breakpoint instructions
> > > +for triggering entry uprobe consumers.
> > 
> > What are breakpoint instructions?
> 
> it's int3 instruction to trigger breakpoint (on x86_64)

I guess it's something that people who do that stuff understand.
I don't, but I guess your intended audience will be okay with it.  :)

> > The pages are almost identical.  Should we document both pages in the
> > same page?
> 
> great, I was wondering this was an option, looks much better
> should we also add uprobe link, like below?

Yep, sure.  Thanks for the reminder!


Have a lovely night!
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 09/22] uprobes/x86: Add uprobe syscall to speed up uprobe
  2025-04-21 21:44 ` [PATCH perf/core 09/22] uprobes/x86: Add uprobe syscall to speed up uprobe Jiri Olsa
@ 2025-04-22 23:48   ` Andrii Nakryiko
  2025-04-27 15:51   ` Oleg Nesterov
  1 sibling, 0 replies; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-22 23:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:46 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding new uprobe syscall that calls uprobe handlers for given
> 'breakpoint' address.
>
> The idea is that the 'breakpoint' address calls the user space
> trampoline which executes the uprobe syscall.
>
> The syscall handler reads the return address of the initial call
> to retrieve the original 'breakpoint' address. With this address
> we find the related uprobe object and call its consumers.
>
> Adding the arch_uprobe_trampoline_mapping function that provides
> uprobe trampoline mapping. This mapping is backed with one global
> page initialized at __init time and shared by the all the mapping
> instances.
>
> We do not allow to execute uprobe syscall if the caller is not
> from uprobe trampoline mapping.
>
> The uprobe syscall ensures the consumer (bpf program) sees registers
> values in the state before the trampoline was called.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  arch/x86/entry/syscalls/syscall_64.tbl |   1 +
>  arch/x86/kernel/uprobes.c              | 122 +++++++++++++++++++++++++
>  include/linux/syscalls.h               |   2 +
>  include/linux/uprobes.h                |   1 +
>  kernel/events/uprobes.c                |  17 ++++
>  kernel/sys_ni.c                        |   1 +
>  6 files changed, 144 insertions(+)
>

LGTM

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> index cfb5ca41e30d..9fd1291e7bdf 100644
> --- a/arch/x86/entry/syscalls/syscall_64.tbl
> +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> @@ -345,6 +345,7 @@
>  333    common  io_pgetevents           sys_io_pgetevents
>  334    common  rseq                    sys_rseq
>  335    common  uretprobe               sys_uretprobe
> +336    common  uprobe                  sys_uprobe
>  # don't use numbers 387 through 423, add new calls after the last
>  # 'common' entry
>  424    common  pidfd_send_signal       sys_pidfd_send_signal

[...]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
  2025-04-21 21:44 ` [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode Jiri Olsa
@ 2025-04-22 23:48   ` Andrii Nakryiko
  2025-04-27 14:13   ` Oleg Nesterov
  1 sibling, 0 replies; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-22 23:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:45 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> The uprobe_write_opcode function currently updates also refctr offset
> if there's one defined for uprobe.
>
> This is not handy for following changes which needs to make several
> updates (writes) to install or remove uprobe, but update refctr offset
> just once.
>
> Adding set_swbp_refctr/set_orig_refctr which makes sure refctr offset
> is updated.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  include/linux/uprobes.h |  2 +-
>  kernel/events/uprobes.c | 62 ++++++++++++++++++++++++-----------------
>  2 files changed, 38 insertions(+), 26 deletions(-)
>

LGTM

Acked-by: Andrii Nakryiko <andrii@kernel.org>

[...]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 05/22] uprobes: Add nbytes argument to uprobe_write
  2025-04-21 21:44 ` [PATCH perf/core 05/22] uprobes: Add nbytes argument to uprobe_write Jiri Olsa
@ 2025-04-22 23:48   ` Andrii Nakryiko
  0 siblings, 0 replies; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-22 23:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:45 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding nbytes argument to uprobe_write and related functions as
> preparation for writing whole instructions in following changes.
>
> Also renaming opcode arguments to insn, which seems to fit better.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  include/linux/uprobes.h |  6 +++---
>  kernel/events/uprobes.c | 27 ++++++++++++++-------------
>  2 files changed, 17 insertions(+), 16 deletions(-)
>

Acked-by: Andrii Nakryiko <andrii@kernel.org>

[...]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 06/22] uprobes: Add is_register argument to uprobe_write and uprobe_write_opcode
  2025-04-21 21:44 ` [PATCH perf/core 06/22] uprobes: Add is_register argument to uprobe_write and uprobe_write_opcode Jiri Olsa
@ 2025-04-22 23:48   ` Andrii Nakryiko
  0 siblings, 0 replies; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-22 23:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:45 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> The uprobe_write has special path to restore the original page when we
> write original instruction back. This happens when uprobe_write detects
> that we want to write anything else but breakpoint instruction.
>
> Moving the detection away and passing it to uprobe_write as argument,
> so it's possible to write different instructions (other than just
> breakpoint and rest).
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  arch/arm/probes/uprobes/core.c |  2 +-
>  include/linux/uprobes.h        |  5 +++--
>  kernel/events/uprobes.c        | 22 +++++++++++-----------
>  3 files changed, 15 insertions(+), 14 deletions(-)
>

Acked-by: Andrii Nakryiko <andrii@kernel.org>


[...]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 07/22] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock
  2025-04-21 21:44 ` [PATCH perf/core 07/22] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock Jiri Olsa
@ 2025-04-22 23:48   ` Andrii Nakryiko
  2025-04-27 14:24   ` Oleg Nesterov
  1 sibling, 0 replies; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-22 23:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:45 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Currently unapply_uprobe takes mmap_read_lock, but it might call
> remove_breakpoint which eventually changes user pages.
>
> Current code writes either breakpoint or original instruction, so
> it can probably go away with that, but with the upcoming change that
> writes multiple instructions on the probed address we need to ensure
> that any update to mm's pages is exclusive.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  kernel/events/uprobes.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>

Makes sense.

Acked-by: Andrii Nakryiko <andrii@kernel.org>


> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index c8d88060dfbf..d256c695d7ff 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1483,7 +1483,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
>         struct vm_area_struct *vma;
>         int err = 0;
>
> -       mmap_read_lock(mm);
> +       mmap_write_lock(mm);
>         for_each_vma(vmi, vma) {
>                 unsigned long vaddr;
>                 loff_t offset;
> @@ -1500,7 +1500,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
>                 vaddr = offset_to_vaddr(vma, uprobe->offset);
>                 err |= remove_breakpoint(uprobe, vma, vaddr);
>         }
> -       mmap_read_unlock(mm);
> +       mmap_write_unlock(mm);
>
>         return err;
>  }
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines
  2025-04-21 21:44 ` [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines Jiri Olsa
@ 2025-04-22 23:51   ` Andrii Nakryiko
  2025-04-27 14:56   ` Oleg Nesterov
  2025-04-27 18:04   ` Oleg Nesterov
  2 siblings, 0 replies; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-22 23:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:46 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding support to add special mapping for for user space trampoline

for for

> with following functions:
>
>   uprobe_trampoline_get - find or add uprobe_trampoline
>   uprobe_trampoline_put - remove or destroy uprobe_trampoline
>
> The user space trampoline is exported as arch specific user space special
> mapping through tramp_mapping, which is initialized in following changes
> with new uprobe syscall.
>
> The uprobe trampoline needs to be callable/reachable from the probed address,
> so while searching for available address we use is_reachable_by_call function
> to decide if the uprobe trampoline is callable from the probe address.
>
> All uprobe_trampoline objects are stored in uprobes_state object and are
> cleaned up when the process mm_struct goes down. Adding new arch hooks
> for that, because this change is x86_64 specific.
>
> Locking is provided by callers in following changes.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  arch/x86/kernel/uprobes.c | 131 ++++++++++++++++++++++++++++++++++++++
>  include/linux/uprobes.h   |   6 ++
>  kernel/events/uprobes.c   |  10 +++
>  kernel/fork.c             |   1 +
>  4 files changed, 148 insertions(+)
>

Acked-by: Andrii Nakryiko <andrii@kernel.org>

[...]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes
  2025-04-21 21:44 ` [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes Jiri Olsa
@ 2025-04-23  0:04   ` Andrii Nakryiko
  2025-04-24 12:49     ` Jiri Olsa
  2025-04-27 17:11   ` Oleg Nesterov
  1 sibling, 1 reply; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-23  0:04 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:46 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Putting together all the previously added pieces to support optimized
> uprobes on top of 5-byte nop instruction.
>
> The current uprobe execution goes through following:
>
>   - installs breakpoint instruction over original instruction
>   - exception handler hit and calls related uprobe consumers
>   - and either simulates original instruction or does out of line single step
>     execution of it
>   - returns to user space
>
> The optimized uprobe path does following:
>
>   - checks the original instruction is 5-byte nop (plus other checks)
>   - adds (or uses existing) user space trampoline with uprobe syscall
>   - overwrites original instruction (5-byte nop) with call to user space
>     trampoline
>   - the user space trampoline executes uprobe syscall that calls related uprobe
>     consumers
>   - trampoline returns back to next instruction
>
> This approach won't speed up all uprobes as it's limited to using nop5 as
> original instruction, but we plan to use nop5 as USDT probe instruction
> (which currently uses single byte nop) and speed up the USDT probes.
>
> The arch_uprobe_optimize triggers the uprobe optimization and is called after
> first uprobe hit. I originally had it called on uprobe installation but then
> it clashed with elf loader, because the user space trampoline was added in a
> place where loader might need to put elf segments, so I decided to do it after
> first uprobe hit when loading is done.
>
> The uprobe is un-optimized in arch specific set_orig_insn call.
>
> The instruction overwrite is x86 arch specific and needs to go through 3 updates:
> (on top of nop5 instruction)
>
>   - write int3 into 1st byte
>   - write last 4 bytes of the call instruction
>   - update the call instruction opcode
>
> And cleanup goes though similar reverse stages:
>
>   - overwrite call opcode with breakpoint (int3)
>   - write last 4 bytes of the nop5 instruction
>   - write the nop5 first instruction byte
>
> We do not unmap and release uprobe trampoline when it's no longer needed,
> because there's no easy way to make sure none of the threads is still
> inside the trampoline. But we do not waste memory, because there's just
> single page for all the uprobe trampoline mappings.
>
> We do waste frame on page mapping for every 4GB by keeping the uprobe
> trampoline page mapped, but that seems ok.
>
> We take the benefit from the fact that set_swbp and set_orig_insn are
> called under mmap_write_lock(mm), so we can use the current instruction
> as the state the uprobe is in - nop5/breakpoint/call trampoline -
> and decide the needed action (optimize/un-optimize) based on that.
>
> Attaching the speed up from benchs/run_bench_uprobes.sh script:
>
> current:
>         usermode-count :  152.604 ± 0.044M/s
>         syscall-count  :   13.359 ± 0.042M/s
> -->     uprobe-nop     :    3.229 ± 0.002M/s
>         uprobe-push    :    3.086 ± 0.004M/s
>         uprobe-ret     :    1.114 ± 0.004M/s
>         uprobe-nop5    :    1.121 ± 0.005M/s
>         uretprobe-nop  :    2.145 ± 0.002M/s
>         uretprobe-push :    2.070 ± 0.001M/s
>         uretprobe-ret  :    0.931 ± 0.001M/s
>         uretprobe-nop5 :    0.957 ± 0.001M/s
>
> after the change:
>         usermode-count :  152.448 ± 0.244M/s
>         syscall-count  :   14.321 ± 0.059M/s
>         uprobe-nop     :    3.148 ± 0.007M/s
>         uprobe-push    :    2.976 ± 0.004M/s
>         uprobe-ret     :    1.068 ± 0.003M/s
> -->     uprobe-nop5    :    7.038 ± 0.007M/s
>         uretprobe-nop  :    2.109 ± 0.004M/s
>         uretprobe-push :    2.035 ± 0.001M/s
>         uretprobe-ret  :    0.908 ± 0.001M/s
>         uretprobe-nop5 :    3.377 ± 0.009M/s
>
> I see bit more speed up on Intel (above) compared to AMD. The big nop5
> speed up is partly due to emulating nop5 and partly due to optimization.
>
> The key speed up we do this for is the USDT switch from nop to nop5:
>         uprobe-nop     :    3.148 ± 0.007M/s
>         uprobe-nop5    :    7.038 ± 0.007M/s
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  arch/x86/include/asm/uprobes.h |   7 +
>  arch/x86/kernel/uprobes.c      | 281 ++++++++++++++++++++++++++++++++-
>  include/linux/uprobes.h        |   6 +-
>  kernel/events/uprobes.c        |  15 +-
>  4 files changed, 301 insertions(+), 8 deletions(-)
>

just minor nits, LGTM

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> +int set_swbp(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> +            unsigned long vaddr)
> +{
> +       if (should_optimize(auprobe)) {
> +               bool optimized = false;
> +               int err;
> +
> +               /*
> +                * We could race with another thread that already optimized the probe,
> +                * so let's not overwrite it with int3 again in this case.
> +                */
> +               err = is_optimized(vma->vm_mm, vaddr, &optimized);
> +               if (err || optimized)
> +                       return err;

IMO, this is a bit too clever, I'd go with plain

if (err)
    return err;
if (optimized)
    return 0; /* we are done */

(and mirror set_orig_insn() structure, consistently)


> +       }
> +       return uprobe_write_opcode(vma, vaddr, UPROBE_SWBP_INSN, true);
> +}
> +
> +int set_orig_insn(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> +                 unsigned long vaddr)
> +{
> +       if (test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags)) {
> +               struct mm_struct *mm = vma->vm_mm;
> +               bool optimized = false;
> +               int err;
> +
> +               err = is_optimized(mm, vaddr, &optimized);
> +               if (err)
> +                       return err;
> +               if (optimized)
> +                       WARN_ON_ONCE(swbp_unoptimize(auprobe, vma, vaddr));
> +       }
> +       return uprobe_write_opcode(vma, vaddr, *(uprobe_opcode_t *)&auprobe->insn, false);
> +}
> +
> +static int __arch_uprobe_optimize(struct mm_struct *mm, unsigned long vaddr)
> +{
> +       struct uprobe_trampoline *tramp;
> +       struct vm_area_struct *vma;
> +       int err = 0;
> +
> +       vma = find_vma(mm, vaddr);
> +       if (!vma)
> +               return -1;

this is EPERM, will be confusing to debug... why not -EINVAL?

> +       tramp = uprobe_trampoline_get(vaddr);
> +       if (!tramp)
> +               return -1;

ditto

> +       err = swbp_optimize(vma, vaddr, tramp->vaddr);
> +       if (WARN_ON_ONCE(err))
> +               uprobe_trampoline_put(tramp);
> +       return err;
> +}
> +

[...]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 11/22] selftests/bpf: Use 5-byte nop for x86 usdt probes
  2025-04-21 21:44 ` [PATCH perf/core 11/22] selftests/bpf: Use 5-byte nop for x86 usdt probes Jiri Olsa
@ 2025-04-23 17:33   ` Andrii Nakryiko
  2025-04-24 12:49     ` Jiri Olsa
  0 siblings, 1 reply; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-23 17:33 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:46 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Using 5-byte nop for x86 usdt probes so we can switch
> to optimized uprobe them.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/testing/selftests/bpf/sdt.h | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>

So sdt.h is an exact copy/paste from systemtap-sdt sources. I'd prefer
to not modify it unnecessarily.

How about we copy/paste usdt.h ([0]) and use *that* for your
benchmarks? I've already anticipated the need to change nop
instruction, so you won't even need to modify the usdt.h file itself,
just

#define USDT_NOP .byte 0x0f, 0x1f, 0x44, 0x00, 0x00

before #include "usdt.h"


  [0] https://github.com/libbpf/usdt/blob/main/usdt.h

> diff --git a/tools/testing/selftests/bpf/sdt.h b/tools/testing/selftests/bpf/sdt.h
> index 1fcfa5160231..1d62c06f5ddc 100644
> --- a/tools/testing/selftests/bpf/sdt.h
> +++ b/tools/testing/selftests/bpf/sdt.h
> @@ -236,6 +236,13 @@ __extension__ extern unsigned long long __sdt_unsp;
>  #define _SDT_NOP       nop
>  #endif
>
> +/* Use 5 byte nop for x86_64 to allow optimizing uprobes. */
> +#if defined(__x86_64__)
> +# define _SDT_DEF_NOP _SDT_ASM_5(990:  .byte 0x0f, 0x1f, 0x44, 0x00, 0x00)
> +#else
> +# define _SDT_DEF_NOP _SDT_ASM_1(990:  _SDT_NOP)
> +#endif
> +
>  #define _SDT_NOTE_NAME "stapsdt"
>  #define _SDT_NOTE_TYPE 3
>
> @@ -288,7 +295,7 @@ __extension__ extern unsigned long long __sdt_unsp;
>
>  #define _SDT_ASM_BODY(provider, name, pack_args, args, ...)                  \
>    _SDT_DEF_MACROS                                                            \
> -  _SDT_ASM_1(990:      _SDT_NOP)                                             \
> +  _SDT_DEF_NOP                                                               \
>    _SDT_ASM_3(          .pushsection .note.stapsdt,_SDT_ASM_AUTOGROUP,"note") \
>    _SDT_ASM_1(          .balign 4)                                            \
>    _SDT_ASM_3(          .4byte 992f-991f, 994f-993f, _SDT_NOTE_TYPE)          \
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 12/22] selftests/bpf: Reorg the uprobe_syscall test function
  2025-04-21 21:44 ` [PATCH perf/core 12/22] selftests/bpf: Reorg the uprobe_syscall test function Jiri Olsa
@ 2025-04-23 17:34   ` Andrii Nakryiko
  0 siblings, 0 replies; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-23 17:34 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:46 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding __test_uprobe_syscall with non x86_64 stub to execute all the tests,
> so we don't need to keep adding non x86_64 stub functions for new tests.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  .../selftests/bpf/prog_tests/uprobe_syscall.c | 34 +++++++------------
>  1 file changed, 12 insertions(+), 22 deletions(-)
>

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index c397336fe1ed..2b00f16406c8 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> @@ -350,29 +350,8 @@ static void test_uretprobe_shadow_stack(void)
>
>         ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
>  }
> -#else
> -static void test_uretprobe_regs_equal(void)
> -{
> -       test__skip();
> -}
> -
> -static void test_uretprobe_regs_change(void)
> -{
> -       test__skip();
> -}
> -
> -static void test_uretprobe_syscall_call(void)
> -{
> -       test__skip();
> -}
>
> -static void test_uretprobe_shadow_stack(void)
> -{
> -       test__skip();
> -}
> -#endif
> -
> -void test_uprobe_syscall(void)
> +static void __test_uprobe_syscall(void)
>  {
>         if (test__start_subtest("uretprobe_regs_equal"))
>                 test_uretprobe_regs_equal();
> @@ -383,3 +362,14 @@ void test_uprobe_syscall(void)
>         if (test__start_subtest("uretprobe_shadow_stack"))
>                 test_uretprobe_shadow_stack();
>  }
> +#else
> +static void __test_uprobe_syscall(void)
> +{
> +       test__skip();
> +}
> +#endif
> +
> +void test_uprobe_syscall(void)
> +{
> +       __test_uprobe_syscall();
> +}
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 13/22] selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi
  2025-04-21 21:44 ` [PATCH perf/core 13/22] selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi Jiri Olsa
@ 2025-04-23 17:36   ` Andrii Nakryiko
  2025-04-24 12:49     ` Jiri Olsa
  0 siblings, 1 reply; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-23 17:36 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:47 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Renaming uprobe_syscall_executed prog to test_uretprobe_multi
> to fit properly in the following changes that add more programs.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c   | 8 ++++----
>  .../testing/selftests/bpf/progs/uprobe_syscall_executed.c | 4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index 2b00f16406c8..3c74a079e6d9 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> @@ -277,10 +277,10 @@ static void test_uretprobe_syscall_call(void)
>                 _exit(0);
>         }
>
> -       skel->links.test = bpf_program__attach_uprobe_multi(skel->progs.test, pid,
> -                                                           "/proc/self/exe",
> -                                                           "uretprobe_syscall_call", &opts);
> -       if (!ASSERT_OK_PTR(skel->links.test, "bpf_program__attach_uprobe_multi"))
> +       skel->links.test_uretprobe_multi = bpf_program__attach_uprobe_multi(skel->progs.test_uretprobe_multi,

this is a bit long, maybe

struct bpf_link *link;

link = bpf_program__attach...
skel->links.test_uretprobe_multi = link;

?

But other than that

Acked-by: Andrii Nakryiko <andrii@kernel.org>


> +                                                       pid, "/proc/self/exe",
> +                                                       "uretprobe_syscall_call", &opts);
> +       if (!ASSERT_OK_PTR(skel->links.test_uretprobe_multi, "bpf_program__attach_uprobe_multi"))
>                 goto cleanup;
>
>         /* kick the child */
> diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> index 0d7f1a7db2e2..2e1b689ed4fb 100644
> --- a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> +++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> @@ -10,8 +10,8 @@ char _license[] SEC("license") = "GPL";
>  int executed = 0;
>
>  SEC("uretprobe.multi")
> -int test(struct pt_regs *regs)
> +int test_uretprobe_multi(struct pt_regs *ctx)
>  {
> -       executed = 1;
> +       executed++;
>         return 0;
>  }
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 14/22] selftests/bpf: Add uprobe/usdt syscall tests
  2025-04-21 21:44 ` [PATCH perf/core 14/22] selftests/bpf: Add uprobe/usdt syscall tests Jiri Olsa
@ 2025-04-23 17:40   ` Andrii Nakryiko
  2025-04-24 12:49     ` Jiri Olsa
  0 siblings, 1 reply; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-23 17:40 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:47 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding tests for optimized uprobe/usdt probes.
>
> Checking that we get expected trampoline and attached bpf programs
> get executed properly.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  .../selftests/bpf/prog_tests/uprobe_syscall.c | 278 +++++++++++++++++-
>  .../bpf/progs/uprobe_syscall_executed.c       |  37 +++
>  2 files changed, 314 insertions(+), 1 deletion(-)
>

[...]

>  static void __test_uprobe_syscall(void)
> diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> index 2e1b689ed4fb..7bb4338c3ee2 100644
> --- a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> +++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> @@ -1,6 +1,8 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include "vmlinux.h"
>  #include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/usdt.bpf.h>
>  #include <string.h>
>
>  struct pt_regs regs;
> @@ -9,9 +11,44 @@ char _license[] SEC("license") = "GPL";
>
>  int executed = 0;
>
> +SEC("uprobe")
> +int BPF_UPROBE(test_uprobe)
> +{

I'd add a PID filter to all of these to guard against potential
unrelated triggerings if in the future there is some parallel test
that attaches to all uprobes or something like that. Better safe than
sorry.

> +       executed++;
> +       return 0;
> +}
> +
> +SEC("uretprobe")
> +int BPF_URETPROBE(test_uretprobe)
> +{
> +       executed++;
> +       return 0;
> +}
> +
> +SEC("uprobe.multi")
> +int test_uprobe_multi(struct pt_regs *ctx)
> +{
> +       executed++;
> +       return 0;
> +}
> +
>  SEC("uretprobe.multi")
>  int test_uretprobe_multi(struct pt_regs *ctx)
>  {
>         executed++;
>         return 0;
>  }
> +
> +SEC("uprobe.session")
> +int test_uprobe_session(struct pt_regs *ctx)
> +{
> +       executed++;
> +       return 0;
> +}
> +
> +SEC("usdt")
> +int test_usdt(struct pt_regs *ctx)
> +{
> +       executed++;
> +       return 0;
> +}
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 15/22] selftests/bpf: Add hit/attach/detach race optimized uprobe test
  2025-04-21 21:44 ` [PATCH perf/core 15/22] selftests/bpf: Add hit/attach/detach race optimized uprobe test Jiri Olsa
@ 2025-04-23 17:42   ` Andrii Nakryiko
  2025-04-24 12:51     ` Jiri Olsa
  0 siblings, 1 reply; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-23 17:42 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:47 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding test that makes sure parallel execution of the uprobe and
> attach/detach of optimized uprobe on it works properly.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  .../selftests/bpf/prog_tests/uprobe_syscall.c | 74 +++++++++++++++++++
>  1 file changed, 74 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index 16effe0bca1d..57ef1207c3f5 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> @@ -619,6 +619,78 @@ static void test_uretprobe_shadow_stack(void)
>         ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
>  }
>
> +static volatile bool race_stop;
> +
> +static void *worker_trigger(void *arg)
> +{
> +       unsigned long rounds = 0;
> +
> +       while (!race_stop) {
> +               uprobe_test();
> +               rounds++;
> +       }
> +
> +       printf("tid %d trigger rounds: %lu\n", gettid(), rounds);
> +       return NULL;
> +}
> +
> +static void *worker_attach(void *arg)
> +{
> +       struct uprobe_syscall_executed *skel;
> +       unsigned long rounds = 0, offset;
> +
> +       offset = get_uprobe_offset(&uprobe_test);
> +       if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
> +               return NULL;
> +
> +       skel = uprobe_syscall_executed__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
> +               return NULL;
> +
> +       while (!race_stop) {
> +               skel->links.test_uprobe = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
> +                                       0, "/proc/self/exe", offset, NULL);
> +               if (!ASSERT_OK_PTR(skel->links.test_uprobe, "bpf_program__attach_uprobe_opts"))
> +                       break;
> +
> +               bpf_link__destroy(skel->links.test_uprobe);
> +               skel->links.test_uprobe = NULL;
> +               rounds++;
> +       }
> +
> +       printf("tid %d attach rounds: %lu hits: %d\n", gettid(), rounds, skel->bss->executed);
> +       uprobe_syscall_executed__destroy(skel);
> +       return NULL;
> +}
> +
> +static void test_uprobe_race(void)
> +{
> +       int err, i, nr_threads;
> +       pthread_t *threads;
> +
> +       nr_threads = libbpf_num_possible_cpus();
> +       if (!ASSERT_GE(nr_threads, 0, "libbpf_num_possible_cpus"))

I hope there are strictly more than zero CPUs... ;)

> +               return;
> +
> +       threads = malloc(sizeof(*threads) * nr_threads);
> +       if (!ASSERT_OK_PTR(threads, "malloc"))
> +               return;
> +
> +       for (i = 0; i < nr_threads; i++) {
> +               err = pthread_create(&threads[i], NULL, i % 2 ? worker_trigger : worker_attach,
> +                                    NULL);

What happens when three is just one CPU?

> +               if (!ASSERT_OK(err, "pthread_create"))
> +                       goto cleanup;
> +       }
> +
> +       sleep(4);
> +
> +cleanup:
> +       race_stop = true;
> +       for (nr_threads = i, i = 0; i < nr_threads; i++)
> +               pthread_join(threads[i], NULL);
> +}
> +
>  static void __test_uprobe_syscall(void)
>  {
>         if (test__start_subtest("uretprobe_regs_equal"))
> @@ -637,6 +709,8 @@ static void __test_uprobe_syscall(void)
>                 test_uprobe_session();
>         if (test__start_subtest("uprobe_usdt"))
>                 test_uprobe_usdt();
> +       if (test__start_subtest("uprobe_race"))
> +               test_uprobe_race();
>  }
>  #else
>  static void __test_uprobe_syscall(void)
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 17/22] selftests/bpf: Add optimized usdt variant for basic usdt test
  2025-04-21 21:44 ` [PATCH perf/core 17/22] selftests/bpf: Add optimized usdt variant for basic usdt test Jiri Olsa
@ 2025-04-23 17:44   ` Andrii Nakryiko
  0 siblings, 0 replies; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-23 17:44 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:47 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding optimized usdt variant for basic usdt test to check that
> usdt arguments are properly passed in optimized code path.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  tools/testing/selftests/bpf/prog_tests/usdt.c | 38 ++++++++++++-------
>  1 file changed, 25 insertions(+), 13 deletions(-)
>

LGTM

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
> index 495d66414b57..3a5b5230bfa0 100644
> --- a/tools/testing/selftests/bpf/prog_tests/usdt.c
> +++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
> @@ -40,12 +40,19 @@ static void __always_inline trigger_func(int x) {
>         }
>  }
>

[...]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 18/22] selftests/bpf: Add uprobe_regs_equal test
  2025-04-21 21:44 ` [PATCH perf/core 18/22] selftests/bpf: Add uprobe_regs_equal test Jiri Olsa
@ 2025-04-23 17:46   ` Andrii Nakryiko
  2025-04-24 12:51     ` Jiri Olsa
  0 siblings, 1 reply; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-23 17:46 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 21, 2025 at 2:48 PM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Changing uretprobe_regs_trigger to allow the test for both
> uprobe and uretprobe and renaming it to uprobe_regs_equal.
>
> We check that both uprobe and uretprobe probes (bpf programs)
> see expected registers with few exceptions.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  .../selftests/bpf/prog_tests/uprobe_syscall.c | 58 ++++++++++++++-----
>  .../selftests/bpf/progs/uprobe_syscall.c      |  4 +-
>  2 files changed, 45 insertions(+), 17 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index f001986981ab..6d88c5b0f6aa 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> @@ -18,15 +18,17 @@
>
>  #pragma GCC diagnostic ignored "-Wattributes"
>
> -__naked unsigned long uretprobe_regs_trigger(void)
> +__attribute__((aligned(16)))
> +__nocf_check __weak __naked unsigned long uprobe_regs_trigger(void)
>  {
>         asm volatile (
> -               "movq $0xdeadbeef, %rax\n"
> +               ".byte 0x0f, 0x1f, 0x44, 0x00, 0x00     \n"

Is it me not being hardcore enough... But is anyone supposed to know
that this is nop5? ;) maybe add /* nop5 */ comment on the side?

> +               "movq $0xdeadbeef, %rax                 \n"

ret\n doesn't align newline, and uprobe_regs below don't either. So
maybe don't align them at all here?

>                 "ret\n"
>         );
>  }
>
> -__naked void uretprobe_regs(struct pt_regs *before, struct pt_regs *after)
> +__naked void uprobe_regs(struct pt_regs *before, struct pt_regs *after)
>  {
>         asm volatile (
>                 "movq %r15,   0(%rdi)\n"
> @@ -47,15 +49,17 @@ __naked void uretprobe_regs(struct pt_regs *before, struct pt_regs *after)
>                 "movq   $0, 120(%rdi)\n" /* orig_rax */
>                 "movq   $0, 128(%rdi)\n" /* rip      */
>                 "movq   $0, 136(%rdi)\n" /* cs       */
> +               "pushq %rax\n"
>                 "pushf\n"
>                 "pop %rax\n"
>                 "movq %rax, 144(%rdi)\n" /* eflags   */
> +               "pop %rax\n"
>                 "movq %rsp, 152(%rdi)\n" /* rsp      */
>                 "movq   $0, 160(%rdi)\n" /* ss       */
>
>                 /* save 2nd argument */
>                 "pushq %rsi\n"
> -               "call uretprobe_regs_trigger\n"
> +               "call uprobe_regs_trigger\n"
>
>                 /* save  return value and load 2nd argument pointer to rax */
>                 "pushq %rax\n"
> @@ -95,25 +99,37 @@ __naked void uretprobe_regs(struct pt_regs *before, struct pt_regs *after)
>  );
>  }
>
> -static void test_uretprobe_regs_equal(void)
> +static void test_uprobe_regs_equal(bool retprobe)
>  {
> +       LIBBPF_OPTS(bpf_uprobe_opts, opts,
> +               .retprobe = retprobe,
> +       );
>         struct uprobe_syscall *skel = NULL;
>         struct pt_regs before = {}, after = {};
>         unsigned long *pb = (unsigned long *) &before;
>         unsigned long *pa = (unsigned long *) &after;
>         unsigned long *pp;
> +       unsigned long offset;
>         unsigned int i, cnt;
> -       int err;
> +
> +       offset = get_uprobe_offset(&uprobe_regs_trigger);
> +       if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
> +               return;
>
>         skel = uprobe_syscall__open_and_load();
>         if (!ASSERT_OK_PTR(skel, "uprobe_syscall__open_and_load"))
>                 goto cleanup;
>
> -       err = uprobe_syscall__attach(skel);
> -       if (!ASSERT_OK(err, "uprobe_syscall__attach"))
> +       skel->links.probe = bpf_program__attach_uprobe_opts(skel->progs.probe,
> +                               0, "/proc/self/exe", offset, &opts);
> +       if (!ASSERT_OK_PTR(skel->links.probe, "bpf_program__attach_uprobe_opts"))
>                 goto cleanup;
>
> -       uretprobe_regs(&before, &after);
> +       /* make sure uprobe gets optimized */
> +       if (!retprobe)
> +               uprobe_regs_trigger();
> +
> +       uprobe_regs(&before, &after);
>
>         pp = (unsigned long *) &skel->bss->regs;
>         cnt = sizeof(before)/sizeof(*pb);
> @@ -122,7 +138,7 @@ static void test_uretprobe_regs_equal(void)
>                 unsigned int offset = i * sizeof(unsigned long);
>
>                 /*
> -                * Check register before and after uretprobe_regs_trigger call
> +                * Check register before and after uprobe_regs_trigger call
>                  * that triggers the uretprobe.
>                  */
>                 switch (offset) {
> @@ -136,7 +152,7 @@ static void test_uretprobe_regs_equal(void)
>
>                 /*
>                  * Check register seen from bpf program and register after
> -                * uretprobe_regs_trigger call
> +                * uprobe_regs_trigger call (with rax exception, check below).
>                  */
>                 switch (offset) {
>                 /*
> @@ -149,6 +165,15 @@ static void test_uretprobe_regs_equal(void)
>                 case offsetof(struct pt_regs, rsp):
>                 case offsetof(struct pt_regs, ss):
>                         break;
> +               /*
> +                * uprobe does not see return value in rax, it needs to see the
> +                * original (before) rax value
> +                */
> +               case offsetof(struct pt_regs, rax):
> +                       if (!retprobe) {
> +                               ASSERT_EQ(pp[i], pb[i], "uprobe rax prog-before value check");
> +                               break;
> +                       }
>                 default:
>                         if (!ASSERT_EQ(pp[i], pa[i], "register prog-after value check"))
>                                 fprintf(stdout, "failed register offset %u\n", offset);
> @@ -186,13 +211,13 @@ static void test_uretprobe_regs_change(void)
>         unsigned long cnt = sizeof(before)/sizeof(*pb);
>         unsigned int i, err, offset;
>
> -       offset = get_uprobe_offset(uretprobe_regs_trigger);
> +       offset = get_uprobe_offset(uprobe_regs_trigger);
>
>         err = write_bpf_testmod_uprobe(offset);
>         if (!ASSERT_OK(err, "register_uprobe"))
>                 return;
>
> -       uretprobe_regs(&before, &after);
> +       uprobe_regs(&before, &after);
>
>         err = write_bpf_testmod_uprobe(0);
>         if (!ASSERT_OK(err, "unregister_uprobe"))
> @@ -605,7 +630,8 @@ static void test_uretprobe_shadow_stack(void)
>         /* Run all the tests with shadow stack in place. */
>         shstk_is_enabled = true;
>
> -       test_uretprobe_regs_equal();
> +       test_uprobe_regs_equal(false);
> +       test_uprobe_regs_equal(true);
>         test_uretprobe_regs_change();
>         test_uretprobe_syscall_call();
>
> @@ -728,7 +754,7 @@ static void test_uprobe_sigill(void)
>  static void __test_uprobe_syscall(void)
>  {
>         if (test__start_subtest("uretprobe_regs_equal"))
> -               test_uretprobe_regs_equal();
> +               test_uprobe_regs_equal(true);
>         if (test__start_subtest("uretprobe_regs_change"))
>                 test_uretprobe_regs_change();
>         if (test__start_subtest("uretprobe_syscall_call"))
> @@ -747,6 +773,8 @@ static void __test_uprobe_syscall(void)
>                 test_uprobe_race();
>         if (test__start_subtest("uprobe_sigill"))
>                 test_uprobe_sigill();
> +       if (test__start_subtest("uprobe_regs_equal"))
> +               test_uprobe_regs_equal(false);
>  }
>  #else
>  static void __test_uprobe_syscall(void)
> diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall.c b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
> index 8a4fa6c7ef59..e08c31669e5a 100644
> --- a/tools/testing/selftests/bpf/progs/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
> @@ -7,8 +7,8 @@ struct pt_regs regs;
>
>  char _license[] SEC("license") = "GPL";
>
> -SEC("uretprobe//proc/self/exe:uretprobe_regs_trigger")
> -int uretprobe(struct pt_regs *ctx)
> +SEC("uprobe")
> +int probe(struct pt_regs *ctx)
>  {
>         __builtin_memcpy(&regs, ctx, sizeof(regs));
>         return 0;
> --
> 2.49.0
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes
  2025-04-23  0:04   ` Andrii Nakryiko
@ 2025-04-24 12:49     ` Jiri Olsa
  2025-04-24 16:06       ` Andrii Nakryiko
  0 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-24 12:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Tue, Apr 22, 2025 at 05:04:03PM -0700, Andrii Nakryiko wrote:

SNIP

> >  arch/x86/include/asm/uprobes.h |   7 +
> >  arch/x86/kernel/uprobes.c      | 281 ++++++++++++++++++++++++++++++++-
> >  include/linux/uprobes.h        |   6 +-
> >  kernel/events/uprobes.c        |  15 +-
> >  4 files changed, 301 insertions(+), 8 deletions(-)
> >
> 
> just minor nits, LGTM
> 
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> 
> > +int set_swbp(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> > +            unsigned long vaddr)
> > +{
> > +       if (should_optimize(auprobe)) {
> > +               bool optimized = false;
> > +               int err;
> > +
> > +               /*
> > +                * We could race with another thread that already optimized the probe,
> > +                * so let's not overwrite it with int3 again in this case.
> > +                */
> > +               err = is_optimized(vma->vm_mm, vaddr, &optimized);
> > +               if (err || optimized)
> > +                       return err;
> 
> IMO, this is a bit too clever, I'd go with plain
> 
> if (err)
>     return err;
> if (optimized)
>     return 0; /* we are done */
> 

ok

> (and mirror set_orig_insn() structure, consistently)

set_orig_insn does that already, right?

> 
> 
> > +       }
> > +       return uprobe_write_opcode(vma, vaddr, UPROBE_SWBP_INSN, true);
> > +}
> > +
> > +int set_orig_insn(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> > +                 unsigned long vaddr)
> > +{
> > +       if (test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags)) {
> > +               struct mm_struct *mm = vma->vm_mm;
> > +               bool optimized = false;
> > +               int err;
> > +
> > +               err = is_optimized(mm, vaddr, &optimized);
> > +               if (err)
> > +                       return err;
> > +               if (optimized)
> > +                       WARN_ON_ONCE(swbp_unoptimize(auprobe, vma, vaddr));
> > +       }
> > +       return uprobe_write_opcode(vma, vaddr, *(uprobe_opcode_t *)&auprobe->insn, false);
> > +}
> > +
> > +static int __arch_uprobe_optimize(struct mm_struct *mm, unsigned long vaddr)
> > +{
> > +       struct uprobe_trampoline *tramp;
> > +       struct vm_area_struct *vma;
> > +       int err = 0;
> > +
> > +       vma = find_vma(mm, vaddr);
> > +       if (!vma)
> > +               return -1;
> 
> this is EPERM, will be confusing to debug... why not -EINVAL?
> 
> > +       tramp = uprobe_trampoline_get(vaddr);
> > +       if (!tramp)
> > +               return -1;
> 
> ditto

so the error value is not exposed to user space in this case,
we try to optimize in the first hit with:

	handle_swbp()
	{
		arch_uprobe_optimize()
		{

			if (__arch_uprobe_optimize(mm, vaddr))
				set_bit(ARCH_UPROBE_FLAG_OPTIMIZE_FAIL, &auprobe->flags);

		}
	}

and set ARCH_UPROBE_FLAG_OPTIMIZE_FAIL flags bit in case of error,
plus there's WARN for swbp_optimize which should pass in case we
get that far

thanks,
jirka

> 
> > +       err = swbp_optimize(vma, vaddr, tramp->vaddr);
> > +       if (WARN_ON_ONCE(err))
> > +               uprobe_trampoline_put(tramp);
> > +       return err;
> > +}
> > +
> 
> [...]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 11/22] selftests/bpf: Use 5-byte nop for x86 usdt probes
  2025-04-23 17:33   ` Andrii Nakryiko
@ 2025-04-24 12:49     ` Jiri Olsa
  2025-04-24 16:29       ` Andrii Nakryiko
  0 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-24 12:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Wed, Apr 23, 2025 at 10:33:18AM -0700, Andrii Nakryiko wrote:
> On Mon, Apr 21, 2025 at 2:46 PM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Using 5-byte nop for x86 usdt probes so we can switch
> > to optimized uprobe them.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  tools/testing/selftests/bpf/sdt.h | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> 
> So sdt.h is an exact copy/paste from systemtap-sdt sources. I'd prefer
> to not modify it unnecessarily.
> 
> How about we copy/paste usdt.h ([0]) and use *that* for your
> benchmarks? I've already anticipated the need to change nop
> instruction, so you won't even need to modify the usdt.h file itself,
> just
> 
> #define USDT_NOP .byte 0x0f, 0x1f, 0x44, 0x00, 0x00
> 
> before #include "usdt.h"


sounds good, but it seems we need bit more changes for that,
so far I ended up with:

-       __usdt_asm1(990:        USDT_NOP)                                                       \
+       __usdt_asm5(990:        USDT_NOP)                                                       \

but it still won't compile, will need to spend more time on that,
unless you have better solution

thanks,
jirka

> 
> 
>   [0] https://github.com/libbpf/usdt/blob/main/usdt.h
> 
> > diff --git a/tools/testing/selftests/bpf/sdt.h b/tools/testing/selftests/bpf/sdt.h
> > index 1fcfa5160231..1d62c06f5ddc 100644
> > --- a/tools/testing/selftests/bpf/sdt.h
> > +++ b/tools/testing/selftests/bpf/sdt.h
> > @@ -236,6 +236,13 @@ __extension__ extern unsigned long long __sdt_unsp;
> >  #define _SDT_NOP       nop
> >  #endif
> >
> > +/* Use 5 byte nop for x86_64 to allow optimizing uprobes. */
> > +#if defined(__x86_64__)
> > +# define _SDT_DEF_NOP _SDT_ASM_5(990:  .byte 0x0f, 0x1f, 0x44, 0x00, 0x00)
> > +#else
> > +# define _SDT_DEF_NOP _SDT_ASM_1(990:  _SDT_NOP)
> > +#endif
> > +
> >  #define _SDT_NOTE_NAME "stapsdt"
> >  #define _SDT_NOTE_TYPE 3
> >
> > @@ -288,7 +295,7 @@ __extension__ extern unsigned long long __sdt_unsp;
> >
> >  #define _SDT_ASM_BODY(provider, name, pack_args, args, ...)                  \
> >    _SDT_DEF_MACROS                                                            \
> > -  _SDT_ASM_1(990:      _SDT_NOP)                                             \
> > +  _SDT_DEF_NOP                                                               \
> >    _SDT_ASM_3(          .pushsection .note.stapsdt,_SDT_ASM_AUTOGROUP,"note") \
> >    _SDT_ASM_1(          .balign 4)                                            \
> >    _SDT_ASM_3(          .4byte 992f-991f, 994f-993f, _SDT_NOTE_TYPE)          \
> > --
> > 2.49.0
> >

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 13/22] selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi
  2025-04-23 17:36   ` Andrii Nakryiko
@ 2025-04-24 12:49     ` Jiri Olsa
  0 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-24 12:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Wed, Apr 23, 2025 at 10:36:22AM -0700, Andrii Nakryiko wrote:
> On Mon, Apr 21, 2025 at 2:47 PM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Renaming uprobe_syscall_executed prog to test_uretprobe_multi
> > to fit properly in the following changes that add more programs.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c   | 8 ++++----
> >  .../testing/selftests/bpf/progs/uprobe_syscall_executed.c | 4 ++--
> >  2 files changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > index 2b00f16406c8..3c74a079e6d9 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > @@ -277,10 +277,10 @@ static void test_uretprobe_syscall_call(void)
> >                 _exit(0);
> >         }
> >
> > -       skel->links.test = bpf_program__attach_uprobe_multi(skel->progs.test, pid,
> > -                                                           "/proc/self/exe",
> > -                                                           "uretprobe_syscall_call", &opts);
> > -       if (!ASSERT_OK_PTR(skel->links.test, "bpf_program__attach_uprobe_multi"))
> > +       skel->links.test_uretprobe_multi = bpf_program__attach_uprobe_multi(skel->progs.test_uretprobe_multi,
> 
> this is a bit long, maybe
> 
> struct bpf_link *link;
> 
> link = bpf_program__attach...
> skel->links.test_uretprobe_multi = link;

ok, thanks

jirka

> 
> ?
> 
> But other than that
> 
> Acked-by: Andrii Nakryiko <andrii@kernel.org>
> 
> 
> > +                                                       pid, "/proc/self/exe",
> > +                                                       "uretprobe_syscall_call", &opts);
> > +       if (!ASSERT_OK_PTR(skel->links.test_uretprobe_multi, "bpf_program__attach_uprobe_multi"))
> >                 goto cleanup;
> >
> >         /* kick the child */
> > diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> > index 0d7f1a7db2e2..2e1b689ed4fb 100644
> > --- a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> > +++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> > @@ -10,8 +10,8 @@ char _license[] SEC("license") = "GPL";
> >  int executed = 0;
> >
> >  SEC("uretprobe.multi")
> > -int test(struct pt_regs *regs)
> > +int test_uretprobe_multi(struct pt_regs *ctx)
> >  {
> > -       executed = 1;
> > +       executed++;
> >         return 0;
> >  }
> > --
> > 2.49.0
> >

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 14/22] selftests/bpf: Add uprobe/usdt syscall tests
  2025-04-23 17:40   ` Andrii Nakryiko
@ 2025-04-24 12:49     ` Jiri Olsa
  0 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-24 12:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Wed, Apr 23, 2025 at 10:40:58AM -0700, Andrii Nakryiko wrote:
> On Mon, Apr 21, 2025 at 2:47 PM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Adding tests for optimized uprobe/usdt probes.
> >
> > Checking that we get expected trampoline and attached bpf programs
> > get executed properly.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  .../selftests/bpf/prog_tests/uprobe_syscall.c | 278 +++++++++++++++++-
> >  .../bpf/progs/uprobe_syscall_executed.c       |  37 +++
> >  2 files changed, 314 insertions(+), 1 deletion(-)
> >
> 
> [...]
> 
> >  static void __test_uprobe_syscall(void)
> > diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> > index 2e1b689ed4fb..7bb4338c3ee2 100644
> > --- a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> > +++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
> > @@ -1,6 +1,8 @@
> >  // SPDX-License-Identifier: GPL-2.0
> >  #include "vmlinux.h"
> >  #include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +#include <bpf/usdt.bpf.h>
> >  #include <string.h>
> >
> >  struct pt_regs regs;
> > @@ -9,9 +11,44 @@ char _license[] SEC("license") = "GPL";
> >
> >  int executed = 0;
> >
> > +SEC("uprobe")
> > +int BPF_UPROBE(test_uprobe)
> > +{
> 
> I'd add a PID filter to all of these to guard against potential
> unrelated triggerings if in the future there is some parallel test
> that attaches to all uprobes or something like that. Better safe than
> sorry.

ok, makes sense, will add

thanks,
jirka

> 
> > +       executed++;
> > +       return 0;
> > +}
> > +
> > +SEC("uretprobe")
> > +int BPF_URETPROBE(test_uretprobe)
> > +{
> > +       executed++;
> > +       return 0;
> > +}
> > +
> > +SEC("uprobe.multi")
> > +int test_uprobe_multi(struct pt_regs *ctx)
> > +{
> > +       executed++;
> > +       return 0;
> > +}
> > +
> >  SEC("uretprobe.multi")
> >  int test_uretprobe_multi(struct pt_regs *ctx)
> >  {
> >         executed++;
> >         return 0;
> >  }
> > +
> > +SEC("uprobe.session")
> > +int test_uprobe_session(struct pt_regs *ctx)
> > +{
> > +       executed++;
> > +       return 0;
> > +}
> > +
> > +SEC("usdt")
> > +int test_usdt(struct pt_regs *ctx)
> > +{
> > +       executed++;
> > +       return 0;
> > +}
> > --
> > 2.49.0
> >

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 15/22] selftests/bpf: Add hit/attach/detach race optimized uprobe test
  2025-04-23 17:42   ` Andrii Nakryiko
@ 2025-04-24 12:51     ` Jiri Olsa
  2025-04-24 16:30       ` Andrii Nakryiko
  0 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-24 12:51 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Wed, Apr 23, 2025 at 10:42:43AM -0700, Andrii Nakryiko wrote:

SNIP

> > +
> > +static void test_uprobe_race(void)
> > +{
> > +       int err, i, nr_threads;
> > +       pthread_t *threads;
> > +
> > +       nr_threads = libbpf_num_possible_cpus();
> > +       if (!ASSERT_GE(nr_threads, 0, "libbpf_num_possible_cpus"))
> 
> I hope there are strictly more than zero CPUs... ;)
> 
> > +               return;
> > +
> > +       threads = malloc(sizeof(*threads) * nr_threads);
> > +       if (!ASSERT_OK_PTR(threads, "malloc"))
> > +               return;
> > +
> > +       for (i = 0; i < nr_threads; i++) {
> > +               err = pthread_create(&threads[i], NULL, i % 2 ? worker_trigger : worker_attach,
> > +                                    NULL);
> 
> What happens when three is just one CPU?
> 

right, we need at least 2 threads, how about the change below

thanks,
jirka


---
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index d55c3579cebe..c885f097eed4 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -701,8 +701,9 @@ static void test_uprobe_race(void)
 	pthread_t *threads;
 
 	nr_threads = libbpf_num_possible_cpus();
-	if (!ASSERT_GE(nr_threads, 0, "libbpf_num_possible_cpus"))
+	if (!ASSERT_GT(nr_threads, 0, "libbpf_num_possible_cpus"))
 		return;
+	nr_threads = max(2, nr_threads);
 
 	threads = malloc(sizeof(*threads) * nr_threads);
 	if (!ASSERT_OK_PTR(threads, "malloc"))

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 18/22] selftests/bpf: Add uprobe_regs_equal test
  2025-04-23 17:46   ` Andrii Nakryiko
@ 2025-04-24 12:51     ` Jiri Olsa
  0 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-24 12:51 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Wed, Apr 23, 2025 at 10:46:24AM -0700, Andrii Nakryiko wrote:
> On Mon, Apr 21, 2025 at 2:48 PM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Changing uretprobe_regs_trigger to allow the test for both
> > uprobe and uretprobe and renaming it to uprobe_regs_equal.
> >
> > We check that both uprobe and uretprobe probes (bpf programs)
> > see expected registers with few exceptions.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  .../selftests/bpf/prog_tests/uprobe_syscall.c | 58 ++++++++++++++-----
> >  .../selftests/bpf/progs/uprobe_syscall.c      |  4 +-
> >  2 files changed, 45 insertions(+), 17 deletions(-)
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > index f001986981ab..6d88c5b0f6aa 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> > @@ -18,15 +18,17 @@
> >
> >  #pragma GCC diagnostic ignored "-Wattributes"
> >
> > -__naked unsigned long uretprobe_regs_trigger(void)
> > +__attribute__((aligned(16)))
> > +__nocf_check __weak __naked unsigned long uprobe_regs_trigger(void)
> >  {
> >         asm volatile (
> > -               "movq $0xdeadbeef, %rax\n"
> > +               ".byte 0x0f, 0x1f, 0x44, 0x00, 0x00     \n"
> 
> Is it me not being hardcore enough... But is anyone supposed to know
> that this is nop5? ;) maybe add /* nop5 */ comment on the side?

ok, will add the comment :)

> 
> > +               "movq $0xdeadbeef, %rax                 \n"
> 
> ret\n doesn't align newline, and uprobe_regs below don't either. So
> maybe don't align them at all here?

ok

thanks,
jirka

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes
  2025-04-24 12:49     ` Jiri Olsa
@ 2025-04-24 16:06       ` Andrii Nakryiko
  0 siblings, 0 replies; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-24 16:06 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Thu, Apr 24, 2025 at 5:49 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Tue, Apr 22, 2025 at 05:04:03PM -0700, Andrii Nakryiko wrote:
>
> SNIP
>
> > >  arch/x86/include/asm/uprobes.h |   7 +
> > >  arch/x86/kernel/uprobes.c      | 281 ++++++++++++++++++++++++++++++++-
> > >  include/linux/uprobes.h        |   6 +-
> > >  kernel/events/uprobes.c        |  15 +-
> > >  4 files changed, 301 insertions(+), 8 deletions(-)
> > >
> >
> > just minor nits, LGTM
> >
> > Acked-by: Andrii Nakryiko <andrii@kernel.org>
> >
> > > +int set_swbp(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> > > +            unsigned long vaddr)
> > > +{
> > > +       if (should_optimize(auprobe)) {
> > > +               bool optimized = false;
> > > +               int err;
> > > +
> > > +               /*
> > > +                * We could race with another thread that already optimized the probe,
> > > +                * so let's not overwrite it with int3 again in this case.
> > > +                */
> > > +               err = is_optimized(vma->vm_mm, vaddr, &optimized);
> > > +               if (err || optimized)
> > > +                       return err;
> >
> > IMO, this is a bit too clever, I'd go with plain
> >
> > if (err)
> >     return err;
> > if (optimized)
> >     return 0; /* we are done */
> >
>
> ok
>
> > (and mirror set_orig_insn() structure, consistently)
>
> set_orig_insn does that already, right?
>

right, and that was my point

> >
> >
> > > +       }
> > > +       return uprobe_write_opcode(vma, vaddr, UPROBE_SWBP_INSN, true);
> > > +}
> > > +
> > > +int set_orig_insn(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
> > > +                 unsigned long vaddr)
> > > +{
> > > +       if (test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags)) {
> > > +               struct mm_struct *mm = vma->vm_mm;
> > > +               bool optimized = false;
> > > +               int err;
> > > +
> > > +               err = is_optimized(mm, vaddr, &optimized);
> > > +               if (err)
> > > +                       return err;
> > > +               if (optimized)
> > > +                       WARN_ON_ONCE(swbp_unoptimize(auprobe, vma, vaddr));
> > > +       }
> > > +       return uprobe_write_opcode(vma, vaddr, *(uprobe_opcode_t *)&auprobe->insn, false);
> > > +}
> > > +
> > > +static int __arch_uprobe_optimize(struct mm_struct *mm, unsigned long vaddr)
> > > +{
> > > +       struct uprobe_trampoline *tramp;
> > > +       struct vm_area_struct *vma;
> > > +       int err = 0;
> > > +
> > > +       vma = find_vma(mm, vaddr);
> > > +       if (!vma)
> > > +               return -1;
> >
> > this is EPERM, will be confusing to debug... why not -EINVAL?
> >
> > > +       tramp = uprobe_trampoline_get(vaddr);
> > > +       if (!tramp)
> > > +               return -1;
> >
> > ditto
>
> so the error value is not exposed to user space in this case,
> we try to optimize in the first hit with:
>
>         handle_swbp()
>         {
>                 arch_uprobe_optimize()
>                 {
>
>                         if (__arch_uprobe_optimize(mm, vaddr))
>                                 set_bit(ARCH_UPROBE_FLAG_OPTIMIZE_FAIL, &auprobe->flags);
>
>                 }
>         }
>
> and set ARCH_UPROBE_FLAG_OPTIMIZE_FAIL flags bit in case of error,
> plus there's WARN for swbp_optimize which should pass in case we
> get that far

yeah, I know, but I don't think we should deviate from kernel-wide
-Exxx convention for returning errors from functions just because this
error doesn't make it all the way to user space

>
> thanks,
> jirka
>
> >
> > > +       err = swbp_optimize(vma, vaddr, tramp->vaddr);
> > > +       if (WARN_ON_ONCE(err))
> > > +               uprobe_trampoline_put(tramp);
> > > +       return err;
> > > +}
> > > +
> >
> > [...]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 11/22] selftests/bpf: Use 5-byte nop for x86 usdt probes
  2025-04-24 12:49     ` Jiri Olsa
@ 2025-04-24 16:29       ` Andrii Nakryiko
  2025-04-24 18:20         ` Andrii Nakryiko
  0 siblings, 1 reply; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-24 16:29 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Thu, Apr 24, 2025 at 5:49 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Wed, Apr 23, 2025 at 10:33:18AM -0700, Andrii Nakryiko wrote:
> > On Mon, Apr 21, 2025 at 2:46 PM Jiri Olsa <jolsa@kernel.org> wrote:
> > >
> > > Using 5-byte nop for x86 usdt probes so we can switch
> > > to optimized uprobe them.
> > >
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > >  tools/testing/selftests/bpf/sdt.h | 9 ++++++++-
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> >
> > So sdt.h is an exact copy/paste from systemtap-sdt sources. I'd prefer
> > to not modify it unnecessarily.
> >
> > How about we copy/paste usdt.h ([0]) and use *that* for your
> > benchmarks? I've already anticipated the need to change nop
> > instruction, so you won't even need to modify the usdt.h file itself,
> > just
> >
> > #define USDT_NOP .byte 0x0f, 0x1f, 0x44, 0x00, 0x00
> >
> > before #include "usdt.h"
>
>
> sounds good, but it seems we need bit more changes for that,
> so far I ended up with:
>
> -       __usdt_asm1(990:        USDT_NOP)                                                       \
> +       __usdt_asm5(990:        USDT_NOP)                                                       \
>
> but it still won't compile, will need to spend more time on that,
> unless you have better solution
>

Use

#define USDT_NOP .ascii "\x0F\x1F\x44\x00\x00"

for now, I'll need to improve macro magic to handle instructions with
commas in them...

> thanks,
> jirka
>
> >
> >
> >   [0] https://github.com/libbpf/usdt/blob/main/usdt.h
> >
> > > diff --git a/tools/testing/selftests/bpf/sdt.h b/tools/testing/selftests/bpf/sdt.h
> > > index 1fcfa5160231..1d62c06f5ddc 100644
> > > --- a/tools/testing/selftests/bpf/sdt.h
> > > +++ b/tools/testing/selftests/bpf/sdt.h
> > > @@ -236,6 +236,13 @@ __extension__ extern unsigned long long __sdt_unsp;
> > >  #define _SDT_NOP       nop
> > >  #endif
> > >
> > > +/* Use 5 byte nop for x86_64 to allow optimizing uprobes. */
> > > +#if defined(__x86_64__)
> > > +# define _SDT_DEF_NOP _SDT_ASM_5(990:  .byte 0x0f, 0x1f, 0x44, 0x00, 0x00)
> > > +#else
> > > +# define _SDT_DEF_NOP _SDT_ASM_1(990:  _SDT_NOP)
> > > +#endif
> > > +
> > >  #define _SDT_NOTE_NAME "stapsdt"
> > >  #define _SDT_NOTE_TYPE 3
> > >
> > > @@ -288,7 +295,7 @@ __extension__ extern unsigned long long __sdt_unsp;
> > >
> > >  #define _SDT_ASM_BODY(provider, name, pack_args, args, ...)                  \
> > >    _SDT_DEF_MACROS                                                            \
> > > -  _SDT_ASM_1(990:      _SDT_NOP)                                             \
> > > +  _SDT_DEF_NOP                                                               \
> > >    _SDT_ASM_3(          .pushsection .note.stapsdt,_SDT_ASM_AUTOGROUP,"note") \
> > >    _SDT_ASM_1(          .balign 4)                                            \
> > >    _SDT_ASM_3(          .4byte 992f-991f, 994f-993f, _SDT_NOTE_TYPE)          \
> > > --
> > > 2.49.0
> > >

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 15/22] selftests/bpf: Add hit/attach/detach race optimized uprobe test
  2025-04-24 12:51     ` Jiri Olsa
@ 2025-04-24 16:30       ` Andrii Nakryiko
  0 siblings, 0 replies; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-24 16:30 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Thu, Apr 24, 2025 at 5:51 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Wed, Apr 23, 2025 at 10:42:43AM -0700, Andrii Nakryiko wrote:
>
> SNIP
>
> > > +
> > > +static void test_uprobe_race(void)
> > > +{
> > > +       int err, i, nr_threads;
> > > +       pthread_t *threads;
> > > +
> > > +       nr_threads = libbpf_num_possible_cpus();
> > > +       if (!ASSERT_GE(nr_threads, 0, "libbpf_num_possible_cpus"))
> >
> > I hope there are strictly more than zero CPUs... ;)
> >
> > > +               return;
> > > +
> > > +       threads = malloc(sizeof(*threads) * nr_threads);
> > > +       if (!ASSERT_OK_PTR(threads, "malloc"))
> > > +               return;
> > > +
> > > +       for (i = 0; i < nr_threads; i++) {
> > > +               err = pthread_create(&threads[i], NULL, i % 2 ? worker_trigger : worker_attach,
> > > +                                    NULL);
> >
> > What happens when three is just one CPU?
> >
>
> right, we need at least 2 threads, how about the change below
>
> thanks,
> jirka
>
>
> ---
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> index d55c3579cebe..c885f097eed4 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
> @@ -701,8 +701,9 @@ static void test_uprobe_race(void)
>         pthread_t *threads;
>
>         nr_threads = libbpf_num_possible_cpus();
> -       if (!ASSERT_GE(nr_threads, 0, "libbpf_num_possible_cpus"))
> +       if (!ASSERT_GT(nr_threads, 0, "libbpf_num_possible_cpus"))
>                 return;
> +       nr_threads = max(2, nr_threads);

yep, ack

>
>         threads = malloc(sizeof(*threads) * nr_threads);
>         if (!ASSERT_OK_PTR(threads, "malloc"))

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 11/22] selftests/bpf: Use 5-byte nop for x86 usdt probes
  2025-04-24 16:29       ` Andrii Nakryiko
@ 2025-04-24 18:20         ` Andrii Nakryiko
  2025-04-25 13:20           ` Jiri Olsa
  0 siblings, 1 reply; 74+ messages in thread
From: Andrii Nakryiko @ 2025-04-24 18:20 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Thu, Apr 24, 2025 at 9:29 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Apr 24, 2025 at 5:49 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Wed, Apr 23, 2025 at 10:33:18AM -0700, Andrii Nakryiko wrote:
> > > On Mon, Apr 21, 2025 at 2:46 PM Jiri Olsa <jolsa@kernel.org> wrote:
> > > >
> > > > Using 5-byte nop for x86 usdt probes so we can switch
> > > > to optimized uprobe them.
> > > >
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > ---
> > > >  tools/testing/selftests/bpf/sdt.h | 9 ++++++++-
> > > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > >
> > >
> > > So sdt.h is an exact copy/paste from systemtap-sdt sources. I'd prefer
> > > to not modify it unnecessarily.
> > >
> > > How about we copy/paste usdt.h ([0]) and use *that* for your
> > > benchmarks? I've already anticipated the need to change nop
> > > instruction, so you won't even need to modify the usdt.h file itself,
> > > just
> > >
> > > #define USDT_NOP .byte 0x0f, 0x1f, 0x44, 0x00, 0x00
> > >
> > > before #include "usdt.h"
> >
> >
> > sounds good, but it seems we need bit more changes for that,
> > so far I ended up with:
> >
> > -       __usdt_asm1(990:        USDT_NOP)                                                       \
> > +       __usdt_asm5(990:        USDT_NOP)                                                       \
> >
> > but it still won't compile, will need to spend more time on that,
> > unless you have better solution
> >
>
> Use
>
> #define USDT_NOP .ascii "\x0F\x1F\x44\x00\x00"
>
> for now, I'll need to improve macro magic to handle instructions with
> commas in them...

Ok, fixed in [0]. If you get the latest version, the .byte approach
will work (I have tests in CI now to validate this).

  [0] https://github.com/libbpf/usdt/pull/12

>
> > thanks,
> > jirka
> >
> > >
> > >
> > >   [0] https://github.com/libbpf/usdt/blob/main/usdt.h
> > >
> > > > diff --git a/tools/testing/selftests/bpf/sdt.h b/tools/testing/selftests/bpf/sdt.h
> > > > index 1fcfa5160231..1d62c06f5ddc 100644
> > > > --- a/tools/testing/selftests/bpf/sdt.h
> > > > +++ b/tools/testing/selftests/bpf/sdt.h
> > > > @@ -236,6 +236,13 @@ __extension__ extern unsigned long long __sdt_unsp;
> > > >  #define _SDT_NOP       nop
> > > >  #endif
> > > >
> > > > +/* Use 5 byte nop for x86_64 to allow optimizing uprobes. */
> > > > +#if defined(__x86_64__)
> > > > +# define _SDT_DEF_NOP _SDT_ASM_5(990:  .byte 0x0f, 0x1f, 0x44, 0x00, 0x00)
> > > > +#else
> > > > +# define _SDT_DEF_NOP _SDT_ASM_1(990:  _SDT_NOP)
> > > > +#endif
> > > > +
> > > >  #define _SDT_NOTE_NAME "stapsdt"
> > > >  #define _SDT_NOTE_TYPE 3
> > > >
> > > > @@ -288,7 +295,7 @@ __extension__ extern unsigned long long __sdt_unsp;
> > > >
> > > >  #define _SDT_ASM_BODY(provider, name, pack_args, args, ...)                  \
> > > >    _SDT_DEF_MACROS                                                            \
> > > > -  _SDT_ASM_1(990:      _SDT_NOP)                                             \
> > > > +  _SDT_DEF_NOP                                                               \
> > > >    _SDT_ASM_3(          .pushsection .note.stapsdt,_SDT_ASM_AUTOGROUP,"note") \
> > > >    _SDT_ASM_1(          .balign 4)                                            \
> > > >    _SDT_ASM_3(          .4byte 992f-991f, 994f-993f, _SDT_NOTE_TYPE)          \
> > > > --
> > > > 2.49.0
> > > >

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 11/22] selftests/bpf: Use 5-byte nop for x86 usdt probes
  2025-04-24 18:20         ` Andrii Nakryiko
@ 2025-04-25 13:20           ` Jiri Olsa
  0 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-25 13:20 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf,
	linux-kernel, linux-trace-kernel, x86, Song Liu, Yonghong Song,
	John Fastabend, Hao Luo, Steven Rostedt, Masami Hiramatsu,
	Alan Maguire, David Laight, Thomas Weißschuh, Ingo Molnar

On Thu, Apr 24, 2025 at 11:20:11AM -0700, Andrii Nakryiko wrote:
> On Thu, Apr 24, 2025 at 9:29 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Apr 24, 2025 at 5:49 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Wed, Apr 23, 2025 at 10:33:18AM -0700, Andrii Nakryiko wrote:
> > > > On Mon, Apr 21, 2025 at 2:46 PM Jiri Olsa <jolsa@kernel.org> wrote:
> > > > >
> > > > > Using 5-byte nop for x86 usdt probes so we can switch
> > > > > to optimized uprobe them.
> > > > >
> > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > > ---
> > > > >  tools/testing/selftests/bpf/sdt.h | 9 ++++++++-
> > > > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > > >
> > > >
> > > > So sdt.h is an exact copy/paste from systemtap-sdt sources. I'd prefer
> > > > to not modify it unnecessarily.
> > > >
> > > > How about we copy/paste usdt.h ([0]) and use *that* for your
> > > > benchmarks? I've already anticipated the need to change nop
> > > > instruction, so you won't even need to modify the usdt.h file itself,
> > > > just
> > > >
> > > > #define USDT_NOP .byte 0x0f, 0x1f, 0x44, 0x00, 0x00
> > > >
> > > > before #include "usdt.h"
> > >
> > >
> > > sounds good, but it seems we need bit more changes for that,
> > > so far I ended up with:
> > >
> > > -       __usdt_asm1(990:        USDT_NOP)                                                       \
> > > +       __usdt_asm5(990:        USDT_NOP)                                                       \
> > >
> > > but it still won't compile, will need to spend more time on that,
> > > unless you have better solution
> > >
> >
> > Use
> >
> > #define USDT_NOP .ascii "\x0F\x1F\x44\x00\x00"
> >
> > for now, I'll need to improve macro magic to handle instructions with
> > commas in them...
> 
> Ok, fixed in [0]. If you get the latest version, the .byte approach
> will work (I have tests in CI now to validate this).
> 
>   [0] https://github.com/libbpf/usdt/pull/12

yep, works nicely, thanks

jirka

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
  2025-04-21 21:44 ` [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode Jiri Olsa
  2025-04-22 23:48   ` Andrii Nakryiko
@ 2025-04-27 14:13   ` Oleg Nesterov
  2025-04-28 10:51     ` Jiri Olsa
  1 sibling, 1 reply; 74+ messages in thread
From: Oleg Nesterov @ 2025-04-27 14:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On 04/21, Jiri Olsa wrote:
>
> +static int set_swbp_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
> +{
> +	struct mm_struct *mm = vma->vm_mm;
> +	int err;
> +
> +	/* We are going to replace instruction, update ref_ctr. */
> +	if (uprobe->ref_ctr_offset) {
> +		err = update_ref_ctr(uprobe, mm, 1);
> +		if (err)
> +			return err;
> +	}
> +
> +	err = set_swbp(&uprobe->arch, vma, vaddr);
> +
> +	/* Revert back reference counter if instruction update failed. */
> +	if (err && uprobe->ref_ctr_offset)
> +		update_ref_ctr(uprobe, mm, -1);
> +	return err;
>  }
...
> +static int set_orig_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
> +{
> +	int err = set_orig_insn(&uprobe->arch, vma, vaddr);
> +
> +	/* Revert back reference counter even if instruction update failed. */
> +	if (uprobe->ref_ctr_offset)
> +		update_ref_ctr(uprobe, vma->vm_mm, -1);
> +	return err;
>  }

This doesn't look right even in the simplest case...

To simplify, suppose that uprobe_register() needs to change a single mm/vma
and set_swbp() fails. In this case uprobe_register() calls uprobe_unregister()
which will find the same vma and call set_orig_refctr(). set_orig_insn() will
do nothing. But update_ref_ctr(uprobe, vma->vm_mm, -1) is wrong/unbalanced.

The current code updates ref_ctr after the verify_opcode() check, so it doesn't
have this problem.

-------------------------------------------------------------------------------
OTOH, I think that the current logic is not really correct too,

	/* Revert back reference counter if instruction update failed. */
	if (ret < 0 && is_register && ref_ctr_updated)
		update_ref_ctr(uprobe, mm, -1);

I think that "Revert back reference counter" logic should not depend on
is_register. Otherwise we can have the unbalanced update_ref_ctr(-1) if
uprobe_unregister() fails, then another uprobe_register() comes at the
same address, and after that uprobe_unregister() succeeds.

Oleg.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 07/22] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock
  2025-04-21 21:44 ` [PATCH perf/core 07/22] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock Jiri Olsa
  2025-04-22 23:48   ` Andrii Nakryiko
@ 2025-04-27 14:24   ` Oleg Nesterov
  2025-04-28 11:11     ` Jiri Olsa
  1 sibling, 1 reply; 74+ messages in thread
From: Oleg Nesterov @ 2025-04-27 14:24 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On 04/21, Jiri Olsa wrote:
>
> @@ -1483,7 +1483,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
>  	struct vm_area_struct *vma;
>  	int err = 0;
>
> -	mmap_read_lock(mm);
> +	mmap_write_lock(mm);

So uprobe_write_opcode() is always called under down_write(), right?
Then this

	* Called with mm->mmap_lock held for read or write.

comment should be probably updated.

And perhaps the comment above mmap_write_lock() in register_for_each_vma()
should be updated too... or even removed.

Oleg.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines
  2025-04-21 21:44 ` [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines Jiri Olsa
  2025-04-22 23:51   ` Andrii Nakryiko
@ 2025-04-27 14:56   ` Oleg Nesterov
  2025-04-27 17:34     ` Oleg Nesterov
  2025-04-27 18:04   ` Oleg Nesterov
  2 siblings, 1 reply; 74+ messages in thread
From: Oleg Nesterov @ 2025-04-27 14:56 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On 04/21, Jiri Olsa wrote:
>
> +static unsigned long find_nearest_page(unsigned long vaddr)
> +{
> +	struct vm_area_struct *vma, *prev = NULL;
> +	unsigned long prev_vm_end = PAGE_SIZE;
> +	VMA_ITERATOR(vmi, current->mm, 0);
> +
> +	vma = vma_next(&vmi);
> +	while (vma) {
> +		if (prev)
> +			prev_vm_end = prev->vm_end;
> +		if (vma->vm_start - prev_vm_end  >= PAGE_SIZE) {
> +			if (is_reachable_by_call(prev_vm_end, vaddr))
> +				return prev_vm_end;
> +			if (is_reachable_by_call(vma->vm_start - PAGE_SIZE, vaddr))
> +				return vma->vm_start - PAGE_SIZE;
> +		}
> +		prev = vma;
> +		vma = vma_next(&vmi);
> +	}
> +
> +	return 0;
> +}

This can be simplified afaics... We don't really need prev, and we can
use for_each_vma(),

	static unsigned long find_nearest_page(unsigned long vaddr)
	{
		struct vm_area_struct *vma;
		unsigned long prev_vm_end = PAGE_SIZE;
		VMA_ITERATOR(vmi, current->mm, 0);

		for_each_vma(vmi, vma) {
			if (vma->vm_start - prev_vm_end  >= PAGE_SIZE) {
				if (is_reachable_by_call(prev_vm_end, vaddr))
					return prev_vm_end;
				if (is_reachable_by_call(vma->vm_start - PAGE_SIZE, vaddr))
					return vma->vm_start - PAGE_SIZE;
			}
			prev_vm_end = vma->vm_end;
		}

		return 0;
	}

> +static struct uprobe_trampoline *create_uprobe_trampoline(unsigned long vaddr)
> +{
> +	struct pt_regs *regs = task_pt_regs(current);
> +	struct mm_struct *mm = current->mm;
> +	struct uprobe_trampoline *tramp;
> +	struct vm_area_struct *vma;
> +
> +	if (!user_64bit_mode(regs))
> +		return NULL;

Cosmetic, but I think it would be better to move this check into the
caller, uprobe_trampoline_get().

> +	vma = _install_special_mapping(mm, tramp->vaddr, PAGE_SIZE,
> +				VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_DONTCOPY|VM_IO,
> +				&tramp_mapping);

Note that xol_add_vma() -> _install_special_mapping() uses VM_SEALED_SYSMAP.
Perhaps create_uprobe_trampoline() should use this flag too for consistency?

Oleg.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 09/22] uprobes/x86: Add uprobe syscall to speed up uprobe
  2025-04-21 21:44 ` [PATCH perf/core 09/22] uprobes/x86: Add uprobe syscall to speed up uprobe Jiri Olsa
  2025-04-22 23:48   ` Andrii Nakryiko
@ 2025-04-27 15:51   ` Oleg Nesterov
  1 sibling, 0 replies; 74+ messages in thread
From: Oleg Nesterov @ 2025-04-27 15:51 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On 04/21, Jiri Olsa wrote:
>
> We do not allow to execute uprobe syscall if the caller is not
> from uprobe trampoline mapping.

...

> +SYSCALL_DEFINE0(uprobe)
> +{
> +	struct pt_regs *regs = task_pt_regs(current);
> +	unsigned long ip, sp, ax_r11_cx_ip[4];
> +	int err;
> +
> +	/* Allow execution only from uprobe trampolines. */
> +	if (!in_uprobe_trampoline(regs->ip))
> +		goto sigill;

I honestly don't understand why do we need this check. Same for the similar
trampoline_check_ip() check in sys_uretprobe(). Nevermind, I won't argue.

Oleg.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes
  2025-04-21 21:44 ` [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes Jiri Olsa
  2025-04-23  0:04   ` Andrii Nakryiko
@ 2025-04-27 17:11   ` Oleg Nesterov
  2025-04-28 13:24     ` Jiri Olsa
  2025-04-28 13:24     ` Jiri Olsa
  1 sibling, 2 replies; 74+ messages in thread
From: Oleg Nesterov @ 2025-04-27 17:11 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

I didn't actually read this patch yet, but let me ask anyway...

On 04/21, Jiri Olsa wrote:
>
> +static int swbp_optimize(struct vm_area_struct *vma, unsigned long vaddr, unsigned long tramp)
> +{
> +	struct write_opcode_ctx ctx = {
> +		.base = vaddr,
> +	};
> +	char call[5];
> +	int err;
> +
> +	relative_call(call, vaddr, tramp);
> +
> +	/*
> +	 * We are in state where breakpoint (int3) is installed on top of first
> +	 * byte of the nop5 instruction. We will do following steps to overwrite
> +	 * this to call instruction:
> +	 *
> +	 * - sync cores
> +	 * - write last 4 bytes of the call instruction
> +	 * - sync cores
> +	 * - update the call instruction opcode
> +	 */
> +
> +	text_poke_sync();

Hmm. I would like to understand why exactly we need at least this first
text_poke_sync() before "write last 4 bytes of the call instruction".


And... I don't suggest to do this right now, but I am wondering if we can
use mm_cpumask(vma->vm_mm) later, I guess we don't care if we race with
switch_mm_irqs_off() which can add another CPU to this mask...

> +void arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
> +{
> +	struct mm_struct *mm = current->mm;
> +	uprobe_opcode_t insn[5];
> +
> +	/*
> +	 * Do not optimize if shadow stack is enabled, the return address hijack
> +	 * code in arch_uretprobe_hijack_return_addr updates wrong frame when
> +	 * the entry uprobe is optimized and the shadow stack crashes the app.
> +	 */
> +	if (shstk_is_enabled())
> +		return;

Not sure I fully understand the comment/problem, but what if
prctl(ARCH_SHSTK_ENABLE) is called after arch_uprobe_optimize() succeeds?

Oleg.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines
  2025-04-27 14:56   ` Oleg Nesterov
@ 2025-04-27 17:34     ` Oleg Nesterov
  2025-04-28 13:48       ` Jiri Olsa
  0 siblings, 1 reply; 74+ messages in thread
From: Oleg Nesterov @ 2025-04-27 17:34 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On 04/27, Oleg Nesterov wrote:
>
> On 04/21, Jiri Olsa wrote:
> >
> > +static unsigned long find_nearest_page(unsigned long vaddr)
> > +{
> > +	struct vm_area_struct *vma, *prev = NULL;
> > +	unsigned long prev_vm_end = PAGE_SIZE;
> > +	VMA_ITERATOR(vmi, current->mm, 0);
> > +
> > +	vma = vma_next(&vmi);
> > +	while (vma) {
> > +		if (prev)
> > +			prev_vm_end = prev->vm_end;
> > +		if (vma->vm_start - prev_vm_end  >= PAGE_SIZE) {
> > +			if (is_reachable_by_call(prev_vm_end, vaddr))
> > +				return prev_vm_end;
> > +			if (is_reachable_by_call(vma->vm_start - PAGE_SIZE, vaddr))
> > +				return vma->vm_start - PAGE_SIZE;
> > +		}
> > +		prev = vma;
> > +		vma = vma_next(&vmi);
> > +	}
> > +
> > +	return 0;
> > +}
>
> This can be simplified afaics... We don't really need prev, and we can
> use for_each_vma(),
>
> 	static unsigned long find_nearest_page(unsigned long vaddr)
> 	{
> 		struct vm_area_struct *vma;
> 		unsigned long prev_vm_end = PAGE_SIZE;
> 		VMA_ITERATOR(vmi, current->mm, 0);
>
> 		for_each_vma(vmi, vma) {
> 			if (vma->vm_start - prev_vm_end  >= PAGE_SIZE) {
> 				if (is_reachable_by_call(prev_vm_end, vaddr))
> 					return prev_vm_end;
> 				if (is_reachable_by_call(vma->vm_start - PAGE_SIZE, vaddr))
> 					return vma->vm_start - PAGE_SIZE;
> 			}
> 			prev_vm_end = vma->vm_end;
> 		}
>
> 		return 0;
> 	}

Either way it doesn't look nice. If nothing else, we should respect
vm_start/end_gap(vma).

Can't we do something like

	struct vm_unmapped_area_info info = {};

	info.length = PAGE_SIZE;
	info.low_limit  = vaddr - INT_MIN + 5;
	info.high_limit = vaddr + INT_MAX;
	
	info.flags = VM_UNMAPPED_AREA_TOPDOWN; // makes sense?

	return vm_unmapped_area(&info);

instead ?

Oleg.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines
  2025-04-21 21:44 ` [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines Jiri Olsa
  2025-04-22 23:51   ` Andrii Nakryiko
  2025-04-27 14:56   ` Oleg Nesterov
@ 2025-04-27 18:04   ` Oleg Nesterov
  2025-04-28 13:52     ` Jiri Olsa
  2 siblings, 1 reply; 74+ messages in thread
From: Oleg Nesterov @ 2025-04-27 18:04 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On 04/21, Jiri Olsa wrote:
>
> +struct uprobe_trampoline {
> +	struct hlist_node	node;
> +	unsigned long		vaddr;
> +	atomic64_t		ref;
> +};

I don't really understand the point of uprobe_trampoline->ref...

set_orig_insn/swbp_unoptimize paths don't call uprobe_trampoline_put().
It is only called in unlikely case when swbp_optimize() fails, so perhaps
we can kill this member and uprobe_trampoline_put() ? At least in the initial
version.

> +static void uprobe_trampoline_put(struct uprobe_trampoline *tramp)
> +{
> +	if (tramp && atomic64_dec_and_test(&tramp->ref))
> +		destroy_uprobe_trampoline(tramp);
> +}

Why does it check tramp != NULL ?

Oleg.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
  2025-04-27 14:13   ` Oleg Nesterov
@ 2025-04-28 10:51     ` Jiri Olsa
  2025-04-29 13:44       ` Jiri Olsa
  2025-05-06 13:11       ` Jiri Olsa
  0 siblings, 2 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-28 10:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Sun, Apr 27, 2025 at 04:13:35PM +0200, Oleg Nesterov wrote:
> On 04/21, Jiri Olsa wrote:
> >
> > +static int set_swbp_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
> > +{
> > +	struct mm_struct *mm = vma->vm_mm;
> > +	int err;
> > +
> > +	/* We are going to replace instruction, update ref_ctr. */
> > +	if (uprobe->ref_ctr_offset) {
> > +		err = update_ref_ctr(uprobe, mm, 1);
> > +		if (err)
> > +			return err;
> > +	}
> > +
> > +	err = set_swbp(&uprobe->arch, vma, vaddr);
> > +
> > +	/* Revert back reference counter if instruction update failed. */
> > +	if (err && uprobe->ref_ctr_offset)
> > +		update_ref_ctr(uprobe, mm, -1);
> > +	return err;
> >  }
> ...
> > +static int set_orig_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
> > +{
> > +	int err = set_orig_insn(&uprobe->arch, vma, vaddr);
> > +
> > +	/* Revert back reference counter even if instruction update failed. */
> > +	if (uprobe->ref_ctr_offset)
> > +		update_ref_ctr(uprobe, vma->vm_mm, -1);
> > +	return err;
> >  }
> 
> This doesn't look right even in the simplest case...
> 
> To simplify, suppose that uprobe_register() needs to change a single mm/vma
> and set_swbp() fails. In this case uprobe_register() calls uprobe_unregister()
> which will find the same vma and call set_orig_refctr(). set_orig_insn() will
> do nothing. But update_ref_ctr(uprobe, vma->vm_mm, -1) is wrong/unbalanced.
> 
> The current code updates ref_ctr after the verify_opcode() check, so it doesn't
> have this problem.

ah right :-\

could set_swbp/set_orig_insn return > 0 in case the memory was actually updated?
and we would update the refctr based on that, like:

+static int set_swbp_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
+{
+       struct mm_struct *mm = vma->vm_mm;
+       int err;
+
+       err = set_swbp(&uprobe->arch, vma, vaddr);
+       if (err > 0 && uprobe->ref_ctr_offset)
+               update_ref_ctr(uprobe, mm, 1);
+       return err;
+}

+static int set_orig_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
+{
+       int err = set_orig_insn(&uprobe->arch, vma, vaddr);
+
+       /* Revert back reference counter even if instruction update failed. */
+       if (err > 0 && uprobe->ref_ctr_offset)
+               update_ref_ctr(uprobe, vma->vm_mm, -1);
+       return err;
+}

but then what if update_ref_ctr fails..

> 
> -------------------------------------------------------------------------------
> OTOH, I think that the current logic is not really correct too,
> 
> 	/* Revert back reference counter if instruction update failed. */
> 	if (ret < 0 && is_register && ref_ctr_updated)
> 		update_ref_ctr(uprobe, mm, -1);
> 
> I think that "Revert back reference counter" logic should not depend on
> is_register. Otherwise we can have the unbalanced update_ref_ctr(-1) if
> uprobe_unregister() fails, then another uprobe_register() comes at the
> same address, and after that uprobe_unregister() succeeds.

sounds good to me

jirka

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 07/22] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock
  2025-04-27 14:24   ` Oleg Nesterov
@ 2025-04-28 11:11     ` Jiri Olsa
  2025-04-28 11:40       ` Oleg Nesterov
  0 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-04-28 11:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Sun, Apr 27, 2025 at 04:24:01PM +0200, Oleg Nesterov wrote:
> On 04/21, Jiri Olsa wrote:
> >
> > @@ -1483,7 +1483,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
> >  	struct vm_area_struct *vma;
> >  	int err = 0;
> >
> > -	mmap_read_lock(mm);
> > +	mmap_write_lock(mm);
> 
> So uprobe_write_opcode() is always called under down_write(), right?
> Then this
> 
> 	* Called with mm->mmap_lock held for read or write.
> 
> comment should be probably updated.

yes

> 
> And perhaps the comment above mmap_write_lock() in register_for_each_vma()
> should be updated too... or even removed.

hum, not sure now how it's related to this change, but will stare at it bit more

thanks,
jirka

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 07/22] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock
  2025-04-28 11:11     ` Jiri Olsa
@ 2025-04-28 11:40       ` Oleg Nesterov
  0 siblings, 0 replies; 74+ messages in thread
From: Oleg Nesterov @ 2025-04-28 11:40 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On 04/28, Jiri Olsa wrote:
>
> On Sun, Apr 27, 2025 at 04:24:01PM +0200, Oleg Nesterov wrote:
> >
> > And perhaps the comment above mmap_write_lock() in register_for_each_vma()
> > should be updated too... or even removed.
>
> hum, not sure now how it's related to this change, but will stare at it bit more

That comment tries to explain why register_for_each_vma() has to take
mm->mmap_lock for writing. Without the described race it could use
mmap_read_lock(). See 84455e6923c79 for the details.

Now that we have another (obvious) reason for mmap_write_lock(mm), this
comment looks confusing.

Oleg.


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes
  2025-04-27 17:11   ` Oleg Nesterov
@ 2025-04-28 13:24     ` Jiri Olsa
  2025-04-28 13:24     ` Jiri Olsa
  1 sibling, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-28 13:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Sun, Apr 27, 2025 at 07:11:43PM +0200, Oleg Nesterov wrote:
> I didn't actually read this patch yet, but let me ask anyway...
> 
> On 04/21, Jiri Olsa wrote:
> >
> > +static int swbp_optimize(struct vm_area_struct *vma, unsigned long vaddr, unsigned long tramp)
> > +{
> > +	struct write_opcode_ctx ctx = {
> > +		.base = vaddr,
> > +	};
> > +	char call[5];
> > +	int err;
> > +
> > +	relative_call(call, vaddr, tramp);
> > +
> > +	/*
> > +	 * We are in state where breakpoint (int3) is installed on top of first
> > +	 * byte of the nop5 instruction. We will do following steps to overwrite
> > +	 * this to call instruction:
> > +	 *
> > +	 * - sync cores
> > +	 * - write last 4 bytes of the call instruction
> > +	 * - sync cores
> > +	 * - update the call instruction opcode
> > +	 */
> > +
> > +	text_poke_sync();
> 
> Hmm. I would like to understand why exactly we need at least this first
> text_poke_sync() before "write last 4 bytes of the call instruction".

I followed David's comment in here:

  https://lore.kernel.org/bpf/e206df95d98d4cbab77824cf7a32a80f@AcuMS.aculab.com/

  > That might work provided there are IPI (to flush the decode pipeline)
  > after the write of the 'int3' and one before the write of the 'call'.
  > You'll need to ensure the I-cache gets invalidated as well.


swbp_optimize is called when there's already int3 in place

> 
> 
> And... I don't suggest to do this right now, but I am wondering if we can
> use mm_cpumask(vma->vm_mm) later, I guess we don't care if we race with
> switch_mm_irqs_off() which can add another CPU to this mask...

hum, probably..

> 
> > +void arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	uprobe_opcode_t insn[5];
> > +
> > +	/*
> > +	 * Do not optimize if shadow stack is enabled, the return address hijack
> > +	 * code in arch_uretprobe_hijack_return_addr updates wrong frame when
> > +	 * the entry uprobe is optimized and the shadow stack crashes the app.
> > +	 */
> > +	if (shstk_is_enabled())
> > +		return;
> 
> Not sure I fully understand the comment/problem, but what if
> prctl(ARCH_SHSTK_ENABLE) is called after arch_uprobe_optimize() succeeds?

I'll address this in separate email

thanks,
jirka

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes
  2025-04-27 17:11   ` Oleg Nesterov
  2025-04-28 13:24     ` Jiri Olsa
@ 2025-04-28 13:24     ` Jiri Olsa
  1 sibling, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-28 13:24 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Sun, Apr 27, 2025 at 07:11:43PM +0200, Oleg Nesterov wrote:

SNIP

> > +void arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
> > +{
> > +	struct mm_struct *mm = current->mm;
> > +	uprobe_opcode_t insn[5];
> > +
> > +	/*
> > +	 * Do not optimize if shadow stack is enabled, the return address hijack
> > +	 * code in arch_uretprobe_hijack_return_addr updates wrong frame when
> > +	 * the entry uprobe is optimized and the shadow stack crashes the app.
> > +	 */
> > +	if (shstk_is_enabled())
> > +		return;
> 
> Not sure I fully understand the comment/problem, but ...

the issue is that sys_uprobe adjusts rsp to skip the uprobe trampoline stack frame
(which is call + 3x push), so the uprobe consumers see expected stack

then if we need to hijack the return address we:
  - update the return value on actual stack (updated rsp)
  - we update shadow stack with shstk_update_last_frame (last shadow stack frame)
    which will cause mismatch and the app crashes on trampoline's ret instruction

I think we could make that work, but to make it simple I think it's better
to skip it for now


> what if
> prctl(ARCH_SHSTK_ENABLE) is called after arch_uprobe_optimize() succeeds?

so that would look like this:

  foo:
    [int3 -> call tramp] hijack foo's return address
    ...

    prctl(ARCH_SHSTK_ENABLE)
    ...
    prctl(ARCH_SHSTK_DISABLE)

    ret -> jumps to uretprobe trampoline

at the time 'prctl(ARCH_SHSTK_ENABLE)' is called the return address is already
hijacked/changed in any case IIUC you need to disable shadow stack before
'foo' returns

thanks,
jirka

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines
  2025-04-27 17:34     ` Oleg Nesterov
@ 2025-04-28 13:48       ` Jiri Olsa
  0 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-28 13:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Sun, Apr 27, 2025 at 07:34:56PM +0200, Oleg Nesterov wrote:
> On 04/27, Oleg Nesterov wrote:
> >
> > On 04/21, Jiri Olsa wrote:
> > >
> > > +static unsigned long find_nearest_page(unsigned long vaddr)
> > > +{
> > > +	struct vm_area_struct *vma, *prev = NULL;
> > > +	unsigned long prev_vm_end = PAGE_SIZE;
> > > +	VMA_ITERATOR(vmi, current->mm, 0);
> > > +
> > > +	vma = vma_next(&vmi);
> > > +	while (vma) {
> > > +		if (prev)
> > > +			prev_vm_end = prev->vm_end;
> > > +		if (vma->vm_start - prev_vm_end  >= PAGE_SIZE) {
> > > +			if (is_reachable_by_call(prev_vm_end, vaddr))
> > > +				return prev_vm_end;
> > > +			if (is_reachable_by_call(vma->vm_start - PAGE_SIZE, vaddr))
> > > +				return vma->vm_start - PAGE_SIZE;
> > > +		}
> > > +		prev = vma;
> > > +		vma = vma_next(&vmi);
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> >
> > This can be simplified afaics... We don't really need prev, and we can
> > use for_each_vma(),
> >
> > 	static unsigned long find_nearest_page(unsigned long vaddr)
> > 	{
> > 		struct vm_area_struct *vma;
> > 		unsigned long prev_vm_end = PAGE_SIZE;
> > 		VMA_ITERATOR(vmi, current->mm, 0);
> >
> > 		for_each_vma(vmi, vma) {
> > 			if (vma->vm_start - prev_vm_end  >= PAGE_SIZE) {
> > 				if (is_reachable_by_call(prev_vm_end, vaddr))
> > 					return prev_vm_end;
> > 				if (is_reachable_by_call(vma->vm_start - PAGE_SIZE, vaddr))
> > 					return vma->vm_start - PAGE_SIZE;
> > 			}
> > 			prev_vm_end = vma->vm_end;
> > 		}
> >
> > 		return 0;
> > 	}
> 
> Either way it doesn't look nice. If nothing else, we should respect
> vm_start/end_gap(vma).
> 
> Can't we do something like
> 
> 	struct vm_unmapped_area_info info = {};
> 
> 	info.length = PAGE_SIZE;
> 	info.low_limit  = vaddr - INT_MIN + 5;
> 	info.high_limit = vaddr + INT_MAX;
> 	
> 	info.flags = VM_UNMAPPED_AREA_TOPDOWN; // makes sense?

so this would return highest available space right? current code goes from
bottom now, not sure what's preffered

> 
> 	return vm_unmapped_area(&info);
> 
> instead ?

yes, I did not realize we could use this, looks better, will try that

thanks,
jirka

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines
  2025-04-27 18:04   ` Oleg Nesterov
@ 2025-04-28 13:52     ` Jiri Olsa
  0 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-28 13:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Sun, Apr 27, 2025 at 08:04:32PM +0200, Oleg Nesterov wrote:
> On 04/21, Jiri Olsa wrote:
> >
> > +struct uprobe_trampoline {
> > +	struct hlist_node	node;
> > +	unsigned long		vaddr;
> > +	atomic64_t		ref;
> > +};
> 
> I don't really understand the point of uprobe_trampoline->ref...
> 
> set_orig_insn/swbp_unoptimize paths don't call uprobe_trampoline_put().
> It is only called in unlikely case when swbp_optimize() fails, so perhaps
> we can kill this member and uprobe_trampoline_put() ? At least in the initial
> version.

right, we can remove that

> 
> > +static void uprobe_trampoline_put(struct uprobe_trampoline *tramp)
> > +{
> > +	if (tramp && atomic64_dec_and_test(&tramp->ref))
> > +		destroy_uprobe_trampoline(tramp);
> > +}
> 
> Why does it check tramp != NULL ?

I think some earlier version of the code could have called that with NULL,
will remove that

thanks,
jirka

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
  2025-04-28 10:51     ` Jiri Olsa
@ 2025-04-29 13:44       ` Jiri Olsa
  2025-05-06 13:11       ` Jiri Olsa
  1 sibling, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-04-29 13:44 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 28, 2025 at 12:51:57PM +0200, Jiri Olsa wrote:
> On Sun, Apr 27, 2025 at 04:13:35PM +0200, Oleg Nesterov wrote:
> > On 04/21, Jiri Olsa wrote:
> > >
> > > +static int set_swbp_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
> > > +{
> > > +	struct mm_struct *mm = vma->vm_mm;
> > > +	int err;
> > > +
> > > +	/* We are going to replace instruction, update ref_ctr. */
> > > +	if (uprobe->ref_ctr_offset) {
> > > +		err = update_ref_ctr(uprobe, mm, 1);
> > > +		if (err)
> > > +			return err;
> > > +	}
> > > +
> > > +	err = set_swbp(&uprobe->arch, vma, vaddr);
> > > +
> > > +	/* Revert back reference counter if instruction update failed. */
> > > +	if (err && uprobe->ref_ctr_offset)
> > > +		update_ref_ctr(uprobe, mm, -1);
> > > +	return err;
> > >  }
> > ...
> > > +static int set_orig_refctr(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long vaddr)
> > > +{
> > > +	int err = set_orig_insn(&uprobe->arch, vma, vaddr);
> > > +
> > > +	/* Revert back reference counter even if instruction update failed. */
> > > +	if (uprobe->ref_ctr_offset)
> > > +		update_ref_ctr(uprobe, vma->vm_mm, -1);
> > > +	return err;
> > >  }
> > 
> > This doesn't look right even in the simplest case...
> > 
> > To simplify, suppose that uprobe_register() needs to change a single mm/vma
> > and set_swbp() fails. In this case uprobe_register() calls uprobe_unregister()
> > which will find the same vma and call set_orig_refctr(). set_orig_insn() will
> > do nothing. But update_ref_ctr(uprobe, vma->vm_mm, -1) is wrong/unbalanced.
> > 
> > The current code updates ref_ctr after the verify_opcode() check, so it doesn't
> > have this problem.
> 
> ah right :-\
> 
> could set_swbp/set_orig_insn return > 0 in case the memory was actually updated?
> and we would update the refctr based on that, like:

ok, I think we need to keep the refcnt update inside write_insn and enable it
through argument, so I can use write_insn from swbp_optimize/swbp_unoptimize
and tell it not to do refcnt update

jirka

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 22/22] man2: Add uprobe syscall page
  2025-04-22 20:45       ` Alejandro Colomar
@ 2025-05-01 21:26         ` Alejandro Colomar
  2025-05-02  8:47           ` Jiri Olsa
  0 siblings, 1 reply; 74+ messages in thread
From: Alejandro Colomar @ 2025-05-01 21:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]

Hi Jiri,

On Tue, Apr 22, 2025 at 10:45:41PM +0200, Alejandro Colomar wrote:
> On Tue, Apr 22, 2025 at 04:01:56PM +0200, Jiri Olsa wrote:
> > > > +is an alternative to breakpoint instructions
> > > > +for triggering entry uprobe consumers.
> > > 
> > > What are breakpoint instructions?
> > 
> > it's int3 instruction to trigger breakpoint (on x86_64)
> 
> I guess it's something that people who do that stuff understand.
> I don't, but I guess your intended audience will be okay with it.  :)
> 
> > > The pages are almost identical.  Should we document both pages in the
> > > same page?
> > 
> > great, I was wondering this was an option, looks much better
> > should we also add uprobe link, like below?
> 
> Yep, sure.  Thanks for the reminder!

From what I see, I should not yet merge the patch, right?  The kernel
code is under review, right?


Have a lovely night!
Alex

> 
> 
> Have a lovely night!
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 22/22] man2: Add uprobe syscall page
  2025-05-01 21:26         ` Alejandro Colomar
@ 2025-05-02  8:47           ` Jiri Olsa
  0 siblings, 0 replies; 74+ messages in thread
From: Jiri Olsa @ 2025-05-02  8:47 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Jiri Olsa, Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf,
	linux-kernel, linux-trace-kernel, x86, Song Liu, Yonghong Song,
	John Fastabend, Hao Luo, Steven Rostedt, Masami Hiramatsu,
	Alan Maguire, David Laight, Thomas Weißschuh, Ingo Molnar

On Thu, May 01, 2025 at 11:26:46PM +0200, Alejandro Colomar wrote:
> Hi Jiri,
> 
> On Tue, Apr 22, 2025 at 10:45:41PM +0200, Alejandro Colomar wrote:
> > On Tue, Apr 22, 2025 at 04:01:56PM +0200, Jiri Olsa wrote:
> > > > > +is an alternative to breakpoint instructions
> > > > > +for triggering entry uprobe consumers.
> > > > 
> > > > What are breakpoint instructions?
> > > 
> > > it's int3 instruction to trigger breakpoint (on x86_64)
> > 
> > I guess it's something that people who do that stuff understand.
> > I don't, but I guess your intended audience will be okay with it.  :)
> > 
> > > > The pages are almost identical.  Should we document both pages in the
> > > > same page?
> > > 
> > > great, I was wondering this was an option, looks much better
> > > should we also add uprobe link, like below?
> > 
> > Yep, sure.  Thanks for the reminder!
> 
> From what I see, I should not yet merge the patch, right?  The kernel
> code is under review, right?

right, we need to figure out other stuff first

thanks,
jirka

> 
> 
> Have a lovely night!
> Alex
> 
> > 
> > 
> > Have a lovely night!
> > Alex
> > 
> > -- 
> > <https://www.alejandro-colomar.es/>
> 
> 
> 
> -- 
> <https://www.alejandro-colomar.es/>



^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
  2025-04-28 10:51     ` Jiri Olsa
  2025-04-29 13:44       ` Jiri Olsa
@ 2025-05-06 13:11       ` Jiri Olsa
  2025-05-06 14:01         ` Oleg Nesterov
  1 sibling, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-05-06 13:11 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Mon, Apr 28, 2025 at 12:51:57PM +0200, Jiri Olsa wrote:
> On Sun, Apr 27, 2025 at 04:13:35PM +0200, Oleg Nesterov wrote:

SNIP

> > 
> > -------------------------------------------------------------------------------
> > OTOH, I think that the current logic is not really correct too,
> > 
> > 	/* Revert back reference counter if instruction update failed. */
> > 	if (ret < 0 && is_register && ref_ctr_updated)
> > 		update_ref_ctr(uprobe, mm, -1);
> > 
> > I think that "Revert back reference counter" logic should not depend on
> > is_register. Otherwise we can have the unbalanced update_ref_ctr(-1) if
> > uprobe_unregister() fails, then another uprobe_register() comes at the
> > same address, and after that uprobe_unregister() succeeds.
> 
> sounds good to me

actualy after closer look, I don't see how this code could be triggered
in the first place.. any hint on how to hit such case? like:

  - ref_ctr_offset is updated

  - we fail somehow

  - "if (ret < 0 && ref_ctr_updated)" is true on the way out

thanks,
jirka

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
  2025-05-06 13:11       ` Jiri Olsa
@ 2025-05-06 14:01         ` Oleg Nesterov
  2025-05-08 22:56           ` Jiri Olsa
  0 siblings, 1 reply; 74+ messages in thread
From: Oleg Nesterov @ 2025-05-06 14:01 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

I'm on PTO and traveling until May 15 without my working laptop, can't read
the code.

Quite possibly I am wrong, but let me try to recall what this code does...

- So. uprobe_register() succeeds and changes ref_ctr from 0 to 1.

- uprobe_unregister() fails but decrements ref_ctr back to zero. Because the
  "Revert back reference counter if instruction update failed" logic doesn't
  apply if is_register is true.

  Since uprobe_unregister() fails, this uprobe won't be removed. IIRC, we even
  have the warning about that.

- another uprobe_register() comes and re-uses the same uprobe. In this case
  install_breakpoint() will do nothing, ref_ctr won't be updated (right ?)

- uprobe_unregister() is called again and this time it succeeds. In this case
  ref_ctr is changed from 0 to -1. IIRC, we even have some warning for this
  case.

No?

Sorry, I can't check my thinking until I return.

Oleg.

On 05/06, Jiri Olsa wrote:
>
> On Mon, Apr 28, 2025 at 12:51:57PM +0200, Jiri Olsa wrote:
> > On Sun, Apr 27, 2025 at 04:13:35PM +0200, Oleg Nesterov wrote:
>
> SNIP
>
> > >
> > > -------------------------------------------------------------------------------
> > > OTOH, I think that the current logic is not really correct too,
> > >
> > > 	/* Revert back reference counter if instruction update failed. */
> > > 	if (ret < 0 && is_register && ref_ctr_updated)
> > > 		update_ref_ctr(uprobe, mm, -1);
> > >
> > > I think that "Revert back reference counter" logic should not depend on
> > > is_register. Otherwise we can have the unbalanced update_ref_ctr(-1) if
> > > uprobe_unregister() fails, then another uprobe_register() comes at the
> > > same address, and after that uprobe_unregister() succeeds.
> >
> > sounds good to me
>
> actualy after closer look, I don't see how this code could be triggered
> in the first place.. any hint on how to hit such case? like:
>
>   - ref_ctr_offset is updated
>
>   - we fail somehow
>
>   - "if (ret < 0 && ref_ctr_updated)" is true on the way out
>
> thanks,
> jirka
>


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
  2025-05-06 14:01         ` Oleg Nesterov
@ 2025-05-08 22:56           ` Jiri Olsa
  2025-05-12 13:37             ` Oleg Nesterov
  0 siblings, 1 reply; 74+ messages in thread
From: Jiri Olsa @ 2025-05-08 22:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Jiri Olsa, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

On Tue, May 06, 2025 at 04:01:45PM +0200, Oleg Nesterov wrote:
> I'm on PTO and traveling until May 15 without my working laptop, can't read
> the code.
> 
> Quite possibly I am wrong, but let me try to recall what this code does...
> 
> - So. uprobe_register() succeeds and changes ref_ctr from 0 to 1.
> 
> - uprobe_unregister() fails but decrements ref_ctr back to zero. Because the
>   "Revert back reference counter if instruction update failed" logic doesn't
>   apply if is_register is true.
> 
>   Since uprobe_unregister() fails, this uprobe won't be removed. IIRC, we even
>   have the warning about that.
> 
> - another uprobe_register() comes and re-uses the same uprobe. In this case
>   install_breakpoint() will do nothing, ref_ctr won't be updated (right ?)

right, because int3 is still in place and verify_opcode returns 0

> 
> - uprobe_unregister() is called again and this time it succeeds. In this case
>   ref_ctr is changed from 0 to -1. IIRC, we even have some warning for this
>   case.

AFAICS that should not happen, there's check below in __update_ref_ctr:

        if (unlikely(*ptr + d < 0)) {
                pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
                        "curr val: %d, delta: %d\n", vaddr, *ptr, d);
                ret = -EINVAL;
                goto out;
        }

        *ptr += d;
        ret = 0;
        ...


but it still prevents the uprobe from 2nd register to trigger,
so I think the change you suggest makes sense


few things first..

 - how do you make uprobe_unregister fail after succesful uprobe_register? 
   I had to instrument the code to do that for me

 - I see one extra uprobe_write_opcode call during unregister (check below)
   seems it does no harm, but looks strange


current code:

   1st register:

   - uprobe_register succeeds and changes ref_ctr_offset from 0 to 1

   1st unregister:

   - first there's uprobe_perf_close -> uprobe_apply call that ends up in
     remove_breakpoint call that will decrement ref_ctr_offset to 0 and fail

   - followed by __probe_event_disable -> uprobe_unregister_nosync call
     that ends up in remove_breakpoint call that will fail to decrement
     ref_ctr_offset to -1 (and ref_ctr_offset stays 0) and fail

   - uprobe is leaked

   2nd register:

   - another uprobe_register() comes and re-uses the same uprobe. In this case
     install_breakpoint() will do nothing, ref_ctr won't be updated, stays 0
     so uprobe WILL NOT trigger

   2nd unregister:

  -  both attempts (from uprobe_perf_close and __probe_event_disable as above)
     to write original instruction will fail, because ref_ctr_offset
     update fails and uprobe_write_opcode bails out


with the attached change we will do:

   1st register:

   - uprobe_register succeeds and changes ref_ctr_offset from 0 to 1

   1st unregister:

   - first there's uprobe_perf_close -> uprobe_apply call that ends up in
     remove_breakpoint call that will decrement ref_ctr_offset to 0 and fail
     and restore ref_ctr_offset to 1

   - followed by __probe_event_disable -> uprobe_unregister_nosync call
     that ends up in remove_breakpoint call that will do the same as
     previous step, ref_ctr_offset is 1

   - uprobe is leaked

   2nd register:

   - another uprobe_register() comes and re-uses the same uprobe. In this case
     install_breakpoint() will do nothing, ref_ctr won't be updated, stays 1,
     so uprobe WILL trigger

   2nd unregister:

  -  succeeds, and ref_ctr_offset is 0


jirka


---
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 207432e92386..65bfe52ed729 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -589,8 +589,8 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
 
 out:
 	/* Revert back reference counter if instruction update failed. */
-	if (ret < 0 && is_register && ref_ctr_updated)
-		update_ref_ctr(uprobe, mm, -1);
+	if (ret < 0 && ref_ctr_updated)
+		update_ref_ctr(uprobe, mm, is_register ? -1 : 1);
 
 	/* try collapse pmd for compound page */
 	if (ret > 0)

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
  2025-05-08 22:56           ` Jiri Olsa
@ 2025-05-12 13:37             ` Oleg Nesterov
  0 siblings, 0 replies; 74+ messages in thread
From: Oleg Nesterov @ 2025-05-12 13:37 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
	linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
	Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
	David Laight, Thomas Weißschuh, Ingo Molnar

I am still traveling, will actually read your email when I get back...

On 05/09, Jiri Olsa wrote:
>
> On Tue, May 06, 2025 at 04:01:45PM +0200, Oleg Nesterov wrote:
> >
> > - uprobe_unregister() is called again and this time it succeeds. In this case
> >   ref_ctr is changed from 0 to -1. IIRC, we even have some warning for this
> >   case.
>
> AFAICS that should not happen, there's check below in __update_ref_ctr:
>
>         if (unlikely(*ptr + d < 0)) {
>                 pr_warn("ref_ctr going negative. vaddr: 0x%lx, "
>                         "curr val: %d, delta: %d\n", vaddr, *ptr, d);
>                 ret = -EINVAL;
>                 goto out;
>         }

OK,

> few things first..
>
>  - how do you make uprobe_unregister fail after succesful uprobe_register?
>    I had to instrument the code to do that for me

I guess _unregister() should not fail "in practice" after
get_user_page + verify_opcode, yet I think we should not rely on this, if possible.

But I won't argue if you think we can ignore this "impossible" failures, just
this should be documented. Same for update_ref_ctr(), iirc it should "never"
fail if ref_offset is correct.

> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -589,8 +589,8 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct vm_area_struct *vma,
>
>  out:
>  	/* Revert back reference counter if instruction update failed. */
> -	if (ret < 0 && is_register && ref_ctr_updated)
> -		update_ref_ctr(uprobe, mm, -1);
> +	if (ret < 0 && ref_ctr_updated)
> +		update_ref_ctr(uprobe, mm, is_register ? -1 : 1);

Yes, this is what I meant.

Oleg.


^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2025-05-12 13:38 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-21 21:44 [PATCH perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 01/22] uprobes: Rename arch_uretprobe_trampoline function Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 02/22] uprobes: Make copy_from_page global Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 03/22] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode Jiri Olsa
2025-04-22 23:48   ` Andrii Nakryiko
2025-04-27 14:13   ` Oleg Nesterov
2025-04-28 10:51     ` Jiri Olsa
2025-04-29 13:44       ` Jiri Olsa
2025-05-06 13:11       ` Jiri Olsa
2025-05-06 14:01         ` Oleg Nesterov
2025-05-08 22:56           ` Jiri Olsa
2025-05-12 13:37             ` Oleg Nesterov
2025-04-21 21:44 ` [PATCH perf/core 04/22] uprobes: Add uprobe_write function Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 05/22] uprobes: Add nbytes argument to uprobe_write Jiri Olsa
2025-04-22 23:48   ` Andrii Nakryiko
2025-04-21 21:44 ` [PATCH perf/core 06/22] uprobes: Add is_register argument to uprobe_write and uprobe_write_opcode Jiri Olsa
2025-04-22 23:48   ` Andrii Nakryiko
2025-04-21 21:44 ` [PATCH perf/core 07/22] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock Jiri Olsa
2025-04-22 23:48   ` Andrii Nakryiko
2025-04-27 14:24   ` Oleg Nesterov
2025-04-28 11:11     ` Jiri Olsa
2025-04-28 11:40       ` Oleg Nesterov
2025-04-21 21:44 ` [PATCH perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines Jiri Olsa
2025-04-22 23:51   ` Andrii Nakryiko
2025-04-27 14:56   ` Oleg Nesterov
2025-04-27 17:34     ` Oleg Nesterov
2025-04-28 13:48       ` Jiri Olsa
2025-04-27 18:04   ` Oleg Nesterov
2025-04-28 13:52     ` Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 09/22] uprobes/x86: Add uprobe syscall to speed up uprobe Jiri Olsa
2025-04-22 23:48   ` Andrii Nakryiko
2025-04-27 15:51   ` Oleg Nesterov
2025-04-21 21:44 ` [PATCH perf/core 10/22] uprobes/x86: Add support to optimize uprobes Jiri Olsa
2025-04-23  0:04   ` Andrii Nakryiko
2025-04-24 12:49     ` Jiri Olsa
2025-04-24 16:06       ` Andrii Nakryiko
2025-04-27 17:11   ` Oleg Nesterov
2025-04-28 13:24     ` Jiri Olsa
2025-04-28 13:24     ` Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 11/22] selftests/bpf: Use 5-byte nop for x86 usdt probes Jiri Olsa
2025-04-23 17:33   ` Andrii Nakryiko
2025-04-24 12:49     ` Jiri Olsa
2025-04-24 16:29       ` Andrii Nakryiko
2025-04-24 18:20         ` Andrii Nakryiko
2025-04-25 13:20           ` Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 12/22] selftests/bpf: Reorg the uprobe_syscall test function Jiri Olsa
2025-04-23 17:34   ` Andrii Nakryiko
2025-04-21 21:44 ` [PATCH perf/core 13/22] selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi Jiri Olsa
2025-04-23 17:36   ` Andrii Nakryiko
2025-04-24 12:49     ` Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 14/22] selftests/bpf: Add uprobe/usdt syscall tests Jiri Olsa
2025-04-23 17:40   ` Andrii Nakryiko
2025-04-24 12:49     ` Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 15/22] selftests/bpf: Add hit/attach/detach race optimized uprobe test Jiri Olsa
2025-04-23 17:42   ` Andrii Nakryiko
2025-04-24 12:51     ` Jiri Olsa
2025-04-24 16:30       ` Andrii Nakryiko
2025-04-21 21:44 ` [PATCH perf/core 16/22] selftests/bpf: Add uprobe syscall sigill signal test Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 17/22] selftests/bpf: Add optimized usdt variant for basic usdt test Jiri Olsa
2025-04-23 17:44   ` Andrii Nakryiko
2025-04-21 21:44 ` [PATCH perf/core 18/22] selftests/bpf: Add uprobe_regs_equal test Jiri Olsa
2025-04-23 17:46   ` Andrii Nakryiko
2025-04-24 12:51     ` Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 19/22] selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe Jiri Olsa
2025-04-21 21:44 ` [PATCH perf/core 20/22] seccomp: passthrough uprobe systemcall without filtering Jiri Olsa
2025-04-21 23:04   ` Kees Cook
2025-04-21 21:44 ` [PATCH perf/core 21/22] selftests/seccomp: validate uprobe syscall passes through seccomp Jiri Olsa
2025-04-21 23:04   ` Kees Cook
2025-04-21 21:44 ` [PATCH 22/22] man2: Add uprobe syscall page Jiri Olsa
2025-04-22  7:00   ` Alejandro Colomar
2025-04-22 14:01     ` Jiri Olsa
2025-04-22 20:45       ` Alejandro Colomar
2025-05-01 21:26         ` Alejandro Colomar
2025-05-02  8:47           ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).