* [PATCH RFCv3 01/23] uprobes: Rename arch_uretprobe_trampoline function
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 02/23] uprobes: Make copy_from_page global Jiri Olsa
` (23 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
We are about to add uprobe trampoline, so cleaning up the namespace.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
arch/x86/kernel/uprobes.c | 2 +-
include/linux/uprobes.h | 2 +-
kernel/events/uprobes.c | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 9194695662b2..39521f1c4185 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -338,7 +338,7 @@ extern u8 uretprobe_trampoline_entry[];
extern u8 uretprobe_trampoline_end[];
extern u8 uretprobe_syscall_check[];
-void *arch_uprobe_trampoline(unsigned long *psize)
+void *arch_uretprobe_trampoline(unsigned long *psize)
{
static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
struct pt_regs *regs = task_pt_regs(current);
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 2e46b69ff0a6..37cd745640b8 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -224,7 +224,7 @@ extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs *regs);
extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
void *src, unsigned long len);
extern void uprobe_handle_trampoline(struct pt_regs *regs);
-extern void *arch_uprobe_trampoline(unsigned long *psize);
+extern void *arch_uretprobe_trampoline(unsigned long *psize);
extern unsigned long uprobe_get_trampoline_vaddr(void);
#else /* !CONFIG_UPROBES */
struct uprobes_state {
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 70c84b9d7be3..e160445e7d07 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1708,7 +1708,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
return ret;
}
-void * __weak arch_uprobe_trampoline(unsigned long *psize)
+void * __weak arch_uretprobe_trampoline(unsigned long *psize)
{
static uprobe_opcode_t insn = UPROBE_SWBP_INSN;
@@ -1740,7 +1740,7 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
init_waitqueue_head(&area->wq);
/* Reserve the 1st slot for get_trampoline_vaddr() */
set_bit(0, area->bitmap);
- insns = arch_uprobe_trampoline(&insns_size);
+ insns = arch_uretprobe_trampoline(&insns_size);
arch_uprobe_copy_ixol(area->page, 0, insns, insns_size);
if (!xol_add_vma(mm, area))
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 02/23] uprobes: Make copy_from_page global
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 01/23] uprobes: Rename arch_uretprobe_trampoline function Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 03/23] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode Jiri Olsa
` (22 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Making copy_from_page global and adding uprobe prefix.
Adding the uprobe prefix to copy_to_page as well for symmetry.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/uprobes.h | 1 +
kernel/events/uprobes.c | 16 ++++++++--------
2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 37cd745640b8..38803e8c8c3d 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -226,6 +226,7 @@ extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
extern void uprobe_handle_trampoline(struct pt_regs *regs);
extern void *arch_uretprobe_trampoline(unsigned long *psize);
extern unsigned long uprobe_get_trampoline_vaddr(void);
+extern void uprobe_copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len);
#else /* !CONFIG_UPROBES */
struct uprobes_state {
};
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index e160445e7d07..5c9fc31c50f1 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -250,14 +250,14 @@ bool __weak is_trap_insn(uprobe_opcode_t *insn)
return is_swbp_insn(insn);
}
-static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len)
+void uprobe_copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len)
{
void *kaddr = kmap_atomic(page);
memcpy(dst, kaddr + (vaddr & ~PAGE_MASK), len);
kunmap_atomic(kaddr);
}
-static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
+static void uprobe_copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
{
void *kaddr = kmap_atomic(page);
memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len);
@@ -278,7 +278,7 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
* is a trap variant; uprobes always wins over any other (gdb)
* breakpoint.
*/
- copy_from_page(page, vaddr, &old_opcode, UPROBE_SWBP_INSN_SIZE);
+ uprobe_copy_from_page(page, vaddr, &old_opcode, UPROBE_SWBP_INSN_SIZE);
is_swbp = is_swbp_insn(&old_opcode);
if (is_swbp_insn(new_opcode)) {
@@ -530,7 +530,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
__SetPageUptodate(new_page);
copy_highpage(new_page, old_page);
- copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
+ uprobe_copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
if (!is_register) {
struct page *orig_page;
@@ -1036,7 +1036,7 @@ static int __copy_insn(struct address_space *mapping, struct file *filp,
if (IS_ERR(page))
return PTR_ERR(page);
- copy_from_page(page, offset, insn, nbytes);
+ uprobe_copy_from_page(page, offset, insn, nbytes);
put_page(page);
return 0;
@@ -1380,7 +1380,7 @@ struct uprobe *uprobe_register(struct inode *inode,
return ERR_PTR(-EINVAL);
/*
- * This ensures that copy_from_page(), copy_to_page() and
+ * This ensures that uprobe_copy_from_page(), uprobe_copy_to_page() and
* __update_ref_ctr() can't cross page boundary.
*/
if (!IS_ALIGNED(offset, UPROBE_SWBP_INSN_SIZE))
@@ -1869,7 +1869,7 @@ void __weak arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
void *src, unsigned long len)
{
/* Initialize the slot */
- copy_to_page(page, vaddr, src, len);
+ uprobe_copy_to_page(page, vaddr, src, len);
/*
* We probably need flush_icache_user_page() but it needs vma.
@@ -2364,7 +2364,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
if (result < 0)
return result;
- copy_from_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
+ uprobe_copy_from_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
put_page(page);
out:
/* This needs to return true for any variant of the trap insn */
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 03/23] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 01/23] uprobes: Rename arch_uretprobe_trampoline function Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 02/23] uprobes: Make copy_from_page global Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 04/23] uprobes: Add uprobe_write function Jiri Olsa
` (21 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
The uprobe_write_opcode function currently updates also refctr offset
if there's one defined for uprobe.
This is not handy for following changes which needs to make several
updates (writes) to install or remove uprobe, but update refctr offset
just once.
Adding set_swbp_refctr/set_orig_refctr which makes sure refctr offset
is updated.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/events/uprobes.c | 50 ++++++++++++++++++++++++++---------------
1 file changed, 32 insertions(+), 18 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 5c9fc31c50f1..77b85b19f4c2 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -473,15 +473,13 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr, uprobe_opcode_t opcode)
{
- struct uprobe *uprobe;
struct page *old_page, *new_page;
struct vm_area_struct *vma;
- int ret, is_register, ref_ctr_updated = 0;
+ int ret, is_register;
bool orig_page_huge = false;
unsigned int gup_flags = FOLL_FORCE;
is_register = is_swbp_insn(&opcode);
- uprobe = container_of(auprobe, struct uprobe, arch);
retry:
if (is_register)
@@ -506,15 +504,6 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
goto put_old;
}
- /* We are going to replace instruction, update ref_ctr. */
- if (!ref_ctr_updated && uprobe->ref_ctr_offset) {
- ret = update_ref_ctr(uprobe, mm, is_register ? 1 : -1);
- if (ret)
- goto put_old;
-
- ref_ctr_updated = 1;
- }
-
ret = 0;
if (!is_register && !PageAnon(old_page))
goto put_old;
@@ -565,10 +554,6 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (unlikely(ret == -EAGAIN))
goto retry;
- /* Revert back reference counter if instruction update failed. */
- if (ret && is_register && ref_ctr_updated)
- update_ref_ctr(uprobe, mm, -1);
-
/* try collapse pmd for compound page */
if (!ret && orig_page_huge)
collapse_pte_mapped_thp(mm, vaddr, false);
@@ -590,6 +575,25 @@ int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned
return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
}
+static int set_swbp_refctr(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
+{
+ int err;
+
+ /* We are going to replace instruction, update ref_ctr. */
+ if (uprobe->ref_ctr_offset) {
+ err = update_ref_ctr(uprobe, mm, 1);
+ if (err)
+ return err;
+ }
+
+ err = set_swbp(&uprobe->arch, mm, vaddr);
+
+ /* Revert back reference counter if instruction update failed. */
+ if (err && uprobe->ref_ctr_offset)
+ update_ref_ctr(uprobe, mm, -1);
+ return err;
+}
+
/**
* set_orig_insn - Restore the original instruction.
* @mm: the probed process address space.
@@ -606,6 +610,16 @@ set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long v
*(uprobe_opcode_t *)&auprobe->insn);
}
+static int set_orig_refctr(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
+{
+ int err = set_orig_insn(&uprobe->arch, mm, vaddr);
+
+ /* Revert back reference counter even if instruction update failed. */
+ if (uprobe->ref_ctr_offset)
+ update_ref_ctr(uprobe, mm, -1);
+ return err;
+}
+
/* uprobe should have guaranteed positive refcount */
static struct uprobe *get_uprobe(struct uprobe *uprobe)
{
@@ -1142,7 +1156,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
if (first_uprobe)
set_bit(MMF_HAS_UPROBES, &mm->flags);
- ret = set_swbp(&uprobe->arch, mm, vaddr);
+ ret = set_swbp_refctr(uprobe, mm, vaddr);
if (!ret)
clear_bit(MMF_RECALC_UPROBES, &mm->flags);
else if (first_uprobe)
@@ -1155,7 +1169,7 @@ static int
remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
{
set_bit(MMF_RECALC_UPROBES, &mm->flags);
- return set_orig_insn(&uprobe->arch, mm, vaddr);
+ return set_orig_refctr(uprobe, mm, vaddr);
}
struct map_info {
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 04/23] uprobes: Add uprobe_write function
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (2 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 03/23] uprobes: Move ref_ctr_offset update out of uprobe_write_opcode Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 05/23] uprobes: Add nbytes argument to uprobe_write_opcode Jiri Olsa
` (20 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Adding uprobe_write function that does what uprobe_write_opcode did
so far, but allows to pass verify callback function that checks the
memory location before writing the opcode.
It will be used in following changes to simplify the checking logic.
The uprobe_write_opcode now calls uprobe_write with verify_opcode as
the verify callback.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/uprobes.h | 4 ++++
kernel/events/uprobes.c | 13 ++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 38803e8c8c3d..1dbaebc30ff9 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -187,6 +187,8 @@ struct uprobes_state {
struct xol_area *xol_area;
};
+typedef int (*uprobe_write_verify_t)(struct page *page, unsigned long vaddr, uprobe_opcode_t *opcode);
+
extern void __init uprobes_init(void);
extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
@@ -195,6 +197,8 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
+extern int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
+ uprobe_opcode_t *opcode, uprobe_write_verify_t verify);
extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 77b85b19f4c2..546e8755cf6d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -472,6 +472,13 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
*/
int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr, uprobe_opcode_t opcode)
+{
+ return uprobe_write(auprobe, mm, vaddr, &opcode, verify_opcode);
+}
+
+int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm,
+ unsigned long vaddr, uprobe_opcode_t *opcode,
+ uprobe_write_verify_t verify)
{
struct page *old_page, *new_page;
struct vm_area_struct *vma;
@@ -479,7 +486,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
bool orig_page_huge = false;
unsigned int gup_flags = FOLL_FORCE;
- is_register = is_swbp_insn(&opcode);
+ is_register = is_swbp_insn(opcode);
retry:
if (is_register)
@@ -489,7 +496,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (IS_ERR(old_page))
return PTR_ERR(old_page);
- ret = verify_opcode(old_page, vaddr, &opcode);
+ ret = verify(old_page, vaddr, opcode);
if (ret <= 0)
goto put_old;
@@ -519,7 +526,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
__SetPageUptodate(new_page);
copy_highpage(new_page, old_page);
- uprobe_copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
+ uprobe_copy_to_page(new_page, vaddr, opcode, UPROBE_SWBP_INSN_SIZE);
if (!is_register) {
struct page *orig_page;
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 05/23] uprobes: Add nbytes argument to uprobe_write_opcode
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (3 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 04/23] uprobes: Add uprobe_write function Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 06/23] uprobes: Add orig argument to uprobe_write and uprobe_write_opcode Jiri Olsa
` (19 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Adding nbytes argument to uprobe_write_opcode as preparation
for writing whole instructions in following changes.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/uprobes.h | 4 ++--
kernel/events/uprobes.c | 14 +++++++-------
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 1dbaebc30ff9..c69a05775394 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -187,7 +187,7 @@ struct uprobes_state {
struct xol_area *xol_area;
};
-typedef int (*uprobe_write_verify_t)(struct page *page, unsigned long vaddr, uprobe_opcode_t *opcode);
+typedef int (*uprobe_write_verify_t)(struct page *page, unsigned long vaddr, uprobe_opcode_t *opcode, int nbytes);
extern void __init uprobes_init(void);
extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
@@ -198,7 +198,7 @@ extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
extern int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
- uprobe_opcode_t *opcode, uprobe_write_verify_t verify);
+ uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify);
extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 546e8755cf6d..7ff1f07c8f79 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -264,7 +264,7 @@ static void uprobe_copy_to_page(struct page *page, unsigned long vaddr, const vo
kunmap_atomic(kaddr);
}
-static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode)
+static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode, int nbytes)
{
uprobe_opcode_t old_opcode;
bool is_swbp;
@@ -473,12 +473,12 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr, uprobe_opcode_t opcode)
{
- return uprobe_write(auprobe, mm, vaddr, &opcode, verify_opcode);
+ return uprobe_write(auprobe, mm, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE, verify_opcode);
}
int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm,
- unsigned long vaddr, uprobe_opcode_t *opcode,
- uprobe_write_verify_t verify)
+ unsigned long vaddr, uprobe_opcode_t *insn,
+ int nbytes, uprobe_write_verify_t verify)
{
struct page *old_page, *new_page;
struct vm_area_struct *vma;
@@ -486,7 +486,7 @@ int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm,
bool orig_page_huge = false;
unsigned int gup_flags = FOLL_FORCE;
- is_register = is_swbp_insn(opcode);
+ is_register = is_swbp_insn(insn);
retry:
if (is_register)
@@ -496,7 +496,7 @@ int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (IS_ERR(old_page))
return PTR_ERR(old_page);
- ret = verify(old_page, vaddr, opcode);
+ ret = verify(old_page, vaddr, insn, nbytes);
if (ret <= 0)
goto put_old;
@@ -526,7 +526,7 @@ int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm,
__SetPageUptodate(new_page);
copy_highpage(new_page, old_page);
- uprobe_copy_to_page(new_page, vaddr, opcode, UPROBE_SWBP_INSN_SIZE);
+ uprobe_copy_to_page(new_page, vaddr, insn, nbytes);
if (!is_register) {
struct page *orig_page;
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 06/23] uprobes: Add orig argument to uprobe_write and uprobe_write_opcode
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (4 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 05/23] uprobes: Add nbytes argument to uprobe_write_opcode Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-04-04 20:33 ` Andrii Nakryiko
2025-03-20 11:41 ` [PATCH RFCv3 07/23] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock Jiri Olsa
` (18 subsequent siblings)
24 siblings, 1 reply; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
The uprobe_write has special path to restore the original page when
we write original instruction back.
This happens when uprobe_write detects that we want to write anything
else but breakpoint instruction.
In following changes we want to use uprobe_write function for multiple
updates, so adding new function argument to denote that this is the
original instruction update. This way uprobe_write can make appropriate
checks and restore the original page when possible.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
arch/arm/probes/uprobes/core.c | 2 +-
include/linux/uprobes.h | 5 +++--
kernel/events/uprobes.c | 22 ++++++++++------------
3 files changed, 14 insertions(+), 15 deletions(-)
diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
index f5f790c6e5f8..54a90b565285 100644
--- a/arch/arm/probes/uprobes/core.c
+++ b/arch/arm/probes/uprobes/core.c
@@ -30,7 +30,7 @@ int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr)
{
return uprobe_write_opcode(auprobe, mm, vaddr,
- __opcode_to_mem_arm(auprobe->bpinsn));
+ __opcode_to_mem_arm(auprobe->bpinsn), false);
}
bool arch_uprobe_ignore(struct arch_uprobe *auprobe, struct pt_regs *regs)
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index c69a05775394..1b6a4e2b5464 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -196,9 +196,10 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
extern bool is_trap_insn(uprobe_opcode_t *insn);
extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
-extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
+extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
+ uprobe_opcode_t, bool);
extern int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
- uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify);
+ uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify, bool orig);
extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 7ff1f07c8f79..92fed5e50ec1 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -471,25 +471,23 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
* Return 0 (success) or a negative errno.
*/
int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
- unsigned long vaddr, uprobe_opcode_t opcode)
+ unsigned long vaddr, uprobe_opcode_t opcode, bool orig)
{
- return uprobe_write(auprobe, mm, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE, verify_opcode);
+ return uprobe_write(auprobe, mm, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE, verify_opcode, orig);
}
int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr, uprobe_opcode_t *insn,
- int nbytes, uprobe_write_verify_t verify)
+ int nbytes, uprobe_write_verify_t verify, bool orig)
{
struct page *old_page, *new_page;
struct vm_area_struct *vma;
- int ret, is_register;
+ int ret;
bool orig_page_huge = false;
unsigned int gup_flags = FOLL_FORCE;
- is_register = is_swbp_insn(insn);
-
retry:
- if (is_register)
+ if (!orig)
gup_flags |= FOLL_SPLIT_PMD;
/* Read the page with vaddr into memory */
old_page = get_user_page_vma_remote(mm, vaddr, gup_flags, &vma);
@@ -505,14 +503,14 @@ int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm,
goto put_old;
}
- if (WARN(!is_register && PageCompound(old_page),
+ if (WARN(orig && PageCompound(old_page),
"uprobe unregister should never work on compound page\n")) {
ret = -EINVAL;
goto put_old;
}
ret = 0;
- if (!is_register && !PageAnon(old_page))
+ if (orig && !PageAnon(old_page))
goto put_old;
ret = anon_vma_prepare(vma);
@@ -528,7 +526,7 @@ int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm,
copy_highpage(new_page, old_page);
uprobe_copy_to_page(new_page, vaddr, insn, nbytes);
- if (!is_register) {
+ if (orig) {
struct page *orig_page;
pgoff_t index;
@@ -579,7 +577,7 @@ int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm,
*/
int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
{
- return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN);
+ return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN, false);
}
static int set_swbp_refctr(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
@@ -614,7 +612,7 @@ int __weak
set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
{
return uprobe_write_opcode(auprobe, mm, vaddr,
- *(uprobe_opcode_t *)&auprobe->insn);
+ *(uprobe_opcode_t *)&auprobe->insn, true);
}
static int set_orig_refctr(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vaddr)
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 06/23] uprobes: Add orig argument to uprobe_write and uprobe_write_opcode
2025-03-20 11:41 ` [PATCH RFCv3 06/23] uprobes: Add orig argument to uprobe_write and uprobe_write_opcode Jiri Olsa
@ 2025-04-04 20:33 ` Andrii Nakryiko
2025-04-07 11:13 ` Jiri Olsa
0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2025-04-04 20:33 UTC (permalink / raw)
To: Jiri Olsa
Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
David Laight, Thomas Weißschuh
On Thu, Mar 20, 2025 at 4:43 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> The uprobe_write has special path to restore the original page when
> we write original instruction back.
>
> This happens when uprobe_write detects that we want to write anything
> else but breakpoint instruction.
>
> In following changes we want to use uprobe_write function for multiple
> updates, so adding new function argument to denote that this is the
> original instruction update. This way uprobe_write can make appropriate
> checks and restore the original page when possible.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> arch/arm/probes/uprobes/core.c | 2 +-
> include/linux/uprobes.h | 5 +++--
> kernel/events/uprobes.c | 22 ++++++++++------------
> 3 files changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
> index f5f790c6e5f8..54a90b565285 100644
> --- a/arch/arm/probes/uprobes/core.c
> +++ b/arch/arm/probes/uprobes/core.c
> @@ -30,7 +30,7 @@ int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
> unsigned long vaddr)
> {
> return uprobe_write_opcode(auprobe, mm, vaddr,
> - __opcode_to_mem_arm(auprobe->bpinsn));
> + __opcode_to_mem_arm(auprobe->bpinsn), false);
> }
>
> bool arch_uprobe_ignore(struct arch_uprobe *auprobe, struct pt_regs *regs)
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index c69a05775394..1b6a4e2b5464 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -196,9 +196,10 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
> extern bool is_trap_insn(uprobe_opcode_t *insn);
> extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
> extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
> -extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
> +extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
> + uprobe_opcode_t, bool);
add arg names for humans?..
> extern int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
> - uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify);
> + uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify, bool orig);
> extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
> extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
> extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc);
[...]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 06/23] uprobes: Add orig argument to uprobe_write and uprobe_write_opcode
2025-04-04 20:33 ` Andrii Nakryiko
@ 2025-04-07 11:13 ` Jiri Olsa
0 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-04-07 11:13 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
David Laight, Thomas Weißschuh
On Fri, Apr 04, 2025 at 01:33:02PM -0700, Andrii Nakryiko wrote:
> On Thu, Mar 20, 2025 at 4:43 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > The uprobe_write has special path to restore the original page when
> > we write original instruction back.
> >
> > This happens when uprobe_write detects that we want to write anything
> > else but breakpoint instruction.
> >
> > In following changes we want to use uprobe_write function for multiple
> > updates, so adding new function argument to denote that this is the
> > original instruction update. This way uprobe_write can make appropriate
> > checks and restore the original page when possible.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> > arch/arm/probes/uprobes/core.c | 2 +-
> > include/linux/uprobes.h | 5 +++--
> > kernel/events/uprobes.c | 22 ++++++++++------------
> > 3 files changed, 14 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/arm/probes/uprobes/core.c b/arch/arm/probes/uprobes/core.c
> > index f5f790c6e5f8..54a90b565285 100644
> > --- a/arch/arm/probes/uprobes/core.c
> > +++ b/arch/arm/probes/uprobes/core.c
> > @@ -30,7 +30,7 @@ int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > unsigned long vaddr)
> > {
> > return uprobe_write_opcode(auprobe, mm, vaddr,
> > - __opcode_to_mem_arm(auprobe->bpinsn));
> > + __opcode_to_mem_arm(auprobe->bpinsn), false);
> > }
> >
> > bool arch_uprobe_ignore(struct arch_uprobe *auprobe, struct pt_regs *regs)
> > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> > index c69a05775394..1b6a4e2b5464 100644
> > --- a/include/linux/uprobes.h
> > +++ b/include/linux/uprobes.h
> > @@ -196,9 +196,10 @@ extern bool is_swbp_insn(uprobe_opcode_t *insn);
> > extern bool is_trap_insn(uprobe_opcode_t *insn);
> > extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
> > extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
> > -extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
> > +extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
> > + uprobe_opcode_t, bool);
>
> add arg names for humans?..
yep, anything for humans.. ;-)
thanks,
jirka
>
> > extern int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
> > - uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify);
> > + uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify, bool orig);
> > extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
> > extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
> > extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc);
>
> [...]
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFCv3 07/23] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (5 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 06/23] uprobes: Add orig argument to uprobe_write and uprobe_write_opcode Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 08/23] uprobes/x86: Add uprobe syscall to speed up uprobe Jiri Olsa
` (17 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Currently unapply_uprobe takes mmap_read_lock, but it might call
remove_breakpoint which eventually changes user pages.
Current code writes either breakpoint or original instruction,
so it probably go away with that, but with the upcoming changes
that use multiple instructions on the probed address we need to
ensure that any update to mm's pages is exclusive.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/events/uprobes.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 92fed5e50ec1..bd4bc62f80d7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1465,7 +1465,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
struct vm_area_struct *vma;
int err = 0;
- mmap_read_lock(mm);
+ mmap_write_lock(mm);
for_each_vma(vmi, vma) {
unsigned long vaddr;
loff_t offset;
@@ -1482,7 +1482,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
vaddr = offset_to_vaddr(vma, uprobe->offset);
err |= remove_breakpoint(uprobe, mm, vaddr);
}
- mmap_read_unlock(mm);
+ mmap_write_unlock(mm);
return err;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 08/23] uprobes/x86: Add uprobe syscall to speed up uprobe
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (6 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 07/23] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-04-04 20:33 ` Andrii Nakryiko
2025-03-20 11:41 ` [PATCH RFCv3 09/23] uprobes/x86: Add mapping for optimized uprobe trampolines Jiri Olsa
` (16 subsequent siblings)
24 siblings, 1 reply; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Adding new uprobe syscall that calls uprobe handlers for given
'breakpoint' address.
The idea is that the 'breakpoint' address calls the user space
trampoline which executes the uprobe syscall.
The syscall handler reads the return address of the initial call
to retrieve the original 'breakpoint' address. With this address
we find the related uprobe object and call its consumers.
Adding the arch_uprobe_trampoline_mapping function that provides
uprobe trampoline mapping. This mapping is backed with one global
page initialized at __init time and shared by the all the mapping
instances.
We do not allow to execute uprobe syscall if the caller is not
from uprobe trampoline mapping.
The uprobe syscall ensures the consumer (bpf program) sees registers
values in the state before the trampoline was called.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/kernel/uprobes.c | 134 +++++++++++++++++++++++++
include/linux/syscalls.h | 2 +
include/linux/uprobes.h | 1 +
kernel/events/uprobes.c | 22 ++++
kernel/sys_ni.c | 1 +
6 files changed, 161 insertions(+)
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5eb708bff1c7..88e388c7675b 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -345,6 +345,7 @@
333 common io_pgetevents sys_io_pgetevents
334 common rseq sys_rseq
335 common uretprobe sys_uretprobe
+336 common uprobe sys_uprobe
# don't use numbers 387 through 423, add new calls after the last
# 'common' entry
424 common pidfd_send_signal sys_pidfd_send_signal
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 39521f1c4185..b11dcd47edaa 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -429,6 +429,140 @@ SYSCALL_DEFINE0(uretprobe)
return -1;
}
+static int tramp_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma)
+{
+ return -EPERM;
+}
+
+static struct page *tramp_mapping_pages[2] __ro_after_init;
+
+static struct vm_special_mapping tramp_mapping = {
+ .name = "[uprobes-trampoline]",
+ .mremap = tramp_mremap,
+ .pages = tramp_mapping_pages,
+};
+
+static bool __in_uprobe_trampoline(unsigned long ip)
+{
+ struct vm_area_struct *vma = vma_lookup(current->mm, ip);
+
+ return vma && vma_is_special_mapping(vma, &tramp_mapping);
+}
+
+static bool in_uprobe_trampoline(unsigned long ip)
+{
+ struct mm_struct *mm = current->mm;
+ bool found, retry = true;
+ unsigned int seq;
+
+ rcu_read_lock();
+ if (mmap_lock_speculate_try_begin(mm, &seq)) {
+ found = __in_uprobe_trampoline(ip);
+ retry = mmap_lock_speculate_retry(mm, seq);
+ }
+ rcu_read_unlock();
+
+ if (retry) {
+ mmap_read_lock(mm);
+ found = __in_uprobe_trampoline(ip);
+ mmap_read_unlock(mm);
+ }
+ return found;
+}
+
+SYSCALL_DEFINE0(uprobe)
+{
+ struct pt_regs *regs = task_pt_regs(current);
+ unsigned long ip, sp, ax_r11_cx_ip[4];
+ int err;
+
+ /* Allow execution only from uprobe trampolines. */
+ if (!in_uprobe_trampoline(regs->ip))
+ goto sigill;
+
+ err = copy_from_user(ax_r11_cx_ip, (void __user *)regs->sp, sizeof(ax_r11_cx_ip));
+ if (err)
+ goto sigill;
+
+ ip = regs->ip;
+
+ /*
+ * expose the "right" values of ax/r11/cx/ip/sp to uprobe_consumer/s, plus:
+ * - adjust ip to the probe address, call saved next instruction address
+ * - adjust sp to the probe's stack frame (check trampoline code)
+ */
+ regs->ax = ax_r11_cx_ip[0];
+ regs->r11 = ax_r11_cx_ip[1];
+ regs->cx = ax_r11_cx_ip[2];
+ regs->ip = ax_r11_cx_ip[3] - 5;
+ regs->sp += sizeof(ax_r11_cx_ip);
+ regs->orig_ax = -1;
+
+ sp = regs->sp;
+
+ handle_syscall_uprobe(regs, regs->ip);
+
+ /*
+ * Some of the uprobe consumers has changed sp, we can do nothing,
+ * just return via iret.
+ */
+ if (regs->sp != sp)
+ return regs->ax;
+
+ regs->sp -= sizeof(ax_r11_cx_ip);
+
+ /* for the case uprobe_consumer has changed ax/r11/cx */
+ ax_r11_cx_ip[0] = regs->ax;
+ ax_r11_cx_ip[1] = regs->r11;
+ ax_r11_cx_ip[2] = regs->cx;
+
+ /* keep return address unless we are instructed otherwise */
+ if (ax_r11_cx_ip[3] - 5 != regs->ip)
+ ax_r11_cx_ip[3] = regs->ip;
+
+ regs->ip = ip;
+
+ err = copy_to_user((void __user *)regs->sp, ax_r11_cx_ip, sizeof(ax_r11_cx_ip));
+ if (err)
+ goto sigill;
+
+ /* ensure sysret, see do_syscall_64() */
+ regs->r11 = regs->flags;
+ regs->cx = regs->ip;
+ return 0;
+
+sigill:
+ force_sig(SIGILL);
+ return -1;
+}
+
+asm (
+ ".pushsection .rodata\n"
+ ".balign " __stringify(PAGE_SIZE) "\n"
+ "uprobe_trampoline_entry:\n"
+ "push %rcx\n"
+ "push %r11\n"
+ "push %rax\n"
+ "movq $" __stringify(__NR_uprobe) ", %rax\n"
+ "syscall\n"
+ "pop %rax\n"
+ "pop %r11\n"
+ "pop %rcx\n"
+ "ret\n"
+ ".balign " __stringify(PAGE_SIZE) "\n"
+ ".popsection\n"
+);
+
+extern u8 uprobe_trampoline_entry[];
+
+static int __init arch_uprobes_init(void)
+{
+ tramp_mapping_pages[0] = virt_to_page(uprobe_trampoline_entry);
+ return 0;
+}
+
+late_initcall(arch_uprobes_init);
+
/*
* If arch_uprobe->insn doesn't use rip-relative addressing, return
* immediately. Otherwise, rewrite the instruction so that it accesses
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index c6333204d451..002f4e1debe5 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -994,6 +994,8 @@ asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int on);
asmlinkage long sys_uretprobe(void);
+asmlinkage long sys_uprobe(void);
+
/* pciconfig: alpha, arm, arm64, ia64, sparc */
asmlinkage long sys_pciconfig_read(unsigned long bus, unsigned long dfn,
unsigned long off, unsigned long len,
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 1b6a4e2b5464..4a2b950beefd 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -232,6 +232,7 @@ extern void uprobe_handle_trampoline(struct pt_regs *regs);
extern void *arch_uretprobe_trampoline(unsigned long *psize);
extern unsigned long uprobe_get_trampoline_vaddr(void);
extern void uprobe_copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len);
+extern void handle_syscall_uprobe(struct pt_regs *regs, unsigned long bp_vaddr);
#else /* !CONFIG_UPROBES */
struct uprobes_state {
};
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index bd4bc62f80d7..4de04075576c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -2742,6 +2742,28 @@ static void handle_swbp(struct pt_regs *regs)
rcu_read_unlock_trace();
}
+void handle_syscall_uprobe(struct pt_regs *regs, unsigned long bp_vaddr)
+{
+ struct uprobe *uprobe;
+ int is_swbp;
+
+ rcu_read_lock_trace();
+ uprobe = find_active_uprobe_rcu(bp_vaddr, &is_swbp);
+ if (!uprobe)
+ goto unlock;
+
+ if (!get_utask())
+ goto unlock;
+
+ if (arch_uprobe_ignore(&uprobe->arch, regs))
+ goto unlock;
+
+ handler_chain(uprobe, regs);
+
+ unlock:
+ rcu_read_unlock_trace();
+}
+
/*
* Perform required fix-ups and disable singlestep.
* Allow pending signals to take effect.
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index c00a86931f8c..bf5d05c635ff 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -392,3 +392,4 @@ COND_SYSCALL(setuid16);
COND_SYSCALL(rseq);
COND_SYSCALL(uretprobe);
+COND_SYSCALL(uprobe);
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 08/23] uprobes/x86: Add uprobe syscall to speed up uprobe
2025-03-20 11:41 ` [PATCH RFCv3 08/23] uprobes/x86: Add uprobe syscall to speed up uprobe Jiri Olsa
@ 2025-04-04 20:33 ` Andrii Nakryiko
2025-04-07 10:58 ` Jiri Olsa
0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2025-04-04 20:33 UTC (permalink / raw)
To: Jiri Olsa
Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
David Laight, Thomas Weißschuh
On Thu, Mar 20, 2025 at 4:43 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding new uprobe syscall that calls uprobe handlers for given
> 'breakpoint' address.
>
> The idea is that the 'breakpoint' address calls the user space
> trampoline which executes the uprobe syscall.
>
> The syscall handler reads the return address of the initial call
> to retrieve the original 'breakpoint' address. With this address
> we find the related uprobe object and call its consumers.
>
> Adding the arch_uprobe_trampoline_mapping function that provides
> uprobe trampoline mapping. This mapping is backed with one global
> page initialized at __init time and shared by the all the mapping
> instances.
>
> We do not allow to execute uprobe syscall if the caller is not
> from uprobe trampoline mapping.
>
> The uprobe syscall ensures the consumer (bpf program) sees registers
> values in the state before the trampoline was called.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> arch/x86/kernel/uprobes.c | 134 +++++++++++++++++++++++++
> include/linux/syscalls.h | 2 +
> include/linux/uprobes.h | 1 +
> kernel/events/uprobes.c | 22 ++++
> kernel/sys_ni.c | 1 +
> 6 files changed, 161 insertions(+)
>
[...]
> +void handle_syscall_uprobe(struct pt_regs *regs, unsigned long bp_vaddr)
> +{
> + struct uprobe *uprobe;
> + int is_swbp;
> +
> + rcu_read_lock_trace();
> + uprobe = find_active_uprobe_rcu(bp_vaddr, &is_swbp);
> + if (!uprobe)
> + goto unlock;
> +
> + if (!get_utask())
> + goto unlock;
> +
> + if (arch_uprobe_ignore(&uprobe->arch, regs))
> + goto unlock;
> +
> + handler_chain(uprobe, regs);
> +
> + unlock:
> + rcu_read_unlock_trace();
we now have `guard(rcu_tasks_trace)();`, let's use that in this
function, seems like a good fit?
> +}
> +
> /*
> * Perform required fix-ups and disable singlestep.
> * Allow pending signals to take effect.
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index c00a86931f8c..bf5d05c635ff 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -392,3 +392,4 @@ COND_SYSCALL(setuid16);
> COND_SYSCALL(rseq);
>
> COND_SYSCALL(uretprobe);
> +COND_SYSCALL(uprobe);
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 08/23] uprobes/x86: Add uprobe syscall to speed up uprobe
2025-04-04 20:33 ` Andrii Nakryiko
@ 2025-04-07 10:58 ` Jiri Olsa
0 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-04-07 10:58 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
David Laight, Thomas Weißschuh
On Fri, Apr 04, 2025 at 01:33:07PM -0700, Andrii Nakryiko wrote:
> On Thu, Mar 20, 2025 at 4:43 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Adding new uprobe syscall that calls uprobe handlers for given
> > 'breakpoint' address.
> >
> > The idea is that the 'breakpoint' address calls the user space
> > trampoline which executes the uprobe syscall.
> >
> > The syscall handler reads the return address of the initial call
> > to retrieve the original 'breakpoint' address. With this address
> > we find the related uprobe object and call its consumers.
> >
> > Adding the arch_uprobe_trampoline_mapping function that provides
> > uprobe trampoline mapping. This mapping is backed with one global
> > page initialized at __init time and shared by the all the mapping
> > instances.
> >
> > We do not allow to execute uprobe syscall if the caller is not
> > from uprobe trampoline mapping.
> >
> > The uprobe syscall ensures the consumer (bpf program) sees registers
> > values in the state before the trampoline was called.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> > arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> > arch/x86/kernel/uprobes.c | 134 +++++++++++++++++++++++++
> > include/linux/syscalls.h | 2 +
> > include/linux/uprobes.h | 1 +
> > kernel/events/uprobes.c | 22 ++++
> > kernel/sys_ni.c | 1 +
> > 6 files changed, 161 insertions(+)
> >
>
> [...]
>
> > +void handle_syscall_uprobe(struct pt_regs *regs, unsigned long bp_vaddr)
> > +{
> > + struct uprobe *uprobe;
> > + int is_swbp;
> > +
> > + rcu_read_lock_trace();
> > + uprobe = find_active_uprobe_rcu(bp_vaddr, &is_swbp);
> > + if (!uprobe)
> > + goto unlock;
> > +
> > + if (!get_utask())
> > + goto unlock;
> > +
> > + if (arch_uprobe_ignore(&uprobe->arch, regs))
> > + goto unlock;
> > +
> > + handler_chain(uprobe, regs);
> > +
> > + unlock:
> > + rcu_read_unlock_trace();
>
> we now have `guard(rcu_tasks_trace)();`, let's use that in this
> function, seems like a good fit?
yes, will use it
thanks,
jirka
>
>
> > +}
> > +
> > /*
> > * Perform required fix-ups and disable singlestep.
> > * Allow pending signals to take effect.
> > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> > index c00a86931f8c..bf5d05c635ff 100644
> > --- a/kernel/sys_ni.c
> > +++ b/kernel/sys_ni.c
> > @@ -392,3 +392,4 @@ COND_SYSCALL(setuid16);
> > COND_SYSCALL(rseq);
> >
> > COND_SYSCALL(uretprobe);
> > +COND_SYSCALL(uprobe);
> > --
> > 2.49.0
> >
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFCv3 09/23] uprobes/x86: Add mapping for optimized uprobe trampolines
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (7 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 08/23] uprobes/x86: Add uprobe syscall to speed up uprobe Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 10/23] uprobes/x86: Add support to emulate nop5 instruction Jiri Olsa
` (15 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Adding support to add special mapping for for user space trampoline
with following functions:
uprobe_trampoline_get - find or add related uprobe_trampoline
uprobe_trampoline_put - remove ref or destroy uprobe_trampoline
The user space trampoline is exported as architecture specific user space
special mapping, which is provided by arch_uprobe_trampoline_mapping
function.
The uprobe trampoline needs to be callable/reachable from the probe address,
so while searching for available address we use uprobe_is_callable function
to decide if the uprobe trampoline is callable from the probe address.
All uprobe_trampoline objects are stored in uprobes_state object and are
cleaned up when the process mm_struct goes down. Adding new arch hooks
for that, because this change is x86_64 specific.
Locking is provided by callers in following changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
arch/x86/kernel/uprobes.c | 123 ++++++++++++++++++++++++++++++++++++++
include/linux/uprobes.h | 6 ++
kernel/events/uprobes.c | 10 ++++
kernel/fork.c | 1 +
4 files changed, 140 insertions(+)
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index b11dcd47edaa..5ee2cce4c63e 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -742,6 +742,129 @@ static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
*sr = utask->autask.saved_scratch_register;
}
}
+
+struct uprobe_trampoline {
+ struct hlist_node node;
+ unsigned long vaddr;
+ atomic64_t ref;
+};
+
+static bool is_reachable_by_call(unsigned long vtramp, unsigned long vaddr)
+{
+ long delta = (long)(vaddr + 5 - vtramp);
+
+ return delta >= INT_MIN && delta <= INT_MAX;
+}
+
+static unsigned long find_nearest_page(unsigned long vaddr)
+{
+ struct vm_area_struct *vma, *prev = NULL;
+ unsigned long prev_vm_end = PAGE_SIZE;
+ VMA_ITERATOR(vmi, current->mm, 0);
+
+ vma = vma_next(&vmi);
+ while (vma) {
+ if (prev)
+ prev_vm_end = prev->vm_end;
+ if (vma->vm_start - prev_vm_end >= PAGE_SIZE) {
+ if (is_reachable_by_call(prev_vm_end, vaddr))
+ return prev_vm_end;
+ if (is_reachable_by_call(vma->vm_start - PAGE_SIZE, vaddr))
+ return vma->vm_start - PAGE_SIZE;
+ }
+ prev = vma;
+ vma = vma_next(&vmi);
+ }
+
+ return 0;
+}
+
+static struct uprobe_trampoline *create_uprobe_trampoline(unsigned long vaddr)
+{
+ struct pt_regs *regs = task_pt_regs(current);
+ const struct vm_special_mapping *mapping;
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+ struct uprobe_trampoline *tramp;
+
+ mapping = user_64bit_mode(regs) ? &tramp_mapping : NULL;
+ if (!mapping)
+ return NULL;
+
+ vaddr = find_nearest_page(vaddr);
+ if (!vaddr)
+ return NULL;
+
+ tramp = kzalloc(sizeof(*tramp), GFP_KERNEL);
+ if (unlikely(!tramp))
+ return NULL;
+
+ atomic64_set(&tramp->ref, 1);
+ tramp->vaddr = vaddr;
+
+ vma = _install_special_mapping(mm, tramp->vaddr, PAGE_SIZE,
+ VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_DONTCOPY|VM_IO,
+ mapping);
+ if (IS_ERR(vma))
+ goto free_area;
+ return tramp;
+
+ free_area:
+ kfree(tramp);
+ return NULL;
+}
+
+__maybe_unused
+static struct uprobe_trampoline *uprobe_trampoline_get(unsigned long vaddr)
+{
+ struct uprobes_state *state = ¤t->mm->uprobes_state;
+ struct uprobe_trampoline *tramp = NULL;
+
+ hlist_for_each_entry(tramp, &state->head_tramps, node) {
+ if (is_reachable_by_call(tramp->vaddr, vaddr)) {
+ atomic64_inc(&tramp->ref);
+ return tramp;
+ }
+ }
+
+ tramp = create_uprobe_trampoline(vaddr);
+ if (!tramp)
+ return NULL;
+
+ hlist_add_head(&tramp->node, &state->head_tramps);
+ return tramp;
+}
+
+static void destroy_uprobe_trampoline(struct uprobe_trampoline *tramp)
+{
+ hlist_del(&tramp->node);
+ kfree(tramp);
+}
+
+__maybe_unused
+static void uprobe_trampoline_put(struct uprobe_trampoline *tramp)
+{
+ if (tramp == NULL)
+ return;
+
+ if (atomic64_dec_and_test(&tramp->ref))
+ destroy_uprobe_trampoline(tramp);
+}
+
+void arch_uprobe_init_state(struct mm_struct *mm)
+{
+ INIT_HLIST_HEAD(&mm->uprobes_state.head_tramps);
+}
+
+void arch_uprobe_clear_state(struct mm_struct *mm)
+{
+ struct uprobes_state *state = &mm->uprobes_state;
+ struct uprobe_trampoline *tramp;
+ struct hlist_node *n;
+
+ hlist_for_each_entry_safe(tramp, n, &state->head_tramps, node)
+ destroy_uprobe_trampoline(tramp);
+}
#else /* 32-bit: */
/*
* No RIP-relative addressing on 32-bit
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 4a2b950beefd..7bde68871150 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -17,6 +17,7 @@
#include <linux/wait.h>
#include <linux/timer.h>
#include <linux/seqlock.h>
+#include <linux/mutex.h>
struct uprobe;
struct vm_area_struct;
@@ -185,6 +186,9 @@ struct xol_area;
struct uprobes_state {
struct xol_area *xol_area;
+#ifdef CONFIG_X86_64
+ struct hlist_head head_tramps;
+#endif
};
typedef int (*uprobe_write_verify_t)(struct page *page, unsigned long vaddr, uprobe_opcode_t *opcode, int nbytes);
@@ -233,6 +237,8 @@ extern void *arch_uretprobe_trampoline(unsigned long *psize);
extern unsigned long uprobe_get_trampoline_vaddr(void);
extern void uprobe_copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len);
extern void handle_syscall_uprobe(struct pt_regs *regs, unsigned long bp_vaddr);
+extern void arch_uprobe_clear_state(struct mm_struct *mm);
+extern void arch_uprobe_init_state(struct mm_struct *mm);
#else /* !CONFIG_UPROBES */
struct uprobes_state {
};
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 4de04075576c..9370df47ec71 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1793,6 +1793,14 @@ static struct xol_area *get_xol_area(void)
return area;
}
+void __weak arch_uprobe_clear_state(struct mm_struct *mm)
+{
+}
+
+void __weak arch_uprobe_init_state(struct mm_struct *mm)
+{
+}
+
/*
* uprobe_clear_state - Free the area allocated for slots.
*/
@@ -1804,6 +1812,8 @@ void uprobe_clear_state(struct mm_struct *mm)
delayed_uprobe_remove(NULL, mm);
mutex_unlock(&delayed_uprobe_lock);
+ arch_uprobe_clear_state(mm);
+
if (!area)
return;
diff --git a/kernel/fork.c b/kernel/fork.c
index e27fe5d5a15c..36a0f073b913 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1260,6 +1260,7 @@ static void mm_init_uprobes_state(struct mm_struct *mm)
{
#ifdef CONFIG_UPROBES
mm->uprobes_state.xol_area = NULL;
+ arch_uprobe_init_state(mm);
#endif
}
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 10/23] uprobes/x86: Add support to emulate nop5 instruction
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (8 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 09/23] uprobes/x86: Add mapping for optimized uprobe trampolines Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-04-04 20:33 ` Andrii Nakryiko
2025-03-20 11:41 ` [PATCH RFCv3 11/23] uprobes/x86: Add support to optimize uprobes Jiri Olsa
` (14 subsequent siblings)
24 siblings, 1 reply; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Adding support to emulate nop5 as the original uprobe instruction.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
arch/x86/kernel/uprobes.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 5ee2cce4c63e..1661e0ab2a3d 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -308,6 +308,11 @@ static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool
return -ENOTSUPP;
}
+static int is_nop5_insn(uprobe_opcode_t *insn)
+{
+ return !memcmp(insn, x86_nops[5], 5);
+}
+
#ifdef CONFIG_X86_64
asm (
@@ -865,6 +870,11 @@ void arch_uprobe_clear_state(struct mm_struct *mm)
hlist_for_each_entry_safe(tramp, n, &state->head_tramps, node)
destroy_uprobe_trampoline(tramp);
}
+
+static bool emulate_nop5_insn(struct arch_uprobe *auprobe)
+{
+ return is_nop5_insn((uprobe_opcode_t *) &auprobe->insn);
+}
#else /* 32-bit: */
/*
* No RIP-relative addressing on 32-bit
@@ -878,6 +888,10 @@ static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
}
+static bool emulate_nop5_insn(struct arch_uprobe *auprobe)
+{
+ return false;
+}
#endif /* CONFIG_X86_64 */
struct uprobe_xol_ops {
@@ -1109,6 +1123,8 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
break;
case 0x0f:
+ if (emulate_nop5_insn(auprobe))
+ goto setup;
if (insn->opcode.nbytes != 2)
return -ENOSYS;
/*
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 10/23] uprobes/x86: Add support to emulate nop5 instruction
2025-03-20 11:41 ` [PATCH RFCv3 10/23] uprobes/x86: Add support to emulate nop5 instruction Jiri Olsa
@ 2025-04-04 20:33 ` Andrii Nakryiko
2025-04-07 11:07 ` Jiri Olsa
0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2025-04-04 20:33 UTC (permalink / raw)
To: Jiri Olsa
Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
David Laight, Thomas Weißschuh
On Thu, Mar 20, 2025 at 4:43 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding support to emulate nop5 as the original uprobe instruction.
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> arch/x86/kernel/uprobes.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
This optimization is independent from the sys_uprobe, right? Maybe
send it as a stand-alone patch and let's land it sooner?
Also, how hard would it be to do the same for other nopX instructions?
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 5ee2cce4c63e..1661e0ab2a3d 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -308,6 +308,11 @@ static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool
> return -ENOTSUPP;
> }
>
> +static int is_nop5_insn(uprobe_opcode_t *insn)
> +{
> + return !memcmp(insn, x86_nops[5], 5);
> +}
> +
> #ifdef CONFIG_X86_64
>
> asm (
> @@ -865,6 +870,11 @@ void arch_uprobe_clear_state(struct mm_struct *mm)
> hlist_for_each_entry_safe(tramp, n, &state->head_tramps, node)
> destroy_uprobe_trampoline(tramp);
> }
> +
> +static bool emulate_nop5_insn(struct arch_uprobe *auprobe)
> +{
> + return is_nop5_insn((uprobe_opcode_t *) &auprobe->insn);
> +}
> #else /* 32-bit: */
> /*
> * No RIP-relative addressing on 32-bit
> @@ -878,6 +888,10 @@ static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> {
> }
> +static bool emulate_nop5_insn(struct arch_uprobe *auprobe)
> +{
> + return false;
> +}
> #endif /* CONFIG_X86_64 */
>
> struct uprobe_xol_ops {
> @@ -1109,6 +1123,8 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> break;
>
> case 0x0f:
> + if (emulate_nop5_insn(auprobe))
> + goto setup;
> if (insn->opcode.nbytes != 2)
> return -ENOSYS;
> /*
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 10/23] uprobes/x86: Add support to emulate nop5 instruction
2025-04-04 20:33 ` Andrii Nakryiko
@ 2025-04-07 11:07 ` Jiri Olsa
2025-04-08 20:21 ` Jiri Olsa
0 siblings, 1 reply; 37+ messages in thread
From: Jiri Olsa @ 2025-04-07 11:07 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
David Laight, Thomas Weißschuh
On Fri, Apr 04, 2025 at 01:33:11PM -0700, Andrii Nakryiko wrote:
> On Thu, Mar 20, 2025 at 4:43 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Adding support to emulate nop5 as the original uprobe instruction.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> > arch/x86/kernel/uprobes.c | 16 ++++++++++++++++
> > 1 file changed, 16 insertions(+)
> >
>
> This optimization is independent from the sys_uprobe, right? Maybe
> send it as a stand-alone patch and let's land it sooner?
ok, will send it separately
> Also, how hard would it be to do the same for other nopX instructions?
will check, might be easy
thanks,
jirka
>
>
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index 5ee2cce4c63e..1661e0ab2a3d 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -308,6 +308,11 @@ static int uprobe_init_insn(struct arch_uprobe *auprobe, struct insn *insn, bool
> > return -ENOTSUPP;
> > }
> >
> > +static int is_nop5_insn(uprobe_opcode_t *insn)
> > +{
> > + return !memcmp(insn, x86_nops[5], 5);
> > +}
> > +
> > #ifdef CONFIG_X86_64
> >
> > asm (
> > @@ -865,6 +870,11 @@ void arch_uprobe_clear_state(struct mm_struct *mm)
> > hlist_for_each_entry_safe(tramp, n, &state->head_tramps, node)
> > destroy_uprobe_trampoline(tramp);
> > }
> > +
> > +static bool emulate_nop5_insn(struct arch_uprobe *auprobe)
> > +{
> > + return is_nop5_insn((uprobe_opcode_t *) &auprobe->insn);
> > +}
> > #else /* 32-bit: */
> > /*
> > * No RIP-relative addressing on 32-bit
> > @@ -878,6 +888,10 @@ static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> > static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> > {
> > }
> > +static bool emulate_nop5_insn(struct arch_uprobe *auprobe)
> > +{
> > + return false;
> > +}
> > #endif /* CONFIG_X86_64 */
> >
> > struct uprobe_xol_ops {
> > @@ -1109,6 +1123,8 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> > break;
> >
> > case 0x0f:
> > + if (emulate_nop5_insn(auprobe))
> > + goto setup;
> > if (insn->opcode.nbytes != 2)
> > return -ENOSYS;
> > /*
> > --
> > 2.49.0
> >
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 10/23] uprobes/x86: Add support to emulate nop5 instruction
2025-04-07 11:07 ` Jiri Olsa
@ 2025-04-08 20:21 ` Jiri Olsa
2025-04-09 18:19 ` Andrii Nakryiko
0 siblings, 1 reply; 37+ messages in thread
From: Jiri Olsa @ 2025-04-08 20:21 UTC (permalink / raw)
To: Jiri Olsa
Cc: Andrii Nakryiko, Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko,
bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
On Mon, Apr 07, 2025 at 01:07:26PM +0200, Jiri Olsa wrote:
> On Fri, Apr 04, 2025 at 01:33:11PM -0700, Andrii Nakryiko wrote:
> > On Thu, Mar 20, 2025 at 4:43 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > >
> > > Adding support to emulate nop5 as the original uprobe instruction.
> > >
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > > arch/x86/kernel/uprobes.c | 16 ++++++++++++++++
> > > 1 file changed, 16 insertions(+)
> > >
> >
> > This optimization is independent from the sys_uprobe, right? Maybe
> > send it as a stand-alone patch and let's land it sooner?
>
> ok, will send it separately
>
> > Also, how hard would it be to do the same for other nopX instructions?
>
> will check, might be easy
we can't do all at the moment, nop1-nop8 are fine, but uprobe won't
attach on nop9/10/11 due unsupported prefix.. I guess insn decode
would need to be updated first
I'll send the nop5 emulation change, because of above and also I don't
see practical justification to emulate other nops
jirka
---
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 9194695662b2..6616cc9866cc 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -608,6 +608,21 @@ static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
*sr = utask->autask.saved_scratch_register;
}
}
+
+static bool emulate_nop_insn(struct arch_uprobe *auprobe)
+{
+ unsigned int i;
+
+ /*
+ * Uprobe is only allowed to be attached on nop1 through nop8. Further nop
+ * instructions have unsupported prefix and uprobe fails to attach on them.
+ */
+ for (i = 1; i < 9; i++) {
+ if (!memcmp(&auprobe->insn, x86_nops[i], i))
+ return true;
+ }
+ return false;
+}
#else /* 32-bit: */
/*
* No RIP-relative addressing on 32-bit
@@ -621,6 +636,10 @@ static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
}
+static bool emulate_nop_insn(struct arch_uprobe *auprobe)
+{
+ return false;
+}
#endif /* CONFIG_X86_64 */
struct uprobe_xol_ops {
@@ -840,6 +859,9 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
insn_byte_t p;
int i;
+ if (emulate_nop_insn(auprobe))
+ goto setup;
+
switch (opc1) {
case 0xeb: /* jmp 8 */
case 0xe9: /* jmp 32 */
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 10/23] uprobes/x86: Add support to emulate nop5 instruction
2025-04-08 20:21 ` Jiri Olsa
@ 2025-04-09 18:19 ` Andrii Nakryiko
2025-04-11 12:18 ` Jiri Olsa
0 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2025-04-09 18:19 UTC (permalink / raw)
To: Jiri Olsa
Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf, linux-kernel,
linux-trace-kernel, x86, Song Liu, Yonghong Song, John Fastabend,
Hao Luo, Steven Rostedt, Masami Hiramatsu, Alan Maguire,
David Laight, Thomas Weißschuh
On Tue, Apr 8, 2025 at 1:22 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Apr 07, 2025 at 01:07:26PM +0200, Jiri Olsa wrote:
> > On Fri, Apr 04, 2025 at 01:33:11PM -0700, Andrii Nakryiko wrote:
> > > On Thu, Mar 20, 2025 at 4:43 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > >
> > > > Adding support to emulate nop5 as the original uprobe instruction.
> > > >
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > ---
> > > > arch/x86/kernel/uprobes.c | 16 ++++++++++++++++
> > > > 1 file changed, 16 insertions(+)
> > > >
> > >
> > > This optimization is independent from the sys_uprobe, right? Maybe
> > > send it as a stand-alone patch and let's land it sooner?
> >
> > ok, will send it separately
> >
> > > Also, how hard would it be to do the same for other nopX instructions?
> >
> > will check, might be easy
>
> we can't do all at the moment, nop1-nop8 are fine, but uprobe won't
> attach on nop9/10/11 due unsupported prefix.. I guess insn decode
> would need to be updated first
>
> I'll send the nop5 emulation change, because of above and also I don't
> see practical justification to emulate other nops
>
Well, let me counter this approach: if we had nop5 emulation from the
day one, then we could have just transparently switched USDT libraries
to use nop5 because they would work well both before and after your
sys_uprobe changes. But we cannot, and that WILL cause problems and
headaches to work around that limitation.
See where I'm going with this? I understand the general "don't build
feature unless you have a use case", but in this case it's just a
matter of generality and common sense: we emulate nop1 and nop5, what
reasons do we have to not emulate all the other nops? Within reason,
of course. If it's hard to do some nopX, then it would be hard to
justify without a specific use case. But it doesn't seem so, at least
for nop1-nop8, so why not?
tl;dr, let's add all the nops we can emulate now, in one go, instead
of spoon-feeding this support through the years (with lots of
unnecessary backwards compatibility headaches associated with that
approach).
> jirka
>
>
> ---
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 9194695662b2..6616cc9866cc 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -608,6 +608,21 @@ static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> *sr = utask->autask.saved_scratch_register;
> }
> }
> +
> +static bool emulate_nop_insn(struct arch_uprobe *auprobe)
> +{
> + unsigned int i;
> +
> + /*
> + * Uprobe is only allowed to be attached on nop1 through nop8. Further nop
> + * instructions have unsupported prefix and uprobe fails to attach on them.
> + */
> + for (i = 1; i < 9; i++) {
> + if (!memcmp(&auprobe->insn, x86_nops[i], i))
> + return true;
> + }
> + return false;
> +}
> #else /* 32-bit: */
> /*
> * No RIP-relative addressing on 32-bit
> @@ -621,6 +636,10 @@ static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> {
> }
> +static bool emulate_nop_insn(struct arch_uprobe *auprobe)
> +{
> + return false;
> +}
> #endif /* CONFIG_X86_64 */
>
> struct uprobe_xol_ops {
> @@ -840,6 +859,9 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> insn_byte_t p;
> int i;
>
> + if (emulate_nop_insn(auprobe))
> + goto setup;
> +
> switch (opc1) {
> case 0xeb: /* jmp 8 */
> case 0xe9: /* jmp 32 */
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 10/23] uprobes/x86: Add support to emulate nop5 instruction
2025-04-09 18:19 ` Andrii Nakryiko
@ 2025-04-11 12:18 ` Jiri Olsa
0 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-04-11 12:18 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Jiri Olsa, Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, bpf,
linux-kernel, linux-trace-kernel, x86, Song Liu, Yonghong Song,
John Fastabend, Hao Luo, Steven Rostedt, Masami Hiramatsu,
Alan Maguire, David Laight, Thomas Weißschuh
On Wed, Apr 09, 2025 at 11:19:36AM -0700, Andrii Nakryiko wrote:
> On Tue, Apr 8, 2025 at 1:22 PM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Apr 07, 2025 at 01:07:26PM +0200, Jiri Olsa wrote:
> > > On Fri, Apr 04, 2025 at 01:33:11PM -0700, Andrii Nakryiko wrote:
> > > > On Thu, Mar 20, 2025 at 4:43 AM Jiri Olsa <jolsa@kernel.org> wrote:
> > > > >
> > > > > Adding support to emulate nop5 as the original uprobe instruction.
> > > > >
> > > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > > ---
> > > > > arch/x86/kernel/uprobes.c | 16 ++++++++++++++++
> > > > > 1 file changed, 16 insertions(+)
> > > > >
> > > >
> > > > This optimization is independent from the sys_uprobe, right? Maybe
> > > > send it as a stand-alone patch and let's land it sooner?
> > >
> > > ok, will send it separately
> > >
> > > > Also, how hard would it be to do the same for other nopX instructions?
> > >
> > > will check, might be easy
> >
> > we can't do all at the moment, nop1-nop8 are fine, but uprobe won't
> > attach on nop9/10/11 due unsupported prefix.. I guess insn decode
> > would need to be updated first
> >
> > I'll send the nop5 emulation change, because of above and also I don't
> > see practical justification to emulate other nops
> >
>
> Well, let me counter this approach: if we had nop5 emulation from the
> day one, then we could have just transparently switched USDT libraries
> to use nop5 because they would work well both before and after your
> sys_uprobe changes. But we cannot, and that WILL cause problems and
> headaches to work around that limitation.
>
> See where I'm going with this? I understand the general "don't build
> feature unless you have a use case", but in this case it's just a
> matter of generality and common sense: we emulate nop1 and nop5, what
> reasons do we have to not emulate all the other nops? Within reason,
> of course. If it's hard to do some nopX, then it would be hard to
> justify without a specific use case. But it doesn't seem so, at least
> for nop1-nop8, so why not?
>
> tl;dr, let's add all the nops we can emulate now, in one go, instead
> of spoon-feeding this support through the years (with lots of
> unnecessary backwards compatibility headaches associated with that
> approach).
ok, Oleg suggested similar change, I sent v2 with that
thanks,
jirka
>
>
> > jirka
> >
> >
> > ---
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index 9194695662b2..6616cc9866cc 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -608,6 +608,21 @@ static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> > *sr = utask->autask.saved_scratch_register;
> > }
> > }
> > +
> > +static bool emulate_nop_insn(struct arch_uprobe *auprobe)
> > +{
> > + unsigned int i;
> > +
> > + /*
> > + * Uprobe is only allowed to be attached on nop1 through nop8. Further nop
> > + * instructions have unsupported prefix and uprobe fails to attach on them.
> > + */
> > + for (i = 1; i < 9; i++) {
> > + if (!memcmp(&auprobe->insn, x86_nops[i], i))
> > + return true;
> > + }
> > + return false;
> > +}
> > #else /* 32-bit: */
> > /*
> > * No RIP-relative addressing on 32-bit
> > @@ -621,6 +636,10 @@ static void riprel_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> > static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> > {
> > }
> > +static bool emulate_nop_insn(struct arch_uprobe *auprobe)
> > +{
> > + return false;
> > +}
> > #endif /* CONFIG_X86_64 */
> >
> > struct uprobe_xol_ops {
> > @@ -840,6 +859,9 @@ static int branch_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> > insn_byte_t p;
> > int i;
> >
> > + if (emulate_nop_insn(auprobe))
> > + goto setup;
> > +
> > switch (opc1) {
> > case 0xeb: /* jmp 8 */
> > case 0xe9: /* jmp 32 */
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH RFCv3 11/23] uprobes/x86: Add support to optimize uprobes
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (9 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 10/23] uprobes/x86: Add support to emulate nop5 instruction Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 12/23] selftests/bpf: Use 5-byte nop for x86 usdt probes Jiri Olsa
` (13 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Putting together all the previously added pieces to support optimized
uprobes on top of 5-byte nop instruction.
The current uprobe execution goes through following:
- installs breakpoint instruction over original instruction
- exception handler hit and calls related uprobe consumers
- and either simulates original instruction or does out of line single step
execution of it
- returns to user space
The optimized uprobe path
- checks the original instruction is 5-byte nop (plus other checks)
- adds (or uses existing) user space trampoline and overwrites original
instruction (5-byte nop) with call to user space trampoline
- the user space trampoline executes uprobe syscall that calls related uprobe
consumers
- trampoline returns back to next instruction
This approach won't speed up all uprobes as it's limited to using nop5 as
original instruction, but we could use nop5 as USDT probe instruction (which
uses single byte nop ATM) and speed up the USDT probes.
This patch overloads related arch functions in uprobe_write_opcode and
set_orig_insn so they can install call instruction if needed.
The arch_uprobe_optimize triggers the uprobe optimization and is called after
first uprobe hit. I originally had it called on uprobe installation but then
it clashed with elf loader, because the user space trampoline was added in a
place where loader might need to put elf segments, so I decided to do it after
first uprobe hit when loading is done.
We do not unmap and release uprobe trampoline when it's no longer needed,
because there's no easy way to make sure none of the threads is still
inside the trampoline. But we do not waste memory, because there's just
single page for all the uprobe trampoline mappings.
We do waste frame on page mapping for every 4GB by keeping the uprobe
trampoline page mapped, but that seems ok.
We take the benefit from the fact that set_swbp and set_orig_insn are
called under mmap_write_lock(mm), so we can use the current instruction
as the state the uprobe is in - nop5/breakpoint/call trampoline -
and decide the needed action (optimize/un-optimize) based on that.
Attaching the speed up from benchs/run_bench_uprobes.sh script:
current:
usermode-count : 152.604 ± 0.044M/s
syscall-count : 13.359 ± 0.042M/s
--> uprobe-nop : 3.229 ± 0.002M/s
uprobe-push : 3.086 ± 0.004M/s
uprobe-ret : 1.114 ± 0.004M/s
uprobe-nop5 : 1.121 ± 0.005M/s
uretprobe-nop : 2.145 ± 0.002M/s
uretprobe-push : 2.070 ± 0.001M/s
uretprobe-ret : 0.931 ± 0.001M/s
uretprobe-nop5 : 0.957 ± 0.001M/s
after the change:
usermode-count : 152.448 ± 0.244M/s
syscall-count : 14.321 ± 0.059M/s
uprobe-nop : 3.148 ± 0.007M/s
uprobe-push : 2.976 ± 0.004M/s
uprobe-ret : 1.068 ± 0.003M/s
--> uprobe-nop5 : 7.038 ± 0.007M/s
uretprobe-nop : 2.109 ± 0.004M/s
uretprobe-push : 2.035 ± 0.001M/s
uretprobe-ret : 0.908 ± 0.001M/s
uretprobe-nop5 : 3.377 ± 0.009M/s
I see bit more speed up on Intel (above) compared to AMD. The big nop5
speed up is partly due to emulating nop5 and partly due to optimization.
The key speed up we do this for is the USDT switch from nop to nop5:
uprobe-nop : 3.148 ± 0.007M/s
uprobe-nop5 : 7.038 ± 0.007M/s
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
arch/x86/include/asm/uprobes.h | 7 +
arch/x86/kernel/uprobes.c | 269 ++++++++++++++++++++++++++++++++-
include/linux/uprobes.h | 6 +-
kernel/events/uprobes.c | 16 +-
4 files changed, 290 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 678fb546f0a7..1ee2e5115955 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -20,6 +20,11 @@ typedef u8 uprobe_opcode_t;
#define UPROBE_SWBP_INSN 0xcc
#define UPROBE_SWBP_INSN_SIZE 1
+enum {
+ ARCH_UPROBE_FLAG_CAN_OPTIMIZE = 0,
+ ARCH_UPROBE_FLAG_OPTIMIZE_FAIL = 1,
+};
+
struct uprobe_xol_ops;
struct arch_uprobe {
@@ -45,6 +50,8 @@ struct arch_uprobe {
u8 ilen;
} push;
};
+
+ unsigned long flags;
};
struct arch_uprobe_task {
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 1661e0ab2a3d..ff09211dada8 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -18,6 +18,7 @@
#include <asm/processor.h>
#include <asm/insn.h>
#include <asm/mmu_context.h>
+#include <asm/nops.h>
/* Post-execution fixups. */
@@ -819,7 +820,6 @@ static struct uprobe_trampoline *create_uprobe_trampoline(unsigned long vaddr)
return NULL;
}
-__maybe_unused
static struct uprobe_trampoline *uprobe_trampoline_get(unsigned long vaddr)
{
struct uprobes_state *state = ¤t->mm->uprobes_state;
@@ -846,7 +846,6 @@ static void destroy_uprobe_trampoline(struct uprobe_trampoline *tramp)
kfree(tramp);
}
-__maybe_unused
static void uprobe_trampoline_put(struct uprobe_trampoline *tramp)
{
if (tramp == NULL)
@@ -856,6 +855,212 @@ static void uprobe_trampoline_put(struct uprobe_trampoline *tramp)
destroy_uprobe_trampoline(tramp);
}
+enum {
+ OPT_PART,
+ OPT_INSN,
+ UNOPT_INT3,
+ UNOPT_PART,
+};
+
+struct write_opcode_ctx {
+ unsigned long base;
+ int update;
+};
+
+static int is_call_insn(uprobe_opcode_t *insn)
+{
+ return *insn == CALL_INSN_OPCODE;
+}
+
+static int verify_insn(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode,
+ int nbytes, void *data)
+{
+ struct write_opcode_ctx *ctx = data;
+ uprobe_opcode_t old_opcode[5];
+
+ uprobe_copy_from_page(page, ctx->base, (uprobe_opcode_t *) &old_opcode, 5);
+
+ switch (ctx->update) {
+ case OPT_PART:
+ case OPT_INSN:
+ if (is_swbp_insn(&old_opcode[0]))
+ return 1;
+ break;
+ case UNOPT_INT3:
+ if (is_call_insn(&old_opcode[0]))
+ return 1;
+ break;
+ case UNOPT_PART:
+ if (is_swbp_insn(&old_opcode[0]))
+ return 1;
+ break;
+ }
+
+ return -1;
+}
+
+static int write_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
+ uprobe_opcode_t *insn, int nbytes, void *ctx)
+{
+ return uprobe_write(auprobe, mm, vaddr, insn, nbytes, verify_insn, false, ctx);
+}
+
+static void relative_call(void *dest, long from, long to)
+{
+ struct __packed __arch_relative_insn {
+ u8 op;
+ s32 raddr;
+ } *insn;
+
+ insn = (struct __arch_relative_insn *)dest;
+ insn->raddr = (s32)(to - (from + 5));
+ insn->op = CALL_INSN_OPCODE;
+}
+
+static int swbp_optimize(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
+ unsigned long tramp)
+{
+ struct write_opcode_ctx ctx = {
+ .base = vaddr,
+ };
+ char call[5];
+ int err;
+
+ relative_call(call, vaddr, tramp);
+
+ /*
+ * We are in state where breakpoint (int3) is installed on top of first
+ * byte of the nop5 instruction. We will do following steps to overwrite
+ * this to call instruction:
+ *
+ * - sync cores
+ * - write last 4 bytes of the call instruction
+ * - sync cores
+ * - update the call instruction opcode
+ */
+
+ text_poke_sync();
+
+ ctx.update = OPT_PART;
+ err = write_insn(auprobe, mm, vaddr + 1, call + 1, 4, &ctx);
+ if (err)
+ return err;
+
+ text_poke_sync();
+
+ ctx.update = OPT_INSN;
+ return write_insn(auprobe, mm, vaddr, call, 1, &ctx);
+}
+
+static int swbp_unoptimize(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
+{
+ uprobe_opcode_t int3 = UPROBE_SWBP_INSN;
+ struct write_opcode_ctx ctx = {
+ .base = vaddr,
+ };
+ int err;
+
+ /*
+ * We need to overwrite call instruction into nop5 instruction with
+ * breakpoint (int3) installed on top of its first byte. We will:
+ *
+ * - overwrite call opcode with breakpoint (int3)
+ * - sync cores
+ * - write last 4 bytes of the nop5 instruction
+ * - sync cores
+ */
+
+ ctx.update = UNOPT_INT3;
+ err = write_insn(auprobe, mm, vaddr, &int3, 1, &ctx);
+ if (err)
+ return err;
+
+ text_poke_sync();
+
+ ctx.update = UNOPT_PART;
+ err = write_insn(auprobe, mm, vaddr + 1, (uprobe_opcode_t *) auprobe->insn + 1, 4, &ctx);
+
+ text_poke_sync();
+ return err;
+}
+
+static int copy_from_vaddr(struct mm_struct *mm, unsigned long vaddr, void *dst, int len)
+{
+ unsigned int gup_flags = FOLL_FORCE|FOLL_SPLIT_PMD;
+ struct vm_area_struct *vma;
+ struct page *page;
+
+ page = get_user_page_vma_remote(mm, vaddr, gup_flags, &vma);
+ if (IS_ERR(page))
+ return PTR_ERR(page);
+ uprobe_copy_from_page(page, vaddr, dst, len);
+ put_page(page);
+ return 0;
+}
+
+static bool __is_optimized(uprobe_opcode_t *insn, unsigned long vaddr)
+{
+ struct __packed __arch_relative_insn {
+ u8 op;
+ s32 raddr;
+ } *call = (struct __arch_relative_insn *) insn;
+
+ if (!is_call_insn(insn))
+ return false;
+ return __in_uprobe_trampoline(vaddr + 5 + call->raddr);
+}
+
+static int is_optimized(struct mm_struct *mm, unsigned long vaddr, bool *optimized)
+{
+ uprobe_opcode_t insn[5];
+ int err;
+
+ err = copy_from_vaddr(mm, vaddr, &insn, 5);
+ if (err)
+ return err;
+ *optimized = __is_optimized((uprobe_opcode_t *)&insn, vaddr);
+ return 0;
+}
+
+static bool should_optimize(struct arch_uprobe *auprobe)
+{
+ return !test_bit(ARCH_UPROBE_FLAG_OPTIMIZE_FAIL, &auprobe->flags) &&
+ test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags);
+}
+
+int set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
+{
+ if (should_optimize(auprobe)) {
+ bool optimized = false;
+ int err;
+
+ err = is_optimized(mm, vaddr, &optimized);
+ if (err || optimized)
+ return err;
+ }
+ return uprobe_write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN, false);
+}
+
+int set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr)
+{
+ /*
+ * We might be in race where we have optimized uprobe, but the uprobe
+ * was flagged as failed from another task, so we try to unoptimize
+ * the uprobe regardless the failed flag.
+ */
+ if (test_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags)) {
+ bool optimized = false;
+ int err;
+
+ err = is_optimized(mm, vaddr, &optimized);
+ if (err)
+ return err;
+ if (optimized)
+ WARN_ON_ONCE(swbp_unoptimize(auprobe, mm, vaddr));
+ }
+ return uprobe_write_opcode(auprobe, mm, vaddr, *(uprobe_opcode_t *)&auprobe->insn, true);
+}
+
void arch_uprobe_init_state(struct mm_struct *mm)
{
INIT_HLIST_HEAD(&mm->uprobes_state.head_tramps);
@@ -875,6 +1080,59 @@ static bool emulate_nop5_insn(struct arch_uprobe *auprobe)
{
return is_nop5_insn((uprobe_opcode_t *) &auprobe->insn);
}
+
+static int __arch_uprobe_optimize(struct mm_struct *mm, struct arch_uprobe *auprobe,
+ unsigned long vaddr)
+{
+ struct uprobe_trampoline *tramp;
+ int err = 0;
+
+ tramp = uprobe_trampoline_get(vaddr);
+ if (!tramp)
+ return -1;
+ err = swbp_optimize(auprobe, mm, vaddr, tramp->vaddr);
+ if (WARN_ON_ONCE(err))
+ uprobe_trampoline_put(tramp);
+ return err;
+}
+
+void arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
+{
+ struct mm_struct *mm = current->mm;
+ uprobe_opcode_t insn[5];
+
+ if (!should_optimize(auprobe))
+ return;
+
+ mmap_write_lock(mm);
+
+ /*
+ * Check if some other thread already optimized the uprobe for us,
+ * if it's the case just go away silently.
+ */
+ if (copy_from_vaddr(mm, vaddr, &insn, 5))
+ goto unlock;
+ if (!is_swbp_insn((uprobe_opcode_t*) &insn))
+ goto unlock;
+
+ /*
+ * If we fail to optimize the uprobe we set the fail bit so the
+ * above should_optimize will fail from now on.
+ */
+ if (__arch_uprobe_optimize(mm, auprobe, vaddr))
+ set_bit(ARCH_UPROBE_FLAG_OPTIMIZE_FAIL, &auprobe->flags);
+
+unlock:
+ mmap_write_unlock(mm);
+}
+
+static bool can_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
+{
+ if (!is_nop5_insn((uprobe_opcode_t *) &auprobe->insn))
+ return false;
+ /* We can't do cross page atomic writes yet. */
+ return PAGE_SIZE - (vaddr & ~PAGE_MASK) >= 5;
+}
#else /* 32-bit: */
/*
* No RIP-relative addressing on 32-bit
@@ -892,6 +1150,10 @@ static bool emulate_nop5_insn(struct arch_uprobe *auprobe)
{
return false;
}
+static bool can_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
+{
+ return false;
+}
#endif /* CONFIG_X86_64 */
struct uprobe_xol_ops {
@@ -1255,6 +1517,9 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (ret)
return ret;
+ if (can_optimize(auprobe, addr))
+ set_bit(ARCH_UPROBE_FLAG_CAN_OPTIMIZE, &auprobe->flags);
+
ret = branch_setup_xol_ops(auprobe, &insn);
if (ret != -ENOSYS)
return ret;
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 7bde68871150..58868407fed0 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -191,7 +191,8 @@ struct uprobes_state {
#endif
};
-typedef int (*uprobe_write_verify_t)(struct page *page, unsigned long vaddr, uprobe_opcode_t *opcode, int nbytes);
+typedef int (*uprobe_write_verify_t)(struct page *page, unsigned long vaddr, uprobe_opcode_t *opcode,
+ int nbytes, void *data);
extern void __init uprobes_init(void);
extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr);
@@ -203,7 +204,7 @@ extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
uprobe_opcode_t, bool);
extern int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr,
- uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify, bool orig);
+ uprobe_opcode_t *insn, int nbytes, uprobe_write_verify_t verify, bool orig, void *data);
extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
extern void uprobe_unregister_nosync(struct uprobe *uprobe, struct uprobe_consumer *uc);
@@ -239,6 +240,7 @@ extern void uprobe_copy_from_page(struct page *page, unsigned long vaddr, void *
extern void handle_syscall_uprobe(struct pt_regs *regs, unsigned long bp_vaddr);
extern void arch_uprobe_clear_state(struct mm_struct *mm);
extern void arch_uprobe_init_state(struct mm_struct *mm);
+extern void arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr);
#else /* !CONFIG_UPROBES */
struct uprobes_state {
};
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 9370df47ec71..a6108bd0b8d7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -264,7 +264,8 @@ static void uprobe_copy_to_page(struct page *page, unsigned long vaddr, const vo
kunmap_atomic(kaddr);
}
-static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode, int nbytes)
+static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode,
+ int nbytes, void *data)
{
uprobe_opcode_t old_opcode;
bool is_swbp;
@@ -473,12 +474,12 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr, uprobe_opcode_t opcode, bool orig)
{
- return uprobe_write(auprobe, mm, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE, verify_opcode, orig);
+ return uprobe_write(auprobe, mm, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE, verify_opcode, orig, NULL);
}
int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm,
unsigned long vaddr, uprobe_opcode_t *insn,
- int nbytes, uprobe_write_verify_t verify, bool orig)
+ int nbytes, uprobe_write_verify_t verify, bool orig, void *data)
{
struct page *old_page, *new_page;
struct vm_area_struct *vma;
@@ -494,7 +495,7 @@ int uprobe_write(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (IS_ERR(old_page))
return PTR_ERR(old_page);
- ret = verify(old_page, vaddr, insn, nbytes);
+ ret = verify(old_page, vaddr, insn, nbytes, data);
if (ret <= 0)
goto put_old;
@@ -2677,6 +2678,10 @@ bool __weak arch_uretprobe_is_alive(struct return_instance *ret, enum rp_check c
return true;
}
+void __weak arch_uprobe_optimize(struct arch_uprobe *auprobe, unsigned long vaddr)
+{
+}
+
/*
* Run handler and ask thread to singlestep.
* Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
@@ -2741,6 +2746,9 @@ static void handle_swbp(struct pt_regs *regs)
handler_chain(uprobe, regs);
+ /* Try to optimize after first hit. */
+ arch_uprobe_optimize(&uprobe->arch, bp_vaddr);
+
if (arch_uprobe_skip_sstep(&uprobe->arch, regs))
goto out;
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 12/23] selftests/bpf: Use 5-byte nop for x86 usdt probes
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (10 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 11/23] uprobes/x86: Add support to optimize uprobes Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 13/23] selftests/bpf: Reorg the uprobe_syscall test function Jiri Olsa
` (12 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Using 5-byte nop for x86 usdt probes so we can switch
to optimized uprobe them.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
tools/testing/selftests/bpf/sdt.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/sdt.h b/tools/testing/selftests/bpf/sdt.h
index 1fcfa5160231..1d62c06f5ddc 100644
--- a/tools/testing/selftests/bpf/sdt.h
+++ b/tools/testing/selftests/bpf/sdt.h
@@ -236,6 +236,13 @@ __extension__ extern unsigned long long __sdt_unsp;
#define _SDT_NOP nop
#endif
+/* Use 5 byte nop for x86_64 to allow optimizing uprobes. */
+#if defined(__x86_64__)
+# define _SDT_DEF_NOP _SDT_ASM_5(990: .byte 0x0f, 0x1f, 0x44, 0x00, 0x00)
+#else
+# define _SDT_DEF_NOP _SDT_ASM_1(990: _SDT_NOP)
+#endif
+
#define _SDT_NOTE_NAME "stapsdt"
#define _SDT_NOTE_TYPE 3
@@ -288,7 +295,7 @@ __extension__ extern unsigned long long __sdt_unsp;
#define _SDT_ASM_BODY(provider, name, pack_args, args, ...) \
_SDT_DEF_MACROS \
- _SDT_ASM_1(990: _SDT_NOP) \
+ _SDT_DEF_NOP \
_SDT_ASM_3( .pushsection .note.stapsdt,_SDT_ASM_AUTOGROUP,"note") \
_SDT_ASM_1( .balign 4) \
_SDT_ASM_3( .4byte 992f-991f, 994f-993f, _SDT_NOTE_TYPE) \
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 13/23] selftests/bpf: Reorg the uprobe_syscall test function
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (11 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 12/23] selftests/bpf: Use 5-byte nop for x86 usdt probes Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 14/23] selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi Jiri Olsa
` (11 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Adding __test_uprobe_syscall with non x86_64 stub to execute all the tests,
so we don't need to keep adding non x86_64 stub functions for new tests.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
.../selftests/bpf/prog_tests/uprobe_syscall.c | 34 +++++++------------
1 file changed, 12 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index c397336fe1ed..2b00f16406c8 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -350,29 +350,8 @@ static void test_uretprobe_shadow_stack(void)
ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
}
-#else
-static void test_uretprobe_regs_equal(void)
-{
- test__skip();
-}
-
-static void test_uretprobe_regs_change(void)
-{
- test__skip();
-}
-
-static void test_uretprobe_syscall_call(void)
-{
- test__skip();
-}
-static void test_uretprobe_shadow_stack(void)
-{
- test__skip();
-}
-#endif
-
-void test_uprobe_syscall(void)
+static void __test_uprobe_syscall(void)
{
if (test__start_subtest("uretprobe_regs_equal"))
test_uretprobe_regs_equal();
@@ -383,3 +362,14 @@ void test_uprobe_syscall(void)
if (test__start_subtest("uretprobe_shadow_stack"))
test_uretprobe_shadow_stack();
}
+#else
+static void __test_uprobe_syscall(void)
+{
+ test__skip();
+}
+#endif
+
+void test_uprobe_syscall(void)
+{
+ __test_uprobe_syscall();
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 14/23] selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (12 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 13/23] selftests/bpf: Reorg the uprobe_syscall test function Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 15/23] selftests/bpf: Add uprobe/usdt syscall tests Jiri Olsa
` (10 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Renaming uprobe_syscall_executed prog to test_uretprobe_multi
to fit properly in the following changes that add more programs.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 8 ++++----
.../testing/selftests/bpf/progs/uprobe_syscall_executed.c | 4 ++--
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 2b00f16406c8..3c74a079e6d9 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -277,10 +277,10 @@ static void test_uretprobe_syscall_call(void)
_exit(0);
}
- skel->links.test = bpf_program__attach_uprobe_multi(skel->progs.test, pid,
- "/proc/self/exe",
- "uretprobe_syscall_call", &opts);
- if (!ASSERT_OK_PTR(skel->links.test, "bpf_program__attach_uprobe_multi"))
+ skel->links.test_uretprobe_multi = bpf_program__attach_uprobe_multi(skel->progs.test_uretprobe_multi,
+ pid, "/proc/self/exe",
+ "uretprobe_syscall_call", &opts);
+ if (!ASSERT_OK_PTR(skel->links.test_uretprobe_multi, "bpf_program__attach_uprobe_multi"))
goto cleanup;
/* kick the child */
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
index 0d7f1a7db2e2..2e1b689ed4fb 100644
--- a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
@@ -10,8 +10,8 @@ char _license[] SEC("license") = "GPL";
int executed = 0;
SEC("uretprobe.multi")
-int test(struct pt_regs *regs)
+int test_uretprobe_multi(struct pt_regs *ctx)
{
- executed = 1;
+ executed++;
return 0;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 15/23] selftests/bpf: Add uprobe/usdt syscall tests
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (13 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 14/23] selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 16/23] selftests/bpf: Add hit/attach/detach race optimized uprobe test Jiri Olsa
` (9 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Adding tests for optimized uprobe/usdt probes.
Checking that we get expected trampoline and attached bpf programs
get executed properly.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
.../selftests/bpf/prog_tests/uprobe_syscall.c | 257 ++++++++++++++++++
.../bpf/progs/uprobe_syscall_executed.c | 37 +++
2 files changed, 294 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 3c74a079e6d9..d648bf8eca64 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -14,6 +14,9 @@
#include <asm/prctl.h>
#include "uprobe_syscall.skel.h"
#include "uprobe_syscall_executed.skel.h"
+#include "sdt.h"
+
+#pragma GCC diagnostic ignored "-Wattributes"
__naked unsigned long uretprobe_regs_trigger(void)
{
@@ -351,6 +354,252 @@ static void test_uretprobe_shadow_stack(void)
ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
}
+#define TRAMP "[uprobes-trampoline]"
+
+__attribute__((aligned(16)))
+__nocf_check __weak __naked void uprobe_test(void)
+{
+ asm volatile (" \n"
+ ".byte 0x0f, 0x1f, 0x44, 0x00, 0x00 \n"
+ "ret \n"
+ );
+}
+
+__attribute__((aligned(16)))
+__nocf_check __weak void usdt_test(void)
+{
+ STAP_PROBE(optimized_uprobe, usdt);
+}
+
+static int find_uprobes_trampoline(void **start, void **end)
+{
+ char line[128];
+ int ret = -1;
+ FILE *maps;
+
+ maps = fopen("/proc/self/maps", "r");
+ if (!maps) {
+ fprintf(stderr, "cannot open maps\n");
+ return -1;
+ }
+
+ while (fgets(line, sizeof(line), maps)) {
+ int m = -1;
+
+ /* We care only about private r-x mappings. */
+ if (sscanf(line, "%p-%p r-xp %*x %*x:%*x %*u %n", start, end, &m) != 2)
+ continue;
+ if (m < 0)
+ continue;
+ if (!strncmp(&line[m], TRAMP, sizeof(TRAMP)-1)) {
+ ret = 0;
+ break;
+ }
+ }
+
+ fclose(maps);
+ return ret;
+}
+
+static unsigned char nop5[5] = { 0x0f, 0x1f, 0x44, 0x00, 0x00 };
+
+static void *find_nop5(void *fn)
+{
+ int i;
+
+ for (i = 0; i < 10; i++) {
+ if (!memcmp(nop5, fn + i, 5))
+ return fn + i;
+ }
+ return NULL;
+}
+
+typedef void (__attribute__((nocf_check)) *trigger_t)(void);
+
+static void check_attach(struct uprobe_syscall_executed *skel, trigger_t trigger,
+ void *addr, int executed)
+{
+ void *tramp_start, *tramp_end;
+ struct __arch_relative_insn {
+ u8 op;
+ s32 raddr;
+ } __packed *call;
+
+ s32 delta;
+
+ /* Uprobe gets optimized after first trigger, so let's press twice. */
+ trigger();
+ trigger();
+
+ if (!ASSERT_OK(find_uprobes_trampoline(&tramp_start, &tramp_end), "uprobes_trampoline"))
+ return;
+
+ /* Make sure bpf program got executed.. */
+ ASSERT_EQ(skel->bss->executed, executed, "executed");
+
+ /* .. and check the trampoline is as expected. */
+ call = (struct __arch_relative_insn *) addr;
+ delta = (unsigned long) tramp_start - ((unsigned long) addr + 5);
+
+ ASSERT_EQ(call->op, 0xe8, "call");
+ ASSERT_EQ(call->raddr, delta, "delta");
+ ASSERT_EQ(tramp_end - tramp_start, 4096, "size");
+}
+
+static void check_detach(struct uprobe_syscall_executed *skel, trigger_t trigger, void *addr)
+{
+ void *tramp_start, *tramp_end;
+
+ /* [uprobes_trampoline] stays after detach */
+ ASSERT_OK(find_uprobes_trampoline(&tramp_start, &tramp_end), "uprobes_trampoline");
+ ASSERT_OK(memcmp(addr, nop5, 5), "nop5");
+}
+
+static void check(struct uprobe_syscall_executed *skel, struct bpf_link *link,
+ trigger_t trigger, void *addr, int executed)
+{
+ check_attach(skel, trigger, addr, executed);
+ bpf_link__destroy(link);
+ check_detach(skel, trigger, addr);
+}
+
+static void test_uprobe_legacy(void)
+{
+ struct uprobe_syscall_executed *skel = NULL;
+ LIBBPF_OPTS(bpf_uprobe_opts, opts,
+ .retprobe = true,
+ );
+ struct bpf_link *link;
+ unsigned long offset;
+
+ offset = get_uprobe_offset(&uprobe_test);
+ if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
+ goto cleanup;
+
+ /* uprobe */
+ skel = uprobe_syscall_executed__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+ return;
+
+ link = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
+ 0, "/proc/self/exe", offset, NULL);
+ if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_opts"))
+ goto cleanup;
+
+ check(skel, link, uprobe_test, uprobe_test, 2);
+
+ /* uretprobe */
+ skel->bss->executed = 0;
+
+ link = bpf_program__attach_uprobe_opts(skel->progs.test_uretprobe,
+ 0, "/proc/self/exe", offset, &opts);
+ if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_opts"))
+ goto cleanup;
+
+ check(skel, link, uprobe_test, uprobe_test, 2);
+
+cleanup:
+ uprobe_syscall_executed__destroy(skel);
+}
+
+static void test_uprobe_multi(void)
+{
+ struct uprobe_syscall_executed *skel = NULL;
+ LIBBPF_OPTS(bpf_uprobe_multi_opts, opts);
+ struct bpf_link *link;
+ unsigned long offset;
+
+ offset = get_uprobe_offset(&uprobe_test);
+ if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
+ goto cleanup;
+
+ opts.offsets = &offset;
+ opts.cnt = 1;
+
+ skel = uprobe_syscall_executed__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+ return;
+
+ /* uprobe.multi */
+ link = bpf_program__attach_uprobe_multi(skel->progs.test_uprobe_multi,
+ 0, "/proc/self/exe", NULL, &opts);
+ if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
+ goto cleanup;
+
+ check(skel, link, uprobe_test, uprobe_test, 2);
+
+ /* uretprobe.multi */
+ skel->bss->executed = 0;
+ opts.retprobe = true;
+ link = bpf_program__attach_uprobe_multi(skel->progs.test_uretprobe_multi,
+ 0, "/proc/self/exe", NULL, &opts);
+ if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
+ goto cleanup;
+
+ check(skel, link, uprobe_test, uprobe_test, 2);
+
+cleanup:
+ uprobe_syscall_executed__destroy(skel);
+}
+
+static void test_uprobe_session(void)
+{
+ struct uprobe_syscall_executed *skel = NULL;
+ LIBBPF_OPTS(bpf_uprobe_multi_opts, opts,
+ .session = true,
+ );
+ struct bpf_link *link;
+ unsigned long offset;
+
+ offset = get_uprobe_offset(&uprobe_test);
+ if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
+ goto cleanup;
+
+ opts.offsets = &offset;
+ opts.cnt = 1;
+
+ skel = uprobe_syscall_executed__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+ return;
+
+ link = bpf_program__attach_uprobe_multi(skel->progs.test_uprobe_session,
+ 0, "/proc/self/exe", NULL, &opts);
+ if (!ASSERT_OK_PTR(link, "bpf_program__attach_uprobe_multi"))
+ goto cleanup;
+
+ check(skel, link, uprobe_test, uprobe_test, 4);
+
+cleanup:
+ uprobe_syscall_executed__destroy(skel);
+}
+
+static void test_uprobe_usdt(void)
+{
+ struct uprobe_syscall_executed *skel;
+ struct bpf_link *link;
+ void *addr;
+
+ errno = 0;
+ addr = find_nop5(usdt_test);
+ if (!ASSERT_OK_PTR(addr, "find_nop5"))
+ return;
+
+ skel = uprobe_syscall_executed__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+ return;
+
+ link = bpf_program__attach_usdt(skel->progs.test_usdt,
+ -1 /* all PIDs */, "/proc/self/exe",
+ "optimized_uprobe", "usdt", NULL);
+ if (!ASSERT_OK_PTR(link, "bpf_program__attach_usdt"))
+ goto cleanup;
+
+ check(skel, link, usdt_test, addr, 2);
+
+cleanup:
+ uprobe_syscall_executed__destroy(skel);
+}
+
static void __test_uprobe_syscall(void)
{
if (test__start_subtest("uretprobe_regs_equal"))
@@ -361,6 +610,14 @@ static void __test_uprobe_syscall(void)
test_uretprobe_syscall_call();
if (test__start_subtest("uretprobe_shadow_stack"))
test_uretprobe_shadow_stack();
+ if (test__start_subtest("uprobe_legacy"))
+ test_uprobe_legacy();
+ if (test__start_subtest("uprobe_multi"))
+ test_uprobe_multi();
+ if (test__start_subtest("uprobe_session"))
+ test_uprobe_session();
+ if (test__start_subtest("uprobe_usdt"))
+ test_uprobe_usdt();
}
#else
static void __test_uprobe_syscall(void)
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
index 2e1b689ed4fb..7bb4338c3ee2 100644
--- a/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c
@@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/usdt.bpf.h>
#include <string.h>
struct pt_regs regs;
@@ -9,9 +11,44 @@ char _license[] SEC("license") = "GPL";
int executed = 0;
+SEC("uprobe")
+int BPF_UPROBE(test_uprobe)
+{
+ executed++;
+ return 0;
+}
+
+SEC("uretprobe")
+int BPF_URETPROBE(test_uretprobe)
+{
+ executed++;
+ return 0;
+}
+
+SEC("uprobe.multi")
+int test_uprobe_multi(struct pt_regs *ctx)
+{
+ executed++;
+ return 0;
+}
+
SEC("uretprobe.multi")
int test_uretprobe_multi(struct pt_regs *ctx)
{
executed++;
return 0;
}
+
+SEC("uprobe.session")
+int test_uprobe_session(struct pt_regs *ctx)
+{
+ executed++;
+ return 0;
+}
+
+SEC("usdt")
+int test_usdt(struct pt_regs *ctx)
+{
+ executed++;
+ return 0;
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 16/23] selftests/bpf: Add hit/attach/detach race optimized uprobe test
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (14 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 15/23] selftests/bpf: Add uprobe/usdt syscall tests Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 17/23] selftests/bpf: Add uprobe syscall sigill signal test Jiri Olsa
` (8 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Adding test that makes sure parallel execution of the uprobe and
attach/detach of optimized uprobe on it works properly.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
.../selftests/bpf/prog_tests/uprobe_syscall.c | 74 +++++++++++++++++++
1 file changed, 74 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index d648bf8eca64..5c10cf173e6d 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -600,6 +600,78 @@ static void test_uprobe_usdt(void)
uprobe_syscall_executed__destroy(skel);
}
+static volatile bool race_stop;
+
+static void *worker_trigger(void *arg)
+{
+ unsigned long rounds = 0;
+
+ while (!race_stop) {
+ uprobe_test();
+ rounds++;
+ }
+
+ printf("tid %d trigger rounds: %lu\n", gettid(), rounds);
+ return NULL;
+}
+
+static void *worker_attach(void *arg)
+{
+ struct uprobe_syscall_executed *skel;
+ unsigned long rounds = 0, offset;
+
+ offset = get_uprobe_offset(&uprobe_test);
+ if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
+ return NULL;
+
+ skel = uprobe_syscall_executed__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "uprobe_syscall_executed__open_and_load"))
+ return NULL;
+
+ while (!race_stop) {
+ skel->links.test_uprobe = bpf_program__attach_uprobe_opts(skel->progs.test_uprobe,
+ 0, "/proc/self/exe", offset, NULL);
+ if (!ASSERT_OK_PTR(skel->links.test_uprobe, "bpf_program__attach_uprobe_opts"))
+ break;
+
+ bpf_link__destroy(skel->links.test_uprobe);
+ skel->links.test_uprobe = NULL;
+ rounds++;
+ }
+
+ printf("tid %d attach rounds: %lu hits: %d\n", gettid(), rounds, skel->bss->executed);
+ uprobe_syscall_executed__destroy(skel);
+ return NULL;
+}
+
+static void test_uprobe_race(void)
+{
+ int err, i, nr_threads;
+ pthread_t *threads;
+
+ nr_threads = libbpf_num_possible_cpus();
+ if (!ASSERT_GE(nr_threads, 0, "libbpf_num_possible_cpus"))
+ return;
+
+ threads = malloc(sizeof(*threads) * nr_threads);
+ if (!ASSERT_OK_PTR(threads, "malloc"))
+ return;
+
+ for (i = 0; i < nr_threads; i++) {
+ err = pthread_create(&threads[i], NULL, i % 2 ? worker_trigger : worker_attach,
+ NULL);
+ if (!ASSERT_OK(err, "pthread_create"))
+ goto cleanup;
+ }
+
+ sleep(4);
+
+cleanup:
+ race_stop = true;
+ for (nr_threads = i, i = 0; i < nr_threads; i++)
+ pthread_join(threads[i], NULL);
+}
+
static void __test_uprobe_syscall(void)
{
if (test__start_subtest("uretprobe_regs_equal"))
@@ -618,6 +690,8 @@ static void __test_uprobe_syscall(void)
test_uprobe_session();
if (test__start_subtest("uprobe_usdt"))
test_uprobe_usdt();
+ if (test__start_subtest("uprobe_race"))
+ test_uprobe_race();
}
#else
static void __test_uprobe_syscall(void)
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 17/23] selftests/bpf: Add uprobe syscall sigill signal test
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (15 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 16/23] selftests/bpf: Add hit/attach/detach race optimized uprobe test Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 18/23] selftests/bpf: Add optimized usdt variant for basic usdt test Jiri Olsa
` (7 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Make sure that calling uprobe syscall from outside uprobe trampoline
results in sigill signal.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
.../selftests/bpf/prog_tests/uprobe_syscall.c | 36 +++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index 5c10cf173e6d..b3518f48329c 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -672,6 +672,40 @@ static void test_uprobe_race(void)
pthread_join(threads[i], NULL);
}
+#ifndef __NR_uprobe
+#define __NR_uprobe 336
+#endif
+
+static void test_uprobe_sigill(void)
+{
+ int status, err, pid;
+
+ pid = fork();
+ if (!ASSERT_GE(pid, 0, "fork"))
+ return;
+ /* child */
+ if (pid == 0) {
+ asm volatile (
+ "pushq %rax\n"
+ "pushq %rcx\n"
+ "pushq %r11\n"
+ "movq $" __stringify(__NR_uprobe) ", %rax\n"
+ "syscall\n"
+ "popq %r11\n"
+ "popq %rcx\n"
+ "retq\n"
+ );
+ exit(0);
+ }
+
+ err = waitpid(pid, &status, 0);
+ ASSERT_EQ(err, pid, "waitpid");
+
+ /* verify the child got killed with SIGILL */
+ ASSERT_EQ(WIFSIGNALED(status), 1, "WIFSIGNALED");
+ ASSERT_EQ(WTERMSIG(status), SIGILL, "WTERMSIG");
+}
+
static void __test_uprobe_syscall(void)
{
if (test__start_subtest("uretprobe_regs_equal"))
@@ -692,6 +726,8 @@ static void __test_uprobe_syscall(void)
test_uprobe_usdt();
if (test__start_subtest("uprobe_race"))
test_uprobe_race();
+ if (test__start_subtest("uprobe_sigill"))
+ test_uprobe_sigill();
}
#else
static void __test_uprobe_syscall(void)
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 18/23] selftests/bpf: Add optimized usdt variant for basic usdt test
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (16 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 17/23] selftests/bpf: Add uprobe syscall sigill signal test Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 19/23] selftests/bpf: Add uprobe_regs_equal test Jiri Olsa
` (6 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Adding optimized usdt variant for basic usdt test to check that
usdt arguments are properly passed in optimized code path.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
tools/testing/selftests/bpf/prog_tests/usdt.c | 38 ++++++++++++-------
1 file changed, 25 insertions(+), 13 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/usdt.c b/tools/testing/selftests/bpf/prog_tests/usdt.c
index 495d66414b57..3a5b5230bfa0 100644
--- a/tools/testing/selftests/bpf/prog_tests/usdt.c
+++ b/tools/testing/selftests/bpf/prog_tests/usdt.c
@@ -40,12 +40,19 @@ static void __always_inline trigger_func(int x) {
}
}
-static void subtest_basic_usdt(void)
+static void subtest_basic_usdt(bool optimized)
{
LIBBPF_OPTS(bpf_usdt_opts, opts);
struct test_usdt *skel;
struct test_usdt__bss *bss;
- int err, i;
+ int err, i, called;
+
+#define TRIGGER(x) ({ \
+ trigger_func(x); \
+ if (optimized) \
+ trigger_func(x); \
+ optimized ? 2 : 1; \
+ })
skel = test_usdt__open_and_load();
if (!ASSERT_OK_PTR(skel, "skel_open"))
@@ -66,11 +73,11 @@ static void subtest_basic_usdt(void)
if (!ASSERT_OK_PTR(skel->links.usdt0, "usdt0_link"))
goto cleanup;
- trigger_func(1);
+ called = TRIGGER(1);
- ASSERT_EQ(bss->usdt0_called, 1, "usdt0_called");
- ASSERT_EQ(bss->usdt3_called, 1, "usdt3_called");
- ASSERT_EQ(bss->usdt12_called, 1, "usdt12_called");
+ ASSERT_EQ(bss->usdt0_called, called, "usdt0_called");
+ ASSERT_EQ(bss->usdt3_called, called, "usdt3_called");
+ ASSERT_EQ(bss->usdt12_called, called, "usdt12_called");
ASSERT_EQ(bss->usdt0_cookie, 0xcafedeadbeeffeed, "usdt0_cookie");
ASSERT_EQ(bss->usdt0_arg_cnt, 0, "usdt0_arg_cnt");
@@ -119,11 +126,11 @@ static void subtest_basic_usdt(void)
* bpf_program__attach_usdt() handles this properly and attaches to
* all possible places of USDT invocation.
*/
- trigger_func(2);
+ called += TRIGGER(2);
- ASSERT_EQ(bss->usdt0_called, 2, "usdt0_called");
- ASSERT_EQ(bss->usdt3_called, 2, "usdt3_called");
- ASSERT_EQ(bss->usdt12_called, 2, "usdt12_called");
+ ASSERT_EQ(bss->usdt0_called, called, "usdt0_called");
+ ASSERT_EQ(bss->usdt3_called, called, "usdt3_called");
+ ASSERT_EQ(bss->usdt12_called, called, "usdt12_called");
/* only check values that depend on trigger_func()'s input value */
ASSERT_EQ(bss->usdt3_args[0], 2, "usdt3_arg1");
@@ -142,9 +149,9 @@ static void subtest_basic_usdt(void)
if (!ASSERT_OK_PTR(skel->links.usdt3, "usdt3_reattach"))
goto cleanup;
- trigger_func(3);
+ called += TRIGGER(3);
- ASSERT_EQ(bss->usdt3_called, 3, "usdt3_called");
+ ASSERT_EQ(bss->usdt3_called, called, "usdt3_called");
/* this time usdt3 has custom cookie */
ASSERT_EQ(bss->usdt3_cookie, 0xBADC00C51E, "usdt3_cookie");
ASSERT_EQ(bss->usdt3_arg_cnt, 3, "usdt3_arg_cnt");
@@ -158,6 +165,7 @@ static void subtest_basic_usdt(void)
cleanup:
test_usdt__destroy(skel);
+#undef TRIGGER
}
unsigned short test_usdt_100_semaphore SEC(".probes");
@@ -419,7 +427,11 @@ static void subtest_urandom_usdt(bool auto_attach)
void test_usdt(void)
{
if (test__start_subtest("basic"))
- subtest_basic_usdt();
+ subtest_basic_usdt(false);
+#ifdef __x86_64__
+ if (test__start_subtest("basic_optimized"))
+ subtest_basic_usdt(true);
+#endif
if (test__start_subtest("multispec"))
subtest_multispec_usdt();
if (test__start_subtest("urand_auto_attach"))
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 19/23] selftests/bpf: Add uprobe_regs_equal test
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (17 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 18/23] selftests/bpf: Add optimized usdt variant for basic usdt test Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 20/23] selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe Jiri Olsa
` (5 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Changing uretprobe_regs_trigger to allow the test for both
uprobe and uretprobe and renaming it to uprobe_regs_equal.
We check that both uprobe and uretprobe probes (bpf programs)
see expected registers with few exceptions.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
.../selftests/bpf/prog_tests/uprobe_syscall.c | 57 ++++++++++++++-----
.../selftests/bpf/progs/uprobe_syscall.c | 4 +-
2 files changed, 44 insertions(+), 17 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index b3518f48329c..f1c297a9bb03 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -18,15 +18,17 @@
#pragma GCC diagnostic ignored "-Wattributes"
-__naked unsigned long uretprobe_regs_trigger(void)
+__attribute__((aligned(16)))
+__nocf_check __weak __naked unsigned long uprobe_regs_trigger(void)
{
asm volatile (
- "movq $0xdeadbeef, %rax\n"
+ ".byte 0x0f, 0x1f, 0x44, 0x00, 0x00 \n"
+ "movq $0xdeadbeef, %rax \n"
"ret\n"
);
}
-__naked void uretprobe_regs(struct pt_regs *before, struct pt_regs *after)
+__naked void uprobe_regs(struct pt_regs *before, struct pt_regs *after)
{
asm volatile (
"movq %r15, 0(%rdi)\n"
@@ -47,15 +49,17 @@ __naked void uretprobe_regs(struct pt_regs *before, struct pt_regs *after)
"movq $0, 120(%rdi)\n" /* orig_rax */
"movq $0, 128(%rdi)\n" /* rip */
"movq $0, 136(%rdi)\n" /* cs */
+ "pushq %rax\n"
"pushf\n"
"pop %rax\n"
"movq %rax, 144(%rdi)\n" /* eflags */
+ "pop %rax\n"
"movq %rsp, 152(%rdi)\n" /* rsp */
"movq $0, 160(%rdi)\n" /* ss */
/* save 2nd argument */
"pushq %rsi\n"
- "call uretprobe_regs_trigger\n"
+ "call uprobe_regs_trigger\n"
/* save return value and load 2nd argument pointer to rax */
"pushq %rax\n"
@@ -95,25 +99,37 @@ __naked void uretprobe_regs(struct pt_regs *before, struct pt_regs *after)
);
}
-static void test_uretprobe_regs_equal(void)
+static void test_uprobe_regs_equal(bool retprobe)
{
+ LIBBPF_OPTS(bpf_uprobe_opts, opts,
+ .retprobe = retprobe,
+ );
struct uprobe_syscall *skel = NULL;
struct pt_regs before = {}, after = {};
unsigned long *pb = (unsigned long *) &before;
unsigned long *pa = (unsigned long *) &after;
unsigned long *pp;
+ unsigned long offset;
unsigned int i, cnt;
- int err;
+
+ offset = get_uprobe_offset(&uprobe_regs_trigger);
+ if (!ASSERT_GE(offset, 0, "get_uprobe_offset"))
+ return;
skel = uprobe_syscall__open_and_load();
if (!ASSERT_OK_PTR(skel, "uprobe_syscall__open_and_load"))
goto cleanup;
- err = uprobe_syscall__attach(skel);
- if (!ASSERT_OK(err, "uprobe_syscall__attach"))
+ skel->links.probe = bpf_program__attach_uprobe_opts(skel->progs.probe,
+ 0, "/proc/self/exe", offset, &opts);
+ if (!ASSERT_OK_PTR(skel->links.probe, "bpf_program__attach_uprobe_opts"))
goto cleanup;
- uretprobe_regs(&before, &after);
+ /* make sure uprobe gets optimized */
+ if (!retprobe)
+ uprobe_regs_trigger();
+
+ uprobe_regs(&before, &after);
pp = (unsigned long *) &skel->bss->regs;
cnt = sizeof(before)/sizeof(*pb);
@@ -122,7 +138,7 @@ static void test_uretprobe_regs_equal(void)
unsigned int offset = i * sizeof(unsigned long);
/*
- * Check register before and after uretprobe_regs_trigger call
+ * Check register before and after uprobe_regs_trigger call
* that triggers the uretprobe.
*/
switch (offset) {
@@ -136,7 +152,7 @@ static void test_uretprobe_regs_equal(void)
/*
* Check register seen from bpf program and register after
- * uretprobe_regs_trigger call
+ * uprobe_regs_trigger call (with rax exception, check below).
*/
switch (offset) {
/*
@@ -149,6 +165,15 @@ static void test_uretprobe_regs_equal(void)
case offsetof(struct pt_regs, rsp):
case offsetof(struct pt_regs, ss):
break;
+ /*
+ * uprobe does not see return value in rax, it needs to see the
+ * original (before) rax value
+ */
+ case offsetof(struct pt_regs, rax):
+ if (!retprobe) {
+ ASSERT_EQ(pp[i], pb[i], "uprobe rax prog-before value check");
+ break;
+ }
default:
if (!ASSERT_EQ(pp[i], pa[i], "register prog-after value check"))
fprintf(stdout, "failed register offset %u\n", offset);
@@ -186,13 +211,13 @@ static void test_uretprobe_regs_change(void)
unsigned long cnt = sizeof(before)/sizeof(*pb);
unsigned int i, err, offset;
- offset = get_uprobe_offset(uretprobe_regs_trigger);
+ offset = get_uprobe_offset(uprobe_regs_trigger);
err = write_bpf_testmod_uprobe(offset);
if (!ASSERT_OK(err, "register_uprobe"))
return;
- uretprobe_regs(&before, &after);
+ uprobe_regs(&before, &after);
err = write_bpf_testmod_uprobe(0);
if (!ASSERT_OK(err, "unregister_uprobe"))
@@ -347,7 +372,7 @@ static void test_uretprobe_shadow_stack(void)
}
/* Run all of the uretprobe tests. */
- test_uretprobe_regs_equal();
+ test_uprobe_regs_equal(false);
test_uretprobe_regs_change();
test_uretprobe_syscall_call();
@@ -709,7 +734,7 @@ static void test_uprobe_sigill(void)
static void __test_uprobe_syscall(void)
{
if (test__start_subtest("uretprobe_regs_equal"))
- test_uretprobe_regs_equal();
+ test_uprobe_regs_equal(true);
if (test__start_subtest("uretprobe_regs_change"))
test_uretprobe_regs_change();
if (test__start_subtest("uretprobe_syscall_call"))
@@ -728,6 +753,8 @@ static void __test_uprobe_syscall(void)
test_uprobe_race();
if (test__start_subtest("uprobe_sigill"))
test_uprobe_sigill();
+ if (test__start_subtest("uprobe_regs_equal"))
+ test_uprobe_regs_equal(false);
}
#else
static void __test_uprobe_syscall(void)
diff --git a/tools/testing/selftests/bpf/progs/uprobe_syscall.c b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
index 8a4fa6c7ef59..e08c31669e5a 100644
--- a/tools/testing/selftests/bpf/progs/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/progs/uprobe_syscall.c
@@ -7,8 +7,8 @@ struct pt_regs regs;
char _license[] SEC("license") = "GPL";
-SEC("uretprobe//proc/self/exe:uretprobe_regs_trigger")
-int uretprobe(struct pt_regs *ctx)
+SEC("uprobe")
+int probe(struct pt_regs *ctx)
{
__builtin_memcpy(®s, ctx, sizeof(regs));
return 0;
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 20/23] selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (18 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 19/23] selftests/bpf: Add uprobe_regs_equal test Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 21/23] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
` (4 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Changing the test_uretprobe_regs_change test to test both uprobe
and uretprobe by adding entry consumer handler to the testmod
and making it to change one of the registers.
Making sure that changed values both uprobe and uretprobe handlers
propagate to the user space.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
.../testing/selftests/bpf/prog_tests/uprobe_syscall.c | 11 +++++++----
tools/testing/selftests/bpf/test_kmods/bpf_testmod.c | 11 +++++++++--
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
index f1c297a9bb03..83e4b7b6095d 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c
@@ -203,7 +203,7 @@ static int write_bpf_testmod_uprobe(unsigned long offset)
return ret != n ? (int) ret : 0;
}
-static void test_uretprobe_regs_change(void)
+static void test_regs_change(void)
{
struct pt_regs before = {}, after = {};
unsigned long *pb = (unsigned long *) &before;
@@ -217,6 +217,9 @@ static void test_uretprobe_regs_change(void)
if (!ASSERT_OK(err, "register_uprobe"))
return;
+ /* make sure uprobe gets optimized */
+ uprobe_regs_trigger();
+
uprobe_regs(&before, &after);
err = write_bpf_testmod_uprobe(0);
@@ -373,8 +376,8 @@ static void test_uretprobe_shadow_stack(void)
/* Run all of the uretprobe tests. */
test_uprobe_regs_equal(false);
- test_uretprobe_regs_change();
test_uretprobe_syscall_call();
+ test_regs_change();
ARCH_PRCTL(ARCH_SHSTK_DISABLE, ARCH_SHSTK_SHSTK);
}
@@ -735,8 +738,6 @@ static void __test_uprobe_syscall(void)
{
if (test__start_subtest("uretprobe_regs_equal"))
test_uprobe_regs_equal(true);
- if (test__start_subtest("uretprobe_regs_change"))
- test_uretprobe_regs_change();
if (test__start_subtest("uretprobe_syscall_call"))
test_uretprobe_syscall_call();
if (test__start_subtest("uretprobe_shadow_stack"))
@@ -755,6 +756,8 @@ static void __test_uprobe_syscall(void)
test_uprobe_sigill();
if (test__start_subtest("uprobe_regs_equal"))
test_uprobe_regs_equal(false);
+ if (test__start_subtest("regs_change"))
+ test_regs_change();
}
#else
static void __test_uprobe_syscall(void)
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
index 3220f1d28697..08494fcf6a58 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c
@@ -496,15 +496,21 @@ static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = {
*/
#ifdef __x86_64__
+static int
+uprobe_handler(struct uprobe_consumer *self, struct pt_regs *regs, __u64 *data)
+{
+ regs->cx = 0x87654321feebdaed;
+ return 0;
+}
+
static int
uprobe_ret_handler(struct uprobe_consumer *self, unsigned long func,
struct pt_regs *regs, __u64 *data)
{
regs->ax = 0x12345678deadbeef;
- regs->cx = 0x87654321feebdaed;
regs->r11 = (u64) -1;
- return true;
+ return 0;
}
struct testmod_uprobe {
@@ -516,6 +522,7 @@ struct testmod_uprobe {
static DEFINE_MUTEX(testmod_uprobe_mutex);
static struct testmod_uprobe uprobe = {
+ .consumer.handler = uprobe_handler,
.consumer.ret_handler = uprobe_ret_handler,
};
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 21/23] selftests/bpf: Add 5-byte nop uprobe trigger bench
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (19 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 20/23] selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 22/23] seccomp: passthrough uprobe systemcall without filtering Jiri Olsa
` (3 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Add 5-byte nop uprobe trigger bench (x86_64 specific) to measure
uprobes/uretprobes on top of nop5 instruction.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
tools/testing/selftests/bpf/bench.c | 12 ++++++
.../selftests/bpf/benchs/bench_trigger.c | 42 +++++++++++++++++++
.../selftests/bpf/benchs/run_bench_uprobes.sh | 2 +-
3 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 1bd403a5ef7b..0fd8c9b0d38f 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -526,6 +526,12 @@ extern const struct bench bench_trig_uprobe_multi_push;
extern const struct bench bench_trig_uretprobe_multi_push;
extern const struct bench bench_trig_uprobe_multi_ret;
extern const struct bench bench_trig_uretprobe_multi_ret;
+#ifdef __x86_64__
+extern const struct bench bench_trig_uprobe_nop5;
+extern const struct bench bench_trig_uretprobe_nop5;
+extern const struct bench bench_trig_uprobe_multi_nop5;
+extern const struct bench bench_trig_uretprobe_multi_nop5;
+#endif
extern const struct bench bench_rb_libbpf;
extern const struct bench bench_rb_custom;
@@ -586,6 +592,12 @@ static const struct bench *benchs[] = {
&bench_trig_uretprobe_multi_push,
&bench_trig_uprobe_multi_ret,
&bench_trig_uretprobe_multi_ret,
+#ifdef __x86_64__
+ &bench_trig_uprobe_nop5,
+ &bench_trig_uretprobe_nop5,
+ &bench_trig_uprobe_multi_nop5,
+ &bench_trig_uretprobe_multi_nop5,
+#endif
/* ringbuf/perfbuf benchmarks */
&bench_rb_libbpf,
&bench_rb_custom,
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index 32e9f194d449..82327657846e 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -333,6 +333,20 @@ static void *uprobe_producer_ret(void *input)
return NULL;
}
+#ifdef __x86_64__
+__nocf_check __weak void uprobe_target_nop5(void)
+{
+ asm volatile (".byte 0x0f, 0x1f, 0x44, 0x00, 0x00");
+}
+
+static void *uprobe_producer_nop5(void *input)
+{
+ while (true)
+ uprobe_target_nop5();
+ return NULL;
+}
+#endif
+
static void usetup(bool use_retprobe, bool use_multi, void *target_addr)
{
size_t uprobe_offset;
@@ -448,6 +462,28 @@ static void uretprobe_multi_ret_setup(void)
usetup(true, true /* use_multi */, &uprobe_target_ret);
}
+#ifdef __x86_64__
+static void uprobe_nop5_setup(void)
+{
+ usetup(false, false /* !use_multi */, &uprobe_target_nop5);
+}
+
+static void uretprobe_nop5_setup(void)
+{
+ usetup(true, false /* !use_multi */, &uprobe_target_nop5);
+}
+
+static void uprobe_multi_nop5_setup(void)
+{
+ usetup(false, true /* use_multi */, &uprobe_target_nop5);
+}
+
+static void uretprobe_multi_nop5_setup(void)
+{
+ usetup(true, true /* use_multi */, &uprobe_target_nop5);
+}
+#endif
+
const struct bench bench_trig_syscall_count = {
.name = "trig-syscall-count",
.validate = trigger_validate,
@@ -506,3 +542,9 @@ BENCH_TRIG_USERMODE(uprobe_multi_ret, ret, "uprobe-multi-ret");
BENCH_TRIG_USERMODE(uretprobe_multi_nop, nop, "uretprobe-multi-nop");
BENCH_TRIG_USERMODE(uretprobe_multi_push, push, "uretprobe-multi-push");
BENCH_TRIG_USERMODE(uretprobe_multi_ret, ret, "uretprobe-multi-ret");
+#ifdef __x86_64__
+BENCH_TRIG_USERMODE(uprobe_nop5, nop5, "uprobe-nop5");
+BENCH_TRIG_USERMODE(uretprobe_nop5, nop5, "uretprobe-nop5");
+BENCH_TRIG_USERMODE(uprobe_multi_nop5, nop5, "uprobe-multi-nop5");
+BENCH_TRIG_USERMODE(uretprobe_multi_nop5, nop5, "uretprobe-multi-nop5");
+#endif
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
index af169f831f2f..03f55405484b 100755
--- a/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
+++ b/tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh
@@ -2,7 +2,7 @@
set -eufo pipefail
-for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret}
+for i in usermode-count syscall-count {uprobe,uretprobe}-{nop,push,ret,nop5}
do
summary=$(sudo ./bench -w2 -d5 -a trig-$i | tail -n1 | cut -d'(' -f1 | cut -d' ' -f3-)
printf "%-15s: %s\n" $i "$summary"
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 22/23] seccomp: passthrough uprobe systemcall without filtering
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (20 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 21/23] selftests/bpf: Add 5-byte nop uprobe trigger bench Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 11:41 ` [PATCH RFCv3 23/23] selftests/seccomp: validate uprobe syscall passes through seccomp Jiri Olsa
` (2 subsequent siblings)
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: Kees Cook, Eyal Birger, bpf, linux-kernel, linux-trace-kernel,
x86, Song Liu, Yonghong Song, John Fastabend, Hao Luo,
Steven Rostedt, Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Adding uprobe as another exception to the seccomp filter alongside
with the uretprobe syscall.
Same as the uretprobe the uprobe syscall is installed by kernel as
replacement for the breakpoint exception and is limited to x86_64
arch and isn't expected to ever be supported in i386.
Cc: Kees Cook <keescook@chromium.org>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/seccomp.c | 32 +++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 7bbb408431eb..44a469b01898 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -733,6 +733,26 @@ seccomp_prepare_user_filter(const char __user *user_filter)
}
#ifdef SECCOMP_ARCH_NATIVE
+static bool seccomp_uprobe_exception(struct seccomp_data *sd)
+{
+#if defined __NR_uretprobe || defined __NR_uprobe
+#ifdef SECCOMP_ARCH_COMPAT
+ if (sd->arch == SECCOMP_ARCH_NATIVE)
+#endif
+ {
+#ifdef __NR_uretprobe
+ if (sd->nr == __NR_uretprobe)
+ return true;
+#endif
+#ifdef __NR_uprobe
+ if (sd->nr == __NR_uprobe)
+ return true;
+#endif
+ }
+#endif
+ return false;
+}
+
/**
* seccomp_is_const_allow - check if filter is constant allow with given data
* @fprog: The BPF programs
@@ -750,13 +770,8 @@ static bool seccomp_is_const_allow(struct sock_fprog_kern *fprog,
return false;
/* Our single exception to filtering. */
-#ifdef __NR_uretprobe
-#ifdef SECCOMP_ARCH_COMPAT
- if (sd->arch == SECCOMP_ARCH_NATIVE)
-#endif
- if (sd->nr == __NR_uretprobe)
- return true;
-#endif
+ if (seccomp_uprobe_exception(sd))
+ return true;
for (pc = 0; pc < fprog->len; pc++) {
struct sock_filter *insn = &fprog->filter[pc];
@@ -1034,6 +1049,9 @@ static const int mode1_syscalls[] = {
__NR_seccomp_read, __NR_seccomp_write, __NR_seccomp_exit, __NR_seccomp_sigreturn,
#ifdef __NR_uretprobe
__NR_uretprobe,
+#endif
+#ifdef __NR_uprobe
+ __NR_uprobe,
#endif
-1, /* negative terminated */
};
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH RFCv3 23/23] selftests/seccomp: validate uprobe syscall passes through seccomp
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (21 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 22/23] seccomp: passthrough uprobe systemcall without filtering Jiri Olsa
@ 2025-03-20 11:41 ` Jiri Olsa
2025-03-20 12:23 ` [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Oleg Nesterov
2025-04-04 20:36 ` Andrii Nakryiko
24 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 11:41 UTC (permalink / raw)
To: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko
Cc: Kees Cook, Eyal Birger, bpf, linux-kernel, linux-trace-kernel,
x86, Song Liu, Yonghong Song, John Fastabend, Hao Luo,
Steven Rostedt, Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
Adding uprobe checks into the current uretprobe tests.
All the related tests are now executed with attached uprobe
or uretprobe or without any probe.
Renaming the test fixture to uprobe, because it seems better.
Cc: Kees Cook <keescook@chromium.org>
Cc: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++----
1 file changed, 86 insertions(+), 21 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 14ba51b52095..794787786968 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -73,6 +73,14 @@
#define noinline __attribute__((noinline))
#endif
+#ifndef __nocf_check
+#define __nocf_check __attribute__((nocf_check))
+#endif
+
+#ifndef __naked
+#define __naked __attribute__((__naked__))
+#endif
+
#ifndef PR_SET_NO_NEW_PRIVS
#define PR_SET_NO_NEW_PRIVS 38
#define PR_GET_NO_NEW_PRIVS 39
@@ -4893,7 +4901,36 @@ TEST(tsync_vs_dead_thread_leader)
EXPECT_EQ(0, status);
}
-noinline int probed(void)
+#ifdef __x86_64__
+
+/*
+ * We need naked probed_uprobe function. Using __nocf_check
+ * check to skip possible endbr64 instruction and ignoring
+ * -Wattributes, otherwise the compilation might fail.
+ */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wattributes"
+
+__naked __nocf_check noinline int probed_uprobe(void)
+{
+ /*
+ * Optimized uprobe is possible only on top of nop5 instruction.
+ */
+ asm volatile (" \n"
+ ".byte 0x0f, 0x1f, 0x44, 0x00, 0x00 \n"
+ "ret \n"
+ );
+}
+#pragma GCC diagnostic pop
+
+#else
+noinline int probed_uprobe(void)
+{
+ return 1;
+}
+#endif
+
+noinline int probed_uretprobe(void)
{
return 1;
}
@@ -4946,35 +4983,46 @@ static ssize_t get_uprobe_offset(const void *addr)
return found ? (uintptr_t)addr - start + base : -1;
}
-FIXTURE(URETPROBE) {
+FIXTURE(UPROBE) {
int fd;
};
-FIXTURE_VARIANT(URETPROBE) {
+FIXTURE_VARIANT(UPROBE) {
/*
- * All of the URETPROBE behaviors can be tested with either
- * uretprobe attached or not
+ * All of the U(RET)PROBE behaviors can be tested with either
+ * u(ret)probe attached or not
*/
bool attach;
+ /*
+ * Test both uprobe and uretprobe.
+ */
+ bool uretprobe;
};
-FIXTURE_VARIANT_ADD(URETPROBE, attached) {
+FIXTURE_VARIANT_ADD(UPROBE, not_attached) {
+ .attach = false,
+ .uretprobe = false,
+};
+
+FIXTURE_VARIANT_ADD(UPROBE, uprobe_attached) {
.attach = true,
+ .uretprobe = false,
};
-FIXTURE_VARIANT_ADD(URETPROBE, not_attached) {
- .attach = false,
+FIXTURE_VARIANT_ADD(UPROBE, uretprobe_attached) {
+ .attach = true,
+ .uretprobe = true,
};
-FIXTURE_SETUP(URETPROBE)
+FIXTURE_SETUP(UPROBE)
{
const size_t attr_sz = sizeof(struct perf_event_attr);
struct perf_event_attr attr;
ssize_t offset;
int type, bit;
-#ifndef __NR_uretprobe
- SKIP(return, "__NR_uretprobe syscall not defined");
+#if !defined(__NR_uprobe) || !defined(__NR_uretprobe)
+ SKIP(return, "__NR_uprobe ot __NR_uretprobe syscalls not defined");
#endif
if (!variant->attach)
@@ -4984,12 +5032,17 @@ FIXTURE_SETUP(URETPROBE)
type = determine_uprobe_perf_type();
ASSERT_GE(type, 0);
- bit = determine_uprobe_retprobe_bit();
- ASSERT_GE(bit, 0);
- offset = get_uprobe_offset(probed);
+
+ if (variant->uretprobe) {
+ bit = determine_uprobe_retprobe_bit();
+ ASSERT_GE(bit, 0);
+ }
+
+ offset = get_uprobe_offset(variant->uretprobe ? probed_uretprobe : probed_uprobe);
ASSERT_GE(offset, 0);
- attr.config |= 1 << bit;
+ if (variant->uretprobe)
+ attr.config |= 1 << bit;
attr.size = attr_sz;
attr.type = type;
attr.config1 = ptr_to_u64("/proc/self/exe");
@@ -5000,7 +5053,7 @@ FIXTURE_SETUP(URETPROBE)
PERF_FLAG_FD_CLOEXEC);
}
-FIXTURE_TEARDOWN(URETPROBE)
+FIXTURE_TEARDOWN(UPROBE)
{
/* we could call close(self->fd), but we'd need extra filter for
* that and since we are calling _exit right away..
@@ -5014,11 +5067,17 @@ static int run_probed_with_filter(struct sock_fprog *prog)
return -1;
}
- probed();
+ /*
+ * Uprobe is optimized after first hit, so let's hit twice.
+ */
+ probed_uprobe();
+ probed_uprobe();
+
+ probed_uretprobe();
return 0;
}
-TEST_F(URETPROBE, uretprobe_default_allow)
+TEST_F(UPROBE, uprobe_default_allow)
{
struct sock_filter filter[] = {
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
@@ -5031,7 +5090,7 @@ TEST_F(URETPROBE, uretprobe_default_allow)
ASSERT_EQ(0, run_probed_with_filter(&prog));
}
-TEST_F(URETPROBE, uretprobe_default_block)
+TEST_F(UPROBE, uprobe_default_block)
{
struct sock_filter filter[] = {
BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
@@ -5048,11 +5107,14 @@ TEST_F(URETPROBE, uretprobe_default_block)
ASSERT_EQ(0, run_probed_with_filter(&prog));
}
-TEST_F(URETPROBE, uretprobe_block_uretprobe_syscall)
+TEST_F(UPROBE, uprobe_block_syscall)
{
struct sock_filter filter[] = {
BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
offsetof(struct seccomp_data, nr)),
+#ifdef __NR_uprobe
+ BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_uprobe, 1, 2),
+#endif
#ifdef __NR_uretprobe
BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_uretprobe, 0, 1),
#endif
@@ -5067,11 +5129,14 @@ TEST_F(URETPROBE, uretprobe_block_uretprobe_syscall)
ASSERT_EQ(0, run_probed_with_filter(&prog));
}
-TEST_F(URETPROBE, uretprobe_default_block_with_uretprobe_syscall)
+TEST_F(UPROBE, uprobe_default_block_with_syscall)
{
struct sock_filter filter[] = {
BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
offsetof(struct seccomp_data, nr)),
+#ifdef __NR_uprobe
+ BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_uprobe, 3, 0),
+#endif
#ifdef __NR_uretprobe
BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_uretprobe, 2, 0),
#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (22 preceding siblings ...)
2025-03-20 11:41 ` [PATCH RFCv3 23/23] selftests/seccomp: validate uprobe syscall passes through seccomp Jiri Olsa
@ 2025-03-20 12:23 ` Oleg Nesterov
2025-03-20 13:51 ` Jiri Olsa
2025-04-04 20:36 ` Andrii Nakryiko
24 siblings, 1 reply; 37+ messages in thread
From: Oleg Nesterov @ 2025-03-20 12:23 UTC (permalink / raw)
To: Jiri Olsa, David Hildenbrand
Cc: Peter Zijlstra, Andrii Nakryiko, Eyal Birger, kees, bpf,
linux-kernel, linux-trace-kernel, x86, Song Liu, Yonghong Song,
John Fastabend, Hao Luo, Steven Rostedt, Masami Hiramatsu,
Alan Maguire, David Laight, Thomas Weißschuh
On 03/20, Jiri Olsa wrote:
>
> hi,
> this patchset adds support to optimize usdt probes on top of 5-byte
> nop instruction.
Just in case... This series conflicts with (imo very important) changes
from David,
[PATCH v2 0/3] kernel/events/uprobes: uprobe_write_opcode() rewrite
https://lore.kernel.org/all/20250318221457.3055598-1-david@redhat.com/
I think they should be merged first.
(and I am not sure yet, but it seems that we should cleanup (fix?) the
update_ref_ctr() logic before other changes).
Oleg.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64
2025-03-20 12:23 ` [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Oleg Nesterov
@ 2025-03-20 13:51 ` Jiri Olsa
0 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-03-20 13:51 UTC (permalink / raw)
To: Oleg Nesterov
Cc: David Hildenbrand, Peter Zijlstra, Andrii Nakryiko, Eyal Birger,
kees, bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
On Thu, Mar 20, 2025 at 01:23:44PM +0100, Oleg Nesterov wrote:
> On 03/20, Jiri Olsa wrote:
> >
> > hi,
> > this patchset adds support to optimize usdt probes on top of 5-byte
> > nop instruction.
>
> Just in case... This series conflicts with (imo very important) changes
> from David,
>
> [PATCH v2 0/3] kernel/events/uprobes: uprobe_write_opcode() rewrite
> https://lore.kernel.org/all/20250318221457.3055598-1-david@redhat.com/
>
> I think they should be merged first.
ok, I'll check on those
thanks,
jirka
>
> (and I am not sure yet, but it seems that we should cleanup (fix?) the
> update_ref_ctr() logic before other changes).
>
> Oleg.
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64
2025-03-20 11:41 [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
` (23 preceding siblings ...)
2025-03-20 12:23 ` [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64 Oleg Nesterov
@ 2025-04-04 20:36 ` Andrii Nakryiko
2025-04-07 11:17 ` Jiri Olsa
24 siblings, 1 reply; 37+ messages in thread
From: Andrii Nakryiko @ 2025-04-04 20:36 UTC (permalink / raw)
To: Jiri Olsa
Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, Eyal Birger, kees,
bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
On Thu, Mar 20, 2025 at 4:42 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> hi,
> this patchset adds support to optimize usdt probes on top of 5-byte
> nop instruction.
>
> The generic approach (optimize all uprobes) is hard due to emulating
> possible multiple original instructions and its related issues. The
> usdt case, which stores 5-byte nop seems much easier, so starting
> with that.
>
> The basic idea is to replace breakpoint exception with syscall which
> is faster on x86_64. For more details please see changelog of patch 8.
>
> The run_bench_uprobes.sh benchmark triggers uprobe (on top of different
> original instructions) in a loop and counts how many of those happened
> per second (the unit below is million loops).
>
> There's big speed up if you consider current usdt implementation
> (uprobe-nop) compared to proposed usdt (uprobe-nop5):
>
> current:
> usermode-count : 152.604 ± 0.044M/s
> syscall-count : 13.359 ± 0.042M/s
> --> uprobe-nop : 3.229 ± 0.002M/s
> uprobe-push : 3.086 ± 0.004M/s
> uprobe-ret : 1.114 ± 0.004M/s
> uprobe-nop5 : 1.121 ± 0.005M/s
> uretprobe-nop : 2.145 ± 0.002M/s
> uretprobe-push : 2.070 ± 0.001M/s
> uretprobe-ret : 0.931 ± 0.001M/s
> uretprobe-nop5 : 0.957 ± 0.001M/s
>
> after the change:
> usermode-count : 152.448 ± 0.244M/s
> syscall-count : 14.321 ± 0.059M/s
> uprobe-nop : 3.148 ± 0.007M/s
> uprobe-push : 2.976 ± 0.004M/s
> uprobe-ret : 1.068 ± 0.003M/s
> --> uprobe-nop5 : 7.038 ± 0.007M/s
> uretprobe-nop : 2.109 ± 0.004M/s
> uretprobe-push : 2.035 ± 0.001M/s
> uretprobe-ret : 0.908 ± 0.001M/s
> uretprobe-nop5 : 3.377 ± 0.009M/s
>
> I see bit more speed up on Intel (above) compared to AMD. The big nop5
> speed up is partly due to emulating nop5 and partly due to optimization.
>
> The key speed up we do this for is the USDT switch from nop to nop5:
> uprobe-nop : 3.148 ± 0.007M/s
> uprobe-nop5 : 7.038 ± 0.007M/s
>
>
> rfc v3 changes:
> - I tried to have just single syscall for both entry and return uprobe,
> but it turned out to be slower than having two separated syscalls,
> probably due to extra save/restore processing we have to do for
> argument reg, I see differences like:
>
> 2 syscalls: uprobe-nop5 : 7.038 ± 0.007M/s
> 1 syscall: uprobe-nop5 : 6.943 ± 0.003M/s
>
> - use instructions (nop5/int3/call) to determine the state of the
> uprobe update in the process
> - removed endbr instruction from uprobe trampoline
> - seccomp changes
>
> pending todo (or follow ups):
> - shadow stack fails for uprobe session setup, will fix it in next version
> - use PROCMAP_QUERY in tests
> - alloc 'struct uprobes_state' for mm_struct only when needed [Andrii]
All the pending TODO stuff seems pretty minor. So is there anything
else holding your patch set from graduating out of RFC status?
David's uprobe_write_opcode() patch set landed, so you should be ready
to rebase and post a proper v1 now, right?
Performance wins are huge, looking forward to this making it into the
kernel soon!
>
> thanks,
> jirka
>
>
> Cc: Eyal Birger <eyal.birger@gmail.com>
> Cc: kees@kernel.org
> ---
> Jiri Olsa (23):
> uprobes: Rename arch_uretprobe_trampoline function
> uprobes: Make copy_from_page global
> uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
> uprobes: Add uprobe_write function
> uprobes: Add nbytes argument to uprobe_write_opcode
> uprobes: Add orig argument to uprobe_write and uprobe_write_opcode
> uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock
> uprobes/x86: Add uprobe syscall to speed up uprobe
> uprobes/x86: Add mapping for optimized uprobe trampolines
> uprobes/x86: Add support to emulate nop5 instruction
> uprobes/x86: Add support to optimize uprobes
> selftests/bpf: Use 5-byte nop for x86 usdt probes
> selftests/bpf: Reorg the uprobe_syscall test function
> selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi
> selftests/bpf: Add uprobe/usdt syscall tests
> selftests/bpf: Add hit/attach/detach race optimized uprobe test
> selftests/bpf: Add uprobe syscall sigill signal test
> selftests/bpf: Add optimized usdt variant for basic usdt test
> selftests/bpf: Add uprobe_regs_equal test
> selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe
> selftests/bpf: Add 5-byte nop uprobe trigger bench
> seccomp: passthrough uprobe systemcall without filtering
> selftests/seccomp: validate uprobe syscall passes through seccomp
>
> arch/arm/probes/uprobes/core.c | 2 +-
> arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> arch/x86/include/asm/uprobes.h | 7 ++
> arch/x86/kernel/uprobes.c | 540 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/syscalls.h | 2 +
> include/linux/uprobes.h | 19 +++-
> kernel/events/uprobes.c | 141 +++++++++++++++++-------
> kernel/fork.c | 1 +
> kernel/seccomp.c | 32 ++++--
> kernel/sys_ni.c | 1 +
> tools/testing/selftests/bpf/bench.c | 12 +++
> tools/testing/selftests/bpf/benchs/bench_trigger.c | 42 ++++++++
> tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh | 2 +-
> tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 453 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
> tools/testing/selftests/bpf/prog_tests/usdt.c | 38 ++++---
> tools/testing/selftests/bpf/progs/uprobe_syscall.c | 4 +-
> tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c | 41 ++++++-
> tools/testing/selftests/bpf/sdt.h | 9 +-
> tools/testing/selftests/bpf/test_kmods/bpf_testmod.c | 11 +-
> tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++----
> 20 files changed, 1338 insertions(+), 127 deletions(-)
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH RFCv3 00/23] uprobes: Add support to optimize usdt probes on x86_64
2025-04-04 20:36 ` Andrii Nakryiko
@ 2025-04-07 11:17 ` Jiri Olsa
0 siblings, 0 replies; 37+ messages in thread
From: Jiri Olsa @ 2025-04-07 11:17 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Oleg Nesterov, Peter Zijlstra, Andrii Nakryiko, Eyal Birger, kees,
bpf, linux-kernel, linux-trace-kernel, x86, Song Liu,
Yonghong Song, John Fastabend, Hao Luo, Steven Rostedt,
Masami Hiramatsu, Alan Maguire, David Laight,
Thomas Weißschuh
On Fri, Apr 04, 2025 at 01:36:13PM -0700, Andrii Nakryiko wrote:
> On Thu, Mar 20, 2025 at 4:42 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > hi,
> > this patchset adds support to optimize usdt probes on top of 5-byte
> > nop instruction.
> >
> > The generic approach (optimize all uprobes) is hard due to emulating
> > possible multiple original instructions and its related issues. The
> > usdt case, which stores 5-byte nop seems much easier, so starting
> > with that.
> >
> > The basic idea is to replace breakpoint exception with syscall which
> > is faster on x86_64. For more details please see changelog of patch 8.
> >
> > The run_bench_uprobes.sh benchmark triggers uprobe (on top of different
> > original instructions) in a loop and counts how many of those happened
> > per second (the unit below is million loops).
> >
> > There's big speed up if you consider current usdt implementation
> > (uprobe-nop) compared to proposed usdt (uprobe-nop5):
> >
> > current:
> > usermode-count : 152.604 ± 0.044M/s
> > syscall-count : 13.359 ± 0.042M/s
> > --> uprobe-nop : 3.229 ± 0.002M/s
> > uprobe-push : 3.086 ± 0.004M/s
> > uprobe-ret : 1.114 ± 0.004M/s
> > uprobe-nop5 : 1.121 ± 0.005M/s
> > uretprobe-nop : 2.145 ± 0.002M/s
> > uretprobe-push : 2.070 ± 0.001M/s
> > uretprobe-ret : 0.931 ± 0.001M/s
> > uretprobe-nop5 : 0.957 ± 0.001M/s
> >
> > after the change:
> > usermode-count : 152.448 ± 0.244M/s
> > syscall-count : 14.321 ± 0.059M/s
> > uprobe-nop : 3.148 ± 0.007M/s
> > uprobe-push : 2.976 ± 0.004M/s
> > uprobe-ret : 1.068 ± 0.003M/s
> > --> uprobe-nop5 : 7.038 ± 0.007M/s
> > uretprobe-nop : 2.109 ± 0.004M/s
> > uretprobe-push : 2.035 ± 0.001M/s
> > uretprobe-ret : 0.908 ± 0.001M/s
> > uretprobe-nop5 : 3.377 ± 0.009M/s
> >
> > I see bit more speed up on Intel (above) compared to AMD. The big nop5
> > speed up is partly due to emulating nop5 and partly due to optimization.
> >
> > The key speed up we do this for is the USDT switch from nop to nop5:
> > uprobe-nop : 3.148 ± 0.007M/s
> > uprobe-nop5 : 7.038 ± 0.007M/s
> >
> >
> > rfc v3 changes:
> > - I tried to have just single syscall for both entry and return uprobe,
> > but it turned out to be slower than having two separated syscalls,
> > probably due to extra save/restore processing we have to do for
> > argument reg, I see differences like:
> >
> > 2 syscalls: uprobe-nop5 : 7.038 ± 0.007M/s
> > 1 syscall: uprobe-nop5 : 6.943 ± 0.003M/s
> >
> > - use instructions (nop5/int3/call) to determine the state of the
> > uprobe update in the process
> > - removed endbr instruction from uprobe trampoline
> > - seccomp changes
> >
> > pending todo (or follow ups):
> > - shadow stack fails for uprobe session setup, will fix it in next version
> > - use PROCMAP_QUERY in tests
> > - alloc 'struct uprobes_state' for mm_struct only when needed [Andrii]
>
> All the pending TODO stuff seems pretty minor. So is there anything
> else holding your patch set from graduating out of RFC status?
>
> David's uprobe_write_opcode() patch set landed, so you should be ready
> to rebase and post a proper v1 now, right?
>
> Performance wins are huge, looking forward to this making it into the
> kernel soon!
I just saw notification that those changes are on the way to mm tree,
I have the rebase ready, want to post it this week, could be v1 ;-)
jirka
>
> >
> > thanks,
> > jirka
> >
> >
> > Cc: Eyal Birger <eyal.birger@gmail.com>
> > Cc: kees@kernel.org
> > ---
> > Jiri Olsa (23):
> > uprobes: Rename arch_uretprobe_trampoline function
> > uprobes: Make copy_from_page global
> > uprobes: Move ref_ctr_offset update out of uprobe_write_opcode
> > uprobes: Add uprobe_write function
> > uprobes: Add nbytes argument to uprobe_write_opcode
> > uprobes: Add orig argument to uprobe_write and uprobe_write_opcode
> > uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock
> > uprobes/x86: Add uprobe syscall to speed up uprobe
> > uprobes/x86: Add mapping for optimized uprobe trampolines
> > uprobes/x86: Add support to emulate nop5 instruction
> > uprobes/x86: Add support to optimize uprobes
> > selftests/bpf: Use 5-byte nop for x86 usdt probes
> > selftests/bpf: Reorg the uprobe_syscall test function
> > selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi
> > selftests/bpf: Add uprobe/usdt syscall tests
> > selftests/bpf: Add hit/attach/detach race optimized uprobe test
> > selftests/bpf: Add uprobe syscall sigill signal test
> > selftests/bpf: Add optimized usdt variant for basic usdt test
> > selftests/bpf: Add uprobe_regs_equal test
> > selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe
> > selftests/bpf: Add 5-byte nop uprobe trigger bench
> > seccomp: passthrough uprobe systemcall without filtering
> > selftests/seccomp: validate uprobe syscall passes through seccomp
> >
> > arch/arm/probes/uprobes/core.c | 2 +-
> > arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> > arch/x86/include/asm/uprobes.h | 7 ++
> > arch/x86/kernel/uprobes.c | 540 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > include/linux/syscalls.h | 2 +
> > include/linux/uprobes.h | 19 +++-
> > kernel/events/uprobes.c | 141 +++++++++++++++++-------
> > kernel/fork.c | 1 +
> > kernel/seccomp.c | 32 ++++--
> > kernel/sys_ni.c | 1 +
> > tools/testing/selftests/bpf/bench.c | 12 +++
> > tools/testing/selftests/bpf/benchs/bench_trigger.c | 42 ++++++++
> > tools/testing/selftests/bpf/benchs/run_bench_uprobes.sh | 2 +-
> > tools/testing/selftests/bpf/prog_tests/uprobe_syscall.c | 453 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
> > tools/testing/selftests/bpf/prog_tests/usdt.c | 38 ++++---
> > tools/testing/selftests/bpf/progs/uprobe_syscall.c | 4 +-
> > tools/testing/selftests/bpf/progs/uprobe_syscall_executed.c | 41 ++++++-
> > tools/testing/selftests/bpf/sdt.h | 9 +-
> > tools/testing/selftests/bpf/test_kmods/bpf_testmod.c | 11 +-
> > tools/testing/selftests/seccomp/seccomp_bpf.c | 107 ++++++++++++++----
> > 20 files changed, 1338 insertions(+), 127 deletions(-)
^ permalink raw reply [flat|nested] 37+ messages in thread