* [PATCH v4 0/9] uprobes: misc cleanups/simplifications
@ 2024-08-01 13:26 Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 1/9] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
` (9 more replies)
0 siblings, 10 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:26 UTC (permalink / raw)
To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel
(Andrii, I'll try to look at your new series on Weekend).
Changes:
- added the acks I got from Andrii, Masami, and Jiri
- new 4/9 patch from Jiri, fixes the unrelated bug in bpf_testmod
- adapt tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
to the API changes in 5/9 an 6/9
see the interdiff below.
Oleg.
---
include/linux/uprobes.h | 22 ++--
kernel/events/uprobes.c | 137 +++++++++------------
kernel/trace/bpf_trace.c | 25 ++--
kernel/trace/trace_uprobe.c | 31 ++---
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 26 ++--
5 files changed, 103 insertions(+), 138 deletions(-)
-------------------------------------------------------------------------------
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -432,7 +432,7 @@ uprobe_ret_handler(struct uprobe_consumer *self, unsigned long func,
struct testmod_uprobe {
struct path path;
- loff_t offset;
+ struct uprobe *uprobe;
struct uprobe_consumer consumer;
};
@@ -446,25 +446,25 @@ static int testmod_register_uprobe(loff_t offset)
{
int err = -EBUSY;
- if (uprobe.offset)
+ if (uprobe.uprobe)
return -EBUSY;
mutex_lock(&testmod_uprobe_mutex);
- if (uprobe.offset)
+ if (uprobe.uprobe)
goto out;
err = kern_path("/proc/self/exe", LOOKUP_FOLLOW, &uprobe.path);
if (err)
goto out;
- err = uprobe_register_refctr(d_real_inode(uprobe.path.dentry),
- offset, 0, &uprobe.consumer);
- if (err)
+ uprobe.uprobe = uprobe_register(d_real_inode(uprobe.path.dentry),
+ offset, 0, &uprobe.consumer);
+ if (IS_ERR(uprobe.uprobe)) {
+ err = PTR_ERR(uprobe.uprobe);
path_put(&uprobe.path);
- else
- uprobe.offset = offset;
-
+ uprobe.uprobe = NULL;
+ }
out:
mutex_unlock(&testmod_uprobe_mutex);
return err;
@@ -474,10 +474,10 @@ static void testmod_unregister_uprobe(void)
{
mutex_lock(&testmod_uprobe_mutex);
- if (uprobe.offset) {
- uprobe_unregister(d_real_inode(uprobe.path.dentry),
- uprobe.offset, &uprobe.consumer);
- uprobe.offset = 0;
+ if (uprobe.uprobe) {
+ uprobe_unregister(uprobe.uprobe, &uprobe.consumer);
+ path_put(&uprobe.path);
+ uprobe.uprobe = NULL;
}
mutex_unlock(&testmod_uprobe_mutex);
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v4 1/9] uprobes: document the usage of mm->mmap_lock
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 2/9] uprobes: is_trap_at_addr: don't use get_user_pages_remote() Oleg Nesterov
` (8 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel
The comment above uprobe_write_opcode() is wrong, unapply_uprobe() calls
it under mmap_read_lock() and this is correct.
And it is completely unclear why register_for_each_vma() takes mmap_lock
for writing, add a comment to explain that mmap_write_lock() is needed to
avoid the following race:
- A task T hits the bp installed by uprobe and calls
find_active_uprobe()
- uprobe_unregister() removes this uprobe/bp
- T calls find_uprobe() which returns NULL
- another uprobe_register() installs the bp at the same address
- T calls is_trap_at_addr() which returns true
- T returns to handle_swbp() and gets SIGTRAP.
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/events/uprobes.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 73cc47708679..73dd12b09a7b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -453,7 +453,7 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
* @vaddr: the virtual address to store the opcode.
* @opcode: opcode to be written at @vaddr.
*
- * Called with mm->mmap_lock held for write.
+ * Called with mm->mmap_lock held for read or write.
* Return 0 (success) or a negative errno.
*/
int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
@@ -1046,7 +1046,13 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
if (err && is_register)
goto free;
-
+ /*
+ * We take mmap_lock for writing to avoid the race with
+ * find_active_uprobe() which takes mmap_lock for reading.
+ * Thus this install_breakpoint() can not make
+ * is_trap_at_addr() true right after find_uprobe()
+ * returns NULL in find_active_uprobe().
+ */
mmap_write_lock(mm);
vma = find_vma(mm, info->vaddr);
if (!vma || !valid_vma(vma, is_register) ||
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 2/9] uprobes: is_trap_at_addr: don't use get_user_pages_remote()
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 1/9] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 3/9] uprobes: simplify error handling for alloc_uprobe() Oleg Nesterov
` (7 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel
get_user_pages_remote() and the comment above it make no sense.
There is no task_struct passed into get_user_pages_remote() anymore, and
nowadays mm_account_fault() increments the current->min/maj_flt counters
regardless of FAULT_FLAG_REMOTE.
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/events/uprobes.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 73dd12b09a7b..eb71428691bb 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -2035,13 +2035,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
if (likely(result == 0))
goto out;
- /*
- * The NULL 'tsk' here ensures that any faults that occur here
- * will not be accounted to the task. 'mm' *is* current->mm,
- * but we treat this as a 'remote' access since it is
- * essentially a kernel access to the memory.
- */
- result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page, NULL);
+ result = get_user_pages(vaddr, 1, FOLL_FORCE, &page);
if (result < 0)
return result;
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 3/9] uprobes: simplify error handling for alloc_uprobe()
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 1/9] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 2/9] uprobes: is_trap_at_addr: don't use get_user_pages_remote() Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 4/9] selftests/bpf: fix uprobe.path leak in bpf_testmod Oleg Nesterov
` (6 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel
From: Andrii Nakryiko <andrii@kernel.org>
Return -ENOMEM instead of NULL, which makes caller's error handling just
a touch simpler.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/events/uprobes.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index eb71428691bb..dfe6306a63b1 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -725,7 +725,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
uprobe = kzalloc(sizeof(struct uprobe), GFP_KERNEL);
if (!uprobe)
- return NULL;
+ return ERR_PTR(-ENOMEM);
uprobe->inode = inode;
uprobe->offset = offset;
@@ -1167,8 +1167,6 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
retry:
uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
- if (!uprobe)
- return -ENOMEM;
if (IS_ERR(uprobe))
return PTR_ERR(uprobe);
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 4/9] selftests/bpf: fix uprobe.path leak in bpf_testmod
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
` (2 preceding siblings ...)
2024-08-01 13:27 ` [PATCH v4 3/9] uprobes: simplify error handling for alloc_uprobe() Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 5/9] uprobes: kill uprobe_register_refctr() Oleg Nesterov
` (5 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel
From: Jiri Olsa <olsajiri@gmail.com>
testmod_unregister_uprobe() forgets to path_put(&uprobe.path).
Signed-off-by: Jiri Olsa <olsajiri@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index fd28c1157bd3..72f565af4f82 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -477,6 +477,7 @@ static void testmod_unregister_uprobe(void)
if (uprobe.offset) {
uprobe_unregister(d_real_inode(uprobe.path.dentry),
uprobe.offset, &uprobe.consumer);
+ path_put(&uprobe.path);
uprobe.offset = 0;
}
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 5/9] uprobes: kill uprobe_register_refctr()
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
` (3 preceding siblings ...)
2024-08-01 13:27 ` [PATCH v4 4/9] selftests/bpf: fix uprobe.path leak in bpf_testmod Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 6/9] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
` (4 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel
It doesn't make any sense to have 2 versions of _register(). Note that
trace_uprobe_enable(), the only user of uprobe_register(), doesn't need
to check tu->ref_ctr_offset to decide which one should be used, it could
safely pass ref_ctr_offset == 0 to uprobe_register_refctr().
Add this argument to uprobe_register(), update the callers, and kill
uprobe_register_refctr().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/uprobes.h | 9 ++-----
kernel/events/uprobes.c | 24 +++++--------------
kernel/trace/bpf_trace.c | 8 +++----
kernel/trace/trace_uprobe.c | 7 +-----
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 4 ++--
5 files changed, 15 insertions(+), 37 deletions(-)
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index b503fafb7fb3..440316fbf3c6 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -110,8 +110,7 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
-extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
-extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
+extern int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -152,11 +151,7 @@ static inline void uprobes_init(void)
#define uprobe_get_trap_addr(regs) instruction_pointer(regs)
static inline int
-uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
-{
- return -ENOSYS;
-}
-static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
{
return -ENOSYS;
}
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index dfe6306a63b1..b7f40bad8abc 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1121,25 +1121,26 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
EXPORT_SYMBOL_GPL(uprobe_unregister);
/*
- * __uprobe_register - register a probe
+ * uprobe_register - register a probe
* @inode: the file in which the probe has to be placed.
* @offset: offset from the start of the file.
+ * @ref_ctr_offset: offset of SDT marker / reference counter
* @uc: information on howto handle the probe..
*
- * Apart from the access refcount, __uprobe_register() takes a creation
+ * Apart from the access refcount, uprobe_register() takes a creation
* refcount (thro alloc_uprobe) if and only if this @uprobe is getting
* inserted into the rbtree (i.e first consumer for a @inode:@offset
* tuple). Creation refcount stops uprobe_unregister from freeing the
* @uprobe even before the register operation is complete. Creation
* refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of __uprobe_register() is required to keep @inode
+ * unregisters. Caller of uprobe_register() is required to keep @inode
* (and the containing mount) referenced.
*
* Return errno if it cannot successully install probes
* else return 0 (success)
*/
-static int __uprobe_register(struct inode *inode, loff_t offset,
- loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
+ struct uprobe_consumer *uc)
{
struct uprobe *uprobe;
int ret;
@@ -1189,21 +1190,8 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
goto retry;
return ret;
}
-
-int uprobe_register(struct inode *inode, loff_t offset,
- struct uprobe_consumer *uc)
-{
- return __uprobe_register(inode, offset, 0, uc);
-}
EXPORT_SYMBOL_GPL(uprobe_register);
-int uprobe_register_refctr(struct inode *inode, loff_t offset,
- loff_t ref_ctr_offset, struct uprobe_consumer *uc)
-{
- return __uprobe_register(inode, offset, ref_ctr_offset, uc);
-}
-EXPORT_SYMBOL_GPL(uprobe_register_refctr);
-
/*
* uprobe_apply - unregister an already registered probe.
* @inode: the file in which the probe has to be removed.
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index cd098846e251..afa909e17824 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3480,10 +3480,10 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
&bpf_uprobe_multi_link_lops, prog);
for (i = 0; i < cnt; i++) {
- err = uprobe_register_refctr(d_real_inode(link->path.dentry),
- uprobes[i].offset,
- uprobes[i].ref_ctr_offset,
- &uprobes[i].consumer);
+ err = uprobe_register(d_real_inode(link->path.dentry),
+ uprobes[i].offset,
+ uprobes[i].ref_ctr_offset,
+ &uprobes[i].consumer);
if (err) {
bpf_uprobe_unregister(&path, uprobes, i);
goto error_free;
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index c98e3b3386ba..1f590f989c1e 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1089,12 +1089,7 @@ static int trace_uprobe_enable(struct trace_uprobe *tu, filter_func_t filter)
tu->consumer.filter = filter;
tu->inode = d_real_inode(tu->path.dentry);
- if (tu->ref_ctr_offset)
- ret = uprobe_register_refctr(tu->inode, tu->offset,
- tu->ref_ctr_offset, &tu->consumer);
- else
- ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
-
+ ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset, &tu->consumer);
if (ret)
tu->inode = NULL;
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 72f565af4f82..55f6905de743 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -458,8 +458,8 @@ static int testmod_register_uprobe(loff_t offset)
if (err)
goto out;
- err = uprobe_register_refctr(d_real_inode(uprobe.path.dentry),
- offset, 0, &uprobe.consumer);
+ err = uprobe_register(d_real_inode(uprobe.path.dentry),
+ offset, 0, &uprobe.consumer);
if (err)
path_put(&uprobe.path);
else
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 6/9] uprobes: make uprobe_register() return struct uprobe *
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
` (4 preceding siblings ...)
2024-08-01 13:27 ` [PATCH v4 5/9] uprobes: kill uprobe_register_refctr() Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 7/9] uprobes: change uprobe_register() to use uprobe_unregister() instead of __uprobe_unregister() Oleg Nesterov
` (3 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel
This way uprobe_unregister() and uprobe_apply() can use "struct uprobe *"
rather than inode + offset. This simplifies the code and allows to avoid
the unnecessary find_uprobe() + put_uprobe() in these functions.
TODO: uprobe_unregister() still needs get_uprobe/put_uprobe to ensure that
this uprobe can't be freed before up_write(&uprobe->register_rwsem).
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
include/linux/uprobes.h | 15 ++---
kernel/events/uprobes.c | 56 ++++++++-----------
kernel/trace/bpf_trace.c | 25 ++++-----
kernel/trace/trace_uprobe.c | 26 ++++-----
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 25 ++++-----
5 files changed, 67 insertions(+), 80 deletions(-)
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 440316fbf3c6..137ddfc0b2f8 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -16,6 +16,7 @@
#include <linux/types.h>
#include <linux/wait.h>
+struct uprobe;
struct vm_area_struct;
struct mm_struct;
struct inode;
@@ -110,9 +111,9 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
-extern int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
-extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
-extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
+extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
+extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
+extern void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc);
extern int uprobe_mmap(struct vm_area_struct *vma);
extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end);
extern void uprobe_start_dup_mmap(void);
@@ -150,18 +151,18 @@ static inline void uprobes_init(void)
#define uprobe_get_trap_addr(regs) instruction_pointer(regs)
-static inline int
+static inline struct uprobe *
uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
{
- return -ENOSYS;
+ return ERR_PTR(-ENOSYS);
}
static inline int
-uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
+uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add)
{
return -ENOSYS;
}
static inline void
-uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
{
}
static inline int uprobe_mmap(struct vm_area_struct *vma)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index b7f40bad8abc..974474680820 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1099,20 +1099,14 @@ __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
delete_uprobe(uprobe);
}
-/*
+/**
* uprobe_unregister - unregister an already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
+ * @uprobe: uprobe to remove
* @uc: identify which probe if multiple probes are colocated.
*/
-void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
{
- struct uprobe *uprobe;
-
- uprobe = find_uprobe(inode, offset);
- if (WARN_ON(!uprobe))
- return;
-
+ get_uprobe(uprobe);
down_write(&uprobe->register_rwsem);
__uprobe_unregister(uprobe, uc);
up_write(&uprobe->register_rwsem);
@@ -1120,7 +1114,7 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
}
EXPORT_SYMBOL_GPL(uprobe_unregister);
-/*
+/**
* uprobe_register - register a probe
* @inode: the file in which the probe has to be placed.
* @offset: offset from the start of the file.
@@ -1136,40 +1130,40 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
* unregisters. Caller of uprobe_register() is required to keep @inode
* (and the containing mount) referenced.
*
- * Return errno if it cannot successully install probes
- * else return 0 (success)
+ * Return: pointer to the new uprobe on success or an ERR_PTR on failure.
*/
-int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
- struct uprobe_consumer *uc)
+struct uprobe *uprobe_register(struct inode *inode,
+ loff_t offset, loff_t ref_ctr_offset,
+ struct uprobe_consumer *uc)
{
struct uprobe *uprobe;
int ret;
/* Uprobe must have at least one set consumer */
if (!uc->handler && !uc->ret_handler)
- return -EINVAL;
+ return ERR_PTR(-EINVAL);
/* copy_insn() uses read_mapping_page() or shmem_read_mapping_page() */
if (!inode->i_mapping->a_ops->read_folio &&
!shmem_mapping(inode->i_mapping))
- return -EIO;
+ return ERR_PTR(-EIO);
/* Racy, just to catch the obvious mistakes */
if (offset > i_size_read(inode))
- return -EINVAL;
+ return ERR_PTR(-EINVAL);
/*
* This ensures that copy_from_page(), copy_to_page() and
* __update_ref_ctr() can't cross page boundary.
*/
if (!IS_ALIGNED(offset, UPROBE_SWBP_INSN_SIZE))
- return -EINVAL;
+ return ERR_PTR(-EINVAL);
if (!IS_ALIGNED(ref_ctr_offset, sizeof(short)))
- return -EINVAL;
+ return ERR_PTR(-EINVAL);
retry:
uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
if (IS_ERR(uprobe))
- return PTR_ERR(uprobe);
+ return uprobe;
/*
* We can race with uprobe_unregister()->delete_uprobe().
@@ -1188,35 +1182,29 @@ int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
if (unlikely(ret == -EAGAIN))
goto retry;
- return ret;
+
+ return ret ? ERR_PTR(ret) : uprobe;
}
EXPORT_SYMBOL_GPL(uprobe_register);
-/*
- * uprobe_apply - unregister an already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
+/**
+ * uprobe_apply - add or remove the breakpoints according to @uc->filter
+ * @uprobe: uprobe which "owns" the breakpoint
* @uc: consumer which wants to add more or remove some breakpoints
* @add: add or remove the breakpoints
+ * Return: 0 on success or negative error code.
*/
-int uprobe_apply(struct inode *inode, loff_t offset,
- struct uprobe_consumer *uc, bool add)
+int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool add)
{
- struct uprobe *uprobe;
struct uprobe_consumer *con;
int ret = -ENOENT;
- uprobe = find_uprobe(inode, offset);
- if (WARN_ON(!uprobe))
- return ret;
-
down_write(&uprobe->register_rwsem);
for (con = uprobe->consumers; con && con != uc ; con = con->next)
;
if (con)
ret = register_for_each_vma(uprobe, add ? uc : NULL);
up_write(&uprobe->register_rwsem);
- put_uprobe(uprobe);
return ret;
}
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index afa909e17824..4e391daafa64 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3160,6 +3160,7 @@ struct bpf_uprobe {
loff_t offset;
unsigned long ref_ctr_offset;
u64 cookie;
+ struct uprobe *uprobe;
struct uprobe_consumer consumer;
};
@@ -3178,15 +3179,12 @@ struct bpf_uprobe_multi_run_ctx {
struct bpf_uprobe *uprobe;
};
-static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
- u32 cnt)
+static void bpf_uprobe_unregister(struct bpf_uprobe *uprobes, u32 cnt)
{
u32 i;
- for (i = 0; i < cnt; i++) {
- uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
- &uprobes[i].consumer);
- }
+ for (i = 0; i < cnt; i++)
+ uprobe_unregister(uprobes[i].uprobe, &uprobes[i].consumer);
}
static void bpf_uprobe_multi_link_release(struct bpf_link *link)
@@ -3194,7 +3192,7 @@ static void bpf_uprobe_multi_link_release(struct bpf_link *link)
struct bpf_uprobe_multi_link *umulti_link;
umulti_link = container_of(link, struct bpf_uprobe_multi_link, link);
- bpf_uprobe_unregister(&umulti_link->path, umulti_link->uprobes, umulti_link->cnt);
+ bpf_uprobe_unregister(umulti_link->uprobes, umulti_link->cnt);
if (umulti_link->task)
put_task_struct(umulti_link->task);
path_put(&umulti_link->path);
@@ -3480,12 +3478,13 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
&bpf_uprobe_multi_link_lops, prog);
for (i = 0; i < cnt; i++) {
- err = uprobe_register(d_real_inode(link->path.dentry),
- uprobes[i].offset,
- uprobes[i].ref_ctr_offset,
- &uprobes[i].consumer);
- if (err) {
- bpf_uprobe_unregister(&path, uprobes, i);
+ uprobes[i].uprobe = uprobe_register(d_real_inode(link->path.dentry),
+ uprobes[i].offset,
+ uprobes[i].ref_ctr_offset,
+ &uprobes[i].consumer);
+ if (IS_ERR(uprobes[i].uprobe)) {
+ err = PTR_ERR(uprobes[i].uprobe);
+ bpf_uprobe_unregister(uprobes, i);
goto error_free;
}
}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 1f590f989c1e..52e76a73fa7c 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -58,8 +58,8 @@ struct trace_uprobe {
struct dyn_event devent;
struct uprobe_consumer consumer;
struct path path;
- struct inode *inode;
char *filename;
+ struct uprobe *uprobe;
unsigned long offset;
unsigned long ref_ctr_offset;
unsigned long nhit;
@@ -1084,16 +1084,16 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
static int trace_uprobe_enable(struct trace_uprobe *tu, filter_func_t filter)
{
- int ret;
+ struct inode *inode = d_real_inode(tu->path.dentry);
+ struct uprobe *uprobe;
tu->consumer.filter = filter;
- tu->inode = d_real_inode(tu->path.dentry);
-
- ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset, &tu->consumer);
- if (ret)
- tu->inode = NULL;
+ uprobe = uprobe_register(inode, tu->offset, tu->ref_ctr_offset, &tu->consumer);
+ if (IS_ERR(uprobe))
+ return PTR_ERR(uprobe);
- return ret;
+ tu->uprobe = uprobe;
+ return 0;
}
static void __probe_event_disable(struct trace_probe *tp)
@@ -1104,11 +1104,11 @@ static void __probe_event_disable(struct trace_probe *tp)
WARN_ON(!uprobe_filter_is_empty(tu->tp.event->filter));
list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) {
- if (!tu->inode)
+ if (!tu->uprobe)
continue;
- uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
- tu->inode = NULL;
+ uprobe_unregister(tu->uprobe, &tu->consumer);
+ tu->uprobe = NULL;
}
}
@@ -1305,7 +1305,7 @@ static int uprobe_perf_close(struct trace_event_call *call,
return 0;
list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) {
- ret = uprobe_apply(tu->inode, tu->offset, &tu->consumer, false);
+ ret = uprobe_apply(tu->uprobe, &tu->consumer, false);
if (ret)
break;
}
@@ -1329,7 +1329,7 @@ static int uprobe_perf_open(struct trace_event_call *call,
return 0;
list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) {
- err = uprobe_apply(tu->inode, tu->offset, &tu->consumer, true);
+ err = uprobe_apply(tu->uprobe, &tu->consumer, true);
if (err) {
uprobe_perf_close(call, event);
break;
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 55f6905de743..3c0515a27842 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -432,7 +432,7 @@ uprobe_ret_handler(struct uprobe_consumer *self, unsigned long func,
struct testmod_uprobe {
struct path path;
- loff_t offset;
+ struct uprobe *uprobe;
struct uprobe_consumer consumer;
};
@@ -446,25 +446,25 @@ static int testmod_register_uprobe(loff_t offset)
{
int err = -EBUSY;
- if (uprobe.offset)
+ if (uprobe.uprobe)
return -EBUSY;
mutex_lock(&testmod_uprobe_mutex);
- if (uprobe.offset)
+ if (uprobe.uprobe)
goto out;
err = kern_path("/proc/self/exe", LOOKUP_FOLLOW, &uprobe.path);
if (err)
goto out;
- err = uprobe_register(d_real_inode(uprobe.path.dentry),
- offset, 0, &uprobe.consumer);
- if (err)
+ uprobe.uprobe = uprobe_register(d_real_inode(uprobe.path.dentry),
+ offset, 0, &uprobe.consumer);
+ if (IS_ERR(uprobe.uprobe)) {
+ err = PTR_ERR(uprobe.uprobe);
path_put(&uprobe.path);
- else
- uprobe.offset = offset;
-
+ uprobe.uprobe = NULL;
+ }
out:
mutex_unlock(&testmod_uprobe_mutex);
return err;
@@ -474,11 +474,10 @@ static void testmod_unregister_uprobe(void)
{
mutex_lock(&testmod_uprobe_mutex);
- if (uprobe.offset) {
- uprobe_unregister(d_real_inode(uprobe.path.dentry),
- uprobe.offset, &uprobe.consumer);
+ if (uprobe.uprobe) {
+ uprobe_unregister(uprobe.uprobe, &uprobe.consumer);
path_put(&uprobe.path);
- uprobe.offset = 0;
+ uprobe.uprobe = NULL;
}
mutex_unlock(&testmod_uprobe_mutex);
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 7/9] uprobes: change uprobe_register() to use uprobe_unregister() instead of __uprobe_unregister()
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
` (5 preceding siblings ...)
2024-08-01 13:27 ` [PATCH v4 6/9] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 8/9] uprobes: fold __uprobe_unregister() into uprobe_unregister() Oleg Nesterov
` (2 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel
If register_for_each_vma() fails uprobe_register() can safely drop
uprobe->register_rwsem and use uprobe_unregister(). There is no worry
about the races with another register/unregister, consumer_add() was
already called so this case doesn't differ from _unregister() right
after the successful _register().
Yes this means the extra up_write() + down_write(), but this is the
slow and unlikely case anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/events/uprobes.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 974474680820..5ea0aabe8774 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1174,16 +1174,18 @@ struct uprobe *uprobe_register(struct inode *inode,
if (likely(uprobe_is_active(uprobe))) {
consumer_add(uprobe, uc);
ret = register_for_each_vma(uprobe, uc);
- if (ret)
- __uprobe_unregister(uprobe, uc);
}
up_write(&uprobe->register_rwsem);
put_uprobe(uprobe);
- if (unlikely(ret == -EAGAIN))
- goto retry;
+ if (ret) {
+ if (unlikely(ret == -EAGAIN))
+ goto retry;
+ uprobe_unregister(uprobe, uc);
+ return ERR_PTR(ret);
+ }
- return ret ? ERR_PTR(ret) : uprobe;
+ return uprobe;
}
EXPORT_SYMBOL_GPL(uprobe_register);
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 8/9] uprobes: fold __uprobe_unregister() into uprobe_unregister()
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
` (6 preceding siblings ...)
2024-08-01 13:27 ` [PATCH v4 7/9] uprobes: change uprobe_register() to use uprobe_unregister() instead of __uprobe_unregister() Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 9/9] uprobes: shift put_uprobe() from delete_uprobe() to uprobe_unregister() Oleg Nesterov
2024-08-01 13:36 ` [PATCH v4 0/9] uprobes: misc cleanups/simplifications Peter Zijlstra
9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel
Fold __uprobe_unregister() into its single caller, uprobe_unregister().
A separate patch to simplify the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/events/uprobes.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 5ea0aabe8774..c06e1a5f1783 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1085,20 +1085,6 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
return err;
}
-static void
-__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
- int err;
-
- if (WARN_ON(!consumer_del(uprobe, uc)))
- return;
-
- err = register_for_each_vma(uprobe, NULL);
- /* TODO : cant unregister? schedule a worker thread */
- if (!uprobe->consumers && !err)
- delete_uprobe(uprobe);
-}
-
/**
* uprobe_unregister - unregister an already registered probe.
* @uprobe: uprobe to remove
@@ -1106,9 +1092,18 @@ __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
*/
void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
{
+ int err;
+
get_uprobe(uprobe);
down_write(&uprobe->register_rwsem);
- __uprobe_unregister(uprobe, uc);
+ if (WARN_ON(!consumer_del(uprobe, uc)))
+ err = -ENOENT;
+ else
+ err = register_for_each_vma(uprobe, NULL);
+
+ /* TODO : cant unregister? schedule a worker thread */
+ if (!err && !uprobe->consumers)
+ delete_uprobe(uprobe);
up_write(&uprobe->register_rwsem);
put_uprobe(uprobe);
}
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 9/9] uprobes: shift put_uprobe() from delete_uprobe() to uprobe_unregister()
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
` (7 preceding siblings ...)
2024-08-01 13:27 ` [PATCH v4 8/9] uprobes: fold __uprobe_unregister() into uprobe_unregister() Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
2024-08-01 13:36 ` [PATCH v4 0/9] uprobes: misc cleanups/simplifications Peter Zijlstra
9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel
Kill the extra get_uprobe() + put_uprobe() in uprobe_unregister() and
move the possibly final put_uprobe() from delete_uprobe() to its only
caller, uprobe_unregister().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
kernel/events/uprobes.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c06e1a5f1783..f88b7ff20587 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -939,7 +939,6 @@ static void delete_uprobe(struct uprobe *uprobe)
rb_erase(&uprobe->rb_node, &uprobes_tree);
write_unlock(&uprobes_treelock);
RB_CLEAR_NODE(&uprobe->rb_node); /* for uprobe_is_active() */
- put_uprobe(uprobe);
}
struct map_info {
@@ -1094,7 +1093,6 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
{
int err;
- get_uprobe(uprobe);
down_write(&uprobe->register_rwsem);
if (WARN_ON(!consumer_del(uprobe, uc)))
err = -ENOENT;
@@ -1102,10 +1100,16 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
err = register_for_each_vma(uprobe, NULL);
/* TODO : cant unregister? schedule a worker thread */
- if (!err && !uprobe->consumers)
- delete_uprobe(uprobe);
+ if (!err) {
+ if (!uprobe->consumers)
+ delete_uprobe(uprobe);
+ else
+ err = -EBUSY;
+ }
up_write(&uprobe->register_rwsem);
- put_uprobe(uprobe);
+
+ if (!err)
+ put_uprobe(uprobe);
}
EXPORT_SYMBOL_GPL(uprobe_unregister);
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
` (8 preceding siblings ...)
2024-08-01 13:27 ` [PATCH v4 9/9] uprobes: shift put_uprobe() from delete_uprobe() to uprobe_unregister() Oleg Nesterov
@ 2024-08-01 13:36 ` Peter Zijlstra
2024-08-01 18:58 ` Andrii Nakryiko
9 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2024-08-01 13:36 UTC (permalink / raw)
To: Oleg Nesterov
Cc: andrii, mhiramat, jolsa, rostedt, linux-kernel,
linux-trace-kernel
On Thu, Aug 01, 2024 at 03:26:38PM +0200, Oleg Nesterov wrote:
> (Andrii, I'll try to look at your new series on Weekend).
OK, I dropped all your previous patches and stuffed these in.
They should all be visible in queue/perf/core, and provided the robot
doesn't scream, I'll push them into tip/perf/core soonish.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
2024-08-01 13:36 ` [PATCH v4 0/9] uprobes: misc cleanups/simplifications Peter Zijlstra
@ 2024-08-01 18:58 ` Andrii Nakryiko
2024-08-01 21:13 ` Andrii Nakryiko
0 siblings, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2024-08-01 18:58 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Oleg Nesterov, andrii, mhiramat, jolsa, rostedt, linux-kernel,
linux-trace-kernel, bpf
+ bpf
On Thu, Aug 1, 2024 at 6:36 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Aug 01, 2024 at 03:26:38PM +0200, Oleg Nesterov wrote:
> > (Andrii, I'll try to look at your new series on Weekend).
>
> OK, I dropped all your previous patches and stuffed these in.
>
> They should all be visible in queue/perf/core, and provided the robot
> doesn't scream, I'll push them into tip/perf/core soonish.
Just FYI, it seems like tip/perf/core is currently broken for uprobes
(and by implication also queue/perf/core). Also torvalds/linux/master
master is broken. See what I'm getting when running BPF selftests
dealing with uprobes. Sometimes I only get that WARNING and nothing
else.
I'm bisecting at the moment with bpf/master being a "good" checkpoint,
will let you know once I bisect.
[ 34.343557] ------------[ cut here ]------------
[ 34.343906] WARNING: CPU: 3 PID: 2364 at
kernel/trace/trace_uprobe.c:1109 __probe_event_disable+0x26/0x80
[ 34.344468] Modules linked in:
[ 34.344488] BUG: unable to handle page fault for address: ffffc90001c5bea0
[ 34.345071] #PF: supervisor read access in kernel mode
[ 34.345370] #PF: error_code(0x0000) - not-present page
[ 34.345700] PGD 100000067 P4D 100000067 PUD ffff88810d86cd40
[ 34.346061] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 34.346377] CPU: 3 UID: 0 PID: 2364 Comm: test_progs Tainted: G
OE 6.11.0-rc1-00006-g6763ebdb4983 #115
[ 34.347052] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 34.347392] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04
[ 34.348085] RIP: 0010:__wake_up_common+0x3e/0xc0
[ 34.348359] Code: 89 d4 55 53 48 89 fb 48 83 ec 08 8b 05 c3 05 a8
02 89 74 24 04 85 c0 75 6d 48 8b 43 48 48 83 c3 48 4f
[ 34.349440] RSP: 0018:ffffc900001d0d90 EFLAGS: 00010087
[ 34.349796] RAX: ffffc90001c5bea0 RBX: ffff88810138e0e8 RCX: 0000000000000001
[ 34.350282] RDX: 0000000080010005 RSI: ffffffff8295f82c RDI: ffffc90001c5be88
[ 34.350768] RBP: ffff88810138e0a0 R08: 0000000000000000 R09: 000000000000438f
[ 34.351245] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
[ 34.351703] R13: 0000000000000046 R14: 0000000000000000 R15: 0000000000000000
[ 34.352112] FS: 00007fd71c7b3d00(0000) GS:ffff88881ca00000(0000)
knlGS:0000000000000000
[ 34.352574] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 34.352911] CR2: ffffc90001c5bea0 CR3: 000000010b456005 CR4: 0000000000370ef0
[ 34.353320] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 34.353734] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 34.354144] Call Trace:
[ 34.354290] <IRQ>
[ 34.354413] ? __die+0x1f/0x60
[ 34.354601] ? page_fault_oops+0x14c/0x450
[ 34.354848] ? search_bpf_extables+0xa8/0x150
[ 34.355105] ? fixup_exception+0x22/0x2d0
[ 34.355342] ? exc_page_fault+0x207/0x210
[ 34.355579] ? asm_exc_page_fault+0x22/0x30
[ 34.355829] ? __wake_up_common+0x3e/0xc0
[ 34.356065] __wake_up+0x32/0x60
[ 34.356261] ep_poll_callback+0x13e/0x270
[ 34.356502] __wake_up_common+0x7d/0xc0
[ 34.356731] __wake_up+0x32/0x60
[ 34.356961] irq_work_single+0x67/0x90
[ 34.357184] irq_work_run_list+0x26/0x40
[ 34.357442] update_process_times+0x83/0xa0
[ 34.357717] tick_nohz_handler+0x97/0x140
[ 34.357977] ? __pfx_tick_nohz_handler+0x10/0x10
[ 34.358283] __hrtimer_run_queues+0x19a/0x3b0
[ 34.358580] hrtimer_interrupt+0xfe/0x240
[ 34.358864] __sysvec_apic_timer_interrupt+0x87/0x210
[ 34.359189] sysvec_apic_timer_interrupt+0x98/0xc0
[ 34.359487] </IRQ>
[ 34.359614] <TASK>
[ 34.359745] asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 34.360042] RIP: 0010:print_modules+0x27/0xd0
[ 34.360301] Code: 90 90 90 0f 1f 44 00 00 53 48 c7 c7 63 22 98 82
48 83 ec 18 e8 da 8d fc ff bf 01 00 00 00 e8 80 d1 f1
[ 34.361372] RSP: 0018:ffffc90001e5bc58 EFLAGS: 00000297
[ 34.361682] RAX: ffffffffa03f0948 RBX: 0000000000000000 RCX: 0000000000000000
[ 34.362091] RDX: 0000000000000002 RSI: 0000000000000027 RDI: 0000000000000001
[ 34.362502] RBP: ffffc90001e5bd28 R08: 00000000fffeffff R09: 0000000000000001
[ 34.362917] R10: 0000000000000000 R11: ffffffff83299920 R12: ffffffff81233536
[ 34.363325] R13: 0000000000000009 R14: 0000000000000455 R15: ffffffff8298f924
[ 34.363740] ? __probe_event_disable+0x26/0x80
[ 34.364009] ? print_modules+0x20/0xd0
[ 34.364230] ? __probe_event_disable+0x26/0x80
[ 34.364488] __warn+0x6f/0x180
[ 34.364683] ? __probe_event_disable+0x26/0x80
[ 34.364945] report_bug+0x18d/0x1c0
[ 34.365156] handle_bug+0x3a/0x70
[ 34.365354] exc_invalid_op+0x13/0x60
[ 34.365571] asm_exc_invalid_op+0x16/0x20
[ 34.365838] RIP: 0010:__probe_event_disable+0x26/0x80
[ 34.366175] Code: 90 90 90 90 55 48 89 fd 53 48 8b 47 10 8b 90 38
01 00 00 85 d2 75 13 48 8b 88 40 01 00 00 48 8d 90 40
[ 34.367340] RSP: 0018:ffffc90001e5bdd0 EFLAGS: 00010287
[ 34.367651] RAX: ffff88810d86cc00 RBX: ffff88810d86cce0 RCX: ffff8881075c8168
[ 34.368067] RDX: ffff88810d86cd40 RSI: 0000000000000003 RDI: ffff88810a5e1db0
[ 34.368482] RBP: ffff88810a5e1db0 R08: 00000000ffffffff R09: 0000000000000000
[ 34.368896] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881075c8250
[ 34.369310] R13: ffff8881075c82f0 R14: ffffc90001e5bb80 R15: ffff8881075c8000
[ 34.369775] trace_uprobe_register+0x1a8/0x300
[ 34.370053] perf_trace_event_unreg.isra.0+0x22/0x80
[ 34.370342] perf_uprobe_destroy+0x3a/0x70
[ 34.370582] _free_event+0x114/0x580
[ 34.370806] perf_event_release_kernel+0x282/0x2c0
[ 34.371123] perf_release+0x11/0x20
[ 34.371354] __fput+0x102/0x2e0
[ 34.371561] task_work_run+0x55/0xa0
[ 34.371798] syscall_exit_to_user_mode+0x1dd/0x1f0
[ 34.372104] do_syscall_64+0x70/0x140
[ 34.372337] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 34.372659] RIP: 0033:0x7fd71c986a94
[ 34.372901] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84
00 00 00 00 00 90 f3 0f 1e fa 80 3d d5 18 0e 00 00 73
[ 34.373987] RSP: 002b:00007ffc09d83118 EFLAGS: 00000202 ORIG_RAX:
0000000000000003
[ 34.374426] RAX: 0000000000000000 RBX: 00007ffc09d835e8 RCX: 00007fd71c986a94
[ 34.374874] RDX: 000000000000000b RSI: 0000000000002401 RDI: 000000000000000c
[ 34.375325] RBP: 00007ffc09d83150 R08: 00000000049d39c7 R09: 0000000000000007
[ 34.375782] R10: 00000000049d71b0 R11: 0000000000000202 R12: 0000000000000000
[ 34.376245] R13: 00007ffc09d83608 R14: 00007fd71cae3000 R15: 000000000103cd90
[ 34.376661] </TASK>
[ 34.376794] Modules linked in: bpf_testmod(OE) aesni_intel(E)
crypto_simd(E) cryptd(E) kvm_intel(E) i2c_piix4(E) i2c_s)
[ 34.377900] CR2: ffffc90001c5bea0
[ 34.378097] ---[ end trace 0000000000000000 ]---
[ 34.378366] RIP: 0010:__wake_up_common+0x3e/0xc0
[ 34.378637] Code: 89 d4 55 53 48 89 fb 48 83 ec 08 8b 05 c3 05 a8
02 89 74 24 04 85 c0 75 6d 48 8b 43 48 48 83 c3 48 4f
[ 34.379703] RSP: 0018:ffffc900001d0d90 EFLAGS: 00010087
[ 34.380004] RAX: ffffc90001c5bea0 RBX: ffff88810138e0e8 RCX: 0000000000000001
[ 34.380414] RDX: 0000000080010005 RSI: ffffffff8295f82c RDI: ffffc90001c5be88
[ 34.380827] RBP: ffff88810138e0a0 R08: 0000000000000000 R09: 000000000000438f
[ 34.381284] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
[ 34.381736] R13: 0000000000000046 R14: 0000000000000000 R15: 0000000000000000
[ 34.382198] FS: 00007fd71c7b3d00(0000) GS:ffff88881ca00000(0000)
knlGS:0000000000000000
[ 34.382712] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 34.383082] CR2: ffffc90001c5bea0 CR3: 000000010b456005 CR4: 0000000000370ef0
[ 34.383540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 34.383991] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 34.384407] Kernel panic - not syncing: Fatal exception in interrupt
[ 35.464708] Shutting down cpus with NMI
[ 35.465244] Kernel Offset: disabled
[ 35.465474] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt ]---
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
2024-08-01 18:58 ` Andrii Nakryiko
@ 2024-08-01 21:13 ` Andrii Nakryiko
2024-08-02 8:27 ` Peter Zijlstra
2024-08-02 9:25 ` Peter Zijlstra
0 siblings, 2 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2024-08-01 21:13 UTC (permalink / raw)
To: Peter Zijlstra, Adrian Hunter
Cc: Oleg Nesterov, andrii, mhiramat, jolsa, rostedt, linux-kernel,
linux-trace-kernel, bpf
On Thu, Aug 1, 2024 at 11:58 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> + bpf
>
> On Thu, Aug 1, 2024 at 6:36 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Aug 01, 2024 at 03:26:38PM +0200, Oleg Nesterov wrote:
> > > (Andrii, I'll try to look at your new series on Weekend).
> >
> > OK, I dropped all your previous patches and stuffed these in.
> >
> > They should all be visible in queue/perf/core, and provided the robot
> > doesn't scream, I'll push them into tip/perf/core soonish.
>
> Just FYI, it seems like tip/perf/core is currently broken for uprobes
> (and by implication also queue/perf/core). Also torvalds/linux/master
> master is broken. See what I'm getting when running BPF selftests
> dealing with uprobes. Sometimes I only get that WARNING and nothing
> else.
>
> I'm bisecting at the moment with bpf/master being a "good" checkpoint,
> will let you know once I bisect.
Ok, this bisected to:
675ad74989c2 ("perf/core: Add aux_pause, aux_resume, aux_start_paused")
Reverting all (applied to tip/perf/core) four patches from that series:
6763ebdb4983 (tip/perf/core) perf/x86/intel: Do not enable large PEBS
for events with aux actions or aux sampling
6a45d8847597 perf/x86/intel/pt: Add support for pause / resume
675ad74989c2 perf/core: Add aux_pause, aux_resume, aux_start_paused
d92792a4b26e perf/x86/intel/pt: Fix sampling synchronization
... makes everything work again. I'll leave it up to you and Adrian to
figure this out.
>
> [ 34.343557] ------------[ cut here ]------------
> [ 34.343906] WARNING: CPU: 3 PID: 2364 at
> kernel/trace/trace_uprobe.c:1109 __probe_event_disable+0x26/0x80
> [ 34.344468] Modules linked in:
> [ 34.344488] BUG: unable to handle page fault for address: ffffc90001c5bea0
> [ 34.345071] #PF: supervisor read access in kernel mode
> [ 34.345370] #PF: error_code(0x0000) - not-present page
> [ 34.345700] PGD 100000067 P4D 100000067 PUD ffff88810d86cd40
> [ 34.346061] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 34.346377] CPU: 3 UID: 0 PID: 2364 Comm: test_progs Tainted: G
> OE 6.11.0-rc1-00006-g6763ebdb4983 #115
> [ 34.347052] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [ 34.347392] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04
> [ 34.348085] RIP: 0010:__wake_up_common+0x3e/0xc0
> [ 34.348359] Code: 89 d4 55 53 48 89 fb 48 83 ec 08 8b 05 c3 05 a8
> 02 89 74 24 04 85 c0 75 6d 48 8b 43 48 48 83 c3 48 4f
> [ 34.349440] RSP: 0018:ffffc900001d0d90 EFLAGS: 00010087
> [ 34.349796] RAX: ffffc90001c5bea0 RBX: ffff88810138e0e8 RCX: 0000000000000001
> [ 34.350282] RDX: 0000000080010005 RSI: ffffffff8295f82c RDI: ffffc90001c5be88
> [ 34.350768] RBP: ffff88810138e0a0 R08: 0000000000000000 R09: 000000000000438f
> [ 34.351245] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
> [ 34.351703] R13: 0000000000000046 R14: 0000000000000000 R15: 0000000000000000
> [ 34.352112] FS: 00007fd71c7b3d00(0000) GS:ffff88881ca00000(0000)
> knlGS:0000000000000000
> [ 34.352574] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 34.352911] CR2: ffffc90001c5bea0 CR3: 000000010b456005 CR4: 0000000000370ef0
> [ 34.353320] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 34.353734] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 34.354144] Call Trace:
> [ 34.354290] <IRQ>
> [ 34.354413] ? __die+0x1f/0x60
> [ 34.354601] ? page_fault_oops+0x14c/0x450
> [ 34.354848] ? search_bpf_extables+0xa8/0x150
> [ 34.355105] ? fixup_exception+0x22/0x2d0
> [ 34.355342] ? exc_page_fault+0x207/0x210
> [ 34.355579] ? asm_exc_page_fault+0x22/0x30
> [ 34.355829] ? __wake_up_common+0x3e/0xc0
> [ 34.356065] __wake_up+0x32/0x60
> [ 34.356261] ep_poll_callback+0x13e/0x270
> [ 34.356502] __wake_up_common+0x7d/0xc0
> [ 34.356731] __wake_up+0x32/0x60
> [ 34.356961] irq_work_single+0x67/0x90
> [ 34.357184] irq_work_run_list+0x26/0x40
> [ 34.357442] update_process_times+0x83/0xa0
> [ 34.357717] tick_nohz_handler+0x97/0x140
> [ 34.357977] ? __pfx_tick_nohz_handler+0x10/0x10
> [ 34.358283] __hrtimer_run_queues+0x19a/0x3b0
> [ 34.358580] hrtimer_interrupt+0xfe/0x240
> [ 34.358864] __sysvec_apic_timer_interrupt+0x87/0x210
> [ 34.359189] sysvec_apic_timer_interrupt+0x98/0xc0
> [ 34.359487] </IRQ>
> [ 34.359614] <TASK>
> [ 34.359745] asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 34.360042] RIP: 0010:print_modules+0x27/0xd0
> [ 34.360301] Code: 90 90 90 0f 1f 44 00 00 53 48 c7 c7 63 22 98 82
> 48 83 ec 18 e8 da 8d fc ff bf 01 00 00 00 e8 80 d1 f1
> [ 34.361372] RSP: 0018:ffffc90001e5bc58 EFLAGS: 00000297
> [ 34.361682] RAX: ffffffffa03f0948 RBX: 0000000000000000 RCX: 0000000000000000
> [ 34.362091] RDX: 0000000000000002 RSI: 0000000000000027 RDI: 0000000000000001
> [ 34.362502] RBP: ffffc90001e5bd28 R08: 00000000fffeffff R09: 0000000000000001
> [ 34.362917] R10: 0000000000000000 R11: ffffffff83299920 R12: ffffffff81233536
> [ 34.363325] R13: 0000000000000009 R14: 0000000000000455 R15: ffffffff8298f924
> [ 34.363740] ? __probe_event_disable+0x26/0x80
> [ 34.364009] ? print_modules+0x20/0xd0
> [ 34.364230] ? __probe_event_disable+0x26/0x80
> [ 34.364488] __warn+0x6f/0x180
> [ 34.364683] ? __probe_event_disable+0x26/0x80
> [ 34.364945] report_bug+0x18d/0x1c0
> [ 34.365156] handle_bug+0x3a/0x70
> [ 34.365354] exc_invalid_op+0x13/0x60
> [ 34.365571] asm_exc_invalid_op+0x16/0x20
> [ 34.365838] RIP: 0010:__probe_event_disable+0x26/0x80
> [ 34.366175] Code: 90 90 90 90 55 48 89 fd 53 48 8b 47 10 8b 90 38
> 01 00 00 85 d2 75 13 48 8b 88 40 01 00 00 48 8d 90 40
> [ 34.367340] RSP: 0018:ffffc90001e5bdd0 EFLAGS: 00010287
> [ 34.367651] RAX: ffff88810d86cc00 RBX: ffff88810d86cce0 RCX: ffff8881075c8168
> [ 34.368067] RDX: ffff88810d86cd40 RSI: 0000000000000003 RDI: ffff88810a5e1db0
> [ 34.368482] RBP: ffff88810a5e1db0 R08: 00000000ffffffff R09: 0000000000000000
> [ 34.368896] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881075c8250
> [ 34.369310] R13: ffff8881075c82f0 R14: ffffc90001e5bb80 R15: ffff8881075c8000
> [ 34.369775] trace_uprobe_register+0x1a8/0x300
> [ 34.370053] perf_trace_event_unreg.isra.0+0x22/0x80
> [ 34.370342] perf_uprobe_destroy+0x3a/0x70
> [ 34.370582] _free_event+0x114/0x580
> [ 34.370806] perf_event_release_kernel+0x282/0x2c0
> [ 34.371123] perf_release+0x11/0x20
> [ 34.371354] __fput+0x102/0x2e0
> [ 34.371561] task_work_run+0x55/0xa0
> [ 34.371798] syscall_exit_to_user_mode+0x1dd/0x1f0
> [ 34.372104] do_syscall_64+0x70/0x140
> [ 34.372337] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 34.372659] RIP: 0033:0x7fd71c986a94
> [ 34.372901] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84
> 00 00 00 00 00 90 f3 0f 1e fa 80 3d d5 18 0e 00 00 73
> [ 34.373987] RSP: 002b:00007ffc09d83118 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000003
> [ 34.374426] RAX: 0000000000000000 RBX: 00007ffc09d835e8 RCX: 00007fd71c986a94
> [ 34.374874] RDX: 000000000000000b RSI: 0000000000002401 RDI: 000000000000000c
> [ 34.375325] RBP: 00007ffc09d83150 R08: 00000000049d39c7 R09: 0000000000000007
> [ 34.375782] R10: 00000000049d71b0 R11: 0000000000000202 R12: 0000000000000000
> [ 34.376245] R13: 00007ffc09d83608 R14: 00007fd71cae3000 R15: 000000000103cd90
> [ 34.376661] </TASK>
> [ 34.376794] Modules linked in: bpf_testmod(OE) aesni_intel(E)
> crypto_simd(E) cryptd(E) kvm_intel(E) i2c_piix4(E) i2c_s)
> [ 34.377900] CR2: ffffc90001c5bea0
> [ 34.378097] ---[ end trace 0000000000000000 ]---
> [ 34.378366] RIP: 0010:__wake_up_common+0x3e/0xc0
> [ 34.378637] Code: 89 d4 55 53 48 89 fb 48 83 ec 08 8b 05 c3 05 a8
> 02 89 74 24 04 85 c0 75 6d 48 8b 43 48 48 83 c3 48 4f
> [ 34.379703] RSP: 0018:ffffc900001d0d90 EFLAGS: 00010087
> [ 34.380004] RAX: ffffc90001c5bea0 RBX: ffff88810138e0e8 RCX: 0000000000000001
> [ 34.380414] RDX: 0000000080010005 RSI: ffffffff8295f82c RDI: ffffc90001c5be88
> [ 34.380827] RBP: ffff88810138e0a0 R08: 0000000000000000 R09: 000000000000438f
> [ 34.381284] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
> [ 34.381736] R13: 0000000000000046 R14: 0000000000000000 R15: 0000000000000000
> [ 34.382198] FS: 00007fd71c7b3d00(0000) GS:ffff88881ca00000(0000)
> knlGS:0000000000000000
> [ 34.382712] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 34.383082] CR2: ffffc90001c5bea0 CR3: 000000010b456005 CR4: 0000000000370ef0
> [ 34.383540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 34.383991] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 34.384407] Kernel panic - not syncing: Fatal exception in interrupt
> [ 35.464708] Shutting down cpus with NMI
> [ 35.465244] Kernel Offset: disabled
> [ 35.465474] ---[ end Kernel panic - not syncing: Fatal exception in
> interrupt ]---
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
2024-08-01 21:13 ` Andrii Nakryiko
@ 2024-08-02 8:27 ` Peter Zijlstra
2024-08-02 9:25 ` Peter Zijlstra
1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2024-08-02 8:27 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Adrian Hunter, Oleg Nesterov, andrii, mhiramat, jolsa, rostedt,
linux-kernel, linux-trace-kernel, bpf
On Thu, Aug 01, 2024 at 02:13:41PM -0700, Andrii Nakryiko wrote:
> On Thu, Aug 1, 2024 at 11:58 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > + bpf
> >
> > On Thu, Aug 1, 2024 at 6:36 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Aug 01, 2024 at 03:26:38PM +0200, Oleg Nesterov wrote:
> > > > (Andrii, I'll try to look at your new series on Weekend).
> > >
> > > OK, I dropped all your previous patches and stuffed these in.
> > >
> > > They should all be visible in queue/perf/core, and provided the robot
> > > doesn't scream, I'll push them into tip/perf/core soonish.
> >
> > Just FYI, it seems like tip/perf/core is currently broken for uprobes
> > (and by implication also queue/perf/core). Also torvalds/linux/master
> > master is broken. See what I'm getting when running BPF selftests
> > dealing with uprobes. Sometimes I only get that WARNING and nothing
> > else.
> >
> > I'm bisecting at the moment with bpf/master being a "good" checkpoint,
> > will let you know once I bisect.
>
> Ok, this bisected to:
>
> 675ad74989c2 ("perf/core: Add aux_pause, aux_resume, aux_start_paused")
>
> Reverting all (applied to tip/perf/core) four patches from that series:
>
> 6763ebdb4983 (tip/perf/core) perf/x86/intel: Do not enable large PEBS
> for events with aux actions or aux sampling
> 6a45d8847597 perf/x86/intel/pt: Add support for pause / resume
> 675ad74989c2 perf/core: Add aux_pause, aux_resume, aux_start_paused
> d92792a4b26e perf/x86/intel/pt: Fix sampling synchronization
>
> ... makes everything work again. I'll leave it up to you and Adrian to
> figure this out.
Thanks for catching this. I'll go have a look.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
2024-08-01 21:13 ` Andrii Nakryiko
2024-08-02 8:27 ` Peter Zijlstra
@ 2024-08-02 9:25 ` Peter Zijlstra
2024-08-02 11:02 ` Adrian Hunter
1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2024-08-02 9:25 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Adrian Hunter, Oleg Nesterov, andrii, mhiramat, jolsa, rostedt,
linux-kernel, linux-trace-kernel, bpf
On Thu, Aug 01, 2024 at 02:13:41PM -0700, Andrii Nakryiko wrote:
> Ok, this bisected to:
>
> 675ad74989c2 ("perf/core: Add aux_pause, aux_resume, aux_start_paused")
Adrian, there are at least two obvious bugs there:
- aux_action was key's off of PERF_PMU_CAP_AUX_OUTPUT, which is not
right, that's the capability where events can output to AUX -- aka.
PEBS-to-PT. It should be PERF_PMU_CAP_ITRACE, which is the
PT/CoreSight thing.
- it sets aux_paused unconditionally, which is scribbling in the giant
union which is overwriting state set by perf_init_event().
But I think there's more problems, we need to do the aux_action
validation after perf_get_aux_event(), we can't know if having those
bits set makes sense before that. This means the perf_event_alloc() site
is wrong in the first place.
I'm going to drop these patches for now. Please rework.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
2024-08-02 9:25 ` Peter Zijlstra
@ 2024-08-02 11:02 ` Adrian Hunter
2024-08-02 17:14 ` Adrian Hunter
0 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2024-08-02 11:02 UTC (permalink / raw)
To: Peter Zijlstra, Andrii Nakryiko
Cc: Oleg Nesterov, andrii, mhiramat, jolsa, rostedt, linux-kernel,
linux-trace-kernel, bpf
On 2/08/24 12:25, Peter Zijlstra wrote:
> On Thu, Aug 01, 2024 at 02:13:41PM -0700, Andrii Nakryiko wrote:
>
>> Ok, this bisected to:
>>
>> 675ad74989c2 ("perf/core: Add aux_pause, aux_resume, aux_start_paused")
>
> Adrian, there are at least two obvious bugs there:
>
> - aux_action was key's off of PERF_PMU_CAP_AUX_OUTPUT, which is not
> right, that's the capability where events can output to AUX -- aka.
> PEBS-to-PT. It should be PERF_PMU_CAP_ITRACE, which is the
> PT/CoreSight thing.
>
> - it sets aux_paused unconditionally, which is scribbling in the giant
> union which is overwriting state set by perf_init_event().
>
> But I think there's more problems, we need to do the aux_action
> validation after perf_get_aux_event(), we can't know if having those
> bits set makes sense before that. This means the perf_event_alloc() site
> is wrong in the first place.
>
> I'm going to drop these patches for now. Please rework.
Yes I will do that.
FWIW, I'd expect the reported issue would go away with just:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e4cb6e5a5f40..2072aaa4d449 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12151,7 +12151,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
err = -EOPNOTSUPP;
goto err_pmu;
}
- event->hw.aux_paused = event->attr.aux_start_paused;
+ if (event->attr.aux_start_paused)
+ event->hw.aux_paused = 1;
if (cgroup_fd != -1) {
err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
2024-08-02 11:02 ` Adrian Hunter
@ 2024-08-02 17:14 ` Adrian Hunter
0 siblings, 0 replies; 17+ messages in thread
From: Adrian Hunter @ 2024-08-02 17:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Oleg Nesterov, andrii, mhiramat, jolsa, rostedt, linux-kernel,
linux-trace-kernel, bpf, Andrii Nakryiko
On 2/08/24 14:02, Adrian Hunter wrote:
> On 2/08/24 12:25, Peter Zijlstra wrote:
>> On Thu, Aug 01, 2024 at 02:13:41PM -0700, Andrii Nakryiko wrote:
>>
>>> Ok, this bisected to:
>>>
>>> 675ad74989c2 ("perf/core: Add aux_pause, aux_resume, aux_start_paused")
>>
>> Adrian, there are at least two obvious bugs there:
>>
>> - aux_action was key's off of PERF_PMU_CAP_AUX_OUTPUT, which is not
>> right, that's the capability where events can output to AUX -- aka.
>> PEBS-to-PT. It should be PERF_PMU_CAP_ITRACE, which is the
>> PT/CoreSight thing.
Not sure about that.
In perf_event_alloc(), there is:
if (event->attr.aux_output &&
(!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT) ||
event->attr.aux_pause || event->attr.aux_resume)) {
err = -EOPNOTSUPP;
goto err_pmu;
}
which is to prevent aux_output with aux_pause or aux_resume.
That is because aux_output (i.e. PEBS-via-PT) has no interrupt
and so does not overflow. (Instead the PEBS record is written
by hardware to the Intel PT trace) No overflow => no (software)
aux_pause/aux_resume, so aux_output with aux_pause/aux_resume
does not make sense.
The PMU capability for aux_pause/aux_resume or aux_start_paused
is PERF_PMU_CAP_AUX_PAUSE. aux_pause/aux_resume are valid for
non-AUX events (member of the group), whereas aux_start_paused
is valid for the AUX event itself (group leader). For
aux_pause/aux_resume the group leader's PMU capability is
checked. For aux_start_paused the event's PMU capability is
checked.
>>
>> - it sets aux_paused unconditionally, which is scribbling in the giant
>> union which is overwriting state set by perf_init_event().
That definitely needs fixing, but the fix is just the diff
from my previous reply:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e4cb6e5a5f40..2072aaa4d449 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12151,7 +12151,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
err = -EOPNOTSUPP;
goto err_pmu;
}
- event->hw.aux_paused = event->attr.aux_start_paused;
+ if (event->attr.aux_start_paused)
+ event->hw.aux_paused = 1;
if (cgroup_fd != -1) {
err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);
I tested that with:
# perf probe -x /root/main -a main
Added new event:
probe_main:main (on main in /root/main)
# perf record -e probe_main:main -- ./main
and it made the problem go away.
>>
>> But I think there's more problems, we need to do the aux_action
>> validation after perf_get_aux_event(), we can't know if having those
>> bits set makes sense before that. This means the perf_event_alloc() site
>> is wrong in the first place.
As above, aux_start_paused is used on the AUX event itself, so the
PMU capability is checked in perf_event_alloc:
if (event->attr.aux_start_paused &&
!(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) {
err = -EOPNOTSUPP;
goto err_pmu;
}
Whereas aux_pause/aux_resume are checked in perf_get_aux_event():
if ((event->attr.aux_pause || event->attr.aux_resume) &&
!(group_leader->pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE))
return 0;
That all seems OK, so please let me know if there is
something else to change.
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-08-02 17:14 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 1/9] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 2/9] uprobes: is_trap_at_addr: don't use get_user_pages_remote() Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 3/9] uprobes: simplify error handling for alloc_uprobe() Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 4/9] selftests/bpf: fix uprobe.path leak in bpf_testmod Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 5/9] uprobes: kill uprobe_register_refctr() Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 6/9] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 7/9] uprobes: change uprobe_register() to use uprobe_unregister() instead of __uprobe_unregister() Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 8/9] uprobes: fold __uprobe_unregister() into uprobe_unregister() Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 9/9] uprobes: shift put_uprobe() from delete_uprobe() to uprobe_unregister() Oleg Nesterov
2024-08-01 13:36 ` [PATCH v4 0/9] uprobes: misc cleanups/simplifications Peter Zijlstra
2024-08-01 18:58 ` Andrii Nakryiko
2024-08-01 21:13 ` Andrii Nakryiko
2024-08-02 8:27 ` Peter Zijlstra
2024-08-02 9:25 ` Peter Zijlstra
2024-08-02 11:02 ` Adrian Hunter
2024-08-02 17:14 ` Adrian Hunter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).