linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/9] uprobes: misc cleanups/simplifications
@ 2024-08-01 13:26 Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 1/9] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:26 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel

(Andrii, I'll try to look at your new series on Weekend).

Changes:

	- added the acks I got from Andrii, Masami, and Jiri
	- new 4/9 patch from Jiri, fixes the unrelated bug in bpf_testmod
	- adapt tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
	  to the API changes in 5/9 an 6/9

see the interdiff below.

Oleg.
---

 include/linux/uprobes.h                            |  22 ++--
 kernel/events/uprobes.c                            | 137 +++++++++------------
 kernel/trace/bpf_trace.c                           |  25 ++--
 kernel/trace/trace_uprobe.c                        |  31 ++---
 .../selftests/bpf/bpf_testmod/bpf_testmod.c        |  26 ++--
 5 files changed, 103 insertions(+), 138 deletions(-)

-------------------------------------------------------------------------------

--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -432,7 +432,7 @@ uprobe_ret_handler(struct uprobe_consumer *self, unsigned long func,
 
 struct testmod_uprobe {
 	struct path path;
-	loff_t offset;
+	struct uprobe *uprobe;
 	struct uprobe_consumer consumer;
 };
 
@@ -446,25 +446,25 @@ static int testmod_register_uprobe(loff_t offset)
 {
 	int err = -EBUSY;
 
-	if (uprobe.offset)
+	if (uprobe.uprobe)
 		return -EBUSY;
 
 	mutex_lock(&testmod_uprobe_mutex);
 
-	if (uprobe.offset)
+	if (uprobe.uprobe)
 		goto out;
 
 	err = kern_path("/proc/self/exe", LOOKUP_FOLLOW, &uprobe.path);
 	if (err)
 		goto out;
 
-	err = uprobe_register_refctr(d_real_inode(uprobe.path.dentry),
-				     offset, 0, &uprobe.consumer);
-	if (err)
+	uprobe.uprobe = uprobe_register(d_real_inode(uprobe.path.dentry),
+					offset, 0, &uprobe.consumer);
+	if (IS_ERR(uprobe.uprobe)) {
+		err = PTR_ERR(uprobe.uprobe);
 		path_put(&uprobe.path);
-	else
-		uprobe.offset = offset;
-
+		uprobe.uprobe = NULL;
+	}
 out:
 	mutex_unlock(&testmod_uprobe_mutex);
 	return err;
@@ -474,10 +474,10 @@ static void testmod_unregister_uprobe(void)
 {
 	mutex_lock(&testmod_uprobe_mutex);
 
-	if (uprobe.offset) {
-		uprobe_unregister(d_real_inode(uprobe.path.dentry),
-				  uprobe.offset, &uprobe.consumer);
-		uprobe.offset = 0;
+	if (uprobe.uprobe) {
+		uprobe_unregister(uprobe.uprobe, &uprobe.consumer);
+		path_put(&uprobe.path);
+		uprobe.uprobe = NULL;
 	}
 
 	mutex_unlock(&testmod_uprobe_mutex);


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/9] uprobes: document the usage of mm->mmap_lock
  2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 2/9] uprobes: is_trap_at_addr: don't use get_user_pages_remote() Oleg Nesterov
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel

The comment above uprobe_write_opcode() is wrong, unapply_uprobe() calls
it under mmap_read_lock() and this is correct.

And it is completely unclear why register_for_each_vma() takes mmap_lock
for writing, add a comment to explain that mmap_write_lock() is needed to
avoid the following race:

	- A task T hits the bp installed by uprobe and calls
	  find_active_uprobe()

	- uprobe_unregister() removes this uprobe/bp

	- T calls find_uprobe() which returns NULL

	- another uprobe_register() installs the bp at the same address

	- T calls is_trap_at_addr() which returns true

	- T returns to handle_swbp() and gets SIGTRAP.

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/uprobes.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 73cc47708679..73dd12b09a7b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -453,7 +453,7 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
  * @vaddr: the virtual address to store the opcode.
  * @opcode: opcode to be written at @vaddr.
  *
- * Called with mm->mmap_lock held for write.
+ * Called with mm->mmap_lock held for read or write.
  * Return 0 (success) or a negative errno.
  */
 int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
@@ -1046,7 +1046,13 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 
 		if (err && is_register)
 			goto free;
-
+		/*
+		 * We take mmap_lock for writing to avoid the race with
+		 * find_active_uprobe() which takes mmap_lock for reading.
+		 * Thus this install_breakpoint() can not make
+		 * is_trap_at_addr() true right after find_uprobe()
+		 * returns NULL in find_active_uprobe().
+		 */
 		mmap_write_lock(mm);
 		vma = find_vma(mm, info->vaddr);
 		if (!vma || !valid_vma(vma, is_register) ||
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 2/9] uprobes: is_trap_at_addr: don't use get_user_pages_remote()
  2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 1/9] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 3/9] uprobes: simplify error handling for alloc_uprobe() Oleg Nesterov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel

get_user_pages_remote() and the comment above it make no sense.

There is no task_struct passed into get_user_pages_remote() anymore, and
nowadays mm_account_fault() increments the current->min/maj_flt counters
regardless of FAULT_FLAG_REMOTE.

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/uprobes.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 73dd12b09a7b..eb71428691bb 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -2035,13 +2035,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
 	if (likely(result == 0))
 		goto out;
 
-	/*
-	 * The NULL 'tsk' here ensures that any faults that occur here
-	 * will not be accounted to the task.  'mm' *is* current->mm,
-	 * but we treat this as a 'remote' access since it is
-	 * essentially a kernel access to the memory.
-	 */
-	result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page, NULL);
+	result = get_user_pages(vaddr, 1, FOLL_FORCE, &page);
 	if (result < 0)
 		return result;
 
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 3/9] uprobes: simplify error handling for alloc_uprobe()
  2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 1/9] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 2/9] uprobes: is_trap_at_addr: don't use get_user_pages_remote() Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 4/9] selftests/bpf: fix uprobe.path leak in bpf_testmod Oleg Nesterov
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel

From: Andrii Nakryiko <andrii@kernel.org>

Return -ENOMEM instead of NULL, which makes caller's error handling just
a touch simpler.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/uprobes.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index eb71428691bb..dfe6306a63b1 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -725,7 +725,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 
 	uprobe = kzalloc(sizeof(struct uprobe), GFP_KERNEL);
 	if (!uprobe)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	uprobe->inode = inode;
 	uprobe->offset = offset;
@@ -1167,8 +1167,6 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 
  retry:
 	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
-	if (!uprobe)
-		return -ENOMEM;
 	if (IS_ERR(uprobe))
 		return PTR_ERR(uprobe);
 
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 4/9] selftests/bpf: fix uprobe.path leak in bpf_testmod
  2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
                   ` (2 preceding siblings ...)
  2024-08-01 13:27 ` [PATCH v4 3/9] uprobes: simplify error handling for alloc_uprobe() Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 5/9] uprobes: kill uprobe_register_refctr() Oleg Nesterov
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel

From: Jiri Olsa <olsajiri@gmail.com>

testmod_unregister_uprobe() forgets to path_put(&uprobe.path).

Signed-off-by: Jiri Olsa <olsajiri@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index fd28c1157bd3..72f565af4f82 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -477,6 +477,7 @@ static void testmod_unregister_uprobe(void)
 	if (uprobe.offset) {
 		uprobe_unregister(d_real_inode(uprobe.path.dentry),
 				  uprobe.offset, &uprobe.consumer);
+		path_put(&uprobe.path);
 		uprobe.offset = 0;
 	}
 
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 5/9] uprobes: kill uprobe_register_refctr()
  2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
                   ` (3 preceding siblings ...)
  2024-08-01 13:27 ` [PATCH v4 4/9] selftests/bpf: fix uprobe.path leak in bpf_testmod Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 6/9] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel

It doesn't make any sense to have 2 versions of _register(). Note that
trace_uprobe_enable(), the only user of uprobe_register(), doesn't need
to check tu->ref_ctr_offset to decide which one should be used, it could
safely pass ref_ctr_offset == 0 to uprobe_register_refctr().

Add this argument to uprobe_register(), update the callers, and kill
uprobe_register_refctr().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/uprobes.h                       |  9 ++-----
 kernel/events/uprobes.c                       | 24 +++++--------------
 kernel/trace/bpf_trace.c                      |  8 +++----
 kernel/trace/trace_uprobe.c                   |  7 +-----
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   |  4 ++--
 5 files changed, 15 insertions(+), 37 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index b503fafb7fb3..440316fbf3c6 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -110,8 +110,7 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
-extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
-extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
+extern int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -152,11 +151,7 @@ static inline void uprobes_init(void)
 #define uprobe_get_trap_addr(regs)	instruction_pointer(regs)
 
 static inline int
-uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
-{
-	return -ENOSYS;
-}
-static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
 {
 	return -ENOSYS;
 }
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index dfe6306a63b1..b7f40bad8abc 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1121,25 +1121,26 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
 EXPORT_SYMBOL_GPL(uprobe_unregister);
 
 /*
- * __uprobe_register - register a probe
+ * uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
+ * @ref_ctr_offset: offset of SDT marker / reference counter
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, __uprobe_register() takes a creation
+ * Apart from the access refcount, uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of __uprobe_register() is required to keep @inode
+ * unregisters. Caller of uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-static int __uprobe_register(struct inode *inode, loff_t offset,
-			     loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
+		    struct uprobe_consumer *uc)
 {
 	struct uprobe *uprobe;
 	int ret;
@@ -1189,21 +1190,8 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 		goto retry;
 	return ret;
 }
-
-int uprobe_register(struct inode *inode, loff_t offset,
-		    struct uprobe_consumer *uc)
-{
-	return __uprobe_register(inode, offset, 0, uc);
-}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
-int uprobe_register_refctr(struct inode *inode, loff_t offset,
-			   loff_t ref_ctr_offset, struct uprobe_consumer *uc)
-{
-	return __uprobe_register(inode, offset, ref_ctr_offset, uc);
-}
-EXPORT_SYMBOL_GPL(uprobe_register_refctr);
-
 /*
  * uprobe_apply - unregister an already registered probe.
  * @inode: the file in which the probe has to be removed.
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index cd098846e251..afa909e17824 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3480,10 +3480,10 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
 		      &bpf_uprobe_multi_link_lops, prog);
 
 	for (i = 0; i < cnt; i++) {
-		err = uprobe_register_refctr(d_real_inode(link->path.dentry),
-					     uprobes[i].offset,
-					     uprobes[i].ref_ctr_offset,
-					     &uprobes[i].consumer);
+		err = uprobe_register(d_real_inode(link->path.dentry),
+				      uprobes[i].offset,
+				      uprobes[i].ref_ctr_offset,
+				      &uprobes[i].consumer);
 		if (err) {
 			bpf_uprobe_unregister(&path, uprobes, i);
 			goto error_free;
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index c98e3b3386ba..1f590f989c1e 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1089,12 +1089,7 @@ static int trace_uprobe_enable(struct trace_uprobe *tu, filter_func_t filter)
 	tu->consumer.filter = filter;
 	tu->inode = d_real_inode(tu->path.dentry);
 
-	if (tu->ref_ctr_offset)
-		ret = uprobe_register_refctr(tu->inode, tu->offset,
-				tu->ref_ctr_offset, &tu->consumer);
-	else
-		ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
-
+	ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset, &tu->consumer);
 	if (ret)
 		tu->inode = NULL;
 
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 72f565af4f82..55f6905de743 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -458,8 +458,8 @@ static int testmod_register_uprobe(loff_t offset)
 	if (err)
 		goto out;
 
-	err = uprobe_register_refctr(d_real_inode(uprobe.path.dentry),
-				     offset, 0, &uprobe.consumer);
+	err = uprobe_register(d_real_inode(uprobe.path.dentry),
+			      offset, 0, &uprobe.consumer);
 	if (err)
 		path_put(&uprobe.path);
 	else
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 6/9] uprobes: make uprobe_register() return struct uprobe *
  2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
                   ` (4 preceding siblings ...)
  2024-08-01 13:27 ` [PATCH v4 5/9] uprobes: kill uprobe_register_refctr() Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 7/9] uprobes: change uprobe_register() to use uprobe_unregister() instead of __uprobe_unregister() Oleg Nesterov
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel

This way uprobe_unregister() and uprobe_apply() can use "struct uprobe *"
rather than inode + offset. This simplifies the code and allows to avoid
the unnecessary find_uprobe() + put_uprobe() in these functions.

TODO: uprobe_unregister() still needs get_uprobe/put_uprobe to ensure that
this uprobe can't be freed before up_write(&uprobe->register_rwsem).

Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/uprobes.h                       | 15 ++---
 kernel/events/uprobes.c                       | 56 ++++++++-----------
 kernel/trace/bpf_trace.c                      | 25 ++++-----
 kernel/trace/trace_uprobe.c                   | 26 ++++-----
 .../selftests/bpf/bpf_testmod/bpf_testmod.c   | 25 ++++-----
 5 files changed, 67 insertions(+), 80 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 440316fbf3c6..137ddfc0b2f8 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -16,6 +16,7 @@
 #include <linux/types.h>
 #include <linux/wait.h>
 
+struct uprobe;
 struct vm_area_struct;
 struct mm_struct;
 struct inode;
@@ -110,9 +111,9 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
-extern int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
-extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
-extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
+extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
+extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
+extern void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
 extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end);
 extern void uprobe_start_dup_mmap(void);
@@ -150,18 +151,18 @@ static inline void uprobes_init(void)
 
 #define uprobe_get_trap_addr(regs)	instruction_pointer(regs)
 
-static inline int
+static inline struct uprobe *
 uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
 {
-	return -ENOSYS;
+	return ERR_PTR(-ENOSYS);
 }
 static inline int
-uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
+uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add)
 {
 	return -ENOSYS;
 }
 static inline void
-uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 }
 static inline int uprobe_mmap(struct vm_area_struct *vma)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index b7f40bad8abc..974474680820 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1099,20 +1099,14 @@ __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 		delete_uprobe(uprobe);
 }
 
-/*
+/**
  * uprobe_unregister - unregister an already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
+ * @uprobe: uprobe to remove
  * @uc: identify which probe if multiple probes are colocated.
  */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
-	struct uprobe *uprobe;
-
-	uprobe = find_uprobe(inode, offset);
-	if (WARN_ON(!uprobe))
-		return;
-
+	get_uprobe(uprobe);
 	down_write(&uprobe->register_rwsem);
 	__uprobe_unregister(uprobe, uc);
 	up_write(&uprobe->register_rwsem);
@@ -1120,7 +1114,7 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
 }
 EXPORT_SYMBOL_GPL(uprobe_unregister);
 
-/*
+/**
  * uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
@@ -1136,40 +1130,40 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
  * unregisters. Caller of uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
- * Return errno if it cannot successully install probes
- * else return 0 (success)
+ * Return: pointer to the new uprobe on success or an ERR_PTR on failure.
  */
-int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
-		    struct uprobe_consumer *uc)
+struct uprobe *uprobe_register(struct inode *inode,
+				loff_t offset, loff_t ref_ctr_offset,
+				struct uprobe_consumer *uc)
 {
 	struct uprobe *uprobe;
 	int ret;
 
 	/* Uprobe must have at least one set consumer */
 	if (!uc->handler && !uc->ret_handler)
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	/* copy_insn() uses read_mapping_page() or shmem_read_mapping_page() */
 	if (!inode->i_mapping->a_ops->read_folio &&
 	    !shmem_mapping(inode->i_mapping))
-		return -EIO;
+		return ERR_PTR(-EIO);
 	/* Racy, just to catch the obvious mistakes */
 	if (offset > i_size_read(inode))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	/*
 	 * This ensures that copy_from_page(), copy_to_page() and
 	 * __update_ref_ctr() can't cross page boundary.
 	 */
 	if (!IS_ALIGNED(offset, UPROBE_SWBP_INSN_SIZE))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 	if (!IS_ALIGNED(ref_ctr_offset, sizeof(short)))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
  retry:
 	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (IS_ERR(uprobe))
-		return PTR_ERR(uprobe);
+		return uprobe;
 
 	/*
 	 * We can race with uprobe_unregister()->delete_uprobe().
@@ -1188,35 +1182,29 @@ int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
 
 	if (unlikely(ret == -EAGAIN))
 		goto retry;
-	return ret;
+
+	return ret ? ERR_PTR(ret) : uprobe;
 }
 EXPORT_SYMBOL_GPL(uprobe_register);
 
-/*
- * uprobe_apply - unregister an already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
+/**
+ * uprobe_apply - add or remove the breakpoints according to @uc->filter
+ * @uprobe: uprobe which "owns" the breakpoint
  * @uc: consumer which wants to add more or remove some breakpoints
  * @add: add or remove the breakpoints
+ * Return: 0 on success or negative error code.
  */
-int uprobe_apply(struct inode *inode, loff_t offset,
-			struct uprobe_consumer *uc, bool add)
+int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool add)
 {
-	struct uprobe *uprobe;
 	struct uprobe_consumer *con;
 	int ret = -ENOENT;
 
-	uprobe = find_uprobe(inode, offset);
-	if (WARN_ON(!uprobe))
-		return ret;
-
 	down_write(&uprobe->register_rwsem);
 	for (con = uprobe->consumers; con && con != uc ; con = con->next)
 		;
 	if (con)
 		ret = register_for_each_vma(uprobe, add ? uc : NULL);
 	up_write(&uprobe->register_rwsem);
-	put_uprobe(uprobe);
 
 	return ret;
 }
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index afa909e17824..4e391daafa64 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3160,6 +3160,7 @@ struct bpf_uprobe {
 	loff_t offset;
 	unsigned long ref_ctr_offset;
 	u64 cookie;
+	struct uprobe *uprobe;
 	struct uprobe_consumer consumer;
 };
 
@@ -3178,15 +3179,12 @@ struct bpf_uprobe_multi_run_ctx {
 	struct bpf_uprobe *uprobe;
 };
 
-static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
-				  u32 cnt)
+static void bpf_uprobe_unregister(struct bpf_uprobe *uprobes, u32 cnt)
 {
 	u32 i;
 
-	for (i = 0; i < cnt; i++) {
-		uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
-				  &uprobes[i].consumer);
-	}
+	for (i = 0; i < cnt; i++)
+		uprobe_unregister(uprobes[i].uprobe, &uprobes[i].consumer);
 }
 
 static void bpf_uprobe_multi_link_release(struct bpf_link *link)
@@ -3194,7 +3192,7 @@ static void bpf_uprobe_multi_link_release(struct bpf_link *link)
 	struct bpf_uprobe_multi_link *umulti_link;
 
 	umulti_link = container_of(link, struct bpf_uprobe_multi_link, link);
-	bpf_uprobe_unregister(&umulti_link->path, umulti_link->uprobes, umulti_link->cnt);
+	bpf_uprobe_unregister(umulti_link->uprobes, umulti_link->cnt);
 	if (umulti_link->task)
 		put_task_struct(umulti_link->task);
 	path_put(&umulti_link->path);
@@ -3480,12 +3478,13 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
 		      &bpf_uprobe_multi_link_lops, prog);
 
 	for (i = 0; i < cnt; i++) {
-		err = uprobe_register(d_real_inode(link->path.dentry),
-				      uprobes[i].offset,
-				      uprobes[i].ref_ctr_offset,
-				      &uprobes[i].consumer);
-		if (err) {
-			bpf_uprobe_unregister(&path, uprobes, i);
+		uprobes[i].uprobe = uprobe_register(d_real_inode(link->path.dentry),
+						    uprobes[i].offset,
+						    uprobes[i].ref_ctr_offset,
+						    &uprobes[i].consumer);
+		if (IS_ERR(uprobes[i].uprobe)) {
+			err = PTR_ERR(uprobes[i].uprobe);
+			bpf_uprobe_unregister(uprobes, i);
 			goto error_free;
 		}
 	}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 1f590f989c1e..52e76a73fa7c 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -58,8 +58,8 @@ struct trace_uprobe {
 	struct dyn_event		devent;
 	struct uprobe_consumer		consumer;
 	struct path			path;
-	struct inode			*inode;
 	char				*filename;
+	struct uprobe			*uprobe;
 	unsigned long			offset;
 	unsigned long			ref_ctr_offset;
 	unsigned long			nhit;
@@ -1084,16 +1084,16 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 
 static int trace_uprobe_enable(struct trace_uprobe *tu, filter_func_t filter)
 {
-	int ret;
+	struct inode *inode = d_real_inode(tu->path.dentry);
+	struct uprobe *uprobe;
 
 	tu->consumer.filter = filter;
-	tu->inode = d_real_inode(tu->path.dentry);
-
-	ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset, &tu->consumer);
-	if (ret)
-		tu->inode = NULL;
+	uprobe = uprobe_register(inode, tu->offset, tu->ref_ctr_offset, &tu->consumer);
+	if (IS_ERR(uprobe))
+		return PTR_ERR(uprobe);
 
-	return ret;
+	tu->uprobe = uprobe;
+	return 0;
 }
 
 static void __probe_event_disable(struct trace_probe *tp)
@@ -1104,11 +1104,11 @@ static void __probe_event_disable(struct trace_probe *tp)
 	WARN_ON(!uprobe_filter_is_empty(tu->tp.event->filter));
 
 	list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) {
-		if (!tu->inode)
+		if (!tu->uprobe)
 			continue;
 
-		uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
-		tu->inode = NULL;
+		uprobe_unregister(tu->uprobe, &tu->consumer);
+		tu->uprobe = NULL;
 	}
 }
 
@@ -1305,7 +1305,7 @@ static int uprobe_perf_close(struct trace_event_call *call,
 		return 0;
 
 	list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) {
-		ret = uprobe_apply(tu->inode, tu->offset, &tu->consumer, false);
+		ret = uprobe_apply(tu->uprobe, &tu->consumer, false);
 		if (ret)
 			break;
 	}
@@ -1329,7 +1329,7 @@ static int uprobe_perf_open(struct trace_event_call *call,
 		return 0;
 
 	list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) {
-		err = uprobe_apply(tu->inode, tu->offset, &tu->consumer, true);
+		err = uprobe_apply(tu->uprobe, &tu->consumer, true);
 		if (err) {
 			uprobe_perf_close(call, event);
 			break;
diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
index 55f6905de743..3c0515a27842 100644
--- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
+++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
@@ -432,7 +432,7 @@ uprobe_ret_handler(struct uprobe_consumer *self, unsigned long func,
 
 struct testmod_uprobe {
 	struct path path;
-	loff_t offset;
+	struct uprobe *uprobe;
 	struct uprobe_consumer consumer;
 };
 
@@ -446,25 +446,25 @@ static int testmod_register_uprobe(loff_t offset)
 {
 	int err = -EBUSY;
 
-	if (uprobe.offset)
+	if (uprobe.uprobe)
 		return -EBUSY;
 
 	mutex_lock(&testmod_uprobe_mutex);
 
-	if (uprobe.offset)
+	if (uprobe.uprobe)
 		goto out;
 
 	err = kern_path("/proc/self/exe", LOOKUP_FOLLOW, &uprobe.path);
 	if (err)
 		goto out;
 
-	err = uprobe_register(d_real_inode(uprobe.path.dentry),
-			      offset, 0, &uprobe.consumer);
-	if (err)
+	uprobe.uprobe = uprobe_register(d_real_inode(uprobe.path.dentry),
+					offset, 0, &uprobe.consumer);
+	if (IS_ERR(uprobe.uprobe)) {
+		err = PTR_ERR(uprobe.uprobe);
 		path_put(&uprobe.path);
-	else
-		uprobe.offset = offset;
-
+		uprobe.uprobe = NULL;
+	}
 out:
 	mutex_unlock(&testmod_uprobe_mutex);
 	return err;
@@ -474,11 +474,10 @@ static void testmod_unregister_uprobe(void)
 {
 	mutex_lock(&testmod_uprobe_mutex);
 
-	if (uprobe.offset) {
-		uprobe_unregister(d_real_inode(uprobe.path.dentry),
-				  uprobe.offset, &uprobe.consumer);
+	if (uprobe.uprobe) {
+		uprobe_unregister(uprobe.uprobe, &uprobe.consumer);
 		path_put(&uprobe.path);
-		uprobe.offset = 0;
+		uprobe.uprobe = NULL;
 	}
 
 	mutex_unlock(&testmod_uprobe_mutex);
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 7/9] uprobes: change uprobe_register() to use uprobe_unregister() instead of __uprobe_unregister()
  2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
                   ` (5 preceding siblings ...)
  2024-08-01 13:27 ` [PATCH v4 6/9] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 8/9] uprobes: fold __uprobe_unregister() into uprobe_unregister() Oleg Nesterov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel

If register_for_each_vma() fails uprobe_register() can safely drop
uprobe->register_rwsem and use uprobe_unregister(). There is no worry
about the races with another register/unregister, consumer_add() was
already called so this case doesn't differ from _unregister() right
after the successful _register().

Yes this means the extra up_write() + down_write(), but this is the
slow and unlikely case anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/uprobes.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 974474680820..5ea0aabe8774 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1174,16 +1174,18 @@ struct uprobe *uprobe_register(struct inode *inode,
 	if (likely(uprobe_is_active(uprobe))) {
 		consumer_add(uprobe, uc);
 		ret = register_for_each_vma(uprobe, uc);
-		if (ret)
-			__uprobe_unregister(uprobe, uc);
 	}
 	up_write(&uprobe->register_rwsem);
 	put_uprobe(uprobe);
 
-	if (unlikely(ret == -EAGAIN))
-		goto retry;
+	if (ret) {
+		if (unlikely(ret == -EAGAIN))
+			goto retry;
+		uprobe_unregister(uprobe, uc);
+		return ERR_PTR(ret);
+	}
 
-	return ret ? ERR_PTR(ret) : uprobe;
+	return uprobe;
 }
 EXPORT_SYMBOL_GPL(uprobe_register);
 
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 8/9] uprobes: fold __uprobe_unregister() into uprobe_unregister()
  2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
                   ` (6 preceding siblings ...)
  2024-08-01 13:27 ` [PATCH v4 7/9] uprobes: change uprobe_register() to use uprobe_unregister() instead of __uprobe_unregister() Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
  2024-08-01 13:27 ` [PATCH v4 9/9] uprobes: shift put_uprobe() from delete_uprobe() to uprobe_unregister() Oleg Nesterov
  2024-08-01 13:36 ` [PATCH v4 0/9] uprobes: misc cleanups/simplifications Peter Zijlstra
  9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel

Fold __uprobe_unregister() into its single caller, uprobe_unregister().
A separate patch to simplify the next change.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/uprobes.c | 25 ++++++++++---------------
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 5ea0aabe8774..c06e1a5f1783 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1085,20 +1085,6 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 	return err;
 }
 
-static void
-__uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
-{
-	int err;
-
-	if (WARN_ON(!consumer_del(uprobe, uc)))
-		return;
-
-	err = register_for_each_vma(uprobe, NULL);
-	/* TODO : cant unregister? schedule a worker thread */
-	if (!uprobe->consumers && !err)
-		delete_uprobe(uprobe);
-}
-
 /**
  * uprobe_unregister - unregister an already registered probe.
  * @uprobe: uprobe to remove
@@ -1106,9 +1092,18 @@ __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
  */
 void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
+	int err;
+
 	get_uprobe(uprobe);
 	down_write(&uprobe->register_rwsem);
-	__uprobe_unregister(uprobe, uc);
+	if (WARN_ON(!consumer_del(uprobe, uc)))
+		err = -ENOENT;
+	else
+		err = register_for_each_vma(uprobe, NULL);
+
+	/* TODO : cant unregister? schedule a worker thread */
+	if (!err && !uprobe->consumers)
+		delete_uprobe(uprobe);
 	up_write(&uprobe->register_rwsem);
 	put_uprobe(uprobe);
 }
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 9/9] uprobes: shift put_uprobe() from delete_uprobe() to uprobe_unregister()
  2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
                   ` (7 preceding siblings ...)
  2024-08-01 13:27 ` [PATCH v4 8/9] uprobes: fold __uprobe_unregister() into uprobe_unregister() Oleg Nesterov
@ 2024-08-01 13:27 ` Oleg Nesterov
  2024-08-01 13:36 ` [PATCH v4 0/9] uprobes: misc cleanups/simplifications Peter Zijlstra
  9 siblings, 0 replies; 17+ messages in thread
From: Oleg Nesterov @ 2024-08-01 13:27 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: jolsa, rostedt, linux-kernel, linux-trace-kernel

Kill the extra get_uprobe() + put_uprobe() in uprobe_unregister() and
move the possibly final put_uprobe() from delete_uprobe() to its only
caller, uprobe_unregister().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/events/uprobes.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c06e1a5f1783..f88b7ff20587 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -939,7 +939,6 @@ static void delete_uprobe(struct uprobe *uprobe)
 	rb_erase(&uprobe->rb_node, &uprobes_tree);
 	write_unlock(&uprobes_treelock);
 	RB_CLEAR_NODE(&uprobe->rb_node); /* for uprobe_is_active() */
-	put_uprobe(uprobe);
 }
 
 struct map_info {
@@ -1094,7 +1093,6 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 	int err;
 
-	get_uprobe(uprobe);
 	down_write(&uprobe->register_rwsem);
 	if (WARN_ON(!consumer_del(uprobe, uc)))
 		err = -ENOENT;
@@ -1102,10 +1100,16 @@ void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 		err = register_for_each_vma(uprobe, NULL);
 
 	/* TODO : cant unregister? schedule a worker thread */
-	if (!err && !uprobe->consumers)
-		delete_uprobe(uprobe);
+	if (!err) {
+		if (!uprobe->consumers)
+			delete_uprobe(uprobe);
+		else
+			err = -EBUSY;
+	}
 	up_write(&uprobe->register_rwsem);
-	put_uprobe(uprobe);
+
+	if (!err)
+		put_uprobe(uprobe);
 }
 EXPORT_SYMBOL_GPL(uprobe_unregister);
 
-- 
2.25.1.362.g51ebf55


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
  2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
                   ` (8 preceding siblings ...)
  2024-08-01 13:27 ` [PATCH v4 9/9] uprobes: shift put_uprobe() from delete_uprobe() to uprobe_unregister() Oleg Nesterov
@ 2024-08-01 13:36 ` Peter Zijlstra
  2024-08-01 18:58   ` Andrii Nakryiko
  9 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2024-08-01 13:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: andrii, mhiramat, jolsa, rostedt, linux-kernel,
	linux-trace-kernel

On Thu, Aug 01, 2024 at 03:26:38PM +0200, Oleg Nesterov wrote:
> (Andrii, I'll try to look at your new series on Weekend).

OK, I dropped all your previous patches and stuffed these in.

They should all be visible in queue/perf/core, and provided the robot
doesn't scream, I'll push them into tip/perf/core soonish.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
  2024-08-01 13:36 ` [PATCH v4 0/9] uprobes: misc cleanups/simplifications Peter Zijlstra
@ 2024-08-01 18:58   ` Andrii Nakryiko
  2024-08-01 21:13     ` Andrii Nakryiko
  0 siblings, 1 reply; 17+ messages in thread
From: Andrii Nakryiko @ 2024-08-01 18:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, andrii, mhiramat, jolsa, rostedt, linux-kernel,
	linux-trace-kernel, bpf

+ bpf

On Thu, Aug 1, 2024 at 6:36 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Aug 01, 2024 at 03:26:38PM +0200, Oleg Nesterov wrote:
> > (Andrii, I'll try to look at your new series on Weekend).
>
> OK, I dropped all your previous patches and stuffed these in.
>
> They should all be visible in queue/perf/core, and provided the robot
> doesn't scream, I'll push them into tip/perf/core soonish.

Just FYI, it seems like tip/perf/core is currently broken for uprobes
(and by implication also queue/perf/core). Also torvalds/linux/master
master is broken. See what I'm getting when running BPF selftests
dealing with uprobes. Sometimes I only get that WARNING and nothing
else.

I'm bisecting at the moment with bpf/master being a "good" checkpoint,
will let you know once I bisect.

[   34.343557] ------------[ cut here ]------------
[   34.343906] WARNING: CPU: 3 PID: 2364 at
kernel/trace/trace_uprobe.c:1109 __probe_event_disable+0x26/0x80
[   34.344468] Modules linked in:
[   34.344488] BUG: unable to handle page fault for address: ffffc90001c5bea0
[   34.345071] #PF: supervisor read access in kernel mode
[   34.345370] #PF: error_code(0x0000) - not-present page
[   34.345700] PGD 100000067 P4D 100000067 PUD ffff88810d86cd40
[   34.346061] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[   34.346377] CPU: 3 UID: 0 PID: 2364 Comm: test_progs Tainted: G
      OE      6.11.0-rc1-00006-g6763ebdb4983 #115
[   34.347052] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   34.347392] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04
[   34.348085] RIP: 0010:__wake_up_common+0x3e/0xc0
[   34.348359] Code: 89 d4 55 53 48 89 fb 48 83 ec 08 8b 05 c3 05 a8
02 89 74 24 04 85 c0 75 6d 48 8b 43 48 48 83 c3 48 4f
[   34.349440] RSP: 0018:ffffc900001d0d90 EFLAGS: 00010087
[   34.349796] RAX: ffffc90001c5bea0 RBX: ffff88810138e0e8 RCX: 0000000000000001
[   34.350282] RDX: 0000000080010005 RSI: ffffffff8295f82c RDI: ffffc90001c5be88
[   34.350768] RBP: ffff88810138e0a0 R08: 0000000000000000 R09: 000000000000438f
[   34.351245] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
[   34.351703] R13: 0000000000000046 R14: 0000000000000000 R15: 0000000000000000
[   34.352112] FS:  00007fd71c7b3d00(0000) GS:ffff88881ca00000(0000)
knlGS:0000000000000000
[   34.352574] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.352911] CR2: ffffc90001c5bea0 CR3: 000000010b456005 CR4: 0000000000370ef0
[   34.353320] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   34.353734] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   34.354144] Call Trace:
[   34.354290]  <IRQ>
[   34.354413]  ? __die+0x1f/0x60
[   34.354601]  ? page_fault_oops+0x14c/0x450
[   34.354848]  ? search_bpf_extables+0xa8/0x150
[   34.355105]  ? fixup_exception+0x22/0x2d0
[   34.355342]  ? exc_page_fault+0x207/0x210
[   34.355579]  ? asm_exc_page_fault+0x22/0x30
[   34.355829]  ? __wake_up_common+0x3e/0xc0
[   34.356065]  __wake_up+0x32/0x60
[   34.356261]  ep_poll_callback+0x13e/0x270
[   34.356502]  __wake_up_common+0x7d/0xc0
[   34.356731]  __wake_up+0x32/0x60
[   34.356961]  irq_work_single+0x67/0x90
[   34.357184]  irq_work_run_list+0x26/0x40
[   34.357442]  update_process_times+0x83/0xa0
[   34.357717]  tick_nohz_handler+0x97/0x140
[   34.357977]  ? __pfx_tick_nohz_handler+0x10/0x10
[   34.358283]  __hrtimer_run_queues+0x19a/0x3b0
[   34.358580]  hrtimer_interrupt+0xfe/0x240
[   34.358864]  __sysvec_apic_timer_interrupt+0x87/0x210
[   34.359189]  sysvec_apic_timer_interrupt+0x98/0xc0
[   34.359487]  </IRQ>
[   34.359614]  <TASK>
[   34.359745]  asm_sysvec_apic_timer_interrupt+0x16/0x20
[   34.360042] RIP: 0010:print_modules+0x27/0xd0
[   34.360301] Code: 90 90 90 0f 1f 44 00 00 53 48 c7 c7 63 22 98 82
48 83 ec 18 e8 da 8d fc ff bf 01 00 00 00 e8 80 d1 f1
[   34.361372] RSP: 0018:ffffc90001e5bc58 EFLAGS: 00000297
[   34.361682] RAX: ffffffffa03f0948 RBX: 0000000000000000 RCX: 0000000000000000
[   34.362091] RDX: 0000000000000002 RSI: 0000000000000027 RDI: 0000000000000001
[   34.362502] RBP: ffffc90001e5bd28 R08: 00000000fffeffff R09: 0000000000000001
[   34.362917] R10: 0000000000000000 R11: ffffffff83299920 R12: ffffffff81233536
[   34.363325] R13: 0000000000000009 R14: 0000000000000455 R15: ffffffff8298f924
[   34.363740]  ? __probe_event_disable+0x26/0x80
[   34.364009]  ? print_modules+0x20/0xd0
[   34.364230]  ? __probe_event_disable+0x26/0x80
[   34.364488]  __warn+0x6f/0x180
[   34.364683]  ? __probe_event_disable+0x26/0x80
[   34.364945]  report_bug+0x18d/0x1c0
[   34.365156]  handle_bug+0x3a/0x70
[   34.365354]  exc_invalid_op+0x13/0x60
[   34.365571]  asm_exc_invalid_op+0x16/0x20
[   34.365838] RIP: 0010:__probe_event_disable+0x26/0x80
[   34.366175] Code: 90 90 90 90 55 48 89 fd 53 48 8b 47 10 8b 90 38
01 00 00 85 d2 75 13 48 8b 88 40 01 00 00 48 8d 90 40
[   34.367340] RSP: 0018:ffffc90001e5bdd0 EFLAGS: 00010287
[   34.367651] RAX: ffff88810d86cc00 RBX: ffff88810d86cce0 RCX: ffff8881075c8168
[   34.368067] RDX: ffff88810d86cd40 RSI: 0000000000000003 RDI: ffff88810a5e1db0
[   34.368482] RBP: ffff88810a5e1db0 R08: 00000000ffffffff R09: 0000000000000000
[   34.368896] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881075c8250
[   34.369310] R13: ffff8881075c82f0 R14: ffffc90001e5bb80 R15: ffff8881075c8000
[   34.369775]  trace_uprobe_register+0x1a8/0x300
[   34.370053]  perf_trace_event_unreg.isra.0+0x22/0x80
[   34.370342]  perf_uprobe_destroy+0x3a/0x70
[   34.370582]  _free_event+0x114/0x580
[   34.370806]  perf_event_release_kernel+0x282/0x2c0
[   34.371123]  perf_release+0x11/0x20
[   34.371354]  __fput+0x102/0x2e0
[   34.371561]  task_work_run+0x55/0xa0
[   34.371798]  syscall_exit_to_user_mode+0x1dd/0x1f0
[   34.372104]  do_syscall_64+0x70/0x140
[   34.372337]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   34.372659] RIP: 0033:0x7fd71c986a94
[   34.372901] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84
00 00 00 00 00 90 f3 0f 1e fa 80 3d d5 18 0e 00 00 73
[   34.373987] RSP: 002b:00007ffc09d83118 EFLAGS: 00000202 ORIG_RAX:
0000000000000003
[   34.374426] RAX: 0000000000000000 RBX: 00007ffc09d835e8 RCX: 00007fd71c986a94
[   34.374874] RDX: 000000000000000b RSI: 0000000000002401 RDI: 000000000000000c
[   34.375325] RBP: 00007ffc09d83150 R08: 00000000049d39c7 R09: 0000000000000007
[   34.375782] R10: 00000000049d71b0 R11: 0000000000000202 R12: 0000000000000000
[   34.376245] R13: 00007ffc09d83608 R14: 00007fd71cae3000 R15: 000000000103cd90
[   34.376661]  </TASK>
[   34.376794] Modules linked in: bpf_testmod(OE) aesni_intel(E)
crypto_simd(E) cryptd(E) kvm_intel(E) i2c_piix4(E) i2c_s)
[   34.377900] CR2: ffffc90001c5bea0
[   34.378097] ---[ end trace 0000000000000000 ]---
[   34.378366] RIP: 0010:__wake_up_common+0x3e/0xc0
[   34.378637] Code: 89 d4 55 53 48 89 fb 48 83 ec 08 8b 05 c3 05 a8
02 89 74 24 04 85 c0 75 6d 48 8b 43 48 48 83 c3 48 4f
[   34.379703] RSP: 0018:ffffc900001d0d90 EFLAGS: 00010087
[   34.380004] RAX: ffffc90001c5bea0 RBX: ffff88810138e0e8 RCX: 0000000000000001
[   34.380414] RDX: 0000000080010005 RSI: ffffffff8295f82c RDI: ffffc90001c5be88
[   34.380827] RBP: ffff88810138e0a0 R08: 0000000000000000 R09: 000000000000438f
[   34.381284] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
[   34.381736] R13: 0000000000000046 R14: 0000000000000000 R15: 0000000000000000
[   34.382198] FS:  00007fd71c7b3d00(0000) GS:ffff88881ca00000(0000)
knlGS:0000000000000000
[   34.382712] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.383082] CR2: ffffc90001c5bea0 CR3: 000000010b456005 CR4: 0000000000370ef0
[   34.383540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   34.383991] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   34.384407] Kernel panic - not syncing: Fatal exception in interrupt
[   35.464708] Shutting down cpus with NMI
[   35.465244] Kernel Offset: disabled
[   35.465474] ---[ end Kernel panic - not syncing: Fatal exception in
interrupt ]---

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
  2024-08-01 18:58   ` Andrii Nakryiko
@ 2024-08-01 21:13     ` Andrii Nakryiko
  2024-08-02  8:27       ` Peter Zijlstra
  2024-08-02  9:25       ` Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2024-08-01 21:13 UTC (permalink / raw)
  To: Peter Zijlstra, Adrian Hunter
  Cc: Oleg Nesterov, andrii, mhiramat, jolsa, rostedt, linux-kernel,
	linux-trace-kernel, bpf

On Thu, Aug 1, 2024 at 11:58 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> + bpf
>
> On Thu, Aug 1, 2024 at 6:36 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Aug 01, 2024 at 03:26:38PM +0200, Oleg Nesterov wrote:
> > > (Andrii, I'll try to look at your new series on Weekend).
> >
> > OK, I dropped all your previous patches and stuffed these in.
> >
> > They should all be visible in queue/perf/core, and provided the robot
> > doesn't scream, I'll push them into tip/perf/core soonish.
>
> Just FYI, it seems like tip/perf/core is currently broken for uprobes
> (and by implication also queue/perf/core). Also torvalds/linux/master
> master is broken. See what I'm getting when running BPF selftests
> dealing with uprobes. Sometimes I only get that WARNING and nothing
> else.
>
> I'm bisecting at the moment with bpf/master being a "good" checkpoint,
> will let you know once I bisect.

Ok, this bisected to:

675ad74989c2 ("perf/core: Add aux_pause, aux_resume, aux_start_paused")

Reverting all (applied to tip/perf/core) four patches from that series:

6763ebdb4983 (tip/perf/core) perf/x86/intel: Do not enable large PEBS
for events with aux actions or aux sampling
6a45d8847597 perf/x86/intel/pt: Add support for pause / resume
675ad74989c2 perf/core: Add aux_pause, aux_resume, aux_start_paused
d92792a4b26e perf/x86/intel/pt: Fix sampling synchronization

... makes everything work again. I'll leave it up to you and Adrian to
figure this out.

>
> [   34.343557] ------------[ cut here ]------------
> [   34.343906] WARNING: CPU: 3 PID: 2364 at
> kernel/trace/trace_uprobe.c:1109 __probe_event_disable+0x26/0x80
> [   34.344468] Modules linked in:
> [   34.344488] BUG: unable to handle page fault for address: ffffc90001c5bea0
> [   34.345071] #PF: supervisor read access in kernel mode
> [   34.345370] #PF: error_code(0x0000) - not-present page
> [   34.345700] PGD 100000067 P4D 100000067 PUD ffff88810d86cd40
> [   34.346061] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> [   34.346377] CPU: 3 UID: 0 PID: 2364 Comm: test_progs Tainted: G
>       OE      6.11.0-rc1-00006-g6763ebdb4983 #115
> [   34.347052] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [   34.347392] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04
> [   34.348085] RIP: 0010:__wake_up_common+0x3e/0xc0
> [   34.348359] Code: 89 d4 55 53 48 89 fb 48 83 ec 08 8b 05 c3 05 a8
> 02 89 74 24 04 85 c0 75 6d 48 8b 43 48 48 83 c3 48 4f
> [   34.349440] RSP: 0018:ffffc900001d0d90 EFLAGS: 00010087
> [   34.349796] RAX: ffffc90001c5bea0 RBX: ffff88810138e0e8 RCX: 0000000000000001
> [   34.350282] RDX: 0000000080010005 RSI: ffffffff8295f82c RDI: ffffc90001c5be88
> [   34.350768] RBP: ffff88810138e0a0 R08: 0000000000000000 R09: 000000000000438f
> [   34.351245] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
> [   34.351703] R13: 0000000000000046 R14: 0000000000000000 R15: 0000000000000000
> [   34.352112] FS:  00007fd71c7b3d00(0000) GS:ffff88881ca00000(0000)
> knlGS:0000000000000000
> [   34.352574] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   34.352911] CR2: ffffc90001c5bea0 CR3: 000000010b456005 CR4: 0000000000370ef0
> [   34.353320] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   34.353734] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   34.354144] Call Trace:
> [   34.354290]  <IRQ>
> [   34.354413]  ? __die+0x1f/0x60
> [   34.354601]  ? page_fault_oops+0x14c/0x450
> [   34.354848]  ? search_bpf_extables+0xa8/0x150
> [   34.355105]  ? fixup_exception+0x22/0x2d0
> [   34.355342]  ? exc_page_fault+0x207/0x210
> [   34.355579]  ? asm_exc_page_fault+0x22/0x30
> [   34.355829]  ? __wake_up_common+0x3e/0xc0
> [   34.356065]  __wake_up+0x32/0x60
> [   34.356261]  ep_poll_callback+0x13e/0x270
> [   34.356502]  __wake_up_common+0x7d/0xc0
> [   34.356731]  __wake_up+0x32/0x60
> [   34.356961]  irq_work_single+0x67/0x90
> [   34.357184]  irq_work_run_list+0x26/0x40
> [   34.357442]  update_process_times+0x83/0xa0
> [   34.357717]  tick_nohz_handler+0x97/0x140
> [   34.357977]  ? __pfx_tick_nohz_handler+0x10/0x10
> [   34.358283]  __hrtimer_run_queues+0x19a/0x3b0
> [   34.358580]  hrtimer_interrupt+0xfe/0x240
> [   34.358864]  __sysvec_apic_timer_interrupt+0x87/0x210
> [   34.359189]  sysvec_apic_timer_interrupt+0x98/0xc0
> [   34.359487]  </IRQ>
> [   34.359614]  <TASK>
> [   34.359745]  asm_sysvec_apic_timer_interrupt+0x16/0x20
> [   34.360042] RIP: 0010:print_modules+0x27/0xd0
> [   34.360301] Code: 90 90 90 0f 1f 44 00 00 53 48 c7 c7 63 22 98 82
> 48 83 ec 18 e8 da 8d fc ff bf 01 00 00 00 e8 80 d1 f1
> [   34.361372] RSP: 0018:ffffc90001e5bc58 EFLAGS: 00000297
> [   34.361682] RAX: ffffffffa03f0948 RBX: 0000000000000000 RCX: 0000000000000000
> [   34.362091] RDX: 0000000000000002 RSI: 0000000000000027 RDI: 0000000000000001
> [   34.362502] RBP: ffffc90001e5bd28 R08: 00000000fffeffff R09: 0000000000000001
> [   34.362917] R10: 0000000000000000 R11: ffffffff83299920 R12: ffffffff81233536
> [   34.363325] R13: 0000000000000009 R14: 0000000000000455 R15: ffffffff8298f924
> [   34.363740]  ? __probe_event_disable+0x26/0x80
> [   34.364009]  ? print_modules+0x20/0xd0
> [   34.364230]  ? __probe_event_disable+0x26/0x80
> [   34.364488]  __warn+0x6f/0x180
> [   34.364683]  ? __probe_event_disable+0x26/0x80
> [   34.364945]  report_bug+0x18d/0x1c0
> [   34.365156]  handle_bug+0x3a/0x70
> [   34.365354]  exc_invalid_op+0x13/0x60
> [   34.365571]  asm_exc_invalid_op+0x16/0x20
> [   34.365838] RIP: 0010:__probe_event_disable+0x26/0x80
> [   34.366175] Code: 90 90 90 90 55 48 89 fd 53 48 8b 47 10 8b 90 38
> 01 00 00 85 d2 75 13 48 8b 88 40 01 00 00 48 8d 90 40
> [   34.367340] RSP: 0018:ffffc90001e5bdd0 EFLAGS: 00010287
> [   34.367651] RAX: ffff88810d86cc00 RBX: ffff88810d86cce0 RCX: ffff8881075c8168
> [   34.368067] RDX: ffff88810d86cd40 RSI: 0000000000000003 RDI: ffff88810a5e1db0
> [   34.368482] RBP: ffff88810a5e1db0 R08: 00000000ffffffff R09: 0000000000000000
> [   34.368896] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881075c8250
> [   34.369310] R13: ffff8881075c82f0 R14: ffffc90001e5bb80 R15: ffff8881075c8000
> [   34.369775]  trace_uprobe_register+0x1a8/0x300
> [   34.370053]  perf_trace_event_unreg.isra.0+0x22/0x80
> [   34.370342]  perf_uprobe_destroy+0x3a/0x70
> [   34.370582]  _free_event+0x114/0x580
> [   34.370806]  perf_event_release_kernel+0x282/0x2c0
> [   34.371123]  perf_release+0x11/0x20
> [   34.371354]  __fput+0x102/0x2e0
> [   34.371561]  task_work_run+0x55/0xa0
> [   34.371798]  syscall_exit_to_user_mode+0x1dd/0x1f0
> [   34.372104]  do_syscall_64+0x70/0x140
> [   34.372337]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   34.372659] RIP: 0033:0x7fd71c986a94
> [   34.372901] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84
> 00 00 00 00 00 90 f3 0f 1e fa 80 3d d5 18 0e 00 00 73
> [   34.373987] RSP: 002b:00007ffc09d83118 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000003
> [   34.374426] RAX: 0000000000000000 RBX: 00007ffc09d835e8 RCX: 00007fd71c986a94
> [   34.374874] RDX: 000000000000000b RSI: 0000000000002401 RDI: 000000000000000c
> [   34.375325] RBP: 00007ffc09d83150 R08: 00000000049d39c7 R09: 0000000000000007
> [   34.375782] R10: 00000000049d71b0 R11: 0000000000000202 R12: 0000000000000000
> [   34.376245] R13: 00007ffc09d83608 R14: 00007fd71cae3000 R15: 000000000103cd90
> [   34.376661]  </TASK>
> [   34.376794] Modules linked in: bpf_testmod(OE) aesni_intel(E)
> crypto_simd(E) cryptd(E) kvm_intel(E) i2c_piix4(E) i2c_s)
> [   34.377900] CR2: ffffc90001c5bea0
> [   34.378097] ---[ end trace 0000000000000000 ]---
> [   34.378366] RIP: 0010:__wake_up_common+0x3e/0xc0
> [   34.378637] Code: 89 d4 55 53 48 89 fb 48 83 ec 08 8b 05 c3 05 a8
> 02 89 74 24 04 85 c0 75 6d 48 8b 43 48 48 83 c3 48 4f
> [   34.379703] RSP: 0018:ffffc900001d0d90 EFLAGS: 00010087
> [   34.380004] RAX: ffffc90001c5bea0 RBX: ffff88810138e0e8 RCX: 0000000000000001
> [   34.380414] RDX: 0000000080010005 RSI: ffffffff8295f82c RDI: ffffc90001c5be88
> [   34.380827] RBP: ffff88810138e0a0 R08: 0000000000000000 R09: 000000000000438f
> [   34.381284] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000001
> [   34.381736] R13: 0000000000000046 R14: 0000000000000000 R15: 0000000000000000
> [   34.382198] FS:  00007fd71c7b3d00(0000) GS:ffff88881ca00000(0000)
> knlGS:0000000000000000
> [   34.382712] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   34.383082] CR2: ffffc90001c5bea0 CR3: 000000010b456005 CR4: 0000000000370ef0
> [   34.383540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   34.383991] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   34.384407] Kernel panic - not syncing: Fatal exception in interrupt
> [   35.464708] Shutting down cpus with NMI
> [   35.465244] Kernel Offset: disabled
> [   35.465474] ---[ end Kernel panic - not syncing: Fatal exception in
> interrupt ]---

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
  2024-08-01 21:13     ` Andrii Nakryiko
@ 2024-08-02  8:27       ` Peter Zijlstra
  2024-08-02  9:25       ` Peter Zijlstra
  1 sibling, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2024-08-02  8:27 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Adrian Hunter, Oleg Nesterov, andrii, mhiramat, jolsa, rostedt,
	linux-kernel, linux-trace-kernel, bpf

On Thu, Aug 01, 2024 at 02:13:41PM -0700, Andrii Nakryiko wrote:
> On Thu, Aug 1, 2024 at 11:58 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > + bpf
> >
> > On Thu, Aug 1, 2024 at 6:36 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Aug 01, 2024 at 03:26:38PM +0200, Oleg Nesterov wrote:
> > > > (Andrii, I'll try to look at your new series on Weekend).
> > >
> > > OK, I dropped all your previous patches and stuffed these in.
> > >
> > > They should all be visible in queue/perf/core, and provided the robot
> > > doesn't scream, I'll push them into tip/perf/core soonish.
> >
> > Just FYI, it seems like tip/perf/core is currently broken for uprobes
> > (and by implication also queue/perf/core). Also torvalds/linux/master
> > master is broken. See what I'm getting when running BPF selftests
> > dealing with uprobes. Sometimes I only get that WARNING and nothing
> > else.
> >
> > I'm bisecting at the moment with bpf/master being a "good" checkpoint,
> > will let you know once I bisect.
> 
> Ok, this bisected to:
> 
> 675ad74989c2 ("perf/core: Add aux_pause, aux_resume, aux_start_paused")
> 
> Reverting all (applied to tip/perf/core) four patches from that series:
> 
> 6763ebdb4983 (tip/perf/core) perf/x86/intel: Do not enable large PEBS
> for events with aux actions or aux sampling
> 6a45d8847597 perf/x86/intel/pt: Add support for pause / resume
> 675ad74989c2 perf/core: Add aux_pause, aux_resume, aux_start_paused
> d92792a4b26e perf/x86/intel/pt: Fix sampling synchronization
> 
> ... makes everything work again. I'll leave it up to you and Adrian to
> figure this out.

Thanks for catching this. I'll go have a look.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
  2024-08-01 21:13     ` Andrii Nakryiko
  2024-08-02  8:27       ` Peter Zijlstra
@ 2024-08-02  9:25       ` Peter Zijlstra
  2024-08-02 11:02         ` Adrian Hunter
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2024-08-02  9:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Adrian Hunter, Oleg Nesterov, andrii, mhiramat, jolsa, rostedt,
	linux-kernel, linux-trace-kernel, bpf

On Thu, Aug 01, 2024 at 02:13:41PM -0700, Andrii Nakryiko wrote:

> Ok, this bisected to:
> 
> 675ad74989c2 ("perf/core: Add aux_pause, aux_resume, aux_start_paused")

Adrian, there are at least two obvious bugs there:

 - aux_action was key's off of PERF_PMU_CAP_AUX_OUTPUT, which is not
   right, that's the capability where events can output to AUX -- aka.
   PEBS-to-PT. It should be PERF_PMU_CAP_ITRACE, which is the
   PT/CoreSight thing.

 - it sets aux_paused unconditionally, which is scribbling in the giant
   union which is overwriting state set by perf_init_event().

But I think there's more problems, we need to do the aux_action
validation after perf_get_aux_event(), we can't know if having those
bits set makes sense before that. This means the perf_event_alloc() site
is wrong in the first place.

I'm going to drop these patches for now. Please rework.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
  2024-08-02  9:25       ` Peter Zijlstra
@ 2024-08-02 11:02         ` Adrian Hunter
  2024-08-02 17:14           ` Adrian Hunter
  0 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2024-08-02 11:02 UTC (permalink / raw)
  To: Peter Zijlstra, Andrii Nakryiko
  Cc: Oleg Nesterov, andrii, mhiramat, jolsa, rostedt, linux-kernel,
	linux-trace-kernel, bpf

On 2/08/24 12:25, Peter Zijlstra wrote:
> On Thu, Aug 01, 2024 at 02:13:41PM -0700, Andrii Nakryiko wrote:
> 
>> Ok, this bisected to:
>>
>> 675ad74989c2 ("perf/core: Add aux_pause, aux_resume, aux_start_paused")
> 
> Adrian, there are at least two obvious bugs there:
> 
>  - aux_action was key's off of PERF_PMU_CAP_AUX_OUTPUT, which is not
>    right, that's the capability where events can output to AUX -- aka.
>    PEBS-to-PT. It should be PERF_PMU_CAP_ITRACE, which is the
>    PT/CoreSight thing.
> 
>  - it sets aux_paused unconditionally, which is scribbling in the giant
>    union which is overwriting state set by perf_init_event().
> 
> But I think there's more problems, we need to do the aux_action
> validation after perf_get_aux_event(), we can't know if having those
> bits set makes sense before that. This means the perf_event_alloc() site
> is wrong in the first place.
> 
> I'm going to drop these patches for now. Please rework.

Yes I will do that.

FWIW, I'd expect the reported issue would go away with just:

diff --git a/kernel/events/core.c b/kernel/events/core.c
index e4cb6e5a5f40..2072aaa4d449 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12151,7 +12151,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		err = -EOPNOTSUPP;
 		goto err_pmu;
 	}
-	event->hw.aux_paused = event->attr.aux_start_paused;
+	if (event->attr.aux_start_paused)
+		event->hw.aux_paused = 1;
 
 	if (cgroup_fd != -1) {
 		err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 0/9] uprobes: misc cleanups/simplifications
  2024-08-02 11:02         ` Adrian Hunter
@ 2024-08-02 17:14           ` Adrian Hunter
  0 siblings, 0 replies; 17+ messages in thread
From: Adrian Hunter @ 2024-08-02 17:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Oleg Nesterov, andrii, mhiramat, jolsa, rostedt, linux-kernel,
	linux-trace-kernel, bpf, Andrii Nakryiko

On 2/08/24 14:02, Adrian Hunter wrote:
> On 2/08/24 12:25, Peter Zijlstra wrote:
>> On Thu, Aug 01, 2024 at 02:13:41PM -0700, Andrii Nakryiko wrote:
>>
>>> Ok, this bisected to:
>>>
>>> 675ad74989c2 ("perf/core: Add aux_pause, aux_resume, aux_start_paused")
>>
>> Adrian, there are at least two obvious bugs there:
>>
>>  - aux_action was key's off of PERF_PMU_CAP_AUX_OUTPUT, which is not
>>    right, that's the capability where events can output to AUX -- aka.
>>    PEBS-to-PT. It should be PERF_PMU_CAP_ITRACE, which is the
>>    PT/CoreSight thing.

Not sure about that.

In perf_event_alloc(), there is:

	if (event->attr.aux_output &&
	    (!(pmu->capabilities & PERF_PMU_CAP_AUX_OUTPUT) ||
	     event->attr.aux_pause || event->attr.aux_resume)) {
		err = -EOPNOTSUPP;
		goto err_pmu;
	}

which is to prevent aux_output with aux_pause or aux_resume.
That is because aux_output (i.e. PEBS-via-PT) has no interrupt
and so does not overflow.  (Instead the PEBS record is written
by hardware to the Intel PT trace)  No overflow => no (software)
aux_pause/aux_resume, so aux_output with aux_pause/aux_resume
does not make sense.

The PMU capability for aux_pause/aux_resume or aux_start_paused
is PERF_PMU_CAP_AUX_PAUSE.  aux_pause/aux_resume are valid for
non-AUX events (member of the group), whereas aux_start_paused
is valid for the AUX event itself (group leader).  For 
aux_pause/aux_resume the group leader's PMU capability is
checked.  For aux_start_paused the event's PMU capability is
checked.

>>
>>  - it sets aux_paused unconditionally, which is scribbling in the giant
>>    union which is overwriting state set by perf_init_event().

That definitely needs fixing, but the fix is just the diff
from my previous reply:

diff --git a/kernel/events/core.c b/kernel/events/core.c
index e4cb6e5a5f40..2072aaa4d449 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12151,7 +12151,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		err = -EOPNOTSUPP;
 		goto err_pmu;
 	}
-	event->hw.aux_paused = event->attr.aux_start_paused;
+	if (event->attr.aux_start_paused)
+		event->hw.aux_paused = 1;
 
 	if (cgroup_fd != -1) {
 		err = perf_cgroup_connect(cgroup_fd, event, attr, group_leader);

I tested that with:

  # perf probe -x /root/main -a main
  Added new event:
    probe_main:main      (on main in /root/main)

  # perf record -e probe_main:main -- ./main

and it made the problem go away.

>>
>> But I think there's more problems, we need to do the aux_action
>> validation after perf_get_aux_event(), we can't know if having those
>> bits set makes sense before that. This means the perf_event_alloc() site
>> is wrong in the first place.

As above, aux_start_paused is used on the AUX event itself, so the
PMU capability is checked in perf_event_alloc:

	if (event->attr.aux_start_paused &&
	    !(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE)) {
		err = -EOPNOTSUPP;
		goto err_pmu;
	}

Whereas aux_pause/aux_resume are checked in perf_get_aux_event():

	if ((event->attr.aux_pause || event->attr.aux_resume) &&
	    !(group_leader->pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE))
		return 0;

That all seems OK, so please let me know if there is
something else to change.


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-08-02 17:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-01 13:26 [PATCH v4 0/9] uprobes: misc cleanups/simplifications Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 1/9] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 2/9] uprobes: is_trap_at_addr: don't use get_user_pages_remote() Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 3/9] uprobes: simplify error handling for alloc_uprobe() Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 4/9] selftests/bpf: fix uprobe.path leak in bpf_testmod Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 5/9] uprobes: kill uprobe_register_refctr() Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 6/9] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 7/9] uprobes: change uprobe_register() to use uprobe_unregister() instead of __uprobe_unregister() Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 8/9] uprobes: fold __uprobe_unregister() into uprobe_unregister() Oleg Nesterov
2024-08-01 13:27 ` [PATCH v4 9/9] uprobes: shift put_uprobe() from delete_uprobe() to uprobe_unregister() Oleg Nesterov
2024-08-01 13:36 ` [PATCH v4 0/9] uprobes: misc cleanups/simplifications Peter Zijlstra
2024-08-01 18:58   ` Andrii Nakryiko
2024-08-01 21:13     ` Andrii Nakryiko
2024-08-02  8:27       ` Peter Zijlstra
2024-08-02  9:25       ` Peter Zijlstra
2024-08-02 11:02         ` Adrian Hunter
2024-08-02 17:14           ` Adrian Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).