public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] uprobes: document mmap_lock, don't abuse get_user_pages_remote()
@ 2024-07-10 14:00 Oleg Nesterov
  2024-07-10 14:00 ` [PATCH 1/2] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-10 14:00 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: clm, jolsa, mingo, paulmck, linux-kernel

Peter, these simple cleanups should not conflict with your changes,
but I can resend them later if it causes any inconvenience.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2] uprobes: document the usage of mm->mmap_lock
  2024-07-10 14:00 [PATCH 0/2] uprobes: document mmap_lock, don't abuse get_user_pages_remote() Oleg Nesterov
@ 2024-07-10 14:00 ` Oleg Nesterov
  2024-07-10 14:51   ` Masami Hiramatsu
  2024-07-10 14:01 ` [PATCH 2/2] uprobes: is_trap_at_addr: don't use get_user_pages_remote() Oleg Nesterov
  2024-07-10 16:30 ` [PATCH 0/3] uprobes: future cleanups for review Oleg Nesterov
  2 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-10 14:00 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: clm, jolsa, mingo, paulmck, linux-kernel

The comment above uprobe_write_opcode() is wrong, unapply_uprobe() calls
it under mmap_read_lock() and this is correct.

And it is completely unclear why register_for_each_vma() takes mmap_lock
for writing, add a comment to explain that mmap_write_lock() is needed to
avoid the following race:

	- A task T hits the bp installed by uprobe and calls
	  find_active_uprobe()

	- uprobe_unregister() removes this uprobe/bp

	- T calls find_uprobe() which returns NULL

	- another uprobe_register() installs the bp at the same address

	- T calls is_trap_at_addr() which returns true

	- T returns to handle_swbp() and gets SIGTRAP.

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/uprobes.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 2c83ba776fc7..d52b624a50fa 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -453,7 +453,7 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
  * @vaddr: the virtual address to store the opcode.
  * @opcode: opcode to be written at @vaddr.
  *
- * Called with mm->mmap_lock held for write.
+ * Called with mm->mmap_lock held for read or write.
  * Return 0 (success) or a negative errno.
  */
 int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
@@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
 
 		if (err && is_register)
 			goto free;
-
+		/*
+		 * We take mmap_lock for writing to avoid the race with
+		 * find_active_uprobe(), install_breakpoint() must not
+		 * make is_trap_at_addr() true right after find_uprobe()
+		 * returns NULL.
+		 */
 		mmap_write_lock(mm);
 		vma = find_vma(mm, info->vaddr);
 		if (!vma || !valid_vma(vma, is_register) ||
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] uprobes: is_trap_at_addr: don't use get_user_pages_remote()
  2024-07-10 14:00 [PATCH 0/2] uprobes: document mmap_lock, don't abuse get_user_pages_remote() Oleg Nesterov
  2024-07-10 14:00 ` [PATCH 1/2] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
@ 2024-07-10 14:01 ` Oleg Nesterov
  2024-07-10 15:15   ` Andrii Nakryiko
  2024-07-10 16:30 ` [PATCH 0/3] uprobes: future cleanups for review Oleg Nesterov
  2 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-10 14:01 UTC (permalink / raw)
  To: andrii, mhiramat, peterz; +Cc: clm, jolsa, mingo, paulmck, linux-kernel

get_user_pages_remote() and the comment above it make no sense.

There is no task_struct passed into get_user_pages_remote() anymore, and
nowadays mm_account_fault() increments the current->min/maj_flt counters
regardless of FAULT_FLAG_REMOTE.

Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/uprobes.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d52b624a50fa..1bdf386485d4 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -453,7 +453,7 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
  * @vaddr: the virtual address to store the opcode.
  * @opcode: opcode to be written at @vaddr.
  *
- * Called with mm->mmap_lock held for read or write.
+ * Called with mm->mmap_lock held.
  * Return 0 (success) or a negative errno.
  */
 int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
@@ -2024,13 +2024,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
 	if (likely(result == 0))
 		goto out;
 
-	/*
-	 * The NULL 'tsk' here ensures that any faults that occur here
-	 * will not be accounted to the task.  'mm' *is* current->mm,
-	 * but we treat this as a 'remote' access since it is
-	 * essentially a kernel access to the memory.
-	 */
-	result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page, NULL);
+	result = get_user_pages(vaddr, 1, FOLL_FORCE, &page);
 	if (result < 0)
 		return result;
 
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] uprobes: document the usage of mm->mmap_lock
  2024-07-10 14:00 ` [PATCH 1/2] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
@ 2024-07-10 14:51   ` Masami Hiramatsu
  2024-07-10 15:10     ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Masami Hiramatsu @ 2024-07-10 14:51 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: andrii, peterz, clm, jolsa, mingo, paulmck, linux-kernel

On Wed, 10 Jul 2024 16:00:45 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> The comment above uprobe_write_opcode() is wrong, unapply_uprobe() calls
> it under mmap_read_lock() and this is correct.
> 
> And it is completely unclear why register_for_each_vma() takes mmap_lock
> for writing, add a comment to explain that mmap_write_lock() is needed to
> avoid the following race:
> 
> 	- A task T hits the bp installed by uprobe and calls
> 	  find_active_uprobe()
> 
> 	- uprobe_unregister() removes this uprobe/bp
> 
> 	- T calls find_uprobe() which returns NULL
> 
> 	- another uprobe_register() installs the bp at the same address
> 
> 	- T calls is_trap_at_addr() which returns true
> 
> 	- T returns to handle_swbp() and gets SIGTRAP.
> 
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/events/uprobes.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 2c83ba776fc7..d52b624a50fa 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -453,7 +453,7 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
>   * @vaddr: the virtual address to store the opcode.
>   * @opcode: opcode to be written at @vaddr.
>   *
> - * Called with mm->mmap_lock held for write.
> + * Called with mm->mmap_lock held for read or write.
>   * Return 0 (success) or a negative errno.
>   */
>  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> @@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
>  
>  		if (err && is_register)
>  			goto free;
> -
> +		/*
> +		 * We take mmap_lock for writing to avoid the race with
> +		 * find_active_uprobe(), install_breakpoint() must not
> +		 * make is_trap_at_addr() true right after find_uprobe()
> +		 * returns NULL.

Sorry, I couldn't catch the latter part. What is the relationship of
taking the mmap_lock and install_breakpoint() and is_trap_at_addr() here?

You meant that find_active_uprobe() is using find_uprobe() which searchs
uprobe form rbtree? But it seems uprobe is already inserted to the rbtree
in alloc_uprobe() so find_uprobe() will not return NULL here, right?

Thank you,

> +		 */
>  		mmap_write_lock(mm);
>  		vma = find_vma(mm, info->vaddr);
>  		if (!vma || !valid_vma(vma, is_register) ||
> -- 
> 2.25.1.362.g51ebf55
> 
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] uprobes: document the usage of mm->mmap_lock
  2024-07-10 14:51   ` Masami Hiramatsu
@ 2024-07-10 15:10     ` Oleg Nesterov
  2024-07-11  0:07       ` Masami Hiramatsu
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-10 15:10 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: andrii, peterz, clm, jolsa, mingo, paulmck, linux-kernel

On 07/10, Masami Hiramatsu wrote:
>
> On Wed, 10 Jul 2024 16:00:45 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > The comment above uprobe_write_opcode() is wrong, unapply_uprobe() calls
> > it under mmap_read_lock() and this is correct.
> >
> > And it is completely unclear why register_for_each_vma() takes mmap_lock
> > for writing, add a comment to explain that mmap_write_lock() is needed to
> > avoid the following race:
> >
> > 	- A task T hits the bp installed by uprobe and calls
> > 	  find_active_uprobe()
> >
> > 	- uprobe_unregister() removes this uprobe/bp
> >
> > 	- T calls find_uprobe() which returns NULL
> >
> > 	- another uprobe_register() installs the bp at the same address
> >
> > 	- T calls is_trap_at_addr() which returns true
> >
> > 	- T returns to handle_swbp() and gets SIGTRAP.

...

> >  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > @@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
> >
> >  		if (err && is_register)
> >  			goto free;
> > -
> > +		/*
> > +		 * We take mmap_lock for writing to avoid the race with
> > +		 * find_active_uprobe(), install_breakpoint() must not
> > +		 * make is_trap_at_addr() true right after find_uprobe()
> > +		 * returns NULL.
>
> Sorry, I couldn't catch the latter part. What is the relationship of
> taking the mmap_lock and install_breakpoint() and is_trap_at_addr() here?

Please the the changelog above, it tries to explain this race with more
details...

> You meant that find_active_uprobe() is using find_uprobe() which searchs
> uprobe form rbtree?

Yes,

> But it seems uprobe is already inserted to the rbtree
> in alloc_uprobe() so find_uprobe() will not return NULL here, right?

uprobe_register() -> alloc_uprobe() can come after
find_active_uprobe() -> find_uprobe() returns NULL.

Now, if uprobe_register() -> register_for_each_vma() used mmap_read_lock(), it
could do install_breakpoint() before find_active_uprobe() calls is_trap_at_addr().

In this case find_active_uprobe() returns with uprobe == NULL and is_swbp == 1,
handle_swbp() treat this case as the "normal" int3 without uprobe and do

	if (!uprobe) {
		if (is_swbp > 0) {
			/* No matching uprobe; signal SIGTRAP. */
			force_sig(SIGTRAP);

Does this answer your question?

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2] uprobes: is_trap_at_addr: don't use get_user_pages_remote()
  2024-07-10 14:01 ` [PATCH 2/2] uprobes: is_trap_at_addr: don't use get_user_pages_remote() Oleg Nesterov
@ 2024-07-10 15:15   ` Andrii Nakryiko
  0 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2024-07-10 15:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: andrii, mhiramat, peterz, clm, jolsa, mingo, paulmck,
	linux-kernel

On Wed, Jul 10, 2024 at 7:02 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> get_user_pages_remote() and the comment above it make no sense.
>
> There is no task_struct passed into get_user_pages_remote() anymore, and
> nowadays mm_account_fault() increments the current->min/maj_flt counters
> regardless of FAULT_FLAG_REMOTE.
>
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/events/uprobes.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
>

Makes sense

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index d52b624a50fa..1bdf386485d4 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -453,7 +453,7 @@ static int update_ref_ctr(struct uprobe *uprobe, struct mm_struct *mm,
>   * @vaddr: the virtual address to store the opcode.
>   * @opcode: opcode to be written at @vaddr.
>   *
> - * Called with mm->mmap_lock held for read or write.
> + * Called with mm->mmap_lock held.
>   * Return 0 (success) or a negative errno.
>   */
>  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> @@ -2024,13 +2024,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
>         if (likely(result == 0))
>                 goto out;
>
> -       /*
> -        * The NULL 'tsk' here ensures that any faults that occur here
> -        * will not be accounted to the task.  'mm' *is* current->mm,
> -        * but we treat this as a 'remote' access since it is
> -        * essentially a kernel access to the memory.
> -        */
> -       result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page, NULL);
> +       result = get_user_pages(vaddr, 1, FOLL_FORCE, &page);
>         if (result < 0)
>                 return result;
>
> --
> 2.25.1.362.g51ebf55
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/3] uprobes: future cleanups for review
  2024-07-10 14:00 [PATCH 0/2] uprobes: document mmap_lock, don't abuse get_user_pages_remote() Oleg Nesterov
  2024-07-10 14:00 ` [PATCH 1/2] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
  2024-07-10 14:01 ` [PATCH 2/2] uprobes: is_trap_at_addr: don't use get_user_pages_remote() Oleg Nesterov
@ 2024-07-10 16:30 ` Oleg Nesterov
  2024-07-10 16:30   ` [PATCH 1/3] uprobes: kill uprobe_register_refctr() Oleg Nesterov
                     ` (3 more replies)
  2 siblings, 4 replies; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-10 16:30 UTC (permalink / raw)
  To: andrii, mhiramat, peterz
  Cc: clm, jolsa, mingo, paulmck, rostedt, linux-kernel,
	linux-trace-kernel

On 07/10, Oleg Nesterov wrote:
>
> Peter, these simple cleanups should not conflict with your changes,
> but I can resend them later if it causes any inconvenience.

In fact I would like to push 2 more cleanups before the more significant
changes, but they certainly conflict with your ongoing work, albeit only
textually.

Let me send the patches for review anyway, perhaps you can take at least
the 1st one.

3/3 is only compile tested so far. Andrii, can you take a look?

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/3] uprobes: kill uprobe_register_refctr()
  2024-07-10 16:30 ` [PATCH 0/3] uprobes: future cleanups for review Oleg Nesterov
@ 2024-07-10 16:30   ` Oleg Nesterov
  2024-07-10 18:03     ` Andrii Nakryiko
  2024-07-10 16:31   ` [PATCH 2/3] uprobes: simplify error handling for alloc_uprobe() Oleg Nesterov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-10 16:30 UTC (permalink / raw)
  To: andrii, mhiramat, peterz
  Cc: clm, jolsa, mingo, paulmck, rostedt, linux-kernel,
	linux-trace-kernel

It doesn't make any sense to have 2 versions of _register(). Note that
trace_uprobe_enable(), the only user of uprobe_register(), doesn't need
to check tu->ref_ctr_offset to decide which one should be used, it could
safely pass ref_ctr_offset == 0 to uprobe_register_refctr().

Add this argument to uprobe_register(), update the callers, and kill
uprobe_register_refctr().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/uprobes.h     |  9 ++-------
 kernel/events/uprobes.c     | 23 +++++------------------
 kernel/trace/bpf_trace.c    |  2 +-
 kernel/trace/trace_uprobe.c |  8 ++------
 4 files changed, 10 insertions(+), 32 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index f46e0ca0169c..aa89a8b67039 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -110,8 +110,7 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
-extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
-extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
+extern int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
 extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
 extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
@@ -149,11 +148,7 @@ static inline void uprobes_init(void)
 #define uprobe_get_trap_addr(regs)	instruction_pointer(regs)
 
 static inline int
-uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
-{
-	return -ENOSYS;
-}
-static inline int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
 {
 	return -ENOSYS;
 }
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1bdf386485d4..4950decebe5c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1120,25 +1120,25 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume
 EXPORT_SYMBOL_GPL(uprobe_unregister);
 
 /*
- * __uprobe_register - register a probe
+ * uprobe_register - register a probe
  * @inode: the file in which the probe has to be placed.
  * @offset: offset from the start of the file.
  * @uc: information on howto handle the probe..
  *
- * Apart from the access refcount, __uprobe_register() takes a creation
+ * Apart from the access refcount, uprobe_register() takes a creation
  * refcount (thro alloc_uprobe) if and only if this @uprobe is getting
  * inserted into the rbtree (i.e first consumer for a @inode:@offset
  * tuple).  Creation refcount stops uprobe_unregister from freeing the
  * @uprobe even before the register operation is complete. Creation
  * refcount is released when the last @uc for the @uprobe
- * unregisters. Caller of __uprobe_register() is required to keep @inode
+ * unregisters. Caller of uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
  *
  * Return errno if it cannot successully install probes
  * else return 0 (success)
  */
-static int __uprobe_register(struct inode *inode, loff_t offset,
-			     loff_t ref_ctr_offset, struct uprobe_consumer *uc)
+int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
+		    struct uprobe_consumer *uc)
 {
 	struct uprobe *uprobe;
 	int ret;
@@ -1190,21 +1190,8 @@ static int __uprobe_register(struct inode *inode, loff_t offset,
 		goto retry;
 	return ret;
 }
-
-int uprobe_register(struct inode *inode, loff_t offset,
-		    struct uprobe_consumer *uc)
-{
-	return __uprobe_register(inode, offset, 0, uc);
-}
 EXPORT_SYMBOL_GPL(uprobe_register);
 
-int uprobe_register_refctr(struct inode *inode, loff_t offset,
-			   loff_t ref_ctr_offset, struct uprobe_consumer *uc)
-{
-	return __uprobe_register(inode, offset, ref_ctr_offset, uc);
-}
-EXPORT_SYMBOL_GPL(uprobe_register_refctr);
-
 /*
  * uprobe_apply - unregister an already registered probe.
  * @inode: the file in which the probe has to be removed.
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d1daeab1bbc1..467f358c8ce7 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3477,7 +3477,7 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
 		      &bpf_uprobe_multi_link_lops, prog);
 
 	for (i = 0; i < cnt; i++) {
-		err = uprobe_register_refctr(d_real_inode(link->path.dentry),
+		err = uprobe_register(d_real_inode(link->path.dentry),
 					     uprobes[i].offset,
 					     uprobes[i].ref_ctr_offset,
 					     &uprobes[i].consumer);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index c98e3b3386ba..78a5c40e885a 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1089,12 +1089,8 @@ static int trace_uprobe_enable(struct trace_uprobe *tu, filter_func_t filter)
 	tu->consumer.filter = filter;
 	tu->inode = d_real_inode(tu->path.dentry);
 
-	if (tu->ref_ctr_offset)
-		ret = uprobe_register_refctr(tu->inode, tu->offset,
-				tu->ref_ctr_offset, &tu->consumer);
-	else
-		ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
-
+	ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset,
+			      &tu->consumer);
 	if (ret)
 		tu->inode = NULL;
 
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/3] uprobes: simplify error handling for alloc_uprobe()
  2024-07-10 16:30 ` [PATCH 0/3] uprobes: future cleanups for review Oleg Nesterov
  2024-07-10 16:30   ` [PATCH 1/3] uprobes: kill uprobe_register_refctr() Oleg Nesterov
@ 2024-07-10 16:31   ` Oleg Nesterov
  2024-07-11 15:18     ` Masami Hiramatsu
  2024-07-10 16:31   ` [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
  2024-07-11  8:27   ` [PATCH 0/3] uprobes: future cleanups for review Peter Zijlstra
  3 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-10 16:31 UTC (permalink / raw)
  To: andrii, mhiramat, peterz
  Cc: clm, jolsa, mingo, paulmck, rostedt, linux-kernel,
	linux-trace-kernel

From: Andrii Nakryiko <andrii@kernel.org>

Return -ENOMEM instead of NULL, which makes caller's error handling just
a touch simpler.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/events/uprobes.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 4950decebe5c..e5b7c6239970 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -725,7 +725,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
 
 	uprobe = kzalloc(sizeof(struct uprobe), GFP_KERNEL);
 	if (!uprobe)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	uprobe->inode = inode;
 	uprobe->offset = offset;
@@ -1166,8 +1166,6 @@ int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
 
  retry:
 	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
-	if (!uprobe)
-		return -ENOMEM;
 	if (IS_ERR(uprobe))
 		return PTR_ERR(uprobe);
 
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-10 16:30 ` [PATCH 0/3] uprobes: future cleanups for review Oleg Nesterov
  2024-07-10 16:30   ` [PATCH 1/3] uprobes: kill uprobe_register_refctr() Oleg Nesterov
  2024-07-10 16:31   ` [PATCH 2/3] uprobes: simplify error handling for alloc_uprobe() Oleg Nesterov
@ 2024-07-10 16:31   ` Oleg Nesterov
  2024-07-10 16:48     ` Jiri Olsa
                       ` (2 more replies)
  2024-07-11  8:27   ` [PATCH 0/3] uprobes: future cleanups for review Peter Zijlstra
  3 siblings, 3 replies; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-10 16:31 UTC (permalink / raw)
  To: andrii, mhiramat, peterz
  Cc: clm, jolsa, mingo, paulmck, rostedt, linux-kernel,
	linux-trace-kernel

This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() +
put_uprobe(). And to me this change simplifies the code a bit.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/uprobes.h     | 14 ++++++------
 kernel/events/uprobes.c     | 45 ++++++++++++-------------------------
 kernel/trace/bpf_trace.c    | 12 +++++-----
 kernel/trace/trace_uprobe.c | 28 +++++++++++------------
 4 files changed, 41 insertions(+), 58 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index aa89a8b67039..399509befcf4 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -110,9 +110,9 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
 extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
 extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
 extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
-extern int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
-extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
-extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
+extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
+extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
+extern void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc);
 extern int uprobe_mmap(struct vm_area_struct *vma);
 extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end);
 extern void uprobe_start_dup_mmap(void);
@@ -147,18 +147,18 @@ static inline void uprobes_init(void)
 
 #define uprobe_get_trap_addr(regs)	instruction_pointer(regs)
 
-static inline int
+static inline struct uprobe *
 uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
 {
-	return -ENOSYS;
+	return ERR_PTR(-ENOSYS);
 }
 static inline int
-uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
+uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add)
 {
 	return -ENOSYS;
 }
 static inline void
-uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
 }
 static inline int uprobe_mmap(struct vm_area_struct *vma)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index e5b7c6239970..dba41403d7b8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1100,22 +1100,15 @@ __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 
 /*
  * uprobe_unregister - unregister an already registered probe.
- * @inode: the file in which the probe has to be removed.
+ * @uprobe: uprobe to remove
  * @offset: offset from the start of the file.
  * @uc: identify which probe if multiple probes are colocated.
  */
-void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
+void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
-	struct uprobe *uprobe;
-
-	uprobe = find_uprobe(inode, offset);
-	if (WARN_ON(!uprobe))
-		return;
-
 	down_write(&uprobe->register_rwsem);
 	__uprobe_unregister(uprobe, uc);
 	up_write(&uprobe->register_rwsem);
-	put_uprobe(uprobe);
 }
 EXPORT_SYMBOL_GPL(uprobe_unregister);
 
@@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
  * refcount is released when the last @uc for the @uprobe
  * unregisters. Caller of uprobe_register() is required to keep @inode
  * (and the containing mount) referenced.
- *
- * Return errno if it cannot successully install probes
- * else return 0 (success)
  */
-int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
-		    struct uprobe_consumer *uc)
+struct uprobe *uprobe_register(struct inode *inode,
+				loff_t offset, loff_t ref_ctr_offset,
+				struct uprobe_consumer *uc)
 {
 	struct uprobe *uprobe;
 	int ret;
 
 	/* Uprobe must have at least one set consumer */
 	if (!uc->handler && !uc->ret_handler)
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	/* copy_insn() uses read_mapping_page() or shmem_read_mapping_page() */
 	if (!inode->i_mapping->a_ops->read_folio &&
 	    !shmem_mapping(inode->i_mapping))
-		return -EIO;
+		return ERR_PTR(-EIO);
 	/* Racy, just to catch the obvious mistakes */
 	if (offset > i_size_read(inode))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	/*
 	 * This ensures that copy_from_page(), copy_to_page() and
 	 * __update_ref_ctr() can't cross page boundary.
 	 */
 	if (!IS_ALIGNED(offset, UPROBE_SWBP_INSN_SIZE))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 	if (!IS_ALIGNED(ref_ctr_offset, sizeof(short)))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
  retry:
 	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
 	if (IS_ERR(uprobe))
-		return PTR_ERR(uprobe);
+		return uprobe;
 
 	/*
 	 * We can race with uprobe_unregister()->delete_uprobe().
@@ -1186,35 +1177,27 @@ int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
 
 	if (unlikely(ret == -EAGAIN))
 		goto retry;
-	return ret;
+
+	return ret ? ERR_PTR(ret) : uprobe;
 }
 EXPORT_SYMBOL_GPL(uprobe_register);
 
 /*
  * uprobe_apply - unregister an already registered probe.
- * @inode: the file in which the probe has to be removed.
- * @offset: offset from the start of the file.
  * @uc: consumer which wants to add more or remove some breakpoints
  * @add: add or remove the breakpoints
  */
-int uprobe_apply(struct inode *inode, loff_t offset,
-			struct uprobe_consumer *uc, bool add)
+int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool add)
 {
-	struct uprobe *uprobe;
 	struct uprobe_consumer *con;
 	int ret = -ENOENT;
 
-	uprobe = find_uprobe(inode, offset);
-	if (WARN_ON(!uprobe))
-		return ret;
-
 	down_write(&uprobe->register_rwsem);
 	for (con = uprobe->consumers; con && con != uc ; con = con->next)
 		;
 	if (con)
 		ret = register_for_each_vma(uprobe, add ? uc : NULL);
 	up_write(&uprobe->register_rwsem);
-	put_uprobe(uprobe);
 
 	return ret;
 }
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 467f358c8ce7..7571811127a2 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3157,6 +3157,7 @@ struct bpf_uprobe {
 	loff_t offset;
 	unsigned long ref_ctr_offset;
 	u64 cookie;
+	struct uprobe *uprobe;
 	struct uprobe_consumer consumer;
 };
 
@@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
 {
 	u32 i;
 
-	for (i = 0; i < cnt; i++) {
-		uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
-				  &uprobes[i].consumer);
-	}
+	for (i = 0; i < cnt; i++)
+		uprobe_unregister(uprobes[i].uprobe, &uprobes[i].consumer);
 }
 
 static void bpf_uprobe_multi_link_release(struct bpf_link *link)
@@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
 		      &bpf_uprobe_multi_link_lops, prog);
 
 	for (i = 0; i < cnt; i++) {
-		err = uprobe_register(d_real_inode(link->path.dentry),
+		uprobes[i].uprobe = uprobe_register(d_real_inode(link->path.dentry),
 					     uprobes[i].offset,
 					     uprobes[i].ref_ctr_offset,
 					     &uprobes[i].consumer);
-		if (err) {
+		if (IS_ERR(uprobes[i].uprobe)) {
+			err = PTR_ERR(uprobes[i].uprobe);
 			bpf_uprobe_unregister(&path, uprobes, i);
 			goto error_free;
 		}
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 78a5c40e885a..2872955da202 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -58,8 +58,8 @@ struct trace_uprobe {
 	struct dyn_event		devent;
 	struct uprobe_consumer		consumer;
 	struct path			path;
-	struct inode			*inode;
 	char				*filename;
+	struct uprobe			*uprobe;
 	unsigned long			offset;
 	unsigned long			ref_ctr_offset;
 	unsigned long			nhit;
@@ -1084,17 +1084,17 @@ typedef bool (*filter_func_t)(struct uprobe_consumer *self,
 
 static int trace_uprobe_enable(struct trace_uprobe *tu, filter_func_t filter)
 {
-	int ret;
+	struct inode *inode = d_real_inode(tu->path.dentry);
+	struct uprobe *uprobe;
 
 	tu->consumer.filter = filter;
-	tu->inode = d_real_inode(tu->path.dentry);
-
-	ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset,
-			      &tu->consumer);
-	if (ret)
-		tu->inode = NULL;
+	uprobe = uprobe_register(inode, tu->offset, tu->ref_ctr_offset,
+				&tu->consumer);
+	if (IS_ERR(uprobe))
+		return PTR_ERR(uprobe);
 
-	return ret;
+	tu->uprobe = uprobe;
+	return 0;
 }
 
 static void __probe_event_disable(struct trace_probe *tp)
@@ -1105,11 +1105,11 @@ static void __probe_event_disable(struct trace_probe *tp)
 	WARN_ON(!uprobe_filter_is_empty(tu->tp.event->filter));
 
 	list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) {
-		if (!tu->inode)
+		if (!tu->uprobe)
 			continue;
 
-		uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
-		tu->inode = NULL;
+		uprobe_unregister(tu->uprobe, &tu->consumer);
+		tu->uprobe = NULL;
 	}
 }
 
@@ -1306,7 +1306,7 @@ static int uprobe_perf_close(struct trace_event_call *call,
 		return 0;
 
 	list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) {
-		ret = uprobe_apply(tu->inode, tu->offset, &tu->consumer, false);
+		ret = uprobe_apply(tu->uprobe, &tu->consumer, false);
 		if (ret)
 			break;
 	}
@@ -1330,7 +1330,7 @@ static int uprobe_perf_open(struct trace_event_call *call,
 		return 0;
 
 	list_for_each_entry(tu, trace_probe_probe_list(tp), tp.list) {
-		err = uprobe_apply(tu->inode, tu->offset, &tu->consumer, true);
+		err = uprobe_apply(tu->uprobe, &tu->consumer, true);
 		if (err) {
 			uprobe_perf_close(call, event);
 			break;
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-10 16:31   ` [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
@ 2024-07-10 16:48     ` Jiri Olsa
  2024-07-10 18:23       ` Andrii Nakryiko
  2024-07-10 19:20       ` Oleg Nesterov
  2024-07-10 18:21     ` Andrii Nakryiko
  2024-07-11  9:26     ` Oleg Nesterov
  2 siblings, 2 replies; 30+ messages in thread
From: Jiri Olsa @ 2024-07-10 16:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: andrii, mhiramat, peterz, clm, mingo, paulmck, rostedt,
	linux-kernel, linux-trace-kernel

On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote:

SNIP

> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 467f358c8ce7..7571811127a2 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -3157,6 +3157,7 @@ struct bpf_uprobe {
>  	loff_t offset;
>  	unsigned long ref_ctr_offset;
>  	u64 cookie;
> +	struct uprobe *uprobe;
>  	struct uprobe_consumer consumer;
>  };
>  
> @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
>  {
>  	u32 i;
>  
> -	for (i = 0; i < cnt; i++) {
> -		uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> -				  &uprobes[i].consumer);
> -	}

nice, we could also drop path argument now

jirka

> +	for (i = 0; i < cnt; i++)
> +		uprobe_unregister(uprobes[i].uprobe, &uprobes[i].consumer);
>  }
>  
>  static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
>  		      &bpf_uprobe_multi_link_lops, prog);
>  
>  	for (i = 0; i < cnt; i++) {
> -		err = uprobe_register(d_real_inode(link->path.dentry),
> +		uprobes[i].uprobe = uprobe_register(d_real_inode(link->path.dentry),
>  					     uprobes[i].offset,
>  					     uprobes[i].ref_ctr_offset,
>  					     &uprobes[i].consumer);
> -		if (err) {
> +		if (IS_ERR(uprobes[i].uprobe)) {
> +			err = PTR_ERR(uprobes[i].uprobe);
>  			bpf_uprobe_unregister(&path, uprobes, i);
>  			goto error_free;
>  		}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] uprobes: kill uprobe_register_refctr()
  2024-07-10 16:30   ` [PATCH 1/3] uprobes: kill uprobe_register_refctr() Oleg Nesterov
@ 2024-07-10 18:03     ` Andrii Nakryiko
  2024-07-10 19:32       ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2024-07-10 18:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: andrii, mhiramat, peterz, clm, jolsa, mingo, paulmck, rostedt,
	linux-kernel, linux-trace-kernel

On Wed, Jul 10, 2024 at 9:32 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> It doesn't make any sense to have 2 versions of _register(). Note that
> trace_uprobe_enable(), the only user of uprobe_register(), doesn't need
> to check tu->ref_ctr_offset to decide which one should be used, it could
> safely pass ref_ctr_offset == 0 to uprobe_register_refctr().
>
> Add this argument to uprobe_register(), update the callers, and kill
> uprobe_register_refctr().
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  include/linux/uprobes.h     |  9 ++-------
>  kernel/events/uprobes.c     | 23 +++++------------------
>  kernel/trace/bpf_trace.c    |  2 +-
>  kernel/trace/trace_uprobe.c |  8 ++------
>  4 files changed, 10 insertions(+), 32 deletions(-)
>

LGTM with few nits below.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  /*
>   * uprobe_apply - unregister an already registered probe.
>   * @inode: the file in which the probe has to be removed.
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index d1daeab1bbc1..467f358c8ce7 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -3477,7 +3477,7 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
>                       &bpf_uprobe_multi_link_lops, prog);
>
>         for (i = 0; i < cnt; i++) {
> -               err = uprobe_register_refctr(d_real_inode(link->path.dentry),
> +               err = uprobe_register(d_real_inode(link->path.dentry),
>                                              uprobes[i].offset,
>                                              uprobes[i].ref_ctr_offset,
>                                              &uprobes[i].consumer);

please adjust indentation here

> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index c98e3b3386ba..78a5c40e885a 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -1089,12 +1089,8 @@ static int trace_uprobe_enable(struct trace_uprobe *tu, filter_func_t filter)
>         tu->consumer.filter = filter;
>         tu->inode = d_real_inode(tu->path.dentry);
>
> -       if (tu->ref_ctr_offset)
> -               ret = uprobe_register_refctr(tu->inode, tu->offset,
> -                               tu->ref_ctr_offset, &tu->consumer);
> -       else
> -               ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> -
> +       ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset,
> +                             &tu->consumer);

doesn't fit under 100 characters? If it does, please keep as a single line.

>         if (ret)
>                 tu->inode = NULL;
>
> --
> 2.25.1.362.g51ebf55
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-10 16:31   ` [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
  2024-07-10 16:48     ` Jiri Olsa
@ 2024-07-10 18:21     ` Andrii Nakryiko
  2024-07-10 20:16       ` Oleg Nesterov
  2024-07-11  9:26     ` Oleg Nesterov
  2 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2024-07-10 18:21 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: andrii, mhiramat, peterz, clm, jolsa, mingo, paulmck, rostedt,
	linux-kernel, linux-trace-kernel

On Wed, Jul 10, 2024 at 9:33 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() +
> put_uprobe(). And to me this change simplifies the code a bit.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  include/linux/uprobes.h     | 14 ++++++------
>  kernel/events/uprobes.c     | 45 ++++++++++++-------------------------
>  kernel/trace/bpf_trace.c    | 12 +++++-----
>  kernel/trace/trace_uprobe.c | 28 +++++++++++------------
>  4 files changed, 41 insertions(+), 58 deletions(-)
>
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index aa89a8b67039..399509befcf4 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h

I don't see struct uprobe forward-declared in this header, maybe we
should add it?

> @@ -110,9 +110,9 @@ extern bool is_trap_insn(uprobe_opcode_t *insn);
>  extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
>  extern unsigned long uprobe_get_trap_addr(struct pt_regs *regs);
>  extern int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t);
> -extern int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
> -extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
> -extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
> +extern struct uprobe *uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
> +extern int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool);
> +extern void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc);
>  extern int uprobe_mmap(struct vm_area_struct *vma);
>  extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end);
>  extern void uprobe_start_dup_mmap(void);
> @@ -147,18 +147,18 @@ static inline void uprobes_init(void)
>
>  #define uprobe_get_trap_addr(regs)     instruction_pointer(regs)
>
> -static inline int
> +static inline struct uprobe *
>  uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc)
>  {
> -       return -ENOSYS;
> +       return ERR_PTR(-ENOSYS);
>  }
>  static inline int
> -uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
> +uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add)
>  {
>         return -ENOSYS;
>  }

complete aside, when I was looking at this code I was wondering why we
even need uprobe_apply, it looks like some hacky variant of
uprobe_register and uprobe_unregister. I didn't dig deeper, but think
whether we even need this? If this is just to avoid (for some period)
some consumer callback calling, then that could be handled at the
consumer side by ignoring such calls.

callback call is cheap, it's the int3 handling that's expensive and
with uprobe_apply we are already paying it anyways, so what is this
for?

>  static inline void
> -uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
> +uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
>  {
>  }
>  static inline int uprobe_mmap(struct vm_area_struct *vma)

[...]

>
> @@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
>   * refcount is released when the last @uc for the @uprobe
>   * unregisters. Caller of uprobe_register() is required to keep @inode
>   * (and the containing mount) referenced.
> - *
> - * Return errno if it cannot successully install probes
> - * else return 0 (success)

mention that it never returns NULL, but rather encodes error code
inside the pointer on error? It's an important part of the contract.

>   */
> -int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
> -                   struct uprobe_consumer *uc)
> +struct uprobe *uprobe_register(struct inode *inode,
> +                               loff_t offset, loff_t ref_ctr_offset,
> +                               struct uprobe_consumer *uc)
>  {

[...]

> @@ -1186,35 +1177,27 @@ int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
>
>         if (unlikely(ret == -EAGAIN))
>                 goto retry;
> -       return ret;
> +
> +       return ret ? ERR_PTR(ret) : uprobe;
>  }
>  EXPORT_SYMBOL_GPL(uprobe_register);
>
>  /*

this should be /** for doccomment checking (you'd get a warning for
missing @uprobe if there was this extra star)

>   * uprobe_apply - unregister an already registered probe.
> - * @inode: the file in which the probe has to be removed.
> - * @offset: offset from the start of the file.

add @uprobe description now?

>   * @uc: consumer which wants to add more or remove some breakpoints
>   * @add: add or remove the breakpoints
>   */
> -int uprobe_apply(struct inode *inode, loff_t offset,
> -                       struct uprobe_consumer *uc, bool add)
> +int uprobe_apply(struct uprobe *uprobe, struct uprobe_consumer *uc, bool add)
>  {
> -       struct uprobe *uprobe;
>         struct uprobe_consumer *con;
>         int ret = -ENOENT;
>

[...]

> @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
>  {
>         u32 i;
>
> -       for (i = 0; i < cnt; i++) {
> -               uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> -                                 &uprobes[i].consumer);
> -       }
> +       for (i = 0; i < cnt; i++)

you'll now need !IS_ERR_OR_NULL(uprobes[i].uprobe) check (or just NULL
check if you null-out it below)

> +               uprobe_unregister(uprobes[i].uprobe, &uprobes[i].consumer);
>  }
>
>  static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
>                       &bpf_uprobe_multi_link_lops, prog);
>
>         for (i = 0; i < cnt; i++) {
> -               err = uprobe_register(d_real_inode(link->path.dentry),
> +               uprobes[i].uprobe = uprobe_register(d_real_inode(link->path.dentry),

will uprobe keep inode alive as long as uprobe is attached? If that's
the case we can get rid of link->path (have it only as a local
variable which we put as soon as we are done with registration). We
can probably do that clean up separately, I'll defer to Jiri.

>                                              uprobes[i].offset,
>                                              uprobes[i].ref_ctr_offset,
>                                              &uprobes[i].consumer);
> -               if (err) {
> +               if (IS_ERR(uprobes[i].uprobe)) {
> +                       err = PTR_ERR(uprobes[i].uprobe);

we can NULL-out uprobe on error for bpf_uprobe_unregister() to handle
only NULL vs non-NULL case

or maybe better yet let's just have local struct uprobe variable and
only assign it if registration succeeded (still need NULL check in
bpf_uprobe_unregister above)

>                         bpf_uprobe_unregister(&path, uprobes, i);
>                         goto error_free;
>                 }

[...]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-10 16:48     ` Jiri Olsa
@ 2024-07-10 18:23       ` Andrii Nakryiko
  2024-07-10 19:38         ` Jiri Olsa
  2024-07-10 19:20       ` Oleg Nesterov
  1 sibling, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2024-07-10 18:23 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, andrii, mhiramat, peterz, clm, mingo, paulmck,
	rostedt, linux-kernel, linux-trace-kernel

On Wed, Jul 10, 2024 at 9:49 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote:
>
> SNIP
>
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 467f358c8ce7..7571811127a2 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -3157,6 +3157,7 @@ struct bpf_uprobe {
> >       loff_t offset;
> >       unsigned long ref_ctr_offset;
> >       u64 cookie;
> > +     struct uprobe *uprobe;
> >       struct uprobe_consumer consumer;
> >  };
> >
> > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
> >  {
> >       u32 i;
> >
> > -     for (i = 0; i < cnt; i++) {
> > -             uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> > -                               &uprobes[i].consumer);
> > -     }
>
> nice, we could also drop path argument now

see my comments to Oleg, I think we can/should get rid of link->path
altogether if uprobe itself keeps inode alive.

BTW, Jiri, do we have any test for multi-uprobe that simulates partial
attachment success/failure (whichever way you want to look at it). It
would be super useful to have to check at least some error handling
code in the uprobe code base. If we don't, do you mind adding
something simple to BPF selftests?

>
> jirka
>
> > +     for (i = 0; i < cnt; i++)
> > +             uprobe_unregister(uprobes[i].uprobe, &uprobes[i].consumer);
> >  }
> >
> >  static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> >                     &bpf_uprobe_multi_link_lops, prog);
> >
> >       for (i = 0; i < cnt; i++) {
> > -             err = uprobe_register(d_real_inode(link->path.dentry),
> > +             uprobes[i].uprobe = uprobe_register(d_real_inode(link->path.dentry),
> >                                            uprobes[i].offset,
> >                                            uprobes[i].ref_ctr_offset,
> >                                            &uprobes[i].consumer);
> > -             if (err) {
> > +             if (IS_ERR(uprobes[i].uprobe)) {
> > +                     err = PTR_ERR(uprobes[i].uprobe);
> >                       bpf_uprobe_unregister(&path, uprobes, i);
> >                       goto error_free;
> >               }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-10 16:48     ` Jiri Olsa
  2024-07-10 18:23       ` Andrii Nakryiko
@ 2024-07-10 19:20       ` Oleg Nesterov
  1 sibling, 0 replies; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-10 19:20 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: andrii, mhiramat, peterz, clm, mingo, paulmck, rostedt,
	linux-kernel, linux-trace-kernel

On 07/10, Jiri Olsa wrote:
>
> > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
> >  {
> >  	u32 i;
> >
> > -	for (i = 0; i < cnt; i++) {
> > -		uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> > -				  &uprobes[i].consumer);
> > -	}
>
> nice, we could also drop path argument now

Indeed, I missed that. Will send V2 when/if this makes any sense.

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] uprobes: kill uprobe_register_refctr()
  2024-07-10 18:03     ` Andrii Nakryiko
@ 2024-07-10 19:32       ` Oleg Nesterov
  0 siblings, 0 replies; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-10 19:32 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: andrii, mhiramat, peterz, clm, jolsa, mingo, paulmck, rostedt,
	linux-kernel, linux-trace-kernel

On 07/10, Andrii Nakryiko wrote:
>
> LGTM with few nits below.
>
> Acked-by: Andrii Nakryiko <andrii@kernel.org>

Thanks for looking at this.

> > @@ -3477,7 +3477,7 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> >                       &bpf_uprobe_multi_link_lops, prog);
> >
> >         for (i = 0; i < cnt; i++) {
> > -               err = uprobe_register_refctr(d_real_inode(link->path.dentry),
> > +               err = uprobe_register(d_real_inode(link->path.dentry),
> >                                              uprobes[i].offset,
> >                                              uprobes[i].ref_ctr_offset,
> >                                              &uprobes[i].consumer);
>
> please adjust indentation here

OK,

> > -       if (tu->ref_ctr_offset)
> > -               ret = uprobe_register_refctr(tu->inode, tu->offset,
> > -                               tu->ref_ctr_offset, &tu->consumer);
> > -       else
> > -               ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
> > -
> > +       ret = uprobe_register(tu->inode, tu->offset, tu->ref_ctr_offset,
> > +                             &tu->consumer);
>
> doesn't fit under 100 characters? If it does, please keep as a single line.

OK, will do.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-10 18:23       ` Andrii Nakryiko
@ 2024-07-10 19:38         ` Jiri Olsa
  2024-07-10 19:48           ` Andrii Nakryiko
  0 siblings, 1 reply; 30+ messages in thread
From: Jiri Olsa @ 2024-07-10 19:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Oleg Nesterov, andrii, mhiramat, peterz, clm, mingo,
	paulmck, rostedt, linux-kernel, linux-trace-kernel

On Wed, Jul 10, 2024 at 11:23:10AM -0700, Andrii Nakryiko wrote:
> On Wed, Jul 10, 2024 at 9:49 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote:
> >
> > SNIP
> >
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index 467f358c8ce7..7571811127a2 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -3157,6 +3157,7 @@ struct bpf_uprobe {
> > >       loff_t offset;
> > >       unsigned long ref_ctr_offset;
> > >       u64 cookie;
> > > +     struct uprobe *uprobe;
> > >       struct uprobe_consumer consumer;
> > >  };
> > >
> > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
> > >  {
> > >       u32 i;
> > >
> > > -     for (i = 0; i < cnt; i++) {
> > > -             uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> > > -                               &uprobes[i].consumer);
> > > -     }
> >
> > nice, we could also drop path argument now
> 
> see my comments to Oleg, I think we can/should get rid of link->path
> altogether if uprobe itself keeps inode alive.

yea, I was thinking of that, but then it's kind of useful to have it in
bpf_uprobe_multi_link_fill_link_info, otherwise we have to take it from
first uprobe in the link, but ok, still probably worth to remove it ;-)

anyway as you wrote it's ok for follow up cleanup, I'll check on that

> 
> BTW, Jiri, do we have any test for multi-uprobe that simulates partial
> attachment success/failure (whichever way you want to look at it). It
> would be super useful to have to check at least some error handling
> code in the uprobe code base. If we don't, do you mind adding
> something simple to BPF selftests?

there's test_attach_api_fails, but I think all checked fails are before
actually calling uprobe_register function

I think there are few ways to fail the uprobe_register, like install it
on top of int3.. will check add some test for that

jirka

> 
> >
> > jirka
> >
> > > +     for (i = 0; i < cnt; i++)
> > > +             uprobe_unregister(uprobes[i].uprobe, &uprobes[i].consumer);
> > >  }
> > >
> > >  static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> > > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> > >                     &bpf_uprobe_multi_link_lops, prog);
> > >
> > >       for (i = 0; i < cnt; i++) {
> > > -             err = uprobe_register(d_real_inode(link->path.dentry),
> > > +             uprobes[i].uprobe = uprobe_register(d_real_inode(link->path.dentry),
> > >                                            uprobes[i].offset,
> > >                                            uprobes[i].ref_ctr_offset,
> > >                                            &uprobes[i].consumer);
> > > -             if (err) {
> > > +             if (IS_ERR(uprobes[i].uprobe)) {
> > > +                     err = PTR_ERR(uprobes[i].uprobe);
> > >                       bpf_uprobe_unregister(&path, uprobes, i);
> > >                       goto error_free;
> > >               }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-10 19:38         ` Jiri Olsa
@ 2024-07-10 19:48           ` Andrii Nakryiko
  0 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2024-07-10 19:48 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Oleg Nesterov, andrii, mhiramat, peterz, clm, mingo, paulmck,
	rostedt, linux-kernel, linux-trace-kernel

On Wed, Jul 10, 2024 at 12:38 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Wed, Jul 10, 2024 at 11:23:10AM -0700, Andrii Nakryiko wrote:
> > On Wed, Jul 10, 2024 at 9:49 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Wed, Jul 10, 2024 at 06:31:33PM +0200, Oleg Nesterov wrote:
> > >
> > > SNIP
> > >
> > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > > index 467f358c8ce7..7571811127a2 100644
> > > > --- a/kernel/trace/bpf_trace.c
> > > > +++ b/kernel/trace/bpf_trace.c
> > > > @@ -3157,6 +3157,7 @@ struct bpf_uprobe {
> > > >       loff_t offset;
> > > >       unsigned long ref_ctr_offset;
> > > >       u64 cookie;
> > > > +     struct uprobe *uprobe;
> > > >       struct uprobe_consumer consumer;
> > > >  };
> > > >
> > > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
> > > >  {
> > > >       u32 i;
> > > >
> > > > -     for (i = 0; i < cnt; i++) {
> > > > -             uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> > > > -                               &uprobes[i].consumer);
> > > > -     }
> > >
> > > nice, we could also drop path argument now
> >
> > see my comments to Oleg, I think we can/should get rid of link->path
> > altogether if uprobe itself keeps inode alive.
>
> yea, I was thinking of that, but then it's kind of useful to have it in
> bpf_uprobe_multi_link_fill_link_info, otherwise we have to take it from
> first uprobe in the link, but ok, still probably worth to remove it ;-)

if we need it for link_info, probably cleaner to just keep it, no big deal then

>
> anyway as you wrote it's ok for follow up cleanup, I'll check on that
>
> >
> > BTW, Jiri, do we have any test for multi-uprobe that simulates partial
> > attachment success/failure (whichever way you want to look at it). It
> > would be super useful to have to check at least some error handling
> > code in the uprobe code base. If we don't, do you mind adding
> > something simple to BPF selftests?
>
> there's test_attach_api_fails, but I think all checked fails are before
> actually calling uprobe_register function
>
> I think there are few ways to fail the uprobe_register, like install it
> on top of int3.. will check add some test for that
>

great, thank you!

> jirka
>
> >
> > >
> > > jirka
> > >
> > > > +     for (i = 0; i < cnt; i++)
> > > > +             uprobe_unregister(uprobes[i].uprobe, &uprobes[i].consumer);
> > > >  }
> > > >
> > > >  static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> > > > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> > > >                     &bpf_uprobe_multi_link_lops, prog);
> > > >
> > > >       for (i = 0; i < cnt; i++) {
> > > > -             err = uprobe_register(d_real_inode(link->path.dentry),
> > > > +             uprobes[i].uprobe = uprobe_register(d_real_inode(link->path.dentry),
> > > >                                            uprobes[i].offset,
> > > >                                            uprobes[i].ref_ctr_offset,
> > > >                                            &uprobes[i].consumer);
> > > > -             if (err) {
> > > > +             if (IS_ERR(uprobes[i].uprobe)) {
> > > > +                     err = PTR_ERR(uprobes[i].uprobe);
> > > >                       bpf_uprobe_unregister(&path, uprobes, i);
> > > >                       goto error_free;
> > > >               }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-10 18:21     ` Andrii Nakryiko
@ 2024-07-10 20:16       ` Oleg Nesterov
  2024-07-10 20:46         ` Andrii Nakryiko
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-10 20:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: andrii, mhiramat, peterz, clm, jolsa, mingo, paulmck, rostedt,
	linux-kernel, linux-trace-kernel

On 07/10, Andrii Nakryiko wrote:
>
> On Wed, Jul 10, 2024 at 9:33 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() +
> > put_uprobe(). And to me this change simplifies the code a bit.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> > ---
> >  include/linux/uprobes.h     | 14 ++++++------
> >  kernel/events/uprobes.c     | 45 ++++++++++++-------------------------
> >  kernel/trace/bpf_trace.c    | 12 +++++-----
> >  kernel/trace/trace_uprobe.c | 28 +++++++++++------------
> >  4 files changed, 41 insertions(+), 58 deletions(-)
> >
> > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> > index aa89a8b67039..399509befcf4 100644
> > --- a/include/linux/uprobes.h
> > +++ b/include/linux/uprobes.h
>
> I don't see struct uprobe forward-declared in this header, maybe we
> should add it?

Probably yes, thanks... Although the current code already uses
struct uprobes * without forward-declaration at least if CONFIG_UPROBES=y.
Thanks, will add.

> >  static inline int
> > -uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
> > +uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add)
> >  {
> >         return -ENOSYS;
> >  }
>
> complete aside, when I was looking at this code I was wondering why we
> even need uprobe_apply, it looks like some hacky variant of
> uprobe_register and uprobe_unregister.

All I can say is that

	- I can hardly recall this logic, I'll try to do this tomorrow
	  and write another email

	- in any case this logic is ugly and needs more cleanups

	- but this patch only tries to simplify this code without any
	  visible changes.

> > @@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
> >   * refcount is released when the last @uc for the @uprobe
> >   * unregisters. Caller of uprobe_register() is required to keep @inode
> >   * (and the containing mount) referenced.
> > - *
> > - * Return errno if it cannot successully install probes
> > - * else return 0 (success)
>
> mention that it never returns NULL, but rather encodes error code
> inside the pointer on error? It's an important part of the contract.

OK...

> >  /*
>
> this should be /** for doccomment checking (you'd get a warning for
> missing @uprobe if there was this extra star)

Well, this is what we have before this patch, but OK

> >   * uprobe_apply - unregister an already registered probe.
> > - * @inode: the file in which the probe has to be removed.
> > - * @offset: offset from the start of the file.
>
> add @uprobe description now?

If only I knew what this @uprobe description can say ;)

> > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
> >  {
> >         u32 i;
> >
> > -       for (i = 0; i < cnt; i++) {
> > -               uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> > -                                 &uprobes[i].consumer);
> > -       }
> > +       for (i = 0; i < cnt; i++)
>
> you'll now need !IS_ERR_OR_NULL(uprobes[i].uprobe) check (or just NULL
> check if you null-out it below)

Hmm... are you sure? I'll re-check... See also the end of my email.

> > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> >                       &bpf_uprobe_multi_link_lops, prog);
> >
> >         for (i = 0; i < cnt; i++) {
> > -               err = uprobe_register(d_real_inode(link->path.dentry),
> > +               uprobes[i].uprobe = uprobe_register(d_real_inode(link->path.dentry),
>
> will uprobe keep inode alive as long as uprobe is attached?

Why? No, this patch doesn't (shouldn't) change the current behaviour.

The users still need kern_path/path_put to, say, prevent umount.

> we can NULL-out uprobe on error for bpf_uprobe_unregister() to handle
> only NULL vs non-NULL case

Not sure I understand...

> or maybe better yet let's just have local struct uprobe variable and
> only assign it if registration succeeded

We can, but

> (still need NULL check in
> bpf_uprobe_unregister above)

Why?

If bpf_uprobe_unregister() is called by bpf_uprobe_multi_link_attach() on
error, it passes cnt = i where is the 1st failed uprobe index. IOW,
uptobes[i].uprobe can't be IS_ERR_OR_NULL() in the 0..cnt-1 range.

If it is called by bpf_uprobe_multi_link_release() then the whole
0..umulti_link->cnt-1 range should be fine?

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-10 20:16       ` Oleg Nesterov
@ 2024-07-10 20:46         ` Andrii Nakryiko
  0 siblings, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2024-07-10 20:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: andrii, mhiramat, peterz, clm, jolsa, mingo, paulmck, rostedt,
	linux-kernel, linux-trace-kernel

On Wed, Jul 10, 2024 at 1:18 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 07/10, Andrii Nakryiko wrote:
> >
> > On Wed, Jul 10, 2024 at 9:33 AM Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > This way uprobe_unregister() and uprobe_apply() do not need find_uprobe() +
> > > put_uprobe(). And to me this change simplifies the code a bit.
> > >
> > > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> > > ---
> > >  include/linux/uprobes.h     | 14 ++++++------
> > >  kernel/events/uprobes.c     | 45 ++++++++++++-------------------------
> > >  kernel/trace/bpf_trace.c    | 12 +++++-----
> > >  kernel/trace/trace_uprobe.c | 28 +++++++++++------------
> > >  4 files changed, 41 insertions(+), 58 deletions(-)
> > >
> > > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> > > index aa89a8b67039..399509befcf4 100644
> > > --- a/include/linux/uprobes.h
> > > +++ b/include/linux/uprobes.h
> >
> > I don't see struct uprobe forward-declared in this header, maybe we
> > should add it?
>
> Probably yes, thanks... Although the current code already uses
> struct uprobes * without forward-declaration at least if CONFIG_UPROBES=y.
> Thanks, will add.
>

Yep, I saw that and was wondering as well.

> > >  static inline int
> > > -uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool add)
> > > +uprobe_apply(struct uprobe* uprobe, struct uprobe_consumer *uc, bool add)
> > >  {
> > >         return -ENOSYS;
> > >  }
> >
> > complete aside, when I was looking at this code I was wondering why we
> > even need uprobe_apply, it looks like some hacky variant of
> > uprobe_register and uprobe_unregister.
>
> All I can say is that
>
>         - I can hardly recall this logic, I'll try to do this tomorrow
>           and write another email
>
>         - in any case this logic is ugly and needs more cleanups
>
>         - but this patch only tries to simplify this code without any
>           visible changes.

yep, that's why it's an aside, up to you

>
> > > @@ -1133,41 +1126,39 @@ EXPORT_SYMBOL_GPL(uprobe_unregister);
> > >   * refcount is released when the last @uc for the @uprobe
> > >   * unregisters. Caller of uprobe_register() is required to keep @inode
> > >   * (and the containing mount) referenced.
> > > - *
> > > - * Return errno if it cannot successully install probes
> > > - * else return 0 (success)
> >
> > mention that it never returns NULL, but rather encodes error code
> > inside the pointer on error? It's an important part of the contract.
>
> OK...
>
> > >  /*
> >
> > this should be /** for doccomment checking (you'd get a warning for
> > missing @uprobe if there was this extra star)
>
> Well, this is what we have before this patch, but OK
>
> > >   * uprobe_apply - unregister an already registered probe.
> > > - * @inode: the file in which the probe has to be removed.
> > > - * @offset: offset from the start of the file.
> >
> > add @uprobe description now?
>
> If only I knew what this @uprobe description can say ;)

I'm pointing this out because I accidentally used /** for comment for
some function, and I got some bot report about missing argument. I
think /** makes sense for documenting "public API" function, so which
is why all the above.

>
> > > @@ -3180,10 +3181,8 @@ static void bpf_uprobe_unregister(struct path *path, struct bpf_uprobe *uprobes,
> > >  {
> > >         u32 i;
> > >
> > > -       for (i = 0; i < cnt; i++) {
> > > -               uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> > > -                                 &uprobes[i].consumer);
> > > -       }
> > > +       for (i = 0; i < cnt; i++)
> >
> > you'll now need !IS_ERR_OR_NULL(uprobes[i].uprobe) check (or just NULL
> > check if you null-out it below)
>
> Hmm... are you sure? I'll re-check... See also the end of my email.

no, you are right, it should be fine

>
> > > @@ -3477,11 +3476,12 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> > >                       &bpf_uprobe_multi_link_lops, prog);
> > >
> > >         for (i = 0; i < cnt; i++) {
> > > -               err = uprobe_register(d_real_inode(link->path.dentry),
> > > +               uprobes[i].uprobe = uprobe_register(d_real_inode(link->path.dentry),
> >
> > will uprobe keep inode alive as long as uprobe is attached?
>
> Why? No, this patch doesn't (shouldn't) change the current behaviour.
>
> The users still need kern_path/path_put to, say, prevent umount.

ok, then link->path has to stay, I was just wondering if we need to
keep it alive

>
> > we can NULL-out uprobe on error for bpf_uprobe_unregister() to handle
> > only NULL vs non-NULL case
>
> Not sure I understand...
>
> > or maybe better yet let's just have local struct uprobe variable and
> > only assign it if registration succeeded
>
> We can, but
>
> > (still need NULL check in
> > bpf_uprobe_unregister above)
>
> Why?
>
> If bpf_uprobe_unregister() is called by bpf_uprobe_multi_link_attach() on
> error, it passes cnt = i where is the 1st failed uprobe index. IOW,
> uptobes[i].uprobe can't be IS_ERR_OR_NULL() in the 0..cnt-1 range.
>
> If it is called by bpf_uprobe_multi_link_release() then the whole
> 0..umulti_link->cnt-1 range should be fine?

yes, you are absolutely right, I missed that for this partial failure
case we pass i as count into bpf_uprobe_unregister(), sorry about the
noise!


please feel free to add my ack for the next revision:

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>
> Oleg.
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] uprobes: document the usage of mm->mmap_lock
  2024-07-10 15:10     ` Oleg Nesterov
@ 2024-07-11  0:07       ` Masami Hiramatsu
  2024-07-11  9:49         ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Masami Hiramatsu @ 2024-07-11  0:07 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: andrii, peterz, clm, jolsa, mingo, paulmck, linux-kernel

On Wed, 10 Jul 2024 17:10:07 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 07/10, Masami Hiramatsu wrote:
> >
> > On Wed, 10 Jul 2024 16:00:45 +0200
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > The comment above uprobe_write_opcode() is wrong, unapply_uprobe() calls
> > > it under mmap_read_lock() and this is correct.
> > >
> > > And it is completely unclear why register_for_each_vma() takes mmap_lock
> > > for writing, add a comment to explain that mmap_write_lock() is needed to
> > > avoid the following race:
> > >
> > > 	- A task T hits the bp installed by uprobe and calls
> > > 	  find_active_uprobe()
> > >
> > > 	- uprobe_unregister() removes this uprobe/bp
> > >
> > > 	- T calls find_uprobe() which returns NULL
> > >
> > > 	- another uprobe_register() installs the bp at the same address
> > >
> > > 	- T calls is_trap_at_addr() which returns true
> > >
> > > 	- T returns to handle_swbp() and gets SIGTRAP.
> 
> ...
> 
> > >  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > > @@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
> > >
> > >  		if (err && is_register)
> > >  			goto free;
> > > -
> > > +		/*
> > > +		 * We take mmap_lock for writing to avoid the race with
> > > +		 * find_active_uprobe(), install_breakpoint() must not
> > > +		 * make is_trap_at_addr() true right after find_uprobe()
> > > +		 * returns NULL.
> >
> > Sorry, I couldn't catch the latter part. What is the relationship of
> > taking the mmap_lock and install_breakpoint() and is_trap_at_addr() here?
> 
> Please the the changelog above, it tries to explain this race with more
> details...

OK, but it seems we should write the above longer explanation here.
What about the comment like this?

/*
 * We take mmap_lock for writing to avoid the race with
 * find_active_uprobe() and is_trap_at_adder() in reader
 * side.
 * If the reader, which hits a swbp and is handling it,
 * does not take mmap_lock for reading, it is possible
 * that find_active_uprobe() returns NULL (because
 * uprobe_unregister() removes uprobes right before that),
 * but is_trap_at_addr() can return true afterwards (because
 * another thread calls uprobe_register() on the same address).
 * This causes unexpected SIGTRAP on reader thread.
 * Taking mmap_lock avoids this race.
*/

> 
> > You meant that find_active_uprobe() is using find_uprobe() which searchs
> > uprobe form rbtree?
> 
> Yes,
> 
> > But it seems uprobe is already inserted to the rbtree
> > in alloc_uprobe() so find_uprobe() will not return NULL here, right?
> 
> uprobe_register() -> alloc_uprobe() can come after
> find_active_uprobe() -> find_uprobe() returns NULL.
> 
> Now, if uprobe_register() -> register_for_each_vma() used mmap_read_lock(), it
> could do install_breakpoint() before find_active_uprobe() calls is_trap_at_addr().
> 
> In this case find_active_uprobe() returns with uprobe == NULL and is_swbp == 1,
> handle_swbp() treat this case as the "normal" int3 without uprobe and do
> 
> 	if (!uprobe) {
> 		if (is_swbp > 0) {
> 			/* No matching uprobe; signal SIGTRAP. */
> 			force_sig(SIGTRAP);
> 
> Does this answer your question?

No, thanks for the explanation.

Thank you!

> 
> Oleg.
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] uprobes: future cleanups for review
  2024-07-10 16:30 ` [PATCH 0/3] uprobes: future cleanups for review Oleg Nesterov
                     ` (2 preceding siblings ...)
  2024-07-10 16:31   ` [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
@ 2024-07-11  8:27   ` Peter Zijlstra
  2024-07-11  8:45     ` Oleg Nesterov
  3 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2024-07-11  8:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: andrii, mhiramat, clm, jolsa, mingo, paulmck, rostedt,
	linux-kernel, linux-trace-kernel

On Wed, Jul 10, 2024 at 06:30:22PM +0200, Oleg Nesterov wrote:
> On 07/10, Oleg Nesterov wrote:
> >
> > Peter, these simple cleanups should not conflict with your changes,
> > but I can resend them later if it causes any inconvenience.
> 
> In fact I would like to push 2 more cleanups before the more significant
> changes, but they certainly conflict with your ongoing work, albeit only
> textually.
> 
> Let me send the patches for review anyway, perhaps you can take at least
> the 1st one.
> 
> 3/3 is only compile tested so far. Andrii, can you take a look?

I was going to post a new version today, but I can wait and rebase on
top of / include these 5 patches (or more, these things tend to grow).


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/3] uprobes: future cleanups for review
  2024-07-11  8:27   ` [PATCH 0/3] uprobes: future cleanups for review Peter Zijlstra
@ 2024-07-11  8:45     ` Oleg Nesterov
  0 siblings, 0 replies; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-11  8:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: andrii, mhiramat, clm, jolsa, mingo, paulmck, rostedt,
	linux-kernel, linux-trace-kernel

On 07/11, Peter Zijlstra wrote:
>
> On Wed, Jul 10, 2024 at 06:30:22PM +0200, Oleg Nesterov wrote:
> >
> > In fact I would like to push 2 more cleanups before the more significant
> > changes, but they certainly conflict with your ongoing work, albeit only
> > textually.
> >
> > Let me send the patches for review anyway, perhaps you can take at least
> > the 1st one.
> >
> > 3/3 is only compile tested so far. Andrii, can you take a look?
>
> I was going to post a new version today, but I can wait and rebase on

No, I do not want to delay your changes, please send the new version,
I'll rebase these cleanups on top of it.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-10 16:31   ` [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
  2024-07-10 16:48     ` Jiri Olsa
  2024-07-10 18:21     ` Andrii Nakryiko
@ 2024-07-11  9:26     ` Oleg Nesterov
  2024-07-11 17:11       ` Andrii Nakryiko
  2 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-11  9:26 UTC (permalink / raw)
  To: andrii, mhiramat, peterz
  Cc: clm, jolsa, mingo, paulmck, rostedt, linux-kernel,
	linux-trace-kernel

On 07/10, Oleg Nesterov wrote:
>
> -void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
> +void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
>  {
> -	struct uprobe *uprobe;
> -
> -	uprobe = find_uprobe(inode, offset);
> -	if (WARN_ON(!uprobe))
> -		return;
> -
>  	down_write(&uprobe->register_rwsem);
>  	__uprobe_unregister(uprobe, uc);
>  	up_write(&uprobe->register_rwsem);
> -	put_uprobe(uprobe);

OK, this is obviously wrong, needs get_uprobe/put_uprobe. __uprobe_unregister()
can free this uprobe, so up_write(&uprobe->register_rwsem) is not safe.

I'll send V2 on top of Peter's new version.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] uprobes: document the usage of mm->mmap_lock
  2024-07-11  0:07       ` Masami Hiramatsu
@ 2024-07-11  9:49         ` Oleg Nesterov
  2024-07-11 14:19           ` Masami Hiramatsu
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-11  9:49 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: andrii, peterz, clm, jolsa, mingo, paulmck, linux-kernel

On 07/11, Masami Hiramatsu wrote:
>
> On Wed, 10 Jul 2024 17:10:07 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > > >  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > > > @@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
> > > >
> > > >  		if (err && is_register)
> > > >  			goto free;
> > > > -
> > > > +		/*
> > > > +		 * We take mmap_lock for writing to avoid the race with
> > > > +		 * find_active_uprobe(), install_breakpoint() must not
> > > > +		 * make is_trap_at_addr() true right after find_uprobe()
> > > > +		 * returns NULL.
> > >

...

> OK, but it seems we should write the above longer explanation here.
> What about the comment like this?

Well, I am biased, but your version looks much more confusing to me...

> /*
>  * We take mmap_lock for writing to avoid the race with
>  * find_active_uprobe() and is_trap_at_adder() in reader
>  * side.
>  * If the reader, which hits a swbp and is handling it,
>  * does not take mmap_lock for reading,

this looks as if the reader which hits a swbp takes mmap_lock for reading
because of this race. No, find_active_uprobe() needs mmap_read_lock() for
vma_lookup, get_user_pages, etc.

> it is possible
>  * that find_active_uprobe() returns NULL (because
>  * uprobe_unregister() removes uprobes right before that),
>  * but is_trap_at_addr() can return true afterwards (because
>  * another thread calls uprobe_register() on the same address).
     ^^^^^^^^^^^^^^^
We are the thread which called uprobe_register(), we are going to
do install_breakpoint().

And btw, not that I think this makes sense, but register_for_each_vma()
could probably do

	if (is_register)
		mmap_write_lock(mm);
	else
		mmap_read_lock(mm);

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] uprobes: document the usage of mm->mmap_lock
  2024-07-11  9:49         ` Oleg Nesterov
@ 2024-07-11 14:19           ` Masami Hiramatsu
  2024-07-11 15:25             ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Masami Hiramatsu @ 2024-07-11 14:19 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: andrii, peterz, clm, jolsa, mingo, paulmck, linux-kernel

On Thu, 11 Jul 2024 11:49:40 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 07/11, Masami Hiramatsu wrote:
> >
> > On Wed, 10 Jul 2024 17:10:07 +0200
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > > >  int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > > > > @@ -1046,7 +1046,12 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
> > > > >
> > > > >  		if (err && is_register)
> > > > >  			goto free;
> > > > > -
> > > > > +		/*
> > > > > +		 * We take mmap_lock for writing to avoid the race with
> > > > > +		 * find_active_uprobe(), install_breakpoint() must not
> > > > > +		 * make is_trap_at_addr() true right after find_uprobe()
> > > > > +		 * returns NULL.
> > > >
> 
> ...
> 
> > OK, but it seems we should write the above longer explanation here.
> > What about the comment like this?
> 
> Well, I am biased, but your version looks much more confusing to me...
> 
> > /*
> >  * We take mmap_lock for writing to avoid the race with
> >  * find_active_uprobe() and is_trap_at_adder() in reader
> >  * side.
> >  * If the reader, which hits a swbp and is handling it,
> >  * does not take mmap_lock for reading,
> 
> this looks as if the reader which hits a swbp takes mmap_lock for reading
> because of this race. No, find_active_uprobe() needs mmap_read_lock() for
> vma_lookup, get_user_pages, etc.

OK, so it is for looking up VMA. (But in the end, this rock protects both
the VMAs and uprobes, right?)

> 
> > it is possible
> >  * that find_active_uprobe() returns NULL (because
> >  * uprobe_unregister() removes uprobes right before that),
> >  * but is_trap_at_addr() can return true afterwards (because
> >  * another thread calls uprobe_register() on the same address).
>      ^^^^^^^^^^^^^^^
> We are the thread which called uprobe_register(), we are going to
> do install_breakpoint().

Ah, yes :)

What about this?

	 * We take mmap_lock for writing to avoid the race with
	 * find_active_uprobe(), which takes mmap_lock for reading.
	 * Thus this install_breakpoint() must not make
	 * is_trap_at_addr() true right after find_uprobe()
	 * returns NULL in find_active_uprobe().


> 
> And btw, not that I think this makes sense, but register_for_each_vma()
> could probably do
> 
> 	if (is_register)
> 		mmap_write_lock(mm);
> 	else
> 		mmap_read_lock(mm);

Agreed.

Thank you,

> 
> Oleg.
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] uprobes: simplify error handling for alloc_uprobe()
  2024-07-10 16:31   ` [PATCH 2/3] uprobes: simplify error handling for alloc_uprobe() Oleg Nesterov
@ 2024-07-11 15:18     ` Masami Hiramatsu
  0 siblings, 0 replies; 30+ messages in thread
From: Masami Hiramatsu @ 2024-07-11 15:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: andrii, peterz, clm, jolsa, mingo, paulmck, rostedt, linux-kernel,
	linux-trace-kernel

On Wed, 10 Jul 2024 18:31:11 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> From: Andrii Nakryiko <andrii@kernel.org>
> 
> Return -ENOMEM instead of NULL, which makes caller's error handling just
> a touch simpler.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Looks good to me.

Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

> ---
>  kernel/events/uprobes.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 4950decebe5c..e5b7c6239970 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -725,7 +725,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset,
>  
>  	uprobe = kzalloc(sizeof(struct uprobe), GFP_KERNEL);
>  	if (!uprobe)
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>  
>  	uprobe->inode = inode;
>  	uprobe->offset = offset;
> @@ -1166,8 +1166,6 @@ int uprobe_register(struct inode *inode, loff_t offset, loff_t ref_ctr_offset,
>  
>   retry:
>  	uprobe = alloc_uprobe(inode, offset, ref_ctr_offset);
> -	if (!uprobe)
> -		return -ENOMEM;
>  	if (IS_ERR(uprobe))
>  		return PTR_ERR(uprobe);
>  
> -- 
> 2.25.1.362.g51ebf55
> 
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] uprobes: document the usage of mm->mmap_lock
  2024-07-11 14:19           ` Masami Hiramatsu
@ 2024-07-11 15:25             ` Oleg Nesterov
  0 siblings, 0 replies; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-11 15:25 UTC (permalink / raw)
  To: Masami Hiramatsu; +Cc: andrii, peterz, clm, jolsa, mingo, paulmck, linux-kernel

On 07/11, Masami Hiramatsu wrote:
>
> What about this?
>
> 	 * We take mmap_lock for writing to avoid the race with
> 	 * find_active_uprobe(), which takes mmap_lock for reading.
> 	 * Thus this install_breakpoint() must not make
> 	 * is_trap_at_addr() true right after find_uprobe()
> 	 * returns NULL in find_active_uprobe().

Thanks! will change.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-11  9:26     ` Oleg Nesterov
@ 2024-07-11 17:11       ` Andrii Nakryiko
  2024-07-11 18:26         ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Andrii Nakryiko @ 2024-07-11 17:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: andrii, mhiramat, peterz, clm, jolsa, mingo, paulmck, rostedt,
	linux-kernel, linux-trace-kernel

On Thu, Jul 11, 2024 at 2:28 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 07/10, Oleg Nesterov wrote:
> >
> > -void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
> > +void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
> >  {
> > -     struct uprobe *uprobe;
> > -
> > -     uprobe = find_uprobe(inode, offset);
> > -     if (WARN_ON(!uprobe))
> > -             return;
> > -
> >       down_write(&uprobe->register_rwsem);
> >       __uprobe_unregister(uprobe, uc);
> >       up_write(&uprobe->register_rwsem);
> > -     put_uprobe(uprobe);
>
> OK, this is obviously wrong, needs get_uprobe/put_uprobe. __uprobe_unregister()
> can free this uprobe, so up_write(&uprobe->register_rwsem) is not safe.

uprobe_register(), given it returns an uprobe instance to the caller
should keep refcount on it (it belongs to uprobe_consumer). That's
what I did for my patches, are you going to do that as well?

We basically do the same thing, just interfaces look a bit different.


>
> I'll send V2 on top of Peter's new version.
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe *
  2024-07-11 17:11       ` Andrii Nakryiko
@ 2024-07-11 18:26         ` Oleg Nesterov
  0 siblings, 0 replies; 30+ messages in thread
From: Oleg Nesterov @ 2024-07-11 18:26 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: andrii, mhiramat, peterz, clm, jolsa, mingo, paulmck, rostedt,
	linux-kernel, linux-trace-kernel

On 07/11, Andrii Nakryiko wrote:
>
> On Thu, Jul 11, 2024 at 2:28 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > On 07/10, Oleg Nesterov wrote:
> > >
> > > -void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
> > > +void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
> > >  {
> > > -     struct uprobe *uprobe;
> > > -
> > > -     uprobe = find_uprobe(inode, offset);
> > > -     if (WARN_ON(!uprobe))
> > > -             return;
> > > -
> > >       down_write(&uprobe->register_rwsem);
> > >       __uprobe_unregister(uprobe, uc);
> > >       up_write(&uprobe->register_rwsem);
> > > -     put_uprobe(uprobe);
> >
> > OK, this is obviously wrong, needs get_uprobe/put_uprobe. __uprobe_unregister()
> > can free this uprobe, so up_write(&uprobe->register_rwsem) is not safe.
>
> uprobe_register(), given it returns an uprobe instance to the caller
> should keep refcount on it (it belongs to uprobe_consumer).

Of course. And again, this patch doesn't change the curent behaviour.

> That's
> what I did for my patches, are you going to do that as well?
>
> We basically do the same thing, just interfaces look a bit different.

Not sure. Well I do not really know, I didn't read your series to the
end, sorry ;) The same for V1/V2 from Peter so far.

But let me say this just in case... With or without this change,
currently uprobe_consumer doesn't have an "individual" ref to uprobe.
The fact that uprobe->consumers != NULL adds a reference.

Lets not discuss if this is good or bad right now, this cleanup is
only cleanup.

------------------------------------------------------------------------
Now, let me add another "just in case" note to explain what I am going
to do in V2.

So. this patch should turn uprobe_unregister() into something like

	void uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer *uc)
	{
		// Ugly !!!! please kill me!!!
		get_uprobe(uprobe);
		down_write(&uprobe->register_rwsem);
		__uprobe_unregister(uprobe, uc);
		up_write(&uprobe->register_rwsem);
		put_uprobe(uprobe);
	}

to simplify this change. And the next (simple) patch will kill these
get_uprobe + put_uprobe, we just need to shift the (possibly) final
put_uprobe() from delete_uprobe() to unregister().

But of course, I will recheck before I send V2.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-07-11 18:27 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-10 14:00 [PATCH 0/2] uprobes: document mmap_lock, don't abuse get_user_pages_remote() Oleg Nesterov
2024-07-10 14:00 ` [PATCH 1/2] uprobes: document the usage of mm->mmap_lock Oleg Nesterov
2024-07-10 14:51   ` Masami Hiramatsu
2024-07-10 15:10     ` Oleg Nesterov
2024-07-11  0:07       ` Masami Hiramatsu
2024-07-11  9:49         ` Oleg Nesterov
2024-07-11 14:19           ` Masami Hiramatsu
2024-07-11 15:25             ` Oleg Nesterov
2024-07-10 14:01 ` [PATCH 2/2] uprobes: is_trap_at_addr: don't use get_user_pages_remote() Oleg Nesterov
2024-07-10 15:15   ` Andrii Nakryiko
2024-07-10 16:30 ` [PATCH 0/3] uprobes: future cleanups for review Oleg Nesterov
2024-07-10 16:30   ` [PATCH 1/3] uprobes: kill uprobe_register_refctr() Oleg Nesterov
2024-07-10 18:03     ` Andrii Nakryiko
2024-07-10 19:32       ` Oleg Nesterov
2024-07-10 16:31   ` [PATCH 2/3] uprobes: simplify error handling for alloc_uprobe() Oleg Nesterov
2024-07-11 15:18     ` Masami Hiramatsu
2024-07-10 16:31   ` [PATCH 3/3] uprobes: make uprobe_register() return struct uprobe * Oleg Nesterov
2024-07-10 16:48     ` Jiri Olsa
2024-07-10 18:23       ` Andrii Nakryiko
2024-07-10 19:38         ` Jiri Olsa
2024-07-10 19:48           ` Andrii Nakryiko
2024-07-10 19:20       ` Oleg Nesterov
2024-07-10 18:21     ` Andrii Nakryiko
2024-07-10 20:16       ` Oleg Nesterov
2024-07-10 20:46         ` Andrii Nakryiko
2024-07-11  9:26     ` Oleg Nesterov
2024-07-11 17:11       ` Andrii Nakryiko
2024-07-11 18:26         ` Oleg Nesterov
2024-07-11  8:27   ` [PATCH 0/3] uprobes: future cleanups for review Peter Zijlstra
2024-07-11  8:45     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox