BPF List
 help / color / mirror / Atom feed
* Re: [PATCH v2 06/11] perf/uprobe: SRCU-ify uprobe->consumer list
       [not found] ` <20240711110400.880800153@infradead.org>
@ 2024-07-12 21:06   ` Andrii Nakryiko
  2024-07-15 11:25     ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2024-07-12 21:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, andrii, oleg, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

+ bpf@vger, please cc bpf ML for the next revision, these changes are
very relevant there as well, thanks

On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> With handle_swbp() hitting concurrently on (all) CPUs the
> uprobe->register_rwsem can get very contended. Add an SRCU instance to
> cover the consumer list and consumer lifetime.
>
> Since the consumer are externally embedded structures, unregister will
> have to suffer a synchronize_srcu().
>
> A notably complication is the UPROBE_HANDLER_REMOVE logic which can
> race against uprobe_register() such that it might want to remove a
> freshly installer handler that didn't get called. In order to close
> this hole, a seqcount is added. With that, the removal path can tell
> if anything changed and bail out of the removal.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/events/uprobes.c |   60 ++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 50 insertions(+), 10 deletions(-)
>

[...]

> @@ -800,7 +808,7 @@ static bool consumer_del(struct uprobe *
>         down_write(&uprobe->consumer_rwsem);
>         for (con = &uprobe->consumers; *con; con = &(*con)->next) {
>                 if (*con == uc) {
> -                       *con = uc->next;
> +                       WRITE_ONCE(*con, uc->next);

I have a dumb and mechanical question.

Above in consumer_add() you are doing WRITE_ONCE() for uc->next
assignment, but rcu_assign_pointer() for uprobe->consumers. Here, you
are doing WRITE_ONCE() for the same operation, if it so happens that
uc == *con == uprobe->consumers. So is rcu_assign_pointer() necessary
in consumer_addr()? If yes, we should have it here as well, no? And if
not, why bother with it in consumer_add()?

>                         ret = true;
>                         break;
>                 }
> @@ -1139,9 +1147,13 @@ void uprobe_unregister(struct inode *ino
>                 return;
>
>         down_write(&uprobe->register_rwsem);
> +       raw_write_seqcount_begin(&uprobe->register_seq);
>         __uprobe_unregister(uprobe, uc);
> +       raw_write_seqcount_end(&uprobe->register_seq);
>         up_write(&uprobe->register_rwsem);
>         put_uprobe(uprobe);
> +
> +       synchronize_srcu(&uprobes_srcu);
>  }
>  EXPORT_SYMBOL_GPL(uprobe_unregister);

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 07/11] perf/uprobe: Split uprobe_unregister()
       [not found] ` <20240711110400.987380024@infradead.org>
@ 2024-07-12 21:10   ` Andrii Nakryiko
  0 siblings, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2024-07-12 21:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, andrii, oleg, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

+ bpf

On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> With uprobe_unregister() having grown a synchronize_srcu(), it becomes
> fairly slow to call. Esp. since both users of this API call it in a
> loop.
>
> Peel off the sync_srcu() and do it once, after the loop.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
> ---
>  include/linux/uprobes.h     |    8 ++++++--
>  kernel/events/uprobes.c     |    8 ++++++--
>  kernel/trace/bpf_trace.c    |    6 ++++--
>  kernel/trace/trace_uprobe.c |    6 +++++-
>  4 files changed, 21 insertions(+), 7 deletions(-)
>

BPF side of things looks good:

Acked-by: Andrii Nakryiko <andrii@kernel.org>


> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -113,7 +113,8 @@ extern int uprobe_write_opcode(struct ar
>  extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
>  extern int uprobe_register_refctr(struct inode *inode, loff_t offset, loff_t ref_ctr_offset, struct uprobe_consumer *uc);
>  extern int uprobe_apply(struct inode *inode, loff_t offset, struct uprobe_consumer *uc, bool);
> -extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
> +extern void uprobe_unregister_nosync(struct inode *inode, loff_t offset, struct uprobe_consumer *uc);
> +extern void uprobe_unregister_sync(void);
>  extern int uprobe_mmap(struct vm_area_struct *vma);
>  extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end);
>  extern void uprobe_start_dup_mmap(void);
> @@ -163,7 +164,10 @@ uprobe_apply(struct inode *inode, loff_t
>         return -ENOSYS;
>  }
>  static inline void
> -uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
> +uprobe_unregister_nosync(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
> +{
> +}
> +static inline void uprobes_unregister_sync(void)
>  {
>  }
>  static inline int uprobe_mmap(struct vm_area_struct *vma)
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1138,7 +1138,7 @@ __uprobe_unregister(struct uprobe *uprob
>   * @offset: offset from the start of the file.
>   * @uc: identify which probe if multiple probes are colocated.
>   */
> -void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
> +void uprobe_unregister_nosync(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
>  {
>         struct uprobe *uprobe;
>
> @@ -1152,10 +1152,14 @@ void uprobe_unregister(struct inode *ino
>         raw_write_seqcount_end(&uprobe->register_seq);
>         up_write(&uprobe->register_rwsem);
>         put_uprobe(uprobe);
> +}
> +EXPORT_SYMBOL_GPL(uprobe_unregister_nosync);
>
> +void uprobe_unregister_sync(void)
> +{
>         synchronize_srcu(&uprobes_srcu);
>  }
> -EXPORT_SYMBOL_GPL(uprobe_unregister);
> +EXPORT_SYMBOL_GPL(uprobe_unregister_sync);
>
>  /*
>   * __uprobe_register - register a probe
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -3181,9 +3181,11 @@ static void bpf_uprobe_unregister(struct
>         u32 i;
>
>         for (i = 0; i < cnt; i++) {
> -               uprobe_unregister(d_real_inode(path->dentry), uprobes[i].offset,
> -                                 &uprobes[i].consumer);
> +               uprobe_unregister_nosync(d_real_inode(path->dentry), uprobes[i].offset,
> +                                        &uprobes[i].consumer);
>         }
> +       if (cnt)
> +               uprobe_unregister_sync();
>  }
>
>  static void bpf_uprobe_multi_link_release(struct bpf_link *link)
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -1104,6 +1104,7 @@ static int trace_uprobe_enable(struct tr
>  static void __probe_event_disable(struct trace_probe *tp)
>  {
>         struct trace_uprobe *tu;
> +       bool sync = false;
>
>         tu = container_of(tp, struct trace_uprobe, tp);
>         WARN_ON(!uprobe_filter_is_empty(tu->tp.event->filter));
> @@ -1112,9 +1113,12 @@ static void __probe_event_disable(struct
>                 if (!tu->inode)
>                         continue;
>
> -               uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
> +               uprobe_unregister_nosync(tu->inode, tu->offset, &tu->consumer);
> +               sync = true;
>                 tu->inode = NULL;
>         }
> +       if (sync)
> +               uprobe_unregister_sync();
>  }
>
>  static int probe_event_enable(struct trace_event_call *call,
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 08/11] perf/uprobe: Convert (some) uprobe->refcount to SRCU
       [not found] ` <20240711110401.096506262@infradead.org>
@ 2024-07-12 21:21   ` Andrii Nakryiko
  0 siblings, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2024-07-12 21:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, andrii, oleg, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

+ bpf

On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> With handle_swbp() hitting concurrently on (all) CPUs, potentially on
> the same uprobe, the uprobe->refcount can get *very* hot. Move the
> struct uprobe lifetime into uprobes_srcu such that it covers both the
> uprobe and the uprobe->consumers list.
>
> With this, handle_swbp() can use a single large SRCU critical section
> to avoid taking a refcount on the uprobe for it's duration.
>
> Notably, the single-step and uretprobe paths need a reference that
> leaves handle_swbp() and will, for now, still use ->refcount.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/events/uprobes.c |   68 ++++++++++++++++++++++++++++--------------------
>  1 file changed, 41 insertions(+), 27 deletions(-)
>
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -51,7 +51,7 @@ static struct mutex uprobes_mmap_mutex[U
>  DEFINE_STATIC_PERCPU_RWSEM(dup_mmap_sem);
>

[...]

> @@ -1982,22 +1990,31 @@ pre_ssout(struct uprobe *uprobe, struct
>         if (!utask)
>                 return -ENOMEM;
>
> +       utask->active_uprobe = try_get_uprobe(uprobe);
> +       if (!utask->active_uprobe)
> +               return -ESRCH;
> +
>         xol_vaddr = xol_get_insn_slot(uprobe);
> -       if (!xol_vaddr)
> -               return -ENOMEM;
> +       if (!xol_vaddr) {
> +               err = -ENOMEM;
> +               goto err_uprobe;
> +       }
>
>         utask->xol_vaddr = xol_vaddr;
>         utask->vaddr = bp_vaddr;
>
>         err = arch_uprobe_pre_xol(&uprobe->arch, regs);
> -       if (unlikely(err)) {
> -               xol_free_insn_slot(current);

let's keep this here, because you later remove err_uprobe part and
err_xol is only jumped to from here; it's better to just drop err_xol
and err_uprobe altogether and keep the original xol_free_insn_slot()
here.

> -               return err;
> -       }
> +       if (unlikely(err))
> +               goto err_xol;
>
> -       utask->active_uprobe = uprobe;
>         utask->state = UTASK_SSTEP;
>         return 0;
> +
> +err_xol:
> +       xol_free_insn_slot(current);
> +err_uprobe:
> +       put_uprobe(utask->active_uprobe);

utask->active_uprobe = NULL;

let's not leave garbage in utask (even if you remove this later anyways)

> +       return err;
>  }
>
>  /*

[...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 10/11] perf/uprobe: Convert single-step and uretprobe to SRCU
       [not found] ` <20240711110401.311168524@infradead.org>
@ 2024-07-12 21:28   ` Andrii Nakryiko
  2024-07-15 11:59     ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2024-07-12 21:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, andrii, oleg, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

+ bpf

On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Both single-step and uretprobes take a refcount on struct uprobe in
> handle_swbp() in order to ensure struct uprobe stays extant until a
> next trap.
>
> Since uprobe_unregister() only cares about the uprobe_consumer
> life-time, and these intra-trap sections can be arbitrarily large,
> create a second SRCU domain to cover these.
>
> Notably, a uretprobe with a registered return_instance that never
> triggers -- because userspace -- will currently pin the
> return_instance and related uprobe until the task dies. With this
> convertion to SRCU this behaviour will inhibit freeing of all uprobes.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/uprobes.h |    2 +
>  kernel/events/uprobes.c |   60 +++++++++++++++++++++++-------------------------
>  2 files changed, 31 insertions(+), 31 deletions(-)
>
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -78,6 +78,7 @@ struct uprobe_task {
>
>         struct return_instance          *return_instances;
>         unsigned int                    depth;
> +       unsigned int                    active_srcu_idx;
>  };
>
>  struct return_instance {
> @@ -86,6 +87,7 @@ struct return_instance {
>         unsigned long           stack;          /* stack pointer */
>         unsigned long           orig_ret_vaddr; /* original return address */
>         bool                    chained;        /* true, if instance is nested */
> +       int                     srcu_idx;
>
>         struct return_instance  *next;          /* keep as stack */
>  };
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -54,6 +54,15 @@ DEFINE_STATIC_PERCPU_RWSEM(dup_mmap_sem)
>   * Covers uprobe->consumers lifetime as well as struct uprobe.
>   */
>  DEFINE_STATIC_SRCU(uprobes_srcu);
> +/*
> + * Covers return_instance->uprobe and utask->active_uprobe. Separate from
> + * uprobe_srcu because uprobe_unregister() doesn't need to wait for this
> + * and these lifetimes can be fairly long.
> + *
> + * Notably, these sections span userspace and as such use
> + * __srcu_read_{,un}lock() to elide lockdep.
> + */
> +DEFINE_STATIC_SRCU(uretprobes_srcu);
>
>  /* Have a copy of original instruction */
>  #define UPROBE_COPY_INSN       0
> @@ -596,25 +605,24 @@ set_orig_insn(struct arch_uprobe *auprob
>                         *(uprobe_opcode_t *)&auprobe->insn);
>  }
>
> -static struct uprobe *try_get_uprobe(struct uprobe *uprobe)
> -{
> -       if (refcount_inc_not_zero(&uprobe->ref))
> -               return uprobe;
> -       return NULL;
> -}
> -
>  static struct uprobe *get_uprobe(struct uprobe *uprobe)
>  {
>         refcount_inc(&uprobe->ref);
>         return uprobe;
>  }
>
> -static void uprobe_free_rcu(struct rcu_head *rcu)
> +static void uprobe_free_stage2(struct rcu_head *rcu)
>  {
>         struct uprobe *uprobe = container_of(rcu, struct uprobe, rcu);
>         kfree(uprobe);
>  }
>
> +static void uprobe_free_stage1(struct rcu_head *rcu)
> +{
> +       struct uprobe *uprobe = container_of(rcu, struct uprobe, rcu);
> +       call_srcu(&uretprobes_srcu, &uprobe->rcu, uprobe_free_stage2);
> +}
> +
>  static void put_uprobe(struct uprobe *uprobe)
>  {
>         if (refcount_dec_and_test(&uprobe->ref)) {
> @@ -626,7 +634,7 @@ static void put_uprobe(struct uprobe *up
>                 mutex_lock(&delayed_uprobe_lock);
>                 delayed_uprobe_remove(uprobe, NULL);
>                 mutex_unlock(&delayed_uprobe_lock);
> -               call_srcu(&uprobes_srcu, &uprobe->rcu, uprobe_free_rcu);
> +               call_srcu(&uprobes_srcu, &uprobe->rcu, uprobe_free_stage1);
>         }
>  }
>
> @@ -1753,7 +1761,7 @@ unsigned long uprobe_get_trap_addr(struc
>  static struct return_instance *free_ret_instance(struct return_instance *ri)
>  {
>         struct return_instance *next = ri->next;
> -       put_uprobe(ri->uprobe);
> +       __srcu_read_unlock(&uretprobes_srcu, ri->srcu_idx);
>         kfree(ri);
>         return next;
>  }
> @@ -1771,7 +1779,7 @@ void uprobe_free_utask(struct task_struc
>                 return;
>
>         if (utask->active_uprobe)
> -               put_uprobe(utask->active_uprobe);
> +               __srcu_read_unlock(&uretprobes_srcu, utask->active_srcu_idx);
>
>         ri = utask->return_instances;
>         while (ri)
> @@ -1814,7 +1822,7 @@ static int dup_utask(struct task_struct
>                         return -ENOMEM;
>
>                 *n = *o;
> -               get_uprobe(n->uprobe);
> +               __srcu_clone_read_lock(&uretprobes_srcu, n->srcu_idx);

do we need to add this __srcu_clone_read_lock hack just to avoid
taking a refcount in dup_utask (i.e., on process fork)? This is not
that frequent and performance-sensitive case, so it seems like it
should be fine to take refcount and avoid doing srcu_read_unlock() in
a new process. Just like the case with long-running uretprobes where
you convert SRCU lock into refcount.

>                 n->next = NULL;
>
>                 *p = n;
> @@ -1931,14 +1939,10 @@ static void prepare_uretprobe(struct upr
>         if (!ri)
>                 return;
>
> -       ri->uprobe = try_get_uprobe(uprobe);
> -       if (!ri->uprobe)
> -               goto err_mem;
> -
>         trampoline_vaddr = get_trampoline_vaddr();
>         orig_ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs);
>         if (orig_ret_vaddr == -1)
> -               goto err_uprobe;
> +               goto err_mem;
>
>         /* drop the entries invalidated by longjmp() */
>         chained = (orig_ret_vaddr == trampoline_vaddr);
> @@ -1956,11 +1960,13 @@ static void prepare_uretprobe(struct upr
>                          * attack from user-space.
>                          */
>                         uprobe_warn(current, "handle tail call");
> -                       goto err_uprobe;
> +                       goto err_mem;
>                 }
>                 orig_ret_vaddr = utask->return_instances->orig_ret_vaddr;
>         }
>
> +       ri->srcu_idx = __srcu_read_lock(&uretprobes_srcu);
> +       ri->uprobe = uprobe;
>         ri->func = instruction_pointer(regs);
>         ri->stack = user_stack_pointer(regs);
>         ri->orig_ret_vaddr = orig_ret_vaddr;
> @@ -1972,8 +1978,6 @@ static void prepare_uretprobe(struct upr
>
>         return;
>
> -err_uprobe:
> -       uprobe_put(ri->uprobe);
>  err_mem:
>         kfree(ri);
>  }
> @@ -1990,15 +1994,9 @@ pre_ssout(struct uprobe *uprobe, struct
>         if (!utask)
>                 return -ENOMEM;
>
> -       utask->active_uprobe = try_get_uprobe(uprobe);
> -       if (!utask->active_uprobe)
> -               return -ESRCH;
> -
>         xol_vaddr = xol_get_insn_slot(uprobe);
> -       if (!xol_vaddr) {
> -               err = -ENOMEM;
> -               goto err_uprobe;
> -       }
> +       if (!xol_vaddr)
> +               return -ENOMEM;
>
>         utask->xol_vaddr = xol_vaddr;
>         utask->vaddr = bp_vaddr;
> @@ -2007,13 +2005,13 @@ pre_ssout(struct uprobe *uprobe, struct
>         if (unlikely(err))
>                 goto err_xol;
>
> +       utask->active_srcu_idx = __srcu_read_lock(&uretprobes_srcu);
> +       utask->active_uprobe = uprobe;
>         utask->state = UTASK_SSTEP;
>         return 0;
>
>  err_xol:
>         xol_free_insn_slot(current);
> -err_uprobe:
> -       put_uprobe(utask->active_uprobe);
>         return err;
>  }
>
> @@ -2366,7 +2364,7 @@ static void handle_singlestep(struct upr
>         else
>                 WARN_ON_ONCE(1);
>
> -       put_uprobe(uprobe);
> +       __srcu_read_unlock(&uretprobes_srcu, utask->active_srcu_idx);
>         utask->active_uprobe = NULL;
>         utask->state = UTASK_RUNNING;
>         xol_free_insn_slot(current);
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 11/11] perf/uprobe: Add uretprobe timer
       [not found] ` <20240711110401.412779774@infradead.org>
@ 2024-07-12 21:43   ` Andrii Nakryiko
  2024-07-15 11:41     ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2024-07-12 21:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, andrii, oleg, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

+ bpf

On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> In order to put a bound on the uretprobe_srcu critical section, add a
> timer to uprobe_task. Upon every RI added or removed the timer is
> pushed forward to now + 1s. If the timer were ever to fire, it would
> convert the SRCU 'reference' to a refcount reference if possible.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/uprobes.h |    8 +++++
>  kernel/events/uprobes.c |   67 ++++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 69 insertions(+), 6 deletions(-)
>
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -15,6 +15,7 @@
>  #include <linux/rbtree.h>
>  #include <linux/types.h>
>  #include <linux/wait.h>
> +#include <linux/timer.h>
>
>  struct vm_area_struct;
>  struct mm_struct;
> @@ -79,6 +80,10 @@ struct uprobe_task {
>         struct return_instance          *return_instances;
>         unsigned int                    depth;
>         unsigned int                    active_srcu_idx;
> +
> +       struct timer_list               ri_timer;
> +       struct callback_head            ri_task_work;
> +       struct task_struct              *task;
>  };
>
>  struct return_instance {
> @@ -86,7 +91,8 @@ struct return_instance {
>         unsigned long           func;
>         unsigned long           stack;          /* stack pointer */
>         unsigned long           orig_ret_vaddr; /* original return address */
> -       bool                    chained;        /* true, if instance is nested */
> +       u8                      chained;        /* true, if instance is nested */
> +       u8                      has_ref;

Why bool -> u8 switch? You don't touch chained, so why change its
type? And for has_ref you interchangeably use 0 and true for the same
field. Let's stick to bool as there is nothing wrong with it?

>         int                     srcu_idx;
>
>         struct return_instance  *next;          /* keep as stack */

[...]

> @@ -1822,13 +1864,20 @@ static int dup_utask(struct task_struct
>                         return -ENOMEM;
>
>                 *n = *o;
> -               __srcu_clone_read_lock(&uretprobes_srcu, n->srcu_idx);
> +               if (n->uprobe) {
> +                       if (n->has_ref)
> +                               get_uprobe(n->uprobe);
> +                       else
> +                               __srcu_clone_read_lock(&uretprobes_srcu, n->srcu_idx);
> +               }
>                 n->next = NULL;
>
>                 *p = n;
>                 p = &n->next;
>                 n_utask->depth++;
>         }
> +       if (n_utask->return_instances)
> +               mod_timer(&n_utask->ri_timer, jiffies + HZ);

let's add #define for HZ, so it's adjusted in just one place (instead
of 3 as it is right now)

Also, we can have up to 64 levels of uretprobe nesting, so,
technically, the user can cause a delay of 64 seconds in total. Maybe
let's use something smaller than a full second? After all, if the
user-space function has high latency, then this refcount congestion is
much less of a problem. I'd set it to something like 50-100 ms for
starters.

>
>         return 0;
>  }
> @@ -1967,6 +2016,7 @@ static void prepare_uretprobe(struct upr
>
>         ri->srcu_idx = __srcu_read_lock(&uretprobes_srcu);
>         ri->uprobe = uprobe;
> +       ri->has_ref = 0;
>         ri->func = instruction_pointer(regs);
>         ri->stack = user_stack_pointer(regs);
>         ri->orig_ret_vaddr = orig_ret_vaddr;
> @@ -1976,6 +2026,8 @@ static void prepare_uretprobe(struct upr
>         ri->next = utask->return_instances;
>         utask->return_instances = ri;
>
> +       mod_timer(&utask->ri_timer, jiffies + HZ);
> +
>         return;
>
>  err_mem:
> @@ -2204,6 +2256,9 @@ handle_uretprobe_chain(struct return_ins
>         struct uprobe *uprobe = ri->uprobe;
>         struct uprobe_consumer *uc;
>
> +       if (!uprobe)
> +               return;
> +
>         guard(srcu)(&uprobes_srcu);
>
>         for_each_consumer_rcu(uc, uprobe->consumers) {
> @@ -2250,8 +2305,10 @@ static void handle_trampoline(struct pt_
>
>                 instruction_pointer_set(regs, ri->orig_ret_vaddr);
>                 do {
> -                       if (valid)
> +                       if (valid) {
>                                 handle_uretprobe_chain(ri, regs);
> +                               mod_timer(&utask->ri_timer, jiffies + HZ);
> +                       }
>                         ri = free_ret_instance(ri);
>                         utask->depth--;
>                 } while (ri != next);
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 06/11] perf/uprobe: SRCU-ify uprobe->consumer list
  2024-07-12 21:06   ` [PATCH v2 06/11] perf/uprobe: SRCU-ify uprobe->consumer list Andrii Nakryiko
@ 2024-07-15 11:25     ` Peter Zijlstra
  2024-07-15 17:30       ` Andrii Nakryiko
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2024-07-15 11:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: mingo, andrii, oleg, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

On Fri, Jul 12, 2024 at 02:06:08PM -0700, Andrii Nakryiko wrote:
> + bpf@vger, please cc bpf ML for the next revision, these changes are
> very relevant there as well, thanks
> 
> On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > With handle_swbp() hitting concurrently on (all) CPUs the
> > uprobe->register_rwsem can get very contended. Add an SRCU instance to
> > cover the consumer list and consumer lifetime.
> >
> > Since the consumer are externally embedded structures, unregister will
> > have to suffer a synchronize_srcu().
> >
> > A notably complication is the UPROBE_HANDLER_REMOVE logic which can
> > race against uprobe_register() such that it might want to remove a
> > freshly installer handler that didn't get called. In order to close
> > this hole, a seqcount is added. With that, the removal path can tell
> > if anything changed and bail out of the removal.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  kernel/events/uprobes.c |   60 ++++++++++++++++++++++++++++++++++++++++--------
> >  1 file changed, 50 insertions(+), 10 deletions(-)
> >
> 
> [...]
> 
> > @@ -800,7 +808,7 @@ static bool consumer_del(struct uprobe *
> >         down_write(&uprobe->consumer_rwsem);
> >         for (con = &uprobe->consumers; *con; con = &(*con)->next) {
> >                 if (*con == uc) {
> > -                       *con = uc->next;
> > +                       WRITE_ONCE(*con, uc->next);
> 
> I have a dumb and mechanical question.
> 
> Above in consumer_add() you are doing WRITE_ONCE() for uc->next
> assignment, but rcu_assign_pointer() for uprobe->consumers. Here, you
> are doing WRITE_ONCE() for the same operation, if it so happens that
> uc == *con == uprobe->consumers. So is rcu_assign_pointer() necessary
> in consumer_addr()? If yes, we should have it here as well, no? And if
> not, why bother with it in consumer_add()?

add is a publish and needs to ensure all stores to the object are
ordered before the object is linked in. Note that rcu_assign_pointer()
is basically a fancy way of writing smp_store_release().

del otoh does not publish, it removes and doesn't need ordering.

It does however need to ensure the pointer write itself isn't torn. That
is, without the WRITE_ONCE() the compiler is free to do byte stores in
order to update the 8 byte pointer value (assuming 64bit). This is
pretty dumb, but very much permitted by C and also utterly fatal in the
case where an RCU iteration comes by and reads a half-half pointer
value.

> >                         ret = true;
> >                         break;
> >                 }
> > @@ -1139,9 +1147,13 @@ void uprobe_unregister(struct inode *ino
> >                 return;
> >
> >         down_write(&uprobe->register_rwsem);
> > +       raw_write_seqcount_begin(&uprobe->register_seq);
> >         __uprobe_unregister(uprobe, uc);
> > +       raw_write_seqcount_end(&uprobe->register_seq);
> >         up_write(&uprobe->register_rwsem);
> >         put_uprobe(uprobe);
> > +
> > +       synchronize_srcu(&uprobes_srcu);
> >  }
> >  EXPORT_SYMBOL_GPL(uprobe_unregister);
> 
> [...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 11/11] perf/uprobe: Add uretprobe timer
  2024-07-12 21:43   ` [PATCH v2 11/11] perf/uprobe: Add uretprobe timer Andrii Nakryiko
@ 2024-07-15 11:41     ` Peter Zijlstra
  2024-07-15 17:34       ` Andrii Nakryiko
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2024-07-15 11:41 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: mingo, andrii, oleg, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

On Fri, Jul 12, 2024 at 02:43:52PM -0700, Andrii Nakryiko wrote:
> + bpf
> 
> On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > In order to put a bound on the uretprobe_srcu critical section, add a
> > timer to uprobe_task. Upon every RI added or removed the timer is
> > pushed forward to now + 1s. If the timer were ever to fire, it would
> > convert the SRCU 'reference' to a refcount reference if possible.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  include/linux/uprobes.h |    8 +++++
> >  kernel/events/uprobes.c |   67 ++++++++++++++++++++++++++++++++++++++++++++----
> >  2 files changed, 69 insertions(+), 6 deletions(-)
> >
> > --- a/include/linux/uprobes.h
> > +++ b/include/linux/uprobes.h
> > @@ -15,6 +15,7 @@
> >  #include <linux/rbtree.h>
> >  #include <linux/types.h>
> >  #include <linux/wait.h>
> > +#include <linux/timer.h>
> >
> >  struct vm_area_struct;
> >  struct mm_struct;
> > @@ -79,6 +80,10 @@ struct uprobe_task {
> >         struct return_instance          *return_instances;
> >         unsigned int                    depth;
> >         unsigned int                    active_srcu_idx;
> > +
> > +       struct timer_list               ri_timer;
> > +       struct callback_head            ri_task_work;
> > +       struct task_struct              *task;
> >  };
> >
> >  struct return_instance {
> > @@ -86,7 +91,8 @@ struct return_instance {
> >         unsigned long           func;
> >         unsigned long           stack;          /* stack pointer */
> >         unsigned long           orig_ret_vaddr; /* original return address */
> > -       bool                    chained;        /* true, if instance is nested */
> > +       u8                      chained;        /* true, if instance is nested */
> > +       u8                      has_ref;
> 
> Why bool -> u8 switch? You don't touch chained, so why change its
> type? And for has_ref you interchangeably use 0 and true for the same
> field. Let's stick to bool as there is nothing wrong with it?

sizeof(_Bool) is implementation defined. It is 1 for x86_64, but there
are platforms where it ends up begin 4 (some PowerPC ABIs among others.
I didn't want to grow this structure for no reason.

> >         int                     srcu_idx;
> >
> >         struct return_instance  *next;          /* keep as stack */
> 
> [...]
> 
> > @@ -1822,13 +1864,20 @@ static int dup_utask(struct task_struct
> >                         return -ENOMEM;
> >
> >                 *n = *o;
> > -               __srcu_clone_read_lock(&uretprobes_srcu, n->srcu_idx);
> > +               if (n->uprobe) {
> > +                       if (n->has_ref)
> > +                               get_uprobe(n->uprobe);
> > +                       else
> > +                               __srcu_clone_read_lock(&uretprobes_srcu, n->srcu_idx);
> > +               }
> >                 n->next = NULL;
> >
> >                 *p = n;
> >                 p = &n->next;
> >                 n_utask->depth++;
> >         }
> > +       if (n_utask->return_instances)
> > +               mod_timer(&n_utask->ri_timer, jiffies + HZ);
> 
> let's add #define for HZ, so it's adjusted in just one place (instead
> of 3 as it is right now)

Can do I suppose.

> Also, we can have up to 64 levels of uretprobe nesting, so,
> technically, the user can cause a delay of 64 seconds in total. Maybe
> let's use something smaller than a full second? After all, if the
> user-space function has high latency, then this refcount congestion is
> much less of a problem. I'd set it to something like 50-100 ms for
> starters.

Before you know it we'll have a sysctl :/ But sure, we can do something
shorter.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 10/11] perf/uprobe: Convert single-step and uretprobe to SRCU
  2024-07-12 21:28   ` [PATCH v2 10/11] perf/uprobe: Convert single-step and uretprobe " Andrii Nakryiko
@ 2024-07-15 11:59     ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2024-07-15 11:59 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: mingo, andrii, oleg, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

On Fri, Jul 12, 2024 at 02:28:13PM -0700, Andrii Nakryiko wrote:

> > @@ -1814,7 +1822,7 @@ static int dup_utask(struct task_struct
> >                         return -ENOMEM;
> >
> >                 *n = *o;
> > -               get_uprobe(n->uprobe);
> > +               __srcu_clone_read_lock(&uretprobes_srcu, n->srcu_idx);
> 
> do we need to add this __srcu_clone_read_lock hack just to avoid
> taking a refcount in dup_utask (i.e., on process fork)? This is not
> that frequent and performance-sensitive case, so it seems like it
> should be fine to take refcount and avoid doing srcu_read_unlock() in
> a new process. Just like the case with long-running uretprobes where
> you convert SRCU lock into refcount.

Yes, I suppose that is now possible too. But it makes the patches harder
to split. Let me ponder that after I get it to pass your stress thing.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 06/11] perf/uprobe: SRCU-ify uprobe->consumer list
  2024-07-15 11:25     ` Peter Zijlstra
@ 2024-07-15 17:30       ` Andrii Nakryiko
  0 siblings, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2024-07-15 17:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, andrii, oleg, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

On Mon, Jul 15, 2024 at 4:25 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jul 12, 2024 at 02:06:08PM -0700, Andrii Nakryiko wrote:
> > + bpf@vger, please cc bpf ML for the next revision, these changes are
> > very relevant there as well, thanks
> >
> > On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > With handle_swbp() hitting concurrently on (all) CPUs the
> > > uprobe->register_rwsem can get very contended. Add an SRCU instance to
> > > cover the consumer list and consumer lifetime.
> > >
> > > Since the consumer are externally embedded structures, unregister will
> > > have to suffer a synchronize_srcu().
> > >
> > > A notably complication is the UPROBE_HANDLER_REMOVE logic which can
> > > race against uprobe_register() such that it might want to remove a
> > > freshly installer handler that didn't get called. In order to close
> > > this hole, a seqcount is added. With that, the removal path can tell
> > > if anything changed and bail out of the removal.
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  kernel/events/uprobes.c |   60 ++++++++++++++++++++++++++++++++++++++++--------
> > >  1 file changed, 50 insertions(+), 10 deletions(-)
> > >
> >
> > [...]
> >
> > > @@ -800,7 +808,7 @@ static bool consumer_del(struct uprobe *
> > >         down_write(&uprobe->consumer_rwsem);
> > >         for (con = &uprobe->consumers; *con; con = &(*con)->next) {
> > >                 if (*con == uc) {
> > > -                       *con = uc->next;
> > > +                       WRITE_ONCE(*con, uc->next);
> >
> > I have a dumb and mechanical question.
> >
> > Above in consumer_add() you are doing WRITE_ONCE() for uc->next
> > assignment, but rcu_assign_pointer() for uprobe->consumers. Here, you
> > are doing WRITE_ONCE() for the same operation, if it so happens that
> > uc == *con == uprobe->consumers. So is rcu_assign_pointer() necessary
> > in consumer_addr()? If yes, we should have it here as well, no? And if
> > not, why bother with it in consumer_add()?
>
> add is a publish and needs to ensure all stores to the object are
> ordered before the object is linked in. Note that rcu_assign_pointer()
> is basically a fancy way of writing smp_store_release().
>
> del otoh does not publish, it removes and doesn't need ordering.
>
> It does however need to ensure the pointer write itself isn't torn. That
> is, without the WRITE_ONCE() the compiler is free to do byte stores in
> order to update the 8 byte pointer value (assuming 64bit). This is
> pretty dumb, but very much permitted by C and also utterly fatal in the
> case where an RCU iteration comes by and reads a half-half pointer
> value.
>

Thanks for elaborating, very helpful! It's basically the same idea as
with rb_find_add_rcu(), got it.

> > >                         ret = true;
> > >                         break;
> > >                 }
> > > @@ -1139,9 +1147,13 @@ void uprobe_unregister(struct inode *ino
> > >                 return;
> > >
> > >         down_write(&uprobe->register_rwsem);
> > > +       raw_write_seqcount_begin(&uprobe->register_seq);
> > >         __uprobe_unregister(uprobe, uc);
> > > +       raw_write_seqcount_end(&uprobe->register_seq);
> > >         up_write(&uprobe->register_rwsem);
> > >         put_uprobe(uprobe);
> > > +
> > > +       synchronize_srcu(&uprobes_srcu);
> > >  }
> > >  EXPORT_SYMBOL_GPL(uprobe_unregister);
> >
> > [...]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 11/11] perf/uprobe: Add uretprobe timer
  2024-07-15 11:41     ` Peter Zijlstra
@ 2024-07-15 17:34       ` Andrii Nakryiko
  0 siblings, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2024-07-15 17:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, andrii, oleg, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

On Mon, Jul 15, 2024 at 4:41 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jul 12, 2024 at 02:43:52PM -0700, Andrii Nakryiko wrote:
> > + bpf
> >
> > On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > In order to put a bound on the uretprobe_srcu critical section, add a
> > > timer to uprobe_task. Upon every RI added or removed the timer is
> > > pushed forward to now + 1s. If the timer were ever to fire, it would
> > > convert the SRCU 'reference' to a refcount reference if possible.
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > >  include/linux/uprobes.h |    8 +++++
> > >  kernel/events/uprobes.c |   67 ++++++++++++++++++++++++++++++++++++++++++++----
> > >  2 files changed, 69 insertions(+), 6 deletions(-)
> > >
> > > --- a/include/linux/uprobes.h
> > > +++ b/include/linux/uprobes.h
> > > @@ -15,6 +15,7 @@
> > >  #include <linux/rbtree.h>
> > >  #include <linux/types.h>
> > >  #include <linux/wait.h>
> > > +#include <linux/timer.h>
> > >
> > >  struct vm_area_struct;
> > >  struct mm_struct;
> > > @@ -79,6 +80,10 @@ struct uprobe_task {
> > >         struct return_instance          *return_instances;
> > >         unsigned int                    depth;
> > >         unsigned int                    active_srcu_idx;
> > > +
> > > +       struct timer_list               ri_timer;
> > > +       struct callback_head            ri_task_work;
> > > +       struct task_struct              *task;
> > >  };
> > >
> > >  struct return_instance {
> > > @@ -86,7 +91,8 @@ struct return_instance {
> > >         unsigned long           func;
> > >         unsigned long           stack;          /* stack pointer */
> > >         unsigned long           orig_ret_vaddr; /* original return address */
> > > -       bool                    chained;        /* true, if instance is nested */
> > > +       u8                      chained;        /* true, if instance is nested */
> > > +       u8                      has_ref;
> >
> > Why bool -> u8 switch? You don't touch chained, so why change its
> > type? And for has_ref you interchangeably use 0 and true for the same
> > field. Let's stick to bool as there is nothing wrong with it?
>
> sizeof(_Bool) is implementation defined. It is 1 for x86_64, but there
> are platforms where it ends up begin 4 (some PowerPC ABIs among others.
> I didn't want to grow this structure for no reason.

There are tons of bools in the kernel, surprised that we (kernel
makefiles) don't do anything on PowerPC to keep it consistent with the
rest of the world. Oh well, it just kind of looks off when there is a
mix of 0 and true used for the same field.

>
> > >         int                     srcu_idx;
> > >
> > >         struct return_instance  *next;          /* keep as stack */
> >
> > [...]
> >
> > > @@ -1822,13 +1864,20 @@ static int dup_utask(struct task_struct
> > >                         return -ENOMEM;
> > >
> > >                 *n = *o;
> > > -               __srcu_clone_read_lock(&uretprobes_srcu, n->srcu_idx);
> > > +               if (n->uprobe) {
> > > +                       if (n->has_ref)
> > > +                               get_uprobe(n->uprobe);
> > > +                       else
> > > +                               __srcu_clone_read_lock(&uretprobes_srcu, n->srcu_idx);
> > > +               }
> > >                 n->next = NULL;
> > >
> > >                 *p = n;
> > >                 p = &n->next;
> > >                 n_utask->depth++;
> > >         }
> > > +       if (n_utask->return_instances)
> > > +               mod_timer(&n_utask->ri_timer, jiffies + HZ);
> >
> > let's add #define for HZ, so it's adjusted in just one place (instead
> > of 3 as it is right now)
>
> Can do I suppose.

thanks!

>
> > Also, we can have up to 64 levels of uretprobe nesting, so,
> > technically, the user can cause a delay of 64 seconds in total. Maybe
> > let's use something smaller than a full second? After all, if the
> > user-space function has high latency, then this refcount congestion is
> > much less of a problem. I'd set it to something like 50-100 ms for
> > starters.
>
> Before you know it we'll have a sysctl :/ But sure, we can do something
> shorter.

:) let's hope we won't need sysctl (I don't think we will, FWIW)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 00/11] perf/uprobe: Optimize uprobes
       [not found]       ` <CAEf4BzY-r2EcQEVxA=kDUvx-wX3t0hsG+66=iKTS5ZaAJF4zjw@mail.gmail.com>
@ 2024-07-19 18:42         ` Andrii Nakryiko
  2024-07-27  0:18           ` Andrii Nakryiko
  0 siblings, 1 reply; 12+ messages in thread
From: Andrii Nakryiko @ 2024-07-19 18:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: oleg, mingo, andrii, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

On Mon, Jul 15, 2024 at 11:10 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 15, 2024 at 10:10 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jul 15, 2024 at 7:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Jul 11, 2024 at 09:57:44PM -0700, Andrii Nakryiko wrote:
> > >
> > > > But then I also ran it on Linux built from perf/uprobes branch (these
> > > > patches), and after a few seconds I see that there is no more
> > > > attachment/detachment happening. Eventually I got splats, which you
> > > > can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
> > > > to run it inside my QEMU image.
> > >
> > > So them git voodoo incantations did work and I got it built. I'm running
> > > that exact same line above (minus the sudo, because test box only has a
> > > root account I think) on real hardware.
> > >
> > > I'm now ~100 periods in and wondering what 'eventually' means...
> >
> > So I was running in a qemu set up with 16 cores on top of bare metal's
> > 80 core CPU (Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz). I just tried
> > it again, and I can reproduce it within first few periods:
> >
> > WORKING HARD!..
> >
> > PERIOD #1 STATS:
> > FUNC CALLS               919632
> > UPROBE HITS              706351
> > URETPROBE HITS           641679
> > ATTACHED LINKS              951
> > ATTACHED UPROBES           2421
> > ATTACHED URETPROBES        2343
> > MMAP CALLS                33533
> > FORKS CALLS                 241
> >
> > PERIOD #2 STATS:
> > FUNC CALLS                11444
> > UPROBE HITS               14320
> > URETPROBE HITS             9896
> > ATTACHED LINKS               26
> > ATTACHED UPROBES             75
> > ATTACHED URETPROBES          61
> > MMAP CALLS                39093
> > FORKS CALLS                  14
> >
> > PERIOD #3 STATS:
> > FUNC CALLS                  230
> > UPROBE HITS                 152
> > URETPROBE HITS              145
> > ATTACHED LINKS                2
> > ATTACHED UPROBES              2
> > ATTACHED URETPROBES           2
> > MMAP CALLS                39121
> > FORKS CALLS                   0
> >
> > PERIOD #4 STATS:
> > FUNC CALLS                    0
> > UPROBE HITS                   0
> > URETPROBE HITS                0
> > ATTACHED LINKS                0
> > ATTACHED UPROBES              0
> > ATTACHED URETPROBES           0
> > MMAP CALLS                39010
> > FORKS CALLS                   0
> >
> > You can see in the second period all the numbers drop and by period #4
> > (which is about 20 seconds in) anything but mmap()ing stops. When I
> > said "eventually" I meant about a minute tops, however long it takes
> > to do soft lockup detection, 23 seconds this time.
> >
> > So it should be very fast.
> >
> > Note that I'm running with debug kernel configuration (see [0] for
> > full kernel config), here are debug-related settings, in case that
> > makes a difference:
> >
> > $ cat ~/linux-build/default/.config | rg -i debug | rg -v '^#'
> > CONFIG_X86_DEBUGCTLMSR=y
> > CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> > CONFIG_BLK_DEBUG_FS=y
> > CONFIG_PNP_DEBUG_MESSAGES=y
> > CONFIG_AIC7XXX_DEBUG_MASK=0
> > CONFIG_AIC79XX_DEBUG_MASK=0
> > CONFIG_SCSI_MVSAS_DEBUG=y
> > CONFIG_DM_DEBUG=y
> > CONFIG_MLX4_DEBUG=y
> > CONFIG_USB_SERIAL_DEBUG=m
> > CONFIG_INFINIBAND_MTHCA_DEBUG=y
> > CONFIG_INFINIBAND_IPOIB_DEBUG=y
> > CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=y
> > CONFIG_CIFS_DEBUG=y
> > CONFIG_DLM_DEBUG=y
> > CONFIG_DEBUG_BUGVERBOSE=y
> > CONFIG_DEBUG_KERNEL=y
> > CONFIG_DEBUG_INFO=y
> > CONFIG_DEBUG_INFO_DWARF4=y
> > CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
> > CONFIG_DEBUG_INFO_BTF=y
> > CONFIG_DEBUG_INFO_BTF_MODULES=y
> > CONFIG_DEBUG_FS=y
> > CONFIG_DEBUG_FS_ALLOW_ALL=y
> > CONFIG_ARCH_HAS_DEBUG_WX=y
> > CONFIG_HAVE_DEBUG_KMEMLEAK=y
> > CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
> > CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
> > CONFIG_SCHED_DEBUG=y
> > CONFIG_DEBUG_PREEMPT=y
> > CONFIG_LOCK_DEBUGGING_SUPPORT=y
> > CONFIG_DEBUG_RT_MUTEXES=y
> > CONFIG_DEBUG_SPINLOCK=y
> > CONFIG_DEBUG_MUTEXES=y
> > CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
> > CONFIG_DEBUG_RWSEMS=y
> > CONFIG_DEBUG_LOCK_ALLOC=y
> > CONFIG_DEBUG_LOCKDEP=y
> > CONFIG_DEBUG_ATOMIC_SLEEP=y
> > CONFIG_DEBUG_IRQFLAGS=y
> > CONFIG_X86_DEBUG_FPU=y
> > CONFIG_FAULT_INJECTION_DEBUG_FS=y
> >
> >   [0] https://gist.github.com/anakryiko/97a023a95b30fb0fe607ff743433e64b
> >
> > >
> > > Also, this is a 2 socket, 10 core per socket, 2 threads per core
> > > ivybridge thing, are those parameters sufficient?
> >
> > Should be, I guess? It might be VM vs bare metal differences, though.
> > I'll try to run this on bare metal with more production-like kernel
> > configuration to see if I can still trigger this. Will let you know
> > the results when I get them.
>
> Ok, so I ran it on bare metal host with production config. I didn't
> really bother to specify parameters (so just one thread for
> everything, the default):
>
> # ./uprobe-stress
> WORKING HARD!..
>
> PERIOD #1 STATS:
> FUNC CALLS              2959843
> UPROBE HITS             1001312
> URETPROBE HITS                0
> ATTACHED LINKS                6
> ATTACHED UPROBES             28
> ATTACHED URETPROBES           0
> MMAP CALLS                 8143
> FORKS CALLS                 301
>
> PERIOD #2 STATS:
> FUNC CALLS                    0
> UPROBE HITS              822826
> URETPROBE HITS                0
> ATTACHED LINKS                0
> ATTACHED UPROBES              0
> ATTACHED URETPROBES           0
> MMAP CALLS                 8006
> FORKS CALLS                 270
>
> PERIOD #3 STATS:
> FUNC CALLS                    0
> UPROBE HITS              889534
> URETPROBE HITS                0
> ATTACHED LINKS                0
> ATTACHED UPROBES              0
> ATTACHED URETPROBES           0
> MMAP CALLS                 8004
> FORKS CALLS                 288
>
> PERIOD #4 STATS:
> FUNC CALLS                    0
> UPROBE HITS              886506
> URETPROBE HITS                0
> ATTACHED LINKS                0
> ATTACHED UPROBES              0
> ATTACHED URETPROBES           0
> MMAP CALLS                 8120
> FORKS CALLS                 285
>
> PERIOD #5 STATS:
> FUNC CALLS                    0
> UPROBE HITS              804556
> URETPROBE HITS                0
> ATTACHED LINKS                0
> ATTACHED UPROBES              0
> ATTACHED URETPROBES           0
> MMAP CALLS                 7131
> FORKS CALLS                 263
> ^C
> EXITING...
>
> Message from syslogd@kerneltest003.10.atn6.facebook.com at Jul 15 11:06:33 ...
>  kernel:[ 2194.334618] watchdog: BUG: soft lockup - CPU#71 stuck for
> 48s! [uprobe-stress:69900]
>
> It was weird on the very first period (no uretprobes, small amount of
> attachments). And sure enough (gmail will reformat below in the
> garbage, so [0] has the splat with the original formatting).
>
>   [0] https://gist.github.com/anakryiko/3e3ddcccc5ea3ca70ce90b5491485fdc
>
> I also keep getting:
>
> Message from syslogd@kerneltest003.10.atn6.facebook.com at Jul 15 11:09:41 ...
>  kernel:[ 2382.334088] watchdog: BUG: soft lockup - CPU#71 stuck for
> 223s! [uprobe-stress:69900]
>
> so it's not just a temporary slowdown
>
>
> [ 2166.893057] rcu: INFO: rcu_sched self-detected stall on CPU
> [ 2166.904199] rcu:     71-....: (20999 ticks this GP)
> idle=2c84/1/0x4000000000000000 softirq=30158/30158 fqs=8110
> [ 2166.923810] rcu:              hardirqs   softirqs   csw/system
> [ 2166.934939] rcu:      number:        0        183            0
> [ 2166.946064] rcu:     cputime:       60          0        10438
> ==> 10549(ms)
> [ 2166.959969] rcu:     (t=21065 jiffies g=369217 q=207850 ncpus=80)
> [ 2166.971619] CPU: 71 PID: 69900 Comm: uprobe-stress Tainted: G S
>      E      6.10.0-rc7-00071-g9423ae8ef6ff #62
> [ 2166.992275] Hardware name: Quanta Tioga Pass Single Side
> 01-0032211004/Tioga Pass Single Side, BIOS F08_3A24 05/13/2020
> [ 2167.013804] RIP: 0010:uprobe_notify_resume+0x622/0xe20
> [ 2167.024064] Code: 8d 9d c0 00 00 00 48 89 df 4c 89 e6 e8 d7 f9 ff
> ff 84 c0 0f 85 c6 06 00 00 48 89 5c 24 20 41 8b 6d 58 40 f6 c5 01 74
> 23 f3 90 <eb> f2 83 7c 24 18 00 48 8b 44 24 10 0f 8e 71 01 00 00 bf 05
> 00 00
> [ 2167.061543] RSP: 0000:ffffc9004a49fe78 EFLAGS: 00000202
> [ 2167.071973] RAX: 0000000000000000 RBX: ffff88a11d307fc0 RCX: ffff88a120752c40
> [ 2167.086223] RDX: 00000000000042ec RSI: ffffc9004a49ff58 RDI: ffff88a11d307fc0
> [ 2167.100472] RBP: 0000000000000003 R08: ffff88a12516e500 R09: ffff88a12516f208
> [ 2167.114717] R10: 00000000004042ec R11: 000000000000000f R12: ffffc9004a49ff58
> [ 2167.128967] R13: ffff88a11d307f00 R14: 00000000004042ec R15: ffff88a09042e000
> [ 2167.143213] FS:  00007fd252000640(0000) GS:ffff88bfffbc0000(0000)
> knlGS:0000000000000000
> [ 2167.159368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2167.170843] CR2: 00007fd244000b60 CR3: 000000209090b001 CR4: 00000000007706f0
> [ 2167.185091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2167.199340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2167.213586] PKRU: 55555554
> [ 2167.218994] Call Trace:
> [ 2167.223883]  <IRQ>
> [ 2167.227905]  ? rcu_dump_cpu_stacks+0x77/0xd0
> [ 2167.236433]  ? print_cpu_stall+0x150/0x2a0
> [ 2167.244615]  ? rcu_sched_clock_irq+0x319/0x490
> [ 2167.253487]  ? update_process_times+0x71/0xa0
> [ 2167.262191]  ? tick_nohz_handler+0xc0/0x100
> [ 2167.270544]  ? tick_setup_sched_timer+0x170/0x170
> [ 2167.279937]  ? __hrtimer_run_queues+0xe3/0x250
> [ 2167.288815]  ? hrtimer_interrupt+0xf0/0x390
> [ 2167.297168]  ? __sysvec_apic_timer_interrupt+0x47/0x110
> [ 2167.307602]  ? sysvec_apic_timer_interrupt+0x68/0x80
> [ 2167.317519]  </IRQ>
> [ 2167.321710]  <TASK>
> [ 2167.325905]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 2167.336517]  ? uprobe_notify_resume+0x622/0xe20
> [ 2167.345565]  ? uprobe_notify_resume+0x609/0xe20
> [ 2167.354612]  ? __se_sys_futex+0xf3/0x180
> [ 2167.362445]  ? arch_uprobe_exception_notify+0x29/0x40
> [ 2167.372533]  ? notify_die+0x51/0xb0
> [ 2167.379503]  irqentry_exit_to_user_mode+0x7f/0xd0
> [ 2167.388896]  asm_exc_int3+0x35/0x40
> [ 2167.395862] RIP: 0033:0x4042ec
> [ 2167.401966] Code: fc 8b 45 fc 89 c7 e8 6f 07 00 00 83 c0 01 c9 c3
> cc 48 89 e5 48 83 ec 10 89 7d fc 8b 45 fc 89 c7 e8 55 07 00 00 83 c0
> 01 c9 c3 <cc> 48 89 e5 48 83 ec 10 89 7d fc 8b 45 fc 89 c7 e8 3b 07 00
> 00 83
> [ 2167.439439] RSP: 002b:00007fd251fff8a8 EFLAGS: 00000206
> [ 2167.449874] RAX: 00000000004042ec RBX: 00007fd252000640 RCX: 000000000000001c
> [ 2167.464122] RDX: 0000000000000033 RSI: 0000000000000064 RDI: 0000000000000033
> [ 2167.478368] RBP: 00007fd251fff8d0 R08: 00007fd2523fa234 R09: 00007fd2523fa280
> [ 2167.492617] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd252000640
> [ 2167.506866] R13: 0000000000000016 R14: 00007fd252289930 R15: 0000000000000000
> [ 2167.521117]  </TASK>

Peter,

Did you manage to reproduce this?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 00/11] perf/uprobe: Optimize uprobes
  2024-07-19 18:42         ` [PATCH v2 00/11] perf/uprobe: Optimize uprobes Andrii Nakryiko
@ 2024-07-27  0:18           ` Andrii Nakryiko
  0 siblings, 0 replies; 12+ messages in thread
From: Andrii Nakryiko @ 2024-07-27  0:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: oleg, mingo, andrii, linux-kernel, linux-trace-kernel, rostedt,
	mhiramat, jolsa, clm, paulmck, bpf

On Fri, Jul 19, 2024 at 11:42 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Jul 15, 2024 at 11:10 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Jul 15, 2024 at 10:10 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Mon, Jul 15, 2024 at 7:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Thu, Jul 11, 2024 at 09:57:44PM -0700, Andrii Nakryiko wrote:
> > > >
> > > > > But then I also ran it on Linux built from perf/uprobes branch (these
> > > > > patches), and after a few seconds I see that there is no more
> > > > > attachment/detachment happening. Eventually I got splats, which you
> > > > > can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
> > > > > to run it inside my QEMU image.
> > > >
> > > > So them git voodoo incantations did work and I got it built. I'm running
> > > > that exact same line above (minus the sudo, because test box only has a
> > > > root account I think) on real hardware.
> > > >
> > > > I'm now ~100 periods in and wondering what 'eventually' means...
> > >
> > > So I was running in a qemu set up with 16 cores on top of bare metal's
> > > 80 core CPU (Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz). I just tried
> > > it again, and I can reproduce it within first few periods:
> > >
> > > WORKING HARD!..
> > >
> > > PERIOD #1 STATS:
> > > FUNC CALLS               919632
> > > UPROBE HITS              706351
> > > URETPROBE HITS           641679
> > > ATTACHED LINKS              951
> > > ATTACHED UPROBES           2421
> > > ATTACHED URETPROBES        2343
> > > MMAP CALLS                33533
> > > FORKS CALLS                 241
> > >
> > > PERIOD #2 STATS:
> > > FUNC CALLS                11444
> > > UPROBE HITS               14320
> > > URETPROBE HITS             9896
> > > ATTACHED LINKS               26
> > > ATTACHED UPROBES             75
> > > ATTACHED URETPROBES          61
> > > MMAP CALLS                39093
> > > FORKS CALLS                  14
> > >
> > > PERIOD #3 STATS:
> > > FUNC CALLS                  230
> > > UPROBE HITS                 152
> > > URETPROBE HITS              145
> > > ATTACHED LINKS                2
> > > ATTACHED UPROBES              2
> > > ATTACHED URETPROBES           2
> > > MMAP CALLS                39121
> > > FORKS CALLS                   0
> > >
> > > PERIOD #4 STATS:
> > > FUNC CALLS                    0
> > > UPROBE HITS                   0
> > > URETPROBE HITS                0
> > > ATTACHED LINKS                0
> > > ATTACHED UPROBES              0
> > > ATTACHED URETPROBES           0
> > > MMAP CALLS                39010
> > > FORKS CALLS                   0
> > >
> > > You can see in the second period all the numbers drop and by period #4
> > > (which is about 20 seconds in) anything but mmap()ing stops. When I
> > > said "eventually" I meant about a minute tops, however long it takes
> > > to do soft lockup detection, 23 seconds this time.
> > >
> > > So it should be very fast.
> > >
> > > Note that I'm running with debug kernel configuration (see [0] for
> > > full kernel config), here are debug-related settings, in case that
> > > makes a difference:
> > >
> > > $ cat ~/linux-build/default/.config | rg -i debug | rg -v '^#'
> > > CONFIG_X86_DEBUGCTLMSR=y
> > > CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> > > CONFIG_BLK_DEBUG_FS=y
> > > CONFIG_PNP_DEBUG_MESSAGES=y
> > > CONFIG_AIC7XXX_DEBUG_MASK=0
> > > CONFIG_AIC79XX_DEBUG_MASK=0
> > > CONFIG_SCSI_MVSAS_DEBUG=y
> > > CONFIG_DM_DEBUG=y
> > > CONFIG_MLX4_DEBUG=y
> > > CONFIG_USB_SERIAL_DEBUG=m
> > > CONFIG_INFINIBAND_MTHCA_DEBUG=y
> > > CONFIG_INFINIBAND_IPOIB_DEBUG=y
> > > CONFIG_INFINIBAND_IPOIB_DEBUG_DATA=y
> > > CONFIG_CIFS_DEBUG=y
> > > CONFIG_DLM_DEBUG=y
> > > CONFIG_DEBUG_BUGVERBOSE=y
> > > CONFIG_DEBUG_KERNEL=y
> > > CONFIG_DEBUG_INFO=y
> > > CONFIG_DEBUG_INFO_DWARF4=y
> > > CONFIG_DEBUG_INFO_COMPRESSED_NONE=y
> > > CONFIG_DEBUG_INFO_BTF=y
> > > CONFIG_DEBUG_INFO_BTF_MODULES=y
> > > CONFIG_DEBUG_FS=y
> > > CONFIG_DEBUG_FS_ALLOW_ALL=y
> > > CONFIG_ARCH_HAS_DEBUG_WX=y
> > > CONFIG_HAVE_DEBUG_KMEMLEAK=y
> > > CONFIG_ARCH_HAS_DEBUG_VM_PGTABLE=y
> > > CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
> > > CONFIG_SCHED_DEBUG=y
> > > CONFIG_DEBUG_PREEMPT=y
> > > CONFIG_LOCK_DEBUGGING_SUPPORT=y
> > > CONFIG_DEBUG_RT_MUTEXES=y
> > > CONFIG_DEBUG_SPINLOCK=y
> > > CONFIG_DEBUG_MUTEXES=y
> > > CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
> > > CONFIG_DEBUG_RWSEMS=y
> > > CONFIG_DEBUG_LOCK_ALLOC=y
> > > CONFIG_DEBUG_LOCKDEP=y
> > > CONFIG_DEBUG_ATOMIC_SLEEP=y
> > > CONFIG_DEBUG_IRQFLAGS=y
> > > CONFIG_X86_DEBUG_FPU=y
> > > CONFIG_FAULT_INJECTION_DEBUG_FS=y
> > >
> > >   [0] https://gist.github.com/anakryiko/97a023a95b30fb0fe607ff743433e64b
> > >
> > > >
> > > > Also, this is a 2 socket, 10 core per socket, 2 threads per core
> > > > ivybridge thing, are those parameters sufficient?
> > >
> > > Should be, I guess? It might be VM vs bare metal differences, though.
> > > I'll try to run this on bare metal with more production-like kernel
> > > configuration to see if I can still trigger this. Will let you know
> > > the results when I get them.
> >
> > Ok, so I ran it on bare metal host with production config. I didn't
> > really bother to specify parameters (so just one thread for
> > everything, the default):
> >
> > # ./uprobe-stress
> > WORKING HARD!..
> >
> > PERIOD #1 STATS:
> > FUNC CALLS              2959843
> > UPROBE HITS             1001312
> > URETPROBE HITS                0
> > ATTACHED LINKS                6
> > ATTACHED UPROBES             28
> > ATTACHED URETPROBES           0
> > MMAP CALLS                 8143
> > FORKS CALLS                 301
> >
> > PERIOD #2 STATS:
> > FUNC CALLS                    0
> > UPROBE HITS              822826
> > URETPROBE HITS                0
> > ATTACHED LINKS                0
> > ATTACHED UPROBES              0
> > ATTACHED URETPROBES           0
> > MMAP CALLS                 8006
> > FORKS CALLS                 270
> >
> > PERIOD #3 STATS:
> > FUNC CALLS                    0
> > UPROBE HITS              889534
> > URETPROBE HITS                0
> > ATTACHED LINKS                0
> > ATTACHED UPROBES              0
> > ATTACHED URETPROBES           0
> > MMAP CALLS                 8004
> > FORKS CALLS                 288
> >
> > PERIOD #4 STATS:
> > FUNC CALLS                    0
> > UPROBE HITS              886506
> > URETPROBE HITS                0
> > ATTACHED LINKS                0
> > ATTACHED UPROBES              0
> > ATTACHED URETPROBES           0
> > MMAP CALLS                 8120
> > FORKS CALLS                 285
> >
> > PERIOD #5 STATS:
> > FUNC CALLS                    0
> > UPROBE HITS              804556
> > URETPROBE HITS                0
> > ATTACHED LINKS                0
> > ATTACHED UPROBES              0
> > ATTACHED URETPROBES           0
> > MMAP CALLS                 7131
> > FORKS CALLS                 263
> > ^C
> > EXITING...
> >
> > Message from syslogd@kerneltest003.10.atn6.facebook.com at Jul 15 11:06:33 ...
> >  kernel:[ 2194.334618] watchdog: BUG: soft lockup - CPU#71 stuck for
> > 48s! [uprobe-stress:69900]
> >
> > It was weird on the very first period (no uretprobes, small amount of
> > attachments). And sure enough (gmail will reformat below in the
> > garbage, so [0] has the splat with the original formatting).
> >
> >   [0] https://gist.github.com/anakryiko/3e3ddcccc5ea3ca70ce90b5491485fdc
> >
> > I also keep getting:
> >
> > Message from syslogd@kerneltest003.10.atn6.facebook.com at Jul 15 11:09:41 ...
> >  kernel:[ 2382.334088] watchdog: BUG: soft lockup - CPU#71 stuck for
> > 223s! [uprobe-stress:69900]
> >
> > so it's not just a temporary slowdown
> >
> >
> > [ 2166.893057] rcu: INFO: rcu_sched self-detected stall on CPU
> > [ 2166.904199] rcu:     71-....: (20999 ticks this GP)
> > idle=2c84/1/0x4000000000000000 softirq=30158/30158 fqs=8110
> > [ 2166.923810] rcu:              hardirqs   softirqs   csw/system
> > [ 2166.934939] rcu:      number:        0        183            0
> > [ 2166.946064] rcu:     cputime:       60          0        10438
> > ==> 10549(ms)
> > [ 2166.959969] rcu:     (t=21065 jiffies g=369217 q=207850 ncpus=80)
> > [ 2166.971619] CPU: 71 PID: 69900 Comm: uprobe-stress Tainted: G S
> >      E      6.10.0-rc7-00071-g9423ae8ef6ff #62
> > [ 2166.992275] Hardware name: Quanta Tioga Pass Single Side
> > 01-0032211004/Tioga Pass Single Side, BIOS F08_3A24 05/13/2020
> > [ 2167.013804] RIP: 0010:uprobe_notify_resume+0x622/0xe20
> > [ 2167.024064] Code: 8d 9d c0 00 00 00 48 89 df 4c 89 e6 e8 d7 f9 ff
> > ff 84 c0 0f 85 c6 06 00 00 48 89 5c 24 20 41 8b 6d 58 40 f6 c5 01 74
> > 23 f3 90 <eb> f2 83 7c 24 18 00 48 8b 44 24 10 0f 8e 71 01 00 00 bf 05
> > 00 00
> > [ 2167.061543] RSP: 0000:ffffc9004a49fe78 EFLAGS: 00000202
> > [ 2167.071973] RAX: 0000000000000000 RBX: ffff88a11d307fc0 RCX: ffff88a120752c40
> > [ 2167.086223] RDX: 00000000000042ec RSI: ffffc9004a49ff58 RDI: ffff88a11d307fc0
> > [ 2167.100472] RBP: 0000000000000003 R08: ffff88a12516e500 R09: ffff88a12516f208
> > [ 2167.114717] R10: 00000000004042ec R11: 000000000000000f R12: ffffc9004a49ff58
> > [ 2167.128967] R13: ffff88a11d307f00 R14: 00000000004042ec R15: ffff88a09042e000
> > [ 2167.143213] FS:  00007fd252000640(0000) GS:ffff88bfffbc0000(0000)
> > knlGS:0000000000000000
> > [ 2167.159368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2167.170843] CR2: 00007fd244000b60 CR3: 000000209090b001 CR4: 00000000007706f0
> > [ 2167.185091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 2167.199340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 2167.213586] PKRU: 55555554
> > [ 2167.218994] Call Trace:
> > [ 2167.223883]  <IRQ>
> > [ 2167.227905]  ? rcu_dump_cpu_stacks+0x77/0xd0
> > [ 2167.236433]  ? print_cpu_stall+0x150/0x2a0
> > [ 2167.244615]  ? rcu_sched_clock_irq+0x319/0x490
> > [ 2167.253487]  ? update_process_times+0x71/0xa0
> > [ 2167.262191]  ? tick_nohz_handler+0xc0/0x100
> > [ 2167.270544]  ? tick_setup_sched_timer+0x170/0x170
> > [ 2167.279937]  ? __hrtimer_run_queues+0xe3/0x250
> > [ 2167.288815]  ? hrtimer_interrupt+0xf0/0x390
> > [ 2167.297168]  ? __sysvec_apic_timer_interrupt+0x47/0x110
> > [ 2167.307602]  ? sysvec_apic_timer_interrupt+0x68/0x80
> > [ 2167.317519]  </IRQ>
> > [ 2167.321710]  <TASK>
> > [ 2167.325905]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > [ 2167.336517]  ? uprobe_notify_resume+0x622/0xe20
> > [ 2167.345565]  ? uprobe_notify_resume+0x609/0xe20
> > [ 2167.354612]  ? __se_sys_futex+0xf3/0x180
> > [ 2167.362445]  ? arch_uprobe_exception_notify+0x29/0x40
> > [ 2167.372533]  ? notify_die+0x51/0xb0
> > [ 2167.379503]  irqentry_exit_to_user_mode+0x7f/0xd0
> > [ 2167.388896]  asm_exc_int3+0x35/0x40
> > [ 2167.395862] RIP: 0033:0x4042ec
> > [ 2167.401966] Code: fc 8b 45 fc 89 c7 e8 6f 07 00 00 83 c0 01 c9 c3
> > cc 48 89 e5 48 83 ec 10 89 7d fc 8b 45 fc 89 c7 e8 55 07 00 00 83 c0
> > 01 c9 c3 <cc> 48 89 e5 48 83 ec 10 89 7d fc 8b 45 fc 89 c7 e8 3b 07 00
> > 00 83
> > [ 2167.439439] RSP: 002b:00007fd251fff8a8 EFLAGS: 00000206
> > [ 2167.449874] RAX: 00000000004042ec RBX: 00007fd252000640 RCX: 000000000000001c
> > [ 2167.464122] RDX: 0000000000000033 RSI: 0000000000000064 RDI: 0000000000000033
> > [ 2167.478368] RBP: 00007fd251fff8d0 R08: 00007fd2523fa234 R09: 00007fd2523fa280
> > [ 2167.492617] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd252000640
> > [ 2167.506866] R13: 0000000000000016 R14: 00007fd252289930 R15: 0000000000000000
> > [ 2167.521117]  </TASK>
>
> Peter,
>
> Did you manage to reproduce this?

Ping.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-07-27  0:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240711110235.098009979@infradead.org>
     [not found] ` <20240711110400.880800153@infradead.org>
2024-07-12 21:06   ` [PATCH v2 06/11] perf/uprobe: SRCU-ify uprobe->consumer list Andrii Nakryiko
2024-07-15 11:25     ` Peter Zijlstra
2024-07-15 17:30       ` Andrii Nakryiko
     [not found] ` <20240711110400.987380024@infradead.org>
2024-07-12 21:10   ` [PATCH v2 07/11] perf/uprobe: Split uprobe_unregister() Andrii Nakryiko
     [not found] ` <20240711110401.096506262@infradead.org>
2024-07-12 21:21   ` [PATCH v2 08/11] perf/uprobe: Convert (some) uprobe->refcount to SRCU Andrii Nakryiko
     [not found] ` <20240711110401.311168524@infradead.org>
2024-07-12 21:28   ` [PATCH v2 10/11] perf/uprobe: Convert single-step and uretprobe " Andrii Nakryiko
2024-07-15 11:59     ` Peter Zijlstra
     [not found] ` <20240711110401.412779774@infradead.org>
2024-07-12 21:43   ` [PATCH v2 11/11] perf/uprobe: Add uretprobe timer Andrii Nakryiko
2024-07-15 11:41     ` Peter Zijlstra
2024-07-15 17:34       ` Andrii Nakryiko
     [not found] ` <CAEf4BzZ+ygwfk8FKn5AS_Ny=igvGcFzdDLE2FjcvwjCKazEWMA@mail.gmail.com>
     [not found]   ` <20240715144536.GI14400@noisy.programming.kicks-ass.net>
     [not found]     ` <CAEf4BzZuR883FEuKAXp3DY1iJcL+ST8eNq5ioq8oRpDyg0w8Kw@mail.gmail.com>
     [not found]       ` <CAEf4BzY-r2EcQEVxA=kDUvx-wX3t0hsG+66=iKTS5ZaAJF4zjw@mail.gmail.com>
2024-07-19 18:42         ` [PATCH v2 00/11] perf/uprobe: Optimize uprobes Andrii Nakryiko
2024-07-27  0:18           ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox