linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] uprobes: make trace_uprobe->nhit counter a per-CPU one
@ 2024-08-09 19:23 Andrii Nakryiko
  2024-08-13 13:30 ` Masami Hiramatsu
  2024-08-13 14:50 ` Oleg Nesterov
  0 siblings, 2 replies; 7+ messages in thread
From: Andrii Nakryiko @ 2024-08-09 19:23 UTC (permalink / raw)
  To: linux-trace-kernel, rostedt, mhiramat
  Cc: peterz, oleg, bpf, linux-kernel, jolsa, Andrii Nakryiko

trace_uprobe->nhit counter is not incremented atomically, so its value
is questionable in when uprobe is hit on multiple CPUs simultaneously.

Also, doing this shared counter increment across many CPUs causes heavy
cache line bouncing, limiting uprobe/uretprobe performance scaling with
number of CPUs.

Solve both problems by making this a per-CPU counter.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/trace/trace_uprobe.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 52e76a73fa7c..002f801a7ab4 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -17,6 +17,7 @@
 #include <linux/string.h>
 #include <linux/rculist.h>
 #include <linux/filter.h>
+#include <linux/percpu.h>
 
 #include "trace_dynevent.h"
 #include "trace_probe.h"
@@ -62,7 +63,7 @@ struct trace_uprobe {
 	struct uprobe			*uprobe;
 	unsigned long			offset;
 	unsigned long			ref_ctr_offset;
-	unsigned long			nhit;
+	unsigned long __percpu		*nhits;
 	struct trace_probe		tp;
 };
 
@@ -337,6 +338,12 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
 	if (!tu)
 		return ERR_PTR(-ENOMEM);
 
+	tu->nhits = alloc_percpu(unsigned long);
+	if (!tu->nhits) {
+		ret = -ENOMEM;
+		goto error;
+	}
+
 	ret = trace_probe_init(&tu->tp, event, group, true, nargs);
 	if (ret < 0)
 		goto error;
@@ -349,6 +356,7 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
 	return tu;
 
 error:
+	free_percpu(tu->nhits);
 	kfree(tu);
 
 	return ERR_PTR(ret);
@@ -362,6 +370,7 @@ static void free_trace_uprobe(struct trace_uprobe *tu)
 	path_put(&tu->path);
 	trace_probe_cleanup(&tu->tp);
 	kfree(tu->filename);
+	free_percpu(tu->nhits);
 	kfree(tu);
 }
 
@@ -815,13 +824,21 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
 {
 	struct dyn_event *ev = v;
 	struct trace_uprobe *tu;
+	unsigned long nhits;
+	int cpu;
 
 	if (!is_trace_uprobe(ev))
 		return 0;
 
 	tu = to_trace_uprobe(ev);
+
+	nhits = 0;
+	for_each_possible_cpu(cpu) {
+		nhits += READ_ONCE(*per_cpu_ptr(tu->nhits, cpu));
+	}
+
 	seq_printf(m, "  %s %-44s %15lu\n", tu->filename,
-			trace_probe_name(&tu->tp), tu->nhit);
+		   trace_probe_name(&tu->tp), nhits);
 	return 0;
 }
 
@@ -1507,7 +1524,8 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
 	int ret = 0;
 
 	tu = container_of(con, struct trace_uprobe, consumer);
-	tu->nhit++;
+
+	this_cpu_inc(*tu->nhits);
 
 	udd.tu = tu;
 	udd.bp_addr = instruction_pointer(regs);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] uprobes: make trace_uprobe->nhit counter a per-CPU one
  2024-08-09 19:23 [PATCH v2] uprobes: make trace_uprobe->nhit counter a per-CPU one Andrii Nakryiko
@ 2024-08-13 13:30 ` Masami Hiramatsu
  2024-08-13 15:41   ` Oleg Nesterov
  2024-08-13 14:50 ` Oleg Nesterov
  1 sibling, 1 reply; 7+ messages in thread
From: Masami Hiramatsu @ 2024-08-13 13:30 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: linux-trace-kernel, rostedt, peterz, oleg, bpf, linux-kernel,
	jolsa

On Fri,  9 Aug 2024 12:23:57 -0700
Andrii Nakryiko <andrii@kernel.org> wrote:

> trace_uprobe->nhit counter is not incremented atomically, so its value
> is questionable in when uprobe is hit on multiple CPUs simultaneously.
> 
> Also, doing this shared counter increment across many CPUs causes heavy
> cache line bouncing, limiting uprobe/uretprobe performance scaling with
> number of CPUs.
> 
> Solve both problems by making this a per-CPU counter.
> 

This looks good to me. I would like to pick this to linux-trace/probes/for-next.

> @@ -62,7 +63,7 @@ struct trace_uprobe {
>  	struct uprobe			*uprobe;

BTW, what is this change? I couldn't cleanly apply this to the v6.11-rc3.
Which tree would you working on? (I missed something?)

Thanks,

>  	unsigned long			offset;
>  	unsigned long			ref_ctr_offset;
> -	unsigned long			nhit;
> +	unsigned long __percpu		*nhits;
>  	struct trace_probe		tp;
>  };
>  
> @@ -337,6 +338,12 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
>  	if (!tu)
>  		return ERR_PTR(-ENOMEM);
>  
> +	tu->nhits = alloc_percpu(unsigned long);
> +	if (!tu->nhits) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +
>  	ret = trace_probe_init(&tu->tp, event, group, true, nargs);
>  	if (ret < 0)
>  		goto error;
> @@ -349,6 +356,7 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
>  	return tu;
>  
>  error:
> +	free_percpu(tu->nhits);
>  	kfree(tu);
>  
>  	return ERR_PTR(ret);
> @@ -362,6 +370,7 @@ static void free_trace_uprobe(struct trace_uprobe *tu)
>  	path_put(&tu->path);
>  	trace_probe_cleanup(&tu->tp);
>  	kfree(tu->filename);
> +	free_percpu(tu->nhits);
>  	kfree(tu);
>  }
>  
> @@ -815,13 +824,21 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
>  {
>  	struct dyn_event *ev = v;
>  	struct trace_uprobe *tu;
> +	unsigned long nhits;
> +	int cpu;
>  
>  	if (!is_trace_uprobe(ev))
>  		return 0;
>  
>  	tu = to_trace_uprobe(ev);
> +
> +	nhits = 0;
> +	for_each_possible_cpu(cpu) {
> +		nhits += READ_ONCE(*per_cpu_ptr(tu->nhits, cpu));
> +	}
> +
>  	seq_printf(m, "  %s %-44s %15lu\n", tu->filename,
> -			trace_probe_name(&tu->tp), tu->nhit);
> +		   trace_probe_name(&tu->tp), nhits);
>  	return 0;
>  }
>  
> @@ -1507,7 +1524,8 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
>  	int ret = 0;
>  
>  	tu = container_of(con, struct trace_uprobe, consumer);
> -	tu->nhit++;
> +
> +	this_cpu_inc(*tu->nhits);
>  
>  	udd.tu = tu;
>  	udd.bp_addr = instruction_pointer(regs);
> -- 
> 2.43.5
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] uprobes: make trace_uprobe->nhit counter a per-CPU one
  2024-08-09 19:23 [PATCH v2] uprobes: make trace_uprobe->nhit counter a per-CPU one Andrii Nakryiko
  2024-08-13 13:30 ` Masami Hiramatsu
@ 2024-08-13 14:50 ` Oleg Nesterov
  2024-08-13 17:05   ` Andrii Nakryiko
  1 sibling, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2024-08-13 14:50 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: linux-trace-kernel, rostedt, mhiramat, peterz, bpf, linux-kernel,
	jolsa

On 08/09, Andrii Nakryiko wrote:
>
> @@ -815,13 +824,21 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
>  {
>  	struct dyn_event *ev = v;
>  	struct trace_uprobe *tu;
> +	unsigned long nhits;
> +	int cpu;
>
>  	if (!is_trace_uprobe(ev))
>  		return 0;
>
>  	tu = to_trace_uprobe(ev);
> +
> +	nhits = 0;
> +	for_each_possible_cpu(cpu) {
> +		nhits += READ_ONCE(*per_cpu_ptr(tu->nhits, cpu));

why not

		nhits += per_cpu(*tu->nhits, cpu);

?

See for example per_cpu_sum() or nr_processes(), per_cpu() should work just fine...

Other than that

Reviewed-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] uprobes: make trace_uprobe->nhit counter a per-CPU one
  2024-08-13 13:30 ` Masami Hiramatsu
@ 2024-08-13 15:41   ` Oleg Nesterov
  2024-08-25 10:15     ` Masami Hiramatsu
  0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2024-08-13 15:41 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Andrii Nakryiko, linux-trace-kernel, rostedt, peterz, bpf,
	linux-kernel, jolsa

On 08/13, Masami Hiramatsu wrote:
>
> > @@ -62,7 +63,7 @@ struct trace_uprobe {
> >  	struct uprobe			*uprobe;
>
> BTW, what is this change? I couldn't cleanly apply this to the v6.11-rc3.
> Which tree would you working on? (I missed something?)

tip/perf/core

See https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/diff/kernel/trace/trace_uprobe.c?h=perf/core&id=3c83a9ad0295eb63bdeb81d821b8c3b9417fbcac

Oleg.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] uprobes: make trace_uprobe->nhit counter a per-CPU one
  2024-08-13 14:50 ` Oleg Nesterov
@ 2024-08-13 17:05   ` Andrii Nakryiko
  0 siblings, 0 replies; 7+ messages in thread
From: Andrii Nakryiko @ 2024-08-13 17:05 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrii Nakryiko, linux-trace-kernel, rostedt, mhiramat, peterz,
	bpf, linux-kernel, jolsa

On Tue, Aug 13, 2024 at 7:50 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 08/09, Andrii Nakryiko wrote:
> >
> > @@ -815,13 +824,21 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
> >  {
> >       struct dyn_event *ev = v;
> >       struct trace_uprobe *tu;
> > +     unsigned long nhits;
> > +     int cpu;
> >
> >       if (!is_trace_uprobe(ev))
> >               return 0;
> >
> >       tu = to_trace_uprobe(ev);
> > +
> > +     nhits = 0;
> > +     for_each_possible_cpu(cpu) {
> > +             nhits += READ_ONCE(*per_cpu_ptr(tu->nhits, cpu));
>
> why not
>
>                 nhits += per_cpu(*tu->nhits, cpu);
>
> ?
>
> See for example per_cpu_sum() or nr_processes(), per_cpu() should work just fine...
>

I just monkeyed it from some existing code somewhere in the BPF code
base. I like per_cpu, will send a v3 and rebase it onto a linux-trace
tree.

> Other than that
>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] uprobes: make trace_uprobe->nhit counter a per-CPU one
  2024-08-13 15:41   ` Oleg Nesterov
@ 2024-08-25 10:15     ` Masami Hiramatsu
  2024-08-26 16:17       ` Andrii Nakryiko
  0 siblings, 1 reply; 7+ messages in thread
From: Masami Hiramatsu @ 2024-08-25 10:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrii Nakryiko, linux-trace-kernel, rostedt, peterz, bpf,
	linux-kernel, jolsa

On Tue, 13 Aug 2024 17:41:04 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 08/13, Masami Hiramatsu wrote:
> >
> > > @@ -62,7 +63,7 @@ struct trace_uprobe {
> > >  	struct uprobe			*uprobe;
> >
> > BTW, what is this change? I couldn't cleanly apply this to the v6.11-rc3.
> > Which tree would you working on? (I missed something?)
> 
> tip/perf/core
> 
> See https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/diff/kernel/trace/trace_uprobe.c?h=perf/core&id=3c83a9ad0295eb63bdeb81d821b8c3b9417fbcac

OK, let me consider to rebase on tip/perf/core.

Thank you,

> 
> Oleg.
> 


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] uprobes: make trace_uprobe->nhit counter a per-CPU one
  2024-08-25 10:15     ` Masami Hiramatsu
@ 2024-08-26 16:17       ` Andrii Nakryiko
  0 siblings, 0 replies; 7+ messages in thread
From: Andrii Nakryiko @ 2024-08-26 16:17 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Oleg Nesterov, Andrii Nakryiko, linux-trace-kernel, rostedt,
	peterz, bpf, linux-kernel, jolsa

On Sun, Aug 25, 2024 at 3:15 AM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Tue, 13 Aug 2024 17:41:04 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > On 08/13, Masami Hiramatsu wrote:
> > >
> > > > @@ -62,7 +63,7 @@ struct trace_uprobe {
> > > >   struct uprobe                   *uprobe;
> > >
> > > BTW, what is this change? I couldn't cleanly apply this to the v6.11-rc3.
> > > Which tree would you working on? (I missed something?)
> >
> > tip/perf/core
> >
> > See https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/diff/kernel/trace/trace_uprobe.c?h=perf/core&id=3c83a9ad0295eb63bdeb81d821b8c3b9417fbcac
>
> OK, let me consider to rebase on tip/perf/core.
>

Hey Masami,

I've posted v3 rebased onto linux-trace/probes/for-next, so you
shouldn't need to rebase anything just for this. See [0] for the
latest revision.

  [0] https://lore.kernel.org/linux-trace-kernel/20240813203409.3985398-1-andrii@kernel.org/

> Thank you,
>
> >
> > Oleg.
> >
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-08-26 16:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-09 19:23 [PATCH v2] uprobes: make trace_uprobe->nhit counter a per-CPU one Andrii Nakryiko
2024-08-13 13:30 ` Masami Hiramatsu
2024-08-13 15:41   ` Oleg Nesterov
2024-08-25 10:15     ` Masami Hiramatsu
2024-08-26 16:17       ` Andrii Nakryiko
2024-08-13 14:50 ` Oleg Nesterov
2024-08-13 17:05   ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).