[PATCH] uprobes: get rid of bogus trace

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] uprobes: get rid of bogus trace_uprobe hit counter
@ 2024-08-05 20:28 Andrii Nakryiko
  2024-08-06  7:37 ` Jiri Olsa
  0 siblings, 1 reply; 5+ messages in thread
From: Andrii Nakryiko @ 2024-08-05 20:28 UTC (permalink / raw)
  To: linux-trace-kernel, rostedt, mhiramat
  Cc: peterz, oleg, bpf, linux-kernel, jolsa, Andrii Nakryiko

trace_uprobe->nhit counter is not incremented atomically, so its value
is bogus in practice. On the other hand, it's actually a pretty big
uprobe scalability problem due to heavy cache line bouncing between CPUs
triggering the same uprobe.

Drop it and emit obviously unrealistic value in its stead in
uporbe_profiler seq file.

The alternative would be allocating per-CPU counter, but I'm not sure
it's justified.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/trace/trace_uprobe.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 52e76a73fa7c..5d38207db479 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -62,7 +62,6 @@ struct trace_uprobe {
 	struct uprobe			*uprobe;
 	unsigned long			offset;
 	unsigned long			ref_ctr_offset;
-	unsigned long			nhit;
 	struct trace_probe		tp;
 };
 
@@ -821,7 +820,7 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
 
 	tu = to_trace_uprobe(ev);
 	seq_printf(m, "  %s %-44s %15lu\n", tu->filename,
-			trace_probe_name(&tu->tp), tu->nhit);
+		   trace_probe_name(&tu->tp), ULONG_MAX);
 	return 0;
 }
 
@@ -1507,7 +1506,6 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
 	int ret = 0;
 
 	tu = container_of(con, struct trace_uprobe, consumer);
-	tu->nhit++;
 
 	udd.tu = tu;
 	udd.bp_addr = instruction_pointer(regs);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] uprobes: get rid of bogus trace_uprobe hit counter
  2024-08-05 20:28 [PATCH] uprobes: get rid of bogus trace_uprobe hit counter Andrii Nakryiko
@ 2024-08-06  7:37 ` Jiri Olsa
  2024-08-06 17:26   ` Andrii Nakryiko
  0 siblings, 1 reply; 5+ messages in thread
From: Jiri Olsa @ 2024-08-06  7:37 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: linux-trace-kernel, rostedt, mhiramat, peterz, oleg, bpf,
	linux-kernel

On Mon, Aug 05, 2024 at 01:28:03PM -0700, Andrii Nakryiko wrote:
> trace_uprobe->nhit counter is not incremented atomically, so its value
> is bogus in practice. On the other hand, it's actually a pretty big
> uprobe scalability problem due to heavy cache line bouncing between CPUs
> triggering the same uprobe.

so you're seeing that in the benchmark, right? I'm curious how bad
the numbers are

> 
> Drop it and emit obviously unrealistic value in its stead in
> uporbe_profiler seq file.
> 
> The alternative would be allocating per-CPU counter, but I'm not sure
> it's justified.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  kernel/trace/trace_uprobe.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 52e76a73fa7c..5d38207db479 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -62,7 +62,6 @@ struct trace_uprobe {
>  	struct uprobe			*uprobe;
>  	unsigned long			offset;
>  	unsigned long			ref_ctr_offset;
> -	unsigned long			nhit;
>  	struct trace_probe		tp;
>  };
>  
> @@ -821,7 +820,7 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
>  
>  	tu = to_trace_uprobe(ev);
>  	seq_printf(m, "  %s %-44s %15lu\n", tu->filename,
> -			trace_probe_name(&tu->tp), tu->nhit);
> +		   trace_probe_name(&tu->tp), ULONG_MAX);

seems harsh.. would it be that bad to create per cpu counter for that?

jirka

>  	return 0;
>  }
>  
> @@ -1507,7 +1506,6 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
>  	int ret = 0;
>  
>  	tu = container_of(con, struct trace_uprobe, consumer);
> -	tu->nhit++;
>  
>  	udd.tu = tu;
>  	udd.bp_addr = instruction_pointer(regs);
> -- 
> 2.43.5
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] uprobes: get rid of bogus trace_uprobe hit counter
  2024-08-06  7:37 ` Jiri Olsa
@ 2024-08-06 17:26   ` Andrii Nakryiko
  2024-08-07 21:43     ` Masami Hiramatsu
  0 siblings, 1 reply; 5+ messages in thread
From: Andrii Nakryiko @ 2024-08-06 17:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, linux-trace-kernel, rostedt, mhiramat, peterz,
	oleg, bpf, linux-kernel

On Tue, Aug 6, 2024 at 12:37 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Aug 05, 2024 at 01:28:03PM -0700, Andrii Nakryiko wrote:
> > trace_uprobe->nhit counter is not incremented atomically, so its value
> > is bogus in practice. On the other hand, it's actually a pretty big
> > uprobe scalability problem due to heavy cache line bouncing between CPUs
> > triggering the same uprobe.
>
> so you're seeing that in the benchmark, right? I'm curious how bad
> the numbers are
>

Yes. So, once we get rid of all the uprobe/uretprobe/mm locks (ongoing
work), this one was the last limiter to linear scalability.

With this counter, I was topping out at about 12 mln/s uprobe
triggering (I think it was 32 CPUs, but I don't remember exactly now).
About 30% of CPU cycles were spent in this increment.

But those 30% don't paint the full picture. Once the counter is
removed, the same uprobe throughput jumps to 62 mln/s or so. So we
definitely have to do something about it.

> >
> > Drop it and emit obviously unrealistic value in its stead in
> > uporbe_profiler seq file.
> >
> > The alternative would be allocating per-CPU counter, but I'm not sure
> > it's justified.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  kernel/trace/trace_uprobe.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > index 52e76a73fa7c..5d38207db479 100644
> > --- a/kernel/trace/trace_uprobe.c
> > +++ b/kernel/trace/trace_uprobe.c
> > @@ -62,7 +62,6 @@ struct trace_uprobe {
> >       struct uprobe                   *uprobe;
> >       unsigned long                   offset;
> >       unsigned long                   ref_ctr_offset;
> > -     unsigned long                   nhit;
> >       struct trace_probe              tp;
> >  };
> >
> > @@ -821,7 +820,7 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
> >
> >       tu = to_trace_uprobe(ev);
> >       seq_printf(m, "  %s %-44s %15lu\n", tu->filename,
> > -                     trace_probe_name(&tu->tp), tu->nhit);
> > +                trace_probe_name(&tu->tp), ULONG_MAX);
>
> seems harsh.. would it be that bad to create per cpu counter for that?

Well, consider this patch a conversation starter. There are two
reasons why I'm removing the counter instead of doing per-CPU one:

  - it's less work to send out a patch pointing out the problem (but
the solution might change)
  - this counter was never correct in the presence of multiple
threads, so I'm not sure how useful it is.

Yes, I think we can do per-CPU counters, but do we want to pay the
memory price? That's what I want to get from Masami, Steven, or Peter
(whoever cares enough).

>
> jirka
>
> >       return 0;
> >  }
> >
> > @@ -1507,7 +1506,6 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
> >       int ret = 0;
> >
> >       tu = container_of(con, struct trace_uprobe, consumer);
> > -     tu->nhit++;
> >
> >       udd.tu = tu;
> >       udd.bp_addr = instruction_pointer(regs);
> > --
> > 2.43.5
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] uprobes: get rid of bogus trace_uprobe hit counter
  2024-08-06 17:26   ` Andrii Nakryiko
@ 2024-08-07 21:43     ` Masami Hiramatsu
  2024-08-09 18:43       ` Andrii Nakryiko
  0 siblings, 1 reply; 5+ messages in thread
From: Masami Hiramatsu @ 2024-08-07 21:43 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiri Olsa, Andrii Nakryiko, linux-trace-kernel, rostedt, mhiramat,
	peterz, oleg, bpf, linux-kernel

On Tue, 6 Aug 2024 10:26:25 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Tue, Aug 6, 2024 at 12:37 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Aug 05, 2024 at 01:28:03PM -0700, Andrii Nakryiko wrote:
> > > trace_uprobe->nhit counter is not incremented atomically, so its value
> > > is bogus in practice. On the other hand, it's actually a pretty big
> > > uprobe scalability problem due to heavy cache line bouncing between CPUs
> > > triggering the same uprobe.
> >
> > so you're seeing that in the benchmark, right? I'm curious how bad
> > the numbers are
> >
> 
> Yes. So, once we get rid of all the uprobe/uretprobe/mm locks (ongoing
> work), this one was the last limiter to linear scalability.
> 
> With this counter, I was topping out at about 12 mln/s uprobe
> triggering (I think it was 32 CPUs, but I don't remember exactly now).
> About 30% of CPU cycles were spent in this increment.
> 
> But those 30% don't paint the full picture. Once the counter is
> removed, the same uprobe throughput jumps to 62 mln/s or so. So we
> definitely have to do something about it.
> 
> > >
> > > Drop it and emit obviously unrealistic value in its stead in
> > > uporbe_profiler seq file.
> > >
> > > The alternative would be allocating per-CPU counter, but I'm not sure
> > > it's justified.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  kernel/trace/trace_uprobe.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > index 52e76a73fa7c..5d38207db479 100644
> > > --- a/kernel/trace/trace_uprobe.c
> > > +++ b/kernel/trace/trace_uprobe.c
> > > @@ -62,7 +62,6 @@ struct trace_uprobe {
> > >       struct uprobe                   *uprobe;
> > >       unsigned long                   offset;
> > >       unsigned long                   ref_ctr_offset;
> > > -     unsigned long                   nhit;
> > >       struct trace_probe              tp;
> > >  };
> > >
> > > @@ -821,7 +820,7 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
> > >
> > >       tu = to_trace_uprobe(ev);
> > >       seq_printf(m, "  %s %-44s %15lu\n", tu->filename,
> > > -                     trace_probe_name(&tu->tp), tu->nhit);
> > > +                trace_probe_name(&tu->tp), ULONG_MAX);
> >
> > seems harsh.. would it be that bad to create per cpu counter for that?
> 
> Well, consider this patch a conversation starter. There are two
> reasons why I'm removing the counter instead of doing per-CPU one:
> 
>   - it's less work to send out a patch pointing out the problem (but
> the solution might change)
>   - this counter was never correct in the presence of multiple
> threads, so I'm not sure how useful it is.
> 
> Yes, I think we can do per-CPU counters, but do we want to pay the
> memory price? That's what I want to get from Masami, Steven, or Peter
> (whoever cares enough).

I would like to make it per-cpu counter *and* make it kconfig optional.
Or just remove with the file (but it changes the user interface without
option).

For the kprobes, the profile file is useful because it shows "missed"
counter. This tells user whether your trace data drops some events or not.
But if uprobes profile only shows the number of hit, we can use the
histogram trigger if needed.

Thank you,

> 
> >
> > jirka
> >
> > >       return 0;
> > >  }
> > >
> > > @@ -1507,7 +1506,6 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
> > >       int ret = 0;
> > >
> > >       tu = container_of(con, struct trace_uprobe, consumer);
> > > -     tu->nhit++;
> > >
> > >       udd.tu = tu;
> > >       udd.bp_addr = instruction_pointer(regs);
> > > --
> > > 2.43.5
> > >


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] uprobes: get rid of bogus trace_uprobe hit counter
  2024-08-07 21:43     ` Masami Hiramatsu
@ 2024-08-09 18:43       ` Andrii Nakryiko
  0 siblings, 0 replies; 5+ messages in thread
From: Andrii Nakryiko @ 2024-08-09 18:43 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Andrii Nakryiko, linux-trace-kernel, rostedt, peterz,
	oleg, bpf, linux-kernel

On Wed, Aug 7, 2024 at 2:44 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Tue, 6 Aug 2024 10:26:25 -0700
> Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> > On Tue, Aug 6, 2024 at 12:37 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Mon, Aug 05, 2024 at 01:28:03PM -0700, Andrii Nakryiko wrote:
> > > > trace_uprobe->nhit counter is not incremented atomically, so its value
> > > > is bogus in practice. On the other hand, it's actually a pretty big
> > > > uprobe scalability problem due to heavy cache line bouncing between CPUs
> > > > triggering the same uprobe.
> > >
> > > so you're seeing that in the benchmark, right? I'm curious how bad
> > > the numbers are
> > >
> >
> > Yes. So, once we get rid of all the uprobe/uretprobe/mm locks (ongoing
> > work), this one was the last limiter to linear scalability.
> >
> > With this counter, I was topping out at about 12 mln/s uprobe
> > triggering (I think it was 32 CPUs, but I don't remember exactly now).
> > About 30% of CPU cycles were spent in this increment.
> >
> > But those 30% don't paint the full picture. Once the counter is
> > removed, the same uprobe throughput jumps to 62 mln/s or so. So we
> > definitely have to do something about it.
> >
> > > >
> > > > Drop it and emit obviously unrealistic value in its stead in
> > > > uporbe_profiler seq file.
> > > >
> > > > The alternative would be allocating per-CPU counter, but I'm not sure
> > > > it's justified.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > ---
> > > >  kernel/trace/trace_uprobe.c | 4 +---
> > > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > >
> > > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > > index 52e76a73fa7c..5d38207db479 100644
> > > > --- a/kernel/trace/trace_uprobe.c
> > > > +++ b/kernel/trace/trace_uprobe.c
> > > > @@ -62,7 +62,6 @@ struct trace_uprobe {
> > > >       struct uprobe                   *uprobe;
> > > >       unsigned long                   offset;
> > > >       unsigned long                   ref_ctr_offset;
> > > > -     unsigned long                   nhit;
> > > >       struct trace_probe              tp;
> > > >  };
> > > >
> > > > @@ -821,7 +820,7 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
> > > >
> > > >       tu = to_trace_uprobe(ev);
> > > >       seq_printf(m, "  %s %-44s %15lu\n", tu->filename,
> > > > -                     trace_probe_name(&tu->tp), tu->nhit);
> > > > +                trace_probe_name(&tu->tp), ULONG_MAX);
> > >
> > > seems harsh.. would it be that bad to create per cpu counter for that?
> >
> > Well, consider this patch a conversation starter. There are two
> > reasons why I'm removing the counter instead of doing per-CPU one:
> >
> >   - it's less work to send out a patch pointing out the problem (but
> > the solution might change)
> >   - this counter was never correct in the presence of multiple
> > threads, so I'm not sure how useful it is.
> >
> > Yes, I think we can do per-CPU counters, but do we want to pay the
> > memory price? That's what I want to get from Masami, Steven, or Peter
> > (whoever cares enough).
>
> I would like to make it per-cpu counter *and* make it kconfig optional.
> Or just remove with the file (but it changes the user interface without
> option).
>
> For the kprobes, the profile file is useful because it shows "missed"
> counter. This tells user whether your trace data drops some events or not.
> But if uprobes profile only shows the number of hit, we can use the
> histogram trigger if needed.
>

I really don't like extra Kconfig options for something like this. So
I'll just go ahead and switch this to per-CPU counters
unconditionally.

For BPF, this is unnecessary memory overhead, but as we shift to
multi-uprobe (which doesn't go through trace_uprobe logic), it
probably isn't a big deal over the longer term.

> Thank you,
>
> >
> > >
> > > jirka
> > >
> > > >       return 0;
> > > >  }
> > > >
> > > > @@ -1507,7 +1506,6 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
> > > >       int ret = 0;
> > > >
> > > >       tu = container_of(con, struct trace_uprobe, consumer);
> > > > -     tu->nhit++;
> > > >
> > > >       udd.tu = tu;
> > > >       udd.bp_addr = instruction_pointer(regs);
> > > > --
> > > > 2.43.5
> > > >
>
>
> --
> Masami Hiramatsu (Google) <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-08-09 18:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-05 20:28 [PATCH] uprobes: get rid of bogus trace_uprobe hit counter Andrii Nakryiko
2024-08-06  7:37 ` Jiri Olsa
2024-08-06 17:26   ` Andrii Nakryiko
2024-08-07 21:43     ` Masami Hiramatsu
2024-08-09 18:43       ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).