Re: [PATCH] uprobes: get rid of bogus trace_uprobe hit counter

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Jiri Olsa <olsajiri@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	linux-trace-kernel@vger.kernel.org, rostedt@goodmis.org,
	mhiramat@kernel.org, peterz@infradead.org, oleg@redhat.com,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] uprobes: get rid of bogus trace_uprobe hit counter
Date: Thu, 8 Aug 2024 06:43:53 +0900	[thread overview]
Message-ID: <20240808064353.7470f6bfab89bd28dbcdebe0@kernel.org> (raw)
In-Reply-To: <CAEf4Bzaq86fPVGWtXqvxLtbsk06coGBebnAO5YiuvuUF2v7++w@mail.gmail.com>

On Tue, 6 Aug 2024 10:26:25 -0700
Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:

> On Tue, Aug 6, 2024 at 12:37 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Aug 05, 2024 at 01:28:03PM -0700, Andrii Nakryiko wrote:
> > > trace_uprobe->nhit counter is not incremented atomically, so its value
> > > is bogus in practice. On the other hand, it's actually a pretty big
> > > uprobe scalability problem due to heavy cache line bouncing between CPUs
> > > triggering the same uprobe.
> >
> > so you're seeing that in the benchmark, right? I'm curious how bad
> > the numbers are
> >
> 
> Yes. So, once we get rid of all the uprobe/uretprobe/mm locks (ongoing
> work), this one was the last limiter to linear scalability.
> 
> With this counter, I was topping out at about 12 mln/s uprobe
> triggering (I think it was 32 CPUs, but I don't remember exactly now).
> About 30% of CPU cycles were spent in this increment.
> 
> But those 30% don't paint the full picture. Once the counter is
> removed, the same uprobe throughput jumps to 62 mln/s or so. So we
> definitely have to do something about it.
> 
> > >
> > > Drop it and emit obviously unrealistic value in its stead in
> > > uporbe_profiler seq file.
> > >
> > > The alternative would be allocating per-CPU counter, but I'm not sure
> > > it's justified.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  kernel/trace/trace_uprobe.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > > index 52e76a73fa7c..5d38207db479 100644
> > > --- a/kernel/trace/trace_uprobe.c
> > > +++ b/kernel/trace/trace_uprobe.c
> > > @@ -62,7 +62,6 @@ struct trace_uprobe {
> > >       struct uprobe                   *uprobe;
> > >       unsigned long                   offset;
> > >       unsigned long                   ref_ctr_offset;
> > > -     unsigned long                   nhit;
> > >       struct trace_probe              tp;
> > >  };
> > >
> > > @@ -821,7 +820,7 @@ static int probes_profile_seq_show(struct seq_file *m, void *v)
> > >
> > >       tu = to_trace_uprobe(ev);
> > >       seq_printf(m, "  %s %-44s %15lu\n", tu->filename,
> > > -                     trace_probe_name(&tu->tp), tu->nhit);
> > > +                trace_probe_name(&tu->tp), ULONG_MAX);
> >
> > seems harsh.. would it be that bad to create per cpu counter for that?
> 
> Well, consider this patch a conversation starter. There are two
> reasons why I'm removing the counter instead of doing per-CPU one:
> 
>   - it's less work to send out a patch pointing out the problem (but
> the solution might change)
>   - this counter was never correct in the presence of multiple
> threads, so I'm not sure how useful it is.
> 
> Yes, I think we can do per-CPU counters, but do we want to pay the
> memory price? That's what I want to get from Masami, Steven, or Peter
> (whoever cares enough).

I would like to make it per-cpu counter *and* make it kconfig optional.
Or just remove with the file (but it changes the user interface without
option).

For the kprobes, the profile file is useful because it shows "missed"
counter. This tells user whether your trace data drops some events or not.
But if uprobes profile only shows the number of hit, we can use the
histogram trigger if needed.

Thank you,

> 
> >
> > jirka
> >
> > >       return 0;
> > >  }
> > >
> > > @@ -1507,7 +1506,6 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
> > >       int ret = 0;
> > >
> > >       tu = container_of(con, struct trace_uprobe, consumer);
> > > -     tu->nhit++;
> > >
> > >       udd.tu = tu;
> > >       udd.bp_addr = instruction_pointer(regs);
> > > --
> > > 2.43.5
> > >


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>

next prev parent reply	other threads:[~2024-08-07 21:43 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-05 20:28 [PATCH] uprobes: get rid of bogus trace_uprobe hit counter Andrii Nakryiko
2024-08-06  7:37 ` Jiri Olsa
2024-08-06 17:26   ` Andrii Nakryiko
2024-08-07 21:43     ` Masami Hiramatsu [this message]
2024-08-09 18:43       ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240808064353.7470f6bfab89bd28dbcdebe0@kernel.org \
    --to=mhiramat@kernel.org \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=olsajiri@gmail.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).