Re: [RFC PATCH v11 3/8] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Namhyung Kim <namhyung@kernel.org>
To: "Wang, Weilin" <weilin.wang@intel.com>
Cc: Ian Rogers <irogers@google.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	"Hunter, Adrian" <adrian.hunter@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	"linux-perf-users@vger.kernel.org"
	<linux-perf-users@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Taylor, Perry" <perry.taylor@intel.com>,
	"Alt, Samantha" <samantha.alt@intel.com>,
	"Biggers, Caleb" <caleb.biggers@intel.com>
Subject: Re: [RFC PATCH v11 3/8] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.
Date: Sat, 8 Jun 2024 19:27:56 -0700	[thread overview]
Message-ID: <ZmUTLOeLcdYs-cqe@google.com> (raw)
In-Reply-To: <CO6PR11MB5635DC04091BBE1FEC59642DEEFB2@CO6PR11MB5635.namprd11.prod.outlook.com>

On Fri, Jun 07, 2024 at 08:45:13PM +0000, Wang, Weilin wrote:
> 
> 
> > -----Original Message-----
> > From: Namhyung Kim <namhyung@kernel.org>
> > Sent: Friday, June 7, 2024 12:20 PM
> > To: Wang, Weilin <weilin.wang@intel.com>
> > Cc: Ian Rogers <irogers@google.com>; Arnaldo Carvalho de Melo
> > <acme@kernel.org>; Peter Zijlstra <peterz@infradead.org>; Ingo Molnar
> > <mingo@redhat.com>; Alexander Shishkin
> > <alexander.shishkin@linux.intel.com>; Jiri Olsa <jolsa@kernel.org>; Hunter,
> > Adrian <adrian.hunter@intel.com>; Kan Liang <kan.liang@linux.intel.com>;
> > linux-perf-users@vger.kernel.org; linux-kernel@vger.kernel.org; Taylor, Perry
> > <perry.taylor@intel.com>; Alt, Samantha <samantha.alt@intel.com>; Biggers,
> > Caleb <caleb.biggers@intel.com>
> > Subject: Re: [RFC PATCH v11 3/8] perf stat: Fork and launch perf record when
> > perf stat needs to get retire latency value for a metric.
> > 
> > On Fri, Jun 07, 2024 at 01:07:12AM +0000, Wang, Weilin wrote:
[SNIP]
> > > > > @@ -2186,6 +2240,9 @@ static int evsel__open_cpu(struct evsel *evsel,
> > > > struct perf_cpu_map *cpus,
> > > > >  		return 0;
> > > > >  	}
> > > > >
> > > > > +	if (evsel__is_retire_lat(evsel))
> > > > > +		return tpebs_start(evsel->evlist, cpus);
> > > >
> > > > As it works with evlist, I think it's better to put this code there.
> > > > But it seems perf stat doesn't call the evlist API for open, then we
> > > > can add this to somewhere in __run_perf_stat() directly.
> > > >
> > > > > +
> > > > >  	err = __evsel__prepare_open(evsel, cpus, threads);
> > > > >  	if (err)
> > > > >  		return err;
> > > > > @@ -2376,6 +2433,8 @@ int evsel__open(struct evsel *evsel, struct
> > > > perf_cpu_map *cpus,
> > > > >
> > > > >  void evsel__close(struct evsel *evsel)
> > > > >  {
> > > > > +	if (evsel__is_retire_lat(evsel))
> > > > > +		tpebs_delete();
> > > >
> > > > Ditto.
> > >
> > > Hi Namhyung,
> > >
> > > I hope both this and the one above on open could stay in evsel level because
> > > these are operations on retire_latency evsel.
> > 
> > Then I think you need to remove the specific evsel not the all tpebs
> > events.
> > 
> > > At the same time, a lot of the
> > > previous several versions of work was to move TPEBS code out from perf
> > stat to
> > > evsel to make it more generic. I think move these back to __run_perf_stat()
> > are
> > > opposite to that goal.
> > 
> > Oh, I meant you can have the logic in utils/intel-tpebs.c but add a call
> > to tpebs_delete() in __run_perf_stat().  I think it'd better to keep it
> > in evlist__close() but we don't use evlist__open() for perf stat so it's
> > not symmetric. :(
> > 
> > Anyway, all I want to say is that tpebs APIs work on evlist level.  So I
> > think it's natural that they are called for the whole list, not for an
> > event/evsel.
> 
> I think we're trying to work at evsel level and open(remove) or close one 
> retire_latency evsel at a time. In addition to that, we put all the required retire_latency
> together in one perf record launch in order to reduce overhead to fork multiple perf 
> record. I hope this makes sense. 

Well.. I think we can do something like this in the current code.

__run_perf_stat():
  ...

  tpebs__start(evlist, target);

  evlist__for_each_cpu(...) {
      if (create_perf_steat_counter() < 0) {
          ....

instead of doing it in the evsel__open().  What's the issue with this
approach?

> 
> > 
> > >
> > >
> > > >
> > > > >  	perf_evsel__close(&evsel->core);
> > > > >  	perf_evsel__free_id(&evsel->core);
> > > > >  }
> > > > > @@ -3341,6 +3400,9 @@ static int store_evsel_ids(struct evsel *evsel,
> > > > struct evlist *evlist)
> > > > >  {
> > > > >  	int cpu_map_idx, thread;
> > > > >
> > > > > +	if (evsel__is_retire_lat(evsel))
> > > > > +		return 0;
> > > > > +
> > > > >  	for (cpu_map_idx = 0; cpu_map_idx < xyarray__max_x(evsel->core.fd);
> > > > cpu_map_idx++) {
> > > > >  		for (thread = 0; thread < xyarray__max_y(evsel->core.fd);
> > > > >  		     thread++) {
> > > > > diff --git a/tools/perf/util/intel-tpebs.c b/tools/perf/util/intel-tpebs.c
> > > > > new file mode 100644
> > > > > index 000000000000..37b7a4f92dd9
> > > > > --- /dev/null
> > > > > +++ b/tools/perf/util/intel-tpebs.c
> > > > > @@ -0,0 +1,397 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0-only
> > > > > +/*
> > > > > + * intel_tpebs.c: Intel TPEBS support
> > > > > + */
> > > > > +
> > > > > +
> > > > > +#include <sys/param.h>
> > > > > +#include <subcmd/run-command.h>
> > > > > +#include <thread.h>
> > > > > +#include "intel-tpebs.h"
> > > > > +#include <linux/list.h>
> > > > > +#include <linux/zalloc.h>
> > > > > +#include <linux/err.h>
> > > > > +#include "sample.h"
> > > > > +#include "debug.h"
> > > > > +#include "evlist.h"
> > > > > +#include "evsel.h"
> > > > > +#include "session.h"
> > > > > +#include "tool.h"
> > > > > +#include "cpumap.h"
> > > > > +#include "metricgroup.h"
> > > > > +#include <sys/stat.h>
> > > > > +#include <sys/file.h>
> > > > > +#include <poll.h>
> > > > > +#include <math.h>
> > > > > +
> > > > > +#define PERF_DATA		"-"
> > > > > +
> > > > > +bool tpebs_recording;
> > > > > +static pid_t tpebs_pid = -1;
> > > > > +static size_t tpebs_event_size;
> > > > > +static pthread_t tpebs_reader_thread;
> > > > > +static struct child_process *tpebs_cmd;
> > > > > +static struct list_head tpebs_results = LIST_HEAD_INIT(tpebs_results);
> > > >
> > > > It can be 'static LIST_HEAD(tpebs_results);'
> > > >
> > > > > +
> > > > > +struct tpebs_retire_lat {
> > > > > +	struct list_head nd;
> > > > > +	/* Event name */
> > > > > +	const char *name;
> > > > > +	/* Event name with the TPEBS modifier R */
> > > > > +	const char *tpebs_name;
> > > > > +	/* Count of retire_latency values found in sample data */
> > > > > +	size_t count;
> > > > > +	/* Sum of all the retire_latency values in sample data */
> > > > > +	int sum;
> > > > > +	/* Average of retire_latency, val = sum / count */
> > > > > +	double val;
> > > > > +};
> > > > > +
> > > > > +static int get_perf_record_args(const char **record_argv, char buf[],
> > > > > +				const char *cpumap_buf)
> > > > > +{
> > > > > +	struct tpebs_retire_lat *e;
> > > > > +	int i = 0;
> > > > > +
> > > > > +	pr_debug("Prepare perf record for retire_latency\n");
> > > > > +
> > > > > +	record_argv[i++] = "perf";
> > > > > +	record_argv[i++] = "record";
> > > > > +	record_argv[i++] = "-W";
> > > > > +	record_argv[i++] = "--synth=no";
> > > > > +	record_argv[i++] = buf;
> > > > > +
> > > > > +	if (cpumap_buf) {
> > > > > +		record_argv[i++] = "-C";
> > > > > +		record_argv[i++] = cpumap_buf;
> > > > > +	}
> > > > > +
> > > > > +	record_argv[i++] = "-a";
> > > > > +
> > > > > +	if (!cpumap_buf) {
> > > > > +		pr_err("Require cpumap list to run sampling.\n");
> > > > > +		return -ECANCELED;
> > > > > +	}
> > > >
> > > > Hmm.. I thought you supported system wide collection, not sure if it has
> > > > a cpumap.  Anyway this check makes the earlier one meaningless - you
> > > > need the cpumap always, right?
> > >
> > > TPEBS should be work with "-a" or "-C". I'm not sure what the cpumap
> > would be
> > > for "-a". Would it make sense to add "-a" only when cpumap_buf is NULL?
> > 
> > I think the best way is to check target__has_cpu().
> Yes this is an ideal way to check. But since the tpebs_start() is called in evsel, I'm
> wondering do we still have target here?

Please see above.

Thanks,
Namhyung

> 
> > 
> > >
> > > >
> > > > > +
> > > > > +	list_for_each_entry(e, &tpebs_results, nd) {
> > > > > +		record_argv[i++] = "-e";
> > > > > +		record_argv[i++] = e->name;
> > > > > +	}
> > > > > +
> > > > > +	record_argv[i++] = "-o";
> > > > > +	record_argv[i++] = PERF_DATA;
> > > > > +
> > > > > +	return 0;
> > > > > +}

next prev parent reply	other threads:[~2024-06-09  2:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05  5:21 [RFC PATCH v11 0/8]TPEBS counting mode support weilin.wang
2024-06-05  5:21 ` [RFC PATCH v11 1/8] perf parse-events: Add a retirement latency modifier weilin.wang
2024-06-05  5:21 ` [RFC PATCH v11 2/8] perf data: Allow to use given fd in data->file.fd weilin.wang
2024-06-05  5:21 ` [RFC PATCH v11 3/8] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric weilin.wang
2024-06-06 23:20   ` Namhyung Kim
2024-06-07  1:07     ` Wang, Weilin
2024-06-07 19:20       ` Namhyung Kim
2024-06-07 20:45         ` Wang, Weilin
2024-06-09  2:27           ` Namhyung Kim [this message]
2024-06-09  3:02             ` Wang, Weilin
2024-06-10 23:58               ` Namhyung Kim
2024-06-05  5:21 ` [RFC PATCH v11 4/8] perf stat: Plugin retire_lat value from sampled data to evsel weilin.wang
2024-06-05  5:21 ` [RFC PATCH v11 5/8] perf vendor events intel: Add MTL metric json files weilin.wang
2024-06-05  5:21 ` [RFC PATCH v11 6/8] perf stat: Add command line option for enabling tpebs recording weilin.wang
2024-06-05  5:21 ` [RFC PATCH v11 7/8] perf Document: Add TPEBS to Documents weilin.wang
2024-06-05  5:21 ` [RFC PATCH v11 8/8] perf test: Add test for Intel TPEBS counting mode weilin.wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZmUTLOeLcdYs-cqe@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=caleb.biggers@intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=perry.taylor@intel.com \
    --cc=peterz@infradead.org \
    --cc=samantha.alt@intel.com \
    --cc=weilin.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.