Re: [RFC PATCH v4 2/6] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Andi Kleen <ak@linux.intel.com>
To: "Wang, Weilin" <weilin.wang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	"Hunter, Adrian" <adrian.hunter@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	"linux-perf-users@vger.kernel.org"
	<linux-perf-users@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Taylor, Perry" <perry.taylor@intel.com>,
	"Alt, Samantha" <samantha.alt@intel.com>,
	"Biggers, Caleb" <caleb.biggers@intel.com>
Subject: Re: [RFC PATCH v4 2/6] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.
Date: Wed, 13 Mar 2024 08:55:12 -0700	[thread overview]
Message-ID: <ZfHMYM3iWlsODtjP@tassilo> (raw)
In-Reply-To: <CO6PR11MB56351D1706A9C46D80982AECEE2A2@CO6PR11MB5635.namprd11.prod.outlook.com>

On Wed, Mar 13, 2024 at 03:31:14PM +0000, Wang, Weilin wrote:
> 
> 
> > -----Original Message-----
> > From: Andi Kleen <ak@linux.intel.com>
> > Sent: Tuesday, March 12, 2024 5:56 PM
> > To: Wang, Weilin <weilin.wang@intel.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>; Ian Rogers
> > <irogers@google.com>; Arnaldo Carvalho de Melo <acme@kernel.org>; Peter
> > Zijlstra <peterz@infradead.org>; Ingo Molnar <mingo@redhat.com>;
> > Alexander Shishkin <alexander.shishkin@linux.intel.com>; Jiri Olsa
> > <jolsa@kernel.org>; Hunter, Adrian <adrian.hunter@intel.com>; Kan Liang
> > <kan.liang@linux.intel.com>; linux-perf-users@vger.kernel.org; linux-
> > kernel@vger.kernel.org; Taylor, Perry <perry.taylor@intel.com>; Alt, Samantha
> > <samantha.alt@intel.com>; Biggers, Caleb <caleb.biggers@intel.com>
> > Subject: Re: [RFC PATCH v4 2/6] perf stat: Fork and launch perf record when
> > perf stat needs to get retire latency value for a metric.
> > 
> > "Wang, Weilin" <weilin.wang@intel.com> writes:
> > 
> > >> -----Original Message-----
> > >> From: Andi Kleen <ak@linux.intel.com>
> > >> Sent: Tuesday, March 12, 2024 5:03 PM
> > >> To: Wang, Weilin <weilin.wang@intel.com>
> > >> Cc: Namhyung Kim <namhyung@kernel.org>; Ian Rogers
> > >> <irogers@google.com>; Arnaldo Carvalho de Melo <acme@kernel.org>;
> > Peter
> > >> Zijlstra <peterz@infradead.org>; Ingo Molnar <mingo@redhat.com>;
> > >> Alexander Shishkin <alexander.shishkin@linux.intel.com>; Jiri Olsa
> > >> <jolsa@kernel.org>; Hunter, Adrian <adrian.hunter@intel.com>; Kan Liang
> > >> <kan.liang@linux.intel.com>; linux-perf-users@vger.kernel.org; linux-
> > >> kernel@vger.kernel.org; Taylor, Perry <perry.taylor@intel.com>; Alt,
> > Samantha
> > >> <samantha.alt@intel.com>; Biggers, Caleb <caleb.biggers@intel.com>
> > >> Subject: Re: [RFC PATCH v4 2/6] perf stat: Fork and launch perf record
> > when
> > >> perf stat needs to get retire latency value for a metric.
> > >>
> > >> weilin.wang@intel.com writes:
> > >>
> > >> > From: Weilin Wang <weilin.wang@intel.com>
> > >> >
> > >> > When retire_latency value is used in a metric formula, perf stat would fork
> > a
> > >> > perf record process with "-e" and "-W" options. Perf record will collect
> > >> > required retire_latency values in parallel while perf stat is collecting
> > >> > counting values.
> > >>
> > >> How does that work when the workload is specified on the command line?
> > >> The workload would run twice? That is very inefficient and may not
> > >> work if it's a large workload.
> > >>
> > >> The perf tool infrastructure is imho not up to the task of such
> > >> parallel collection.
> > >>
> > >> Also it won't work for very long collections because you will get a
> > >> very large perf.data. Better to use a pipeline.
> > >>
> > >> I think it would be better if you made it a separate operation that can
> > >> generate a file that is then consumed by perf stat. This is also more efficient
> > >> because often the calibration is only needed once. And it's all under
> > >> user control so no nasty surprises.
> > >>
> > >
> > > Workload runs only once with perf stat. Perf record is forked by perf stat and
> > run
> > > in parallel with perf stat. Perf stat will send perf record a signal to terminate
> > after
> > > perf stat stops collecting count value.
> > 
> > I don't understand how the perf record filters on the workload created by
> > the perf stat. At a minimum you would need -p to connect to the pid
> > of the parent, but IIRC -p doesnt follow children, so if it forked
> > it wouldn't work.
> > 
> > I think your approach may only work with -a, but perhaps I'm missing
> > something (-a is often not usable due to restrictions)
> > 
> > Also if perf stat runs in interval mode and you only get the data
> > at the end how would that work?
> > 
> > iirc i wrestled with all these questions for toplev (which has a
> > similar feature) and in the end i concluded doing it automatically
> > has far too many problems.
> > 
> 
> Yes, you are completely right that there are limitation that we can only support -a, -C 
> and not support on -I now. I'm wondering if we could support "-I" in next step by 
> processing sampled data on the go.

-I is very tricky in a separate process. How do you align the two
intervals on a long runs without drift. I don't know of a reliable
way to do it in the general case only using time.

Also just the non support for forking workloads without -a is fatal imho. That's 
likely one of the most common cases.

Separate is a far better model imho:

- It is under full user control and no surprises
- No uncontrolled multiplexing
- Often it is fine to measure once and cache the data

It cannot deal with -I properly either (short of some form of
phase detection), but at least it doesn't give false promises
to that effect.

The way to do it is to have defaults in a json file
and the user can override them with a calibration step.
There is a JSON format that is used by some other tools.

This is my implementation:
https://github.com/andikleen/pmu-tools/blob/master/genretlat.py
https://github.com/andikleen/pmu-tools/blob/89861055b53e57ba0b7c6348745b2fbe6615c068/toplev.py#L1031


-Andi

next prev parent reply	other threads:[~2024-03-13 15:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-12 23:49 [RFC PATCH v4 0/6] TPEBS counting mode support weilin.wang
2024-03-12 23:49 ` [RFC PATCH v4 1/6] perf stat: Parse and find tpebs events when parsing metrics to prepare for perf record sampling weilin.wang
2024-03-12 23:58   ` Andi Kleen
2024-03-13  0:27     ` Wang, Weilin
2024-03-12 23:49 ` [RFC PATCH v4 2/6] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric weilin.wang
2024-03-13  0:03   ` Andi Kleen
2024-03-13  0:26     ` Wang, Weilin
2024-03-13  0:56       ` Andi Kleen
2024-03-13 15:31         ` Wang, Weilin
2024-03-13 15:55           ` Andi Kleen [this message]
2024-03-13 16:23             ` Wang, Weilin
2024-03-14  0:00               ` Andi Kleen
2024-03-24  3:39   ` Ian Rogers
2024-03-12 23:49 ` [RFC PATCH v4 3/6] perf stat: Add retire latency values into the expr_parse_ctx to prepare for final metric calculation weilin.wang
2024-03-24  3:45   ` Ian Rogers
2024-03-12 23:49 ` [RFC PATCH v4 4/6] perf stat: Create another thread for sample data processing weilin.wang
2024-03-12 23:49 ` [RFC PATCH v4 5/6] perf stat: Add retire latency print functions to print out at the very end of print out weilin.wang
2024-03-12 23:49 ` [RFC PATCH v4 6/6] perf vendor events intel: Add MTL metric json files weilin.wang
2024-03-24  4:01 ` [RFC PATCH v4 0/6] TPEBS counting mode support Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfHMYM3iWlsODtjP@tassilo \
    --to=ak@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=caleb.biggers@intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=perry.taylor@intel.com \
    --cc=peterz@infradead.org \
    --cc=samantha.alt@intel.com \
    --cc=weilin.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).