Re: [RFC PATCH v4 2/6] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andi Kleen <ak@linux.intel.com>
To: "Wang, Weilin" <weilin.wang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	"Hunter, Adrian" <adrian.hunter@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	"linux-perf-users@vger.kernel.org"
	<linux-perf-users@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Taylor, Perry" <perry.taylor@intel.com>,
	"Alt, Samantha" <samantha.alt@intel.com>,
	"Biggers, Caleb" <caleb.biggers@intel.com>
Subject: Re: [RFC PATCH v4 2/6] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric.
Date: Wed, 13 Mar 2024 08:55:12 -0700	[thread overview]
Message-ID: <ZfHMYM3iWlsODtjP@tassilo> (raw)
In-Reply-To: <CO6PR11MB56351D1706A9C46D80982AECEE2A2@CO6PR11MB5635.namprd11.prod.outlook.com>

On Wed, Mar 13, 2024 at 03:31:14PM +0000, Wang, Weilin wrote:
> 
> 
> > -----Original Message-----
> > From: Andi Kleen <ak@linux.intel.com>
> > Sent: Tuesday, March 12, 2024 5:56 PM
> > To: Wang, Weilin <weilin.wang@intel.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>; Ian Rogers
> > <irogers@google.com>; Arnaldo Carvalho de Melo <acme@kernel.org>; Peter
> > Zijlstra <peterz@infradead.org>; Ingo Molnar <mingo@redhat.com>;
> > Alexander Shishkin <alexander.shishkin@linux.intel.com>; Jiri Olsa
> > <jolsa@kernel.org>; Hunter, Adrian <adrian.hunter@intel.com>; Kan Liang
> > <kan.liang@linux.intel.com>; linux-perf-users@vger.kernel.org; linux-
> > kernel@vger.kernel.org; Taylor, Perry <perry.taylor@intel.com>; Alt, Samantha
> > <samantha.alt@intel.com>; Biggers, Caleb <caleb.biggers@intel.com>
> > Subject: Re: [RFC PATCH v4 2/6] perf stat: Fork and launch perf record when
> > perf stat needs to get retire latency value for a metric.
> > 
> > "Wang, Weilin" <weilin.wang@intel.com> writes:
> > 
> > >> -----Original Message-----
> > >> From: Andi Kleen <ak@linux.intel.com>
> > >> Sent: Tuesday, March 12, 2024 5:03 PM
> > >> To: Wang, Weilin <weilin.wang@intel.com>
> > >> Cc: Namhyung Kim <namhyung@kernel.org>; Ian Rogers
> > >> <irogers@google.com>; Arnaldo Carvalho de Melo <acme@kernel.org>;
> > Peter
> > >> Zijlstra <peterz@infradead.org>; Ingo Molnar <mingo@redhat.com>;
> > >> Alexander Shishkin <alexander.shishkin@linux.intel.com>; Jiri Olsa
> > >> <jolsa@kernel.org>; Hunter, Adrian <adrian.hunter@intel.com>; Kan Liang
> > >> <kan.liang@linux.intel.com>; linux-perf-users@vger.kernel.org; linux-
> > >> kernel@vger.kernel.org; Taylor, Perry <perry.taylor@intel.com>; Alt,
> > Samantha
> > >> <samantha.alt@intel.com>; Biggers, Caleb <caleb.biggers@intel.com>
> > >> Subject: Re: [RFC PATCH v4 2/6] perf stat: Fork and launch perf record
> > when
> > >> perf stat needs to get retire latency value for a metric.
> > >>
> > >> weilin.wang@intel.com writes:
> > >>
> > >> > From: Weilin Wang <weilin.wang@intel.com>
> > >> >
> > >> > When retire_latency value is used in a metric formula, perf stat would fork
> > a
> > >> > perf record process with "-e" and "-W" options. Perf record will collect
> > >> > required retire_latency values in parallel while perf stat is collecting
> > >> > counting values.
> > >>
> > >> How does that work when the workload is specified on the command line?
> > >> The workload would run twice? That is very inefficient and may not
> > >> work if it's a large workload.
> > >>
> > >> The perf tool infrastructure is imho not up to the task of such
> > >> parallel collection.
> > >>
> > >> Also it won't work for very long collections because you will get a
> > >> very large perf.data. Better to use a pipeline.
> > >>
> > >> I think it would be better if you made it a separate operation that can
> > >> generate a file that is then consumed by perf stat. This is also more efficient
> > >> because often the calibration is only needed once. And it's all under
> > >> user control so no nasty surprises.
> > >>
> > >
> > > Workload runs only once with perf stat. Perf record is forked by perf stat and
> > run
> > > in parallel with perf stat. Perf stat will send perf record a signal to terminate
> > after
> > > perf stat stops collecting count value.
> > 
> > I don't understand how the perf record filters on the workload created by
> > the perf stat. At a minimum you would need -p to connect to the pid
> > of the parent, but IIRC -p doesnt follow children, so if it forked
> > it wouldn't work.
> > 
> > I think your approach may only work with -a, but perhaps I'm missing
> > something (-a is often not usable due to restrictions)
> > 
> > Also if perf stat runs in interval mode and you only get the data
> > at the end how would that work?
> > 
> > iirc i wrestled with all these questions for toplev (which has a
> > similar feature) and in the end i concluded doing it automatically
> > has far too many problems.
> > 
> 
> Yes, you are completely right that there are limitation that we can only support -a, -C 
> and not support on -I now. I'm wondering if we could support "-I" in next step by 
> processing sampled data on the go.

-I is very tricky in a separate process. How do you align the two
intervals on a long runs without drift. I don't know of a reliable
way to do it in the general case only using time.

Also just the non support for forking workloads without -a is fatal imho. That's 
likely one of the most common cases.

Separate is a far better model imho:

- It is under full user control and no surprises
- No uncontrolled multiplexing
- Often it is fine to measure once and cache the data

It cannot deal with -I properly either (short of some form of
phase detection), but at least it doesn't give false promises
to that effect.

The way to do it is to have defaults in a json file
and the user can override them with a calibration step.
There is a JSON format that is used by some other tools.

This is my implementation:
https://github.com/andikleen/pmu-tools/blob/master/genretlat.py
https://github.com/andikleen/pmu-tools/blob/89861055b53e57ba0b7c6348745b2fbe6615c068/toplev.py#L1031


-Andi

next prev parent reply	other threads:[~2024-03-13 15:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-12 23:49 [RFC PATCH v4 0/6] TPEBS counting mode support weilin.wang
2024-03-12 23:49 ` [RFC PATCH v4 1/6] perf stat: Parse and find tpebs events when parsing metrics to prepare for perf record sampling weilin.wang
2024-03-12 23:58   ` Andi Kleen
2024-03-13  0:27     ` Wang, Weilin
2024-03-12 23:49 ` [RFC PATCH v4 2/6] perf stat: Fork and launch perf record when perf stat needs to get retire latency value for a metric weilin.wang
2024-03-13  0:03   ` Andi Kleen
2024-03-13  0:26     ` Wang, Weilin
2024-03-13  0:56       ` Andi Kleen
2024-03-13 15:31         ` Wang, Weilin
2024-03-13 15:55           ` Andi Kleen [this message]
2024-03-13 16:23             ` Wang, Weilin
2024-03-14  0:00               ` Andi Kleen
2024-03-24  3:39   ` Ian Rogers
2024-03-12 23:49 ` [RFC PATCH v4 3/6] perf stat: Add retire latency values into the expr_parse_ctx to prepare for final metric calculation weilin.wang
2024-03-24  3:45   ` Ian Rogers
2024-03-12 23:49 ` [RFC PATCH v4 4/6] perf stat: Create another thread for sample data processing weilin.wang
2024-03-12 23:49 ` [RFC PATCH v4 5/6] perf stat: Add retire latency print functions to print out at the very end of print out weilin.wang
2024-03-12 23:49 ` [RFC PATCH v4 6/6] perf vendor events intel: Add MTL metric json files weilin.wang
2024-03-24  4:01 ` [RFC PATCH v4 0/6] TPEBS counting mode support Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfHMYM3iWlsODtjP@tassilo \
    --to=ak@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=caleb.biggers@intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=perry.taylor@intel.com \
    --cc=peterz@infradead.org \
    --cc=samantha.alt@intel.com \
    --cc=weilin.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.