Re: [PATCH] perf scripts python: Add a script to run instances of perf script in parallel

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Adrian Hunter <adrian.hunter@intel.com>
To: Andi Kleen <ak@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org
Subject: Re: [PATCH] perf scripts python: Add a script to run instances of perf script in parallel
Date: Mon, 11 Mar 2024 19:52:01 +0200	[thread overview]
Message-ID: <3a92ebdb-8923-46af-a020-0e12233262a9@intel.com> (raw)
In-Reply-To: <Ze8ttn4bxBrYi63h@tassilo>

On 11/03/24 18:13, Andi Kleen wrote:
> On Sun, Mar 10, 2024 at 09:35:02PM +0200, Adrian Hunter wrote:
>> Add a Python script to run a perf script command multiple times in
>> parallel, using perf script options --cpu and --time so that each job
>> processes a different chunk of the data.
>>
>> Refer to the script's own help text at the end of the patch for more
>> details.
>>
>> The script is useful for Intel PT traces, that can be efficiently
>> decoded by perf script when split by CPU and/or time ranges. Running
>> jobs in parallel can decrease the overall decoding time.
> 
> This only optimizes for the run time of the decoder. Often when you do
> analysis you have a non trivial part of it in some analysis script too,
> but you currently have no directi / easy way to paralelize that. It would 
> be better to support parallel pipelines.

It will parallelize any scripts and / or dlfilters that perf script
itself executes.

> 
> TBH I'm not sure the script is worth it. If you need to do parallel
> pipelines (which imho is the common case) it's probably better to just
> write a custom shell script, which is not that difficult.

It can be a pain to figure out how best to split the data if it is not
evenly distributed.

The script also has value as a reference or starting point for
users.

>                                                           It might be
> better to have a helper that makes writing such scripts easier, 
> e.g. figuring out reasonable options for manual parallelization
> based on the input file. I think parts of your script do that, maybe
> it is usable for that.

The --dry-run option shows the perf script commands, but an option
to pipe through another command could be added.

> 
> Also as a default output it would be better to just merge the 
> original output in order and output it on stdout.

That assumes that the output comes from perf script printf
output and not a perf script _script_.

If the data is split by CPU, it will not be in time order
if it is simply concatenated back together.

> 
> You should probably limit the number of jobs to some minimum
> length, otherwise on systems with many CPUs there might be
> inefficiently short jobs.

That happens for Intel PT (64 PSB minimum), but could be added
for the normal case also.

     prev parent reply	other threads:[~2024-03-11 17:52 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-10 19:35 [PATCH] perf scripts python: Add a script to run instances of perf script in parallel Adrian Hunter
2024-03-11 16:13 ` Andi Kleen
2024-03-11 17:52   ` Adrian Hunter [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3a92ebdb-8923-46af-a020-0e12233262a9@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).