All of lore.kernel.org
 help / color / mirror / Atom feed
From: Namhyung Kim <namhyung@kernel.org>
To: Andi Kleen <ak@linux.intel.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v7 3/4] perf script: Fix perf script -F +metric
Date: Thu, 25 Jul 2024 17:31:20 -0700	[thread overview]
Message-ID: <ZqLuWGKZjdUrkd0L@google.com> (raw)
In-Reply-To: <20240724190137.3810429-3-ak@linux.intel.com>

On Wed, Jul 24, 2024 at 12:01:36PM -0700, Andi Kleen wrote:
> This fixes a regression with perf script -F +metric originally caused by :
> 
> commit 37cc8ad77cf81f3ffd226856c367b0e15333a738
> Author: Ian Rogers <irogers@google.com>
> Date:   Sun Feb 19 01:28:46 2023 -0800
> 
>     perf metric: Directly use counts rather than saved_value
> 
> In the perf script environment the evsel wouldn't allocate an aggr
> values array, which led to a -1 reference because the metric
> evaluation would try to reference NULL - 1 (for aggr_idx)
> 
> Give the perf script evsels a single CPU aggr setup. That's
> enough because the groups are always contiguous, so no need
> to store more than one CPU's worth of values.
> 
> Before
> 
> % perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
> % perf script -F +metric
> Segmentation fault (core dumped)
> 
> After:
> 
> % perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
> ...
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.028 MB perf.data (90 samples) ]
> % perf script -F +metric
>        perf-exec 1847557 264658.180789:       3009       cycles:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
>        perf-exec 1847557 264658.180789:        382 instructions:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
>        perf-exec 1847557 264658.180789:         metric:    0.13  insn per cycle
> ...
> 
> Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather ...")
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ----
> 
> v2: Reformat code
> v3: Work around bogus warning
> v4: Set up aggr map only for metrics case to keep perf stat record
> working
> v5: Broken version
> v6: Only set up limited aggregation mode with -F +metric. Add conflict
> checks with perf stat record files.
> v7: Remove some unnecessary conflict checks. Fix buffer overflow. Minor cleanups.
> ---
>  tools/perf/builtin-script.c | 42 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index c16224b1fef3..8058bb19a956 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -335,7 +335,6 @@ struct evsel_script {
>         FILE *fp;
>         u64  samples;
>         /* For metric output */
> -       u64  val;
>         int  gnum;
>  };
>  
> @@ -2132,13 +2131,17 @@ static void perf_sample__fprint_metric(struct perf_script *script,
>  		evlist__alloc_stats(&stat_config, script->session->evlist, /*alloc_raw=*/false);
>  	if (evsel_script(leader)->gnum++ == 0)
>  		perf_stat__reset_shadow_stats();
> -	val = sample->period * evsel->scale;
> -	evsel_script(evsel)->val = val;
> +	val = sample->period;
> +	/*
> +	 * Always use the first storage because the groups are contiguous

Without leader sampling we cannot guarantee groups events fire
together, right?


> +	 * and there's no need to handle multiple indexes for anything

Actually I think this is a behavior change that you changed the
aggregation mode from NONE to GLOBAL.

> +	 */
> +	evsel->stats->aggr[0].counts.val = val;
>  	if (evsel_script(leader)->gnum == leader->core.nr_members) {
>  		for_each_group_member (ev2, leader) {
>  			perf_stat__print_shadow_stats(&stat_config, ev2,
> -						      evsel_script(ev2)->val,
> -						      sample->cpu,
> +						      evsel->stats->aggr[0].counts.val,
> +						      0,

Like I said to Ian, we should pass a proper aggr_idx here not just 0 to
support correct aggregation.  For now I think only possible choice is
AGGR_NONE (for cpu-wide record) or AGGR_THREAD (for per-task record).
Then it should be an index to cpu or thread map.

I think existing sample->cpu can be incorrect for cpu-wide records too
in case of non-contiguous CPU list like `perf record -C 1,3,5 ...`.

>  						      &ctx,
>  						      NULL);
>  		}
> @@ -2325,6 +2328,20 @@ static void process_event(struct perf_script *script,
>  		fflush(fp);
>  }
>  
> +static void check_metric_conflict(void)
> +{
> +	int i;
> +	/*
> +	 * Avoid conflict with the aggregation mode used for the metric printing.
> +	 */
> +	for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> +		if (output[i].fields & PERF_OUTPUT_METRIC) {
> +			fprintf(stderr, "perf stat record files are not supported with -F metric\n");
> +			exit(1);
> +		}
> +	}
> +}
> +
>  static struct scripting_ops	*scripting_ops;
>  
>  static void __process_stat(struct evsel *counter, u64 tstamp)
> @@ -2334,6 +2351,8 @@ static void __process_stat(struct evsel *counter, u64 tstamp)
>  	struct perf_cpu cpu;
>  	static int header_printed;
>  
> +	check_metric_conflict();
> +
>  	if (!header_printed) {
>  		printf("%3s %8s %15s %15s %15s %15s %s\n",
>  		       "CPU", "THREAD", "VAL", "ENA", "RUN", "TIME", "EVENT");
> @@ -3725,6 +3744,8 @@ static int process_stat_config_event(struct perf_session *session __maybe_unused
>  {
>  	perf_event__read_stat_config(&stat_config, &event->stat_config);
>  
> +	check_metric_conflict();
> +
>  	/*
>  	 * Aggregation modes are not used since post-processing scripts are
>  	 * supposed to take care of such requirements
> @@ -4088,6 +4109,17 @@ int cmd_script(int argc, const char **argv)
>  
>  	argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage,
>  			     PARSE_OPT_STOP_AT_NON_OPTION);
> +	for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> +		if (output[i].fields & PERF_OUTPUT_METRIC) {
> +			stat_config.aggr_map = cpu_aggr_map__empty_new(1);
> +			err = -ENOMEM;
> +			if (!stat_config.aggr_map)
> +				goto out;
> +			err = 0;
> +			stat_config.aggr_map->nr = 1;

It should be number of entries in the cpu map or thread map.

Thanks,
Namhyung


> +			break;
> +		}
> +	}
>  
>  	if (symbol_conf.guestmount ||
>  	    symbol_conf.default_guest_vmlinux_name ||
> -- 
> 2.45.2
> 

  reply	other threads:[~2024-07-26  0:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-24 19:01 [PATCH v7 1/4] Create source symlink in perf object dir Andi Kleen
2024-07-24 19:01 ` [PATCH v7 2/4] perf test: Support external tests for separate objdir Andi Kleen
2024-07-26  0:07   ` Namhyung Kim
2024-07-24 19:01 ` [PATCH v7 3/4] perf script: Fix perf script -F +metric Andi Kleen
2024-07-26  0:31   ` Namhyung Kim [this message]
2024-07-26  3:13     ` Ian Rogers
2024-07-31 19:32     ` Andi Kleen
2024-08-02 18:26       ` Namhyung Kim
2024-08-02 20:58         ` Andi Kleen
2024-08-05 18:58           ` Namhyung Kim
2024-07-24 19:01 ` [PATCH v7 4/4] Add a test case for " Andi Kleen
2024-07-26  0:32   ` Namhyung Kim
2024-07-24 20:29 ` [PATCH v7 1/4] Create source symlink in perf object dir Ian Rogers
2024-07-24 21:48   ` Andi Kleen
2024-07-24 22:31     ` Ian Rogers
2024-07-25  7:28       ` Andi Kleen
2024-07-25  9:18         ` Ian Rogers
2024-07-25 22:50           ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZqLuWGKZjdUrkd0L@google.com \
    --to=namhyung@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.