Re: [PATCH v1 12/12] perf parse-events: Support user CPUs mixed with threads/processes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Namhyung Kim <namhyung@kernel.org>
To: Ian Rogers <irogers@google.com>
Cc: Thomas Falcon <thomas.falcon@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Ben Gainey <ben.gainey@arm.com>,
	James Clark <james.clark@linaro.org>,
	Howard Chu <howardchu95@gmail.com>,
	Weilin Wang <weilin.wang@intel.com>,
	Levi Yun <yeoreum.yun@arm.com>,
	"Dr. David Alan Gilbert" <linux@treblig.org>,
	Zhongqiu Han <quic_zhonhan@quicinc.com>,
	Blake Jones <blakejones@google.com>,
	Yicong Yang <yangyicong@hisilicon.com>,
	Anubhav Shelat <ashelat@redhat.com>,
	Thomas Richter <tmricht@linux.ibm.com>,
	Jean-Philippe Romain <jean-philippe.romain@foss.st.com>,
	Song Liu <song@kernel.org>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 12/12] perf parse-events: Support user CPUs mixed with threads/processes
Date: Wed, 16 Jul 2025 13:28:05 -0700	[thread overview]
Message-ID: <aHgLVcxpEphtYLsB@google.com> (raw)
In-Reply-To: <20250627192417.1157736-13-irogers@google.com>

On Fri, Jun 27, 2025 at 12:24:17PM -0700, Ian Rogers wrote:
> Counting events system-wide with a specified CPU prior to this change worked:
> ```
> $ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' -a sleep 1
> 
>   Performance counter stats for 'system wide':
> 
>      59,393,419,099      msr/tsc/
>      33,927,965,927      msr/tsc,cpu=cpu_core/
>      25,465,608,044      msr/tsc,cpu=cpu_atom/
> ```
> 
> However, when counting with process the counts became system wide:
> ```
> $ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
>  10.1: Basic parsing test                                            : Ok
>  10.2: Parsing without PMU name                                      : Ok
>  10.3: Parsing with PMU name                                         : Ok
> 
>  Performance counter stats for 'perf test -F 10':
> 
>         59,233,549      msr/tsc/
>         59,227,556      msr/tsc,cpu=cpu_core/
>         59,224,053      msr/tsc,cpu=cpu_atom/
> ```
> 
> Make the handling of CPU maps with event parsing clearer. When an
> event is parsed creating an evsel the cpus should be either the PMU's
> cpumask or user specified CPUs.
> 
> Update perf_evlist__propagate_maps so that it doesn't clobber the user
> specified CPUs. Try to make the behavior clearer, firstly fix up
> missing cpumasks. Next, perform sanity checks and adjustments from the
> global evlist CPU requests and for the PMU including simplifying to
> the "any CPU"(-1) value. Finally remove the event if the cpumask is
> empty.
> 
> So that events are opened with a CPU and a thread change stat's
> create_perf_stat_counter to give both.
> 
> With the change things are fixed:
> ```
> $ perf stat --no-scale -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
>  10.1: Basic parsing test                                            : Ok
>  10.2: Parsing without PMU name                                      : Ok
>  10.3: Parsing with PMU name                                         : Ok
> 
>  Performance counter stats for 'perf test -F 10':
> 
>         63,704,975      msr/tsc/
>         47,060,704      msr/tsc,cpu=cpu_core/                        (4.62%)
>         16,640,591      msr/tsc,cpu=cpu_atom/                        (2.18%)
> ```
> 
> However, note the "--no-scale" option is used. This is necessary as
> the running time for the event on the counter isn't the same as the
> enabled time because the thread doesn't necessarily run on the CPUs
> specified for the counter. All counter values are scaled with:
> 
>   scaled_value = value * time_enabled / time_running
> 
> and so without --no-scale the scaled_value becomes very large. This
> problem already exists on hybrid systems for the same reason. Here are
> 2 runs of the same code with an instructions event that counts the
> same on both types of core, there is no real multiplexing happening on
> the event:

This is unfortunate.  The event is in a task context but also has a CPU
constraint.  So it's not schedulable on other CPUs.

A problem is that it cannot distinguish the real multiplexing..

> 
> ```
> $ perf stat -e instructions perf test -F 10
> ...
>  Performance counter stats for 'perf test -F 10':
> 
>         87,896,447      cpu_atom/instructions/                       (14.37%)
>         98,171,964      cpu_core/instructions/                       (85.63%)
> ...
> $ perf stat --no-scale -e instructions perf test -F 10
> ...
>  Performance counter stats for 'perf test -F 10':
> 
>         13,069,890      cpu_atom/instructions/                       (19.32%)
>         83,460,274      cpu_core/instructions/                       (80.68%)
> ...
> ```
> The scaling has inflated per-PMU instruction counts and the overall
> count by 2x.
> 
> To fix this the kernel needs changing when a task+CPU event (or just
> task event on hybrid) is scheduled out. A fix could be that the state
> isn't inactive but off for such events, so that time_enabled counts
> don't accumulate on them.

Right, maybe we need to add a new state (UNSCHEDULABLE?) to skip
updating the enabled time.

Thanks,
Namhyung

> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/perf/evlist.c        | 118 ++++++++++++++++++++++-----------
>  tools/perf/util/parse-events.c |  10 ++-
>  tools/perf/util/stat.c         |   6 +-
>  3 files changed, 86 insertions(+), 48 deletions(-)
> 
> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> index 9d9dec21f510..2d2236400220 100644
> --- a/tools/lib/perf/evlist.c
> +++ b/tools/lib/perf/evlist.c
> @@ -36,49 +36,87 @@ void perf_evlist__init(struct perf_evlist *evlist)
>  static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
>  					  struct perf_evsel *evsel)
>  {
> -	if (evsel->system_wide) {
> -		/* System wide: set the cpu map of the evsel to all online CPUs. */
> -		perf_cpu_map__put(evsel->cpus);
> -		evsel->cpus = perf_cpu_map__new_online_cpus();
> -	} else if (evlist->has_user_cpus && evsel->is_pmu_core) {
> -		/*
> -		 * User requested CPUs on a core PMU, ensure the requested CPUs
> -		 * are valid by intersecting with those of the PMU.
> -		 */
> +	if (perf_cpu_map__is_empty(evsel->cpus)) {
> +		if (perf_cpu_map__is_empty(evsel->pmu_cpus)) {
> +			/*
> +			 * Assume the unset PMU cpus were for a system-wide
> +			 * event, like a software or tracepoint.
> +			 */
> +			evsel->pmu_cpus = perf_cpu_map__new_online_cpus();
> +		}
> +		if (evlist->has_user_cpus && !evsel->system_wide) {
> +			/*
> +			 * Use the user CPUs unless the evsel is set to be
> +			 * system wide, such as the dummy event.
> +			 */
> +			evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> +		} else {
> +			/*
> +			 * System wide and other modes, assume the cpu map
> +			 * should be set to all PMU CPUs.
> +			 */
> +			evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
> +		}
> +	}
> +	/*
> +	 * Avoid "any CPU"(-1) for uncore and PMUs that require a CPU, even if
> +	 * requested.
> +	 */
> +	if (evsel->requires_cpu && perf_cpu_map__has_any_cpu(evsel->cpus)) {
>  		perf_cpu_map__put(evsel->cpus);
> -		evsel->cpus = perf_cpu_map__intersect(evlist->user_requested_cpus, evsel->pmu_cpus);
> +		evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
> +	}
>  
> -		/*
> -		 * Empty cpu lists would eventually get opened as "any" so remove
> -		 * genuinely empty ones before they're opened in the wrong place.
> -		 */
> -		if (perf_cpu_map__is_empty(evsel->cpus)) {
> -			struct perf_evsel *next = perf_evlist__next(evlist, evsel);
> -
> -			perf_evlist__remove(evlist, evsel);
> -			/* Keep idx contiguous */
> -			if (next)
> -				list_for_each_entry_from(next, &evlist->entries, node)
> -					next->idx--;
> +	/*
> +	 * Globally requested CPUs replace user requested unless the evsel is
> +	 * set to be system wide.
> +	 */
> +	if (evlist->has_user_cpus && !evsel->system_wide) {
> +		assert(!perf_cpu_map__has_any_cpu(evlist->user_requested_cpus));
> +		if (!perf_cpu_map__equal(evsel->cpus, evlist->user_requested_cpus)) {
> +			perf_cpu_map__put(evsel->cpus);
> +			evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
>  		}
> -	} else if (!evsel->pmu_cpus || evlist->has_user_cpus ||
> -		(!evsel->requires_cpu && perf_cpu_map__has_any_cpu(evlist->user_requested_cpus))) {
> -		/*
> -		 * The PMU didn't specify a default cpu map, this isn't a core
> -		 * event and the user requested CPUs or the evlist user
> -		 * requested CPUs have the "any CPU" (aka dummy) CPU value. In
> -		 * which case use the user requested CPUs rather than the PMU
> -		 * ones.
> -		 */
> +	}
> +
> +	/* Ensure cpus only references valid PMU CPUs. */
> +	if (!perf_cpu_map__has_any_cpu(evsel->cpus) &&
> +	    !perf_cpu_map__is_subset(evsel->pmu_cpus, evsel->cpus)) {
> +		struct perf_cpu_map *tmp = perf_cpu_map__intersect(evsel->pmu_cpus, evsel->cpus);
> +
>  		perf_cpu_map__put(evsel->cpus);
> -		evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> -	} else if (evsel->cpus != evsel->pmu_cpus) {
> -		/*
> -		 * No user requested cpu map but the PMU cpu map doesn't match
> -		 * the evsel's. Reset it back to the PMU cpu map.
> -		 */
> +		evsel->cpus = tmp;
> +	}
> +
> +	/*
> +	 * Was event requested on all the PMU's CPUs but the user requested is
> +	 * any CPU (-1)? If so switch to using any CPU (-1) to reduce the number
> +	 * of events.
> +	 */
> +	if (!evsel->system_wide &&
> +	    perf_cpu_map__equal(evsel->cpus, evsel->pmu_cpus) &&
> +	    perf_cpu_map__has_any_cpu(evlist->user_requested_cpus)) {
>  		perf_cpu_map__put(evsel->cpus);
> -		evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
> +		evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> +	}
> +
> +	/* Sanity check assert before the evsel is potentially removed. */
> +	assert(!evsel->requires_cpu || !perf_cpu_map__has_any_cpu(evsel->cpus));
> +
> +	/*
> +	 * Empty cpu lists would eventually get opened as "any" so remove
> +	 * genuinely empty ones before they're opened in the wrong place.
> +	 */
> +	if (perf_cpu_map__is_empty(evsel->cpus)) {
> +		struct perf_evsel *next = perf_evlist__next(evlist, evsel);
> +
> +		perf_evlist__remove(evlist, evsel);
> +		/* Keep idx contiguous */
> +		if (next)
> +			list_for_each_entry_from(next, &evlist->entries, node)
> +				next->idx--;
> +
> +		return;
>  	}
>  
>  	if (evsel->system_wide) {
> @@ -98,6 +136,10 @@ static void perf_evlist__propagate_maps(struct perf_evlist *evlist)
>  
>  	evlist->needs_map_propagation = true;
>  
> +	/* Clear the all_cpus set which will be merged into during propagation. */
> +	perf_cpu_map__put(evlist->all_cpus);
> +	evlist->all_cpus = NULL;
> +
>  	list_for_each_entry_safe(evsel, n, &evlist->entries, node)
>  		__perf_evlist__propagate_maps(evlist, evsel);
>  }
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 4092a43aa84e..0ff7ae75d8f9 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -313,20 +313,18 @@ __add_event(struct list_head *list, int *idx,
>  	if (pmu) {
>  		is_pmu_core = pmu->is_core;
>  		pmu_cpus = perf_cpu_map__get(pmu->cpus);
> +		if (perf_cpu_map__is_empty(pmu_cpus))
> +			pmu_cpus = cpu_map__online();
>  	} else {
>  		is_pmu_core = (attr->type == PERF_TYPE_HARDWARE ||
>  			       attr->type == PERF_TYPE_HW_CACHE);
>  		pmu_cpus = is_pmu_core ? cpu_map__online() : NULL;
>  	}
>  
> -	if (has_user_cpus) {
> +	if (has_user_cpus)
>  		cpus = perf_cpu_map__get(user_cpus);
> -		/* Existing behavior that pmu_cpus matches the given user ones. */
> -		perf_cpu_map__put(pmu_cpus);
> -		pmu_cpus = perf_cpu_map__get(user_cpus);
> -	} else {
> +	else
>  		cpus = perf_cpu_map__get(pmu_cpus);
> -	}
>  
>  	if (init_attr)
>  		event_attr_init(attr);
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 355a7d5c8ab8..8d3bcdb69d37 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -769,8 +769,6 @@ int create_perf_stat_counter(struct evsel *evsel,
>  			attr->enable_on_exec = 1;
>  	}
>  
> -	if (target__has_cpu(target) && !target__has_per_thread(target))
> -		return evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu_map_idx);
> -
> -	return evsel__open_per_thread(evsel, evsel->core.threads);
> +	return evsel__open_per_cpu_and_thread(evsel, evsel__cpus(evsel), cpu_map_idx,
> +					      evsel->core.threads);
>  }
> -- 
> 2.50.0.727.gbf7dc18ff4-goog
>

next prev parent reply	other threads:[~2025-07-16 20:28 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
2025-06-27 19:24 ` [PATCH v1 01/12] perf parse-events: Warn if a cpu term is unsupported by a CPU Ian Rogers
2025-06-27 19:24 ` [PATCH v1 02/12] perf stat: Avoid buffer overflow to the aggregation map Ian Rogers
2025-06-27 19:24 ` [PATCH v1 03/12] perf stat: Don't size aggregation ids from user_requested_cpus Ian Rogers
2025-06-27 19:24 ` [PATCH v1 04/12] perf parse-events: Allow the cpu term to be a PMU Ian Rogers
2025-07-16 20:09   ` Namhyung Kim
2025-07-16 20:25     ` Ian Rogers
2025-07-18 17:56       ` Namhyung Kim
2025-06-27 19:24 ` [PATCH v1 05/12] perf tool_pmu: Allow num_cpus(_online) to be specific to a cpumask Ian Rogers
2025-06-27 19:24 ` [PATCH v1 06/12] libperf evsel: Rename own_cpus to pmu_cpus Ian Rogers
2025-06-27 19:24 ` [PATCH v1 07/12] libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete Ian Rogers
2025-06-27 19:24 ` [PATCH v1 08/12] perf evsel: Use libperf perf_evsel__exit Ian Rogers
2025-06-27 19:24 ` [PATCH v1 09/12] perf pmus: Factor perf_pmus__find_by_attr out of evsel__find_pmu Ian Rogers
2025-06-27 19:24 ` [PATCH v1 10/12] perf parse-events: Minor __add_event refactoring Ian Rogers
2025-06-27 19:24 ` [PATCH v1 11/12] perf evsel: Add evsel__open_per_cpu_and_thread Ian Rogers
2025-06-27 19:24 ` [PATCH v1 12/12] perf parse-events: Support user CPUs mixed with threads/processes Ian Rogers
2025-07-16 20:28   ` Namhyung Kim [this message]
2025-07-17  0:04     ` Ian Rogers
2025-07-15 19:55 ` [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
2025-07-16 20:03   ` Falcon, Thomas
2025-07-21 16:13 ` James Clark
2025-07-21 17:44   ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aHgLVcxpEphtYLsB@google.com \
    --to=namhyung@kernel.org \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=ashelat@redhat.com \
    --cc=ben.gainey@arm.com \
    --cc=blakejones@google.com \
    --cc=howardchu95@gmail.com \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jean-philippe.romain@foss.st.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux@treblig.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quic_zhonhan@quicinc.com \
    --cc=song@kernel.org \
    --cc=thomas.falcon@intel.com \
    --cc=tmricht@linux.ibm.com \
    --cc=weilin.wang@intel.com \
    --cc=yangyicong@hisilicon.com \
    --cc=yeoreum.yun@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.