[PATCH] perf record: Usability enhancement for Auto Counter Reload

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] perf record: Usability enhancement for Auto Counter Reload
@ 2025-05-12 21:09 Thomas Falcon
  2025-05-13 15:31 ` Ian Rogers
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Falcon @ 2025-05-12 21:09 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang
  Cc: linux-kernel, linux-perf-users, Thomas Falcon

The Auto Counter Reload (ACR)[1] feature is used to track the
relative rates of two or more perf events, only sampling
when a given threshold is exceeded. This helps reduce overhead
and unnecessary samples. However, enabling this feature
currently requires setting two parameters:

 -- Event sampling period ("period")
 -- acr_mask, which determines which events get reloaded
    when the sample period is reached.

For example, in the following command:

perf record -e "{cpu_atom/branch-misses,period=200000,\
acr_mask=0x2/ppu,cpu_atom/branch-instructions,period=1000000,\
acr_mask=0x3/u}" -- ./mispredict

The goal is to limit event sampling to cases when the
branch miss rate exceeds 20%. If the branch instructions
sample period is exceeded first, both events are reloaded.
If branch misses exceed their threshold first, only the
second counter is reloaded, and a sample is taken.

To simplify this, provide a new “ratio-to-prev” event term
that works alongside the period event option or -c option.
This would allow users to specify the desired relative rate
between events as a ratio, making configuration more intuitive.

With this enhancement, the equivalent command would be:

perf record -e "{cpu_atom/branch-misses/ppu,\
cpu_atom/branch-instructions,period=1000000,ratio_to_prev=5/u}" \
-- ./mispredict

or

perf record -e "{cpu_atom/branch-misses/ppu,\
cpu_atom/branch-instructions,ratio-to-prev=5/u}" -c 1000000 \
-- ./mispredict

[1] https://lore.kernel.org/lkml/20250327195217.2683619-1-kan.liang@linux.intel.com/

Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>

---
 tools/perf/Documentation/intel-acr.txt |  45 +++++++++++
 tools/perf/Documentation/perf-list.txt |   2 +
 tools/perf/arch/x86/util/evsel.c       | 100 ++++++++++++++++++++++++-
 tools/perf/util/evsel.c                |   2 +
 tools/perf/util/evsel_config.h         |   1 +
 tools/perf/util/parse-events.c         |  10 +++
 tools/perf/util/parse-events.h         |   3 +-
 tools/perf/util/parse-events.l         |   1 +
 tools/perf/util/pmu.c                  |   3 +-
 9 files changed, 164 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-acr.txt

diff --git a/tools/perf/Documentation/intel-acr.txt b/tools/perf/Documentation/intel-acr.txt
new file mode 100644
index 000000000000..db835c769e1c
--- /dev/null
+++ b/tools/perf/Documentation/intel-acr.txt
@@ -0,0 +1,45 @@
+Intel Auto Counter Reload Support
+---------------------------------
+Support for Intel Auto Counter Reload in perf tools
+
+Auto counter reload provides a means for software to specify to hardware
+that certain counters, if supported, should be automatically reloaded
+upon overflow of chosen counters. By taking a sample only if the rate of
+one event exceeds some threshold relative to the rate of another event,
+this feature enables software to sample based on the relative rate of
+two or more events. To enable this, the user must provide a sample period
+term and a bitmask ("acr_mask") for each relevant event specifying the
+counters in an event group to reload if the event's specified sample
+period is exceeded.
+
+For example, if the user desires to measure a scenario when IPC > 2,
+the event group might look like the one below:
+
+	perf record -e {cpu_atom/instructions,period=200000,acr_mask=0x2/, \
+	cpu_atom/cycles,period=100000,acr_mask=0x3/} -- true
+
+In this case, if the "instructions" counter exceeds the sample period of
+200000, the second counter, "cycles", will be reset and a sample will be
+taken. If "cycles" is exceeded first, both counters in the group will be
+reset. In this way, samples will only be taken for cases where IPC > 2.
+
+ratio-to-prev Event Term
+------------------------
+To simplify this, an event term "ratio-to-prev" is provided which is used
+alongside the sample period term n or the -c/--count option. This would
+allow users to specify the desired relative rate between events as a
+ratio.
+
+The command above would then become
+
+	perf record -e {cpu_atom/instructions/, \
+	cpu_atom/cycles,period=100000,ratio-to-prev=0.5/} -- true
+
+ratio-to-prev is the ratio of the event using the term relative
+to the previous event in the group, which will always be 1,
+for a 1:0.5 or 2:1 ratio.
+
+To sample for IPC < 2 for example, the events need to be reordered:
+
+	perf record -e {cpu_atom/cycles/, \
+	cpu_atom/instructions,period=200000,ratio-to-prev=2.0/} -- true
diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
index 8914f12d2b85..ba809fa1e8c6 100644
--- a/tools/perf/Documentation/perf-list.txt
+++ b/tools/perf/Documentation/perf-list.txt
@@ -376,6 +376,8 @@ Support raw format:
 . '--raw-dump [hw|sw|cache|tracepoint|pmu|event_glob]', shows the raw-dump of
   a certain kind of events.
 
+include::intel-acr.txt[]
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-top[1],
diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
index 3dd29ba2c23b..b93dbbed2c8e 100644
--- a/tools/perf/arch/x86/util/evsel.c
+++ b/tools/perf/arch/x86/util/evsel.c
@@ -1,7 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <stdio.h>
 #include <stdlib.h>
+#include "util/evlist.h"
 #include "util/evsel.h"
+#include "util/evsel_config.h"
 #include "util/env.h"
 #include "util/pmu.h"
 #include "util/pmus.h"
@@ -89,6 +91,97 @@ int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
 			 event_name);
 }
 
+static void evsel__apply_ratio_to_prev(struct evsel *evsel,
+				struct perf_event_attr *attr,
+				const char *buf)
+{
+	struct perf_event_attr *prev_attr = NULL;
+	struct evsel *evsel_prev = NULL;
+	struct evsel *leader = evsel__leader(evsel);
+	struct evsel *pos;
+	const char *name = "acr_mask";
+	int evsel_idx = 0;
+	__u64 ev_mask, pr_ev_mask;
+	double rtp;
+
+	rtp = strtod(buf, NULL);
+	if (rtp <= 0) {
+		pr_err("Invalid ratio-to-prev value %lf\n", rtp);
+		return;
+	}
+	if (evsel == evsel__leader(evsel)) {
+		pr_err("Invalid use of ratio-to-prev term without preceding element in group\n");
+		return;
+	}
+	if (!evsel->pmu->is_core) {
+		pr_err("Event using ratio-to-prev term must have a core PMU\n");
+		return;
+	}
+	if (!perf_pmu__has_format(evsel->pmu, name)) {
+		pr_err("'%s' does not have acr_mask format support\n", evsel->pmu->name);
+		return;
+	}
+	if (perf_pmu__format_type(evsel->pmu, name) !=
+			PERF_PMU_FORMAT_VALUE_CONFIG2) {
+		pr_err("'%s' does not have acr_mask format support\n", evsel->pmu->name);
+		return;
+	}
+	if (attr->freq) {
+		pr_err("Event period term or count (-c) must be set when using ratio-to-prev term.\n");
+		return;
+	}
+
+	evsel_prev = evsel__prev(evsel);
+	if (!evsel_prev) {
+		pr_err("Previous event does not exist.\n");
+		return;
+	}
+
+	prev_attr = &evsel_prev->core.attr;
+
+	prev_attr->sample_period = (__u64)(attr->sample_period / rtp);
+	prev_attr->freq = 0;
+	evsel__reset_sample_bit(evsel_prev, PERIOD);
+
+	for_each_group_evsel(pos, leader) {
+		if (pos == evsel)
+			break;
+		evsel_idx++;
+	}
+
+	/*
+	 * acr_mask (config2) is calculated using the event's index in
+	 * the event group. The first event will use the index of the
+	 * second event as its mask (e.g., 0x2), indicating that the
+	 * second event counter will be reset and a sample taken for
+	 * the first event if its counter overflows. The second event
+	 * will use the mask consisting of the first and second bits
+	 * (e.g., 0x3), meaning both counters will be reset if the
+	 * second event counter overflows.
+	 */
+
+	ev_mask = 1ull << evsel_idx;
+	pr_ev_mask = 1ull << (evsel_idx - 1);
+
+	prev_attr->config2 = ev_mask;
+	attr->config2 = ev_mask | pr_ev_mask;
+}
+
+static void intel__post_evsel_config(struct evsel *evsel,
+			      struct perf_event_attr *attr)
+{
+	struct evsel_config_term *term;
+	struct list_head *config_terms = &evsel->config_terms;
+	const char *rtp_buf = NULL;
+
+	list_for_each_entry(term, config_terms, list) {
+		if (term->type == EVSEL__CONFIG_TERM_RATIO_TO_PREV) {
+			rtp_buf = term->val.str;
+			evsel__apply_ratio_to_prev(evsel, attr, rtp_buf);
+		}
+	}
+}
+
 static void ibs_l3miss_warn(void)
 {
 	pr_warning(
@@ -101,7 +194,12 @@ void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr)
 	struct perf_pmu *evsel_pmu, *ibs_fetch_pmu, *ibs_op_pmu;
 	static int warned_once;
 
-	if (warned_once || !x86__is_amd_cpu())
+	if (!x86__is_amd_cpu()) {
+		intel__post_evsel_config(evsel, attr);
+		return;
+	}
+
+	if (warned_once)
 		return;
 
 	evsel_pmu = evsel__find_pmu(evsel);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index b60461e16804..5028232afeb7 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1189,6 +1189,8 @@ static void evsel__apply_config_terms(struct evsel *evsel,
 			break;
 		case EVSEL__CONFIG_TERM_CFG_CHG:
 			break;
+		case EVSEL__CONFIG_TERM_RATIO_TO_PREV:
+			break;
 		default:
 			break;
 		}
diff --git a/tools/perf/util/evsel_config.h b/tools/perf/util/evsel_config.h
index af52a1516d0b..26c69d9ce788 100644
--- a/tools/perf/util/evsel_config.h
+++ b/tools/perf/util/evsel_config.h
@@ -28,6 +28,7 @@ enum evsel_term_type {
 	EVSEL__CONFIG_TERM_AUX_ACTION,
 	EVSEL__CONFIG_TERM_AUX_SAMPLE_SIZE,
 	EVSEL__CONFIG_TERM_CFG_CHG,
+	EVSEL__CONFIG_TERM_RATIO_TO_PREV,
 };
 
 struct evsel_config_term {
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 7297ca3a4eec..4ea8d4ffabdb 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -806,6 +806,7 @@ const char *parse_events__term_type_str(enum parse_events__term_type term_type)
 		[PARSE_EVENTS__TERM_TYPE_RAW]                   = "raw",
 		[PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE]          = "legacy-cache",
 		[PARSE_EVENTS__TERM_TYPE_HARDWARE]              = "hardware",
+		[PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV]         = "ratio-to-prev",
 	};
 	if ((unsigned int)term_type >= __PARSE_EVENTS__TERM_TYPE_NR)
 		return "unknown term";
@@ -855,6 +856,7 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
 	case PARSE_EVENTS__TERM_TYPE_RAW:
 	case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
 	case PARSE_EVENTS__TERM_TYPE_HARDWARE:
+	case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
 	default:
 		if (!err)
 			return false;
@@ -982,6 +984,9 @@ do {									   \
 			return -EINVAL;
 		}
 		break;
+	case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
+		CHECK_TYPE_VAL(STR);
+		break;
 	case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
 	case PARSE_EVENTS__TERM_TYPE_USER:
 	case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
@@ -1109,6 +1114,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
 	case PARSE_EVENTS__TERM_TYPE_RAW:
 	case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
 	case PARSE_EVENTS__TERM_TYPE_HARDWARE:
+	case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
 	default:
 		if (err) {
 			parse_events_error__handle(err, term->err_term,
@@ -1233,6 +1239,9 @@ do {								\
 			ADD_CONFIG_TERM_VAL(AUX_SAMPLE_SIZE, aux_sample_size,
 					    term->val.num, term->weak);
 			break;
+		case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
+			ADD_CONFIG_TERM_STR(RATIO_TO_PREV, term->val.str, term->weak);
+			break;
 		case PARSE_EVENTS__TERM_TYPE_USER:
 		case PARSE_EVENTS__TERM_TYPE_CONFIG:
 		case PARSE_EVENTS__TERM_TYPE_CONFIG1:
@@ -1297,6 +1306,7 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
 		case PARSE_EVENTS__TERM_TYPE_RAW:
 		case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
 		case PARSE_EVENTS__TERM_TYPE_HARDWARE:
+		case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
 		default:
 			break;
 		}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e176a34ab088..a9de95dd337a 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -80,7 +80,8 @@ enum parse_events__term_type {
 	PARSE_EVENTS__TERM_TYPE_RAW,
 	PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE,
 	PARSE_EVENTS__TERM_TYPE_HARDWARE,
-#define	__PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_HARDWARE + 1)
+	PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV,
+#define	__PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV + 1)
 };
 
 struct parse_events_term {
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 7ed86e3e34e3..49fe1811fe68 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -335,6 +335,7 @@ aux-output		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
 aux-action		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_ACTION); }
 aux-sample-size		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
 metric-id		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
+ratio-to-prev		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV); }
 cpu-cycles|cycles				{ return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
 stalled-cycles-frontend|idle-cycles-frontend	{ return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
 stalled-cycles-backend|idle-cycles-backend	{ return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index d08972aa461c..8b5b5a6adb29 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1470,7 +1470,7 @@ static int pmu_config_term(const struct perf_pmu *pmu,
 			break;
 		case PARSE_EVENTS__TERM_TYPE_USER: /* Not hardcoded. */
 			return -EINVAL;
-		case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_HARDWARE:
+		case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
 			/* Skip non-config terms. */
 			break;
 		default:
@@ -1852,6 +1852,7 @@ int perf_pmu__for_each_format(struct perf_pmu *pmu, void *state, pmu_format_call
 		"aux-output",
 		"aux-action=(pause|resume|start-paused)",
 		"aux-sample-size=number",
+		"ratio-to-prev=string",
 	};
 	struct perf_pmu_format *format;
 	int ret;
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf record: Usability enhancement for Auto Counter Reload
  2025-05-12 21:09 [PATCH] perf record: Usability enhancement for Auto Counter Reload Thomas Falcon
@ 2025-05-13 15:31 ` Ian Rogers
  2025-05-14 18:05   ` Falcon, Thomas
  0 siblings, 1 reply; 4+ messages in thread
From: Ian Rogers @ 2025-05-13 15:31 UTC (permalink / raw)
  To: Thomas Falcon
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Kan Liang, linux-kernel, linux-perf-users

On Mon, May 12, 2025 at 2:09 PM Thomas Falcon <thomas.falcon@intel.com> wrote:
>
> The Auto Counter Reload (ACR)[1] feature is used to track the
> relative rates of two or more perf events, only sampling
> when a given threshold is exceeded. This helps reduce overhead
> and unnecessary samples. However, enabling this feature
> currently requires setting two parameters:
>
>  -- Event sampling period ("period")
>  -- acr_mask, which determines which events get reloaded
>     when the sample period is reached.
>
> For example, in the following command:
>
> perf record -e "{cpu_atom/branch-misses,period=200000,\
> acr_mask=0x2/ppu,cpu_atom/branch-instructions,period=1000000,\
> acr_mask=0x3/u}" -- ./mispredict
>
> The goal is to limit event sampling to cases when the
> branch miss rate exceeds 20%. If the branch instructions
> sample period is exceeded first, both events are reloaded.
> If branch misses exceed their threshold first, only the
> second counter is reloaded, and a sample is taken.
>
> To simplify this, provide a new “ratio-to-prev” event term
> that works alongside the period event option or -c option.
> This would allow users to specify the desired relative rate
> between events as a ratio, making configuration more intuitive.
>
> With this enhancement, the equivalent command would be:
>
> perf record -e "{cpu_atom/branch-misses/ppu,\
> cpu_atom/branch-instructions,period=1000000,ratio_to_prev=5/u}" \
> -- ./mispredict
>
> or
>
> perf record -e "{cpu_atom/branch-misses/ppu,\
> cpu_atom/branch-instructions,ratio-to-prev=5/u}" -c 1000000 \
> -- ./mispredict

Thanks Thomas. I'm wondering if ratio-to-prev should be a generic term
such that periods can be set as a ratio of each on non-Intel?

> [1] https://lore.kernel.org/lkml/20250327195217.2683619-1-kan.liang@linux.intel.com/
>
> Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
>
> ---
>  tools/perf/Documentation/intel-acr.txt |  45 +++++++++++
>  tools/perf/Documentation/perf-list.txt |   2 +
>  tools/perf/arch/x86/util/evsel.c       | 100 ++++++++++++++++++++++++-
>  tools/perf/util/evsel.c                |   2 +
>  tools/perf/util/evsel_config.h         |   1 +
>  tools/perf/util/parse-events.c         |  10 +++
>  tools/perf/util/parse-events.h         |   3 +-
>  tools/perf/util/parse-events.l         |   1 +
>  tools/perf/util/pmu.c                  |   3 +-
>  9 files changed, 164 insertions(+), 3 deletions(-)
>  create mode 100644 tools/perf/Documentation/intel-acr.txt
>
> diff --git a/tools/perf/Documentation/intel-acr.txt b/tools/perf/Documentation/intel-acr.txt
> new file mode 100644
> index 000000000000..db835c769e1c
> --- /dev/null
> +++ b/tools/perf/Documentation/intel-acr.txt
> @@ -0,0 +1,45 @@
> +Intel Auto Counter Reload Support
> +---------------------------------
> +Support for Intel Auto Counter Reload in perf tools
> +
> +Auto counter reload provides a means for software to specify to hardware
> +that certain counters, if supported, should be automatically reloaded
> +upon overflow of chosen counters. By taking a sample only if the rate of
> +one event exceeds some threshold relative to the rate of another event,
> +this feature enables software to sample based on the relative rate of
> +two or more events. To enable this, the user must provide a sample period
> +term and a bitmask ("acr_mask") for each relevant event specifying the
> +counters in an event group to reload if the event's specified sample
> +period is exceeded.
> +
> +For example, if the user desires to measure a scenario when IPC > 2,
> +the event group might look like the one below:
> +
> +       perf record -e {cpu_atom/instructions,period=200000,acr_mask=0x2/, \
> +       cpu_atom/cycles,period=100000,acr_mask=0x3/} -- true
> +
> +In this case, if the "instructions" counter exceeds the sample period of
> +200000, the second counter, "cycles", will be reset and a sample will be
> +taken. If "cycles" is exceeded first, both counters in the group will be
> +reset. In this way, samples will only be taken for cases where IPC > 2.

Could this definition include the meaning of acr_mask? I can see that
the 2 periods create an IPC of 2, but I can't see why the acr_mask
needs to be 2 and 3.

> +
> +ratio-to-prev Event Term
> +------------------------
> +To simplify this, an event term "ratio-to-prev" is provided which is used
> +alongside the sample period term n or the -c/--count option. This would
> +allow users to specify the desired relative rate between events as a
> +ratio.

Should there be an opposite ratio-to-next?

> +
> +The command above would then become
> +
> +       perf record -e {cpu_atom/instructions/, \
> +       cpu_atom/cycles,period=100000,ratio-to-prev=0.5/} -- true
> +
> +ratio-to-prev is the ratio of the event using the term relative
> +to the previous event in the group, which will always be 1,
> +for a 1:0.5 or 2:1 ratio.
> +
> +To sample for IPC < 2 for example, the events need to be reordered:
> +
> +       perf record -e {cpu_atom/cycles/, \
> +       cpu_atom/instructions,period=200000,ratio-to-prev=2.0/} -- true

We allow "software" events in groups with hardware events. The current
list of software events is in perf_pmu__is_software and contains a few
surprises like "msr":
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2134
presumably ratio-to-prev should apply to the previous event in the
list that is on the same PMU?

> diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
> index 8914f12d2b85..ba809fa1e8c6 100644
> --- a/tools/perf/Documentation/perf-list.txt
> +++ b/tools/perf/Documentation/perf-list.txt
> @@ -376,6 +376,8 @@ Support raw format:
>  . '--raw-dump [hw|sw|cache|tracepoint|pmu|event_glob]', shows the raw-dump of
>    a certain kind of events.
>
> +include::intel-acr.txt[]
> +
>  SEE ALSO
>  --------
>  linkperf:perf-stat[1], linkperf:perf-top[1],
> diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
> index 3dd29ba2c23b..b93dbbed2c8e 100644
> --- a/tools/perf/arch/x86/util/evsel.c
> +++ b/tools/perf/arch/x86/util/evsel.c
> @@ -1,7 +1,9 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include <stdio.h>
>  #include <stdlib.h>
> +#include "util/evlist.h"
>  #include "util/evsel.h"
> +#include "util/evsel_config.h"
>  #include "util/env.h"
>  #include "util/pmu.h"
>  #include "util/pmus.h"
> @@ -89,6 +91,97 @@ int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
>                          event_name);
>  }
>
> +static void evsel__apply_ratio_to_prev(struct evsel *evsel,
> +                               struct perf_event_attr *attr,
> +                               const char *buf)
> +{
> +       struct perf_event_attr *prev_attr = NULL;
> +       struct evsel *evsel_prev = NULL;
> +       struct evsel *leader = evsel__leader(evsel);
> +       struct evsel *pos;
> +       const char *name = "acr_mask";
> +       int evsel_idx = 0;
> +       __u64 ev_mask, pr_ev_mask;
> +       double rtp;
> +
> +       rtp = strtod(buf, NULL);
> +       if (rtp <= 0) {
> +               pr_err("Invalid ratio-to-prev value %lf\n", rtp);

It would be nice to fail this early during parsing so that we can
identify the part of the parse string that is invalid. I'm guessing it
is this way because the parse_events__term_val_type are either an
integer or a string.

> +               return;
> +       }
> +       if (evsel == evsel__leader(evsel)) {
> +               pr_err("Invalid use of ratio-to-prev term without preceding element in group\n");

I'm wondering how we can prevent this happening due to event
reordering in parse_events__sort_events_and_fix_groups:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2095

> +               return;
> +       }
> +       if (!evsel->pmu->is_core) {
> +               pr_err("Event using ratio-to-prev term must have a core PMU\n");

Would a stronger test here be the same PMU?

> +               return;
> +       }
> +       if (!perf_pmu__has_format(evsel->pmu, name)) {
> +               pr_err("'%s' does not have acr_mask format support\n", evsel->pmu->name);
> +               return;
> +       }
> +       if (perf_pmu__format_type(evsel->pmu, name) !=
> +                       PERF_PMU_FORMAT_VALUE_CONFIG2) {
> +               pr_err("'%s' does not have acr_mask format support\n", evsel->pmu->name);

I wonder if the acr_mask support could be in an
arch_evsel__apply_ratio_to_prev and the non-acr_mask stuff be generic?
If nothing else this would aid testing.

> +               return;
> +       }
> +       if (attr->freq) {
> +               pr_err("Event period term or count (-c) must be set when using ratio-to-prev term.\n");
> +               return;
> +       }
> +
> +       evsel_prev = evsel__prev(evsel);
> +       if (!evsel_prev) {
> +               pr_err("Previous event does not exist.\n");
> +               return;
> +       }
> +
> +       prev_attr = &evsel_prev->core.attr;
> +
> +       prev_attr->sample_period = (__u64)(attr->sample_period / rtp);
> +       prev_attr->freq = 0;
> +       evsel__reset_sample_bit(evsel_prev, PERIOD);
> +
> +       for_each_group_evsel(pos, leader) {
> +               if (pos == evsel)
> +                       break;
> +               evsel_idx++;
> +       }
> +
> +       /*
> +        * acr_mask (config2) is calculated using the event's index in
> +        * the event group. The first event will use the index of the
> +        * second event as its mask (e.g., 0x2), indicating that the
> +        * second event counter will be reset and a sample taken for
> +        * the first event if its counter overflows. The second event
> +        * will use the mask consisting of the first and second bits
> +        * (e.g., 0x3), meaning both counters will be reset if the
> +        * second event counter overflows.
> +        */
> +
> +       ev_mask = 1ull << evsel_idx;
> +       pr_ev_mask = 1ull << (evsel_idx - 1);
> +
> +       prev_attr->config2 = ev_mask;
> +       attr->config2 = ev_mask | pr_ev_mask;
> +}
> +
> +static void intel__post_evsel_config(struct evsel *evsel,
> +                             struct perf_event_attr *attr)
> +{
> +       struct evsel_config_term *term;
> +       struct list_head *config_terms = &evsel->config_terms;
> +       const char *rtp_buf = NULL;
> +
> +       list_for_each_entry(term, config_terms, list) {
> +               if (term->type == EVSEL__CONFIG_TERM_RATIO_TO_PREV) {
> +                       rtp_buf = term->val.str;
> +                       evsel__apply_ratio_to_prev(evsel, attr, rtp_buf);
> +               }
> +       }
> +}
> +
>  static void ibs_l3miss_warn(void)
>  {
>         pr_warning(
> @@ -101,7 +194,12 @@ void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr)
>         struct perf_pmu *evsel_pmu, *ibs_fetch_pmu, *ibs_op_pmu;
>         static int warned_once;
>
> -       if (warned_once || !x86__is_amd_cpu())
> +       if (!x86__is_amd_cpu()) {
> +               intel__post_evsel_config(evsel, attr);
> +               return;
> +       }
> +
> +       if (warned_once)
>                 return;
>
>         evsel_pmu = evsel__find_pmu(evsel);
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index b60461e16804..5028232afeb7 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1189,6 +1189,8 @@ static void evsel__apply_config_terms(struct evsel *evsel,
>                         break;
>                 case EVSEL__CONFIG_TERM_CFG_CHG:
>                         break;
> +               case EVSEL__CONFIG_TERM_RATIO_TO_PREV:
> +                       break;
>                 default:
>                         break;
>                 }
> diff --git a/tools/perf/util/evsel_config.h b/tools/perf/util/evsel_config.h
> index af52a1516d0b..26c69d9ce788 100644
> --- a/tools/perf/util/evsel_config.h
> +++ b/tools/perf/util/evsel_config.h
> @@ -28,6 +28,7 @@ enum evsel_term_type {
>         EVSEL__CONFIG_TERM_AUX_ACTION,
>         EVSEL__CONFIG_TERM_AUX_SAMPLE_SIZE,
>         EVSEL__CONFIG_TERM_CFG_CHG,
> +       EVSEL__CONFIG_TERM_RATIO_TO_PREV,
>  };
>
>  struct evsel_config_term {
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 7297ca3a4eec..4ea8d4ffabdb 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -806,6 +806,7 @@ const char *parse_events__term_type_str(enum parse_events__term_type term_type)
>                 [PARSE_EVENTS__TERM_TYPE_RAW]                   = "raw",
>                 [PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE]          = "legacy-cache",
>                 [PARSE_EVENTS__TERM_TYPE_HARDWARE]              = "hardware",
> +               [PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV]         = "ratio-to-prev",
>         };
>         if ((unsigned int)term_type >= __PARSE_EVENTS__TERM_TYPE_NR)
>                 return "unknown term";
> @@ -855,6 +856,7 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
>         case PARSE_EVENTS__TERM_TYPE_RAW:
>         case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
>         case PARSE_EVENTS__TERM_TYPE_HARDWARE:
> +       case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
>         default:
>                 if (!err)
>                         return false;
> @@ -982,6 +984,9 @@ do {                                                                           \
>                         return -EINVAL;
>                 }
>                 break;
> +       case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> +               CHECK_TYPE_VAL(STR);
> +               break;
>         case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
>         case PARSE_EVENTS__TERM_TYPE_USER:
>         case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
> @@ -1109,6 +1114,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
>         case PARSE_EVENTS__TERM_TYPE_RAW:
>         case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
>         case PARSE_EVENTS__TERM_TYPE_HARDWARE:
> +       case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
>         default:
>                 if (err) {
>                         parse_events_error__handle(err, term->err_term,
> @@ -1233,6 +1239,9 @@ do {                                                              \
>                         ADD_CONFIG_TERM_VAL(AUX_SAMPLE_SIZE, aux_sample_size,
>                                             term->val.num, term->weak);
>                         break;
> +               case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> +                       ADD_CONFIG_TERM_STR(RATIO_TO_PREV, term->val.str, term->weak);
> +                       break;
>                 case PARSE_EVENTS__TERM_TYPE_USER:
>                 case PARSE_EVENTS__TERM_TYPE_CONFIG:
>                 case PARSE_EVENTS__TERM_TYPE_CONFIG1:
> @@ -1297,6 +1306,7 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
>                 case PARSE_EVENTS__TERM_TYPE_RAW:
>                 case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
>                 case PARSE_EVENTS__TERM_TYPE_HARDWARE:
> +               case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
>                 default:
>                         break;
>                 }
> diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> index e176a34ab088..a9de95dd337a 100644
> --- a/tools/perf/util/parse-events.h
> +++ b/tools/perf/util/parse-events.h
> @@ -80,7 +80,8 @@ enum parse_events__term_type {
>         PARSE_EVENTS__TERM_TYPE_RAW,
>         PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE,
>         PARSE_EVENTS__TERM_TYPE_HARDWARE,
> -#define        __PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_HARDWARE + 1)
> +       PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV,
> +#define        __PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV + 1)
>  };
>
>  struct parse_events_term {
> diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> index 7ed86e3e34e3..49fe1811fe68 100644
> --- a/tools/perf/util/parse-events.l
> +++ b/tools/perf/util/parse-events.l
> @@ -335,6 +335,7 @@ aux-output          { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
>  aux-action             { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_ACTION); }
>  aux-sample-size                { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
>  metric-id              { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
> +ratio-to-prev          { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV); }
>  cpu-cycles|cycles                              { return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
>  stalled-cycles-frontend|idle-cycles-frontend   { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
>  stalled-cycles-backend|idle-cycles-backend     { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index d08972aa461c..8b5b5a6adb29 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -1470,7 +1470,7 @@ static int pmu_config_term(const struct perf_pmu *pmu,
>                         break;
>                 case PARSE_EVENTS__TERM_TYPE_USER: /* Not hardcoded. */
>                         return -EINVAL;
> -               case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_HARDWARE:
> +               case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
>                         /* Skip non-config terms. */
>                         break;
>                 default:
> @@ -1852,6 +1852,7 @@ int perf_pmu__for_each_format(struct perf_pmu *pmu, void *state, pmu_format_call
>                 "aux-output",
>                 "aux-action=(pause|resume|start-paused)",
>                 "aux-sample-size=number",
> +               "ratio-to-prev=string",
>         };
>         struct perf_pmu_format *format;
>         int ret;

Some places I think testing can be added are:
* The event parsing test:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/parse-events.c?h=perf-tools-next#n2143
* The perf record test:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/record.sh?h=perf-tools-next

I wonder for if acr_mask is present (or ratio-to-prev) then
arch_evsel__must_be_in_group should return true:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/util/evsel.c?h=perf-tools-next#n63
this is used to force topdown events into groups, so perhaps we can do
similar forcing and make the use of the {} for the group optional (or
fixed by the tool).

Thanks,
Ian

> --
> 2.48.1
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf record: Usability enhancement for Auto Counter Reload
  2025-05-13 15:31 ` Ian Rogers
@ 2025-05-14 18:05   ` Falcon, Thomas
  2025-05-19 16:17     ` Ian Rogers
  0 siblings, 1 reply; 4+ messages in thread
From: Falcon, Thomas @ 2025-05-14 18:05 UTC (permalink / raw)
  To: irogers@google.com
  Cc: alexander.shishkin@linux.intel.com,
	linux-perf-users@vger.kernel.org, peterz@infradead.org,
	acme@kernel.org, mingo@redhat.com, Hunter, Adrian,
	namhyung@kernel.org, jolsa@kernel.org,
	linux-kernel@vger.kernel.org, kan.liang@linux.intel.com,
	mark.rutland@arm.com

On Tue, 2025-05-13 at 08:31 -0700, Ian Rogers wrote:
> On Mon, May 12, 2025 at 2:09 PM Thomas Falcon <thomas.falcon@intel.com> wrote:
> > 
> > The Auto Counter Reload (ACR)[1] feature is used to track the
> > relative rates of two or more perf events, only sampling
> > when a given threshold is exceeded. This helps reduce overhead
> > and unnecessary samples. However, enabling this feature
> > currently requires setting two parameters:
> > 
> >  -- Event sampling period ("period")
> >  -- acr_mask, which determines which events get reloaded
> >     when the sample period is reached.
> > 
> > For example, in the following command:
> > 
> > perf record -e "{cpu_atom/branch-misses,period=200000,\
> > acr_mask=0x2/ppu,cpu_atom/branch-instructions,period=1000000,\
> > acr_mask=0x3/u}" -- ./mispredict
> > 
> > The goal is to limit event sampling to cases when the
> > branch miss rate exceeds 20%. If the branch instructions
> > sample period is exceeded first, both events are reloaded.
> > If branch misses exceed their threshold first, only the
> > second counter is reloaded, and a sample is taken.
> > 
> > To simplify this, provide a new “ratio-to-prev” event term
> > that works alongside the period event option or -c option.
> > This would allow users to specify the desired relative rate
> > between events as a ratio, making configuration more intuitive.
> > 
> > With this enhancement, the equivalent command would be:
> > 
> > perf record -e "{cpu_atom/branch-misses/ppu,\
> > cpu_atom/branch-instructions,period=1000000,ratio_to_prev=5/u}" \
> > -- ./mispredict
> > 
> > or
> > 
> > perf record -e "{cpu_atom/branch-misses/ppu,\
> > cpu_atom/branch-instructions,ratio-to-prev=5/u}" -c 1000000 \
> > -- ./mispredict
> 
> Thanks Thomas. I'm wondering if ratio-to-prev should be a generic term
> such that periods can be set as a ratio of each on non-Intel?
> 
> > [1] https://lore.kernel.org/lkml/20250327195217.2683619-1-kan.liang@linux.intel.com/
> > 
> > Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
> > 
> > ---
> >  tools/perf/Documentation/intel-acr.txt |  45 +++++++++++
> >  tools/perf/Documentation/perf-list.txt |   2 +
> >  tools/perf/arch/x86/util/evsel.c       | 100 ++++++++++++++++++++++++-
> >  tools/perf/util/evsel.c                |   2 +
> >  tools/perf/util/evsel_config.h         |   1 +
> >  tools/perf/util/parse-events.c         |  10 +++
> >  tools/perf/util/parse-events.h         |   3 +-
> >  tools/perf/util/parse-events.l         |   1 +
> >  tools/perf/util/pmu.c                  |   3 +-
> >  9 files changed, 164 insertions(+), 3 deletions(-)
> >  create mode 100644 tools/perf/Documentation/intel-acr.txt
> > 
> > diff --git a/tools/perf/Documentation/intel-acr.txt b/tools/perf/Documentation/intel-acr.txt
> > new file mode 100644
> > index 000000000000..db835c769e1c
> > --- /dev/null
> > +++ b/tools/perf/Documentation/intel-acr.txt
> > @@ -0,0 +1,45 @@
> > +Intel Auto Counter Reload Support
> > +---------------------------------
> > +Support for Intel Auto Counter Reload in perf tools
> > +
> > +Auto counter reload provides a means for software to specify to hardware
> > +that certain counters, if supported, should be automatically reloaded
> > +upon overflow of chosen counters. By taking a sample only if the rate of
> > +one event exceeds some threshold relative to the rate of another event,
> > +this feature enables software to sample based on the relative rate of
> > +two or more events. To enable this, the user must provide a sample period
> > +term and a bitmask ("acr_mask") for each relevant event specifying the
> > +counters in an event group to reload if the event's specified sample
> > +period is exceeded.
> > +
> > +For example, if the user desires to measure a scenario when IPC > 2,
> > +the event group might look like the one below:
> > +
> > +       perf record -e {cpu_atom/instructions,period=200000,acr_mask=0x2/, \
> > +       cpu_atom/cycles,period=100000,acr_mask=0x3/} -- true
> > +
> > +In this case, if the "instructions" counter exceeds the sample period of
> > +200000, the second counter, "cycles", will be reset and a sample will be
> > +taken. If "cycles" is exceeded first, both counters in the group will be
> > +reset. In this way, samples will only be taken for cases where IPC > 2.
> 
> Could this definition include the meaning of acr_mask? I can see that
> the 2 periods create an IPC of 2, but I can't see why the acr_mask
> needs to be 2 and 3.

Hi Ian, thanks for reviewing. I will include an explanation for the acr mask in a new version.

> 
> > +
> > +ratio-to-prev Event Term
> > +------------------------
> > +To simplify this, an event term "ratio-to-prev" is provided which is used
> > +alongside the sample period term n or the -c/--count option. This would
> > +allow users to specify the desired relative rate between events as a
> > +ratio.
> 
> Should there be an opposite ratio-to-next?
> 
> > +
> > +The command above would then become
> > +
> > +       perf record -e {cpu_atom/instructions/, \
> > +       cpu_atom/cycles,period=100000,ratio-to-prev=0.5/} -- true
> > +
> > +ratio-to-prev is the ratio of the event using the term relative
> > +to the previous event in the group, which will always be 1,
> > +for a 1:0.5 or 2:1 ratio.
> > +
> > +To sample for IPC < 2 for example, the events need to be reordered:
> > +
> > +       perf record -e {cpu_atom/cycles/, \
> > +       cpu_atom/instructions,period=200000,ratio-to-prev=2.0/} -- true
> 
> We allow "software" events in groups with hardware events. The current
> list of software events is in perf_pmu__is_software and contains a few
> surprises like "msr":
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2134
> presumably ratio-to-prev should apply to the previous event in the
> list that is on the same PMU?

Yes, that's right.

> 
> > diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
> > index 8914f12d2b85..ba809fa1e8c6 100644
> > --- a/tools/perf/Documentation/perf-list.txt
> > +++ b/tools/perf/Documentation/perf-list.txt
> > @@ -376,6 +376,8 @@ Support raw format:
> >  . '--raw-dump [hw|sw|cache|tracepoint|pmu|event_glob]', shows the raw-dump of
> >    a certain kind of events.
> > 
> > +include::intel-acr.txt[]
> > +
> >  SEE ALSO
> >  --------
> >  linkperf:perf-stat[1], linkperf:perf-top[1],
> > diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
> > index 3dd29ba2c23b..b93dbbed2c8e 100644
> > --- a/tools/perf/arch/x86/util/evsel.c
> > +++ b/tools/perf/arch/x86/util/evsel.c
> > @@ -1,7 +1,9 @@
> >  // SPDX-License-Identifier: GPL-2.0
> >  #include <stdio.h>
> >  #include <stdlib.h>
> > +#include "util/evlist.h"
> >  #include "util/evsel.h"
> > +#include "util/evsel_config.h"
> >  #include "util/env.h"
> >  #include "util/pmu.h"
> >  #include "util/pmus.h"
> > @@ -89,6 +91,97 @@ int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
> >                          event_name);
> >  }
> > 
> > +static void evsel__apply_ratio_to_prev(struct evsel *evsel,
> > +                               struct perf_event_attr *attr,
> > +                               const char *buf)
> > +{
> > +       struct perf_event_attr *prev_attr = NULL;
> > +       struct evsel *evsel_prev = NULL;
> > +       struct evsel *leader = evsel__leader(evsel);
> > +       struct evsel *pos;
> > +       const char *name = "acr_mask";
> > +       int evsel_idx = 0;
> > +       __u64 ev_mask, pr_ev_mask;
> > +       double rtp;
> > +
> > +       rtp = strtod(buf, NULL);
> > +       if (rtp <= 0) {
> > +               pr_err("Invalid ratio-to-prev value %lf\n", rtp);
> 
> It would be nice to fail this early during parsing so that we can
> identify the part of the parse string that is invalid. I'm guessing it
> is this way because the parse_events__term_val_type are either an
> integer or a string.

Thanks, that does sound better. I'll look into that for v2.

> 
> > +               return;
> > +       }
> > +       if (evsel == evsel__leader(evsel)) {
> > +               pr_err("Invalid use of ratio-to-prev term without preceding element in group\n");
> 
> I'm wondering how we can prevent this happening due to event
> reordering in parse_events__sort_events_and_fix_groups:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2095
> 

Is there a way for a user to force a specific event order? Or avoid event reordering?

> > +               return;
> > +       }
> > +       if (!evsel->pmu->is_core) {
> > +               pr_err("Event using ratio-to-prev term must have a core PMU\n");
> 
> Would a stronger test here be the same PMU?

The previous check should make sure that the events are in the same group, which implies that they
are either from the same PMU or a SW event. This check then covers the SW event and non-core PMU
case.

> 
> > +               return;
> > +       }
> > +       if (!perf_pmu__has_format(evsel->pmu, name)) {
> > +               pr_err("'%s' does not have acr_mask format support\n", evsel->pmu->name);
> > +               return;
> > +       }
> > +       if (perf_pmu__format_type(evsel->pmu, name) !=
> > +                       PERF_PMU_FORMAT_VALUE_CONFIG2) {
> > +               pr_err("'%s' does not have acr_mask format support\n", evsel->pmu->name);
> 
> I wonder if the acr_mask support could be in an
> arch_evsel__apply_ratio_to_prev and the non-acr_mask stuff be generic?
> If nothing else this would aid testing.
> 
> > +               return;
> > +       }
> > +       if (attr->freq) {
> > +               pr_err("Event period term or count (-c) must be set when using ratio-to-prev term.\n");
> > +               return;
> > +       }
> > +
> > +       evsel_prev = evsel__prev(evsel);
> > +       if (!evsel_prev) {
> > +               pr_err("Previous event does not exist.\n");
> > +               return;
> > +       }
> > +
> > +       prev_attr = &evsel_prev->core.attr;
> > +
> > +       prev_attr->sample_period = (__u64)(attr->sample_period / rtp);
> > +       prev_attr->freq = 0;
> > +       evsel__reset_sample_bit(evsel_prev, PERIOD);
> > +
> > +       for_each_group_evsel(pos, leader) {
> > +               if (pos == evsel)
> > +                       break;
> > +               evsel_idx++;
> > +       }
> > +
> > +       /*
> > +        * acr_mask (config2) is calculated using the event's index in
> > +        * the event group. The first event will use the index of the
> > +        * second event as its mask (e.g., 0x2), indicating that the
> > +        * second event counter will be reset and a sample taken for
> > +        * the first event if its counter overflows. The second event
> > +        * will use the mask consisting of the first and second bits
> > +        * (e.g., 0x3), meaning both counters will be reset if the
> > +        * second event counter overflows.
> > +        */
> > +
> > +       ev_mask = 1ull << evsel_idx;
> > +       pr_ev_mask = 1ull << (evsel_idx - 1);
> > +
> > +       prev_attr->config2 = ev_mask;
> > +       attr->config2 = ev_mask | pr_ev_mask;
> > +}
> > +
> > +static void intel__post_evsel_config(struct evsel *evsel,
> > +                             struct perf_event_attr *attr)
> > +{
> > +       struct evsel_config_term *term;
> > +       struct list_head *config_terms = &evsel->config_terms;
> > +       const char *rtp_buf = NULL;
> > +
> > +       list_for_each_entry(term, config_terms, list) {
> > +               if (term->type == EVSEL__CONFIG_TERM_RATIO_TO_PREV) {
> > +                       rtp_buf = term->val.str;
> > +                       evsel__apply_ratio_to_prev(evsel, attr, rtp_buf);
> > +               }
> > +       }
> > +}
> > +
> >  static void ibs_l3miss_warn(void)
> >  {
> >         pr_warning(
> > @@ -101,7 +194,12 @@ void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr)
> >         struct perf_pmu *evsel_pmu, *ibs_fetch_pmu, *ibs_op_pmu;
> >         static int warned_once;
> > 
> > -       if (warned_once || !x86__is_amd_cpu())
> > +       if (!x86__is_amd_cpu()) {
> > +               intel__post_evsel_config(evsel, attr);
> > +               return;
> > +       }
> > +
> > +       if (warned_once)
> >                 return;
> > 
> >         evsel_pmu = evsel__find_pmu(evsel);
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index b60461e16804..5028232afeb7 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -1189,6 +1189,8 @@ static void evsel__apply_config_terms(struct evsel *evsel,
> >                         break;
> >                 case EVSEL__CONFIG_TERM_CFG_CHG:
> >                         break;
> > +               case EVSEL__CONFIG_TERM_RATIO_TO_PREV:
> > +                       break;
> >                 default:
> >                         break;
> >                 }
> > diff --git a/tools/perf/util/evsel_config.h b/tools/perf/util/evsel_config.h
> > index af52a1516d0b..26c69d9ce788 100644
> > --- a/tools/perf/util/evsel_config.h
> > +++ b/tools/perf/util/evsel_config.h
> > @@ -28,6 +28,7 @@ enum evsel_term_type {
> >         EVSEL__CONFIG_TERM_AUX_ACTION,
> >         EVSEL__CONFIG_TERM_AUX_SAMPLE_SIZE,
> >         EVSEL__CONFIG_TERM_CFG_CHG,
> > +       EVSEL__CONFIG_TERM_RATIO_TO_PREV,
> >  };
> > 
> >  struct evsel_config_term {
> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > index 7297ca3a4eec..4ea8d4ffabdb 100644
> > --- a/tools/perf/util/parse-events.c
> > +++ b/tools/perf/util/parse-events.c
> > @@ -806,6 +806,7 @@ const char *parse_events__term_type_str(enum parse_events__term_type term_type)
> >                 [PARSE_EVENTS__TERM_TYPE_RAW]                   = "raw",
> >                 [PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE]          = "legacy-cache",
> >                 [PARSE_EVENTS__TERM_TYPE_HARDWARE]              = "hardware",
> > +               [PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV]         = "ratio-to-prev",
> >         };
> >         if ((unsigned int)term_type >= __PARSE_EVENTS__TERM_TYPE_NR)
> >                 return "unknown term";
> > @@ -855,6 +856,7 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
> >         case PARSE_EVENTS__TERM_TYPE_RAW:
> >         case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
> >         case PARSE_EVENTS__TERM_TYPE_HARDWARE:
> > +       case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> >         default:
> >                 if (!err)
> >                         return false;
> > @@ -982,6 +984,9 @@ do {                                                                           \
> >                         return -EINVAL;
> >                 }
> >                 break;
> > +       case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> > +               CHECK_TYPE_VAL(STR);
> > +               break;
> >         case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
> >         case PARSE_EVENTS__TERM_TYPE_USER:
> >         case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
> > @@ -1109,6 +1114,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
> >         case PARSE_EVENTS__TERM_TYPE_RAW:
> >         case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
> >         case PARSE_EVENTS__TERM_TYPE_HARDWARE:
> > +       case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> >         default:
> >                 if (err) {
> >                         parse_events_error__handle(err, term->err_term,
> > @@ -1233,6 +1239,9 @@ do {                                                              \
> >                         ADD_CONFIG_TERM_VAL(AUX_SAMPLE_SIZE, aux_sample_size,
> >                                             term->val.num, term->weak);
> >                         break;
> > +               case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> > +                       ADD_CONFIG_TERM_STR(RATIO_TO_PREV, term->val.str, term->weak);
> > +                       break;
> >                 case PARSE_EVENTS__TERM_TYPE_USER:
> >                 case PARSE_EVENTS__TERM_TYPE_CONFIG:
> >                 case PARSE_EVENTS__TERM_TYPE_CONFIG1:
> > @@ -1297,6 +1306,7 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
> >                 case PARSE_EVENTS__TERM_TYPE_RAW:
> >                 case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
> >                 case PARSE_EVENTS__TERM_TYPE_HARDWARE:
> > +               case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> >                 default:
> >                         break;
> >                 }
> > diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> > index e176a34ab088..a9de95dd337a 100644
> > --- a/tools/perf/util/parse-events.h
> > +++ b/tools/perf/util/parse-events.h
> > @@ -80,7 +80,8 @@ enum parse_events__term_type {
> >         PARSE_EVENTS__TERM_TYPE_RAW,
> >         PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE,
> >         PARSE_EVENTS__TERM_TYPE_HARDWARE,
> > -#define        __PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_HARDWARE + 1)
> > +       PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV,
> > +#define        __PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV + 1)
> >  };
> > 
> >  struct parse_events_term {
> > diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> > index 7ed86e3e34e3..49fe1811fe68 100644
> > --- a/tools/perf/util/parse-events.l
> > +++ b/tools/perf/util/parse-events.l
> > @@ -335,6 +335,7 @@ aux-output          { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
> >  aux-action             { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_ACTION); }
> >  aux-sample-size                { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
> >  metric-id              { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
> > +ratio-to-prev          { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV); }
> >  cpu-cycles|cycles                              { return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
> >  stalled-cycles-frontend|idle-cycles-frontend   { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
> >  stalled-cycles-backend|idle-cycles-backend     { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
> > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> > index d08972aa461c..8b5b5a6adb29 100644
> > --- a/tools/perf/util/pmu.c
> > +++ b/tools/perf/util/pmu.c
> > @@ -1470,7 +1470,7 @@ static int pmu_config_term(const struct perf_pmu *pmu,
> >                         break;
> >                 case PARSE_EVENTS__TERM_TYPE_USER: /* Not hardcoded. */
> >                         return -EINVAL;
> > -               case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_HARDWARE:
> > +               case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> >                         /* Skip non-config terms. */
> >                         break;
> >                 default:
> > @@ -1852,6 +1852,7 @@ int perf_pmu__for_each_format(struct perf_pmu *pmu, void *state, pmu_format_call
> >                 "aux-output",
> >                 "aux-action=(pause|resume|start-paused)",
> >                 "aux-sample-size=number",
> > +               "ratio-to-prev=string",
> >         };
> >         struct perf_pmu_format *format;
> >         int ret;
> 
> Some places I think testing can be added are:
> * The event parsing test:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/parse-events.c?h=perf-tools-next#n2143
> * The perf record test:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/record.sh?h=perf-tools-next
> 

Thanks, I'll look into adding more testing here. 

> I wonder for if acr_mask is present (or ratio-to-prev) then
> arch_evsel__must_be_in_group should return true:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/util/evsel.c?h=perf-tools-next#n63
> this is used to force topdown events into groups, so perhaps we can do
> similar forcing and make the use of the {} for the group optional (or
> fixed by the tool).

Thanks, this might be an answer to my previous question.

Thanks, Tom
> 
> Thanks,
> Ian
> 
> > --
> > 2.48.1
> > 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf record: Usability enhancement for Auto Counter Reload
  2025-05-14 18:05   ` Falcon, Thomas
@ 2025-05-19 16:17     ` Ian Rogers
  0 siblings, 0 replies; 4+ messages in thread
From: Ian Rogers @ 2025-05-19 16:17 UTC (permalink / raw)
  To: Falcon, Thomas
  Cc: alexander.shishkin@linux.intel.com,
	linux-perf-users@vger.kernel.org, peterz@infradead.org,
	acme@kernel.org, mingo@redhat.com, Hunter, Adrian,
	namhyung@kernel.org, jolsa@kernel.org,
	linux-kernel@vger.kernel.org, kan.liang@linux.intel.com,
	mark.rutland@arm.com

On Wed, May 14, 2025 at 11:05 AM Falcon, Thomas <thomas.falcon@intel.com> wrote:
>
> On Tue, 2025-05-13 at 08:31 -0700, Ian Rogers wrote:
> > On Mon, May 12, 2025 at 2:09 PM Thomas Falcon <thomas.falcon@intel.com> wrote:
> > >
> > > The Auto Counter Reload (ACR)[1] feature is used to track the
> > > relative rates of two or more perf events, only sampling
> > > when a given threshold is exceeded. This helps reduce overhead
> > > and unnecessary samples. However, enabling this feature
> > > currently requires setting two parameters:
> > >
> > >  -- Event sampling period ("period")
> > >  -- acr_mask, which determines which events get reloaded
> > >     when the sample period is reached.
> > >
> > > For example, in the following command:
> > >
> > > perf record -e "{cpu_atom/branch-misses,period=200000,\
> > > acr_mask=0x2/ppu,cpu_atom/branch-instructions,period=1000000,\
> > > acr_mask=0x3/u}" -- ./mispredict
> > >
> > > The goal is to limit event sampling to cases when the
> > > branch miss rate exceeds 20%. If the branch instructions
> > > sample period is exceeded first, both events are reloaded.
> > > If branch misses exceed their threshold first, only the
> > > second counter is reloaded, and a sample is taken.
> > >
> > > To simplify this, provide a new “ratio-to-prev” event term
> > > that works alongside the period event option or -c option.
> > > This would allow users to specify the desired relative rate
> > > between events as a ratio, making configuration more intuitive.
> > >
> > > With this enhancement, the equivalent command would be:
> > >
> > > perf record -e "{cpu_atom/branch-misses/ppu,\
> > > cpu_atom/branch-instructions,period=1000000,ratio_to_prev=5/u}" \
> > > -- ./mispredict
> > >
> > > or
> > >
> > > perf record -e "{cpu_atom/branch-misses/ppu,\
> > > cpu_atom/branch-instructions,ratio-to-prev=5/u}" -c 1000000 \
> > > -- ./mispredict
> >
> > Thanks Thomas. I'm wondering if ratio-to-prev should be a generic term
> > such that periods can be set as a ratio of each on non-Intel?
> >
> > > [1] https://lore.kernel.org/lkml/20250327195217.2683619-1-kan.liang@linux.intel.com/
> > >
> > > Signed-off-by: Thomas Falcon <thomas.falcon@intel.com>
> > >
> > > ---
> > >  tools/perf/Documentation/intel-acr.txt |  45 +++++++++++
> > >  tools/perf/Documentation/perf-list.txt |   2 +
> > >  tools/perf/arch/x86/util/evsel.c       | 100 ++++++++++++++++++++++++-
> > >  tools/perf/util/evsel.c                |   2 +
> > >  tools/perf/util/evsel_config.h         |   1 +
> > >  tools/perf/util/parse-events.c         |  10 +++
> > >  tools/perf/util/parse-events.h         |   3 +-
> > >  tools/perf/util/parse-events.l         |   1 +
> > >  tools/perf/util/pmu.c                  |   3 +-
> > >  9 files changed, 164 insertions(+), 3 deletions(-)
> > >  create mode 100644 tools/perf/Documentation/intel-acr.txt
> > >
> > > diff --git a/tools/perf/Documentation/intel-acr.txt b/tools/perf/Documentation/intel-acr.txt
> > > new file mode 100644
> > > index 000000000000..db835c769e1c
> > > --- /dev/null
> > > +++ b/tools/perf/Documentation/intel-acr.txt
> > > @@ -0,0 +1,45 @@
> > > +Intel Auto Counter Reload Support
> > > +---------------------------------
> > > +Support for Intel Auto Counter Reload in perf tools
> > > +
> > > +Auto counter reload provides a means for software to specify to hardware
> > > +that certain counters, if supported, should be automatically reloaded
> > > +upon overflow of chosen counters. By taking a sample only if the rate of
> > > +one event exceeds some threshold relative to the rate of another event,
> > > +this feature enables software to sample based on the relative rate of
> > > +two or more events. To enable this, the user must provide a sample period
> > > +term and a bitmask ("acr_mask") for each relevant event specifying the
> > > +counters in an event group to reload if the event's specified sample
> > > +period is exceeded.
> > > +
> > > +For example, if the user desires to measure a scenario when IPC > 2,
> > > +the event group might look like the one below:
> > > +
> > > +       perf record -e {cpu_atom/instructions,period=200000,acr_mask=0x2/, \
> > > +       cpu_atom/cycles,period=100000,acr_mask=0x3/} -- true
> > > +
> > > +In this case, if the "instructions" counter exceeds the sample period of
> > > +200000, the second counter, "cycles", will be reset and a sample will be
> > > +taken. If "cycles" is exceeded first, both counters in the group will be
> > > +reset. In this way, samples will only be taken for cases where IPC > 2.
> >
> > Could this definition include the meaning of acr_mask? I can see that
> > the 2 periods create an IPC of 2, but I can't see why the acr_mask
> > needs to be 2 and 3.
>
> Hi Ian, thanks for reviewing. I will include an explanation for the acr mask in a new version.
>
> >
> > > +
> > > +ratio-to-prev Event Term
> > > +------------------------
> > > +To simplify this, an event term "ratio-to-prev" is provided which is used
> > > +alongside the sample period term n or the -c/--count option. This would
> > > +allow users to specify the desired relative rate between events as a
> > > +ratio.
> >
> > Should there be an opposite ratio-to-next?
> >
> > > +
> > > +The command above would then become
> > > +
> > > +       perf record -e {cpu_atom/instructions/, \
> > > +       cpu_atom/cycles,period=100000,ratio-to-prev=0.5/} -- true
> > > +
> > > +ratio-to-prev is the ratio of the event using the term relative
> > > +to the previous event in the group, which will always be 1,
> > > +for a 1:0.5 or 2:1 ratio.
> > > +
> > > +To sample for IPC < 2 for example, the events need to be reordered:
> > > +
> > > +       perf record -e {cpu_atom/cycles/, \
> > > +       cpu_atom/instructions,period=200000,ratio-to-prev=2.0/} -- true
> >
> > We allow "software" events in groups with hardware events. The current
> > list of software events is in perf_pmu__is_software and contains a few
> > surprises like "msr":
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2134
> > presumably ratio-to-prev should apply to the previous event in the
> > list that is on the same PMU?
>
> Yes, that's right.
>
> >
> > > diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt
> > > index 8914f12d2b85..ba809fa1e8c6 100644
> > > --- a/tools/perf/Documentation/perf-list.txt
> > > +++ b/tools/perf/Documentation/perf-list.txt
> > > @@ -376,6 +376,8 @@ Support raw format:
> > >  . '--raw-dump [hw|sw|cache|tracepoint|pmu|event_glob]', shows the raw-dump of
> > >    a certain kind of events.
> > >
> > > +include::intel-acr.txt[]
> > > +
> > >  SEE ALSO
> > >  --------
> > >  linkperf:perf-stat[1], linkperf:perf-top[1],
> > > diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
> > > index 3dd29ba2c23b..b93dbbed2c8e 100644
> > > --- a/tools/perf/arch/x86/util/evsel.c
> > > +++ b/tools/perf/arch/x86/util/evsel.c
> > > @@ -1,7 +1,9 @@
> > >  // SPDX-License-Identifier: GPL-2.0
> > >  #include <stdio.h>
> > >  #include <stdlib.h>
> > > +#include "util/evlist.h"
> > >  #include "util/evsel.h"
> > > +#include "util/evsel_config.h"
> > >  #include "util/env.h"
> > >  #include "util/pmu.h"
> > >  #include "util/pmus.h"
> > > @@ -89,6 +91,97 @@ int arch_evsel__hw_name(struct evsel *evsel, char *bf, size_t size)
> > >                          event_name);
> > >  }
> > >
> > > +static void evsel__apply_ratio_to_prev(struct evsel *evsel,
> > > +                               struct perf_event_attr *attr,
> > > +                               const char *buf)
> > > +{
> > > +       struct perf_event_attr *prev_attr = NULL;
> > > +       struct evsel *evsel_prev = NULL;
> > > +       struct evsel *leader = evsel__leader(evsel);
> > > +       struct evsel *pos;
> > > +       const char *name = "acr_mask";
> > > +       int evsel_idx = 0;
> > > +       __u64 ev_mask, pr_ev_mask;
> > > +       double rtp;
> > > +
> > > +       rtp = strtod(buf, NULL);
> > > +       if (rtp <= 0) {
> > > +               pr_err("Invalid ratio-to-prev value %lf\n", rtp);
> >
> > It would be nice to fail this early during parsing so that we can
> > identify the part of the parse string that is invalid. I'm guessing it
> > is this way because the parse_events__term_val_type are either an
> > integer or a string.
>
> Thanks, that does sound better. I'll look into that for v2.
>
> >
> > > +               return;
> > > +       }
> > > +       if (evsel == evsel__leader(evsel)) {
> > > +               pr_err("Invalid use of ratio-to-prev term without preceding element in group\n");
> >
> > I'm wondering how we can prevent this happening due to event
> > reordering in parse_events__sort_events_and_fix_groups:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.c?h=perf-tools-next#n2095
> >
>
> Is there a way for a user to force a specific event order? Or avoid event reordering?

Sorry for the delay in replying to this, There isn't a way to avoid
event reordering in part as it is a property of how the parsing works.
I can give a bit of a back story on why there is event reordering.
Initially events were reordered for uncore PMUs. On my alderlake I
have uncore_imc_free_running_0 and uncore_imc_free_running_1 and they
have the events data_read and data_write. If I grouped the events
like:
```
$ perf stat -e '{data_read,data_write}' -a sleep 1
```
then at parse time the evlist will be
{uncore_imc_0/data_read/,uncore_imc_1/data_read/,uncore_imc_0/data_write/,uncore_imc_1/data_write/}.
It isn't allowed to have events for different PMUs within the same
group, and we also want to group the uncore_imc_0/data_write/ event
with the uncore_imc_0/data_read/ event (and similarly for
uncore_imc_1). There was previous logic that I rewrote as
parse_events__sort_events_and_fix_groups that fixed issues like
software events, but I also used it to fix the perf metric events. The
perf metric events on performance cores require the slots event to
come first and to be in a group with it. For metrics we pretty much
grab events in the order they are in the metric and then try to
program them in a group. There were proposals to provide an event
order with metrics so they could be programmed properly somehow, but
then what about uncore, etc. What we have is messy, but the messiness
comes about from perf metric events and uncore, it somewhat works well
for metrics as they needn't worry at all about event order and
grouping.

Thanks,
Ian

> > > +               return;
> > > +       }
> > > +       if (!evsel->pmu->is_core) {
> > > +               pr_err("Event using ratio-to-prev term must have a core PMU\n");
> >
> > Would a stronger test here be the same PMU?
>
> The previous check should make sure that the events are in the same group, which implies that they
> are either from the same PMU or a SW event. This check then covers the SW event and non-core PMU
> case.
>
> >
> > > +               return;
> > > +       }
> > > +       if (!perf_pmu__has_format(evsel->pmu, name)) {
> > > +               pr_err("'%s' does not have acr_mask format support\n", evsel->pmu->name);
> > > +               return;
> > > +       }
> > > +       if (perf_pmu__format_type(evsel->pmu, name) !=
> > > +                       PERF_PMU_FORMAT_VALUE_CONFIG2) {
> > > +               pr_err("'%s' does not have acr_mask format support\n", evsel->pmu->name);
> >
> > I wonder if the acr_mask support could be in an
> > arch_evsel__apply_ratio_to_prev and the non-acr_mask stuff be generic?
> > If nothing else this would aid testing.
> >
> > > +               return;
> > > +       }
> > > +       if (attr->freq) {
> > > +               pr_err("Event period term or count (-c) must be set when using ratio-to-prev term.\n");
> > > +               return;
> > > +       }
> > > +
> > > +       evsel_prev = evsel__prev(evsel);
> > > +       if (!evsel_prev) {
> > > +               pr_err("Previous event does not exist.\n");
> > > +               return;
> > > +       }
> > > +
> > > +       prev_attr = &evsel_prev->core.attr;
> > > +
> > > +       prev_attr->sample_period = (__u64)(attr->sample_period / rtp);
> > > +       prev_attr->freq = 0;
> > > +       evsel__reset_sample_bit(evsel_prev, PERIOD);
> > > +
> > > +       for_each_group_evsel(pos, leader) {
> > > +               if (pos == evsel)
> > > +                       break;
> > > +               evsel_idx++;
> > > +       }
> > > +
> > > +       /*
> > > +        * acr_mask (config2) is calculated using the event's index in
> > > +        * the event group. The first event will use the index of the
> > > +        * second event as its mask (e.g., 0x2), indicating that the
> > > +        * second event counter will be reset and a sample taken for
> > > +        * the first event if its counter overflows. The second event
> > > +        * will use the mask consisting of the first and second bits
> > > +        * (e.g., 0x3), meaning both counters will be reset if the
> > > +        * second event counter overflows.
> > > +        */
> > > +
> > > +       ev_mask = 1ull << evsel_idx;
> > > +       pr_ev_mask = 1ull << (evsel_idx - 1);
> > > +
> > > +       prev_attr->config2 = ev_mask;
> > > +       attr->config2 = ev_mask | pr_ev_mask;
> > > +}
> > > +
> > > +static void intel__post_evsel_config(struct evsel *evsel,
> > > +                             struct perf_event_attr *attr)
> > > +{
> > > +       struct evsel_config_term *term;
> > > +       struct list_head *config_terms = &evsel->config_terms;
> > > +       const char *rtp_buf = NULL;
> > > +
> > > +       list_for_each_entry(term, config_terms, list) {
> > > +               if (term->type == EVSEL__CONFIG_TERM_RATIO_TO_PREV) {
> > > +                       rtp_buf = term->val.str;
> > > +                       evsel__apply_ratio_to_prev(evsel, attr, rtp_buf);
> > > +               }
> > > +       }
> > > +}
> > > +
> > >  static void ibs_l3miss_warn(void)
> > >  {
> > >         pr_warning(
> > > @@ -101,7 +194,12 @@ void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr)
> > >         struct perf_pmu *evsel_pmu, *ibs_fetch_pmu, *ibs_op_pmu;
> > >         static int warned_once;
> > >
> > > -       if (warned_once || !x86__is_amd_cpu())
> > > +       if (!x86__is_amd_cpu()) {
> > > +               intel__post_evsel_config(evsel, attr);
> > > +               return;
> > > +       }
> > > +
> > > +       if (warned_once)
> > >                 return;
> > >
> > >         evsel_pmu = evsel__find_pmu(evsel);
> > > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > > index b60461e16804..5028232afeb7 100644
> > > --- a/tools/perf/util/evsel.c
> > > +++ b/tools/perf/util/evsel.c
> > > @@ -1189,6 +1189,8 @@ static void evsel__apply_config_terms(struct evsel *evsel,
> > >                         break;
> > >                 case EVSEL__CONFIG_TERM_CFG_CHG:
> > >                         break;
> > > +               case EVSEL__CONFIG_TERM_RATIO_TO_PREV:
> > > +                       break;
> > >                 default:
> > >                         break;
> > >                 }
> > > diff --git a/tools/perf/util/evsel_config.h b/tools/perf/util/evsel_config.h
> > > index af52a1516d0b..26c69d9ce788 100644
> > > --- a/tools/perf/util/evsel_config.h
> > > +++ b/tools/perf/util/evsel_config.h
> > > @@ -28,6 +28,7 @@ enum evsel_term_type {
> > >         EVSEL__CONFIG_TERM_AUX_ACTION,
> > >         EVSEL__CONFIG_TERM_AUX_SAMPLE_SIZE,
> > >         EVSEL__CONFIG_TERM_CFG_CHG,
> > > +       EVSEL__CONFIG_TERM_RATIO_TO_PREV,
> > >  };
> > >
> > >  struct evsel_config_term {
> > > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > > index 7297ca3a4eec..4ea8d4ffabdb 100644
> > > --- a/tools/perf/util/parse-events.c
> > > +++ b/tools/perf/util/parse-events.c
> > > @@ -806,6 +806,7 @@ const char *parse_events__term_type_str(enum parse_events__term_type term_type)
> > >                 [PARSE_EVENTS__TERM_TYPE_RAW]                   = "raw",
> > >                 [PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE]          = "legacy-cache",
> > >                 [PARSE_EVENTS__TERM_TYPE_HARDWARE]              = "hardware",
> > > +               [PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV]         = "ratio-to-prev",
> > >         };
> > >         if ((unsigned int)term_type >= __PARSE_EVENTS__TERM_TYPE_NR)
> > >                 return "unknown term";
> > > @@ -855,6 +856,7 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
> > >         case PARSE_EVENTS__TERM_TYPE_RAW:
> > >         case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
> > >         case PARSE_EVENTS__TERM_TYPE_HARDWARE:
> > > +       case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> > >         default:
> > >                 if (!err)
> > >                         return false;
> > > @@ -982,6 +984,9 @@ do {                                                                           \
> > >                         return -EINVAL;
> > >                 }
> > >                 break;
> > > +       case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> > > +               CHECK_TYPE_VAL(STR);
> > > +               break;
> > >         case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
> > >         case PARSE_EVENTS__TERM_TYPE_USER:
> > >         case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
> > > @@ -1109,6 +1114,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
> > >         case PARSE_EVENTS__TERM_TYPE_RAW:
> > >         case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
> > >         case PARSE_EVENTS__TERM_TYPE_HARDWARE:
> > > +       case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> > >         default:
> > >                 if (err) {
> > >                         parse_events_error__handle(err, term->err_term,
> > > @@ -1233,6 +1239,9 @@ do {                                                              \
> > >                         ADD_CONFIG_TERM_VAL(AUX_SAMPLE_SIZE, aux_sample_size,
> > >                                             term->val.num, term->weak);
> > >                         break;
> > > +               case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> > > +                       ADD_CONFIG_TERM_STR(RATIO_TO_PREV, term->val.str, term->weak);
> > > +                       break;
> > >                 case PARSE_EVENTS__TERM_TYPE_USER:
> > >                 case PARSE_EVENTS__TERM_TYPE_CONFIG:
> > >                 case PARSE_EVENTS__TERM_TYPE_CONFIG1:
> > > @@ -1297,6 +1306,7 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
> > >                 case PARSE_EVENTS__TERM_TYPE_RAW:
> > >                 case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
> > >                 case PARSE_EVENTS__TERM_TYPE_HARDWARE:
> > > +               case PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> > >                 default:
> > >                         break;
> > >                 }
> > > diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> > > index e176a34ab088..a9de95dd337a 100644
> > > --- a/tools/perf/util/parse-events.h
> > > +++ b/tools/perf/util/parse-events.h
> > > @@ -80,7 +80,8 @@ enum parse_events__term_type {
> > >         PARSE_EVENTS__TERM_TYPE_RAW,
> > >         PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE,
> > >         PARSE_EVENTS__TERM_TYPE_HARDWARE,
> > > -#define        __PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_HARDWARE + 1)
> > > +       PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV,
> > > +#define        __PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV + 1)
> > >  };
> > >
> > >  struct parse_events_term {
> > > diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> > > index 7ed86e3e34e3..49fe1811fe68 100644
> > > --- a/tools/perf/util/parse-events.l
> > > +++ b/tools/perf/util/parse-events.l
> > > @@ -335,6 +335,7 @@ aux-output          { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
> > >  aux-action             { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_ACTION); }
> > >  aux-sample-size                { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
> > >  metric-id              { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
> > > +ratio-to-prev          { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV); }
> > >  cpu-cycles|cycles                              { return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
> > >  stalled-cycles-frontend|idle-cycles-frontend   { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
> > >  stalled-cycles-backend|idle-cycles-backend     { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
> > > diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> > > index d08972aa461c..8b5b5a6adb29 100644
> > > --- a/tools/perf/util/pmu.c
> > > +++ b/tools/perf/util/pmu.c
> > > @@ -1470,7 +1470,7 @@ static int pmu_config_term(const struct perf_pmu *pmu,
> > >                         break;
> > >                 case PARSE_EVENTS__TERM_TYPE_USER: /* Not hardcoded. */
> > >                         return -EINVAL;
> > > -               case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_HARDWARE:
> > > +               case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_RATIO_TO_PREV:
> > >                         /* Skip non-config terms. */
> > >                         break;
> > >                 default:
> > > @@ -1852,6 +1852,7 @@ int perf_pmu__for_each_format(struct perf_pmu *pmu, void *state, pmu_format_call
> > >                 "aux-output",
> > >                 "aux-action=(pause|resume|start-paused)",
> > >                 "aux-sample-size=number",
> > > +               "ratio-to-prev=string",
> > >         };
> > >         struct perf_pmu_format *format;
> > >         int ret;
> >
> > Some places I think testing can be added are:
> > * The event parsing test:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/parse-events.c?h=perf-tools-next#n2143
> > * The perf record test:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/record.sh?h=perf-tools-next
> >
>
> Thanks, I'll look into adding more testing here.
>
> > I wonder for if acr_mask is present (or ratio-to-prev) then
> > arch_evsel__must_be_in_group should return true:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/util/evsel.c?h=perf-tools-next#n63
> > this is used to force topdown events into groups, so perhaps we can do
> > similar forcing and make the use of the {} for the group optional (or
> > fixed by the tool).
>
> Thanks, this might be an answer to my previous question.
>
> Thanks, Tom
> >
> > Thanks,
> > Ian
> >
> > > --
> > > 2.48.1
> > >
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-19 16:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12 21:09 [PATCH] perf record: Usability enhancement for Auto Counter Reload Thomas Falcon
2025-05-13 15:31 ` Ian Rogers
2025-05-14 18:05   ` Falcon, Thomas
2025-05-19 16:17     ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).