* [PATCH 0/5] perf: arm_spe: Add format option for discard mode
@ 2024-12-17 11:56 James Clark
2024-12-17 11:56 ` [PATCH 1/5] " James Clark
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: James Clark @ 2024-12-17 11:56 UTC (permalink / raw)
To: linux-arm-kernel, linux-perf-users
Cc: James Clark, Will Deacon, Mark Rutland, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang, Kan, John Garry, Mike Leach, Leo Yan, Graham Woodward,
linux-kernel, bpf
Discard mode is a way to enable SPE related PMU events without the
overhead of recording any data. Add a format option, tests and docs for
it.
In theory we could make the driver drop calls to allocate the aux buffer
when discard mode is enabled. This would give a small memory saving,
but I think there is potential to interfere with any tools that don't
expect this so I left the aux allocation untouched. Even old tools that
don't know about discard mode will be able to use it because we publish
the format option. Not allocating the aux buffer will have to be added
to tools which I've done in Perf.
Tested on the FVP with SAMPLE_FEED_OP (0x812D):
$ perf stat -e armv8_pmuv3/event=0x812D/ -- true
Performance counter stats for 'true':
0 armv8_pmuv3/event=0x812D/
$ perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
$ perf stat -e armv8_pmuv3/event=0x812D/ -- true
Performance counter stats for 'true':
17350 armv8_pmuv3/event=0x812D/
James Clark (5):
perf: arm_spe: Add format option for discard mode
perf tool: arm-spe: Pull out functions for aux buffer and tracking
setup
perf tool: arm-spe: Don't allocate buffer or tracking event in discard
mode
perf test: arm_spe: Add test for discard mode
perf docs: arm_spe: Document new discard mode
drivers/perf/arm_spe_pmu.c | 23 ++++++
tools/perf/Documentation/perf-arm-spe.txt | 11 +++
tools/perf/arch/arm64/util/arm-spe.c | 90 +++++++++++++++--------
tools/perf/tests/shell/test_arm_spe.sh | 30 ++++++++
4 files changed, 122 insertions(+), 32 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/5] perf: arm_spe: Add format option for discard mode
2024-12-17 11:56 [PATCH 0/5] perf: arm_spe: Add format option for discard mode James Clark
@ 2024-12-17 11:56 ` James Clark
2024-12-17 11:56 ` [PATCH 2/5] perf tool: arm-spe: Pull out functions for aux buffer and tracking setup James Clark
` (4 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: James Clark @ 2024-12-17 11:56 UTC (permalink / raw)
To: linux-arm-kernel, linux-perf-users
Cc: James Clark, Will Deacon, Mark Rutland, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang, Kan, John Garry, Mike Leach, Leo Yan, Graham Woodward,
linux-kernel, bpf
FEAT_SPEv1p2 (optional from Armv8.6) adds a discard mode that allows all
SPE data to be discarded rather than written to memory. Add a format
bit for this mode.
If the mode isn't supported, the format bit isn't published and attempts
to use it will result in -EOPNOTSUPP. Allocating an aux buffer is still
allowed even though it won't be written to so that old tools continue to
work, but updated tools can choose to skip this step.
Signed-off-by: James Clark <james.clark@linaro.org>
---
drivers/perf/arm_spe_pmu.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c
index fd5b78732603..9aaf3f98e6f5 100644
--- a/drivers/perf/arm_spe_pmu.c
+++ b/drivers/perf/arm_spe_pmu.c
@@ -193,6 +193,9 @@ static const struct attribute_group arm_spe_pmu_cap_group = {
#define ATTR_CFG_FLD_store_filter_CFG config /* PMSFCR_EL1.ST */
#define ATTR_CFG_FLD_store_filter_LO 34
#define ATTR_CFG_FLD_store_filter_HI 34
+#define ATTR_CFG_FLD_discard_CFG config /* PMBLIMITR_EL1.FM = DISCARD */
+#define ATTR_CFG_FLD_discard_LO 35
+#define ATTR_CFG_FLD_discard_HI 35
#define ATTR_CFG_FLD_event_filter_CFG config1 /* PMSEVFR_EL1 */
#define ATTR_CFG_FLD_event_filter_LO 0
@@ -216,6 +219,7 @@ GEN_PMU_FORMAT_ATTR(store_filter);
GEN_PMU_FORMAT_ATTR(event_filter);
GEN_PMU_FORMAT_ATTR(inv_event_filter);
GEN_PMU_FORMAT_ATTR(min_latency);
+GEN_PMU_FORMAT_ATTR(discard);
static struct attribute *arm_spe_pmu_formats_attr[] = {
&format_attr_ts_enable.attr,
@@ -228,9 +232,15 @@ static struct attribute *arm_spe_pmu_formats_attr[] = {
&format_attr_event_filter.attr,
&format_attr_inv_event_filter.attr,
&format_attr_min_latency.attr,
+ &format_attr_discard.attr,
NULL,
};
+static bool discard_unsupported(struct arm_spe_pmu *spe_pmu)
+{
+ return spe_pmu->pmsver < ID_AA64DFR0_EL1_PMSVer_V1P2;
+}
+
static umode_t arm_spe_pmu_format_attr_is_visible(struct kobject *kobj,
struct attribute *attr,
int unused)
@@ -238,6 +248,9 @@ static umode_t arm_spe_pmu_format_attr_is_visible(struct kobject *kobj,
struct device *dev = kobj_to_dev(kobj);
struct arm_spe_pmu *spe_pmu = dev_get_drvdata(dev);
+ if (attr == &format_attr_discard.attr && discard_unsupported(spe_pmu))
+ return 0;
+
if (attr == &format_attr_inv_event_filter.attr && !(spe_pmu->features & SPE_PMU_FEAT_INV_FILT_EVT))
return 0;
@@ -502,6 +515,12 @@ static void arm_spe_perf_aux_output_begin(struct perf_output_handle *handle,
u64 base, limit;
struct arm_spe_pmu_buf *buf;
+ if (ATTR_CFG_GET_FLD(&event->attr, discard)) {
+ limit = FIELD_PREP(PMBLIMITR_EL1_FM, PMBLIMITR_EL1_FM_DISCARD);
+ limit |= PMBLIMITR_EL1_E;
+ goto out_write_limit;
+ }
+
/* Start a new aux session */
buf = perf_aux_output_begin(handle, event);
if (!buf) {
@@ -743,6 +762,10 @@ static int arm_spe_pmu_event_init(struct perf_event *event)
!(spe_pmu->features & SPE_PMU_FEAT_FILT_LAT))
return -EOPNOTSUPP;
+ if (ATTR_CFG_GET_FLD(&event->attr, discard) &&
+ discard_unsupported(spe_pmu))
+ return -EOPNOTSUPP;
+
set_spe_event_has_cx(event);
reg = arm_spe_event_to_pmscr(event);
if (reg & (PMSCR_EL1_PA | PMSCR_EL1_PCT))
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/5] perf tool: arm-spe: Pull out functions for aux buffer and tracking setup
2024-12-17 11:56 [PATCH 0/5] perf: arm_spe: Add format option for discard mode James Clark
2024-12-17 11:56 ` [PATCH 1/5] " James Clark
@ 2024-12-17 11:56 ` James Clark
2024-12-17 11:56 ` [PATCH 3/5] perf tool: arm-spe: Don't allocate buffer or tracking event in discard mode James Clark
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: James Clark @ 2024-12-17 11:56 UTC (permalink / raw)
To: linux-arm-kernel, linux-perf-users
Cc: James Clark, Will Deacon, Mark Rutland, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang, Kan, John Garry, Mike Leach, Leo Yan, Graham Woodward,
linux-kernel, bpf
These won't be used in the next commit in discard mode, so put them in
their own functions. No functional changes intended.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/arch/arm64/util/arm-spe.c | 83 +++++++++++++++++-----------
1 file changed, 51 insertions(+), 32 deletions(-)
diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index 22b19dcc6beb..1b543855f206 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -274,33 +274,9 @@ static void arm_spe_setup_evsel(struct evsel *evsel, struct perf_cpu_map *cpus)
evsel__set_sample_bit(evsel, PHYS_ADDR);
}
-static int arm_spe_recording_options(struct auxtrace_record *itr,
- struct evlist *evlist,
- struct record_opts *opts)
+static int arm_spe_setup_aux_buffer(struct record_opts *opts)
{
- struct arm_spe_recording *sper =
- container_of(itr, struct arm_spe_recording, itr);
- struct evsel *evsel, *tmp;
- struct perf_cpu_map *cpus = evlist->core.user_requested_cpus;
bool privileged = perf_event_paranoid_check(-1);
- struct evsel *tracking_evsel;
- int err;
-
- sper->evlist = evlist;
-
- evlist__for_each_entry(evlist, evsel) {
- if (evsel__is_aux_event(evsel)) {
- if (!strstarts(evsel->pmu->name, ARM_SPE_PMU_NAME)) {
- pr_err("Found unexpected auxtrace event: %s\n",
- evsel->pmu->name);
- return -EINVAL;
- }
- opts->full_auxtrace = true;
- }
- }
-
- if (!opts->full_auxtrace)
- return 0;
/*
* we are in snapshot mode.
@@ -330,6 +306,9 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
return -EINVAL;
}
+
+ pr_debug2("%sx snapshot size: %zu\n", ARM_SPE_PMU_NAME,
+ opts->auxtrace_snapshot_size);
}
/* We are in full trace mode but '-m,xyz' wasn't specified */
@@ -355,14 +334,15 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
}
}
- if (opts->auxtrace_snapshot_mode)
- pr_debug2("%sx snapshot size: %zu\n", ARM_SPE_PMU_NAME,
- opts->auxtrace_snapshot_size);
+ return 0;
+}
- evlist__for_each_entry_safe(evlist, tmp, evsel) {
- if (evsel__is_aux_event(evsel))
- arm_spe_setup_evsel(evsel, cpus);
- }
+static int arm_spe_setup_tracking_event(struct evlist *evlist,
+ struct record_opts *opts)
+{
+ int err;
+ struct evsel *tracking_evsel;
+ struct perf_cpu_map *cpus = evlist->core.user_requested_cpus;
/* Add dummy event to keep tracking */
err = parse_event(evlist, "dummy:u");
@@ -388,6 +368,45 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
return 0;
}
+static int arm_spe_recording_options(struct auxtrace_record *itr,
+ struct evlist *evlist,
+ struct record_opts *opts)
+{
+ struct arm_spe_recording *sper =
+ container_of(itr, struct arm_spe_recording, itr);
+ struct evsel *evsel, *tmp;
+ struct perf_cpu_map *cpus = evlist->core.user_requested_cpus;
+
+ int err;
+
+ sper->evlist = evlist;
+
+ evlist__for_each_entry(evlist, evsel) {
+ if (evsel__is_aux_event(evsel)) {
+ if (!strstarts(evsel->pmu->name, ARM_SPE_PMU_NAME)) {
+ pr_err("Found unexpected auxtrace event: %s\n",
+ evsel->pmu->name);
+ return -EINVAL;
+ }
+ opts->full_auxtrace = true;
+ }
+ }
+
+ if (!opts->full_auxtrace)
+ return 0;
+
+ evlist__for_each_entry_safe(evlist, tmp, evsel) {
+ if (evsel__is_aux_event(evsel))
+ arm_spe_setup_evsel(evsel, cpus);
+ }
+
+ err = arm_spe_setup_aux_buffer(opts);
+ if (err)
+ return err;
+
+ return arm_spe_setup_tracking_event(evlist, opts);
+}
+
static int arm_spe_parse_snapshot_options(struct auxtrace_record *itr __maybe_unused,
struct record_opts *opts,
const char *str)
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/5] perf tool: arm-spe: Don't allocate buffer or tracking event in discard mode
2024-12-17 11:56 [PATCH 0/5] perf: arm_spe: Add format option for discard mode James Clark
2024-12-17 11:56 ` [PATCH 1/5] " James Clark
2024-12-17 11:56 ` [PATCH 2/5] perf tool: arm-spe: Pull out functions for aux buffer and tracking setup James Clark
@ 2024-12-17 11:56 ` James Clark
2024-12-17 11:56 ` [PATCH 4/5] perf test: arm_spe: Add test for " James Clark
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: James Clark @ 2024-12-17 11:56 UTC (permalink / raw)
To: linux-arm-kernel, linux-perf-users
Cc: James Clark, Will Deacon, Mark Rutland, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang, Kan, John Garry, Mike Leach, Leo Yan, Graham Woodward,
linux-kernel, bpf
The buffer will never be written to so don't bother allocating it. The
tracking event is also not required.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/arch/arm64/util/arm-spe.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index 1b543855f206..4301181b8e45 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -376,7 +376,7 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
container_of(itr, struct arm_spe_recording, itr);
struct evsel *evsel, *tmp;
struct perf_cpu_map *cpus = evlist->core.user_requested_cpus;
-
+ bool discard = false;
int err;
sper->evlist = evlist;
@@ -396,10 +396,17 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
return 0;
evlist__for_each_entry_safe(evlist, tmp, evsel) {
- if (evsel__is_aux_event(evsel))
+ if (evsel__is_aux_event(evsel)) {
arm_spe_setup_evsel(evsel, cpus);
+ if (evsel->core.attr.config &
+ perf_pmu__format_bits(evsel->pmu, "discard"))
+ discard = true;
+ }
}
+ if (discard)
+ return 0;
+
err = arm_spe_setup_aux_buffer(opts);
if (err)
return err;
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/5] perf test: arm_spe: Add test for discard mode
2024-12-17 11:56 [PATCH 0/5] perf: arm_spe: Add format option for discard mode James Clark
` (2 preceding siblings ...)
2024-12-17 11:56 ` [PATCH 3/5] perf tool: arm-spe: Don't allocate buffer or tracking event in discard mode James Clark
@ 2024-12-17 11:56 ` James Clark
2024-12-17 11:56 ` [PATCH 5/5] perf docs: arm_spe: Document new " James Clark
2024-12-18 10:39 ` [PATCH 0/5] perf: arm_spe: Add format option for " Yeo Reum Yun
5 siblings, 0 replies; 11+ messages in thread
From: James Clark @ 2024-12-17 11:56 UTC (permalink / raw)
To: linux-arm-kernel, linux-perf-users
Cc: James Clark, Will Deacon, Mark Rutland, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang, Kan, John Garry, Mike Leach, Leo Yan, Graham Woodward,
linux-kernel, bpf
Add a test that checks that there were no AUX or AUXTRACE events
recorded when discard mode is used.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/tests/shell/test_arm_spe.sh | 30 ++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/tools/perf/tests/shell/test_arm_spe.sh b/tools/perf/tests/shell/test_arm_spe.sh
index 3258368634f7..a69aab70dd8a 100755
--- a/tools/perf/tests/shell/test_arm_spe.sh
+++ b/tools/perf/tests/shell/test_arm_spe.sh
@@ -107,7 +107,37 @@ arm_spe_system_wide_test() {
arm_spe_report "SPE system-wide testing" $err
}
+arm_spe_discard_test() {
+ echo "SPE discard mode"
+
+ for f in /sys/bus/event_source/devices/arm_spe_*; do
+ if [ -e "$f/format/discard" ]; then
+ cpu=$(cut -c -1 "$f/cpumask")
+ break
+ fi
+ done
+
+ if [ -z $cpu ]; then
+ arm_spe_report "SPE discard mode not present" 2
+ return
+ fi
+
+ # Test can use wildcard SPE instance and Perf will only open the event
+ # on instances that have that format flag. But make sure the target
+ # runs on an instance with discard mode otherwise we're not testing
+ # anything.
+ perf record -o ${perfdata} -e arm_spe/discard/ -N -B --no-bpf-event \
+ -- taskset --cpu-list $cpu true
+
+ if perf report -i ${perfdata} --stats | grep 'AUX events\|AUXTRACE events'; then
+ arm_spe_report "SPE discard mode found unexpected data" 1
+ else
+ arm_spe_report "SPE discard mode" 0
+ fi
+}
+
arm_spe_snapshot_test
arm_spe_system_wide_test
+arm_spe_discard_test
exit $glb_err
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 5/5] perf docs: arm_spe: Document new discard mode
2024-12-17 11:56 [PATCH 0/5] perf: arm_spe: Add format option for discard mode James Clark
` (3 preceding siblings ...)
2024-12-17 11:56 ` [PATCH 4/5] perf test: arm_spe: Add test for " James Clark
@ 2024-12-17 11:56 ` James Clark
2024-12-18 0:54 ` Ian Rogers
2024-12-18 10:39 ` [PATCH 0/5] perf: arm_spe: Add format option for " Yeo Reum Yun
5 siblings, 1 reply; 11+ messages in thread
From: James Clark @ 2024-12-17 11:56 UTC (permalink / raw)
To: linux-arm-kernel, linux-perf-users
Cc: James Clark, Will Deacon, Mark Rutland, Peter Zijlstra,
Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Liang, Kan, John Garry, Mike Leach, Leo Yan, Graham Woodward,
linux-kernel, bpf
Document the flag, hint what it's used for and give an example with
other useful options to get minimal output.
Signed-off-by: James Clark <james.clark@linaro.org>
---
tools/perf/Documentation/perf-arm-spe.txt | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
index de2b0b479249..588eead438bc 100644
--- a/tools/perf/Documentation/perf-arm-spe.txt
+++ b/tools/perf/Documentation/perf-arm-spe.txt
@@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
store_filter=1 - collect stores only (PMSFCR.ST)
ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
+ discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
+++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
than only the execution latency.
@@ -220,6 +221,16 @@ Common errors
Increase sampling interval (see above)
+Discard mode
+~~~~~~~~~~~~
+
+SPE PMU events can be used without the overhead of collecting sample data if
+discard mode is supported (optional from Armv8.6). First run a system wide SPE
+session (or on the core of interest) using options to minimize output. Then run
+perf stat:
+
+ perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
+ perf stat -e SAMPLE_FEED_LD
SEE ALSO
--------
--
2.34.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 5/5] perf docs: arm_spe: Document new discard mode
2024-12-17 11:56 ` [PATCH 5/5] perf docs: arm_spe: Document new " James Clark
@ 2024-12-18 0:54 ` Ian Rogers
2024-12-18 10:07 ` James Clark
0 siblings, 1 reply; 11+ messages in thread
From: Ian Rogers @ 2024-12-18 0:54 UTC (permalink / raw)
To: James Clark
Cc: linux-arm-kernel, linux-perf-users, Will Deacon, Mark Rutland,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Liang, Kan, John Garry, Mike Leach, Leo Yan, Graham Woodward,
linux-kernel, bpf
On Tue, Dec 17, 2024 at 3:56 AM James Clark <james.clark@linaro.org> wrote:
>
> Document the flag, hint what it's used for and give an example with
> other useful options to get minimal output.
>
> Signed-off-by: James Clark <james.clark@linaro.org>
> ---
> tools/perf/Documentation/perf-arm-spe.txt | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
> index de2b0b479249..588eead438bc 100644
> --- a/tools/perf/Documentation/perf-arm-spe.txt
> +++ b/tools/perf/Documentation/perf-arm-spe.txt
> @@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
> pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
> store_filter=1 - collect stores only (PMSFCR.ST)
> ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
> + discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
>
> +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
> than only the execution latency.
> @@ -220,6 +221,16 @@ Common errors
>
> Increase sampling interval (see above)
>
> +Discard mode
> +~~~~~~~~~~~~
> +
> +SPE PMU events can be used without the overhead of collecting sample data if
> +discard mode is supported (optional from Armv8.6). First run a system wide SPE
> +session (or on the core of interest) using options to minimize output. Then run
> +perf stat:
> +
> + perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
> + perf stat -e SAMPLE_FEED_LD
Perhaps clarify this should be an ARM SPE event? It seems strange to
have one perf command affect a later one, the purpose of things like
event multiplexing is to hide the hardware limits. I'd prefer if the
last bit was like:
```
Then run perf stat with an SPE event on the same PMU:
perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
perf stat -e arm_spe/SAMPLE_FEED_LD/
``
Thanks,
Ian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 5/5] perf docs: arm_spe: Document new discard mode
2024-12-18 0:54 ` Ian Rogers
@ 2024-12-18 10:07 ` James Clark
2024-12-18 19:47 ` Ian Rogers
0 siblings, 1 reply; 11+ messages in thread
From: James Clark @ 2024-12-18 10:07 UTC (permalink / raw)
To: Ian Rogers
Cc: linux-arm-kernel, linux-perf-users, Will Deacon, Mark Rutland,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Liang, Kan, John Garry, Mike Leach, Leo Yan, Graham Woodward,
linux-kernel, bpf
On 18/12/2024 12:54 am, Ian Rogers wrote:
> On Tue, Dec 17, 2024 at 3:56 AM James Clark <james.clark@linaro.org> wrote:
>>
>> Document the flag, hint what it's used for and give an example with
>> other useful options to get minimal output.
>>
>> Signed-off-by: James Clark <james.clark@linaro.org>
>> ---
>> tools/perf/Documentation/perf-arm-spe.txt | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
>> index de2b0b479249..588eead438bc 100644
>> --- a/tools/perf/Documentation/perf-arm-spe.txt
>> +++ b/tools/perf/Documentation/perf-arm-spe.txt
>> @@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
>> pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
>> store_filter=1 - collect stores only (PMSFCR.ST)
>> ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
>> + discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
>>
>> +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
>> than only the execution latency.
>> @@ -220,6 +221,16 @@ Common errors
>>
>> Increase sampling interval (see above)
>>
>> +Discard mode
>> +~~~~~~~~~~~~
>> +
>> +SPE PMU events can be used without the overhead of collecting sample data if
>> +discard mode is supported (optional from Armv8.6). First run a system wide SPE
>> +session (or on the core of interest) using options to minimize output. Then run
>> +perf stat:
>> +
>> + perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
>> + perf stat -e SAMPLE_FEED_LD
>
> Perhaps clarify this should be an ARM SPE event? It seems strange to
> have one perf command affect a later one, the purpose of things like
> event multiplexing is to hide the hardware limits. I'd prefer if the
> last bit was like:
> ```
> Then run perf stat with an SPE event on the same PMU:
>
> perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
> perf stat -e arm_spe/SAMPLE_FEED_LD/
> ``
>
> Thanks,
> Ian
Hi Ian,
Confusingly this isn't an SPE event, it is a normal PMU event. The fact
that one Perf command affects the other is because these events only
count when SPE is enabled. When it's enabled it has an effect on a
per-core level which is why in the example I made it simpler by enabling
SPE system wide.
SPE is an exclusive PMU like Coresight and some others so it can't be
affected by multiplexing or anything like that. The SAMPLE_FEED_LD PMU
would be, but as long as SPE stays enabled it will count the right thing
regardless of multiplexing.
THanks
James
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/5] perf: arm_spe: Add format option for discard mode
2024-12-17 11:56 [PATCH 0/5] perf: arm_spe: Add format option for discard mode James Clark
` (4 preceding siblings ...)
2024-12-17 11:56 ` [PATCH 5/5] perf docs: arm_spe: Document new " James Clark
@ 2024-12-18 10:39 ` Yeo Reum Yun
5 siblings, 0 replies; 11+ messages in thread
From: Yeo Reum Yun @ 2024-12-18 10:39 UTC (permalink / raw)
To: James Clark, linux-arm-kernel@lists.infradead.org,
linux-perf-users@vger.kernel.org
Cc: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Alexander Shishkin,
Jiri Olsa, Ian Rogers, Adrian Hunter, Liang, Kan, John Garry,
Mike Leach, Leo Yan, Graham Woodward,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org
This patch series looks good to me.
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
________________________________________
From: James Clark <james.clark@linaro.org>
Sent: 17 December 2024 11:56
To: linux-arm-kernel@lists.infradead.org; linux-perf-users@vger.kernel.org
Cc: James Clark; Will Deacon; Mark Rutland; Peter Zijlstra; Ingo Molnar; Arnaldo Carvalho de Melo; Namhyung Kim; Alexander Shishkin; Jiri Olsa; Ian Rogers; Adrian Hunter; Liang, Kan; John Garry; Mike Leach; Leo Yan; Graham Woodward; linux-kernel@vger.kernel.org; bpf@vger.kernel.org
Subject: [PATCH 0/5] perf: arm_spe: Add format option for discard mode
Discard mode is a way to enable SPE related PMU events without the
overhead of recording any data. Add a format option, tests and docs for
it.
In theory we could make the driver drop calls to allocate the aux buffer
when discard mode is enabled. This would give a small memory saving,
but I think there is potential to interfere with any tools that don't
expect this so I left the aux allocation untouched. Even old tools that
don't know about discard mode will be able to use it because we publish
the format option. Not allocating the aux buffer will have to be added
to tools which I've done in Perf.
Tested on the FVP with SAMPLE_FEED_OP (0x812D):
$ perf stat -e armv8_pmuv3/event=0x812D/ -- true
Performance counter stats for 'true':
0 armv8_pmuv3/event=0x812D/
$ perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
$ perf stat -e armv8_pmuv3/event=0x812D/ -- true
Performance counter stats for 'true':
17350 armv8_pmuv3/event=0x812D/
James Clark (5):
perf: arm_spe: Add format option for discard mode
perf tool: arm-spe: Pull out functions for aux buffer and tracking
setup
perf tool: arm-spe: Don't allocate buffer or tracking event in discard
mode
perf test: arm_spe: Add test for discard mode
perf docs: arm_spe: Document new discard mode
drivers/perf/arm_spe_pmu.c | 23 ++++++
tools/perf/Documentation/perf-arm-spe.txt | 11 +++
tools/perf/arch/arm64/util/arm-spe.c | 90 +++++++++++++++--------
tools/perf/tests/shell/test_arm_spe.sh | 30 ++++++++
4 files changed, 122 insertions(+), 32 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 5/5] perf docs: arm_spe: Document new discard mode
2024-12-18 10:07 ` James Clark
@ 2024-12-18 19:47 ` Ian Rogers
2024-12-19 10:10 ` James Clark
0 siblings, 1 reply; 11+ messages in thread
From: Ian Rogers @ 2024-12-18 19:47 UTC (permalink / raw)
To: James Clark
Cc: linux-arm-kernel, linux-perf-users, Will Deacon, Mark Rutland,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Liang, Kan, John Garry, Mike Leach, Leo Yan, Graham Woodward,
linux-kernel, bpf
On Wed, Dec 18, 2024 at 2:07 AM James Clark <james.clark@linaro.org> wrote:
>
> On 18/12/2024 12:54 am, Ian Rogers wrote:
> > On Tue, Dec 17, 2024 at 3:56 AM James Clark <james.clark@linaro.org> wrote:
> >>
> >> Document the flag, hint what it's used for and give an example with
> >> other useful options to get minimal output.
> >>
> >> Signed-off-by: James Clark <james.clark@linaro.org>
> >> ---
> >> tools/perf/Documentation/perf-arm-spe.txt | 11 +++++++++++
> >> 1 file changed, 11 insertions(+)
> >>
> >> diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
> >> index de2b0b479249..588eead438bc 100644
> >> --- a/tools/perf/Documentation/perf-arm-spe.txt
> >> +++ b/tools/perf/Documentation/perf-arm-spe.txt
> >> @@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
> >> pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
> >> store_filter=1 - collect stores only (PMSFCR.ST)
> >> ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
> >> + discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
> >>
> >> +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
> >> than only the execution latency.
> >> @@ -220,6 +221,16 @@ Common errors
> >>
> >> Increase sampling interval (see above)
> >>
> >> +Discard mode
> >> +~~~~~~~~~~~~
> >> +
> >> +SPE PMU events can be used without the overhead of collecting sample data if
> >> +discard mode is supported (optional from Armv8.6). First run a system wide SPE
> >> +session (or on the core of interest) using options to minimize output. Then run
> >> +perf stat:
> >> +
> >> + perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
> >> + perf stat -e SAMPLE_FEED_LD
> >
> > Perhaps clarify this should be an ARM SPE event? It seems strange to
> > have one perf command affect a later one, the purpose of things like
> > event multiplexing is to hide the hardware limits. I'd prefer if the
> > last bit was like:
> > ```
> > Then run perf stat with an SPE event on the same PMU:
> >
> > perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
> > perf stat -e arm_spe/SAMPLE_FEED_LD/
> > ``
> >
> > Thanks,
> > Ian
>
> Hi Ian,
>
> Confusingly this isn't an SPE event, it is a normal PMU event. The fact
> that one Perf command affects the other is because these events only
> count when SPE is enabled. When it's enabled it has an effect on a
> per-core level which is why in the example I made it simpler by enabling
> SPE system wide.
>
> SPE is an exclusive PMU like Coresight and some others so it can't be
> affected by multiplexing or anything like that. The SAMPLE_FEED_LD PMU
> would be, but as long as SPE stays enabled it will count the right thing
> regardless of multiplexing.
Thanks James, sorry for my SPE ignorance. I'm smiling about the use of
the word exclusive. When I was trying to make the tests run in
parallel I used a file lock - so shared and exclusive. There were a
lot of issues with that, hence switching to 2 phases in the test,
parallel then sequential but I kept the "exclusive" tag for want of a
better word. Perhaps the notion of an exclusive PMU existed previously
but maybe I've accidentally invented the term by way of a failed file
lock experiment :-)
Presumably the two PMUs side-effecting each other is a known thing. I
wonder if we can capture this in the documentation. When you say
"normal PMU event" you mean core PMU events?
Thanks,
Ian
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 5/5] perf docs: arm_spe: Document new discard mode
2024-12-18 19:47 ` Ian Rogers
@ 2024-12-19 10:10 ` James Clark
0 siblings, 0 replies; 11+ messages in thread
From: James Clark @ 2024-12-19 10:10 UTC (permalink / raw)
To: Ian Rogers
Cc: linux-arm-kernel, linux-perf-users, Will Deacon, Mark Rutland,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Liang, Kan, John Garry, Mike Leach, Leo Yan, Graham Woodward,
linux-kernel, bpf
On 18/12/2024 7:47 pm, Ian Rogers wrote:
> On Wed, Dec 18, 2024 at 2:07 AM James Clark <james.clark@linaro.org> wrote:
>>
>> On 18/12/2024 12:54 am, Ian Rogers wrote:
>>> On Tue, Dec 17, 2024 at 3:56 AM James Clark <james.clark@linaro.org> wrote:
>>>>
>>>> Document the flag, hint what it's used for and give an example with
>>>> other useful options to get minimal output.
>>>>
>>>> Signed-off-by: James Clark <james.clark@linaro.org>
>>>> ---
>>>> tools/perf/Documentation/perf-arm-spe.txt | 11 +++++++++++
>>>> 1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
>>>> index de2b0b479249..588eead438bc 100644
>>>> --- a/tools/perf/Documentation/perf-arm-spe.txt
>>>> +++ b/tools/perf/Documentation/perf-arm-spe.txt
>>>> @@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
>>>> pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
>>>> store_filter=1 - collect stores only (PMSFCR.ST)
>>>> ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
>>>> + discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
>>>>
>>>> +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
>>>> than only the execution latency.
>>>> @@ -220,6 +221,16 @@ Common errors
>>>>
>>>> Increase sampling interval (see above)
>>>>
>>>> +Discard mode
>>>> +~~~~~~~~~~~~
>>>> +
>>>> +SPE PMU events can be used without the overhead of collecting sample data if
>>>> +discard mode is supported (optional from Armv8.6). First run a system wide SPE
>>>> +session (or on the core of interest) using options to minimize output. Then run
>>>> +perf stat:
>>>> +
>>>> + perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
>>>> + perf stat -e SAMPLE_FEED_LD
>>>
>>> Perhaps clarify this should be an ARM SPE event? It seems strange to
>>> have one perf command affect a later one, the purpose of things like
>>> event multiplexing is to hide the hardware limits. I'd prefer if the
>>> last bit was like:
>>> ```
>>> Then run perf stat with an SPE event on the same PMU:
>>>
>>> perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
>>> perf stat -e arm_spe/SAMPLE_FEED_LD/
>>> ``
>>>
>>> Thanks,
>>> Ian
>>
>> Hi Ian,
>>
>> Confusingly this isn't an SPE event, it is a normal PMU event. The fact
>> that one Perf command affects the other is because these events only
>> count when SPE is enabled. When it's enabled it has an effect on a
>> per-core level which is why in the example I made it simpler by enabling
>> SPE system wide.
>>
>> SPE is an exclusive PMU like Coresight and some others so it can't be
>> affected by multiplexing or anything like that. The SAMPLE_FEED_LD PMU
>> would be, but as long as SPE stays enabled it will count the right thing
>> regardless of multiplexing.
>
> Thanks James, sorry for my SPE ignorance. I'm smiling about the use of
> the word exclusive. When I was trying to make the tests run in
> parallel I used a file lock - so shared and exclusive. There were a
> lot of issues with that, hence switching to 2 phases in the test,
> parallel then sequential but I kept the "exclusive" tag for want of a
> better word. Perhaps the notion of an exclusive PMU existed previously
Yeah, see PERF_PMU_CAP_EXCLUSIVE. Hopefully it doesn't cause too much
confusion, the context of test vs PMU should make it clear.
> but maybe I've accidentally invented the term by way of a failed file
> lock experiment :-)
>
> Presumably the two PMUs side-effecting each other is a known thing. I
> wonder if we can capture this in the documentation. When you say
> "normal PMU event" you mean core PMU events?
>
> Thanks,
> Ian
It should be a known thing yes, discard mode doesn't change this
behavior anyway but just makes one use case of it better. I can add
another section to this SPE manpage about it in a v2, that's probably
the best place for it.
And yes, I meant core PMU event. I can clarify that the second example
command is for a core PMU to avoid any doubt.
Thanks
James
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-12-19 10:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-17 11:56 [PATCH 0/5] perf: arm_spe: Add format option for discard mode James Clark
2024-12-17 11:56 ` [PATCH 1/5] " James Clark
2024-12-17 11:56 ` [PATCH 2/5] perf tool: arm-spe: Pull out functions for aux buffer and tracking setup James Clark
2024-12-17 11:56 ` [PATCH 3/5] perf tool: arm-spe: Don't allocate buffer or tracking event in discard mode James Clark
2024-12-17 11:56 ` [PATCH 4/5] perf test: arm_spe: Add test for " James Clark
2024-12-17 11:56 ` [PATCH 5/5] perf docs: arm_spe: Document new " James Clark
2024-12-18 0:54 ` Ian Rogers
2024-12-18 10:07 ` James Clark
2024-12-18 19:47 ` Ian Rogers
2024-12-19 10:10 ` James Clark
2024-12-18 10:39 ` [PATCH 0/5] perf: arm_spe: Add format option for " Yeo Reum Yun
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).