* [PATCH v2 0/5] A mechanism for efficient support for per-function metrics
@ 2025-01-06 12:01 mark.barnett
2025-01-06 12:01 ` [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods mark.barnett
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: mark.barnett @ 2025-01-06 12:01 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, irogers
Cc: ben.gainey, deepak.surti, ak, will, james.clark, mark.rutland,
alexander.shishkin, jolsa, adrian.hunter, linux-perf-users,
linux-kernel, linux-arm-kernel, Mark Barnett
From: Mark Barnett <mark.barnett@arm.com>
This patch introduces the concept of an alternating sample rate to perf
core and provides the necessary basic changes in the tools to activate
that option.
The primary use case for this change is to be able to enable collecting
per-function performance metrics using the Arm PMU, as per the following
approach:
* Starting with a simple periodic sampling (hotspot) profile,
augment each sample with PMU counters accumulated over a short window
up to the point the sample was taken.
* For each sample, perform some filtering to improve attribution of
the accumulated PMU counters (ensure they are attributed to a single
function)
* For each function accumulate a total for each PMU counter so that
metrics may be derived.
Without modification, and sampling at a typical rate associated
with hotspot profiling (~1mS) leads to poor results. Such an
approach gives you a reasonable estimation of where the profiled
application is spending time for relatively low overhead, but the
PMU counters cannot easily be attributed to a single function as the
window over which they are collected is too large. A modern CPU may
execute many millions of instructions over many thousands of functions
within 1mS window. With this approach, the per-function metrics tend
to trend to some average value across the top N functions in the
profile.
In order to ensure a reasonable likelihood that the counters are
attributed to a single function, the sampling window must be rather
short; typically something in the order of a few hundred cycles proves
well as tested on a range of aarch64 Cortex and Neoverse cores.
As it stands, it is possible to achieve this with perf using a very high
sampling rate (e.g ~300cy), but there are at least three major concerns
with this approach:
* For speculatively executing, out of order cores, can the results be
accurately attributed to a give function or the given sample window?
* A short sample window is not guaranteed to cover a single function.
* The overhead of sampling every few hundred cycles is very high and
is highly likely to cause throttling which is undesirable as it leads
to patchy results; i.e. the profile alternates between periods of
high frequency samples followed by longer periods of no samples.
This patch does not address the first two points directly. Some means
to address those are discussed on the RFC v2 cover letter. The patch
focuses on addressing the final point, though happily this approach
gives us a way to perform basic filtering on the second point.
The alternating sample period allows us to do two things:
* We can control the risk of throttling and reduce overhead by
alternating between a long and short period. This allows us to
decouple the "periodic" sampling rate (as might be used for hotspot
profiling) from the short sampling window needed for collecting
the PMU counters.
* The sample taken at the end of the long period can be otherwise
discarded (as the PMU data is not useful), but the
PERF_RECORD_CALLCHAIN information can be used to identify the current
function at the start of the short sample window. This is useful
for filtering samples where the PMU counter data cannot be attributed
to a single function.
There are several reasons why it is desirable to reduce the overhead and
risk of throttling:
* PMU counter overflow typically causes an interrupt into the kernel;
this affects program runtime, and can affect things like branch
prediction, cache locality and so on which can skew the metrics.
* The very high sample rate produces significant amounts of data.
Depending on the configuration of the profiling session and machine,
it is easily possible to produce many orders of magnitude more data
which is costly for tools to post-process and increases the chance
of data loss. This is especially relevant on larger core count
systems where it is very easy to produce massive recordings.
Whilst the kernel will throttle such a configuration,
which helps to mitigate a large portion of the bandwidth and capture
overhead, it is not something that can be controlled for on a per
event basis, or for non-root users, and because throttling is
controlled as a percentage of time its affects vary from machine to
machine. AIUI throttling may also produce an uneven temporal
distribution of samples. Finally, whilst throttling does a good job
at reducing the overall amount of data produced, it still leads to
much larger captures than with this method; typically we have
observed 1-2 orders of magnitude larger captures.
This patch set modifies perf core to support alternating between two
sample_period values, providing a simple and inexpensive way for tools
to separate out the sample window (time over which events are
counted) from the sample period (time between interesting samples).
It is expected to be used with the cycle counter event, alternating
between a long and short period and subsequently discarding the counter
data for samples with the long period. The combined long and short
period gives the overall sampling period, and the short sample period
gives the sample window. The symbol taken from the sample at the end of
the long period can be used by tools to ensure correct attribution as
described previously. The cycle counter is recommended as it provides
fair temporal distribution of samples as would be required for the
per-symbol sample count mentioned previously, and because the PMU can
be programmed to overflow after a sufficiently short window (which may
not be possible with software timer, for example). This patch does not
restrict to only the cycle counter, it is possible there could be other
novel uses based on different events, or more appropriate counters on
other architectures. This patch set does not modify or otherwise disable
the kernel's existing throttling behaviour; if a configuration is given
that would lead high CPU usage, then throttling still occurs.
To test this a simple `perf script` based python script was developed.
For a limited set of Arm PMU events it will post process a
`perf record`-ing and generate a table of metrics. Along side this a
benchmark application was developed that rotates through a sequence
of different classes of behaviour that can be detected by the Arm PMU
(eg. mispredicts, cache misses, different instruction mixes). The path
through the benchmark can be rotated after each iteration so as to
ensure the results don't land on some lucky harmonic with the sample
period. The script can be used with and without this patch allowing
comparison of the results. Testing was on Juno (A53+A57), N1SDP,
Gravaton 2 and 3. In addition this approach has been applied to a few
of Arm's tools projects and has correctly identified improvements and
regressions.
Headline results from testing indicate that ~300 cycles sample window
gives good results with or without this patch. Typical output on N1SDP (Neoverse-N1)
for the provided benchmark when run as:
perf record -T --sample-cpu --call-graph fp,4 --user-callchains \
-k CLOCK_MONOTONIC_RAW \
-e '{cycles/period=999700,alt-period=300/,instructions,branch-misses,cache-references,cache-misses}:uS' \
benchmark 0 1
perf script -s generate-function-metrics.py -- -s discard
Looks like (reformatted for email brevity):
Symbol # CPI BM/KI CM/KI %CM %CY %I %BM %L1DA %L1DM
fp_divider_stalls 6553 4.9 0.0 0.0 0.0 41.8 22.9 0.1 0.6 0.0
int_divider_stalls 4741 3.5 0.0 0.0 1.1 28.3 21.5 0.1 1.9 0.2
isb 3414 20.1 0.2 0.0 0.4 17.6 2.3 0.1 0.8 0.0
branch_mispredicts 1234 1.1 33.0 0.0 0.0 6.1 15.2 99.0 71.6 0.1
double_to_int 694 0.5 0.0 0.0 0.6 3.4 19.1 0.1 1.2 0.1
nops 417 0.3 0.2 0.0 2.8 1.9 18.3 0.6 0.4 0.1
dcache_miss 185 3.6 0.4 184.7 53.8 0.7 0.5 0.0 18.4 99.1
(CPI = Cycles/Instruction, BM/KI = Branch Misses per 1000 Instruction,
CM/KI = Cache Misses per 1000 Instruction, %CM = Percent of Cache
accesses that miss, %CY = Percentage of total cycles, %I = Percentage
of total instructions, %BM = Percentage of total branch mispredicts,
%L1DA = Percentage of total cache accesses, %L1DM = Percentage of total
cache misses)
When the patch is used, the resulting `perf.data` files are typically
between 25-50x smaller than without, and take ~25x less time for the
python script to post-process. For example, running the following:
perf record -i -vvv -e '{cycles/period=1000000/,instructions}:uS' benchmark 0 1
perf record -i -vvv -e '{cycles/period=1000/,instructions}:uS' benchmark 0 1
perf record -i -vvv -e '{cycles/period=300/,instructions}:uS' benchmark 0 1
produces captures on N1SDP (Neoverse-N1) of the following sizes:
* period=1000000: 2.601 MB perf.data (55780 samples), script time = 0m0.362s
* period=1000: 283.749 MB perf.data (6162932 samples), script time = 0m33.100s
* period=300: 304.281 MB perf.data (6614182 samples), script time = 0m35.826s
The "script time" is the user time from running "time perf script -s generate-function-metrics.py"
on the recording. Similar processing times were observed for "time perf report --stdio|cat"
as well.
By comparison, with the patch active:
perf record -i -vvv -e '{cycles/period=999700,alt-period=300/,instructions}:uS' benchmark 0 1
produces 4.923 MB perf.data (107512 samples), and script time = 0m0.578s.
Which is as expected ~2x the size and ~2x the number of samples as per
the period=1000000 recording. When compared to the period=300 recording,
the results from the provided post-processing script are (within margin
of error) the same, but the data file is ~62x smaller. The same affect
is seen for the post-processing script runtime.
Notably, without the patch enable, L1D cache miss rates are often higher
than with, which we attribute to increased impact on the cache that
trapping into the kernel every 300 cycles has.
These results are given with `perf_cpu_time_max_percent=25`. When tested
with `perf_cpu_time_max_percent=100` the size and time comparisons are
more significant. Disabling throttling did not lead to obvious
improvements in the collected metrics, suggesting that the sampling
approach is sufficient to collect representative metrics.
Cursory testing on a Xeon(R) W-2145 with a 300 *instruction* sample
window (with and without the patch) suggests this approach might work
for some counters. Using the same test script, it was possible to identify
branch mispredicts correctly. However, whilst the patch is functionally
correct, differences in the architectures may mean that this approach it
enables does not apply as a means to collect per-function metrics on x86.
Changes since v1:
- Rebased onto perf-tools-next, as per request from Ian Rogers.
- Removed unnecessary code that truncated period_left to 0 and restarted
the PMU.
- Renamed variables to use the shorter 'alt_period' instead of
'alterantive_period'.
- Added patch #5 that addresses an issue in the x86 and PowerPC drivers that
caused the opposite period to be reported in the sample record.
Changes since RFC v2:
- Rebased on v6.12-rc6.
Changes since RFC v1:
- Rebased on v6.9-rc1.
- Refactored from arm_pmu based extension to core feature
- Added the ability to jitter the sample window based on feedback
from Andi Kleen.
- Modified perf tool to parse the "alt-period" and "alt-period-jitter"
terms in the event specification.
Ben Gainey (4):
perf: Allow periodic events to alternate between two sample periods
perf: Allow adding fixed random jitter to the alternate sampling
period
tools/perf: Modify event parser to support alt-period term
tools/perf: Modify event parser to support alt-period-jitter term
Mark Barnett (1):
perf: Record sample last_period before updating
arch/powerpc/perf/core-book3s.c | 3 +-
arch/powerpc/perf/core-fsl-emb.c | 3 +-
arch/x86/events/core.c | 4 +-
arch/x86/events/intel/core.c | 3 +-
arch/x86/events/intel/knc.c | 3 +-
include/linux/perf_event.h | 5 +++
include/uapi/linux/perf_event.h | 10 ++++-
kernel/events/core.c | 44 ++++++++++++++++++-
tools/include/uapi/linux/perf_event.h | 10 ++++-
tools/perf/tests/shell/attr/base-record | 4 +-
tools/perf/tests/shell/attr/base-record-spe | 2 +
tools/perf/tests/shell/attr/base-stat | 4 +-
tools/perf/tests/shell/attr/system-wide-dummy | 4 +-
.../attr/test-record-alt-period-jitter-term | 13 ++++++
.../shell/attr/test-record-alt-period-term | 12 +++++
.../tests/shell/attr/test-record-dummy-C0 | 4 +-
tools/perf/tests/shell/lib/attr.py | 2 +
tools/perf/util/evsel.c | 2 +
tools/perf/util/parse-events.c | 30 +++++++++++++
tools/perf/util/parse-events.h | 4 +-
tools/perf/util/parse-events.l | 2 +
tools/perf/util/perf_event_attr_fprintf.c | 1 +
tools/perf/util/pmu.c | 4 +-
23 files changed, 159 insertions(+), 14 deletions(-)
create mode 100644 tools/perf/tests/shell/attr/test-record-alt-period-jitter-term
create mode 100644 tools/perf/tests/shell/attr/test-record-alt-period-term
--
2.43.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods
2025-01-06 12:01 [PATCH v2 0/5] A mechanism for efficient support for per-function metrics mark.barnett
@ 2025-01-06 12:01 ` mark.barnett
2025-01-21 13:01 ` Leo Yan
2025-01-31 18:44 ` Rob Herring
2025-01-06 12:01 ` [PATCH v2 2/5] perf: Allow adding fixed random jitter to the alternate sampling period mark.barnett
` (4 subsequent siblings)
5 siblings, 2 replies; 15+ messages in thread
From: mark.barnett @ 2025-01-06 12:01 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, irogers
Cc: ben.gainey, deepak.surti, ak, will, james.clark, mark.rutland,
alexander.shishkin, jolsa, adrian.hunter, linux-perf-users,
linux-kernel, linux-arm-kernel, Mark Barnett
From: Ben Gainey <ben.gainey@arm.com>
This change modifies perf_event_attr to add a second, alternative
sample period field, and modifies the core perf overflow handling
such that when specified an event will alternate between two sample
periods.
Currently, perf does not provide a mechanism for decoupling the period
over which counters are counted from the period between samples. This is
problematic for building a tool to measure per-function metrics derived
from a sampled counter group. Ideally such a tool wants a very small
sample window in order to correctly attribute the metrics to a given
function, but prefers a larger sample period that provides representative
coverage without excessive probe effect, triggering throttling, or
generating excessive amounts of data.
By alternating between a long and short sample_period and subsequently
discarding the long samples, tools may decouple the period between
samples that the tool cares about from the window of time over which
interesting counts are collected.
It is expected that typically tools would use this feature with the
cycles or instructions events as an approximation for time, but no
restrictions are applied to which events this can be applied to.
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Signed-off-by: Mark Barnett <mark.barnett@arm.com>
---
include/linux/perf_event.h | 5 +++++
include/uapi/linux/perf_event.h | 3 +++
kernel/events/core.c | 37 ++++++++++++++++++++++++++++++++-
3 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index cb99ec8c9e96..cbb332f4e19c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -276,6 +276,11 @@ struct hw_perf_event {
*/
u64 freq_time_stamp;
u64 freq_count_stamp;
+
+ /*
+ * Indicates that the alternative sample period is used
+ */
+ bool using_alt_sample_period;
#endif
};
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 0524d541d4e3..499a8673df8e 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -379,6 +379,7 @@ enum perf_event_read_format {
#define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */
#define PERF_ATTR_SIZE_VER7 128 /* add: sig_data */
#define PERF_ATTR_SIZE_VER8 136 /* add: config3 */
+#define PERF_ATTR_SIZE_VER9 144 /* add: alt_sample_period */
/*
* Hardware event_id to monitor via a performance monitoring event:
@@ -531,6 +532,8 @@ struct perf_event_attr {
__u64 sig_data;
__u64 config3; /* extension of config2 */
+
+ __u64 alt_sample_period;
};
/*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 065f9188b44a..7e339d12363a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4178,6 +4178,8 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
s64 period, sample_period;
s64 delta;
+ WARN_ON_ONCE(hwc->using_alt_sample_period);
+
period = perf_calculate_period(event, nsec, count);
delta = (s64)(period - hwc->sample_period);
@@ -9850,6 +9852,7 @@ static int __perf_event_overflow(struct perf_event *event,
int throttle, struct perf_sample_data *data,
struct pt_regs *regs)
{
+ struct hw_perf_event *hwc = &event->hw;
int events = atomic_read(&event->event_limit);
int ret = 0;
@@ -9869,6 +9872,18 @@ static int __perf_event_overflow(struct perf_event *event,
!bpf_overflow_handler(event, data, regs))
goto out;
+ /*
+ * Swap the sample period to the alternative period
+ */
+ if (event->attr.alt_sample_period) {
+ bool using_alt = hwc->using_alt_sample_period;
+ u64 sample_period = (using_alt ? event->attr.sample_period
+ : event->attr.alt_sample_period);
+
+ hwc->sample_period = sample_period;
+ hwc->using_alt_sample_period = !using_alt;
+ }
+
/*
* XXX event_limit might not quite work as expected on inherited
* events
@@ -12291,9 +12306,19 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
if (attr->freq && attr->sample_freq)
hwc->sample_period = 1;
hwc->last_period = hwc->sample_period;
-
local64_set(&hwc->period_left, hwc->sample_period);
+ if (attr->alt_sample_period) {
+ hwc->sample_period = attr->alt_sample_period;
+ hwc->using_alt_sample_period = true;
+ }
+
+ /*
+ * alt_sample_period cannot be used with freq
+ */
+ if (attr->freq && attr->alt_sample_period)
+ goto err_ns;
+
/*
* We do not support PERF_SAMPLE_READ on inherited events unless
* PERF_SAMPLE_TID is also selected, which allows inherited events to
@@ -12763,9 +12788,19 @@ SYSCALL_DEFINE5(perf_event_open,
if (attr.freq) {
if (attr.sample_freq > sysctl_perf_event_sample_rate)
return -EINVAL;
+ if (attr.alt_sample_period)
+ return -EINVAL;
} else {
if (attr.sample_period & (1ULL << 63))
return -EINVAL;
+ if (attr.alt_sample_period) {
+ if (!attr.sample_period)
+ return -EINVAL;
+ if (attr.alt_sample_period & (1ULL << 63))
+ return -EINVAL;
+ if (attr.alt_sample_period == attr.sample_period)
+ attr.alt_sample_period = 0;
+ }
}
/* Only privileged users can get physical addresses */
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 2/5] perf: Allow adding fixed random jitter to the alternate sampling period
2025-01-06 12:01 [PATCH v2 0/5] A mechanism for efficient support for per-function metrics mark.barnett
2025-01-06 12:01 ` [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods mark.barnett
@ 2025-01-06 12:01 ` mark.barnett
2025-01-06 12:01 ` [PATCH v2 3/5] tools/perf: Modify event parser to support alt-period term mark.barnett
` (3 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: mark.barnett @ 2025-01-06 12:01 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, irogers
Cc: ben.gainey, deepak.surti, ak, will, james.clark, mark.rutland,
alexander.shishkin, jolsa, adrian.hunter, linux-perf-users,
linux-kernel, linux-arm-kernel, Mark Barnett
From: Ben Gainey <ben.gainey@arm.com>
This change modifies the core perf overflow handler, adding some small
random jitter to each sample period whenever an event switches between the
two alternate sample periods. A new flag is added to perf_event_attr to
opt into this behaviour.
This change follows the discussion in [1], where it is recognized that it
may be possible for certain patterns of execution to end up with biased
results.
[1] https://lore.kernel.org/linux-perf-users/Zc24eLqZycmIg3d2@tassilo/
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Signed-off-by: Mark Barnett <mark.barnett@arm.com>
---
include/uapi/linux/perf_event.h | 7 ++++++-
kernel/events/core.c | 9 ++++++++-
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 499a8673df8e..c0076ce8f80a 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -461,7 +461,12 @@ struct perf_event_attr {
inherit_thread : 1, /* children only inherit if cloned with CLONE_THREAD */
remove_on_exec : 1, /* event is removed from task on exec */
sigtrap : 1, /* send synchronous SIGTRAP on event */
- __reserved_1 : 26;
+ /*
+ * Add a limited amount of jitter on each alternate period, where
+ * the jitter is between [0, (2<<jitter_alt_period) - 1]
+ */
+ jitter_alt_period : 3,
+ __reserved_1 : 23;
union {
__u32 wakeup_events; /* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7e339d12363a..c5480965df32 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -15,6 +15,7 @@
#include <linux/idr.h>
#include <linux/file.h>
#include <linux/poll.h>
+#include <linux/random.h>
#include <linux/slab.h>
#include <linux/hash.h>
#include <linux/tick.h>
@@ -9878,7 +9879,10 @@ static int __perf_event_overflow(struct perf_event *event,
if (event->attr.alt_sample_period) {
bool using_alt = hwc->using_alt_sample_period;
u64 sample_period = (using_alt ? event->attr.sample_period
- : event->attr.alt_sample_period);
+ : event->attr.alt_sample_period)
+ + (event->attr.jitter_alt_period
+ ? get_random_u32_below(2 << event->attr.jitter_alt_period)
+ : 0);
hwc->sample_period = sample_period;
hwc->using_alt_sample_period = !using_alt;
@@ -12803,6 +12807,9 @@ SYSCALL_DEFINE5(perf_event_open,
}
}
+ if (attr.jitter_alt_period && !attr.alt_sample_period)
+ return -EINVAL;
+
/* Only privileged users can get physical addresses */
if ((attr.sample_type & PERF_SAMPLE_PHYS_ADDR)) {
err = perf_allow_kernel(&attr);
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 3/5] tools/perf: Modify event parser to support alt-period term
2025-01-06 12:01 [PATCH v2 0/5] A mechanism for efficient support for per-function metrics mark.barnett
2025-01-06 12:01 ` [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods mark.barnett
2025-01-06 12:01 ` [PATCH v2 2/5] perf: Allow adding fixed random jitter to the alternate sampling period mark.barnett
@ 2025-01-06 12:01 ` mark.barnett
2025-01-06 12:01 ` [PATCH v2 4/5] tools/perf: Modify event parser to support alt-period-jitter term mark.barnett
` (2 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: mark.barnett @ 2025-01-06 12:01 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, irogers
Cc: ben.gainey, deepak.surti, ak, will, james.clark, mark.rutland,
alexander.shishkin, jolsa, adrian.hunter, linux-perf-users,
linux-kernel, linux-arm-kernel, Mark Barnett
From: Ben Gainey <ben.gainey@arm.com>
parse-events is modified, adding the "alt-period" term which can be used
to specify the alternative sampling period.
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Signed-off-by: Mark Barnett <mark.barnett@arm.com>
---
tools/include/uapi/linux/perf_event.h | 3 +++
tools/perf/tests/shell/attr/base-record | 3 ++-
tools/perf/tests/shell/attr/base-record-spe | 1 +
tools/perf/tests/shell/attr/base-stat | 3 ++-
tools/perf/tests/shell/attr/system-wide-dummy | 3 ++-
.../tests/shell/attr/test-record-alt-period-term | 12 ++++++++++++
tools/perf/tests/shell/attr/test-record-dummy-C0 | 3 ++-
tools/perf/tests/shell/lib/attr.py | 1 +
tools/perf/util/evsel.c | 1 +
tools/perf/util/parse-events.c | 15 +++++++++++++++
tools/perf/util/parse-events.h | 3 ++-
tools/perf/util/parse-events.l | 1 +
tools/perf/util/perf_event_attr_fprintf.c | 1 +
tools/perf/util/pmu.c | 1 +
14 files changed, 46 insertions(+), 5 deletions(-)
create mode 100644 tools/perf/tests/shell/attr/test-record-alt-period-term
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 0524d541d4e3..499a8673df8e 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -379,6 +379,7 @@ enum perf_event_read_format {
#define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */
#define PERF_ATTR_SIZE_VER7 128 /* add: sig_data */
#define PERF_ATTR_SIZE_VER8 136 /* add: config3 */
+#define PERF_ATTR_SIZE_VER9 144 /* add: alt_sample_period */
/*
* Hardware event_id to monitor via a performance monitoring event:
@@ -531,6 +532,8 @@ struct perf_event_attr {
__u64 sig_data;
__u64 config3; /* extension of config2 */
+
+ __u64 alt_sample_period;
};
/*
diff --git a/tools/perf/tests/shell/attr/base-record b/tools/perf/tests/shell/attr/base-record
index b44e4e6e4443..28a7233f7bc1 100644
--- a/tools/perf/tests/shell/attr/base-record
+++ b/tools/perf/tests/shell/attr/base-record
@@ -5,7 +5,7 @@ group_fd=-1
flags=0|8
cpu=*
type=0|1
-size=136
+size=144
config=0|1
sample_period=*
sample_type=263
@@ -39,3 +39,4 @@ config2=0
branch_sample_type=0
sample_regs_user=0
sample_stack_user=0
+alt_sample_period=0
diff --git a/tools/perf/tests/shell/attr/base-record-spe b/tools/perf/tests/shell/attr/base-record-spe
index 08fa96b59240..ad8eb72e655a 100644
--- a/tools/perf/tests/shell/attr/base-record-spe
+++ b/tools/perf/tests/shell/attr/base-record-spe
@@ -38,3 +38,4 @@ config2=*
branch_sample_type=*
sample_regs_user=*
sample_stack_user=*
+alt_sample_period=0
diff --git a/tools/perf/tests/shell/attr/base-stat b/tools/perf/tests/shell/attr/base-stat
index fccd8ec4d1b0..2de92c5c300d 100644
--- a/tools/perf/tests/shell/attr/base-stat
+++ b/tools/perf/tests/shell/attr/base-stat
@@ -5,7 +5,7 @@ group_fd=-1
flags=0|8
cpu=*
type=0
-size=136
+size=144
config=0
sample_period=0
sample_type=65536
@@ -39,3 +39,4 @@ config2=0
branch_sample_type=0
sample_regs_user=0
sample_stack_user=0
+alt_sample_period=0
diff --git a/tools/perf/tests/shell/attr/system-wide-dummy b/tools/perf/tests/shell/attr/system-wide-dummy
index a1e1d6a263bf..c0a17bb3c022 100644
--- a/tools/perf/tests/shell/attr/system-wide-dummy
+++ b/tools/perf/tests/shell/attr/system-wide-dummy
@@ -7,7 +7,7 @@ cpu=*
pid=-1
flags=8
type=1
-size=136
+size=144
config=9
sample_period=1
# PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_TIME |
@@ -50,3 +50,4 @@ config2=0
branch_sample_type=0
sample_regs_user=0
sample_stack_user=0
+alt_sample_period=0
diff --git a/tools/perf/tests/shell/attr/test-record-alt-period-term b/tools/perf/tests/shell/attr/test-record-alt-period-term
new file mode 100644
index 000000000000..fcdb790adcd3
--- /dev/null
+++ b/tools/perf/tests/shell/attr/test-record-alt-period-term
@@ -0,0 +1,12 @@
+[config]
+command = record
+args = --no-bpf-event -e cycles/period=3,alt-period=2/ -- kill >/dev/null 2>&1
+ret = 1
+kernel_since = 6.11
+
+[event-10:base-record]
+sample_period=3
+alt_sample_period=2
+
+freq=0
+sample_type=7
diff --git a/tools/perf/tests/shell/attr/test-record-dummy-C0 b/tools/perf/tests/shell/attr/test-record-dummy-C0
index 91499405fff4..e6315918a95e 100644
--- a/tools/perf/tests/shell/attr/test-record-dummy-C0
+++ b/tools/perf/tests/shell/attr/test-record-dummy-C0
@@ -10,7 +10,7 @@ cpu=0
pid=-1
flags=8
type=1
-size=136
+size=144
config=9
sample_period=4000
# PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_TIME |
@@ -53,3 +53,4 @@ config2=0
branch_sample_type=0
sample_regs_user=0
sample_stack_user=0
+alt_sample_period=0
diff --git a/tools/perf/tests/shell/lib/attr.py b/tools/perf/tests/shell/lib/attr.py
index 3db9a7d78715..04e95f76005a 100644
--- a/tools/perf/tests/shell/lib/attr.py
+++ b/tools/perf/tests/shell/lib/attr.py
@@ -91,6 +91,7 @@ class Event(dict):
'branch_sample_type',
'sample_regs_user',
'sample_stack_user',
+ 'alt_sample_period',
]
def add(self, data):
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 697428efa644..a796e49ef2f6 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -180,6 +180,7 @@ static int store_event(struct perf_event_attr *attr, pid_t pid, struct perf_cpu
WRITE_ASS(branch_sample_type, "llu");
WRITE_ASS(sample_regs_user, "llu");
WRITE_ASS(sample_stack_user, PRIu32);
+ WRITE_ASS(alt_sample_period, "llu");
fclose(file);
return 0;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 1e23faa364b1..41e82347a020 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -799,6 +799,7 @@ const char *parse_events__term_type_str(enum parse_events__term_type term_type)
[PARSE_EVENTS__TERM_TYPE_RAW] = "raw",
[PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE] = "legacy-cache",
[PARSE_EVENTS__TERM_TYPE_HARDWARE] = "hardware",
+ [PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD] = "alt-period",
};
if ((unsigned int)term_type >= __PARSE_EVENTS__TERM_TYPE_NR)
return "unknown term";
@@ -827,6 +828,7 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
case PARSE_EVENTS__TERM_TYPE_NAME:
case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
+ case PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD:
case PARSE_EVENTS__TERM_TYPE_PERCORE:
return true;
case PARSE_EVENTS__TERM_TYPE_USER:
@@ -975,6 +977,16 @@ do { \
return -EINVAL;
}
break;
+ case PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD:
+ CHECK_TYPE_VAL(NUM);
+ if (term->val.num == 0) {
+ parse_events_error__handle(err, term->err_val,
+ strdup("expected a non-zero value"),
+ NULL);
+ return -EINVAL;
+ }
+ attr->alt_sample_period = term->val.num;
+ break;
case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
case PARSE_EVENTS__TERM_TYPE_USER:
case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
@@ -1102,6 +1114,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
case PARSE_EVENTS__TERM_TYPE_RAW:
case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
case PARSE_EVENTS__TERM_TYPE_HARDWARE:
+ case PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD:
default:
if (err) {
parse_events_error__handle(err, term->err_term,
@@ -1236,6 +1249,7 @@ do { \
case PARSE_EVENTS__TERM_TYPE_RAW:
case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
case PARSE_EVENTS__TERM_TYPE_HARDWARE:
+ case PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD:
default:
break;
}
@@ -1290,6 +1304,7 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
case PARSE_EVENTS__TERM_TYPE_RAW:
case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
case PARSE_EVENTS__TERM_TYPE_HARDWARE:
+ case PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD:
default:
break;
}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e176a34ab088..d00bb6c5d9ab 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -80,7 +80,8 @@ enum parse_events__term_type {
PARSE_EVENTS__TERM_TYPE_RAW,
PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE,
PARSE_EVENTS__TERM_TYPE_HARDWARE,
-#define __PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_HARDWARE + 1)
+ PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD,
+#define __PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD + 1)
};
struct parse_events_term {
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index bf7f73548605..3538f5bca5df 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -324,6 +324,7 @@ aux-output { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
aux-action { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_ACTION); }
aux-sample-size { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
metric-id { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
+alt-period { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD); }
cpu-cycles|cycles { return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
stalled-cycles-frontend|idle-cycles-frontend { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
stalled-cycles-backend|idle-cycles-backend { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index c7f3543b9921..e823240b7dd8 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -334,6 +334,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
PRINT_ATTRf(aux_start_paused, p_unsigned);
PRINT_ATTRf(aux_pause, p_unsigned);
PRINT_ATTRf(aux_resume, p_unsigned);
+ PRINT_ATTRf(alt_sample_period, p_unsigned);
return ret;
}
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 6206c8fe2bf9..376cdec933c2 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1775,6 +1775,7 @@ int perf_pmu__for_each_format(struct perf_pmu *pmu, void *state, pmu_format_call
"aux-output",
"aux-action=(pause|resume|start-paused)",
"aux-sample-size=number",
+ "alt-period=number",
};
struct perf_pmu_format *format;
int ret;
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 4/5] tools/perf: Modify event parser to support alt-period-jitter term
2025-01-06 12:01 [PATCH v2 0/5] A mechanism for efficient support for per-function metrics mark.barnett
` (2 preceding siblings ...)
2025-01-06 12:01 ` [PATCH v2 3/5] tools/perf: Modify event parser to support alt-period term mark.barnett
@ 2025-01-06 12:01 ` mark.barnett
2025-01-06 12:01 ` [PATCH v2 5/5] perf: Record sample last_period before updating mark.barnett
2025-01-22 16:47 ` [PATCH v2 0/5] A mechanism for efficient support for per-function metrics James Clark
5 siblings, 0 replies; 15+ messages in thread
From: mark.barnett @ 2025-01-06 12:01 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, irogers
Cc: ben.gainey, deepak.surti, ak, will, james.clark, mark.rutland,
alexander.shishkin, jolsa, adrian.hunter, linux-perf-users,
linux-kernel, linux-arm-kernel, Mark Barnett
From: Ben Gainey <ben.gainey@arm.com>
parse-events is modified, adding the "alt-period-jitter" term which
can be used to enable random jitter of the alternative sample
period.
Signed-off-by: Ben Gainey <ben.gainey@arm.com>
Signed-off-by: Mark Barnett <mark.barnett@arm.com>
---
tools/include/uapi/linux/perf_event.h | 7 ++++++-
tools/perf/tests/shell/attr/base-record | 1 +
tools/perf/tests/shell/attr/base-record-spe | 1 +
tools/perf/tests/shell/attr/base-stat | 1 +
tools/perf/tests/shell/attr/system-wide-dummy | 1 +
.../shell/attr/test-record-alt-period-jitter-term | 13 +++++++++++++
tools/perf/tests/shell/attr/test-record-dummy-C0 | 1 +
tools/perf/tests/shell/lib/attr.py | 1 +
tools/perf/util/evsel.c | 1 +
tools/perf/util/parse-events.c | 15 +++++++++++++++
tools/perf/util/parse-events.h | 3 ++-
tools/perf/util/parse-events.l | 1 +
tools/perf/util/pmu.c | 3 ++-
13 files changed, 46 insertions(+), 3 deletions(-)
create mode 100644 tools/perf/tests/shell/attr/test-record-alt-period-jitter-term
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 499a8673df8e..c0076ce8f80a 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -461,7 +461,12 @@ struct perf_event_attr {
inherit_thread : 1, /* children only inherit if cloned with CLONE_THREAD */
remove_on_exec : 1, /* event is removed from task on exec */
sigtrap : 1, /* send synchronous SIGTRAP on event */
- __reserved_1 : 26;
+ /*
+ * Add a limited amount of jitter on each alternate period, where
+ * the jitter is between [0, (2<<jitter_alt_period) - 1]
+ */
+ jitter_alt_period : 3,
+ __reserved_1 : 23;
union {
__u32 wakeup_events; /* wakeup every n events */
diff --git a/tools/perf/tests/shell/attr/base-record b/tools/perf/tests/shell/attr/base-record
index 28a7233f7bc1..1f5ab125c78d 100644
--- a/tools/perf/tests/shell/attr/base-record
+++ b/tools/perf/tests/shell/attr/base-record
@@ -40,3 +40,4 @@ branch_sample_type=0
sample_regs_user=0
sample_stack_user=0
alt_sample_period=0
+jitter_alt_period=0
diff --git a/tools/perf/tests/shell/attr/base-record-spe b/tools/perf/tests/shell/attr/base-record-spe
index ad8eb72e655a..b35ca04b2ce4 100644
--- a/tools/perf/tests/shell/attr/base-record-spe
+++ b/tools/perf/tests/shell/attr/base-record-spe
@@ -39,3 +39,4 @@ branch_sample_type=*
sample_regs_user=*
sample_stack_user=*
alt_sample_period=0
+jitter_alt_period=0
diff --git a/tools/perf/tests/shell/attr/base-stat b/tools/perf/tests/shell/attr/base-stat
index 2de92c5c300d..2d90a055686a 100644
--- a/tools/perf/tests/shell/attr/base-stat
+++ b/tools/perf/tests/shell/attr/base-stat
@@ -40,3 +40,4 @@ branch_sample_type=0
sample_regs_user=0
sample_stack_user=0
alt_sample_period=0
+jitter_alt_period=0
diff --git a/tools/perf/tests/shell/attr/system-wide-dummy b/tools/perf/tests/shell/attr/system-wide-dummy
index c0a17bb3c022..527707b505e0 100644
--- a/tools/perf/tests/shell/attr/system-wide-dummy
+++ b/tools/perf/tests/shell/attr/system-wide-dummy
@@ -51,3 +51,4 @@ branch_sample_type=0
sample_regs_user=0
sample_stack_user=0
alt_sample_period=0
+jitter_alt_period=0
diff --git a/tools/perf/tests/shell/attr/test-record-alt-period-jitter-term b/tools/perf/tests/shell/attr/test-record-alt-period-jitter-term
new file mode 100644
index 000000000000..6b31c898c905
--- /dev/null
+++ b/tools/perf/tests/shell/attr/test-record-alt-period-jitter-term
@@ -0,0 +1,13 @@
+[config]
+command = record
+args = --no-bpf-event -e cycles/period=3,alt-period=2,alt-period-jitter=7/ -- kill >/dev/null 2>&1
+ret = 1
+kernel_since = 6.11
+
+[event-10:base-record]
+sample_period=3
+alt_sample_period=2
+jitter_alt_period=7
+
+freq=0
+sample_type=7
diff --git a/tools/perf/tests/shell/attr/test-record-dummy-C0 b/tools/perf/tests/shell/attr/test-record-dummy-C0
index e6315918a95e..436534df0434 100644
--- a/tools/perf/tests/shell/attr/test-record-dummy-C0
+++ b/tools/perf/tests/shell/attr/test-record-dummy-C0
@@ -54,3 +54,4 @@ branch_sample_type=0
sample_regs_user=0
sample_stack_user=0
alt_sample_period=0
+jitter_alt_period=0
diff --git a/tools/perf/tests/shell/lib/attr.py b/tools/perf/tests/shell/lib/attr.py
index 04e95f76005a..d15363e925fe 100644
--- a/tools/perf/tests/shell/lib/attr.py
+++ b/tools/perf/tests/shell/lib/attr.py
@@ -92,6 +92,7 @@ class Event(dict):
'sample_regs_user',
'sample_stack_user',
'alt_sample_period',
+ 'jitter_alt_period',
]
def add(self, data):
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a796e49ef2f6..420bf0ce55eb 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -181,6 +181,7 @@ static int store_event(struct perf_event_attr *attr, pid_t pid, struct perf_cpu
WRITE_ASS(sample_regs_user, "llu");
WRITE_ASS(sample_stack_user, PRIu32);
WRITE_ASS(alt_sample_period, "llu");
+ WRITE_ASS(jitter_alt_period, "d");
fclose(file);
return 0;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 41e82347a020..53079053a044 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -800,6 +800,7 @@ const char *parse_events__term_type_str(enum parse_events__term_type term_type)
[PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE] = "legacy-cache",
[PARSE_EVENTS__TERM_TYPE_HARDWARE] = "hardware",
[PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD] = "alt-period",
+ [PARSE_EVENTS__TERM_TYPE_ALT_PERIOD_JITTER] = "alt-period-jitter",
};
if ((unsigned int)term_type >= __PARSE_EVENTS__TERM_TYPE_NR)
return "unknown term";
@@ -829,6 +830,7 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
case PARSE_EVENTS__TERM_TYPE_METRIC_ID:
case PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD:
case PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD:
+ case PARSE_EVENTS__TERM_TYPE_ALT_PERIOD_JITTER:
case PARSE_EVENTS__TERM_TYPE_PERCORE:
return true;
case PARSE_EVENTS__TERM_TYPE_USER:
@@ -987,6 +989,16 @@ do { \
}
attr->alt_sample_period = term->val.num;
break;
+ case PARSE_EVENTS__TERM_TYPE_ALT_PERIOD_JITTER:
+ CHECK_TYPE_VAL(NUM);
+ if ((unsigned int)term->val.num > 7) {
+ parse_events_error__handle(err, term->err_val,
+ strdup("expected a value between 0-7"),
+ NULL);
+ return -EINVAL;
+ }
+ attr->jitter_alt_period = (unsigned int)term->val.num;
+ break;
case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
case PARSE_EVENTS__TERM_TYPE_USER:
case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
@@ -1115,6 +1127,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
case PARSE_EVENTS__TERM_TYPE_HARDWARE:
case PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD:
+ case PARSE_EVENTS__TERM_TYPE_ALT_PERIOD_JITTER:
default:
if (err) {
parse_events_error__handle(err, term->err_term,
@@ -1250,6 +1263,7 @@ do { \
case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
case PARSE_EVENTS__TERM_TYPE_HARDWARE:
case PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD:
+ case PARSE_EVENTS__TERM_TYPE_ALT_PERIOD_JITTER:
default:
break;
}
@@ -1305,6 +1319,7 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
case PARSE_EVENTS__TERM_TYPE_HARDWARE:
case PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD:
+ case PARSE_EVENTS__TERM_TYPE_ALT_PERIOD_JITTER:
default:
break;
}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index d00bb6c5d9ab..dafd4b4d0f0e 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -81,7 +81,8 @@ enum parse_events__term_type {
PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE,
PARSE_EVENTS__TERM_TYPE_HARDWARE,
PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD,
-#define __PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD + 1)
+ PARSE_EVENTS__TERM_TYPE_ALT_PERIOD_JITTER,
+#define __PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_ALT_PERIOD_JITTER + 1)
};
struct parse_events_term {
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 3538f5bca5df..ceb19d41766a 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -325,6 +325,7 @@ aux-action { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_ACTION); }
aux-sample-size { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
metric-id { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
alt-period { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_ALT_SAMPLE_PERIOD); }
+alt-period-jitter { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_ALT_PERIOD_JITTER); }
cpu-cycles|cycles { return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
stalled-cycles-frontend|idle-cycles-frontend { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
stalled-cycles-backend|idle-cycles-backend { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 376cdec933c2..d8e145d6abff 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1400,7 +1400,7 @@ static int pmu_config_term(const struct perf_pmu *pmu,
break;
case PARSE_EVENTS__TERM_TYPE_USER: /* Not hardcoded. */
return -EINVAL;
- case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_HARDWARE:
+ case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_ALT_PERIOD_JITTER:
/* Skip non-config terms. */
break;
default:
@@ -1776,6 +1776,7 @@ int perf_pmu__for_each_format(struct perf_pmu *pmu, void *state, pmu_format_call
"aux-action=(pause|resume|start-paused)",
"aux-sample-size=number",
"alt-period=number",
+ "alt-period-jitter=number",
};
struct perf_pmu_format *format;
int ret;
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 5/5] perf: Record sample last_period before updating
2025-01-06 12:01 [PATCH v2 0/5] A mechanism for efficient support for per-function metrics mark.barnett
` (3 preceding siblings ...)
2025-01-06 12:01 ` [PATCH v2 4/5] tools/perf: Modify event parser to support alt-period-jitter term mark.barnett
@ 2025-01-06 12:01 ` mark.barnett
2025-01-21 17:22 ` Leo Yan
2025-01-22 5:03 ` kernel test robot
2025-01-22 16:47 ` [PATCH v2 0/5] A mechanism for efficient support for per-function metrics James Clark
5 siblings, 2 replies; 15+ messages in thread
From: mark.barnett @ 2025-01-06 12:01 UTC (permalink / raw)
To: peterz, mingo, acme, namhyung, irogers
Cc: ben.gainey, deepak.surti, ak, will, james.clark, mark.rutland,
alexander.shishkin, jolsa, adrian.hunter, linux-perf-users,
linux-kernel, linux-arm-kernel, Mark Barnett
From: Mark Barnett <mark.barnett@arm.com>
This change alters the PowerPC and x86 driver implementations to record
the last sample period before the event is updated for the next period.
A common pattern in PMU driver implementations is to have a
"*_event_set_period" function which takes care of updating the various
period-related fields in a perf_event structure. In most cases, the
drivers choose to call this function after initializing a sample data
structure with perf_sample_data_init. The x86 and PowerPC drivers
deviate from this, choosing to update the period before initializing the
sample data. When using an event with an alternate sample period, this
causes an incorrect period to be written to the sample data that gets
reported to userspace.
Signed-off-by: Mark Barnett <mark.barnett@arm.com>
---
arch/powerpc/perf/core-book3s.c | 3 ++-
arch/powerpc/perf/core-fsl-emb.c | 3 ++-
arch/x86/events/core.c | 4 +++-
arch/x86/events/intel/core.c | 3 ++-
arch/x86/events/intel/knc.c | 3 ++-
5 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 2b79171ee185..234803441caa 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2231,6 +2231,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
struct pt_regs *regs)
{
u64 period = event->hw.sample_period;
+ const u64 last_period = event->hw.last_period;
s64 prev, delta, left;
int record = 0;
@@ -2296,7 +2297,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
if (record) {
struct perf_sample_data data;
- perf_sample_data_init(&data, ~0ULL, event->hw.last_period);
+ perf_sample_data_init(&data, ~0ULL, last_period);
if (event->attr.sample_type & PERF_SAMPLE_ADDR_TYPE)
perf_get_data_addr(event, regs, &data.addr);
diff --git a/arch/powerpc/perf/core-fsl-emb.c b/arch/powerpc/perf/core-fsl-emb.c
index 1a53ab08447c..d2ffcc7021c5 100644
--- a/arch/powerpc/perf/core-fsl-emb.c
+++ b/arch/powerpc/perf/core-fsl-emb.c
@@ -590,6 +590,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
struct pt_regs *regs)
{
u64 period = event->hw.sample_period;
+ const u64 last_period = event->hw.last_period;
s64 prev, delta, left;
int record = 0;
@@ -632,7 +633,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
if (record) {
struct perf_sample_data data;
- perf_sample_data_init(&data, 0, event->hw.last_period);
+ perf_sample_data_init(&data, 0, last_period);
if (perf_event_overflow(event, &data, regs))
fsl_emb_pmu_stop(event, 0);
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index c75c482d4c52..39891fef4395 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1673,6 +1673,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
struct cpu_hw_events *cpuc;
struct perf_event *event;
int idx, handled = 0;
+ u64 last_period;
u64 val;
cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -1692,6 +1693,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
continue;
event = cpuc->events[idx];
+ last_period = event->hw.last_period;
val = static_call(x86_pmu_update)(event);
if (val & (1ULL << (x86_pmu.cntval_bits - 1)))
@@ -1705,7 +1707,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
if (!static_call(x86_pmu_set_period)(event))
continue;
- perf_sample_data_init(&data, 0, event->hw.last_period);
+ perf_sample_data_init(&data, 0, last_period);
if (has_branch_stack(event))
perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2e1e26846050..17d8c9b8738f 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3103,6 +3103,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
+ const u64 last_period = event->hw.last_period;
handled++;
@@ -3112,7 +3113,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
if (!intel_pmu_save_and_restart(event))
continue;
- perf_sample_data_init(&data, 0, event->hw.last_period);
+ perf_sample_data_init(&data, 0, last_period);
if (has_branch_stack(event))
intel_pmu_lbr_save_brstack(&data, cpuc, event);
diff --git a/arch/x86/events/intel/knc.c b/arch/x86/events/intel/knc.c
index 034a1f6a457c..e8720ac5200b 100644
--- a/arch/x86/events/intel/knc.c
+++ b/arch/x86/events/intel/knc.c
@@ -241,6 +241,7 @@ static int knc_pmu_handle_irq(struct pt_regs *regs)
for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
+ const u64 last_period = event->hw.last_period;
handled++;
@@ -250,7 +251,7 @@ static int knc_pmu_handle_irq(struct pt_regs *regs)
if (!intel_pmu_save_and_restart(event))
continue;
- perf_sample_data_init(&data, 0, event->hw.last_period);
+ perf_sample_data_init(&data, 0, last_period);
if (perf_event_overflow(event, &data, regs))
x86_pmu_stop(event, 0);
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods
2025-01-06 12:01 ` [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods mark.barnett
@ 2025-01-21 13:01 ` Leo Yan
2025-03-07 20:28 ` Mark Barnett
2025-01-31 18:44 ` Rob Herring
1 sibling, 1 reply; 15+ messages in thread
From: Leo Yan @ 2025-01-21 13:01 UTC (permalink / raw)
To: mark.barnett@arm.com
Cc: peterz, mingo, acme, namhyung, irogers, ben.gainey, deepak.surti,
ak, will, james.clark, mark.rutland, alexander.shishkin, jolsa,
adrian.hunter, linux-perf-users, linux-kernel, linux-arm-kernel
On Mon, Jan 06, 2025 at 12:01:52PM +0000, mark.barnett@arm.com wrote:
> From: Ben Gainey <ben.gainey@arm.com>
>
> This change modifies perf_event_attr to add a second, alternative
> sample period field, and modifies the core perf overflow handling
> such that when specified an event will alternate between two sample
> periods.
>
> Currently, perf does not provide a mechanism for decoupling the period
> over which counters are counted from the period between samples. This is
> problematic for building a tool to measure per-function metrics derived
> from a sampled counter group. Ideally such a tool wants a very small
> sample window in order to correctly attribute the metrics to a given
> function, but prefers a larger sample period that provides representative
> coverage without excessive probe effect, triggering throttling, or
> generating excessive amounts of data.
>
> By alternating between a long and short sample_period and subsequently
> discarding the long samples, tools may decouple the period between
> samples that the tool cares about from the window of time over which
> interesting counts are collected.
>
> It is expected that typically tools would use this feature with the
> cycles or instructions events as an approximation for time, but no
> restrictions are applied to which events this can be applied to.
>
> Signed-off-by: Ben Gainey <ben.gainey@arm.com>
> Signed-off-by: Mark Barnett <mark.barnett@arm.com>
> ---
> include/linux/perf_event.h | 5 +++++
> include/uapi/linux/perf_event.h | 3 +++
> kernel/events/core.c | 37 ++++++++++++++++++++++++++++++++-
> 3 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index cb99ec8c9e96..cbb332f4e19c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -276,6 +276,11 @@ struct hw_perf_event {
> */
> u64 freq_time_stamp;
> u64 freq_count_stamp;
> +
> + /*
> + * Indicates that the alternative sample period is used
> + */
> + bool using_alt_sample_period;
> #endif
> };
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 0524d541d4e3..499a8673df8e 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -379,6 +379,7 @@ enum perf_event_read_format {
> #define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */
> #define PERF_ATTR_SIZE_VER7 128 /* add: sig_data */
> #define PERF_ATTR_SIZE_VER8 136 /* add: config3 */
> +#define PERF_ATTR_SIZE_VER9 144 /* add: alt_sample_period */
>
> /*
> * Hardware event_id to monitor via a performance monitoring event:
> @@ -531,6 +532,8 @@ struct perf_event_attr {
> __u64 sig_data;
>
> __u64 config3; /* extension of config2 */
> +
> + __u64 alt_sample_period;
> };
>
> /*
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 065f9188b44a..7e339d12363a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4178,6 +4178,8 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
> s64 period, sample_period;
> s64 delta;
>
> + WARN_ON_ONCE(hwc->using_alt_sample_period);
> +
> period = perf_calculate_period(event, nsec, count);
>
> delta = (s64)(period - hwc->sample_period);
> @@ -9850,6 +9852,7 @@ static int __perf_event_overflow(struct perf_event *event,
> int throttle, struct perf_sample_data *data,
> struct pt_regs *regs)
> {
> + struct hw_perf_event *hwc = &event->hw;
> int events = atomic_read(&event->event_limit);
> int ret = 0;
>
> @@ -9869,6 +9872,18 @@ static int __perf_event_overflow(struct perf_event *event,
> !bpf_overflow_handler(event, data, regs))
> goto out;
>
> + /*
> + * Swap the sample period to the alternative period
> + */
> + if (event->attr.alt_sample_period) {
> + bool using_alt = hwc->using_alt_sample_period;
> + u64 sample_period = (using_alt ? event->attr.sample_period
> + : event->attr.alt_sample_period);
> +
> + hwc->sample_period = sample_period;
> + hwc->using_alt_sample_period = !using_alt;
> + }
> +
> /*
> * XXX event_limit might not quite work as expected on inherited
> * events
> @@ -12291,9 +12306,19 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
> if (attr->freq && attr->sample_freq)
> hwc->sample_period = 1;
> hwc->last_period = hwc->sample_period;
> -
Redundant change at here?
> local64_set(&hwc->period_left, hwc->sample_period);
>
> + if (attr->alt_sample_period) {
> + hwc->sample_period = attr->alt_sample_period;
> + hwc->using_alt_sample_period = true;
> + }
My understanding it sets a short sample window for the first period.
Would it initialize the `hwc->period_left` with the updated sample
period?
> +
> + /*
> + * alt_sample_period cannot be used with freq
> + */
> + if (attr->freq && attr->alt_sample_period)
> + goto err_ns;
> +
It is good to validate parameters first. So move the checking before
the adjustment for the alt sample period.
> /*
> * We do not support PERF_SAMPLE_READ on inherited events unless
> * PERF_SAMPLE_TID is also selected, which allows inherited events to
> @@ -12763,9 +12788,19 @@ SYSCALL_DEFINE5(perf_event_open,
> if (attr.freq) {
> if (attr.sample_freq > sysctl_perf_event_sample_rate)
> return -EINVAL;
> + if (attr.alt_sample_period)
> + return -EINVAL;
> } else {
> if (attr.sample_period & (1ULL << 63))
> return -EINVAL;
> + if (attr.alt_sample_period) {
> + if (!attr.sample_period)
> + return -EINVAL;
> + if (attr.alt_sample_period & (1ULL << 63))
> + return -EINVAL;
> + if (attr.alt_sample_period == attr.sample_period)
> + attr.alt_sample_period = 0;
In theory, the attr.alt_sample_period should be less than
attr.sample_period, right?
Thanks,
Leo
> + }
> }
>
> /* Only privileged users can get physical addresses */
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 5/5] perf: Record sample last_period before updating
2025-01-06 12:01 ` [PATCH v2 5/5] perf: Record sample last_period before updating mark.barnett
@ 2025-01-21 17:22 ` Leo Yan
2025-01-22 5:03 ` kernel test robot
1 sibling, 0 replies; 15+ messages in thread
From: Leo Yan @ 2025-01-21 17:22 UTC (permalink / raw)
To: mark.barnett@arm.com
Cc: peterz, mingo, acme, namhyung, irogers, ben.gainey, deepak.surti,
ak, will, james.clark, mark.rutland, alexander.shishkin, jolsa,
adrian.hunter, linux-perf-users, linux-kernel, linux-arm-kernel
On Mon, Jan 06, 2025 at 12:01:56PM +0000, mark.barnett@arm.com wrote:
> From: Mark Barnett <mark.barnett@arm.com>
>
> This change alters the PowerPC and x86 driver implementations to record
> the last sample period before the event is updated for the next period.
>
> A common pattern in PMU driver implementations is to have a
> "*_event_set_period" function which takes care of updating the various
> period-related fields in a perf_event structure. In most cases, the
> drivers choose to call this function after initializing a sample data
> structure with perf_sample_data_init. The x86 and PowerPC drivers
> deviate from this, choosing to update the period before initializing the
> sample data. When using an event with an alternate sample period, this
> causes an incorrect period to be written to the sample data that gets
> reported to userspace.
I found conflict when I applied this patch on linux-next [1], with the
commit below:
commit faac6f105ef169e2e5678c14e1ffebf2a7d780b6
Author: Yabin Cui <yabinc@google.com>
Date: Wed May 15 12:36:09 2024 -0700
perf/core: Check sample_type in perf_sample_save_brstack
Check sample_type in perf_sample_save_brstack() to prevent
saving branch stack data when it isn't required.
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Yabin Cui <yabinc@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240515193610.2350456-4-yabinc@google.com
Please consider to rebase the patch to fix conflict in next spin.
Thanks,
Leo
[1] git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 5/5] perf: Record sample last_period before updating
2025-01-06 12:01 ` [PATCH v2 5/5] perf: Record sample last_period before updating mark.barnett
2025-01-21 17:22 ` Leo Yan
@ 2025-01-22 5:03 ` kernel test robot
1 sibling, 0 replies; 15+ messages in thread
From: kernel test robot @ 2025-01-22 5:03 UTC (permalink / raw)
To: mark.barnett
Cc: oe-lkp, lkp, linuxppc-dev, linux-perf-users, linux-kernel, peterz,
mingo, acme, namhyung, irogers, ben.gainey, deepak.surti, ak,
will, james.clark, mark.rutland, alexander.shishkin, jolsa,
adrian.hunter, linux-arm-kernel, Mark Barnett, oliver.sang
Hello,
kernel test robot noticed "BUG:KASAN:null-ptr-deref_in_handle_pmi_common" on:
commit: b16c01fbc96460a72789c04e0e2a8f8437eab05b ("[PATCH v2 5/5] perf: Record sample last_period before updating")
url: https://github.com/intel-lab-lkp/linux/commits/mark-barnett-arm-com/perf-Allow-periodic-events-to-alternate-between-two-sample-periods/20250106-203820
base: https://git.kernel.org/cgit/linux/kernel/git/perf/perf-tools-next.git perf-tools-next
patch link: https://lore.kernel.org/all/20250106120156.227273-6-mark.barnett@arm.com/
patch subject: [PATCH v2 5/5] perf: Record sample last_period before updating
in testcase: kernel-selftests-bpf
version:
with following parameters:
group: bpf
config: x86_64-rhel-9.4-bpf
compiler: gcc-12
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202501221114.c06f7c72-lkp@intel.com
[ 1693.204121][ C3] ==================================================================
[ 1693.204127][ C3] BUG: KASAN: null-ptr-deref in handle_pmi_common+0x218/0x630
[ 1693.204138][ C3] Read of size 8 at addr 0000000000000200 by task (udev-worker)/62767
[ 1693.204143][ C3]
[ 1693.204146][ C3] CPU: 3 UID: 0 PID: 62767 Comm: (udev-worker) Tainted: G S OE 6.13.0-rc2-00267-gb16c01fbc964 #1
[ 1693.204156][ C3] Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 1693.204159][ C3] Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.2.0 12/22/2016
[ 1693.204163][ C3] Call Trace:
[ 1693.204166][ C3] <NMI>
[ 1693.204169][ C3] dump_stack_lvl+0x62/0x90
[ 1693.204178][ C3] kasan_report+0xb9/0xf0
[ 1693.204189][ C3] ? handle_pmi_common+0x218/0x630
[ 1693.204202][ C3] handle_pmi_common+0x218/0x630
[ 1693.204216][ C3] ? __pfx_handle_pmi_common+0x10/0x10
[ 1693.204239][ C3] ? rcu_is_watching+0x1c/0x50
[ 1693.204245][ C3] ? trace_lock_acquire+0x118/0x150
[ 1693.204255][ C3] ? intel_bts_interrupt+0xcc/0x270
[ 1693.204264][ C3] ? __pfx_intel_bts_interrupt+0x10/0x10
[ 1693.204279][ C3] intel_pmu_handle_irq+0x152/0x320
[ 1693.204290][ C3] perf_event_nmi_handler+0x37/0x60
[ 1693.204299][ C3] nmi_handle+0xb2/0x240
[ 1693.204311][ C3] default_do_nmi+0x45/0x110
[ 1693.204321][ C3] exc_nmi+0x100/0x190
[ 1693.204329][ C3] end_repeat_nmi+0xf/0x53
[ 1693.204335][ C3] RIP: 0010:kasan_check_range+0x38/0x1b0
[ 1693.204344][ C3] Code: 44 0f b6 c2 48 01 f0 55 53 0f 82 d7 00 00 00 eb 0f cc cc cc 48 b8 00 00 00 00 00 00 00 ff eb 0a 48 b8 00 00 00 00 00 80 ff ff <48> 39 c7 0f 82 b3 00 00 00 4c 8d 54 37 ff 48 89 fd 48 b8 00 00 00
[ 1693.204349][ C3] RSP: 0018:ffff8882c850f9b8 EFLAGS: 00000086
[ 1693.204355][ C3] RAX: ffff800000000000 RBX: 000000000000001b RCX: ffffffff812458aa
[ 1693.204359][ C3] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff86ee1140
[ 1693.204363][ C3] RBP: 0000000000000009 R08: 0000000000000000 R09: fffffbfff0ddc234
[ 1693.204367][ C3] R10: ffffffff86ee11a7 R11: 0000000000000000 R12: ffff888107450f08
[ 1693.204371][ C3] R13: 0000000000000200 R14: ffff888107450000 R15: ffff888107450f28
[ 1693.204381][ C3] ? mark_lock+0x6a/0x530
[ 1693.204393][ C3] ? kasan_check_range+0x38/0x1b0
[ 1693.204403][ C3] ? kasan_check_range+0x38/0x1b0
[ 1693.204413][ C3] </NMI>
[ 1693.204415][ C3] <TASK>
[ 1693.204419][ C3] mark_lock+0x6a/0x530
[ 1693.204430][ C3] mark_usage+0xbb/0x1a0
[ 1693.204439][ C3] __lock_acquire+0x50e/0xf90
[ 1693.204451][ C3] ? rcu_is_watching+0x1c/0x50
[ 1693.204459][ C3] lock_acquire+0x123/0x2e0
[ 1693.204468][ C3] ? bpf_trace_run2+0x115/0x310
[ 1693.204479][ C3] ? __pfx_lock_acquire+0x10/0x10
[ 1693.204491][ C3] ? lock_acquire+0x123/0x2e0
[ 1693.204499][ C3] ? __might_fault+0x74/0xc0
[ 1693.204509][ C3] ? find_held_lock+0x83/0xa0
[ 1693.204519][ C3] bpf_trace_run2+0x129/0x310
[ 1693.204526][ C3] ? bpf_trace_run2+0x115/0x310
[ 1693.204534][ C3] ? __pfx_bpf_trace_run2+0x10/0x10
[ 1693.204541][ C3] ? lock_is_held_type+0x9a/0x110
[ 1693.204551][ C3] ? __might_fault+0x74/0xc0
[ 1693.204562][ C3] __bpf_trace_sys_enter+0x33/0x60
[ 1693.204570][ C3] syscall_trace_enter+0x1b8/0x260
[ 1693.204579][ C3] do_syscall_64+0x139/0x170
[ 1693.204585][ C3] ? __pfx___lock_release+0x10/0x10
[ 1693.204600][ C3] ? __might_fault+0x74/0xc0
[ 1693.204609][ C3] ? rcu_is_watching+0x1c/0x50
[ 1693.204615][ C3] ? trace_rseq_update+0xb9/0xf0
[ 1693.204624][ C3] ? __rseq_handle_notify_resume+0x321/0x3a0
[ 1693.204632][ C3] ? do_epoll_wait+0xd1/0xf0
[ 1693.204642][ C3] ? __pfx___rseq_handle_notify_resume+0x10/0x10
[ 1693.204652][ C3] ? __might_fault+0x74/0xc0
[ 1693.204661][ C3] ? rcu_is_watching+0x1c/0x50
[ 1693.204667][ C3] ? mark_held_locks+0x24/0x90
[ 1693.204677][ C3] ? lockdep_hardirqs_on_prepare+0x131/0x200
[ 1693.204687][ C3] ? syscall_exit_to_user_mode+0xa2/0x2a0
[ 1693.204694][ C3] ? do_syscall_64+0x98/0x170
[ 1693.204699][ C3] ? mark_held_locks+0x24/0x90
[ 1693.204709][ C3] ? lockdep_hardirqs_on_prepare+0x131/0x200
[ 1693.204718][ C3] ? syscall_exit_to_user_mode+0xa2/0x2a0
[ 1693.204723][ C3] ? do_syscall_64+0x98/0x170
[ 1693.204726][ C3] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1693.204731][ C3] RIP: 0033:0x7fcc237cb899
[ 1693.204734][ C3] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d 51 fd 0c 00 00 41 89 ca 74 1c 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 55 48 83 ec 20 48 89
[ 1693.204737][ C3] RSP: 002b:00007ffeb82d1788 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
[ 1693.204741][ C3] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007fcc237cb899
[ 1693.204743][ C3] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001d
[ 1693.204745][ C3] RBP: 00007ffeb82d1970 R08: 0000000000000000 R09: 0000000000000000
[ 1693.204747][ C3] R10: 0000000000000022 R11: 0000000000000246 R12: 00007ffeb82d1980
[ 1693.204749][ C3] R13: 00007ffeb82d19c8 R14: 000055d5d68d8a50 R15: 0000000000000000
[ 1693.204755][ C3] </TASK>
[ 1693.204757][ C3] ==================================================================
[ 1693.204758][ C3] Disabling lock debugging due to kernel taint
[ 1693.204761][ C3] BUG: kernel NULL pointer dereference, address: 0000000000000200
[ 1693.204762][ C3] #PF: supervisor read access in kernel mode
[ 1693.204764][ C3] #PF: error_code(0x0000) - not-present page
[ 1693.204766][ C3] PGD 0 P4D 0
[ 1693.204769][ C3] Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI
[ 1693.204772][ C3] CPU: 3 UID: 0 PID: 62767 Comm: (udev-worker) Tainted: G S B OE 6.13.0-rc2-00267-gb16c01fbc964 #1
[ 1693.204778][ C3] Tainted: [S]=CPU_OUT_OF_SPEC, [B]=BAD_PAGE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 1693.204779][ C3] Hardware name: Dell Inc. OptiPlex 7050/062KRH, BIOS 1.2.0 12/22/2016
[ 1693.204781][ C3] RIP: 0010:handle_pmi_common+0x222/0x630
[ 1693.204785][ C3] Code: 74 24 48 41 83 c5 01 4b 8d 3c f4 e8 e8 02 6a 00 4f 8b 3c f4 49 8d bf 00 02 00 00 e8 d8 02 6a 00 48 8b 54 24 40 be 08 00 00 00 <49> 8b 87 00 02 00 00 48 89 44 24 38 4c 89 f0 48 c1 e8 06 48 8d 3c
[ 1693.204788][ C3] RSP: 0018:fffffe00000e6b80 EFLAGS: 00010086
[ 1693.204790][ C3] RAX: 0000000000000001 RBX: 1fffffc00001cd7c RCX: ffffffff81144e56
[ 1693.204792][ C3] RDX: ffff8887337a9e80 RSI: 0000000000000008 RDI: ffffffff867c2f80
[ 1693.204794][ C3] RBP: fffffe00000e6df0 R08: 0000000000000001 R09: fffffbfff0cf85f0
[ 1693.204796][ C3] R10: ffffffff867c2f87 R11: 0000000000000001 R12: ffff8887337a9c80
[ 1693.204798][ C3] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 1693.204800][ C3] FS: 00007fcc230b18c0(0000) GS:ffff888733780000(0000) knlGS:0000000000000000
[ 1693.204803][ C3] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1693.204805][ C3] CR2: 0000000000000200 CR3: 000000034bbd6006 CR4: 00000000003726f0
[ 1693.204807][ C3] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1693.204808][ C3] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1693.204810][ C3] Call Trace:
[ 1693.204811][ C3] <NMI>
[ 1693.204813][ C3] ? __die+0x1f/0x60
[ 1693.204817][ C3] ? page_fault_oops+0x8d/0xc0
[ 1693.204822][ C3] ? exc_page_fault+0x57/0xe0
[ 1693.204827][ C3] ? asm_exc_page_fault+0x22/0x30
[ 1693.204834][ C3] ? add_taint+0x26/0x90
[ 1693.204842][ C3] ? handle_pmi_common+0x222/0x630
[ 1693.204856][ C3] ? __pfx_handle_pmi_common+0x10/0x10
[ 1693.204879][ C3] ? rcu_is_watching+0x1c/0x50
[ 1693.204885][ C3] ? trace_lock_acquire+0x118/0x150
[ 1693.204894][ C3] ? intel_bts_interrupt+0xcc/0x270
[ 1693.204904][ C3] ? __pfx_intel_bts_interrupt+0x10/0x10
[ 1693.204918][ C3] intel_pmu_handle_irq+0x152/0x320
[ 1693.204928][ C3] perf_event_nmi_handler+0x37/0x60
[ 1693.204936][ C3] nmi_handle+0xb2/0x240
[ 1693.204947][ C3] default_do_nmi+0x45/0x110
[ 1693.204955][ C3] exc_nmi+0x100/0x190
[ 1693.204964][ C3] end_repeat_nmi+0xf/0x53
[ 1693.204969][ C3] RIP: 0010:kasan_check_range+0x38/0x1b0
[ 1693.204977][ C3] Code: 44 0f b6 c2 48 01 f0 55 53 0f 82 d7 00 00 00 eb 0f cc cc cc 48 b8 00 00 00 00 00 00 00 ff eb 0a 48 b8 00 00 00 00 00 80 ff ff <48> 39 c7 0f 82 b3 00 00 00 4c 8d 54 37 ff 48 89 fd 48 b8 00 00 00
[ 1693.204982][ C3] RSP: 0018:ffff8882c850f9b8 EFLAGS: 00000086
[ 1693.204987][ C3] RAX: ffff800000000000 RBX: 000000000000001b RCX: ffffffff812458aa
[ 1693.204991][ C3] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff86ee1140
[ 1693.204995][ C3] RBP: 0000000000000009 R08: 0000000000000000 R09: fffffbfff0ddc234
[ 1693.204999][ C3] R10: ffffffff86ee11a7 R11: 0000000000000000 R12: ffff888107450f08
[ 1693.205003][ C3] R13: 0000000000000200 R14: ffff888107450000 R15: ffff888107450f28
[ 1693.205012][ C3] ? mark_lock+0x6a/0x530
[ 1693.205023][ C3] ? kasan_check_range+0x38/0x1b0
[ 1693.205034][ C3] ? kasan_check_range+0x38/0x1b0
[ 1693.205044][ C3] </NMI>
[ 1693.205046][ C3] <TASK>
[ 1693.205049][ C3] mark_lock+0x6a/0x530
[ 1693.205060][ C3] mark_usage+0xbb/0x1a0
[ 1693.205069][ C3] __lock_acquire+0x50e/0xf90
[ 1693.205081][ C3] ? rcu_is_watching+0x1c/0x50
[ 1693.205090][ C3] lock_acquire+0x123/0x2e0
[ 1693.205098][ C3] ? bpf_trace_run2+0x115/0x310
[ 1693.205108][ C3] ? __pfx_lock_acquire+0x10/0x10
[ 1693.205120][ C3] ? lock_acquire+0x123/0x2e0
[ 1693.205128][ C3] ? __might_fault+0x74/0xc0
[ 1693.205137][ C3] ? find_held_lock+0x83/0xa0
[ 1693.205147][ C3] bpf_trace_run2+0x129/0x310
[ 1693.205154][ C3] ? bpf_trace_run2+0x115/0x310
[ 1693.205161][ C3] ? __pfx_bpf_trace_run2+0x10/0x10
[ 1693.205168][ C3] ? lock_is_held_type+0x9a/0x110
[ 1693.205177][ C3] ? __might_fault+0x74/0xc0
[ 1693.205189][ C3] __bpf_trace_sys_enter+0x33/0x60
[ 1693.205196][ C3] syscall_trace_enter+0x1b8/0x260
[ 1693.205205][ C3] do_syscall_64+0x139/0x170
[ 1693.205211][ C3] ? __pfx___lock_release+0x10/0x10
[ 1693.205225][ C3] ? __might_fault+0x74/0xc0
[ 1693.205234][ C3] ? rcu_is_watching+0x1c/0x50
[ 1693.205240][ C3] ? trace_rseq_update+0xb9/0xf0
[ 1693.205248][ C3] ? __rseq_handle_notify_resume+0x321/0x3a0
[ 1693.205255][ C3] ? do_epoll_wait+0xd1/0xf0
[ 1693.205264][ C3] ? __pfx___rseq_handle_notify_resume+0x10/0x10
[ 1693.205273][ C3] ? __might_fault+0x74/0xc0
[ 1693.205281][ C3] ? rcu_is_watching+0x1c/0x50
[ 1693.205287][ C3] ? mark_held_locks+0x24/0x90
[ 1693.205297][ C3] ? lockdep_hardirqs_on_prepare+0x131/0x200
[ 1693.205306][ C3] ? syscall_exit_to_user_mode+0xa2/0x2a0
[ 1693.205313][ C3] ? do_syscall_64+0x98/0x170
[ 1693.205318][ C3] ? mark_held_locks+0x24/0x90
[ 1693.205328][ C3] ? lockdep_hardirqs_on_prepare+0x131/0x200
[ 1693.205336][ C3] ? syscall_exit_to_user_mode+0xa2/0x2a0
[ 1693.205343][ C3] ? do_syscall_64+0x98/0x170
[ 1693.205350][ C3] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1693.205359][ C3] RIP: 0033:0x7fcc237cb899
[ 1693.205363][ C3] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d 51 fd 0c 00 00 41 89 ca 74 1c 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 55 48 83 ec 20 48 89
[ 1693.205369][ C3] RSP: 002b:00007ffeb82d1788 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
[ 1693.205374][ C3] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007fcc237cb899
[ 1693.205379][ C3] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001d
[ 1693.205382][ C3] RBP: 00007ffeb82d1970 R08: 0000000000000000 R09: 0000000000000000
[ 1693.205386][ C3] R10: 0000000000000022 R11: 0000000000000246 R12: 00007ffeb82d1980
[ 1693.205389][ C3] R13: 00007ffeb82d19c8 R14: 000055d5d68d8a50 R15: 0000000000000000
[ 1693.205402][ C3] </TASK>
[ 1693.205404][ C3] Modules linked in: cls_matchall tls sch_fq 8021q garp mrp stp llc dummy tun ipip tunnel4 ip_tunnel iptable_raw xt_connmark bpf_testmod(OE) veth cls_bpf sch_ingress rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver openvswitch nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 psample snd_hda_codec_hdmi snd_ctl_led intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component btrfs blake2b_generic x86_pkg_temp_thermal xor zstd_compress intel_powerclamp raid6_pq libcrc32c snd_soc_avs snd_soc_hda_codec snd_hda_ext_core coretemp i915 snd_soc_core sd_mod snd_compress cec kvm_intel sg snd_hda_intel drm_buddy snd_intel_dspcfg ttm snd_intel_sdw_acpi kvm snd_hda_codec crct10dif_pclmul drm_display_helper crc32_pclmul crc32c_intel snd_hda_core ghash_clmulni_intel dell_pc dell_wmi drm_kms_helper snd_hwdep mei_wdt i2c_designware_platform rapl snd_pcm intel_gtt ipmi_devintf platform_profile i2c_designware_core
[ 1693.205551][ C3] intel_cstate snd_timer dell_wmi_aio agpgart ahci dell_smbios ipmi_msghandler wmi_bmof dell_wmi_descriptor sparse_keymap dcdbas libahci mei_me snd video i2c_i801 pcspkr intel_uncore intel_lpss_pci libata intel_lpss mei i2c_smbus soundcore idma64 intel_pmc_core intel_vsec pmt_telemetry wmi pinctrl_sunrisepoint pmt_class acpi_pad binfmt_misc drm dm_mod ip_tables x_tables sch_fq_codel [last unloaded: bpf_test_no_cfi(OE)]
[ 1693.205635][ C3] CR2: 0000000000000200
[ 1693.205638][ C3] ---[ end trace 0000000000000000 ]---
[ 1693.205641][ C3] RIP: 0010:handle_pmi_common+0x222/0x630
[ 1693.205648][ C3] Code: 74 24 48 41 83 c5 01 4b 8d 3c f4 e8 e8 02 6a 00 4f 8b 3c f4 49 8d bf 00 02 00 00 e8 d8 02 6a 00 48 8b 54 24 40 be 08 00 00 00 <49> 8b 87 00 02 00 00 48 89 44 24 38 4c 89 f0 48 c1 e8 06 48 8d 3c
[ 1693.205653][ C3] RSP: 0018:fffffe00000e6b80 EFLAGS: 00010086
[ 1693.205658][ C3] RAX: 0000000000000001 RBX: 1fffffc00001cd7c RCX: ffffffff81144e56
[ 1693.205662][ C3] RDX: ffff8887337a9e80 RSI: 0000000000000008 RDI: ffffffff867c2f80
[ 1693.205666][ C3] RBP: fffffe00000e6df0 R08: 0000000000000001 R09: fffffbfff0cf85f0
[ 1693.205670][ C3] R10: ffffffff867c2f87 R11: 0000000000000001 R12: ffff8887337a9c80
[ 1693.205673][ C3] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 1693.205677][ C3] FS: 00007fcc230b18c0(0000) GS:ffff888733780000(0000) knlGS:0000000000000000
[ 1693.205681][ C3] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1693.205685][ C3] CR2: 0000000000000200 CR3: 000000034bbd6006 CR4: 00000000003726f0
[ 1693.205688][ C3] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1693.205691][ C3] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1693.205695][ C3] Kernel panic - not syncing: Fatal exception in interrupt
[ 1693.205723][ C3] Kernel Offset: disabled
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250122/202501221114.c06f7c72-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/5] A mechanism for efficient support for per-function metrics
2025-01-06 12:01 [PATCH v2 0/5] A mechanism for efficient support for per-function metrics mark.barnett
` (4 preceding siblings ...)
2025-01-06 12:01 ` [PATCH v2 5/5] perf: Record sample last_period before updating mark.barnett
@ 2025-01-22 16:47 ` James Clark
2025-02-07 19:23 ` Mark Barnett
5 siblings, 1 reply; 15+ messages in thread
From: James Clark @ 2025-01-22 16:47 UTC (permalink / raw)
To: mark.barnett
Cc: ben.gainey, deepak.surti, ak, will, james.clark, mark.rutland,
alexander.shishkin, jolsa, adrian.hunter, linux-perf-users,
linux-kernel, linux-arm-kernel, peterz, mingo, acme, namhyung,
irogers
On 06/01/2025 12:01 pm, mark.barnett@arm.com wrote:
> From: Mark Barnett <mark.barnett@arm.com>
>
> This patch introduces the concept of an alternating sample rate to perf
> core and provides the necessary basic changes in the tools to activate
> that option.
>
[...]
> Changes since v1:
> - Rebased onto perf-tools-next, as per request from Ian Rogers.
> - Removed unnecessary code that truncated period_left to 0 and restarted
> the PMU.
> - Renamed variables to use the shorter 'alt_period' instead of
> 'alterantive_period'.
> - Added patch #5 that addresses an issue in the x86 and PowerPC drivers that
> caused the opposite period to be reported in the sample record.
>
It sounds like it would be better if patch 5 comes first otherwise the
feature is introduced as broken.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods
2025-01-06 12:01 ` [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods mark.barnett
2025-01-21 13:01 ` Leo Yan
@ 2025-01-31 18:44 ` Rob Herring
2025-02-07 19:18 ` Mark Barnett
1 sibling, 1 reply; 15+ messages in thread
From: Rob Herring @ 2025-01-31 18:44 UTC (permalink / raw)
To: mark.barnett
Cc: peterz, mingo, acme, namhyung, irogers, ben.gainey, deepak.surti,
ak, will, james.clark, mark.rutland, alexander.shishkin, jolsa,
adrian.hunter, linux-perf-users, linux-kernel, linux-arm-kernel
On Mon, Jan 6, 2025 at 6:12 AM <mark.barnett@arm.com> wrote:
>
> From: Ben Gainey <ben.gainey@arm.com>
>
> This change modifies perf_event_attr to add a second, alternative
> sample period field, and modifies the core perf overflow handling
> such that when specified an event will alternate between two sample
> periods.
>
> Currently, perf does not provide a mechanism for decoupling the period
> over which counters are counted from the period between samples. This is
> problematic for building a tool to measure per-function metrics derived
> from a sampled counter group. Ideally such a tool wants a very small
> sample window in order to correctly attribute the metrics to a given
> function, but prefers a larger sample period that provides representative
> coverage without excessive probe effect, triggering throttling, or
> generating excessive amounts of data.
>
> By alternating between a long and short sample_period and subsequently
> discarding the long samples, tools may decouple the period between
> samples that the tool cares about from the window of time over which
> interesting counts are collected.
>
> It is expected that typically tools would use this feature with the
> cycles or instructions events as an approximation for time, but no
> restrictions are applied to which events this can be applied to.
>
> Signed-off-by: Ben Gainey <ben.gainey@arm.com>
> Signed-off-by: Mark Barnett <mark.barnett@arm.com>
> ---
> include/linux/perf_event.h | 5 +++++
> include/uapi/linux/perf_event.h | 3 +++
> kernel/events/core.c | 37 ++++++++++++++++++++++++++++++++-
> 3 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index cb99ec8c9e96..cbb332f4e19c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -276,6 +276,11 @@ struct hw_perf_event {
> */
> u64 freq_time_stamp;
> u64 freq_count_stamp;
> +
> + /*
> + * Indicates that the alternative sample period is used
> + */
> + bool using_alt_sample_period;
8 bytes more for a single bit of data. I think we can avoid it. More below.
> #endif
> };
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 0524d541d4e3..499a8673df8e 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -379,6 +379,7 @@ enum perf_event_read_format {
> #define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */
> #define PERF_ATTR_SIZE_VER7 128 /* add: sig_data */
> #define PERF_ATTR_SIZE_VER8 136 /* add: config3 */
> +#define PERF_ATTR_SIZE_VER9 144 /* add: alt_sample_period */
>
> /*
> * Hardware event_id to monitor via a performance monitoring event:
> @@ -531,6 +532,8 @@ struct perf_event_attr {
> __u64 sig_data;
>
> __u64 config3; /* extension of config2 */
> +
> + __u64 alt_sample_period;
> };
>
> /*
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 065f9188b44a..7e339d12363a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4178,6 +4178,8 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
> s64 period, sample_period;
> s64 delta;
>
> + WARN_ON_ONCE(hwc->using_alt_sample_period);
> +
> period = perf_calculate_period(event, nsec, count);
>
> delta = (s64)(period - hwc->sample_period);
> @@ -9850,6 +9852,7 @@ static int __perf_event_overflow(struct perf_event *event,
> int throttle, struct perf_sample_data *data,
> struct pt_regs *regs)
> {
> + struct hw_perf_event *hwc = &event->hw;
> int events = atomic_read(&event->event_limit);
> int ret = 0;
>
> @@ -9869,6 +9872,18 @@ static int __perf_event_overflow(struct perf_event *event,
> !bpf_overflow_handler(event, data, regs))
> goto out;
>
> + /*
> + * Swap the sample period to the alternative period
> + */
> + if (event->attr.alt_sample_period) {
> + bool using_alt = hwc->using_alt_sample_period;
> + u64 sample_period = (using_alt ? event->attr.sample_period
> + : event->attr.alt_sample_period);
> +
> + hwc->sample_period = sample_period;
> + hwc->using_alt_sample_period = !using_alt;
> + }
Wouldn't something like this avoid the need for using_alt_sample_period:
if (event->attr.alt_sample_period) {
if (hwc->sample_period == event->attr.sample_period)
hwc->sample_period = event->attr.alt_sample_period;
else
hwc->sample_period = event->attr.sample_period;
}
Rob
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods
2025-01-31 18:44 ` Rob Herring
@ 2025-02-07 19:18 ` Mark Barnett
0 siblings, 0 replies; 15+ messages in thread
From: Mark Barnett @ 2025-02-07 19:18 UTC (permalink / raw)
To: Rob Herring
Cc: peterz, mingo, acme, namhyung, irogers, ben.gainey, deepak.surti,
ak, will, james.clark, mark.rutland, alexander.shishkin, jolsa,
adrian.hunter, linux-perf-users, linux-kernel, linux-arm-kernel
On 1/31/25 18:44, Rob Herring wrote:
> On Mon, Jan 6, 2025 at 6:12 AM <mark.barnett@arm.com> wrote:
>>
>> From: Ben Gainey <ben.gainey@arm.com>
>>
>> This change modifies perf_event_attr to add a second, alternative
>> sample period field, and modifies the core perf overflow handling
>> such that when specified an event will alternate between two sample
>> periods.
>>
>> Currently, perf does not provide a mechanism for decoupling the period
>> over which counters are counted from the period between samples. This is
>> problematic for building a tool to measure per-function metrics derived
>> from a sampled counter group. Ideally such a tool wants a very small
>> sample window in order to correctly attribute the metrics to a given
>> function, but prefers a larger sample period that provides representative
>> coverage without excessive probe effect, triggering throttling, or
>> generating excessive amounts of data.
>>
>> By alternating between a long and short sample_period and subsequently
>> discarding the long samples, tools may decouple the period between
>> samples that the tool cares about from the window of time over which
>> interesting counts are collected.
>>
>> It is expected that typically tools would use this feature with the
>> cycles or instructions events as an approximation for time, but no
>> restrictions are applied to which events this can be applied to.
>>
>> Signed-off-by: Ben Gainey <ben.gainey@arm.com>
>> Signed-off-by: Mark Barnett <mark.barnett@arm.com>
>> ---
>> include/linux/perf_event.h | 5 +++++
>> include/uapi/linux/perf_event.h | 3 +++
>> kernel/events/core.c | 37 ++++++++++++++++++++++++++++++++-
>> 3 files changed, 44 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index cb99ec8c9e96..cbb332f4e19c 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -276,6 +276,11 @@ struct hw_perf_event {
>> */
>> u64 freq_time_stamp;
>> u64 freq_count_stamp;
>> +
>> + /*
>> + * Indicates that the alternative sample period is used
>> + */
>> + bool using_alt_sample_period;
>
> 8 bytes more for a single bit of data. I think we can avoid it. More below.
>
>> #endif
>> };
>>
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index 0524d541d4e3..499a8673df8e 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -379,6 +379,7 @@ enum perf_event_read_format {
>> #define PERF_ATTR_SIZE_VER6 120 /* add: aux_sample_size */
>> #define PERF_ATTR_SIZE_VER7 128 /* add: sig_data */
>> #define PERF_ATTR_SIZE_VER8 136 /* add: config3 */
>> +#define PERF_ATTR_SIZE_VER9 144 /* add: alt_sample_period */
>>
>> /*
>> * Hardware event_id to monitor via a performance monitoring event:
>> @@ -531,6 +532,8 @@ struct perf_event_attr {
>> __u64 sig_data;
>>
>> __u64 config3; /* extension of config2 */
>> +
>> + __u64 alt_sample_period;
>> };
>>
>> /*
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 065f9188b44a..7e339d12363a 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -4178,6 +4178,8 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
>> s64 period, sample_period;
>> s64 delta;
>>
>> + WARN_ON_ONCE(hwc->using_alt_sample_period);
>> +
>> period = perf_calculate_period(event, nsec, count);
>>
>> delta = (s64)(period - hwc->sample_period);
>> @@ -9850,6 +9852,7 @@ static int __perf_event_overflow(struct perf_event*event,
>> int throttle, struct perf_sample_data *data,
>> struct pt_regs *regs)
>> {
>> + struct hw_perf_event *hwc = &event->hw;
>> int events = atomic_read(&event->event_limit);
>> int ret = 0;
>>
>> @@ -9869,6 +9872,18 @@ static int __perf_event_overflow(struct perf_event *event,
>> !bpf_overflow_handler(event, data, regs))
>> goto out;
>>
>> + /*
>> + * Swap the sample period to the alternative period
>> + */
>> + if (event->attr.alt_sample_period) {
>> + bool using_alt = hwc->using_alt_sample_period;
>> + u64 sample_period = (using_alt ? event->attr.sample_period
>> + : event->attr.alt_sample_period);
>> +
>> + hwc->sample_period = sample_period;
>> + hwc->using_alt_sample_period = !using_alt;
>> + }
>
> Wouldn't something like this avoid the need for using_alt_sample_period:
>
> if (event->attr.alt_sample_period) {
> if (hwc->sample_period == event->attr.sample_period)
> hwc->sample_period = event->attr.alt_sample_period;
> else
> hwc->sample_period = event->attr.sample_period;
> }
>
> Rob
Hi Rob,
Thanks for looking it over. That would work for this patch but the
second patch in this series adds a variable jitter to the sample
periods. So 'hwc->sample_period' wouldn't be directly comparable to
'attr.sample_period'.
Mark
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 0/5] A mechanism for efficient support for per-function metrics
2025-01-22 16:47 ` [PATCH v2 0/5] A mechanism for efficient support for per-function metrics James Clark
@ 2025-02-07 19:23 ` Mark Barnett
0 siblings, 0 replies; 15+ messages in thread
From: Mark Barnett @ 2025-02-07 19:23 UTC (permalink / raw)
To: James Clark
Cc: ben.gainey, deepak.surti, ak, will, james.clark, mark.rutland,
alexander.shishkin, jolsa, adrian.hunter, linux-perf-users,
linux-kernel, linux-arm-kernel, peterz, mingo, acme, namhyung,
irogers
On 1/22/25 16:47, James Clark wrote:
>
>
> On 06/01/2025 12:01 pm, mark.barnett@arm.com wrote:
>> From: Mark Barnett <mark.barnett@arm.com>
>>
>> This patch introduces the concept of an alternating sample rate to perf
>> core and provides the necessary basic changes in the tools to activate
>> that option.
>>
>
> [...]
>
>> Changes since v1:
>> - Rebased onto perf-tools-next, as per request from Ian Rogers.
>> - Removed unnecessary code that truncated period_left to 0 and
>> restarted
>> the PMU.
>> - Renamed variables to use the shorter 'alt_period' instead of
>> 'alterantive_period'.
>> - Added patch #5 that addresses an issue in the x86 and PowerPC
>> drivers that
>> caused the opposite period to be reported in the sample record.
>
> It sounds like it would be better if patch 5 comes first otherwise the
> feature is introduced as broken.
>
>
Thanks, James. I'll re-order them with the next submission.
Mark
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods
2025-01-21 13:01 ` Leo Yan
@ 2025-03-07 20:28 ` Mark Barnett
2025-03-10 10:55 ` Leo Yan
0 siblings, 1 reply; 15+ messages in thread
From: Mark Barnett @ 2025-03-07 20:28 UTC (permalink / raw)
To: Leo Yan
Cc: peterz, mingo, acme, namhyung, irogers, ben.gainey, deepak.surti,
ak, will, james.clark, mark.rutland, alexander.shishkin, jolsa,
adrian.hunter, linux-perf-users, linux-kernel, linux-arm-kernel
On 1/21/25 13:01, Leo Yan wrote:
>> local64_set(&hwc->period_left, hwc->sample_period);
>>
>> + if (attr->alt_sample_period) {
>> + hwc->sample_period = attr->alt_sample_period;
>> + hwc->using_alt_sample_period = true;
>> + }
>
> My understanding it sets a short sample window for the first period.
> Would it initialize the `hwc->period_left` with the updated sample
> period?
>
It sets the long period first: hwc->period_left is used to program the
PMU when setting up the event, and hwc->sample_period is queued up as
the next period to switch to.
>> +
>> + /*
>> + * alt_sample_period cannot be used with freq
>> + */
>> + if (attr->freq && attr->alt_sample_period)
>> + goto err_ns;
>> +
>
> It is good to validate parameters first. So move the checking before
> the adjustment for the alt sample period.
>
Ack. Done.
>> /*
>> * We do not support PERF_SAMPLE_READ on inherited events unless
>> * PERF_SAMPLE_TID is also selected, which allows inherited events to
>> @@ -12763,9 +12788,19 @@ SYSCALL_DEFINE5(perf_event_open,
>> if (attr.freq) {
>> if (attr.sample_freq > sysctl_perf_event_sample_rate)
>> return -EINVAL;
>> + if (attr.alt_sample_period)
>> + return -EINVAL;
>> } else {
>> if (attr.sample_period & (1ULL << 63))
>> return -EINVAL;
>> + if (attr.alt_sample_period) {
>> + if (!attr.sample_period)
>> + return -EINVAL;
>> + if (attr.alt_sample_period & (1ULL << 63))
>> + return -EINVAL;
>> + if (attr.alt_sample_period == attr.sample_period)
>> + attr.alt_sample_period = 0;
>
> In theory, the attr.alt_sample_period should be less than
> attr.sample_period, right?
>
Added some validation for this.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods
2025-03-07 20:28 ` Mark Barnett
@ 2025-03-10 10:55 ` Leo Yan
0 siblings, 0 replies; 15+ messages in thread
From: Leo Yan @ 2025-03-10 10:55 UTC (permalink / raw)
To: Mark Barnett
Cc: peterz, mingo, acme, namhyung, irogers, ben.gainey, deepak.surti,
ak, will, james.clark, mark.rutland, alexander.shishkin, jolsa,
adrian.hunter, linux-perf-users, linux-kernel, linux-arm-kernel
On Fri, Mar 07, 2025 at 08:28:13PM +0000, Mark Barnett wrote:
> On 1/21/25 13:01, Leo Yan wrote:
> > > local64_set(&hwc->period_left, hwc->sample_period);
> > > + if (attr->alt_sample_period) {
> > > + hwc->sample_period = attr->alt_sample_period;
> > > + hwc->using_alt_sample_period = true;
> > > + }
> >
> > My understanding it sets a short sample window for the first period.
> > Would it initialize the `hwc->period_left` with the updated sample
> > period?
> >
>
> It sets the long period first: hwc->period_left is used to program the PMU
> when setting up the event, and hwc->sample_period is queued up as the next
> period to switch to.
Makes sense to me. Thanks for explanation.
Leo
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-03-10 10:55 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-06 12:01 [PATCH v2 0/5] A mechanism for efficient support for per-function metrics mark.barnett
2025-01-06 12:01 ` [PATCH v2 1/5] perf: Allow periodic events to alternate between two sample periods mark.barnett
2025-01-21 13:01 ` Leo Yan
2025-03-07 20:28 ` Mark Barnett
2025-03-10 10:55 ` Leo Yan
2025-01-31 18:44 ` Rob Herring
2025-02-07 19:18 ` Mark Barnett
2025-01-06 12:01 ` [PATCH v2 2/5] perf: Allow adding fixed random jitter to the alternate sampling period mark.barnett
2025-01-06 12:01 ` [PATCH v2 3/5] tools/perf: Modify event parser to support alt-period term mark.barnett
2025-01-06 12:01 ` [PATCH v2 4/5] tools/perf: Modify event parser to support alt-period-jitter term mark.barnett
2025-01-06 12:01 ` [PATCH v2 5/5] perf: Record sample last_period before updating mark.barnett
2025-01-21 17:22 ` Leo Yan
2025-01-22 5:03 ` kernel test robot
2025-01-22 16:47 ` [PATCH v2 0/5] A mechanism for efficient support for per-function metrics James Clark
2025-02-07 19:23 ` Mark Barnett
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).