* [PATCH] drivers/perf: riscv: do not restart throttled events after overflow
@ 2026-04-15 3:20 Zhanpeng Zhang
2026-04-15 5:44 ` Michael Ellerman
0 siblings, 1 reply; 2+ messages in thread
From: Zhanpeng Zhang @ 2026-04-15 3:20 UTC (permalink / raw)
To: atish.patra, anup, will, palmer, pjw, cuiyunhui
Cc: linux-riscv, linux-kernel, yuanzhu, Zhanpeng Zhang
Perf core uses `event->hw.interrupts == MAX_INTERRUPTS` to keep
throttled events stopped until it explicitly unthrottles them later.
However, current RISC-V Perf/PMU system unconditionally restarts all the
counters at the end of overflow handler, which bypasses the perf core's
throttle mechanism. Therefore, an unreasonable small sampling period
such as `perf top -c 20` may cause an IRQ storm and eventually leads to
soft lockup.
Fix this by filtering the counter start/restart mask: do not restart
counters for events already marked as throttled by the perf core. This
retains the throttle effect and prevents interrupt storms in such
workloads.
Signed-off-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
---
drivers/perf/riscv_pmu_sbi.c | 45 ++++++++++++++++++++++++++++++++++--
1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
index 385af5e6e6d0..664f9b86c468 100644
--- a/drivers/perf/riscv_pmu_sbi.c
+++ b/drivers/perf/riscv_pmu_sbi.c
@@ -60,6 +60,33 @@ asm volatile(ALTERNATIVE( \
#define PERF_EVENT_FLAG_LEGACY BIT(SYSCTL_LEGACY)
PMU_FORMAT_ATTR(event, "config:0-55");
+static inline bool rvpmu_perf_event_is_throttled(struct perf_event *event)
+{
+ return event->hw.interrupts == MAX_INTERRUPTS;
+}
+
+/*
+ * Return a mask of counters that must not be restarted.
+ * `base` is the starting bit index to limit the mask to this long word.
+ */
+static inline unsigned long rvpmu_get_throttled_mask(struct perf_event **events,
+ unsigned long mask,
+ int base)
+{
+ unsigned long tmp = mask, throttled = 0;
+ int bit = -1;
+ int nr_bits = min_t(int, BITS_PER_LONG, RISCV_MAX_COUNTERS - base);
+
+ for_each_set_bit(bit, &tmp, nr_bits) {
+ struct perf_event *event = events[bit];
+
+ if (!event || rvpmu_perf_event_is_throttled(event))
+ throttled |= BIT(bit);
+ }
+
+ return throttled;
+}
+
PMU_FORMAT_ATTR(firmware, "config:62-63");
static bool sbi_v2_available;
@@ -1005,6 +1032,8 @@ static inline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_
for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
if (ctr_ovf_mask & BIT(idx)) {
event = cpu_hw_evt->events[idx];
+ if (!event || rvpmu_perf_event_is_throttled(event))
+ continue;
hwc = &event->hw;
max_period = riscv_pmu_ctr_get_width_mask(event);
init_val = local64_read(&hwc->prev_count) & max_period;
@@ -1017,13 +1046,25 @@ static inline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_
}
for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++) {
+ int base = i * BITS_PER_LONG;
+ unsigned long throttled;
+ unsigned long used;
+ unsigned long start_mask;
+
+ used = cpu_hw_evt->used_hw_ctrs[i];
+ throttled = rvpmu_get_throttled_mask(cpu_hw_evt->events + base,
+ used, base);
+ start_mask = used & ~throttled;
+ if (!start_mask)
+ continue;
+
/* Restore the counter values to relative indices for used hw counters */
for_each_set_bit(idx, &cpu_hw_evt->used_hw_ctrs[i], BITS_PER_LONG)
sdata->ctr_values[idx] =
cpu_hw_evt->snapshot_cval_shcopy[idx + i * BITS_PER_LONG];
/* Start all the counters in a single shot */
- sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, idx * BITS_PER_LONG,
- cpu_hw_evt->used_hw_ctrs[i], flag, 0, 0, 0);
+ sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, i * BITS_PER_LONG,
+ start_mask, flag, 0, 0, 0);
}
}
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] drivers/perf: riscv: do not restart throttled events after overflow
2026-04-15 3:20 [PATCH] drivers/perf: riscv: do not restart throttled events after overflow Zhanpeng Zhang
@ 2026-04-15 5:44 ` Michael Ellerman
0 siblings, 0 replies; 2+ messages in thread
From: Michael Ellerman @ 2026-04-15 5:44 UTC (permalink / raw)
To: Zhanpeng Zhang, atish.patra, anup, will, palmer, pjw, cuiyunhui
Cc: linux-riscv, linux-kernel, yuanzhu
On 15/4/2026 13:20, Zhanpeng Zhang wrote:
> Perf core uses `event->hw.interrupts == MAX_INTERRUPTS` to keep
> throttled events stopped until it explicitly unthrottles them later.
>
> However, current RISC-V Perf/PMU system unconditionally restarts all the
> counters at the end of overflow handler, which bypasses the perf core's
> throttle mechanism. Therefore, an unreasonable small sampling period
> such as `perf top -c 20` may cause an IRQ storm and eventually leads to
> soft lockup.
>
> Fix this by filtering the counter start/restart mask: do not restart
> counters for events already marked as throttled by the perf core. This
> retains the throttle effect and prevents interrupt storms in such
> workloads.
>
> Signed-off-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
> ---
> drivers/perf/riscv_pmu_sbi.c | 45 ++++++++++++++++++++++++++++++++++--
> 1 file changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 385af5e6e6d0..664f9b86c468 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -60,6 +60,33 @@ asm volatile(ALTERNATIVE( \
> #define PERF_EVENT_FLAG_LEGACY BIT(SYSCTL_LEGACY)
>
> PMU_FORMAT_ATTR(event, "config:0-55");
> +static inline bool rvpmu_perf_event_is_throttled(struct perf_event *event)
> +{
> + return event->hw.interrupts == MAX_INTERRUPTS;
> +}
I don't see any other arch or driver code looking for hw.interrupts ==
MAX_INTERRUPTS.
The events/core.c code calls event->pmu->stop(event, 0) when an event is
throttled. I think the arch/driver code should be using that as the
signal that the event should no longer be running.
So something like filtering out stopped events (PERF_HES_STOPPED) seems
like it would be the right fix.
> +/*
> + * Return a mask of counters that must not be restarted.
> + * `base` is the starting bit index to limit the mask to this long word.
> + */
> +static inline unsigned long rvpmu_get_throttled_mask(struct perf_event **events,
> + unsigned long mask,
> + int base)
> +{
> + unsigned long tmp = mask, throttled = 0;
> + int bit = -1;
> + int nr_bits = min_t(int, BITS_PER_LONG, RISCV_MAX_COUNTERS - base);
> +
> + for_each_set_bit(bit, &tmp, nr_bits) {
> + struct perf_event *event = events[bit];
> +
> + if (!event || rvpmu_perf_event_is_throttled(event))
> + throttled |= BIT(bit);
> + }
> +
> + return throttled;
> +}
> +
> PMU_FORMAT_ATTR(firmware, "config:62-63");
>
> static bool sbi_v2_available;
> @@ -1005,6 +1032,8 @@ static inline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_
> for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
> if (ctr_ovf_mask & BIT(idx)) {
> event = cpu_hw_evt->events[idx];
> + if (!event || rvpmu_perf_event_is_throttled(event))
> + continue;
> hwc = &event->hw;
> max_period = riscv_pmu_ctr_get_width_mask(event);
> init_val = local64_read(&hwc->prev_count) & max_period;
> @@ -1017,13 +1046,25 @@ static inline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_
> }
>
> for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++) {
> + int base = i * BITS_PER_LONG;
> + unsigned long throttled;
> + unsigned long used;
> + unsigned long start_mask;
> +
> + used = cpu_hw_evt->used_hw_ctrs[i];
> + throttled = rvpmu_get_throttled_mask(cpu_hw_evt->events + base,
> + used, base);
> + start_mask = used & ~throttled;
> + if (!start_mask)
> + continue;
> +
> /* Restore the counter values to relative indices for used hw counters */
> for_each_set_bit(idx, &cpu_hw_evt->used_hw_ctrs[i], BITS_PER_LONG)
> sdata->ctr_values[idx] =
> cpu_hw_evt->snapshot_cval_shcopy[idx + i * BITS_PER_LONG];
> /* Start all the counters in a single shot */
> - sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, idx * BITS_PER_LONG,
> - cpu_hw_evt->used_hw_ctrs[i], flag, 0, 0, 0);
> + sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, i * BITS_PER_LONG,
> + start_mask, flag, 0, 0, 0);
> }
> }
You only patch the snapshot case. Is the non-snapshot path unaffected?
If so you should explain why.
Can you identify a commit where the bad behaviour was introduced? Maybe
when snapshot support was added, if indeed it's only the snapshot path
that is buggy.
cheers
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-04-15 5:44 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-15 3:20 [PATCH] drivers/perf: riscv: do not restart throttled events after overflow Zhanpeng Zhang
2026-04-15 5:44 ` Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox