From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73C6929DB6C
	for <linux-kernel@vger.kernel.org>; Wed, 15 Apr 2026 05:44:21 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1776231861; cv=none; b=BwO7dGZNramEBVta5sQQXxYWZtnV4USZGAdC/PYa8aPLKOI/8aLhniFAneqlsSexgwocRbODWQA5fJCxOSzwXG6lHR88mciOAXZIfZ6tCVGIYByT7E9msF1ZELvcZ/7wWhmlM0k/ACD6KrZmMwsHUAKpJbj13DPvZ4x4NyuLk4g=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1776231861; c=relaxed/simple;
	bh=AH20hvpu+D4qi2v6kSM8ydKG6lVcHQUGwCcXo764m2M=;
	h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:
	 In-Reply-To:Content-Type; b=n07pmVV8/WoeyYN5bEtLMbQz5EeHTLhRh/u+nlPMudpEL2dvtQKQ3bdAnETszhvQUKCcCeNuk4V6WeZPKRnovDvI+tMk9MaMj6r2E45Spz5C1Tk4ZBT/bzVmPKK9fPYWi3qEoQ5kgKKZll0PzClAZwee1SS3TgyZObuuSM0QxbY=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UQS9PJ2d; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UQS9PJ2d"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9476CC2BCB0;
	Wed, 15 Apr 2026 05:44:18 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1776231861;
	bh=AH20hvpu+D4qi2v6kSM8ydKG6lVcHQUGwCcXo764m2M=;
	h=Date:Subject:To:Cc:References:From:In-Reply-To:From;
	b=UQS9PJ2dvuriIS6Zzg9kPQ2SZM4aFsC/Qvika0GQFzqIBh8zFnc73U8uMK6Fcwe4k
	 xepMUFHalQ54HZf+FhrlC/8aMRm1ecB4ELaCy06+lRK5RIW4xMYr9cxqvYJhDCbLwL
	 7IlAhShgZUHV6ZfjIvRE8c4nlz4YP/luIf+4/B9gok/EKBb8EGYVpGY48TGwmlSW+B
	 i+PU0/Bo5HJeit65EKU0UmOmd3yl3pN6fH8FOJE1H4eFlR7VfHCoOD+wYJS1dPd21e
	 KzvyoEuXiEDqySemJydp2XyzCV3DW6bSg1bD14P9kuEZKjE1vtXrHjMQzFuO1WjL0h
	 1yfSclanj1Hvg==
Message-ID: <6f07e5eb-306a-4d29-840c-c29771cff9bd@kernel.org>
Date: Wed, 15 Apr 2026 15:44:16 +1000
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH] drivers/perf: riscv: do not restart throttled events
 after overflow
To: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>,
 atish.patra@linux.dev, anup@brainfault.org, will@kernel.org,
 palmer@dabbelt.com, pjw@kernel.org, cuiyunhui@bytedance.com
Cc: linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
 yuanzhu@bytedance.com
References: <20260415032017.10712-1-zhangzhanpeng.jasper@bytedance.com>
Content-Language: en-US
From: Michael Ellerman <mpe@kernel.org>
In-Reply-To: <20260415032017.10712-1-zhangzhanpeng.jasper@bytedance.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

On 15/4/2026 13:20, Zhanpeng Zhang wrote:
> Perf core uses `event->hw.interrupts == MAX_INTERRUPTS` to keep
> throttled events stopped until it explicitly unthrottles them later.
> 
> However, current RISC-V Perf/PMU system unconditionally restarts all the
> counters at the end of overflow handler, which bypasses the perf core's
> throttle mechanism. Therefore, an unreasonable small sampling period
> such as `perf top -c 20` may cause an IRQ storm and eventually leads to
> soft lockup.
> 
> Fix this by filtering the counter start/restart mask: do not restart
> counters for events already marked as throttled by the perf core. This
> retains the throttle effect and prevents interrupt storms in such
> workloads.
> 
> Signed-off-by: Zhanpeng Zhang <zhangzhanpeng.jasper@bytedance.com>
> ---
>   drivers/perf/riscv_pmu_sbi.c | 45 ++++++++++++++++++++++++++++++++++--
>   1 file changed, 43 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/perf/riscv_pmu_sbi.c b/drivers/perf/riscv_pmu_sbi.c
> index 385af5e6e6d0..664f9b86c468 100644
> --- a/drivers/perf/riscv_pmu_sbi.c
> +++ b/drivers/perf/riscv_pmu_sbi.c
> @@ -60,6 +60,33 @@ asm volatile(ALTERNATIVE(						\
>   #define PERF_EVENT_FLAG_LEGACY		BIT(SYSCTL_LEGACY)
>   
>   PMU_FORMAT_ATTR(event, "config:0-55");
> +static inline bool rvpmu_perf_event_is_throttled(struct perf_event *event)
> +{
> +	return event->hw.interrupts == MAX_INTERRUPTS;
> +}

I don't see any other arch or driver code looking for hw.interrupts == 
MAX_INTERRUPTS.

The events/core.c code calls event->pmu->stop(event, 0) when an event is 
throttled. I think the arch/driver code should be using that as the 
signal that the event should no longer be running.

So something like filtering out stopped events (PERF_HES_STOPPED) seems 
like it would be the right fix.

> +/*
> + * Return a mask of counters that must not be restarted.
> + * `base` is the starting bit index to limit the mask to this long word.
> + */
> +static inline unsigned long rvpmu_get_throttled_mask(struct perf_event **events,
> +						     unsigned long mask,
> +						     int base)
> +{
> +	unsigned long tmp = mask, throttled = 0;
> +	int bit = -1;
> +	int nr_bits = min_t(int, BITS_PER_LONG, RISCV_MAX_COUNTERS - base);
> +
> +	for_each_set_bit(bit, &tmp, nr_bits) {
> +		struct perf_event *event = events[bit];
> +
> +		if (!event || rvpmu_perf_event_is_throttled(event))
> +			throttled |= BIT(bit);
> +	}
> +
> +	return throttled;
> +}
> +
>   PMU_FORMAT_ATTR(firmware, "config:62-63");
>   
>   static bool sbi_v2_available;
> @@ -1005,6 +1032,8 @@ static inline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_
>   	for_each_set_bit(idx, cpu_hw_evt->used_hw_ctrs, RISCV_MAX_COUNTERS) {
>   		if (ctr_ovf_mask & BIT(idx)) {
>   			event = cpu_hw_evt->events[idx];
> +			if (!event || rvpmu_perf_event_is_throttled(event))
> +				continue;
>   			hwc = &event->hw;
>   			max_period = riscv_pmu_ctr_get_width_mask(event);
>   			init_val = local64_read(&hwc->prev_count) & max_period;
> @@ -1017,13 +1046,25 @@ static inline void pmu_sbi_start_ovf_ctrs_snapshot(struct cpu_hw_events *cpu_hw_
>   	}
>   
>   	for (i = 0; i < BITS_TO_LONGS(RISCV_MAX_COUNTERS); i++) {
> +		int base = i * BITS_PER_LONG;
> +		unsigned long throttled;
> +		unsigned long used;
> +		unsigned long start_mask;
> +
> +		used = cpu_hw_evt->used_hw_ctrs[i];
> +		throttled = rvpmu_get_throttled_mask(cpu_hw_evt->events + base,
> +						     used, base);
> +		start_mask = used & ~throttled;
> +		if (!start_mask)
> +			continue;
> +
>   		/* Restore the counter values to relative indices for used hw counters */
>   		for_each_set_bit(idx, &cpu_hw_evt->used_hw_ctrs[i], BITS_PER_LONG)
>   			sdata->ctr_values[idx] =
>   					cpu_hw_evt->snapshot_cval_shcopy[idx + i * BITS_PER_LONG];
>   		/* Start all the counters in a single shot */
> -		sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, idx * BITS_PER_LONG,
> -			  cpu_hw_evt->used_hw_ctrs[i], flag, 0, 0, 0);
> +		sbi_ecall(SBI_EXT_PMU, SBI_EXT_PMU_COUNTER_START, i * BITS_PER_LONG,
> +			  start_mask, flag, 0, 0, 0);
>   	}
>   }

You only patch the snapshot case. Is the non-snapshot path unaffected? 
If so you should explain why.

Can you identify a commit where the bad behaviour was introduced? Maybe 
when snapshot support was added, if indeed it's only the snapshot path 
that is buggy.

cheers