[PATCH V3] perf: Fix the throttle error of some clock events

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V3] perf: Fix the throttle error of some clock events
@ 2025-06-04 17:15 kan.liang
  2025-06-04 20:08 ` Ian Rogers
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: kan.liang @ 2025-06-04 17:15 UTC (permalink / raw)
  To: peterz, mingo, namhyung, irogers, mark.rutland, linux-kernel,
	linux-perf-users
  Cc: eranian, ctshao, tmricht, Kan Liang, Leo Yan, Aishwarya TCV,
	Alexei Starovoitov, Venkat Rao Bagalkote

From: Kan Liang <kan.liang@linux.intel.com>

Both ARM and IBM CI reports RCU stall, which can be reproduced by the
below perf command.
  perf record -a -e cpu-clock -- sleep 2

The issue is introduced by the generic throttle patch set, which
unconditionally invoke the event_stop() when throttle is triggered.

The cpu-clock and task-clock are two special SW events, which rely on
the hrtimer. The throttle is invoked in the hrtimer handler. The
event_stop()->hrtimer_cancel() waits for the handler to finish, which is
a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
be used to stop the timer.

There may be two ways to fix it.
- Introduce a PMU flag to track the case. Avoid the event_stop in
  perf_event_throttle() if the flag is detected.
  It has been implemented in the
  https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
  The new flag was thought to be an overkill for the issue.
- Add a check in the event_stop. Return immediately if the throttle is
  invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
  method to stop the timer.

The latter is implemented here.

Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
the order the same as perf_event_unthrottle(). Except the patch, no one
checks the hw.interrupts in the stop(). There is no impact from the
order change.

Reported-by: Leo Yan <leo.yan@arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V2:
- Apply a different way to fix the issue.
  Remove all Tested-by since a different way is applied
- Update the change log
- Add more Reported-by

 kernel/events/core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f34c99f8ce8f..46441c23475d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct perf_event *event, bool start)
 
 static void perf_event_throttle(struct perf_event *event)
 {
-	event->pmu->stop(event, 0);
 	event->hw.interrupts = MAX_INTERRUPTS;
+	event->pmu->stop(event, 0);
 	if (event == event->group_leader)
 		perf_log_throttle(event, 0);
 }
@@ -11749,7 +11749,12 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
-	if (is_sampling_event(event)) {
+	/*
+	 * The throttle can be triggered in the hrtimer handler.
+	 * The HRTIMER_NORESTART should be used to stop the timer,
+	 * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
+	 */
+	if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
 		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
 		local64_set(&hwc->period_left, ktime_to_ns(remaining));
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-04 17:15 [PATCH V3] perf: Fix the throttle error of some clock events kan.liang
@ 2025-06-04 20:08 ` Ian Rogers
  2025-06-04 23:21 ` Alexei Starovoitov
  2025-06-05  6:39 ` Ingo Molnar
  2 siblings, 0 replies; 16+ messages in thread
From: Ian Rogers @ 2025-06-04 20:08 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, namhyung, mark.rutland, linux-kernel,
	linux-perf-users, eranian, ctshao, tmricht, Leo Yan,
	Aishwarya TCV, Alexei Starovoitov, Venkat Rao Bagalkote

On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
> below perf command.
>   perf record -a -e cpu-clock -- sleep 2
>
> The issue is introduced by the generic throttle patch set, which
> unconditionally invoke the event_stop() when throttle is triggered.
>
> The cpu-clock and task-clock are two special SW events, which rely on
> the hrtimer. The throttle is invoked in the hrtimer handler. The
> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
> be used to stop the timer.
>
> There may be two ways to fix it.
> - Introduce a PMU flag to track the case. Avoid the event_stop in
>   perf_event_throttle() if the flag is detected.
>   It has been implemented in the
>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>   The new flag was thought to be an overkill for the issue.
> - Add a check in the event_stop. Return immediately if the throttle is
>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>   method to stop the timer.
>
> The latter is implemented here.
>
> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
> the order the same as perf_event_unthrottle(). Except the patch, no one
> checks the hw.interrupts in the stop(). There is no impact from the
> order change.
>
> Reported-by: Leo Yan <leo.yan@arm.com>
> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>

Reviewed-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>
> Changes since V2:
> - Apply a different way to fix the issue.
>   Remove all Tested-by since a different way is applied
> - Update the change log
> - Add more Reported-by
>
>  kernel/events/core.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f34c99f8ce8f..46441c23475d 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct perf_event *event, bool start)
>
>  static void perf_event_throttle(struct perf_event *event)
>  {
> -       event->pmu->stop(event, 0);
>         event->hw.interrupts = MAX_INTERRUPTS;
> +       event->pmu->stop(event, 0);
>         if (event == event->group_leader)
>                 perf_log_throttle(event, 0);
>  }
> @@ -11749,7 +11749,12 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
>  {
>         struct hw_perf_event *hwc = &event->hw;
>
> -       if (is_sampling_event(event)) {
> +       /*
> +        * The throttle can be triggered in the hrtimer handler.
> +        * The HRTIMER_NORESTART should be used to stop the timer,
> +        * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
> +        */
> +       if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
>                 ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
>                 local64_set(&hwc->period_left, ktime_to_ns(remaining));
>
> --
> 2.38.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-04 17:15 [PATCH V3] perf: Fix the throttle error of some clock events kan.liang
  2025-06-04 20:08 ` Ian Rogers
@ 2025-06-04 23:21 ` Alexei Starovoitov
  2025-06-05 13:46   ` Liang, Kan
  2025-06-05  6:39 ` Ingo Molnar
  2 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-06-04 23:21 UTC (permalink / raw)
  To: Kan Liang
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote

On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
> below perf command.
>   perf record -a -e cpu-clock -- sleep 2
>
> The issue is introduced by the generic throttle patch set, which
> unconditionally invoke the event_stop() when throttle is triggered.
>
> The cpu-clock and task-clock are two special SW events, which rely on
> the hrtimer. The throttle is invoked in the hrtimer handler. The
> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
> be used to stop the timer.
>
> There may be two ways to fix it.
> - Introduce a PMU flag to track the case. Avoid the event_stop in
>   perf_event_throttle() if the flag is detected.
>   It has been implemented in the
>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>   The new flag was thought to be an overkill for the issue.
> - Add a check in the event_stop. Return immediately if the throttle is
>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>   method to stop the timer.
>
> The latter is implemented here.
>
> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
> the order the same as perf_event_unthrottle(). Except the patch, no one
> checks the hw.interrupts in the stop(). There is no impact from the
> order change.
>
> Reported-by: Leo Yan <leo.yan@arm.com>
> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>

It seems the patch fixes one issue and introduces another ?

Looks like the throttle event is sticky.
Once it's reached the perf_event no longer works ?
Repro:
test_progs -t perf_branches/perf_branches_no_hw
#250/2   perf_branches/perf_branches_no_hw:OK

test_progs -t stacktrace_build_id_nmi
#393     stacktrace_build_id_nmi:OK

test_progs -t perf_branches/perf_branches_no_hw
perf_branches/perf_branches_no_hw:FAIL

Maybe it's an unrelated bug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-04 17:15 [PATCH V3] perf: Fix the throttle error of some clock events kan.liang
  2025-06-04 20:08 ` Ian Rogers
  2025-06-04 23:21 ` Alexei Starovoitov
@ 2025-06-05  6:39 ` Ingo Molnar
  2025-06-05 13:51   ` Liang, Kan
  2 siblings, 1 reply; 16+ messages in thread
From: Ingo Molnar @ 2025-06-05  6:39 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, namhyung, irogers, mark.rutland, linux-kernel,
	linux-perf-users, eranian, ctshao, tmricht, Leo Yan,
	Aishwarya TCV, Alexei Starovoitov, Venkat Rao Bagalkote


* kan.liang@linux.intel.com <kan.liang@linux.intel.com> wrote:

> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
> below perf command.
>   perf record -a -e cpu-clock -- sleep 2
> 
> The issue is introduced by the generic throttle patch set, which
> unconditionally invoke the event_stop() when throttle is triggered.
> 
> The cpu-clock and task-clock are two special SW events, which rely on
> the hrtimer. The throttle is invoked in the hrtimer handler. The
> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
> be used to stop the timer.
> 
> There may be two ways to fix it.
> - Introduce a PMU flag to track the case. Avoid the event_stop in
>   perf_event_throttle() if the flag is detected.
>   It has been implemented in the
>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>   The new flag was thought to be an overkill for the issue.
> - Add a check in the event_stop. Return immediately if the throttle is
>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>   method to stop the timer.
> 
> The latter is implemented here.
> 
> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
> the order the same as perf_event_unthrottle(). Except the patch, no one
> checks the hw.interrupts in the stop(). There is no impact from the
> order change.
> 
> Reported-by: Leo Yan <leo.yan@arm.com>
> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>

Any idea which commit introduced this bug?

Was it:

  9734e25fbf5a perf: Fix the throttle logic for a group
 
plus the followup driver updates?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-04 23:21 ` Alexei Starovoitov
@ 2025-06-05 13:46   ` Liang, Kan
  2025-06-05 16:58     ` Liang, Kan
  2025-06-05 18:45     ` Alexei Starovoitov
  0 siblings, 2 replies; 16+ messages in thread
From: Liang, Kan @ 2025-06-05 13:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote



On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
>>
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
>> below perf command.
>>   perf record -a -e cpu-clock -- sleep 2
>>
>> The issue is introduced by the generic throttle patch set, which
>> unconditionally invoke the event_stop() when throttle is triggered.
>>
>> The cpu-clock and task-clock are two special SW events, which rely on
>> the hrtimer. The throttle is invoked in the hrtimer handler. The
>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
>> be used to stop the timer.
>>
>> There may be two ways to fix it.
>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>>   perf_event_throttle() if the flag is detected.
>>   It has been implemented in the
>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>   The new flag was thought to be an overkill for the issue.
>> - Add a check in the event_stop. Return immediately if the throttle is
>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>>   method to stop the timer.
>>
>> The latter is implemented here.
>>
>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>> the order the same as perf_event_unthrottle(). Except the patch, no one
>> checks the hw.interrupts in the stop(). There is no impact from the
>> order change.
>>
>> Reported-by: Leo Yan <leo.yan@arm.com>
>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> 
> It seems the patch fixes one issue and introduces another ?
> 
> Looks like the throttle event is sticky.
> Once it's reached the perf_event no longer works ?

No. It should still work even the throttle is triggered.

sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
/proc/sys/kernel/perf_event_max_sample_rate'
sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.564 MB perf.data (613 samples) ]
sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.545 MB perf.data (502 samples) ]


> Repro:
> test_progs -t perf_branches/perf_branches_no_hw
> #250/2   perf_branches/perf_branches_no_hw:OK
> 
> test_progs -t stacktrace_build_id_nmi
> #393     stacktrace_build_id_nmi:OK
> 
> test_progs -t perf_branches/perf_branches_no_hw
> perf_branches/perf_branches_no_hw:FAIL
> 

Do you have more logs regarding where it's failed?

Thanks,
Kan> Maybe it's an unrelated bug.
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-05  6:39 ` Ingo Molnar
@ 2025-06-05 13:51   ` Liang, Kan
  0 siblings, 0 replies; 16+ messages in thread
From: Liang, Kan @ 2025-06-05 13:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: peterz, mingo, namhyung, irogers, mark.rutland, linux-kernel,
	linux-perf-users, eranian, ctshao, tmricht, Leo Yan,
	Aishwarya TCV, Alexei Starovoitov, Venkat Rao Bagalkote



On 2025-06-05 2:39 a.m., Ingo Molnar wrote:
> 
> * kan.liang@linux.intel.com <kan.liang@linux.intel.com> wrote:
> 
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
>> below perf command.
>>   perf record -a -e cpu-clock -- sleep 2
>>
>> The issue is introduced by the generic throttle patch set, which
>> unconditionally invoke the event_stop() when throttle is triggered.
>>
>> The cpu-clock and task-clock are two special SW events, which rely on
>> the hrtimer. The throttle is invoked in the hrtimer handler. The
>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
>> be used to stop the timer.
>>
>> There may be two ways to fix it.
>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>>   perf_event_throttle() if the flag is detected.
>>   It has been implemented in the
>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>   The new flag was thought to be an overkill for the issue.
>> - Add a check in the event_stop. Return immediately if the throttle is
>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>>   method to stop the timer.
>>
>> The latter is implemented here.
>>
>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>> the order the same as perf_event_unthrottle(). Except the patch, no one
>> checks the hw.interrupts in the stop(). There is no impact from the
>> order change.
>>
>> Reported-by: Leo Yan <leo.yan@arm.com>
>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> 
> Any idea which commit introduced this bug?
> 
> Was it:
> 
>   9734e25fbf5a perf: Fix the throttle logic for a group
>  

Yes.
Since it is still in the tip.git, I'm not sure if the commit ID is valid
for the Fixes tag. so I didn't mention the commit ID in the log.

Thanks,
Kan

> plus the followup driver updates?
> > Thanks,
> 
> 	Ingo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-05 13:46   ` Liang, Kan
@ 2025-06-05 16:58     ` Liang, Kan
  2025-06-05 18:45     ` Alexei Starovoitov
  1 sibling, 0 replies; 16+ messages in thread
From: Liang, Kan @ 2025-06-05 16:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote



On 2025-06-05 9:46 a.m., Liang, Kan wrote:
> 
> 
> On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
>> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
>>>
>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>
>>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
>>> below perf command.
>>>   perf record -a -e cpu-clock -- sleep 2
>>>
>>> The issue is introduced by the generic throttle patch set, which
>>> unconditionally invoke the event_stop() when throttle is triggered.
>>>
>>> The cpu-clock and task-clock are two special SW events, which rely on
>>> the hrtimer. The throttle is invoked in the hrtimer handler. The
>>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
>>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
>>> be used to stop the timer.
>>>
>>> There may be two ways to fix it.
>>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>>>   perf_event_throttle() if the flag is detected.
>>>   It has been implemented in the
>>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>>   The new flag was thought to be an overkill for the issue.
>>> - Add a check in the event_stop. Return immediately if the throttle is
>>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>>>   method to stop the timer.
>>>
>>> The latter is implemented here.
>>>
>>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>>> the order the same as perf_event_unthrottle(). Except the patch, no one
>>> checks the hw.interrupts in the stop(). There is no impact from the
>>> order change.
>>>
>>> Reported-by: Leo Yan <leo.yan@arm.com>
>>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
>>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>
>> It seems the patch fixes one issue and introduces another ?
>>
>> Looks like the throttle event is sticky.
>> Once it's reached the perf_event no longer works ?
> 
> No. It should still work even the throttle is triggered.
> 
> sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
> /proc/sys/kernel/perf_event_max_sample_rate'
> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.564 MB perf.data (613 samples) ]
> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.545 MB perf.data (502 samples) ]
> 
> 
>> Repro:
>> test_progs -t perf_branches/perf_branches_no_hw
>> #250/2   perf_branches/perf_branches_no_hw:OK
>>
>> test_progs -t stacktrace_build_id_nmi
>> #393     stacktrace_build_id_nmi:OK
>>
>> test_progs -t perf_branches/perf_branches_no_hw
>> perf_branches/perf_branches_no_hw:FAIL
>>
> 
> Do you have more logs regarding where it's failed?
> 
> Thanks,
> Kan> Maybe it's an unrelated bug.

I tried the tests on my machine. I don't see any issues.

sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
perf_branches/perf_branches_no_hw
#240/2   perf_branches/perf_branches_no_hw:OK
#240     perf_branches:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
stacktrace_build_id_nmi
#376     stacktrace_build_id_nmi:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
perf_branches/perf_branches_no_hw
#240/2   perf_branches/perf_branches_no_hw:OK
#240     perf_branches:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
stacktrace_build_id_nmi
#376     stacktrace_build_id_nmi:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
perf_branches/perf_branches_no_hw
#240/2   perf_branches/perf_branches_no_hw:OK
#240     perf_branches:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED


Thanks,
Kan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-05 13:46   ` Liang, Kan
  2025-06-05 16:58     ` Liang, Kan
@ 2025-06-05 18:45     ` Alexei Starovoitov
  2025-06-05 20:24       ` Liang, Kan
  1 sibling, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-06-05 18:45 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote

On Thu, Jun 5, 2025 at 6:46 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
> > On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
> >>
> >> From: Kan Liang <kan.liang@linux.intel.com>
> >>
> >> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
> >> below perf command.
> >>   perf record -a -e cpu-clock -- sleep 2
> >>
> >> The issue is introduced by the generic throttle patch set, which
> >> unconditionally invoke the event_stop() when throttle is triggered.
> >>
> >> The cpu-clock and task-clock are two special SW events, which rely on
> >> the hrtimer. The throttle is invoked in the hrtimer handler. The
> >> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
> >> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
> >> be used to stop the timer.
> >>
> >> There may be two ways to fix it.
> >> - Introduce a PMU flag to track the case. Avoid the event_stop in
> >>   perf_event_throttle() if the flag is detected.
> >>   It has been implemented in the
> >>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
> >>   The new flag was thought to be an overkill for the issue.
> >> - Add a check in the event_stop. Return immediately if the throttle is
> >>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
> >>   method to stop the timer.
> >>
> >> The latter is implemented here.
> >>
> >> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
> >> the order the same as perf_event_unthrottle(). Except the patch, no one
> >> checks the hw.interrupts in the stop(). There is no impact from the
> >> order change.
> >>
> >> Reported-by: Leo Yan <leo.yan@arm.com>
> >> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
> >> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
> >> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> >> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
> >> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> >> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
> >> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> >
> > It seems the patch fixes one issue and introduces another ?
> >
> > Looks like the throttle event is sticky.
> > Once it's reached the perf_event no longer works ?
>
> No. It should still work even the throttle is triggered.
>
> sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
> /proc/sys/kernel/perf_event_max_sample_rate'
> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]

With the patch applied above command hangs in my VM:

# perf record -a -e cpu-clock -c10000 -- sleep 1
[   43.656855] hrtimer: interrupt took 21640 ns
[   68.561052] watchdog: BUG: soft lockup - CPU#0 stuck for 41s! [perf:2253]
[   68.561056] Modules linked in: bpf_preload
[   68.561060] CPU: 0 UID: 0 PID: 2253 Comm: perf Not tainted
6.15.0-12294-gc89e5202e569 #1172 PREEMPT
[   68.561062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[   68.561063] RIP: 0010:handle_softirqs+0x77/0x2a0
[   68.561070] Code: 6b 96 02 00 01 00 00 bd 0a 00 00 00 44 89 64 24
14 89 6c 24 10 40 88 7c 24 04 31 c0 65 66 89 05 5f 6b 96 02 fb bb ff
ff ff ff <48> c7 c0 c0 f
[   68.561071] RSP: 0018:ffa0000000003fa0 EFLAGS: 00000246
[   68.561072] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0
[   68.561073] RDX: 0000000000000007 RSI: ff1100010212e100 RDI: 0000000000000000
[   68.561074] RBP: 000000000000000a R08: 0000000000000000 R09: 7fffffffffffffff
[   68.561074] R10: 00000005a3af2140 R11: 0000000000004601 R12: 0000000000400100
[   68.561075] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
[   68.561111] FS:  00007fa1b536f780(0000) GS:ff110004abe26000(0000)
knlGS:0000000000000000
[   68.561112] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   68.561113] CR2: 0000000001c67a30 CR3: 000000010f2f3003 CR4: 00000000003716f0
[   68.561113] Call Trace:
[   68.561170]  <IRQ>
[   68.561174]  irq_exit_rcu+0x91/0xb0
[   68.561176]  sysvec_apic_timer_interrupt+0x71/0x90
[   68.561181]  </IRQ>
[   68.561181]  <TASK>
[   68.561182]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[   68.561184] RIP: 0010:generic_exec_single+0x33/0x120
[   68.561188] Code: 65 39 3d 5c 4d 89 02 74 28 3b 3d b8 f5 17 02 0f
83 de 00 00 00 89 f8 48 0f a3 05 d8 f1 17 02 0f 83 ce 00 00 00 e8 bd
fe ff ff <31> c0 5b 5d 9
[   68.561189] RSP: 0018:ffa00000025a3cc0 EFLAGS: 00000206
[   68.561190] RAX: 0000000000000000 RBX: ffffffff8145b310 RCX: ff110004abe26000
[   68.561190] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ff1100042fa26540
[   68.561191] RBP: 0000000000000202 R08: 0000000000000000 R09: 0000000000000000
[   68.561191] R10: ffa00000025a3ee0 R11: 0000000000000000 R12: ffa00000025a3d40
[   68.561192] R13: ff11000100881220 R14: ff11000100880fc0 R15: 0000000000000000
[   68.561192]  ? sw_perf_event_destroy+0x70/0x70
[   68.561197]  smp_call_function_single+0xc4/0x110
[   68.561199]  ? sw_perf_event_destroy+0x70/0x70
[   68.561200]  event_function_call+0x160/0x170
[   68.561202]  ? ctx_resched+0x2d0/0x2d0
[   68.561205]  ? perf_event_set_state+0x60/0x60
[   68.561206]  ? _perf_event_disable+0x50/0x50
[   68.561208]  perf_event_for_each_child+0x37/0x80
[   68.561209]  ? _perf_event_disable+0x50/0x50
[   68.561211]  _perf_ioctl+0x1df/0xad0
[   68.561213]  ? __set_cpus_allowed_ptr+0x71/0x80
[   68.561216]  ? avc_has_perm+0x72/0x160
[   68.561219]  ? ldsem_down_write+0x1bb/0x1fc
[   68.561222]  perf_ioctl+0x42/0x70
[   68.561224]  __x64_sys_ioctl+0x8f/0xd0
[   68.561226]  do_syscall_64+0x46/0x160
[   68.561228]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

but only after a fresh boot.

If I run bpf selftests before that it works.
Like test_progs -t stacktrace_build_id_nmi
followed by the same
perf record -a -e cpu-clock ..
it works fine.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-05 18:45     ` Alexei Starovoitov
@ 2025-06-05 20:24       ` Liang, Kan
  2025-06-05 20:46         ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Liang, Kan @ 2025-06-05 20:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote



On 2025-06-05 2:45 p.m., Alexei Starovoitov wrote:
> On Thu, Jun 5, 2025 at 6:46 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
>>> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
>>>>
>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>
>>>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
>>>> below perf command.
>>>>   perf record -a -e cpu-clock -- sleep 2
>>>>
>>>> The issue is introduced by the generic throttle patch set, which
>>>> unconditionally invoke the event_stop() when throttle is triggered.
>>>>
>>>> The cpu-clock and task-clock are two special SW events, which rely on
>>>> the hrtimer. The throttle is invoked in the hrtimer handler. The
>>>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
>>>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
>>>> be used to stop the timer.
>>>>
>>>> There may be two ways to fix it.
>>>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>>>>   perf_event_throttle() if the flag is detected.
>>>>   It has been implemented in the
>>>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>>>   The new flag was thought to be an overkill for the issue.
>>>> - Add a check in the event_stop. Return immediately if the throttle is
>>>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>>>>   method to stop the timer.
>>>>
>>>> The latter is implemented here.
>>>>
>>>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>>>> the order the same as perf_event_unthrottle(). Except the patch, no one
>>>> checks the hw.interrupts in the stop(). There is no impact from the
>>>> order change.
>>>>
>>>> Reported-by: Leo Yan <leo.yan@arm.com>
>>>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
>>>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>>>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>>>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>>>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>>
>>> It seems the patch fixes one issue and introduces another ?
>>>
>>> Looks like the throttle event is sticky.
>>> Once it's reached the perf_event no longer works ?
>>
>> No. It should still work even the throttle is triggered.
>>
>> sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
>> /proc/sys/kernel/perf_event_max_sample_rate'
>> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
> 
> With the patch applied above command hangs in my VM:
> 
> # perf record -a -e cpu-clock -c10000 -- sleep 1
> [   43.656855] hrtimer: interrupt took 21640 ns
> [   68.561052] watchdog: BUG: soft lockup - CPU#0 stuck for 41s! [perf:2253]
> [   68.561056] Modules linked in: bpf_preload
> [   68.561060] CPU: 0 UID: 0 PID: 2253 Comm: perf Not tainted
> 6.15.0-12294-gc89e5202e569 #1172 PREEMPT
> [   68.561062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> [   68.561063] RIP: 0010:handle_softirqs+0x77/0x2a0
> [   68.561070] Code: 6b 96 02 00 01 00 00 bd 0a 00 00 00 44 89 64 24
> 14 89 6c 24 10 40 88 7c 24 04 31 c0 65 66 89 05 5f 6b 96 02 fb bb ff
> ff ff ff <48> c7 c0 c0 f
> [   68.561071] RSP: 0018:ffa0000000003fa0 EFLAGS: 00000246
> [   68.561072] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0
> [   68.561073] RDX: 0000000000000007 RSI: ff1100010212e100 RDI: 0000000000000000
> [   68.561074] RBP: 000000000000000a R08: 0000000000000000 R09: 7fffffffffffffff
> [   68.561074] R10: 00000005a3af2140 R11: 0000000000004601 R12: 0000000000400100
> [   68.561075] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
> [   68.561111] FS:  00007fa1b536f780(0000) GS:ff110004abe26000(0000)
> knlGS:0000000000000000
> [   68.561112] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   68.561113] CR2: 0000000001c67a30 CR3: 000000010f2f3003 CR4: 00000000003716f0
> [   68.561113] Call Trace:
> [   68.561170]  <IRQ>
> [   68.561174]  irq_exit_rcu+0x91/0xb0
> [   68.561176]  sysvec_apic_timer_interrupt+0x71/0x90
> [   68.561181]  </IRQ>
> [   68.561181]  <TASK>
> [   68.561182]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [   68.561184] RIP: 0010:generic_exec_single+0x33/0x120
> [   68.561188] Code: 65 39 3d 5c 4d 89 02 74 28 3b 3d b8 f5 17 02 0f
> 83 de 00 00 00 89 f8 48 0f a3 05 d8 f1 17 02 0f 83 ce 00 00 00 e8 bd
> fe ff ff <31> c0 5b 5d 9
> [   68.561189] RSP: 0018:ffa00000025a3cc0 EFLAGS: 00000206
> [   68.561190] RAX: 0000000000000000 RBX: ffffffff8145b310 RCX: ff110004abe26000
> [   68.561190] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ff1100042fa26540
> [   68.561191] RBP: 0000000000000202 R08: 0000000000000000 R09: 0000000000000000
> [   68.561191] R10: ffa00000025a3ee0 R11: 0000000000000000 R12: ffa00000025a3d40
> [   68.561192] R13: ff11000100881220 R14: ff11000100880fc0 R15: 0000000000000000
> [   68.561192]  ? sw_perf_event_destroy+0x70/0x70
> [   68.561197]  smp_call_function_single+0xc4/0x110
> [   68.561199]  ? sw_perf_event_destroy+0x70/0x70
> [   68.561200]  event_function_call+0x160/0x170
> [   68.561202]  ? ctx_resched+0x2d0/0x2d0
> [   68.561205]  ? perf_event_set_state+0x60/0x60
> [   68.561206]  ? _perf_event_disable+0x50/0x50
> [   68.561208]  perf_event_for_each_child+0x37/0x80
> [   68.561209]  ? _perf_event_disable+0x50/0x50
> [   68.561211]  _perf_ioctl+0x1df/0xad0
> [   68.561213]  ? __set_cpus_allowed_ptr+0x71/0x80
> [   68.561216]  ? avc_has_perm+0x72/0x160
> [   68.561219]  ? ldsem_down_write+0x1bb/0x1fc
> [   68.561222]  perf_ioctl+0x42/0x70
> [   68.561224]  __x64_sys_ioctl+0x8f/0xd0
> [   68.561226]  do_syscall_64+0x46/0x160
> [   68.561228]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> 
> but only after a fresh boot.
> 
> If I run bpf selftests before that it works.
> Like test_progs -t stacktrace_build_id_nmi
> followed by the same
> perf record -a -e cpu-clock ..
> it works fine.
> 

There should be a bug in V3. When stops in the throttle, the event
should not be updated, stop(event, 0). But the cpu_clock_event_stop()
doesn't handle the flag. That changes the behavior a little bit.

Could you please try the below patch and see if it helps?

diff --git a/kernel/events/core.c b/kernel/events/core.c
index cd85b1820e7d..b6c57ba24e78 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct
perf_event *event, bool start)

 static void perf_event_throttle(struct perf_event *event)
 {
-	event->pmu->stop(event, 0);
 	event->hw.interrupts = MAX_INTERRUPTS;
+	event->pmu->stop(event, 0);
 	if (event == event->group_leader)
 		perf_log_throttle(event, 0);
 }
@@ -11777,7 +11777,12 @@ static void perf_swevent_cancel_hrtimer(struct
perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;

-	if (is_sampling_event(event)) {
+	/*
+	 * The throttle can be triggered in the hrtimer handler.
+	 * The HRTIMER_NORESTART should be used to stop the timer,
+	 * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
+	 */
+	if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
 		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
 		local64_set(&hwc->period_left, ktime_to_ns(remaining));

@@ -11832,7 +11837,8 @@ static void cpu_clock_event_start(struct
perf_event *event, int flags)
 static void cpu_clock_event_stop(struct perf_event *event, int flags)
 {
 	perf_swevent_cancel_hrtimer(event);
-	cpu_clock_event_update(event);
+	if (flags & PERF_EF_UPDATE)
+		cpu_clock_event_update(event);
 }

 static int cpu_clock_event_add(struct perf_event *event, int flags)


Thanks,
Kan


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-05 20:24       ` Liang, Kan
@ 2025-06-05 20:46         ` Alexei Starovoitov
  2025-06-05 23:50           ` Liang, Kan
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-06-05 20:46 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote

On Thu, Jun 5, 2025 at 1:24 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2025-06-05 2:45 p.m., Alexei Starovoitov wrote:
> > On Thu, Jun 5, 2025 at 6:46 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>
> >>
> >>
> >> On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
> >>> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
> >>>>
> >>>> From: Kan Liang <kan.liang@linux.intel.com>
> >>>>
> >>>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
> >>>> below perf command.
> >>>>   perf record -a -e cpu-clock -- sleep 2
> >>>>
> >>>> The issue is introduced by the generic throttle patch set, which
> >>>> unconditionally invoke the event_stop() when throttle is triggered.
> >>>>
> >>>> The cpu-clock and task-clock are two special SW events, which rely on
> >>>> the hrtimer. The throttle is invoked in the hrtimer handler. The
> >>>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
> >>>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
> >>>> be used to stop the timer.
> >>>>
> >>>> There may be two ways to fix it.
> >>>> - Introduce a PMU flag to track the case. Avoid the event_stop in
> >>>>   perf_event_throttle() if the flag is detected.
> >>>>   It has been implemented in the
> >>>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
> >>>>   The new flag was thought to be an overkill for the issue.
> >>>> - Add a check in the event_stop. Return immediately if the throttle is
> >>>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
> >>>>   method to stop the timer.
> >>>>
> >>>> The latter is implemented here.
> >>>>
> >>>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
> >>>> the order the same as perf_event_unthrottle(). Except the patch, no one
> >>>> checks the hw.interrupts in the stop(). There is no impact from the
> >>>> order change.
> >>>>
> >>>> Reported-by: Leo Yan <leo.yan@arm.com>
> >>>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
> >>>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
> >>>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> >>>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
> >>>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> >>>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
> >>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> >>>
> >>> It seems the patch fixes one issue and introduces another ?
> >>>
> >>> Looks like the throttle event is sticky.
> >>> Once it's reached the perf_event no longer works ?
> >>
> >> No. It should still work even the throttle is triggered.
> >>
> >> sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
> >> /proc/sys/kernel/perf_event_max_sample_rate'
> >> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> >> [ perf record: Woken up 1 times to write data ]
> >> [ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
> >
> > With the patch applied above command hangs in my VM:
> >
> > # perf record -a -e cpu-clock -c10000 -- sleep 1
> > [   43.656855] hrtimer: interrupt took 21640 ns
> > [   68.561052] watchdog: BUG: soft lockup - CPU#0 stuck for 41s! [perf:2253]
> > [   68.561056] Modules linked in: bpf_preload
> > [   68.561060] CPU: 0 UID: 0 PID: 2253 Comm: perf Not tainted
> > 6.15.0-12294-gc89e5202e569 #1172 PREEMPT
> > [   68.561062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> > [   68.561063] RIP: 0010:handle_softirqs+0x77/0x2a0
> > [   68.561070] Code: 6b 96 02 00 01 00 00 bd 0a 00 00 00 44 89 64 24
> > 14 89 6c 24 10 40 88 7c 24 04 31 c0 65 66 89 05 5f 6b 96 02 fb bb ff
> > ff ff ff <48> c7 c0 c0 f
> > [   68.561071] RSP: 0018:ffa0000000003fa0 EFLAGS: 00000246
> > [   68.561072] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0
> > [   68.561073] RDX: 0000000000000007 RSI: ff1100010212e100 RDI: 0000000000000000
> > [   68.561074] RBP: 000000000000000a R08: 0000000000000000 R09: 7fffffffffffffff
> > [   68.561074] R10: 00000005a3af2140 R11: 0000000000004601 R12: 0000000000400100
> > [   68.561075] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
> > [   68.561111] FS:  00007fa1b536f780(0000) GS:ff110004abe26000(0000)
> > knlGS:0000000000000000
> > [   68.561112] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   68.561113] CR2: 0000000001c67a30 CR3: 000000010f2f3003 CR4: 00000000003716f0
> > [   68.561113] Call Trace:
> > [   68.561170]  <IRQ>
> > [   68.561174]  irq_exit_rcu+0x91/0xb0
> > [   68.561176]  sysvec_apic_timer_interrupt+0x71/0x90
> > [   68.561181]  </IRQ>
> > [   68.561181]  <TASK>
> > [   68.561182]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> > [   68.561184] RIP: 0010:generic_exec_single+0x33/0x120
> > [   68.561188] Code: 65 39 3d 5c 4d 89 02 74 28 3b 3d b8 f5 17 02 0f
> > 83 de 00 00 00 89 f8 48 0f a3 05 d8 f1 17 02 0f 83 ce 00 00 00 e8 bd
> > fe ff ff <31> c0 5b 5d 9
> > [   68.561189] RSP: 0018:ffa00000025a3cc0 EFLAGS: 00000206
> > [   68.561190] RAX: 0000000000000000 RBX: ffffffff8145b310 RCX: ff110004abe26000
> > [   68.561190] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ff1100042fa26540
> > [   68.561191] RBP: 0000000000000202 R08: 0000000000000000 R09: 0000000000000000
> > [   68.561191] R10: ffa00000025a3ee0 R11: 0000000000000000 R12: ffa00000025a3d40
> > [   68.561192] R13: ff11000100881220 R14: ff11000100880fc0 R15: 0000000000000000
> > [   68.561192]  ? sw_perf_event_destroy+0x70/0x70
> > [   68.561197]  smp_call_function_single+0xc4/0x110
> > [   68.561199]  ? sw_perf_event_destroy+0x70/0x70
> > [   68.561200]  event_function_call+0x160/0x170
> > [   68.561202]  ? ctx_resched+0x2d0/0x2d0
> > [   68.561205]  ? perf_event_set_state+0x60/0x60
> > [   68.561206]  ? _perf_event_disable+0x50/0x50
> > [   68.561208]  perf_event_for_each_child+0x37/0x80
> > [   68.561209]  ? _perf_event_disable+0x50/0x50
> > [   68.561211]  _perf_ioctl+0x1df/0xad0
> > [   68.561213]  ? __set_cpus_allowed_ptr+0x71/0x80
> > [   68.561216]  ? avc_has_perm+0x72/0x160
> > [   68.561219]  ? ldsem_down_write+0x1bb/0x1fc
> > [   68.561222]  perf_ioctl+0x42/0x70
> > [   68.561224]  __x64_sys_ioctl+0x8f/0xd0
> > [   68.561226]  do_syscall_64+0x46/0x160
> > [   68.561228]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> >
> > but only after a fresh boot.
> >
> > If I run bpf selftests before that it works.
> > Like test_progs -t stacktrace_build_id_nmi
> > followed by the same
> > perf record -a -e cpu-clock ..
> > it works fine.
> >
>
> There should be a bug in V3. When stops in the throttle, the event
> should not be updated, stop(event, 0). But the cpu_clock_event_stop()
> doesn't handle the flag. That changes the behavior a little bit.
>
> Could you please try the below patch and see if it helps?
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index cd85b1820e7d..b6c57ba24e78 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct
> perf_event *event, bool start)
>
>  static void perf_event_throttle(struct perf_event *event)
>  {
> -       event->pmu->stop(event, 0);
>         event->hw.interrupts = MAX_INTERRUPTS;
> +       event->pmu->stop(event, 0);
>         if (event == event->group_leader)
>                 perf_log_throttle(event, 0);
>  }
> @@ -11777,7 +11777,12 @@ static void perf_swevent_cancel_hrtimer(struct
> perf_event *event)
>  {
>         struct hw_perf_event *hwc = &event->hw;
>
> -       if (is_sampling_event(event)) {
> +       /*
> +        * The throttle can be triggered in the hrtimer handler.
> +        * The HRTIMER_NORESTART should be used to stop the timer,
> +        * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
> +        */
> +       if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
>                 ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
>                 local64_set(&hwc->period_left, ktime_to_ns(remaining));
>
> @@ -11832,7 +11837,8 @@ static void cpu_clock_event_start(struct
> perf_event *event, int flags)
>  static void cpu_clock_event_stop(struct perf_event *event, int flags)
>  {
>         perf_swevent_cancel_hrtimer(event);
> -       cpu_clock_event_update(event);
> +       if (flags & PERF_EF_UPDATE)
> +               cpu_clock_event_update(event);
>  }

Nope. The last hunk didn't make any difference.
Same soft lockup.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-05 20:46         ` Alexei Starovoitov
@ 2025-06-05 23:50           ` Liang, Kan
  2025-06-06  0:39             ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Liang, Kan @ 2025-06-05 23:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote



On 2025-06-05 4:46 p.m., Alexei Starovoitov wrote:
> On Thu, Jun 5, 2025 at 1:24 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2025-06-05 2:45 p.m., Alexei Starovoitov wrote:
>>> On Thu, Jun 5, 2025 at 6:46 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
>>>>> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
>>>>>>
>>>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>>>
>>>>>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
>>>>>> below perf command.
>>>>>>   perf record -a -e cpu-clock -- sleep 2
>>>>>>
>>>>>> The issue is introduced by the generic throttle patch set, which
>>>>>> unconditionally invoke the event_stop() when throttle is triggered.
>>>>>>
>>>>>> The cpu-clock and task-clock are two special SW events, which rely on
>>>>>> the hrtimer. The throttle is invoked in the hrtimer handler. The
>>>>>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
>>>>>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
>>>>>> be used to stop the timer.
>>>>>>
>>>>>> There may be two ways to fix it.
>>>>>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>>>>>>   perf_event_throttle() if the flag is detected.
>>>>>>   It has been implemented in the
>>>>>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>>>>>   The new flag was thought to be an overkill for the issue.
>>>>>> - Add a check in the event_stop. Return immediately if the throttle is
>>>>>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>>>>>>   method to stop the timer.
>>>>>>
>>>>>> The latter is implemented here.
>>>>>>
>>>>>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>>>>>> the order the same as perf_event_unthrottle(). Except the patch, no one
>>>>>> checks the hw.interrupts in the stop(). There is no impact from the
>>>>>> order change.
>>>>>>
>>>>>> Reported-by: Leo Yan <leo.yan@arm.com>
>>>>>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
>>>>>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>>>>>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>>>>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>>>>>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>>>>>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>>>>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>>>>
>>>>> It seems the patch fixes one issue and introduces another ?
>>>>>
>>>>> Looks like the throttle event is sticky.
>>>>> Once it's reached the perf_event no longer works ?
>>>>
>>>> No. It should still work even the throttle is triggered.
>>>>
>>>> sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
>>>> /proc/sys/kernel/perf_event_max_sample_rate'
>>>> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
>>>
>>> With the patch applied above command hangs in my VM:
>>>
>>> # perf record -a -e cpu-clock -c10000 -- sleep 1
>>> [   43.656855] hrtimer: interrupt took 21640 ns
>>> [   68.561052] watchdog: BUG: soft lockup - CPU#0 stuck for 41s! [perf:2253]
>>> [   68.561056] Modules linked in: bpf_preload
>>> [   68.561060] CPU: 0 UID: 0 PID: 2253 Comm: perf Not tainted
>>> 6.15.0-12294-gc89e5202e569 #1172 PREEMPT
>>> [   68.561062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>>> [   68.561063] RIP: 0010:handle_softirqs+0x77/0x2a0
>>> [   68.561070] Code: 6b 96 02 00 01 00 00 bd 0a 00 00 00 44 89 64 24
>>> 14 89 6c 24 10 40 88 7c 24 04 31 c0 65 66 89 05 5f 6b 96 02 fb bb ff
>>> ff ff ff <48> c7 c0 c0 f
>>> [   68.561071] RSP: 0018:ffa0000000003fa0 EFLAGS: 00000246
>>> [   68.561072] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0
>>> [   68.561073] RDX: 0000000000000007 RSI: ff1100010212e100 RDI: 0000000000000000
>>> [   68.561074] RBP: 000000000000000a R08: 0000000000000000 R09: 7fffffffffffffff
>>> [   68.561074] R10: 00000005a3af2140 R11: 0000000000004601 R12: 0000000000400100
>>> [   68.561075] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
>>> [   68.561111] FS:  00007fa1b536f780(0000) GS:ff110004abe26000(0000)
>>> knlGS:0000000000000000
>>> [   68.561112] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   68.561113] CR2: 0000000001c67a30 CR3: 000000010f2f3003 CR4: 00000000003716f0
>>> [   68.561113] Call Trace:
>>> [   68.561170]  <IRQ>
>>> [   68.561174]  irq_exit_rcu+0x91/0xb0
>>> [   68.561176]  sysvec_apic_timer_interrupt+0x71/0x90
>>> [   68.561181]  </IRQ>
>>> [   68.561181]  <TASK>
>>> [   68.561182]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
>>> [   68.561184] RIP: 0010:generic_exec_single+0x33/0x120
>>> [   68.561188] Code: 65 39 3d 5c 4d 89 02 74 28 3b 3d b8 f5 17 02 0f
>>> 83 de 00 00 00 89 f8 48 0f a3 05 d8 f1 17 02 0f 83 ce 00 00 00 e8 bd
>>> fe ff ff <31> c0 5b 5d 9
>>> [   68.561189] RSP: 0018:ffa00000025a3cc0 EFLAGS: 00000206
>>> [   68.561190] RAX: 0000000000000000 RBX: ffffffff8145b310 RCX: ff110004abe26000
>>> [   68.561190] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ff1100042fa26540
>>> [   68.561191] RBP: 0000000000000202 R08: 0000000000000000 R09: 0000000000000000
>>> [   68.561191] R10: ffa00000025a3ee0 R11: 0000000000000000 R12: ffa00000025a3d40
>>> [   68.561192] R13: ff11000100881220 R14: ff11000100880fc0 R15: 0000000000000000
>>> [   68.561192]  ? sw_perf_event_destroy+0x70/0x70
>>> [   68.561197]  smp_call_function_single+0xc4/0x110
>>> [   68.561199]  ? sw_perf_event_destroy+0x70/0x70
>>> [   68.561200]  event_function_call+0x160/0x170
>>> [   68.561202]  ? ctx_resched+0x2d0/0x2d0
>>> [   68.561205]  ? perf_event_set_state+0x60/0x60
>>> [   68.561206]  ? _perf_event_disable+0x50/0x50
>>> [   68.561208]  perf_event_for_each_child+0x37/0x80
>>> [   68.561209]  ? _perf_event_disable+0x50/0x50
>>> [   68.561211]  _perf_ioctl+0x1df/0xad0
>>> [   68.561213]  ? __set_cpus_allowed_ptr+0x71/0x80
>>> [   68.561216]  ? avc_has_perm+0x72/0x160
>>> [   68.561219]  ? ldsem_down_write+0x1bb/0x1fc
>>> [   68.561222]  perf_ioctl+0x42/0x70
>>> [   68.561224]  __x64_sys_ioctl+0x8f/0xd0
>>> [   68.561226]  do_syscall_64+0x46/0x160
>>> [   68.561228]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>
>>> but only after a fresh boot.
>>>
>>> If I run bpf selftests before that it works.
>>> Like test_progs -t stacktrace_build_id_nmi
>>> followed by the same
>>> perf record -a -e cpu-clock ..
>>> it works fine.
>>>
>>
>> There should be a bug in V3. When stops in the throttle, the event
>> should not be updated, stop(event, 0). But the cpu_clock_event_stop()
>> doesn't handle the flag. That changes the behavior a little bit.
>>
>> Could you please try the below patch and see if it helps?
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index cd85b1820e7d..b6c57ba24e78 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct
>> perf_event *event, bool start)
>>
>>  static void perf_event_throttle(struct perf_event *event)
>>  {
>> -       event->pmu->stop(event, 0);
>>         event->hw.interrupts = MAX_INTERRUPTS;
>> +       event->pmu->stop(event, 0);
>>         if (event == event->group_leader)
>>                 perf_log_throttle(event, 0);
>>  }
>> @@ -11777,7 +11777,12 @@ static void perf_swevent_cancel_hrtimer(struct
>> perf_event *event)
>>  {
>>         struct hw_perf_event *hwc = &event->hw;
>>
>> -       if (is_sampling_event(event)) {
>> +       /*
>> +        * The throttle can be triggered in the hrtimer handler.
>> +        * The HRTIMER_NORESTART should be used to stop the timer,
>> +        * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
>> +        */
>> +       if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
>>                 ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
>>                 local64_set(&hwc->period_left, ktime_to_ns(remaining));
>>
>> @@ -11832,7 +11837,8 @@ static void cpu_clock_event_start(struct
>> perf_event *event, int flags)
>>  static void cpu_clock_event_stop(struct perf_event *event, int flags)
>>  {
>>         perf_swevent_cancel_hrtimer(event);
>> -       cpu_clock_event_update(event);
>> +       if (flags & PERF_EF_UPDATE)
>> +               cpu_clock_event_update(event);
>>  }
> 
> Nope. The last hunk didn't make any difference.
> Same soft lockup.

Thanks for the verification.

I have some questions, could you please help to clarify?

- What's the value of proc/sys/kernel/perf_event_max_sample_rate in the
test?

- Can the soft lockup issue be produced by both bpf selftests and the
perf record command? Or just the perf record command?

- Can the soft lockup issue be reproduced with the v2 patchset?
https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/

- Furthermore, can the soft lockup issue be reproduced after reverting
the recent generic throttle logic fix?
commit e800ac51202f ("perf: Only dump the throttle log for the leader")
commit 9734e25fbf5a ("perf: Fix the throttle logic for a group")

Thanks,
Kan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-05 23:50           ` Liang, Kan
@ 2025-06-06  0:39             ` Alexei Starovoitov
  2025-06-06 13:05               ` Liang, Kan
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-06-06  0:39 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote

On Thu, Jun 5, 2025 at 4:50 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2025-06-05 4:46 p.m., Alexei Starovoitov wrote:
> > On Thu, Jun 5, 2025 at 1:24 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>
> >>
> >>
> >> On 2025-06-05 2:45 p.m., Alexei Starovoitov wrote:
> >>> On Thu, Jun 5, 2025 at 6:46 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
> >>>>> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
> >>>>>>
> >>>>>> From: Kan Liang <kan.liang@linux.intel.com>
> >>>>>>
> >>>>>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
> >>>>>> below perf command.
> >>>>>>   perf record -a -e cpu-clock -- sleep 2
> >>>>>>
> >>>>>> The issue is introduced by the generic throttle patch set, which
> >>>>>> unconditionally invoke the event_stop() when throttle is triggered.
> >>>>>>
> >>>>>> The cpu-clock and task-clock are two special SW events, which rely on
> >>>>>> the hrtimer. The throttle is invoked in the hrtimer handler. The
> >>>>>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
> >>>>>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
> >>>>>> be used to stop the timer.
> >>>>>>
> >>>>>> There may be two ways to fix it.
> >>>>>> - Introduce a PMU flag to track the case. Avoid the event_stop in
> >>>>>>   perf_event_throttle() if the flag is detected.
> >>>>>>   It has been implemented in the
> >>>>>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
> >>>>>>   The new flag was thought to be an overkill for the issue.
> >>>>>> - Add a check in the event_stop. Return immediately if the throttle is
> >>>>>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
> >>>>>>   method to stop the timer.
> >>>>>>
> >>>>>> The latter is implemented here.
> >>>>>>
> >>>>>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
> >>>>>> the order the same as perf_event_unthrottle(). Except the patch, no one
> >>>>>> checks the hw.interrupts in the stop(). There is no impact from the
> >>>>>> order change.
> >>>>>>
> >>>>>> Reported-by: Leo Yan <leo.yan@arm.com>
> >>>>>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
> >>>>>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
> >>>>>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> >>>>>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
> >>>>>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> >>>>>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
> >>>>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> >>>>>
> >>>>> It seems the patch fixes one issue and introduces another ?
> >>>>>
> >>>>> Looks like the throttle event is sticky.
> >>>>> Once it's reached the perf_event no longer works ?
> >>>>
> >>>> No. It should still work even the throttle is triggered.
> >>>>
> >>>> sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
> >>>> /proc/sys/kernel/perf_event_max_sample_rate'
> >>>> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> >>>> [ perf record: Woken up 1 times to write data ]
> >>>> [ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
> >>>
> >>> With the patch applied above command hangs in my VM:
> >>>
> >>> # perf record -a -e cpu-clock -c10000 -- sleep 1
> >>> [   43.656855] hrtimer: interrupt took 21640 ns
> >>> [   68.561052] watchdog: BUG: soft lockup - CPU#0 stuck for 41s! [perf:2253]
> >>> [   68.561056] Modules linked in: bpf_preload
> >>> [   68.561060] CPU: 0 UID: 0 PID: 2253 Comm: perf Not tainted
> >>> 6.15.0-12294-gc89e5202e569 #1172 PREEMPT
> >>> [   68.561062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >>> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> >>> [   68.561063] RIP: 0010:handle_softirqs+0x77/0x2a0
> >>> [   68.561070] Code: 6b 96 02 00 01 00 00 bd 0a 00 00 00 44 89 64 24
> >>> 14 89 6c 24 10 40 88 7c 24 04 31 c0 65 66 89 05 5f 6b 96 02 fb bb ff
> >>> ff ff ff <48> c7 c0 c0 f
> >>> [   68.561071] RSP: 0018:ffa0000000003fa0 EFLAGS: 00000246
> >>> [   68.561072] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0
> >>> [   68.561073] RDX: 0000000000000007 RSI: ff1100010212e100 RDI: 0000000000000000
> >>> [   68.561074] RBP: 000000000000000a R08: 0000000000000000 R09: 7fffffffffffffff
> >>> [   68.561074] R10: 00000005a3af2140 R11: 0000000000004601 R12: 0000000000400100
> >>> [   68.561075] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
> >>> [   68.561111] FS:  00007fa1b536f780(0000) GS:ff110004abe26000(0000)
> >>> knlGS:0000000000000000
> >>> [   68.561112] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [   68.561113] CR2: 0000000001c67a30 CR3: 000000010f2f3003 CR4: 00000000003716f0
> >>> [   68.561113] Call Trace:
> >>> [   68.561170]  <IRQ>
> >>> [   68.561174]  irq_exit_rcu+0x91/0xb0
> >>> [   68.561176]  sysvec_apic_timer_interrupt+0x71/0x90
> >>> [   68.561181]  </IRQ>
> >>> [   68.561181]  <TASK>
> >>> [   68.561182]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> >>> [   68.561184] RIP: 0010:generic_exec_single+0x33/0x120
> >>> [   68.561188] Code: 65 39 3d 5c 4d 89 02 74 28 3b 3d b8 f5 17 02 0f
> >>> 83 de 00 00 00 89 f8 48 0f a3 05 d8 f1 17 02 0f 83 ce 00 00 00 e8 bd
> >>> fe ff ff <31> c0 5b 5d 9
> >>> [   68.561189] RSP: 0018:ffa00000025a3cc0 EFLAGS: 00000206
> >>> [   68.561190] RAX: 0000000000000000 RBX: ffffffff8145b310 RCX: ff110004abe26000
> >>> [   68.561190] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ff1100042fa26540
> >>> [   68.561191] RBP: 0000000000000202 R08: 0000000000000000 R09: 0000000000000000
> >>> [   68.561191] R10: ffa00000025a3ee0 R11: 0000000000000000 R12: ffa00000025a3d40
> >>> [   68.561192] R13: ff11000100881220 R14: ff11000100880fc0 R15: 0000000000000000
> >>> [   68.561192]  ? sw_perf_event_destroy+0x70/0x70
> >>> [   68.561197]  smp_call_function_single+0xc4/0x110
> >>> [   68.561199]  ? sw_perf_event_destroy+0x70/0x70
> >>> [   68.561200]  event_function_call+0x160/0x170
> >>> [   68.561202]  ? ctx_resched+0x2d0/0x2d0
> >>> [   68.561205]  ? perf_event_set_state+0x60/0x60
> >>> [   68.561206]  ? _perf_event_disable+0x50/0x50
> >>> [   68.561208]  perf_event_for_each_child+0x37/0x80
> >>> [   68.561209]  ? _perf_event_disable+0x50/0x50
> >>> [   68.561211]  _perf_ioctl+0x1df/0xad0
> >>> [   68.561213]  ? __set_cpus_allowed_ptr+0x71/0x80
> >>> [   68.561216]  ? avc_has_perm+0x72/0x160
> >>> [   68.561219]  ? ldsem_down_write+0x1bb/0x1fc
> >>> [   68.561222]  perf_ioctl+0x42/0x70
> >>> [   68.561224]  __x64_sys_ioctl+0x8f/0xd0
> >>> [   68.561226]  do_syscall_64+0x46/0x160
> >>> [   68.561228]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> >>>
> >>> but only after a fresh boot.
> >>>
> >>> If I run bpf selftests before that it works.
> >>> Like test_progs -t stacktrace_build_id_nmi
> >>> followed by the same
> >>> perf record -a -e cpu-clock ..
> >>> it works fine.
> >>>
> >>
> >> There should be a bug in V3. When stops in the throttle, the event
> >> should not be updated, stop(event, 0). But the cpu_clock_event_stop()
> >> doesn't handle the flag. That changes the behavior a little bit.
> >>
> >> Could you please try the below patch and see if it helps?
> >>
> >> diff --git a/kernel/events/core.c b/kernel/events/core.c
> >> index cd85b1820e7d..b6c57ba24e78 100644
> >> --- a/kernel/events/core.c
> >> +++ b/kernel/events/core.c
> >> @@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct
> >> perf_event *event, bool start)
> >>
> >>  static void perf_event_throttle(struct perf_event *event)
> >>  {
> >> -       event->pmu->stop(event, 0);
> >>         event->hw.interrupts = MAX_INTERRUPTS;
> >> +       event->pmu->stop(event, 0);
> >>         if (event == event->group_leader)
> >>                 perf_log_throttle(event, 0);
> >>  }
> >> @@ -11777,7 +11777,12 @@ static void perf_swevent_cancel_hrtimer(struct
> >> perf_event *event)
> >>  {
> >>         struct hw_perf_event *hwc = &event->hw;
> >>
> >> -       if (is_sampling_event(event)) {
> >> +       /*
> >> +        * The throttle can be triggered in the hrtimer handler.
> >> +        * The HRTIMER_NORESTART should be used to stop the timer,
> >> +        * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
> >> +        */
> >> +       if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
> >>                 ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
> >>                 local64_set(&hwc->period_left, ktime_to_ns(remaining));
> >>
> >> @@ -11832,7 +11837,8 @@ static void cpu_clock_event_start(struct
> >> perf_event *event, int flags)
> >>  static void cpu_clock_event_stop(struct perf_event *event, int flags)
> >>  {
> >>         perf_swevent_cancel_hrtimer(event);
> >> -       cpu_clock_event_update(event);
> >> +       if (flags & PERF_EF_UPDATE)
> >> +               cpu_clock_event_update(event);
> >>  }
> >
> > Nope. The last hunk didn't make any difference.
> > Same soft lockup.
>
> Thanks for the verification.
>
> I have some questions, could you please help to clarify?
>
> - What's the value of proc/sys/kernel/perf_event_max_sample_rate in the
> test?

100000

> - Can the soft lockup issue be produced by both bpf selftests and the
> perf record command? Or just the perf record command?

just perf record.

> - Can the soft lockup issue be reproduced with the v2 patchset?
> https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/

yes. With v2
perf record -a -e cpu-clock -c10000 -- sleep 1

soft locks up as well with the same stack trace.

> - Furthermore, can the soft lockup issue be reproduced after reverting
> the recent generic throttle logic fix?
> commit e800ac51202f ("perf: Only dump the throttle log for the leader")
> commit 9734e25fbf5a ("perf: Fix the throttle logic for a group")

yes.

perf record -a -e cpu-clock -- sleep 1
works

but
perf record -a -e cpu-clock -c100 -- sleep 1
locks up.
-c1, -c100, -c10000 all lockup.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-06  0:39             ` Alexei Starovoitov
@ 2025-06-06 13:05               ` Liang, Kan
  2025-06-06 17:42                 ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Liang, Kan @ 2025-06-06 13:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote



On 2025-06-05 8:39 p.m., Alexei Starovoitov wrote:
> On Thu, Jun 5, 2025 at 4:50 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2025-06-05 4:46 p.m., Alexei Starovoitov wrote:
>>> On Thu, Jun 5, 2025 at 1:24 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2025-06-05 2:45 p.m., Alexei Starovoitov wrote:
>>>>> On Thu, Jun 5, 2025 at 6:46 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
>>>>>>> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
>>>>>>>>
>>>>>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>>>>>
>>>>>>>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
>>>>>>>> below perf command.
>>>>>>>>   perf record -a -e cpu-clock -- sleep 2
>>>>>>>>
>>>>>>>> The issue is introduced by the generic throttle patch set, which
>>>>>>>> unconditionally invoke the event_stop() when throttle is triggered.
>>>>>>>>
>>>>>>>> The cpu-clock and task-clock are two special SW events, which rely on
>>>>>>>> the hrtimer. The throttle is invoked in the hrtimer handler. The
>>>>>>>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
>>>>>>>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
>>>>>>>> be used to stop the timer.
>>>>>>>>
>>>>>>>> There may be two ways to fix it.
>>>>>>>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>>>>>>>>   perf_event_throttle() if the flag is detected.
>>>>>>>>   It has been implemented in the
>>>>>>>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>>>>>>>   The new flag was thought to be an overkill for the issue.
>>>>>>>> - Add a check in the event_stop. Return immediately if the throttle is
>>>>>>>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>>>>>>>>   method to stop the timer.
>>>>>>>>
>>>>>>>> The latter is implemented here.
>>>>>>>>
>>>>>>>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>>>>>>>> the order the same as perf_event_unthrottle(). Except the patch, no one
>>>>>>>> checks the hw.interrupts in the stop(). There is no impact from the
>>>>>>>> order change.
>>>>>>>>
>>>>>>>> Reported-by: Leo Yan <leo.yan@arm.com>
>>>>>>>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
>>>>>>>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>>>>>>>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>>>>>>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>>>>>>>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>>>>>>>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>>>>>>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>>>>>>
>>>>>>> It seems the patch fixes one issue and introduces another ?
>>>>>>>
>>>>>>> Looks like the throttle event is sticky.
>>>>>>> Once it's reached the perf_event no longer works ?
>>>>>>
>>>>>> No. It should still work even the throttle is triggered.
>>>>>>
>>>>>> sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
>>>>>> /proc/sys/kernel/perf_event_max_sample_rate'
>>>>>> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>> [ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
>>>>>
>>>>> With the patch applied above command hangs in my VM:
>>>>>
>>>>> # perf record -a -e cpu-clock -c10000 -- sleep 1
>>>>> [   43.656855] hrtimer: interrupt took 21640 ns
>>>>> [   68.561052] watchdog: BUG: soft lockup - CPU#0 stuck for 41s! [perf:2253]
>>>>> [   68.561056] Modules linked in: bpf_preload
>>>>> [   68.561060] CPU: 0 UID: 0 PID: 2253 Comm: perf Not tainted
>>>>> 6.15.0-12294-gc89e5202e569 #1172 PREEMPT
>>>>> [   68.561062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>>> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>>>>> [   68.561063] RIP: 0010:handle_softirqs+0x77/0x2a0
>>>>> [   68.561070] Code: 6b 96 02 00 01 00 00 bd 0a 00 00 00 44 89 64 24
>>>>> 14 89 6c 24 10 40 88 7c 24 04 31 c0 65 66 89 05 5f 6b 96 02 fb bb ff
>>>>> ff ff ff <48> c7 c0 c0 f
>>>>> [   68.561071] RSP: 0018:ffa0000000003fa0 EFLAGS: 00000246
>>>>> [   68.561072] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0
>>>>> [   68.561073] RDX: 0000000000000007 RSI: ff1100010212e100 RDI: 0000000000000000
>>>>> [   68.561074] RBP: 000000000000000a R08: 0000000000000000 R09: 7fffffffffffffff
>>>>> [   68.561074] R10: 00000005a3af2140 R11: 0000000000004601 R12: 0000000000400100
>>>>> [   68.561075] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
>>>>> [   68.561111] FS:  00007fa1b536f780(0000) GS:ff110004abe26000(0000)
>>>>> knlGS:0000000000000000
>>>>> [   68.561112] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [   68.561113] CR2: 0000000001c67a30 CR3: 000000010f2f3003 CR4: 00000000003716f0
>>>>> [   68.561113] Call Trace:
>>>>> [   68.561170]  <IRQ>
>>>>> [   68.561174]  irq_exit_rcu+0x91/0xb0
>>>>> [   68.561176]  sysvec_apic_timer_interrupt+0x71/0x90
>>>>> [   68.561181]  </IRQ>
>>>>> [   68.561181]  <TASK>
>>>>> [   68.561182]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
>>>>> [   68.561184] RIP: 0010:generic_exec_single+0x33/0x120
>>>>> [   68.561188] Code: 65 39 3d 5c 4d 89 02 74 28 3b 3d b8 f5 17 02 0f
>>>>> 83 de 00 00 00 89 f8 48 0f a3 05 d8 f1 17 02 0f 83 ce 00 00 00 e8 bd
>>>>> fe ff ff <31> c0 5b 5d 9
>>>>> [   68.561189] RSP: 0018:ffa00000025a3cc0 EFLAGS: 00000206
>>>>> [   68.561190] RAX: 0000000000000000 RBX: ffffffff8145b310 RCX: ff110004abe26000
>>>>> [   68.561190] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ff1100042fa26540
>>>>> [   68.561191] RBP: 0000000000000202 R08: 0000000000000000 R09: 0000000000000000
>>>>> [   68.561191] R10: ffa00000025a3ee0 R11: 0000000000000000 R12: ffa00000025a3d40
>>>>> [   68.561192] R13: ff11000100881220 R14: ff11000100880fc0 R15: 0000000000000000
>>>>> [   68.561192]  ? sw_perf_event_destroy+0x70/0x70
>>>>> [   68.561197]  smp_call_function_single+0xc4/0x110
>>>>> [   68.561199]  ? sw_perf_event_destroy+0x70/0x70
>>>>> [   68.561200]  event_function_call+0x160/0x170
>>>>> [   68.561202]  ? ctx_resched+0x2d0/0x2d0
>>>>> [   68.561205]  ? perf_event_set_state+0x60/0x60
>>>>> [   68.561206]  ? _perf_event_disable+0x50/0x50
>>>>> [   68.561208]  perf_event_for_each_child+0x37/0x80
>>>>> [   68.561209]  ? _perf_event_disable+0x50/0x50
>>>>> [   68.561211]  _perf_ioctl+0x1df/0xad0
>>>>> [   68.561213]  ? __set_cpus_allowed_ptr+0x71/0x80
>>>>> [   68.561216]  ? avc_has_perm+0x72/0x160
>>>>> [   68.561219]  ? ldsem_down_write+0x1bb/0x1fc
>>>>> [   68.561222]  perf_ioctl+0x42/0x70
>>>>> [   68.561224]  __x64_sys_ioctl+0x8f/0xd0
>>>>> [   68.561226]  do_syscall_64+0x46/0x160
>>>>> [   68.561228]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>>>
>>>>> but only after a fresh boot.
>>>>>
>>>>> If I run bpf selftests before that it works.
>>>>> Like test_progs -t stacktrace_build_id_nmi
>>>>> followed by the same
>>>>> perf record -a -e cpu-clock ..
>>>>> it works fine.
>>>>>
>>>>
>>>> There should be a bug in V3. When stops in the throttle, the event
>>>> should not be updated, stop(event, 0). But the cpu_clock_event_stop()
>>>> doesn't handle the flag. That changes the behavior a little bit.
>>>>
>>>> Could you please try the below patch and see if it helps?
>>>>
>>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>>> index cd85b1820e7d..b6c57ba24e78 100644
>>>> --- a/kernel/events/core.c
>>>> +++ b/kernel/events/core.c
>>>> @@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct
>>>> perf_event *event, bool start)
>>>>
>>>>  static void perf_event_throttle(struct perf_event *event)
>>>>  {
>>>> -       event->pmu->stop(event, 0);
>>>>         event->hw.interrupts = MAX_INTERRUPTS;
>>>> +       event->pmu->stop(event, 0);
>>>>         if (event == event->group_leader)
>>>>                 perf_log_throttle(event, 0);
>>>>  }
>>>> @@ -11777,7 +11777,12 @@ static void perf_swevent_cancel_hrtimer(struct
>>>> perf_event *event)
>>>>  {
>>>>         struct hw_perf_event *hwc = &event->hw;
>>>>
>>>> -       if (is_sampling_event(event)) {
>>>> +       /*
>>>> +        * The throttle can be triggered in the hrtimer handler.
>>>> +        * The HRTIMER_NORESTART should be used to stop the timer,
>>>> +        * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
>>>> +        */
>>>> +       if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
>>>>                 ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
>>>>                 local64_set(&hwc->period_left, ktime_to_ns(remaining));
>>>>
>>>> @@ -11832,7 +11837,8 @@ static void cpu_clock_event_start(struct
>>>> perf_event *event, int flags)
>>>>  static void cpu_clock_event_stop(struct perf_event *event, int flags)
>>>>  {
>>>>         perf_swevent_cancel_hrtimer(event);
>>>> -       cpu_clock_event_update(event);
>>>> +       if (flags & PERF_EF_UPDATE)
>>>> +               cpu_clock_event_update(event);
>>>>  }
>>>
>>> Nope. The last hunk didn't make any difference.
>>> Same soft lockup.
>>
>> Thanks for the verification.
>>
>> I have some questions, could you please help to clarify?
>>
>> - What's the value of proc/sys/kernel/perf_event_max_sample_rate in the
>> test?
> 
> 100000
> 
>> - Can the soft lockup issue be produced by both bpf selftests and the
>> perf record command? Or just the perf record command?
> 
> just perf record.
> 
>> - Can the soft lockup issue be reproduced with the v2 patchset?
>> https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
> 
> yes. With v2
> perf record -a -e cpu-clock -c10000 -- sleep 1
> 
> soft locks up as well with the same stack trace.
> 
>> - Furthermore, can the soft lockup issue be reproduced after reverting
>> the recent generic throttle logic fix?
>> commit e800ac51202f ("perf: Only dump the throttle log for the leader")
>> commit 9734e25fbf5a ("perf: Fix the throttle logic for a group")
> 
> yes.
> 
> perf record -a -e cpu-clock -- sleep 1
> works
> 
> but
> perf record -a -e cpu-clock -c100 -- sleep 1
> locks up.
> -c1, -c100, -c10000 all lockup.

Thanks for the confirmation.

The soft lockup issue is likely an existing issue in virtualization,
unrelated to the recent throttle patches.

The issue should be caused by the overwhelming overflows.
The -c means fixed period. It's too small for the SW event in VM.
For 1 GHZ CPU, the -c10000 will trigger 100000 samples per second.
-c1 will trigger 1000000000 samples per second.
The perf_event_max_sample_rate is set to 100000. The throttle mechanism
should be triggered for the above case, but even with 100000 samples per
second. It seems still be big number in virtualization.

There is no such issue without -c. I think it's because of the
auto-adjust mechanism for the freq mode. The period is automatically
adjusted, so a reasonable number of samples can be generated per second.

If you change the proc/sys/kernel/perf_event_max_sample_rate to a small
value, e.g., 1000, I think the issue should be gone.

Could you please give a try?

Thanks,
Kan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-06 13:05               ` Liang, Kan
@ 2025-06-06 17:42                 ` Alexei Starovoitov
  2025-06-06 18:38                   ` Liang, Kan
  0 siblings, 1 reply; 16+ messages in thread
From: Alexei Starovoitov @ 2025-06-06 17:42 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote

On Fri, Jun 6, 2025 at 6:05 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2025-06-05 8:39 p.m., Alexei Starovoitov wrote:
> > On Thu, Jun 5, 2025 at 4:50 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>
> >>
> >>
> >> On 2025-06-05 4:46 p.m., Alexei Starovoitov wrote:
> >>> On Thu, Jun 5, 2025 at 1:24 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2025-06-05 2:45 p.m., Alexei Starovoitov wrote:
> >>>>> On Thu, Jun 5, 2025 at 6:46 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
> >>>>>>> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
> >>>>>>>>
> >>>>>>>> From: Kan Liang <kan.liang@linux.intel.com>
> >>>>>>>>
> >>>>>>>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
> >>>>>>>> below perf command.
> >>>>>>>>   perf record -a -e cpu-clock -- sleep 2
> >>>>>>>>
> >>>>>>>> The issue is introduced by the generic throttle patch set, which
> >>>>>>>> unconditionally invoke the event_stop() when throttle is triggered.
> >>>>>>>>
> >>>>>>>> The cpu-clock and task-clock are two special SW events, which rely on
> >>>>>>>> the hrtimer. The throttle is invoked in the hrtimer handler. The
> >>>>>>>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
> >>>>>>>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
> >>>>>>>> be used to stop the timer.
> >>>>>>>>
> >>>>>>>> There may be two ways to fix it.
> >>>>>>>> - Introduce a PMU flag to track the case. Avoid the event_stop in
> >>>>>>>>   perf_event_throttle() if the flag is detected.
> >>>>>>>>   It has been implemented in the
> >>>>>>>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
> >>>>>>>>   The new flag was thought to be an overkill for the issue.
> >>>>>>>> - Add a check in the event_stop. Return immediately if the throttle is
> >>>>>>>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
> >>>>>>>>   method to stop the timer.
> >>>>>>>>
> >>>>>>>> The latter is implemented here.
> >>>>>>>>
> >>>>>>>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
> >>>>>>>> the order the same as perf_event_unthrottle(). Except the patch, no one
> >>>>>>>> checks the hw.interrupts in the stop(). There is no impact from the
> >>>>>>>> order change.
> >>>>>>>>
> >>>>>>>> Reported-by: Leo Yan <leo.yan@arm.com>
> >>>>>>>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
> >>>>>>>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
> >>>>>>>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> >>>>>>>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
> >>>>>>>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> >>>>>>>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
> >>>>>>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> >>>>>>>
> >>>>>>> It seems the patch fixes one issue and introduces another ?
> >>>>>>>
> >>>>>>> Looks like the throttle event is sticky.
> >>>>>>> Once it's reached the perf_event no longer works ?
> >>>>>>
> >>>>>> No. It should still work even the throttle is triggered.
> >>>>>>
> >>>>>> sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
> >>>>>> /proc/sys/kernel/perf_event_max_sample_rate'
> >>>>>> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> >>>>>> [ perf record: Woken up 1 times to write data ]
> >>>>>> [ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
> >>>>>
> >>>>> With the patch applied above command hangs in my VM:
> >>>>>
> >>>>> # perf record -a -e cpu-clock -c10000 -- sleep 1
> >>>>> [   43.656855] hrtimer: interrupt took 21640 ns
> >>>>> [   68.561052] watchdog: BUG: soft lockup - CPU#0 stuck for 41s! [perf:2253]
> >>>>> [   68.561056] Modules linked in: bpf_preload
> >>>>> [   68.561060] CPU: 0 UID: 0 PID: 2253 Comm: perf Not tainted
> >>>>> 6.15.0-12294-gc89e5202e569 #1172 PREEMPT
> >>>>> [   68.561062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >>>>> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> >>>>> [   68.561063] RIP: 0010:handle_softirqs+0x77/0x2a0
> >>>>> [   68.561070] Code: 6b 96 02 00 01 00 00 bd 0a 00 00 00 44 89 64 24
> >>>>> 14 89 6c 24 10 40 88 7c 24 04 31 c0 65 66 89 05 5f 6b 96 02 fb bb ff
> >>>>> ff ff ff <48> c7 c0 c0 f
> >>>>> [   68.561071] RSP: 0018:ffa0000000003fa0 EFLAGS: 00000246
> >>>>> [   68.561072] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0
> >>>>> [   68.561073] RDX: 0000000000000007 RSI: ff1100010212e100 RDI: 0000000000000000
> >>>>> [   68.561074] RBP: 000000000000000a R08: 0000000000000000 R09: 7fffffffffffffff
> >>>>> [   68.561074] R10: 00000005a3af2140 R11: 0000000000004601 R12: 0000000000400100
> >>>>> [   68.561075] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
> >>>>> [   68.561111] FS:  00007fa1b536f780(0000) GS:ff110004abe26000(0000)
> >>>>> knlGS:0000000000000000
> >>>>> [   68.561112] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> [   68.561113] CR2: 0000000001c67a30 CR3: 000000010f2f3003 CR4: 00000000003716f0
> >>>>> [   68.561113] Call Trace:
> >>>>> [   68.561170]  <IRQ>
> >>>>> [   68.561174]  irq_exit_rcu+0x91/0xb0
> >>>>> [   68.561176]  sysvec_apic_timer_interrupt+0x71/0x90
> >>>>> [   68.561181]  </IRQ>
> >>>>> [   68.561181]  <TASK>
> >>>>> [   68.561182]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> >>>>> [   68.561184] RIP: 0010:generic_exec_single+0x33/0x120
> >>>>> [   68.561188] Code: 65 39 3d 5c 4d 89 02 74 28 3b 3d b8 f5 17 02 0f
> >>>>> 83 de 00 00 00 89 f8 48 0f a3 05 d8 f1 17 02 0f 83 ce 00 00 00 e8 bd
> >>>>> fe ff ff <31> c0 5b 5d 9
> >>>>> [   68.561189] RSP: 0018:ffa00000025a3cc0 EFLAGS: 00000206
> >>>>> [   68.561190] RAX: 0000000000000000 RBX: ffffffff8145b310 RCX: ff110004abe26000
> >>>>> [   68.561190] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ff1100042fa26540
> >>>>> [   68.561191] RBP: 0000000000000202 R08: 0000000000000000 R09: 0000000000000000
> >>>>> [   68.561191] R10: ffa00000025a3ee0 R11: 0000000000000000 R12: ffa00000025a3d40
> >>>>> [   68.561192] R13: ff11000100881220 R14: ff11000100880fc0 R15: 0000000000000000
> >>>>> [   68.561192]  ? sw_perf_event_destroy+0x70/0x70
> >>>>> [   68.561197]  smp_call_function_single+0xc4/0x110
> >>>>> [   68.561199]  ? sw_perf_event_destroy+0x70/0x70
> >>>>> [   68.561200]  event_function_call+0x160/0x170
> >>>>> [   68.561202]  ? ctx_resched+0x2d0/0x2d0
> >>>>> [   68.561205]  ? perf_event_set_state+0x60/0x60
> >>>>> [   68.561206]  ? _perf_event_disable+0x50/0x50
> >>>>> [   68.561208]  perf_event_for_each_child+0x37/0x80
> >>>>> [   68.561209]  ? _perf_event_disable+0x50/0x50
> >>>>> [   68.561211]  _perf_ioctl+0x1df/0xad0
> >>>>> [   68.561213]  ? __set_cpus_allowed_ptr+0x71/0x80
> >>>>> [   68.561216]  ? avc_has_perm+0x72/0x160
> >>>>> [   68.561219]  ? ldsem_down_write+0x1bb/0x1fc
> >>>>> [   68.561222]  perf_ioctl+0x42/0x70
> >>>>> [   68.561224]  __x64_sys_ioctl+0x8f/0xd0
> >>>>> [   68.561226]  do_syscall_64+0x46/0x160
> >>>>> [   68.561228]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> >>>>>
> >>>>> but only after a fresh boot.
> >>>>>
> >>>>> If I run bpf selftests before that it works.
> >>>>> Like test_progs -t stacktrace_build_id_nmi
> >>>>> followed by the same
> >>>>> perf record -a -e cpu-clock ..
> >>>>> it works fine.
> >>>>>
> >>>>
> >>>> There should be a bug in V3. When stops in the throttle, the event
> >>>> should not be updated, stop(event, 0). But the cpu_clock_event_stop()
> >>>> doesn't handle the flag. That changes the behavior a little bit.
> >>>>
> >>>> Could you please try the below patch and see if it helps?
> >>>>
> >>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
> >>>> index cd85b1820e7d..b6c57ba24e78 100644
> >>>> --- a/kernel/events/core.c
> >>>> +++ b/kernel/events/core.c
> >>>> @@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct
> >>>> perf_event *event, bool start)
> >>>>
> >>>>  static void perf_event_throttle(struct perf_event *event)
> >>>>  {
> >>>> -       event->pmu->stop(event, 0);
> >>>>         event->hw.interrupts = MAX_INTERRUPTS;
> >>>> +       event->pmu->stop(event, 0);
> >>>>         if (event == event->group_leader)
> >>>>                 perf_log_throttle(event, 0);
> >>>>  }
> >>>> @@ -11777,7 +11777,12 @@ static void perf_swevent_cancel_hrtimer(struct
> >>>> perf_event *event)
> >>>>  {
> >>>>         struct hw_perf_event *hwc = &event->hw;
> >>>>
> >>>> -       if (is_sampling_event(event)) {
> >>>> +       /*
> >>>> +        * The throttle can be triggered in the hrtimer handler.
> >>>> +        * The HRTIMER_NORESTART should be used to stop the timer,
> >>>> +        * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
> >>>> +        */
> >>>> +       if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
> >>>>                 ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
> >>>>                 local64_set(&hwc->period_left, ktime_to_ns(remaining));
> >>>>
> >>>> @@ -11832,7 +11837,8 @@ static void cpu_clock_event_start(struct
> >>>> perf_event *event, int flags)
> >>>>  static void cpu_clock_event_stop(struct perf_event *event, int flags)
> >>>>  {
> >>>>         perf_swevent_cancel_hrtimer(event);
> >>>> -       cpu_clock_event_update(event);
> >>>> +       if (flags & PERF_EF_UPDATE)
> >>>> +               cpu_clock_event_update(event);
> >>>>  }
> >>>
> >>> Nope. The last hunk didn't make any difference.
> >>> Same soft lockup.
> >>
> >> Thanks for the verification.
> >>
> >> I have some questions, could you please help to clarify?
> >>
> >> - What's the value of proc/sys/kernel/perf_event_max_sample_rate in the
> >> test?
> >
> > 100000
> >
> >> - Can the soft lockup issue be produced by both bpf selftests and the
> >> perf record command? Or just the perf record command?
> >
> > just perf record.
> >
> >> - Can the soft lockup issue be reproduced with the v2 patchset?
> >> https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
> >
> > yes. With v2
> > perf record -a -e cpu-clock -c10000 -- sleep 1
> >
> > soft locks up as well with the same stack trace.
> >
> >> - Furthermore, can the soft lockup issue be reproduced after reverting
> >> the recent generic throttle logic fix?
> >> commit e800ac51202f ("perf: Only dump the throttle log for the leader")
> >> commit 9734e25fbf5a ("perf: Fix the throttle logic for a group")
> >
> > yes.
> >
> > perf record -a -e cpu-clock -- sleep 1
> > works
> >
> > but
> > perf record -a -e cpu-clock -c100 -- sleep 1
> > locks up.
> > -c1, -c100, -c10000 all lockup.
>
> Thanks for the confirmation.
>
> The soft lockup issue is likely an existing issue in virtualization,
> unrelated to the recent throttle patches.
>
> The issue should be caused by the overwhelming overflows.
> The -c means fixed period. It's too small for the SW event in VM.
> For 1 GHZ CPU, the -c10000 will trigger 100000 samples per second.
> -c1 will trigger 1000000000 samples per second.
> The perf_event_max_sample_rate is set to 100000. The throttle mechanism
> should be triggered for the above case, but even with 100000 samples per
> second. It seems still be big number in virtualization.
>
> There is no such issue without -c. I think it's because of the
> auto-adjust mechanism for the freq mode. The period is automatically
> adjusted, so a reasonable number of samples can be generated per second.
>
> If you change the proc/sys/kernel/perf_event_max_sample_rate to a small
> value, e.g., 1000, I think the issue should be gone.
>
> Could you please give a try?

Smaller value helps, but this is not an acceptable workaround.
The kernel shouldn't lock up like that with a simple perf command.
Pls spin up a VM, reproduce it and fix it.
The issue affects everyone. My VM is not special in any way.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-06 17:42                 ` Alexei Starovoitov
@ 2025-06-06 18:38                   ` Liang, Kan
  2025-06-06 18:40                     ` Alexei Starovoitov
  0 siblings, 1 reply; 16+ messages in thread
From: Liang, Kan @ 2025-06-06 18:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote



On 2025-06-06 1:42 p.m., Alexei Starovoitov wrote:
> On Fri, Jun 6, 2025 at 6:05 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2025-06-05 8:39 p.m., Alexei Starovoitov wrote:
>>> On Thu, Jun 5, 2025 at 4:50 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2025-06-05 4:46 p.m., Alexei Starovoitov wrote:
>>>>> On Thu, Jun 5, 2025 at 1:24 PM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2025-06-05 2:45 p.m., Alexei Starovoitov wrote:
>>>>>>> On Thu, Jun 5, 2025 at 6:46 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
>>>>>>>>> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
>>>>>>>>>>
>>>>>>>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>>>>>>>
>>>>>>>>>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
>>>>>>>>>> below perf command.
>>>>>>>>>>   perf record -a -e cpu-clock -- sleep 2
>>>>>>>>>>
>>>>>>>>>> The issue is introduced by the generic throttle patch set, which
>>>>>>>>>> unconditionally invoke the event_stop() when throttle is triggered.
>>>>>>>>>>
>>>>>>>>>> The cpu-clock and task-clock are two special SW events, which rely on
>>>>>>>>>> the hrtimer. The throttle is invoked in the hrtimer handler. The
>>>>>>>>>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
>>>>>>>>>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
>>>>>>>>>> be used to stop the timer.
>>>>>>>>>>
>>>>>>>>>> There may be two ways to fix it.
>>>>>>>>>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>>>>>>>>>>   perf_event_throttle() if the flag is detected.
>>>>>>>>>>   It has been implemented in the
>>>>>>>>>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>>>>>>>>>   The new flag was thought to be an overkill for the issue.
>>>>>>>>>> - Add a check in the event_stop. Return immediately if the throttle is
>>>>>>>>>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>>>>>>>>>>   method to stop the timer.
>>>>>>>>>>
>>>>>>>>>> The latter is implemented here.
>>>>>>>>>>
>>>>>>>>>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>>>>>>>>>> the order the same as perf_event_unthrottle(). Except the patch, no one
>>>>>>>>>> checks the hw.interrupts in the stop(). There is no impact from the
>>>>>>>>>> order change.
>>>>>>>>>>
>>>>>>>>>> Reported-by: Leo Yan <leo.yan@arm.com>
>>>>>>>>>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
>>>>>>>>>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>>>>>>>>>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>>>>>>>>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>>>>>>>>>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>>>>>>>>>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>>>>>>>>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>>>>>>>>
>>>>>>>>> It seems the patch fixes one issue and introduces another ?
>>>>>>>>>
>>>>>>>>> Looks like the throttle event is sticky.
>>>>>>>>> Once it's reached the perf_event no longer works ?
>>>>>>>>
>>>>>>>> No. It should still work even the throttle is triggered.
>>>>>>>>
>>>>>>>> sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
>>>>>>>> /proc/sys/kernel/perf_event_max_sample_rate'
>>>>>>>> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
>>>>>>>> [ perf record: Woken up 1 times to write data ]
>>>>>>>> [ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
>>>>>>>
>>>>>>> With the patch applied above command hangs in my VM:
>>>>>>>
>>>>>>> # perf record -a -e cpu-clock -c10000 -- sleep 1
>>>>>>> [   43.656855] hrtimer: interrupt took 21640 ns
>>>>>>> [   68.561052] watchdog: BUG: soft lockup - CPU#0 stuck for 41s! [perf:2253]
>>>>>>> [   68.561056] Modules linked in: bpf_preload
>>>>>>> [   68.561060] CPU: 0 UID: 0 PID: 2253 Comm: perf Not tainted
>>>>>>> 6.15.0-12294-gc89e5202e569 #1172 PREEMPT
>>>>>>> [   68.561062] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>>>>> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
>>>>>>> [   68.561063] RIP: 0010:handle_softirqs+0x77/0x2a0
>>>>>>> [   68.561070] Code: 6b 96 02 00 01 00 00 bd 0a 00 00 00 44 89 64 24
>>>>>>> 14 89 6c 24 10 40 88 7c 24 04 31 c0 65 66 89 05 5f 6b 96 02 fb bb ff
>>>>>>> ff ff ff <48> c7 c0 c0 f
>>>>>>> [   68.561071] RSP: 0018:ffa0000000003fa0 EFLAGS: 00000246
>>>>>>> [   68.561072] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0
>>>>>>> [   68.561073] RDX: 0000000000000007 RSI: ff1100010212e100 RDI: 0000000000000000
>>>>>>> [   68.561074] RBP: 000000000000000a R08: 0000000000000000 R09: 7fffffffffffffff
>>>>>>> [   68.561074] R10: 00000005a3af2140 R11: 0000000000004601 R12: 0000000000400100
>>>>>>> [   68.561075] R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000000
>>>>>>> [   68.561111] FS:  00007fa1b536f780(0000) GS:ff110004abe26000(0000)
>>>>>>> knlGS:0000000000000000
>>>>>>> [   68.561112] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [   68.561113] CR2: 0000000001c67a30 CR3: 000000010f2f3003 CR4: 00000000003716f0
>>>>>>> [   68.561113] Call Trace:
>>>>>>> [   68.561170]  <IRQ>
>>>>>>> [   68.561174]  irq_exit_rcu+0x91/0xb0
>>>>>>> [   68.561176]  sysvec_apic_timer_interrupt+0x71/0x90
>>>>>>> [   68.561181]  </IRQ>
>>>>>>> [   68.561181]  <TASK>
>>>>>>> [   68.561182]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
>>>>>>> [   68.561184] RIP: 0010:generic_exec_single+0x33/0x120
>>>>>>> [   68.561188] Code: 65 39 3d 5c 4d 89 02 74 28 3b 3d b8 f5 17 02 0f
>>>>>>> 83 de 00 00 00 89 f8 48 0f a3 05 d8 f1 17 02 0f 83 ce 00 00 00 e8 bd
>>>>>>> fe ff ff <31> c0 5b 5d 9
>>>>>>> [   68.561189] RSP: 0018:ffa00000025a3cc0 EFLAGS: 00000206
>>>>>>> [   68.561190] RAX: 0000000000000000 RBX: ffffffff8145b310 RCX: ff110004abe26000
>>>>>>> [   68.561190] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ff1100042fa26540
>>>>>>> [   68.561191] RBP: 0000000000000202 R08: 0000000000000000 R09: 0000000000000000
>>>>>>> [   68.561191] R10: ffa00000025a3ee0 R11: 0000000000000000 R12: ffa00000025a3d40
>>>>>>> [   68.561192] R13: ff11000100881220 R14: ff11000100880fc0 R15: 0000000000000000
>>>>>>> [   68.561192]  ? sw_perf_event_destroy+0x70/0x70
>>>>>>> [   68.561197]  smp_call_function_single+0xc4/0x110
>>>>>>> [   68.561199]  ? sw_perf_event_destroy+0x70/0x70
>>>>>>> [   68.561200]  event_function_call+0x160/0x170
>>>>>>> [   68.561202]  ? ctx_resched+0x2d0/0x2d0
>>>>>>> [   68.561205]  ? perf_event_set_state+0x60/0x60
>>>>>>> [   68.561206]  ? _perf_event_disable+0x50/0x50
>>>>>>> [   68.561208]  perf_event_for_each_child+0x37/0x80
>>>>>>> [   68.561209]  ? _perf_event_disable+0x50/0x50
>>>>>>> [   68.561211]  _perf_ioctl+0x1df/0xad0
>>>>>>> [   68.561213]  ? __set_cpus_allowed_ptr+0x71/0x80
>>>>>>> [   68.561216]  ? avc_has_perm+0x72/0x160
>>>>>>> [   68.561219]  ? ldsem_down_write+0x1bb/0x1fc
>>>>>>> [   68.561222]  perf_ioctl+0x42/0x70
>>>>>>> [   68.561224]  __x64_sys_ioctl+0x8f/0xd0
>>>>>>> [   68.561226]  do_syscall_64+0x46/0x160
>>>>>>> [   68.561228]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>>>>>
>>>>>>> but only after a fresh boot.
>>>>>>>
>>>>>>> If I run bpf selftests before that it works.
>>>>>>> Like test_progs -t stacktrace_build_id_nmi
>>>>>>> followed by the same
>>>>>>> perf record -a -e cpu-clock ..
>>>>>>> it works fine.
>>>>>>>
>>>>>>
>>>>>> There should be a bug in V3. When stops in the throttle, the event
>>>>>> should not be updated, stop(event, 0). But the cpu_clock_event_stop()
>>>>>> doesn't handle the flag. That changes the behavior a little bit.
>>>>>>
>>>>>> Could you please try the below patch and see if it helps?
>>>>>>
>>>>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>>>>> index cd85b1820e7d..b6c57ba24e78 100644
>>>>>> --- a/kernel/events/core.c
>>>>>> +++ b/kernel/events/core.c
>>>>>> @@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct
>>>>>> perf_event *event, bool start)
>>>>>>
>>>>>>  static void perf_event_throttle(struct perf_event *event)
>>>>>>  {
>>>>>> -       event->pmu->stop(event, 0);
>>>>>>         event->hw.interrupts = MAX_INTERRUPTS;
>>>>>> +       event->pmu->stop(event, 0);
>>>>>>         if (event == event->group_leader)
>>>>>>                 perf_log_throttle(event, 0);
>>>>>>  }
>>>>>> @@ -11777,7 +11777,12 @@ static void perf_swevent_cancel_hrtimer(struct
>>>>>> perf_event *event)
>>>>>>  {
>>>>>>         struct hw_perf_event *hwc = &event->hw;
>>>>>>
>>>>>> -       if (is_sampling_event(event)) {
>>>>>> +       /*
>>>>>> +        * The throttle can be triggered in the hrtimer handler.
>>>>>> +        * The HRTIMER_NORESTART should be used to stop the timer,
>>>>>> +        * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
>>>>>> +        */
>>>>>> +       if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
>>>>>>                 ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
>>>>>>                 local64_set(&hwc->period_left, ktime_to_ns(remaining));
>>>>>>
>>>>>> @@ -11832,7 +11837,8 @@ static void cpu_clock_event_start(struct
>>>>>> perf_event *event, int flags)
>>>>>>  static void cpu_clock_event_stop(struct perf_event *event, int flags)
>>>>>>  {
>>>>>>         perf_swevent_cancel_hrtimer(event);
>>>>>> -       cpu_clock_event_update(event);
>>>>>> +       if (flags & PERF_EF_UPDATE)
>>>>>> +               cpu_clock_event_update(event);
>>>>>>  }
>>>>>
>>>>> Nope. The last hunk didn't make any difference.
>>>>> Same soft lockup.
>>>>
>>>> Thanks for the verification.
>>>>
>>>> I have some questions, could you please help to clarify?
>>>>
>>>> - What's the value of proc/sys/kernel/perf_event_max_sample_rate in the
>>>> test?
>>>
>>> 100000
>>>
>>>> - Can the soft lockup issue be produced by both bpf selftests and the
>>>> perf record command? Or just the perf record command?
>>>
>>> just perf record.
>>>
>>>> - Can the soft lockup issue be reproduced with the v2 patchset?
>>>> https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>>
>>> yes. With v2
>>> perf record -a -e cpu-clock -c10000 -- sleep 1
>>>
>>> soft locks up as well with the same stack trace.
>>>
>>>> - Furthermore, can the soft lockup issue be reproduced after reverting
>>>> the recent generic throttle logic fix?
>>>> commit e800ac51202f ("perf: Only dump the throttle log for the leader")
>>>> commit 9734e25fbf5a ("perf: Fix the throttle logic for a group")
>>>
>>> yes.
>>>
>>> perf record -a -e cpu-clock -- sleep 1
>>> works
>>>
>>> but
>>> perf record -a -e cpu-clock -c100 -- sleep 1
>>> locks up.
>>> -c1, -c100, -c10000 all lockup.
>>
>> Thanks for the confirmation.
>>
>> The soft lockup issue is likely an existing issue in virtualization,
>> unrelated to the recent throttle patches.
>>
>> The issue should be caused by the overwhelming overflows.
>> The -c means fixed period. It's too small for the SW event in VM.
>> For 1 GHZ CPU, the -c10000 will trigger 100000 samples per second.
>> -c1 will trigger 1000000000 samples per second.
>> The perf_event_max_sample_rate is set to 100000. The throttle mechanism
>> should be triggered for the above case, but even with 100000 samples per
>> second. It seems still be big number in virtualization.
>>
>> There is no such issue without -c. I think it's because of the
>> auto-adjust mechanism for the freq mode. The period is automatically
>> adjusted, so a reasonable number of samples can be generated per second.
>>
>> If you change the proc/sys/kernel/perf_event_max_sample_rate to a small
>> value, e.g., 1000, I think the issue should be gone.
>>
>> Could you please give a try?
> 
> Smaller value helps, but this is not an acceptable workaround.
> The kernel shouldn't lock up like that with a simple perf command.

Right, but the soft lockup is a different issue. Even without the recent
generic throttle logic change, it still can be triggered, right?

How does the below sound?
I will send a V4 with the small change I made yesterday. Please help to
verify it and see if it fixes your original report (test_progs cases).
If yes, let's merge it first, so it can unblock other people (ARM CI,
IBM CI, Vince...).

Then I will go back and see what I can do for this long-standing issue
in VM.

Thanks,
Kan




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] perf: Fix the throttle error of some clock events
  2025-06-06 18:38                   ` Liang, Kan
@ 2025-06-06 18:40                     ` Alexei Starovoitov
  0 siblings, 0 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2025-06-06 18:40 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Ian Rogers,
	Mark Rutland, LKML, linux-perf-use., Stephane Eranian,
	Chun-Tse Shao, Thomas Richter, Leo Yan, Aishwarya TCV,
	Venkat Rao Bagalkote

On Fri, Jun 6, 2025 at 11:38 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
> Right, but the soft lockup is a different issue. Even without the recent
> generic throttle logic change, it still can be triggered, right?
>
> How does the below sound?
> I will send a V4 with the small change I made yesterday. Please help to
> verify it and see if it fixes your original report (test_progs cases).
> If yes, let's merge it first, so it can unblock other people (ARM CI,
> IBM CI, Vince...).

sounds good. definitely address the regression asap,
since BPF CI is also blocked.

> Then I will go back and see what I can do for this long-standing issue
> in VM.
>
> Thanks,
> Kan
>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-06-06 18:40 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04 17:15 [PATCH V3] perf: Fix the throttle error of some clock events kan.liang
2025-06-04 20:08 ` Ian Rogers
2025-06-04 23:21 ` Alexei Starovoitov
2025-06-05 13:46   ` Liang, Kan
2025-06-05 16:58     ` Liang, Kan
2025-06-05 18:45     ` Alexei Starovoitov
2025-06-05 20:24       ` Liang, Kan
2025-06-05 20:46         ` Alexei Starovoitov
2025-06-05 23:50           ` Liang, Kan
2025-06-06  0:39             ` Alexei Starovoitov
2025-06-06 13:05               ` Liang, Kan
2025-06-06 17:42                 ` Alexei Starovoitov
2025-06-06 18:38                   ` Liang, Kan
2025-06-06 18:40                     ` Alexei Starovoitov
2025-06-05  6:39 ` Ingo Molnar
2025-06-05 13:51   ` Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).