linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4] perf: Fix the throttle error of some clock events
@ 2025-06-06 19:25 kan.liang
  2025-06-09  7:43 ` Venkat Rao Bagalkote
  2025-06-09 12:34 ` Leo Yan
  0 siblings, 2 replies; 7+ messages in thread
From: kan.liang @ 2025-06-06 19:25 UTC (permalink / raw)
  To: peterz, mingo, namhyung, irogers, mark.rutland, linux-kernel,
	linux-perf-users
  Cc: eranian, ctshao, tmricht, Kan Liang, Leo Yan, Aishwarya TCV,
	Alexei Starovoitov, Venkat Rao Bagalkote, Vince Weaver

From: Kan Liang <kan.liang@linux.intel.com>

Both ARM and IBM CI reports RCU stall, which can be reproduced by the
below perf command.
  perf record -a -e cpu-clock -- sleep 2

The issue is introduced by the generic throttle patch set, which
unconditionally invoke the event_stop() when throttle is triggered.

The cpu-clock and task-clock are two special SW events, which rely on
the hrtimer. The throttle is invoked in the hrtimer handler. The
event_stop()->hrtimer_cancel() waits for the handler to finish, which is
a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
be used to stop the timer.

There may be two ways to fix it.
- Introduce a PMU flag to track the case. Avoid the event_stop in
  perf_event_throttle() if the flag is detected.
  It has been implemented in the
  https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
  The new flag was thought to be an overkill for the issue.
- Add a check in the event_stop. Return immediately if the throttle is
  invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
  method to stop the timer.

The latter is implemented here.

Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
the order the same as perf_event_unthrottle(). Except the patch, no one
checks the hw.interrupts in the stop(). There is no impact from the
order change.

When stops in the throttle, the event should not be updated,
stop(event, 0). But the cpu_clock_event_stop() doesn't handle the flag.
In logic, it's wrong. But it didn't bring any problems with the old
code, because the stop() was not invoked when handling the throttle.
Checking the flag before updating the event.

Reported-by: Leo Yan <leo.yan@arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Closes: https://lore.kernel.org/lkml/4ce106d0-950c-aadc-0b6a-f0215cd39913@maine.edu/
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

The patch is to fix the issue introduced by

  9734e25fbf5a perf: Fix the throttle logic for a group

It is still in the tip.git, I'm not sure if the commit ID is valid. So
the Fixes tag is not added.

There are some discussions regarding to a soft lockup issue.
That is an existing issue, which are not introduced by the above change.
It should be fixed separately.
https://lore.kernel.org/lkml/CAADnVQ+Lx0HWEM8xdLC80wco3BTUPAD_2dQ-3oZFiECZMcw2aQ@mail.gmail.com/

Changes since V3:
- Check before update in event_stop()
- Add Reviewed-by from Ian

 kernel/events/core.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f34c99f8ce8f..cc77f127e11a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct perf_event *event, bool start)
 
 static void perf_event_throttle(struct perf_event *event)
 {
-	event->pmu->stop(event, 0);
 	event->hw.interrupts = MAX_INTERRUPTS;
+	event->pmu->stop(event, 0);
 	if (event == event->group_leader)
 		perf_log_throttle(event, 0);
 }
@@ -11749,7 +11749,12 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
-	if (is_sampling_event(event)) {
+	/*
+	 * The throttle can be triggered in the hrtimer handler.
+	 * The HRTIMER_NORESTART should be used to stop the timer,
+	 * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
+	 */
+	if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
 		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
 		local64_set(&hwc->period_left, ktime_to_ns(remaining));
 
@@ -11804,7 +11809,8 @@ static void cpu_clock_event_start(struct perf_event *event, int flags)
 static void cpu_clock_event_stop(struct perf_event *event, int flags)
 {
 	perf_swevent_cancel_hrtimer(event);
-	cpu_clock_event_update(event);
+	if (flags & PERF_EF_UPDATE)
+		cpu_clock_event_update(event);
 }
 
 static int cpu_clock_event_add(struct perf_event *event, int flags)
@@ -11882,7 +11888,8 @@ static void task_clock_event_start(struct perf_event *event, int flags)
 static void task_clock_event_stop(struct perf_event *event, int flags)
 {
 	perf_swevent_cancel_hrtimer(event);
-	task_clock_event_update(event, event->ctx->time);
+	if (flags & PERF_EF_UPDATE)
+		task_clock_event_update(event, event->ctx->time);
 }
 
 static int task_clock_event_add(struct perf_event *event, int flags)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V4] perf: Fix the throttle error of some clock events
  2025-06-06 19:25 [PATCH V4] perf: Fix the throttle error of some clock events kan.liang
@ 2025-06-09  7:43 ` Venkat Rao Bagalkote
  2025-06-09 12:34 ` Leo Yan
  1 sibling, 0 replies; 7+ messages in thread
From: Venkat Rao Bagalkote @ 2025-06-09  7:43 UTC (permalink / raw)
  To: kan.liang, peterz, mingo, namhyung, irogers, mark.rutland,
	linux-kernel, linux-perf-users
  Cc: eranian, ctshao, tmricht, Leo Yan, Aishwarya TCV,
	Alexei Starovoitov, Vince Weaver


On 07/06/25 12:55 am, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
> below perf command.
>    perf record -a -e cpu-clock -- sleep 2
>
> The issue is introduced by the generic throttle patch set, which
> unconditionally invoke the event_stop() when throttle is triggered.
>
> The cpu-clock and task-clock are two special SW events, which rely on
> the hrtimer. The throttle is invoked in the hrtimer handler. The
> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
> be used to stop the timer.
>
> There may be two ways to fix it.
> - Introduce a PMU flag to track the case. Avoid the event_stop in
>    perf_event_throttle() if the flag is detected.
>    It has been implemented in the
>    https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>    The new flag was thought to be an overkill for the issue.
> - Add a check in the event_stop. Return immediately if the throttle is
>    invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>    method to stop the timer.
>
> The latter is implemented here.
>
> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
> the order the same as perf_event_unthrottle(). Except the patch, no one
> checks the hw.interrupts in the stop(). There is no impact from the
> order change.
>
> When stops in the throttle, the event should not be updated,
> stop(event, 0). But the cpu_clock_event_stop() doesn't handle the flag.
> In logic, it's wrong. But it didn't bring any problems with the old
> code, because the stop() was not invoked when handling the throttle.
> Checking the flag before updating the event.
>
> Reported-by: Leo Yan <leo.yan@arm.com>
> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
> Reported-by: Vince Weaver <vincent.weaver@maine.edu>
> Closes: https://lore.kernel.org/lkml/4ce106d0-950c-aadc-0b6a-f0215cd39913@maine.edu/
> Reviewed-by: Ian Rogers <irogers@google.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---


Tested this patch by applying on top of 6.16.0-rc1, and it fixes the 
reported issue. Hence,


Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>


Regards,

Venkat.

>
> The patch is to fix the issue introduced by
>
>    9734e25fbf5a perf: Fix the throttle logic for a group
>
> It is still in the tip.git, I'm not sure if the commit ID is valid. So
> the Fixes tag is not added.
>
> There are some discussions regarding to a soft lockup issue.
> That is an existing issue, which are not introduced by the above change.
> It should be fixed separately.
> https://lore.kernel.org/lkml/CAADnVQ+Lx0HWEM8xdLC80wco3BTUPAD_2dQ-3oZFiECZMcw2aQ@mail.gmail.com/
>
> Changes since V3:
> - Check before update in event_stop()
> - Add Reviewed-by from Ian
>
>   kernel/events/core.c | 15 +++++++++++----
>   1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f34c99f8ce8f..cc77f127e11a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct perf_event *event, bool start)
>   
>   static void perf_event_throttle(struct perf_event *event)
>   {
> -	event->pmu->stop(event, 0);
>   	event->hw.interrupts = MAX_INTERRUPTS;
> +	event->pmu->stop(event, 0);
>   	if (event == event->group_leader)
>   		perf_log_throttle(event, 0);
>   }
> @@ -11749,7 +11749,12 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
>   {
>   	struct hw_perf_event *hwc = &event->hw;
>   
> -	if (is_sampling_event(event)) {
> +	/*
> +	 * The throttle can be triggered in the hrtimer handler.
> +	 * The HRTIMER_NORESTART should be used to stop the timer,
> +	 * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
> +	 */
> +	if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
>   		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
>   		local64_set(&hwc->period_left, ktime_to_ns(remaining));
>   
> @@ -11804,7 +11809,8 @@ static void cpu_clock_event_start(struct perf_event *event, int flags)
>   static void cpu_clock_event_stop(struct perf_event *event, int flags)
>   {
>   	perf_swevent_cancel_hrtimer(event);
> -	cpu_clock_event_update(event);
> +	if (flags & PERF_EF_UPDATE)
> +		cpu_clock_event_update(event);
>   }
>   
>   static int cpu_clock_event_add(struct perf_event *event, int flags)
> @@ -11882,7 +11888,8 @@ static void task_clock_event_start(struct perf_event *event, int flags)
>   static void task_clock_event_stop(struct perf_event *event, int flags)
>   {
>   	perf_swevent_cancel_hrtimer(event);
> -	task_clock_event_update(event, event->ctx->time);
> +	if (flags & PERF_EF_UPDATE)
> +		task_clock_event_update(event, event->ctx->time);
>   }
>   
>   static int task_clock_event_add(struct perf_event *event, int flags)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4] perf: Fix the throttle error of some clock events
  2025-06-06 19:25 [PATCH V4] perf: Fix the throttle error of some clock events kan.liang
  2025-06-09  7:43 ` Venkat Rao Bagalkote
@ 2025-06-09 12:34 ` Leo Yan
  2025-06-09 13:48   ` Liang, Kan
  1 sibling, 1 reply; 7+ messages in thread
From: Leo Yan @ 2025-06-09 12:34 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, namhyung, irogers, mark.rutland, linux-kernel,
	linux-perf-users, eranian, ctshao, tmricht, Aishwarya TCV,
	Alexei Starovoitov, Venkat Rao Bagalkote, Vince Weaver

Hi Kan,

On Fri, Jun 06, 2025 at 12:25:46PM -0700, kan.liang@linux.intel.com wrote:

[...]

> There may be two ways to fix it.
> - Introduce a PMU flag to track the case. Avoid the event_stop in
>   perf_event_throttle() if the flag is detected.
>   It has been implemented in the
>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>   The new flag was thought to be an overkill for the issue.
> - Add a check in the event_stop. Return immediately if the throttle is
>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>   method to stop the timer.
> 
> The latter is implemented here.
> 
> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
> the order the same as perf_event_unthrottle(). Except the patch, no one
> checks the hw.interrupts in the stop(). There is no impact from the
> order change.
> 
> When stops in the throttle, the event should not be updated,
> stop(event, 0).

I am confused for this conclusion. When a CPU or task clock event is
stopped by throttling, should it also be updated? Otherwise, we will
lose accouting for the period prior to the throttling.

I saw you exchanged with Alexei for a soft lockup issue, the reply [1]
shows that skipping event update on throttling does not help to
resolve the lockup issue.

Could you elaberate why we don't need to update clock events when
throttling?

Thanks,
Leo

[1] https://lore.kernel.org/linux-perf-users/CAADnVQKRJKsG08KkEriuBQop0LgDr+c9rkNE6MUh_n3rzZoXVQ@mail.gmail.com/

> But the cpu_clock_event_stop() doesn't handle the flag.
> In logic, it's wrong. But it didn't bring any problems with the old
> code, because the stop() was not invoked when handling the throttle.
> Checking the flag before updating the event.
> 
> Reported-by: Leo Yan <leo.yan@arm.com>
> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
> Reported-by: Vince Weaver <vincent.weaver@maine.edu>
> Closes: https://lore.kernel.org/lkml/4ce106d0-950c-aadc-0b6a-f0215cd39913@maine.edu/
> Reviewed-by: Ian Rogers <irogers@google.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
> 
> The patch is to fix the issue introduced by
> 
>   9734e25fbf5a perf: Fix the throttle logic for a group
> 
> It is still in the tip.git, I'm not sure if the commit ID is valid. So
> the Fixes tag is not added.
> 
> There are some discussions regarding to a soft lockup issue.
> That is an existing issue, which are not introduced by the above change.
> It should be fixed separately.
> https://lore.kernel.org/lkml/CAADnVQ+Lx0HWEM8xdLC80wco3BTUPAD_2dQ-3oZFiECZMcw2aQ@mail.gmail.com/
> 
> Changes since V3:
> - Check before update in event_stop()
> - Add Reviewed-by from Ian
> 
>  kernel/events/core.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f34c99f8ce8f..cc77f127e11a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct perf_event *event, bool start)
>  
>  static void perf_event_throttle(struct perf_event *event)
>  {
> -	event->pmu->stop(event, 0);
>  	event->hw.interrupts = MAX_INTERRUPTS;
> +	event->pmu->stop(event, 0);
>  	if (event == event->group_leader)
>  		perf_log_throttle(event, 0);
>  }
> @@ -11749,7 +11749,12 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  
> -	if (is_sampling_event(event)) {
> +	/*
> +	 * The throttle can be triggered in the hrtimer handler.
> +	 * The HRTIMER_NORESTART should be used to stop the timer,
> +	 * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
> +	 */
> +	if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
>  		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
>  		local64_set(&hwc->period_left, ktime_to_ns(remaining));
>  
> @@ -11804,7 +11809,8 @@ static void cpu_clock_event_start(struct perf_event *event, int flags)
>  static void cpu_clock_event_stop(struct perf_event *event, int flags)
>  {
>  	perf_swevent_cancel_hrtimer(event);
> -	cpu_clock_event_update(event);
> +	if (flags & PERF_EF_UPDATE)
> +		cpu_clock_event_update(event);
>  }
>  
>  static int cpu_clock_event_add(struct perf_event *event, int flags)
> @@ -11882,7 +11888,8 @@ static void task_clock_event_start(struct perf_event *event, int flags)
>  static void task_clock_event_stop(struct perf_event *event, int flags)
>  {
>  	perf_swevent_cancel_hrtimer(event);
> -	task_clock_event_update(event, event->ctx->time);
> +	if (flags & PERF_EF_UPDATE)
> +		task_clock_event_update(event, event->ctx->time);
>  }
>  
>  static int task_clock_event_add(struct perf_event *event, int flags)
> -- 
> 2.38.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4] perf: Fix the throttle error of some clock events
  2025-06-09 12:34 ` Leo Yan
@ 2025-06-09 13:48   ` Liang, Kan
  2025-06-09 18:36     ` Leo Yan
  0 siblings, 1 reply; 7+ messages in thread
From: Liang, Kan @ 2025-06-09 13:48 UTC (permalink / raw)
  To: Leo Yan
  Cc: peterz, mingo, namhyung, irogers, mark.rutland, linux-kernel,
	linux-perf-users, eranian, ctshao, tmricht, Aishwarya TCV,
	Alexei Starovoitov, Venkat Rao Bagalkote, Vince Weaver



On 2025-06-09 8:34 a.m., Leo Yan wrote:
> Hi Kan,
> 
> On Fri, Jun 06, 2025 at 12:25:46PM -0700, kan.liang@linux.intel.com wrote:
> 
> [...]
> 
>> There may be two ways to fix it.
>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>>   perf_event_throttle() if the flag is detected.
>>   It has been implemented in the
>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>   The new flag was thought to be an overkill for the issue.
>> - Add a check in the event_stop. Return immediately if the throttle is
>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>>   method to stop the timer.
>>
>> The latter is implemented here.
>>
>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>> the order the same as perf_event_unthrottle(). Except the patch, no one
>> checks the hw.interrupts in the stop(). There is no impact from the
>> order change.
>>
>> When stops in the throttle, the event should not be updated,
>> stop(event, 0).
> 
> I am confused for this conclusion. When a CPU or task clock event is
> stopped by throttling, should it also be updated? Otherwise, we will
> lose accouting for the period prior to the throttling.
> 
> I saw you exchanged with Alexei for a soft lockup issue, the reply [1]
> shows that skipping event update on throttling does not help to
> resolve the lockup issue.
> 
> Could you elaberate why we don't need to update clock events when
> throttling?
> 

This is to follow the existing behavior before the throttling fix*.
When throttling is triggered, the stop(event, 0); will be invoked.
As my understanding, it's because the period is not changed with
throttling. So we don't need to update the period.

But if the period is changed, the update is required. You may find an
example in the perf_adjust_freq_unthr_events(). In the freq mode,
stop(event, PERF_EF_UPDATE) is actually invoked for the triggered event.

For the clock event, the existing behavior before the throttling fix* is
not to invoke the stop() in throttling. It relies on the
HRTIMER_NORESTART instead. My previous throttling fix changes the
behavior. It invokes both stop() and HRTIMER_NORESTART. Now, this patch
change the behavior back.

Regarding the soft lockup issue, it's a different issue in
virtualization. It should be an old issue which is not introduced by my
throttling fix.

Thanks,
Kan

* The throttling fix is 9734e25fbf5a ("perf: Fix the throttle logic for
a group")


> Thanks,
> Leo
> 
> [1] https://lore.kernel.org/linux-perf-users/CAADnVQKRJKsG08KkEriuBQop0LgDr+c9rkNE6MUh_n3rzZoXVQ@mail.gmail.com/
> 
>> But the cpu_clock_event_stop() doesn't handle the flag.
>> In logic, it's wrong. But it didn't bring any problems with the old
>> code, because the stop() was not invoked when handling the throttle.
>> Checking the flag before updating the event.
>>
>> Reported-by: Leo Yan <leo.yan@arm.com>
>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>> Reported-by: Vince Weaver <vincent.weaver@maine.edu>
>> Closes: https://lore.kernel.org/lkml/4ce106d0-950c-aadc-0b6a-f0215cd39913@maine.edu/
>> Reviewed-by: Ian Rogers <irogers@google.com>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>>
>> The patch is to fix the issue introduced by
>>
>>   9734e25fbf5a perf: Fix the throttle logic for a group
>>
>> It is still in the tip.git, I'm not sure if the commit ID is valid. So
>> the Fixes tag is not added.
>>
>> There are some discussions regarding to a soft lockup issue.
>> That is an existing issue, which are not introduced by the above change.
>> It should be fixed separately.
>> https://lore.kernel.org/lkml/CAADnVQ+Lx0HWEM8xdLC80wco3BTUPAD_2dQ-3oZFiECZMcw2aQ@mail.gmail.com/
>>
>> Changes since V3:
>> - Check before update in event_stop()
>> - Add Reviewed-by from Ian
>>
>>  kernel/events/core.c | 15 +++++++++++----
>>  1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index f34c99f8ce8f..cc77f127e11a 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -2656,8 +2656,8 @@ static void perf_event_unthrottle(struct perf_event *event, bool start)
>>  
>>  static void perf_event_throttle(struct perf_event *event)
>>  {
>> -	event->pmu->stop(event, 0);
>>  	event->hw.interrupts = MAX_INTERRUPTS;
>> +	event->pmu->stop(event, 0);
>>  	if (event == event->group_leader)
>>  		perf_log_throttle(event, 0);
>>  }
>> @@ -11749,7 +11749,12 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
>>  {
>>  	struct hw_perf_event *hwc = &event->hw;
>>  
>> -	if (is_sampling_event(event)) {
>> +	/*
>> +	 * The throttle can be triggered in the hrtimer handler.
>> +	 * The HRTIMER_NORESTART should be used to stop the timer,
>> +	 * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
>> +	 */
>> +	if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
>>  		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
>>  		local64_set(&hwc->period_left, ktime_to_ns(remaining));
>>  
>> @@ -11804,7 +11809,8 @@ static void cpu_clock_event_start(struct perf_event *event, int flags)
>>  static void cpu_clock_event_stop(struct perf_event *event, int flags)
>>  {
>>  	perf_swevent_cancel_hrtimer(event);
>> -	cpu_clock_event_update(event);
>> +	if (flags & PERF_EF_UPDATE)
>> +		cpu_clock_event_update(event);
>>  }
>>  
>>  static int cpu_clock_event_add(struct perf_event *event, int flags)
>> @@ -11882,7 +11888,8 @@ static void task_clock_event_start(struct perf_event *event, int flags)
>>  static void task_clock_event_stop(struct perf_event *event, int flags)
>>  {
>>  	perf_swevent_cancel_hrtimer(event);
>> -	task_clock_event_update(event, event->ctx->time);
>> +	if (flags & PERF_EF_UPDATE)
>> +		task_clock_event_update(event, event->ctx->time);
>>  }
>>  
>>  static int task_clock_event_add(struct perf_event *event, int flags)
>> -- 
>> 2.38.1
>>
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4] perf: Fix the throttle error of some clock events
  2025-06-09 13:48   ` Liang, Kan
@ 2025-06-09 18:36     ` Leo Yan
  2025-06-09 19:59       ` Liang, Kan
  0 siblings, 1 reply; 7+ messages in thread
From: Leo Yan @ 2025-06-09 18:36 UTC (permalink / raw)
  To: Liang, Kan
  Cc: peterz, mingo, namhyung, irogers, mark.rutland, linux-kernel,
	linux-perf-users, eranian, ctshao, tmricht, Aishwarya TCV,
	Alexei Starovoitov, Venkat Rao Bagalkote, Vince Weaver

On Mon, Jun 09, 2025 at 09:48:12AM -0400, Liang, Kan wrote:

[...]

> >> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
> >> the order the same as perf_event_unthrottle(). Except the patch, no one
> >> checks the hw.interrupts in the stop(). There is no impact from the
> >> order change.
> >>
> >> When stops in the throttle, the event should not be updated,
> >> stop(event, 0).
> > 
> > I am confused for this conclusion. When a CPU or task clock event is
> > stopped by throttling, should it also be updated? Otherwise, we will
> > lose accouting for the period prior to the throttling.
> > 
> > I saw you exchanged with Alexei for a soft lockup issue, the reply [1]
> > shows that skipping event update on throttling does not help to
> > resolve the lockup issue.
> > 
> > Could you elaberate why we don't need to update clock events when
> > throttling?
> > 
> 
> This is to follow the existing behavior before the throttling fix*.
>
> When throttling is triggered, the stop(event, 0); will be invoked.
> As my understanding, it's because the period is not changed with
> throttling. So we don't need to update the period.

> But if the period is changed, the update is required. You may find an
> example in the perf_adjust_freq_unthr_events(). In the freq mode,
> stop(event, PERF_EF_UPDATE) is actually invoked for the triggered event.

> For the clock event, the existing behavior before the throttling fix* is
> not to invoke the stop() in throttling. It relies on the
> HRTIMER_NORESTART instead. My previous throttling fix changes the
> behavior. It invokes both stop() and HRTIMER_NORESTART. Now, this patch
> change the behavior back.

Actually, the "event->count" has been updated in perf_swevent_hrtimer(),
this is why this patch does not cause big deviation if skip updating
count in the ->stop() callback:

  perf_swevent_hrtimer()
   ` event->pmu->read(event);               => Update count
   ` __perf_event_overflow()
      ` perf_event_throttle()
         ` event->pmu->stop(event, 0) / cpu_clock_event_stop()
            ` perf_swevent_cancel_hrtimer() => Skip to cancel timer
            ` task_clock_event_update()     => Skip to update count
   ` return HRTIMER_NORESTART;              => Stop timer

It is a bit urgly that we check the throttling separately in two
places: one is in perf_swevent_cancel_hrtime() for skipping cancel
timer, and then we skip updating event count in
cpu_clock_event_stop().

One solution is it would be fine to update count in ->stop() callback
for the throttling. This should not cause any issue (though it is a bit
redundant that the count is updated twice).

Or even more clear, we can define a flag PERF_EF_THROTTLING:

    #define PERF_EF_THROTTLING  0x20

    event->pmu->stop(event, PERF_EF_THROTTLING);

    cpu_clock_event_stop(struct perf_event *event, int flags)
    {
        if (flags == PERF_EF_THROTTLING)
            return;

        ....
    }

This might need to do a wider checking to ensure this new flags will not
cause any issues.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4] perf: Fix the throttle error of some clock events
  2025-06-09 18:36     ` Leo Yan
@ 2025-06-09 19:59       ` Liang, Kan
  2025-06-10 12:13         ` Leo Yan
  0 siblings, 1 reply; 7+ messages in thread
From: Liang, Kan @ 2025-06-09 19:59 UTC (permalink / raw)
  To: Leo Yan
  Cc: peterz, mingo, namhyung, irogers, mark.rutland, linux-kernel,
	linux-perf-users, eranian, ctshao, tmricht, Aishwarya TCV,
	Alexei Starovoitov, Venkat Rao Bagalkote, Vince Weaver



On 2025-06-09 2:36 p.m., Leo Yan wrote:
> On Mon, Jun 09, 2025 at 09:48:12AM -0400, Liang, Kan wrote:
> 
> [...]
> 
>>>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>>>> the order the same as perf_event_unthrottle(). Except the patch, no one
>>>> checks the hw.interrupts in the stop(). There is no impact from the
>>>> order change.
>>>>
>>>> When stops in the throttle, the event should not be updated,
>>>> stop(event, 0).
>>>
>>> I am confused for this conclusion. When a CPU or task clock event is
>>> stopped by throttling, should it also be updated? Otherwise, we will
>>> lose accouting for the period prior to the throttling.
>>>
>>> I saw you exchanged with Alexei for a soft lockup issue, the reply [1]
>>> shows that skipping event update on throttling does not help to
>>> resolve the lockup issue.
>>>
>>> Could you elaberate why we don't need to update clock events when
>>> throttling?
>>>
>>
>> This is to follow the existing behavior before the throttling fix*.
>>
>> When throttling is triggered, the stop(event, 0); will be invoked.
>> As my understanding, it's because the period is not changed with
>> throttling. So we don't need to update the period.
> 
>> But if the period is changed, the update is required. You may find an
>> example in the perf_adjust_freq_unthr_events(). In the freq mode,
>> stop(event, PERF_EF_UPDATE) is actually invoked for the triggered event.
> 
>> For the clock event, the existing behavior before the throttling fix* is
>> not to invoke the stop() in throttling. It relies on the
>> HRTIMER_NORESTART instead. My previous throttling fix changes the
>> behavior. It invokes both stop() and HRTIMER_NORESTART. Now, this patch
>> change the behavior back.
> 
> Actually, the "event->count" has been updated in perf_swevent_hrtimer(),
> this is why this patch does not cause big deviation if skip updating
> count in the ->stop() callback:
> >   perf_swevent_hrtimer()
>    ` event->pmu->read(event);               => Update count
>    ` __perf_event_overflow()
>       ` perf_event_throttle()
>          ` event->pmu->stop(event, 0) / cpu_clock_event_stop()
>             ` perf_swevent_cancel_hrtimer() => Skip to cancel timer
>             ` task_clock_event_update()     => Skip to update count
>    ` return HRTIMER_NORESTART;              => Stop timer
> 
> It is a bit urgly that we check the throttling separately in two
> places: one is in perf_swevent_cancel_hrtime() for skipping cancel
> timer, and then we skip updating event count in
> cpu_clock_event_stop().

The second check before cpu_clock_event_stop() is not a throttling
check. It's to implement the missed flag check.
Usually, the stop() should check PERF_EF_UPDATE before updating an
event. I think most of the ARCHs do so.
Some cases may ignore the flag. For the clock event, I think it's
because the stop(event, 0) is never invoked. So it doesn't matter if the
flag is checked. But now, there is a case which the flag matters.
So I think we should add the flag check.

> 
> One solution is it would be fine to update count in ->stop() callback
> for the throttling. This should not cause any issue (though it is a bit
> redundant that the count is updated twice).

The clock event relies on local_clock(), which never stops.
So it still counts between read() and stop().
It's not just redundant. The behavior is changed if the event is updated
in the stop() again.

> 
> Or even more clear, we can define a flag PERF_EF_THROTTLING:
> 
>     #define PERF_EF_THROTTLING  0x20
> 
>     event->pmu->stop(event, PERF_EF_THROTTLING);
> 

The if (hwc->interrupts != MAX_INTERRUPTS) should be good enough to
check the throttling case. I don't think we need a new flag here.

>     cpu_clock_event_stop(struct perf_event *event, int flags)
>     {
>         if (flags == PERF_EF_THROTTLING)
>             return;
> 
>         ....
>     }
> 
> This might need to do a wider checking to ensure this new flags will not
> cause any issues.

Right, it may brings more troubles.

I think we should properly utilize the existing flag rather than
introducing a new one.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V4] perf: Fix the throttle error of some clock events
  2025-06-09 19:59       ` Liang, Kan
@ 2025-06-10 12:13         ` Leo Yan
  0 siblings, 0 replies; 7+ messages in thread
From: Leo Yan @ 2025-06-10 12:13 UTC (permalink / raw)
  To: Liang, Kan
  Cc: peterz, mingo, namhyung, irogers, mark.rutland, linux-kernel,
	linux-perf-users, eranian, ctshao, tmricht, Aishwarya TCV,
	Alexei Starovoitov, Venkat Rao Bagalkote, Vince Weaver

On Mon, Jun 09, 2025 at 03:59:41PM -0400, Liang, Kan wrote:

[...]

> >> When throttling is triggered, the stop(event, 0); will be invoked.
> >> As my understanding, it's because the period is not changed with
> >> throttling. So we don't need to update the period.
> > 
> >> But if the period is changed, the update is required. You may find an
> >> example in the perf_adjust_freq_unthr_events(). In the freq mode,
> >> stop(event, PERF_EF_UPDATE) is actually invoked for the triggered event.
> > 
> >> For the clock event, the existing behavior before the throttling fix* is
> >> not to invoke the stop() in throttling. It relies on the
> >> HRTIMER_NORESTART instead. My previous throttling fix changes the
> >> behavior. It invokes both stop() and HRTIMER_NORESTART. Now, this patch
> >> change the behavior back.
> > 
> > Actually, the "event->count" has been updated in perf_swevent_hrtimer(),
> > this is why this patch does not cause big deviation if skip updating
> > count in the ->stop() callback:
> > >   perf_swevent_hrtimer()
> >    ` event->pmu->read(event);               => Update count
> >    ` __perf_event_overflow()
> >       ` perf_event_throttle()
> >          ` event->pmu->stop(event, 0) / cpu_clock_event_stop()
> >             ` perf_swevent_cancel_hrtimer() => Skip to cancel timer
> >             ` task_clock_event_update()     => Skip to update count
> >    ` return HRTIMER_NORESTART;              => Stop timer
> > 
> > It is a bit urgly that we check the throttling separately in two
> > places: one is in perf_swevent_cancel_hrtime() for skipping cancel
> > timer, and then we skip updating event count in
> > cpu_clock_event_stop().
> 
> The second check before cpu_clock_event_stop() is not a throttling
> check. It's to implement the missed flag check.
> Usually, the stop() should check PERF_EF_UPDATE before updating an
> event. I think most of the ARCHs do so.
> Some cases may ignore the flag. For the clock event, I think it's
> because the stop(event, 0) is never invoked. So it doesn't matter if the
> flag is checked. But now, there is a case which the flag matters.
> So I think we should add the flag check.
> 
> > 
> > One solution is it would be fine to update count in ->stop() callback
> > for the throttling. This should not cause any issue (though it is a bit
> > redundant that the count is updated twice).
> 
> The clock event relies on local_clock(), which never stops.

Ah, good point!

> So it still counts between read() and stop().
> It's not just redundant. The behavior is changed if the event is updated
> in the stop() again.

> > Or even more clear, we can define a flag PERF_EF_THROTTLING:
> > 
> >     #define PERF_EF_THROTTLING  0x20
> > 
> >     event->pmu->stop(event, PERF_EF_THROTTLING);
> > 
> 
> The if (hwc->interrupts != MAX_INTERRUPTS) should be good enough to
> check the throttling case. I don't think we need a new flag here.

Makes sense to me.

Thanks,
Leo

> >     cpu_clock_event_stop(struct perf_event *event, int flags)
> >     {
> >         if (flags == PERF_EF_THROTTLING)
> >             return;
> > 
> >         ....
> >     }
> > 
> > This might need to do a wider checking to ensure this new flags will not
> > cause any issues.
> 
> Right, it may brings more troubles.
> 
> I think we should properly utilize the existing flag rather than
> introducing a new one.
> 
> Thanks,
> Kan
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-06-10 12:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-06 19:25 [PATCH V4] perf: Fix the throttle error of some clock events kan.liang
2025-06-09  7:43 ` Venkat Rao Bagalkote
2025-06-09 12:34 ` Leo Yan
2025-06-09 13:48   ` Liang, Kan
2025-06-09 18:36     ` Leo Yan
2025-06-09 19:59       ` Liang, Kan
2025-06-10 12:13         ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).