All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf: Fix system hang caused by cpu-clock
@ 2025-10-15  5:18 Dapeng Mi
  2025-10-21 14:47 ` Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Dapeng Mi @ 2025-10-15  5:18 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Andi Kleen, Eranian Stephane
  Cc: linux-kernel, linux-perf-users, Dapeng Mi, Dapeng Mi,
	Octavia Togami

A system hang issue caused by cpu-clock is reported and bisection
indicates the commit 18dbcbfabfff ("perf: Fix the POLL_HUP delivery
 breakage") causes this issue.

The root cause of the hang issue is that cpu-clock is a specific SW
event which relies on the hrtimer. The __perf_event_overflow()
is invoked from the hrtimer handler for cpu-clock event, and
__perf_event_overflow() tries to call event stop callback
(cpu_clock_event_stop()) to stop the event, and cpu_clock_event_stop()
calls htimer_cancel() to cancel the hrtimer. But unfortunately the
hrtimer callback is currently executing and then traps into deadlock.

To avoid this deadlock, use hrtimer_try_to_cancel() instead of
hrtimer_cancel() to cancel the hrtimer, and set PERF_HES_STOPPED flag
for the stopping events. perf_swevent_hrtimer() would stop the event
hrtimer once it detects the PERF_HES_STOPPED flag.

Reported-by: Octavia Togami <octavia.togami@gmail.com>
Closes: https://lore.kernel.org/all/CAHPNGSQpXEopYreir+uDDEbtXTBvBvi8c6fYXJvceqtgTPao3Q@mail.gmail.com/
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Fixes: 18dbcbfabfff ("perf: Fix the POLL_HUP delivery breakage")
Tested-by: Octavia Togami <octavia.togami@gmail.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 kernel/events/core.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7541f6f85fcb..f90105d5f26a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11773,7 +11773,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 
 	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
 
-	if (event->state != PERF_EVENT_STATE_ACTIVE)
+	if (event->state != PERF_EVENT_STATE_ACTIVE ||
+	    event->hw.state & PERF_HES_STOPPED)
 		return HRTIMER_NORESTART;
 
 	event->pmu->read(event);
@@ -11819,15 +11820,18 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 
 	/*
-	 * The throttle can be triggered in the hrtimer handler.
-	 * The HRTIMER_NORESTART should be used to stop the timer,
-	 * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
+	 * The event stop can be triggered in the hrtimer handler.
+	 * So use hrtimer_try_to_cancel() instead of hrtimer_cancel()
+	 * to stop the hrtimer() to avoid trapping into a dead loop.
+	 * Simultaneously the event would be set PERF_HES_STOPPED flag,
+	 * perf_swevent_hrtimer() would stop the event hrtimer once it
+	 * detects the PERF_HES_STOPPED flag.
 	 */
 	if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
 		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
 		local64_set(&hwc->period_left, ktime_to_ns(remaining));
 
-		hrtimer_cancel(&hwc->hrtimer);
+		hrtimer_try_to_cancel(&hwc->hrtimer);
 	}
 }
 
@@ -11871,12 +11875,14 @@ static void cpu_clock_event_update(struct perf_event *event)
 
 static void cpu_clock_event_start(struct perf_event *event, int flags)
 {
+	event->hw.state = 0;
 	local64_set(&event->hw.prev_count, local_clock());
 	perf_swevent_start_hrtimer(event);
 }
 
 static void cpu_clock_event_stop(struct perf_event *event, int flags)
 {
+	event->hw.state = PERF_HES_STOPPED;
 	perf_swevent_cancel_hrtimer(event);
 	if (flags & PERF_EF_UPDATE)
 		cpu_clock_event_update(event);
@@ -11950,12 +11956,14 @@ static void task_clock_event_update(struct perf_event *event, u64 now)
 
 static void task_clock_event_start(struct perf_event *event, int flags)
 {
+	event->hw.state = 0;
 	local64_set(&event->hw.prev_count, event->ctx->time);
 	perf_swevent_start_hrtimer(event);
 }
 
 static void task_clock_event_stop(struct perf_event *event, int flags)
 {
+	event->hw.state = PERF_HES_STOPPED;
 	perf_swevent_cancel_hrtimer(event);
 	if (flags & PERF_EF_UPDATE)
 		task_clock_event_update(event, event->ctx->time);

base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf: Fix system hang caused by cpu-clock
  2025-10-15  5:18 [PATCH] perf: Fix system hang caused by cpu-clock Dapeng Mi
@ 2025-10-21 14:47 ` Peter Zijlstra
  2025-10-22  5:34   ` Mi, Dapeng
  2025-11-03  9:28 ` [tip: perf/urgent] " tip-bot2 for Dapeng Mi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2025-10-21 14:47 UTC (permalink / raw)
  To: Dapeng Mi
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane,
	linux-kernel, linux-perf-users, Dapeng Mi, Octavia Togami

On Wed, Oct 15, 2025 at 01:18:28PM +0800, Dapeng Mi wrote:
> A system hang issue caused by cpu-clock is reported and bisection
> indicates the commit 18dbcbfabfff ("perf: Fix the POLL_HUP delivery
>  breakage") causes this issue.
> 
> The root cause of the hang issue is that cpu-clock is a specific SW
> event which relies on the hrtimer. The __perf_event_overflow()
> is invoked from the hrtimer handler for cpu-clock event, and
> __perf_event_overflow() tries to call event stop callback
> (cpu_clock_event_stop()) to stop the event, and cpu_clock_event_stop()
> calls htimer_cancel() to cancel the hrtimer. But unfortunately the
> hrtimer callback is currently executing and then traps into deadlock.
> 
> To avoid this deadlock, use hrtimer_try_to_cancel() instead of
> hrtimer_cancel() to cancel the hrtimer, and set PERF_HES_STOPPED flag
> for the stopping events. perf_swevent_hrtimer() would stop the event
> hrtimer once it detects the PERF_HES_STOPPED flag.
> 
> Reported-by: Octavia Togami <octavia.togami@gmail.com>
> Closes: https://lore.kernel.org/all/CAHPNGSQpXEopYreir+uDDEbtXTBvBvi8c6fYXJvceqtgTPao3Q@mail.gmail.com/
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Fixes: 18dbcbfabfff ("perf: Fix the POLL_HUP delivery breakage")
> Tested-by: Octavia Togami <octavia.togami@gmail.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  kernel/events/core.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 7541f6f85fcb..f90105d5f26a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -11773,7 +11773,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
>  
>  	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
>  
> -	if (event->state != PERF_EVENT_STATE_ACTIVE)
> +	if (event->state != PERF_EVENT_STATE_ACTIVE ||
> +	    event->hw.state & PERF_HES_STOPPED)
>  		return HRTIMER_NORESTART;
>  
>  	event->pmu->read(event);

I was wondering if we need a HES_STOPPED check after calling
__perf_event_overflow(), but typically that will return 1 when it does
the stop itself, which then already does NORESTART.

So yeah, I suppose this works. Let me go queue this up.

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf: Fix system hang caused by cpu-clock
  2025-10-21 14:47 ` Peter Zijlstra
@ 2025-10-22  5:34   ` Mi, Dapeng
  0 siblings, 0 replies; 6+ messages in thread
From: Mi, Dapeng @ 2025-10-22  5:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Ian Rogers,
	Adrian Hunter, Alexander Shishkin, Andi Kleen, Eranian Stephane,
	linux-kernel, linux-perf-users, Dapeng Mi, Octavia Togami


On 10/21/2025 10:47 PM, Peter Zijlstra wrote:
> On Wed, Oct 15, 2025 at 01:18:28PM +0800, Dapeng Mi wrote:
>> A system hang issue caused by cpu-clock is reported and bisection
>> indicates the commit 18dbcbfabfff ("perf: Fix the POLL_HUP delivery
>>  breakage") causes this issue.
>>
>> The root cause of the hang issue is that cpu-clock is a specific SW
>> event which relies on the hrtimer. The __perf_event_overflow()
>> is invoked from the hrtimer handler for cpu-clock event, and
>> __perf_event_overflow() tries to call event stop callback
>> (cpu_clock_event_stop()) to stop the event, and cpu_clock_event_stop()
>> calls htimer_cancel() to cancel the hrtimer. But unfortunately the
>> hrtimer callback is currently executing and then traps into deadlock.
>>
>> To avoid this deadlock, use hrtimer_try_to_cancel() instead of
>> hrtimer_cancel() to cancel the hrtimer, and set PERF_HES_STOPPED flag
>> for the stopping events. perf_swevent_hrtimer() would stop the event
>> hrtimer once it detects the PERF_HES_STOPPED flag.
>>
>> Reported-by: Octavia Togami <octavia.togami@gmail.com>
>> Closes: https://lore.kernel.org/all/CAHPNGSQpXEopYreir+uDDEbtXTBvBvi8c6fYXJvceqtgTPao3Q@mail.gmail.com/
>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>> Fixes: 18dbcbfabfff ("perf: Fix the POLL_HUP delivery breakage")
>> Tested-by: Octavia Togami <octavia.togami@gmail.com>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  kernel/events/core.c | 18 +++++++++++++-----
>>  1 file changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 7541f6f85fcb..f90105d5f26a 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -11773,7 +11773,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
>>  
>>  	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
>>  
>> -	if (event->state != PERF_EVENT_STATE_ACTIVE)
>> +	if (event->state != PERF_EVENT_STATE_ACTIVE ||
>> +	    event->hw.state & PERF_HES_STOPPED)
>>  		return HRTIMER_NORESTART;
>>  
>>  	event->pmu->read(event);
> I was wondering if we need a HES_STOPPED check after calling
> __perf_event_overflow(), but typically that will return 1 when it does
> the stop itself, which then already does NORESTART.

Yes.


>
> So yeah, I suppose this works. Let me go queue this up.

Thanks for reviewing this patch.


>
> Thanks!
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip: perf/urgent] perf: Fix system hang caused by cpu-clock
  2025-10-15  5:18 [PATCH] perf: Fix system hang caused by cpu-clock Dapeng Mi
  2025-10-21 14:47 ` Peter Zijlstra
@ 2025-11-03  9:28 ` tip-bot2 for Dapeng Mi
  2025-11-03  9:58 ` [tip: perf/urgent] perf/core: Fix system hang caused by cpu-clock usage tip-bot2 for Dapeng Mi
  2025-11-03 10:10 ` tip-bot2 for Dapeng Mi
  3 siblings, 0 replies; 6+ messages in thread
From: tip-bot2 for Dapeng Mi @ 2025-11-03  9:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Octavia Togami, Peter Zijlstra, Dapeng Mi, x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     e33076f34f7a449c0e1808f15d88b2dd9a85979a
Gitweb:        https://git.kernel.org/tip/e33076f34f7a449c0e1808f15d88b2dd9a85979a
Author:        Dapeng Mi <dapeng1.mi@linux.intel.com>
AuthorDate:    Wed, 15 Oct 2025 13:18:28 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 03 Nov 2025 10:26:09 +01:00

perf: Fix system hang caused by cpu-clock

A system hang issue caused by cpu-clock is reported and bisection
indicates the commit 18dbcbfabfff ("perf: Fix the POLL_HUP delivery
 breakage") causes this issue.

The root cause of the hang issue is that cpu-clock is a specific SW
event which relies on the hrtimer. The __perf_event_overflow()
is invoked from the hrtimer handler for cpu-clock event, and
__perf_event_overflow() tries to call event stop callback
(cpu_clock_event_stop()) to stop the event, and cpu_clock_event_stop()
calls htimer_cancel() to cancel the hrtimer. But unfortunately the
hrtimer callback is currently executing and then traps into deadlock.

To avoid this deadlock, use hrtimer_try_to_cancel() instead of
hrtimer_cancel() to cancel the hrtimer, and set PERF_HES_STOPPED flag
for the stopping events. perf_swevent_hrtimer() would stop the event
hrtimer once it detects the PERF_HES_STOPPED flag.

Closes: https://lore.kernel.org/all/CAHPNGSQpXEopYreir+uDDEbtXTBvBvi8c6fYXJvceqtgTPao3Q@mail.gmail.com/
Fixes: 18dbcbfabfff ("perf: Fix the POLL_HUP delivery breakage")
Reported-by: Octavia Togami <octavia.togami@gmail.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Octavia Togami <octavia.togami@gmail.com>
Link: https://patch.msgid.link/20251015051828.12809-1-dapeng1.mi@linux.intel.com
---
 kernel/events/core.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 177e57c..6e4af97 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11773,7 +11773,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 
 	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
 
-	if (event->state != PERF_EVENT_STATE_ACTIVE)
+	if (event->state != PERF_EVENT_STATE_ACTIVE ||
+	    event->hw.state & PERF_HES_STOPPED)
 		return HRTIMER_NORESTART;
 
 	event->pmu->read(event);
@@ -11819,15 +11820,18 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 
 	/*
-	 * The throttle can be triggered in the hrtimer handler.
-	 * The HRTIMER_NORESTART should be used to stop the timer,
-	 * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
+	 * The event stop can be triggered in the hrtimer handler.
+	 * So use hrtimer_try_to_cancel() instead of hrtimer_cancel()
+	 * to stop the hrtimer() to avoid trapping into a dead loop.
+	 * Simultaneously the event would be set PERF_HES_STOPPED flag,
+	 * perf_swevent_hrtimer() would stop the event hrtimer once it
+	 * detects the PERF_HES_STOPPED flag.
 	 */
 	if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
 		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
 		local64_set(&hwc->period_left, ktime_to_ns(remaining));
 
-		hrtimer_cancel(&hwc->hrtimer);
+		hrtimer_try_to_cancel(&hwc->hrtimer);
 	}
 }
 
@@ -11871,12 +11875,14 @@ static void cpu_clock_event_update(struct perf_event *event)
 
 static void cpu_clock_event_start(struct perf_event *event, int flags)
 {
+	event->hw.state = 0;
 	local64_set(&event->hw.prev_count, local_clock());
 	perf_swevent_start_hrtimer(event);
 }
 
 static void cpu_clock_event_stop(struct perf_event *event, int flags)
 {
+	event->hw.state = PERF_HES_STOPPED;
 	perf_swevent_cancel_hrtimer(event);
 	if (flags & PERF_EF_UPDATE)
 		cpu_clock_event_update(event);
@@ -11950,12 +11956,14 @@ static void task_clock_event_update(struct perf_event *event, u64 now)
 
 static void task_clock_event_start(struct perf_event *event, int flags)
 {
+	event->hw.state = 0;
 	local64_set(&event->hw.prev_count, event->ctx->time);
 	perf_swevent_start_hrtimer(event);
 }
 
 static void task_clock_event_stop(struct perf_event *event, int flags)
 {
+	event->hw.state = PERF_HES_STOPPED;
 	perf_swevent_cancel_hrtimer(event);
 	if (flags & PERF_EF_UPDATE)
 		task_clock_event_update(event, event->ctx->time);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip: perf/urgent] perf/core: Fix system hang caused by cpu-clock usage
  2025-10-15  5:18 [PATCH] perf: Fix system hang caused by cpu-clock Dapeng Mi
  2025-10-21 14:47 ` Peter Zijlstra
  2025-11-03  9:28 ` [tip: perf/urgent] " tip-bot2 for Dapeng Mi
@ 2025-11-03  9:58 ` tip-bot2 for Dapeng Mi
  2025-11-03 10:10 ` tip-bot2 for Dapeng Mi
  3 siblings, 0 replies; 6+ messages in thread
From: tip-bot2 for Dapeng Mi @ 2025-11-03  9:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Octavia Togami, Peter Zijlstra, Dapeng Mi, Ingo Molnar, x86,
	linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     e061eb22817cb15e65b91e46d3fa8cd5ae60f9f4
Gitweb:        https://git.kernel.org/tip/e061eb22817cb15e65b91e46d3fa8cd5ae60f9f4
Author:        Dapeng Mi <dapeng1.mi@linux.intel.com>
AuthorDate:    Wed, 15 Oct 2025 13:18:28 +08:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 03 Nov 2025 10:52:10 +01:00

perf/core: Fix system hang caused by cpu-clock usage

cpu-clock usage by the async-profiler tool can trigger a system hang,
which got bisected back to the following commit by Octavia Togami:

  18dbcbfabfff ("perf: Fix the POLL_HUP delivery breakage") causes this issue

The root cause of the hang is that cpu-clock is a special type of SW
event which relies on hrtimers. The __perf_event_overflow() callback
is invoked from the hrtimer handler for cpu-clock events, and
__perf_event_overflow() tries to call cpu_clock_event_stop()
to stop the event, which calls htimer_cancel() to cancel the hrtimer.

But that's a recursion into the hrtimer code from a hrtimer handler,
which (unsurprisingly) deadlocks.

To fix this bug, use hrtimer_try_to_cancel() instead, and set
the PERF_HES_STOPPED flag, which causes perf_swevent_hrtimer()
to stop the event once it sees the PERF_HES_STOPPED flag.

[ mingo: Fixed the comments and improved the changelog. ]

Closes: https://lore.kernel.org/all/CAHPNGSQpXEopYreir+uDDEbtXTBvBvi8c6fYXJvceqtgTPao3Q@mail.gmail.com/
Fixes: 18dbcbfabfff ("perf: Fix the POLL_HUP delivery breakage")
Reported-by: Octavia Togami <octavia.togami@gmail.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Octavia Togami <octavia.togami@gmail.com>
Link: https://github.com/lucko/spark/issues/530
Link: https://patch.msgid.link/20251015051828.12809-1-dapeng1.mi@linux.intel.com
---
 kernel/events/core.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 177e57c..1fd347d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11773,7 +11773,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 
 	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
 
-	if (event->state != PERF_EVENT_STATE_ACTIVE)
+	if (event->state != PERF_EVENT_STATE_ACTIVE ||
+	    event->hw.state & PERF_HES_STOPPED)
 		return HRTIMER_NORESTART;
 
 	event->pmu->read(event);
@@ -11819,15 +11820,20 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 
 	/*
-	 * The throttle can be triggered in the hrtimer handler.
-	 * The HRTIMER_NORESTART should be used to stop the timer,
-	 * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
+	 * Careful: this function can be triggered in the hrtimer handler,
+	 * for cpu-clock events, so hrtimer_cancel() would cause a
+	 * deadlock.
+	 *
+	 * So use hrtimer_try_to_cancel() to try to stop the hrtimer,
+	 * and the cpu-clock handler also sets the PERF_HES_STOPPED flag,
+	 * which guarantees that perf_swevent_hrtimer() will stop the
+	 * hrtimer once it sees the PERF_HES_STOPPED flag.
 	 */
 	if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
 		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
 		local64_set(&hwc->period_left, ktime_to_ns(remaining));
 
-		hrtimer_cancel(&hwc->hrtimer);
+		hrtimer_try_to_cancel(&hwc->hrtimer);
 	}
 }
 
@@ -11871,12 +11877,14 @@ static void cpu_clock_event_update(struct perf_event *event)
 
 static void cpu_clock_event_start(struct perf_event *event, int flags)
 {
+	event->hw.state = 0;
 	local64_set(&event->hw.prev_count, local_clock());
 	perf_swevent_start_hrtimer(event);
 }
 
 static void cpu_clock_event_stop(struct perf_event *event, int flags)
 {
+	event->hw.state = PERF_HES_STOPPED;
 	perf_swevent_cancel_hrtimer(event);
 	if (flags & PERF_EF_UPDATE)
 		cpu_clock_event_update(event);
@@ -11950,12 +11958,14 @@ static void task_clock_event_update(struct perf_event *event, u64 now)
 
 static void task_clock_event_start(struct perf_event *event, int flags)
 {
+	event->hw.state = 0;
 	local64_set(&event->hw.prev_count, event->ctx->time);
 	perf_swevent_start_hrtimer(event);
 }
 
 static void task_clock_event_stop(struct perf_event *event, int flags)
 {
+	event->hw.state = PERF_HES_STOPPED;
 	perf_swevent_cancel_hrtimer(event);
 	if (flags & PERF_EF_UPDATE)
 		task_clock_event_update(event, event->ctx->time);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip: perf/urgent] perf/core: Fix system hang caused by cpu-clock usage
  2025-10-15  5:18 [PATCH] perf: Fix system hang caused by cpu-clock Dapeng Mi
                   ` (2 preceding siblings ...)
  2025-11-03  9:58 ` [tip: perf/urgent] perf/core: Fix system hang caused by cpu-clock usage tip-bot2 for Dapeng Mi
@ 2025-11-03 10:10 ` tip-bot2 for Dapeng Mi
  3 siblings, 0 replies; 6+ messages in thread
From: tip-bot2 for Dapeng Mi @ 2025-11-03 10:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Octavia Togami, Peter Zijlstra, Dapeng Mi, Ingo Molnar, stable,
	x86, linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     eb3182ef0405ff2f6668fd3e5ff9883f60ce8801
Gitweb:        https://git.kernel.org/tip/eb3182ef0405ff2f6668fd3e5ff9883f60ce8801
Author:        Dapeng Mi <dapeng1.mi@linux.intel.com>
AuthorDate:    Wed, 15 Oct 2025 13:18:28 +08:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Mon, 03 Nov 2025 11:04:19 +01:00

perf/core: Fix system hang caused by cpu-clock usage

cpu-clock usage by the async-profiler tool can trigger a system hang,
which got bisected back to the following commit by Octavia Togami:

  18dbcbfabfff ("perf: Fix the POLL_HUP delivery breakage") causes this issue

The root cause of the hang is that cpu-clock is a special type of SW
event which relies on hrtimers. The __perf_event_overflow() callback
is invoked from the hrtimer handler for cpu-clock events, and
__perf_event_overflow() tries to call cpu_clock_event_stop()
to stop the event, which calls htimer_cancel() to cancel the hrtimer.

But that's a recursion into the hrtimer code from a hrtimer handler,
which (unsurprisingly) deadlocks.

To fix this bug, use hrtimer_try_to_cancel() instead, and set
the PERF_HES_STOPPED flag, which causes perf_swevent_hrtimer()
to stop the event once it sees the PERF_HES_STOPPED flag.

[ mingo: Fixed the comments and improved the changelog. ]

Closes: https://lore.kernel.org/all/CAHPNGSQpXEopYreir+uDDEbtXTBvBvi8c6fYXJvceqtgTPao3Q@mail.gmail.com/
Fixes: 18dbcbfabfff ("perf: Fix the POLL_HUP delivery breakage")
Reported-by: Octavia Togami <octavia.togami@gmail.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Octavia Togami <octavia.togami@gmail.com>
Cc: stable@vger.kernel.org
Link: https://github.com/lucko/spark/issues/530
Link: https://patch.msgid.link/20251015051828.12809-1-dapeng1.mi@linux.intel.com
---
 kernel/events/core.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 177e57c..1fd347d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11773,7 +11773,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 
 	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
 
-	if (event->state != PERF_EVENT_STATE_ACTIVE)
+	if (event->state != PERF_EVENT_STATE_ACTIVE ||
+	    event->hw.state & PERF_HES_STOPPED)
 		return HRTIMER_NORESTART;
 
 	event->pmu->read(event);
@@ -11819,15 +11820,20 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 
 	/*
-	 * The throttle can be triggered in the hrtimer handler.
-	 * The HRTIMER_NORESTART should be used to stop the timer,
-	 * rather than hrtimer_cancel(). See perf_swevent_hrtimer()
+	 * Careful: this function can be triggered in the hrtimer handler,
+	 * for cpu-clock events, so hrtimer_cancel() would cause a
+	 * deadlock.
+	 *
+	 * So use hrtimer_try_to_cancel() to try to stop the hrtimer,
+	 * and the cpu-clock handler also sets the PERF_HES_STOPPED flag,
+	 * which guarantees that perf_swevent_hrtimer() will stop the
+	 * hrtimer once it sees the PERF_HES_STOPPED flag.
 	 */
 	if (is_sampling_event(event) && (hwc->interrupts != MAX_INTERRUPTS)) {
 		ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
 		local64_set(&hwc->period_left, ktime_to_ns(remaining));
 
-		hrtimer_cancel(&hwc->hrtimer);
+		hrtimer_try_to_cancel(&hwc->hrtimer);
 	}
 }
 
@@ -11871,12 +11877,14 @@ static void cpu_clock_event_update(struct perf_event *event)
 
 static void cpu_clock_event_start(struct perf_event *event, int flags)
 {
+	event->hw.state = 0;
 	local64_set(&event->hw.prev_count, local_clock());
 	perf_swevent_start_hrtimer(event);
 }
 
 static void cpu_clock_event_stop(struct perf_event *event, int flags)
 {
+	event->hw.state = PERF_HES_STOPPED;
 	perf_swevent_cancel_hrtimer(event);
 	if (flags & PERF_EF_UPDATE)
 		cpu_clock_event_update(event);
@@ -11950,12 +11958,14 @@ static void task_clock_event_update(struct perf_event *event, u64 now)
 
 static void task_clock_event_start(struct perf_event *event, int flags)
 {
+	event->hw.state = 0;
 	local64_set(&event->hw.prev_count, event->ctx->time);
 	perf_swevent_start_hrtimer(event);
 }
 
 static void task_clock_event_stop(struct perf_event *event, int flags)
 {
+	event->hw.state = PERF_HES_STOPPED;
 	perf_swevent_cancel_hrtimer(event);
 	if (flags & PERF_EF_UPDATE)
 		task_clock_event_update(event, event->ctx->time);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-11-03 10:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-15  5:18 [PATCH] perf: Fix system hang caused by cpu-clock Dapeng Mi
2025-10-21 14:47 ` Peter Zijlstra
2025-10-22  5:34   ` Mi, Dapeng
2025-11-03  9:28 ` [tip: perf/urgent] " tip-bot2 for Dapeng Mi
2025-11-03  9:58 ` [tip: perf/urgent] perf/core: Fix system hang caused by cpu-clock usage tip-bot2 for Dapeng Mi
2025-11-03 10:10 ` tip-bot2 for Dapeng Mi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.