All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Eranian Stephane <eranian@google.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Dapeng Mi <dapeng1.mi@intel.com>,
	Octavia Togami <octavia.togami@gmail.com>
Subject: Re: [PATCH] perf: Fix system hang caused by cpu-clock
Date: Wed, 22 Oct 2025 13:34:09 +0800	[thread overview]
Message-ID: <9f45329d-e49b-4055-bfb0-458db6ae319d@linux.intel.com> (raw)
In-Reply-To: <20251021144751.GQ3245006@noisy.programming.kicks-ass.net>


On 10/21/2025 10:47 PM, Peter Zijlstra wrote:
> On Wed, Oct 15, 2025 at 01:18:28PM +0800, Dapeng Mi wrote:
>> A system hang issue caused by cpu-clock is reported and bisection
>> indicates the commit 18dbcbfabfff ("perf: Fix the POLL_HUP delivery
>>  breakage") causes this issue.
>>
>> The root cause of the hang issue is that cpu-clock is a specific SW
>> event which relies on the hrtimer. The __perf_event_overflow()
>> is invoked from the hrtimer handler for cpu-clock event, and
>> __perf_event_overflow() tries to call event stop callback
>> (cpu_clock_event_stop()) to stop the event, and cpu_clock_event_stop()
>> calls htimer_cancel() to cancel the hrtimer. But unfortunately the
>> hrtimer callback is currently executing and then traps into deadlock.
>>
>> To avoid this deadlock, use hrtimer_try_to_cancel() instead of
>> hrtimer_cancel() to cancel the hrtimer, and set PERF_HES_STOPPED flag
>> for the stopping events. perf_swevent_hrtimer() would stop the event
>> hrtimer once it detects the PERF_HES_STOPPED flag.
>>
>> Reported-by: Octavia Togami <octavia.togami@gmail.com>
>> Closes: https://lore.kernel.org/all/CAHPNGSQpXEopYreir+uDDEbtXTBvBvi8c6fYXJvceqtgTPao3Q@mail.gmail.com/
>> Suggested-by: Peter Zijlstra <peterz@infradead.org>
>> Fixes: 18dbcbfabfff ("perf: Fix the POLL_HUP delivery breakage")
>> Tested-by: Octavia Togami <octavia.togami@gmail.com>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  kernel/events/core.c | 18 +++++++++++++-----
>>  1 file changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 7541f6f85fcb..f90105d5f26a 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -11773,7 +11773,8 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
>>  
>>  	event = container_of(hrtimer, struct perf_event, hw.hrtimer);
>>  
>> -	if (event->state != PERF_EVENT_STATE_ACTIVE)
>> +	if (event->state != PERF_EVENT_STATE_ACTIVE ||
>> +	    event->hw.state & PERF_HES_STOPPED)
>>  		return HRTIMER_NORESTART;
>>  
>>  	event->pmu->read(event);
> I was wondering if we need a HES_STOPPED check after calling
> __perf_event_overflow(), but typically that will return 1 when it does
> the stop itself, which then already does NORESTART.

Yes.


>
> So yeah, I suppose this works. Let me go queue this up.

Thanks for reviewing this patch.


>
> Thanks!
>

  reply	other threads:[~2025-10-22  5:34 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-15  5:18 [PATCH] perf: Fix system hang caused by cpu-clock Dapeng Mi
2025-10-21 14:47 ` Peter Zijlstra
2025-10-22  5:34   ` Mi, Dapeng [this message]
2025-11-03  9:28 ` [tip: perf/urgent] " tip-bot2 for Dapeng Mi
2025-11-03  9:58 ` [tip: perf/urgent] perf/core: Fix system hang caused by cpu-clock usage tip-bot2 for Dapeng Mi
2025-11-03 10:10 ` tip-bot2 for Dapeng Mi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f45329d-e49b-4055-bfb0-458db6ae319d@linux.intel.com \
    --to=dapeng1.mi@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dapeng1.mi@intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=octavia.togami@gmail.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.