linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Mark Rutland <mark.rutland@arm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-perf-use." <linux-perf-users@vger.kernel.org>,
	Stephane Eranian <eranian@google.com>,
	Chun-Tse Shao <ctshao@google.com>,
	Thomas Richter <tmricht@linux.ibm.com>, Leo Yan <leo.yan@arm.com>,
	Aishwarya TCV <aishwarya.tcv@arm.com>,
	Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Subject: Re: [PATCH V3] perf: Fix the throttle error of some clock events
Date: Thu, 5 Jun 2025 12:58:38 -0400	[thread overview]
Message-ID: <b5c3cc21-26d6-4e87-84de-fa99909bdf1c@linux.intel.com> (raw)
In-Reply-To: <d5fcf34f-63fe-451b-89ad-621c38981709@linux.intel.com>



On 2025-06-05 9:46 a.m., Liang, Kan wrote:
> 
> 
> On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
>> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
>>>
>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>
>>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
>>> below perf command.
>>>   perf record -a -e cpu-clock -- sleep 2
>>>
>>> The issue is introduced by the generic throttle patch set, which
>>> unconditionally invoke the event_stop() when throttle is triggered.
>>>
>>> The cpu-clock and task-clock are two special SW events, which rely on
>>> the hrtimer. The throttle is invoked in the hrtimer handler. The
>>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
>>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
>>> be used to stop the timer.
>>>
>>> There may be two ways to fix it.
>>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>>>   perf_event_throttle() if the flag is detected.
>>>   It has been implemented in the
>>>   https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>>   The new flag was thought to be an overkill for the issue.
>>> - Add a check in the event_stop. Return immediately if the throttle is
>>>   invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>>>   method to stop the timer.
>>>
>>> The latter is implemented here.
>>>
>>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>>> the order the same as perf_event_unthrottle(). Except the patch, no one
>>> checks the hw.interrupts in the stop(). There is no impact from the
>>> order change.
>>>
>>> Reported-by: Leo Yan <leo.yan@arm.com>
>>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
>>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>
>> It seems the patch fixes one issue and introduces another ?
>>
>> Looks like the throttle event is sticky.
>> Once it's reached the perf_event no longer works ?
> 
> No. It should still work even the throttle is triggered.
> 
> sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
> /proc/sys/kernel/perf_event_max_sample_rate'
> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.564 MB perf.data (613 samples) ]
> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.545 MB perf.data (502 samples) ]
> 
> 
>> Repro:
>> test_progs -t perf_branches/perf_branches_no_hw
>> #250/2   perf_branches/perf_branches_no_hw:OK
>>
>> test_progs -t stacktrace_build_id_nmi
>> #393     stacktrace_build_id_nmi:OK
>>
>> test_progs -t perf_branches/perf_branches_no_hw
>> perf_branches/perf_branches_no_hw:FAIL
>>
> 
> Do you have more logs regarding where it's failed?
> 
> Thanks,
> Kan> Maybe it's an unrelated bug.

I tried the tests on my machine. I don't see any issues.

sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
perf_branches/perf_branches_no_hw
#240/2   perf_branches/perf_branches_no_hw:OK
#240     perf_branches:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
stacktrace_build_id_nmi
#376     stacktrace_build_id_nmi:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
perf_branches/perf_branches_no_hw
#240/2   perf_branches/perf_branches_no_hw:OK
#240     perf_branches:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
stacktrace_build_id_nmi
#376     stacktrace_build_id_nmi:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
perf_branches/perf_branches_no_hw
#240/2   perf_branches/perf_branches_no_hw:OK
#240     perf_branches:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED


Thanks,
Kan

  reply	other threads:[~2025-06-05 16:58 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-04 17:15 [PATCH V3] perf: Fix the throttle error of some clock events kan.liang
2025-06-04 20:08 ` Ian Rogers
2025-06-04 23:21 ` Alexei Starovoitov
2025-06-05 13:46   ` Liang, Kan
2025-06-05 16:58     ` Liang, Kan [this message]
2025-06-05 18:45     ` Alexei Starovoitov
2025-06-05 20:24       ` Liang, Kan
2025-06-05 20:46         ` Alexei Starovoitov
2025-06-05 23:50           ` Liang, Kan
2025-06-06  0:39             ` Alexei Starovoitov
2025-06-06 13:05               ` Liang, Kan
2025-06-06 17:42                 ` Alexei Starovoitov
2025-06-06 18:38                   ` Liang, Kan
2025-06-06 18:40                     ` Alexei Starovoitov
2025-06-05  6:39 ` Ingo Molnar
2025-06-05 13:51   ` Liang, Kan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b5c3cc21-26d6-4e87-84de-fa99909bdf1c@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=aishwarya.tcv@arm.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ctshao@google.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=leo.yan@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tmricht@linux.ibm.com \
    --cc=venkat88@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).