From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Namhyung Kim <namhyung@kernel.org>,
Ian Rogers <irogers@google.com>,
Mark Rutland <mark.rutland@arm.com>,
LKML <linux-kernel@vger.kernel.org>,
"linux-perf-use." <linux-perf-users@vger.kernel.org>,
Stephane Eranian <eranian@google.com>,
Chun-Tse Shao <ctshao@google.com>,
Thomas Richter <tmricht@linux.ibm.com>, Leo Yan <leo.yan@arm.com>,
Aishwarya TCV <aishwarya.tcv@arm.com>,
Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Subject: Re: [PATCH V3] perf: Fix the throttle error of some clock events
Date: Thu, 5 Jun 2025 12:58:38 -0400 [thread overview]
Message-ID: <b5c3cc21-26d6-4e87-84de-fa99909bdf1c@linux.intel.com> (raw)
In-Reply-To: <d5fcf34f-63fe-451b-89ad-621c38981709@linux.intel.com>
On 2025-06-05 9:46 a.m., Liang, Kan wrote:
>
>
> On 2025-06-04 7:21 p.m., Alexei Starovoitov wrote:
>> On Wed, Jun 4, 2025 at 10:16 AM <kan.liang@linux.intel.com> wrote:
>>>
>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>
>>> Both ARM and IBM CI reports RCU stall, which can be reproduced by the
>>> below perf command.
>>> perf record -a -e cpu-clock -- sleep 2
>>>
>>> The issue is introduced by the generic throttle patch set, which
>>> unconditionally invoke the event_stop() when throttle is triggered.
>>>
>>> The cpu-clock and task-clock are two special SW events, which rely on
>>> the hrtimer. The throttle is invoked in the hrtimer handler. The
>>> event_stop()->hrtimer_cancel() waits for the handler to finish, which is
>>> a deadlock. Instead of invoking the stop(), the HRTIMER_NORESTART should
>>> be used to stop the timer.
>>>
>>> There may be two ways to fix it.
>>> - Introduce a PMU flag to track the case. Avoid the event_stop in
>>> perf_event_throttle() if the flag is detected.
>>> It has been implemented in the
>>> https://lore.kernel.org/lkml/20250528175832.2999139-1-kan.liang@linux.intel.com/
>>> The new flag was thought to be an overkill for the issue.
>>> - Add a check in the event_stop. Return immediately if the throttle is
>>> invoked in the hrtimer handler. Rely on the existing HRTIMER_NORESTART
>>> method to stop the timer.
>>>
>>> The latter is implemented here.
>>>
>>> Move event->hw.interrupts = MAX_INTERRUPTS before the stop(). It makes
>>> the order the same as perf_event_unthrottle(). Except the patch, no one
>>> checks the hw.interrupts in the stop(). There is no impact from the
>>> order change.
>>>
>>> Reported-by: Leo Yan <leo.yan@arm.com>
>>> Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
>>> Closes: https://lore.kernel.org/lkml/20250527161656.GJ2566836@e132581.arm.com/
>>> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>> Closes: https://lore.kernel.org/lkml/djxlh5fx326gcenwrr52ry3pk4wxmugu4jccdjysza7tlc5fef@ktp4rffawgcw/
>>> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
>>> Closes: https://lore.kernel.org/lkml/8e8f51d8-af64-4d9e-934b-c0ee9f131293@linux.ibm.com/
>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>
>> It seems the patch fixes one issue and introduces another ?
>>
>> Looks like the throttle event is sticky.
>> Once it's reached the perf_event no longer works ?
>
> No. It should still work even the throttle is triggered.
>
> sdp@d404e6bce080:~$ sudo bash -c 'echo 10 >
> /proc/sys/kernel/perf_event_max_sample_rate'
> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.559 MB perf.data (584 samples) ]
> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.564 MB perf.data (613 samples) ]
> sdp@d404e6bce080:~$ sudo perf record -a -e cpu-clock -c10000 -- sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.545 MB perf.data (502 samples) ]
>
>
>> Repro:
>> test_progs -t perf_branches/perf_branches_no_hw
>> #250/2 perf_branches/perf_branches_no_hw:OK
>>
>> test_progs -t stacktrace_build_id_nmi
>> #393 stacktrace_build_id_nmi:OK
>>
>> test_progs -t perf_branches/perf_branches_no_hw
>> perf_branches/perf_branches_no_hw:FAIL
>>
>
> Do you have more logs regarding where it's failed?
>
> Thanks,
> Kan> Maybe it's an unrelated bug.
I tried the tests on my machine. I don't see any issues.
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
perf_branches/perf_branches_no_hw
#240/2 perf_branches/perf_branches_no_hw:OK
#240 perf_branches:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
stacktrace_build_id_nmi
#376 stacktrace_build_id_nmi:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
perf_branches/perf_branches_no_hw
#240/2 perf_branches/perf_branches_no_hw:OK
#240 perf_branches:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
stacktrace_build_id_nmi
#376 stacktrace_build_id_nmi:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
sdp@d404e6bce080:~/tip/tools/testing/selftests/bpf$ sudo ./test_progs -t
perf_branches/perf_branches_no_hw
#240/2 perf_branches/perf_branches_no_hw:OK
#240 perf_branches:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
Thanks,
Kan
next prev parent reply other threads:[~2025-06-05 16:58 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-04 17:15 [PATCH V3] perf: Fix the throttle error of some clock events kan.liang
2025-06-04 20:08 ` Ian Rogers
2025-06-04 23:21 ` Alexei Starovoitov
2025-06-05 13:46 ` Liang, Kan
2025-06-05 16:58 ` Liang, Kan [this message]
2025-06-05 18:45 ` Alexei Starovoitov
2025-06-05 20:24 ` Liang, Kan
2025-06-05 20:46 ` Alexei Starovoitov
2025-06-05 23:50 ` Liang, Kan
2025-06-06 0:39 ` Alexei Starovoitov
2025-06-06 13:05 ` Liang, Kan
2025-06-06 17:42 ` Alexei Starovoitov
2025-06-06 18:38 ` Liang, Kan
2025-06-06 18:40 ` Alexei Starovoitov
2025-06-05 6:39 ` Ingo Molnar
2025-06-05 13:51 ` Liang, Kan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b5c3cc21-26d6-4e87-84de-fa99909bdf1c@linux.intel.com \
--to=kan.liang@linux.intel.com \
--cc=aishwarya.tcv@arm.com \
--cc=alexei.starovoitov@gmail.com \
--cc=ctshao@google.com \
--cc=eranian@google.com \
--cc=irogers@google.com \
--cc=leo.yan@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=tmricht@linux.ibm.com \
--cc=venkat88@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.