From: Namhyung Kim <namhyung@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>,
Thomas Gleixner <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
x86@kernel.org, LKML <linux-kernel@vger.kernel.org>,
Stephane Eranian <eranian@google.com>,
Kan Liang <kan.liang@linux.intel.com>,
John Sperbeck <jsperbeck@google.com>,
"Lendacky, Thomas" <Thomas.Lendacky@amd.com>
Subject: [RFC] perf/x86: Fix a warning on x86_pmu_stop()
Date: Sat, 21 Nov 2020 11:50:11 +0900 [thread overview]
Message-ID: <20201121025011.227781-1-namhyung@kernel.org> (raw)
When large PEBS is enabled, the below warning is triggered:
[6070379.453697] WARNING: CPU: 23 PID: 42379 at arch/x86/events/core.c:1466 x86_pmu_stop+0x95/0xa0
...
[6070379.453831] Call Trace:
[6070379.453840] x86_pmu_del+0x50/0x150
[6070379.453845] event_sched_out.isra.0+0x95/0x200
[6070379.453848] group_sched_out.part.0+0x53/0xd0
[6070379.453851] __perf_event_disable+0xee/0x1e0
[6070379.453854] event_function+0x89/0xd0
[6070379.453859] remote_function+0x3e/0x50
[6070379.453866] generic_exec_single+0x91/0xd0
[6070379.453870] smp_call_function_single+0xd1/0x110
[6070379.453874] event_function_call+0x11c/0x130
[6070379.453877] ? task_ctx_sched_out+0x20/0x20
[6070379.453880] ? perf_mux_hrtimer_handler+0x370/0x370
[6070379.453882] ? event_function_call+0x130/0x130
[6070379.453886] perf_event_for_each_child+0x34/0x80
[6070379.453889] ? event_function_call+0x130/0x130
[6070379.453891] _perf_ioctl+0x24b/0x6a0
[6070379.453898] ? sched_setaffinity+0x1ad/0x2a0
[6070379.453904] ? _cond_resched+0x15/0x30
[6070379.453906] perf_ioctl+0x3d/0x60
[6070379.453912] ksys_ioctl+0x87/0xc0
[6070379.453917] __x64_sys_ioctl+0x16/0x20
[6070379.453923] do_syscall_64+0x52/0x180
[6070379.453928] entry_SYSCALL_64_after_hwframe+0x44/0xa9
The commit 3966c3feca3f ("x86/perf/amd: Remove need to check "running"
bit in NMI handler") introduced this. It seems x86_pmu_stop can be
called recursively (like when it losts some samples) like below:
x86_pmu_stop
intel_pmu_disable_event (x86_pmu_disable)
intel_pmu_pebs_disable
intel_pmu_drain_pebs_buffer
x86_pmu_stop
It seems the change is only needed for AMD. So I added a new bit to
check when it should clear the active mask.
Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
Reported-by: John Sperbeck <jsperbeck@google.com>
Cc: "Lendacky, Thomas" <Thomas.Lendacky@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
arch/x86/events/amd/core.c | 1 +
arch/x86/events/core.c | 9 +++++++--
arch/x86/events/perf_event.h | 3 ++-
3 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 39eb276d0277..c545fbd423df 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -927,6 +927,7 @@ static __initconst const struct x86_pmu amd_pmu = {
.max_period = (1ULL << 47) - 1,
.get_event_constraints = amd_get_event_constraints,
.put_event_constraints = amd_put_event_constraints,
+ .late_nmi = 1,
.format_attrs = amd_format_attr,
.events_sysfs_show = amd_event_sysfs_show,
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 7b802a778014..a6c12bd71e66 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1514,8 +1514,13 @@ void x86_pmu_stop(struct perf_event *event, int flags)
struct hw_perf_event *hwc = &event->hw;
if (test_bit(hwc->idx, cpuc->active_mask)) {
- x86_pmu.disable(event);
- __clear_bit(hwc->idx, cpuc->active_mask);
+ if (x86_pmu.late_nmi) {
+ x86_pmu.disable(event);
+ __clear_bit(hwc->idx, cpuc->active_mask);
+ } else {
+ __clear_bit(hwc->idx, cpuc->active_mask);
+ x86_pmu.disable(event);
+ }
cpuc->events[hwc->idx] = NULL;
WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
hwc->state |= PERF_HES_STOPPED;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 10032f023fcc..1ffaa0fcd521 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -682,7 +682,8 @@ struct x86_pmu {
/* PMI handler bits */
unsigned int late_ack :1,
enabled_ack :1,
- counter_freezing :1;
+ counter_freezing :1,
+ late_nmi :1;
/*
* sysfs attrs
*/
--
2.29.2.454.gaff20da3a2-goog
next reply other threads:[~2020-11-21 2:50 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-21 2:50 Namhyung Kim [this message]
2020-11-23 14:23 ` [RFC] perf/x86: Fix a warning on x86_pmu_stop() Peter Zijlstra
2020-11-24 5:01 ` Namhyung Kim
2020-11-24 8:09 ` Peter Zijlstra
2020-11-24 8:19 ` Stephane Eranian
2020-11-25 7:36 ` Peter Zijlstra
2020-11-25 7:22 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201121025011.227781-1-namhyung@kernel.org \
--to=namhyung@kernel.org \
--cc=Thomas.Lendacky@amd.com \
--cc=bp@alien8.de \
--cc=eranian@google.com \
--cc=hpa@zytor.com \
--cc=jsperbeck@google.com \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox