From: Zeng Heng <zengheng4@huawei.com>
To: <alexander.shishkin@linux.intel.com>, <tglx@linutronix.de>,
<peterz@infradead.org>, <tiwai@suse.de>, <jolsa@kernel.org>,
<vbabka@suse.cz>, <keescook@chromium.org>, <mingo@redhat.com>,
<acme@kernel.org>, <namhyung@kernel.org>, <bp@alien8.de>,
<bhe@redhat.com>, <eric.devolder@oracle.com>, <hpa@zytor.com>,
<jroedel@suse.de>, <dave.hansen@linux.intel.com>
Cc: <linux-perf-users@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <liwei391@huawei.com>,
<x86@kernel.org>, <xiexiuqi@huawei.com>
Subject: [RFC PATCH v4] x86/kdump: terminate watchdog NMI interrupt to avoid kdump crashes
Date: Fri, 17 Feb 2023 20:06:04 +0800 [thread overview]
Message-ID: <20230217120604.435608-1-zengheng4@huawei.com> (raw)
If the cpu panics within the NMI interrupt context, there could be
unhandled NMI interrupts in the background which are blocked by processor
until next IRET instruction executes. Since that, it prevents nested
NMI handler execution.
In case of IRET execution during kdump reboot and no proper NMIs handler
registered at that point (such as during EFI loader), we need to ensure
watchdog no work any more, or kdump would crash later. So call
perf_event_exit_cpu() at the very last moment in the panic shutdown.
!! Here I know it's not allowed to call perf_event_exit_cpu() within nmi
context, because of mutex_lock, smp_call_function and so on.
Is there any experts know about the similar function which allowed to call
within atomic context (Neither x86_pmu_disable() nor x86_pmu_disable_all()
do work after my practice)?
Thank you in advance.
Here provide one of test case to reproduce the concerned issue:
1. # cat uncorrected
CPU 1 BANK 4
STATUS uncorrected 0xc0
MCGSTATUS EIPV MCIP
ADDR 0x1234
RIP 0xdeadbabe
RAISINGCPU 0
MCGCAP SER CMCI TES 0x6
2. # modprobe mce_inject
3. # mce-inject uncorrected
Mce-inject would trigger kernel panic under NMI interrupt context. In
addition, we need another NMI interrupt raise (such as from watchdog)
during panic process. Set proper watchdog threshold value and/or add an
artificial delay to make sure watchdog interrupt raise during the panic
procedure and the involved issue would occur.
Fixes: ca0e22d4f011 ("x86/boot/compressed/64: Always switch to own page table")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
---
v1: add dummy NMI interrupt handler in EFI loader
v2: tidy up changelog, add comments (by Ingo Molnar)
v3: add iret_to_self() to deal with blocked NMIs in advance
v4: call perf_event_exit_cpu() to terminate watchdog in panic shutdown
arch/x86/kernel/crash.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 305514431f26..f46df94bbdad 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -25,6 +25,7 @@
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/memblock.h>
+#include <linux/perf_event.h>
#include <asm/processor.h>
#include <asm/hardirq.h>
@@ -170,6 +171,15 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
#ifdef CONFIG_HPET_TIMER
hpet_disable();
#endif
+
+ /*
+ * If the cpu panics within the NMI interrupt context,
+ * we need to ensure no more NMI interrupts blocked by
+ * processor. In case of IRET execution during kdump
+ * path and no proper NMIs handler registered at that
+ * point, here terminate watchdog in panic shutdown.
+ */
+ perf_event_exit_cpu(smp_processor_id());
crash_save_cpu(regs, safe_smp_processor_id());
}
--
2.25.1
next reply other threads:[~2023-02-17 12:06 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-17 12:06 Zeng Heng [this message]
2023-02-22 17:08 ` [RFC PATCH v4] x86/kdump: terminate watchdog NMI interrupt to avoid kdump crashes Peter Zijlstra
2023-02-22 18:39 ` Eric W. Biederman
2023-02-23 2:29 ` Zeng Heng
2023-02-23 3:14 ` Zeng Heng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230217120604.435608-1-zengheng4@huawei.com \
--to=zengheng4@huawei.com \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=bhe@redhat.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=eric.devolder@oracle.com \
--cc=hpa@zytor.com \
--cc=jolsa@kernel.org \
--cc=jroedel@suse.de \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=liwei391@huawei.com \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tiwai@suse.de \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
--cc=xiexiuqi@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox