All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/boot/compressed: Register NMI handler in EFI boot loader
@ 2023-01-10 10:27 Zeng Heng
  2023-01-10 10:59 ` Ingo Molnar
  2023-01-10 11:41 ` [tip: x86/boot] " tip-bot2 for Zeng Heng
  0 siblings, 2 replies; 17+ messages in thread
From: Zeng Heng @ 2023-01-10 10:27 UTC (permalink / raw)
  To: michael.roth, bp, hpa, tglx, sathyanarayanan.kuppuswamy,
	kirill.shutemov, jroedel, keescook, mingo, dave.hansen,
	brijesh.singh
  Cc: linux-kernel, x86, liwei391

If kdump is enabled, when using mce_inject to inject errors, EFI
boot loader would decompressed & load second kernel for saving
vmcore file.

For normal errors that is fine. However, in MCEs cases, the panic
cpu that firstly enters into mce_panic(), is running within nmi
interrupt context, and the processor blocks delivery of subsequent
NMIs until the next execution of the IRET instruction.

When the panic cpu takes long time in the panic processing route,
and causes the watchdog timeout, at this moment, the processor
already receives NMI interrupt in the background.

In the following processure, panic cpu would run into EFI loader
and raise page fault exception (like visiting `vidmem` variable
when attempts to call debug_putstr()), the cpu would execute IRET
instruction when exits from page fault handler.

But the loader never registers handler for NMI vector in IDT,
lack of vector handler would cause reboot, which interrupts
kdump processure and fails to save vmcore file.

Here is steps to reproduce the above issue (Have a certain probability):
1. # cat uncorrected
CPU 1 BANK 4
STATUS uncorrected 0xc0
MCGSTATUS  EIPV MCIP
ADDR 0x1234
RIP 0xdeadbabe
RAISINGCPU 0
MCGCAP SER CMCI TES 0x6
2. # modprobe mce_inject
3. # mce-inject uncorrected

For increasing probability of issue reproduce, there is two ways of
modification to select:
1. modify the threshold value of watchdog;
2. add delays before panic() in mce_panic() and modify PANIC_TIMEOUT macro;

Fixes: ca0e22d4f011 ("x86/boot/compressed/64: Always switch to own page table")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
---
 arch/x86/boot/compressed/ident_map_64.c    | 5 +++++
 arch/x86/boot/compressed/idt_64.c          | 1 +
 arch/x86/boot/compressed/idt_handlers_64.S | 1 +
 arch/x86/boot/compressed/misc.h            | 1 +
 4 files changed, 8 insertions(+)

diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c
index d4a314cc50d6..6893127f673f 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -379,3 +379,8 @@ void do_boot_page_fault(struct pt_regs *regs, unsigned long error_code)
 	 */
 	kernel_add_identity_map(address, end);
 }
+
+void do_boot_nmi_fault(struct pt_regs *regs, unsigned long error_code)
+{
+	/* ignore */
+}
diff --git a/arch/x86/boot/compressed/idt_64.c b/arch/x86/boot/compressed/idt_64.c
index 6debb816e83d..b169c9728d52 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -60,6 +60,7 @@ void load_stage2_idt(void)
 {
 	boot_idt_desc.address = (unsigned long)boot_idt;
 
+	set_idt_entry(X86_TRAP_NMI, boot_nmi_fault);
 	set_idt_entry(X86_TRAP_PF, boot_page_fault);
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
diff --git a/arch/x86/boot/compressed/idt_handlers_64.S b/arch/x86/boot/compressed/idt_handlers_64.S
index 22890e199f5b..2aef8e1b515b 100644
--- a/arch/x86/boot/compressed/idt_handlers_64.S
+++ b/arch/x86/boot/compressed/idt_handlers_64.S
@@ -69,6 +69,7 @@ SYM_FUNC_END(\name)
 	.text
 	.code64
 
+EXCEPTION_HANDLER	boot_nmi_fault do_boot_nmi_fault error_code=0
 EXCEPTION_HANDLER	boot_page_fault do_boot_page_fault error_code=1
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 62208ec04ca4..d89d3f8417f6 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -187,6 +187,7 @@ static inline void cleanup_exception_handling(void) { }
 #endif
 
 /* IDT Entry Points */
+void boot_nmi_fault(void);
 void boot_page_fault(void);
 void boot_stage1_vc(void);
 void boot_stage2_vc(void);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-01-12  2:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-10 10:27 [PATCH] x86/boot/compressed: Register NMI handler in EFI boot loader Zeng Heng
2023-01-10 10:59 ` Ingo Molnar
2023-01-10 12:01   ` [PATCH -v2] x86/boot/compressed: Register dummy NMI handler in EFI boot loader, to avoid kdump crashes Ingo Molnar
2023-01-10 12:11     ` Borislav Petkov
2023-01-10 12:17       ` Borislav Petkov
2023-01-10 12:34         ` Ingo Molnar
2023-01-10 12:46           ` Borislav Petkov
2023-01-10 12:50           ` Zeng Heng
2023-01-10 12:32       ` Zeng Heng
2023-01-10 12:57         ` Borislav Petkov
2023-01-10 13:42           ` Zeng Heng
2023-01-10 16:09           ` Borislav Petkov
2023-01-12  2:07             ` Zeng Heng
2023-01-10 14:53         ` Borislav Petkov
2023-01-11  3:46           ` Zeng Heng
2023-01-10 14:06     ` Zeng Heng
2023-01-10 11:41 ` [tip: x86/boot] " tip-bot2 for Zeng Heng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.