public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Zhiquan Li <zhiquan1.li@intel.com>,
	Youquan Song <youquan.song@intel.com>,
	Borislav Petkov <bp@alien8.de>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Sasha Levin <sashal@kernel.org>,
	tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, x86@kernel.org,
	linux-edac@vger.kernel.org
Subject: [PATCH AUTOSEL 5.15 12/12] x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
Date: Mon, 15 Jan 2024 18:26:57 -0500	[thread overview]
Message-ID: <20240115232718.209642-12-sashal@kernel.org> (raw)
In-Reply-To: <20240115232718.209642-1-sashal@kernel.org>

From: Zhiquan Li <zhiquan1.li@intel.com>

[ Upstream commit 9f3b130048bfa2e44a8cfb1b616f826d9d5d8188 ]

Memory errors don't happen very often, especially fatal ones. However,
in large-scale scenarios such as data centers, that probability
increases with the amount of machines present.

When a fatal machine check happens, mce_panic() is called based on the
severity grading of that error. The page containing the error is not
marked as poison.

However, when kexec is enabled, tools like makedumpfile understand when
pages are marked as poison and do not touch them so as not to cause
a fatal machine check exception again while dumping the previous
kernel's memory.

Therefore, mark the page containing the error as poisoned so that the
kexec'ed kernel can avoid accessing the page.

  [ bp: Rewrite commit message and comment. ]

Co-developed-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Link: https://lore.kernel.org/r/20231014051754.3759099-1-zhiquan1.li@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/kernel/cpu/mce/core.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index a0727723676b..eb48729e220e 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -44,6 +44,7 @@
 #include <linux/sync_core.h>
 #include <linux/task_work.h>
 #include <linux/hardirq.h>
+#include <linux/kexec.h>
 
 #include <asm/intel-family.h>
 #include <asm/processor.h>
@@ -274,6 +275,7 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	struct llist_node *pending;
 	struct mce_evt_llist *l;
 	int apei_err = 0;
+	struct page *p;
 
 	/*
 	 * Allow instrumentation around external facilities usage. Not that it
@@ -329,6 +331,20 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	if (!fake_panic) {
 		if (panic_timeout == 0)
 			panic_timeout = mca_cfg.panic_timeout;
+
+		/*
+		 * Kdump skips the poisoned page in order to avoid
+		 * touching the error bits again. Poison the page even
+		 * if the error is fatal and the machine is about to
+		 * panic.
+		 */
+		if (kexec_crash_loaded()) {
+			if (final && (final->status & MCI_STATUS_ADDRV)) {
+				p = pfn_to_online_page(final->addr >> PAGE_SHIFT);
+				if (p)
+					SetPageHWPoison(p);
+			}
+		}
 		panic(msg);
 	} else
 		pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
-- 
2.43.0


      parent reply	other threads:[~2024-01-15 23:27 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-15 23:26 [PATCH AUTOSEL 5.15 01/12] watch_queue: fix kcalloc() arguments order Sasha Levin
2024-01-15 23:26 ` [PATCH AUTOSEL 5.15 02/12] powerpc/mm: Fix null-pointer dereference in pgtable_cache_add Sasha Levin
2024-01-15 23:26 ` [PATCH AUTOSEL 5.15 03/12] arm64: irq: set the correct node for VMAP stack Sasha Levin
2024-01-15 23:26 ` [PATCH AUTOSEL 5.15 04/12] drivers/perf: pmuv3: don't expose SW_INCR event in sysfs Sasha Levin
2024-01-15 23:26 ` [PATCH AUTOSEL 5.15 05/12] powerpc: Fix build error due to is_valid_bugaddr() Sasha Levin
2024-01-15 23:26 ` [PATCH AUTOSEL 5.15 06/12] powerpc/mm: Fix build failures due to arch_reserved_kernel_pages() Sasha Levin
2024-01-15 23:26 ` [PATCH AUTOSEL 5.15 07/12] powerpc/64s: Fix CONFIG_NUMA=n build due to create_section_mapping() Sasha Levin
2024-01-15 23:26 ` [PATCH AUTOSEL 5.15 08/12] x86/boot: Ignore NMIs during very early boot Sasha Levin
2024-01-15 23:26 ` [PATCH AUTOSEL 5.15 09/12] powerpc: pmd_move_must_withdraw() is only needed for CONFIG_TRANSPARENT_HUGEPAGE Sasha Levin
2024-01-15 23:26 ` [PATCH AUTOSEL 5.15 10/12] powerpc/lib: Validate size for vector operations Sasha Levin
2024-01-15 23:26 ` [PATCH AUTOSEL 5.15 11/12] x86/barrier: Do not serialize MSR accesses on AMD Sasha Levin
2024-11-28 11:59   ` Borislav Petkov
2024-11-28 15:52     ` Sasha Levin
2024-11-28 16:08       ` Erwan Velu
2024-11-28 16:43       ` Borislav Petkov
2024-11-29  0:21         ` Sasha Levin
2024-11-29  9:30           ` Erwan Velu
2024-11-29 13:33             ` Borislav Petkov
2024-11-29 15:37               ` Sasha Levin
2024-11-29 21:18                 ` Erwan Velu
2024-11-29 13:30           ` Borislav Petkov
2024-11-29  9:45         ` Pavel Machek
2024-11-29 13:38           ` Sasha Levin
2024-11-29 20:34             ` Pavel Machek
2024-11-29 20:50               ` Sasha Levin
2024-11-29 21:27                 ` Pavel Machek
2024-01-15 23:26 ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240115232718.209642-12-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=youquan.song@intel.com \
    --cc=zhiquan1.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox