Re: [PATCH RESEND v2] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic

patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: Zhiquan Li <zhiquan1.li@intel.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: <x86@kernel.org>, <linux-edac@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <patches@lists.linux.dev>,
	<bp@alien8.de>, <tony.luck@intel.com>, <naoya.horiguchi@nec.com>
Subject: Re: [PATCH RESEND v2] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic
Date: Tue, 10 Oct 2023 08:56:38 +0800	[thread overview]
Message-ID: <5b6bdf6a-760c-4ba3-95ec-2d4482ad9bac@intel.com> (raw)
In-Reply-To: <ZRsUpM/XtPAE50Rm@gmail.com>

On 2023/10/3 03:06, Ingo Molnar wrote:
> The English in this commit is *atrocious*, both in the changelog and in
> the comments - how on Earth did 'Posion' typo and half a dozen other
> typos and bad grammar survive ~3 iterations and a Reviewed-by tag?? The
> version below fixes up the worst, but I suspect that's not the only
> problem with this patch...

Many thanks for your attention and fixes up, Ingo.

I’d like to introduce more background of this patch.

Memory errors don’t happen very often, especially the severity is fatal.
 However, in large-scale scenarios, such as data centers, it might still
happen.  For some MCE fatal error cases, the kernel might call
mce_panic() to terminate the production kernel directly, but not try to
make the kernel survive via memory_failure() handling.  Unfortunately,
the capture kernel will panic for the same reason if it touches the
error memory again.  The consequence is that only an incomplete vmcore
is left for sustaining engineers, it’s a big headache for them to make
clear what happened in the past.

We had considered 3 solutions and finally chose the last one.

1. When the capture kernel boots up, re-scans the MCE banks to check if
   there are fatal errors, set the PG_hwpoison flag for each error
   pages.
   We can foresee this solution is heavy.  It needs to find the struct
   page of error pages from old memory and set the flag.  Looks like we
   need to remake the wheel, so we gave up it.

2. Replace the function copy_to_iter() at __copy_oldmem_page() with the
   function _copy_mc_to_iter(), which is a #MC safe version.
   This solution is lightweight but has following drawbacks:

   1) Such issues are quite rare events; we don’t want to use a #MC safe
      copy to accommodate it. Especially, if the problem can be deal
      with by MCE handling rather than touching the Kdump stuff.

   2) The #MC safe copy is conditionally, whether it can fix the #MC
      error depends on MCE handling can reach the fixup_exception()
      function at do_machine_check().  However, in fatal error case, it
      might invoke mce_panic() to crash the capture kernel earlier than
      fixing up the error.

3. The solution in this patch overcomes all above drawbacks.  It set the
   flag just before the production kernel calls panic(), which would not
   introduce additional overhead in capture kernel or conflict with
   other hwpoision-related code in production kernel.  Furthermore, it
   leverages the already existing mechanisms to fix the issue as much as
   possible, the code changes are also lightweight.

To verify the fix is not difficult.  The issue can be simulated by
ras-tools
(https://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git),
"copyout" test case.  It can inject a fatal memory error in kernel space
via APEI ENIJ interface (need hardware platform support), and then it
touches the error page to produce the issue.  The patch has been
validated by this tool.

Any idea is welcome!

Best Regards,
Zhiquan

next prev parent reply	other threads:[~2023-10-10  0:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-14  3:05 [PATCH RESEND v2] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic Zhiquan Li
2023-10-02 19:06 ` Ingo Molnar
2023-10-10  0:56   ` Zhiquan Li [this message]
2023-10-10  8:25     ` Borislav Petkov
2023-10-10  8:28 ` Borislav Petkov
2023-10-11  3:00   ` Zhiquan Li
2023-10-12 14:57     ` Borislav Petkov
2023-10-13  0:26       ` Zhiquan Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5b6bdf6a-760c-4ba3-95ec-2d4482ad9bac@intel.com \
    --to=zhiquan1.li@intel.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=patches@lists.linux.dev \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).