public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Aili Yao <yaoaili@kingsoft.com>, Borislav Petkov <bp@alien8.de>
Cc: "rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"james.morse@arm.com" <james.morse@arm.com>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>,
	"CHENGUOMIN@kingsoft.com" <CHENGUOMIN@kingsoft.com>
Subject: RE: [PATCH v2] Dump cper error table in mce_panic
Date: Thu, 28 Jan 2021 17:22:30 +0000	[thread overview]
Message-ID: <e9645a3ff93e46d4aabdf7dd45bfc4d7@intel.com> (raw)
In-Reply-To: <20210128200128.6f022993.yaoaili@kingsoft.com>

> The even better way to detect this is to be able to check whether this
> is the kdump kernel and whether it got loaded due to a fatal MCE in the
> first kernel and then match that error address with the error address of
> the error which caused the first panic in the mce code. Then the second
> kernel won't need to panic but simply log.

The biggest problem with all of the logging (whether in machine check
banks, or in error records from BIOS) is the lack of a timestamp. If there
was a way to tell if this "just happened", or "happened a while ago" then
such "take action" or "just log" decisions would be simpler.

Maybe you don't need to do *all* those matching checks.  Just a flag
from the first kernel to say "I died from a fatal machine check" could
be used to tell the kdump kernel "just log the cper" stuff.

If the system is broken enough that more machine checks are still
firing in the kdump kernel ... then you would miss trying to recover.
But if more machine checks are happening, then the kdump kernel
is likely doomed anyway.

Getting a full memory dump after a machine check generally isn't
all that useful anyway. The problem was (almost certainly) h/w, so
not much benefit in decoding the dump to find which code was running
when the h/w signalled.

A second bite at getting the error logs from the death of the first
kernel is worth it though.

-Tony

  reply	other threads:[~2021-01-28 17:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-04  6:50 [PATCH] Dump cper error table in mce_panic yaoaili126
2020-11-04 10:16 ` kernel test robot
2020-11-06 19:35 ` James Morse
2020-11-18  3:12   ` Aili Yao
2020-11-17  9:58 ` [PATCH v2] " Aili Yao
2020-11-18 12:45   ` Borislav Petkov
2020-11-19  5:40     ` Aili Yao
2020-11-19 17:45       ` Borislav Petkov
2020-11-20  3:40         ` Aili Yao
2020-11-20  9:22         ` Aili Yao
2020-11-20 10:24           ` Borislav Petkov
2021-01-28 12:01             ` Aili Yao
2021-01-28 17:22               ` Luck, Tony [this message]
2021-02-23  9:18                 ` Aili Yao
2021-02-23 19:32                   ` Luck, Tony
2021-02-24  9:56                     ` Aili Yao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9645a3ff93e46d4aabdf7dd45bfc4d7@intel.com \
    --to=tony.luck@intel.com \
    --cc=CHENGUOMIN@kingsoft.com \
    --cc=bp@alien8.de \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=yangfeng1@kingsoft.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox