public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
From: James Morse <james.morse@arm.com>
To: yaoaili126@163.com
Cc: rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com,
	bp@alien8.de, linux-acpi@vger.kernel.org,
	linux-edac@vger.kernel.org, yangfeng1@kingsoft.com,
	CHENGUOMIN@kingsoft.com, yaoaili@kingsoft.com
Subject: Re: [PATCH] Dump cper error table in mce_panic
Date: Fri, 6 Nov 2020 19:35:32 +0000	[thread overview]
Message-ID: <112d8a04-4f0d-6705-4da1-e8d95a14dbaf@arm.com> (raw)
In-Reply-To: <20201104065057.40442-1-yaoaili126@163.com>

Hello!

On 04/11/2020 06:50, yaoaili126@163.com wrote:
> From: Aili Yao <yaoaili@kingsoft.com>
> 
> For X86_MCE, When there is a fatal ue error, BIOS will prepare one
> detailed cper error table before raising MCE,

(outside GHES-ASSIST), Its not supposed to do this.

There is an example flow described in 18.4.1 "Example: Firmware First Handling Using NMI
Notification" of ACPI v6.3:
https://uefi.org/sites/default/files/resources/ACPI_Spec_6_3_A_Oct_6_2020.pdf


The machine-check is the notification from hardware, which in step 1 of the above should
go to firmware. You should only see an NMI, which is step 8.
Step 7 is to clear the error from hardware, so triggering a machine-check is pointless.
(but I agree no firmware ever follows this!)


You appear to have something that behaves as GHES-ASSIST. Can you post the decompiled dump
of your HEST table? (decompiled, no binaries!) If its large, you can post it to me off
list and I'll copy the relevant bits here...


> this cper table is meant
> to supply addtional error information and not to race with mce handler
> to panic.

This is a description of GHES_ASSIST. See 18.7 "GHES_ASSIST Error Reporting" of the above pdf.


> Usually possible unexpected cper process from NMI watchdog race panic
> with MCE panic is not a problem, the panic process will coordinate with
> each core. But When the CPER is not processed in the first kernel and
> leave it to the second kernel, this is a problem, lead to a kdump fail.

> Now in this patch, the mce_panic will race with unexpected NMI to dump
> the cper error log and get it cleaned, this will prevent the cper table
> leak to the second kernel, which will fix the kdump fail problem, and
> also guarrante the cper log is collected which it's meant to.

> Anyway,For x86_mce platform, the ghes module is still needed not to
> panic for fatal memory UE as it's MCE handler's work.

If and only if those GHES are marked as GHES_ASSIST.

If they are not, then you have a fully fledged firwmare-first system.

Could you share what your system is describing it as in the HEST so we can work out what
is going on here?!

We need to work this out first.


Thanks,

James


  parent reply	other threads:[~2020-11-06 19:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-04  6:50 [PATCH] Dump cper error table in mce_panic yaoaili126
2020-11-04 10:16 ` kernel test robot
2020-11-06 19:35 ` James Morse [this message]
2020-11-18  3:12   ` Aili Yao
2020-11-17  9:58 ` [PATCH v2] " Aili Yao
2020-11-18 12:45   ` Borislav Petkov
2020-11-19  5:40     ` Aili Yao
2020-11-19 17:45       ` Borislav Petkov
2020-11-20  3:40         ` Aili Yao
2020-11-20  9:22         ` Aili Yao
2020-11-20 10:24           ` Borislav Petkov
2021-01-28 12:01             ` Aili Yao
2021-01-28 17:22               ` Luck, Tony
2021-02-23  9:18                 ` Aili Yao
2021-02-23 19:32                   ` Luck, Tony
2021-02-24  9:56                     ` Aili Yao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=112d8a04-4f0d-6705-4da1-e8d95a14dbaf@arm.com \
    --to=james.morse@arm.com \
    --cc=CHENGUOMIN@kingsoft.com \
    --cc=bp@alien8.de \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=tony.luck@intel.com \
    --cc=yangfeng1@kingsoft.com \
    --cc=yaoaili126@163.com \
    --cc=yaoaili@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox