From: Don Zickus <dzickus@redhat.com>
To: Huang Ying <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>,
linux-kernel@vger.kernel.org, Andi Kleen <andi@firstfloor.org>,
Robert Richter <robert.richter@amd.com>
Subject: Re: [PATCH -v3 5/6] x86, NMI, treat unknown NMI as hardware error
Date: Mon, 11 Oct 2010 17:20:06 -0400 [thread overview]
Message-ID: <20101011212006.GB23882@redhat.com> (raw)
In-Reply-To: <1286606987-19879-5-git-send-email-ying.huang@intel.com>
On Sat, Oct 09, 2010 at 02:49:46PM +0800, Huang Ying wrote:
> In general, unknown NMI is used by hardware and firmware to notify
> fatal hardware errors to OS. So the Linux should treat unknown NMI as
> hardware error and go panic upon unknown NMI for better error
> containment.
>
> But there are some broken hardware, which will generate unknown NMI
> not for hardware error. To support these machines, a white list
> mechanism is provided to treat unknown NMI as hardware error only on
> some known working system.
>
> These systems are identified via the presentation of APEI HEST or
> some PCI ID of the host bridge. The PCI ID of host bridge instead of
> DMI ID is used, so that the checking can be done based on the platform
> type instead of motherboard. This should be simpler and sufficient.
>
> The method to identify the platforms is designed by Andi Kleen.
I don't have any major problems with the other patches in the patch
series. In fact I would like to get them committed somewhere, so we can
continue building on them.
> @@ -366,6 +368,15 @@ unknown_nmi_error(unsigned char reason,
> if (notify_die(DIE_NMIUNKNOWN, "nmi", regs, reason, 2, SIGINT) ==
> NOTIFY_STOP)
> return;
> + /*
> + * On some platforms, hardware errors may be notified via
> + * unknown NMI
> + */
> + if (unknown_nmi_as_hwerr)
> + panic(
> + "NMI for hardware error without error record: Not continuing\n"
> + "Please check BIOS/BMC log for further information.");
> +
> #ifdef CONFIG_MCA
> /*
> * Might actually be able to figure out what the guilty party
The only quirk I have left is the above piece, which is basically a
philosophy difference with Robert and myself. Where we believe it should
be on the die_chain and Andi and yourself would like to see it explicitly
called out.
If we move to a new notifier chain, like we discussed in another thread,
would you guys be willing to move this into that new notifier chain or is
your argument still going to stand?
Cheers,
Don
next prev parent reply other threads:[~2010-10-11 21:20 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-09 6:49 [PATCH -v3 1/6] x86, NMI, Add NMI symbol constants and rename memory parity to PCI SERR Huang Ying
2010-10-09 6:49 ` [PATCH -v3 2/6] x86, NMI, Add touch_nmi_watchdog to io_check_error delay Huang Ying
2010-10-09 6:49 ` [PATCH -v3 3/6] x86, NMI, Rewrite NMI handler Huang Ying
2010-10-11 16:13 ` Peter Zijlstra
2010-10-11 20:35 ` Don Zickus
2010-10-12 0:50 ` Huang Ying
2010-10-12 6:04 ` Peter Zijlstra
2010-10-12 6:14 ` Huang Ying
2010-10-12 6:31 ` Peter Zijlstra
2010-10-12 6:37 ` Huang Ying
2010-10-12 6:40 ` Peter Zijlstra
2010-10-12 6:45 ` Huang Ying
2010-10-12 6:49 ` Peter Zijlstra
2010-10-12 6:54 ` Huang Ying
2010-10-12 13:51 ` Andi Kleen
2010-10-12 14:15 ` Peter Zijlstra
2010-10-27 16:45 ` Don Zickus
2010-10-27 17:08 ` Peter Zijlstra
2010-10-27 18:07 ` Don Zickus
2010-11-02 17:50 ` Don Zickus
2010-11-02 18:16 ` Huang Ying
2010-11-02 19:11 ` Don Zickus
2010-11-02 20:47 ` Don Zickus
2010-10-09 6:49 ` [PATCH -v3 4/6] Make NMI reason io port (0x61) can be processed on any CPU Huang Ying
2010-10-09 6:49 ` [PATCH -v3 5/6] x86, NMI, treat unknown NMI as hardware error Huang Ying
2010-10-10 14:07 ` Alan Cox
2010-10-10 14:13 ` Andi Kleen
2010-10-11 21:08 ` Don Zickus
2010-10-11 21:12 ` Don Zickus
2010-10-11 21:20 ` Don Zickus [this message]
2010-10-12 1:10 ` Huang Ying
2010-10-20 6:12 ` Huang Ying
2010-10-20 14:15 ` Don Zickus
2010-10-21 1:14 ` Huang Ying
2010-10-21 2:31 ` Don Zickus
2010-10-21 5:17 ` Huang Ying
2010-10-21 14:10 ` Don Zickus
2010-10-21 15:45 ` Andi Kleen
2010-10-22 1:49 ` Don Zickus
2010-10-22 2:05 ` Huang Ying
2010-10-22 2:56 ` Don Zickus
2010-10-22 5:23 ` Huang Ying
2010-10-22 9:24 ` Andi Kleen
2010-10-09 6:49 ` [PATCH -v3 6/6] x86, NMI, Remove do_nmi_callback logic Huang Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101011212006.GB23882@redhat.com \
--to=dzickus@redhat.com \
--cc=andi@firstfloor.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=robert.richter@amd.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox