From: Don Zickus <dzickus@redhat.com>
To: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>, Ingo Molnar <mingo@elte.hu>,
"H. Peter Anvin" <hpa@zytor.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Robert Richter <robert.richter@amd.com>,
"peterz@infradead.org" <peterz@infradead.org>
Subject: Re: [PATCH -v3 5/6] x86, NMI, treat unknown NMI as hardware error
Date: Thu, 21 Oct 2010 22:56:57 -0400 [thread overview]
Message-ID: <20101022025657.GA10556@redhat.com> (raw)
In-Reply-To: <1287713110.2862.54.camel@yhuang-dev>
On Fri, Oct 22, 2010 at 10:05:10AM +0800, Huang Ying wrote:
> > > > Well, do you have an alternative way to handle broken hardware? Broken
> > > > hardware has generated NMIs, sometimes if I am lucky SERRs. The ones that
> > > > generate SERRs can be filtered through a different path, but what about
> > > > the ones that don't?
> > > >
> > >
> > > Don, AFAIK you're saying the same thing as Ying: an unknown NMI is
> > > a hardware error.
> > >
> > > The reason the hardware does that is that it wants to tell us:
> > >
> > > "I lost track of an error. There is corrupted data somewhere in the system.
> > > Please stop, don't do anything that could consume that data. S.O.S."
> > >
> > > The correct answer for that is panic.
> >
> > After re-reading Huang's patch, I am starting to understand what you mean
> > by broken hardware. Basically you are trying to distinguish between
> > legacy systems that were 'broken' in the sense they would randomly send
> > uknown NMIs for no good reason, hence the 'Dazed and confused' messages
> > and hardware errors on more modern systems that say, 'Hardware error,
> > panicing check your BIOS for more info' (or whatever).
>
> Yes.
>
> > So Huang's patch was sort of acting like a switch. On legacy systems use
> > 'Dazed and confused' for unknown NMIs. Whereas on whitelisted modern
> > systems use a more relavant 'Check BIOS for error' message. Is that
> > right?
>
> In fact we want to go panic and 'check BIOS for error, contact your
> hardware vendor' for all systems. But as you said, there are some
> 'broken hardware' randomly send unknown NMIs for no good reason. So a
> white list is used for them. And not all pre-Nehalem machines are
> 'broken' in fact.
Ok, I think I finally understand what you guys are trying to do. I also
can't see a problem with it. Though I think the patch could probably use
some clean up to make it more clear. Off the top of my head perhaps a
function call that sets the variable unknown_nmi_as_hwerr instead of
setting it explicitly and maybe structuring unknown_nmi() with an if-then
modern-message; else legacy-message; to possibly make it obvious what the
code is trying to acheive.
And yeah I know not all pre-Nehalem machines are broken. I am usually
sarcastic when I mention that just because being at IDF last year, I got
the impression that pre-Nehalem machines were considered the dark ages.
:-)
I am actually curious to know how many x86_64 machines would be considered
broken?
>
> > That's why you guys are complaining that registering a die_notifier would
> > be silly?
>
> I think whether going die_notifier or unknown_nmi_error() depends on it
> is general or specific for some hardware. Do you agree with that?
Well I am hoping the only general case would be the one you want to use
now. Everything else would be specific and require a die_notifier. I
mean how many different ways do we want to have a printk/panic in
unknown_nmi()?
Cheers,
Don
next prev parent reply other threads:[~2010-10-22 2:57 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-09 6:49 [PATCH -v3 1/6] x86, NMI, Add NMI symbol constants and rename memory parity to PCI SERR Huang Ying
2010-10-09 6:49 ` [PATCH -v3 2/6] x86, NMI, Add touch_nmi_watchdog to io_check_error delay Huang Ying
2010-10-09 6:49 ` [PATCH -v3 3/6] x86, NMI, Rewrite NMI handler Huang Ying
2010-10-11 16:13 ` Peter Zijlstra
2010-10-11 20:35 ` Don Zickus
2010-10-12 0:50 ` Huang Ying
2010-10-12 6:04 ` Peter Zijlstra
2010-10-12 6:14 ` Huang Ying
2010-10-12 6:31 ` Peter Zijlstra
2010-10-12 6:37 ` Huang Ying
2010-10-12 6:40 ` Peter Zijlstra
2010-10-12 6:45 ` Huang Ying
2010-10-12 6:49 ` Peter Zijlstra
2010-10-12 6:54 ` Huang Ying
2010-10-12 13:51 ` Andi Kleen
2010-10-12 14:15 ` Peter Zijlstra
2010-10-27 16:45 ` Don Zickus
2010-10-27 17:08 ` Peter Zijlstra
2010-10-27 18:07 ` Don Zickus
2010-11-02 17:50 ` Don Zickus
2010-11-02 18:16 ` Huang Ying
2010-11-02 19:11 ` Don Zickus
2010-11-02 20:47 ` Don Zickus
2010-10-09 6:49 ` [PATCH -v3 4/6] Make NMI reason io port (0x61) can be processed on any CPU Huang Ying
2010-10-09 6:49 ` [PATCH -v3 5/6] x86, NMI, treat unknown NMI as hardware error Huang Ying
2010-10-10 14:07 ` Alan Cox
2010-10-10 14:13 ` Andi Kleen
2010-10-11 21:08 ` Don Zickus
2010-10-11 21:12 ` Don Zickus
2010-10-11 21:20 ` Don Zickus
2010-10-12 1:10 ` Huang Ying
2010-10-20 6:12 ` Huang Ying
2010-10-20 14:15 ` Don Zickus
2010-10-21 1:14 ` Huang Ying
2010-10-21 2:31 ` Don Zickus
2010-10-21 5:17 ` Huang Ying
2010-10-21 14:10 ` Don Zickus
2010-10-21 15:45 ` Andi Kleen
2010-10-22 1:49 ` Don Zickus
2010-10-22 2:05 ` Huang Ying
2010-10-22 2:56 ` Don Zickus [this message]
2010-10-22 5:23 ` Huang Ying
2010-10-22 9:24 ` Andi Kleen
2010-10-09 6:49 ` [PATCH -v3 6/6] x86, NMI, Remove do_nmi_callback logic Huang Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101022025657.GA10556@redhat.com \
--to=dzickus@redhat.com \
--cc=andi@firstfloor.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=robert.richter@amd.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.