From: Don Zickus <dzickus@redhat.com>
To: Huang Ying <ying.huang@intel.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>,
huang ying <huang.ying.caritas@gmail.com>,
Ingo Molnar <mingo@elte.hu>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Andi Kleen <andi@firstfloor.org>,
Robert Richter <robert.richter@amd.com>,
Andi Kleen <ak@linux.intel.com>
Subject: Re: [RFC] x86, NMI, Treat unknown NMI as hardware error
Date: Mon, 16 May 2011 15:03:10 -0400 [thread overview]
Message-ID: <20110516190310.GH31888@redhat.com> (raw)
In-Reply-To: <4DD07959.4030608@intel.com>
On Mon, May 16, 2011 at 09:09:45AM +0800, Huang Ying wrote:
> > Ying, the concern is rather related to the code scheme in general. Since
> > we have notifiers I think the better way to be consistent here and use
> > hwerr notifier too. But it's IMHO ;)
>
> As for go notifiers or not. IMHO, a rule can be:
>
> - If it is something like a driver, than it should go notifier
> - If it is architectural/PC defacto standard, it can sit outside of
> notifier.
Hmm, then what do you do about perf? That is architectural and a defacto
standard, but I am not sure hardcoding that would be appropriate.
>
> I think that seeing unknown NMI as hardware error should be part of PC
> defacto standard. Do you think so?
Well after thinking about it, I would say no. And my reason is, if
vendors are really serious about using NMIs as an indicator for hardware
errors, shouldn't they be setting a bit in the memory controller/north
bridge or south bridge/IOHC for an NMI handler to read? I mean hardware
devices don't just get wired directly to the NMI pin on the cpu, right?
They generally have to go through some hub that acts as a multiplexer.
In those cases, why can't those hubs set a bit saying it detected an error
(don't PCIe bridges already do that?) and let the NMI handler read it to
confirm. This way we can leave 'unknown NMIs' as a way to say an
unclaimed NMI entered the system and we can have users set policy about
what to do, panic, printk, whatever.
But for the HEST stuff, it should be smart enough by now to trap any
hardware error, no? How does a machine that supports HEST let a hardware
error get through without detecting it? Isn't that the point? Detect a
hardware error, grab as much info about it as possible, save the error
record and then panic?
Otherwise if you just panic, then you have no idea why the machine errored
in the first place. It might be the safe thing to do in some
circumstances, but then you have to wonder why the fancy HEST enabled
server didn't catch it. Isn't that what people are spending extra money
for those Intel servers with RAS features?
Cheers,
Don
next prev parent reply other threads:[~2011-05-16 19:03 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-13 8:23 [RFC] x86, NMI, Treat unknown NMI as hardware error Huang Ying
2011-05-13 12:45 ` Don Zickus
2011-05-13 13:00 ` Ingo Molnar
2011-05-13 13:24 ` huang ying
2011-05-13 15:20 ` Ingo Molnar
2011-05-13 16:00 ` Don Zickus
2011-05-16 11:29 ` Ingo Molnar
2011-05-16 19:19 ` Don Zickus
2011-05-17 8:50 ` Ingo Molnar
2011-05-17 7:41 ` Huang Ying
2011-05-17 8:53 ` Ingo Molnar
2011-05-19 6:44 ` Huang Ying
2011-05-20 11:58 ` Ingo Molnar
2011-05-14 0:56 ` huang ying
2011-05-13 13:17 ` huang ying
2011-05-13 13:51 ` Don Zickus
2011-05-14 0:20 ` huang ying
2011-05-14 4:11 ` Andi Kleen
2011-05-13 15:17 ` Cyrill Gorcunov
2011-05-14 0:26 ` huang ying
2011-05-14 7:51 ` Cyrill Gorcunov
2011-05-15 0:06 ` huang ying
2011-05-15 6:34 ` Cyrill Gorcunov
2011-05-16 1:09 ` Huang Ying
2011-05-16 19:03 ` Don Zickus [this message]
2011-05-16 19:53 ` Cyrill Gorcunov
2011-05-17 5:39 ` Huang Ying
2011-05-17 14:24 ` Don Zickus
2011-05-17 16:38 ` Andi Kleen
2011-05-17 17:57 ` Don Zickus
2011-05-17 18:18 ` Andi Kleen
2011-05-17 19:07 ` Don Zickus
2011-05-20 8:13 ` Huang Ying
2011-06-09 12:09 ` Don Zickus
2011-06-09 15:22 ` Cyrill Gorcunov
2011-06-13 1:34 ` Huang Ying
2011-05-16 19:44 ` Cyrill Gorcunov
2011-05-17 7:32 ` Huang Ying
2011-05-14 0:47 ` huang ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110516190310.GH31888@redhat.com \
--to=dzickus@redhat.com \
--cc=ak@linux.intel.com \
--cc=andi@firstfloor.org \
--cc=gorcunov@gmail.com \
--cc=huang.ying.caritas@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=robert.richter@amd.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.