From: Don Zickus <dzickus@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org, Andi Kleen <andi@firstfloor.org>,
gong.chen@linux.intel.com, LKML <linux-kernel@vger.kernel.org>,
Elliott@hp.com, thomas.mingarelli@hp.com
Subject: Re: [PATCH 5/6] x86, nmi: Move default external NMI handler to its own routine
Date: Wed, 21 May 2014 15:13:19 -0400 [thread overview]
Message-ID: <20140521191319.GD50500@redhat.com> (raw)
In-Reply-To: <20140521181756.GJ2485@laptop.programming.kicks-ass.net>
On Wed, May 21, 2014 at 08:17:56PM +0200, Peter Zijlstra wrote:
> On Wed, May 21, 2014 at 12:48:48PM -0400, Don Zickus wrote:
> > On Wed, May 21, 2014 at 12:38:46PM +0200, Peter Zijlstra wrote:
> > > On Thu, May 15, 2014 at 03:25:48PM -0400, Don Zickus wrote:
> > > > Now that we have setup an NMI subtye called NMI_EXT, there is really
> > > > no need to hard code the default external NMI handler in the main
> > > > nmi handler routine.
> > > >
> > > > Move it to a proper function and register it on boot. This change is
> > > > just code movement.
> > > >
> > > > In addition, update the hpwdt to allow it to unregister the default
> > > > handler on its registration (and vice versa). This allows the driver
> > > > to take control of that io port (which it ultimately wanted to do
> > > > originally), but in a cleaner way.
> > >
> > > wanting that is one thing, but is it also a sane thing? You don't do
> > > thing just because drivers want it.
> >
> > Heh. I understand.
> >
> > Today, I have hacked up the SERR and IOCHK handlers to give hpwdt the
> > chance to do its 'magic' bios call to collect information before
> > panic'ing.
> >
> > I was trying to clean things up by removing those hacks, but I guess I can
> > see your point, there is no guarantee they handle the hardware correctly.
> > :-/
>
> So while I'll leave the decision to the x86 people, I find the changelog
> entirely devoid of a good reason to do this.
>
> An in my personal opinion any hardware that triggers non detectable NMIs
> is just plain broken.
I do agree. And I am not looking to argue against your opinion, but the
'broken' part is what is interesting to vendors. With firmware becoming
more prevalent these days, I have seen large upticks in unknown NMIs with
RHEL-X due to broken firmware implementing the latest bells and whistles.
With so much firmware on the system (various pci cards, system firmware,
etc), no one knows which piece is broken. What hpwdt is trying to do (and
other vendors too), is the momemnt an unknown NMI happens, jump into
bios and start poking registers on various system bridges to figure out
who is causing the problems and log them somehow (on a BMC and its ilk).
Then the hardware guys know what to fix.
Of course, ACPI's APEI was supposed to create a framework to properly
deliver these errors to the OS for reliable reporting (using a properly
registerd NMI handler with a detectable NMI). But I think it is still a
work in progress. :-/
So the problem is the hardware _is_ broken, but how to communicate that is
difficult and unknown NMI appears to be the cheap and easy way to do that.
Cheers,
Don
next prev parent reply other threads:[~2014-05-21 19:14 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-15 19:25 [PATCH 0/6 V2] x86, nmi: Various fixes and cleanups Don Zickus
2014-05-15 19:25 ` [PATCH 1/6] x86, nmi: Implement delayed irq_work mechanism to handle lost NMIs Don Zickus
2014-05-21 10:29 ` Peter Zijlstra
2014-05-21 16:45 ` Don Zickus
2014-05-21 17:51 ` Peter Zijlstra
2014-05-21 19:02 ` Don Zickus
2014-05-21 19:38 ` Peter Zijlstra
2014-05-15 19:25 ` [PATCH 2/6] x86, nmi: Add new nmi type 'external' Don Zickus
2014-05-15 19:25 ` [PATCH 3/6] x86, nmi: Add boot line option 'panic_on_unrecovered_nmi' and 'panic_on_io_nmi' Don Zickus
2014-05-15 19:25 ` [PATCH 4/6] x86, nmi: Remove 'reason' value from unknown nmi output Don Zickus
2014-05-15 19:25 ` [PATCH 5/6] x86, nmi: Move default external NMI handler to its own routine Don Zickus
2014-05-21 10:38 ` Peter Zijlstra
2014-05-21 16:48 ` Don Zickus
2014-05-21 18:17 ` Peter Zijlstra
2014-05-21 19:13 ` Don Zickus [this message]
2014-05-15 19:25 ` [PATCH 6/6 V2] x86, nmi: Add better NMI stats to /proc/interrupts and show handlers Don Zickus
2014-05-15 20:28 ` [PATCH 0/6 V2] x86, nmi: Various fixes and cleanups Don Zickus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140521191319.GD50500@redhat.com \
--to=dzickus@redhat.com \
--cc=Elliott@hp.com \
--cc=andi@firstfloor.org \
--cc=gong.chen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=thomas.mingarelli@hp.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox