From: Andi Kleen <ak@linux.intel.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Huang Ying <ying.huang@intel.com>,
Jin Dongming <jin.dongming@np.css.fujitsu.com>,
LKLM <linux-kernel@vger.kernel.org>,
Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Subject: Re: [Patch-next] Remove notify_die in do_machine_check functioin
Date: Thu, 27 May 2010 15:15:25 +0200 [thread overview]
Message-ID: <4BFE706D.2070909@linux.intel.com> (raw)
In-Reply-To: <20100527120410.43a0e04c@lxorguk.ukuu.org.uk>
Hi Alan,
> That would be because you don't do driver work I suspect. If you are
> doing driver work then its extremely useful ending up in the debugger
> when you get an MCE because some random bit of hardware on the bus
> decided to throw a tantrum.
>
> This is particularly the case with AMD/ATI and AMD/Nvidia chipset systems
> which tend to throw this kind of error if you prod some of the chipset
> controllers (eg the Nvidia SATA) in them in just the wrong way.
>
> So NAK simply removing it. As a driver writer I want to end up in the
> debugger when this happens so I can work out what led up to the MCE.
Have you ever tried that? It does not sound like it to be honest :)
You have no chance to figure out why the MCE happened
either, unless you run through the handler first.
Unless you want to do all the work the MCE handler does manually
somehow in the debugger (reading all banks on all CPUs, parsing
all the bits, doing all the other work). I wrote the code
to do that and even I am a bit scared of doing all the manually.
Also if the MCE is recoverable you'll just get a log entry
with all the information and if it's not recoverable you
get a panic which ends up entering the debugger anyways.
In addition you won't get a single debugger entry, but a parallel
entry on all CPUs because a MCE is broadcast.
So overall I still think handling MCEs in debuggers does not
make sense.
-Andi
next prev parent reply other threads:[~2010-05-27 13:15 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-27 2:40 [Patch-next] Remove notify_die in do_machine_check functioin Jin Dongming
2010-05-27 3:21 ` Huang Ying
2010-05-27 6:06 ` Hidetoshi Seto
2010-05-27 6:57 ` Andi Kleen
2010-05-27 6:54 ` Andi Kleen
2010-05-27 11:04 ` Alan Cox
2010-05-27 13:15 ` Andi Kleen [this message]
2010-05-27 6:58 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BFE706D.2070909@linux.intel.com \
--to=ak@linux.intel.com \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=jin.dongming@np.css.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox