All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Don Zickus <dzickus@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>, Ingo Molnar <mingo@elte.hu>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC 5/6] x86, NMI, Add support to notify hardware error with unknown NMI
Date: Mon, 13 Sep 2010 17:24:38 +0200	[thread overview]
Message-ID: <20100913172438.37443bf7@basil.nowhere.org> (raw)
In-Reply-To: <20100913141140.GB27371@redhat.com>


Don,

> Unfortunately, most of the bugzillas I deal with, unkown NMIs are the
> result of SERRs.  While you can consider that hardware error
> reporting, the easiest way for me to debug those problems currently
> is to have reporters run 'lspci -vvv' after the NMI is displayed to
> figure out who caused the NMI.
> 
> My fear is that panic'ing the box on unknown NMIs on those platforms
> will hinder my ability to easily debug those NMIs.


The reason the NMI is sent is that there is a "lost" 
data corruption somewhere in the system and if you don't 
stop it the system the corruption may make it to disk,
become permanent etc. The hardware was designed
under the assumption that  the system is stopped by software
when this happens (same reason as why many machine
checks cause panics)

Then there isn't necessarily something to "debug": data corruption
can happen without any bugs being around (and in fact
that's the common case, assuming production systems)

So I'm not sure what you're debugging here. Are you being the support
technician for the system through bugzilla? That sounds
inefficient.

Anyways for hardware support we could probably dump some
more information at panic or better through error
serialization, but the word "debug" is IMHO an very wrong
way to think about that.

If this is about driver debugging it's entirely reasonable
to have a special setting (e.g. disable the panic), 
but the defaults should be set in a way to avoid
spreading data corruption,.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

  reply	other threads:[~2010-09-13 15:24 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-10  2:51 [RFC 1/6] x86, NMI, Add symbol definition for NMI magic constants Huang Ying
2010-09-10  2:51 ` [RFC 2/6] x86, NMI, Add touch_nmi_watchdog to io_check_error delay Huang Ying
2010-09-10  2:51 ` [RFC 3/6] x86, NMI, Rename memory parity error to PCI SERR error Huang Ying
2010-09-13  1:02   ` Robert Richter
2010-09-13  2:02     ` Huang Ying
2010-09-16  8:18       ` Robert Richter
2010-09-17  0:08         ` Huang Ying
2010-09-17  9:14           ` Robert Richter
2010-09-19  0:20             ` Huang Ying
2010-09-20  8:00               ` Robert Richter
2010-09-20 12:59                 ` Borislav Petkov
2010-09-21  0:22                   ` Huang Ying
2010-09-21  6:37                     ` Borislav Petkov
2010-09-21 14:08                       ` Doug Thompson
2010-09-21 23:04   ` Maciej W. Rozycki
2010-09-23  5:37     ` huang ying
2010-09-29  0:26       ` Maciej W. Rozycki
2010-09-10  2:51 ` [RFC 4/6] x86, NMI, Rewrite NMI handler Huang Ying
2010-09-10 15:56   ` Don Zickus
2010-09-10 16:03     ` Andi Kleen
2010-09-10 18:29       ` Don Zickus
2010-09-13  2:09         ` Huang Ying
2010-09-13 14:04           ` Don Zickus
2010-09-14  5:12             ` Huang Ying
2010-09-14 13:37               ` Don Zickus
2010-09-13  1:16   ` Robert Richter
2010-09-10  2:51 ` [RFC 5/6] x86, NMI, Add support to notify hardware error with unknown NMI Huang Ying
2010-09-10 16:02   ` Don Zickus
2010-09-10 16:19     ` Andi Kleen
2010-09-10 18:40       ` Don Zickus
2010-09-13  2:19         ` Huang Ying
2010-09-13 14:11           ` Don Zickus
2010-09-13 15:24             ` Andi Kleen [this message]
2010-09-13 15:47               ` Don Zickus
2010-09-13 16:57                 ` Andi Kleen
2010-09-13 17:53                   ` Don Zickus
2010-09-13 18:07                     ` Andi Kleen
2010-09-13 18:23                       ` Don Zickus
2010-09-13 18:36                         ` Andi Kleen
2010-09-13 19:36                           ` Don Zickus
2010-09-13 20:49                             ` Andi Kleen
2010-09-13 21:25                               ` Valdis.Kletnieks
2010-09-14  7:48                                 ` Andi Kleen
2010-09-14 17:54                                   ` Valdis.Kletnieks
2010-09-14 12:21                             ` Ingo Molnar
2010-09-14 13:45                               ` Don Zickus
2010-09-14 19:34                               ` Cyrill Gorcunov
2010-09-15  9:29                                 ` Ingo Molnar
2010-09-10  2:51 ` [RFC 6/6] x86, NMI, Remove do_nmi_callback logic Huang Ying
2010-09-10 16:13   ` Don Zickus
2010-09-13  2:27     ` Huang Ying
2010-09-13  6:24       ` Ingo Molnar
2010-09-10 20:37 ` [RFC 1/6] x86, NMI, Add symbol definition for NMI magic constants Peter Zijlstra
2010-09-10 22:58   ` H. Peter Anvin
2010-09-11  8:50   ` Andi Kleen
2010-09-13  1:30     ` Robert Richter
2010-09-21 21:48 ` Don Zickus
2010-09-21 22:19   ` Andi Kleen
2010-09-22 16:07     ` Don Zickus
2010-09-23  9:29       ` huang ying
2010-09-23 14:16         ` Don Zickus
2010-09-24 11:50           ` huang ying
2010-09-24 14:29             ` Don Zickus
2010-09-23  9:51   ` huang ying
  -- strict thread matches above, loose matches on Subject: below --
2010-09-14 14:31 [RFC 5/6] x86, NMI, Add support to notify hardware error with unknown NMI Andi Kleen
2010-09-14 15:17 ` Don Zickus
2010-09-14 17:40   ` Andi Kleen
2010-09-14 17:48 ` Ingo Molnar
2010-09-15  5:06   ` Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100913172438.37443bf7@basil.nowhere.org \
    --to=andi@firstfloor.org \
    --cc=dzickus@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.