All of lore.kernel.org
 help / color / mirror / Atom feed
From: Russ Anderson <rja@sgi.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Borislav Petkov <bp@amd64.org>, Ingo Molnar <mingo@elte.hu>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Tony Luck <tony.luck@intel.com>,
	EDAC devel <linux-edac@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Prarit Bhargava <prarit@redhat.com>,
	Nagananda Chumbalkar <Nagananda.Chumbalkar@hp.com>,
	rja@americas.sgi.com
Subject: Re: [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier
Date: Tue, 26 Apr 2011 17:32:57 -0500	[thread overview]
Message-ID: <20110426223257.GB27953@sgi.com> (raw)
In-Reply-To: <m1y62wk9i8.fsf@fess.ebiederm.org>

On Tue, Apr 26, 2011 at 02:06:39PM -0700, Eric W. Biederman wrote:
> Borislav Petkov <bp@amd64.org> writes:
> > On Mon, Apr 25, 2011 at 03:40:11PM -0400, Eric W. Biederman wrote:
> >> > From: Borislav Petkov <borislav.petkov@amd.com>
> >> > Date: Wed, 13 Apr 2011 14:32:06 +0200
> >> > Subject: [PATCH -v2.1 2/2] x86, MCE: Drop the default decoding notifier
> >> >
> >> > The default notifier doesn't make a lot of sense to call in the
> >> > correctable errors case. Drop it and emit the mcelog decoding hint only
> >> > in the uncorrectable errors case and when no notifier is registered.
> >> > Also, limit issuing the "mcelog --ascii" message in the rare case when
> >> > we dump unreported CEs before panicking.
> >> >
> >> > While at it, remove unused old x86_mce_decode_callback from the
> >> > header.
> >> 
> >> Can we please print something if we please log something in the
> >> case of a correctable error, when we only report it via mcelog?
> >> 
> >> I have a stupid recent intel cpu here that hits that case and without
> >> the default x86_mce_decode_callback I wouldn't have even known that I am
> >> getting something like 50 correctable errors an hour on one of my
> >> machines.  In particular I am it hits so often I am seeing:
> >> "mce_notify_irq: 2 callbacks suppressed".  I need to get those dimms
> >> replaced soon because in a new product I simply can't imagine that many
> >> correctable errors.
> >
> > Isn't there a mcelog daemon or something that polls /dev/mcelog and
> > tells you about those DRAM ECCs in some log file where you're supposed
> > to look? :)
> 
> On fedora 14 there is a cron job that writes to /var/log/mcelog, and
> does not go through syslog.

Interesting.  I'm running fedora 14 and don't have a /var/log/mcelog
file or see an mcelog package (not that I'd looked until just now).

>                              But you have to be proactive and look
> there.  If the people who work on this code can't even remember
> where to look I can't imagine how anyone else can remember.
> Which is why I object to the removal of the one printk that told
> me something was broken on my machine.

Historically hardware error reporting has been very platform
dependent.  Those differences made it difficult to come up with
agreement on standard ways to report errors.  You raise a good
point that it needs to work better.

> So far from what I have seen /dev/mcelog and the userspace mcelog is
> over complicated and near useless.

/dev/mcelog is extremely useful to SGI.  As you said, "you have to 
be proactive and look there" which we are and do.  :-)

>                                     It seems to focused around the
> notion that "This is not a software problem, please do not bug
> Andi Kleen about it"
> 
> Well it is a hardware problem so I do need to RMA that hardware.
> Sigh.

You raise a good issue that users do need to know when their 
hardware is having issues.

> Eric

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

      parent reply	other threads:[~2011-04-26 22:33 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-18 14:00 [PATCH -v2 2/2] x86, MCE: Drop the default decoding notifier Borislav Petkov
2011-04-19 17:13 ` Ingo Molnar
2011-04-19 17:35   ` Borislav Petkov
2011-04-19 17:44     ` Ingo Molnar
2011-04-20 10:23       ` Borislav Petkov
2011-04-21 12:10         ` [tip:x86/mce] x86, mce: " tip-bot for Borislav Petkov
2011-04-25 19:40         ` [PATCH -v2 2/2] x86, MCE: " Eric W. Biederman
2011-04-26  7:42           ` Borislav Petkov
2011-04-26 21:06             ` Eric W. Biederman
2011-04-26 21:47               ` Borislav Petkov
2011-04-26 22:26                 ` Eric W. Biederman
2011-04-26 23:44                   ` Luck, Tony
2011-04-27 14:03                     ` Borislav Petkov
2011-04-26 22:32               ` Russ Anderson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110426223257.GB27953@sgi.com \
    --to=rja@sgi.com \
    --cc=Nagananda.Chumbalkar@hp.com \
    --cc=bp@amd64.org \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=prarit@redhat.com \
    --cc=rja@americas.sgi.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.