From: Borislav Petkov <bp@amd64.org>
To: Russ Anderson <rja@sgi.com>
Cc: Prarit Bhargava <prarit@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Luck, Tony" <tony.luck@intel.com>,
"dzickus@redhat.com" <dzickus@redhat.com>,
"mstowe@redhat.com" <mstowe@redhat.com>,
"dnelson@redhat.com" <dnelson@redhat.com>,
"rja@americas.sgi.com" <rja@americas.sgi.com>
Subject: Re: [PATCH -v3] x86, MCE: Drop the default decoding notifier
Date: Thu, 14 Apr 2011 17:49:34 +0200 [thread overview]
Message-ID: <20110414154934.GL10080@aftab> (raw)
In-Reply-To: <20110414153318.GA13891@sgi.com>
On Thu, Apr 14, 2011 at 11:33:18AM -0400, Russ Anderson wrote:
> On Thu, Apr 14, 2011 at 05:16:21PM +0200, Borislav Petkov wrote:
> > On Thu, Apr 14, 2011 at 11:04:43AM -0400, Prarit Bhargava wrote:
> > > On 04/14/2011 11:00 AM, Borislav Petkov wrote:
> > > > On Wed, Apr 13, 2011 at 01:37:05PM -0400, Borislav Petkov wrote:
> > > >
> > > >> In the worst case, we will report 32 CEs before panicking. For that case
> > > >> we either do printk_once as Tony suggested or we ratelimit it. I'll
> > > >> update the patch.
> > > >>
> > > > Ok, how about the following, I ratelimit the printk to the default of 10
> > > > messages per 5 seconds. I've also got the hardware MCE injection patches
> > > > ready and will do some testing with them.
> > > >
> > >
> > > See my previous email ;) I think just putting in a printk_once after
> > > the CE call to print_mce() in mce_panic() might be better? At least
> > > that way we get the --ascii message for *EVERY* UC which IMO would be
> > > nice...
> >
> > Are you sure? printk_once() is, as its name says, a one-time thing and
> > it is implemented that way - a static bool counter which is once set and
> > that's it. I.e., the "--ascii" message will be printed only once for the
> > system's lifetime.
> >
> > The ratelimit-ed thing dumps it a strict number of times. In the end,
> > I don't have a strong opinion on how many times we issue it - I'm fine
> > with it either way.
> >
> > Maybe some other opinions. Tony?
>
> In general I think you and Prarit are headed in the right direction.
>
> As for when to throttle messages, differing people will have
> different opinions as to the right number. For example, some
> sites may want the threshold low because once they see a CE they
> will schedule to replace the DIMM. Manufacturing sometimes
> wants to see all the CEs to know how good/bad the DIMM is.
>
> My suggestion is to pick a default value and have a /sys (or
> other) way of changing the value. That way if someone has
> a need to change the value they can. In real life someone
> will have a legitimate need to change the value.
>
> Is the thresholding on a per DIMM, per Socket, or per system
> basis? SGI tends to have large systems with lots of DIMMs.
> Per DIMM or per Socket thresholds tend to scale better.
Nah, we're talking about the decoding hint only
"Run the above through 'mcelog --ascii'"
and that issuing it makes no sense when no correctable errors info is on
dmesg.
The CEs don't get reported in this case, I think on Intel you have to
run mcelog. On AMD, you generally use the amd64_edac driver and collect
all CEs info which gets decoded to chip selects on the node and then you
can do thresholding.
What you're talking above is not yet fully ... hm.. implemented yet :).
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
next prev parent reply other threads:[~2011-04-14 15:49 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-12 17:44 [PATCH]: mce: don't print "human readable" message for corrected errors Prarit Bhargava
2011-04-12 18:58 ` Borislav Petkov
2011-04-12 19:22 ` Prarit Bhargava
2011-04-12 19:57 ` Borislav Petkov
2011-04-12 20:02 ` Luck, Tony
2011-04-12 20:15 ` Prarit Bhargava
2011-04-12 20:28 ` Borislav Petkov
2011-04-13 3:00 ` Russ Anderson
2011-04-13 7:14 ` Borislav Petkov
2011-04-13 13:24 ` Borislav Petkov
2011-04-13 13:36 ` [PATCH 1/3] x86, MCE: Do not taint when correctable errors Borislav Petkov
2011-04-13 13:36 ` [PATCH 2/3] x86, MCE: Drop default decoding notifier Borislav Petkov
2011-04-13 14:01 ` Prarit Bhargava
2011-04-13 14:18 ` Borislav Petkov
2011-04-13 14:22 ` Prarit Bhargava
2011-04-13 14:26 ` Borislav Petkov
2011-04-13 14:32 ` Prarit Bhargava
2011-04-13 14:39 ` Borislav Petkov
2011-04-13 14:45 ` Prarit Bhargava
2011-04-13 14:36 ` [PATCH -v2] " Borislav Petkov
2011-04-13 17:01 ` Prarit Bhargava
2011-04-13 17:13 ` Luck, Tony
2011-04-13 17:17 ` Prarit Bhargava
2011-04-13 17:14 ` Prarit Bhargava
2011-04-13 17:37 ` Borislav Petkov
2011-04-14 14:59 ` Prarit Bhargava
2011-04-14 15:00 ` [PATCH -v3] x86, MCE: Drop the " Borislav Petkov
2011-04-14 15:04 ` Prarit Bhargava
2011-04-14 15:16 ` Borislav Petkov
2011-04-14 15:23 ` Prarit Bhargava
2011-04-14 15:44 ` Borislav Petkov
2011-04-14 15:49 ` Prarit Bhargava
2011-04-14 19:02 ` Borislav Petkov
2011-04-14 19:04 ` Prarit Bhargava
2011-04-14 15:33 ` Russ Anderson
2011-04-14 15:49 ` Borislav Petkov [this message]
2011-04-13 13:36 ` [PATCH 3/3] EDAC, MCE, AMD: Register with MCE core Borislav Petkov
2011-04-13 2:24 ` [PATCH]: mce: don't print "human readable" message for corrected errors Russ Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110414154934.GL10080@aftab \
--to=bp@amd64.org \
--cc=dnelson@redhat.com \
--cc=dzickus@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mstowe@redhat.com \
--cc=prarit@redhat.com \
--cc=rja@americas.sgi.com \
--cc=rja@sgi.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).