From: Borislav Petkov <bp@amd64.org>
To: Prarit Bhargava <prarit@redhat.com>
Cc: Borislav Petkov <bp@amd64.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Russ Anderson <rja@sgi.com>, "Luck, Tony" <tony.luck@intel.com>,
"dzickus@redhat.com" <dzickus@redhat.com>,
"mstowe@redhat.com" <mstowe@redhat.com>,
"dnelson@redhat.com" <dnelson@redhat.com>,
"rja@americas.sgi.com" <rja@americas.sgi.com>
Subject: Re: [PATCH -v3] x86, MCE: Drop the default decoding notifier
Date: Thu, 14 Apr 2011 21:02:53 +0200 [thread overview]
Message-ID: <20110414190253.GQ10080@aftab> (raw)
In-Reply-To: <4DA71774.9020900@redhat.com>
On Thu, Apr 14, 2011 at 11:49:08AM -0400, Prarit Bhargava wrote:
> > And we don't want to fix that - we want to fix the case with the
> > occasional CE MCEs which get detected in the polling path but none of
> > their MCA regs get dumped for decoding so the decoding hint there is
> > out of place. And we fixed that at least partially so that it doesn't
> > flood the logs. If you're not fine with the default ratelimit of 10 msgs
> > per 5 seconds we can always raise the ratelimit but tweaking an almost
> > hypothetical case is just not worth it.
> >
> Okay -- I'm good then.
Ok, injecting MCEs with this patch looks like this. Nevermind the
decoded MCEs, I simply uncommented the "if (ret != NOTIFY_DONE)" line
so that I can see the rate limiting. So we still have them there, this
is the default setting of 10 calls per 5 secs, we might want to dial it
down in the output. After the 10th MCE, it doesn't appear anymore.
Hmm...
Prarit, you said you have a machine which spits a lot of CECCs on boot,
does the final version help there, did you have a chance to run it?
[ 312.983610] [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 0: 9200400000010e3f
[ 312.987463] [Hardware Error]: TSC 7ca4633c24
[ 312.987463] [Hardware Error]: PROCESSOR 2:100f91 TIME 1302807306 SOCKET 0 APIC 0
[ 312.987463] [Hardware Error]: MC0_STATUS[-|CE|-|PCC|-|CECC]: 0x9200400000010e3f
[ 312.987463] [Hardware Error]: Data Cache Error:
[ 312.987463] [Hardware Error]: Corrupted DC MCE info?
[ 312.987463] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: DRD, part-proc: GEN (no timeout)
[ 312.987463] [Hardware Error]: Run the above through 'mcelog --ascii' <-----------------
[ 312.987463] [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 0: d400400000010016
[ 312.987463] [Hardware Error]: TSC 825c93b856
[ 312.987463] [Hardware Error]: PROCESSOR 2:100f91 TIME 1302807321 SOCKET 0 APIC 0
[ 312.987463] [Hardware Error]: MC0_STATUS[Over|CE|-|-|AddrV|CECC]: 0xd400400000010016
[ 312.987463] [Hardware Error]: Data Cache Error: L2 TLB multimatch.
[ 312.987463] [Hardware Error]: cache level: L2, tx: DATA
[ 312.987463] [Hardware Error]: Run the above through 'mcelog --ascii' <-----------------
[ 312.987463] [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 0: d40040000000081f
[ 312.987463] [Hardware Error]: TSC 852bb06a47
[ 312.987463] [Hardware Error]: PROCESSOR 2:100f91 TIME 1302807328 SOCKET 0 APIC 0
[ 312.987463] [Hardware Error]: MC0_STATUS[Over|CE|-|-|AddrV|CECC]: 0xd40040000000081f
[ 312.987463] [Hardware Error]: Data Cache Error:
[ 312.987463] [Hardware Error]: Corrupted DC MCE info?
[ 312.987463] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: RD, part-proc: SRC (no timeout)
[ 312.987463] [Hardware Error]: Run the above through 'mcelog --ascii' <-----------------
[ 312.987463] [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 0: dc03400008000f43
[ 312.987463] [Hardware Error]: TSC 9a9c3818f3
[ 312.987463] [Hardware Error]: PROCESSOR 2:100f91 TIME 1302807382 SOCKET 0 APIC 0
[ 312.987463] [Hardware Error]: MC0_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc03400008000f43
[ 312.987463] [Hardware Error]: Data Cache Error:
[ 312.987463] [Hardware Error]: Corrupted DC MCE info?
[ 312.987463] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: DWR, part-proc: GEN (timed out)
[ 312.987463] [Hardware Error]: Run the above through 'mcelog --ascii' <-----------------
[ 312.987463] [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 0: f600210000000107
[ 312.987463] [Hardware Error]: TSC 88ce52d026
[ 312.987463] [Hardware Error]: PROCESSOR 2:100f91 TIME 1302807337 SOCKET 0 APIC 0
[ 312.987463] [Hardware Error]: MC0_STATUS[Over|UE|-|PCC|AddrV|UECC]: 0xf600210000000107
[ 312.987463] [Hardware Error]: Data Cache Error:
[ 312.987463] [Hardware Error]: Corrupted DC MCE info?
[ 312.987463] [Hardware Error]: cache level: L3/GEN, tx: DATA, mem-tx: GEN
[ 312.987463] [Hardware Error]: Run the above through 'mcelog --ascii' <-----------------
[ 312.987463] [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 0: dc03400008000f43
[ 312.987463] [Hardware Error]: TSC 9a9c3818f3
[ 312.987463] [Hardware Error]: PROCESSOR 2:100f91 TIME 1302807382 SOCKET 0 APIC 0
[ 312.987463] [Hardware Error]: MC0_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc03400008000f43
[ 312.987463] [Hardware Error]: Data Cache Error:
[ 312.987463] [Hardware Error]: Corrupted DC MCE info?
[ 312.987463] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: DWR, part-proc: GEN (timed out)
[ 312.987463] [Hardware Error]: Run the above through 'mcelog --ascii' <-----------------
[ 312.987463] [Hardware Error]: Machine check: MCIP not set in MCA handler
[ 312.987463] [Hardware Error]: Fake kernel panic: Fatal machine check on current CPU
[ 312.987463] mce_notify_irq: 2 callbacks suppressed
[ 312.987463] [Hardware Error]: Machine check events logged
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
next prev parent reply other threads:[~2011-04-14 19:03 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-12 17:44 [PATCH]: mce: don't print "human readable" message for corrected errors Prarit Bhargava
2011-04-12 18:58 ` Borislav Petkov
2011-04-12 19:22 ` Prarit Bhargava
2011-04-12 19:57 ` Borislav Petkov
2011-04-12 20:02 ` Luck, Tony
2011-04-12 20:15 ` Prarit Bhargava
2011-04-12 20:28 ` Borislav Petkov
2011-04-13 3:00 ` Russ Anderson
2011-04-13 7:14 ` Borislav Petkov
2011-04-13 13:24 ` Borislav Petkov
2011-04-13 13:36 ` [PATCH 1/3] x86, MCE: Do not taint when correctable errors Borislav Petkov
2011-04-13 13:36 ` [PATCH 2/3] x86, MCE: Drop default decoding notifier Borislav Petkov
2011-04-13 14:01 ` Prarit Bhargava
2011-04-13 14:18 ` Borislav Petkov
2011-04-13 14:22 ` Prarit Bhargava
2011-04-13 14:26 ` Borislav Petkov
2011-04-13 14:32 ` Prarit Bhargava
2011-04-13 14:39 ` Borislav Petkov
2011-04-13 14:45 ` Prarit Bhargava
2011-04-13 14:36 ` [PATCH -v2] " Borislav Petkov
2011-04-13 17:01 ` Prarit Bhargava
2011-04-13 17:13 ` Luck, Tony
2011-04-13 17:17 ` Prarit Bhargava
2011-04-13 17:14 ` Prarit Bhargava
2011-04-13 17:37 ` Borislav Petkov
2011-04-14 14:59 ` Prarit Bhargava
2011-04-14 15:00 ` [PATCH -v3] x86, MCE: Drop the " Borislav Petkov
2011-04-14 15:04 ` Prarit Bhargava
2011-04-14 15:16 ` Borislav Petkov
2011-04-14 15:23 ` Prarit Bhargava
2011-04-14 15:44 ` Borislav Petkov
2011-04-14 15:49 ` Prarit Bhargava
2011-04-14 19:02 ` Borislav Petkov [this message]
2011-04-14 19:04 ` Prarit Bhargava
2011-04-14 15:33 ` Russ Anderson
2011-04-14 15:49 ` Borislav Petkov
2011-04-13 13:36 ` [PATCH 3/3] EDAC, MCE, AMD: Register with MCE core Borislav Petkov
2011-04-13 2:24 ` [PATCH]: mce: don't print "human readable" message for corrected errors Russ Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110414190253.GQ10080@aftab \
--to=bp@amd64.org \
--cc=dnelson@redhat.com \
--cc=dzickus@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mstowe@redhat.com \
--cc=prarit@redhat.com \
--cc=rja@americas.sgi.com \
--cc=rja@sgi.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.