From: Borislav Petkov <bp@amd64.org>
To: "F. P. Beekhof" <fpbeekhof@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>, Borislav Petkov <bp@amd64.org>,
Jeff Garzik <jgarzik@pobox.com>,
Mikael Pettersson <mikpe@it.uu.se>,
"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>
Subject: Re: Machine check exception
Date: Sun, 31 Jul 2011 14:09:44 +0200 [thread overview]
Message-ID: <20110731120944.GA17348@aftab> (raw)
In-Reply-To: <4E3510D1.8010703@gmail.com>
Hi,
On Sun, Jul 31, 2011 at 04:22:41AM -0400, F. P. Beekhof wrote:
> I've used the hooks to call a script, the value is 100008 after
> resume, and I'm booting the system by going onto 'recovery console',
> running the script to set msr 0xc001001f to 100008, then completing
> the normal boot procedure.
Hmm, there has a to be a way to automate that. Maybe push
/etc/init.d/rc.local up in the call prio so that it gets run as early as
possible?
> So far, it seems to have fixed the issue, in the sense that there have
> been no MCEs yet. There was some call trace after a suspend/resume
> (see below), but that's it.
Yeah, its on resume. This warning fires because it took the system
17880 msecs to resume and the test was expecting something under 10000.
It could be unstable RTC clock or something. You could disable it by
turning CONFIG_PM_TEST_SUSPEND off for your kernel if there's no other
issues with suspend/resume beside that warning firing.
> I found that one can enable ECC on ram in the bios, which I did. As
> far as I know, this is non-ECC ram, so frankly I'm at a loss about
Maybe the BIOS is not properly detecting whether DRAM is ECC or not.
Normally, if it is not, it should simply remove the option to enable ECC
from the menu.
To check what the hw says, do
$ setpci -s 18.3 0x44.l
as root and send me the result pls.
> To provoke MCEs, I've added a firewire card, that I had pulled out
> before. Removing that thing had reduced the number of MCEs, but not
> eliminated them. With a regular boot sequence (no msr setting), the
> radeon driver complained of something and the system froze within 5
> minutes. I then rebooted and followed your instructions, so far the
> system is working perfectly fine.
good.
> I've also switched two eSATA on and off a few times, they are detected
> fine now with no crash, and let banshee run. That has frequently
> proven to be too much, but now it is fine.
good.
> All of this is no definite proof that all is well, but it certainly
> seems more stable.
I'd suggest you run your system at full swing and watch it for signs of
trouble a couple of days longer just in case.
> Are there any conclusions that can be drawn from this experiment ?
Yeah, it means that your BIOS doesn't seem to have the fix for erratum
#131: http://support.amd.com/us/Processor_TechDocs/25759.pdf, page 83.
I don't know whether there is BIOS for your ancient CPU :-) and if there
were, whether upgrading it won't break something else.
If I were you, I'd run the automated script hooks and don't care about
upgrade... provided we don't see any other hickups that is and provided
we manage to automate them so that you don't have to boot into recovery
console every time.
Let me know how it all plays out.
HTH.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
next prev parent reply other threads:[~2011-07-31 12:09 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-27 11:05 Machine check exception F. P. Beekhof
2011-07-27 13:03 ` Borislav Petkov
2011-07-27 15:31 ` F. P. Beekhof
2011-07-27 17:03 ` Borislav Petkov
2011-07-27 20:54 ` F. P. Beekhof
2011-07-27 21:30 ` F. P. Beekhof
2011-07-28 7:47 ` Borislav Petkov
2011-07-31 8:22 ` F. P. Beekhof
2011-07-31 12:09 ` Borislav Petkov [this message]
2011-07-31 15:56 ` F. P. Beekhof
2011-08-01 8:48 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110731120944.GA17348@aftab \
--to=bp@amd64.org \
--cc=bp@alien8.de \
--cc=fpbeekhof@gmail.com \
--cc=jgarzik@pobox.com \
--cc=linux-ide@vger.kernel.org \
--cc=mikpe@it.uu.se \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).