From: Alexander Holler <holler@ahsoftware.de>
To: Borislav Petkov <bp@alien8.de>, linux-kernel@vger.kernel.org
Subject: Re: AMD A10: MCE Instruction Cache Error
Date: Tue, 06 Nov 2012 12:18:13 +0100 [thread overview]
Message-ID: <5098F1F5.5060709@ahsoftware.de> (raw)
In-Reply-To: <20121106091057.GC2090@x1.osrc.amd.com>
Am 06.11.2012 10:10, schrieb Borislav Petkov:
> On Sun, Nov 04, 2012 at 06:19:32PM +0100, Alexander Holler wrote:
>> I was remotely logged in and there aren't that many faults which
>> lead to complete stand still of hw (no reset).
>
> Right, can you retry triggering the freeze without the fglrx driver?
> Simply remove it completely so that even the possibility to load it is
> not there.
Will do. But I don't think it is fglrx. I'm using it since several years
(just with an external graphics card before) and never had a problem
with it. Besides that, during the hangs nothing on the display happened,
I was logged out and just had a remote ssh session on.
>> But as you said I can't know, the only thing I know is that a box
>> with new mb, memory and apu come to a complete stand still, and
>> such shortly after I've received an emergency message which told me
>> that a bit inside the cpu switched unexpected. Adding to that, the
>> box did the same as what it did while it received the MCE, a backup
>> from a sata-atached ssd to an usb3-hd which includes compression and
>> encryption which keeps all cores at work most of the time for several
>> hours.
>
> So do you get that MCE each time you execute that same workload?
No, up to now the MCE only was visible once. But stressing the box
yesterday (with loads of 3 for several hours and such) revealed some
other serious failures which all look like the stuff which happens when
the cache (or memory) is broken (I don't know how many bits of the cache
can be corrected until something else happens or what happens). E.g. the
checksum of a backup is wrong, or bzip2 failed with an error which it
suggests is because of an HW failure like bad RAM (I've never seen that
error from bzip2 before).
I've just done a memory test using memtest86+-4.20 for about 7h (3
complete passes of all 16GB), no errors, so the new memory itself seems
to be ok.
I will now to tests with leaving fglrx off.
Regards,
Alexander
next prev parent reply other threads:[~2012-11-06 11:18 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-02 10:50 AMD A10: MCE Instruction Cache Error Alexander Holler
2012-11-02 13:53 ` Alexander Holler
2012-11-03 4:49 ` Borislav Petkov
2012-11-03 10:45 ` Alexander Holler
2012-11-04 15:21 ` Borislav Petkov
2012-11-04 17:19 ` Alexander Holler
2012-11-06 9:10 ` Borislav Petkov
2012-11-06 11:18 ` Alexander Holler [this message]
2012-11-06 11:44 ` Alexander Holler
2012-11-06 13:14 ` Alexander Holler
2012-11-06 14:47 ` Borislav Petkov
2012-11-06 16:02 ` Alexander Holler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5098F1F5.5060709@ahsoftware.de \
--to=holler@ahsoftware.de \
--cc=bp@alien8.de \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox