public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Adalbert Dawid <dawid@rinux.net>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Borislav Petkov <bp@amd64.org>,
	"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@elte.hu" <mingo@elte.hu>,
	"x86@kernel.org" <x86@kernel.org>
Subject: RE: Kernel Panic with Rawtherapee (mce related)
Date: Wed, 14 Mar 2012 21:18:51 +0100	[thread overview]
Message-ID: <1331756331.5315.19.camel@erde.fritz.box> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F15B64BF0@ORSMSX103.amr.corp.intel.com>

Thank you for the quick reply.

On Wed, 2012-03-14 at 17:51 +0000, Luck, Tony wrote:
> > You're getting a bunch of machine checks, the last one of them being
> > fatal (Process Context Corrupt bit is set) causing the machine to panic.
> 
> PCC is set in all of them
> 
> > Tony will probably be able to help you further in decoding what exactly
> > those MC0_STATUS and MC5_STATUS values mean
> 
> Bank 5 ends in 0400 - which means "Internal timer error". Bank 0 has 0800
> which is a bus/interconnect error where this processor was the source of
> a memory transaction.
> 
> That's where the facts end - speculation begins here ...
> 
> Since this is repeatable under load - it's possible that a page table got
> corrupted and you are trying to access some non-existent memory location?
> Do all traces for this panic involve *_tlb_* functions?

Since the screenshot I had posted is the only one I have been able to
capture, I don't know. I will try to provoke the crash by setting the
machine under load utilizing rawtherapee and will post results in case
of success. Cpuburn did not manage to crash the machine in a (shortish)
test I did a few days ago.

It would be very helpful to disable the "reboot in 30 seconds" timeout.
Is that possible somehow?

> Or perhaps you have a cooling problem - and when stressed your cpu or
> memory is getting too hot?

I do not believe this is true as the cpu fan plus two case fans are
running fine and the sensors display cpu tempratures <60°C, even under
load.

Up to now, it has always been rawtherapee that crashed the machine. This
is why I thought it might possibly be some special cpu feature (an SSE
command or something) that happens to be broken in my cpu and that is
triggered only by rawtherapee and not by any other software. What is
your opinion on this theory? 

> -Tony
> 



  reply	other threads:[~2012-03-14 20:19 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-13 22:48 Kernel Panic with Rawtherapee Adalbert Dawid
2012-03-14 14:53 ` Kernel Panic with Rawtherapee (mce related) Srivatsa S. Bhat
2012-03-14 15:59   ` Borislav Petkov
2012-03-14 17:51     ` Luck, Tony
2012-03-14 20:18       ` Adalbert Dawid [this message]
2012-03-14 21:53         ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1331756331.5315.19.camel@erde.fritz.box \
    --to=dawid@rinux.net \
    --cc=bp@amd64.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox