public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: AndrewL733 <AndrewL733@aol.com>
To: rdunlap@xenotime.net
Cc: Jim Paris <jim@jtan.com>,
	linas@austin.ibm.com, Alan Cox <alan@lxorguk.ukuu.org.uk>,
	samson yeung <fragmede@onepatchdown.net>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	bbermack@alum.mit.edu, Justin Mazzola Paluska <jmp@mit.edu>
Subject: Re: NMI error and Intel S5000PSL Motherboards
Date: Fri, 28 Sep 2007 11:13:51 -0400	[thread overview]
Message-ID: <46FD1A2F.9050801@aol.com> (raw)
In-Reply-To: <20070926170330.ee8fb5b4.rdunlap@xenotime.net>

rdunlap@xenotime.net wrote:
> On Wed, 26 Sep 2007 19:48:14 -0400 Jim Paris wrote:
>
>   
>> Hello,
>>
>>     
>>> We have about 100 servers based on Intel S5000PSL-SATA motherboards. 
>>> They have been running for anywhere between 1 and 10 months. For the 
>>> past few months, after updating them all to the 2.6.20.15 kernel 
>>> (because of a bug in the 2.6.18 kernel), we are seeing some strange NMI 
>>> errors. For example:
>>>
>>> Aug 29 09:02:10 master kernel: Uhhuh. NMI received for unknown reason 30.
>>> Aug 29 09:02:10 master kernel: Do you have a strange power saving mode enabled?
>>> Aug 29 09:02:10 master kernel: Dazed and confused, but trying to continue
>>>       
>> I'm also working with Andrew and Samson.  It seems that the cause of
>> the problem is CONFIG_PCIEAER, which was introduced after 2.6.18 and
>> defaults to y.
>>
>> With CONFIG_PCIEAER=n, scanpci works fine with no errors.  This is the
>> workaround that they'll likely use for now.
>>     
>
> Glad that you found it.
>
>   
>> With CONFIG_PCIEAER=y, scanpci always triggers the NMI error.  The
>> option aerdriver.forceload=1 has no effect.
>>     
>
> The 'forceload' option only forces the driver to load even when the
> ACPI hardware initialization routine fails.
>
> It would be nice to be able to disable PCIEAER at boot time though.
> Shouldn't be difficult.
>
>   
So, looking for some closure here, what do we think is the "root cause"? 
Is it:

1)  a defect with Intel's S5000PSL motherboards that is exposed by an 
otherwise fine new (since 2.6.19) Linux kernel feature? (in which case 
we and others should probably press Intel to recognize they have a 
problem, seeing as they only "officially support" distributions running 
on 2.6.16 or below so maybe they don't even know about this issue).

2)  a problem with PCIEAER? And maybe "CONFIG_PCIEAER=y"  should NOT be 
the default setting? (in which case the kernel maybe needs fixing)

3)  just a bad interaction between a good motherboard and a good Linux 
feature that don't play well together? (in which case this is a kernel 
"feature" that anybody compiling a kernel to run on the Intel S5000PSL 
motherboard should know not to enable -- maybe a note is warranted so 
that when configuring the kernel, people with S5000PSL motherboards 
might not make the same mistake???).



>   
>> The related dmesg output at boot is:
>>
>>   Evaluate _OSC Set fails. Status = 0x0005
>>   Evaluate _OSC Set fails. Status = 0x0005
>>   aer_init: AER service init fails - Run ACPI _OSC fails
>>   aer: probe of 0000:00:02.0:pcie01 failed with error 2
>>   aer_init: AER service init fails - No ACPI _OSC support
>>   aer: probe of 0000:00:03.0:pcie01 failed with error 1
>>   Evaluate _OSC Set fails. Status = 0x0005
>>   Evaluate _OSC Set fails. Status = 0x0005
>>   aer_init: AER service init fails - Run ACPI _OSC fails
>>   aer: probe of 0000:00:04.0:pcie01 failed with error 2
>>   Evaluate _OSC Set fails. Status = 0x0005
>>   Evaluate _OSC Set fails. Status = 0x0005
>>   aer_init: AER service init fails - Run ACPI _OSC fails
>>   aer: probe of 0000:00:05.0:pcie01 failed with error 2
>>   Evaluate _OSC Set fails. Status = 0x0005
>>   Evaluate _OSC Set fails. Status = 0x0005
>>   aer_init: AER service init fails - Run ACPI _OSC fails
>>   aer: probe of 0000:00:06.0:pcie01 failed with error 2
>>   aer_init: AER service init fails - No ACPI _OSC support
>>   aer: probe of 0000:00:07.0:pcie01 failed with error 1
>>
>> Full dmesg, lspci, and ACPI DSDT are available here:
>>   http://jim.sh/~jim/tmp/nmi/
>>
>> -jim
>>     
>
>
> ---
> ~Randy
> Phaedrus says that Quality is about caring.
>   


  parent reply	other threads:[~2007-09-28 15:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-26 10:12 NMI error and Intel S5000PSL Motherboards AndrewL733
2007-09-26  2:59 ` Randy Dunlap
2007-09-26  4:58 ` Randy Dunlap
2007-09-26 11:16 ` Alan Cox
2007-09-26 23:48 ` Jim Paris
2007-09-27  0:03   ` Randy Dunlap
2007-09-28 15:11     ` AndrewL733
2007-09-28 15:13     ` AndrewL733 [this message]
2007-10-01  4:09       ` Repost: " AndrewL733
  -- strict thread matches above, loose matches on Subject: below --
2007-09-26 19:07 [Re: NMI error and Intel S5000PSL Motherboards] samson yeung
2007-09-26 20:52 ` Randy Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46FD1A2F.9050801@aol.com \
    --to=andrewl733@aol.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=bbermack@alum.mit.edu \
    --cc=fragmede@onepatchdown.net \
    --cc=jim@jtan.com \
    --cc=jmp@mit.edu \
    --cc=linas@austin.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rdunlap@xenotime.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox