public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Keith Owens <kaos@sgi.com>
To: Andi Kleen <ak@suse.de>
Cc: Ashok Raj <ashok.raj@intel.com>,
	linux-kernel@vger.kernel.org,
	"Brendan Trotter" <btrotter@gmail.com>
Subject: Re: NMI problems with Dell SMP Xeons
Date: Wed, 07 Jun 2006 17:43:47 +1000	[thread overview]
Message-ID: <8446.1149666227@kao2.melbourne.sgi.com> (raw)
In-Reply-To: Your message of "Wed, 07 Jun 2006 09:20:23 +0200." <200606070920.23436.ak@suse.de>

Andi Kleen (on Wed, 7 Jun 2006 09:20:23 +0200) wrote:
>On Wednesday 07 June 2006 06:49, Keith Owens wrote:
>> Following a suggestion by Brendan Trotter, I ran some more tests to
>> track down the problem with sending NMI IPI on Dell Xeons.
>>
>> BIOS Logical    OS ACPI     Cpus    IPI 2             NMI IPI
>>  Processor                BIOS  OS                 (APIC_DM_NMI)
>>
>> Enabled         Enabled    4    4  Not delivered   Delivered as NMI
>> Enabled         Disabled   4    2  Machine reset   Machine reset
>> Disabled        Enabled    2    2  Not delivered   Delivered as NMI
>> Disabled        Disabled   2    2  Not delivered   Delivered as NMI
>>
>> So the killer combination with this motherboard is when the BIOS knows
>> about logical processors but the OS does not.  Sending IPI 2 or NMI IPI
>> with that combination kills the machine.  Brendan suggested that the
>> BIOS is seeing the broadcast NMI on the logical processors which are
>> not under OS control and that the BIOS cannot cope.
>
>How did you manage that? Normally the OS should use all CPUs
>known to BIOS. Or did you boot with special boot options to limit it?

Two ways:

(1) Boot with a kernel with CONFIG_ACPI=n, so the OS only finds 2 cpus
    in the MPT instead of the 4 listed by ACPI.

(2) The kernel has ACPI=y, but is booted with maxcpus=2.

In both cases, send_IPI_allbutself() with IPI 2 or an NMI will result
in a hard reset.

>> Should we change the x86_64 send_IPI_allbutself() so it is only
>> delivered to cpus that the OS knows about, instead of doing a general
>> broadcast. 
>
>Hmm, we should be doing that already to avoid races for CPU hotplug.  But 
>maybe it's not working correctly for KDB.

This problem is not KDB specific, although that is where it was first
noticed.  Any code that sends a broadcast IPI 2 or an NMI IPI will
crash these Dell boxes when there is a mismatch between the cpus known
to the BIOS and the cpus known to the OS.

>Does it go away when you
>enable CPU hotplug?

HOTPLUG_CPU was already on in all of my test kernels.

>Anyways, should be a SMOP to force it. I wouldn't
>have a problem to use sequence ipis  always and get rid of the broadcasts.
>There were benchmarks at some point and there wasn't a noticeable
>difference. 

I will try forcing send_IPI_allbutself() to use the mask version rather
than the broadcast shortcut.  Later tonight ...


  reply	other threads:[~2006-06-07  7:44 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-05-22  9:08 NMI problems with Dell SMP Xeons Keith Owens
2006-05-22 23:56 ` Andi Kleen
2006-05-23  1:26   ` Keith Owens
2006-05-23  1:55     ` Andi Kleen
2006-05-23  2:02       ` Keith Owens
2006-05-23  2:21         ` Keith Owens
2006-05-23  5:03 ` Keith Owens
2006-06-07  4:49   ` Keith Owens
2006-06-07  7:20     ` Andi Kleen
2006-06-07  7:43       ` Keith Owens [this message]
2006-06-07  8:01         ` Andi Kleen
2006-06-07 11:47           ` Keith Owens
2006-06-07 12:13             ` Andi Kleen
2006-06-07 15:18         ` Brendan Trotter
2006-06-07 15:23           ` Andi Kleen
2006-06-07 18:47           ` Rajesh Shah
2006-06-08  0:41             ` Rajesh Shah
2006-06-08  0:46           ` Rajesh Shah
2006-06-08  5:11           ` Keith Owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8446.1149666227@kao2.melbourne.sgi.com \
    --to=kaos@sgi.com \
    --cc=ak@suse.de \
    --cc=ashok.raj@intel.com \
    --cc=btrotter@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox