public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Simon Kirby <sim@hostway.ca>
To: Jesse Barnes <jbarnes@virtuousgeek.org>, Jon Mason <mason@myri.com>
Cc: Josh Boyer <jwboyer@gmail.com>,
	Sven Schnelle <svens@stackframe.org>,
	linux-kernel@vger.kernel.org, Jordan_Hargrave@dell.com
Subject: Re: [3.1-rc4] Bus Fatal Error caused by "PCI: Set PCI-E Max Payload Size on fabric"
Date: Wed, 7 Sep 2011 13:47:54 -0700	[thread overview]
Message-ID: <20110907204754.GA21603@hostway.ca> (raw)
In-Reply-To: <20110907191859.GB14950@hostway.ca>

On Wed, Sep 07, 2011 at 12:18:59PM -0700, Simon Kirby wrote:

> On Wed, Sep 07, 2011 at 10:44:32AM -0700, Jesse Barnes wrote:
> 
> > On Wed, 7 Sep 2011 12:52:25 -0400
> > Josh Boyer <jwboyer@gmail.com> wrote:
> > 
> > > On Wed, Sep 7, 2011 at 12:22 PM, Sven Schnelle <svens@stackframe.org>
> > > wrote:
> > > > Simon Kirby <sim@hostway.ca> writes:
> > > >
> > > >> Hello!
> > > >>
> > > >> Since trying 3.1-rc4 on a few Dell servers, all of them have
> > > >> booted up with the amber error LED lit. "ipmitool sel list" shows:
> > > >>
> > > >> ?? ??1 | 09/06/2011 | 17:21:56 | Event Logging Disabled #0x72 | Log
> > > >> area reset/cleared | Asserted 2 | 09/06/2011 | 17:25:38 | Critical
> > > >> Interrupt #0x18 | Bus Fatal Error | Asserted 3 | 09/06/2011 |
> > > >> 17:25:38 | Unknown #0x1a | 4 | 09/06/2011 | 17:25:38 | Unknown
> > > >> #0x1a |
> > > >
> > > > I'm seeing exact the same issue on a Dell 1950 Server. If anyone
> > > > wants me to try additional debugging/patches, feel free to do
> > > > so. Unfortunately i don't have the time/knowledge to debug that by
> > > > myself.
> > > 
> > > I thought Jesse or Jon had a revert or partial fix queued up to send
> > > to Linus, but I don't see anything in or post -rc5 yet.  That was
> > > indicated in https://bugzilla.kernel.org/show_bug.cgi?id=42162
> > > 
> > > Jesse, Jon?
> > 
> > kernel.org is still down and I haven't pushed anything to github.  I
> > asked Jon to send his patch directly to Linus today instead.
> 
> FWIW, this patch didn't seem to fix it:
> https://bugzilla.kernel.org/attachment.cgi?id=71222
> 
> dmesg used to say:
> 
> pci 0000:00:02.0: Dev MPS 128 MPSS 256 MRRS 128
> pci 0000:00:02.0: Dev MPS 256 MPSS 256 MRRS 128
> pci 0000:06:00.0: Dev MPS 128 MPSS 256 MRRS 4096
> pci 0000:06:00.0: Dev MPS 256 MPSS 256 MRRS 128
> pci 0000:07:00.0: Dev MPS 128 MPSS 256 MRRS 4096
> pci 0000:07:00.0: Dev MPS 256 MPSS 256 MRRS 128
> pci 0000:08:00.0: Dev MPS 128 MPSS 128 MRRS 128
> pci 0000:08:00.0: MPS configured higher than maximum supported by the device.  If a bus issue occurs, try running with pci=pcie_bus_safe.
> pci 0000:08:00.0: Dev MPS 256 MPSS 256 MRRS 128
> Uhhuh. NMI received for unknown reason 21 on CPU 0.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue

Ok, I commented out the "pcie_write_mps(dev, mps);" line and the error
stopped, but this made me realize that the pci=pcie_bus_safe option must
have been missing. It turns out I had hacked a custom grub entry to load
the newest kernel into grub instead of the one with the highest version
number (grumble), so the default kopt didn't apply there.

So, pci=pcie_bus_safe DOES fix this case, and I've confirmed that the
MRRS-dissabling patch makes no difference in this case.

Can we just make pci=pcie_bus_safe (as in previous behavior) the default,
or make it not change where it would otherwise warn, or does that
basically make the thing useless?

Simon-

  reply	other threads:[~2011-09-07 20:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-06 17:36 [3.1-rc4] Bus Fatal Error caused by "PCI: Set PCI-E Max Payload Size on fabric" Simon Kirby
2011-09-07 16:22 ` Sven Schnelle
2011-09-07 16:52   ` Josh Boyer
2011-09-07 17:44     ` Jesse Barnes
2011-09-07 19:18       ` Simon Kirby
2011-09-07 20:47         ` Simon Kirby [this message]
2011-09-07 20:57           ` Jon Mason
2011-09-07 20:58             ` Simon Kirby
2011-09-07 21:10               ` Jon Mason
2011-09-07 21:33                 ` Simon Kirby
2011-09-08  6:42                 ` Sven Schnelle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110907204754.GA21603@hostway.ca \
    --to=sim@hostway.ca \
    --cc=Jordan_Hargrave@dell.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=jwboyer@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mason@myri.com \
    --cc=svens@stackframe.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox