All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Kirby <sim@hostway.ca>
To: Jesse Barnes <jbarnes@virtuousgeek.org>, Jon Mason <mason@myri.com>
Cc: Josh Boyer <jwboyer@gmail.com>,
	Sven Schnelle <svens@stackframe.org>,
	linux-kernel@vger.kernel.org, Jordan_Hargrave@dell.com
Subject: Re: [3.1-rc4] Bus Fatal Error caused by "PCI: Set PCI-E Max Payload Size on fabric"
Date: Wed, 7 Sep 2011 13:47:54 -0700	[thread overview]
Message-ID: <20110907204754.GA21603@hostway.ca> (raw)
In-Reply-To: <20110907191859.GB14950@hostway.ca>

On Wed, Sep 07, 2011 at 12:18:59PM -0700, Simon Kirby wrote:

> On Wed, Sep 07, 2011 at 10:44:32AM -0700, Jesse Barnes wrote:
> 
> > On Wed, 7 Sep 2011 12:52:25 -0400
> > Josh Boyer <jwboyer@gmail.com> wrote:
> > 
> > > On Wed, Sep 7, 2011 at 12:22 PM, Sven Schnelle <svens@stackframe.org>
> > > wrote:
> > > > Simon Kirby <sim@hostway.ca> writes:
> > > >
> > > >> Hello!
> > > >>
> > > >> Since trying 3.1-rc4 on a few Dell servers, all of them have
> > > >> booted up with the amber error LED lit. "ipmitool sel list" shows:
> > > >>
> > > >> ?? ??1 | 09/06/2011 | 17:21:56 | Event Logging Disabled #0x72 | Log
> > > >> area reset/cleared | Asserted 2 | 09/06/2011 | 17:25:38 | Critical
> > > >> Interrupt #0x18 | Bus Fatal Error | Asserted 3 | 09/06/2011 |
> > > >> 17:25:38 | Unknown #0x1a | 4 | 09/06/2011 | 17:25:38 | Unknown
> > > >> #0x1a |
> > > >
> > > > I'm seeing exact the same issue on a Dell 1950 Server. If anyone
> > > > wants me to try additional debugging/patches, feel free to do
> > > > so. Unfortunately i don't have the time/knowledge to debug that by
> > > > myself.
> > > 
> > > I thought Jesse or Jon had a revert or partial fix queued up to send
> > > to Linus, but I don't see anything in or post -rc5 yet.  That was
> > > indicated in https://bugzilla.kernel.org/show_bug.cgi?id=42162
> > > 
> > > Jesse, Jon?
> > 
> > kernel.org is still down and I haven't pushed anything to github.  I
> > asked Jon to send his patch directly to Linus today instead.
> 
> FWIW, this patch didn't seem to fix it:
> https://bugzilla.kernel.org/attachment.cgi?id=71222
> 
> dmesg used to say:
> 
> pci 0000:00:02.0: Dev MPS 128 MPSS 256 MRRS 128
> pci 0000:00:02.0: Dev MPS 256 MPSS 256 MRRS 128
> pci 0000:06:00.0: Dev MPS 128 MPSS 256 MRRS 4096
> pci 0000:06:00.0: Dev MPS 256 MPSS 256 MRRS 128
> pci 0000:07:00.0: Dev MPS 128 MPSS 256 MRRS 4096
> pci 0000:07:00.0: Dev MPS 256 MPSS 256 MRRS 128
> pci 0000:08:00.0: Dev MPS 128 MPSS 128 MRRS 128
> pci 0000:08:00.0: MPS configured higher than maximum supported by the device.  If a bus issue occurs, try running with pci=pcie_bus_safe.
> pci 0000:08:00.0: Dev MPS 256 MPSS 256 MRRS 128
> Uhhuh. NMI received for unknown reason 21 on CPU 0.
> Do you have a strange power saving mode enabled?
> Dazed and confused, but trying to continue

Ok, I commented out the "pcie_write_mps(dev, mps);" line and the error
stopped, but this made me realize that the pci=pcie_bus_safe option must
have been missing. It turns out I had hacked a custom grub entry to load
the newest kernel into grub instead of the one with the highest version
number (grumble), so the default kopt didn't apply there.

So, pci=pcie_bus_safe DOES fix this case, and I've confirmed that the
MRRS-dissabling patch makes no difference in this case.

Can we just make pci=pcie_bus_safe (as in previous behavior) the default,
or make it not change where it would otherwise warn, or does that
basically make the thing useless?

Simon-

  reply	other threads:[~2011-09-07 20:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-06 17:36 [3.1-rc4] Bus Fatal Error caused by "PCI: Set PCI-E Max Payload Size on fabric" Simon Kirby
2011-09-07 16:22 ` Sven Schnelle
2011-09-07 16:52   ` Josh Boyer
2011-09-07 17:44     ` Jesse Barnes
2011-09-07 19:18       ` Simon Kirby
2011-09-07 20:47         ` Simon Kirby [this message]
2011-09-07 20:57           ` Jon Mason
2011-09-07 20:58             ` Simon Kirby
2011-09-07 21:10               ` Jon Mason
2011-09-07 21:33                 ` Simon Kirby
2011-09-08  6:42                 ` Sven Schnelle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110907204754.GA21603@hostway.ca \
    --to=sim@hostway.ca \
    --cc=Jordan_Hargrave@dell.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=jwboyer@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mason@myri.com \
    --cc=svens@stackframe.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.