All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sven Schnelle <svens@stackframe.org>
To: Jon Mason <mason@myri.com>
Cc: Simon Kirby <sim@hostway.ca>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	Josh Boyer <jwboyer@gmail.com>,
	linux-kernel@vger.kernel.org, Jordan_Hargrave@dell.com
Subject: Re: [3.1-rc4] Bus Fatal Error caused by "PCI: Set PCI-E Max Payload Size on fabric"
Date: Thu, 08 Sep 2011 08:42:06 +0200	[thread overview]
Message-ID: <874o0na62p.fsf@begreifnix.stackframe.org> (raw)
In-Reply-To: <CAMaF-rOZ28UdUUMPjL_4v-_Tqk26otyhzVAmjvRXyE0tqc8odg@mail.gmail.com> (Jon Mason's message of "Wed\, 7 Sep 2011 14\:10\:59 -0700")

Jon Mason <mason@myri.com> writes:

> On Wed, Sep 7, 2011 at 1:58 PM, Simon Kirby <sim@hostway.ca> wrote:
>> On Wed, Sep 07, 2011 at 01:57:28PM -0700, Jon Mason wrote:
>>
>>> On Wed, Sep 7, 2011 at 1:47 PM, Simon Kirby <sim@hostway.ca> wrote:
>>> > On Wed, Sep 07, 2011 at 12:18:59PM -0700, Simon Kirby wrote:
>>> >
>>> >> On Wed, Sep 07, 2011 at 10:44:32AM -0700, Jesse Barnes wrote:
>>> >>
>>> >> > On Wed, 7 Sep 2011 12:52:25 -0400
>>> >> > Josh Boyer <jwboyer@gmail.com> wrote:
>>> >> >
>>> >> > > On Wed, Sep 7, 2011 at 12:22 PM, Sven Schnelle <svens@stackframe.org>
>>> >> > > wrote:
>>> >> > > > Simon Kirby <sim@hostway.ca> writes:
>>> >> > > >
>>> >> > > >> Hello!
>>> >> > > >>
>>> >> > > >> Since trying 3.1-rc4 on a few Dell servers, all of them have
>>> >> > > >> booted up with the amber error LED lit. "ipmitool sel list" shows:
>>> >> > > >>
>>> >> > > >> ?? ??1 | 09/06/2011 | 17:21:56 | Event Logging Disabled #0x72 | Log
>>> >> > > >> area reset/cleared | Asserted 2 | 09/06/2011 | 17:25:38 | Critical
>>> >> > > >> Interrupt #0x18 | Bus Fatal Error | Asserted 3 | 09/06/2011 |
>>> >> > > >> 17:25:38 | Unknown #0x1a | 4 | 09/06/2011 | 17:25:38 | Unknown
>>> >> > > >> #0x1a |
>>> >> > > >
>>> >> > > > I'm seeing exact the same issue on a Dell 1950 Server. If anyone
>>> >> > > > wants me to try additional debugging/patches, feel free to do
>>> >> > > > so. Unfortunately i don't have the time/knowledge to debug that by
>>> >> > > > myself.
>>> >> > >
>>> >> > > I thought Jesse or Jon had a revert or partial fix queued up to send
>>> >> > > to Linus, but I don't see anything in or post -rc5 yet. ?That was
>>> >> > > indicated in https://bugzilla.kernel.org/show_bug.cgi?id=42162
>>> >> > >
>>> >> > > Jesse, Jon?
>>> >> >
>>> >> > kernel.org is still down and I haven't pushed anything to github. ?I
>>> >> > asked Jon to send his patch directly to Linus today instead.
>>> >>
>>> >> FWIW, this patch didn't seem to fix it:
>>> >> https://bugzilla.kernel.org/attachment.cgi?id=71222
>>> >>
>>> >> dmesg used to say:
>>> >>
>>> >> pci 0000:00:02.0: Dev MPS 128 MPSS 256 MRRS 128
>>> >> pci 0000:00:02.0: Dev MPS 256 MPSS 256 MRRS 128
>>> >> pci 0000:06:00.0: Dev MPS 128 MPSS 256 MRRS 4096
>>> >> pci 0000:06:00.0: Dev MPS 256 MPSS 256 MRRS 128
>>> >> pci 0000:07:00.0: Dev MPS 128 MPSS 256 MRRS 4096
>>> >> pci 0000:07:00.0: Dev MPS 256 MPSS 256 MRRS 128
>>> >> pci 0000:08:00.0: Dev MPS 128 MPSS 128 MRRS 128
>>> >> pci 0000:08:00.0: MPS configured higher than maximum supported by the device. ?If a bus issue occurs, try running with pci=pcie_bus_safe.
>>> >> pci 0000:08:00.0: Dev MPS 256 MPSS 256 MRRS 128
>>> >> Uhhuh. NMI received for unknown reason 21 on CPU 0.
>>> >> Do you have a strange power saving mode enabled?
>>> >> Dazed and confused, but trying to continue
>>> >
>>> > Ok, I commented out the "pcie_write_mps(dev, mps);" line and the error
>>> > stopped, but this made me realize that the pci=pcie_bus_safe option must
>>> > have been missing. It turns out I had hacked a custom grub entry to load
>>> > the newest kernel into grub instead of the one with the highest version
>>> > number (grumble), so the default kopt didn't apply there.
>>> >
>>> > So, pci=pcie_bus_safe DOES fix this case, and I've confirmed that the
>>> > MRRS-dissabling patch makes no difference in this case.
>>> >
>>> > Can we just make pci=pcie_bus_safe (as in previous behavior) the default,
>>> > or make it not change where it would otherwise warn, or does that
>>> > basically make the thing useless?
>>>
>>> I have a patch that does does pcie_bus_safe as the default behavior
>>> and does not modify the MRRS.  Would you be willing to test this patch
>>> for me?
>>
>> Sure, of course. (It compiles, ship it. :))
>
> Great, thanks!  I've attached a patch file to this e-mail.

Thanks, Jon. Works my system (Dell 1950).

Tested-by: Sven Schnelle <svens@stackframe.org>

Regards

Sven

      parent reply	other threads:[~2011-09-08  6:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-06 17:36 [3.1-rc4] Bus Fatal Error caused by "PCI: Set PCI-E Max Payload Size on fabric" Simon Kirby
2011-09-07 16:22 ` Sven Schnelle
2011-09-07 16:52   ` Josh Boyer
2011-09-07 17:44     ` Jesse Barnes
2011-09-07 19:18       ` Simon Kirby
2011-09-07 20:47         ` Simon Kirby
2011-09-07 20:57           ` Jon Mason
2011-09-07 20:58             ` Simon Kirby
2011-09-07 21:10               ` Jon Mason
2011-09-07 21:33                 ` Simon Kirby
2011-09-08  6:42                 ` Sven Schnelle [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874o0na62p.fsf@begreifnix.stackframe.org \
    --to=svens@stackframe.org \
    --cc=Jordan_Hargrave@dell.com \
    --cc=jbarnes@virtuousgeek.org \
    --cc=jwboyer@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mason@myri.com \
    --cc=sim@hostway.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.