All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Alex_Gagniuc@Dellteam.com
Cc: oohall@gmail.com, gregkh@linuxfoundation.org,
	keith.busch@intel.com, mr.nuke.me@gmail.com,
	linux-pci@vger.kernel.org, Austin.Bolen@dell.com,
	Shyam.Iyer@dell.com, linux-kernel@vger.kernel.org,
	jonathan.derrick@intel.com, lukas@wunner.de, ruscur@russell.cc,
	sbobroff@linux.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected
Date: Mon, 12 Nov 2018 23:02:40 -0600	[thread overview]
Message-ID: <20181113050240.GA182139@google.com> (raw)
In-Reply-To: <df85813c9860463d85f6c302dfe07b12@ausx13mps321.AMER.DELL.COM>

[+cc Jon, for related VMD firmware-first error enable issue]

On Mon, Nov 12, 2018 at 08:05:41PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 11/11/2018 11:50 PM, Oliver O'Halloran wrote:
> > On Thu, 2018-11-08 at 23:06 +0000, Alex_Gagniuc@Dellteam.com wrote:

> >> But it's not the firmware that crashes. It's linux as a result of a
> >> fatal error message from the firmware. And we can't fix that because FFS
> >> handling requires that the system reboots [1].
> > 
> > Do we know the exact circumsances that result in firmware requesting a
> > reboot? If it happen on any PCIe error I don't see what we can do to
> > prevent that beyond masking UEs entirely (are we even allowed to do
> > that on FFS systems?).
> 
> Pull a drive out at an angle, push two drives in at the same time, pull 
> out a drive really slow. If an error is even reported to the OS depends 
> on PD state, and proprietary mechanisms and logic in the HW and FW. OS 
> is not supposed to mask errors (touch AER bits) on FFS.

PD?

Do you think Linux observes the rule about not touching AER bits on
FFS?  I'm not sure it does.  I'm not even sure what section of the
spec is relevant.

The whole issue of firmware-first, the mechanism by which firmware
gets control, the System Error enables in Root Port Root Control
registers, etc., is very murky to me.  Jon has a sort of similar issue
with VMD where he needs to leave System Errors enabled instead of
disabling them as we currently do.

Bjorn

[1] https://lore.kernel.org/linux-pci/20181029210651.GB13681@bhelgaas-glaptop.roam.corp.google.com

WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Alex_Gagniuc@Dellteam.com
Cc: Shyam.Iyer@dell.com, sbobroff@linux.ibm.com,
	gregkh@linuxfoundation.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, keith.busch@intel.com,
	lukas@wunner.de, oohall@gmail.com, mr.nuke.me@gmail.com,
	Austin.Bolen@dell.com, linuxppc-dev@lists.ozlabs.org,
	jonathan.derrick@intel.com
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected
Date: Mon, 12 Nov 2018 23:02:40 -0600	[thread overview]
Message-ID: <20181113050240.GA182139@google.com> (raw)
In-Reply-To: <df85813c9860463d85f6c302dfe07b12@ausx13mps321.AMER.DELL.COM>

[+cc Jon, for related VMD firmware-first error enable issue]

On Mon, Nov 12, 2018 at 08:05:41PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 11/11/2018 11:50 PM, Oliver O'Halloran wrote:
> > On Thu, 2018-11-08 at 23:06 +0000, Alex_Gagniuc@Dellteam.com wrote:

> >> But it's not the firmware that crashes. It's linux as a result of a
> >> fatal error message from the firmware. And we can't fix that because FFS
> >> handling requires that the system reboots [1].
> > 
> > Do we know the exact circumsances that result in firmware requesting a
> > reboot? If it happen on any PCIe error I don't see what we can do to
> > prevent that beyond masking UEs entirely (are we even allowed to do
> > that on FFS systems?).
> 
> Pull a drive out at an angle, push two drives in at the same time, pull 
> out a drive really slow. If an error is even reported to the OS depends 
> on PD state, and proprietary mechanisms and logic in the HW and FW. OS 
> is not supposed to mask errors (touch AER bits) on FFS.

PD?

Do you think Linux observes the rule about not touching AER bits on
FFS?  I'm not sure it does.  I'm not even sure what section of the
spec is relevant.

The whole issue of firmware-first, the mechanism by which firmware
gets control, the System Error enables in Root Port Root Control
registers, etc., is very murky to me.  Jon has a sort of similar issue
with VMD where he needs to leave System Errors enabled instead of
disabling them as we currently do.

Bjorn

[1] https://lore.kernel.org/linux-pci/20181029210651.GB13681@bhelgaas-glaptop.roam.corp.google.com

  reply	other threads:[~2018-11-13  5:02 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-18 22:15 [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected Alexandru Gagniuc
2018-11-06  0:32 ` Alex G.
2018-11-07 17:04   ` Derrick, Jonathan
2018-11-07 23:42 ` Bjorn Helgaas
2018-11-08 20:09   ` Bjorn Helgaas
2018-11-08 20:09     ` Bjorn Helgaas
2018-11-08 21:49     ` Keith Busch
2018-11-08 21:49       ` Keith Busch
2018-11-08 22:01     ` Greg Kroah-Hartman
2018-11-08 22:01       ` Greg Kroah-Hartman
2018-11-08 22:32       ` Keith Busch
2018-11-08 22:32         ` Keith Busch
2018-11-08 22:42         ` Greg Kroah-Hartman
2018-11-08 22:42           ` Greg Kroah-Hartman
2018-11-08 22:49           ` Alex_Gagniuc
2018-11-08 22:49             ` Alex_Gagniuc
2018-11-08 22:51             ` Greg KH
2018-11-08 22:51               ` Greg KH
2018-11-08 23:06               ` Alex_Gagniuc
2018-11-08 23:06                 ` Alex_Gagniuc
2018-11-12  5:49                 ` Oliver O'Halloran
2018-11-12  5:49                   ` Oliver O'Halloran
2018-11-12 20:05                   ` Alex_Gagniuc
2018-11-12 20:05                     ` Alex_Gagniuc
2018-11-13  5:02                     ` Bjorn Helgaas [this message]
2018-11-13  5:02                       ` Bjorn Helgaas
2018-11-13 22:39                       ` Alex_Gagniuc
2018-11-13 22:39                         ` Alex_Gagniuc
2018-11-13 22:52                         ` Keith Busch
2018-11-13 22:52                           ` Keith Busch
2018-11-14  0:31                           ` Alex_Gagniuc
2018-11-14  0:31                             ` Alex_Gagniuc
2018-11-14  5:59                         ` Bjorn Helgaas
2018-11-14  5:59                           ` Bjorn Helgaas
2018-11-14 19:22                           ` Alex_Gagniuc
2018-11-14 19:22                             ` Alex_Gagniuc
2018-11-14 19:41                             ` Derrick, Jonathan
2018-11-14 19:41                               ` Derrick, Jonathan
2018-11-14 20:23                             ` Keith Busch
2018-11-14 20:23                               ` Keith Busch
2018-11-14 20:52                               ` Alex_Gagniuc
2018-11-14 20:52                                 ` Alex_Gagniuc
2018-11-14 20:58                                 ` Keith Busch
2018-11-14 20:58                                   ` Keith Busch
2018-11-15  6:24                             ` Bjorn Helgaas
2018-11-15  6:24                               ` Bjorn Helgaas
2018-11-16  0:19                               ` Alex_Gagniuc
2018-11-16  0:19                                 ` Alex_Gagniuc
2018-11-08 23:03           ` Keith Busch
2018-11-08 23:03             ` Keith Busch
2018-11-09  7:29       ` Lukas Wunner
2018-11-09 11:32         ` Greg Kroah-Hartman
2018-11-09 11:32           ` Greg Kroah-Hartman
2018-11-09 16:36           ` Keith Busch
2018-11-09 16:36             ` Keith Busch
2018-11-08 22:20     ` Alex_Gagniuc
2018-11-08 22:20       ` Alex_Gagniuc
2018-11-09  7:11     ` Lukas Wunner
2018-11-12  5:48       ` Oliver O'Halloran
2018-11-12  5:48         ` Oliver O'Halloran
2018-12-27 19:28     ` Alex_Gagniuc
2018-12-27 19:28       ` Alex_Gagniuc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181113050240.GA182139@google.com \
    --to=helgaas@kernel.org \
    --cc=Alex_Gagniuc@Dellteam.com \
    --cc=Austin.Bolen@dell.com \
    --cc=Shyam.Iyer@dell.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jonathan.derrick@intel.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lukas@wunner.de \
    --cc=mr.nuke.me@gmail.com \
    --cc=oohall@gmail.com \
    --cc=ruscur@russell.cc \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.