From: Keith Busch <keith.busch@intel.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lukas Wunner <lukas@wunner.de>,
Bjorn Helgaas <helgaas@kernel.org>,
Alexandru Gagniuc <mr.nuke.me@gmail.com>,
linux-pci@vger.kernel.org, alex_gagniuc@dellteam.com,
austin_bolen@dell.com, shyam_iyer@dell.com,
linux-kernel@vger.kernel.org,
Jonathan Derrick <jonathan.derrick@intel.com>,
Russell Currey <ruscur@russell.cc>,
Sam Bobroff <sbobroff@linux.ibm.com>,
Oliver O'Halloran <oohall@gmail.com>,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected
Date: Fri, 9 Nov 2018 09:36:29 -0700 [thread overview]
Message-ID: <20181109163629.GF2932@localhost.localdomain> (raw)
In-Reply-To: <20181109113257.GB29785@kroah.com>
On Fri, Nov 09, 2018 at 03:32:57AM -0800, Greg Kroah-Hartman wrote:
> On Fri, Nov 09, 2018 at 08:29:53AM +0100, Lukas Wunner wrote:
> > On Thu, Nov 08, 2018 at 02:01:17PM -0800, Greg Kroah-Hartman wrote:
> > > On Thu, Nov 08, 2018 at 02:09:17PM -0600, Bjorn Helgaas wrote:
> > > > I'm having second thoughts about this. One thing I'm uncomfortable
> > > > with is that sprinkling pci_dev_is_disconnected() around feels ad hoc
> > >
> > > I think my stance always has been that this call is not good at all
> > > because once you call it you never really know if it is still true as
> > > the device could have been removed right afterward.
> > >
> > > So almost any code that relies on it is broken, there is no locking and
> > > it can and will race and you will loose.
> >
> > Hm, to be honest if that's your impression I think you must have missed a
> > large portion of the discussion we've been having over the past 2 years.
> >
> > Please consider reading this LWN article, particularly the "Surprise
> > removal" section, to get up to speed:
> >
> > https://lwn.net/Articles/767885/
> >
> > You seem to be assuming that all we care about is the *return value* of
> > an mmio read. However a transaction to a surprise removed device has
> > side effects beyond returning all ones, such as a Completion Timeout
> > which, with thousands of transactions in flight, added up to many seconds
> > to handle removal of an NVMe array and occasionally caused MCEs.
>
> Again, I still claim this is broken hardware/firmware :)
Indeed it is, but I don't want to abandon people with hardware in hand
if we can make it work despite being broken. Perfection is the enemy of
good. :)
> > It is not an option to just blindly carry out device accesses even though
> > it is known the device is gone, Completion Timeouts be damned.
>
> I don't disagree with you at all, and your other email is great with
> summarizing the issues here.
>
> What I do object to is somehow relying on that function call as knowing
> that the device really is present or not. It's a good hint, yes, but
> driver authors still have to be able to handle the bad data coming back
> from when the call races with the device being removed.
The function has always been a private interface. It is not available
for drivers to rely on.
The only thing we're trying to accomplish is not start a transaction
if software knows it will not succeed. There are certainly times when
a transaction will fail that software does not forsee, but we're not
suggesting the intent handles that either.
WARNING: multiple messages have this Message-ID (diff)
From: Keith Busch <keith.busch@intel.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: alex_gagniuc@dellteam.com, Sam Bobroff <sbobroff@linux.ibm.com>,
linux-pci@vger.kernel.org, shyam_iyer@dell.com,
linux-kernel@vger.kernel.org, Lukas Wunner <lukas@wunner.de>,
Bjorn Helgaas <helgaas@kernel.org>,
Alexandru Gagniuc <mr.nuke.me@gmail.com>,
Oliver O'Halloran <oohall@gmail.com>,
austin_bolen@dell.com, linuxppc-dev@lists.ozlabs.org,
Jonathan Derrick <jonathan.derrick@intel.com>
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected
Date: Fri, 9 Nov 2018 09:36:29 -0700 [thread overview]
Message-ID: <20181109163629.GF2932@localhost.localdomain> (raw)
In-Reply-To: <20181109113257.GB29785@kroah.com>
On Fri, Nov 09, 2018 at 03:32:57AM -0800, Greg Kroah-Hartman wrote:
> On Fri, Nov 09, 2018 at 08:29:53AM +0100, Lukas Wunner wrote:
> > On Thu, Nov 08, 2018 at 02:01:17PM -0800, Greg Kroah-Hartman wrote:
> > > On Thu, Nov 08, 2018 at 02:09:17PM -0600, Bjorn Helgaas wrote:
> > > > I'm having second thoughts about this. One thing I'm uncomfortable
> > > > with is that sprinkling pci_dev_is_disconnected() around feels ad hoc
> > >
> > > I think my stance always has been that this call is not good at all
> > > because once you call it you never really know if it is still true as
> > > the device could have been removed right afterward.
> > >
> > > So almost any code that relies on it is broken, there is no locking and
> > > it can and will race and you will loose.
> >
> > Hm, to be honest if that's your impression I think you must have missed a
> > large portion of the discussion we've been having over the past 2 years.
> >
> > Please consider reading this LWN article, particularly the "Surprise
> > removal" section, to get up to speed:
> >
> > https://lwn.net/Articles/767885/
> >
> > You seem to be assuming that all we care about is the *return value* of
> > an mmio read. However a transaction to a surprise removed device has
> > side effects beyond returning all ones, such as a Completion Timeout
> > which, with thousands of transactions in flight, added up to many seconds
> > to handle removal of an NVMe array and occasionally caused MCEs.
>
> Again, I still claim this is broken hardware/firmware :)
Indeed it is, but I don't want to abandon people with hardware in hand
if we can make it work despite being broken. Perfection is the enemy of
good. :)
> > It is not an option to just blindly carry out device accesses even though
> > it is known the device is gone, Completion Timeouts be damned.
>
> I don't disagree with you at all, and your other email is great with
> summarizing the issues here.
>
> What I do object to is somehow relying on that function call as knowing
> that the device really is present or not. It's a good hint, yes, but
> driver authors still have to be able to handle the bad data coming back
> from when the call races with the device being removed.
The function has always been a private interface. It is not available
for drivers to rely on.
The only thing we're trying to accomplish is not start a transaction
if software knows it will not succeed. There are certainly times when
a transaction will fail that software does not forsee, but we're not
suggesting the intent handles that either.
next prev parent reply other threads:[~2018-11-09 16:40 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-18 22:15 [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected Alexandru Gagniuc
2018-11-06 0:32 ` Alex G.
2018-11-07 17:04 ` Derrick, Jonathan
2018-11-07 23:42 ` Bjorn Helgaas
2018-11-08 20:09 ` Bjorn Helgaas
2018-11-08 20:09 ` Bjorn Helgaas
2018-11-08 21:49 ` Keith Busch
2018-11-08 21:49 ` Keith Busch
2018-11-08 22:01 ` Greg Kroah-Hartman
2018-11-08 22:01 ` Greg Kroah-Hartman
2018-11-08 22:32 ` Keith Busch
2018-11-08 22:32 ` Keith Busch
2018-11-08 22:42 ` Greg Kroah-Hartman
2018-11-08 22:42 ` Greg Kroah-Hartman
2018-11-08 22:49 ` Alex_Gagniuc
2018-11-08 22:49 ` Alex_Gagniuc
2018-11-08 22:51 ` Greg KH
2018-11-08 22:51 ` Greg KH
2018-11-08 23:06 ` Alex_Gagniuc
2018-11-08 23:06 ` Alex_Gagniuc
2018-11-12 5:49 ` Oliver O'Halloran
2018-11-12 5:49 ` Oliver O'Halloran
2018-11-12 20:05 ` Alex_Gagniuc
2018-11-12 20:05 ` Alex_Gagniuc
2018-11-13 5:02 ` Bjorn Helgaas
2018-11-13 5:02 ` Bjorn Helgaas
2018-11-13 22:39 ` Alex_Gagniuc
2018-11-13 22:39 ` Alex_Gagniuc
2018-11-13 22:52 ` Keith Busch
2018-11-13 22:52 ` Keith Busch
2018-11-14 0:31 ` Alex_Gagniuc
2018-11-14 0:31 ` Alex_Gagniuc
2018-11-14 5:59 ` Bjorn Helgaas
2018-11-14 5:59 ` Bjorn Helgaas
2018-11-14 19:22 ` Alex_Gagniuc
2018-11-14 19:22 ` Alex_Gagniuc
2018-11-14 19:41 ` Derrick, Jonathan
2018-11-14 19:41 ` Derrick, Jonathan
2018-11-14 20:23 ` Keith Busch
2018-11-14 20:23 ` Keith Busch
2018-11-14 20:52 ` Alex_Gagniuc
2018-11-14 20:52 ` Alex_Gagniuc
2018-11-14 20:58 ` Keith Busch
2018-11-14 20:58 ` Keith Busch
2018-11-15 6:24 ` Bjorn Helgaas
2018-11-15 6:24 ` Bjorn Helgaas
2018-11-16 0:19 ` Alex_Gagniuc
2018-11-16 0:19 ` Alex_Gagniuc
2018-11-08 23:03 ` Keith Busch
2018-11-08 23:03 ` Keith Busch
2018-11-09 7:29 ` Lukas Wunner
2018-11-09 11:32 ` Greg Kroah-Hartman
2018-11-09 11:32 ` Greg Kroah-Hartman
2018-11-09 16:36 ` Keith Busch [this message]
2018-11-09 16:36 ` Keith Busch
2018-11-08 22:20 ` Alex_Gagniuc
2018-11-08 22:20 ` Alex_Gagniuc
2018-11-09 7:11 ` Lukas Wunner
2018-11-12 5:48 ` Oliver O'Halloran
2018-11-12 5:48 ` Oliver O'Halloran
2018-12-27 19:28 ` Alex_Gagniuc
2018-12-27 19:28 ` Alex_Gagniuc
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181109163629.GF2932@localhost.localdomain \
--to=keith.busch@intel.com \
--cc=alex_gagniuc@dellteam.com \
--cc=austin_bolen@dell.com \
--cc=gregkh@linuxfoundation.org \
--cc=helgaas@kernel.org \
--cc=jonathan.derrick@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lukas@wunner.de \
--cc=mr.nuke.me@gmail.com \
--cc=oohall@gmail.com \
--cc=ruscur@russell.cc \
--cc=sbobroff@linux.ibm.com \
--cc=shyam_iyer@dell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.