From: Greg KH <gregkh@linuxfoundation.org>
To: Alex_Gagniuc@dellteam.com
Cc: keith.busch@intel.com, helgaas@kernel.org, mr.nuke.me@gmail.com,
linux-pci@vger.kernel.org, Austin.Bolen@dell.com,
Shyam.Iyer@dell.com, linux-kernel@vger.kernel.org,
jonathan.derrick@intel.com, lukas@wunner.de, ruscur@russell.cc,
sbobroff@linux.ibm.com, oohall@gmail.com,
linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected
Date: Thu, 8 Nov 2018 14:51:09 -0800 [thread overview]
Message-ID: <20181108225109.GA3023@kroah.com> (raw)
In-Reply-To: <20d68e586fff4dcca5616d5056f6fc21@ausx13mps321.AMER.DELL.COM>
On Thu, Nov 08, 2018 at 10:49:08PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 11/08/2018 04:43 PM, Greg Kroah-Hartman wrote:
> >
> > [EXTERNAL EMAIL]
> > Please report any suspicious attachments, links, or requests for sensitive information.
> >
> >
> > On Thu, Nov 08, 2018 at 03:32:58PM -0700, Keith Busch wrote:
> >> On Thu, Nov 08, 2018 at 02:01:17PM -0800, Greg Kroah-Hartman wrote:
> >>> On Thu, Nov 08, 2018 at 02:09:17PM -0600, Bjorn Helgaas wrote:
> >>>> I'm having second thoughts about this. One thing I'm uncomfortable
> >>>> with is that sprinkling pci_dev_is_disconnected() around feels ad hoc
> >>>> instead of systematic, in the sense that I don't know how we convince
> >>>> ourselves that this (and only this) is the correct place to put it.
> >>>
> >>> I think my stance always has been that this call is not good at all
> >>> because once you call it you never really know if it is still true as
> >>> the device could have been removed right afterward.
> >>>
> >>> So almost any code that relies on it is broken, there is no locking and
> >>> it can and will race and you will loose.
> >>
> >> AIUI, we're not trying to create code to rely on this. This more about
> >> reducing reliance on hardware. If the software misses the race once and
> >> accesses disconnected device memory, that's usually not a big deal to
> >> let hardware sort it out, but the point is not to push our luck.
> >
> > Then why even care about this call at all? If you need to really know
> > if the read worked, you have to check the value. If the value is FF
> > then you have a huge hint that the hardware is now gone. And you can
> > rely on it being gone, you can never rely on making the call to the
> > function to check if the hardware is there to be still valid any point
> > in time after the call returns.
>
> In the case that we're trying to fix, this code executing is a result of
> the device being gone, so we can guarantee race-free operation. I agree
> that there is a race, in the general case. As far as checking the result
> for all F's, that's not an option when firmware crashes the system as a
> result of the mmio read/write. It's never pretty when firmware gets
> involved.
If you have firmware that crashes the system when you try to read from a
PCI device that was hot-removed, that is broken firmware and needs to be
fixed. The kernel can not work around that as again, you will never win
that race.
thanks,
greg k-h
WARNING: multiple messages have this Message-ID (diff)
From: Greg KH <gregkh@linuxfoundation.org>
To: Alex_Gagniuc@dellteam.com
Cc: Shyam.Iyer@dell.com, sbobroff@linux.ibm.com,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
keith.busch@intel.com, lukas@wunner.de, helgaas@kernel.org,
mr.nuke.me@gmail.com, Austin.Bolen@dell.com, oohall@gmail.com,
linuxppc-dev@lists.ozlabs.org, jonathan.derrick@intel.com
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected
Date: Thu, 8 Nov 2018 14:51:09 -0800 [thread overview]
Message-ID: <20181108225109.GA3023@kroah.com> (raw)
In-Reply-To: <20d68e586fff4dcca5616d5056f6fc21@ausx13mps321.AMER.DELL.COM>
On Thu, Nov 08, 2018 at 10:49:08PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 11/08/2018 04:43 PM, Greg Kroah-Hartman wrote:
> >
> > [EXTERNAL EMAIL]
> > Please report any suspicious attachments, links, or requests for sensitive information.
> >
> >
> > On Thu, Nov 08, 2018 at 03:32:58PM -0700, Keith Busch wrote:
> >> On Thu, Nov 08, 2018 at 02:01:17PM -0800, Greg Kroah-Hartman wrote:
> >>> On Thu, Nov 08, 2018 at 02:09:17PM -0600, Bjorn Helgaas wrote:
> >>>> I'm having second thoughts about this. One thing I'm uncomfortable
> >>>> with is that sprinkling pci_dev_is_disconnected() around feels ad hoc
> >>>> instead of systematic, in the sense that I don't know how we convince
> >>>> ourselves that this (and only this) is the correct place to put it.
> >>>
> >>> I think my stance always has been that this call is not good at all
> >>> because once you call it you never really know if it is still true as
> >>> the device could have been removed right afterward.
> >>>
> >>> So almost any code that relies on it is broken, there is no locking and
> >>> it can and will race and you will loose.
> >>
> >> AIUI, we're not trying to create code to rely on this. This more about
> >> reducing reliance on hardware. If the software misses the race once and
> >> accesses disconnected device memory, that's usually not a big deal to
> >> let hardware sort it out, but the point is not to push our luck.
> >
> > Then why even care about this call at all? If you need to really know
> > if the read worked, you have to check the value. If the value is FF
> > then you have a huge hint that the hardware is now gone. And you can
> > rely on it being gone, you can never rely on making the call to the
> > function to check if the hardware is there to be still valid any point
> > in time after the call returns.
>
> In the case that we're trying to fix, this code executing is a result of
> the device being gone, so we can guarantee race-free operation. I agree
> that there is a race, in the general case. As far as checking the result
> for all F's, that's not an option when firmware crashes the system as a
> result of the mmio read/write. It's never pretty when firmware gets
> involved.
If you have firmware that crashes the system when you try to read from a
PCI device that was hot-removed, that is broken firmware and needs to be
fixed. The kernel can not work around that as again, you will never win
that race.
thanks,
greg k-h
next prev parent reply other threads:[~2018-11-08 22:51 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-18 22:15 [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected Alexandru Gagniuc
2018-11-06 0:32 ` Alex G.
2018-11-07 17:04 ` Derrick, Jonathan
2018-11-07 23:42 ` Bjorn Helgaas
2018-11-08 20:09 ` Bjorn Helgaas
2018-11-08 20:09 ` Bjorn Helgaas
2018-11-08 21:49 ` Keith Busch
2018-11-08 21:49 ` Keith Busch
2018-11-08 22:01 ` Greg Kroah-Hartman
2018-11-08 22:01 ` Greg Kroah-Hartman
2018-11-08 22:32 ` Keith Busch
2018-11-08 22:32 ` Keith Busch
2018-11-08 22:42 ` Greg Kroah-Hartman
2018-11-08 22:42 ` Greg Kroah-Hartman
2018-11-08 22:49 ` Alex_Gagniuc
2018-11-08 22:49 ` Alex_Gagniuc
2018-11-08 22:51 ` Greg KH [this message]
2018-11-08 22:51 ` Greg KH
2018-11-08 23:06 ` Alex_Gagniuc
2018-11-08 23:06 ` Alex_Gagniuc
2018-11-12 5:49 ` Oliver O'Halloran
2018-11-12 5:49 ` Oliver O'Halloran
2018-11-12 20:05 ` Alex_Gagniuc
2018-11-12 20:05 ` Alex_Gagniuc
2018-11-13 5:02 ` Bjorn Helgaas
2018-11-13 5:02 ` Bjorn Helgaas
2018-11-13 22:39 ` Alex_Gagniuc
2018-11-13 22:39 ` Alex_Gagniuc
2018-11-13 22:52 ` Keith Busch
2018-11-13 22:52 ` Keith Busch
2018-11-14 0:31 ` Alex_Gagniuc
2018-11-14 0:31 ` Alex_Gagniuc
2018-11-14 5:59 ` Bjorn Helgaas
2018-11-14 5:59 ` Bjorn Helgaas
2018-11-14 19:22 ` Alex_Gagniuc
2018-11-14 19:22 ` Alex_Gagniuc
2018-11-14 19:41 ` Derrick, Jonathan
2018-11-14 19:41 ` Derrick, Jonathan
2018-11-14 20:23 ` Keith Busch
2018-11-14 20:23 ` Keith Busch
2018-11-14 20:52 ` Alex_Gagniuc
2018-11-14 20:52 ` Alex_Gagniuc
2018-11-14 20:58 ` Keith Busch
2018-11-14 20:58 ` Keith Busch
2018-11-15 6:24 ` Bjorn Helgaas
2018-11-15 6:24 ` Bjorn Helgaas
2018-11-16 0:19 ` Alex_Gagniuc
2018-11-16 0:19 ` Alex_Gagniuc
2018-11-08 23:03 ` Keith Busch
2018-11-08 23:03 ` Keith Busch
2018-11-09 7:29 ` Lukas Wunner
2018-11-09 11:32 ` Greg Kroah-Hartman
2018-11-09 11:32 ` Greg Kroah-Hartman
2018-11-09 16:36 ` Keith Busch
2018-11-09 16:36 ` Keith Busch
2018-11-08 22:20 ` Alex_Gagniuc
2018-11-08 22:20 ` Alex_Gagniuc
2018-11-09 7:11 ` Lukas Wunner
2018-11-12 5:48 ` Oliver O'Halloran
2018-11-12 5:48 ` Oliver O'Halloran
2018-12-27 19:28 ` Alex_Gagniuc
2018-12-27 19:28 ` Alex_Gagniuc
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181108225109.GA3023@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=Alex_Gagniuc@dellteam.com \
--cc=Austin.Bolen@dell.com \
--cc=Shyam.Iyer@dell.com \
--cc=helgaas@kernel.org \
--cc=jonathan.derrick@intel.com \
--cc=keith.busch@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lukas@wunner.de \
--cc=mr.nuke.me@gmail.com \
--cc=oohall@gmail.com \
--cc=ruscur@russell.cc \
--cc=sbobroff@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.