* [PATCH v2 2/2] PCI: Do not treat EPROBE_DEFER as device attach failure
2016-04-20 14:44 [PATCH v2 1/2] PCI: Fix BUG on device attach failure Lukas Wunner
@ 2016-04-20 14:44 ` Lukas Wunner
2016-04-26 1:00 ` Rafael J. Wysocki
2016-04-26 1:00 ` [PATCH v2 1/2] PCI: Fix BUG on " Rafael J. Wysocki
2016-05-02 19:01 ` Bjorn Helgaas
2 siblings, 1 reply; 6+ messages in thread
From: Lukas Wunner @ 2016-04-20 14:44 UTC (permalink / raw)
To: linux-pci; +Cc: linux-pm, Bjorn Helgaas
Linux 4.5 introduced a behavioral change in device probing during the
suspend process with commit 013c074f8642 ("PM / sleep: prohibit devices
probing during suspend/hibernation"): It defers device probing during
the entire suspend process, starting from the prepare phase and ending
with the complete phase. A rule existed before that "we rely on sub-
systems not to do any probing once a device is suspended" but it is
enforced only now (Alan Stern, https://lkml.org/lkml/2015/9/15/908).
This resulted in a WARN splat if a PCI device (e.g. Thunderbolt) is
plugged in while the system is asleep: Upon waking up, pciehp_resume()
discovers new devices in the resume phase and immediately tries to bind
them to a driver. Since probing is now deferred, device_attach() returns
-EPROBE_DEFER, which provoked a WARN in pci_bus_add_device().
Linux 4.6-rc1 aggravates the situation with commit ab1a187bba5c ("PCI:
Check device_attach() return value always"): If device_attach() returns
a negative value, pci_bus_add_device() now removes the sysfs and procfs
entries for the device and pci_bus_add_devices() subsequently locks up
with a BUG. Even with the BUG fixed we're still in trouble because the
device remains on the deferred probing list even though its sysfs and
procfs entries are gone and its children won't be added.
Fix by not interpreting -EPROBE_DEFER as failure. The device will be
probed eventually (through device_unblock_probing() in dpm_complete())
and there is proper locking in place to avoid races (e.g. if devices are
unplugged again und thus deleted from the system before deferred probing
happens, I have tested this). Also, those functions which dereference
dev->driver (e.g. pci_pm_*()) do contain proper NULL pointer checks.
So it seems safe to ignore -EPROBE_DEFER.
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
v2: Split commit in two, explain when exactly deferred probing will
happen (Bjorn Helgaas).
drivers/pci/bus.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 23a39fd..dd7cdbe 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -294,7 +294,7 @@ void pci_bus_add_device(struct pci_dev *dev)
dev->match_driver = true;
retval = device_attach(&dev->dev);
- if (retval < 0) {
+ if (retval < 0 && retval != -EPROBE_DEFER) {
dev_warn(&dev->dev, "device attach failed (%d)\n", retval);
pci_proc_detach_device(dev);
pci_remove_sysfs_dev_files(dev);
--
2.8.0.rc3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] PCI: Do not treat EPROBE_DEFER as device attach failure
2016-04-20 14:44 ` [PATCH v2 2/2] PCI: Do not treat EPROBE_DEFER as " Lukas Wunner
@ 2016-04-26 1:00 ` Rafael J. Wysocki
0 siblings, 0 replies; 6+ messages in thread
From: Rafael J. Wysocki @ 2016-04-26 1:00 UTC (permalink / raw)
To: Lukas Wunner; +Cc: linux-pci, linux-pm, Bjorn Helgaas
On Wednesday, April 20, 2016 04:44:27 PM Lukas Wunner wrote:
> Linux 4.5 introduced a behavioral change in device probing during the
> suspend process with commit 013c074f8642 ("PM / sleep: prohibit devices
> probing during suspend/hibernation"): It defers device probing during
> the entire suspend process, starting from the prepare phase and ending
> with the complete phase. A rule existed before that "we rely on sub-
> systems not to do any probing once a device is suspended" but it is
> enforced only now (Alan Stern, https://lkml.org/lkml/2015/9/15/908).
>
> This resulted in a WARN splat if a PCI device (e.g. Thunderbolt) is
> plugged in while the system is asleep: Upon waking up, pciehp_resume()
> discovers new devices in the resume phase and immediately tries to bind
> them to a driver. Since probing is now deferred, device_attach() returns
> -EPROBE_DEFER, which provoked a WARN in pci_bus_add_device().
>
> Linux 4.6-rc1 aggravates the situation with commit ab1a187bba5c ("PCI:
> Check device_attach() return value always"): If device_attach() returns
> a negative value, pci_bus_add_device() now removes the sysfs and procfs
> entries for the device and pci_bus_add_devices() subsequently locks up
> with a BUG. Even with the BUG fixed we're still in trouble because the
> device remains on the deferred probing list even though its sysfs and
> procfs entries are gone and its children won't be added.
>
> Fix by not interpreting -EPROBE_DEFER as failure. The device will be
> probed eventually (through device_unblock_probing() in dpm_complete())
> and there is proper locking in place to avoid races (e.g. if devices are
> unplugged again und thus deleted from the system before deferred probing
> happens, I have tested this). Also, those functions which dereference
> dev->driver (e.g. pci_pm_*()) do contain proper NULL pointer checks.
> So it seems safe to ignore -EPROBE_DEFER.
>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> v2: Split commit in two, explain when exactly deferred probing will
> happen (Bjorn Helgaas).
>
> drivers/pci/bus.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 23a39fd..dd7cdbe 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -294,7 +294,7 @@ void pci_bus_add_device(struct pci_dev *dev)
>
> dev->match_driver = true;
> retval = device_attach(&dev->dev);
> - if (retval < 0) {
> + if (retval < 0 && retval != -EPROBE_DEFER) {
> dev_warn(&dev->dev, "device attach failed (%d)\n", retval);
> pci_proc_detach_device(dev);
> pci_remove_sysfs_dev_files(dev);
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] PCI: Fix BUG on device attach failure
2016-04-20 14:44 [PATCH v2 1/2] PCI: Fix BUG on device attach failure Lukas Wunner
2016-04-20 14:44 ` [PATCH v2 2/2] PCI: Do not treat EPROBE_DEFER as " Lukas Wunner
@ 2016-04-26 1:00 ` Rafael J. Wysocki
2016-05-02 19:01 ` Bjorn Helgaas
2 siblings, 0 replies; 6+ messages in thread
From: Rafael J. Wysocki @ 2016-04-26 1:00 UTC (permalink / raw)
To: Lukas Wunner; +Cc: linux-pci, linux-pm, Bjorn Helgaas
On Wednesday, April 20, 2016 04:44:27 PM Lukas Wunner wrote:
> Previously when pci_bus_add_device() called device_attach() and it
> returned a negative value, we emitted a WARN but carried on.
>
> Commit ab1a187bba5c ("PCI: Check device_attach() return value always"),
> introduced in Linux 4.6-rc1, changed this to unwind all steps preceding
> device_attach() and to not set dev->is_added = 1.
>
> The latter leads to a BUG if pci_bus_add_device() was called from
> pci_bus_add_devices(). Fix by not recursing to a child bus if
> device_attach() failed for the bridge leading to it.
>
> This can be triggered by plugging in a PCI device (e.g. Thunderbolt)
> while the system is asleep. The system locks up when woken because
> device_attach() returns -EPROBE_DEFER.
>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
Both this and the [2/2] look reasonable to me.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> v2: Split commit in two (Bjorn Helgaas).
>
> drivers/pci/bus.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 6c9f546..23a39fd 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -324,7 +324,9 @@ void pci_bus_add_devices(const struct pci_bus *bus)
> }
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> - BUG_ON(!dev->is_added);
> + /* Skip if device attach failed */
> + if (!dev->is_added)
> + continue;
> child = dev->subordinate;
> if (child)
> pci_bus_add_devices(child);
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] PCI: Fix BUG on device attach failure
2016-04-20 14:44 [PATCH v2 1/2] PCI: Fix BUG on device attach failure Lukas Wunner
2016-04-20 14:44 ` [PATCH v2 2/2] PCI: Do not treat EPROBE_DEFER as " Lukas Wunner
2016-04-26 1:00 ` [PATCH v2 1/2] PCI: Fix BUG on " Rafael J. Wysocki
@ 2016-05-02 19:01 ` Bjorn Helgaas
2016-05-02 20:23 ` Lukas Wunner
2 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2016-05-02 19:01 UTC (permalink / raw)
To: Lukas Wunner; +Cc: linux-pci, linux-pm
On Wed, Apr 20, 2016 at 04:44:27PM +0200, Lukas Wunner wrote:
> Previously when pci_bus_add_device() called device_attach() and it
> returned a negative value, we emitted a WARN but carried on.
>
> Commit ab1a187bba5c ("PCI: Check device_attach() return value always"),
> introduced in Linux 4.6-rc1, changed this to unwind all steps preceding
> device_attach() and to not set dev->is_added = 1.
>
> The latter leads to a BUG if pci_bus_add_device() was called from
> pci_bus_add_devices(). Fix by not recursing to a child bus if
> device_attach() failed for the bridge leading to it.
>
> This can be triggered by plugging in a PCI device (e.g. Thunderbolt)
> while the system is asleep. The system locks up when woken because
> device_attach() returns -EPROBE_DEFER.
>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
Applied both with Rafael's acks to for-linus for v4.6.
The rationale is that ab1a187bba5c ("PCI: Check device_attach() return
value always") appeared in v4.6-rc1, and after that commit, plugging
in a Thunderbolt device while the system is asleep will cause a BUG()
when the system is awakened.
Lukas, you mentioned earlier that we might want to include Andreas'
patch "thunderbolt: Fix double free of drom buffer" at the same time.
I did apply that on pci/thunderbolt for v4.7. I can move that to
for-linus and put it in v4.6 if necessary, but that bug been there
since v3.17. Is there something in v4.6 that makes us more likely to
hit that double free?
> ---
> v2: Split commit in two (Bjorn Helgaas).
>
> drivers/pci/bus.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 6c9f546..23a39fd 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -324,7 +324,9 @@ void pci_bus_add_devices(const struct pci_bus *bus)
> }
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> - BUG_ON(!dev->is_added);
> + /* Skip if device attach failed */
> + if (!dev->is_added)
> + continue;
> child = dev->subordinate;
> if (child)
> pci_bus_add_devices(child);
> --
> 2.8.0.rc3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/2] PCI: Fix BUG on device attach failure
2016-05-02 19:01 ` Bjorn Helgaas
@ 2016-05-02 20:23 ` Lukas Wunner
0 siblings, 0 replies; 6+ messages in thread
From: Lukas Wunner @ 2016-05-02 20:23 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: linux-pci, linux-pm
Hi Bjorn,
On Mon, May 02, 2016 at 02:01:33PM -0500, Bjorn Helgaas wrote:
> On Wed, Apr 20, 2016 at 04:44:27PM +0200, Lukas Wunner wrote:
> > Previously when pci_bus_add_device() called device_attach() and it
> > returned a negative value, we emitted a WARN but carried on.
> >
> > Commit ab1a187bba5c ("PCI: Check device_attach() return value always"),
> > introduced in Linux 4.6-rc1, changed this to unwind all steps preceding
> > device_attach() and to not set dev->is_added = 1.
> >
> > The latter leads to a BUG if pci_bus_add_device() was called from
> > pci_bus_add_devices(). Fix by not recursing to a child bus if
> > device_attach() failed for the bridge leading to it.
> >
> > This can be triggered by plugging in a PCI device (e.g. Thunderbolt)
> > while the system is asleep. The system locks up when woken because
> > device_attach() returns -EPROBE_DEFER.
> >
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
>
> Applied both with Rafael's acks to for-linus for v4.6.
>
> The rationale is that ab1a187bba5c ("PCI: Check device_attach() return
> value always") appeared in v4.6-rc1, and after that commit, plugging
> in a Thunderbolt device while the system is asleep will cause a BUG()
> when the system is awakened.
>
> Lukas, you mentioned earlier that we might want to include Andreas'
> patch "thunderbolt: Fix double free of drom buffer" at the same time.
> I did apply that on pci/thunderbolt for v4.7. I can move that to
> for-linus and put it in v4.6 if necessary, but that bug been there
> since v3.17. Is there something in v4.6 that makes us more likely to
> hit that double free?
No there's not, I meant that just as a reminder. The solution you've
settled on seems perfectly fine, thank you so much!
Lukas
>
> > ---
> > v2: Split commit in two (Bjorn Helgaas).
> >
> > drivers/pci/bus.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > index 6c9f546..23a39fd 100644
> > --- a/drivers/pci/bus.c
> > +++ b/drivers/pci/bus.c
> > @@ -324,7 +324,9 @@ void pci_bus_add_devices(const struct pci_bus *bus)
> > }
> >
> > list_for_each_entry(dev, &bus->devices, bus_list) {
> > - BUG_ON(!dev->is_added);
> > + /* Skip if device attach failed */
> > + if (!dev->is_added)
> > + continue;
> > child = dev->subordinate;
> > if (child)
> > pci_bus_add_devices(child);
> > --
> > 2.8.0.rc3
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread