linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] PCI: Fix device attach failure handling
@ 2016-03-31 12:57 Lukas Wunner
  2016-04-05 11:29 ` Grygorii Strashko
  2016-04-19 23:23 ` Bjorn Helgaas
  0 siblings, 2 replies; 8+ messages in thread
From: Lukas Wunner @ 2016-03-31 12:57 UTC (permalink / raw)
  To: linux-pci, linux-pm
  Cc: Grygorii Strashko, Alan Stern, Rafael J. Wysocki, Bjorn Helgaas

Linux 4.5 introduced a behavioral change in device probing during the
suspend process with commit 013c074f8642 ("PM / sleep: prohibit devices
probing during suspend/hibernation"): It defers device probing during
the entire suspend process, starting from the prepare phase and ending
with the complete phase. A rule existed before that "we rely on sub-
systems not to do any probing once a device is suspended" but it is
enforced only now (Alan Stern, https://lkml.org/lkml/2015/9/15/908).

This resulted in a WARN splat if a PCI device (e.g. Thunderbolt) is
plugged in while the system is asleep: Upon waking up, pciehp_resume()
discovers new devices in the resume phase and immediately tries to bind
them to a driver. Since probing is now deferred, device_attach() returns
-EPROBE_DEFER, which provoked a WARN in pci_bus_add_device().

Linux 4.6-rc1 aggravates the situation with commit ab1a187bba5c ("PCI:
Check device_attach() return value always"): pci_bus_add_device() no
longer sets dev->is_added = 1 if device_attach() returned a negative
value. This results in a BUG lockup in pci_bus_add_devices().

Fix the latter by not recursing to a child bus if device_attach() failed
for the bridge leading to it.

Fix the former by not interpreting -EPROBE_DEFER as failure. The device
will be probed eventually and there is proper locking in place to avoid
races (e.g. if devices are unplugged again und thus deleted from the
system before deferred probing happens, I have tested this). Also, those
functions which dereference dev->driver (e.g. pci_pm_*()) do contain
proper NULL pointer checks. So it seems safe to ignore -EPROBE_DEFER.

Note that even postponing the code in pciehp_resume() until the
complete phase wouldn't avoid these troubles because dpm_complete()
calls device_unblock_probing() only after ->complete has been
executed for all devices. We lack a pm hook from which it would
be safe to check a hotplug port and call device_attach() without
risking -EPROBE_DEFER.

Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/pci/bus.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 6c9f546..dd7cdbe 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -294,7 +294,7 @@ void pci_bus_add_device(struct pci_dev *dev)
 
 	dev->match_driver = true;
 	retval = device_attach(&dev->dev);
-	if (retval < 0) {
+	if (retval < 0 && retval != -EPROBE_DEFER) {
 		dev_warn(&dev->dev, "device attach failed (%d)\n", retval);
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
@@ -324,7 +324,9 @@ void pci_bus_add_devices(const struct pci_bus *bus)
 	}
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
-		BUG_ON(!dev->is_added);
+		/* Skip if device attach failed */
+		if (!dev->is_added)
+			continue;
 		child = dev->subordinate;
 		if (child)
 			pci_bus_add_devices(child);
-- 
2.8.0.rc3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: Fix device attach failure handling
  2016-03-31 12:57 [PATCH] PCI: Fix device attach failure handling Lukas Wunner
@ 2016-04-05 11:29 ` Grygorii Strashko
  2016-04-05 16:45   ` Lukas Wunner
  2016-04-19 23:23 ` Bjorn Helgaas
  1 sibling, 1 reply; 8+ messages in thread
From: Grygorii Strashko @ 2016-04-05 11:29 UTC (permalink / raw)
  To: Lukas Wunner, linux-pci, linux-pm
  Cc: Alan Stern, Rafael J. Wysocki, Bjorn Helgaas

On 03/31/2016 03:57 PM, Lukas Wunner wrote:
> Linux 4.5 introduced a behavioral change in device probing during the
> suspend process with commit 013c074f8642 ("PM / sleep: prohibit devices
> probing during suspend/hibernation"): It defers device probing during
> the entire suspend process, starting from the prepare phase and ending
> with the complete phase. A rule existed before that "we rely on sub-
> systems not to do any probing once a device is suspended" but it is
> enforced only now (Alan Stern, https://lkml.org/lkml/2015/9/15/908).
> 
> This resulted in a WARN splat if a PCI device (e.g. Thunderbolt) is
> plugged in while the system is asleep: Upon waking up, pciehp_resume()
> discovers new devices in the resume phase and immediately tries to bind
> them to a driver. Since probing is now deferred, device_attach() returns
> -EPROBE_DEFER, which provoked a WARN in pci_bus_add_device().
> 
> Linux 4.6-rc1 aggravates the situation with commit ab1a187bba5c ("PCI:
> Check device_attach() return value always"): pci_bus_add_device() no
> longer sets dev->is_added = 1 if device_attach() returned a negative
> value. This results in a BUG lockup in pci_bus_add_devices().
> 
> Fix the latter by not recursing to a child bus if device_attach() failed
> for the bridge leading to it.
> 
> Fix the former by not interpreting -EPROBE_DEFER as failure. The device
> will be probed eventually and there is proper locking in place to avoid
> races (e.g. if devices are unplugged again und thus deleted from the
> system before deferred probing happens, I have tested this). Also, those
> functions which dereference dev->driver (e.g. pci_pm_*()) do contain
> proper NULL pointer checks. So it seems safe to ignore -EPROBE_DEFER.
> 
> Note that even postponing the code in pciehp_resume() until the
> complete phase wouldn't avoid these troubles because dpm_complete()
> calls device_unblock_probing() only after ->complete has been
> executed for all devices. We lack a pm hook from which it would
> be safe to check a hotplug port and call device_attach() without
> risking -EPROBE_DEFER.

Unfortunately, I can't say too much about pci in general.
Regarding checking a hotplug port - Potentially,
PM notifiers can be used PM_SUSPEND_PREPARE/PM_POST_SUSPEND.
Smth. similar (more or less) was implemented for MMC

bbd4368 mmc: core: Signal wakeup event at card insert/removal
4c2ef25 mmc: fix all hangs related to mmc/sd card insert/removal during suspend/resume

> 
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>   drivers/pci/bus.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 6c9f546..dd7cdbe 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -294,7 +294,7 @@ void pci_bus_add_device(struct pci_dev *dev)
>   
>   	dev->match_driver = true;
>   	retval = device_attach(&dev->dev);
> -	if (retval < 0) {
> +	if (retval < 0 && retval != -EPROBE_DEFER) {
>   		dev_warn(&dev->dev, "device attach failed (%d)\n", retval);
>   		pci_proc_detach_device(dev);
>   		pci_remove_sysfs_dev_files(dev);
> @@ -324,7 +324,9 @@ void pci_bus_add_devices(const struct pci_bus *bus)
>   	}
>   
>   	list_for_each_entry(dev, &bus->devices, bus_list) {
> -		BUG_ON(!dev->is_added);
> +		/* Skip if device attach failed */
> +		if (!dev->is_added)
> +			continue;
>   		child = dev->subordinate;
>   		if (child)
>   			pci_bus_add_devices(child);
> 


-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: Fix device attach failure handling
  2016-04-05 11:29 ` Grygorii Strashko
@ 2016-04-05 16:45   ` Lukas Wunner
  0 siblings, 0 replies; 8+ messages in thread
From: Lukas Wunner @ 2016-04-05 16:45 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: linux-pci, linux-pm, Alan Stern, Rafael J. Wysocki, Bjorn Helgaas

Hi Grygorii,

On Tue, Apr 05, 2016 at 02:29:38PM +0300, Grygorii Strashko wrote:
> On 03/31/2016 03:57 PM, Lukas Wunner wrote:
> > Linux 4.5 introduced a behavioral change in device probing during the
> > suspend process with commit 013c074f8642 ("PM / sleep: prohibit devices
> > probing during suspend/hibernation"): It defers device probing during
> > the entire suspend process, starting from the prepare phase and ending
> > with the complete phase. A rule existed before that "we rely on sub-
> > systems not to do any probing once a device is suspended" but it is
> > enforced only now (Alan Stern, https://lkml.org/lkml/2015/9/15/908).
> > 
> > This resulted in a WARN splat if a PCI device (e.g. Thunderbolt) is
> > plugged in while the system is asleep: Upon waking up, pciehp_resume()
> > discovers new devices in the resume phase and immediately tries to bind
> > them to a driver. Since probing is now deferred, device_attach() returns
> > -EPROBE_DEFER, which provoked a WARN in pci_bus_add_device().
> > 
> > Linux 4.6-rc1 aggravates the situation with commit ab1a187bba5c ("PCI:
> > Check device_attach() return value always"): pci_bus_add_device() no
> > longer sets dev->is_added = 1 if device_attach() returned a negative
> > value. This results in a BUG lockup in pci_bus_add_devices().
> > 
> > Fix the latter by not recursing to a child bus if device_attach() failed
> > for the bridge leading to it.
> > 
> > Fix the former by not interpreting -EPROBE_DEFER as failure. The device
> > will be probed eventually and there is proper locking in place to avoid
> > races (e.g. if devices are unplugged again und thus deleted from the
> > system before deferred probing happens, I have tested this). Also, those
> > functions which dereference dev->driver (e.g. pci_pm_*()) do contain
> > proper NULL pointer checks. So it seems safe to ignore -EPROBE_DEFER.
> > 
> > Note that even postponing the code in pciehp_resume() until the
> > complete phase wouldn't avoid these troubles because dpm_complete()
> > calls device_unblock_probing() only after ->complete has been
> > executed for all devices. We lack a pm hook from which it would
> > be safe to check a hotplug port and call device_attach() without
> > risking -EPROBE_DEFER.
> 
> Unfortunately, I can't say too much about pci in general.
> Regarding checking a hotplug port - Potentially,
> PM notifiers can be used PM_SUSPEND_PREPARE/PM_POST_SUSPEND.
> Smth. similar (more or less) was implemented for MMC
> 
> bbd4368 mmc: core: Signal wakeup event at card insert/removal
> 4c2ef25 mmc: fix all hangs related to mmc/sd card insert/removal during suspend/resume

Thank you, I hadn't thought of PM notifiers.

That would indeed be a viable alternative solution:
- Call pcie_disable_notification() on PM_SUSPEND_PREPARE /
  PM_HIBERNATION_PREPARE.
- Run the code in pciehp_resume() on PM_POST_SUSPEND / PM_POST_HIBERNATION.
  (This includes pcie_enable_notification(). The pciehp_suspend() and
  pciehp_resume() could be dropped completely.)

Bjorn, please let me know if you'd prefer that in favor of the patch below.
(The portion which replaces the BUG_ON would still be needed though.)

Thanks,

Lukas

> > 
> > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > Cc: Alan Stern <stern@rowland.harvard.edu>
> > Cc: Rafael J. Wysocki <rafael@kernel.org>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >   drivers/pci/bus.c | 6 ++++--
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > index 6c9f546..dd7cdbe 100644
> > --- a/drivers/pci/bus.c
> > +++ b/drivers/pci/bus.c
> > @@ -294,7 +294,7 @@ void pci_bus_add_device(struct pci_dev *dev)
> >   
> >   	dev->match_driver = true;
> >   	retval = device_attach(&dev->dev);
> > -	if (retval < 0) {
> > +	if (retval < 0 && retval != -EPROBE_DEFER) {
> >   		dev_warn(&dev->dev, "device attach failed (%d)\n", retval);
> >   		pci_proc_detach_device(dev);
> >   		pci_remove_sysfs_dev_files(dev);
> > @@ -324,7 +324,9 @@ void pci_bus_add_devices(const struct pci_bus *bus)
> >   	}
> >   
> >   	list_for_each_entry(dev, &bus->devices, bus_list) {
> > -		BUG_ON(!dev->is_added);
> > +		/* Skip if device attach failed */
> > +		if (!dev->is_added)
> > +			continue;
> >   		child = dev->subordinate;
> >   		if (child)
> >   			pci_bus_add_devices(child);
> > 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: Fix device attach failure handling
  2016-03-31 12:57 [PATCH] PCI: Fix device attach failure handling Lukas Wunner
  2016-04-05 11:29 ` Grygorii Strashko
@ 2016-04-19 23:23 ` Bjorn Helgaas
  2016-04-20 14:54   ` Lukas Wunner
  1 sibling, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2016-04-19 23:23 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-pm, Grygorii Strashko, Alan Stern,
	Rafael J. Wysocki, Bjorn Helgaas

Hi Lukas,

On Thu, Mar 31, 2016 at 02:57:48PM +0200, Lukas Wunner wrote:
> Linux 4.5 introduced a behavioral change in device probing during the
> suspend process with commit 013c074f8642 ("PM / sleep: prohibit devices
> probing during suspend/hibernation"): It defers device probing during
> the entire suspend process, starting from the prepare phase and ending
> with the complete phase. A rule existed before that "we rely on sub-
> systems not to do any probing once a device is suspended" but it is
> enforced only now (Alan Stern, https://lkml.org/lkml/2015/9/15/908).
> 
> This resulted in a WARN splat if a PCI device (e.g. Thunderbolt) is
> plugged in while the system is asleep: Upon waking up, pciehp_resume()
> discovers new devices in the resume phase and immediately tries to bind
> them to a driver. Since probing is now deferred, device_attach() returns
> -EPROBE_DEFER, which provoked a WARN in pci_bus_add_device().
> 
> Linux 4.6-rc1 aggravates the situation with commit ab1a187bba5c ("PCI:
> Check device_attach() return value always"): pci_bus_add_device() no
> longer sets dev->is_added = 1 if device_attach() returned a negative
> value. This results in a BUG lockup in pci_bus_add_devices().

If this fixes a BUG() that we introduced in v4.6-rc1, it sounds like we
should fix it before v4.6-final.  Please confirm.

> Fix the latter by not recursing to a child bus if device_attach() failed
> for the bridge leading to it.
> 
> Fix the former by not interpreting -EPROBE_DEFER as failure. The device
> will be probed eventually and there is proper locking in place to avoid
> races (e.g. if devices are unplugged again und thus deleted from the
> system before deferred probing happens, I have tested this). Also, those
> functions which dereference dev->driver (e.g. pci_pm_*()) do contain
> proper NULL pointer checks. So it seems safe to ignore -EPROBE_DEFER.

This looks like two different bug fixes.  Can you split them into
separate patches, or is there a reason to combine them?

> Note that even postponing the code in pciehp_resume() until the
> complete phase wouldn't avoid these troubles because dpm_complete()
> calls device_unblock_probing() only after ->complete has been
> executed for all devices. We lack a pm hook from which it would
> be safe to check a hotplug port and call device_attach() without
> risking -EPROBE_DEFER.
> 
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> ---
>  drivers/pci/bus.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 6c9f546..dd7cdbe 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -294,7 +294,7 @@ void pci_bus_add_device(struct pci_dev *dev)
>  
>  	dev->match_driver = true;
>  	retval = device_attach(&dev->dev);
> -	if (retval < 0) {
> +	if (retval < 0 && retval != -EPROBE_DEFER) {
>  		dev_warn(&dev->dev, "device attach failed (%d)\n", retval);

I would prefer if the dev_warn() made a distinction between
-EPROBE_DEFER and other failures.  It sounds like the -EPROBE_DEFER
case will happen in normal operation, and we probably shouldn't treat
it as a warning.

>  		pci_proc_detach_device(dev);
>  		pci_remove_sysfs_dev_files(dev);
> @@ -324,7 +324,9 @@ void pci_bus_add_devices(const struct pci_bus *bus)
>  	}
>  
>  	list_for_each_entry(dev, &bus->devices, bus_list) {
> -		BUG_ON(!dev->is_added);
> +		/* Skip if device attach failed */
> +		if (!dev->is_added)
> +			continue;

I assume the "pci_bus_add_devices(child)" will happen *eventually*?
Can you add a comment about when that is?

>  		child = dev->subordinate;
>  		if (child)
>  			pci_bus_add_devices(child);
> -- 
> 2.8.0.rc3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: Fix device attach failure handling
  2016-04-19 23:23 ` Bjorn Helgaas
@ 2016-04-20 14:54   ` Lukas Wunner
  2016-04-20 19:48     ` Bjorn Helgaas
  0 siblings, 1 reply; 8+ messages in thread
From: Lukas Wunner @ 2016-04-20 14:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-pm, Grygorii Strashko, Alan Stern,
	Rafael J. Wysocki, Bjorn Helgaas, Andreas Noever

Hi Bjorn,

On Tue, Apr 19, 2016 at 06:23:29PM -0500, Bjorn Helgaas wrote:
> On Thu, Mar 31, 2016 at 02:57:48PM +0200, Lukas Wunner wrote:
> > Linux 4.5 introduced a behavioral change in device probing during the
> > suspend process with commit 013c074f8642 ("PM / sleep: prohibit devices
> > probing during suspend/hibernation"): It defers device probing during
> > the entire suspend process, starting from the prepare phase and ending
> > with the complete phase. A rule existed before that "we rely on sub-
> > systems not to do any probing once a device is suspended" but it is
> > enforced only now (Alan Stern, https://lkml.org/lkml/2015/9/15/908).
> > 
> > This resulted in a WARN splat if a PCI device (e.g. Thunderbolt) is
> > plugged in while the system is asleep: Upon waking up, pciehp_resume()
> > discovers new devices in the resume phase and immediately tries to bind
> > them to a driver. Since probing is now deferred, device_attach() returns
> > -EPROBE_DEFER, which provoked a WARN in pci_bus_add_device().
> > 
> > Linux 4.6-rc1 aggravates the situation with commit ab1a187bba5c ("PCI:
> > Check device_attach() return value always"): pci_bus_add_device() no
> > longer sets dev->is_added = 1 if device_attach() returned a negative
> > value. This results in a BUG lockup in pci_bus_add_devices().
> 
> If this fixes a BUG() that we introduced in v4.6-rc1, it sounds like we
> should fix it before v4.6-final.  Please confirm.

Affirmative.


> > Fix the latter by not recursing to a child bus if device_attach() failed
> > for the bridge leading to it.
> > 
> > Fix the former by not interpreting -EPROBE_DEFER as failure. The device
> > will be probed eventually and there is proper locking in place to avoid
> > races (e.g. if devices are unplugged again und thus deleted from the
> > system before deferred probing happens, I have tested this). Also, those
> > functions which dereference dev->driver (e.g. pci_pm_*()) do contain
> > proper NULL pointer checks. So it seems safe to ignore -EPROBE_DEFER.
> 
> This looks like two different bug fixes.  Can you split them into
> separate patches, or is there a reason to combine them?

The two bugs are related but the fix can be split in two, which I've
just done and sent to the list as v2.


> > Note that even postponing the code in pciehp_resume() until the
> > complete phase wouldn't avoid these troubles because dpm_complete()
> > calls device_unblock_probing() only after ->complete has been
> > executed for all devices. We lack a pm hook from which it would
> > be safe to check a hotplug port and call device_attach() without
> > risking -EPROBE_DEFER.
> > 
> > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > Cc: Alan Stern <stern@rowland.harvard.edu>
> > Cc: Rafael J. Wysocki <rafael@kernel.org>
> > Cc: Bjorn Helgaas <bhelgaas@google.com>
> > Signed-off-by: Lukas Wunner <lukas@wunner.de>
> > ---
> >  drivers/pci/bus.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> > index 6c9f546..dd7cdbe 100644
> > --- a/drivers/pci/bus.c
> > +++ b/drivers/pci/bus.c
> > @@ -294,7 +294,7 @@ void pci_bus_add_device(struct pci_dev *dev)
> >  
> >  	dev->match_driver = true;
> >  	retval = device_attach(&dev->dev);
> > -	if (retval < 0) {
> > +	if (retval < 0 && retval != -EPROBE_DEFER) {
> >  		dev_warn(&dev->dev, "device attach failed (%d)\n", retval);
> 
> I would prefer if the dev_warn() made a distinction between
> -EPROBE_DEFER and other failures.  It sounds like the -EPROBE_DEFER
> case will happen in normal operation, and we probably shouldn't treat
> it as a warning.

Both v1 and v2 of my fix do not dev_warn() at all on -EPROBE_DEFER since,
as you've correctly stated above, it happens in normal operation.

If you want I could add a dev_info() specifically for the -EPROBE_DEFER
case but personally I don't think it's necessary, it'll probably just
irritate users.


> >  		pci_proc_detach_device(dev);
> >  		pci_remove_sysfs_dev_files(dev);
> > @@ -324,7 +324,9 @@ void pci_bus_add_devices(const struct pci_bus *bus)
> >  	}
> >  
> >  	list_for_each_entry(dev, &bus->devices, bus_list) {
> > -		BUG_ON(!dev->is_added);
> > +		/* Skip if device attach failed */
> > +		if (!dev->is_added)
> > +			continue;
> 
> I assume the "pci_bus_add_devices(child)" will happen *eventually*?
> Can you add a comment about when that is?

At the end of dpm_complete(), device_unblock_probing() is called and
this will reprobe anything that's ended up on the deferred probing list.
In other words, deferred probing happens right after ->complete has
been called for all devices.

I've amended the commit message of patch [2/2] in v2 to clarify this.

Let me know if you want anything else changed. If you send a pull with
these to Linus, please consider including the thunderbolt double-free
patch posted by Andreas Noever with:

    Subject: [PATCH] thunderbolt: Fix double free of drom buffer
    Message-Id: <1460285307-3557-1-git-send-email-andreas.noever@gmail.com>
    Date: Sun, 10 Apr 2016 12:48:27 +0200

Thank you!

Lukas

> >  		child = dev->subordinate;
> >  		if (child)
> >  			pci_bus_add_devices(child);
> > -- 
> > 2.8.0.rc3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: Fix device attach failure handling
  2016-04-20 14:54   ` Lukas Wunner
@ 2016-04-20 19:48     ` Bjorn Helgaas
  2016-04-20 20:06       ` Lukas Wunner
  0 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2016-04-20 19:48 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: linux-pci, linux-pm, Grygorii Strashko, Alan Stern,
	Rafael J. Wysocki, Bjorn Helgaas, Andreas Noever

On Wed, Apr 20, 2016 at 04:54:55PM +0200, Lukas Wunner wrote:
> Hi Bjorn,
> 
> On Tue, Apr 19, 2016 at 06:23:29PM -0500, Bjorn Helgaas wrote:
> > On Thu, Mar 31, 2016 at 02:57:48PM +0200, Lukas Wunner wrote:
> > > Linux 4.5 introduced a behavioral change in device probing during the
> > > suspend process with commit 013c074f8642 ("PM / sleep: prohibit devices
> > > probing during suspend/hibernation"): It defers device probing during
> > > the entire suspend process, starting from the prepare phase and ending
> > > with the complete phase. A rule existed before that "we rely on sub-
> > > systems not to do any probing once a device is suspended" but it is
> > > enforced only now (Alan Stern, https://lkml.org/lkml/2015/9/15/908).
> > > 
> > > This resulted in a WARN splat if a PCI device (e.g. Thunderbolt) is
> > > plugged in while the system is asleep: Upon waking up, pciehp_resume()
> > > discovers new devices in the resume phase and immediately tries to bind
> > > them to a driver. Since probing is now deferred, device_attach() returns
> > > -EPROBE_DEFER, which provoked a WARN in pci_bus_add_device().
> > > 
> > > Linux 4.6-rc1 aggravates the situation with commit ab1a187bba5c ("PCI:
> > > Check device_attach() return value always"): pci_bus_add_device() no
> > > longer sets dev->is_added = 1 if device_attach() returned a negative
> > > value. This results in a BUG lockup in pci_bus_add_devices().

What is the connection here?  013c074f8642 causes device_attach() to
return -EPROBE_DEFER, and your patch makes pci_bus_add_device() treat
that as success, so it continues on and sets dev->is_added.

Since we will now set dev->is_added even for -EPROBE_DEFER, why do we
need to change the "BUG_ON(!dev->is_added)" in pci_bus_add_devices()?

I agree that device_attach() might fail for reasons other than
-EPROBE_DEFER, and it might be nice to avoid the BUG_ON then.  It just
seems like those scenarios are different from the -EPROBE_DEFER one.

Sorry I'm being so dense here.  I'm not very familiar with
suspend/resume, so I'm having a hard time following everything.

> > > @@ -294,7 +294,7 @@ void pci_bus_add_device(struct pci_dev *dev)
> > >  
> > >          dev->match_driver = true;
> > >          retval = device_attach(&dev->dev);
> > > -        if (retval < 0) {
> > > +        if (retval < 0 && retval != -EPROBE_DEFER) {
> > >                  dev_warn(&dev->dev, "device attach failed (%d)\n", retval);
> > 
> > I would prefer if the dev_warn() made a distinction between
> > -EPROBE_DEFER and other failures.  It sounds like the -EPROBE_DEFER
> > case will happen in normal operation, and we probably shouldn't treat
> > it as a warning.
> 
> Both v1 and v2 of my fix do not dev_warn() at all on -EPROBE_DEFER since,
> as you've correctly stated above, it happens in normal operation.
> 
> If you want I could add a dev_info() specifically for the -EPROBE_DEFER
> case but personally I don't think it's necessary, it'll probably just
> irritate users.

You're right; sorry, I just wasn't reading your code clearly.  I don't
think we need an extra message.  We should already have messages for
detecting the new device and its BARs, and those should be enough.

Bjorn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: Fix device attach failure handling
  2016-04-20 19:48     ` Bjorn Helgaas
@ 2016-04-20 20:06       ` Lukas Wunner
  2016-04-20 20:26         ` Rafael J. Wysocki
  0 siblings, 1 reply; 8+ messages in thread
From: Lukas Wunner @ 2016-04-20 20:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-pm, Grygorii Strashko, Alan Stern,
	Rafael J. Wysocki, Bjorn Helgaas, Andreas Noever

Hi Bjorn,

On Wed, Apr 20, 2016 at 02:48:08PM -0500, Bjorn Helgaas wrote:
> On Wed, Apr 20, 2016 at 04:54:55PM +0200, Lukas Wunner wrote:
> > On Tue, Apr 19, 2016 at 06:23:29PM -0500, Bjorn Helgaas wrote:
> > > On Thu, Mar 31, 2016 at 02:57:48PM +0200, Lukas Wunner wrote:
> > > > Linux 4.5 introduced a behavioral change in device probing during the
> > > > suspend process with commit 013c074f8642 ("PM / sleep: prohibit devices
> > > > probing during suspend/hibernation"): It defers device probing during
> > > > the entire suspend process, starting from the prepare phase and ending
> > > > with the complete phase. A rule existed before that "we rely on sub-
> > > > systems not to do any probing once a device is suspended" but it is
> > > > enforced only now (Alan Stern, https://lkml.org/lkml/2015/9/15/908).
> > > > 
> > > > This resulted in a WARN splat if a PCI device (e.g. Thunderbolt) is
> > > > plugged in while the system is asleep: Upon waking up, pciehp_resume()
> > > > discovers new devices in the resume phase and immediately tries to bind
> > > > them to a driver. Since probing is now deferred, device_attach() returns
> > > > -EPROBE_DEFER, which provoked a WARN in pci_bus_add_device().
> > > > 
> > > > Linux 4.6-rc1 aggravates the situation with commit ab1a187bba5c ("PCI:
> > > > Check device_attach() return value always"): pci_bus_add_device() no
> > > > longer sets dev->is_added = 1 if device_attach() returned a negative
> > > > value. This results in a BUG lockup in pci_bus_add_devices().
> 
> What is the connection here?  013c074f8642 causes device_attach() to
> return -EPROBE_DEFER, and your patch makes pci_bus_add_device() treat
> that as success, so it continues on and sets dev->is_added.
> 
> Since we will now set dev->is_added even for -EPROBE_DEFER, why do we
> need to change the "BUG_ON(!dev->is_added)" in pci_bus_add_devices()?
> 
> I agree that device_attach() might fail for reasons other than
> -EPROBE_DEFER, and it might be nice to avoid the BUG_ON then.  It just
> seems like those scenarios are different from the -EPROBE_DEFER one.

Correct. The idea is that your commit ab1a187bba5c already throws a
warning message and undoes the steps preceding device_attach(),
so there's no reason to BUG_ON at this point. If we want to BUG_ON
whenever device_attach() fails we should do that in pci_bus_add_devices()
right after the call to device_attach().

It just doesn't seem to make sense to BUG_ON, there's no bug here,
ever since ab1a187bba5c got introduced, dev->is_added == false is now
a legitimate and expected result if device_attach() failed.

The commit message above is written from a user's point of view:
In 4.5 users got a WARN, in 4.6 they got a BUG when doing the same thing.


> Sorry I'm being so dense here.  I'm not very familiar with
> suspend/resume, so I'm having a hard time following everything.

More like me writing incomprehensible / irritating stuff. Please ask
more questions if I failed to make everything clear.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] PCI: Fix device attach failure handling
  2016-04-20 20:06       ` Lukas Wunner
@ 2016-04-20 20:26         ` Rafael J. Wysocki
  0 siblings, 0 replies; 8+ messages in thread
From: Rafael J. Wysocki @ 2016-04-20 20:26 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Linux PCI, linux-pm@vger.kernel.org,
	Grygorii Strashko, Alan Stern, Rafael J. Wysocki, Bjorn Helgaas,
	Andreas Noever

On Wed, Apr 20, 2016 at 10:06 PM, Lukas Wunner <lukas@wunner.de> wrote:
> Hi Bjorn,
>
> On Wed, Apr 20, 2016 at 02:48:08PM -0500, Bjorn Helgaas wrote:
>> On Wed, Apr 20, 2016 at 04:54:55PM +0200, Lukas Wunner wrote:
>> > On Tue, Apr 19, 2016 at 06:23:29PM -0500, Bjorn Helgaas wrote:
>> > > On Thu, Mar 31, 2016 at 02:57:48PM +0200, Lukas Wunner wrote:
>> > > > Linux 4.5 introduced a behavioral change in device probing during the
>> > > > suspend process with commit 013c074f8642 ("PM / sleep: prohibit devices
>> > > > probing during suspend/hibernation"): It defers device probing during
>> > > > the entire suspend process, starting from the prepare phase and ending
>> > > > with the complete phase. A rule existed before that "we rely on sub-
>> > > > systems not to do any probing once a device is suspended" but it is
>> > > > enforced only now (Alan Stern, https://lkml.org/lkml/2015/9/15/908).
>> > > >
>> > > > This resulted in a WARN splat if a PCI device (e.g. Thunderbolt) is
>> > > > plugged in while the system is asleep: Upon waking up, pciehp_resume()
>> > > > discovers new devices in the resume phase and immediately tries to bind
>> > > > them to a driver. Since probing is now deferred, device_attach() returns
>> > > > -EPROBE_DEFER, which provoked a WARN in pci_bus_add_device().
>> > > >
>> > > > Linux 4.6-rc1 aggravates the situation with commit ab1a187bba5c ("PCI:
>> > > > Check device_attach() return value always"): pci_bus_add_device() no
>> > > > longer sets dev->is_added = 1 if device_attach() returned a negative
>> > > > value. This results in a BUG lockup in pci_bus_add_devices().
>>
>> What is the connection here?  013c074f8642 causes device_attach() to
>> return -EPROBE_DEFER, and your patch makes pci_bus_add_device() treat
>> that as success, so it continues on and sets dev->is_added.
>>
>> Since we will now set dev->is_added even for -EPROBE_DEFER, why do we
>> need to change the "BUG_ON(!dev->is_added)" in pci_bus_add_devices()?
>>
>> I agree that device_attach() might fail for reasons other than
>> -EPROBE_DEFER, and it might be nice to avoid the BUG_ON then.  It just
>> seems like those scenarios are different from the -EPROBE_DEFER one.
>
> Correct. The idea is that your commit ab1a187bba5c already throws a
> warning message and undoes the steps preceding device_attach(),
> so there's no reason to BUG_ON at this point. If we want to BUG_ON
> whenever device_attach() fails we should do that in pci_bus_add_devices()
> right after the call to device_attach().
>
> It just doesn't seem to make sense to BUG_ON, there's no bug here,
> ever since ab1a187bba5c got introduced, dev->is_added == false is now
> a legitimate and expected result if device_attach() failed.
>
> The commit message above is written from a user's point of view:
> In 4.5 users got a WARN, in 4.6 they got a BUG when doing the same thing.

Which is unfortunate.

>> Sorry I'm being so dense here.  I'm not very familiar with
>> suspend/resume, so I'm having a hard time following everything.
>
> More like me writing incomprehensible / irritating stuff. Please ask
> more questions if I failed to make everything clear.

I still need to have a deeper look at your patches, sorry for the
delay (travel & stuff).

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-04-20 20:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-31 12:57 [PATCH] PCI: Fix device attach failure handling Lukas Wunner
2016-04-05 11:29 ` Grygorii Strashko
2016-04-05 16:45   ` Lukas Wunner
2016-04-19 23:23 ` Bjorn Helgaas
2016-04-20 14:54   ` Lukas Wunner
2016-04-20 19:48     ` Bjorn Helgaas
2016-04-20 20:06       ` Lukas Wunner
2016-04-20 20:26         ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).