* [PATCH] PCI: Fix racing for pci device removing via sysfs
@ 2013-04-26 1:47 Yinghai Lu
2013-04-26 16:28 ` Bjorn Helgaas
0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2013-04-26 1:47 UTC (permalink / raw)
To: Bjorn Helgaas; +Cc: Gu Zheng, linux-pci, linux-kernel, Yinghai Lu
Gu found nested removing through
echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
/sys/bus/pci/devices/0000\:1a\:01.0/remove
will cause kernel crash as bus get freed.
[ 418.946462] CPU 4
[ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
FUJITSU-SV PRIMEQUEST 1800E/SB
[ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
pci_bus_read_config_word+0x5e/0x90
[ 420.494137] Call Trace:
[ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
[ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
[ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
[ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
[ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
[ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
https://bugzilla.kernel.org/show_bug.cgi?id=54411
We have one patch that will let device hold bus ref to prevent it from
being freed, but that will still generate warning.
------------[ cut here ]------------
WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
Hardware name: PRIMEQUEST 1800E
list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
Call Trace:
[<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81280b13>] __list_del_entry+0x63/0xd0
[<ffffffff81280b91>] list_del+0x11/0x40
[<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
[<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
[<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
[<ffffffff8129fc89>] remove_callback+0x29/0x40
[<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
We can just check if the device get removed from pci tree
already in the protection under pci_remove_rescan_mutex.
Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/pci-sysfs.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -329,9 +329,16 @@ dev_rescan_store(struct device *dev, str
static void remove_callback(struct device *dev)
{
struct pci_dev *pdev = to_pci_dev(dev);
+ int domain = pci_domain_nr(pdev->bus);
+ u8 bus = pdev->bus->number;
+ u8 devfn = pdev->devfn;
mutex_lock(&pci_remove_rescan_mutex);
- pci_stop_and_remove_bus_device(pdev);
+ pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
+ if (pdev) {
+ pci_dev_put(pdev);
+ pci_stop_and_remove_bus_device(pdev);
+ }
mutex_unlock(&pci_remove_rescan_mutex);
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-26 1:47 [PATCH] PCI: Fix racing for pci device removing via sysfs Yinghai Lu
@ 2013-04-26 16:28 ` Bjorn Helgaas
2013-04-26 20:20 ` Yinghai Lu
0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2013-04-26 16:28 UTC (permalink / raw)
To: Yinghai Lu
Cc: Gu Zheng, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
On Thu, Apr 25, 2013 at 7:47 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> Gu found nested removing through
> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
> /sys/bus/pci/devices/0000\:1a\:01.0/remove
>
> will cause kernel crash as bus get freed.
>
> [ 418.946462] CPU 4
> [ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
> FUJITSU-SV PRIMEQUEST 1800E/SB
> [ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
> pci_bus_read_config_word+0x5e/0x90
> [ 420.494137] Call Trace:
> [ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
> [ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
> [ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
> [ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
> [ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
> [ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
>
> https://bugzilla.kernel.org/show_bug.cgi?id=54411
>
> We have one patch that will let device hold bus ref to prevent it from
> being freed, but that will still generate warning.
>
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
> Hardware name: PRIMEQUEST 1800E
> list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
> Call Trace:
> [<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff81280b13>] __list_del_entry+0x63/0xd0
> [<ffffffff81280b91>] list_del+0x11/0x40
> [<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
> [<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
> [<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
> [<ffffffff8129fc89>] remove_callback+0x29/0x40
> [<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
>
> We can just check if the device get removed from pci tree
> already in the protection under pci_remove_rescan_mutex.
>
> Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
> drivers/pci/pci-sysfs.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/drivers/pci/pci-sysfs.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/pci-sysfs.c
> +++ linux-2.6/drivers/pci/pci-sysfs.c
> @@ -329,9 +329,16 @@ dev_rescan_store(struct device *dev, str
> static void remove_callback(struct device *dev)
> {
> struct pci_dev *pdev = to_pci_dev(dev);
> + int domain = pci_domain_nr(pdev->bus);
> + u8 bus = pdev->bus->number;
> + u8 devfn = pdev->devfn;
>
> mutex_lock(&pci_remove_rescan_mutex);
> - pci_stop_and_remove_bus_device(pdev);
> + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
This is a gross hack. Iterating through all known pci_devs to see if
this one still exists?
I reproduced the original problem, applied this patch, and verified
that it avoids the original crash.
However, it's still incorrect because now you're looking at pdev after
it's been freed. With CONFIG_SLUB_DEBUG_ON=y, the removal still causes
a crash in remove_callback().
Bjorn
> + if (pdev) {
> + pci_dev_put(pdev);
> + pci_stop_and_remove_bus_device(pdev);
> + }
> mutex_unlock(&pci_remove_rescan_mutex);
> }
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-26 16:28 ` Bjorn Helgaas
@ 2013-04-26 20:20 ` Yinghai Lu
2013-04-26 20:53 ` Bjorn Helgaas
0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2013-04-26 20:20 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Gu Zheng, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 609 bytes --]
On Fri, Apr 26, 2013 at 9:28 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> This is a gross hack. Iterating through all known pci_devs to see if
> this one still exists?
>
> I reproduced the original problem, applied this patch, and verified
> that it avoids the original crash.
>
> However, it's still incorrect because now you're looking at pdev after
> it's been freed. With CONFIG_SLUB_DEBUG_ON=y, the removal still causes
> a crash in remove_callback().
>
Yes, there is small window, that could have bus and dev get freed...
Please check attached that should address your concerns.
Thanks
Yinghai
[-- Attachment #2: remove_callback_clean.patch --]
[-- Type: application/octet-stream, Size: 1113 bytes --]
Subject: [PATCH] PCI, sysfs: Clean up rescan with schedule_callback
Change to use three return according to Bjorn.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/pci-sysfs.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -339,19 +339,22 @@ static ssize_t
remove_store(struct device *dev, struct device_attribute *dummy,
const char *buf, size_t count)
{
- int ret = 0;
+ int err;
unsigned long val;
if (strict_strtoul(buf, 0, &val) < 0)
return -EINVAL;
+ if (!val)
+ return count;
+
/* An attribute cannot be unregistered by one of its own methods,
* so we have to use this roundabout approach.
*/
- if (val)
- ret = device_schedule_callback(dev, remove_callback);
- if (ret)
- count = ret;
+ err = device_schedule_callback(dev, remove_callback);
+ if (err)
+ return err;
+
return count;
}
[-- Attachment #3: fix_racing_removing_2.patch --]
[-- Type: application/octet-stream, Size: 3528 bytes --]
Subject: [PATCH -v2] PCI: Fix racing for pci device removing via sysfs
From: Yinghai Lu <yinghai@kernel.org>
Gu found nested removing through
echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
/sys/bus/pci/devices/0000\:1a\:01.0/remove
will cause kernel crash as bus get freed.
[ 418.946462] CPU 4
[ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
FUJITSU-SV PRIMEQUEST 1800E/SB
[ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
pci_bus_read_config_word+0x5e/0x90
[ 420.494137] Call Trace:
[ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
[ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
[ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
[ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
[ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
[ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
https://bugzilla.kernel.org/show_bug.cgi?id=54411
We have one patch that will let device hold bus ref to prevent it from
being freed, but that will still generate warning.
------------[ cut here ]------------
WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
Hardware name: PRIMEQUEST 1800E
list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
Call Trace:
[<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81280b13>] __list_del_entry+0x63/0xd0
[<ffffffff81280b91>] list_del+0x11/0x40
[<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
[<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
[<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
[<ffffffff8129fc89>] remove_callback+0x29/0x40
[<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
We can just check if the device get removed from pci tree
already in the protection under pci_remove_rescan_mutex.
-v2: check if the dev->bus_list is poisoned instead to
find out if it is removed already.
Also add one extra ref to dev to make sure dev is not
get freed too early.
Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/pci-sysfs.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -329,9 +329,13 @@ dev_rescan_store(struct device *dev, str
static void remove_callback(struct device *dev)
{
struct pci_dev *pdev = to_pci_dev(dev);
+ bool found;
mutex_lock(&pci_remove_rescan_mutex);
- pci_stop_and_remove_bus_device(pdev);
+ found = pdev->bus_list.next != LIST_POISON1;
+ pci_dev_put(pdev);
+ if (found)
+ pci_stop_and_remove_bus_device(pdev);
mutex_unlock(&pci_remove_rescan_mutex);
}
@@ -341,6 +345,7 @@ remove_store(struct device *dev, struct
{
int err;
unsigned long val;
+ struct pci_dev *pdev;
if (strict_strtoul(buf, 0, &val) < 0)
return -EINVAL;
@@ -351,9 +356,12 @@ remove_store(struct device *dev, struct
/* An attribute cannot be unregistered by one of its own methods,
* so we have to use this roundabout approach.
*/
+ pdev = pci_dev_get(to_pci_dev(dev));
err = device_schedule_callback(dev, remove_callback);
- if (err)
+ if (err) {
+ pci_dev_put(pdev);
return err;
+ }
return count;
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-26 20:20 ` Yinghai Lu
@ 2013-04-26 20:53 ` Bjorn Helgaas
2013-04-26 21:01 ` Yinghai Lu
0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2013-04-26 20:53 UTC (permalink / raw)
To: Yinghai Lu
Cc: Gu Zheng, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
On Fri, Apr 26, 2013 at 2:20 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Fri, Apr 26, 2013 at 9:28 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> This is a gross hack. Iterating through all known pci_devs to see if
>> this one still exists?
>>
>> I reproduced the original problem, applied this patch, and verified
>> that it avoids the original crash.
>>
>> However, it's still incorrect because now you're looking at pdev after
>> it's been freed. With CONFIG_SLUB_DEBUG_ON=y, the removal still causes
>> a crash in remove_callback().
>>
>
> Yes, there is small window, that could have bus and dev get freed...
>
> Please check attached that should address your concerns.
You can't be serious. This is a disgusting mess. Checking a list
pointer for LIST_POISON1? As far as I'm concerned, this is a waste of
my time.
Bjorn
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-26 20:53 ` Bjorn Helgaas
@ 2013-04-26 21:01 ` Yinghai Lu
2013-04-29 10:04 ` Gu Zheng
0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2013-04-26 21:01 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Gu Zheng, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 307 bytes --]
On Fri, Apr 26, 2013 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
> You can't be serious. This is a disgusting mess. Checking a list
> pointer for LIST_POISON1? As far as I'm concerned, this is a waste of
> my time.
Well, then need to hold the bus ref, and check bus->devices list instead.
[-- Attachment #2: fix_racing_removing_3.patch --]
[-- Type: application/octet-stream, Size: 3830 bytes --]
Subject: [PATCH -v3] PCI: Fix racing for pci device removing via sysfs
From: Yinghai Lu <yinghai@kernel.org>
Gu found nested removing through
echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
/sys/bus/pci/devices/0000\:1a\:01.0/remove
will cause kernel crash as bus get freed.
[ 418.946462] CPU 4
[ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
FUJITSU-SV PRIMEQUEST 1800E/SB
[ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
pci_bus_read_config_word+0x5e/0x90
[ 420.494137] Call Trace:
[ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
[ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
[ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
[ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
[ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
[ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
https://bugzilla.kernel.org/show_bug.cgi?id=54411
We have one patch that will let device hold bus ref to prevent it from
being freed, but that will still generate warning.
------------[ cut here ]------------
WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
Hardware name: PRIMEQUEST 1800E
list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
Call Trace:
[<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81280b13>] __list_del_entry+0x63/0xd0
[<ffffffff81280b91>] list_del+0x11/0x40
[<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
[<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
[<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
[<ffffffff8129fc89>] remove_callback+0x29/0x40
[<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
We can just check if the device get removed from pci tree
already in the protection under pci_remove_rescan_mutex.
-v2: check if the dev->bus_list is poisoned instead to
find out if it is removed already.
Also add one extra ref to dev to make sure dev is not
get freed too early.
-v3: hold bus ref and dev ref instead.
Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/pci-sysfs.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -328,10 +328,21 @@ dev_rescan_store(struct device *dev, str
static void remove_callback(struct device *dev)
{
- struct pci_dev *pdev = to_pci_dev(dev);
+ struct pci_dev *pdev = to_pci_dev(dev), *tmp;
+ struct pci_bus *bus = pdev->bus;
+ bool found = false;
mutex_lock(&pci_remove_rescan_mutex);
- pci_stop_and_remove_bus_device(pdev);
+ list_for_each_entry(tmp, &bus->devices, bus_list)
+ if (tmp == pdev) {
+ found = true;
+ break;
+ }
+
+ put_device(&pdev->bus->dev);
+ pci_dev_put(pdev);
+ if (found)
+ pci_stop_and_remove_bus_device(pdev);
mutex_unlock(&pci_remove_rescan_mutex);
}
@@ -341,6 +352,7 @@ remove_store(struct device *dev, struct
{
int err;
unsigned long val;
+ struct pci_dev *pdev;
if (strict_strtoul(buf, 0, &val) < 0)
return -EINVAL;
@@ -351,9 +363,14 @@ remove_store(struct device *dev, struct
/* An attribute cannot be unregistered by one of its own methods,
* so we have to use this roundabout approach.
*/
+ pdev = pci_dev_get(to_pci_dev(dev));
+ get_device(&pdev->bus->dev);
err = device_schedule_callback(dev, remove_callback);
- if (err)
+ if (err) {
+ put_device(&pdev->bus->dev);
+ pci_dev_put(pdev);
return err;
+ }
return count;
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-26 21:01 ` Yinghai Lu
@ 2013-04-29 10:04 ` Gu Zheng
2013-04-29 15:19 ` Yinghai Lu
0 siblings, 1 reply; 15+ messages in thread
From: Gu Zheng @ 2013-04-29 10:04 UTC (permalink / raw)
To: Yinghai Lu
Cc: Bjorn Helgaas, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org
Hi Yinghai,
On 04/27/2013 05:01 AM, Yinghai Lu wrote:
> On Fri, Apr 26, 2013 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
>>
>> You can't be serious. This is a disgusting mess. Checking a list
>> pointer for LIST_POISON1? As far as I'm concerned, this is a waste of
>> my time.
>
> Well, then need to hold the bus ref, and check bus->devices list instead.
@@ -341,6 +352,7 @@ remove_store(struct device *dev, struct
{
int err;
unsigned long val;
+ struct pci_dev *pdev;
if (strict_strtoul(buf, 0, &val) < 0)
return -EINVAL;
@@ -351,9 +363,14 @@ remove_store(struct device *dev, struct
/* An attribute cannot be unregistered by one of its own methods,
* so we have to use this roundabout approach.
*/
+ pdev = pci_dev_get(to_pci_dev(dev));
There is no need to increase pci_dev's ref here, because we'll increase it
in sysfs_schedule_callback.
+ get_device(&pdev->bus->dev);
So the pci_bus' ref management is still needed.
err = device_schedule_callback(dev, remove_callback);
- if (err)
+ if (err) {
+ put_device(&pdev->bus->dev);
+ pci_dev_put(pdev);
return err;
+ }
return count;
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-29 10:04 ` Gu Zheng
@ 2013-04-29 15:19 ` Yinghai Lu
2013-04-29 18:15 ` Bjorn Helgaas
0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2013-04-29 15:19 UTC (permalink / raw)
To: Gu Zheng
Cc: Bjorn Helgaas, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Greg Kroah-Hartman, Sarah Sharp
[-- Attachment #1: Type: text/plain, Size: 1393 bytes --]
On Mon, Apr 29, 2013 at 3:04 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> Hi Yinghai,
>
> On 04/27/2013 05:01 AM, Yinghai Lu wrote:
>
>> On Fri, Apr 26, 2013 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>>>
>>> You can't be serious. This is a disgusting mess. Checking a list
>>> pointer for LIST_POISON1? As far as I'm concerned, this is a waste of
>>> my time.
looks like xhci is using that LIST_POISON1 ...
>>
>> Well, then need to hold the bus ref, and check bus->devices list instead.
>
> @@ -341,6 +352,7 @@ remove_store(struct device *dev, struct
> {
> int err;
> unsigned long val;
> + struct pci_dev *pdev;
>
> if (strict_strtoul(buf, 0, &val) < 0)
> return -EINVAL;
> @@ -351,9 +363,14 @@ remove_store(struct device *dev, struct
> /* An attribute cannot be unregistered by one of its own methods,
> * so we have to use this roundabout approach.
> */
> + pdev = pci_dev_get(to_pci_dev(dev));
>
> There is no need to increase pci_dev's ref here, because we'll increase it
> in sysfs_schedule_callback.
ok, i missed that. if we can use LIST_POISON, then could be more simple.
like -v4.
>
> + get_device(&pdev->bus->dev);
>
> So the pci_bus' ref management is still needed.
No, we don't need that as first pci_stop_and_remove_bus_device()
already drop that reference.
Yinghai
[-- Attachment #2: fix_racing_removing_4.patch --]
[-- Type: application/octet-stream, Size: 2923 bytes --]
Subject: [PATCH -v4] PCI: Fix racing for pci device removing via sysfs
From: Yinghai Lu <yinghai@kernel.org>
Gu found nested removing through
echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
/sys/bus/pci/devices/0000\:1a\:01.0/remove
will cause kernel crash as bus get freed.
[ 418.946462] CPU 4
[ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
FUJITSU-SV PRIMEQUEST 1800E/SB
[ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
pci_bus_read_config_word+0x5e/0x90
[ 420.494137] Call Trace:
[ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
[ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
[ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
[ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
[ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
[ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
https://bugzilla.kernel.org/show_bug.cgi?id=54411
We have one patch that will let device hold bus ref to prevent it from
being freed, but that will still generate warning.
------------[ cut here ]------------
WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
Hardware name: PRIMEQUEST 1800E
list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
Call Trace:
[<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81280b13>] __list_del_entry+0x63/0xd0
[<ffffffff81280b91>] list_del+0x11/0x40
[<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
[<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
[<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
[<ffffffff8129fc89>] remove_callback+0x29/0x40
[<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
We can just check if the device get removed from pci tree
already in the protection under pci_remove_rescan_mutex.
-v2: check if the dev->bus_list is poisoned instead to
find out if it is removed already.
Also add one extra ref to dev to make sure dev is not
get freed too early.
-v4: remove not needed ref holding pointed by Gu Zheng.
Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/pci-sysfs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -331,7 +331,8 @@ static void remove_callback(struct devic
struct pci_dev *pdev = to_pci_dev(dev);
mutex_lock(&pci_remove_rescan_mutex);
- pci_stop_and_remove_bus_device(pdev);
+ if (pdev->bus_list.next != LIST_POISON1)
+ pci_stop_and_remove_bus_device(pdev);
mutex_unlock(&pci_remove_rescan_mutex);
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-29 15:19 ` Yinghai Lu
@ 2013-04-29 18:15 ` Bjorn Helgaas
2013-04-29 18:21 ` Greg Kroah-Hartman
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Bjorn Helgaas @ 2013-04-29 18:15 UTC (permalink / raw)
To: Yinghai Lu
Cc: Gu Zheng, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
Greg Kroah-Hartman, Sarah Sharp
On Mon, Apr 29, 2013 at 08:19:10AM -0700, Yinghai Lu wrote:
> On Mon, Apr 29, 2013 at 3:04 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> > On 04/27/2013 05:01 AM, Yinghai Lu wrote:
> >> On Fri, Apr 26, 2013 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>>
> >>> You can't be serious. This is a disgusting mess. Checking a list
> >>> pointer for LIST_POISON1? As far as I'm concerned, this is a waste of
> >>> my time.
>
> looks like xhci is using that LIST_POISON1 ...
>
> >> Well, then need to hold the bus ref, and check bus->devices list instead.
> >
> > @@ -341,6 +352,7 @@ remove_store(struct device *dev, struct
> > {
> > int err;
> > unsigned long val;
> > + struct pci_dev *pdev;
> >
> > if (strict_strtoul(buf, 0, &val) < 0)
> > return -EINVAL;
> > @@ -351,9 +363,14 @@ remove_store(struct device *dev, struct
> > /* An attribute cannot be unregistered by one of its own methods,
> > * so we have to use this roundabout approach.
> > */
> > + pdev = pci_dev_get(to_pci_dev(dev));
> >
> > There is no need to increase pci_dev's ref here, because we'll increase it
> > in sysfs_schedule_callback.
>
> ok, i missed that. if we can use LIST_POISON, then could be more simple.
> like -v4.
I inlined your v4 patch below for convenience.
Maybe my allergic reaction to your use of LIST_POISON1 is unjustified,
but I am dubious about the idea that xhci was the only place that needed
it before now, and we just happened to find one more place in PCI that
needs it. That doesn't make sense because good design patterns are used
many times, not just once or twice.
I thought the whole point of the get/put scheme was that if we had a
pointer to a correctly reference-counted object, we didn't need to check
whether the object was still valid because the object remains valid until
all the references are released.
Gu's "[v2 2/2] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus)"
patch essentially did this:
pci_destroy_dev(struct pci_dev *dev) {
...
+ pci_bus_put(dev->bus)
pci_free_resources(dev)
put_device(&dev->dev)
}
I think this is the wrong place to do the pci_bus_put() because the
pci_dev is reference-counted, and there may be other users that still
have valid references to it.
In this case, 10:00.0 is a bridge leading to [bus 11-1e], and 1a:01.0 is
part of that subtree. The user removed both 10:00.0 and 1a:01.0 almost
simultaneously via sysfs and we scheduled a callback for each.
Each callback acquires a pci_dev reference, and removal of 10:00.0 and the
subtree below it, including pci_destroy_dev(1a:01.0), is done first. The
callback to remove 1a:01.0 is still pending and has a valid reference to
the 1a:01.0 pci_dev.
Since the 1a:01.0 callback is still pending, the put_device in that first
pci_destroy_dev(1a:01.0) call decrements the ref count but doesn't release
the pci_dev.
I think the 1a:01.0 pci_dev should retain its reference to the pci_bus
for as long as the pci_dev exists, so the pci_bus_put() should go in
pci_release_dev() instead.
Bjorn
> Subject: [PATCH -v4] PCI: Fix racing for pci device removing via sysfs
> From: Yinghai Lu <yinghai@kernel.org>
>
> Gu found nested removing through
> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
> /sys/bus/pci/devices/0000\:1a\:01.0/remove
>
> will cause kernel crash as bus get freed.
>
> [ 418.946462] CPU 4
> [ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
> FUJITSU-SV PRIMEQUEST 1800E/SB
> [ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
> pci_bus_read_config_word+0x5e/0x90
> [ 420.494137] Call Trace:
> [ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
> [ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
> [ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
> [ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
> [ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
> [ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
>
> https://bugzilla.kernel.org/show_bug.cgi?id=54411
>
> We have one patch that will let device hold bus ref to prevent it from
> being freed, but that will still generate warning.
>
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
> Hardware name: PRIMEQUEST 1800E
> list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
> Call Trace:
> [<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff81280b13>] __list_del_entry+0x63/0xd0
> [<ffffffff81280b91>] list_del+0x11/0x40
> [<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
> [<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
> [<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
> [<ffffffff8129fc89>] remove_callback+0x29/0x40
> [<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
>
> We can just check if the device get removed from pci tree
> already in the protection under pci_remove_rescan_mutex.
>
> -v2: check if the dev->bus_list is poisoned instead to
> find out if it is removed already.
> Also add one extra ref to dev to make sure dev is not
> get freed too early.
> -v4: remove not needed ref holding pointed by Gu Zheng.
>
> Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>
> ---
> drivers/pci/pci-sysfs.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/drivers/pci/pci-sysfs.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/pci-sysfs.c
> +++ linux-2.6/drivers/pci/pci-sysfs.c
> @@ -331,7 +331,8 @@ static void remove_callback(struct devic
> struct pci_dev *pdev = to_pci_dev(dev);
>
> mutex_lock(&pci_remove_rescan_mutex);
> - pci_stop_and_remove_bus_device(pdev);
> + if (pdev->bus_list.next != LIST_POISON1)
> + pci_stop_and_remove_bus_device(pdev);
> mutex_unlock(&pci_remove_rescan_mutex);
> }
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-29 18:15 ` Bjorn Helgaas
@ 2013-04-29 18:21 ` Greg Kroah-Hartman
2013-04-29 21:23 ` Sarah Sharp
2013-04-29 22:17 ` Yinghai Lu
2013-04-30 9:17 ` Gu Zheng
2 siblings, 1 reply; 15+ messages in thread
From: Greg Kroah-Hartman @ 2013-04-29 18:21 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Yinghai Lu, Gu Zheng, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Sarah Sharp
On Mon, Apr 29, 2013 at 11:15:50AM -0700, Bjorn Helgaas wrote:
> On Mon, Apr 29, 2013 at 08:19:10AM -0700, Yinghai Lu wrote:
> > On Mon, Apr 29, 2013 at 3:04 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> > > On 04/27/2013 05:01 AM, Yinghai Lu wrote:
> > >> On Fri, Apr 26, 2013 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > >>>
> > >>> You can't be serious. This is a disgusting mess. Checking a list
> > >>> pointer for LIST_POISON1? As far as I'm concerned, this is a waste of
> > >>> my time.
> >
> > looks like xhci is using that LIST_POISON1 ...
> >
> > >> Well, then need to hold the bus ref, and check bus->devices list instead.
> > >
> > > @@ -341,6 +352,7 @@ remove_store(struct device *dev, struct
> > > {
> > > int err;
> > > unsigned long val;
> > > + struct pci_dev *pdev;
> > >
> > > if (strict_strtoul(buf, 0, &val) < 0)
> > > return -EINVAL;
> > > @@ -351,9 +363,14 @@ remove_store(struct device *dev, struct
> > > /* An attribute cannot be unregistered by one of its own methods,
> > > * so we have to use this roundabout approach.
> > > */
> > > + pdev = pci_dev_get(to_pci_dev(dev));
> > >
> > > There is no need to increase pci_dev's ref here, because we'll increase it
> > > in sysfs_schedule_callback.
> >
> > ok, i missed that. if we can use LIST_POISON, then could be more simple.
> > like -v4.
>
> I inlined your v4 patch below for convenience.
>
> Maybe my allergic reaction to your use of LIST_POISON1 is unjustified,
> but I am dubious about the idea that xhci was the only place that needed
> it before now, and we just happened to find one more place in PCI that
> needs it. That doesn't make sense because good design patterns are used
> many times, not just once or twice.
>
> I thought the whole point of the get/put scheme was that if we had a
> pointer to a correctly reference-counted object, we didn't need to check
> whether the object was still valid because the object remains valid until
> all the references are released.
You are correct, you shouldn't have to worry about that. If you have to
do something like the LIST_POISON test, something is really wrong.
> Gu's "[v2 2/2] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus)"
> patch essentially did this:
>
> pci_destroy_dev(struct pci_dev *dev) {
> ...
> + pci_bus_put(dev->bus)
> pci_free_resources(dev)
> put_device(&dev->dev)
> }
>
> I think this is the wrong place to do the pci_bus_put() because the
> pci_dev is reference-counted, and there may be other users that still
> have valid references to it.
It should happen in the release function for the pci device, which will
handle other users of the device.
> In this case, 10:00.0 is a bridge leading to [bus 11-1e], and 1a:01.0 is
> part of that subtree. The user removed both 10:00.0 and 1a:01.0 almost
> simultaneously via sysfs and we scheduled a callback for each.
>
> Each callback acquires a pci_dev reference, and removal of 10:00.0 and the
> subtree below it, including pci_destroy_dev(1a:01.0), is done first. The
> callback to remove 1a:01.0 is still pending and has a valid reference to
> the 1a:01.0 pci_dev.
>
> Since the 1a:01.0 callback is still pending, the put_device in that first
> pci_destroy_dev(1a:01.0) call decrements the ref count but doesn't release
> the pci_dev.
>
> I think the 1a:01.0 pci_dev should retain its reference to the pci_bus
> for as long as the pci_dev exists, so the pci_bus_put() should go in
> pci_release_dev() instead.
I agree, that should be the correct fix for this.
greg k-h
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-29 18:21 ` Greg Kroah-Hartman
@ 2013-04-29 21:23 ` Sarah Sharp
2013-04-29 21:32 ` Greg Kroah-Hartman
0 siblings, 1 reply; 15+ messages in thread
From: Sarah Sharp @ 2013-04-29 21:23 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Bjorn Helgaas, Yinghai Lu, Gu Zheng, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org
On Mon, Apr 29, 2013 at 11:21:42AM -0700, Greg Kroah-Hartman wrote:
> On Mon, Apr 29, 2013 at 11:15:50AM -0700, Bjorn Helgaas wrote:
> > On Mon, Apr 29, 2013 at 08:19:10AM -0700, Yinghai Lu wrote:
> > > On Mon, Apr 29, 2013 at 3:04 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> > > > On 04/27/2013 05:01 AM, Yinghai Lu wrote:
> > > >> On Fri, Apr 26, 2013 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > > >>>
> > > >>> You can't be serious. This is a disgusting mess. Checking a list
> > > >>> pointer for LIST_POISON1? As far as I'm concerned, this is a waste of
> > > >>> my time.
> > >
> > > looks like xhci is using that LIST_POISON1 ...
> > >
> > Maybe my allergic reaction to your use of LIST_POISON1 is unjustified,
> > but I am dubious about the idea that xhci was the only place that needed
> > it before now, and we just happened to find one more place in PCI that
> > needs it. That doesn't make sense because good design patterns are used
> > many times, not just once or twice.
> >
> > I thought the whole point of the get/put scheme was that if we had a
> > pointer to a correctly reference-counted object, we didn't need to check
> > whether the object was still valid because the object remains valid until
> > all the references are released.
>
> You are correct, you shouldn't have to worry about that. If you have to
> do something like the LIST_POISON test, something is really wrong.
All right, I'll take a look at the xHCI code. From a brief glance, both
places that use LIST_POISON are handling a timed-out command. The
command handling in xHCI needs to get completely reworked anyway, due to
other race conditions.
Were you suggesting I use the get/put scheme in the xHCI driver, or was
that for Yinghai?
Sarah Sharp
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-29 21:23 ` Sarah Sharp
@ 2013-04-29 21:32 ` Greg Kroah-Hartman
0 siblings, 0 replies; 15+ messages in thread
From: Greg Kroah-Hartman @ 2013-04-29 21:32 UTC (permalink / raw)
To: Sarah Sharp
Cc: Bjorn Helgaas, Yinghai Lu, Gu Zheng, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org
On Mon, Apr 29, 2013 at 02:23:50PM -0700, Sarah Sharp wrote:
> On Mon, Apr 29, 2013 at 11:21:42AM -0700, Greg Kroah-Hartman wrote:
> > On Mon, Apr 29, 2013 at 11:15:50AM -0700, Bjorn Helgaas wrote:
> > > On Mon, Apr 29, 2013 at 08:19:10AM -0700, Yinghai Lu wrote:
> > > > On Mon, Apr 29, 2013 at 3:04 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> > > > > On 04/27/2013 05:01 AM, Yinghai Lu wrote:
> > > > >> On Fri, Apr 26, 2013 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > > > >>>
> > > > >>> You can't be serious. This is a disgusting mess. Checking a list
> > > > >>> pointer for LIST_POISON1? As far as I'm concerned, this is a waste of
> > > > >>> my time.
> > > >
> > > > looks like xhci is using that LIST_POISON1 ...
> > > >
> > > Maybe my allergic reaction to your use of LIST_POISON1 is unjustified,
> > > but I am dubious about the idea that xhci was the only place that needed
> > > it before now, and we just happened to find one more place in PCI that
> > > needs it. That doesn't make sense because good design patterns are used
> > > many times, not just once or twice.
> > >
> > > I thought the whole point of the get/put scheme was that if we had a
> > > pointer to a correctly reference-counted object, we didn't need to check
> > > whether the object was still valid because the object remains valid until
> > > all the references are released.
> >
> > You are correct, you shouldn't have to worry about that. If you have to
> > do something like the LIST_POISON test, something is really wrong.
>
> All right, I'll take a look at the xHCI code. From a brief glance, both
> places that use LIST_POISON are handling a timed-out command. The
> command handling in xHCI needs to get completely reworked anyway, due to
> other race conditions.
>
> Were you suggesting I use the get/put scheme in the xHCI driver, or was
> that for Yinghai?
I thought for Yinghai, but really, no code should have to check that
type of thing, so if the xhci driver is, it shouldn't :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-29 18:15 ` Bjorn Helgaas
2013-04-29 18:21 ` Greg Kroah-Hartman
@ 2013-04-29 22:17 ` Yinghai Lu
2013-04-30 21:29 ` Yinghai Lu
2013-04-30 9:17 ` Gu Zheng
2 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2013-04-29 22:17 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Gu Zheng, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
Greg Kroah-Hartman, Sarah Sharp
On Mon, Apr 29, 2013 at 11:15 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Apr 29, 2013 at 08:19:10AM -0700, Yinghai Lu wrote:
>> ok, i missed that. if we can use LIST_POISON, then could be more simple.
>> like -v4.
>
> I inlined your v4 patch below for convenience.
>
> Maybe my allergic reaction to your use of LIST_POISON1 is unjustified,
> but I am dubious about the idea that xhci was the only place that needed
> it before now, and we just happened to find one more place in PCI that
> needs it. That doesn't make sense because good design patterns are used
> many times, not just once or twice.
>
> I thought the whole point of the get/put scheme was that if we had a
> pointer to a correctly reference-counted object, we didn't need to check
> whether the object was still valid because the object remains valid until
> all the references are released.
>
> Gu's "[v2 2/2] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus)"
> patch essentially did this:
>
> pci_destroy_dev(struct pci_dev *dev) {
> ...
> + pci_bus_put(dev->bus)
> pci_free_resources(dev)
> put_device(&dev->dev)
> }
>
> I think this is the wrong place to do the pci_bus_put() because the
> pci_dev is reference-counted, and there may be other users that still
> have valid references to it.
>
> In this case, 10:00.0 is a bridge leading to [bus 11-1e], and 1a:01.0 is
> part of that subtree. The user removed both 10:00.0 and 1a:01.0 almost
> simultaneously via sysfs and we scheduled a callback for each.
>
> Each callback acquires a pci_dev reference, and removal of 10:00.0 and the
> subtree below it, including pci_destroy_dev(1a:01.0), is done first. The
> callback to remove 1a:01.0 is still pending and has a valid reference to
> the 1a:01.0 pci_dev.
>
> Since the 1a:01.0 callback is still pending, the put_device in that first
> pci_destroy_dev(1a:01.0) call decrements the ref count but doesn't release
> the pci_dev.
>
> I think the 1a:01.0 pci_dev should retain its reference to the pci_bus
> for as long as the pci_dev exists, so the pci_bus_put() should go in
> pci_release_dev() instead.
Good point.
will rework pci remove sequence.
Thanks
Yinghai
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-29 18:15 ` Bjorn Helgaas
2013-04-29 18:21 ` Greg Kroah-Hartman
2013-04-29 22:17 ` Yinghai Lu
@ 2013-04-30 9:17 ` Gu Zheng
2 siblings, 0 replies; 15+ messages in thread
From: Gu Zheng @ 2013-04-30 9:17 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Yinghai Lu, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, Greg Kroah-Hartman, Sarah Sharp
On 04/30/2013 02:15 AM, Bjorn Helgaas wrote:
> On Mon, Apr 29, 2013 at 08:19:10AM -0700, Yinghai Lu wrote:
>> On Mon, Apr 29, 2013 at 3:04 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>> On 04/27/2013 05:01 AM, Yinghai Lu wrote:
>>>> On Fri, Apr 26, 2013 at 1:53 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>>>
>>>>> You can't be serious. This is a disgusting mess. Checking a list
>>>>> pointer for LIST_POISON1? As far as I'm concerned, this is a waste of
>>>>> my time.
>>
>> looks like xhci is using that LIST_POISON1 ...
>>
>>>> Well, then need to hold the bus ref, and check bus->devices list instead.
>>>
>>> @@ -341,6 +352,7 @@ remove_store(struct device *dev, struct
>>> {
>>> int err;
>>> unsigned long val;
>>> + struct pci_dev *pdev;
>>>
>>> if (strict_strtoul(buf, 0, &val) < 0)
>>> return -EINVAL;
>>> @@ -351,9 +363,14 @@ remove_store(struct device *dev, struct
>>> /* An attribute cannot be unregistered by one of its own methods,
>>> * so we have to use this roundabout approach.
>>> */
>>> + pdev = pci_dev_get(to_pci_dev(dev));
>>>
>>> There is no need to increase pci_dev's ref here, because we'll increase it
>>> in sysfs_schedule_callback.
>>
>> ok, i missed that. if we can use LIST_POISON, then could be more simple.
>> like -v4.
>
> I inlined your v4 patch below for convenience.
>
> Maybe my allergic reaction to your use of LIST_POISON1 is unjustified,
> but I am dubious about the idea that xhci was the only place that needed
> it before now, and we just happened to find one more place in PCI that
> needs it. That doesn't make sense because good design patterns are used
> many times, not just once or twice.
>
> I thought the whole point of the get/put scheme was that if we had a
> pointer to a correctly reference-counted object, we didn't need to check
> whether the object was still valid because the object remains valid until
> all the references are released.
Agree.
>
> Gu's "[v2 2/2] PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus)"
> patch essentially did this:
>
> pci_destroy_dev(struct pci_dev *dev) {
> ...
> + pci_bus_put(dev->bus)
> pci_free_resources(dev)
> put_device(&dev->dev)
> }
>
> I think this is the wrong place to do the pci_bus_put() because the
> pci_dev is reference-counted, and there may be other users that still
> have valid references to it.
Thanks for your correction.
>
> In this case, 10:00.0 is a bridge leading to [bus 11-1e], and 1a:01.0 is
> part of that subtree. The user removed both 10:00.0 and 1a:01.0 almost
> simultaneously via sysfs and we scheduled a callback for each.
>
> Each callback acquires a pci_dev reference, and removal of 10:00.0 and the
> subtree below it, including pci_destroy_dev(1a:01.0), is done first. The
> callback to remove 1a:01.0 is still pending and has a valid reference to
> the 1a:01.0 pci_dev.
>
> Since the 1a:01.0 callback is still pending, the put_device in that first
> pci_destroy_dev(1a:01.0) call decrements the ref count but doesn't release
> the pci_dev.
>
> I think the 1a:01.0 pci_dev should retain its reference to the pci_bus
> for as long as the pci_dev exists, so the pci_bus_put() should go in
> pci_release_dev() instead.
Yes, it's the correct way.
Best regards,
Gu
>
> Bjorn
>
>> Subject: [PATCH -v4] PCI: Fix racing for pci device removing via sysfs
>> From: Yinghai Lu <yinghai@kernel.org>
>>
>> Gu found nested removing through
>> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
>> /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>
>> will cause kernel crash as bus get freed.
>>
>> [ 418.946462] CPU 4
>> [ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
>> FUJITSU-SV PRIMEQUEST 1800E/SB
>> [ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
>> pci_bus_read_config_word+0x5e/0x90
>> [ 420.494137] Call Trace:
>> [ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
>> [ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
>> [ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
>> [ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
>> [ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
>> [ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=54411
>>
>> We have one patch that will let device hold bus ref to prevent it from
>> being freed, but that will still generate warning.
>>
>> ------------[ cut here ]------------
>> WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
>> Hardware name: PRIMEQUEST 1800E
>> list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
>> Call Trace:
>> [<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
>> [<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
>> [<ffffffff81280b13>] __list_del_entry+0x63/0xd0
>> [<ffffffff81280b91>] list_del+0x11/0x40
>> [<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
>> [<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
>> [<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
>> [<ffffffff8129fc89>] remove_callback+0x29/0x40
>> [<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
>>
>> We can just check if the device get removed from pci tree
>> already in the protection under pci_remove_rescan_mutex.
>>
>> -v2: check if the dev->bus_list is poisoned instead to
>> find out if it is removed already.
>> Also add one extra ref to dev to make sure dev is not
>> get freed too early.
>> -v4: remove not needed ref holding pointed by Gu Zheng.
>>
>> Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>>
>> ---
>> drivers/pci/pci-sysfs.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> Index: linux-2.6/drivers/pci/pci-sysfs.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/pci/pci-sysfs.c
>> +++ linux-2.6/drivers/pci/pci-sysfs.c
>> @@ -331,7 +331,8 @@ static void remove_callback(struct devic
>> struct pci_dev *pdev = to_pci_dev(dev);
>>
>> mutex_lock(&pci_remove_rescan_mutex);
>> - pci_stop_and_remove_bus_device(pdev);
>> + if (pdev->bus_list.next != LIST_POISON1)
>> + pci_stop_and_remove_bus_device(pdev);
>> mutex_unlock(&pci_remove_rescan_mutex);
>> }
>>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-29 22:17 ` Yinghai Lu
@ 2013-04-30 21:29 ` Yinghai Lu
2013-05-08 23:43 ` Bjorn Helgaas
0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2013-04-30 21:29 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Gu Zheng, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
Greg Kroah-Hartman, Sarah Sharp
[-- Attachment #1: Type: text/plain, Size: 475 bytes --]
On Mon, Apr 29, 2013 at 3:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Mon, Apr 29, 2013 at 11:15 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>
>> I think the 1a:01.0 pci_dev should retain its reference to the pci_bus
>> for as long as the pci_dev exists, so the pci_bus_put() should go in
>> pci_release_dev() instead.
>
> Good point.
>
> will rework pci remove sequence.
Please check attached version that will not need to touch pci sysfs bits.
Thanks
Yinghai
[-- Attachment #2: fix_racing_removing_5.patch --]
[-- Type: application/octet-stream, Size: 5634 bytes --]
Subject: [PATCH -v5] PCI: Fix racing for pci device removing via sysfs
From: Yinghai Lu <yinghai@kernel.org>
Gu found nested removing through
echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
/sys/bus/pci/devices/0000\:1a\:01.0/remove
will cause kernel crash as bus get freed.
[ 418.946462] CPU 4
[ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
FUJITSU-SV PRIMEQUEST 1800E/SB
[ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
pci_bus_read_config_word+0x5e/0x90
[ 420.494137] Call Trace:
[ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
[ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
[ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
[ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
[ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
[ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
https://bugzilla.kernel.org/show_bug.cgi?id=54411
We have one patch that will let device hold bus ref to prevent it from
being freed, but that will still generate warning.
------------[ cut here ]------------
WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
Hardware name: PRIMEQUEST 1800E
list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
Call Trace:
[<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81280b13>] __list_del_entry+0x63/0xd0
[<ffffffff81280b91>] list_del+0x11/0x40
[<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
[<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
[<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
[<ffffffff8129fc89>] remove_callback+0x29/0x40
[<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
Bjorn pointed out that pci_dev should retain its reference to the pci_bus
for as long as the pci_dev exists, so the release reference should go in
pci_release_dev() instead.
Here we move other bus_list change to pci_release_dev too to make
it consistent with bus reference change.
Also make stop device actually detach driver only, and remove
device will do device_del instead. Then also could make hostbridge
to use device_unregister to be pair with device_register before.
At last we will not need to touch pci-sysfs bits.
Reported-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
---
drivers/pci/probe.c | 22 ++++++++++++++++++++++
drivers/pci/remove.c | 32 ++++++++------------------------
2 files changed, 30 insertions(+), 24 deletions(-)
Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -1119,6 +1119,20 @@ static void pci_release_capabilities(str
pci_free_cap_save_buffers(dev);
}
+static void pci_free_resources(struct pci_dev *dev)
+{
+ int i;
+
+ msi_remove_pci_irq_vectors(dev);
+
+ pci_cleanup_rom(dev);
+ for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+ struct resource *res = dev->resource + i;
+ if (res->parent)
+ release_resource(res);
+ }
+}
+
/**
* pci_release_dev - free a pci device structure when all users of it are finished.
* @dev: device that's been disconnected
@@ -1131,6 +1145,13 @@ static void pci_release_dev(struct devic
struct pci_dev *pci_dev;
pci_dev = to_pci_dev(dev);
+
+ down_write(&pci_bus_sem);
+ list_del(&pci_dev->bus_list);
+ up_write(&pci_bus_sem);
+ pci_free_resources(pci_dev);
+ put_device(&pci_dev->bus->dev);
+
pci_release_capabilities(pci_dev);
pci_release_of_node(pci_dev);
kfree(pci_dev);
@@ -1340,6 +1361,7 @@ void pci_device_add(struct pci_dev *dev,
down_write(&pci_bus_sem);
list_add_tail(&dev->bus_list, &bus->devices);
up_write(&pci_bus_sem);
+ get_device(&bus->dev);
ret = pcibios_add_device(dev);
WARN_ON(ret < 0);
Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -3,20 +3,6 @@
#include <linux/pci-aspm.h>
#include "pci.h"
-static void pci_free_resources(struct pci_dev *dev)
-{
- int i;
-
- msi_remove_pci_irq_vectors(dev);
-
- pci_cleanup_rom(dev);
- for (i = 0; i < PCI_NUM_RESOURCES; i++) {
- struct resource *res = dev->resource + i;
- if (res->parent)
- release_resource(res);
- }
-}
-
static void pci_stop_dev(struct pci_dev *dev)
{
pci_pme_active(dev, false);
@@ -24,8 +10,7 @@ static void pci_stop_dev(struct pci_dev
if (dev->is_added) {
pci_proc_detach_device(dev);
pci_remove_sysfs_dev_files(dev);
- device_del(&dev->dev);
- dev->is_added = 0;
+ device_release_driver(&dev->dev);
}
if (dev->bus->self)
@@ -34,12 +19,11 @@ static void pci_stop_dev(struct pci_dev
static void pci_destroy_dev(struct pci_dev *dev)
{
- down_write(&pci_bus_sem);
- list_del(&dev->bus_list);
- up_write(&pci_bus_sem);
-
- pci_free_resources(dev);
- put_device(&dev->dev);
+ if (dev->is_added) {
+ device_del(&dev->dev);
+ put_device(&dev->dev);
+ dev->is_added = 0;
+ }
}
void pci_remove_bus(struct pci_bus *bus)
@@ -126,7 +110,7 @@ void pci_stop_root_bus(struct pci_bus *b
pci_stop_bus_device(child);
/* stop the host bridge */
- device_del(&host_bridge->dev);
+ device_release_driver(&host_bridge->dev);
}
void pci_remove_root_bus(struct pci_bus *bus)
@@ -145,5 +129,5 @@ void pci_remove_root_bus(struct pci_bus
host_bridge->bus = NULL;
/* remove the host bridge */
- put_device(&host_bridge->dev);
+ device_unregister(&host_bridge->dev);
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] PCI: Fix racing for pci device removing via sysfs
2013-04-30 21:29 ` Yinghai Lu
@ 2013-05-08 23:43 ` Bjorn Helgaas
0 siblings, 0 replies; 15+ messages in thread
From: Bjorn Helgaas @ 2013-05-08 23:43 UTC (permalink / raw)
To: Yinghai Lu
Cc: Gu Zheng, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
Greg Kroah-Hartman, Sarah Sharp
On Tue, Apr 30, 2013 at 02:29:35PM -0700, Yinghai Lu wrote:
> On Mon, Apr 29, 2013 at 3:17 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> > On Mon, Apr 29, 2013 at 11:15 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >>
> >> I think the 1a:01.0 pci_dev should retain its reference to the pci_bus
> >> for as long as the pci_dev exists, so the pci_bus_put() should go in
> >> pci_release_dev() instead.
> >
> > Good point.
> >
> > will rework pci remove sequence.
>
> Please check attached version that will not need to touch pci sysfs bits.
I use patchwork to keep track of things I need to look at, and I don't
think patchwork looks at attachments. Just FYI in case I seem to be
ignoring things; it might be that they just didn't appear on my patchwork
"to-do" list.
I completely agree that gmail makes it impossible to send patches in-line.
On the other hand, sending them as attachments is easy for you but makes it
difficult for others to review and reply to them. I'm using mutt to
comment on your patch, but eventually I'll get tired of doing the extra
work on my end :)
I tried to apply this on top of 96a3e8af5a (Linus' merge of the v3.10 PCI
changes), but it didn't apply cleanly. I assume you'll rebase it to
v3.10-rc1 when it comes out.
> Subject: [PATCH -v5] PCI: Fix racing for pci device removing via sysfs
> From: Yinghai Lu <yinghai@kernel.org>
> ...
> Index: linux-2.6/drivers/pci/probe.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/probe.c
> +++ linux-2.6/drivers/pci/probe.c
> @@ -1119,6 +1119,20 @@ static void pci_release_capabilities(str
> pci_free_cap_save_buffers(dev);
> }
>
> +static void pci_free_resources(struct pci_dev *dev)
> +{
> + int i;
> +
> + msi_remove_pci_irq_vectors(dev);
> +
> + pci_cleanup_rom(dev);
> + for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> + struct resource *res = dev->resource + i;
> + if (res->parent)
> + release_resource(res);
> + }
> +}
> +
> /**
> * pci_release_dev - free a pci device structure when all users of it are finished.
> * @dev: device that's been disconnected
> @@ -1131,6 +1145,13 @@ static void pci_release_dev(struct devic
> struct pci_dev *pci_dev;
>
> pci_dev = to_pci_dev(dev);
> +
> + down_write(&pci_bus_sem);
> + list_del(&pci_dev->bus_list);
> + up_write(&pci_bus_sem);
> + pci_free_resources(pci_dev);
> + put_device(&pci_dev->bus->dev);
Is there any reason to drop the pci_bus reference here, as opposed to
doing it after the "kfree(pci_dev)"? We call a couple more things below,
and it's possible that they will still reference pci_dev->bus.
> +
> pci_release_capabilities(pci_dev);
> pci_release_of_node(pci_dev);
> kfree(pci_dev);
> @@ -1340,6 +1361,7 @@ void pci_device_add(struct pci_dev *dev,
> down_write(&pci_bus_sem);
> list_add_tail(&dev->bus_list, &bus->devices);
> up_write(&pci_bus_sem);
> + get_device(&bus->dev);
>
> ret = pcibios_add_device(dev);
> WARN_ON(ret < 0);
> Index: linux-2.6/drivers/pci/remove.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/remove.c
> +++ linux-2.6/drivers/pci/remove.c
> @@ -3,20 +3,6 @@
> #include <linux/pci-aspm.h>
> #include "pci.h"
>
> -static void pci_free_resources(struct pci_dev *dev)
> -{
> - int i;
> -
> - msi_remove_pci_irq_vectors(dev);
> -
> - pci_cleanup_rom(dev);
> - for (i = 0; i < PCI_NUM_RESOURCES; i++) {
> - struct resource *res = dev->resource + i;
> - if (res->parent)
> - release_resource(res);
> - }
> -}
> -
> static void pci_stop_dev(struct pci_dev *dev)
> {
> pci_pme_active(dev, false);
> @@ -24,8 +10,7 @@ static void pci_stop_dev(struct pci_dev
> if (dev->is_added) {
> pci_proc_detach_device(dev);
> pci_remove_sysfs_dev_files(dev);
> - device_del(&dev->dev);
> - dev->is_added = 0;
> + device_release_driver(&dev->dev);
> }
>
> if (dev->bus->self)
> @@ -34,12 +19,11 @@ static void pci_stop_dev(struct pci_dev
>
> static void pci_destroy_dev(struct pci_dev *dev)
> {
> - down_write(&pci_bus_sem);
> - list_del(&dev->bus_list);
> - up_write(&pci_bus_sem);
> -
> - pci_free_resources(dev);
> - put_device(&dev->dev);
> + if (dev->is_added) {
If it's possible that "dev->is_added == 0" here, doesn't that mean
we leaked a struct pci_dev?
For example, if we're hot-adding a device, dev->is_added is zero
between points A and B here:
pciehp_configure_device
pci_scan_slot
pci_scan_single_device
pci_scan_device
dev = alloc_pci_dev # A) dev->is_added == 0 here
pci_device_add
device_initialize
device_add
pci_bus_add_devices
pci_bus_add_device
device_attach
dev->is_added = 1 # B) dev->is_added == 1 here
If we can get to pci_destroy_dev() for that device during the
interval between A and B, dev->is_added will be zero, and I don't
know where we will ever clean up the device.
If we *can't* get here during that interval, there shouldn't be any
need to test dev->is_added.
> + device_del(&dev->dev);
> + put_device(&dev->dev);
> + dev->is_added = 0;
> + }
> }
>
> void pci_remove_bus(struct pci_bus *bus)
> @@ -126,7 +110,7 @@ void pci_stop_root_bus(struct pci_bus *b
> pci_stop_bus_device(child);
>
> /* stop the host bridge */
> - device_del(&host_bridge->dev);
> + device_release_driver(&host_bridge->dev);
> }
>
> void pci_remove_root_bus(struct pci_bus *bus)
> @@ -145,5 +129,5 @@ void pci_remove_root_bus(struct pci_bus
> host_bridge->bus = NULL;
>
> /* remove the host bridge */
> - put_device(&host_bridge->dev);
> + device_unregister(&host_bridge->dev);
> }
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-05-08 23:43 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-26 1:47 [PATCH] PCI: Fix racing for pci device removing via sysfs Yinghai Lu
2013-04-26 16:28 ` Bjorn Helgaas
2013-04-26 20:20 ` Yinghai Lu
2013-04-26 20:53 ` Bjorn Helgaas
2013-04-26 21:01 ` Yinghai Lu
2013-04-29 10:04 ` Gu Zheng
2013-04-29 15:19 ` Yinghai Lu
2013-04-29 18:15 ` Bjorn Helgaas
2013-04-29 18:21 ` Greg Kroah-Hartman
2013-04-29 21:23 ` Sarah Sharp
2013-04-29 21:32 ` Greg Kroah-Hartman
2013-04-29 22:17 ` Yinghai Lu
2013-04-30 21:29 ` Yinghai Lu
2013-05-08 23:43 ` Bjorn Helgaas
2013-04-30 9:17 ` Gu Zheng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).