* [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
@ 2013-04-18 8:53 Gu Zheng
2013-04-18 19:43 ` Yinghai Lu
0 siblings, 1 reply; 8+ messages in thread
From: Gu Zheng @ 2013-04-18 8:53 UTC (permalink / raw)
To: linux-pci
Cc: Bjorn Helgaas, Yasuaki Ishimatsu, Taku Izumi, Yinghai Lu,
Jiang Liu, tangchen, 'Lin Feng'
>From 0c45a7fca2276123d0b926a22ea69158dad8ab9c Mon Sep 17 00:00:00 2001
From: Gu Zheng <guz.fnst@cn.fujitsu.com>
Date: Thu, 18 Apr 2013 17:43:53 +0900
Subject: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
This patch is used to fix the panic issue of parallel device removal on different pci hierarchy,
refer to https://bugzilla.kernel.org/show_bug.cgi?id=54411.
[ 418.775140] ioatdma i7core_edac edac_core sg e1000e igb dca ptp pps_core
sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas
scsi_mod
[ 418.946462] CPU 4
[ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
FUJITSU-SV PRIMEQUEST 1800E/SB
[ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
pci_bus_read_config_word+0x5e/0x90
[ 419.189965] RSP: 0018:ffff8807b0a37c08 EFLAGS: 00010046
[ 419.253409] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8807bb4a1290 RCX:
0000000000000002
[ 419.338658] RDX: 00000000000000c4 RSI: 0000000000000008 RDI:
ffff8807bb4a1290
[ 419.423925] RBP: ffff8807b0a37c48 R08: ffff8807b0a37c24 R09:
6db5c22da55960d0
[ 419.509175] R10: 0000000000000000 R11: 000000000003ecd0 R12:
ffff8807b0a37c66
[ 419.594425] R13: 0000000000000282 R14: ffffffff82126d40 R15:
0000000000000000
[ 419.679675] FS: 0000000000000000(0000) GS:ffff8807c2200000(0000)
knlGS:0000000000000000
[ 419.776343] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 419.844981] CR2: 00007ffa898a54f8 CR3: 0000000001c0c000 CR4:
00000000000007e0
[ 419.930236] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 420.015484] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 420.100736] Process kworker/u:2 (pid: 512, threadinfo ffff8807b0a36000, task
ffff8807b30bcd00)
[ 420.203632] Stack:
[ 420.227623] ffff8807000000c4 ffffffff00000008 ffffffff813851ef
0000000000992000
[ 420.316421] ffff8807b0a37c98 ffff8807bb49b3d8 0000000000000000
0000000000000000
[ 420.405233] ffff8807b0a37c88 ffffffff8138044b ffff8807b0a37c88
0000000000000246
[ 420.494137] Call Trace:
[ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
[ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
[ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
[ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
[ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
[ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
[ 420.958017] [<ffffffff810919ae>] process_one_work+0x20e/0x5c0
[ 421.027691] [<ffffffff8109193f>] ? process_one_work+0x19f/0x5c0
[ 421.099441] [<ffffffff81257a30>] ? sysfs_schedule_callback+0x210/0x210
[ 421.178461] [<ffffffff81093a4e>] worker_thread+0x12e/0x370
[ 421.245020] [<ffffffff81093920>] ? manage_workers+0x180/0x180
[ 421.314697] [<ffffffff81099b8e>] kthread+0xee/0x100
[ 421.373992] [<ffffffff810e0f09>] ? __lock_release+0x129/0x190
[ 421.443671] [<ffffffff81099aa0>] ? __init_kthread_worker+0x70/0x70
[ 421.518544] [<ffffffff816b2dac>] ret_from_fork+0x7c/0xb0
[ 421.583031] [<ffffffff81099aa0>] ? __init_kthread_worker+0x70/0x70
[ 421.657894] Code: 89 75 c8 c7 45 dc 00 00 00 00 e8 4e ef 32 00 49 89 c5 48
8b 83 b8 00 00 00 4c 8d 45 dc b9 02 00 00 00 8b 55 c0 8b 75 c8 48 89 df <ff> 10
8b 55 dc 4c 89 ee 48 c7 c7 c0 67 cb 81 89 45 c8 66 41 89
[ 421.890306] RIP [<ffffffff8137972e>] pci_bus_read_config_word+0x5e/0x90
[ 421.970475] RSP <ffff8807b0a37c08>
[ 422.012121] ---[ end trace 403f76cf31f1bcb1 ]---
It is easy to reproduce with the following script:
echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
/sys/bus/pci/devices/0000\:1a\:01.0/remove
The 1a:01.0 device is downstream from the 10:00.0 bridge.
The sysfs interface remove_store() uses device_schedule_callback() to schedule
the remove for later. What's happening is that we schedule
remove_callback() for both devices before 10:00.0 has been removed,
like this:
# echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
remove_store # for 10:00.0
device_schedule_callback(10:00.0, remove_callback)
sysfs_schedule_callback
kobject_get
queue_work
# echo -n 1 > /sys/bus/pci/devices/0000\:1a\:01.0/remove
remove_store # for 1a:01.0
device_schedule_callback(1a:01.0, remove_callback)
sysfs_schedule_callback
kobject_get
queue_work
Later, we run the callbacks, starting with 10:00.0. This calls
remove_callback() to perform the remove:
remove_callback(10:00.0)
mutex_lock(&pci_remove_rescan_mutex)
pci_stop_and_remove_bus_device(pdev)
mutex_unlock(&pci_remove_rescan_mutex)
This will stop and remove the subtree below 10:00.0, but it does not
actually free the pci_dev for 1a:01.0 because we increased its ref
count in sysfs_schedule_callback. So after completing
remove_callback(10:00.0), we run the second callback for 1a:01.0.
But the PCI core did this removal wrong. It deallocated the struct pci_bus
for bus 0000:1a too soon.
So we take a reference on the bus object when capturing the struct pci_bus pointer,
in order to keep it valid before its downstream devices' removal routines complete.
Gu Zheng (3):
PCI: take a reference on the bus object when we capture the struct
pci_bus pointer
PCI: rename alloc_pci_dev() to pci_alloc_dev()
PCI: Move the acquiring the reference of pci bus inside
pci_alloc_bus()
arch/powerpc/kernel/pci_of_scan.c | 3 +--
drivers/char/agp/alpha-agp.c | 2 +-
drivers/char/agp/parisc-agp.c | 2 +-
drivers/pci/bus.c | 14 ++++++++++++++
drivers/pci/iov.c | 6 ++++--
drivers/pci/probe.c | 10 ++++++----
drivers/scsi/megaraid.c | 2 +-
include/linux/pci.h | 5 ++++-
8 files changed, 32 insertions(+), 12 deletions(-)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
2013-04-18 8:53 [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy Gu Zheng
@ 2013-04-18 19:43 ` Yinghai Lu
2013-04-22 3:02 ` Gu Zheng
2013-04-24 9:36 ` Gu Zheng
0 siblings, 2 replies; 8+ messages in thread
From: Yinghai Lu @ 2013-04-18 19:43 UTC (permalink / raw)
To: Gu Zheng
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas, Yasuaki Ishimatsu,
Taku Izumi, Jiang Liu, tangchen, Lin Feng
[-- Attachment #1: Type: text/plain, Size: 6466 bytes --]
On Thu, Apr 18, 2013 at 1:53 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> From 0c45a7fca2276123d0b926a22ea69158dad8ab9c Mon Sep 17 00:00:00 2001
> From: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Date: Thu, 18 Apr 2013 17:43:53 +0900
> Subject: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
>
> This patch is used to fix the panic issue of parallel device removal on different pci hierarchy,
> refer to https://bugzilla.kernel.org/show_bug.cgi?id=54411.
>
> [ 418.775140] ioatdma i7core_edac edac_core sg e1000e igb dca ptp pps_core
> sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas
> scsi_mod
> [ 418.946462] CPU 4
> [ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
> FUJITSU-SV PRIMEQUEST 1800E/SB
> [ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
> pci_bus_read_config_word+0x5e/0x90
> [ 419.189965] RSP: 0018:ffff8807b0a37c08 EFLAGS: 00010046
> [ 419.253409] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8807bb4a1290 RCX:
> 0000000000000002
> [ 419.338658] RDX: 00000000000000c4 RSI: 0000000000000008 RDI:
> ffff8807bb4a1290
> [ 419.423925] RBP: ffff8807b0a37c48 R08: ffff8807b0a37c24 R09:
> 6db5c22da55960d0
> [ 419.509175] R10: 0000000000000000 R11: 000000000003ecd0 R12:
> ffff8807b0a37c66
> [ 419.594425] R13: 0000000000000282 R14: ffffffff82126d40 R15:
> 0000000000000000
> [ 419.679675] FS: 0000000000000000(0000) GS:ffff8807c2200000(0000)
> knlGS:0000000000000000
> [ 419.776343] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 419.844981] CR2: 00007ffa898a54f8 CR3: 0000000001c0c000 CR4:
> 00000000000007e0
> [ 419.930236] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 420.015484] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [ 420.100736] Process kworker/u:2 (pid: 512, threadinfo ffff8807b0a36000, task
> ffff8807b30bcd00)
> [ 420.203632] Stack:
> [ 420.227623] ffff8807000000c4 ffffffff00000008 ffffffff813851ef
> 0000000000992000
> [ 420.316421] ffff8807b0a37c98 ffff8807bb49b3d8 0000000000000000
> 0000000000000000
> [ 420.405233] ffff8807b0a37c88 ffffffff8138044b ffff8807b0a37c88
> 0000000000000246
> [ 420.494137] Call Trace:
> [ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
> [ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
> [ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
> [ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
> [ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
> [ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
> [ 420.958017] [<ffffffff810919ae>] process_one_work+0x20e/0x5c0
> [ 421.027691] [<ffffffff8109193f>] ? process_one_work+0x19f/0x5c0
> [ 421.099441] [<ffffffff81257a30>] ? sysfs_schedule_callback+0x210/0x210
> [ 421.178461] [<ffffffff81093a4e>] worker_thread+0x12e/0x370
> [ 421.245020] [<ffffffff81093920>] ? manage_workers+0x180/0x180
> [ 421.314697] [<ffffffff81099b8e>] kthread+0xee/0x100
> [ 421.373992] [<ffffffff810e0f09>] ? __lock_release+0x129/0x190
> [ 421.443671] [<ffffffff81099aa0>] ? __init_kthread_worker+0x70/0x70
> [ 421.518544] [<ffffffff816b2dac>] ret_from_fork+0x7c/0xb0
> [ 421.583031] [<ffffffff81099aa0>] ? __init_kthread_worker+0x70/0x70
> [ 421.657894] Code: 89 75 c8 c7 45 dc 00 00 00 00 e8 4e ef 32 00 49 89 c5 48
> 8b 83 b8 00 00 00 4c 8d 45 dc b9 02 00 00 00 8b 55 c0 8b 75 c8 48 89 df <ff> 10
> 8b 55 dc 4c 89 ee 48 c7 c7 c0 67 cb 81 89 45 c8 66 41 89
> [ 421.890306] RIP [<ffffffff8137972e>] pci_bus_read_config_word+0x5e/0x90
> [ 421.970475] RSP <ffff8807b0a37c08>
> [ 422.012121] ---[ end trace 403f76cf31f1bcb1 ]---
>
> It is easy to reproduce with the following script:
> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
> /sys/bus/pci/devices/0000\:1a\:01.0/remove
>
> The 1a:01.0 device is downstream from the 10:00.0 bridge.
>
> The sysfs interface remove_store() uses device_schedule_callback() to schedule
> the remove for later. What's happening is that we schedule
> remove_callback() for both devices before 10:00.0 has been removed,
> like this:
>
> # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
> remove_store # for 10:00.0
> device_schedule_callback(10:00.0, remove_callback)
> sysfs_schedule_callback
> kobject_get
> queue_work
> # echo -n 1 > /sys/bus/pci/devices/0000\:1a\:01.0/remove
> remove_store # for 1a:01.0
> device_schedule_callback(1a:01.0, remove_callback)
> sysfs_schedule_callback
> kobject_get
> queue_work
>
> Later, we run the callbacks, starting with 10:00.0. This calls
> remove_callback() to perform the remove:
>
> remove_callback(10:00.0)
> mutex_lock(&pci_remove_rescan_mutex)
> pci_stop_and_remove_bus_device(pdev)
> mutex_unlock(&pci_remove_rescan_mutex)
>
> This will stop and remove the subtree below 10:00.0, but it does not
> actually free the pci_dev for 1a:01.0 because we increased its ref
> count in sysfs_schedule_callback. So after completing
> remove_callback(10:00.0), we run the second callback for 1a:01.0.
>
> But the PCI core did this removal wrong. It deallocated the struct pci_bus
> for bus 0000:1a too soon.
>
> So we take a reference on the bus object when capturing the struct pci_bus pointer,
> in order to keep it valid before its downstream devices' removal routines complete.
>
> Gu Zheng (3):
> PCI: take a reference on the bus object when we capture the struct
> pci_bus pointer
> PCI: rename alloc_pci_dev() to pci_alloc_dev()
> PCI: Move the acquiring the reference of pci bus inside
> pci_alloc_bus()
>
> arch/powerpc/kernel/pci_of_scan.c | 3 +--
> drivers/char/agp/alpha-agp.c | 2 +-
> drivers/char/agp/parisc-agp.c | 2 +-
> drivers/pci/bus.c | 14 ++++++++++++++
> drivers/pci/iov.c | 6 ++++--
> drivers/pci/probe.c | 10 ++++++----
> drivers/scsi/megaraid.c | 2 +-
> include/linux/pci.h | 5 ++++-
> 8 files changed, 32 insertions(+), 12 deletions(-)
No. that is wrong.
I can not find where that reference count get reduced!
in the my test patches that I sent, it does have have that in remove path.
What is reason for you to remove that line?
After put the line back, your test still can pass?
Yinghai
[-- Attachment #2: pci_dev_bus_ref.patch --]
[-- Type: application/octet-stream, Size: 903 bytes --]
---
drivers/pci/probe.c | 1 +
drivers/pci/remove.c | 1 +
2 files changed, 2 insertions(+)
Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -1337,6 +1337,7 @@ void pci_device_add(struct pci_dev *dev,
*/
down_write(&pci_bus_sem);
list_add_tail(&dev->bus_list, &bus->devices);
+ get_device(&bus->dev);
up_write(&pci_bus_sem);
pci_fixup_device(pci_fixup_final, dev);
Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -36,6 +36,7 @@ static void pci_destroy_dev(struct pci_d
{
down_write(&pci_bus_sem);
list_del(&dev->bus_list);
+ put_device(&dev->bus->dev);
up_write(&pci_bus_sem);
pci_free_resources(dev);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
2013-04-18 19:43 ` Yinghai Lu
@ 2013-04-22 3:02 ` Gu Zheng
2013-04-24 9:36 ` Gu Zheng
1 sibling, 0 replies; 8+ messages in thread
From: Gu Zheng @ 2013-04-22 3:02 UTC (permalink / raw)
To: Yinghai Lu
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas, Yasuaki Ishimatsu,
Taku Izumi, Jiang Liu, tangchen, Lin Feng
On 04/19/2013 03:43 AM, Yinghai Lu wrote:
> On Thu, Apr 18, 2013 at 1:53 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>> From 0c45a7fca2276123d0b926a22ea69158dad8ab9c Mon Sep 17 00:00:00 2001
>> From: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Date: Thu, 18 Apr 2013 17:43:53 +0900
>> Subject: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
>>
>> This patch is used to fix the panic issue of parallel device removal on different pci hierarchy,
>> refer to https://bugzilla.kernel.org/show_bug.cgi?id=54411.
>>
>> [ 418.775140] ioatdma i7core_edac edac_core sg e1000e igb dca ptp pps_core
>> sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas
>> scsi_mod
>> [ 418.946462] CPU 4
>> [ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
>> FUJITSU-SV PRIMEQUEST 1800E/SB
>> [ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
>> pci_bus_read_config_word+0x5e/0x90
>> [ 419.189965] RSP: 0018:ffff8807b0a37c08 EFLAGS: 00010046
>> [ 419.253409] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8807bb4a1290 RCX:
>> 0000000000000002
>> [ 419.338658] RDX: 00000000000000c4 RSI: 0000000000000008 RDI:
>> ffff8807bb4a1290
>> [ 419.423925] RBP: ffff8807b0a37c48 R08: ffff8807b0a37c24 R09:
>> 6db5c22da55960d0
>> [ 419.509175] R10: 0000000000000000 R11: 000000000003ecd0 R12:
>> ffff8807b0a37c66
>> [ 419.594425] R13: 0000000000000282 R14: ffffffff82126d40 R15:
>> 0000000000000000
>> [ 419.679675] FS: 0000000000000000(0000) GS:ffff8807c2200000(0000)
>> knlGS:0000000000000000
>> [ 419.776343] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [ 419.844981] CR2: 00007ffa898a54f8 CR3: 0000000001c0c000 CR4:
>> 00000000000007e0
>> [ 419.930236] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [ 420.015484] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [ 420.100736] Process kworker/u:2 (pid: 512, threadinfo ffff8807b0a36000, task
>> ffff8807b30bcd00)
>> [ 420.203632] Stack:
>> [ 420.227623] ffff8807000000c4 ffffffff00000008 ffffffff813851ef
>> 0000000000992000
>> [ 420.316421] ffff8807b0a37c98 ffff8807bb49b3d8 0000000000000000
>> 0000000000000000
>> [ 420.405233] ffff8807b0a37c88 ffffffff8138044b ffff8807b0a37c88
>> 0000000000000246
>> [ 420.494137] Call Trace:
>> [ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
>> [ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
>> [ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
>> [ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
>> [ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
>> [ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
>> [ 420.958017] [<ffffffff810919ae>] process_one_work+0x20e/0x5c0
>> [ 421.027691] [<ffffffff8109193f>] ? process_one_work+0x19f/0x5c0
>> [ 421.099441] [<ffffffff81257a30>] ? sysfs_schedule_callback+0x210/0x210
>> [ 421.178461] [<ffffffff81093a4e>] worker_thread+0x12e/0x370
>> [ 421.245020] [<ffffffff81093920>] ? manage_workers+0x180/0x180
>> [ 421.314697] [<ffffffff81099b8e>] kthread+0xee/0x100
>> [ 421.373992] [<ffffffff810e0f09>] ? __lock_release+0x129/0x190
>> [ 421.443671] [<ffffffff81099aa0>] ? __init_kthread_worker+0x70/0x70
>> [ 421.518544] [<ffffffff816b2dac>] ret_from_fork+0x7c/0xb0
>> [ 421.583031] [<ffffffff81099aa0>] ? __init_kthread_worker+0x70/0x70
>> [ 421.657894] Code: 89 75 c8 c7 45 dc 00 00 00 00 e8 4e ef 32 00 49 89 c5 48
>> 8b 83 b8 00 00 00 4c 8d 45 dc b9 02 00 00 00 8b 55 c0 8b 75 c8 48 89 df <ff> 10
>> 8b 55 dc 4c 89 ee 48 c7 c7 c0 67 cb 81 89 45 c8 66 41 89
>> [ 421.890306] RIP [<ffffffff8137972e>] pci_bus_read_config_word+0x5e/0x90
>> [ 421.970475] RSP <ffff8807b0a37c08>
>> [ 422.012121] ---[ end trace 403f76cf31f1bcb1 ]---
>>
>> It is easy to reproduce with the following script:
>> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
>> /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>
>> The 1a:01.0 device is downstream from the 10:00.0 bridge.
>>
>> The sysfs interface remove_store() uses device_schedule_callback() to schedule
>> the remove for later. What's happening is that we schedule
>> remove_callback() for both devices before 10:00.0 has been removed,
>> like this:
>>
>> # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
>> remove_store # for 10:00.0
>> device_schedule_callback(10:00.0, remove_callback)
>> sysfs_schedule_callback
>> kobject_get
>> queue_work
>> # echo -n 1 > /sys/bus/pci/devices/0000\:1a\:01.0/remove
>> remove_store # for 1a:01.0
>> device_schedule_callback(1a:01.0, remove_callback)
>> sysfs_schedule_callback
>> kobject_get
>> queue_work
>>
>> Later, we run the callbacks, starting with 10:00.0. This calls
>> remove_callback() to perform the remove:
>>
>> remove_callback(10:00.0)
>> mutex_lock(&pci_remove_rescan_mutex)
>> pci_stop_and_remove_bus_device(pdev)
>> mutex_unlock(&pci_remove_rescan_mutex)
>>
>> This will stop and remove the subtree below 10:00.0, but it does not
>> actually free the pci_dev for 1a:01.0 because we increased its ref
>> count in sysfs_schedule_callback. So after completing
>> remove_callback(10:00.0), we run the second callback for 1a:01.0.
>>
>> But the PCI core did this removal wrong. It deallocated the struct pci_bus
>> for bus 0000:1a too soon.
>>
>> So we take a reference on the bus object when capturing the struct pci_bus pointer,
>> in order to keep it valid before its downstream devices' removal routines complete.
>>
>> Gu Zheng (3):
>> PCI: take a reference on the bus object when we capture the struct
>> pci_bus pointer
>> PCI: rename alloc_pci_dev() to pci_alloc_dev()
>> PCI: Move the acquiring the reference of pci bus inside
>> pci_alloc_bus()
>>
>> arch/powerpc/kernel/pci_of_scan.c | 3 +--
>> drivers/char/agp/alpha-agp.c | 2 +-
>> drivers/char/agp/parisc-agp.c | 2 +-
>> drivers/pci/bus.c | 14 ++++++++++++++
>> drivers/pci/iov.c | 6 ++++--
>> drivers/pci/probe.c | 10 ++++++----
>> drivers/scsi/megaraid.c | 2 +-
>> include/linux/pci.h | 5 ++++-
>> 8 files changed, 32 insertions(+), 12 deletions(-)
>
> No. that is wrong.
>
> I can not find where that reference count get reduced!
> in the my test patches that I sent, it does have have that in remove path.
Hi Yinghai,
Sorry for the later reply and the missing reading of your mail thread!
>
> What is reason for you to remove that line?
It's just a mistake here.
>
> After put the line back, your test still can pass?
Our test machine is not ready now, I'll test it soon. Maybe the result is the same as your patch,
there are the same in the logic.
In the original test of your patch, one doubt puzzles me, although we have increased the pci_bus' reference,
but the pci_bus is still released too soon before its downstream devices' removal routines complete.
Can you explain why?
Is not there a reference detection mechanism in pci core when releasing the pci object?
Thanks,
Gu
>
> Yinghai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
2013-04-18 19:43 ` Yinghai Lu
2013-04-22 3:02 ` Gu Zheng
@ 2013-04-24 9:36 ` Gu Zheng
2013-04-24 17:11 ` Yinghai Lu
1 sibling, 1 reply; 8+ messages in thread
From: Gu Zheng @ 2013-04-24 9:36 UTC (permalink / raw)
To: Yinghai Lu
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas, Yasuaki Ishimatsu,
Taku Izumi, Jiang Liu, tangchen, Lin Feng
On 04/19/2013 03:43 AM, Yinghai Lu wrote:
> On Thu, Apr 18, 2013 at 1:53 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>> From 0c45a7fca2276123d0b926a22ea69158dad8ab9c Mon Sep 17 00:00:00 2001
>> From: Gu Zheng <guz.fnst@cn.fujitsu.com>
>> Date: Thu, 18 Apr 2013 17:43:53 +0900
>> Subject: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
>>
>> This patch is used to fix the panic issue of parallel device removal on different pci hierarchy,
>> refer to https://bugzilla.kernel.org/show_bug.cgi?id=54411.
>>
>> [ 418.775140] ioatdma i7core_edac edac_core sg e1000e igb dca ptp pps_core
>> sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas
>> scsi_mod
>> [ 418.946462] CPU 4
>> [ 418.968377] Pid: 512, comm: kworker/u:2 Tainted: G W 3.8.0 #2
>> FUJITSU-SV PRIMEQUEST 1800E/SB
>> [ 419.081763] RIP: 0010:[<ffffffff8137972e>] [<ffffffff8137972e>]
>> pci_bus_read_config_word+0x5e/0x90
>> [ 419.189965] RSP: 0018:ffff8807b0a37c08 EFLAGS: 00010046
>> [ 419.253409] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8807bb4a1290 RCX:
>> 0000000000000002
>> [ 419.338658] RDX: 00000000000000c4 RSI: 0000000000000008 RDI:
>> ffff8807bb4a1290
>> [ 419.423925] RBP: ffff8807b0a37c48 R08: ffff8807b0a37c24 R09:
>> 6db5c22da55960d0
>> [ 419.509175] R10: 0000000000000000 R11: 000000000003ecd0 R12:
>> ffff8807b0a37c66
>> [ 419.594425] R13: 0000000000000282 R14: ffffffff82126d40 R15:
>> 0000000000000000
>> [ 419.679675] FS: 0000000000000000(0000) GS:ffff8807c2200000(0000)
>> knlGS:0000000000000000
>> [ 419.776343] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [ 419.844981] CR2: 00007ffa898a54f8 CR3: 0000000001c0c000 CR4:
>> 00000000000007e0
>> [ 419.930236] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [ 420.015484] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [ 420.100736] Process kworker/u:2 (pid: 512, threadinfo ffff8807b0a36000, task
>> ffff8807b30bcd00)
>> [ 420.203632] Stack:
>> [ 420.227623] ffff8807000000c4 ffffffff00000008 ffffffff813851ef
>> 0000000000992000
>> [ 420.316421] ffff8807b0a37c98 ffff8807bb49b3d8 0000000000000000
>> 0000000000000000
>> [ 420.405233] ffff8807b0a37c88 ffffffff8138044b ffff8807b0a37c88
>> 0000000000000246
>> [ 420.494137] Call Trace:
>> [ 420.523326] [<ffffffff813851ef>] ? remove_callback+0x1f/0x40
>> [ 420.591984] [<ffffffff8138044b>] pci_pme_active+0x4b/0x1c0
>> [ 420.658545] [<ffffffff8137d8e7>] pci_stop_bus_device+0x57/0xb0
>> [ 420.729259] [<ffffffff8137dab6>] pci_stop_and_remove_bus_device+0x16/0x30
>> [ 420.811392] [<ffffffff813851fb>] remove_callback+0x2b/0x40
>> [ 420.877955] [<ffffffff81257a56>] sysfs_schedule_callback_work+0x26/0x70
>> [ 420.958017] [<ffffffff810919ae>] process_one_work+0x20e/0x5c0
>> [ 421.027691] [<ffffffff8109193f>] ? process_one_work+0x19f/0x5c0
>> [ 421.099441] [<ffffffff81257a30>] ? sysfs_schedule_callback+0x210/0x210
>> [ 421.178461] [<ffffffff81093a4e>] worker_thread+0x12e/0x370
>> [ 421.245020] [<ffffffff81093920>] ? manage_workers+0x180/0x180
>> [ 421.314697] [<ffffffff81099b8e>] kthread+0xee/0x100
>> [ 421.373992] [<ffffffff810e0f09>] ? __lock_release+0x129/0x190
>> [ 421.443671] [<ffffffff81099aa0>] ? __init_kthread_worker+0x70/0x70
>> [ 421.518544] [<ffffffff816b2dac>] ret_from_fork+0x7c/0xb0
>> [ 421.583031] [<ffffffff81099aa0>] ? __init_kthread_worker+0x70/0x70
>> [ 421.657894] Code: 89 75 c8 c7 45 dc 00 00 00 00 e8 4e ef 32 00 49 89 c5 48
>> 8b 83 b8 00 00 00 4c 8d 45 dc b9 02 00 00 00 8b 55 c0 8b 75 c8 48 89 df <ff> 10
>> 8b 55 dc 4c 89 ee 48 c7 c7 c0 67 cb 81 89 45 c8 66 41 89
>> [ 421.890306] RIP [<ffffffff8137972e>] pci_bus_read_config_word+0x5e/0x90
>> [ 421.970475] RSP <ffff8807b0a37c08>
>> [ 422.012121] ---[ end trace 403f76cf31f1bcb1 ]---
>>
>> It is easy to reproduce with the following script:
>> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
>> /sys/bus/pci/devices/0000\:1a\:01.0/remove
>>
>> The 1a:01.0 device is downstream from the 10:00.0 bridge.
>>
>> The sysfs interface remove_store() uses device_schedule_callback() to schedule
>> the remove for later. What's happening is that we schedule
>> remove_callback() for both devices before 10:00.0 has been removed,
>> like this:
>>
>> # echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove
>> remove_store # for 10:00.0
>> device_schedule_callback(10:00.0, remove_callback)
>> sysfs_schedule_callback
>> kobject_get
>> queue_work
>> # echo -n 1 > /sys/bus/pci/devices/0000\:1a\:01.0/remove
>> remove_store # for 1a:01.0
>> device_schedule_callback(1a:01.0, remove_callback)
>> sysfs_schedule_callback
>> kobject_get
>> queue_work
>>
>> Later, we run the callbacks, starting with 10:00.0. This calls
>> remove_callback() to perform the remove:
>>
>> remove_callback(10:00.0)
>> mutex_lock(&pci_remove_rescan_mutex)
>> pci_stop_and_remove_bus_device(pdev)
>> mutex_unlock(&pci_remove_rescan_mutex)
>>
>> This will stop and remove the subtree below 10:00.0, but it does not
>> actually free the pci_dev for 1a:01.0 because we increased its ref
>> count in sysfs_schedule_callback. So after completing
>> remove_callback(10:00.0), we run the second callback for 1a:01.0.
>>
>> But the PCI core did this removal wrong. It deallocated the struct pci_bus
>> for bus 0000:1a too soon.
>>
>> So we take a reference on the bus object when capturing the struct pci_bus pointer,
>> in order to keep it valid before its downstream devices' removal routines complete.
>>
>> Gu Zheng (3):
>> PCI: take a reference on the bus object when we capture the struct
>> pci_bus pointer
>> PCI: rename alloc_pci_dev() to pci_alloc_dev()
>> PCI: Move the acquiring the reference of pci bus inside
>> pci_alloc_bus()
>>
>> arch/powerpc/kernel/pci_of_scan.c | 3 +--
>> drivers/char/agp/alpha-agp.c | 2 +-
>> drivers/char/agp/parisc-agp.c | 2 +-
>> drivers/pci/bus.c | 14 ++++++++++++++
>> drivers/pci/iov.c | 6 ++++--
>> drivers/pci/probe.c | 10 ++++++----
>> drivers/scsi/megaraid.c | 2 +-
>> include/linux/pci.h | 5 ++++-
>> 8 files changed, 32 insertions(+), 12 deletions(-)
>
> No. that is wrong.
>
> I can not find where that reference count get reduced!
> in the my test patches that I sent, it does have have that in remove path.
>
> What is reason for you to remove that line?
>
> After put the line back, your test still can pass?
Hi Yinghai,
The remove test script still can pass if we reduce the reference of pci_bus when destroying
the pci_dev, so does your patch, but a list_del corruption warning occurs in the bottom half routine:
------------[ cut here ]------------
WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
Hardware name: PRIMEQUEST 1800E
list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
Modules linked in: shpchp ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core i7core_edac edac_core ioatdma e1000e igb dca i2c_algo_bit i2c_core ptp pps_core ext4(F) mbcache(F) jbd2(F) sd_mod(F) crc_t10dif(F) megaraid_sas(F) mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F)
Pid: 6, comm: kworker/u:0 Tainted: GF W 3.9.0-rc7-pci-remove-test+ #59
Call Trace:
[<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81280b13>] __list_del_entry+0x63/0xd0
[<ffffffff81280b91>] list_del+0x11/0x40
[<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
[<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
[<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
[<ffffffff8129fc89>] remove_callback+0x29/0x40
[<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
[<ffffffff81073d85>] process_one_work+0x185/0x3f0
[<ffffffff810763e9>] worker_thread+0x119/0x380
[<ffffffff810762d0>] ? manage_workers+0x180/0x180
[<ffffffff8107b6ae>] kthread+0xce/0xe0
[<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff815724ac>] ret_from_fork+0x7c/0xb0
[<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
---[ end trace 9c05e382f933a515 ]---
Thanks,
Gu
>
> Yinghai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
2013-04-24 9:36 ` Gu Zheng
@ 2013-04-24 17:11 ` Yinghai Lu
2013-04-24 23:48 ` Yinghai Lu
0 siblings, 1 reply; 8+ messages in thread
From: Yinghai Lu @ 2013-04-24 17:11 UTC (permalink / raw)
To: Gu Zheng
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas, Yasuaki Ishimatsu,
Taku Izumi, Jiang Liu, tangchen, Lin Feng
On Wed, Apr 24, 2013 at 2:36 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> On 04/19/2013 03:43 AM, Yinghai Lu wrote:
[...]
>> I can not find where that reference count get reduced!
>> in the my test patches that I sent, it does have have that in remove path.
>>
>> What is reason for you to remove that line?
>>
>> After put the line back, your test still can pass?
>
> Hi Yinghai,
> The remove test script still can pass if we reduce the reference of pci_bus when destroying
> the pci_dev, so does your patch, but a list_del corruption warning occurs in the bottom half routine:
That is not clear.
your patchset does not have list_del corruption warning?
> ------------[ cut here ]------------
> WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
> Hardware name: PRIMEQUEST 1800E
> list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
> Modules linked in: shpchp ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core i7core_edac edac_core ioatdma e1000e igb dca i2c_algo_bit i2c_core ptp pps_core ext4(F) mbcache(F) jbd2(F) sd_mod(F) crc_t10dif(F) megaraid_sas(F) mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F)
> Pid: 6, comm: kworker/u:0 Tainted: GF W 3.9.0-rc7-pci-remove-test+ #59
> Call Trace:
> [<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
> [<ffffffff81280b13>] __list_del_entry+0x63/0xd0
> [<ffffffff81280b91>] list_del+0x11/0x40
> [<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
> [<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
> [<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
> [<ffffffff8129fc89>] remove_callback+0x29/0x40
> [<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
> [<ffffffff81073d85>] process_one_work+0x185/0x3f0
> [<ffffffff810763e9>] worker_thread+0x119/0x380
> [<ffffffff810762d0>] ? manage_workers+0x180/0x180
> [<ffffffff8107b6ae>] kthread+0xce/0xe0
> [<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
> [<ffffffff815724ac>] ret_from_fork+0x7c/0xb0
> [<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
> ---[ end trace 9c05e382f933a515 ]---
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
2013-04-24 17:11 ` Yinghai Lu
@ 2013-04-24 23:48 ` Yinghai Lu
2013-04-25 1:10 ` Gu Zheng
2013-04-25 8:52 ` Gu Zheng
0 siblings, 2 replies; 8+ messages in thread
From: Yinghai Lu @ 2013-04-24 23:48 UTC (permalink / raw)
To: Gu Zheng
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas, Yasuaki Ishimatsu,
Taku Izumi, Jiang Liu, tangchen, Lin Feng
[-- Attachment #1: Type: text/plain, Size: 2939 bytes --]
On Wed, Apr 24, 2013 at 10:11 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Wed, Apr 24, 2013 at 2:36 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>> On 04/19/2013 03:43 AM, Yinghai Lu wrote:
> [...]
>>> I can not find where that reference count get reduced!
>>> in the my test patches that I sent, it does have have that in remove path.
>>>
>>> What is reason for you to remove that line?
>>>
>>> After put the line back, your test still can pass?
>>
>> Hi Yinghai,
>> The remove test script still can pass if we reduce the reference of pci_bus when destroying
>> the pci_dev, so does your patch, but a list_del corruption warning occurs in the bottom half routine:
>
> That is not clear.
>
> your patchset does not have list_del corruption warning?
>
>> ------------[ cut here ]------------
>> WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
>> Hardware name: PRIMEQUEST 1800E
>> list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
>> Modules linked in: shpchp ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core i7core_edac edac_core ioatdma e1000e igb dca i2c_algo_bit i2c_core ptp pps_core ext4(F) mbcache(F) jbd2(F) sd_mod(F) crc_t10dif(F) megaraid_sas(F) mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F)
>> Pid: 6, comm: kworker/u:0 Tainted: GF W 3.9.0-rc7-pci-remove-test+ #59
>> Call Trace:
>> [<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
>> [<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
>> [<ffffffff81280b13>] __list_del_entry+0x63/0xd0
>> [<ffffffff81280b91>] list_del+0x11/0x40
>> [<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
>> [<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
>> [<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
>> [<ffffffff8129fc89>] remove_callback+0x29/0x40
>> [<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
>> [<ffffffff81073d85>] process_one_work+0x185/0x3f0
>> [<ffffffff810763e9>] worker_thread+0x119/0x380
>> [<ffffffff810762d0>] ? manage_workers+0x180/0x180
>> [<ffffffff8107b6ae>] kthread+0xce/0xe0
>> [<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
>> [<ffffffff815724ac>] ret_from_fork+0x7c/0xb0
>> [<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
>> ---[ end trace 9c05e382f933a515 ]---
Attached patch on top of current linus tree should fix the racing problem.
We don't need to bother to add get bus ref. etc.
Let us know your test result.
Yinghai
[-- Attachment #2: fix_racing_removing.patch --]
[-- Type: application/octet-stream, Size: 806 bytes --]
---
drivers/pci/pci-sysfs.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
Index: linux-2.6/drivers/pci/pci-sysfs.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-sysfs.c
+++ linux-2.6/drivers/pci/pci-sysfs.c
@@ -310,9 +310,16 @@ dev_rescan_store(struct device *dev, str
static void remove_callback(struct device *dev)
{
struct pci_dev *pdev = to_pci_dev(dev);
+ int domain = pci_domain_nr(pdev->bus);
+ u8 bus = pdev->bus->number;
+ u8 devfn = pdev->devfn;
mutex_lock(&pci_remove_rescan_mutex);
- pci_stop_and_remove_bus_device(pdev);
+ pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
+ if (pdev) {
+ pci_dev_put(pdev);
+ pci_stop_and_remove_bus_device(pdev);
+ }
mutex_unlock(&pci_remove_rescan_mutex);
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
2013-04-24 23:48 ` Yinghai Lu
@ 2013-04-25 1:10 ` Gu Zheng
2013-04-25 8:52 ` Gu Zheng
1 sibling, 0 replies; 8+ messages in thread
From: Gu Zheng @ 2013-04-25 1:10 UTC (permalink / raw)
To: Yinghai Lu
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas, Yasuaki Ishimatsu,
Taku Izumi, Jiang Liu, tangchen, Lin Feng
On 04/25/2013 07:48 AM, Yinghai Lu wrote:
> On Wed, Apr 24, 2013 at 10:11 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Wed, Apr 24, 2013 at 2:36 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>> On 04/19/2013 03:43 AM, Yinghai Lu wrote:
>> [...]
>>>> I can not find where that reference count get reduced!
>>>> in the my test patches that I sent, it does have have that in remove path.
>>>>
>>>> What is reason for you to remove that line?
>>>>
>>>> After put the line back, your test still can pass?
>>>
>>> Hi Yinghai,
>>> The remove test script still can pass if we reduce the reference of pci_bus when destroying
>>> the pci_dev, so does your patch, but a list_del corruption warning occurs in the bottom half routine:
>>
>> That is not clear.
>>
>> your patchset does not have list_del corruption warning?
Hi Yinghai,
Sorry for my unclear description, my patchset has the same warning.
>>
>>> ------------[ cut here ]------------
>>> WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
>>> Hardware name: PRIMEQUEST 1800E
>>> list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
>>> Modules linked in: shpchp ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core i7core_edac edac_core ioatdma e1000e igb dca i2c_algo_bit i2c_core ptp pps_core ext4(F) mbcache(F) jbd2(F) sd_mod(F) crc_t10dif(F) megaraid_sas(F) mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F)
>>> Pid: 6, comm: kworker/u:0 Tainted: GF W 3.9.0-rc7-pci-remove-test+ #59
>>> Call Trace:
>>> [<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
>>> [<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
>>> [<ffffffff81280b13>] __list_del_entry+0x63/0xd0
>>> [<ffffffff81280b91>] list_del+0x11/0x40
>>> [<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
>>> [<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
>>> [<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
>>> [<ffffffff8129fc89>] remove_callback+0x29/0x40
>>> [<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
>>> [<ffffffff81073d85>] process_one_work+0x185/0x3f0
>>> [<ffffffff810763e9>] worker_thread+0x119/0x380
>>> [<ffffffff810762d0>] ? manage_workers+0x180/0x180
>>> [<ffffffff8107b6ae>] kthread+0xce/0xe0
>>> [<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
>>> [<ffffffff815724ac>] ret_from_fork+0x7c/0xb0
>>> [<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
>>> ---[ end trace 9c05e382f933a515 ]---
>
> Attached patch on top of current linus tree should fix the racing problem.
>
> We don't need to bother to add get bus ref. etc.
>
> Let us know your test result.
Thanks, I'll do the test soon, and send out the report later!
Best regards,
Gu
>
> Yinghai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy
2013-04-24 23:48 ` Yinghai Lu
2013-04-25 1:10 ` Gu Zheng
@ 2013-04-25 8:52 ` Gu Zheng
1 sibling, 0 replies; 8+ messages in thread
From: Gu Zheng @ 2013-04-25 8:52 UTC (permalink / raw)
To: Yinghai Lu
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas, Yasuaki Ishimatsu,
Taku Izumi, Jiang Liu, tangchen, Lin Feng
On 04/25/2013 07:48 AM, Yinghai Lu wrote:
> On Wed, Apr 24, 2013 at 10:11 AM, Yinghai Lu <yinghai@kernel.org> wrote:
>> On Wed, Apr 24, 2013 at 2:36 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
>>> On 04/19/2013 03:43 AM, Yinghai Lu wrote:
>> [...]
>>>> I can not find where that reference count get reduced!
>>>> in the my test patches that I sent, it does have have that in remove path.
>>>>
>>>> What is reason for you to remove that line?
>>>>
>>>> After put the line back, your test still can pass?
>>>
>>> Hi Yinghai,
>>> The remove test script still can pass if we reduce the reference of pci_bus when destroying
>>> the pci_dev, so does your patch, but a list_del corruption warning occurs in the bottom half routine:
>>
>> That is not clear.
>>
>> your patchset does not have list_del corruption warning?
>>
>>> ------------[ cut here ]------------
>>> WARNING: at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
>>> Hardware name: PRIMEQUEST 1800E
>>> list_del corruption, ffff8807d1b6c000->next is LIST_POISON1 (dead000000100100)
>>> Modules linked in: shpchp ebtable_nat ebtables xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc cpufreq_ondemand ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 vfat fat dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel microcode pcspkr sg i2c_i801 lpc_ich mfd_core i7core_edac edac_core ioatdma e1000e igb dca i2c_algo_bit i2c_core ptp pps_core ext4(F) mbcache(F) jbd2(F) sd_mod(F) crc_t10dif(F) megaraid_sas(F) mptsas(F) mptscsih(F) mptbase(F) scsi_transport_sas(F)
>>> Pid: 6, comm: kworker/u:0 Tainted: GF W 3.9.0-rc7-pci-remove-test+ #59
>>> Call Trace:
>>> [<ffffffff81056d4f>] warn_slowpath_common+0x7f/0xc0
>>> [<ffffffff81056e46>] warn_slowpath_fmt+0x46/0x50
>>> [<ffffffff81280b13>] __list_del_entry+0x63/0xd0
>>> [<ffffffff81280b91>] list_del+0x11/0x40
>>> [<ffffffff81298331>] pci_destroy_dev+0x31/0xc0
>>> [<ffffffff812985bb>] pci_remove_bus_device+0x5b/0x70
>>> [<ffffffff812985ee>] pci_stop_and_remove_bus_device+0x1e/0x30
>>> [<ffffffff8129fc89>] remove_callback+0x29/0x40
>>> [<ffffffff811f3b84>] sysfs_schedule_callback_work+0x24/0x70
>>> [<ffffffff81073d85>] process_one_work+0x185/0x3f0
>>> [<ffffffff810763e9>] worker_thread+0x119/0x380
>>> [<ffffffff810762d0>] ? manage_workers+0x180/0x180
>>> [<ffffffff8107b6ae>] kthread+0xce/0xe0
>>> [<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
>>> [<ffffffff815724ac>] ret_from_fork+0x7c/0xb0
>>> [<ffffffff8107b5e0>] ? kthread_freezable_should_stop+0x70/0x70
>>> ---[ end trace 9c05e382f933a515 ]---
>
> Attached patch on top of current linus tree should fix the racing problem.
>
> We don't need to bother to add get bus ref. etc.
>
> Let us know your test result.
Hi Yinghai,
I tested your new patch on latest linus tree(824282ca7d250), it works well, no list_del corruption warning.
And the dmesg info seems that every thing goes well.
[root@DP guzheng]# echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 > /sys/bus/pci/devices/0000\:1a\:01.0/remove
[root@DP guzheng]# dmesg
pciehp 0000:1a:01.0:pcie24: unloading service driver pciehp
pciehp 0000:1a:03.0:pcie24: unloading service driver pciehp
pciehp 0000:1a:02.0:pcie24: unloading service driver pciehp
pciehp 0000:1a:00.0:pcie24: unloading service driver pciehp
igb 0000:15:00.1: removed PHC on eth7
igb 0000:15:00.0: removed PHC on eth6
igb 0000:14:00.1: removed PHC on eth5
igb 0000:14:00.0: removed PHC on eth4
pci_bus 0000:14: busn_res: [bus 14] is released
pci_bus 0000:15: busn_res: [bus 15] is released
pci_bus 0000:16: busn_res: [bus 16] is released
pci_bus 0000:17: busn_res: [bus 17] is released
pci_bus 0000:18: busn_res: [bus 18] is released
pci_bus 0000:13: busn_res: [bus 13-18] is released
pci_bus 0000:12: busn_res: [bus 12-18] is released
pci_bus 0000:1b: busn_res: [bus 1b] is released
pci_bus 0000:1d: busn_res: [bus 1d] is released
pci_bus 0000:1e: busn_res: [bus 1e] is released
pci_bus 0000:1c: busn_res: [bus 1c] is released
pci_bus 0000:1a: busn_res: [bus 1a-1e] is released
pci_bus 0000:19: busn_res: [bus 19-1e] is released
pci_bus 0000:11: busn_res: [bus 11-1e] is released
Best regards,
Gu
>
> Yinghai
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-04-25 8:58 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-18 8:53 [PATCH 0/3] PCI: fix the object lifetime issue of parallel device removal on different pci hierarchy Gu Zheng
2013-04-18 19:43 ` Yinghai Lu
2013-04-22 3:02 ` Gu Zheng
2013-04-24 9:36 ` Gu Zheng
2013-04-24 17:11 ` Yinghai Lu
2013-04-24 23:48 ` Yinghai Lu
2013-04-25 1:10 ` Gu Zheng
2013-04-25 8:52 ` Gu Zheng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).