* 3-way deadlock during fc_remove_host
@ 2009-07-25 22:26 Joe Eykholt
0 siblings, 0 replies; 2+ messages in thread
From: Joe Eykholt @ 2009-07-25 22:26 UTC (permalink / raw)
To: linux-scsi@vger.kernel.org, devel@open-fcoe.org
I could use some insight about this problem. I have a hang involving
three threads.
The first is in fc_remove_host() doing flush_cpu_workqueue().
It cannot proceed because of the second:
A worker thread PID 7624 is trying to remove an fc_rport and doing
async_synchronize_full(), which waits until all posted async events
are complete. One such event that keeps it from completing is:
The third thread which is in sd_probe_async() waiting on I/O completion.
I think that last one won't finish because the HBA is being removed,
but it could be for another HBA instead. Actually, all I/O for the
HBA being removed should have been canceled (via fc_rport_terminate_io()
doing a exch_mgr_reset) and no new I/O started.
This is with the current fcoe-next.git tree plus local fixes,
but it may apply to other trees.
Here are the stacks I got from /proc/*/stack:
7635
Name: fcoeadm
State: D (disk sleep)
cmd: /sbin/fcoeadm -d eth4
wchan: flush_cpu_workqueue
--- assume waiting for work queue
--- waiting for 7519 probably
[<ffffffff810532e1>] flush_cpu_workqueue+0x7b/0x87
[<ffffffff81053357>] cleanup_workqueue_thread+0x6a/0xb8
[<ffffffff8105343c>] destroy_workqueue+0x63/0x9e
[<ffffffffa004a5a7>] fc_remove_host+0x148/0x171 [scsi_transport_fc]
[<ffffffffa00b612c>] fcoe_if_destroy+0x123/0x15b [fcoe]
[<ffffffffa00b620c>] fcoe_destroy+0x72/0xa0 [fcoe]
[<ffffffff81055558>] param_attr_store+0x25/0x35
[<ffffffff810555ad>] module_attr_store+0x21/0x25
[<ffffffff81126c5a>] sysfs_write_file+0xe4/0x119
[<ffffffff810d76b4>] vfs_write+0xab/0x105
[<ffffffff810d77d2>] sys_write+0x47/0x6e
[<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
7519
Name: fc_wq_5
State: D (disk sleep)
cmd:
wchan: async_synchronize_cookie_domain
--- waiting for async 7624?/
[<ffffffff8105cdca>] async_synchronize_cookie_domain+0xb4/0x110
[<ffffffff8105ce36>] async_synchronize_cookie+0x10/0x12
[<ffffffff8105ce48>] async_synchronize_full+0x10/0x2c
[<ffffffff812aa7a7>] sd_remove+0x15/0x8a
[<ffffffff81291b76>] __device_release_driver+0x80/0xc9
[<ffffffff81291c8a>] device_release_driver+0x1e/0x2b
[<ffffffff8129122f>] bus_remove_device+0xa8/0xc9
[<ffffffff8128f92e>] device_del+0x138/0x1a1
[<ffffffff812a502c>] __scsi_remove_device+0x44/0x81
[<ffffffff812a508f>] scsi_remove_device+0x26/0x33
[<ffffffff812a5141>] __scsi_remove_target+0x93/0xd7
[<ffffffff812a51eb>] __remove_child+0x1e/0x25
[<ffffffff8128f18a>] device_for_each_child+0x38/0x6f
[<ffffffff812a51c0>] scsi_remove_target+0x3b/0x48
[<ffffffffa0049db7>] fc_starget_delete+0x21/0x25 [scsi_transport_fc]
[<ffffffffa0049eb1>] fc_rport_final_delete+0xf6/0x188 [scsi_transport_fc]
[<ffffffff81052d10>] worker_thread+0x1fa/0x30a
[<ffffffff81057151>] kthread+0x88/0x90
[<ffffffff8100cbfa>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff
7624
Name: async/1
State: D (disk sleep)
cmd:
wchan: blk_execute_rq
--- waiting for completion of i/o
--- since this is an async thread, presumably 7519 is waiting for it
[<ffffffff811c5b51>] blk_execute_rq+0xb6/0xd9
[<ffffffff812a1b9f>] scsi_execute+0xe0/0x132
[<ffffffff812a1c71>] scsi_execute_req+0x80/0xb2
[<ffffffff812aa912>] read_capacity_10+0x7d/0x1a0
[<ffffffff812ac80f>] sd_revalidate_disk+0x14c2/0x1561
[<ffffffff811242db>] rescan_partitions+0x8c/0x3a3
[<ffffffff810fb991>] __blkdev_get+0x264/0x333
[<ffffffff810fba6b>] blkdev_get+0xb/0xd
[<ffffffff81123971>] register_disk+0xe2/0x144
[<ffffffff811c7f80>] add_disk+0xc0/0x11e
[<ffffffff812ac9ca>] sd_probe_async+0x11c/0x1cd
[<ffffffff8105cbbc>] async_thread+0x114/0x205
[<ffffffff81057151>] kthread+0x88/0x90
[<ffffffff8100cbfa>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff
If you've seen a hang like this or might know what's going on,
please let me know.
Thanks,
Joe
^ permalink raw reply [flat|nested] 2+ messages in thread
* 3-way deadlock during fc_remove_host
@ 2009-07-25 22:26 Joe Eykholt
0 siblings, 0 replies; 2+ messages in thread
From: Joe Eykholt @ 2009-07-25 22:26 UTC (permalink / raw)
To: linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
devel-s9riP+hp16TNLxjTenLetw@public.gmane.org
I could use some insight about this problem. I have a hang involving
three threads.
The first is in fc_remove_host() doing flush_cpu_workqueue().
It cannot proceed because of the second:
A worker thread PID 7624 is trying to remove an fc_rport and doing
async_synchronize_full(), which waits until all posted async events
are complete. One such event that keeps it from completing is:
The third thread which is in sd_probe_async() waiting on I/O completion.
I think that last one won't finish because the HBA is being removed,
but it could be for another HBA instead. Actually, all I/O for the
HBA being removed should have been canceled (via fc_rport_terminate_io()
doing a exch_mgr_reset) and no new I/O started.
This is with the current fcoe-next.git tree plus local fixes,
but it may apply to other trees.
Here are the stacks I got from /proc/*/stack:
7635
Name: fcoeadm
State: D (disk sleep)
cmd: /sbin/fcoeadm -d eth4
wchan: flush_cpu_workqueue
--- assume waiting for work queue
--- waiting for 7519 probably
[<ffffffff810532e1>] flush_cpu_workqueue+0x7b/0x87
[<ffffffff81053357>] cleanup_workqueue_thread+0x6a/0xb8
[<ffffffff8105343c>] destroy_workqueue+0x63/0x9e
[<ffffffffa004a5a7>] fc_remove_host+0x148/0x171 [scsi_transport_fc]
[<ffffffffa00b612c>] fcoe_if_destroy+0x123/0x15b [fcoe]
[<ffffffffa00b620c>] fcoe_destroy+0x72/0xa0 [fcoe]
[<ffffffff81055558>] param_attr_store+0x25/0x35
[<ffffffff810555ad>] module_attr_store+0x21/0x25
[<ffffffff81126c5a>] sysfs_write_file+0xe4/0x119
[<ffffffff810d76b4>] vfs_write+0xab/0x105
[<ffffffff810d77d2>] sys_write+0x47/0x6e
[<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
7519
Name: fc_wq_5
State: D (disk sleep)
cmd:
wchan: async_synchronize_cookie_domain
--- waiting for async 7624?/
[<ffffffff8105cdca>] async_synchronize_cookie_domain+0xb4/0x110
[<ffffffff8105ce36>] async_synchronize_cookie+0x10/0x12
[<ffffffff8105ce48>] async_synchronize_full+0x10/0x2c
[<ffffffff812aa7a7>] sd_remove+0x15/0x8a
[<ffffffff81291b76>] __device_release_driver+0x80/0xc9
[<ffffffff81291c8a>] device_release_driver+0x1e/0x2b
[<ffffffff8129122f>] bus_remove_device+0xa8/0xc9
[<ffffffff8128f92e>] device_del+0x138/0x1a1
[<ffffffff812a502c>] __scsi_remove_device+0x44/0x81
[<ffffffff812a508f>] scsi_remove_device+0x26/0x33
[<ffffffff812a5141>] __scsi_remove_target+0x93/0xd7
[<ffffffff812a51eb>] __remove_child+0x1e/0x25
[<ffffffff8128f18a>] device_for_each_child+0x38/0x6f
[<ffffffff812a51c0>] scsi_remove_target+0x3b/0x48
[<ffffffffa0049db7>] fc_starget_delete+0x21/0x25 [scsi_transport_fc]
[<ffffffffa0049eb1>] fc_rport_final_delete+0xf6/0x188 [scsi_transport_fc]
[<ffffffff81052d10>] worker_thread+0x1fa/0x30a
[<ffffffff81057151>] kthread+0x88/0x90
[<ffffffff8100cbfa>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff
7624
Name: async/1
State: D (disk sleep)
cmd:
wchan: blk_execute_rq
--- waiting for completion of i/o
--- since this is an async thread, presumably 7519 is waiting for it
[<ffffffff811c5b51>] blk_execute_rq+0xb6/0xd9
[<ffffffff812a1b9f>] scsi_execute+0xe0/0x132
[<ffffffff812a1c71>] scsi_execute_req+0x80/0xb2
[<ffffffff812aa912>] read_capacity_10+0x7d/0x1a0
[<ffffffff812ac80f>] sd_revalidate_disk+0x14c2/0x1561
[<ffffffff811242db>] rescan_partitions+0x8c/0x3a3
[<ffffffff810fb991>] __blkdev_get+0x264/0x333
[<ffffffff810fba6b>] blkdev_get+0xb/0xd
[<ffffffff81123971>] register_disk+0xe2/0x144
[<ffffffff811c7f80>] add_disk+0xc0/0x11e
[<ffffffff812ac9ca>] sd_probe_async+0x11c/0x1cd
[<ffffffff8105cbbc>] async_thread+0x114/0x205
[<ffffffff81057151>] kthread+0x88/0x90
[<ffffffff8100cbfa>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff
If you've seen a hang like this or might know what's going on,
please let me know.
Thanks,
Joe
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2009-07-25 22:26 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-25 22:26 3-way deadlock during fc_remove_host Joe Eykholt
-- strict thread matches above, loose matches on Subject: below --
2009-07-25 22:26 Joe Eykholt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).