From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joe Eykholt Subject: Re: fc_remove_host / scsi_host_dev_release deadlock Date: Fri, 07 Aug 2009 13:33:03 -0700 Message-ID: <4A7C8F7F.2080706@cisco.com> References: <4A7C6D56.6080602@cisco.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from sj-iport-1.cisco.com ([171.71.176.70]:10787 "EHLO sj-iport-1.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932775AbZHGUdN (ORCPT ); Fri, 7 Aug 2009 16:33:13 -0400 Received: from sj-core-5.cisco.com (sj-core-5.cisco.com [171.71.177.238]) by sj-dkim-2.cisco.com (8.12.11/8.12.11) with ESMTP id n77KX7qS025391 for ; Fri, 7 Aug 2009 13:33:07 -0700 Received: from jeykholt-mac.nuovasystems.com ([172.30.136.66]) by sj-core-5.cisco.com (8.13.8/8.14.3) with ESMTP id n77KX74G028352 for ; Fri, 7 Aug 2009 20:33:07 GMT In-Reply-To: <4A7C6D56.6080602@cisco.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: "linux-scsi@vger.kernel.org" Joe Eykholt wrote: > This is different from the 3-way deadlock I posted on 7/25 > http://marc.info/?l=linux-scsi&m=124856078500527&w=2 > > A user process is closing an sd block dev file. > It holds the sd_ref_mutex and is in scsi_host_dev_release() > doing a destroy_workqueue() which needs the cpu_add_remove_lock. > > The process that holds cpu_add_remove_lock is doing > fcoe_destroy (deleting an FCoE instance) which calls > fc_remove_host() which does destroy_workqueue() and waiting in > flush_cpu_workqueue() for a work item to finish. > > The stuck work item is doing fc_rport_final_delete(), and > that's in device_release_driver() in sd_remove() which > is waiting for the sd_ref_mutex. > > I'd like to hear some discussion about the right approach > to fix this. There should be some clear rules about how > work items can destroy or flush other workqueues. > I'd like to be able to state the clear rule that's being > violated here. Now it seems to me that that the rule is: A workqueue item may not destroy a workqueue if it is running on a workqueue that itself may be destroyed. That's because destroy_workqueue takes the cpu_add_remove lock and, while holding it, does flush_cpu_workqueue(). If a work item on workqueue A destroy workqueue B, and thread C destroys workqueue A, then thread C may get the cpu_add_remove lock first, work item on A can wait forever for the lock, and thread C can hang waiting for workqueue A to flush. So we're violating that rule if we let fc_rport_final_delete() do the destroy of the device's workqueue. I still don't have a good solution. Maybe fc_host_remove() could flush the workqueue before destroying it? Maybe destroy_workqueue() should flush before grabbing the cpu_add_remove_lock, but maybe that doesn't completely fix it. Joe > Perhaps the way cpu_add_remove_lock is used to in > destroy_workqueue() should be changed, or maybe > fc_rport_final_delete and fc_remove_host need to > be better about using workqueues somehow. > > I do see a lock dependency warning about this, so that > system works. I saw this once on a system based > on 2.6.31-rc4 under the current fcoe-next.git tree. > > The following are the stacks I collected via /proc/*/stacks > and the lockdep warnings. > > Thanks, > Joe > > > Name: hald-probe-volu > State: D (disk sleep) > cmd: /usr/libexec/hald-probe-volume > wchan: cpu_maps_update_begin > > --- waiting cpu_add_remove_lock > --- has sd_ref_mutex in scsi_disk_put > > [] cpu_maps_update_begin+0x12/0x14 > [] destroy_workqueue+0x2b/0x9e > [] scsi_host_dev_release+0x5a/0xbd > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] fc_rport_dev_release+0x18/0x24 [scsi_transport_fc] > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] scsi_target_dev_release+0x1d/0x21 > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] scsi_device_dev_release_usercontext+0x118/0x124 > [] execute_in_process_context+0x2a/0x70 > [] scsi_device_dev_release+0x17/0x19 > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] scsi_device_put+0x3d/0x42 > [] scsi_disk_put+0x30/0x41 > [] sd_release+0x4d/0x54 > [] __blkdev_put+0xa7/0x16e > [] __blkdev_put+0x15e/0x16e > [] blkdev_put+0xb/0xd > [] blkdev_close+0x37/0x3c > [] __fput+0x10d/0x1bb > [] fput+0x18/0x1a > [] filp_close+0x67/0x72 > [] sys_close+0xa5/0xe4 > [] system_call_fastpath+0x16/0x1b > [] 0xffffffffffffffff > > > Name: fcoeadm > State: D (disk sleep) > cmd: fcoeadm -d eth4 > wchan: flush_cpu_workqueue > > --- waits on cpu work to finish. -- fc_workqueue 28596 > --- holds cpu_add_remove_lock > > [] flush_cpu_workqueue+0x7b/0x87 > [] cleanup_workqueue_thread+0x6a/0xb8 > [] destroy_workqueue+0x63/0x9e > [] fc_remove_host+0x148/0x171 [scsi_transport_fc] > [] fcoe_if_destroy+0x1a6/0x1db [fcoe] > [] fcoe_destroy+0x86/0xad [fcoe] > [] param_attr_store+0x25/0x35 > [] module_attr_store+0x21/0x25 > [] sysfs_write_file+0xe4/0x119 > [] vfs_write+0xab/0x105 > [] sys_write+0x47/0x6f > [] system_call_fastpath+0x16/0x1b > [] 0xffffffffffffffff > > > Name: fc_wq_137 > State: D (disk sleep) > cmd: > wchan: scsi_disk_get_from_dev > > -- waits on sd_ref_mutex > > [] scsi_disk_get_from_dev+0x1a/0x49 > [] sd_shutdown+0x12/0x117 > [] sd_remove+0x51/0x8a > [] __device_release_driver+0x80/0xc9 > [] device_release_driver+0x1e/0x2b > [] bus_remove_device+0xa8/0xc9 > [] device_del+0x13f/0x1ac > [] __scsi_remove_device+0x44/0x81 > [] scsi_remove_device+0x26/0x33 > [] __scsi_remove_target+0x93/0xd7 > [] __remove_child+0x1e/0x25 > [] device_for_each_child+0x38/0x6f > [] scsi_remove_target+0x3b/0x48 > [] fc_starget_delete+0x21/0x25 [scsi_transport_fc] > [] fc_rport_final_delete+0xf6/0x188 [scsi_transport_fc] > [] worker_thread+0x1fa/0x30a > [] kthread+0x88/0x90 > [] child_rip+0xa/0x20 > [] 0xffffffffffffffff > > [ INFO: possible circular locking dependency detected ] > 2.6.31-rc4-rp9 #5 > ------------------------------------------------------- > hald-probe-volu/23829 is trying to acquire lock: > (cpu_add_remove_lock){+.+.+.}, at: [] > cpu_maps_update_begin+0x12/0x14 > > but task is already holding lock: > (sd_ref_mutex){+.+.+.}, at: [] scsi_disk_put+0x20/0x41 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #4 (sd_ref_mutex){+.+.+.}: > [] __lock_acquire+0xa48/0xbd0 > [] lock_acquire+0xb9/0xdd > [] __mutex_lock_common+0x48/0x328 > [] mutex_lock_nested+0x24/0x29 > [] scsi_disk_get_from_dev+0x1a/0x49 > [] sd_shutdown+0x12/0x117 > [] sd_remove+0x51/0x8a > [] __device_release_driver+0x80/0xc9 > [] device_release_driver+0x1e/0x2b > [] bus_remove_device+0xa8/0xc9 > [] device_del+0x13f/0x1ac > [] __scsi_remove_device+0x44/0x81 > [] scsi_remove_device+0x26/0x33 > [] __scsi_remove_target+0x93/0xd7 > [] __remove_child+0x1e/0x25 > [] device_for_each_child+0x38/0x6f > [] scsi_remove_target+0x3b/0x48 > [] fc_starget_delete+0x21/0x25 > [scsi_transport_fc] > [] fc_rport_final_delete+0xf6/0x188 > [scsi_transport_fc] > [] worker_thread+0x1fa/0x30a > [] kthread+0x88/0x90 > [] child_rip+0xa/0x20 > [] 0xffffffffffffffff > > -> #3 (&shost->scan_mutex){+.+.+.}: > [] __lock_acquire+0xa48/0xbd0 > [] lock_acquire+0xb9/0xdd > [] __mutex_lock_common+0x48/0x328 > [] mutex_lock_nested+0x24/0x29 > [] scsi_remove_device+0x1e/0x33 > [] __scsi_remove_target+0x93/0xd7 > [] __remove_child+0x1e/0x25 > [] device_for_each_child+0x38/0x6f > [] scsi_remove_target+0x3b/0x48 > [] fc_starget_delete+0x21/0x25 > [scsi_transport_fc] > [] fc_rport_final_delete+0xf6/0x188 > [scsi_transport_fc] > [] worker_thread+0x1fa/0x30a > [] kthread+0x88/0x90 > [] child_rip+0xa/0x20 > [] 0xffffffffffffffff > > -> #2 (&rport->rport_delete_work){+.+.+.}: > [] __lock_acquire+0xa48/0xbd0 > [] lock_acquire+0xb9/0xdd > [] worker_thread+0x1f1/0x30a > [] kthread+0x88/0x90 > [] child_rip+0xa/0x20 > [] 0xffffffffffffffff > > -> #1 ((fc_host->work_q_name)){+.+.+.}: > [] __lock_acquire+0xa48/0xbd0 > [] lock_acquire+0xb9/0xdd > [] cleanup_workqueue_thread+0x45/0xb8 > [] destroy_workqueue+0x63/0x9e > [] fc_remove_host+0x148/0x171 [scsi_transport_fc] > [] 0xffffffffa00738b2 > [] 0xffffffffa0075001 > [] param_attr_store+0x25/0x35 > [] module_attr_store+0x21/0x25 > [] sysfs_write_file+0xe4/0x119 > [] vfs_write+0xab/0x105 > [] sys_write+0x47/0x6f > [] system_call_fastpath+0x16/0x1b > [] 0xffffffffffffffff > > -> #0 (cpu_add_remove_lock){+.+.+.}: > [] __lock_acquire+0x93c/0xbd0 > [] lock_acquire+0xb9/0xdd > [] __mutex_lock_common+0x48/0x328 > [] mutex_lock_nested+0x24/0x29 > [] cpu_maps_update_begin+0x12/0x14 > [] destroy_workqueue+0x2b/0x9e > [] scsi_host_dev_release+0x5a/0xbd > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] fc_rport_dev_release+0x18/0x24 > [scsi_transport_fc] > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] scsi_target_dev_release+0x1d/0x21 > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] > scsi_device_dev_release_usercontext+0x118/0x124 > [] execute_in_process_context+0x2a/0x70 > [] scsi_device_dev_release+0x17/0x19 > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] scsi_device_put+0x3d/0x42 > [] scsi_disk_put+0x30/0x41 > [] sd_release+0x4d/0x54 > [] __blkdev_put+0xa7/0x16e > [] __blkdev_put+0x15e/0x16e > [] blkdev_put+0xb/0xd > [] blkdev_close+0x37/0x3c > [] __fput+0x10d/0x1bb > [] fput+0x18/0x1a > [] filp_close+0x67/0x72 > [] sys_close+0xa5/0xe4 > [] system_call_fastpath+0x16/0x1b > [] 0xffffffffffffffff > > other info that might help us debug this: > > 2 locks held by hald-probe-volu/23829: > #0: (&bdev->bd_mutex/1){+.+.+.}, at: [] > __blkdev_put+0x2e/0x16e > #1: (sd_ref_mutex){+.+.+.}, at: [] > scsi_disk_put+0x20/0x41 > > stack backtrace: > Pid: 23829, comm: hald-probe-volu Not tainted 2.6.31-rc4-rp9 #5 > Call Trace: > [] print_circular_bug_tail+0x71/0x7c > [] __lock_acquire+0x93c/0xbd0 > [] lock_acquire+0xb9/0xdd > [] ? cpu_maps_update_begin+0x12/0x14 > [] __mutex_lock_common+0x48/0x328 > [] ? cpu_maps_update_begin+0x12/0x14 > [] ? trace_hardirqs_on+0xd/0xf > [] ? _spin_unlock_irq+0x2b/0x30 > [] ? cpu_maps_update_begin+0x12/0x14 > [] ? wait_for_common+0xf7/0x112 > [] ? default_wake_function+0x0/0xf > [] mutex_lock_nested+0x24/0x29 > [] cpu_maps_update_begin+0x12/0x14 > [] destroy_workqueue+0x2b/0x9e > [] scsi_host_dev_release+0x5a/0xbd > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] ? kobject_release+0x0/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] fc_rport_dev_release+0x18/0x24 [scsi_transport_fc] > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] ? kobject_release+0x0/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] scsi_target_dev_release+0x1d/0x21 > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] ? kobject_release+0x0/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] scsi_device_dev_release_usercontext+0x118/0x124 > [] ? scsi_device_dev_release_usercontext+0x0/0x124 > [] execute_in_process_context+0x2a/0x70 > [] scsi_device_dev_release+0x17/0x19 > [] device_release+0x49/0x75 > [] kobject_release+0x51/0x67 > [] ? kobject_release+0x0/0x67 > [] kref_put+0x43/0x4f > [] kobject_put+0x47/0x4b > [] put_device+0x12/0x14 > [] scsi_device_put+0x3d/0x42 > [] scsi_disk_put+0x30/0x41 > [] sd_release+0x4d/0x54 > [] __blkdev_put+0xa7/0x16e > [] __blkdev_put+0x15e/0x16e > [] blkdev_put+0xb/0xd > [] blkdev_close+0x37/0x3c > [] __fput+0x10d/0x1bb > [] fput+0x18/0x1a > [] filp_close+0x67/0x72 > [] sys_close+0xa5/0xe4 > [] system_call_fastpath+0x16/0x1b > > -- end > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html