From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joe Eykholt Subject: Re: fc_remove_host / scsi_host_dev_release deadlock Date: Fri, 07 Aug 2009 14:42:02 -0700 Message-ID: <4A7C9FAA.6070506@cisco.com> References: <4A7C6D56.6080602@cisco.com> <4A7C8F7F.2080706@cisco.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from sj-iport-3.cisco.com ([171.71.176.72]:3522 "EHLO sj-iport-3.cisco.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753216AbZHGVmR (ORCPT ); Fri, 7 Aug 2009 17:42:17 -0400 Received: from sj-core-2.cisco.com (sj-core-2.cisco.com [171.71.177.254]) by sj-dkim-1.cisco.com (8.12.11/8.12.11) with ESMTP id n77Lg6Kv020727 for ; Fri, 7 Aug 2009 14:42:06 -0700 Received: from jeykholt-mac.nuovasystems.com ([172.30.136.66]) by sj-core-2.cisco.com (8.13.8/8.14.3) with ESMTP id n77Lg6Lu003348 for ; Fri, 7 Aug 2009 21:42:06 GMT In-Reply-To: <4A7C8F7F.2080706@cisco.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: "linux-scsi@vger.kernel.org" Joe Eykholt wrote: > Joe Eykholt wrote: >> This is different from the 3-way deadlock I posted on 7/25 >> http://marc.info/?l=linux-scsi&m=124856078500527&w=2 >> >> A user process is closing an sd block dev file. >> It holds the sd_ref_mutex and is in scsi_host_dev_release() >> doing a destroy_workqueue() which needs the cpu_add_remove_lock. >> >> The process that holds cpu_add_remove_lock is doing >> fcoe_destroy (deleting an FCoE instance) which calls >> fc_remove_host() which does destroy_workqueue() and waiting in >> flush_cpu_workqueue() for a work item to finish. >> >> The stuck work item is doing fc_rport_final_delete(), and >> that's in device_release_driver() in sd_remove() which >> is waiting for the sd_ref_mutex. >> >> I'd like to hear some discussion about the right approach >> to fix this. There should be some clear rules about how >> work items can destroy or flush other workqueues. >> I'd like to be able to state the clear rule that's being >> violated here. > > Now it seems to me that that the rule is: > > A workqueue item may not destroy a workqueue if it > is running on a workqueue that itself may be destroyed. > Rephrasing: If a workqueue may be destroyed, it's handlers may not destroy any workqueue. > That's because destroy_workqueue takes the cpu_add_remove lock > and, while holding it, does flush_cpu_workqueue(). > > If a work item on workqueue A destroy workqueue B, > and thread C destroys workqueue A, then thread C > may get the cpu_add_remove lock first, work item on A > can wait forever for the lock, and thread C can > hang waiting for workqueue A to flush. > > So we're violating that rule if we let fc_rport_final_delete() > do the destroy of the device's workqueue. I still don't > have a good solution. Maybe fc_host_remove() could flush > the workqueue before destroying it? > > Maybe destroy_workqueue() should flush before grabbing > the cpu_add_remove_lock, but maybe that doesn't completely fix it. > > Joe Sorry for the noise. The rule above is true, but it's not really what's causing the problem here, which is: sd_ref_mutex is held during a destroy_workqueue() for the shost->work_q. It means sd_release must have been the the last one to release the scsi_host shost_gendev. Since fc_remove_host() is running, but it must be for the other fcoe instance, not the same host. sd_ref_mutex is also grabbed from a work item handler on the fc_host fc_wq. Since both these workqueues can be destroyed, its wrong to hold sd_ref mutex while destroying shost->work_q. Since it may also be grabbed in a work item that needs to be finished before destroying its work queue. So my second rule is: If a workqueue may be destroyed, it's handlers may not take a lock if that lock is ever held while destroying any workqueue. These two rules are caused by the use of cpu_add_remove_lock. Maybe that's the issue to work on? The first rule is violated in fc_rport_final_delete() and the second rule is violated for sd_ref_mutex also in fc_rport_final_delete(). Would it be OK to run fc_rport_final_delete() on the general event workqueue (non-destroyable)? I'm not sure that fixes it. Is there a way to fix the violations, or should we manage to change the rules somehow? Does anyone find a problem with the rules I've stated? Thanks, Joe >> Perhaps the way cpu_add_remove_lock is used to in >> destroy_workqueue() should be changed, or maybe >> fc_rport_final_delete and fc_remove_host need to >> be better about using workqueues somehow. >> >> I do see a lock dependency warning about this, so that >> system works. I saw this once on a system based >> on 2.6.31-rc4 under the current fcoe-next.git tree. >> >> The following are the stacks I collected via /proc/*/stacks >> and the lockdep warnings. >> >> Thanks, >> Joe >> >> >> Name: hald-probe-volu >> State: D (disk sleep) >> cmd: /usr/libexec/hald-probe-volume >> wchan: cpu_maps_update_begin >> >> --- waiting cpu_add_remove_lock >> --- has sd_ref_mutex in scsi_disk_put >> >> [] cpu_maps_update_begin+0x12/0x14 >> [] destroy_workqueue+0x2b/0x9e >> [] scsi_host_dev_release+0x5a/0xbd >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] fc_rport_dev_release+0x18/0x24 [scsi_transport_fc] >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] scsi_target_dev_release+0x1d/0x21 >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] scsi_device_dev_release_usercontext+0x118/0x124 >> [] execute_in_process_context+0x2a/0x70 >> [] scsi_device_dev_release+0x17/0x19 >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] scsi_device_put+0x3d/0x42 >> [] scsi_disk_put+0x30/0x41 >> [] sd_release+0x4d/0x54 >> [] __blkdev_put+0xa7/0x16e >> [] __blkdev_put+0x15e/0x16e >> [] blkdev_put+0xb/0xd >> [] blkdev_close+0x37/0x3c >> [] __fput+0x10d/0x1bb >> [] fput+0x18/0x1a >> [] filp_close+0x67/0x72 >> [] sys_close+0xa5/0xe4 >> [] system_call_fastpath+0x16/0x1b >> [] 0xffffffffffffffff >> >> >> Name: fcoeadm >> State: D (disk sleep) >> cmd: fcoeadm -d eth4 >> wchan: flush_cpu_workqueue >> >> --- waits on cpu work to finish. -- fc_workqueue 28596 >> --- holds cpu_add_remove_lock >> >> [] flush_cpu_workqueue+0x7b/0x87 >> [] cleanup_workqueue_thread+0x6a/0xb8 >> [] destroy_workqueue+0x63/0x9e >> [] fc_remove_host+0x148/0x171 [scsi_transport_fc] >> [] fcoe_if_destroy+0x1a6/0x1db [fcoe] >> [] fcoe_destroy+0x86/0xad [fcoe] >> [] param_attr_store+0x25/0x35 >> [] module_attr_store+0x21/0x25 >> [] sysfs_write_file+0xe4/0x119 >> [] vfs_write+0xab/0x105 >> [] sys_write+0x47/0x6f >> [] system_call_fastpath+0x16/0x1b >> [] 0xffffffffffffffff >> >> >> Name: fc_wq_137 >> State: D (disk sleep) >> cmd: >> wchan: scsi_disk_get_from_dev >> >> -- waits on sd_ref_mutex >> >> [] scsi_disk_get_from_dev+0x1a/0x49 >> [] sd_shutdown+0x12/0x117 >> [] sd_remove+0x51/0x8a >> [] __device_release_driver+0x80/0xc9 >> [] device_release_driver+0x1e/0x2b >> [] bus_remove_device+0xa8/0xc9 >> [] device_del+0x13f/0x1ac >> [] __scsi_remove_device+0x44/0x81 >> [] scsi_remove_device+0x26/0x33 >> [] __scsi_remove_target+0x93/0xd7 >> [] __remove_child+0x1e/0x25 >> [] device_for_each_child+0x38/0x6f >> [] scsi_remove_target+0x3b/0x48 >> [] fc_starget_delete+0x21/0x25 [scsi_transport_fc] >> [] fc_rport_final_delete+0xf6/0x188 [scsi_transport_fc] >> [] worker_thread+0x1fa/0x30a >> [] kthread+0x88/0x90 >> [] child_rip+0xa/0x20 >> [] 0xffffffffffffffff >> >> [ INFO: possible circular locking dependency detected ] >> 2.6.31-rc4-rp9 #5 >> ------------------------------------------------------- >> hald-probe-volu/23829 is trying to acquire lock: >> (cpu_add_remove_lock){+.+.+.}, at: [] >> cpu_maps_update_begin+0x12/0x14 >> >> but task is already holding lock: >> (sd_ref_mutex){+.+.+.}, at: [] >> scsi_disk_put+0x20/0x41 >> >> which lock already depends on the new lock. >> >> >> the existing dependency chain (in reverse order) is: >> >> -> #4 (sd_ref_mutex){+.+.+.}: >> [] __lock_acquire+0xa48/0xbd0 >> [] lock_acquire+0xb9/0xdd >> [] __mutex_lock_common+0x48/0x328 >> [] mutex_lock_nested+0x24/0x29 >> [] scsi_disk_get_from_dev+0x1a/0x49 >> [] sd_shutdown+0x12/0x117 >> [] sd_remove+0x51/0x8a >> [] __device_release_driver+0x80/0xc9 >> [] device_release_driver+0x1e/0x2b >> [] bus_remove_device+0xa8/0xc9 >> [] device_del+0x13f/0x1ac >> [] __scsi_remove_device+0x44/0x81 >> [] scsi_remove_device+0x26/0x33 >> [] __scsi_remove_target+0x93/0xd7 >> [] __remove_child+0x1e/0x25 >> [] device_for_each_child+0x38/0x6f >> [] scsi_remove_target+0x3b/0x48 >> [] fc_starget_delete+0x21/0x25 >> [scsi_transport_fc] >> [] fc_rport_final_delete+0xf6/0x188 >> [scsi_transport_fc] >> [] worker_thread+0x1fa/0x30a >> [] kthread+0x88/0x90 >> [] child_rip+0xa/0x20 >> [] 0xffffffffffffffff >> >> -> #3 (&shost->scan_mutex){+.+.+.}: >> [] __lock_acquire+0xa48/0xbd0 >> [] lock_acquire+0xb9/0xdd >> [] __mutex_lock_common+0x48/0x328 >> [] mutex_lock_nested+0x24/0x29 >> [] scsi_remove_device+0x1e/0x33 >> [] __scsi_remove_target+0x93/0xd7 >> [] __remove_child+0x1e/0x25 >> [] device_for_each_child+0x38/0x6f >> [] scsi_remove_target+0x3b/0x48 >> [] fc_starget_delete+0x21/0x25 >> [scsi_transport_fc] >> [] fc_rport_final_delete+0xf6/0x188 >> [scsi_transport_fc] >> [] worker_thread+0x1fa/0x30a >> [] kthread+0x88/0x90 >> [] child_rip+0xa/0x20 >> [] 0xffffffffffffffff >> >> -> #2 (&rport->rport_delete_work){+.+.+.}: >> [] __lock_acquire+0xa48/0xbd0 >> [] lock_acquire+0xb9/0xdd >> [] worker_thread+0x1f1/0x30a >> [] kthread+0x88/0x90 >> [] child_rip+0xa/0x20 >> [] 0xffffffffffffffff >> >> -> #1 ((fc_host->work_q_name)){+.+.+.}: >> [] __lock_acquire+0xa48/0xbd0 >> [] lock_acquire+0xb9/0xdd >> [] cleanup_workqueue_thread+0x45/0xb8 >> [] destroy_workqueue+0x63/0x9e >> [] fc_remove_host+0x148/0x171 >> [scsi_transport_fc] >> [] 0xffffffffa00738b2 >> [] 0xffffffffa0075001 >> [] param_attr_store+0x25/0x35 >> [] module_attr_store+0x21/0x25 >> [] sysfs_write_file+0xe4/0x119 >> [] vfs_write+0xab/0x105 >> [] sys_write+0x47/0x6f >> [] system_call_fastpath+0x16/0x1b >> [] 0xffffffffffffffff >> >> -> #0 (cpu_add_remove_lock){+.+.+.}: >> [] __lock_acquire+0x93c/0xbd0 >> [] lock_acquire+0xb9/0xdd >> [] __mutex_lock_common+0x48/0x328 >> [] mutex_lock_nested+0x24/0x29 >> [] cpu_maps_update_begin+0x12/0x14 >> [] destroy_workqueue+0x2b/0x9e >> [] scsi_host_dev_release+0x5a/0xbd >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] fc_rport_dev_release+0x18/0x24 >> [scsi_transport_fc] >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] scsi_target_dev_release+0x1d/0x21 >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] >> scsi_device_dev_release_usercontext+0x118/0x124 >> [] execute_in_process_context+0x2a/0x70 >> [] scsi_device_dev_release+0x17/0x19 >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] scsi_device_put+0x3d/0x42 >> [] scsi_disk_put+0x30/0x41 >> [] sd_release+0x4d/0x54 >> [] __blkdev_put+0xa7/0x16e >> [] __blkdev_put+0x15e/0x16e >> [] blkdev_put+0xb/0xd >> [] blkdev_close+0x37/0x3c >> [] __fput+0x10d/0x1bb >> [] fput+0x18/0x1a >> [] filp_close+0x67/0x72 >> [] sys_close+0xa5/0xe4 >> [] system_call_fastpath+0x16/0x1b >> [] 0xffffffffffffffff >> >> other info that might help us debug this: >> >> 2 locks held by hald-probe-volu/23829: >> #0: (&bdev->bd_mutex/1){+.+.+.}, at: [] >> __blkdev_put+0x2e/0x16e >> #1: (sd_ref_mutex){+.+.+.}, at: [] >> scsi_disk_put+0x20/0x41 >> >> stack backtrace: >> Pid: 23829, comm: hald-probe-volu Not tainted 2.6.31-rc4-rp9 #5 >> Call Trace: >> [] print_circular_bug_tail+0x71/0x7c >> [] __lock_acquire+0x93c/0xbd0 >> [] lock_acquire+0xb9/0xdd >> [] ? cpu_maps_update_begin+0x12/0x14 >> [] __mutex_lock_common+0x48/0x328 >> [] ? cpu_maps_update_begin+0x12/0x14 >> [] ? trace_hardirqs_on+0xd/0xf >> [] ? _spin_unlock_irq+0x2b/0x30 >> [] ? cpu_maps_update_begin+0x12/0x14 >> [] ? wait_for_common+0xf7/0x112 >> [] ? default_wake_function+0x0/0xf >> [] mutex_lock_nested+0x24/0x29 >> [] cpu_maps_update_begin+0x12/0x14 >> [] destroy_workqueue+0x2b/0x9e >> [] scsi_host_dev_release+0x5a/0xbd >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] ? kobject_release+0x0/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] fc_rport_dev_release+0x18/0x24 [scsi_transport_fc] >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] ? kobject_release+0x0/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] scsi_target_dev_release+0x1d/0x21 >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] ? kobject_release+0x0/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] scsi_device_dev_release_usercontext+0x118/0x124 >> [] ? scsi_device_dev_release_usercontext+0x0/0x124 >> [] execute_in_process_context+0x2a/0x70 >> [] scsi_device_dev_release+0x17/0x19 >> [] device_release+0x49/0x75 >> [] kobject_release+0x51/0x67 >> [] ? kobject_release+0x0/0x67 >> [] kref_put+0x43/0x4f >> [] kobject_put+0x47/0x4b >> [] put_device+0x12/0x14 >> [] scsi_device_put+0x3d/0x42 >> [] scsi_disk_put+0x30/0x41 >> [] sd_release+0x4d/0x54 >> [] __blkdev_put+0xa7/0x16e >> [] __blkdev_put+0x15e/0x16e >> [] blkdev_put+0xb/0xd >> [] blkdev_close+0x37/0x3c >> [] __fput+0x10d/0x1bb >> [] fput+0x18/0x1a >> [] filp_close+0x67/0x72 >> [] sys_close+0xa5/0xe4 >> [] system_call_fastpath+0x16/0x1b >> >> -- end >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html