From mboxrd@z Thu Jan 1 00:00:00 1970 From: Robert Love Subject: [PATCH] fcoe: Drop the rtnl_mutex before calling fcoe_ctlr_link_up Date: Tue, 13 Mar 2012 18:22:12 -0700 Message-ID: <20120314012212.7143.52793.stgit@localhost6.localdomain6> References: Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Return-path: Received: from mga01.intel.com ([192.55.52.88]:16507 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760077Ab2CNBWM (ORCPT ); Tue, 13 Mar 2012 21:22:12 -0400 In-Reply-To: Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: bvanassche@acm.org, linux-scsi@vger.kernel.org Cc: devel@open-fcoe.org, yi.zou@intel.com The rtnl_lock is primarily used to serialize networking driver changes as well as to ensure that a networking driver is not removed when making changes to it. fcoe also uses the rtnl_lock to protect the fcoe hostlist. fcoe_create holds the rtnl_lock over the entirity of the routine including a the call to fcoe_ctlr_link_up. This causes the below deadlock because fcoe_ctlr_link_up acquires the fcoe_ctlr ctlr_mutex and this deadlocks with a libfcoe thread that acquires the fcoe_ctlr ctlr_mutex and then the rtnl_lock (to update a MAC address). This patch drops the rtnl_lock before calling fcoe_ctlr_link_up and therefore the deadlock is prevented. https://bugzilla.kernel.org/show_bug.cgi?id=42918 the existing dependency chain (in reverse order) is: -> #1 (&fip->ctlr_mutex){+.+...}: [] lock_acquire+0x80/0x1b0 [] mutex_lock_nested+0x6d/0x340 [] fcoe_ctlr_link_up+0x22/0x180 [libfcoe] [] fcoe_create+0x47e/0x6e0 [fcoe] [] fcoe_transport_create+0x143/0x250 [libfcoe] [] param_attr_store+0x30/0x60 [] module_attr_store+0x26/0x40 [] sysfs_write_file+0xae/0x100 [] vfs_write+0x8f/0x160 [] sys_write+0x3d/0x70 [] syscall_call+0x7/0xb -> #0 (rtnl_mutex){+.+.+.}: [] __lock_acquire+0x140b/0x1720 [] lock_acquire+0x80/0x1b0 [] mutex_lock_nested+0x6d/0x340 [] rtnl_lock+0x14/0x20 [] fcoe_update_src_mac+0x2c/0xb0 [fcoe] [] fcoe_ctlr_timer_work+0x712/0xb60 [libfcoe] [] process_one_work+0x179/0x5d0 [] worker_thread+0x121/0x2d0 [] kthread+0x7d/0x90 [] kernel_thread_helper+0x6/0x10 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fip->ctlr_mutex); lock(rtnl_mutex); lock(&fip->ctlr_mutex); lock(rtnl_mutex); *** DEADLOCK *** Signed-off-by: Robert Love --- drivers/scsi/fcoe/fcoe.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c index e959960..408ca0e 100644 --- a/drivers/scsi/fcoe/fcoe.c +++ b/drivers/scsi/fcoe/fcoe.c @@ -2133,8 +2133,12 @@ static int fcoe_create(struct net_device *netdev, enum fip_state fip_mode) /* start FIP Discovery and FLOGI */ lport->boot_time = jiffies; fc_fabric_login(lport); - if (!fcoe_link_ok(lport)) + if (!fcoe_link_ok(lport)) { + rtnl_unlock(); fcoe_ctlr_link_up(&fcoe->ctlr); + mutex_unlock(&fcoe_config_mutex); + return rc; + } out_nodev: rtnl_unlock();