All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] zfcp: Invalid locking order
@ 2007-02-07 12:17 Swen Schillig
  2007-02-07 16:06 ` Andreas Herrmann
  0 siblings, 1 reply; 5+ messages in thread
From: Swen Schillig @ 2007-02-07 12:17 UTC (permalink / raw)
  To: James Bottomley, linux-scsi

From: Swen Schillig <swen@vnet.ibm.com>

Invalid locking order. Kernel hangs after trying to take two locks
which are dependend on each other. Introducing temporary variable
to free requests. Free lock after requests are copied.
    
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
---
 drivers/s390/scsi/zfcp_ext.h |    2 +-
 drivers/s390/scsi/zfcp_fsf.c |   23 ++++++++++-------------
 2 files changed, 11 insertions(+), 14 deletions(-)

Index: linux-2.6/drivers/s390/scsi/zfcp_ext.h
===================================================================
--- linux-2.6.orig/drivers/s390/scsi/zfcp_ext.h
+++ linux-2.6/drivers/s390/scsi/zfcp_ext.h
@@ -89,7 +89,7 @@ extern int  zfcp_fsf_control_file(struct
 				  u32, u32, struct zfcp_sg_list *);
 extern void zfcp_fsf_start_timer(struct zfcp_fsf_req *, unsigned long);
 extern void zfcp_erp_start_timer(struct zfcp_fsf_req *);
-extern int  zfcp_fsf_req_dismiss_all(struct zfcp_adapter *);
+extern void zfcp_fsf_req_dismiss_all(struct zfcp_adapter *);
 extern int  zfcp_fsf_status_read(struct zfcp_adapter *, int);
 extern int zfcp_fsf_req_create(struct zfcp_adapter *, u32, int, mempool_t *,
 			       unsigned long *, struct zfcp_fsf_req **);
Index: linux-2.6/drivers/s390/scsi/zfcp_fsf.c
===================================================================
--- linux-2.6.orig/drivers/s390/scsi/zfcp_fsf.c
+++ linux-2.6/drivers/s390/scsi/zfcp_fsf.c
@@ -176,28 +176,25 @@ static void zfcp_fsf_req_dismiss(struct 
 /**
  * zfcp_fsf_req_dismiss_all - dismiss all remaining fsf requests
  */
-int zfcp_fsf_req_dismiss_all(struct zfcp_adapter *adapter)
+void zfcp_fsf_req_dismiss_all(struct zfcp_adapter *adapter)
 {
 	struct zfcp_fsf_req *request, *tmp;
 	unsigned long flags;
+	LIST_HEAD(remove_queue);
 	unsigned int i, counter;
 
 	spin_lock_irqsave(&adapter->req_list_lock, flags);
 	atomic_set(&adapter->reqs_active, 0);
-	for (i=0; i<REQUEST_LIST_SIZE; i++) {
-		if (list_empty(&adapter->req_list[i]))
-			continue;
-
-		counter = 0;
-		list_for_each_entry_safe(request, tmp,
-					 &adapter->req_list[i], list) {
-			zfcp_fsf_req_dismiss(adapter, request, counter);
-			counter++;
-		}
-	}
+	for (i=0; i<REQUEST_LIST_SIZE; i++)
+		list_splice_init(&adapter->req_list[i], &remove_queue);
+
 	spin_unlock_irqrestore(&adapter->req_list_lock, flags);
 
-	return 0;
+	counter = 0;
+	list_for_each_entry_safe(request, tmp, &remove_queue, list) {
+		zfcp_fsf_req_dismiss(adapter, request, counter);
+		counter++;
+	}
 }
 
 /*

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] zfcp: Invalid locking order
  2007-02-07 12:17 [PATCH] zfcp: Invalid locking order Swen Schillig
@ 2007-02-07 16:06 ` Andreas Herrmann
  2007-02-07 16:19   ` Swen Schillig
  2007-02-08 14:40   ` Heiko Carstens
  0 siblings, 2 replies; 5+ messages in thread
From: Andreas Herrmann @ 2007-02-07 16:06 UTC (permalink / raw)
  To: Swen Schillig; +Cc: linux-scsi

On Wed, Feb 07, 2007 at 01:17:57PM +0100, Swen Schillig wrote:
> From: Swen Schillig <swen@vnet.ibm.com>
> 
> Invalid locking order. Kernel hangs after trying to take two locks
> which are dependend on each other. Introducing temporary variable
> to free requests. Free lock after requests are copied.
>     

I am just curious. You didn't mention which locks are causing the dead
lock.

I've glanced through the code and it seems that locking order
of abort_lock and req_list_lock for adapters is inconsistent.
Is that the bug you try to fix?


Regards,

Andreas




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] zfcp: Invalid locking order
  2007-02-07 16:06 ` Andreas Herrmann
@ 2007-02-07 16:19   ` Swen Schillig
  2007-02-08 14:40   ` Heiko Carstens
  1 sibling, 0 replies; 5+ messages in thread
From: Swen Schillig @ 2007-02-07 16:19 UTC (permalink / raw)
  To: Andreas Herrmann; +Cc: linux-scsi

Hi Andreas

No, not those. 
We got a possible recursive locking on the adapter->request_queue.queue_lock

Cheers Swen

On Wednesday 07 February 2007 17:06, Andreas Herrmann wrote:
> On Wed, Feb 07, 2007 at 01:17:57PM +0100, Swen Schillig wrote:
> > From: Swen Schillig <swen@vnet.ibm.com>
> > 
> > Invalid locking order. Kernel hangs after trying to take two locks
> > which are dependend on each other. Introducing temporary variable
> > to free requests. Free lock after requests are copied.
> >     
> 
> I am just curious. You didn't mention which locks are causing the dead
> lock.
> 
> I've glanced through the code and it seems that locking order
> of abort_lock and req_list_lock for adapters is inconsistent.
> Is that the bug you try to fix?
> 
> 
> Regards,
> 
> Andreas
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] zfcp: Invalid locking order
  2007-02-07 16:06 ` Andreas Herrmann
  2007-02-07 16:19   ` Swen Schillig
@ 2007-02-08 14:40   ` Heiko Carstens
  2007-02-08 16:18     ` Andreas Herrmann
  1 sibling, 1 reply; 5+ messages in thread
From: Heiko Carstens @ 2007-02-08 14:40 UTC (permalink / raw)
  To: Andreas Herrmann; +Cc: Swen Schillig, linux-scsi

On Wed, Feb 07, 2007 at 05:06:43PM +0100, Andreas Herrmann wrote:
> On Wed, Feb 07, 2007 at 01:17:57PM +0100, Swen Schillig wrote:
> > From: Swen Schillig <swen@vnet.ibm.com>
> > 
> > Invalid locking order. Kernel hangs after trying to take two locks
> > which are dependend on each other. Introducing temporary variable
> > to free requests. Free lock after requests are copied.
> >     
> 
> I am just curious. You didn't mention which locks are causing the dead
> lock.
> 
> I've glanced through the code and it seems that locking order
> of abort_lock and req_list_lock for adapters is inconsistent.
> Is that the bug you try to fix?

It's a possible A-B-B-A deadlock on the erp_lock and req_list_lock
(see output below).  The bug was introduced with
fea9d6c7bcd8ff1d60ff74f27ba483b3820b18a3 and this patch reverts
parts of it.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.20-rc6-mm3 #2
-------------------------------------------------------
zfcperp0.0.5207/165 is trying to acquire lock:
 (&adapter->erp_lock){++..}, at: [<802110ea>] zfcp_erp_port_forced_reopen+0x3a/0x68

but task is already holding lock:
 (&adapter->req_list_lock){++..}, at: [<8021bec2>] zfcp_fsf_req_dismiss_all+0x2e/0x124

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&adapter->req_list_lock){++..}:
       [<000522e0>] lock_acquire+0x6c/0x94
       [<002d7dc2>] _spin_lock+0x46/0x58
       [<0020c0ec>] zfcp_erp_strategy_check_action+0x130/0x328
       [<00210008>] zfcp_erp_thread+0x1cc/0x1274
       [<00018dfa>] kernel_thread_starter+0x6/0xc
       [<00018df4>] kernel_thread_starter+0x0/0xc

-> #0 (&adapter->erp_lock){++..}:
       [<000522e0>] lock_acquire+0x6c/0x94
       [<002d819e>] _write_lock+0x46/0x58
       [<002110ea>] zfcp_erp_port_forced_reopen+0x3a/0x68
       [<0021159e>] zfcp_erp_adisc_handler+0x1a6/0x258
       [<002138ac>] zfcp_fsf_send_els_handler+0x140/0x548
       [<00216afe>] zfcp_fsf_req_dispatch+0x1a6/0x1c44
       [<0021b266>] zfcp_fsf_req_complete+0xfa/0xd28
       [<0021bf6c>] zfcp_fsf_req_dismiss_all+0xd8/0x124
       [<0020c91e>] zfcp_erp_adapter_strategy_generic+0x49e/0xb5c
       [<0020e37e>] zfcp_erp_strategy_do_action+0x44a/0x17c4
       [<0020ffe4>] zfcp_erp_thread+0x1a8/0x1274
       [<00018dfa>] kernel_thread_starter+0x6/0xc
       [<00018df4>] kernel_thread_starter+0x0/0xc

other info that might help us debug this:

2 locks held by zfcperp0.0.5207/165:
 #0:  (&adapter->req_list_lock){++..}, at: [<8021bec2>] zfcp_fsf_req_dismiss_all+0x2e/0x124
 #1:  (&zfcp_data.config_lock){..++}, at: [<802110dc>] zfcp_erp_port_forced_reopen+0x2c/0x68

stack backtrace:
048d1af0 00000000 0f8d1bb8 0f8d1b6c 00000000 00000000 00000000 00000000 
       00000002 00367cd6 00367cd6 8001504e 
Call Trace:
([<0000000000014f9c>] show_trace+0x70/0xb4)
 [<000000000001506c>] show_stack+0x8c/0xb0
 [<00000000000150b6>] dump_stack+0x26/0x30
 [<000000000004f0e0>] print_circular_bug_tail+0x8c/0x90
 [<00000000000516a4>] __lock_acquire+0x838/0x1018
 [<00000000000522e0>] lock_acquire+0x6c/0x94
 [<00000000002d819e>] _write_lock+0x46/0x58
 [<00000000002110ea>] zfcp_erp_port_forced_reopen+0x3a/0x68
 [<000000000021159e>] zfcp_erp_adisc_handler+0x1a6/0x258
 [<00000000002138ac>] zfcp_fsf_send_els_handler+0x140/0x548
 [<0000000000216afe>] zfcp_fsf_req_dispatch+0x1a6/0x1c44
 [<000000000021b266>] zfcp_fsf_req_complete+0xfa/0xd28
 [<000000000021bf6c>] zfcp_fsf_req_dismiss_all+0xd8/0x124
 [<000000000020c91e>] zfcp_erp_adapter_strategy_generic+0x49e/0xb5c
 [<000000000020e37e>] zfcp_erp_strategy_do_action+0x44a/0x17c4
 [<000000000020ffe4>] zfcp_erp_thread+0x1a8/0x1274
 [<0000000000018dfa>] kernel_thread_starter+0x6/0xc
 [<0000000000018df4>] kernel_thread_starter+0x0/0xc

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] zfcp: Invalid locking order
  2007-02-08 14:40   ` Heiko Carstens
@ 2007-02-08 16:18     ` Andreas Herrmann
  0 siblings, 0 replies; 5+ messages in thread
From: Andreas Herrmann @ 2007-02-08 16:18 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: Swen Schillig, linux-scsi

On Thu, Feb 08, 2007 at 03:40:03PM +0100, Heiko Carstens wrote:
> On Wed, Feb 07, 2007 at 05:06:43PM +0100, Andreas Herrmann wrote:
> > On Wed, Feb 07, 2007 at 01:17:57PM +0100, Swen Schillig wrote:
> > > From: Swen Schillig <swen@vnet.ibm.com>
> > > 
> > > Invalid locking order. Kernel hangs after trying to take two locks
> > > which are dependend on each other. Introducing temporary variable
> > > to free requests. Free lock after requests are copied.
> > >     
> > 
> > I am just curious. You didn't mention which locks are causing the dead
> > lock.
> > 
> > I've glanced through the code and it seems that locking order
> > of abort_lock and req_list_lock for adapters is inconsistent.
> > Is that the bug you try to fix?
> 
> It's a possible A-B-B-A deadlock on the erp_lock and req_list_lock
> (see output below).  The bug was introduced with
> fea9d6c7bcd8ff1d60ff74f27ba483b3820b18a3 and this patch reverts
> parts of it.
> 

Pretty good catch.

I might be wrong but the patch seems to fix another potential deadlock:


CPU#0
0: zfcp_fsf_req_dimiss_all()
	spin_lock_irqsave(&adapter->req_list_lock, flags);
1: zfcp_fsf_req_dismiss()
2: zfcp_fsf_protstatus_eval()
3: zfcp_fsf_fsfstatus_eval()
4: zfcp_fsf_req_dispatch()
5: zfcp_fsf_send_fcp_command_handler()
6: zfcp_fsf_send_fcp_command_task_handler()
	read_lock_irqsave(&fsf_req->adapter->abort_lock, flags);

CPU#1
0: zfcp_scsi_eh_abort_handler()
	write_lock_irqsave(&adapter->abort_lock, flags);
	spin_lock(&adapter->req_list_lock);

But I currently do not overlook whether zfcp_fsf_req_dimiss_all()
and zfcp_scsi_eh_abort_handler() can run simultaneously for the
same adapter.

However that may be, with Swen's patch this case won't happen.


Regards,

Andreas

-- 
AMD Saxony, Dresden, Germany
Operating System Research Center




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-02-08 16:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-07 12:17 [PATCH] zfcp: Invalid locking order Swen Schillig
2007-02-07 16:06 ` Andreas Herrmann
2007-02-07 16:19   ` Swen Schillig
2007-02-08 14:40   ` Heiko Carstens
2007-02-08 16:18     ` Andreas Herrmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.