* [PATCH] zfcp: Invalid locking order
@ 2007-02-07 12:17 Swen Schillig
2007-02-07 16:06 ` Andreas Herrmann
0 siblings, 1 reply; 5+ messages in thread
From: Swen Schillig @ 2007-02-07 12:17 UTC (permalink / raw)
To: James Bottomley, linux-scsi
From: Swen Schillig <swen@vnet.ibm.com>
Invalid locking order. Kernel hangs after trying to take two locks
which are dependend on each other. Introducing temporary variable
to free requests. Free lock after requests are copied.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
---
drivers/s390/scsi/zfcp_ext.h | 2 +-
drivers/s390/scsi/zfcp_fsf.c | 23 ++++++++++-------------
2 files changed, 11 insertions(+), 14 deletions(-)
Index: linux-2.6/drivers/s390/scsi/zfcp_ext.h
===================================================================
--- linux-2.6.orig/drivers/s390/scsi/zfcp_ext.h
+++ linux-2.6/drivers/s390/scsi/zfcp_ext.h
@@ -89,7 +89,7 @@ extern int zfcp_fsf_control_file(struct
u32, u32, struct zfcp_sg_list *);
extern void zfcp_fsf_start_timer(struct zfcp_fsf_req *, unsigned long);
extern void zfcp_erp_start_timer(struct zfcp_fsf_req *);
-extern int zfcp_fsf_req_dismiss_all(struct zfcp_adapter *);
+extern void zfcp_fsf_req_dismiss_all(struct zfcp_adapter *);
extern int zfcp_fsf_status_read(struct zfcp_adapter *, int);
extern int zfcp_fsf_req_create(struct zfcp_adapter *, u32, int, mempool_t *,
unsigned long *, struct zfcp_fsf_req **);
Index: linux-2.6/drivers/s390/scsi/zfcp_fsf.c
===================================================================
--- linux-2.6.orig/drivers/s390/scsi/zfcp_fsf.c
+++ linux-2.6/drivers/s390/scsi/zfcp_fsf.c
@@ -176,28 +176,25 @@ static void zfcp_fsf_req_dismiss(struct
/**
* zfcp_fsf_req_dismiss_all - dismiss all remaining fsf requests
*/
-int zfcp_fsf_req_dismiss_all(struct zfcp_adapter *adapter)
+void zfcp_fsf_req_dismiss_all(struct zfcp_adapter *adapter)
{
struct zfcp_fsf_req *request, *tmp;
unsigned long flags;
+ LIST_HEAD(remove_queue);
unsigned int i, counter;
spin_lock_irqsave(&adapter->req_list_lock, flags);
atomic_set(&adapter->reqs_active, 0);
- for (i=0; i<REQUEST_LIST_SIZE; i++) {
- if (list_empty(&adapter->req_list[i]))
- continue;
-
- counter = 0;
- list_for_each_entry_safe(request, tmp,
- &adapter->req_list[i], list) {
- zfcp_fsf_req_dismiss(adapter, request, counter);
- counter++;
- }
- }
+ for (i=0; i<REQUEST_LIST_SIZE; i++)
+ list_splice_init(&adapter->req_list[i], &remove_queue);
+
spin_unlock_irqrestore(&adapter->req_list_lock, flags);
- return 0;
+ counter = 0;
+ list_for_each_entry_safe(request, tmp, &remove_queue, list) {
+ zfcp_fsf_req_dismiss(adapter, request, counter);
+ counter++;
+ }
}
/*
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] zfcp: Invalid locking order
2007-02-07 12:17 [PATCH] zfcp: Invalid locking order Swen Schillig
@ 2007-02-07 16:06 ` Andreas Herrmann
2007-02-07 16:19 ` Swen Schillig
2007-02-08 14:40 ` Heiko Carstens
0 siblings, 2 replies; 5+ messages in thread
From: Andreas Herrmann @ 2007-02-07 16:06 UTC (permalink / raw)
To: Swen Schillig; +Cc: linux-scsi
On Wed, Feb 07, 2007 at 01:17:57PM +0100, Swen Schillig wrote:
> From: Swen Schillig <swen@vnet.ibm.com>
>
> Invalid locking order. Kernel hangs after trying to take two locks
> which are dependend on each other. Introducing temporary variable
> to free requests. Free lock after requests are copied.
>
I am just curious. You didn't mention which locks are causing the dead
lock.
I've glanced through the code and it seems that locking order
of abort_lock and req_list_lock for adapters is inconsistent.
Is that the bug you try to fix?
Regards,
Andreas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] zfcp: Invalid locking order
2007-02-07 16:06 ` Andreas Herrmann
@ 2007-02-07 16:19 ` Swen Schillig
2007-02-08 14:40 ` Heiko Carstens
1 sibling, 0 replies; 5+ messages in thread
From: Swen Schillig @ 2007-02-07 16:19 UTC (permalink / raw)
To: Andreas Herrmann; +Cc: linux-scsi
Hi Andreas
No, not those.
We got a possible recursive locking on the adapter->request_queue.queue_lock
Cheers Swen
On Wednesday 07 February 2007 17:06, Andreas Herrmann wrote:
> On Wed, Feb 07, 2007 at 01:17:57PM +0100, Swen Schillig wrote:
> > From: Swen Schillig <swen@vnet.ibm.com>
> >
> > Invalid locking order. Kernel hangs after trying to take two locks
> > which are dependend on each other. Introducing temporary variable
> > to free requests. Free lock after requests are copied.
> >
>
> I am just curious. You didn't mention which locks are causing the dead
> lock.
>
> I've glanced through the code and it seems that locking order
> of abort_lock and req_list_lock for adapters is inconsistent.
> Is that the bug you try to fix?
>
>
> Regards,
>
> Andreas
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] zfcp: Invalid locking order
2007-02-07 16:06 ` Andreas Herrmann
2007-02-07 16:19 ` Swen Schillig
@ 2007-02-08 14:40 ` Heiko Carstens
2007-02-08 16:18 ` Andreas Herrmann
1 sibling, 1 reply; 5+ messages in thread
From: Heiko Carstens @ 2007-02-08 14:40 UTC (permalink / raw)
To: Andreas Herrmann; +Cc: Swen Schillig, linux-scsi
On Wed, Feb 07, 2007 at 05:06:43PM +0100, Andreas Herrmann wrote:
> On Wed, Feb 07, 2007 at 01:17:57PM +0100, Swen Schillig wrote:
> > From: Swen Schillig <swen@vnet.ibm.com>
> >
> > Invalid locking order. Kernel hangs after trying to take two locks
> > which are dependend on each other. Introducing temporary variable
> > to free requests. Free lock after requests are copied.
> >
>
> I am just curious. You didn't mention which locks are causing the dead
> lock.
>
> I've glanced through the code and it seems that locking order
> of abort_lock and req_list_lock for adapters is inconsistent.
> Is that the bug you try to fix?
It's a possible A-B-B-A deadlock on the erp_lock and req_list_lock
(see output below). The bug was introduced with
fea9d6c7bcd8ff1d60ff74f27ba483b3820b18a3 and this patch reverts
parts of it.
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.20-rc6-mm3 #2
-------------------------------------------------------
zfcperp0.0.5207/165 is trying to acquire lock:
(&adapter->erp_lock){++..}, at: [<802110ea>] zfcp_erp_port_forced_reopen+0x3a/0x68
but task is already holding lock:
(&adapter->req_list_lock){++..}, at: [<8021bec2>] zfcp_fsf_req_dismiss_all+0x2e/0x124
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&adapter->req_list_lock){++..}:
[<000522e0>] lock_acquire+0x6c/0x94
[<002d7dc2>] _spin_lock+0x46/0x58
[<0020c0ec>] zfcp_erp_strategy_check_action+0x130/0x328
[<00210008>] zfcp_erp_thread+0x1cc/0x1274
[<00018dfa>] kernel_thread_starter+0x6/0xc
[<00018df4>] kernel_thread_starter+0x0/0xc
-> #0 (&adapter->erp_lock){++..}:
[<000522e0>] lock_acquire+0x6c/0x94
[<002d819e>] _write_lock+0x46/0x58
[<002110ea>] zfcp_erp_port_forced_reopen+0x3a/0x68
[<0021159e>] zfcp_erp_adisc_handler+0x1a6/0x258
[<002138ac>] zfcp_fsf_send_els_handler+0x140/0x548
[<00216afe>] zfcp_fsf_req_dispatch+0x1a6/0x1c44
[<0021b266>] zfcp_fsf_req_complete+0xfa/0xd28
[<0021bf6c>] zfcp_fsf_req_dismiss_all+0xd8/0x124
[<0020c91e>] zfcp_erp_adapter_strategy_generic+0x49e/0xb5c
[<0020e37e>] zfcp_erp_strategy_do_action+0x44a/0x17c4
[<0020ffe4>] zfcp_erp_thread+0x1a8/0x1274
[<00018dfa>] kernel_thread_starter+0x6/0xc
[<00018df4>] kernel_thread_starter+0x0/0xc
other info that might help us debug this:
2 locks held by zfcperp0.0.5207/165:
#0: (&adapter->req_list_lock){++..}, at: [<8021bec2>] zfcp_fsf_req_dismiss_all+0x2e/0x124
#1: (&zfcp_data.config_lock){..++}, at: [<802110dc>] zfcp_erp_port_forced_reopen+0x2c/0x68
stack backtrace:
048d1af0 00000000 0f8d1bb8 0f8d1b6c 00000000 00000000 00000000 00000000
00000002 00367cd6 00367cd6 8001504e
Call Trace:
([<0000000000014f9c>] show_trace+0x70/0xb4)
[<000000000001506c>] show_stack+0x8c/0xb0
[<00000000000150b6>] dump_stack+0x26/0x30
[<000000000004f0e0>] print_circular_bug_tail+0x8c/0x90
[<00000000000516a4>] __lock_acquire+0x838/0x1018
[<00000000000522e0>] lock_acquire+0x6c/0x94
[<00000000002d819e>] _write_lock+0x46/0x58
[<00000000002110ea>] zfcp_erp_port_forced_reopen+0x3a/0x68
[<000000000021159e>] zfcp_erp_adisc_handler+0x1a6/0x258
[<00000000002138ac>] zfcp_fsf_send_els_handler+0x140/0x548
[<0000000000216afe>] zfcp_fsf_req_dispatch+0x1a6/0x1c44
[<000000000021b266>] zfcp_fsf_req_complete+0xfa/0xd28
[<000000000021bf6c>] zfcp_fsf_req_dismiss_all+0xd8/0x124
[<000000000020c91e>] zfcp_erp_adapter_strategy_generic+0x49e/0xb5c
[<000000000020e37e>] zfcp_erp_strategy_do_action+0x44a/0x17c4
[<000000000020ffe4>] zfcp_erp_thread+0x1a8/0x1274
[<0000000000018dfa>] kernel_thread_starter+0x6/0xc
[<0000000000018df4>] kernel_thread_starter+0x0/0xc
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] zfcp: Invalid locking order
2007-02-08 14:40 ` Heiko Carstens
@ 2007-02-08 16:18 ` Andreas Herrmann
0 siblings, 0 replies; 5+ messages in thread
From: Andreas Herrmann @ 2007-02-08 16:18 UTC (permalink / raw)
To: Heiko Carstens; +Cc: Swen Schillig, linux-scsi
On Thu, Feb 08, 2007 at 03:40:03PM +0100, Heiko Carstens wrote:
> On Wed, Feb 07, 2007 at 05:06:43PM +0100, Andreas Herrmann wrote:
> > On Wed, Feb 07, 2007 at 01:17:57PM +0100, Swen Schillig wrote:
> > > From: Swen Schillig <swen@vnet.ibm.com>
> > >
> > > Invalid locking order. Kernel hangs after trying to take two locks
> > > which are dependend on each other. Introducing temporary variable
> > > to free requests. Free lock after requests are copied.
> > >
> >
> > I am just curious. You didn't mention which locks are causing the dead
> > lock.
> >
> > I've glanced through the code and it seems that locking order
> > of abort_lock and req_list_lock for adapters is inconsistent.
> > Is that the bug you try to fix?
>
> It's a possible A-B-B-A deadlock on the erp_lock and req_list_lock
> (see output below). The bug was introduced with
> fea9d6c7bcd8ff1d60ff74f27ba483b3820b18a3 and this patch reverts
> parts of it.
>
Pretty good catch.
I might be wrong but the patch seems to fix another potential deadlock:
CPU#0
0: zfcp_fsf_req_dimiss_all()
spin_lock_irqsave(&adapter->req_list_lock, flags);
1: zfcp_fsf_req_dismiss()
2: zfcp_fsf_protstatus_eval()
3: zfcp_fsf_fsfstatus_eval()
4: zfcp_fsf_req_dispatch()
5: zfcp_fsf_send_fcp_command_handler()
6: zfcp_fsf_send_fcp_command_task_handler()
read_lock_irqsave(&fsf_req->adapter->abort_lock, flags);
CPU#1
0: zfcp_scsi_eh_abort_handler()
write_lock_irqsave(&adapter->abort_lock, flags);
spin_lock(&adapter->req_list_lock);
But I currently do not overlook whether zfcp_fsf_req_dimiss_all()
and zfcp_scsi_eh_abort_handler() can run simultaneously for the
same adapter.
However that may be, with Swen's patch this case won't happen.
Regards,
Andreas
--
AMD Saxony, Dresden, Germany
Operating System Research Center
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-02-08 16:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-07 12:17 [PATCH] zfcp: Invalid locking order Swen Schillig
2007-02-07 16:06 ` Andreas Herrmann
2007-02-07 16:19 ` Swen Schillig
2007-02-08 14:40 ` Heiko Carstens
2007-02-08 16:18 ` Andreas Herrmann
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.