linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/5] zfcp fixes for 2.6.37-rc
@ 2010-11-17 13:23 Christof Schmitt
  2010-11-17 13:23 ` [patch 1/5] zfcp: Fix common FCP request reception Christof Schmitt
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Christof Schmitt @ 2010-11-17 13:23 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens

James,

here is the series of zfcp fixes for the current 2.6.37 development
cycle. The patches apply on the current scsi-rc-fixes git tree.

--
Christof

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 1/5] zfcp: Fix common FCP request reception
  2010-11-17 13:23 [patch 0/5] zfcp fixes for 2.6.37-rc Christof Schmitt
@ 2010-11-17 13:23 ` Christof Schmitt
  2010-11-17 13:23 ` [patch 2/5] zfcp: Correct false abort data assignment Christof Schmitt
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Christof Schmitt @ 2010-11-17 13:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens,
	Swen Schillig, Christof Schmitt

[-- Attachment #1: 700-zfcp-request-reception.diff --]
[-- Type: text/plain, Size: 3885 bytes --]

From: Swen Schillig <swen@vnet.ibm.com>

The reception of a common FCP request should only be evaluated if the
corresponding SCSI request data is available. Therefore put the
information under the lock protection and verify the existence before
processing.  This fixes the following kernel panic.

Unable to handle kernel pointer dereference at virtual kernel address 0000000180000000
Oops: 003b [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 0 Not tainted 2.6.35.7-45.x.20101007-s390xdefault #1
Process blast (pid: 9711, task: 00000000a3be8e40, ksp: 00000000b221bac0)
Krnl PSW : 0704300180000000 0000000000489878 (zfcp_fsf_fcp_handler_common+0x4c/0x3a0)
           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
Krnl GPRS: 00000000b663c1b8 0000000180000000 000000007ab5bdf0 0000000000000000
           00000000b0ccd800 0000000000000018 07000000a3be8e78 00000000b5d3e600
           000000007ab5bdf0 0000000000000066 00000000b72137f0 00000000b72137f0
           0000000000000000 00000000005a8178 00000000bdf37a60 00000000bdf379f0
Krnl Code: 0000000000489866: e3c030000004       lg      %r12,0(%r3)
           000000000048986c: e310c0000004       lg      %r1,0(%r12)
           0000000000489872: e31011e00004       lg      %r1,480(%r1)
          >0000000000489878: 581011ec           l       %r1,492(%r1)
           000000000048987c: a774001c           brc     7,4898b4
           0000000000489880: b91400b1           lgfr    %r11,%r1
           0000000000489884: 5810405c           l       %r1,92(%r4)
           0000000000489888: 5510d00c           cl      %r1,12(%r13)
Call Trace:
([<000000000010d344>] debug_event_common+0x22c/0x244)
 [<000000000048a0b4>] zfcp_fsf_fcp_cmnd_handler+0x2c/0x3b4
 [<000000000048b5b6>] zfcp_fsf_req_complete+0x1b6/0x9dc
 [<000000000048bede>] zfcp_fsf_reqid_check+0x102/0x138
 [<000000000048e478>] zfcp_qdio_int_resp+0x70/0x110
 [<000000000044a1ec>] qdio_kick_handler+0xb0/0x19c
 [<000000000044c228>] __tiqdio_inbound_processing+0x30c/0xebc
 [<000000000014a5fc>] tasklet_action+0x1b4/0x1e8
 [<000000000014b676>] __do_softirq+0x106/0x1cc
 [<000000000010d91a>] do_softirq+0xe6/0xec
 [<000000000014b0c8>] irq_exit+0xd4/0xd8
 [<00000000004307ec>] do_IRQ+0x7c0/0xf54
 [<0000000000114d28>] io_return+0x0/0x16
 [<000000000055fef0>] sub_preempt_count+0x50/0xe4
([<00000000b1f873c0>] 0xb1f873c0)
 [<000000000055e25a>] _raw_spin_unlock+0x46/0x74
 [<0000000000241c40>] __d_lookup+0x288/0x2c8
 [<000000000023502c>] do_lookup+0x7c/0x25c
 [<0000000000237fa8>] link_path_walk+0x5e4/0xe2c
 [<0000000000238a00>] path_walk+0x98/0x148
 [<0000000000238c98>] do_path_lookup+0x74/0xc0
 [<000000000023989c>] user_path_at+0x64/0xa4
 [<000000000022e366>] vfs_fstatat+0x4e/0xb0
 [<000000000022e4d6>] SyS_newstat+0x2e/0x54
 [<00000000001146de>] sysc_noemu+0x10/0x16
 [<0000020000153456>] 0x20000153456
INFO: lockdep is turned off.
Last Breaking-Event-Address:
 [<000000000048a0ae>] zfcp_fsf_fcp_cmnd_handler+0x26/0x3b4

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---

 drivers/s390/scsi/zfcp_fsf.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -urpN linux-2.6/drivers/s390/scsi/zfcp_fsf.c linux-2.6-patched/drivers/s390/scsi/zfcp_fsf.c
--- linux-2.6/drivers/s390/scsi/zfcp_fsf.c	2010-11-17 10:03:42.000000000 +0100
+++ linux-2.6-patched/drivers/s390/scsi/zfcp_fsf.c	2010-11-17 10:04:18.000000000 +0100
@@ -2069,8 +2069,6 @@ static void zfcp_fsf_fcp_cmnd_handler(st
 	struct fcp_resp_with_ext *fcp_rsp;
 	unsigned long flags;
 
-	zfcp_fsf_fcp_handler_common(req);
-
 	read_lock_irqsave(&req->adapter->abort_lock, flags);
 
 	scpnt = req->data;
@@ -2079,6 +2077,8 @@ static void zfcp_fsf_fcp_cmnd_handler(st
 		return;
 	}
 
+	zfcp_fsf_fcp_handler_common(req);
+
 	if (unlikely(req->status & ZFCP_STATUS_FSFREQ_ERROR)) {
 		set_host_byte(scpnt, DID_TRANSPORT_DISRUPTED);
 		goto skip_fsfstatus;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 2/5] zfcp: Correct false abort data assignment.
  2010-11-17 13:23 [patch 0/5] zfcp fixes for 2.6.37-rc Christof Schmitt
  2010-11-17 13:23 ` [patch 1/5] zfcp: Fix common FCP request reception Christof Schmitt
@ 2010-11-17 13:23 ` Christof Schmitt
  2010-11-17 13:23 ` [patch 3/5] zfcp: No ERP escalation on gpn_ft eval Christof Schmitt
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Christof Schmitt @ 2010-11-17 13:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens,
	Swen Schillig, Christof Schmitt

[-- Attachment #1: 706-zfcp-false-abort.diff --]
[-- Type: text/plain, Size: 872 bytes --]

From: Swen Schillig <swen@vnet.ibm.com>

The request data assignment between the fsf abort initiator and its
corresponding handler is not consistent and leads to an unpredictable
behaviour, e.g. kernel panic.  This patch fixes this issue and assigns
the correct value.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---

 drivers/s390/scsi/zfcp_fsf.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -851,7 +851,7 @@ struct zfcp_fsf_req *zfcp_fsf_abort_fcp_
 
 	zfcp_qdio_set_sbale_last(qdio, &req->qdio_req);
 
-	req->data = zfcp_sdev;
+	req->data = sdev;
 	req->handler = zfcp_fsf_abort_fcp_command_handler;
 	req->qtcb->header.lun_handle = zfcp_sdev->lun_handle;
 	req->qtcb->header.port_handle = zfcp_sdev->port->handle;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 3/5] zfcp: No ERP escalation on gpn_ft eval
  2010-11-17 13:23 [patch 0/5] zfcp fixes for 2.6.37-rc Christof Schmitt
  2010-11-17 13:23 ` [patch 1/5] zfcp: Fix common FCP request reception Christof Schmitt
  2010-11-17 13:23 ` [patch 2/5] zfcp: Correct false abort data assignment Christof Schmitt
@ 2010-11-17 13:23 ` Christof Schmitt
  2010-11-17 13:23 ` [patch 4/5] zfcp: Prevent usage w/o holding a reference Christof Schmitt
  2010-11-17 13:23 ` [patch 5/5] zfcp: Issue FCP command without holding SCSI host_lock Christof Schmitt
  4 siblings, 0 replies; 11+ messages in thread
From: Christof Schmitt @ 2010-11-17 13:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens,
	Swen Schillig, Christof Schmitt

[-- Attachment #1: 708-zfcp-no-escalation.diff --]
[-- Type: text/plain, Size: 1257 bytes --]

From: Swen Schillig <swen@vnet.ibm.com>

If the evaluation of GPN_FT requests wants to remove an invalid port
from the system the zfcp_erp_port_shutdown function is triggered.
Depending on the system status a superior action (e.g. adapter reopen)
is required. This can lead to an invalid mem access of the port struct
which might be freed at the time since the superior action is not
holding a reference of the port which triggered this ERP action.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---

 drivers/s390/scsi/zfcp_erp.c |    2 ++
 1 file changed, 2 insertions(+)

diff -urpN linux-2.6/drivers/s390/scsi/zfcp_erp.c linux-2.6-patched/drivers/s390/scsi/zfcp_erp.c
--- linux-2.6/drivers/s390/scsi/zfcp_erp.c	2010-11-17 10:04:26.000000000 +0100
+++ linux-2.6-patched/drivers/s390/scsi/zfcp_erp.c	2010-11-17 10:04:26.000000000 +0100
@@ -156,6 +156,8 @@ static int zfcp_erp_required_act(int wan
 		if (!(a_status & ZFCP_STATUS_COMMON_RUNNING) ||
 		      a_status & ZFCP_STATUS_COMMON_ERP_FAILED)
 			return 0;
+		if (p_status & ZFCP_STATUS_COMMON_NOESC)
+			return need;
 		if (!(a_status & ZFCP_STATUS_COMMON_UNBLOCKED))
 			need = ZFCP_ERP_ACTION_REOPEN_ADAPTER;
 		/* fall through */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 4/5] zfcp: Prevent usage w/o holding a reference
  2010-11-17 13:23 [patch 0/5] zfcp fixes for 2.6.37-rc Christof Schmitt
                   ` (2 preceding siblings ...)
  2010-11-17 13:23 ` [patch 3/5] zfcp: No ERP escalation on gpn_ft eval Christof Schmitt
@ 2010-11-17 13:23 ` Christof Schmitt
  2010-11-17 13:23 ` [patch 5/5] zfcp: Issue FCP command without holding SCSI host_lock Christof Schmitt
  4 siblings, 0 replies; 11+ messages in thread
From: Christof Schmitt @ 2010-11-17 13:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens,
	Swen Schillig, Christof Schmitt

[-- Attachment #1: 709-zfcp-usage-reference.diff --]
[-- Type: text/plain, Size: 2264 bytes --]

From: Swen Schillig <swen@vnet.ibm.com>

The ERP got values assigned for which no reference was taken.  This
can lead to an unpredictable race condition.  Fix this by only
assigning the values which are required and for which a reference was
pulled or is held implicitly.

Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---

 drivers/s390/scsi/zfcp_erp.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff -urpN linux-2.6/drivers/s390/scsi/zfcp_erp.c linux-2.6-patched/drivers/s390/scsi/zfcp_erp.c
--- linux-2.6/drivers/s390/scsi/zfcp_erp.c	2010-11-17 10:04:27.000000000 +0100
+++ linux-2.6-patched/drivers/s390/scsi/zfcp_erp.c	2010-11-17 10:04:27.000000000 +0100
@@ -190,6 +190,9 @@ static struct zfcp_erp_action *zfcp_erp_
 		atomic_set_mask(ZFCP_STATUS_COMMON_ERP_INUSE,
 				&zfcp_sdev->status);
 		erp_action = &zfcp_sdev->erp_action;
+		memset(erp_action, 0, sizeof(struct zfcp_erp_action));
+		erp_action->port = port;
+		erp_action->sdev = sdev;
 		if (!(atomic_read(&zfcp_sdev->status) &
 		      ZFCP_STATUS_COMMON_RUNNING))
 			act_status |= ZFCP_STATUS_ERP_CLOSE_ONLY;
@@ -202,6 +205,8 @@ static struct zfcp_erp_action *zfcp_erp_
 		zfcp_erp_action_dismiss_port(port);
 		atomic_set_mask(ZFCP_STATUS_COMMON_ERP_INUSE, &port->status);
 		erp_action = &port->erp_action;
+		memset(erp_action, 0, sizeof(struct zfcp_erp_action));
+		erp_action->port = port;
 		if (!(atomic_read(&port->status) & ZFCP_STATUS_COMMON_RUNNING))
 			act_status |= ZFCP_STATUS_ERP_CLOSE_ONLY;
 		break;
@@ -211,6 +216,7 @@ static struct zfcp_erp_action *zfcp_erp_
 		zfcp_erp_action_dismiss_adapter(adapter);
 		atomic_set_mask(ZFCP_STATUS_COMMON_ERP_INUSE, &adapter->status);
 		erp_action = &adapter->erp_action;
+		memset(erp_action, 0, sizeof(struct zfcp_erp_action));
 		if (!(atomic_read(&adapter->status) &
 		      ZFCP_STATUS_COMMON_RUNNING))
 			act_status |= ZFCP_STATUS_ERP_CLOSE_ONLY;
@@ -220,10 +226,7 @@ static struct zfcp_erp_action *zfcp_erp_
 		return NULL;
 	}
 
-	memset(erp_action, 0, sizeof(struct zfcp_erp_action));
 	erp_action->adapter = adapter;
-	erp_action->port = port;
-	erp_action->sdev = sdev;
 	erp_action->action = need;
 	erp_action->status = act_status;
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [patch 5/5] zfcp: Issue FCP command without holding SCSI host_lock
  2010-11-17 13:23 [patch 0/5] zfcp fixes for 2.6.37-rc Christof Schmitt
                   ` (3 preceding siblings ...)
  2010-11-17 13:23 ` [patch 4/5] zfcp: Prevent usage w/o holding a reference Christof Schmitt
@ 2010-11-17 13:23 ` Christof Schmitt
  2010-11-17 13:43   ` Boaz Harrosh
  4 siblings, 1 reply; 11+ messages in thread
From: Christof Schmitt @ 2010-11-17 13:23 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, linux-s390, schwidefsky, heiko.carstens,
	Christof Schmitt

[-- Attachment #1: linux-2.6.36-zfcp-fcp-cmd-lock.patch --]
[-- Type: text/plain, Size: 4301 bytes --]

From: Christof Schmitt <christof.schmitt@de.ibm.com>

Interrupting the connection to the FCP channel while I/O requests are
being issues can lead to this deadlock. scsi_dispatch_cmd already
holds the host_lock while the recovery trigger tries to acquire the
host_lock again when iterating through the scsi_devices.

 INFO: lockdep is turned off.
 BUG: spinlock lockup on CPU#1, blast/9660, 0000000078f38878
 CPU: 1 Not tainted 2.6.35.7SWEN2 #2
 Process blast (pid: 9660, task: 0000000071f75940, ksp: 0000000074393ac0)
        0000000074393640 00000000743935c0 0000000000000002 0000000000000000
        0000000074393660 00000000743935d8 00000000743935d8 00000000005590c2
        0000000000000000 0000000078f38878 0000000026ede800 0000000078f38878
        000000000000000d 040000000000000c 0000000074393628 0000000000000000
        0000000000000000 0000000000100b2a 00000000743935c0 0000000074393600
 Call Trace:
 ([<0000000000100a32>] show_trace+0xee/0x144)
  [<00000000003be202>] do_raw_spin_lock+0x112/0x178
  [<000000000055d408>] _raw_spin_lock_irqsave+0x90/0xb0
  [<00000000003f1514>] __scsi_iterate_devices+0x38/0xbc
  [<00000000004849b0>] zfcp_erp_clear_adapter_status+0xd0/0x16c
  [<000000000048587a>] zfcp_erp_adapter_reopen+0x3a/0xb4
  [<0000000000489812>] zfcp_fsf_req_send+0x166/0x180
  [<000000000048c8d6>] zfcp_fsf_fcp_cmnd+0x272/0x408
  [<000000000048f864>] zfcp_scsi_queuecommand+0x11c/0x1e0
  [<00000000003f1f2a>] scsi_dispatch_cmd+0x1d6/0x324
  [<00000000003f9910>] scsi_request_fn+0x42c/0x56c
  [<00000000003828ae>] __blk_run_queue+0x86/0x140
  [<000000000037f742>] elv_insert+0x11a/0x208
  [<000000000038104c>] blk_insert_cloned_request+0x84/0xe4
  [<000003c0032b7c64>] dm_dispatch_request+0x6c/0x94 [dm_mod]
  [<000003c0032b7d5c>] map_request+0xd0/0x100 [dm_mod]
  [<000003c0032b9a78>] dm_request_fn+0xec/0x1bc [dm_mod]
  [<0000000000382c0e>] generic_unplug_device+0x5a/0x6c
  [<000003c0032b7f98>] dm_unplug_all+0x74/0x9c [dm_mod]
  [<00000000001d1272>] sync_page+0x76/0x9c
  [<00000000001d12ba>] sync_page_killable+0x22/0x60
  [<000000000055a768>] __wait_on_bit_lock+0xc0/0x124
  [<00000000001d1140>] __lock_page_killable+0x78/0x84
  [<00000000001d351c>] generic_file_aio_read+0x5a4/0x7e8
  [<0000000000228ec0>] do_sync_read+0xc8/0x12c
  [<0000000000229edc>] vfs_read+0xac/0x1ac
  [<000000000022a0d8>] SyS_read+0x58/0xa8
  [<00000000001146de>] sysc_noemu+0x10/0x16
  [<00000200000493c4>] 0x200000493c4
 INFO: lockdep is turned off.

Call zfcp_fsf_fcp_cmnd without the host_lock and disable the
interrupts when acquiring the req_q_lock.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---

 drivers/s390/scsi/zfcp_fsf.c  |    5 +++--
 drivers/s390/scsi/zfcp_scsi.c |    3 +++
 2 files changed, 6 insertions(+), 2 deletions(-)

--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -2170,12 +2170,13 @@ int zfcp_fsf_fcp_cmnd(struct scsi_cmnd *
 	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
 	struct zfcp_qdio *qdio = adapter->qdio;
 	struct fsf_qtcb_bottom_io *io;
+	unsigned long flags;
 
 	if (unlikely(!(atomic_read(&zfcp_sdev->status) &
 		       ZFCP_STATUS_COMMON_UNBLOCKED)))
 		return -EBUSY;
 
-	spin_lock(&qdio->req_q_lock);
+	spin_lock_irqsave(&qdio->req_q_lock, flags);
 	if (atomic_read(&qdio->req_q_free) <= 0) {
 		atomic_inc(&qdio->req_q_full);
 		goto out;
@@ -2239,7 +2240,7 @@ failed_scsi_cmnd:
 	zfcp_fsf_req_free(req);
 	scsi_cmnd->host_scribble = NULL;
 out:
-	spin_unlock(&qdio->req_q_lock);
+	spin_unlock_irqrestore(&qdio->req_q_lock, flags);
 	return retval;
 }
 
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -83,6 +83,7 @@ static int zfcp_scsi_queuecommand_lck(st
 	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
 	struct fc_rport *rport = starget_to_rport(scsi_target(scpnt->device));
 	int    status, scsi_result, ret;
+	struct scsi_device *sdev = scpnt->device;
 
 	/* reset the status for this request */
 	scpnt->result = 0;
@@ -118,7 +119,9 @@ static int zfcp_scsi_queuecommand_lck(st
 		return 0;
 	}
 
+	spin_unlock_irq(sdev->host->host_lock);
 	ret = zfcp_fsf_fcp_cmnd(scpnt);
+	spin_lock_irq(sdev->host->host_lock);
 	if (unlikely(ret == -EBUSY))
 		return SCSI_MLQUEUE_DEVICE_BUSY;
 	else if (unlikely(ret < 0))

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 5/5] zfcp: Issue FCP command without holding SCSI host_lock
  2010-11-17 13:23 ` [patch 5/5] zfcp: Issue FCP command without holding SCSI host_lock Christof Schmitt
@ 2010-11-17 13:43   ` Boaz Harrosh
  2010-11-17 14:10     ` Christof Schmitt
  0 siblings, 1 reply; 11+ messages in thread
From: Boaz Harrosh @ 2010-11-17 13:43 UTC (permalink / raw)
  To: Christof Schmitt, Jeff Garzik
  Cc: James Bottomley, linux-scsi, linux-s390, schwidefsky,
	heiko.carstens

On 11/17/2010 03:23 PM, Christof Schmitt wrote:
> From: Christof Schmitt <christof.schmitt@de.ibm.com>
> 
> Interrupting the connection to the FCP channel while I/O requests are
> being issues can lead to this deadlock. scsi_dispatch_cmd already
> holds the host_lock while the recovery trigger tries to acquire the
> host_lock again when iterating through the scsi_devices.
> 
>  INFO: lockdep is turned off.
>  BUG: spinlock lockup on CPU#1, blast/9660, 0000000078f38878
>  CPU: 1 Not tainted 2.6.35.7SWEN2 #2
>  Process blast (pid: 9660, task: 0000000071f75940, ksp: 0000000074393ac0)
>         0000000074393640 00000000743935c0 0000000000000002 0000000000000000
>         0000000074393660 00000000743935d8 00000000743935d8 00000000005590c2
>         0000000000000000 0000000078f38878 0000000026ede800 0000000078f38878
>         000000000000000d 040000000000000c 0000000074393628 0000000000000000
>         0000000000000000 0000000000100b2a 00000000743935c0 0000000074393600
>  Call Trace:
>  ([<0000000000100a32>] show_trace+0xee/0x144)
>   [<00000000003be202>] do_raw_spin_lock+0x112/0x178
>   [<000000000055d408>] _raw_spin_lock_irqsave+0x90/0xb0
>   [<00000000003f1514>] __scsi_iterate_devices+0x38/0xbc
>   [<00000000004849b0>] zfcp_erp_clear_adapter_status+0xd0/0x16c
>   [<000000000048587a>] zfcp_erp_adapter_reopen+0x3a/0xb4
>   [<0000000000489812>] zfcp_fsf_req_send+0x166/0x180
>   [<000000000048c8d6>] zfcp_fsf_fcp_cmnd+0x272/0x408
>   [<000000000048f864>] zfcp_scsi_queuecommand+0x11c/0x1e0
>   [<00000000003f1f2a>] scsi_dispatch_cmd+0x1d6/0x324
>   [<00000000003f9910>] scsi_request_fn+0x42c/0x56c
>   [<00000000003828ae>] __blk_run_queue+0x86/0x140
>   [<000000000037f742>] elv_insert+0x11a/0x208
>   [<000000000038104c>] blk_insert_cloned_request+0x84/0xe4
>   [<000003c0032b7c64>] dm_dispatch_request+0x6c/0x94 [dm_mod]
>   [<000003c0032b7d5c>] map_request+0xd0/0x100 [dm_mod]
>   [<000003c0032b9a78>] dm_request_fn+0xec/0x1bc [dm_mod]
>   [<0000000000382c0e>] generic_unplug_device+0x5a/0x6c
>   [<000003c0032b7f98>] dm_unplug_all+0x74/0x9c [dm_mod]
>   [<00000000001d1272>] sync_page+0x76/0x9c
>   [<00000000001d12ba>] sync_page_killable+0x22/0x60
>   [<000000000055a768>] __wait_on_bit_lock+0xc0/0x124
>   [<00000000001d1140>] __lock_page_killable+0x78/0x84
>   [<00000000001d351c>] generic_file_aio_read+0x5a4/0x7e8
>   [<0000000000228ec0>] do_sync_read+0xc8/0x12c
>   [<0000000000229edc>] vfs_read+0xac/0x1ac
>   [<000000000022a0d8>] SyS_read+0x58/0xa8
>   [<00000000001146de>] sysc_noemu+0x10/0x16
>   [<00000200000493c4>] 0x200000493c4
>  INFO: lockdep is turned off.
> 
> Call zfcp_fsf_fcp_cmnd without the host_lock and disable the
> interrupts when acquiring the req_q_lock.
> 
> Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
> ---
> 
>  drivers/s390/scsi/zfcp_fsf.c  |    5 +++--
>  drivers/s390/scsi/zfcp_scsi.c |    3 +++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> --- a/drivers/s390/scsi/zfcp_fsf.c
> +++ b/drivers/s390/scsi/zfcp_fsf.c
> @@ -2170,12 +2170,13 @@ int zfcp_fsf_fcp_cmnd(struct scsi_cmnd *
>  	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
>  	struct zfcp_qdio *qdio = adapter->qdio;
>  	struct fsf_qtcb_bottom_io *io;
> +	unsigned long flags;
>  
>  	if (unlikely(!(atomic_read(&zfcp_sdev->status) &
>  		       ZFCP_STATUS_COMMON_UNBLOCKED)))
>  		return -EBUSY;
>  
> -	spin_lock(&qdio->req_q_lock);
> +	spin_lock_irqsave(&qdio->req_q_lock, flags);
>  	if (atomic_read(&qdio->req_q_free) <= 0) {
>  		atomic_inc(&qdio->req_q_full);
>  		goto out;
> @@ -2239,7 +2240,7 @@ failed_scsi_cmnd:
>  	zfcp_fsf_req_free(req);
>  	scsi_cmnd->host_scribble = NULL;
>  out:
> -	spin_unlock(&qdio->req_q_lock);
> +	spin_unlock_irqrestore(&qdio->req_q_lock, flags);
>  	return retval;
>  }
>  
> --- a/drivers/s390/scsi/zfcp_scsi.c
> +++ b/drivers/s390/scsi/zfcp_scsi.c
> @@ -83,6 +83,7 @@ static int zfcp_scsi_queuecommand_lck(st
>  	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
>  	struct fc_rport *rport = starget_to_rport(scsi_target(scpnt->device));
>  	int    status, scsi_result, ret;
> +	struct scsi_device *sdev = scpnt->device;
>  
>  	/* reset the status for this request */
>  	scpnt->result = 0;
> @@ -118,7 +119,9 @@ static int zfcp_scsi_queuecommand_lck(st
>  		return 0;
>  	}
>  
> +	spin_unlock_irq(sdev->host->host_lock);
>  	ret = zfcp_fsf_fcp_cmnd(scpnt);
> +	spin_lock_irq(sdev->host->host_lock);

CCing Jeff

that locks is taken in your own driver three lines below at the
DEF_SCSI_QCMD macro invocation

Please do the proper host-lock-removal. The first time you are
touching this code. (See example patch to libata by Jeff Garzik)

Boaz

>  	if (unlikely(ret == -EBUSY))
>  		return SCSI_MLQUEUE_DEVICE_BUSY;
>  	else if (unlikely(ret < 0))
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 5/5] zfcp: Issue FCP command without holding SCSI host_lock
  2010-11-17 13:43   ` Boaz Harrosh
@ 2010-11-17 14:10     ` Christof Schmitt
  2010-11-17 14:35       ` Boaz Harrosh
  0 siblings, 1 reply; 11+ messages in thread
From: Christof Schmitt @ 2010-11-17 14:10 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Jeff Garzik, James Bottomley, linux-scsi, linux-s390, schwidefsky,
	heiko.carstens

On Wed, Nov 17, 2010 at 03:43:34PM +0200, Boaz Harrosh wrote:
> On 11/17/2010 03:23 PM, Christof Schmitt wrote:
> > From: Christof Schmitt <christof.schmitt@de.ibm.com>
> > 
> > Interrupting the connection to the FCP channel while I/O requests are
> > being issues can lead to this deadlock. scsi_dispatch_cmd already
> > holds the host_lock while the recovery trigger tries to acquire the
> > host_lock again when iterating through the scsi_devices.
> > 
> >  INFO: lockdep is turned off.
> >  BUG: spinlock lockup on CPU#1, blast/9660, 0000000078f38878
> >  CPU: 1 Not tainted 2.6.35.7SWEN2 #2
> >  Process blast (pid: 9660, task: 0000000071f75940, ksp: 0000000074393ac0)
> >         0000000074393640 00000000743935c0 0000000000000002 0000000000000000
> >         0000000074393660 00000000743935d8 00000000743935d8 00000000005590c2
> >         0000000000000000 0000000078f38878 0000000026ede800 0000000078f38878
> >         000000000000000d 040000000000000c 0000000074393628 0000000000000000
> >         0000000000000000 0000000000100b2a 00000000743935c0 0000000074393600
> >  Call Trace:
> >  ([<0000000000100a32>] show_trace+0xee/0x144)
> >   [<00000000003be202>] do_raw_spin_lock+0x112/0x178
> >   [<000000000055d408>] _raw_spin_lock_irqsave+0x90/0xb0
> >   [<00000000003f1514>] __scsi_iterate_devices+0x38/0xbc
> >   [<00000000004849b0>] zfcp_erp_clear_adapter_status+0xd0/0x16c
> >   [<000000000048587a>] zfcp_erp_adapter_reopen+0x3a/0xb4
> >   [<0000000000489812>] zfcp_fsf_req_send+0x166/0x180
> >   [<000000000048c8d6>] zfcp_fsf_fcp_cmnd+0x272/0x408
> >   [<000000000048f864>] zfcp_scsi_queuecommand+0x11c/0x1e0
> >   [<00000000003f1f2a>] scsi_dispatch_cmd+0x1d6/0x324
> >   [<00000000003f9910>] scsi_request_fn+0x42c/0x56c
> >   [<00000000003828ae>] __blk_run_queue+0x86/0x140
> >   [<000000000037f742>] elv_insert+0x11a/0x208
> >   [<000000000038104c>] blk_insert_cloned_request+0x84/0xe4
> >   [<000003c0032b7c64>] dm_dispatch_request+0x6c/0x94 [dm_mod]
> >   [<000003c0032b7d5c>] map_request+0xd0/0x100 [dm_mod]
> >   [<000003c0032b9a78>] dm_request_fn+0xec/0x1bc [dm_mod]
> >   [<0000000000382c0e>] generic_unplug_device+0x5a/0x6c
> >   [<000003c0032b7f98>] dm_unplug_all+0x74/0x9c [dm_mod]
> >   [<00000000001d1272>] sync_page+0x76/0x9c
> >   [<00000000001d12ba>] sync_page_killable+0x22/0x60
> >   [<000000000055a768>] __wait_on_bit_lock+0xc0/0x124
> >   [<00000000001d1140>] __lock_page_killable+0x78/0x84
> >   [<00000000001d351c>] generic_file_aio_read+0x5a4/0x7e8
> >   [<0000000000228ec0>] do_sync_read+0xc8/0x12c
> >   [<0000000000229edc>] vfs_read+0xac/0x1ac
> >   [<000000000022a0d8>] SyS_read+0x58/0xa8
> >   [<00000000001146de>] sysc_noemu+0x10/0x16
> >   [<00000200000493c4>] 0x200000493c4
> >  INFO: lockdep is turned off.
> > 
> > Call zfcp_fsf_fcp_cmnd without the host_lock and disable the
> > interrupts when acquiring the req_q_lock.
> > 
> > Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
> > Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
> > ---
> > 
> >  drivers/s390/scsi/zfcp_fsf.c  |    5 +++--
> >  drivers/s390/scsi/zfcp_scsi.c |    3 +++
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > --- a/drivers/s390/scsi/zfcp_fsf.c
> > +++ b/drivers/s390/scsi/zfcp_fsf.c
> > @@ -2170,12 +2170,13 @@ int zfcp_fsf_fcp_cmnd(struct scsi_cmnd *
> >  	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
> >  	struct zfcp_qdio *qdio = adapter->qdio;
> >  	struct fsf_qtcb_bottom_io *io;
> > +	unsigned long flags;
> >  
> >  	if (unlikely(!(atomic_read(&zfcp_sdev->status) &
> >  		       ZFCP_STATUS_COMMON_UNBLOCKED)))
> >  		return -EBUSY;
> >  
> > -	spin_lock(&qdio->req_q_lock);
> > +	spin_lock_irqsave(&qdio->req_q_lock, flags);
> >  	if (atomic_read(&qdio->req_q_free) <= 0) {
> >  		atomic_inc(&qdio->req_q_full);
> >  		goto out;
> > @@ -2239,7 +2240,7 @@ failed_scsi_cmnd:
> >  	zfcp_fsf_req_free(req);
> >  	scsi_cmnd->host_scribble = NULL;
> >  out:
> > -	spin_unlock(&qdio->req_q_lock);
> > +	spin_unlock_irqrestore(&qdio->req_q_lock, flags);
> >  	return retval;
> >  }
> >  
> > --- a/drivers/s390/scsi/zfcp_scsi.c
> > +++ b/drivers/s390/scsi/zfcp_scsi.c
> > @@ -83,6 +83,7 @@ static int zfcp_scsi_queuecommand_lck(st
> >  	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
> >  	struct fc_rport *rport = starget_to_rport(scsi_target(scpnt->device));
> >  	int    status, scsi_result, ret;
> > +	struct scsi_device *sdev = scpnt->device;
> >  
> >  	/* reset the status for this request */
> >  	scpnt->result = 0;
> > @@ -118,7 +119,9 @@ static int zfcp_scsi_queuecommand_lck(st
> >  		return 0;
> >  	}
> >  
> > +	spin_unlock_irq(sdev->host->host_lock);
> >  	ret = zfcp_fsf_fcp_cmnd(scpnt);
> > +	spin_lock_irq(sdev->host->host_lock);
> 
> CCing Jeff
> 
> that locks is taken in your own driver three lines below at the
> DEF_SCSI_QCMD macro invocation
> 
> Please do the proper host-lock-removal. The first time you are
> touching this code. (See example patch to libata by Jeff Garzik)

With the current code, the serial_number has to be updated for each
command since scsi_error still has the check for the serial_number in
scsi_try_to_abort_cmd.  The change above is a bug fix for the zfcp
changes introduced in 2.6.37-rc1, and i would like to fix this now. I
remember from the host_lock discussion that the serial_number changes
will happen in the 2.6.38 kernel. To me, it looks like there a two
changes needed anyway.

Either
 - make the above change now (as bug fix)
 - remove the host_lock from zfcp's queuecommand function when the
   serial_number becomes optional
or
 - change the queuecommand function now to include:
   take host_lock, call scsi_cmd_get_serial, release host_lock
 - remove the sequence again when the serial_number becomes optional

I opted for the first approach, to have a smaller patch now. If the
second approach is preferred, i can send an updated patch.

Christof

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 5/5] zfcp: Issue FCP command without holding SCSI host_lock
  2010-11-17 14:10     ` Christof Schmitt
@ 2010-11-17 14:35       ` Boaz Harrosh
  2010-11-18  8:34         ` Christof Schmitt
  0 siblings, 1 reply; 11+ messages in thread
From: Boaz Harrosh @ 2010-11-17 14:35 UTC (permalink / raw)
  To: Christof Schmitt
  Cc: Jeff Garzik, James Bottomley, linux-scsi, linux-s390, schwidefsky,
	heiko.carstens, Nicholas A. Bellinger

On 11/17/2010 04:10 PM, Christof Schmitt wrote:
> On Wed, Nov 17, 2010 at 03:43:34PM +0200, Boaz Harrosh wrote:
>> On 11/17/2010 03:23 PM, Christof Schmitt wrote:
>>> From: Christof Schmitt <christof.schmitt@de.ibm.com>
>>>
>>> Interrupting the connection to the FCP channel while I/O requests are
>>> being issues can lead to this deadlock. scsi_dispatch_cmd already
>>> holds the host_lock while the recovery trigger tries to acquire the
>>> host_lock again when iterating through the scsi_devices.
>>>
>>>  INFO: lockdep is turned off.
>>>  BUG: spinlock lockup on CPU#1, blast/9660, 0000000078f38878
>>>  CPU: 1 Not tainted 2.6.35.7SWEN2 #2
>>>  Process blast (pid: 9660, task: 0000000071f75940, ksp: 0000000074393ac0)
>>>         0000000074393640 00000000743935c0 0000000000000002 0000000000000000
>>>         0000000074393660 00000000743935d8 00000000743935d8 00000000005590c2
>>>         0000000000000000 0000000078f38878 0000000026ede800 0000000078f38878
>>>         000000000000000d 040000000000000c 0000000074393628 0000000000000000
>>>         0000000000000000 0000000000100b2a 00000000743935c0 0000000074393600
>>>  Call Trace:
>>>  ([<0000000000100a32>] show_trace+0xee/0x144)
>>>   [<00000000003be202>] do_raw_spin_lock+0x112/0x178
>>>   [<000000000055d408>] _raw_spin_lock_irqsave+0x90/0xb0
>>>   [<00000000003f1514>] __scsi_iterate_devices+0x38/0xbc
>>>   [<00000000004849b0>] zfcp_erp_clear_adapter_status+0xd0/0x16c
>>>   [<000000000048587a>] zfcp_erp_adapter_reopen+0x3a/0xb4
>>>   [<0000000000489812>] zfcp_fsf_req_send+0x166/0x180
>>>   [<000000000048c8d6>] zfcp_fsf_fcp_cmnd+0x272/0x408
>>>   [<000000000048f864>] zfcp_scsi_queuecommand+0x11c/0x1e0
>>>   [<00000000003f1f2a>] scsi_dispatch_cmd+0x1d6/0x324
>>>   [<00000000003f9910>] scsi_request_fn+0x42c/0x56c
>>>   [<00000000003828ae>] __blk_run_queue+0x86/0x140
>>>   [<000000000037f742>] elv_insert+0x11a/0x208
>>>   [<000000000038104c>] blk_insert_cloned_request+0x84/0xe4
>>>   [<000003c0032b7c64>] dm_dispatch_request+0x6c/0x94 [dm_mod]
>>>   [<000003c0032b7d5c>] map_request+0xd0/0x100 [dm_mod]
>>>   [<000003c0032b9a78>] dm_request_fn+0xec/0x1bc [dm_mod]
>>>   [<0000000000382c0e>] generic_unplug_device+0x5a/0x6c
>>>   [<000003c0032b7f98>] dm_unplug_all+0x74/0x9c [dm_mod]
>>>   [<00000000001d1272>] sync_page+0x76/0x9c
>>>   [<00000000001d12ba>] sync_page_killable+0x22/0x60
>>>   [<000000000055a768>] __wait_on_bit_lock+0xc0/0x124
>>>   [<00000000001d1140>] __lock_page_killable+0x78/0x84
>>>   [<00000000001d351c>] generic_file_aio_read+0x5a4/0x7e8
>>>   [<0000000000228ec0>] do_sync_read+0xc8/0x12c
>>>   [<0000000000229edc>] vfs_read+0xac/0x1ac
>>>   [<000000000022a0d8>] SyS_read+0x58/0xa8
>>>   [<00000000001146de>] sysc_noemu+0x10/0x16
>>>   [<00000200000493c4>] 0x200000493c4
>>>  INFO: lockdep is turned off.
>>>
>>> Call zfcp_fsf_fcp_cmnd without the host_lock and disable the
>>> interrupts when acquiring the req_q_lock.
>>>
>>> Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
>>> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
>>> ---
>>>
>>>  drivers/s390/scsi/zfcp_fsf.c  |    5 +++--
>>>  drivers/s390/scsi/zfcp_scsi.c |    3 +++
>>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> --- a/drivers/s390/scsi/zfcp_fsf.c
>>> +++ b/drivers/s390/scsi/zfcp_fsf.c
>>> @@ -2170,12 +2170,13 @@ int zfcp_fsf_fcp_cmnd(struct scsi_cmnd *
>>>  	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
>>>  	struct zfcp_qdio *qdio = adapter->qdio;
>>>  	struct fsf_qtcb_bottom_io *io;
>>> +	unsigned long flags;
>>>  
>>>  	if (unlikely(!(atomic_read(&zfcp_sdev->status) &
>>>  		       ZFCP_STATUS_COMMON_UNBLOCKED)))
>>>  		return -EBUSY;
>>>  
>>> -	spin_lock(&qdio->req_q_lock);
>>> +	spin_lock_irqsave(&qdio->req_q_lock, flags);
>>>  	if (atomic_read(&qdio->req_q_free) <= 0) {
>>>  		atomic_inc(&qdio->req_q_full);
>>>  		goto out;
>>> @@ -2239,7 +2240,7 @@ failed_scsi_cmnd:
>>>  	zfcp_fsf_req_free(req);
>>>  	scsi_cmnd->host_scribble = NULL;
>>>  out:
>>> -	spin_unlock(&qdio->req_q_lock);
>>> +	spin_unlock_irqrestore(&qdio->req_q_lock, flags);
>>>  	return retval;
>>>  }
>>>  
>>> --- a/drivers/s390/scsi/zfcp_scsi.c
>>> +++ b/drivers/s390/scsi/zfcp_scsi.c
>>> @@ -83,6 +83,7 @@ static int zfcp_scsi_queuecommand_lck(st
>>>  	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
>>>  	struct fc_rport *rport = starget_to_rport(scsi_target(scpnt->device));
>>>  	int    status, scsi_result, ret;
>>> +	struct scsi_device *sdev = scpnt->device;
>>>  
>>>  	/* reset the status for this request */
>>>  	scpnt->result = 0;
>>> @@ -118,7 +119,9 @@ static int zfcp_scsi_queuecommand_lck(st
>>>  		return 0;
>>>  	}
>>>  
>>> +	spin_unlock_irq(sdev->host->host_lock);
>>>  	ret = zfcp_fsf_fcp_cmnd(scpnt);
>>> +	spin_lock_irq(sdev->host->host_lock);
>>
>> CCing Jeff
>>
>> that locks is taken in your own driver three lines below at the
>> DEF_SCSI_QCMD macro invocation
>>
>> Please do the proper host-lock-removal. The first time you are
>> touching this code. (See example patch to libata by Jeff Garzik)
> 
> With the current code, the serial_number has to be updated for each
> command since scsi_error still has the check for the serial_number in
> scsi_try_to_abort_cmd.  The change above is a bug fix for the zfcp
> changes introduced in 2.6.37-rc1, and i would like to fix this now. I
> remember from the host_lock discussion that the serial_number changes
> will happen in the 2.6.38 kernel. To me, it looks like there a two
> changes needed anyway.
> 

Sigh! yes this is a deficiency I don't like about the current
push-down. I think we could do Nic's atomic serial_number as an
intermediate solution, or the pending fix to scsi_eh, and not force
a double change to every driver like we have now.

But I still think it's better to open code DEF_SCSI_QCMD than
to have that ugly 
unlock
...
lock

That looks like an ugly typo (And is slower actually)

Jeff ?
Boaz

> Either
>  - make the above change now (as bug fix)
>  - remove the host_lock from zfcp's queuecommand function when the
>    serial_number becomes optional
> or
>  - change the queuecommand function now to include:
>    take host_lock, call scsi_cmd_get_serial, release host_lock
>  - remove the sequence again when the serial_number becomes optional
> 
> I opted for the first approach, to have a smaller patch now. If the
> second approach is preferred, i can send an updated patch.
> 
> Christof
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch 5/5] zfcp: Issue FCP command without holding SCSI host_lock
  2010-11-17 14:35       ` Boaz Harrosh
@ 2010-11-18  8:34         ` Christof Schmitt
  2010-11-18 13:53           ` [PATCH 5/5 v2] " Christof Schmitt
  0 siblings, 1 reply; 11+ messages in thread
From: Christof Schmitt @ 2010-11-18  8:34 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Jeff Garzik, James Bottomley, linux-scsi, linux-s390, schwidefsky,
	heiko.carstens, Nicholas A. Bellinger

On Wed, Nov 17, 2010 at 04:35:52PM +0200, Boaz Harrosh wrote:
> On 11/17/2010 04:10 PM, Christof Schmitt wrote:
> > On Wed, Nov 17, 2010 at 03:43:34PM +0200, Boaz Harrosh wrote:
> >> On 11/17/2010 03:23 PM, Christof Schmitt wrote:
> >>> From: Christof Schmitt <christof.schmitt@de.ibm.com>
> >>>
> >>> Interrupting the connection to the FCP channel while I/O requests are
> >>> being issues can lead to this deadlock. scsi_dispatch_cmd already
> >>> holds the host_lock while the recovery trigger tries to acquire the
> >>> host_lock again when iterating through the scsi_devices.
> >>>
> >>>  INFO: lockdep is turned off.
> >>>  BUG: spinlock lockup on CPU#1, blast/9660, 0000000078f38878
> >>>  CPU: 1 Not tainted 2.6.35.7SWEN2 #2
> >>>  Process blast (pid: 9660, task: 0000000071f75940, ksp: 0000000074393ac0)
> >>>         0000000074393640 00000000743935c0 0000000000000002 0000000000000000
> >>>         0000000074393660 00000000743935d8 00000000743935d8 00000000005590c2
> >>>         0000000000000000 0000000078f38878 0000000026ede800 0000000078f38878
> >>>         000000000000000d 040000000000000c 0000000074393628 0000000000000000
> >>>         0000000000000000 0000000000100b2a 00000000743935c0 0000000074393600
> >>>  Call Trace:
> >>>  ([<0000000000100a32>] show_trace+0xee/0x144)
> >>>   [<00000000003be202>] do_raw_spin_lock+0x112/0x178
> >>>   [<000000000055d408>] _raw_spin_lock_irqsave+0x90/0xb0
> >>>   [<00000000003f1514>] __scsi_iterate_devices+0x38/0xbc
> >>>   [<00000000004849b0>] zfcp_erp_clear_adapter_status+0xd0/0x16c
> >>>   [<000000000048587a>] zfcp_erp_adapter_reopen+0x3a/0xb4
> >>>   [<0000000000489812>] zfcp_fsf_req_send+0x166/0x180
> >>>   [<000000000048c8d6>] zfcp_fsf_fcp_cmnd+0x272/0x408
> >>>   [<000000000048f864>] zfcp_scsi_queuecommand+0x11c/0x1e0
> >>>   [<00000000003f1f2a>] scsi_dispatch_cmd+0x1d6/0x324
> >>>   [<00000000003f9910>] scsi_request_fn+0x42c/0x56c
> >>>   [<00000000003828ae>] __blk_run_queue+0x86/0x140
> >>>   [<000000000037f742>] elv_insert+0x11a/0x208
> >>>   [<000000000038104c>] blk_insert_cloned_request+0x84/0xe4
> >>>   [<000003c0032b7c64>] dm_dispatch_request+0x6c/0x94 [dm_mod]
> >>>   [<000003c0032b7d5c>] map_request+0xd0/0x100 [dm_mod]
> >>>   [<000003c0032b9a78>] dm_request_fn+0xec/0x1bc [dm_mod]
> >>>   [<0000000000382c0e>] generic_unplug_device+0x5a/0x6c
> >>>   [<000003c0032b7f98>] dm_unplug_all+0x74/0x9c [dm_mod]
> >>>   [<00000000001d1272>] sync_page+0x76/0x9c
> >>>   [<00000000001d12ba>] sync_page_killable+0x22/0x60
> >>>   [<000000000055a768>] __wait_on_bit_lock+0xc0/0x124
> >>>   [<00000000001d1140>] __lock_page_killable+0x78/0x84
> >>>   [<00000000001d351c>] generic_file_aio_read+0x5a4/0x7e8
> >>>   [<0000000000228ec0>] do_sync_read+0xc8/0x12c
> >>>   [<0000000000229edc>] vfs_read+0xac/0x1ac
> >>>   [<000000000022a0d8>] SyS_read+0x58/0xa8
> >>>   [<00000000001146de>] sysc_noemu+0x10/0x16
> >>>   [<00000200000493c4>] 0x200000493c4
> >>>  INFO: lockdep is turned off.
> >>>
> >>> Call zfcp_fsf_fcp_cmnd without the host_lock and disable the
> >>> interrupts when acquiring the req_q_lock.
> >>>
> >>> Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
> >>> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
> >>> ---
> >>>
> >>>  drivers/s390/scsi/zfcp_fsf.c  |    5 +++--
> >>>  drivers/s390/scsi/zfcp_scsi.c |    3 +++
> >>>  2 files changed, 6 insertions(+), 2 deletions(-)
> >>>
> >>> --- a/drivers/s390/scsi/zfcp_fsf.c
> >>> +++ b/drivers/s390/scsi/zfcp_fsf.c
> >>> @@ -2170,12 +2170,13 @@ int zfcp_fsf_fcp_cmnd(struct scsi_cmnd *
> >>>  	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
> >>>  	struct zfcp_qdio *qdio = adapter->qdio;
> >>>  	struct fsf_qtcb_bottom_io *io;
> >>> +	unsigned long flags;
> >>>  
> >>>  	if (unlikely(!(atomic_read(&zfcp_sdev->status) &
> >>>  		       ZFCP_STATUS_COMMON_UNBLOCKED)))
> >>>  		return -EBUSY;
> >>>  
> >>> -	spin_lock(&qdio->req_q_lock);
> >>> +	spin_lock_irqsave(&qdio->req_q_lock, flags);
> >>>  	if (atomic_read(&qdio->req_q_free) <= 0) {
> >>>  		atomic_inc(&qdio->req_q_full);
> >>>  		goto out;
> >>> @@ -2239,7 +2240,7 @@ failed_scsi_cmnd:
> >>>  	zfcp_fsf_req_free(req);
> >>>  	scsi_cmnd->host_scribble = NULL;
> >>>  out:
> >>> -	spin_unlock(&qdio->req_q_lock);
> >>> +	spin_unlock_irqrestore(&qdio->req_q_lock, flags);
> >>>  	return retval;
> >>>  }
> >>>  
> >>> --- a/drivers/s390/scsi/zfcp_scsi.c
> >>> +++ b/drivers/s390/scsi/zfcp_scsi.c
> >>> @@ -83,6 +83,7 @@ static int zfcp_scsi_queuecommand_lck(st
> >>>  	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
> >>>  	struct fc_rport *rport = starget_to_rport(scsi_target(scpnt->device));
> >>>  	int    status, scsi_result, ret;
> >>> +	struct scsi_device *sdev = scpnt->device;
> >>>  
> >>>  	/* reset the status for this request */
> >>>  	scpnt->result = 0;
> >>> @@ -118,7 +119,9 @@ static int zfcp_scsi_queuecommand_lck(st
> >>>  		return 0;
> >>>  	}
> >>>  
> >>> +	spin_unlock_irq(sdev->host->host_lock);
> >>>  	ret = zfcp_fsf_fcp_cmnd(scpnt);
> >>> +	spin_lock_irq(sdev->host->host_lock);
> >>
> >> CCing Jeff
> >>
> >> that locks is taken in your own driver three lines below at the
> >> DEF_SCSI_QCMD macro invocation
> >>
> >> Please do the proper host-lock-removal. The first time you are
> >> touching this code. (See example patch to libata by Jeff Garzik)
> > 
> > With the current code, the serial_number has to be updated for each
> > command since scsi_error still has the check for the serial_number in
> > scsi_try_to_abort_cmd.  The change above is a bug fix for the zfcp
> > changes introduced in 2.6.37-rc1, and i would like to fix this now. I
> > remember from the host_lock discussion that the serial_number changes
> > will happen in the 2.6.38 kernel. To me, it looks like there a two
> > changes needed anyway.
> > 
> 
> Sigh! yes this is a deficiency I don't like about the current
> push-down. I think we could do Nic's atomic serial_number as an
> intermediate solution, or the pending fix to scsi_eh, and not force
> a double change to every driver like we have now.
> 
> But I still think it's better to open code DEF_SCSI_QCMD than
> to have that ugly 
> unlock
> ...
> lock
> 
> That looks like an ugly typo (And is slower actually)
> 
> Jeff ?
> Boaz

I have just seen the patch "[PATCH] Eliminate error handler overload
of the SCSI serial number" from James. This means, i don't have to
worry about the serial_number at all. I will send a new patch that
removes the host_lock from zfcp to fix the locking issue in zfcp.

Christof

> 
> > Either
> >  - make the above change now (as bug fix)
> >  - remove the host_lock from zfcp's queuecommand function when the
> >    serial_number becomes optional
> > or
> >  - change the queuecommand function now to include:
> >    take host_lock, call scsi_cmd_get_serial, release host_lock
> >  - remove the sequence again when the serial_number becomes optional
> > 
> > I opted for the first approach, to have a smaller patch now. If the
> > second approach is preferred, i can send an updated patch.
> > 
> > Christof
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 5/5 v2] zfcp: Issue FCP command without holding SCSI host_lock
  2010-11-18  8:34         ` Christof Schmitt
@ 2010-11-18 13:53           ` Christof Schmitt
  0 siblings, 0 replies; 11+ messages in thread
From: Christof Schmitt @ 2010-11-18 13:53 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jeff Garzik, linux-scsi, linux-s390, schwidefsky, heiko.carstens,
	Nicholas A. Bellinger, Boaz Harrosh

From: Christof Schmitt <christof.schmitt@de.ibm.com>

Interrupting the connection to the FCP channel while I/O requests are
being issued can lead to this deadlock. scsi_dispatch_cmd already
holds the host_lock while the recovery trigger tries to acquire the
host_lock again when iterating through the scsi_devices.

 INFO: lockdep is turned off.
 BUG: spinlock lockup on CPU#1, blast/9660, 0000000078f38878
 CPU: 1 Not tainted 2.6.35.7SWEN2 #2
 Process blast (pid: 9660, task: 0000000071f75940, ksp: 0000000074393ac0)
        0000000074393640 00000000743935c0 0000000000000002 0000000000000000
        0000000074393660 00000000743935d8 00000000743935d8 00000000005590c2
        0000000000000000 0000000078f38878 0000000026ede800 0000000078f38878
        000000000000000d 040000000000000c 0000000074393628 0000000000000000
        0000000000000000 0000000000100b2a 00000000743935c0 0000000074393600
 Call Trace:
 ([<0000000000100a32>] show_trace+0xee/0x144)
  [<00000000003be202>] do_raw_spin_lock+0x112/0x178
  [<000000000055d408>] _raw_spin_lock_irqsave+0x90/0xb0
  [<00000000003f1514>] __scsi_iterate_devices+0x38/0xbc
  [<00000000004849b0>] zfcp_erp_clear_adapter_status+0xd0/0x16c
  [<000000000048587a>] zfcp_erp_adapter_reopen+0x3a/0xb4
  [<0000000000489812>] zfcp_fsf_req_send+0x166/0x180
  [<000000000048c8d6>] zfcp_fsf_fcp_cmnd+0x272/0x408
  [<000000000048f864>] zfcp_scsi_queuecommand+0x11c/0x1e0
  [<00000000003f1f2a>] scsi_dispatch_cmd+0x1d6/0x324
  [<00000000003f9910>] scsi_request_fn+0x42c/0x56c
  [<00000000003828ae>] __blk_run_queue+0x86/0x140
  [<000000000037f742>] elv_insert+0x11a/0x208
  [<000000000038104c>] blk_insert_cloned_request+0x84/0xe4
  [<000003c0032b7c64>] dm_dispatch_request+0x6c/0x94 [dm_mod]
  [<000003c0032b7d5c>] map_request+0xd0/0x100 [dm_mod]
  [<000003c0032b9a78>] dm_request_fn+0xec/0x1bc [dm_mod]
  [<0000000000382c0e>] generic_unplug_device+0x5a/0x6c
  [<000003c0032b7f98>] dm_unplug_all+0x74/0x9c [dm_mod]
  [<00000000001d1272>] sync_page+0x76/0x9c
  [<00000000001d12ba>] sync_page_killable+0x22/0x60
  [<000000000055a768>] __wait_on_bit_lock+0xc0/0x124
  [<00000000001d1140>] __lock_page_killable+0x78/0x84
  [<00000000001d351c>] generic_file_aio_read+0x5a4/0x7e8
  [<0000000000228ec0>] do_sync_read+0xc8/0x12c
  [<0000000000229edc>] vfs_read+0xac/0x1ac
  [<000000000022a0d8>] SyS_read+0x58/0xa8
  [<00000000001146de>] sysc_noemu+0x10/0x16
  [<00000200000493c4>] 0x200000493c4
 INFO: lockdep is turned off.

Call zfcp_fsf_fcp_cmnd without the host_lock and disable the
interrupts when acquiring the req_q_lock. According to the patch
description in "[PATCH] Eliminate error handler overload of the SCSI
serial number", the serial_number is not used, so simply drop the
queuecommand wrapper function and run zfcp_scsi_queuecommand without
holding the host_lock.

Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
---
 drivers/s390/scsi/zfcp_fsf.c  |    5 +++--
 drivers/s390/scsi/zfcp_scsi.c |    7 ++-----
 2 files changed, 5 insertions(+), 7 deletions(-)

--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -2170,12 +2170,13 @@ int zfcp_fsf_fcp_cmnd(struct scsi_cmnd *
 	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
 	struct zfcp_qdio *qdio = adapter->qdio;
 	struct fsf_qtcb_bottom_io *io;
+	unsigned long flags;
 
 	if (unlikely(!(atomic_read(&zfcp_sdev->status) &
 		       ZFCP_STATUS_COMMON_UNBLOCKED)))
 		return -EBUSY;
 
-	spin_lock(&qdio->req_q_lock);
+	spin_lock_irqsave(&qdio->req_q_lock, flags);
 	if (atomic_read(&qdio->req_q_free) <= 0) {
 		atomic_inc(&qdio->req_q_full);
 		goto out;
@@ -2239,7 +2240,7 @@ failed_scsi_cmnd:
 	zfcp_fsf_req_free(req);
 	scsi_cmnd->host_scribble = NULL;
 out:
-	spin_unlock(&qdio->req_q_lock);
+	spin_unlock_irqrestore(&qdio->req_q_lock, flags);
 	return retval;
 }
 
--- a/drivers/s390/scsi/zfcp_scsi.c
+++ b/drivers/s390/scsi/zfcp_scsi.c
@@ -76,8 +76,8 @@ static void zfcp_scsi_command_fail(struc
 	scpnt->scsi_done(scpnt);
 }
 
-static int zfcp_scsi_queuecommand_lck(struct scsi_cmnd *scpnt,
-				  void (*done) (struct scsi_cmnd *))
+static
+int zfcp_scsi_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scpnt)
 {
 	struct zfcp_scsi_dev *zfcp_sdev = sdev_to_zfcp(scpnt->device);
 	struct zfcp_adapter *adapter = zfcp_sdev->port->adapter;
@@ -87,7 +87,6 @@ static int zfcp_scsi_queuecommand_lck(st
 	/* reset the status for this request */
 	scpnt->result = 0;
 	scpnt->host_scribble = NULL;
-	scpnt->scsi_done = done;
 
 	scsi_result = fc_remote_port_chkready(rport);
 	if (unlikely(scsi_result)) {
@@ -127,8 +126,6 @@ static int zfcp_scsi_queuecommand_lck(st
 	return ret;
 }
 
-static DEF_SCSI_QCMD(zfcp_scsi_queuecommand)
-
 static int zfcp_scsi_slave_alloc(struct scsi_device *sdev)
 {
 	struct fc_rport *rport = starget_to_rport(scsi_target(sdev));

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-11-18 13:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-17 13:23 [patch 0/5] zfcp fixes for 2.6.37-rc Christof Schmitt
2010-11-17 13:23 ` [patch 1/5] zfcp: Fix common FCP request reception Christof Schmitt
2010-11-17 13:23 ` [patch 2/5] zfcp: Correct false abort data assignment Christof Schmitt
2010-11-17 13:23 ` [patch 3/5] zfcp: No ERP escalation on gpn_ft eval Christof Schmitt
2010-11-17 13:23 ` [patch 4/5] zfcp: Prevent usage w/o holding a reference Christof Schmitt
2010-11-17 13:23 ` [patch 5/5] zfcp: Issue FCP command without holding SCSI host_lock Christof Schmitt
2010-11-17 13:43   ` Boaz Harrosh
2010-11-17 14:10     ` Christof Schmitt
2010-11-17 14:35       ` Boaz Harrosh
2010-11-18  8:34         ` Christof Schmitt
2010-11-18 13:53           ` [PATCH 5/5 v2] " Christof Schmitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).