[PATCHv5 0/9] New EH command timeout handler

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv5 0/9] New EH command timeout handler
@ 2013-09-02  7:12 Hannes Reinecke
  2013-09-02  7:12 ` [PATCH 1/9] scsi: Fix erratic device offline during EH Hannes Reinecke
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  7:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel, James Smart,
	Bart Van Assche, Roland Dreier, Hannes Reinecke

Hi all,

this patchset implements a new SCSI EH command timeout handler
which will be sending command aborts inline without actually
engaging SCSI EH.
SCSI EH will only be invoked if command abort fails.

In addition the commands will be returned directly
if the command abort succeeded, cutting down recovery
times dramatically.

With the original SCSI EH I got:
# time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 142.652 s, 118 kB/s

real	2m22.657s
user	0m0.013s
sys	0m0.145s

With this patchset I got:
# time dd if=/dev/zero of=/dev/dm-2 bs=4k count=4k oflag=direct
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 52.1579 s, 322 kB/s

real	0m52.163s
user	0m0.012s
sys	0m0.145s

Test was to disable RSCN on the target port, disable the
target port, and then start the 'dd' command as indicated.

Changes to the original version:
- Use a private list in scsi_eh_abort_handler to avoid
  list starvation (pointed out by Joern Engel)
- Terminate command aborts when the first abort fails
- Do not attempt command aborts if the host is already in recovery
  or if the device is removed.
- Flush abort workqueue if the device is removed.

Changes to v2:
- Removed eh_entry initialisation
- Convert to per-command workqueue

Changes to v3:
- Use delayed_work
- Enable new eh timeout handler for virtio, SAS, and FC
- Modify logging messages to include scmd pointer

Changes to v4:
- Remove stubs when enabling new eh timeout handler
  for other drivers

Hannes Reinecke (9):
  scsi: Fix erratic device offline during EH
  blk-timeout: add BLK_EH_SCHEDULED return code
  scsi: improved eh timeout handler
  virtio_scsi: Enable new EH timeout handler
  libsas: Enable new EH timeout handler
  mptsas: Enable new EH timeout handler
  mpt2sas: Enable new EH timeout handler
  mpt3sas: Enable new EH timeout handler
  scsi_transport_fc: Enable new EH timeout handler

 drivers/message/fusion/mptsas.c      |   3 +-
 drivers/scsi/libsas/sas_scsi_host.c  |   2 +-
 drivers/scsi/mpt2sas/mpt2sas_scsih.c |   3 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c |   1 +
 drivers/scsi/scsi.c                  |   3 +
 drivers/scsi/scsi_error.c            | 178 ++++++++++++++++++++++++++++++-----
 drivers/scsi/scsi_priv.h             |   2 +
 drivers/scsi/scsi_transport_fc.c     |   2 +-
 drivers/scsi/virtio_scsi.c           |   2 +
 include/linux/blkdev.h               |   1 +
 include/scsi/scsi_cmnd.h             |   2 +
 11 files changed, 173 insertions(+), 26 deletions(-)

-- 
1.7.12.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/9] scsi: Fix erratic device offline during EH
  2013-09-02  7:12 [PATCHv5 0/9] New EH command timeout handler Hannes Reinecke
@ 2013-09-02  7:12 ` Hannes Reinecke
  2013-09-02  7:12 ` [PATCH 2/9] blk-timeout: add BLK_EH_SCHEDULED return code Hannes Reinecke
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  7:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel, James Smart,
	Bart Van Assche, Roland Dreier, Hannes Reinecke,
	Martin K. Petersen

Commit 18a4d0a22ed6c54b67af7718c305cd010f09ddf8
(Handle disk devices which can not process medium access commands)
was introduced to offline any device which cannot process medium
access commands.
However, commit 3eef6257de48ff84a5d98ca533685df8a3beaeb8
(Reduce error recovery time by reducing use of TURs) reduced
the number of TURs by sending it only on the first failing
command, which might or might not be a medium access command.
So in combination this results in an erratic device offlining
during EH; if the command where the TUR was sent upon happens
to be a medium access command the device will be set offline,
if not everything proceeds as normal.

So instead of checking the EH command in the ->eh_action
callback we should rather call ->eh_action when we're
about to finish the command _and_ have sent a TUR previously.
This should then set the device offline as advertised.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ewan Milne <emilne@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_error.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index abf0916..c88cb7e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -941,12 +941,6 @@ retry:
 
 	scsi_eh_restore_cmnd(scmd, &ses);
 
-	if (scmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {
-		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
-		if (sdrv->eh_action)
-			rtn = sdrv->eh_action(scmd, cmnd, cmnd_size, rtn);
-	}
-
 	return rtn;
 }
 
@@ -964,6 +958,18 @@ static int scsi_request_sense(struct scsi_cmnd *scmd)
 	return scsi_send_eh_cmnd(scmd, NULL, 0, scmd->device->eh_timeout, ~0);
 }
 
+static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
+{
+	static unsigned char tur_command[6] = {TEST_UNIT_READY, 0, 0, 0, 0, 0};
+
+	if (scmd->request->cmd_type != REQ_TYPE_BLOCK_PC) {
+		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
+		if (sdrv->eh_action)
+			rtn = sdrv->eh_action(scmd, tur_command, 6, rtn);
+	}
+	return rtn;
+}
+
 /**
  * scsi_eh_finish_cmd - Handle a cmd that eh is finished with.
  * @scmd:	Original SCSI cmd that eh has finished.
@@ -1142,7 +1148,9 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
 
 		list_for_each_entry_safe(scmd, next, cmd_list, eh_entry)
 			if (scmd->device == sdev) {
-				if (finish_cmds)
+				if (finish_cmds &&
+				    (try_stu ||
+				     scsi_eh_action(scmd, SUCCESS) == SUCCESS))
 					scsi_eh_finish_cmd(scmd, done_q);
 				else
 					list_move_tail(&scmd->eh_entry, work_q);
@@ -1281,7 +1289,8 @@ static int scsi_eh_stu(struct Scsi_Host *shost,
 			    !scsi_eh_tur(stu_scmd)) {
 				list_for_each_entry_safe(scmd, next,
 							  work_q, eh_entry) {
-					if (scmd->device == sdev)
+					if (scmd->device == sdev &&
+					    scsi_eh_action(scmd, SUCCESS) == SUCCESS)
 						scsi_eh_finish_cmd(scmd, done_q);
 				}
 			}
@@ -1348,7 +1357,8 @@ static int scsi_eh_bus_device_reset(struct Scsi_Host *shost,
 			    !scsi_eh_tur(bdr_scmd)) {
 				list_for_each_entry_safe(scmd, next,
 							 work_q, eh_entry) {
-					if (scmd->device == sdev)
+					if (scmd->device == sdev &&
+					    scsi_eh_action(scmd, rtn) != FAILED)
 						scsi_eh_finish_cmd(scmd,
 								   done_q);
 				}
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/9] blk-timeout: add BLK_EH_SCHEDULED return code
  2013-09-02  7:12 [PATCHv5 0/9] New EH command timeout handler Hannes Reinecke
  2013-09-02  7:12 ` [PATCH 1/9] scsi: Fix erratic device offline during EH Hannes Reinecke
@ 2013-09-02  7:12 ` Hannes Reinecke
  2013-09-02  7:12 ` [PATCH 3/9] scsi: improved eh timeout handler Hannes Reinecke
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  7:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel, James Smart,
	Bart Van Assche, Roland Dreier, Hannes Reinecke

Add a 'BLK_EH_SCHEDULED' return code for blk-timeout to indicate
that a delayed error recovery has been initiated.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_error.c | 4 ++++
 include/linux/blkdev.h    | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c88cb7e..b7dd774 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -161,6 +161,10 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 	else if (host->hostt->eh_timed_out)
 		rtn = host->hostt->eh_timed_out(scmd);
 
+	/* Check for asynchronous command aborts */
+	if (rtn == BLK_EH_SCHEDULED)
+		return BLK_EH_NOT_HANDLED;
+
 	scmd->result |= DID_TIME_OUT << 16;
 
 	if (unlikely(rtn == BLK_EH_NOT_HANDLED &&
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2fdb4a4..d846e2b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -238,6 +238,7 @@ enum blk_eh_timer_return {
 	BLK_EH_NOT_HANDLED,
 	BLK_EH_HANDLED,
 	BLK_EH_RESET_TIMER,
+	BLK_EH_SCHEDULED,
 };
 
 typedef enum blk_eh_timer_return (rq_timed_out_fn)(struct request *);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/9] scsi: improved eh timeout handler
  2013-09-02  7:12 [PATCHv5 0/9] New EH command timeout handler Hannes Reinecke
  2013-09-02  7:12 ` [PATCH 1/9] scsi: Fix erratic device offline during EH Hannes Reinecke
  2013-09-02  7:12 ` [PATCH 2/9] blk-timeout: add BLK_EH_SCHEDULED return code Hannes Reinecke
@ 2013-09-02  7:12 ` Hannes Reinecke
  2013-09-02 12:45   ` Bart Van Assche
  2013-09-02  7:13 ` [PATCH 4/9] virtio_scsi: Enable new EH " Hannes Reinecke
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  7:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel, James Smart,
	Bart Van Assche, Roland Dreier, Hannes Reinecke

When a command runs into a timeout we need to send an 'ABORT TASK'
TMF. This is typically done by the 'eh_abort_handler' LLDD callback.

Conceptually, however, this function is a normal SCSI command, so
there is no need to enter the error handler.

This patch implements a new scsi_abort_command() function which
invokes an asynchronous function scsi_eh_abort_handler() to
abort the commands via the usual 'eh_abort_handler'.

If abort succeeds the command is either retried or terminated,
depending on the number of allowed retries. However, 'eh_eflags'
records the abort, so if the retry would fail again the
command is pushed onto the error handler without trying to
abort it (again); it'll be cleared up from SCSI EH.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Ren Mingxin <renmx@cn.fujitsu.com>
---
 drivers/scsi/scsi.c       |   3 +
 drivers/scsi/scsi_error.c | 146 +++++++++++++++++++++++++++++++++++++++++-----
 drivers/scsi/scsi_priv.h  |   2 +
 include/scsi/scsi_cmnd.h  |   2 +
 4 files changed, 140 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index fe0bcb1..2b04a57 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -297,6 +297,7 @@ struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, gfp_t gfp_mask)
 
 		cmd->device = dev;
 		INIT_LIST_HEAD(&cmd->list);
+		INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
 		spin_lock_irqsave(&dev->list_lock, flags);
 		list_add_tail(&cmd->list, &dev->cmd_list);
 		spin_unlock_irqrestore(&dev->list_lock, flags);
@@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
 	list_del_init(&cmd->list);
 	spin_unlock_irqrestore(&cmd->device->list_lock, flags);
 
+	cancel_delayed_work(&cmd->abort_work);
+
 	__scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
 }
 EXPORT_SYMBOL(scsi_put_command);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index b7dd774..21c2465 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -53,6 +53,7 @@ static void scsi_eh_done(struct scsi_cmnd *scmd);
 #define HOST_RESET_SETTLE_TIME  (10)
 
 static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
+static int scsi_try_to_abort_cmd(struct scsi_host_template *, struct scsi_cmnd *);
 
 /* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
@@ -100,6 +101,116 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 }
 
 /**
+ * scmd_eh_abort_handler - Handle command aborts
+ * @work:	command to be aborted.
+ */
+void
+scmd_eh_abort_handler(struct work_struct *work)
+{
+	struct scsi_cmnd *scmd =
+		container_of(work, struct scsi_cmnd, abort_work.work);
+	struct scsi_device *sdev = scmd->device;
+	unsigned long flags;
+	int rtn;
+
+	spin_lock_irqsave(sdev->host->host_lock, flags);
+	if (scsi_host_eh_past_deadline(sdev->host)) {
+		spin_unlock_irqrestore(sdev->host->host_lock, flags);
+		SCSI_LOG_ERROR_RECOVERY(3,
+			scmd_printk(KERN_INFO, scmd,
+				    "scmd %p eh timeout, not aborting\n", scmd));
+	} else {
+		spin_unlock_irqrestore(sdev->host->host_lock, flags);
+		SCSI_LOG_ERROR_RECOVERY(3,
+			scmd_printk(KERN_INFO, scmd,
+				    "aborting command %p\n", scmd));
+		rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
+		if (rtn == SUCCESS) {
+			scmd->result |= DID_TIME_OUT << 16;
+			if (!scsi_noretry_cmd(scmd) &&
+			    (++scmd->retries <= scmd->allowed)) {
+				SCSI_LOG_ERROR_RECOVERY(3,
+					scmd_printk(KERN_WARNING, scmd,
+						    "scmd %p retry "
+						    "aborted command\n", scmd));
+				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+			} else {
+				SCSI_LOG_ERROR_RECOVERY(3,
+					scmd_printk(KERN_WARNING, scmd,
+						    "scmd %p finish "
+						    "aborted command\n", scmd));
+				scsi_finish_command(scmd);
+			}
+			return;
+		}
+		SCSI_LOG_ERROR_RECOVERY(3,
+			scmd_printk(KERN_INFO, scmd,
+				    "scmd %p abort failed, rtn %d\n",
+				    scmd, rtn));
+	}
+
+	if (scsi_eh_scmd_add(scmd, 0)) {
+		SCSI_LOG_ERROR_RECOVERY(3,
+			scmd_printk(KERN_WARNING, scmd,
+				    "scmd %p terminate "
+				    "aborted command\n", scmd));
+		scmd->result |= DID_TIME_OUT << 16;
+		scsi_finish_command(scmd);
+	}
+}
+
+/**
+ * scsi_abort_command - schedule a command abort
+ * @scmd:	scmd to abort.
+ *
+ * We only need to abort commands after a command timeout
+ */
+enum blk_eh_timer_return
+scsi_abort_command(struct scsi_cmnd *scmd)
+{
+	struct scsi_device *sdev = scmd->device;
+	struct Scsi_Host *shost = sdev->host;
+	unsigned long flags;
+
+	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) {
+		/*
+		 * command abort timed out.
+		 */
+		SCSI_LOG_ERROR_RECOVERY(3,
+			scmd_printk(KERN_INFO, scmd,
+				    "scmd %p abort timeout\n", scmd));
+		cancel_delayed_work(&scmd->abort_work);
+		return BLK_EH_NOT_HANDLED;
+	}
+
+	/*
+	 * Do not try a command abort if
+	 * SCSI EH has already started.
+	 */
+	spin_lock_irqsave(shost->host_lock, flags);
+	if (scsi_host_in_recovery(shost)) {
+		spin_unlock_irqrestore(shost->host_lock, flags);
+		SCSI_LOG_ERROR_RECOVERY(3,
+			scmd_printk(KERN_INFO, scmd,
+				    "scmd %p not aborting, host in recovery\n",
+				    scmd));
+		return BLK_EH_NOT_HANDLED;
+	}
+
+	if (shost->eh_deadline && !shost->last_reset)
+		shost->last_reset = jiffies;
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
+	scmd->eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
+	SCSI_LOG_ERROR_RECOVERY(3,
+		scmd_printk(KERN_INFO, scmd,
+			    "scmd %p abort scheduled\n", scmd));
+	schedule_delayed_work(&scmd->abort_work, HZ / 100);
+	return BLK_EH_SCHEDULED;
+}
+EXPORT_SYMBOL_GPL(scsi_abort_command);
+
+/**
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:	scmd to run eh on.
  * @eh_flag:	optional SCSI_EH flag.
@@ -125,6 +236,8 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 		shost->last_reset = jiffies;
 
 	ret = 1;
+	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
+		eh_flag &= ~SCSI_EH_CANCEL_CMD;
 	scmd->eh_eflags |= eh_flag;
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
 	shost->host_failed++;
@@ -1581,7 +1694,7 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
 }
 
 /**
- * scsi_noretry_cmd - determinte if command should be failed fast
+ * scsi_noretry_cmd - determine if command should be failed fast
  * @scmd:	SCSI cmd to examine.
  */
 int scsi_noretry_cmd(struct scsi_cmnd *scmd)
@@ -1589,6 +1702,8 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd)
 	switch (host_byte(scmd->result)) {
 	case DID_OK:
 		break;
+	case DID_TIME_OUT:
+		goto check_type;
 	case DID_BUS_BUSY:
 		return (scmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT);
 	case DID_PARITY:
@@ -1602,18 +1717,19 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd)
 		return (scmd->request->cmd_flags & REQ_FAILFAST_DRIVER);
 	}
 
-	switch (status_byte(scmd->result)) {
-	case CHECK_CONDITION:
-		/*
-		 * assume caller has checked sense and determinted
-		 * the check condition was retryable.
-		 */
-		if (scmd->request->cmd_flags & REQ_FAILFAST_DEV ||
-		    scmd->request->cmd_type == REQ_TYPE_BLOCK_PC)
-			return 1;
-	}
+	switch (status_byte(scmd->result) != CHECK_CONDITION)
+		return 0;
 
-	return 0;
+check_type:
+	/*
+	 * assume caller has checked sense and determined
+	 * the check condition was retryable.
+	 */
+	if (scmd->request->cmd_flags & REQ_FAILFAST_DEV ||
+	    scmd->request->cmd_type == REQ_TYPE_BLOCK_PC)
+		return 1;
+	else
+		return 0;
 }
 
 /**
@@ -1663,9 +1779,13 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
 		 * looks good.  drop through, and check the next byte.
 		 */
 		break;
+	case DID_ABORT:
+		if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) {
+			scmd->result |= DID_TIME_OUT << 16;
+			return SUCCESS;
+		}
 	case DID_NO_CONNECT:
 	case DID_BAD_TARGET:
-	case DID_ABORT:
 		/*
 		 * note - this means that we just report the status back
 		 * to the top level driver, not that we actually think
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 8f9a0ca..f079a59 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -19,6 +19,7 @@ struct scsi_nl_hdr;
  * Scsi Error Handler Flags
  */
 #define SCSI_EH_CANCEL_CMD	0x0001	/* Cancel this cmd */
+#define SCSI_EH_ABORT_SCHEDULED	0x0002	/* Abort has been scheduled */
 
 #define SCSI_SENSE_VALID(scmd) \
 	(((scmd)->sense_buffer[0] & 0x70) == 0x70)
@@ -66,6 +67,7 @@ extern int __init scsi_init_devinfo(void);
 extern void scsi_exit_devinfo(void);
 
 /* scsi_error.c */
+extern void scmd_eh_abort_handler(struct work_struct *work);
 extern enum blk_eh_timer_return scsi_times_out(struct request *req);
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index de5f5d8..e3137e3 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -55,6 +55,7 @@ struct scsi_cmnd {
 	struct scsi_device *device;
 	struct list_head list;  /* scsi_cmnd participates in queue lists */
 	struct list_head eh_entry; /* entry for the host eh_cmd_q */
+	struct delayed_work abort_work;
 	int eh_eflags;		/* Used by error handlr */
 
 	/*
@@ -144,6 +145,7 @@ extern void scsi_put_command(struct scsi_cmnd *);
 extern void __scsi_put_command(struct Scsi_Host *, struct scsi_cmnd *,
 			       struct device *);
 extern void scsi_finish_command(struct scsi_cmnd *cmd);
+extern enum blk_eh_timer_return scsi_abort_command(struct scsi_cmnd *cmd);
 
 extern void *scsi_kmap_atomic_sg(struct scatterlist *sg, int sg_count,
 				 size_t *offset, size_t *len);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] scsi: improved eh timeout handler
  2013-09-02  7:12 ` [PATCH 3/9] scsi: improved eh timeout handler Hannes Reinecke
@ 2013-09-02 12:45   ` Bart Van Assche
  2013-09-02 13:11     ` Hannes Reinecke
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2013-09-02 12:45 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel,
	James Smart, Roland Dreier

On 09/02/13 09:12, Hannes Reinecke wrote:
> @@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
>   	list_del_init(&cmd->list);
>   	spin_unlock_irqrestore(&cmd->device->list_lock, flags);
>
> +	cancel_delayed_work(&cmd->abort_work);
> +
>   	__scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
>   }
>   EXPORT_SYMBOL(scsi_put_command);

Is this approach safe ? Is it e.g. possible that the abort work starts 
just before the cancel_delayed_work() call and continues to run after 
scsi_put_command() has finished ? In drivers/scsi/libfc/fc_exch.c a 
similar issue is solved by holding an additional reference as long as 
delayed work (fc_exch.timeout_work) is queued.

> +void
> +scmd_eh_abort_handler(struct work_struct *work)
> +{
> +	struct scsi_cmnd *scmd =
> +		container_of(work, struct scsi_cmnd, abort_work.work);
> +	struct scsi_device *sdev = scmd->device;
> +	unsigned long flags;
> +	int rtn;
> +
> +	spin_lock_irqsave(sdev->host->host_lock, flags);
> +	if (scsi_host_eh_past_deadline(sdev->host)) {
> +		spin_unlock_irqrestore(sdev->host->host_lock, flags);
> +		SCSI_LOG_ERROR_RECOVERY(3,
> +			scmd_printk(KERN_INFO, scmd,
> +				    "scmd %p eh timeout, not aborting\n", scmd));
> +	} else {
> +		spin_unlock_irqrestore(sdev->host->host_lock, flags);
> +		SCSI_LOG_ERROR_RECOVERY(3,
> +			scmd_printk(KERN_INFO, scmd,
> +				    "aborting command %p\n", scmd));
> +		rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
> +		if (rtn == SUCCESS) {
> +			scmd->result |= DID_TIME_OUT << 16;
> +			if (!scsi_noretry_cmd(scmd) &&
> +			    (++scmd->retries <= scmd->allowed)) {
> +				SCSI_LOG_ERROR_RECOVERY(3,
> +					scmd_printk(KERN_WARNING, scmd,
> +						    "scmd %p retry "
> +						    "aborted command\n", scmd));
> +				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
> +			} else {
> +				SCSI_LOG_ERROR_RECOVERY(3,
> +					scmd_printk(KERN_WARNING, scmd,
> +						    "scmd %p finish "
> +						    "aborted command\n", scmd));
> +				scsi_finish_command(scmd);
> +			}
> +			return;
> +		}
> +		SCSI_LOG_ERROR_RECOVERY(3,
> +			scmd_printk(KERN_INFO, scmd,
> +				    "scmd %p abort failed, rtn %d\n",
> +				    scmd, rtn));
> +	}
> +
> +	if (scsi_eh_scmd_add(scmd, 0)) {
> +		SCSI_LOG_ERROR_RECOVERY(3,
> +			scmd_printk(KERN_WARNING, scmd,
> +				    "scmd %p terminate "
> +				    "aborted command\n", scmd));
> +		scmd->result |= DID_TIME_OUT << 16;
> +		scsi_finish_command(scmd);
> +	}
> +}

This patch adds several new calls to LLD EH handlers. Is it guaranteed 
that these will only be invoked before scsi_remove_host() has finished ? 
For more background information, see also "[PATCH] Make 
scsi_remove_host() wait until error handling finished" 
(http://thread.gmane.org/gmane.linux.scsi/82572/focus=82779).

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] scsi: improved eh timeout handler
  2013-09-02 12:45   ` Bart Van Assche
@ 2013-09-02 13:11     ` Hannes Reinecke
  2013-09-02 16:38       ` Bart Van Assche
  2013-09-03  9:13       ` Bart Van Assche
  0 siblings, 2 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02 13:11 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel,
	James Smart, Roland Dreier

On 09/02/2013 02:45 PM, Bart Van Assche wrote:
> On 09/02/13 09:12, Hannes Reinecke wrote:
>> @@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
>>       list_del_init(&cmd->list);
>>       spin_unlock_irqrestore(&cmd->device->list_lock, flags);
>>
>> +    cancel_delayed_work(&cmd->abort_work);
>> +
>>       __scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
>>   }
>>   EXPORT_SYMBOL(scsi_put_command);
> 
> Is this approach safe ? Is it e.g. possible that the abort work
> starts just before the cancel_delayed_work() call and continues to
> run after scsi_put_command() has finished ? In
> drivers/scsi/libfc/fc_exch.c a similar issue is solved by holding an
> additional reference as long as delayed work (fc_exch.timeout_work)
> is queued.
> 
I have been thinking of this, and in fact my original approach had
'cancel_delayed_work_sync' here. However, this didn't work as
scsi_put_command() might end up being called from an softirq context.

From my understanding the workqueue stuff guarantees that either
a) the workqueue item is still queued
   -> cancel_delayed_work will be in fact synchronous, as it'll
      cancel queue item from the queue
b) the workqueue item is running
   -> cancel_delayed_work is essentially a no-op, as the function
      is running and will be terminated anyway.

IE from the API perspective the transition between 'queued' and
'running' is atomic, and no in-between states are visible.

So case a) is obviously safe, and for case b) the abort function is
already running. But then the abort function has been called from
the block timeout handler, which did a blk_mark_rq_complete() prior
to calling the handler. So any completion coming in after that will
be ignored, and scsi_put_command() won't be called.

Hence we should be safe here.

>> +void
>> +scmd_eh_abort_handler(struct work_struct *work)
>> +{
>> +    struct scsi_cmnd *scmd =
>> +        container_of(work, struct scsi_cmnd, abort_work.work);
>> +    struct scsi_device *sdev = scmd->device;
>> +    unsigned long flags;
>> +    int rtn;
>> +
>> +    spin_lock_irqsave(sdev->host->host_lock, flags);
>> +    if (scsi_host_eh_past_deadline(sdev->host)) {
>> +        spin_unlock_irqrestore(sdev->host->host_lock, flags);
>> +        SCSI_LOG_ERROR_RECOVERY(3,
>> +            scmd_printk(KERN_INFO, scmd,
>> +                    "scmd %p eh timeout, not aborting\n", scmd));
>> +    } else {
>> +        spin_unlock_irqrestore(sdev->host->host_lock, flags);
>> +        SCSI_LOG_ERROR_RECOVERY(3,
>> +            scmd_printk(KERN_INFO, scmd,
>> +                    "aborting command %p\n", scmd));
>> +        rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
>> +        if (rtn == SUCCESS) {
>> +            scmd->result |= DID_TIME_OUT << 16;
>> +            if (!scsi_noretry_cmd(scmd) &&
>> +                (++scmd->retries <= scmd->allowed)) {
>> +                SCSI_LOG_ERROR_RECOVERY(3,
>> +                    scmd_printk(KERN_WARNING, scmd,
>> +                            "scmd %p retry "
>> +                            "aborted command\n", scmd));
>> +                scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
>> +            } else {
>> +                SCSI_LOG_ERROR_RECOVERY(3,
>> +                    scmd_printk(KERN_WARNING, scmd,
>> +                            "scmd %p finish "
>> +                            "aborted command\n", scmd));
>> +                scsi_finish_command(scmd);
>> +            }
>> +            return;
>> +        }
>> +        SCSI_LOG_ERROR_RECOVERY(3,
>> +            scmd_printk(KERN_INFO, scmd,
>> +                    "scmd %p abort failed, rtn %d\n",
>> +                    scmd, rtn));
>> +    }
>> +
>> +    if (scsi_eh_scmd_add(scmd, 0)) {
>> +        SCSI_LOG_ERROR_RECOVERY(3,
>> +            scmd_printk(KERN_WARNING, scmd,
>> +                    "scmd %p terminate "
>> +                    "aborted command\n", scmd));
>> +        scmd->result |= DID_TIME_OUT << 16;
>> +        scsi_finish_command(scmd);
>> +    }
>> +}
> 
> This patch adds several new calls to LLD EH handlers. Is it
> guaranteed that these will only be invoked before scsi_remove_host()
> has finished ? For more background information, see also "[PATCH]
> Make scsi_remove_host() wait until error handling finished"
> (http://thread.gmane.org/gmane.linux.scsi/82572/focus=82779).
> 
Well, that depends how scsi_remove_host() handles outstanding
commands. What happens if you call scsi_remove_host() and there is
still I/O in flight? I would assume that any HBA would have to kill
any outstanding I/O prior to calling scsi_remove_host() (FC most
certainly does this).
Which would mean that it'll have to wait for scsi_put_command() to
be called for all outstanding commands. And as scsi_put_command()
will be called only _after_ our routine runs (see the reasoning
above) we should be safe.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] scsi: improved eh timeout handler
  2013-09-02 13:11     ` Hannes Reinecke
@ 2013-09-02 16:38       ` Bart Van Assche
  2013-09-03  9:13       ` Bart Van Assche
  1 sibling, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2013-09-02 16:38 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel,
	James Smart, Roland Dreier

On 09/02/13 15:11, Hannes Reinecke wrote:
> On 09/02/2013 02:45 PM, Bart Van Assche wrote:
>> On 09/02/13 09:12, Hannes Reinecke wrote:
>>> @@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
>>>        list_del_init(&cmd->list);
>>>        spin_unlock_irqrestore(&cmd->device->list_lock, flags);
>>>
>>> +    cancel_delayed_work(&cmd->abort_work);
>>> +
>>>        __scsi_put_command(cmd->device->host, cmd, &sdev->sdev_gendev);
>>>    }
>>>    EXPORT_SYMBOL(scsi_put_command);
>>
>> Is this approach safe ? Is it e.g. possible that the abort work
>> starts just before the cancel_delayed_work() call and continues to
>> run after scsi_put_command() has finished ? In
>> drivers/scsi/libfc/fc_exch.c a similar issue is solved by holding an
>> additional reference as long as delayed work (fc_exch.timeout_work)
>> is queued.
>>
> I have been thinking of this, and in fact my original approach had
> 'cancel_delayed_work_sync' here. However, this didn't work as
> scsi_put_command() might end up being called from an softirq context.
>
>>From my understanding the workqueue stuff guarantees that either
> a) the workqueue item is still queued
>     -> cancel_delayed_work will be in fact synchronous, as it'll
>        cancel queue item from the queue
> b) the workqueue item is running
>     -> cancel_delayed_work is essentially a no-op, as the function
>        is running and will be terminated anyway.
>
> IE from the API perspective the transition between 'queued' and
> 'running' is atomic, and no in-between states are visible.
>
> So case a) is obviously safe, and for case b) the abort function is
> already running. But then the abort function has been called from
> the block timeout handler, which did a blk_mark_rq_complete() prior
> to calling the handler. So any completion coming in after that will
> be ignored, and scsi_put_command() won't be called.
>
> Hence we should be safe here.

Hello Hannes,

I think that you have just explained why that cancel_delayed_work() call 
in scsi_put_command() is not necessary at all ... In case of normal 
command completion, scsi_put_command() will be reached with no 
scmd_eh_abort_handler() call queued since there was no timeout. If the 
command timed out scsi_put_command() will be called after the abort_work 
has already been dequeued since work items are dequeued before being 
executed.

Bart.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] scsi: improved eh timeout handler
  2013-09-02 13:11     ` Hannes Reinecke
  2013-09-02 16:38       ` Bart Van Assche
@ 2013-09-03  9:13       ` Bart Van Assche
  2013-09-04  7:02         ` Hannes Reinecke
  1 sibling, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2013-09-03  9:13 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel,
	James Smart, Roland Dreier, Mike Christie

On 09/02/13 15:11, Hannes Reinecke wrote:
> On 09/02/2013 02:45 PM, Bart Van Assche wrote:
>> This patch adds several new calls to LLD EH handlers. Is it
>> guaranteed that these will only be invoked before scsi_remove_host()
>> has finished ? For more background information, see also "[PATCH]
>> Make scsi_remove_host() wait until error handling finished"
>> (http://thread.gmane.org/gmane.linux.scsi/82572/focus=82779).
>>
> Well, that depends how scsi_remove_host() handles outstanding
> commands. What happens if you call scsi_remove_host() and there is
> still I/O in flight? I would assume that any HBA would have to kill
> any outstanding I/O prior to calling scsi_remove_host() (FC most
> certainly does this).
> Which would mean that it'll have to wait for scsi_put_command() to
> be called for all outstanding commands. And as scsi_put_command()
> will be called only _after_ our routine runs (see the reasoning
> above) we should be safe.

Hello Hannes,

Since fc_remove_host() has to be invoked before scsi_remove_host() and 
since fc_remove_host() changes the port state into FC_PORTSTATE_DELETED 
this means that fc_remote_port_chkready() will return DID_NO_CONNECT 
while scsi_remove_host() is in progress. I think this prevents that the 
SYNCHRONIZE CACHE command submitted by sd_remove() reaches SCSI disks 
over FC since sd_remove() is invoked from inside scsi_remove_host(). The 
SRP transport patch I had posted recently makes sure that the 
SYNCHRONIZE CACHE command submitted by sd_remove() reaches the target 
SCSI disk. This makes me wonder what the correct behavior is for SCSI 
transport drivers - disabling I/O before scsi_remove_host() starts or 
ensuring that I/O is still possible while scsi_remove_host() is in 
progress ? I think the call chain is as follows: scsi_remove_host() -> 
device_del() -> bus_remove_device() -> device_release_driver() -> 
__device_release_driver() -> sd_remove() -> sd_shutdown() -> 
sd_sync_cache().

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/9] scsi: improved eh timeout handler
  2013-09-03  9:13       ` Bart Van Assche
@ 2013-09-04  7:02         ` Hannes Reinecke
  0 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-04  7:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel,
	James Smart, Roland Dreier, Mike Christie

On 09/03/2013 11:13 AM, Bart Van Assche wrote:
> On 09/02/13 15:11, Hannes Reinecke wrote:
>> On 09/02/2013 02:45 PM, Bart Van Assche wrote:
>>> This patch adds several new calls to LLD EH handlers. Is it
>>> guaranteed that these will only be invoked before scsi_remove_host()
>>> has finished ? For more background information, see also "[PATCH]
>>> Make scsi_remove_host() wait until error handling finished"
>>> (http://thread.gmane.org/gmane.linux.scsi/82572/focus=82779).
>>>
>> Well, that depends how scsi_remove_host() handles outstanding
>> commands. What happens if you call scsi_remove_host() and there is
>> still I/O in flight? I would assume that any HBA would have to kill
>> any outstanding I/O prior to calling scsi_remove_host() (FC most
>> certainly does this).
>> Which would mean that it'll have to wait for scsi_put_command() to
>> be called for all outstanding commands. And as scsi_put_command()
>> will be called only _after_ our routine runs (see the reasoning
>> above) we should be safe.
> 
> Hello Hannes,
> 
> Since fc_remove_host() has to be invoked before scsi_remove_host()
> and since fc_remove_host() changes the port state into
> FC_PORTSTATE_DELETED this means that fc_remote_port_chkready() will
> return DID_NO_CONNECT while scsi_remove_host() is in progress. I
> think this prevents that the SYNCHRONIZE CACHE command submitted by
> sd_remove() reaches SCSI disks over FC since sd_remove() is invoked
> from inside scsi_remove_host(). The SRP transport patch I had posted
> recently makes sure that the SYNCHRONIZE CACHE command submitted by
> sd_remove() reaches the target SCSI disk. This makes me wonder what
> the correct behavior is for SCSI transport drivers - disabling I/O
> before scsi_remove_host() starts or ensuring that I/O is still
> possible while scsi_remove_host() is in progress ? I think the call
> chain is as follows: scsi_remove_host() -> device_del() ->
> bus_remove_device() -> device_release_driver() ->
> __device_release_driver() -> sd_remove() -> sd_shutdown() ->
> sd_sync_cache().
> 

The actual call chain is
scsi_remove_host() -> scsi_forget_host() -> __scsi_remove_device()
-> device_del() etc.

What's important here is that __scsi_remove_device() sets the state
'SDEV_DEL' and calls blk_cleanup_queue().

So after __scsi_remove_device() no further I/O is possible.
However, prior to setting SDEV_DEL I/O should be perfectly okay, so
one would assume that the SYNCHRONIZE CACHE command should be sent
to the device. USB most certainly does it.

A safe practice, however, would be to ensure that no _other_ I/O can
be sent to the device, ie that all I/O coming in via the request
queue or ioctl should be short-circuited and never hit the device at
all. That's what fc_remote_port_chkready() does.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/9] virtio_scsi: Enable new EH timeout handler
  2013-09-02  7:12 [PATCHv5 0/9] New EH command timeout handler Hannes Reinecke
                   ` (2 preceding siblings ...)
  2013-09-02  7:12 ` [PATCH 3/9] scsi: improved eh timeout handler Hannes Reinecke
@ 2013-09-02  7:13 ` Hannes Reinecke
  2013-09-02  7:13 ` [PATCH 5/9] libsas: " Hannes Reinecke
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  7:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel, James Smart,
	Bart Van Assche, Roland Dreier, Hannes Reinecke

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/virtio_scsi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 74b88ef..5fca1ca 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -683,6 +683,7 @@ static struct scsi_host_template virtscsi_host_template_single = {
 	.queuecommand = virtscsi_queuecommand_single,
 	.eh_abort_handler = virtscsi_abort,
 	.eh_device_reset_handler = virtscsi_device_reset,
+	.eh_timed_out = scsi_abort_command,
 
 	.can_queue = 1024,
 	.dma_boundary = UINT_MAX,
@@ -699,6 +700,7 @@ static struct scsi_host_template virtscsi_host_template_multi = {
 	.queuecommand = virtscsi_queuecommand_multi,
 	.eh_abort_handler = virtscsi_abort,
 	.eh_device_reset_handler = virtscsi_device_reset,
+	.eh_timed_out = scsi_abort_command,
 
 	.can_queue = 1024,
 	.dma_boundary = UINT_MAX,
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/9] libsas: Enable new EH timeout handler
  2013-09-02  7:12 [PATCHv5 0/9] New EH command timeout handler Hannes Reinecke
                   ` (3 preceding siblings ...)
  2013-09-02  7:13 ` [PATCH 4/9] virtio_scsi: Enable new EH " Hannes Reinecke
@ 2013-09-02  7:13 ` Hannes Reinecke
  2013-09-02  7:13 ` [PATCH 6/9] mptsas: " Hannes Reinecke
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  7:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel, James Smart,
	Bart Van Assche, Roland Dreier, Hannes Reinecke

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/libsas/sas_scsi_host.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index da3aee1..6eb2f0c 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -864,7 +864,7 @@ enum blk_eh_timer_return sas_scsi_timed_out(struct scsi_cmnd *cmd)
 {
 	scmd_printk(KERN_DEBUG, cmd, "command %p timed out\n", cmd);
 
-	return BLK_EH_NOT_HANDLED;
+	return scsi_abort_command(cmd);
 }
 
 int sas_ioctl(struct scsi_device *sdev, int cmd, void __user *arg)
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/9] mptsas: Enable new EH timeout handler
  2013-09-02  7:12 [PATCHv5 0/9] New EH command timeout handler Hannes Reinecke
                   ` (4 preceding siblings ...)
  2013-09-02  7:13 ` [PATCH 5/9] libsas: " Hannes Reinecke
@ 2013-09-02  7:13 ` Hannes Reinecke
  2013-09-02  7:13 ` [PATCH 7/9] mpt2sas: " Hannes Reinecke
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  7:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel, James Smart,
	Bart Van Assche, Roland Dreier, Hannes Reinecke

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/message/fusion/mptsas.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/message/fusion/mptsas.c b/drivers/message/fusion/mptsas.c
index dd239bd..d0aed77 100644
--- a/drivers/message/fusion/mptsas.c
+++ b/drivers/message/fusion/mptsas.c
@@ -1986,7 +1986,8 @@ static struct scsi_host_template mptsas_driver_template = {
 	.slave_configure		= mptsas_slave_configure,
 	.target_destroy			= mptsas_target_destroy,
 	.slave_destroy			= mptscsih_slave_destroy,
-	.change_queue_depth 		= mptscsih_change_queue_depth,
+	.change_queue_depth		= mptscsih_change_queue_depth,
+	.eh_timed_out			= scsi_abort_command,
 	.eh_abort_handler		= mptscsih_abort,
 	.eh_device_reset_handler	= mptscsih_dev_reset,
 	.eh_host_reset_handler		= mptscsih_host_reset,
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/9] mpt2sas: Enable new EH timeout handler
  2013-09-02  7:12 [PATCHv5 0/9] New EH command timeout handler Hannes Reinecke
                   ` (5 preceding siblings ...)
  2013-09-02  7:13 ` [PATCH 6/9] mptsas: " Hannes Reinecke
@ 2013-09-02  7:13 ` Hannes Reinecke
  2013-09-02  7:13 ` [PATCH 8/9] mpt3sas: " Hannes Reinecke
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  7:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel, James Smart,
	Bart Van Assche, Roland Dreier, Hannes Reinecke

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/mpt2sas/mpt2sas_scsih.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index 389d792..9ede2c2 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -7628,8 +7628,9 @@ static struct scsi_host_template scsih_driver_template = {
 	.slave_destroy			= _scsih_slave_destroy,
 	.scan_finished			= _scsih_scan_finished,
 	.scan_start			= _scsih_scan_start,
-	.change_queue_depth 		= _scsih_change_queue_depth,
+	.change_queue_depth		= _scsih_change_queue_depth,
 	.change_queue_type		= _scsih_change_queue_type,
+	.eh_timed_out			= scsi_abort_command,
 	.eh_abort_handler		= _scsih_abort,
 	.eh_device_reset_handler	= _scsih_dev_reset,
 	.eh_target_reset_handler	= _scsih_target_reset,
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/9] mpt3sas: Enable new EH timeout handler
  2013-09-02  7:12 [PATCHv5 0/9] New EH command timeout handler Hannes Reinecke
                   ` (6 preceding siblings ...)
  2013-09-02  7:13 ` [PATCH 7/9] mpt2sas: " Hannes Reinecke
@ 2013-09-02  7:13 ` Hannes Reinecke
  2013-09-02  7:13 ` [PATCH 9/9] scsi_transport_fc: " Hannes Reinecke
  2013-09-02  8:27 ` [PATCHv5 0/9] New EH command " Christoph Hellwig
  9 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  7:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel, James Smart,
	Bart Van Assche, Roland Dreier, Hannes Reinecke

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index a961fe1..b893383 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -7237,6 +7237,7 @@ static struct scsi_host_template scsih_driver_template = {
 	.scan_start			= _scsih_scan_start,
 	.change_queue_depth		= _scsih_change_queue_depth,
 	.change_queue_type		= _scsih_change_queue_type,
+	.eh_timed_out			= scsi_abort_command,
 	.eh_abort_handler		= _scsih_abort,
 	.eh_device_reset_handler	= _scsih_dev_reset,
 	.eh_target_reset_handler	= _scsih_target_reset,
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 9/9] scsi_transport_fc: Enable new EH timeout handler
  2013-09-02  7:12 [PATCHv5 0/9] New EH command timeout handler Hannes Reinecke
                   ` (7 preceding siblings ...)
  2013-09-02  7:13 ` [PATCH 8/9] mpt3sas: " Hannes Reinecke
@ 2013-09-02  7:13 ` Hannes Reinecke
  2013-09-02  8:27 ` [PATCHv5 0/9] New EH command " Christoph Hellwig
  9 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  7:13 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel, James Smart,
	Bart Van Assche, Roland Dreier, Hannes Reinecke

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_transport_fc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 4628fd5..012c801 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2079,7 +2079,7 @@ fc_timed_out(struct scsi_cmnd *scmd)
 	if (rport->port_state == FC_PORTSTATE_BLOCKED)
 		return BLK_EH_RESET_TIMER;
 
-	return BLK_EH_NOT_HANDLED;
+	return scsi_abort_command(scmd);
 }
 
 /*
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCHv5 0/9] New EH command timeout handler
  2013-09-02  7:12 [PATCHv5 0/9] New EH command timeout handler Hannes Reinecke
                   ` (8 preceding siblings ...)
  2013-09-02  7:13 ` [PATCH 9/9] scsi_transport_fc: " Hannes Reinecke
@ 2013-09-02  8:27 ` Christoph Hellwig
  2013-09-02  8:54   ` Hannes Reinecke
  9 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2013-09-02  8:27 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel,
	James Smart, Bart Van Assche, Roland Dreier

One thing I'm still wondering is why we can't enable this globally,
and if there is a reason it should be documented.

As far as I can tell the actual calling context of the eh_abort_handler
doesn't change, so an LLDD would have to rely on some very specific
side effects of being in EH to break, and we never guaranteed such
specifics.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCHv5 0/9] New EH command timeout handler
  2013-09-02  8:27 ` [PATCHv5 0/9] New EH command " Christoph Hellwig
@ 2013-09-02  8:54   ` Hannes Reinecke
  2013-09-02  9:31     ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  8:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel,
	James Smart, Bart Van Assche, Roland Dreier

On 09/02/2013 10:27 AM, Christoph Hellwig wrote:
> One thing I'm still wondering is why we can't enable this globally,
> and if there is a reason it should be documented.
> 
> As far as I can tell the actual calling context of the eh_abort_handler
> doesn't change, so an LLDD would have to rely on some very specific
> side effects of being in EH to break, and we never guaranteed such
> specifics.
> 
I don't mind. Having talked to the various SCSI folks everyone
agreed that calling abort asynchronously shouldn't do any harm.
At least as far as the SCSI spec goes.

As for documentation: I didn't document it it currently as with my
implementation it's pretty much an optional thing.
But if we were to enable it globally it surely should be documented.

So if there is a consensus I surely can enable it globally and
update the documentation.

James?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCHv5 0/9] New EH command timeout handler
  2013-09-02  8:54   ` Hannes Reinecke
@ 2013-09-02  9:31     ` Christoph Hellwig
  2013-09-02  9:59       ` Hannes Reinecke
  2013-09-03 16:36       ` Jörn Engel
  0 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2013-09-02  9:31 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, James Bottomley, linux-scsi, Ewan Milne,
	Ren Mingxin, Joern Engel, James Smart, Bart Van Assche,
	Roland Dreier

On Mon, Sep 02, 2013 at 10:54:13AM +0200, Hannes Reinecke wrote:
> I don't mind. Having talked to the various SCSI folks everyone
> agreed that calling abort asynchronously shouldn't do any harm.
> At least as far as the SCSI spec goes.
> 
> As for documentation: I didn't document it it currently as with my
> implementation it's pretty much an optional thing.
> But if we were to enable it globally it surely should be documented.
> 
> So if there is a consensus I surely can enable it globally and
> update the documentation.

I would defintively love enabling it globally.  As a fallback we should
probably keep the old code for a while and prepare a way to disable
the async code by a flag in the host template.  If no reason to enable
it shows up after a year ore two we can remove the old code entirely.

I.e change the tail of scsi_times_out into something like:


	if (rtn == BLK_EH_NOT_HANDLED && !shost->hostt->no_async_abort) {
		if (scsi_abort_command(scmd))
			return BLK_EH_NOT_HANDLED;
	}

	scmd->result |= DID_TIME_OUT << 16;
	...

That way we don't even need the new BLK_EH_SCHEDULED return value and
have all the handling in scsi_error.c

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCHv5 0/9] New EH command timeout handler
  2013-09-02  9:31     ` Christoph Hellwig
@ 2013-09-02  9:59       ` Hannes Reinecke
  2013-09-03 16:36       ` Jörn Engel
  1 sibling, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2013-09-02  9:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, linux-scsi, Ewan Milne, Ren Mingxin, Joern Engel,
	James Smart, Bart Van Assche, Roland Dreier

On 09/02/2013 11:31 AM, Christoph Hellwig wrote:
> On Mon, Sep 02, 2013 at 10:54:13AM +0200, Hannes Reinecke wrote:
>> I don't mind. Having talked to the various SCSI folks everyone
>> agreed that calling abort asynchronously shouldn't do any harm.
>> At least as far as the SCSI spec goes.
>>
>> As for documentation: I didn't document it it currently as with my
>> implementation it's pretty much an optional thing.
>> But if we were to enable it globally it surely should be documented.
>>
>> So if there is a consensus I surely can enable it globally and
>> update the documentation.
> 
> I would defintively love enabling it globally.  As a fallback we should
> probably keep the old code for a while and prepare a way to disable
> the async code by a flag in the host template.  If no reason to enable
> it shows up after a year ore two we can remove the old code entirely.
> 
> I.e change the tail of scsi_times_out into something like:
> 
> 
> 	if (rtn == BLK_EH_NOT_HANDLED && !shost->hostt->no_async_abort) {
> 		if (scsi_abort_command(scmd))
> 			return BLK_EH_NOT_HANDLED;
> 	}
> 
> 	scmd->result |= DID_TIME_OUT << 16;
> 	...
> 
> That way we don't even need the new BLK_EH_SCHEDULED return value and
> have all the handling in scsi_error.c
> 
Right. Will be doing it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCHv5 0/9] New EH command timeout handler
  2013-09-02  9:31     ` Christoph Hellwig
  2013-09-02  9:59       ` Hannes Reinecke
@ 2013-09-03 16:36       ` Jörn Engel
  1 sibling, 0 replies; 20+ messages in thread
From: Jörn Engel @ 2013-09-03 16:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Hannes Reinecke, James Bottomley, linux-scsi, Ewan Milne,
	Ren Mingxin, James Smart, Bart Van Assche, Roland Dreier

On Mon, 2 September 2013 02:31:29 -0700, Christoph Hellwig wrote:
> 
> I would defintively love enabling it globally.

Agreed.  I carry a private patch based on an older version of Hannes'
patchset doing just that.  Istr there was some problem exactly once on
a VM with a virtualized hba.  We hit problems with virtualized
ethernet devices fairly regularly, so a single case on scsi may simply
be some other latent bug.

On real hardware there have never been problems other than the scsi
aborts themselves.

Jörn

--
Do not stop an army on its way home.
-- Sun Tzu
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-09-04  7:02 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-02  7:12 [PATCHv5 0/9] New EH command timeout handler Hannes Reinecke
2013-09-02  7:12 ` [PATCH 1/9] scsi: Fix erratic device offline during EH Hannes Reinecke
2013-09-02  7:12 ` [PATCH 2/9] blk-timeout: add BLK_EH_SCHEDULED return code Hannes Reinecke
2013-09-02  7:12 ` [PATCH 3/9] scsi: improved eh timeout handler Hannes Reinecke
2013-09-02 12:45   ` Bart Van Assche
2013-09-02 13:11     ` Hannes Reinecke
2013-09-02 16:38       ` Bart Van Assche
2013-09-03  9:13       ` Bart Van Assche
2013-09-04  7:02         ` Hannes Reinecke
2013-09-02  7:13 ` [PATCH 4/9] virtio_scsi: Enable new EH " Hannes Reinecke
2013-09-02  7:13 ` [PATCH 5/9] libsas: " Hannes Reinecke
2013-09-02  7:13 ` [PATCH 6/9] mptsas: " Hannes Reinecke
2013-09-02  7:13 ` [PATCH 7/9] mpt2sas: " Hannes Reinecke
2013-09-02  7:13 ` [PATCH 8/9] mpt3sas: " Hannes Reinecke
2013-09-02  7:13 ` [PATCH 9/9] scsi_transport_fc: " Hannes Reinecke
2013-09-02  8:27 ` [PATCHv5 0/9] New EH command " Christoph Hellwig
2013-09-02  8:54   ` Hannes Reinecke
2013-09-02  9:31     ` Christoph Hellwig
2013-09-02  9:59       ` Hannes Reinecke
2013-09-03 16:36       ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).