[PATCH] SCSI: implement scsi_eh

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] SCSI: implement scsi_eh_schedule()
@ 2006-04-01 10:38 Tejun Heo
  2006-04-01 20:14 ` Jeff Garzik
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2006-04-01 10:38 UTC (permalink / raw)
  To: hch, James.Bottomley, Jeff Garzik, alan, albertcc, linux-ide,
	linux-scsi

[PATCH] SCSI: implement scsi_eh_schedule()

This patch implements scsi_eh_schedule() which provides a way to
directly invoke SCSI EH from drivers which implement EH using
->eh_strategy_handler.  Combined with scsi_eh_flush_done_q(), this
gives such drivers complete control over when and how to invoke EH and
handle failed commands.

scsi_eh_schedule() can also be invoked without a scmd.  This is useful
for handling exception conditions occurring while no command is in
progress.  New Scsi_Host field host_eh_invoked is added for this.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

Hello, all.

This patch is in preparation for new libata EH implementation.  As
described above, this is to allow LLDD's direct access to SCSI EH
rather than invoking it indirectly with CHECK SENSE without sense data
hack.

This patch applies cleanly to both scsi-misc-2.6 and libata-dev.  If
this gets ACKed, it would be better to merge via libata-dev as this
change is necessary for new libata EH.

Thanks.

 drivers/scsi/scsi_error.c |   40 +++++++++++++++++++++++++++++++++++++++-
 include/scsi/scsi_eh.h    |    1 +
 include/scsi/scsi_host.h  |    1 +
 3 files changed, 41 insertions(+), 1 deletion(-)

Index: work/drivers/scsi/scsi_error.c
===================================================================
--- work.orig/drivers/scsi/scsi_error.c	2006-04-01 16:42:51.000000000 +0900
+++ work/drivers/scsi/scsi_error.c	2006-04-01 16:42:58.000000000 +0900
@@ -90,6 +90,44 @@ int scsi_eh_scmd_add(struct scsi_cmnd *s
 }
 
 /**
+ * scsi_eh_schedule - schedule error handling.
+ * @shost:	SCSI host to invoke error handling on.
+ * @scmd:	scmd to run eh on (can be NULL)
+ *
+ * Return value:
+ *	0 on failure.
+ **/
+int scsi_eh_schedule(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	WARN_ON(!shost->hostt->eh_strategy_handler);
+
+	if (!shost->ehandler)
+		return 0;
+
+	if (scmd && !scsi_delete_timer(scmd))
+		return 0;	/* timeout won */
+
+	spin_lock_irqsave(shost->host_lock, flags);
+	if (scsi_host_set_state(shost, SHOST_RECOVERY))
+		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
+			goto out_unlock;
+	ret = 1;
+	if (scmd) {
+		list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
+		shost->host_failed++;
+	} else
+		shost->host_eh_invoked++;
+	scsi_eh_wakeup(shost);
+ out_unlock:
+	spin_unlock_irqrestore(shost->host_lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL(scsi_eh_schedule);
+
+/**
  * scsi_add_timer - Start timeout timer for a single scsi command.
  * @scmd:	scsi command that is about to start running.
  * @timeout:	amount of time to allow this command to run.
@@ -1517,7 +1555,7 @@ int scsi_error_handler(void *data)
 	 */
 	set_current_state(TASK_INTERRUPTIBLE);
 	while (!kthread_should_stop()) {
-		if (shost->host_failed == 0 ||
+		if ((shost->host_failed == 0 && shost->host_eh_invoked == 0) ||
 		    shost->host_failed != shost->host_busy) {
 			SCSI_LOG_ERROR_RECOVERY(1,
 				printk("Error handler scsi_eh_%d sleeping\n",
Index: work/include/scsi/scsi_eh.h
===================================================================
--- work.orig/include/scsi/scsi_eh.h	2006-04-01 16:42:51.000000000 +0900
+++ work/include/scsi/scsi_eh.h	2006-04-01 16:42:58.000000000 +0900
@@ -35,6 +35,7 @@ static inline int scsi_sense_valid(struc
 }
 
 
+extern int scsi_eh_schedule(struct Scsi_Host *shost, struct scsi_cmnd *scmd);
 extern void scsi_eh_finish_cmd(struct scsi_cmnd *scmd,
 			       struct list_head *done_q);
 extern void scsi_eh_flush_done_q(struct list_head *done_q);
Index: work/include/scsi/scsi_host.h
===================================================================
--- work.orig/include/scsi/scsi_host.h	2006-04-01 16:42:51.000000000 +0900
+++ work/include/scsi/scsi_host.h	2006-04-01 16:42:58.000000000 +0900
@@ -473,6 +473,7 @@ struct Scsi_Host {
 	 */
 	unsigned int host_busy;		   /* commands actually active on low-level */
 	unsigned int host_failed;	   /* commands that failed. */
+	unsigned int host_eh_invoked;	   /* EH invocations without command */
     
 	unsigned short host_no;  /* Used for IOCTL_GET_IDLUN, /proc/scsi et al. */
 	int resetting; /* if set, it means that last_reset is a valid value */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] SCSI: implement scsi_eh_schedule()
  2006-04-01 10:38 [PATCH] SCSI: implement scsi_eh_schedule() Tejun Heo
@ 2006-04-01 20:14 ` Jeff Garzik
  2006-04-02  1:15   ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff Garzik @ 2006-04-01 20:14 UTC (permalink / raw)
  To: Tejun Heo; +Cc: hch, James.Bottomley, alan, albertcc, linux-ide, linux-scsi

Tejun Heo wrote:
> [PATCH] SCSI: implement scsi_eh_schedule()
> 
> This patch implements scsi_eh_schedule() which provides a way to
> directly invoke SCSI EH from drivers which implement EH using
> ->eh_strategy_handler.  Combined with scsi_eh_flush_done_q(), this
> gives such drivers complete control over when and how to invoke EH and
> handle failed commands.
> 
> scsi_eh_schedule() can also be invoked without a scmd.  This is useful
> for handling exception conditions occurring while no command is in
> progress.  New Scsi_Host field host_eh_invoked is added for this.
> 
> Signed-off-by: Tejun Heo <htejun@gmail.com>

Not so sure about this one:

1) Rather than creating the scmd==NULL special case path, just create a 
new function scsi_kick_eh() or somesuch.  I presume such an 
implementation would be quite tiny:  satisfy the condition 
'shost->host_busy == shost->host_failed', then call scsi_eh_wakeup().

2) Looks almost exactly like scsi_eh_scmd_add().  Minus the code deleted 
via suggestion #1, you really just add a timer check.  Why not add a 
SCSI EH flag which says "check timer", then just call scsi_eh_scmd_add()?

3) I would think that host_eh_invoked could be deduced.

	Jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] SCSI: implement scsi_eh_schedule()
  2006-04-01 20:14 ` Jeff Garzik
@ 2006-04-02  1:15   ` Tejun Heo
  2006-04-02 16:04     ` [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd() Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2006-04-02  1:15 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: hch, James.Bottomley, alan, albertcc, linux-ide, linux-scsi

Jeff Garzik wrote:
> Tejun Heo wrote:
>> [PATCH] SCSI: implement scsi_eh_schedule()
>>
>> This patch implements scsi_eh_schedule() which provides a way to
>> directly invoke SCSI EH from drivers which implement EH using
>> ->eh_strategy_handler.  Combined with scsi_eh_flush_done_q(), this
>> gives such drivers complete control over when and how to invoke EH and
>> handle failed commands.
>>
>> scsi_eh_schedule() can also be invoked without a scmd.  This is useful
>> for handling exception conditions occurring while no command is in
>> progress.  New Scsi_Host field host_eh_invoked is added for this.
>>
>> Signed-off-by: Tejun Heo <htejun@gmail.com>
> 
> Not so sure about this one:
> 
> 1) Rather than creating the scmd==NULL special case path, just create a 
> new function scsi_kick_eh() or somesuch.  I presume such an 
> implementation would be quite tiny:  satisfy the condition 
> 'shost->host_busy == shost->host_failed', then call scsi_eh_wakeup().

I see.

> 2) Looks almost exactly like scsi_eh_scmd_add().  Minus the code deleted 
> via suggestion #1, you really just add a timer check.  Why not add a 
> SCSI EH flag which says "check timer", then just call scsi_eh_scmd_add()?

I didn't want to add stuff to SCSI main EH path which isn't necessary to 
main EH. But you're right, they are almost identical. Maybe a wrapper 
around scsi_eh_scmd_add() should work.

> 3) I would think that host_eh_invoked could be deduced.

Okay, I'll try to cook something prettier.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-02  1:15   ` Tejun Heo
@ 2006-04-02 16:04     ` Tejun Heo
  2006-04-02 16:06       ` [PATCH 2/2] SCSI: implement scsi_eh_schedule_host() Tejun Heo
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Tejun Heo @ 2006-04-02 16:04 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: hch, James.Bottomley, alan, albertcc, arjan, linux-ide,
	linux-scsi

This patch implements scsi_eh_schedule_cmd() which provides a way to
directly invoke SCSI EH from drivers implementing
->eh_strategy_handler.  Combined with scsi_eh_flush_done_q(), this
gives such drivers complete control over when and how to invoke EH and
handle failed commands.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

 drivers/scsi/scsi_error.c |   24 ++++++++++++++++++++++++
 include/scsi/scsi_eh.h    |    1 +
 2 files changed, 25 insertions(+), 0 deletions(-)

caaed524eab41fab88be717d06627bd50cf66ee2
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5f0fdfb..c314095 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -90,6 +90,30 @@ int scsi_eh_scmd_add(struct scsi_cmnd *s
 }
 
 /**
+ * scsi_eh_schedule_cmd - schedule scsi cmd for error handling.
+ * @scmd:	scmd to run eh on.
+ *
+ * Description:
+ *	This function is used by LLDDs which don't use standard SCSI
+ *	EH to schedule scmd for EH.
+ *
+ * Return value:
+ *	0 on failure.
+ **/
+int scsi_eh_schedule_cmd(struct scsi_cmnd *scmd)
+{
+	struct Scsi_Host *shost = scmd->device->host;
+
+	WARN_ON(!shost->hostt->eh_strategy_handler);
+
+	if (!scsi_delete_timer(scmd))
+		return 0;	/* timeout won */
+
+	return scsi_eh_scmd_add(scmd, 0);
+}
+EXPORT_SYMBOL_GPL(scsi_eh_schedule_cmd);
+
+/**
  * scsi_add_timer - Start timeout timer for a single scsi command.
  * @scmd:	scsi command that is about to start running.
  * @timeout:	amount of time to allow this command to run.
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index d160880..dd0a50e 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -35,6 +35,7 @@ static inline int scsi_sense_valid(struc
 }
 
 
+extern int scsi_eh_schedule_cmd(struct scsi_cmnd *scmd);
 extern void scsi_eh_finish_cmd(struct scsi_cmnd *scmd,
 			       struct list_head *done_q);
 extern void scsi_eh_flush_done_q(struct list_head *done_q);
-- 
1.2.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] SCSI: implement scsi_eh_schedule_host()
  2006-04-02 16:04     ` [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd() Tejun Heo
@ 2006-04-02 16:06       ` Tejun Heo
  2006-04-11 17:43         ` Jeff Garzik
  2006-04-02 23:49       ` [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd() Luben Tuikov
  2006-04-11 17:41       ` Jeff Garzik
  2 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2006-04-02 16:06 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: hch, James.Bottomley, alan, albertcc, arjan, linux-ide,
	linux-scsi

This patch implements scsi_eh_schedule_host() which provides a way to
directly invoke SCSI EH without SCSI command from drivers implementing
->eh_strategy_handler.  This allows such drivers to use EH to handle
exception conditions which are not associated with particular SCSI
command.

Signed-off-by: Tejun Heo <htejun@gmail.com>

---

Jeff, I think using separate variable (->host_eh_scheduled) is the
simplest way to implement this.  Adding more semantics to ->host_busy
and failed doesn't sound very attractive to me.

 drivers/scsi/scsi_error.c |   38 +++++++++++++++++++++++++++++++++++++-
 include/scsi/scsi_eh.h    |    1 +
 include/scsi/scsi_host.h  |    1 +
 3 files changed, 39 insertions(+), 1 deletions(-)

544d293f756e7041f676ac30152445339111a9c8
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c314095..bc8a253 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -114,6 +114,42 @@ int scsi_eh_schedule_cmd(struct scsi_cmn
 EXPORT_SYMBOL_GPL(scsi_eh_schedule_cmd);
 
 /**
+ * scsi_eh_schedule_host - schedule scsi host for error handling
+ * @shost:	SCSI host to invoke error handling on.
+ *
+ * Description:
+ *	This function is used by LLDDs which don't use standard SCSI
+ *	EH to schedule @shost for EH.  This allows such LLDDs to use
+ *	EH for handling exceptions not associated with command.
+ *
+ * Return value:
+ *	0 on failure.
+ **/
+int scsi_eh_schedule_host(struct Scsi_Host *shost)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	WARN_ON(!shost->hostt->eh_strategy_handler);
+
+	if (!shost->ehandler)
+		return 0;
+
+	spin_lock_irqsave(shost->host_lock, flags);
+	if (scsi_host_set_state(shost, SHOST_RECOVERY))
+		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
+			goto out_unlock;
+
+	ret = 1;
+	shost->host_eh_scheduled++;
+	scsi_eh_wakeup(shost);
+ out_unlock:
+	spin_unlock_irqrestore(shost->host_lock, flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(scsi_eh_schedule_host);
+
+/**
  * scsi_add_timer - Start timeout timer for a single scsi command.
  * @scmd:	scsi command that is about to start running.
  * @timeout:	amount of time to allow this command to run.
@@ -1541,7 +1577,7 @@ int scsi_error_handler(void *data)
 	 */
 	set_current_state(TASK_INTERRUPTIBLE);
 	while (!kthread_should_stop()) {
-		if (shost->host_failed == 0 ||
+		if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
 		    shost->host_failed != shost->host_busy) {
 			SCSI_LOG_ERROR_RECOVERY(1,
 				printk("Error handler scsi_eh_%d sleeping\n",
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index dd0a50e..b1c09f4 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -36,6 +36,7 @@ static inline int scsi_sense_valid(struc
 
 
 extern int scsi_eh_schedule_cmd(struct scsi_cmnd *scmd);
+extern int scsi_eh_schedule_host(struct Scsi_Host *shost);
 extern void scsi_eh_finish_cmd(struct scsi_cmnd *scmd,
 			       struct list_head *done_q);
 extern void scsi_eh_flush_done_q(struct list_head *done_q);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index dc6862d..a991b9c 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -473,6 +473,7 @@ struct Scsi_Host {
 	 */
 	unsigned int host_busy;		   /* commands actually active on low-level */
 	unsigned int host_failed;	   /* commands that failed. */
+	unsigned int host_eh_scheduled;	   /* EH scheduled without command */
     
 	unsigned short host_no;  /* Used for IOCTL_GET_IDLUN, /proc/scsi et al. */
 	int resetting; /* if set, it means that last_reset is a valid value */
-- 
1.2.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-02 16:04     ` [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd() Tejun Heo
  2006-04-02 16:06       ` [PATCH 2/2] SCSI: implement scsi_eh_schedule_host() Tejun Heo
@ 2006-04-02 23:49       ` Luben Tuikov
  2006-04-03  1:24         ` Tejun Heo
  2006-04-11 17:41       ` Jeff Garzik
  2 siblings, 1 reply; 22+ messages in thread
From: Luben Tuikov @ 2006-04-02 23:49 UTC (permalink / raw)
  To: Tejun Heo, Jeff Garzik
  Cc: hch, James.Bottomley, alan, albertcc, arjan, linux-ide,
	linux-scsi

--- Tejun Heo <htejun@gmail.com> wrote:
> This patch implements scsi_eh_schedule_cmd() which provides a way to
> directly invoke SCSI EH from drivers implementing
> ->eh_strategy_handler.  Combined with scsi_eh_flush_done_q(), this
> gives such drivers complete control over when and how to invoke EH and
> handle failed commands.

Hi Tejun,

I sent an almost identical patch, albeit much more general, see this:
http://marc.theaimsgroup.com/?l=linux-scsi&m=113833937421677&w=2

Here is the core of it:
--start-quote--
This function requests that SCSI Core start recovery for the
command by deleting the timer and adding the command to the eh
queue.  It can be called by either LLDDs or SCSI Core.  LLDDs who
implement their own error recovery MAY ignore the timeout event if
they generated scsi_req_abort_cmd.

void scsi_req_abort_cmd(struct scsi_cmnd *cmd)
{
	if (!scsi_delete_timer(cmd))
		return;
	scsi_times_out(cmd);
}
--end-quote---

To have a consistent and complete error recovery solution, given
the current SCSI Core infrastructure, as is requred for your SATA effort,
and other (protocol) work, this is what one needs to implement:

1. eh_timed_out(),
2. eh_strategy_handler(),
3. scsi_req_abort_cmd()  (as shown above), and finally,
4. scsi_req_dev_reset().

I keep those in my own git trees. (Had 3 been accepted, I would've sent 4.)

Beware, Bottomley will most likely have his own ideas on error recovery,
as he did when I posted my patch above.  I didn't bother to explain, from a
protocol point of view, why it is needed.

Good luck!
     Luben


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-02 23:49       ` [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd() Luben Tuikov
@ 2006-04-03  1:24         ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2006-04-03  1:24 UTC (permalink / raw)
  To: ltuikov
  Cc: Jeff Garzik, hch, James.Bottomley, alan, albertcc, arjan,
	linux-ide, linux-scsi

Hello, Luben.

Luben Tuikov wrote:
> --- Tejun Heo <htejun@gmail.com> wrote:
>> This patch implements scsi_eh_schedule_cmd() which provides a way to
>> directly invoke SCSI EH from drivers implementing
>> ->eh_strategy_handler.  Combined with scsi_eh_flush_done_q(), this
>> gives such drivers complete control over when and how to invoke EH and
>> handle failed commands.
> 
> Hi Tejun,
> 
> I sent an almost identical patch, albeit much more general, see this:
> http://marc.theaimsgroup.com/?l=linux-scsi&m=113833937421677&w=2
> 
> Here is the core of it:
> --start-quote--
> This function requests that SCSI Core start recovery for the
> command by deleting the timer and adding the command to the eh
> queue.  It can be called by either LLDDs or SCSI Core.  LLDDs who
> implement their own error recovery MAY ignore the timeout event if
> they generated scsi_req_abort_cmd.
> 
> void scsi_req_abort_cmd(struct scsi_cmnd *cmd)
> {
> 	if (!scsi_delete_timer(cmd))
> 		return;
> 	scsi_times_out(cmd);
> }
> --end-quote---
> 
> To have a consistent and complete error recovery solution, given
> the current SCSI Core infrastructure, as is requred for your SATA effort,
> and other (protocol) work, this is what one needs to implement:
> 
> 1. eh_timed_out(),
> 2. eh_strategy_handler(),
> 3. scsi_req_abort_cmd()  (as shown above), and finally,
> 4. scsi_req_dev_reset().
> 
> I keep those in my own git trees. (Had 3 been accepted, I would've sent 4.)

Hmmm... Is scsi_req_dev_reset() similar to scsi_eh_schedule_host()?  ie. 
Does it request EH without scmd?  If so, can you post the patch?  I'm 
okay with anything that works.  All that I need are...

* Non-hackish way to invoke EH on scmd from drivers implementing its own 
EH.  DID_TIME_OUT solution James talked about in the thread isn't 
enough.  It needs to invoke EH whatever the command is.

* Invoking EH without scmd.

IMHO, it's better to allow such driver to tap directly into EH rather 
than making it pass through normal SCSI EH decision making process 
(scsi_device_disposition).  It's much simpler that way.

James, can you tell us what you're thinking about this matter?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-02 16:04     ` [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd() Tejun Heo
  2006-04-02 16:06       ` [PATCH 2/2] SCSI: implement scsi_eh_schedule_host() Tejun Heo
  2006-04-02 23:49       ` [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd() Luben Tuikov
@ 2006-04-11 17:41       ` Jeff Garzik
  2006-04-11 21:28         ` Patrick Mansfield
  2 siblings, 1 reply; 22+ messages in thread
From: Jeff Garzik @ 2006-04-11 17:41 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hch, James.Bottomley, alan, albertcc, arjan, linux-ide,
	linux-scsi

Tejun Heo wrote:
> This patch implements scsi_eh_schedule_cmd() which provides a way to
> directly invoke SCSI EH from drivers implementing
> ->eh_strategy_handler.  Combined with scsi_eh_flush_done_q(), this
> gives such drivers complete control over when and how to invoke EH and
> handle failed commands.
> 
> Signed-off-by: Tejun Heo <htejun@gmail.com>
> 
> ---
> 
>  drivers/scsi/scsi_error.c |   24 ++++++++++++++++++++++++
>  include/scsi/scsi_eh.h    |    1 +
>  2 files changed, 25 insertions(+), 0 deletions(-)
> 
> caaed524eab41fab88be717d06627bd50cf66ee2
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 5f0fdfb..c314095 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -90,6 +90,30 @@ int scsi_eh_scmd_add(struct scsi_cmnd *s
>  }
>  
>  /**
> + * scsi_eh_schedule_cmd - schedule scsi cmd for error handling.
> + * @scmd:	scmd to run eh on.
> + *
> + * Description:
> + *	This function is used by LLDDs which don't use standard SCSI
> + *	EH to schedule scmd for EH.
> + *
> + * Return value:
> + *	0 on failure.
> + **/
> +int scsi_eh_schedule_cmd(struct scsi_cmnd *scmd)
> +{
> +	struct Scsi_Host *shost = scmd->device->host;
> +
> +	WARN_ON(!shost->hostt->eh_strategy_handler);
> +
> +	if (!scsi_delete_timer(scmd))
> +		return 0;	/* timeout won */
> +
> +	return scsi_eh_scmd_add(scmd, 0);
> +}
> +EXPORT_SYMBOL_GPL(scsi_eh_schedule_cmd);


ACK, though this patch makes me think that there are some non-libata 
pieces of code that could use this function.

Unfortunately for libata, we are stuck waiting on linux-scsi people to 
return and give their opinion.  If nobody complains, I'll apply it, if 
it doesn't appear in scsi-misc-2.6 sometime vaguely soon.

	Jeff



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] SCSI: implement scsi_eh_schedule_host()
  2006-04-02 16:06       ` [PATCH 2/2] SCSI: implement scsi_eh_schedule_host() Tejun Heo
@ 2006-04-11 17:43         ` Jeff Garzik
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff Garzik @ 2006-04-11 17:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: hch, James.Bottomley, alan, albertcc, arjan, linux-ide,
	linux-scsi

Tejun Heo wrote:
> This patch implements scsi_eh_schedule_host() which provides a way to
> directly invoke SCSI EH without SCSI command from drivers implementing
> ->eh_strategy_handler.  This allows such drivers to use EH to handle
> exception conditions which are not associated with particular SCSI
> command.
> 
> Signed-off-by: Tejun Heo <htejun@gmail.com>
> 
> ---
> 
> Jeff, I think using separate variable (->host_eh_scheduled) is the
> simplest way to implement this.  Adding more semantics to ->host_busy
> and failed doesn't sound very attractive to me.

ACK, I reviewed the existing uses of host_busy and host_error (not too 
many), and this should be OK.

However, I continue to be nervous being in the error handler when 
host_busy and host_error are zero.

	Jeff




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-11 17:41       ` Jeff Garzik
@ 2006-04-11 21:28         ` Patrick Mansfield
  2006-04-12  2:21           ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Patrick Mansfield @ 2006-04-11 21:28 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Tejun Heo, hch, James.Bottomley, alan, albertcc, arjan, linux-ide,
	linux-scsi

On Tue, Apr 11, 2006 at 01:41:57PM -0400, Jeff Garzik wrote:
> Tejun Heo wrote:
> >This patch implements scsi_eh_schedule_cmd() which provides a way to
> >directly invoke SCSI EH from drivers implementing
> >->eh_strategy_handler.  Combined with scsi_eh_flush_done_q(), this
> >gives such drivers complete control over when and how to invoke EH and
> >handle failed commands.


> >+ * scsi_eh_schedule_cmd - schedule scsi cmd for error handling.
> >+ * @scmd:	scmd to run eh on.
> >+ *
> >+ * Description:
> >+ *	This function is used by LLDDs which don't use standard SCSI
> >+ *	EH to schedule scmd for EH.
> >+ *
> >+ * Return value:
> >+ *	0 on failure.
> >+ **/
> >+int scsi_eh_schedule_cmd(struct scsi_cmnd *scmd)
> >+{
> >+	struct Scsi_Host *shost = scmd->device->host;
> >+
> >+	WARN_ON(!shost->hostt->eh_strategy_handler);
> >+
> >+	if (!scsi_delete_timer(scmd))
> >+		return 0;	/* timeout won */
> >+
> >+	return scsi_eh_scmd_add(scmd, 0);
> >+}
> >+EXPORT_SYMBOL_GPL(scsi_eh_schedule_cmd);
> 
> 
> ACK, though this patch makes me think that there are some non-libata 
> pieces of code that could use this function.
> 
> Unfortunately for libata, we are stuck waiting on linux-scsi people to 
> return and give their opinion.  If nobody complains, I'll apply it, if 
> it doesn't appear in scsi-misc-2.6 sometime vaguely soon.

Per other email it should be called scsi_req_abort_cmd() or such, as that
is the only reason to call it, correct?

Any other handling can be completed by calling the ->done function.

Even the abort/cancel could be done in the driver without this, I assume
it is avaiable so the driver can use the eh process and existing code
paths rather than duplicate similar code.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-11 21:28         ` Patrick Mansfield
@ 2006-04-12  2:21           ` Tejun Heo
  2006-04-12  8:24             ` Luben Tuikov
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2006-04-12  2:21 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: Jeff Garzik, hch, James.Bottomley, alan, albertcc, arjan,
	linux-ide, linux-scsi

Patrick Mansfield wrote:
> On Tue, Apr 11, 2006 at 01:41:57PM -0400, Jeff Garzik wrote:
>> Tejun Heo wrote:
>>> + * scsi_eh_schedule_cmd - schedule scsi cmd for error handling.
>>> + * @scmd:	scmd to run eh on.
>>> + *
>>> + * Description:
>>> + *	This function is used by LLDDs which don't use standard SCSI
>>> + *	EH to schedule scmd for EH.
>>> + *
>>> + * Return value:
>>> + *	0 on failure.
>>> + **/
>>> +int scsi_eh_schedule_cmd(struct scsi_cmnd *scmd)
[--snip--]
>>
>> ACK, though this patch makes me think that there are some non-libata 
>> pieces of code that could use this function.
>>
>> Unfortunately for libata, we are stuck waiting on linux-scsi people to 
>> return and give their opinion.  If nobody complains, I'll apply it, if 
>> it doesn't appear in scsi-misc-2.6 sometime vaguely soon.
> 
> Per other email it should be called scsi_req_abort_cmd() or such, as that
> is the only reason to call it, correct?

Well, it's named this way is to keep it consistent with 
scsi_eh_schedule_host().  Either name is okay with me but driver which 
use this function probably have interest in schedule_host() but not in 
other SCSI EH functions.  So, considering that, I think the current 
naming is okay.

> Any other handling can be completed by calling the ->done function.
> 
> Even the abort/cancel could be done in the driver without this, I assume
> it is avaiable so the driver can use the eh process and existing code
> paths rather than duplicate similar code.

Yeap, as I noted earlier, passing scmds to EH is possible without this 
function but it has to be done in a quite hackish way.  My earlier 
libata EH implementation did this by completing the scmd with CHECK 
CONDITION but without sense data which makes SCSI midlayer invoke EH to 
acquire sense data, at which point libata EH can take over and do EH. 
It's just ugly.  Again, I'm okay with whatever that works, but things 
are much simpler this way.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-12  2:21           ` Tejun Heo
@ 2006-04-12  8:24             ` Luben Tuikov
  2006-04-12 16:18               ` Patrick Mansfield
  0 siblings, 1 reply; 22+ messages in thread
From: Luben Tuikov @ 2006-04-12  8:24 UTC (permalink / raw)
  To: Tejun Heo, Patrick Mansfield
  Cc: Jeff Garzik, hch, James.Bottomley, alan, albertcc, arjan,
	linux-ide, linux-scsi

--- Tejun Heo <htejun@gmail.com> wrote:
> Patrick Mansfield wrote:
> > On Tue, Apr 11, 2006 at 01:41:57PM -0400, Jeff Garzik wrote:
> >> Tejun Heo wrote:
> >>> + * scsi_eh_schedule_cmd - schedule scsi cmd for error handling.
> >>> + * @scmd:	scmd to run eh on.
> >>> + *
> >>> + * Description:
> >>> + *	This function is used by LLDDs which don't use standard SCSI
> >>> + *	EH to schedule scmd for EH.
> >>> + *
> >>> + * Return value:
> >>> + *	0 on failure.
> >>> + **/
> >>> +int scsi_eh_schedule_cmd(struct scsi_cmnd *scmd)
> [--snip--]
> >>
> >> ACK, though this patch makes me think that there are some non-libata 
> >> pieces of code that could use this function.
> >>
> >> Unfortunately for libata, we are stuck waiting on linux-scsi people to 
> >> return and give their opinion.  If nobody complains, I'll apply it, if 
> >> it doesn't appear in scsi-misc-2.6 sometime vaguely soon.
> > 
> > Per other email it should be called scsi_req_abort_cmd() or such, as that
> > is the only reason to call it, correct?
> 
> Well, it's named this way is to keep it consistent with 
> scsi_eh_schedule_host().  Either name is okay with me but driver which 
> use this function probably have interest in schedule_host() but not in 
> other SCSI EH functions.  So, considering that, I think the current 
> naming is okay.

Tejun, note that scsi_req_abort_cmd() absolves your patch.
I think this is what Pat is trying to say.  I.e. it is much
more general and thus applies to more applications (as in uses).
Plus it is shorter, simpler (3 lines) and more straightforward.

So, scsi_eh_reschedule_host() is equivalent to simply scsi_req_abort_cmd(cmd)
xor scsi_req_dev_reset(dev), the latter in case you want to notify without
piggybacking on a command.

Also, your routine calls more specific eh routines and you should try
to be more general.

> > Any other handling can be completed by calling the ->done function.
> > 
> > Even the abort/cancel could be done in the driver without this, I assume
> > it is avaiable so the driver can use the eh process and existing code
> > paths rather than duplicate similar code.
> 
> Yeap, as I noted earlier, passing scmds to EH is possible without this 
> function but it has to be done in a quite hackish way.  My earlier 

Indeed.

> libata EH implementation did this by completing the scmd with CHECK 
> CONDITION but without sense data which makes SCSI midlayer invoke EH to 
> acquire sense data, at which point libata EH can take over and do EH. 

Ouch!

Good thing you rewrought your solution:
1. you naturally arive at a better solution,
2. you are trying to bring some sense into "libata" and indirectly SCSI.

> It's just ugly.  Again, I'm okay with whatever that works, but things 
> are much simpler this way.

Or maybe even simpler this way:

void scsi_req_abort_cmd(struct scsi_cmnd *cmd)
{
	if (!scsi_delete_timer(cmd))
		return;
	scsi_times_out(cmd);
}
EXPORT_SYMBOL(scsi_req_abort_cmd);
as posted here: http://marc.theaimsgroup.com/?l=linux-scsi&m=113833937421677&w=2

(Had one page on error recovery discussion here, but deleted it
as it wasn't relevant.)

Good luck!
      Luben


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-12  8:24             ` Luben Tuikov
@ 2006-04-12 16:18               ` Patrick Mansfield
  2006-04-13  5:32                 ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Patrick Mansfield @ 2006-04-12 16:18 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: Tejun Heo, Jeff Garzik, hch, James.Bottomley, alan, albertcc,
	arjan, linux-ide, linux-scsi

On Wed, Apr 12, 2006 at 01:24:42AM -0700, Luben Tuikov wrote:
> --- Tejun Heo <htejun@gmail.com> wrote:
> > Patrick Mansfield wrote:
> > > On Tue, Apr 11, 2006 at 01:41:57PM -0400, Jeff Garzik wrote:
> > >> Tejun Heo wrote:

> > >>> +int scsi_eh_schedule_cmd(struct scsi_cmnd *scmd)

> > > Per other email it should be called scsi_req_abort_cmd() or such, as that
> > > is the only reason to call it, correct?
> > 
> > Well, it's named this way is to keep it consistent with 
> > scsi_eh_schedule_host().  Either name is okay with me but driver which 
> > use this function probably have interest in schedule_host() but not in 
> > other SCSI EH functions.  So, considering that, I think the current 
> > naming is okay.
> 
> Tejun, note that scsi_req_abort_cmd() absolves your patch.
> I think this is what Pat is trying to say.  I.e. it is much
> more general and thus applies to more applications (as in uses).
> Plus it is shorter, simpler (3 lines) and more straightforward.

Both implementations work ... really I'd prefer a scsi_times_out() and
scsi_abort_cmd() that call a __scsi_abort_cmd().

If we implement them right, in the future those simple interfaces can stay
the same even if we no longer have to invoke eh to abort a command.

> So, scsi_eh_reschedule_host() is equivalent to simply scsi_req_abort_cmd(cmd)
> xor scsi_req_dev_reset(dev), the latter in case you want to notify without
> piggybacking on a command.
> 
> Also, your routine calls more specific eh routines and you should try
> to be more general.
> 
> > > Any other handling can be completed by calling the ->done function.
> > > 
> > > Even the abort/cancel could be done in the driver without this, I assume
> > > it is avaiable so the driver can use the eh process and existing code
> > > paths rather than duplicate similar code.
> > 
> > Yeap, as I noted earlier, passing scmds to EH is possible without this 
> > function but it has to be done in a quite hackish way.  My earlier 

I didn't mean pass them to eh, I mean the driver or transport could have
its own work queue (or not ... we don't need a work queue or process
context to send commands, scsi core sends commands without a work queue or
process context), and issue the abort. Then on completion set error to
DID_TIME_OUT (or whatever makes sense) and call the ->done function.

We should move towards driver/transport supplied eh, that is invoked by
the driver/transport specific code, not (possibly) by a single timeout. I
thought this was the direction everyone wanted to go.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-12 16:18               ` Patrick Mansfield
@ 2006-04-13  5:32                 ` Tejun Heo
  2006-04-14  8:49                   ` Luben Tuikov
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2006-04-13  5:32 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: Luben Tuikov, Jeff Garzik, hch, James.Bottomley, alan, albertcc,
	arjan, linux-ide, linux-scsi

Hello, Luben, Patrick.

Patrick Mansfield wrote:
> On Wed, Apr 12, 2006 at 01:24:42AM -0700, Luben Tuikov wrote:
>> --- Tejun Heo <htejun@gmail.com> wrote:
>>> Patrick Mansfield wrote:
>>>> On Tue, Apr 11, 2006 at 01:41:57PM -0400, Jeff Garzik wrote:
>>>>> Tejun Heo wrote:
> 
>>>>>> +int scsi_eh_schedule_cmd(struct scsi_cmnd *scmd)
> 
>>>> Per other email it should be called scsi_req_abort_cmd() or such, as that
>>>> is the only reason to call it, correct?
>>> Well, it's named this way is to keep it consistent with 
>>> scsi_eh_schedule_host().  Either name is okay with me but driver which 
>>> use this function probably have interest in schedule_host() but not in 
>>> other SCSI EH functions.  So, considering that, I think the current 
>>> naming is okay.
>> Tejun, note that scsi_req_abort_cmd() absolves your patch.
>> I think this is what Pat is trying to say.  I.e. it is much
>> more general and thus applies to more applications (as in uses).
>> Plus it is shorter, simpler (3 lines) and more straightforward.

Yeap, I kind of misunderstood Patrick's mail.  scsi_req_abort_cmd() is 
good but...

* The event is not really a timeout.  It shouldn't be logged as 
TIMEOUT_ERROR or completed with DID_TIME_OUT when add_scmd fails.

* Similarly, I don't think it's a good idea to call ->eh_timed_out() on 
explicit command abort.  If something common has to be done, it's better 
to make a function for that can calling the function from both 
->eh_timed_out() and after scsi_req_abort_cmd() succeeds.  Otherwise, 
libata has to decide whether ->eh_timed_out() is called from real 
timeout or explicit abortion and skip timeout processing for explicit 
abortions.

* I don't really understand what you mean by saying scsi_req_abort_cmd() 
is more generic.  If it means the WARN_ON(!->eh_strategy_handler) part. 
  That's just a safety net as standard SCSI doesn't know what to do with 
commands directly aborted by LLDD.  If it's something else, please 
elaborate.

> Both implementations work ... really I'd prefer a scsi_times_out() and
> scsi_abort_cmd() that call a __scsi_abort_cmd().

__scsi_abort_cmd() would just contain scmd_add, I think.

> If we implement them right, in the future those simple interfaces can stay
> the same even if we no longer have to invoke eh to abort a command.
> 
>> So, scsi_eh_reschedule_host() is equivalent to simply scsi_req_abort_cmd(cmd)
>> xor scsi_req_dev_reset(dev), the latter in case you want to notify without
>> piggybacking on a command.

Can you please post the patch for scsi_req_dev_reset()?  One thing to 
note is that libata might not have sdev to call that function with when 
it wants to invoke EH for hotplug.

>> Also, your routine calls more specific eh routines and you should try
>> to be more general.

Please, elaborate.

>>>> Any other handling can be completed by calling the ->done function.
>>>>
>>>> Even the abort/cancel could be done in the driver without this, I assume
>>>> it is avaiable so the driver can use the eh process and existing code
>>>> paths rather than duplicate similar code.
>>> Yeap, as I noted earlier, passing scmds to EH is possible without this 
>>> function but it has to be done in a quite hackish way.  My earlier 
> 
> I didn't mean pass them to eh, I mean the driver or transport could have
> its own work queue (or not ... we don't need a work queue or process
> context to send commands, scsi core sends commands without a work queue or
> process context), and issue the abort. Then on completion set error to
> DID_TIME_OUT (or whatever makes sense) and call the ->done function.
> 
> We should move towards driver/transport supplied eh, that is invoked by
> the driver/transport specific code, not (possibly) by a single timeout. I
> thought this was the direction everyone wanted to go.

I think it's good have some infrastructure in SCSI.  e.g. libata can do 
everything itself but it's just nice to have SCSI EH infrastructure to 
build upon (EH thread, scmd draining & plugging...).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-13  5:32                 ` Tejun Heo
@ 2006-04-14  8:49                   ` Luben Tuikov
  2006-04-14 12:02                     ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Luben Tuikov @ 2006-04-14  8:49 UTC (permalink / raw)
  To: Tejun Heo, Patrick Mansfield
  Cc: Luben Tuikov, Jeff Garzik, hch, James.Bottomley, alan, albertcc,
	arjan, linux-ide, linux-scsi

Hi Tejun,

--- Tejun Heo <htejun@gmail.com> wrote:
> Yeap, I kind of misunderstood Patrick's mail.  scsi_req_abort_cmd() is 
> good but...
> 
> * The event is not really a timeout.

You are absolutely correct.

> It shouldn't be logged as 
> TIMEOUT_ERROR or completed with DID_TIME_OUT when add_scmd fails.
> 
> * Similarly, I don't think it's a good idea to call ->eh_timed_out() on 
> explicit command abort.  If something common has to be done, it's better 
> to make a function for that can calling the function from both 
> ->eh_timed_out() and after scsi_req_abort_cmd() succeeds.  Otherwise, 
> libata has to decide whether ->eh_timed_out() is called from real 
> timeout or explicit abortion and skip timeout processing for explicit 
> abortions.

You've been looking too much into libata code.
You are absolutely correct here as well.
"Special case begat special case."

Now, if you are absolutely correct, and we (I at least) recognize that,
then why are we (I at least) contending that you should use scsi_req_abort_cmd()?

> * I don't really understand what you mean by saying scsi_req_abort_cmd() 
> is more generic.  If it means the WARN_ON(!->eh_strategy_handler) part. 

No, not that part.

>   That's just a safety net as standard SCSI doesn't know what to do with 
> commands directly aborted by LLDD.

It does know.  Look at the 3 line implementation of scsi_req_abort_cmd().
Notice that "scsi_times_out()" is the beauty of generality: follow the same path.
That is, the call to the general "scsi_times_out()" gives the driver/layer another chance,
or notification thereof, that this command is going to go to the eh queue.

Now thereon, the eh_strategy is called with the list of commands to be recovered.
Where the eh_strategy points to if anywhere is another completely different story.

> __scsi_abort_cmd() would just contain scmd_add, I think.

Yeah, you see, try to stay away of those special cases with abcd_fn() and
__abcd_fn().  (Too much libata code?)

> Can you please post the patch for scsi_req_dev_reset()?  One thing to

First, that patch isn't ready for lsml posting.  Also, take a look at the
elaborate threads about literally a three (3) line function "scsi_req_abort_cmd()".
Imagine the book that would result from posting "scsi_req_dev_reset()".

> note is that libata might not have sdev to call that function with when 
> it wants to invoke EH for hotplug.

Let's separate the domains.  You are doing a good thing in separating
your SATA code into a "layer", and then you have LLDD which actually drive
the HW by which you access the interconnect.  (Sounds familiar? ;-) )

Now enter SCSI (as in SAM).  How can you tell SCSI "do eh for me, but
neither a device nor command has failed and I cannot give you either one of them"
as you're saying you'd like to do above?  See?  It is a protocol thing!  That is,
you want to handle such things in your layer.

But since the device abstraction and the command abstraction is _shared_ with
SCSI Core, you have to call "scsi_req_abort_cmd()" and "scsi_req_dev_reset()"
in order to request SCSI Core to call you back with that type of request when
it feels that is is comfortable in calling you to abort the task or
reset the device.

> >> Also, your routine calls more specific eh routines and you should try
> >> to be more general.
> 
> Please, elaborate.

"scsi_times_out()"

> I think it's good have some infrastructure in SCSI.  e.g. libata can do 
> everything itself but it's just nice to have SCSI EH infrastructure to 
> build upon (EH thread, scmd draining & plugging...).

You have to admit, SCSI is a lot more than SATA.  For this reason,
deriving an abstraction from your SATA code that would work for SCSI
isn't an easy feat.

For example, why do you absolutely have to do anything in your eh_timed_out()
callback?  Just atomicly mark your task abstraction as "aborting/aborted" and
return EH_NOT_HANDLED so that you can get called back in your eh_strategy with
a list of commands that need error recovery (ER, from now on).  This is _all_ that
you're going to do in your eh_timed_out() callback.

By also having everything go through eh_timed_out() you can inspect at that instant
if the command has completed and if not, mark it as aborted/aborting, else it has
completed, give it to SCSI Core to complete it for you.

When your ER strategy gets called with a list of commands to be recovered,
it is not necessarily the case that they ended up there because all of them timed
out.  But one thing is for sure, they are all marked aborted/aborting and they
all went through eh_timed_out() and were not done at that time.

Maybe some of them completed ok, and you'd want to "return" them, but cannot since
they were marked "aborted/aborting"... it is this dis-syncrhonization or late-completion,
which you can achieve.

Also consider that the "device failed" you can get from any of the commands on the
er list when your er strategy gets called.  Pick the first command, take a look at the
device, device dead, search the rest of the list for any commands also going to that
device and "recover" them and the device, then go to the next command.

Consider, the SATA layer's task/device abstraction is shared with the LLDD and this
is why you want to use things like eh_timed_out().  For commands and devices it is
most likely the LLDD which will call them and you would want to get notified
somehow of this (via the eh_timed_out()).

Also you want ER to always flow in the same direction from the same starting point
going to the same ending point.

This is the reason to have scsi_req_abort_cmd() and scsi_req_device_reset(), callable
from anywhere by anyone.

Good luck!
    Luben

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-14  8:49                   ` Luben Tuikov
@ 2006-04-14 12:02                     ` Tejun Heo
  2006-04-19 18:49                       ` Luben Tuikov
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2006-04-14 12:02 UTC (permalink / raw)
  To: ltuikov
  Cc: Patrick Mansfield, Jeff Garzik, hch, James.Bottomley, alan,
	albertcc, arjan, linux-ide, linux-scsi

Hello, Luben.

Luben Tuikov wrote:
[--snip--]
>> note is that libata might not have sdev to call that function with when 
>> it wants to invoke EH for hotplug.
> 
> Let's separate the domains.  You are doing a good thing in separating
> your SATA code into a "layer", and then you have LLDD which actually drive
> the HW by which you access the interconnect.  (Sounds familiar? ;-) )
> 
> Now enter SCSI (as in SAM).  How can you tell SCSI "do eh for me, but
> neither a device nor command has failed and I cannot give you either one of them"
> as you're saying you'd like to do above?  See?  It is a protocol thing!  That is,
> you want to handle such things in your layer.
> 
> But since the device abstraction and the command abstraction is _shared_ with
> SCSI Core, you have to call "scsi_req_abort_cmd()" and "scsi_req_dev_reset()"
> in order to request SCSI Core to call you back with that type of request when
> it feels that is is comfortable in calling you to abort the task or
> reset the device.

So, what's your suggestion here?  Do you think libata should do such 
things with its own mechanism?

>>>> Also, your routine calls more specific eh routines and you should try
>>>> to be more general.
>> Please, elaborate.
> 
> "scsi_times_out()"
> 
>> I think it's good have some infrastructure in SCSI.  e.g. libata can do 
>> everything itself but it's just nice to have SCSI EH infrastructure to 
>> build upon (EH thread, scmd draining & plugging...).
> 
> You have to admit, SCSI is a lot more than SATA.  For this reason,
> deriving an abstraction from your SATA code that would work for SCSI
> isn't an easy feat.
> 
> For example, why do you absolutely have to do anything in your eh_timed_out()
> callback?  Just atomicly mark your task abstraction as "aborting/aborted" and
> return EH_NOT_HANDLED so that you can get called back in your eh_strategy with
> a list of commands that need error recovery (ER, from now on).  This is _all_ that
> you're going to do in your eh_timed_out() callback.
> 
> By also having everything go through eh_timed_out() you can inspect at that instant
> if the command has completed and if not, mark it as aborted/aborting, else it has
> completed, give it to SCSI Core to complete it for you.
> 
> When your ER strategy gets called with a list of commands to be recovered,
> it is not necessarily the case that they ended up there because all of them timed
> out.  But one thing is for sure, they are all marked aborted/aborting and they
> all went through eh_timed_out() and were not done at that time.
> 
> Maybe some of them completed ok, and you'd want to "return" them, but cannot since
> they were marked "aborted/aborting"... it is this dis-syncrhonization or late-completion,
> which you can achieve.
> 
> Also consider that the "device failed" you can get from any of the commands on the
> er list when your er strategy gets called.  Pick the first command, take a look at the
> device, device dead, search the rest of the list for any commands also going to that
> device and "recover" them and the device, then go to the next command.
> 
> Consider, the SATA layer's task/device abstraction is shared with the LLDD and this
> is why you want to use things like eh_timed_out().  For commands and devices it is
> most likely the LLDD which will call them and you would want to get notified
> somehow of this (via the eh_timed_out()).
> 
> Also you want ER to always flow in the same direction from the same starting point
> going to the same ending point.
> 
> This is the reason to have scsi_req_abort_cmd() and scsi_req_device_reset(), callable
> from anywhere by anyone.

Point taken about scsi_req_abort_cmd().  scsi_req_abort_cmd() it is, 
then.  To proceed from here....

* sort out things about scsi_eh_schedule_port()/scsi_req_dev_reset()

* re-post patch for scsi_req_abort_cmd() and push it through either 
scsi-misc or libata-dev.  Luben, can you please re-post the patch?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-14 12:02                     ` Tejun Heo
@ 2006-04-19 18:49                       ` Luben Tuikov
  2006-04-20  2:07                         ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Luben Tuikov @ 2006-04-19 18:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Patrick Mansfield, Jeff Garzik, hch, James.Bottomley, alan,
	albertcc, arjan, linux-ide, linux-scsi

Hi Tejun,

--- Tejun Heo <htejun@gmail.com> wrote:
> So, what's your suggestion here?  Do you think libata should do such 
> things with its own mechanism?

Your instinct is correct.  Anything between I and T, as in I_T, is SDS
domain.  That is, SAM doesn't have a _context_ for anything between the I and T.
(In SATA's case I_T_L, of course.)

So this is why you shouldn't call SCSI's ER without context, as in "Hey,
neither command, nor device is broken, but do some ER for me."  Such work belongs
in your SATA Layer, unless either a device or command is involved, as we discussed.

Handle protocol ER in your SATA layer.  Note however that most of your ER could
be done on behalf of a command or device, since of course, a device or command(s)
end up being affected.  Basically "do as little as possible but no less" technique.

Good luck!
     Luben

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-19 18:49                       ` Luben Tuikov
@ 2006-04-20  2:07                         ` Tejun Heo
  2006-04-20 13:01                           ` Christoph Hellwig
  2006-04-20 19:23                           ` Luben Tuikov
  0 siblings, 2 replies; 22+ messages in thread
From: Tejun Heo @ 2006-04-20  2:07 UTC (permalink / raw)
  To: ltuikov
  Cc: Patrick Mansfield, Jeff Garzik, hch, James.Bottomley, alan,
	albertcc, arjan, linux-ide, linux-scsi

Luben Tuikov wrote:
> Hi Tejun,
> 
> --- Tejun Heo <htejun@gmail.com> wrote:
>> So, what's your suggestion here?  Do you think libata should do such 
>> things with its own mechanism?
> 
> Your instinct is correct.  Anything between I and T, as in I_T, is SDS
> domain.  That is, SAM doesn't have a _context_ for anything between the I and T.
> (In SATA's case I_T_L, of course.)
> 
> So this is why you shouldn't call SCSI's ER without context, as in "Hey,
> neither command, nor device is broken, but do some ER for me."  Such work belongs
> in your SATA Layer, unless either a device or command is involved, as we discussed.
> 
> Handle protocol ER in your SATA layer.  Note however that most of your ER could
> be done on behalf of a command or device, since of course, a device or command(s)
> end up being affected.  Basically "do as little as possible but no less" technique.
> 

Hello, Luben.

I see your point regarding SAM.  If this is the consensus among SCSI 
developers, I guess eh_schedule_host() shouldn't be added.

Hmmm.. libata isn't SCSI.  It's just hitch-hiking on the back of SCSI at 
the moment.  And, being able to invoke EH on the host directly makes 
things a lot simpler for libata.  So, what do you think about just 
adding shost->host_eh_scheduled, adding a condition to 
scsi_error_handler() and export scsi_eh_wakeup()?  The changes will be 
like three lines.  And, also, ->host_eh_scheduled can be marked as hack 
for libata.  libata can handle the rest.

This may not be the most graceful thing to do but...

1. SCSI change is minimal

2. it's a temporary measure (moving libata out of SCSI is one of the 
items on my agenda and if no one else does it I'm gonna give a shot at 
it in not-so-distant future.  Once libata is gone of SCSI, the three 
lines can be removed.)

3. It will take a lot more work to achieve the same thing in libata 
proper without the three line change.

Luben, what do you think?

James and Jeff, any opinion on this matter?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-20  2:07                         ` Tejun Heo
@ 2006-04-20 13:01                           ` Christoph Hellwig
  2006-04-21  2:22                             ` Tejun Heo
  2006-04-20 19:23                           ` Luben Tuikov
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2006-04-20 13:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ltuikov, Patrick Mansfield, Jeff Garzik, hch, James.Bottomley,
	alan, albertcc, arjan, linux-ide, linux-scsi

On Thu, Apr 20, 2006 at 11:07:11AM +0900, Tejun Heo wrote:
> James and Jeff, any opinion on this matter?

I think you hack looks okayish.  But I want to see the whole libata
->eh_strategy_handler first.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-20  2:07                         ` Tejun Heo
  2006-04-20 13:01                           ` Christoph Hellwig
@ 2006-04-20 19:23                           ` Luben Tuikov
  2006-04-21  2:39                             ` Tejun Heo
  1 sibling, 1 reply; 22+ messages in thread
From: Luben Tuikov @ 2006-04-20 19:23 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Patrick Mansfield, Jeff Garzik, hch, James.Bottomley, alan,
	albertcc, arjan, linux-ide, linux-scsi

--- Tejun Heo <htejun@gmail.com> wrote:
> Hmmm.. libata isn't SCSI.  It's just hitch-hiking on the back of SCSI at 
> the moment.  And, being able to invoke EH on the host directly makes 
> things a lot simpler for libata.  So, what do you think about just 
> adding shost->host_eh_scheduled, adding a condition to 
> scsi_error_handler() and export scsi_eh_wakeup()?  The changes will be 
> like three lines.  And, also, ->host_eh_scheduled can be marked as hack 
> for libata.  libata can handle the rest.

Hi Tejun,

I don't mind such a patch.  It is well defined -- all you want is
process (ER) context to do some hw/proto ER work.  Clearly the architecture
of libata doesn't allow for protocol/hw work in a process as we'd discussed,
and so I think such a small patch would be completely ok to make the rest
of your work go through.

> This may not be the most graceful thing to do but...
> 
> 1. SCSI change is minimal

I think this is a good thing, and for clearly understood and well defined
solutions we want patches to be small.  Your architecural SATA layer work
can be as big as it needs to be and I think you're doing an excellent
job -- FWIW, if it hadn't been for your patches, a lot of things would
be pretty stagnant in libata, SCSI and the block layer.

> 2. it's a temporary measure (moving libata out of SCSI is one of the 
> items on my agenda and if no one else does it I'm gonna give a shot at 
> it in not-so-distant future.  Once libata is gone of SCSI, the three 
> lines can be removed.)

You are a very, very smart person to want to move libata out of SCSI.
It is very unfortunate, but I clearly understand why you'd want to do
it.

> 3. It will take a lot more work to achieve the same thing in libata 
> proper without the three line change.
> 
> Luben, what do you think?

I think it sounds good.

Good luck!
       Luben

P.S.  Remember what we'd discussed earlier, if you can offload something
to command ER or device ER, do it!  It will save you a lot of special
cases, IRQ/process context jungle, and differing paths and races.

Don't you mind that if a command fails, you know that all of them
queued commands would have to get zapped and that the device would probably
have to be reset.  Just call scsi_req_abort_cmd() and do that abstraction
later in process context when it calls your ER routine.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-20 13:01                           ` Christoph Hellwig
@ 2006-04-21  2:22                             ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2006-04-21  2:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: ltuikov, Patrick Mansfield, Jeff Garzik, hch, James.Bottomley,
	alan, albertcc, arjan, linux-ide, linux-scsi

On Thu, Apr 20, 2006 at 02:01:13PM +0100, Christoph Hellwig wrote:
> On Thu, Apr 20, 2006 at 11:07:11AM +0900, Tejun Heo wrote:
> > James and Jeff, any opinion on this matter?
> 
> I think you hack looks okayish.  But I want to see the whole libata
> ->eh_strategy_handler first.

Thanks, Christoph.

I'll CC you on the next post of libata EH patches.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd()
  2006-04-20 19:23                           ` Luben Tuikov
@ 2006-04-21  2:39                             ` Tejun Heo
  0 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2006-04-21  2:39 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: Patrick Mansfield, Jeff Garzik, hch, James.Bottomley, alan,
	albertcc, arjan, linux-ide, linux-scsi

Hello, Luben.

On Thu, Apr 20, 2006 at 12:23:57PM -0700, Luben Tuikov wrote:
> --- Tejun Heo <htejun@gmail.com> wrote:
> > Hmmm.. libata isn't SCSI.  It's just hitch-hiking on the back of SCSI at 
> > the moment.  And, being able to invoke EH on the host directly makes 
> > things a lot simpler for libata.  So, what do you think about just 
> > adding shost->host_eh_scheduled, adding a condition to 
> > scsi_error_handler() and export scsi_eh_wakeup()?  The changes will be 
> > like three lines.  And, also, ->host_eh_scheduled can be marked as hack 
> > for libata.  libata can handle the rest.
> 
> Hi Tejun,
> 
> I don't mind such a patch.  It is well defined -- all you want is
> process (ER) context to do some hw/proto ER work.  Clearly the architecture
> of libata doesn't allow for protocol/hw work in a process as we'd discussed,
> and so I think such a small patch would be completely ok to make the rest
> of your work go through.

Good.

> > This may not be the most graceful thing to do but...
> > 
> > 1. SCSI change is minimal
> 
> I think this is a good thing, and for clearly understood and well defined
> solutions we want patches to be small.  Your architecural SATA layer work
> can be as big as it needs to be and I think you're doing an excellent
> job -- FWIW, if it hadn't been for your patches, a lot of things would
> be pretty stagnant in libata, SCSI and the block layer.
> 
> > 2. it's a temporary measure (moving libata out of SCSI is one of the 
> > items on my agenda and if no one else does it I'm gonna give a shot at 
> > it in not-so-distant future.  Once libata is gone of SCSI, the three 
> > lines can be removed.)
> 
> You are a very, very smart person to want to move libata out of SCSI.
> It is very unfortunate, but I clearly understand why you'd want to do
> it.

libata piggy-backed on SCSI mainly for convenience reasons.  It just
basically wanted to use the SCSI high level drivers, command queue
handling, EH and etc.  libata is scheduled to be detached from SCSI
almost from the beginning, IIRC.  Well, ATA pass-through and other
changes somewhat decreased the need, but still...

> > 3. It will take a lot more work to achieve the same thing in libata 
> > proper without the three line change.
> > 
> > Luben, what do you think?
> 
> I think it sounds good.
> 
> Good luck!
>        Luben
> 
> P.S.  Remember what we'd discussed earlier, if you can offload something
> to command ER or device ER, do it!  It will save you a lot of special
> cases, IRQ/process context jungle, and differing paths and races.
> 
> Don't you mind that if a command fails, you know that all of them
> queued commands would have to get zapped and that the device would probably
> have to be reset.  Just call scsi_req_abort_cmd() and do that abstraction
> later in process context when it calls your ER routine.

Thank you for the advice.  I'll think about it.

So, if no one else objects, let's merge stuff.  If there is no
immediate need in SCSI, I think both Luben's scsi_req_abort_cmd() and
my hack are better off being merged via libata, where the current need
is.  Any opinions?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2006-04-21  2:39 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-01 10:38 [PATCH] SCSI: implement scsi_eh_schedule() Tejun Heo
2006-04-01 20:14 ` Jeff Garzik
2006-04-02  1:15   ` Tejun Heo
2006-04-02 16:04     ` [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd() Tejun Heo
2006-04-02 16:06       ` [PATCH 2/2] SCSI: implement scsi_eh_schedule_host() Tejun Heo
2006-04-11 17:43         ` Jeff Garzik
2006-04-02 23:49       ` [PATCH 1/2] SCSI: implement scsi_eh_schedule_cmd() Luben Tuikov
2006-04-03  1:24         ` Tejun Heo
2006-04-11 17:41       ` Jeff Garzik
2006-04-11 21:28         ` Patrick Mansfield
2006-04-12  2:21           ` Tejun Heo
2006-04-12  8:24             ` Luben Tuikov
2006-04-12 16:18               ` Patrick Mansfield
2006-04-13  5:32                 ` Tejun Heo
2006-04-14  8:49                   ` Luben Tuikov
2006-04-14 12:02                     ` Tejun Heo
2006-04-19 18:49                       ` Luben Tuikov
2006-04-20  2:07                         ` Tejun Heo
2006-04-20 13:01                           ` Christoph Hellwig
2006-04-21  2:22                             ` Tejun Heo
2006-04-20 19:23                           ` Luben Tuikov
2006-04-21  2:39                             ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).