public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* RE: [PATCH]: Flexible timeout infrastructure
@ 2004-06-16 17:33 Smart, James
  2004-06-16 17:38 ` James Bottomley
  0 siblings, 1 reply; 38+ messages in thread
From: Smart, James @ 2004-06-16 17:33 UTC (permalink / raw)
  To: 'James Bottomley'; +Cc: Luben Tuikov, Mike Anderson, SCSI Mailing List

Sorry, guess I wasn't clear.

reset - meant timer reset. And I completely follow that the command is not
being retried, etc, just the timeout restarted.

I'll try to restate. The patch proposed to allow only 1 restart of the timer
for a command. My comment was - why not 2, or 3?  I would think - the number
of restarts needed is a function of how long the timeout value is vs when
the LLDD event occurs and what the duration of the LLDD event is. I can see
capping the number of timer restarts, but am not sure 1 is the best choice. 

-- james

^ permalink raw reply	[flat|nested] 38+ messages in thread
* RE: [PATCH]: Flexible timeout infrastructure
@ 2004-06-16 18:05 Smart, James
  0 siblings, 0 replies; 38+ messages in thread
From: Smart, James @ 2004-06-16 18:05 UTC (permalink / raw)
  To: 'James Bottomley'; +Cc: Luben Tuikov, Mike Anderson, SCSI Mailing List

So the jist isn't to get the command to survive the LLDD event, even if it
could. Rather, the user has set aside X amount of time (where X = #retries *
timeout length), and you want to return the command to the user as close to
X as possible. And there are 2 further expectations: you are willing to
suffer the failure of non-retryable commands that may be forced to fail; and
you expect the LLDDs to call the midlayer to block further commands if the
LLDD event persists. (Note: this is where I think more granularity could be
used - instead of host-global events, there may be target-specific events as
well).

Ok.  Just trying to get on the same page...

-- james

> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@SteelEye.com]
> Sent: Wednesday, June 16, 2004 1:38 PM
> To: Smart, James
> Cc: Luben Tuikov; Mike Anderson; SCSI Mailing List
> Subject: RE: [PATCH]: Flexible timeout infrastructure
> 
> 
> On Wed, 2004-06-16 at 12:33, Smart, James wrote:
> > I'll try to restate. The patch proposed to allow only 1 
> restart of the timer
> > for a command. My comment was - why not 2, or 3?  I would 
> think - the number
> > of restarts needed is a function of how long the timeout 
> value is vs when
> > the LLDD event occurs and what the duration of the LLDD 
> event is. I can see
> > capping the number of timer restarts, but am not sure 1 is 
> the best choice. 
> 
> It does.  But the number of allowed restarts is linked to the command
> retries (actually allowed retries + 1).
> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread
* RE: [PATCH]: Flexible timeout infrastructure
@ 2004-06-16 17:10 Smart, James
  2004-06-16 17:21 ` James Bottomley
  0 siblings, 1 reply; 38+ messages in thread
From: Smart, James @ 2004-06-16 17:10 UTC (permalink / raw)
  To: 'James Bottomley'; +Cc: Luben Tuikov, Mike Anderson, SCSI Mailing List

I can appreciate the protectionist point of view. My only contention is -
how do you really derive that 1 reset is any less valid than 2 ? it totally
depends on what the original timeout was relative to whatever the duration
is for the LLDD event that is requesting the reset. 

-- James

> -----Original Message-----
> From: James Bottomley [mailto:James.Bottomley@SteelEye.com]
> Sent: Wednesday, June 16, 2004 1:04 PM
> To: Smart, James
> Cc: Luben Tuikov; Mike Anderson; SCSI Mailing List
> Subject: RE: [PATCH]: Flexible timeout infrastructure
> 
> 
> On Wed, 2004-06-16 at 11:58, Smart, James wrote:
> > So why increment the retries counter at all ? 
> 
> So drivers can't return EH_RESET_TIMER forever. This really ties the
> handling into the behaviour the user has requested.  The only wrinkle
> was the no retry streaming commands.  The assumption is that a retry
> doesn't actually happen under eh_timed_out, but only an attempt to
> collect an existing command which may be delayed because of 
> transport or
> other problems the host knows about.
> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread
* RE: [PATCH]: Flexible timeout infrastructure
@ 2004-06-16 16:58 Smart, James
  2004-06-16 17:04 ` James Bottomley
  0 siblings, 1 reply; 38+ messages in thread
From: Smart, James @ 2004-06-16 16:58 UTC (permalink / raw)
  To: 'James Bottomley', Luben Tuikov; +Cc: Mike Anderson, SCSI Mailing List

> +		case EH_RESET_TIMER:
> +			/* This allows a single retry even of a command
> +			 * with allowed == 0 */
> +			if (scmd->retries++ > scmd->allowed)
> +				break;
> +			scsi_add_timer(scmd, scmd->timeout_per_command,
> +				       scsi_times_out);
> +			return;

So why increment the retries counter at all ? 

^ permalink raw reply	[flat|nested] 38+ messages in thread
* [PATCH]: Flexible timeout infrastructure
@ 2004-06-15 15:02 Luben Tuikov
  2004-06-15 15:27 ` Arjan van de Ven
  2004-06-15 15:31 ` James Bottomley
  0 siblings, 2 replies; 38+ messages in thread
From: Luben Tuikov @ 2004-06-15 15:02 UTC (permalink / raw)
  To: SCSI Mailing List

Hello,

This patch introduces a flexible command timeout infrastructure
accomodating completely current behaviour SCSI Core command timeout,
but also offering the ability to hand command timeout handling to
a LLDD, _yet_ still have it all go through *SCSI Core*.

This is somewhat similar to Christoph's proposal patch, but I wasn't
aware of his patch when I thought of this. Interestingly enough it can be
viewed as an extension to his patch.

The patch is very short (minimal chages indeed), but the text a bit
lenghty since it outlines the many different a usage of such an
infrastructure.

New method: 
  struct scsi_host_template :: int (*eh_cmd_timed_out)(struct scsi_cmnd *);
This method is marked as OPTIONAL status.

If the driver does not define eh_cmd_timed_out() method in its
host template, then current behaviour is assumed.

If the driver does define it, then SCSI Core will give the command
to LLDD without a timer running.  Then when the LLDD notifies the
transport and before it sends it to the interconnect it start
the timer *by calling SCSI Core*, scsi_add_timer(<cmd>,<timeout>,scsi_times_out).

Timeline:
---+-------------------------+-------------> t
   t0                        t1           
SCSI Core::queuecommand(), LLDD::scsi_add_timer(,,scsi_times_out)

NB: |t1-t0| is *always* bounded! (hint)

SUCCESS: If the command completes, LLDD calls
scsi_delete_timer(<cmd>), and then scsi_done().  This completes
the SCSI command transaction.

TO: If the command times out, then *SCSI Core* is called,
scsi_times_out(), which calls the defined, eh_cmd_timed_out()
method.

RETURNS: eh_cmd_timed_out() returns:
  EH_HANDLED:
          - LLDD has called scsi_done() having set the status/response
            codes appropriately,
       XOR
          - resent the command back to the LU.
    Either way, there's nothing to do at this point and the timer
    routine completes.

  EH_NONE:
          - SCSI Core should do error handling as usual.

Note that after eh_cmd_timed_out() returns, we are in _consistent_
state:
- either the command is back out to the LU: timer
has been rerun by the LLDD, the command belongs to LLDD,
SCSI Core should do nothing -- just the same as if it never fired,
XOR
- scsi_done() is called and SCSI Core will
be processing the command shortly, no timer is running
since it fired (since we're executing in this method).
Consistent as if the command completed.
Or we returned EH_NONE, and SCSI Core adds it to the
eh list for recovery.

Let me know your comments, USB, FC, SAS, iSCSI device driver
writers.  The main point of this patch is complicated
_recovery_ of protocols for which SCSI Core cannot have
internal knowlege, but will get notified *at every step*.
I.e. SAM like.

Note that the driver may also decide to do some processing
in the eh_cmd_timed_out() method (interrupt context) and still
return EH_NONE, to get back at that command in process context
from the eh tread (i.e. if TMF ABORT TASK takes some time to return).

Against scsi-misc-2.6:

===== drivers/scsi/scsi.c 1.143 vs edited =====
--- 1.143/drivers/scsi/scsi.c	2004-04-28 12:32:09 -04:00
+++ edited/drivers/scsi/scsi.c	2004-06-14 15:13:12 -04:00
@@ -549,7 +549,12 @@
 		host->resetting = 0;
 	}
 
-	scsi_add_timer(cmd, cmd->timeout_per_command, scsi_times_out);
+	/* If the LLDD claims to be able to handle its own command timeout,
+	 * don't start the timer.  It will start it before sending the command
+	 * to the LU.
+	 */
+	if (!host->hostt->eh_cmd_timed_out)
+		scsi_add_timer(cmd, cmd->timeout_per_command, scsi_times_out);
 
 	scsi_log_send(cmd);
 
@@ -699,8 +704,9 @@
 	 * that function could really be.  It might be on another processor,
 	 * etc, etc.
 	 */
-	if (!scsi_delete_timer(cmd))
-		return;
+	if (!cmd->device->host->hostt->eh_cmd_timed_out)
+		if (!scsi_delete_timer(cmd))
+			return;
 
 	/*
 	 * Set the serial numbers back to zero
===== drivers/scsi/scsi_error.c 1.76 vs edited =====
--- 1.76/drivers/scsi/scsi_error.c	2004-06-04 12:54:06 -04:00
+++ edited/drivers/scsi/scsi_error.c	2004-06-14 15:13:53 -04:00
@@ -155,8 +155,8 @@
 }
 
 /**
- * scsi_times_out - Timeout function for normal scsi commands.
- * @scmd:	Cmd that is timing out.
+ * scsi_times_out - Timeout function for normal SCSI commands.
+ * @scmd:	Command that is timing out.
  *
  * Notes:
  *     We do not need to lock this.  There is the potential for a race
@@ -166,7 +166,16 @@
  **/
 void scsi_times_out(struct scsi_cmnd *scmd)
 {
+	int ret = EH_NONE;
+	
 	scsi_log_completion(scmd, TIMEOUT_ERROR);
+
+	if (scmd->device->host->hostt->eh_cmd_timed_out)
+		ret = scmd->device->host->hostt->eh_cmd_timed_out(scmd);
+
+	if (ret == EH_HANDLED)
+		return;
+	
 	if (unlikely(!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))) {
 		panic("Error handler thread not present at %p %p %s %d",
 		      scmd, scmd->device->host, __FILE__, __LINE__);
===== drivers/scsi/scsi_syms.c 1.47 vs edited =====
--- 1.47/drivers/scsi/scsi_syms.c	2004-05-19 13:46:14 -04:00
+++ edited/drivers/scsi/scsi_syms.c	2004-06-14 11:09:29 -04:00
@@ -107,3 +107,4 @@
  */
 EXPORT_SYMBOL(scsi_add_timer);
 EXPORT_SYMBOL(scsi_delete_timer);
+EXPORT_SYMBOL(scsi_times_out);
===== include/scsi/scsi.h 1.21 vs edited =====
--- 1.21/include/scsi/scsi.h	2004-04-21 12:54:43 -04:00
+++ edited/include/scsi/scsi.h	2004-06-14 14:33:15 -04:00
@@ -327,6 +327,12 @@
 #define SCSI_MLQUEUE_EH_RETRY    0x1057
 
 /*
+ * LLDD timeout handler return values
+ */
+#define EH_NONE        (0)
+#define EH_HANDLED     (1)
+
+/*
  *  Use these to separate status msg and our bytes
  *
  *  These are set by:
===== include/scsi/scsi_eh.h 1.2 vs edited =====
--- 1.2/include/scsi/scsi_eh.h	2004-06-04 07:45:01 -04:00
+++ edited/include/scsi/scsi_eh.h	2004-06-14 11:12:13 -04:00
@@ -7,10 +7,11 @@
 
 extern void scsi_add_timer(struct scsi_cmnd *, int,
 			   void (*)(struct scsi_cmnd *));
-extern int scsi_delete_timer(struct scsi_cmnd *);
+extern int  scsi_delete_timer(struct scsi_cmnd *);
+extern void scsi_times_out(struct scsi_cmnd *);
 extern void scsi_report_bus_reset(struct Scsi_Host *, int);
 extern void scsi_report_device_reset(struct Scsi_Host *, int, int);
-extern int scsi_block_when_processing_errors(struct scsi_device *);
+extern int  scsi_block_when_processing_errors(struct scsi_device *);
 extern void scsi_sleep(int);
 
 /*
===== include/scsi/scsi_host.h 1.17 vs edited =====
--- 1.17/include/scsi/scsi_host.h	2004-06-04 12:51:31 -04:00
+++ edited/include/scsi/scsi_host.h	2004-06-14 15:12:23 -04:00
@@ -125,6 +125,40 @@
 	int (* eh_bus_reset_handler)(struct scsi_cmnd *);
 	int (* eh_host_reset_handler)(struct scsi_cmnd *);
 
+
+	/*
+	 * If defined, SCSI Core will leave _all_ command timeout
+	 * handling to the LLDD.  The LLDD should call
+	 * scsi_add_timer(<cmd>,<timeout>,scsi_times_out) when
+	 * appropriate, possibly before sending the command to the LU,
+	 * and scsi_delete_timer(<cmd>) before returning it to SCSI
+	 * Core. That is, a SCSI command is passed to the LLDD (via
+	 * queuecommand()) _without_ a timer running and is expected
+	 * to be returned to SCSI Core (via scsi_done()) just the
+	 * same, without one running.
+	 *
+	 * This method is called whenever a SCSI command times out, if
+	 * the driver had called
+	 * scsi_add_timer(<cmd>,<timeout>,scsi_times_out) of course.
+	 *
+	 * Returns: EH_HANDLED or EH_NONE.
+	 * EH_HANDLED -- means that the LLDD has taken all steps
+	 * to ensure proper command recovery handling and had
+	 * A) either called scsi_done() and set the response/result
+	 * properly in the SCSI command structure,
+	 * XOR
+	 * B) resent the command to the LU.
+	 * EH_NONE -- SCSI Core should try to recover the command as
+	 * usual (eh recovery thread, etc).
+	 *
+	 * If this method is not defined, current/old behaviour is assumed.
+	 *
+	 * NOTE: This runs in interrupt context, so it cannot sleep.
+	 * 
+	 * STATUS: OPTIONAL
+	 */
+	int (* eh_cmd_timed_out)(struct scsi_cmnd *);
+
 	/*
 	 * Old EH handlers, no longer used. Make them warn the user of old
 	 * drivers by using a wrong type


-- 
Luben



^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2004-06-16 19:17 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-06-16 17:33 [PATCH]: Flexible timeout infrastructure Smart, James
2004-06-16 17:38 ` James Bottomley
  -- strict thread matches above, loose matches on Subject: below --
2004-06-16 18:05 Smart, James
2004-06-16 17:10 Smart, James
2004-06-16 17:21 ` James Bottomley
2004-06-16 16:58 Smart, James
2004-06-16 17:04 ` James Bottomley
2004-06-16 18:58   ` Luben Tuikov
2004-06-16 19:17     ` James Bottomley
2004-06-15 15:02 Luben Tuikov
2004-06-15 15:27 ` Arjan van de Ven
2004-06-15 15:40   ` Luben Tuikov
2004-06-15 15:42     ` Christoph Hellwig
2004-06-15 15:46       ` Luben Tuikov
2004-06-15 15:49         ` Christoph Hellwig
2004-06-15 15:43     ` Arjan van de Ven
2004-06-15 15:48       ` Luben Tuikov
2004-06-15 15:57         ` Christoph Hellwig
2004-06-15 16:07           ` Arjan van de Ven
2004-06-15 16:24           ` Doug Ledford
2004-06-15 16:27           ` Luben Tuikov
2004-06-15 16:33             ` Arjan van de Ven
2004-06-15 18:07               ` Luben Tuikov
2004-06-15 15:31 ` James Bottomley
2004-06-15 18:15   ` Mike Anderson
2004-06-15 18:37     ` Luben Tuikov
2004-06-15 19:20       ` Mike Anderson
2004-06-15 19:52         ` Luben Tuikov
2004-06-15 20:57           ` Mike Anderson
2004-06-15 22:00             ` Luben Tuikov
2004-06-15 22:31               ` Luben Tuikov
2004-06-15 22:13             ` Doug Ledford
2004-06-15 19:12   ` Luben Tuikov
2004-06-15 19:54     ` James Bottomley
2004-06-16 15:27       ` Mike Anderson
2004-06-16 15:37         ` James Bottomley
2004-06-16 15:48           ` Luben Tuikov
2004-06-16 15:58             ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox