[PATCH] allow drivers to hook into watchdog timeout

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] allow drivers to hook into watchdog timeout
@ 2004-01-20 13:20 Christoph Hellwig
  2004-01-20 15:53 ` Mike Anderson
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Christoph Hellwig @ 2004-01-20 13:20 UTC (permalink / raw)
  To: James.Bottomley, gibbs; +Cc: linux-scsi

We all know talk is cheap, so here's a first draft patch to allow LLDDs
to get control first after a command timeout.  Justin, does this look
okay for you?  BTW, your drivers are the last ones using scsi_add_timer
from outside the midlayer, if we could get rid of that we'd be able to
keep the interface private.


===== drivers/scsi/scsi.c 1.134 vs edited =====
--- 1.134/drivers/scsi/scsi.c	Sat Jan 10 21:37:37 2004
+++ edited/drivers/scsi/scsi.c	Tue Jan 20 14:18:51 2004
@@ -484,6 +484,28 @@
 }
 #endif
 
+/**
+ * scsi_times_out - Timeout handler for normal scsi commands.
+ * @scmd:	Cmd that is timing out.
+ *
+ * Notes:
+ *     We do not need to lock this.  There is the potential for a race
+ *     only in that the normal completion handling might run, but if the
+ *     normal completion function determines that the timer has already
+ *     fired, then it mustn't do anything.
+ */
+static void scsi_times_out(struct scsi_cmnd *scmd)
+{
+	scsi_log_completion(scmd, TIMEOUT_ERROR);
+
+	if (scmd->device->host->hostt->eh_timeout) {
+		scmd->device->host->hostt->eh_timeout(scmd);
+	} else if (unlikely(!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))) {
+		panic("Error handler thread not present at %p %p %s %d",
+		      scmd, scmd->device->host, __FILE__, __LINE__);
+	}
+}
+
 /*
  * Function:    scsi_dispatch_command
  *
===== drivers/scsi/scsi_error.c 1.69 vs edited =====
--- 1.69/drivers/scsi/scsi_error.c	Thu Jan 15 23:14:40 2004
+++ edited/drivers/scsi/scsi_error.c	Tue Jan 20 14:18:56 2004
@@ -153,25 +153,6 @@
 }
 
 /**
- * scsi_times_out - Timeout function for normal scsi commands.
- * @scmd:	Cmd that is timing out.
- *
- * Notes:
- *     We do not need to lock this.  There is the potential for a race
- *     only in that the normal completion handling might run, but if the
- *     normal completion function determines that the timer has already
- *     fired, then it mustn't do anything.
- **/
-void scsi_times_out(struct scsi_cmnd *scmd)
-{
-	scsi_log_completion(scmd, TIMEOUT_ERROR);
-	if (unlikely(!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))) {
-		panic("Error handler thread not present at %p %p %s %d",
-		      scmd, scmd->device->host, __FILE__, __LINE__);
-	}
-}
-
-/**
  * scsi_block_when_processing_errors - Prevent cmds from being queued.
  * @sdev:	Device on which we are performing recovery.
  *
===== drivers/scsi/scsi_priv.h 1.30 vs edited =====
--- 1.30/drivers/scsi/scsi_priv.h	Mon Sep 29 14:37:28 2003
+++ edited/drivers/scsi/scsi_priv.h	Tue Jan 20 14:17:30 2004
@@ -100,11 +100,9 @@
 extern void scsi_exit_devinfo(void);
 
 /* scsi_error.c */
-extern void scsi_times_out(struct scsi_cmnd *cmd);
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
-extern int scsi_eh_scmd_add(struct scsi_cmnd *, int);
 
 /* scsi_lib.c */
 extern int scsi_maybe_unblock_host(struct scsi_device *sdev);
===== include/scsi/scsi_eh.h 1.1 vs edited =====
--- 1.1/include/scsi/scsi_eh.h	Mon Jun  9 00:03:00 2003
+++ edited/include/scsi/scsi_eh.h	Tue Jan 20 14:17:27 2004
@@ -4,6 +4,7 @@
 extern void scsi_add_timer(struct scsi_cmnd *, int,
 			   void (*)(struct scsi_cmnd *));
 extern int scsi_delete_timer(struct scsi_cmnd *);
+extern int scsi_eh_scmd_add(struct scsi_cmnd *, int);
 extern void scsi_report_bus_reset(struct Scsi_Host *, int);
 extern void scsi_report_device_reset(struct Scsi_Host *, int, int);
 extern int scsi_block_when_processing_errors(struct scsi_device *);
===== include/scsi/scsi_host.h 1.13 vs edited =====
--- 1.13/include/scsi/scsi_host.h	Wed Nov 12 15:15:46 2003
+++ edited/include/scsi/scsi_host.h	Tue Jan 20 14:11:46 2004
@@ -125,6 +125,11 @@
 	int (* eh_host_reset_handler)(struct scsi_cmnd *);
 
 	/*
+	 * Timeout handler.  Optional.
+	 */
+	void (* eh_timeout)(struct scsi_cmnd *);
+
+	/*
 	 * Old EH handlers, no longer used. Make them warn the user of old
 	 * drivers by using a wrong type
 	 *

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-01-20 13:20 [PATCH] allow drivers to hook into watchdog timeout Christoph Hellwig
@ 2004-01-20 15:53 ` Mike Anderson
  2004-01-20 16:00   ` Christoph Hellwig
  2004-01-20 17:00 ` Brian King
  2004-02-10 16:34 ` Justin T. Gibbs
  2 siblings, 1 reply; 19+ messages in thread
From: Mike Anderson @ 2004-01-20 15:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: James.Bottomley, gibbs, linux-scsi

Christoph Hellwig [hch@lst.de] wrote:
> to get control first after a command timeout.  Justin, does this look
> okay for you?  BTW, your drivers are the last ones using scsi_add_timer
> from outside the midlayer, if we could get rid of that we'd be able to
> keep the interface private.
> 

If a LLDD uses this interface how does the scsi_cmnd get returned back
to the mid-layer? If one uses scsi_done without a timer set it will
believe that the error handler is running and not pass it on to
scsi_softirq. It would seem a hack to add a timer just to let scsi_done
pass the cmd back as this appears to be one thing we are trying to avoid
( scsi_errror used at hack like this in the past :-( ). Having everything
always come through scsi_done even under error recovery (timeouts, aborts)
has been discussed on the list in the past.

-andmike
--
Michael Anderson
andmike@us.ibm.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-01-20 15:53 ` Mike Anderson
@ 2004-01-20 16:00   ` Christoph Hellwig
  2004-01-20 16:47     ` Mike Anderson
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2004-01-20 16:00 UTC (permalink / raw)
  To: James.Bottomley, gibbs, linux-scsi

On Tue, Jan 20, 2004 at 07:53:20AM -0800, Mike Anderson wrote:
> Christoph Hellwig [hch@lst.de] wrote:
> > to get control first after a command timeout.  Justin, does this look
> > okay for you?  BTW, your drivers are the last ones using scsi_add_timer
> > from outside the midlayer, if we could get rid of that we'd be able to
> > keep the interface private.
> > 
> 
> If a LLDD uses this interface how does the scsi_cmnd get returned back
> to the mid-layer?

My thoughts about this interface would be that the driver performs
internals actions of whatever means and then calls scsi_eh_scmd_add
anyway - that's why it moved out of scsi_priv.h

> If one uses scsi_done without a timer set it will
> believe that the error handler is running and not pass it on to
> scsi_softirq.

We could change that by moving the guts of scsi_done to a new
__scsi_done and then make scsi_done a tiny wrapper around it, e.g:


--- 1.134/drivers/scsi/scsi.c	Sat Jan 10 21:37:37 2004
+++ edited/drivers/scsi/scsi.c	Tue Jan 20 17:56:56 2004
@@ -672,6 +694,29 @@
  */
 static DEFINE_PER_CPU(struct list_head, scsi_done_q);
 
+void __scsi_done(struct scsi_cmnd *cmd)
+{
+	unsigned long flags;
+
+	/*
+	 * Set the serial numbers back to zero
+	 */
+	cmd->serial_number = 0;
+	cmd->serial_number_at_timeout = 0;
+	cmd->state = SCSI_STATE_BHQUEUE;
+	cmd->owner = SCSI_OWNER_BH_HANDLER;
+
+	/*
+	 * Next, enqueue the command into the done queue.
+	 * It is a per-CPU queue, so we just disable local interrupts
+	 * and need no spinlock.
+	 */
+	local_irq_save(flags);
+	list_add_tail(&cmd->eh_entry, &__get_cpu_var(scsi_done_q));
+	raise_softirq_irqoff(SCSI_SOFTIRQ);
+	local_irq_restore(flags);
+}
+
 /**
  * scsi_done - Enqueue the finished SCSI command into the done queue.
  * @cmd: The SCSI Command for which a low-level device driver (LLDD) gives
@@ -687,8 +732,6 @@
  */
 void scsi_done(struct scsi_cmnd *cmd)
 {
-	unsigned long flags;
-
 	/*
 	 * We don't have to worry about this one timing out any more.
 	 * If we are unable to remove the timer, then the command
@@ -697,26 +740,8 @@
 	 * that function could really be.  It might be on another processor,
 	 * etc, etc.
 	 */
-	if (!scsi_delete_timer(cmd))
-		return;
-
-	/*
-	 * Set the serial numbers back to zero
-	 */
-	cmd->serial_number = 0;
-	cmd->serial_number_at_timeout = 0;
-	cmd->state = SCSI_STATE_BHQUEUE;
-	cmd->owner = SCSI_OWNER_BH_HANDLER;
-
-	/*
-	 * Next, enqueue the command into the done queue.
-	 * It is a per-CPU queue, so we just disable local interrupts
-	 * and need no spinlock.
-	 */
-	local_irq_save(flags);
-	list_add_tail(&cmd->eh_entry, &__get_cpu_var(scsi_done_q));
-	raise_softirq_irqoff(SCSI_SOFTIRQ);
-	local_irq_restore(flags);
+	if (scsi_delete_timer(cmd))
+		__scsi_done(cmd);
 }
 
 /**

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-01-20 16:00   ` Christoph Hellwig
@ 2004-01-20 16:47     ` Mike Anderson
  2004-01-22 13:59       ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Anderson @ 2004-01-20 16:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: James.Bottomley, gibbs, linux-scsi

Christoph Hellwig [hch@lst.de] wrote:
> On Tue, Jan 20, 2004 at 07:53:20AM -0800, Mike Anderson wrote:
> > Christoph Hellwig [hch@lst.de] wrote:
> > > to get control first after a command timeout.  Justin, does this look
> > > okay for you?  BTW, your drivers are the last ones using scsi_add_timer
> > > from outside the midlayer, if we could get rid of that we'd be able to
> > > keep the interface private.
> > > 
> > 
> > If a LLDD uses this interface how does the scsi_cmnd get returned back
> > to the mid-layer?
> 
> My thoughts about this interface would be that the driver performs
> internals actions of whatever means and then calls scsi_eh_scmd_add
> anyway - that's why it moved out of scsi_priv.h
> 

Sorry I read the patch to fast and missed the move.

> > If one uses scsi_done without a timer set it will
> > believe that the error handler is running and not pass it on to
> > scsi_softirq.
> 
> We could change that by moving the guts of scsi_done to a new
> __scsi_done and then make scsi_done a tiny wrapper around it, e.g:
> 

I assume the wrapper would only help if we went the direction of having
the LLDD call __scsi_done directly under some recovery case?

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-01-20 13:20 [PATCH] allow drivers to hook into watchdog timeout Christoph Hellwig
  2004-01-20 15:53 ` Mike Anderson
@ 2004-01-20 17:00 ` Brian King
  2004-01-20 18:06   ` Christoph Hellwig
  2004-02-10 16:34 ` Justin T. Gibbs
  2 siblings, 1 reply; 19+ messages in thread
From: Brian King @ 2004-01-20 17:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: James.Bottomley, gibbs, linux-scsi

Christoph Hellwig wrote:
> We all know talk is cheap, so here's a first draft patch to allow LLDDs
> to get control first after a command timeout.  Justin, does this look
> okay for you?  BTW, your drivers are the last ones using scsi_add_timer
> from outside the midlayer, if we could get rid of that we'd be able to
> keep the interface private.

If we get rid of scsi_add_timer, could we have a scsi_mod_timer? In the 
ipr driver I submitted, I would like to be able to simply double the 
timeout value, I don't necessarily want to change the timeout policy. 
The adapter firmware is already timing each command, so I would like to 
use that as the primary timing mechanism if possible.

For normal SCSI devices under this adapter I could change the philosophy 
and not let the adapter time the commands, but for disk array devices, 
the timeout value for reads/writes is too short.

-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-01-20 17:00 ` Brian King
@ 2004-01-20 18:06   ` Christoph Hellwig
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2004-01-20 18:06 UTC (permalink / raw)
  To: Brian King; +Cc: James.Bottomley, gibbs, linux-scsi

On Tue, Jan 20, 2004 at 11:00:15AM -0600, Brian King wrote:
> If we get rid of scsi_add_timer, could we have a scsi_mod_timer? In the 
> ipr driver I submitted, I would like to be able to simply double the 
> timeout value, I don't necessarily want to change the timeout policy. 
> The adapter firmware is already timing each command, so I would like to 
> use that as the primary timing mechanism if possible.
> 
> For normal SCSI devices under this adapter I could change the philosophy 
> and not let the adapter time the commands, but for disk array devices, 
> the timeout value for reads/writes is too short.

messing with the timer after it's setup sounds like the wrong way to do
that to me.  Overriding the timeout in the host template or the device
sounds like the better idea.  Given that there seem to be other adapters
that need longer timeouts, I wonder whether we should look for a generic
solution to this

> 
> 
> -- 
> Brian King
> eServer Storage I/O
> IBM Linux Technology Center
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---

-- 
Christoph Hellwig <hch@lst.de>		-	Freelance Hacker
Contact me for driver hacking and kernel development consulting

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-01-20 16:47     ` Mike Anderson
@ 2004-01-22 13:59       ` Christoph Hellwig
  2004-01-22 14:27         ` Justin T. Gibbs
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2004-01-22 13:59 UTC (permalink / raw)
  To: Christoph Hellwig, James.Bottomley, gibbs, linux-scsi

On Tue, Jan 20, 2004 at 08:47:11AM -0800, Mike Anderson wrote:
> I assume the wrapper would only help if we went the direction of having
> the LLDD call __scsi_done directly under some recovery case?

That's the case we'd use it for.  Whether it's a good thing is a different
question.  At least with the current SCSI code it probably creates more
problems than it could solve.  This whole thread is really just trying to
fish for the requirement from the driver writers.  And the maintainers of
the driver that could benefit from those unfortunately didn't answer yet..

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-01-22 13:59       ` Christoph Hellwig
@ 2004-01-22 14:27         ` Justin T. Gibbs
  0 siblings, 0 replies; 19+ messages in thread
From: Justin T. Gibbs @ 2004-01-22 14:27 UTC (permalink / raw)
  To: Christoph Hellwig, Christoph Hellwig, James.Bottomley, linux-scsi

> problems than it could solve.  This whole thread is really just trying to
> fish for the requirement from the driver writers.  And the maintainers of
> the driver that could benefit from those unfortunately didn't answer yet..

I'm pretty much heads down until the weekend - I'm working up to a Friday
deadline on another project.  The problem is complex, and I want to reserve
some time so that I can answer thoughtfully and fully to your proposal.
Please don't take my delayed response as indication that I'm not interested.
8-)

--
Justin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-01-20 13:20 [PATCH] allow drivers to hook into watchdog timeout Christoph Hellwig
  2004-01-20 15:53 ` Mike Anderson
  2004-01-20 17:00 ` Brian King
@ 2004-02-10 16:34 ` Justin T. Gibbs
  2004-02-10 16:42   ` James Bottomley
  2 siblings, 1 reply; 19+ messages in thread
From: Justin T. Gibbs @ 2004-02-10 16:34 UTC (permalink / raw)
  To: Christoph Hellwig, James.Bottomley; +Cc: linux-scsi

> We all know talk is cheap, so here's a first draft patch to allow LLDDs
> to get control first after a command timeout.  Justin, does this look
> okay for you?  BTW, your drivers are the last ones using scsi_add_timer
> from outside the midlayer, if we could get rid of that we'd be able to
> keep the interface private.

[ Sorry for taking so long to get back to you.  Things are still very
  hectic here... ]

I would rather not lose the ability for LLDs to setup/modify/etc.
timers.  This is because in an ideal world, the mid-layer and peripheral
drivers would specify the timeout value and let the LLD start and stop
the timer as it sees fit.   I say this because the mid-layer just can't
know all of the information that the HBA driver does:

o When should the timer start?  If the HBA controls the timer, the timer
  can be started only once the command is actually issued to the end device.
  The watchdog is supposed to ensure that the transport/device doesn't
  lockup, so the timer should only cover this period.  The mid-layer
  can't have this precision.  This is even more crucial for drivers that
  must temporarily hold back I/O to handle a topology change, LIP, or
  other transport specific event that the mid-layer is unaware of.

o When should the timer be stopped?  On command completion, of course,
  but there are other, transport specific, times when the timer may need
  to be stopped prematurely or given a completely different value than
  what was originally given.  For example, on transports that do not
  provide error/sense data with every completion, the HBA may have to
  issue another command, without the knowledge of the mid-layer, to
  retrieve this data.  Since the original command has already completed,
  the original timer should not be running.  A new timer, tailored to
  the characteristics of retrieving sense data should be running instead.
  If the old timer is left running and expires, which command timed out?
  The original command or the request sense command?  The HBA knows,
  but the mid-layer does not.

If you just allow the LLD drivers to claim responsibility for setting
and tearing down timers, there is no need to redirect the timer
action.  All that would be required is a check of a flag in the host
structure in scsi_add_timer to avoid starting the timer at all and
another check in scsi_done that doesn't enforce that a timer be
active on completion.

--
Justin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-02-10 16:34 ` Justin T. Gibbs
@ 2004-02-10 16:42   ` James Bottomley
  2004-02-10 17:47     ` Justin T. Gibbs
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2004-02-10 16:42 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: Christoph Hellwig, SCSI Mailing List

On Tue, 2004-02-10 at 11:34, Justin T. Gibbs wrote:
> o When should the timer start?  If the HBA controls the timer, the timer
>   can be started only once the command is actually issued to the end device.
>   The watchdog is supposed to ensure that the transport/device doesn't
>   lockup, so the timer should only cover this period.  The mid-layer
>   can't have this precision.  This is even more crucial for drivers that
>   must temporarily hold back I/O to handle a topology change, LIP, or
>   other transport specific event that the mid-layer is unaware of.

There was a change between 2.4 and 2.6 to address this.  It was the
concept of eliminating driver queues entirely and using the block layer
queueing facilities.  Thus, a 2.6 queuecommand should either issue the
command or return it to the mid-layer for requeueing.

I know that "issue the command" means queue in the device internal issue
queue for quite a few devices.  However, the time to traverse this queue
should be as short as possible.  The driver is free to watchdog the HW
issue queue to make sure it operates.

If you're spending seconds in the HW issue queue, there's something
wrong in the way the queue is working.

The bottom line is that the elimination of software queueing in drivers
should have dispensed with the need to modify the mid-layer timers.

> o When should the timer be stopped?  On command completion, of course,
>   but there are other, transport specific, times when the timer may need
>   to be stopped prematurely or given a completely different value than
>   what was originally given.  For example, on transports that do not
>   provide error/sense data with every completion, the HBA may have to
>   issue another command, without the knowledge of the mid-layer, to
>   retrieve this data.  Since the original command has already completed,
>   the original timer should not be running.  A new timer, tailored to
>   the characteristics of retrieving sense data should be running instead.
>   If the old timer is left running and expires, which command timed out?
>   The original command or the request sense command?  The HBA knows,
>   but the mid-layer does not.

commands are stopped as soon as the device acks.  scsi_done() is
designed to be called from interrupt level.  This can be done with or
without the lock since it uses a per cpu queue.

scsi_done stops the timer immediately.  Command processing is deferred
until the scsi softirq.

As far as ACA emulation goes, perhaps you're right, it is time to remove
that from the drivers as well and place it up in the mid-layer.  That
way we'd have better knowledge of the request sense timings.

James 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-02-10 16:42   ` James Bottomley
@ 2004-02-10 17:47     ` Justin T. Gibbs
  2004-02-10 18:41       ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Justin T. Gibbs @ 2004-02-10 17:47 UTC (permalink / raw)
  To: James Bottomley; +Cc: Christoph Hellwig, SCSI Mailing List

>> o When should the timer start?  If the HBA controls the timer, the timer
>>   can be started only once the command is actually issued to the end device.
>>   The watchdog is supposed to ensure that the transport/device doesn't
>>   lockup, so the timer should only cover this period.  The mid-layer
>>   can't have this precision.  This is even more crucial for drivers that
>>   must temporarily hold back I/O to handle a topology change, LIP, or
>>   other transport specific event that the mid-layer is unaware of.
> 
> There was a change between 2.4 and 2.6 to address this.  It was the
> concept of eliminating driver queues entirely and using the block layer
> queueing facilities.  Thus, a 2.6 queuecommand should either issue the
> command or return it to the mid-layer for requeueing.

You can't get rid of driver or controller queues entirely.  The command
will always live on some queue for some period of time before
it goes out on the transport.  This implies that there will always
be situations where I do not know in my queuecommand routine that the
command needs to be stalled.  In general, for this to work, the mid-layer
must provide:

1) A counting "device frozen semaphore" that the LLD or the mid-layer
   can decrement when I/O to this device needs to be halted.

2) An explicit scsi cmd code indicating "requeue this request - don't
   attempt recovery" for commands that are in internal queues that were
   innocently affected by a recovery or transport event.

FreeBSD has had this concept since '97, and I'd be more than happy to
use it if it were available and worked in Linux.

> If you're spending seconds in the HW issue queue, there's something
> wrong in the way the queue is working.

Who's to say that the timeout duration is seconds?  I've worked on several
products where the largest timeout for disk I/O was perhaps 100ms.  My
drivers are pretty good about not holding locks for long periods, but it is
not hard for me to believe that certain recovery operations may end up
holding off the queue of a new transaction for 10s of milliseconds in the
worst case.

>> o When should the timer be stopped?  On command completion, of course,
>>   but there are other, transport specific, times when the timer may need
>>   to be stopped prematurely or given a completely different value than
>>   what was originally given.  For example, on transports that do not
>>   provide error/sense data with every completion, the HBA may have to
>>   issue another command, without the knowledge of the mid-layer, to
>>   retrieve this data.  Since the original command has already completed,
>>   the original timer should not be running.  A new timer, tailored to
>>   the characteristics of retrieving sense data should be running instead.
>>   If the old timer is left running and expires, which command timed out?
>>   The original command or the request sense command?  The HBA knows,
>>   but the mid-layer does not.
> 
> commands are stopped as soon as the device acks.  scsi_done() is
> designed to be called from interrupt level.  This can be done with or
> without the lock since it uses a per cpu queue.

I'm well aware of how scsi_done() works.  The argument above has nothing to
do with normal completions.

> As far as ACA emulation goes, perhaps you're right, it is time to remove
> that from the drivers as well and place it up in the mid-layer.  That
> way we'd have better knowledge of the request sense timings.

Please don't confuse the issue by using the term "ACA".  Linux doesn't
take advantage of ACA so it has no place in this discussion.  Perhaps you
meant "auto-sense" emulation?  (SAM3 5.9.4.2).

For auto-sense to work, the driver will have to:

1) Abort all I/O waiting for issuance to the device.  On the aic7xxx and
   aic79xx hardware, there can be 10s of I/Os waiting to be issued to the
   device on the controller queue.  These would have to be aborted to the
   mid-layer in order to not clobber the sense information.

2) The mid-layer would have to queue the request sense operation at the
   head of the device's transaction queue to ensure it is the next command
   sent to the device.

To do this safely, and to allow the controller to abort pending transactions
in whatever order it finds convenient, the mid-layer should allow the LLD
to decrement the device queue frozen semaphore for each aborted command while
setting a flag in the command structure indicating it has done so.  The 
mid-layer can then up the semaphore after successfully processing each
command.  This ensures that the mid-layer cannot proceed until it has "seen"
the command that has the check condition.

Of course, not all controllers have this kind of ability, and the controller
drivers can often guarantee ordering in a controller specific manner without
all these complication.  You should also consider that any additional latency
added to this path may adversly effect tape (consider ILI reporting) and other
peripheral type performance where check conditions are more routine.
However, if you really want to go this route and you make the mid-layer handle
it correctly, I won't get in your way.

--
Justin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-02-10 17:47     ` Justin T. Gibbs
@ 2004-02-10 18:41       ` James Bottomley
  2004-02-10 19:44         ` Justin T. Gibbs
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2004-02-10 18:41 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: Christoph Hellwig, SCSI Mailing List

On Tue, 2004-02-10 at 12:47, Justin T. Gibbs wrote:
> >> o When should the timer start?  If the HBA controls the timer, the timer
> >>   can be started only once the command is actually issued to the end device.
> >>   The watchdog is supposed to ensure that the transport/device doesn't
> >>   lockup, so the timer should only cover this period.  The mid-layer
> >>   can't have this precision.  This is even more crucial for drivers that
> >>   must temporarily hold back I/O to handle a topology change, LIP, or
> >>   other transport specific event that the mid-layer is unaware of.
> > 
> > There was a change between 2.4 and 2.6 to address this.  It was the
> > concept of eliminating driver queues entirely and using the block layer
> > queueing facilities.  Thus, a 2.6 queuecommand should either issue the
> > command or return it to the mid-layer for requeueing.
> 
> You can't get rid of driver or controller queues entirely.  The command
> will always live on some queue for some period of time before
> it goes out on the transport.  This implies that there will always
> be situations where I do not know in my queuecommand routine that the
> command needs to be stalled.  In general, for this to work, the mid-layer
> must provide:

I assert you can get rid of driver queueing.  I agree about the
controller issue queue.

If you need to stall a command after you've accepted it by returning
zero from queuecommand, you return it to the mid-layer with status
either BUSY or QUEUE_FULL.

> 1) A counting "device frozen semaphore" that the LLD or the mid-layer
>    can decrement when I/O to this device needs to be halted.

If you really need to halt everything after returning BUSY, then the
scsi_block_requests()/scsi_unblock_requests() can be used for this.

> 2) An explicit scsi cmd code indicating "requeue this request - don't
>    attempt recovery" for commands that are in internal queues that were
>    innocently affected by a recovery or transport event.

commands in the controller issue queue innocently affected by recovery
should be returned to the mid-layer with DID_RESET, where they will be
reissued.

> > As far as ACA emulation goes, perhaps you're right, it is time to remove
> > that from the drivers as well and place it up in the mid-layer.  That
> > way we'd have better knowledge of the request sense timings.
> 
> Please don't confuse the issue by using the term "ACA".  Linux doesn't
> take advantage of ACA so it has no place in this discussion.  Perhaps you
> meant "auto-sense" emulation?  (SAM3 5.9.4.2).

OK, I mean the clearing of the contingent allegiance condition in the
driver by issuing the REQUEST SENSE directly.  That's the one case where
two commands are issued under one timer.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-02-10 18:41       ` James Bottomley
@ 2004-02-10 19:44         ` Justin T. Gibbs
  2004-02-10 20:05           ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Justin T. Gibbs @ 2004-02-10 19:44 UTC (permalink / raw)
  To: James Bottomley; +Cc: Christoph Hellwig, SCSI Mailing List

> If you need to stall a command after you've accepted it by returning
> zero from queuecommand, you return it to the mid-layer with status
> either BUSY or QUEUE_FULL.

BUSY and QUEUE_FULL status have particular meanings when associated
with an SCSI peripheral.  Using them for this purpose will only confuse
the mid-layer into taking unwarranted action, like trying to throttle
the queue depth.  The times that I want to use this have nothing to
do with BUSY or QUEUE_FULL in their SCSI sense.

>> 1) A counting "device frozen semaphore" that the LLD or the mid-layer
>>    can decrement when I/O to this device needs to be halted.
> 
> If you really need to halt everything after returning BUSY, then the
> scsi_block_requests()/scsi_unblock_requests() can be used for this.

scsi_block_requests() blocks the whole controller up.  I only want
and need to block the transactions going to a particular lun.

>> 2) An explicit scsi cmd code indicating "requeue this request - don't
>>    attempt recovery" for commands that are in internal queues that were
>>    innocently affected by a recovery or transport event.
> 
> commands in the controller issue queue innocently affected by recovery
> should be returned to the mid-layer with DID_RESET, where they will be
> reissued.

They will only be issued up to their command retry count which may be zero
for certain commands.  This may also confuse the peripheral or mid-layer
drivers into believing that a unit attention condition is expected and
should be ignored.  The commands that were affected by the recovery action
should be marked accordingly, but marking the commands that are still waiting
on the sidelines is the equivalent of a drive-by-shooting.

You've complained in the past about how the aic7xxx and aic79xx drivers
work around the mid-layer to get stuff to work correctly.  Above you are
asking for the LLD to continue to "lie" to the mid-layer instead of actually
fixing the problem.  What you are proposing here *loses* critical
information for the mid-layer and the perihperal drivers.  How is this
different or better than the other "lies" the current aic7xxx and aic79xx
drivers tell to convince the mid-layer to do the right thing?  I thought
the intention was to fix the mid-layer so that drivers could avoid these
kinds of things.

--
Justin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-02-10 19:44         ` Justin T. Gibbs
@ 2004-02-10 20:05           ` James Bottomley
  2004-02-10 20:26             ` Justin T. Gibbs
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2004-02-10 20:05 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: Christoph Hellwig, SCSI Mailing List

On Tue, 2004-02-10 at 14:44, Justin T. Gibbs wrote:
> > If you need to stall a command after you've accepted it by returning
> > zero from queuecommand, you return it to the mid-layer with status
> > either BUSY or QUEUE_FULL.
> 
> BUSY and QUEUE_FULL status have particular meanings when associated
> with an SCSI peripheral.  Using them for this purpose will only confuse
> the mid-layer into taking unwarranted action, like trying to throttle
> the queue depth.  The times that I want to use this have nothing to
> do with BUSY or QUEUE_FULL in their SCSI sense.

So if I give you an error code for this, like DID_REQEUEUE, you'll
eliminate the driver queueing from your queucommand and from your done
processing?

> >> 1) A counting "device frozen semaphore" that the LLD or the mid-layer
> >>    can decrement when I/O to this device needs to be halted.
> > 
> > If you really need to halt everything after returning BUSY, then the
> > scsi_block_requests()/scsi_unblock_requests() can be used for this.
> 
> scsi_block_requests() blocks the whole controller up.  I only want
> and need to block the transactions going to a particular lun.
> 
> >> 2) An explicit scsi cmd code indicating "requeue this request - don't
> >>    attempt recovery" for commands that are in internal queues that were
> >>    innocently affected by a recovery or transport event.
> > 
> > commands in the controller issue queue innocently affected by recovery
> > should be returned to the mid-layer with DID_RESET, where they will be
> > reissued.
> 
> They will only be issued up to their command retry count which may be zero
> for certain commands.  This may also confuse the peripheral or mid-layer
> drivers into believing that a unit attention condition is expected and
> should be ignored.  The commands that were affected by the recovery action
> should be marked accordingly, but marking the commands that are still waiting
> on the sidelines is the equivalent of a drive-by-shooting.

No, they won't.  DID_RESET doesn't count against the retry count (the
only things that affect the retry count are conditions that go through
the maybe_retry label in scsi_device_disposition()).

DID_RESET is designed for returning an uncompleted command where
recovery in one command affected another, which is what you say you
need.

It will not cause any unit attention exception processing.  That only
happens if the error handler knows it reset something, or the driver
reports that it is resetting something.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-02-10 20:05           ` James Bottomley
@ 2004-02-10 20:26             ` Justin T. Gibbs
  2004-02-10 22:47               ` Clay Haapala
  2004-02-11 20:05               ` James Bottomley
  0 siblings, 2 replies; 19+ messages in thread
From: Justin T. Gibbs @ 2004-02-10 20:26 UTC (permalink / raw)
  To: James Bottomley; +Cc: Christoph Hellwig, SCSI Mailing List

> On Tue, 2004-02-10 at 14:44, Justin T. Gibbs wrote:
>> > If you need to stall a command after you've accepted it by returning
>> > zero from queuecommand, you return it to the mid-layer with status
>> > either BUSY or QUEUE_FULL.
>> 
>> BUSY and QUEUE_FULL status have particular meanings when associated
>> with an SCSI peripheral.  Using them for this purpose will only confuse
>> the mid-layer into taking unwarranted action, like trying to throttle
>> the queue depth.  The times that I want to use this have nothing to
>> do with BUSY or QUEUE_FULL in their SCSI sense.
> 
> So if I give you an error code for this, like DID_REQEUEUE, you'll
> eliminate the driver queueing from your queucommand and from your done
> processing?

If I can freeze at per-device granularity and testing of the BUSY and
QUEUE_FULL paths in the mid-layer pan out, I believe the answer is yes.

>> They will only be issued up to their command retry count which may be zero
>> for certain commands.  This may also confuse the peripheral or mid-layer
>> drivers into believing that a unit attention condition is expected and
>> should be ignored.  The commands that were affected by the recovery action
>> should be marked accordingly, but marking the commands that are still waiting
>> on the sidelines is the equivalent of a drive-by-shooting.
> 
> No, they won't.  DID_RESET doesn't count against the retry count (the
> only things that affect the retry count are conditions that go through
> the maybe_retry label in scsi_device_disposition()).

This is only true if the peripheral driver calls scsi_io_completion().
The SG driver, for instance, does not.

> It will not cause any unit attention exception processing.  That only
> happens if the error handler knows it reset something, or the driver
> reports that it is resetting something.

That reminds me.  Reported bus/target resets do not cause a
bus/device-settle delay.  This is another one of the workarounds
in my driver.

--
Justin


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-02-10 20:26             ` Justin T. Gibbs
@ 2004-02-10 22:47               ` Clay Haapala
  2004-02-11 20:05               ` James Bottomley
  1 sibling, 0 replies; 19+ messages in thread
From: Clay Haapala @ 2004-02-10 22:47 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: James Bottomley, Christoph Hellwig, SCSI Mailing List

Some of this queuing/retry conversation sounds similar to the recent
discussion we had on the linux-iscsi driver.  I think that the
linux-iscsi ended up OK on it, but it is something to keep in mind for
the reviewers of the 4.0.0.4 version to see if we got it right.

On a related topic:
On Tue, 10 Feb 2004, Justin T. Gibbs told this:
> [ queuing/retry discussion between Justin and James omitted ]
> That reminds me.  Reported bus/target resets do not cause a
> bus/device-settle delay.  This is another one of the workarounds
> in my driver.

Is this the scenario where the device resets and issues a couple of
"Unit Attentions", etc.?
-- 
Clay Haapala (chaapala@cisco.com) Cisco Systems SRBU +1 763-398-1056
   6450 Wedgwood Rd, Suite 130 Maple Grove MN 55311 PGP: C89240AD
"Honor thy prototypes, that it may be well with thee, and thou mayest
   code long upon the Earth, else thy iniquity will be visited unto
      the third and fourth generation of those that revise it."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-02-10 20:26             ` Justin T. Gibbs
  2004-02-10 22:47               ` Clay Haapala
@ 2004-02-11 20:05               ` James Bottomley
  2004-02-12  0:15                 ` Justin T. Gibbs
  1 sibling, 1 reply; 19+ messages in thread
From: James Bottomley @ 2004-02-11 20:05 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: Christoph Hellwig, SCSI Mailing List

On Tue, 2004-02-10 at 15:26, Justin T. Gibbs wrote:
> > So if I give you an error code for this, like DID_REQEUEUE, you'll
> > eliminate the driver queueing from your queucommand and from your done
> > processing?
> 
> If I can freeze at per-device granularity and testing of the BUSY and
> QUEUE_FULL paths in the mid-layer pan out, I believe the answer is yes.

I believe we have everything that you need with regard to BUSY and
QUEUE_FULL.

It looks like a device_blocked interface won't be too much work, I'll
look at coding one up.

> > No, they won't.  DID_RESET doesn't count against the retry count (the
> > only things that affect the retry count are conditions that go through
> > the maybe_retry label in scsi_device_disposition()).
> 
> This is only true if the peripheral driver calls scsi_io_completion().
> The SG driver, for instance, does not.

But that's by design.  The application using SG_IO receives the error
code directly and is in control of deciding to retry.

> That reminds me.  Reported bus/target resets do not cause a
> bus/device-settle delay.  This is another one of the workarounds
> in my driver.

That's correct.  The interface is really designed to report that
something happened.  Since we may not know the actual timing of the
reported event, the philosophy is that is't better queue and retry
(while correctly processing the NOT_READY) than to impose an arbitrary
delay.

The rule is that the mid-layer only delays for events it initiated.

James

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-02-11 20:05               ` James Bottomley
@ 2004-02-12  0:15                 ` Justin T. Gibbs
  2004-02-12 14:42                   ` James Bottomley
  0 siblings, 1 reply; 19+ messages in thread
From: Justin T. Gibbs @ 2004-02-12  0:15 UTC (permalink / raw)
  To: James Bottomley; +Cc: Christoph Hellwig, SCSI Mailing List

> On Tue, 2004-02-10 at 15:26, Justin T. Gibbs wrote:
>> > So if I give you an error code for this, like DID_REQEUEUE, you'll
>> > eliminate the driver queueing from your queucommand and from your done
>> > processing?
>> 
>> If I can freeze at per-device granularity and testing of the BUSY and
>> QUEUE_FULL paths in the mid-layer pan out, I believe the answer is yes.
> 
> I believe we have everything that you need with regard to BUSY and
> QUEUE_FULL.
> 
> It looks like a device_blocked interface won't be too much work, I'll
> look at coding one up.

Okay.

>> > No, they won't.  DID_RESET doesn't count against the retry count (the
>> > only things that affect the retry count are conditions that go through
>> > the maybe_retry label in scsi_device_disposition()).
>> 
>> This is only true if the peripheral driver calls scsi_io_completion().
>> The SG driver, for instance, does not.
> 
> But that's by design.  The application using SG_IO receives the error
> code directly and is in control of deciding to retry.

That's fine if the application has good information to go on.  Most
of the comments in the SCSI layer indicate that DID_RESET means that
a bus reset event happened, not that the LLD wanted to unconditionally
retry a command that has never seen the transport.

>> That reminds me.  Reported bus/target resets do not cause a
>> bus/device-settle delay.  This is another one of the workarounds
>> in my driver.
> 
> That's correct.  The interface is really designed to report that
> something happened.  Since we may not know the actual timing of the
> reported event, the philosophy is that is't better queue and retry
> (while correctly processing the NOT_READY) than to impose an arbitrary
> delay.
> 
> The rule is that the mid-layer only delays for events it initiated.

Well, this breaks lots of devices like external RAID controllers that
need at least a few hundred ms bus settle delay before they will handle
a new command.  This controllers often initiate the bus reset when they
do a module failover or shutdown (e.g. upgrading one of the two controllers
in a redundant controller).  Without a bus settle delay, these devices
are taken offline by the mid-layer.

You have to enforce the delay regardless of where the reset event comes
from.  The devices on the bus don't care who reset the bus, and their
behavior doesn't differ when Linux does the reset or some third party
does.

--
Justin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] allow drivers to hook into watchdog timeout
  2004-02-12  0:15                 ` Justin T. Gibbs
@ 2004-02-12 14:42                   ` James Bottomley
  0 siblings, 0 replies; 19+ messages in thread
From: James Bottomley @ 2004-02-12 14:42 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: Christoph Hellwig, SCSI Mailing List

On Wed, 2004-02-11 at 19:15, Justin T. Gibbs wrote: 
> > But that's by design.  The application using SG_IO receives the error
> > code directly and is in control of deciding to retry.
> 
> That's fine if the application has good information to go on.  Most
> of the comments in the SCSI layer indicate that DID_RESET means that
> a bus reset event happened, not that the LLD wanted to unconditionally
> retry a command that has never seen the transport.

DID_RESET doesn't mean the LLD wants to retry the command
unconditionally.  It means that the LLD is reporting that the command
was affected by error recovery actions and should be retried. 

In the normal course of events, the mid-layer will do the retry without
incrementing the retry count. 

applications issuing direct commands get to decide what the policy
should be. 

> > The rule is that the mid-layer only delays for events it initiated.
> 
> Well, this breaks lots of devices like external RAID controllers that
> need at least a few hundred ms bus settle delay before they will handle
> a new command.  This controllers often initiate the bus reset when they
> do a module failover or shutdown (e.g. upgrading one of the two controllers
> in a redundant controller).  Without a bus settle delay, these devices
> are taken offline by the mid-layer.
> 
> You have to enforce the delay regardless of where the reset event comes
> from.  The devices on the bus don't care who reset the bus, and their
> behavior doesn't differ when Linux does the reset or some third party
> does.

Well, there are two delays, aren't there: the bus settle delay and the
device ready delay.  The latter isn't really determinable, but the
device is supposed to return NOT_READY while in it.

For the former, it only applies to a bus reset on SPI, so I'd like to
handle it in the transport class.

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-02-12 14:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-01-20 13:20 [PATCH] allow drivers to hook into watchdog timeout Christoph Hellwig
2004-01-20 15:53 ` Mike Anderson
2004-01-20 16:00   ` Christoph Hellwig
2004-01-20 16:47     ` Mike Anderson
2004-01-22 13:59       ` Christoph Hellwig
2004-01-22 14:27         ` Justin T. Gibbs
2004-01-20 17:00 ` Brian King
2004-01-20 18:06   ` Christoph Hellwig
2004-02-10 16:34 ` Justin T. Gibbs
2004-02-10 16:42   ` James Bottomley
2004-02-10 17:47     ` Justin T. Gibbs
2004-02-10 18:41       ` James Bottomley
2004-02-10 19:44         ` Justin T. Gibbs
2004-02-10 20:05           ` James Bottomley
2004-02-10 20:26             ` Justin T. Gibbs
2004-02-10 22:47               ` Clay Haapala
2004-02-11 20:05               ` James Bottomley
2004-02-12  0:15                 ` Justin T. Gibbs
2004-02-12 14:42                   ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox