public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH scsi-misc-2.6 00/04] scsi: misc timer fixes (reworked)
@ 2005-04-19 14:31 Tejun Heo
  2005-04-19 14:31 ` [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout Tejun Heo
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Tejun Heo @ 2005-04-19 14:31 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, linux-kernel

 Hello, James.

 This patchset contains the following patches from the previous timer
update patchset.

 02_scsi_timer_eh_timer_fix.patch
 03_scsi_timer_dispatch_race_fix.patch
 04_scsi_timer_remove_delete_timer_from_reset_provider.patch

 eh_timer_fix is reworked as you suggested and split into two
patches. (#01 and #02)

 In the current bk repository, aic7xxx_osm.c has been updated to not
use scsi_add_timer() but aic79xx_osm.c hasn't been yet.  I guess it's
gonna be updated sometime soon, so I've dropped the aic7xxx update
patch.

 As aic79xx_osm.c still uses scsi_add_timer(), timer API update
patches are omitted.  I'll repost them once aic79xx_osm.c is
converted.

 dispatch_race_fix and remove_delete_timer_from_reset_provider patches
are the same as in the previous posting.  If you've already applied
them, just ignore those two (#03 and #04).

 The following bugs are fixed.

 * Race condition between eh and normal completion path for eh_timer
 * scsi_delete_timer() race in scsi_queue_insert()

[ Start of patch descriptions ]

01_scsi_timer_eh_timer_fix.patch
	: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout

	scmd->eh_timeout is used to resolve the race between command
	completion and timeout.  However, during error handling,
	scsi_send_eh_cmnd uses scmd->eh_timeout.  This creates a race
	condition between eh and normal completion for a request which
	has timed out and in the process of error handling.  If the
	request completes while scmd->eh_timeout is being used by eh,
	eh timeout is lost and the command will be handled by both eh
	and completion path.  This patch fixes the race by making
	scsi_send_eh_cmnd() use its own timer.

	This patch adds shost->eh_timeout field.  The name of the
	field equals scmd->eh_timeout which is used for normal command
	timeout.  As this can be confusing, renaming scmd->eh_timeout
	to something like scmd->cmd_timeout would be good.

	Reworked such that timeout race window is kept at minimal
	level as pointed out by James Bottomley.

02_scsi_timer_eh_timer_remove_spurious_if.patch
	: remove spurious if tests from scsi_eh_{times_out|done}

	If tests which check if eh_action isn't NULL in both functions
	are always true.  Remove the if's.

03_scsi_timer_dispatch_race_fix.patch
	: remove a timer race in scsi_queue_insert()

	scsi_queue_insert() has four callers.  Three callers call with
	timer disabled and one (the second invocation in
	scsi_dispatch_cmd()) calls with timer activated.
	scsi_queue_insert() used to always call scsi_delete_timer()
	and ignore the return value.  This results in race with timer
	expiration.  Remove scsi_delete_timer() call from
	scsi_queue_insert() and make the caller delete timer and check
	the return value.

04_scsi_timer_remove_delete_timer_from_reset_provider.patch
	: remove unnecessary scsi_delete_timer() call in scsi_reset_provider()

	scsi_reset_provider() calls scsi_delete_timer() on exit which
	isn't necessary.  Remove it.

[ End of patch descriptions ]

 Thanks.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout
  2005-04-19 14:31 [PATCH scsi-misc-2.6 00/04] scsi: misc timer fixes (reworked) Tejun Heo
@ 2005-04-19 14:31 ` Tejun Heo
  2005-04-24 22:22   ` James Bottomley
  2005-04-19 14:31 ` [PATCH scsi-misc-2.6 02/04] scsi: remove spurious if tests from scsi_eh_{times_out|done} Tejun Heo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2005-04-19 14:31 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, linux-kernel

01_scsi_timer_eh_timer_fix.patch

	scmd->eh_timeout is used to resolve the race between command
	completion and timeout.  However, during error handling,
	scsi_send_eh_cmnd uses scmd->eh_timeout.  This creates a race
	condition between eh and normal completion for a request which
	has timed out and in the process of error handling.  If the
	request completes while scmd->eh_timeout is being used by eh,
	eh timeout is lost and the command will be handled by both eh
	and completion path.  This patch fixes the race by making
	scsi_send_eh_cmnd() use its own timer.

	This patch adds shost->eh_timeout field.  The name of the
	field equals scmd->eh_timeout which is used for normal command
	timeout.  As this can be confusing, renaming scmd->eh_timeout
	to something like scmd->cmd_timeout would be good.

	Reworked such that timeout race window is kept at minimal
	level as pointed out by James Bottomley.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 drivers/scsi/scsi_error.c |   26 ++++++++++++++++++++------
 include/scsi/scsi_host.h  |    1 +
 2 files changed, 21 insertions(+), 6 deletions(-)

Index: scsi-reqfn-export/drivers/scsi/scsi_error.c
===================================================================
--- scsi-reqfn-export.orig/drivers/scsi/scsi_error.c	2005-04-19 23:28:33.000000000 +0900
+++ scsi-reqfn-export/drivers/scsi/scsi_error.c	2005-04-19 23:30:57.000000000 +0900
@@ -428,8 +428,10 @@ static int scsi_eh_completed_normally(st
  *    for some action to complete on the device.  our only job is to
  *    record that it timed out, and to wake up the thread.
  **/
-static void scsi_eh_times_out(struct scsi_cmnd *scmd)
+static void scsi_eh_times_out(unsigned long arg)
 {
+	struct scsi_cmnd *scmd = (void *)arg;
+
 	scsi_eh_eflags_set(scmd, SCSI_EH_REC_TIMEOUT);
 	SCSI_LOG_ERROR_RECOVERY(3, printk("%s: scmd:%p\n", __FUNCTION__,
 					  scmd));
@@ -448,9 +450,11 @@ static void scsi_eh_done(struct scsi_cmn
 	 * if the timeout handler is already running, then just set the
 	 * flag which says we finished late, and return.  we have no
 	 * way of stopping the timeout handler from running, so we must
-	 * always defer to it.
+	 * always defer to it.  note that by doing timer removal here
+	 * the window in which normally completed commands are treated
+	 * as timed out is kept at minimal level.
 	 */
-	if (del_timer(&scmd->eh_timeout)) {
+	if (del_timer(scmd->device->host->eh_timeout)) {
 		scmd->request->rq_status = RQ_SCSI_DONE;
 		scmd->owner = SCSI_OWNER_ERROR_HANDLER;
 
@@ -478,6 +482,7 @@ static int scsi_send_eh_cmnd(struct scsi
 {
 	struct scsi_device *sdev = scmd->device;
 	struct Scsi_Host *shost = sdev->host;
+	struct timer_list timer;
 	DECLARE_MUTEX_LOCKED(sem);
 	unsigned long flags;
 	int rtn = SUCCESS;
@@ -492,7 +497,15 @@ static int scsi_send_eh_cmnd(struct scsi
 		scmd->cmnd[1] = (scmd->cmnd[1] & 0x1f) |
 			(sdev->lun << 5 & 0xe0);
 
-	scsi_add_timer(scmd, timeout, scsi_eh_times_out);
+	/*
+	 * set up eh timer.
+	 */
+	shost->eh_timeout = &timer;
+	init_timer(&timer);
+	timer.expires = jiffies + timeout;
+	timer.function = scsi_eh_times_out;
+	timer.data = (unsigned long)scmd;
+	add_timer(&timer);
 
 	/*
 	 * set up the semaphore so we wait for the command to complete.
@@ -508,8 +521,6 @@ static int scsi_send_eh_cmnd(struct scsi
 	down(&sem);
 	scsi_log_completion(scmd, SUCCESS);
 
-	shost->eh_action = NULL;
-
 	/*
 	 * see if timeout.  if so, tell the host to forget about it.
 	 * in other words, we don't want a callback any more.
@@ -539,6 +550,9 @@ static int scsi_send_eh_cmnd(struct scsi
 		rtn = FAILED;
 	}
 
+	shost->eh_action = NULL;
+	shost->eh_timeout = NULL;
+
 	SCSI_LOG_ERROR_RECOVERY(3, printk("%s: scmd: %p, rtn:%x\n",
 					  __FUNCTION__, scmd, rtn));
 
Index: scsi-reqfn-export/include/scsi/scsi_host.h
===================================================================
--- scsi-reqfn-export.orig/include/scsi/scsi_host.h	2005-04-19 23:28:33.000000000 +0900
+++ scsi-reqfn-export/include/scsi/scsi_host.h	2005-04-19 23:30:57.000000000 +0900
@@ -442,6 +442,7 @@ struct Scsi_Host {
 	struct completion     * eh_notify; /* wait for eh to begin or end */
 	struct semaphore      * eh_action; /* Wait for specific actions on the
                                           host. */
+	struct timer_list     * eh_timeout;  /* Used to timeout eh commands */
 	unsigned int            eh_active:1; /* Indicates the eh thread is awake and active if
                                           this is true. */
 	unsigned int            eh_kill:1; /* set when killing the eh thread */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH scsi-misc-2.6 02/04] scsi: remove spurious if tests from scsi_eh_{times_out|done}
  2005-04-19 14:31 [PATCH scsi-misc-2.6 00/04] scsi: misc timer fixes (reworked) Tejun Heo
  2005-04-19 14:31 ` [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout Tejun Heo
@ 2005-04-19 14:31 ` Tejun Heo
  2005-04-19 14:31 ` [PATCH scsi-misc-2.6 03/04] scsi: remove a timer race in scsi_queue_insert() Tejun Heo
  2005-04-19 14:31 ` [PATCH scsi-misc-2.6 04/04] scsi: remove unnecessary scsi_delete_timer() call in scsi_reset_provider() Tejun Heo
  3 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2005-04-19 14:31 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, linux-kernel

02_scsi_timer_eh_timer_remove_spurious_if.patch

	If tests which check if eh_action isn't NULL in both functions
	are always true.  Remove the if's.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 scsi_error.c |    6 ++----
 1 files changed, 2 insertions(+), 4 deletions(-)

Index: scsi-reqfn-export/drivers/scsi/scsi_error.c
===================================================================
--- scsi-reqfn-export.orig/drivers/scsi/scsi_error.c	2005-04-19 23:30:57.000000000 +0900
+++ scsi-reqfn-export/drivers/scsi/scsi_error.c	2005-04-19 23:30:58.000000000 +0900
@@ -436,8 +436,7 @@ static void scsi_eh_times_out(unsigned l
 	SCSI_LOG_ERROR_RECOVERY(3, printk("%s: scmd:%p\n", __FUNCTION__,
 					  scmd));
 
-	if (scmd->device->host->eh_action)
-		up(scmd->device->host->eh_action);
+	up(scmd->device->host->eh_action);
 }
 
 /**
@@ -461,8 +460,7 @@ static void scsi_eh_done(struct scsi_cmn
 		SCSI_LOG_ERROR_RECOVERY(3, printk("%s scmd: %p result: %x\n",
 					   __FUNCTION__, scmd, scmd->result));
 
-		if (scmd->device->host->eh_action)
-			up(scmd->device->host->eh_action);
+		up(scmd->device->host->eh_action);
 	}
 }
 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH scsi-misc-2.6 03/04] scsi: remove a timer race in scsi_queue_insert()
  2005-04-19 14:31 [PATCH scsi-misc-2.6 00/04] scsi: misc timer fixes (reworked) Tejun Heo
  2005-04-19 14:31 ` [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout Tejun Heo
  2005-04-19 14:31 ` [PATCH scsi-misc-2.6 02/04] scsi: remove spurious if tests from scsi_eh_{times_out|done} Tejun Heo
@ 2005-04-19 14:31 ` Tejun Heo
  2005-04-19 14:31 ` [PATCH scsi-misc-2.6 04/04] scsi: remove unnecessary scsi_delete_timer() call in scsi_reset_provider() Tejun Heo
  3 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2005-04-19 14:31 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, linux-kernel

03_scsi_timer_dispatch_race_fix.patch

	scsi_queue_insert() has four callers.  Three callers call with
	timer disabled and one (the second invocation in
	scsi_dispatch_cmd()) calls with timer activated.
	scsi_queue_insert() used to always call scsi_delete_timer()
	and ignore the return value.  This results in race with timer
	expiration.  Remove scsi_delete_timer() call from
	scsi_queue_insert() and make the caller delete timer and check
	the return value.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 scsi.c     |   10 ++++++----
 scsi_lib.c |    8 +-------
 2 files changed, 7 insertions(+), 11 deletions(-)

Index: scsi-reqfn-export/drivers/scsi/scsi.c
===================================================================
--- scsi-reqfn-export.orig/drivers/scsi/scsi.c	2005-04-19 23:28:33.000000000 +0900
+++ scsi-reqfn-export/drivers/scsi/scsi.c	2005-04-19 23:30:58.000000000 +0900
@@ -638,10 +638,12 @@ int scsi_dispatch_cmd(struct scsi_cmnd *
 	}
 	spin_unlock_irqrestore(host->host_lock, flags);
 	if (rtn) {
-		atomic_inc(&cmd->device->iodone_cnt);
-		scsi_queue_insert(cmd,
-				(rtn == SCSI_MLQUEUE_DEVICE_BUSY) ?
-				 rtn : SCSI_MLQUEUE_HOST_BUSY);
+		if (scsi_delete_timer(cmd)) {
+			atomic_inc(&cmd->device->iodone_cnt);
+			scsi_queue_insert(cmd,
+					  (rtn == SCSI_MLQUEUE_DEVICE_BUSY) ?
+					  rtn : SCSI_MLQUEUE_HOST_BUSY);
+		}
 		SCSI_LOG_MLQUEUE(3,
 		    printk("queuecommand : request rejected\n"));
 	}
Index: scsi-reqfn-export/drivers/scsi/scsi_lib.c
===================================================================
--- scsi-reqfn-export.orig/drivers/scsi/scsi_lib.c	2005-04-19 23:28:33.000000000 +0900
+++ scsi-reqfn-export/drivers/scsi/scsi_lib.c	2005-04-19 23:30:58.000000000 +0900
@@ -124,13 +124,7 @@ int scsi_queue_insert(struct scsi_cmnd *
 		 printk("Inserting command %p into mlqueue\n", cmd));
 
 	/*
-	 * We are inserting the command into the ml queue.  First, we
-	 * cancel the timer, so it doesn't time out.
-	 */
-	scsi_delete_timer(cmd);
-
-	/*
-	 * Next, set the appropriate busy bit for the device/host.
+	 * Set the appropriate busy bit for the device/host.
 	 *
 	 * If the host/device isn't busy, assume that something actually
 	 * completed, and that we should be able to queue a command now.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH scsi-misc-2.6 04/04] scsi: remove unnecessary scsi_delete_timer() call in scsi_reset_provider()
  2005-04-19 14:31 [PATCH scsi-misc-2.6 00/04] scsi: misc timer fixes (reworked) Tejun Heo
                   ` (2 preceding siblings ...)
  2005-04-19 14:31 ` [PATCH scsi-misc-2.6 03/04] scsi: remove a timer race in scsi_queue_insert() Tejun Heo
@ 2005-04-19 14:31 ` Tejun Heo
  3 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2005-04-19 14:31 UTC (permalink / raw)
  To: James.Bottomley; +Cc: linux-scsi, linux-kernel

04_scsi_timer_remove_delete_timer_from_reset_provider.patch

	scsi_reset_provider() calls scsi_delete_timer() on exit which
	isn't necessary.  Remove it.

Signed-off-by: Tejun Heo <htejun@gmail.com>

 scsi_error.c |    1 -
 1 files changed, 1 deletion(-)

Index: scsi-reqfn-export/drivers/scsi/scsi_error.c
===================================================================
--- scsi-reqfn-export.orig/drivers/scsi/scsi_error.c	2005-04-19 23:30:58.000000000 +0900
+++ scsi-reqfn-export/drivers/scsi/scsi_error.c	2005-04-19 23:30:58.000000000 +0900
@@ -1882,7 +1882,6 @@ scsi_reset_provider(struct scsi_device *
 		rtn = FAILED;
 	}
 
-	scsi_delete_timer(scmd);
 	scsi_next_command(scmd);
 	return rtn;
 }


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout
  2005-04-19 14:31 ` [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout Tejun Heo
@ 2005-04-24 22:22   ` James Bottomley
  2005-04-24 23:46     ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2005-04-24 22:22 UTC (permalink / raw)
  To: Tejun Heo; +Cc: SCSI Mailing List, Linux Kernel

On Tue, 2005-04-19 at 23:31 +0900, Tejun Heo wrote:
> 	scmd->eh_timeout is used to resolve the race between command
> 	completion and timeout.  However, during error handling,
> 	scsi_send_eh_cmnd uses scmd->eh_timeout.  This creates a race
> 	condition between eh and normal completion for a request which
> 	has timed out and in the process of error handling.  If the
> 	request completes while scmd->eh_timeout is being used by eh,
> 	eh timeout is lost and the command will be handled by both eh
> 	and completion path.  This patch fixes the race by making
> 	scsi_send_eh_cmnd() use its own timer.
> 
> 	This patch adds shost->eh_timeout field.  The name of the
> 	field equals scmd->eh_timeout which is used for normal command
> 	timeout.  As this can be confusing, renaming scmd->eh_timeout
> 	to something like scmd->cmd_timeout would be good.
> 
> 	Reworked such that timeout race window is kept at minimal
> 	level as pointed out by James Bottomley.

This looks fine in principle.  However, three comments

1. If you're doing this, there's no further use for eh_timeout, so
remove it (and preferably fix gdth_proc.c; however, it's better to break
the compile of that driver than have it rely on a now defunct field).
2. Use of eh_action is private to scsi_error.c, so you don't need to add
a new field to the host, just make eh_action a pointer to a private
eh_action structure which contains the timer and the semaphore.
3. To close a really tiny window where the running timer could race with
the del_timer, it should probably be del_timer_sync().  The practical
effect of this is nil, but it would be correct programming.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout
  2005-04-24 22:22   ` James Bottomley
@ 2005-04-24 23:46     ` Tejun Heo
  2005-04-25 18:09       ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2005-04-24 23:46 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI Mailing List, Linux Kernel


  Hi, James.

James Bottomley wrote:
> On Tue, 2005-04-19 at 23:31 +0900, Tejun Heo wrote:
> 
>>	scmd->eh_timeout is used to resolve the race between command
>>	completion and timeout.  However, during error handling,
>>	scsi_send_eh_cmnd uses scmd->eh_timeout.  This creates a race
>>	condition between eh and normal completion for a request which
>>	has timed out and in the process of error handling.  If the
>>	request completes while scmd->eh_timeout is being used by eh,
>>	eh timeout is lost and the command will be handled by both eh
>>	and completion path.  This patch fixes the race by making
>>	scsi_send_eh_cmnd() use its own timer.
>>
>>	This patch adds shost->eh_timeout field.  The name of the
>>	field equals scmd->eh_timeout which is used for normal command
>>	timeout.  As this can be confusing, renaming scmd->eh_timeout
>>	to something like scmd->cmd_timeout would be good.
>>
>>	Reworked such that timeout race window is kept at minimal
>>	level as pointed out by James Bottomley.
> 
> 
> This looks fine in principle.  However, three comments
> 
> 1. If you're doing this, there's no further use for eh_timeout, so
> remove it (and preferably fix gdth_proc.c; however, it's better to break
> the compile of that driver than have it rely on a now defunct field).

  If you're talking about scmd->eh_timeout, it's our main timer for 
normal command timeouts.  If you're suggesting renaming it to something 
more apparant, I agree.  Maybe just scmd->timeout will do.

> 2. Use of eh_action is private to scsi_error.c, so you don't need to add
> a new field to the host, just make eh_action a pointer to a private
> eh_action structure which contains the timer and the semaphore.

  Sure.

> 3. To close a really tiny window where the running timer could race with
> the del_timer, it should probably be del_timer_sync().  The practical
> effect of this is nil, but it would be correct programming.

  Sorry, but, AFAICT, that wouldn't close any window.  We use timer 
pending for tie-breaker.  When scsi_eh_done() wins, timer never gets to 
run, and if scsi_eh_times_out() wins, the eh thread is woken up only 
after the last reference to the timer/eh is finished (up operation).  If 
I'm missing something, please point out.

  BTW, are you still keeping the bk tree up-to-date?  And, if so, until 
when are you gonna keep the bk tree?  I'm painfully trying to follow and 
convert all my trees to git, but I _really_ miss changeset browsing of 
bk.  Call me lazy but it was just too nice browsing the changesets only 
with mouse.  One way or the other, it's a shame.

  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout
  2005-04-24 23:46     ` Tejun Heo
@ 2005-04-25 18:09       ` James Bottomley
  2005-04-27  2:22         ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2005-04-25 18:09 UTC (permalink / raw)
  To: Tejun Heo; +Cc: SCSI Mailing List, Linux Kernel

On Mon, 2005-04-25 at 08:46 +0900, Tejun Heo wrote:
>   If you're talking about scmd->eh_timeout, it's our main timer for 
> normal command timeouts.  If you're suggesting renaming it to something 
> more apparant, I agree.  Maybe just scmd->timeout will do.

Sorry ... actually on the ball now; I was assuming you simply wanted not
to use the field for efficiency.  

So, actually having read the description, you think that reusing the
eh_timeout in the error handler command submission path could confuse
the normal done routine if the host still has the command pending and
completes it?

Jmaes



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout
  2005-04-25 18:09       ` James Bottomley
@ 2005-04-27  2:22         ` Tejun Heo
  2005-04-27  5:34           ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2005-04-27  2:22 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI Mailing List, Linux Kernel

James Bottomley wrote:
> On Mon, 2005-04-25 at 08:46 +0900, Tejun Heo wrote:
> 
>>  If you're talking about scmd->eh_timeout, it's our main timer for 
>>normal command timeouts.  If you're suggesting renaming it to something 
>>more apparant, I agree.  Maybe just scmd->timeout will do.
> 
> 
> Sorry ... actually on the ball now; I was assuming you simply wanted not
> to use the field for efficiency.  
> 
> So, actually having read the description, you think that reusing the
> eh_timeout in the error handler command submission path could confuse
> the normal done routine if the host still has the command pending and
> completes it?

 Hi, James.

 Sorry about late reply.  Been busy and currently on the run, so please
excuse me for being brief.

 * A command is passed to lldd and starts execution
 * It times out.
 * eh runs
 * abort isn't implemented or fails
 * eh issues eh cmd (TUL, STU...)
 * The command miraculously & stupidly completes just now.
 * The lldd succeeds to delete timer and normal completion path runs.
 * We're fucked up now.

 If anything is wrong, please point out.

 Thanks.  Gotta go.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout
  2005-04-27  2:22         ` Tejun Heo
@ 2005-04-27  5:34           ` James Bottomley
  2005-04-27 11:50             ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2005-04-27  5:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: SCSI Mailing List, Linux Kernel

On Wed, 2005-04-27 at 11:22 +0900, Tejun Heo wrote:
>  * A command is passed to lldd and starts execution
>  * It times out.
>  * eh runs
>  * abort isn't implemented or fails
>  * eh issues eh cmd (TUL, STU...)
>  * The command miraculously & stupidly completes just now.

This should be impossible.  The error handler API requirement is that
the driver relinquish a command once it returns success from any error
handling callback ... and if it never returns success, we simply offline
the device and never use it again.  This is the principle behind the
command reuse: we only try an additional command *after* error handling
succeeds, so the error handler now owns the command absolutely.

>  * The lldd succeeds to delete timer and normal completion path runs.
>  * We're fucked up now.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout
  2005-04-27  5:34           ` James Bottomley
@ 2005-04-27 11:50             ` Tejun Heo
  0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2005-04-27 11:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI Mailing List, Linux Kernel

  Hello, James.

James Bottomley wrote:
> On Wed, 2005-04-27 at 11:22 +0900, Tejun Heo wrote:
> 
>> * A command is passed to lldd and starts execution
>> * It times out.
>> * eh runs
>> * abort isn't implemented or fails
>> * eh issues eh cmd (TUL, STU...)
>> * The command miraculously & stupidly completes just now.
> 
> 
> This should be impossible.  The error handler API requirement is that
> the driver relinquish a command once it returns success from any error
> handling callback ... and if it never returns success, we simply offline
> the device and never use it again.  This is the principle behind the
> command reuse: we only try an additional command *after* error handling
> succeeds, so the error handler now owns the command absolutely.
> 

  Hmmm, yeah, it currently cannot happen, and if what you're describing 
is a requirement, everything should be okay.  But, I still think that 
using separate timer will be better as it won't add any overhead (with 
the change you proposed) and it makes the somewhat unobivous requirement 
go away.  Or at least add BUG_ON() test or something to make the 
requirement clear.

  What do you think?

  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2005-04-27 11:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-19 14:31 [PATCH scsi-misc-2.6 00/04] scsi: misc timer fixes (reworked) Tejun Heo
2005-04-19 14:31 ` [PATCH scsi-misc-2.6 01/04] scsi: make scsi_send_eh_cmnd use its own timer instead of scmd->eh_timeout Tejun Heo
2005-04-24 22:22   ` James Bottomley
2005-04-24 23:46     ` Tejun Heo
2005-04-25 18:09       ` James Bottomley
2005-04-27  2:22         ` Tejun Heo
2005-04-27  5:34           ` James Bottomley
2005-04-27 11:50             ` Tejun Heo
2005-04-19 14:31 ` [PATCH scsi-misc-2.6 02/04] scsi: remove spurious if tests from scsi_eh_{times_out|done} Tejun Heo
2005-04-19 14:31 ` [PATCH scsi-misc-2.6 03/04] scsi: remove a timer race in scsi_queue_insert() Tejun Heo
2005-04-19 14:31 ` [PATCH scsi-misc-2.6 04/04] scsi: remove unnecessary scsi_delete_timer() call in scsi_reset_provider() Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox