sg driver and the error handler

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* sg driver and the error handler
@ 2005-05-10 18:51 Alan Stern
  2005-05-10 22:58 ` Patrick Mansfield
  0 siblings, 1 reply; 19+ messages in thread
From: Alan Stern @ 2005-05-10 18:51 UTC (permalink / raw)
  To: James Bottomley, Doug Gilbert; +Cc: SCSI development list

I don't know who the right person is to handle this, but maybe someone can 
help...

When a command injected through the sg driver encounters any sort of
error, the usual retry mechanism and error handler are brought into play.  
Since sg sets the number of retries to 0 by default, the retry mechanism
shouldn't cause any difficulty.  But the error handler does.  IMO it
should never get involved with requests coming through sg -- sg should
provide access that is as transparent as possible so that userspace
programs will have a clean shot at managing their device.

Consider a case that just came up.  A USB CD drive causes a phase error
when it receives a certain READ BUFFER command (buggy firmware on the
drive, never mind that now).  You would expect the user program to receive
from sg a host_status value indicating DID_ERROR or something of the sort.

Instead the error handler takes charge and sends out one or two TEST UNIT 
READY commands.  The status information finally received by the user 
program is the status from the TEST UNIT READY, not from the failed READ 
BUFFER!  How's a program supposed to cope with that sort of obfuscation?

Something in the SCSI stack (scsi_io_completion ?) should check for 
requests coming in via sg and should know to complete the requests 
immediately.  No requeuing and no error handling.

Does this sound like a feasible thing to implement?

Alan Stern

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sg driver and the error handler
  2005-05-10 18:51 sg driver and the error handler Alan Stern
@ 2005-05-10 22:58 ` Patrick Mansfield
  2005-05-11 11:59   ` Douglas Gilbert
                     ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Patrick Mansfield @ 2005-05-10 22:58 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, Doug Gilbert, SCSI development list

Hi Alan -

On Tue, May 10, 2005 at 02:51:44PM -0400, Alan Stern wrote:
> When a command injected through the sg driver encounters any sort of
> error, the usual retry mechanism and error handler are brought into play.  
> Since sg sets the number of retries to 0 by default, the retry mechanism
> shouldn't cause any difficulty.  But the error handler does.  IMO it
> should never get involved with requests coming through sg -- sg should
> provide access that is as transparent as possible so that userspace
> programs will have a clean shot at managing their device.

The error handler is mainly a timeout handler, so it has to run to cancel
the timed out command, we can't complete the timed out command back to the
upper level until we know the HBA is no longer using it.

> Consider a case that just came up.  A USB CD drive causes a phase error
> when it receives a certain READ BUFFER command (buggy firmware on the
> drive, never mind that now).  You would expect the user program to receive
> from sg a host_status value indicating DID_ERROR or something of the sort.
> 
> Instead the error handler takes charge and sends out one or two TEST UNIT 
> READY commands.  The status information finally received by the user 
> program is the status from the TEST UNIT READY, not from the failed READ 
> BUFFER!  How's a program supposed to cope with that sort of obfuscation?

:-(

> Something in the SCSI stack (scsi_io_completion ?) should check for 
> requests coming in via sg and should know to complete the requests 
> immediately.  No requeuing and no error handling.

> Does this sound like a feasible thing to implement?

No per above. I think there are still some cases where sg or SG_IO
commands are not immediately returned, I think for some certain ASC codes,
some of those probably should be retried, as should cases like QUEUE FULL.

As you say, what you really want is the correct result going back to the
user, not the result of the TUR.

So save and restore the result in scsi_eh_tur, and also in scsi_eh_try_stu.
The request sense one already saves and restores it.

And always set result |= (DRIVER_TIMEOUT << 24) if we are not requeueing
the IO.

Like (only compile tested) this, against scsi-misc git tree of a few days
ago. Hopefully, sg or SG_IO programs can handle DRIVER_TIMEOUT :-/ but it
is certainly better than returning as if no error occurred.

What do you think?

diff -uprN -X /home/patman/dontdiff scsi-misc-2.6.git/drivers/scsi/scsi_error.c to-res-scsi-misc-2.6.git/drivers/scsi/scsi_error.c
--- scsi-misc-2.6.git/drivers/scsi/scsi_error.c	2005-05-09 14:19:41.000000000 -0700
+++ to-res-scsi-misc-2.6.git/drivers/scsi/scsi_error.c	2005-05-10 15:49:01.000000000 -0700
@@ -770,6 +770,7 @@ static int scsi_eh_tur(struct scsi_cmnd 
 {
 	static unsigned char tur_command[6] = {TEST_UNIT_READY, 0, 0, 0, 0, 0};
 	int retry_cnt = 1, rtn;
+	int saved_result;
 
 retry_tur:
 	memcpy(scmd->cmnd, tur_command, sizeof(tur_command));
@@ -780,6 +781,7 @@ retry_tur:
 	 */
 	memset(scmd->sense_buffer, 0, sizeof(scmd->sense_buffer));
 
+	saved_result = scmd->result;
 	scmd->request_buffer = NULL;
 	scmd->request_bufflen = 0;
 	scmd->use_sg = 0;
@@ -794,6 +796,7 @@ retry_tur:
 	 * the original request, so let's restore the original data. (db)
 	 */
 	scsi_setup_cmd_retry(scmd);
+	scmd->result = saved_result;
 
 	/*
 	 * hey, we are done.  let's look to see what happened.
@@ -896,6 +899,7 @@ static int scsi_eh_try_stu(struct scsi_c
 {
 	static unsigned char stu_command[6] = {START_STOP, 0, 0, 0, 1, 0};
 	int rtn;
+	int saved_result;
 
 	if (!scmd->device->allow_restart)
 		return 1;
@@ -908,6 +912,7 @@ static int scsi_eh_try_stu(struct scsi_c
 	 */
 	memset(scmd->sense_buffer, 0, sizeof(scmd->sense_buffer));
 
+	saved_result = scmd->result;
 	scmd->request_buffer = NULL;
 	scmd->request_bufflen = 0;
 	scmd->use_sg = 0;
@@ -922,6 +927,7 @@ static int scsi_eh_try_stu(struct scsi_c
 	 * the original request, so let's restore the original data. (db)
 	 */
 	scsi_setup_cmd_retry(scmd);
+	scmd->result = saved_result;
 
 	/*
 	 * hey, we are done.  let's look to see what happened.
@@ -1561,8 +1567,7 @@ static void scsi_eh_flush_done_q(struct 
 							  scmd));
 				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
 		} else {
-			if (!scmd->result)
-				scmd->result |= (DRIVER_TIMEOUT << 24);
+			scmd->result |= (DRIVER_TIMEOUT << 24);
 			SCSI_LOG_ERROR_RECOVERY(3, printk("%s: flush finish"
 							" cmd: %p\n",
 							current->comm, scmd));


-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sg driver and the error handler
  2005-05-10 22:58 ` Patrick Mansfield
@ 2005-05-11 11:59   ` Douglas Gilbert
  2005-05-11 16:35     ` Luben Tuikov
  2005-05-11 16:40   ` Luben Tuikov
  2005-05-16 17:42   ` [PATCH] saved and restore result for timed out commands Patrick Mansfield
  2 siblings, 1 reply; 19+ messages in thread
From: Douglas Gilbert @ 2005-05-11 11:59 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: Alan Stern, James Bottomley, SCSI development list

Patrick Mansfield wrote:
> Hi Alan -
> 
> On Tue, May 10, 2005 at 02:51:44PM -0400, Alan Stern wrote:
> 
>>When a command injected through the sg driver encounters any sort of
>>error, the usual retry mechanism and error handler are brought into play.  
>>Since sg sets the number of retries to 0 by default, the retry mechanism
>>shouldn't cause any difficulty.  But the error handler does.  IMO it
>>should never get involved with requests coming through sg -- sg should
>>provide access that is as transparent as possible so that userspace
>>programs will have a clean shot at managing their device.
> 
> 
> The error handler is mainly a timeout handler, so it has to run to cancel
> the timed out command, we can't complete the timed out command back to the
> upper level until we know the HBA is no longer using it.
> 
> 
>>Consider a case that just came up.  A USB CD drive causes a phase error
>>when it receives a certain READ BUFFER command (buggy firmware on the
>>drive, never mind that now).  You would expect the user program to receive
>>from sg a host_status value indicating DID_ERROR or something of the sort.
>>
>>Instead the error handler takes charge and sends out one or two TEST UNIT 
>>READY commands.  The status information finally received by the user 
>>program is the status from the TEST UNIT READY, not from the failed READ 
>>BUFFER!  How's a program supposed to cope with that sort of obfuscation?
> 
> 
> :-(

Writing and maintaining a clean SCSI command pass-through in a
block-centric environment has not been easy. It often feels
like a one step forward and two steps back process :-)

In my recent "sense descriptor format" changes I did
unclutter some SG_IO error return paths, mainly for usages
via block device nodes (e.g. /dev/sda).

As ever, handling errors sanely is not easy. It also runs the
risk of tripping up some hardware.

>>Something in the SCSI stack (scsi_io_completion ?) should check for 
>>requests coming in via sg and should know to complete the requests 
>>immediately.  No requeuing and no error handling.

The sg driver does not currently call scsi_io_completion().
In the mid level you can see code like:
   if (blk_pc_request(scsi_cmnd->request))
      pass-through
   else
      block-injected [almost always a READ or WRITE]

I think the "_pc_" stands for packet command. Maybe we need more
instances of this.

>>Does this sound like a feasible thing to implement?
> 
> 
> No per above. I think there are still some cases where sg or SG_IO
> commands are not immediately returned, I think for some certain ASC codes,
> some of those probably should be retried, as should cases like QUEUE FULL.
> 
> As you say, what you really want is the correct result going back to the
> user, not the result of the TUR.
> 
> So save and restore the result in scsi_eh_tur, and also in scsi_eh_try_stu.
> The request sense one already saves and restores it.
> 
> And always set result |= (DRIVER_TIMEOUT << 24) if we are not requeueing
> the IO.
> 
> Like (only compile tested) this, against scsi-misc git tree of a few days
> ago. Hopefully, sg or SG_IO programs can handle DRIVER_TIMEOUT :-/ but it
> is certainly better than returning as if no error occurred.

sg3_utils error processing first decodes SCSI status and sense data.
Then if any of the other bits in "result" are set, then it comments
on them.

<snip>
Patch looks fine. Perhaps Alan could try some utility from sg3_utils
in the problem environment once Patrick's patch has been applied.
The sg_dd utility now has a "verbose=<n>" option where <n>=1 should
decode any SCSI status (including DID and DRIVER) errors received.

Doug Gilbert

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sg driver and the error handler
  2005-05-11 11:59   ` Douglas Gilbert
@ 2005-05-11 16:35     ` Luben Tuikov
  2005-05-11 17:45       ` Patrick Mansfield
  2005-05-11 19:43       ` Alan Stern
  0 siblings, 2 replies; 19+ messages in thread
From: Luben Tuikov @ 2005-05-11 16:35 UTC (permalink / raw)
  To: dougg; +Cc: Patrick Mansfield, Alan Stern, James Bottomley,
	SCSI development list

>>The error handler is mainly a timeout handler, so it has to run to cancel
>>the timed out command, we can't complete the timed out command back to the
>>upper level until we know the HBA is no longer using it.
>>
>>
>>
>>>Consider a case that just came up.  A USB CD drive causes a phase error
>>>when it receives a certain READ BUFFER command (buggy firmware on the
>>>drive, never mind that now).  You would expect the user program to receive
>>
>>>from sg a host_status value indicating DID_ERROR or something of the sort.
>>
>>>Instead the error handler takes charge and sends out one or two TEST UNIT 
>>>READY commands.  The status information finally received by the user 
>>>program is the status from the TEST UNIT READY, not from the failed READ 
>>>BUFFER!  How's a program supposed to cope with that sort of obfuscation?
>>
>>
>>:-(

I think Alan even wants to see the QUEUEFULL status not being
retried.

That is, other than reclaiming the command from the LLDD, the
eh code should not retry or do anything with the command, just
pass it back to the user process who sent it via sg.

Maybe the application client has other tricks up its sleeve
on QUEUEFULL coming from the device.

I think scsi generic as a passthrough, where minimal logic
from SCSI Core is applied and most is left to the application
client.

	Luben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sg driver and the error handler
  2005-05-10 22:58 ` Patrick Mansfield
  2005-05-11 11:59   ` Douglas Gilbert
@ 2005-05-11 16:40   ` Luben Tuikov
  2005-05-11 17:14     ` Patrick Mansfield
  2005-05-16 17:42   ` [PATCH] saved and restore result for timed out commands Patrick Mansfield
  2 siblings, 1 reply; 19+ messages in thread
From: Luben Tuikov @ 2005-05-11 16:40 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: Alan Stern, James Bottomley, Doug Gilbert, SCSI development list

On 05/10/05 18:58, Patrick Mansfield wrote:
> As you say, what you really want is the correct result going back to the
> user, not the result of the TUR.
> 
> So save and restore the result in scsi_eh_tur, and also in scsi_eh_try_stu.
> The request sense one already saves and restores it.

I think he even wants the stronger condition -- after the command is reclaimed
from the LLDD, just send back the result to the application client.  And let
the application client decide whether they want to send TUR or something else.

	Luben
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sg driver and the error handler
  2005-05-11 16:40   ` Luben Tuikov
@ 2005-05-11 17:14     ` Patrick Mansfield
  0 siblings, 0 replies; 19+ messages in thread
From: Patrick Mansfield @ 2005-05-11 17:14 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: Alan Stern, James Bottomley, Doug Gilbert, SCSI development list

On Wed, May 11, 2005 at 12:40:05PM -0400, Luben Tuikov wrote:
> On 05/10/05 18:58, Patrick Mansfield wrote:
> > As you say, what you really want is the correct result going back to the
> > user, not the result of the TUR.
> > 
> > So save and restore the result in scsi_eh_tur, and also in scsi_eh_try_stu.
> > The request sense one already saves and restores it.
> 
> I think he even wants the stronger condition -- after the command is reclaimed
> from the LLDD, just send back the result to the application client.  And let
> the application client decide whether they want to send TUR or something else.

The TUR is part of the error recovery, and so it should be sent
independent of the timed out command.

I know we want to have transport specific error (or timeout) handling and
recovery. But for current usage, we should return the result of the
timeout to the upper level driver (and implicitly to the application).

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sg driver and the error handler
  2005-05-11 16:35     ` Luben Tuikov
@ 2005-05-11 17:45       ` Patrick Mansfield
  2005-05-11 17:55         ` Luben Tuikov
  2005-05-11 19:43       ` Alan Stern
  1 sibling, 1 reply; 19+ messages in thread
From: Patrick Mansfield @ 2005-05-11 17:45 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: dougg, Alan Stern, James Bottomley, SCSI development list

On Wed, May 11, 2005 at 12:35:39PM -0400, Luben Tuikov wrote:

> I think Alan even wants to see the QUEUEFULL status not being
> retried.
> 
> That is, other than reclaiming the command from the LLDD, the
> eh code should not retry or do anything with the command, just
> pass it back to the user process who sent it via sg.
> 
> Maybe the application client has other tricks up its sleeve
> on QUEUEFULL coming from the device.
> 
> I think scsi generic as a passthrough, where minimal logic
> from SCSI Core is applied and most is left to the application
> client.

IMO, normal sg/SG_IO usage wants scsi core to requeue the request, they
want the command to be processed by the LUN.

And user space does not have the information available in scsi core
(status of other outstanding IO to the same LUN), and can't retry in a
timely manner, possibly avoid starvation issues.

Maybe fast fail should be used to modify QUEUE FULL re-queue behaviours.

It might be a bit odd in some cases and maybe hardware specific (for multi
path io, does QUEUE FULL mean a storage port/controller is full, or the
LUN is full? if port, then you want to send back to the upper level, but
if LUN, you might want to let scsi core requeue it). 

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sg driver and the error handler
  2005-05-11 17:45       ` Patrick Mansfield
@ 2005-05-11 17:55         ` Luben Tuikov
  2005-05-11 18:02           ` Patrick Mansfield
  0 siblings, 1 reply; 19+ messages in thread
From: Luben Tuikov @ 2005-05-11 17:55 UTC (permalink / raw)
  To: Patrick Mansfield
  Cc: dougg, Alan Stern, James Bottomley, SCSI development list

On 05/11/05 13:45, Patrick Mansfield wrote:
> IMO, normal sg/SG_IO usage wants scsi core to requeue the request, they
> want the command to be processed by the LUN.
> 
> And user space does not have the information available in scsi core
> (status of other outstanding IO to the same LUN), and can't retry in a
> timely manner, possibly avoid starvation issues.
> 
> Maybe fast fail should be used to modify QUEUE FULL re-queue behaviours.

Yes, if this is controllable through sg, that would avoid
the problems Alan mentioned.
 
> It might be a bit odd in some cases and maybe hardware specific (for multi
> path io, does QUEUE FULL mean a storage port/controller is full, or the
> LUN is full? if port, then you want to send back to the upper level, but
> if LUN, you might want to let scsi core requeue it). 

If it is coming from SDS (TASK SET FULL) then it is the LUN.  If it
is coming from the LLDD, then it is the other (host queue full).

	Luben


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sg driver and the error handler
  2005-05-11 17:55         ` Luben Tuikov
@ 2005-05-11 18:02           ` Patrick Mansfield
  0 siblings, 0 replies; 19+ messages in thread
From: Patrick Mansfield @ 2005-05-11 18:02 UTC (permalink / raw)
  To: Luben Tuikov; +Cc: dougg, Alan Stern, James Bottomley, SCSI development list

On Wed, May 11, 2005 at 01:55:50PM -0400, Luben Tuikov wrote:
> On 05/11/05 13:45, Patrick Mansfield wrote:

> > It might be a bit odd in some cases and maybe hardware specific (for multi
> > path io, does QUEUE FULL mean a storage port/controller is full, or the
> > LUN is full? if port, then you want to send back to the upper level, but
> > if LUN, you might want to let scsi core requeue it). 
> 
> If it is coming from SDS (TASK SET FULL) then it is the LUN.  If it
> is coming from the LLDD, then it is the other (host queue full).

I mean that if it is from the LUN, and there are multiple ports or
controllers on the LUN, you don't know if the QUEUE FULL means the port is
busy, or the LUN (or target) is busy.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: sg driver and the error handler
  2005-05-11 16:35     ` Luben Tuikov
  2005-05-11 17:45       ` Patrick Mansfield
@ 2005-05-11 19:43       ` Alan Stern
  1 sibling, 0 replies; 19+ messages in thread
From: Alan Stern @ 2005-05-11 19:43 UTC (permalink / raw)
  To: Luben Tuikov
  Cc: dougg, Patrick Mansfield, James Bottomley, SCSI development list

On Wed, 11 May 2005, Luben Tuikov wrote:

> I think Alan even wants to see the QUEUEFULL status not being
> retried.
> 
> That is, other than reclaiming the command from the LLDD, the
> eh code should not retry or do anything with the command, just
> pass it back to the user process who sent it via sg.

Yes.

> Maybe the application client has other tricks up its sleeve
> on QUEUEFULL coming from the device.
> 
> I think scsi generic as a passthrough, where minimal logic
> from SCSI Core is applied and most is left to the application
> client.

That's how I think of it too.

> I think he even wants the stronger condition -- after the command is reclaimed
> from the LLDD, just send back the result to the application client.  And let
> the application client decide whether they want to send TUR or something else.

Absolutely correct.

Now, I have no idea how applications will handle exotic errors like 
QUEUEFULL.  And it might make sense to have the core retry errors like 
that when they come from the LLDD and not from the device.  But when an 
error does come from the device, it certainly should be passed directly 
back to the sg client.  The error handler shouldn't do anything more than 
abort and clean up after a timed-out command.

On Wed, 11 May 2005, Patrick Mansfield wrote:

> The TUR is part of the error recovery, and so it should be sent
> independent of the timed out command.

Or shouldn't be sent, as the case may be.  If the error handler isn't
going to do error recovery and is going to leave it up to the application,
then there's no reason to send the TUR.

> I know we want to have transport specific error (or timeout) handling and
> recovery. But for current usage, we should return the result of the
> timeout to the upper level driver (and implicitly to the application).

Right.  Ultimately I don't know how transport-specific issues need to be
handled.  It would be best if they can be limited to minimal cleanup so
that the application can do device-related error recovery.

Alan Stern

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] saved and restore result for timed out commands
  2005-05-10 22:58 ` Patrick Mansfield
  2005-05-11 11:59   ` Douglas Gilbert
  2005-05-11 16:40   ` Luben Tuikov
@ 2005-05-16 17:42   ` Patrick Mansfield
  2005-05-16 19:42     ` Alan Stern
  2005-06-01 18:45     ` Alan Stern
  2 siblings, 2 replies; 19+ messages in thread
From: Patrick Mansfield @ 2005-05-16 17:42 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, Doug Gilbert, SCSI development list

Save and restore the scmd->result, so that timed out commands do not
return the result of the TEST UNIT READY or the start/stop commands. Code
is already in place to save and restore the result for the request sense
case.

The previous version of this patch erroneously removed the "if" check,
instead add a comment as to why the "if" is needed.

Signed-off-by: Patrick Mansfield <patmans@us.ibm.com>

diff -uprN -X /home/patman/dontdiff scsi-misc-2.6.git/drivers/scsi/scsi_error.c to-res-scsi-misc-2.6.git/drivers/scsi/scsi_error.c
--- scsi-misc-2.6.git/drivers/scsi/scsi_error.c	2005-05-09 14:19:41.000000000 -0700
+++ to-res-scsi-misc-2.6.git/drivers/scsi/scsi_error.c	2005-05-16 10:01:11.000000000 -0700
@@ -770,6 +770,7 @@ static int scsi_eh_tur(struct scsi_cmnd 
 {
 	static unsigned char tur_command[6] = {TEST_UNIT_READY, 0, 0, 0, 0, 0};
 	int retry_cnt = 1, rtn;
+	int saved_result;
 
 retry_tur:
 	memcpy(scmd->cmnd, tur_command, sizeof(tur_command));
@@ -780,6 +781,7 @@ retry_tur:
 	 */
 	memset(scmd->sense_buffer, 0, sizeof(scmd->sense_buffer));
 
+	saved_result = scmd->result;
 	scmd->request_buffer = NULL;
 	scmd->request_bufflen = 0;
 	scmd->use_sg = 0;
@@ -794,6 +796,7 @@ retry_tur:
 	 * the original request, so let's restore the original data. (db)
 	 */
 	scsi_setup_cmd_retry(scmd);
+	scmd->result = saved_result;
 
 	/*
 	 * hey, we are done.  let's look to see what happened.
@@ -896,6 +899,7 @@ static int scsi_eh_try_stu(struct scsi_c
 {
 	static unsigned char stu_command[6] = {START_STOP, 0, 0, 0, 1, 0};
 	int rtn;
+	int saved_result;
 
 	if (!scmd->device->allow_restart)
 		return 1;
@@ -908,6 +912,7 @@ static int scsi_eh_try_stu(struct scsi_c
 	 */
 	memset(scmd->sense_buffer, 0, sizeof(scmd->sense_buffer));
 
+	saved_result = scmd->result;
 	scmd->request_buffer = NULL;
 	scmd->request_bufflen = 0;
 	scmd->use_sg = 0;
@@ -922,6 +927,7 @@ static int scsi_eh_try_stu(struct scsi_c
 	 * the original request, so let's restore the original data. (db)
 	 */
 	scsi_setup_cmd_retry(scmd);
+	scmd->result = saved_result;
 
 	/*
 	 * hey, we are done.  let's look to see what happened.
@@ -1561,6 +1567,11 @@ static void scsi_eh_flush_done_q(struct 
 							  scmd));
 				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
 		} else {
+			/*
+			 * If just we got sense for the device (called
+			 * scsi_eh_get_sense), scmd->result is already
+			 * set, do not set DRIVER_TIMEOUT.
+			 */
 			if (!scmd->result)
 				scmd->result |= (DRIVER_TIMEOUT << 24);
 			SCSI_LOG_ERROR_RECOVERY(3, printk("%s: flush finish"

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] saved and restore result for timed out commands
  2005-05-16 17:42   ` [PATCH] saved and restore result for timed out commands Patrick Mansfield
@ 2005-05-16 19:42     ` Alan Stern
  2005-06-01 18:45     ` Alan Stern
  1 sibling, 0 replies; 19+ messages in thread
From: Alan Stern @ 2005-05-16 19:42 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: James Bottomley, Doug Gilbert, SCSI development list

On Mon, 16 May 2005, Patrick Mansfield wrote:

> Save and restore the scmd->result, so that timed out commands do not
> return the result of the TEST UNIT READY or the start/stop commands. Code
> is already in place to save and restore the result for the request sense
> case.
> 
> The previous version of this patch erroneously removed the "if" check,
> instead add a comment as to why the "if" is needed.

I can confirm that this patch does help; the correct status is now
returned to the sg user.  However there are still a few things that should
be fixed up:

First, this doesn't erase the sense data after error recovery.  I tried it 
with two different scenarios, one where the command returned a phase error 
and one where the command timed out.  In both cases the sg user got back a 
buffer with valid sense data, from the TUR commands issued by the error 
handler.  This could be highly misleading.

Second, as Patrick has noted, this doesn't prevent the error handler from 
issuing those TURs in the first place.  If some sort of Unit Attention 
came up, the sg user wouldn't be able to handle it correctly because the 
error handler would have swallowed it.  Of course, once this is fixed the 
first problem will no longer be an issue (no TUR, hence no unsolicited 
sense data).

Alan Stern

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] saved and restore result for timed out commands
  2005-05-16 17:42   ` [PATCH] saved and restore result for timed out commands Patrick Mansfield
  2005-05-16 19:42     ` Alan Stern
@ 2005-06-01 18:45     ` Alan Stern
  2005-06-01 21:00       ` James Bottomley
  1 sibling, 1 reply; 19+ messages in thread
From: Alan Stern @ 2005-06-01 18:45 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: James Bottomley, Doug Gilbert, SCSI development list

On Mon, 16 May 2005, Patrick Mansfield wrote:

> Save and restore the scmd->result, so that timed out commands do not
> return the result of the TEST UNIT READY or the start/stop commands. Code
> is already in place to save and restore the result for the request sense
> case.
> 
> The previous version of this patch erroneously removed the "if" check,
> instead add a comment as to why the "if" is needed.

Has there been any more progress on this issue?

I spoke with Jörg Schilling (author of the cdrecord package, a widely-used
sg client) and he said that failed operations should be retried
automatically if the command had not gotten as far as being sent on the
bus -- e.g., if the host adapter driver's queue was full.  But if the
command did get sent to the device then any subsequent failure should be
reflected back to the client with no retrys.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] saved and restore result for timed out commands
  2005-06-01 18:45     ` Alan Stern
@ 2005-06-01 21:00       ` James Bottomley
  2005-06-01 21:26         ` Alan Stern
  0 siblings, 1 reply; 19+ messages in thread
From: James Bottomley @ 2005-06-01 21:00 UTC (permalink / raw)
  To: Alan Stern; +Cc: Patrick Mansfield, Doug Gilbert, SCSI development list

On Wed, 2005-06-01 at 14:45 -0400, Alan Stern wrote:
> Has there been any more progress on this issue?

Well, the patch is in scsi-misc, if that's what you're asking.

> I spoke with Jörg Schilling (author of the cdrecord package, a widely-used
> sg client) and he said that failed operations should be retried
> automatically if the command had not gotten as far as being sent on the
> bus -- e.g., if the host adapter driver's queue was full.  But if the
> command did get sent to the device then any subsequent failure should be
> reflected back to the client with no retrys.

That's pretty much what we do now ... with the exception of QUEUE_FULL
and BUSY handling.  What's the actual problem you were discussing?

James


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] saved and restore result for timed out commands
  2005-06-01 21:00       ` James Bottomley
@ 2005-06-01 21:26         ` Alan Stern
  2005-06-01 21:57           ` Patrick Mansfield
  2005-06-03 15:21           ` James Bottomley
  0 siblings, 2 replies; 19+ messages in thread
From: Alan Stern @ 2005-06-01 21:26 UTC (permalink / raw)
  To: James Bottomley; +Cc: Patrick Mansfield, Doug Gilbert, SCSI development list

On Wed, 1 Jun 2005, James Bottomley wrote:

> On Wed, 2005-06-01 at 14:45 -0400, Alan Stern wrote:
> > Has there been any more progress on this issue?
> 
> Well, the patch is in scsi-misc, if that's what you're asking.

That's good.  Is it too much to ask for an additional patch to zero out
the sense data buffer after the error handler has finished with a command?  
After a transport error or a timeout there shouldn't be any sense data
(races aside), but the error handler often leaves apparently valid data in
the buffer, unrelated to the original command.

By the way, is scsi-misc available anywhere as a patch (or set of
patches) for those of us not on the bleeding edge, or does it exist only 
as a git repository?

> > I spoke with Jörg Schilling (author of the cdrecord package, a widely-used
> > sg client) and he said that failed operations should be retried
> > automatically if the command had not gotten as far as being sent on the
> > bus -- e.g., if the host adapter driver's queue was full.  But if the
> > command did get sent to the device then any subsequent failure should be
> > reflected back to the client with no retrys.
> 
> That's pretty much what we do now ... with the exception of QUEUE_FULL
> and BUSY handling.  What's the actual problem you were discussing?

This isn't so much in response to a particular problem as it is an attempt 
to make the Linux sg driver more useful.  (The discussion got started 
because of an actual problem, but that problem was at the USB level, not 
the SCSI level.)

You may not have run across Schilling very much.  He's done a lot of 
work on SCSI application programming on a variety of OS's.  His opinion of 
the Linux SCSI stack isn't very high, although admittedly many of the 
problems he complains about only existed in much earlier versions of 
Linux.  (Try reading through the some of the documentation or man pages 
for cdrecord and you'll see what I mean.)

Anyway, he still talks about how Linux's sg interface is so much worse
than Solaris's.  Considering this issue of proper return codes and client
control of the device, you can agree that he does have a point.  He must
think of himself purely as an applications programmer, not a kernel
programmer, because he hasn't done anything to _fix_ these problems -- he
prefers to complain about them.  So I decided to see if something could be 
done to fix this particular issue, which does appear to be genuine.

Overall, the motivation is simply to improve sg by making it more 
transparent, giving clients more control over the device, and returning 
valid and useful status.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] saved and restore result for timed out commands
  2005-06-01 21:26         ` Alan Stern
@ 2005-06-01 21:57           ` Patrick Mansfield
  2005-06-03 15:21           ` James Bottomley
  1 sibling, 0 replies; 19+ messages in thread
From: Patrick Mansfield @ 2005-06-01 21:57 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, Doug Gilbert, SCSI development list

On Wed, Jun 01, 2005 at 05:26:34PM -0400, Alan Stern wrote:
> On Wed, 1 Jun 2005, James Bottomley wrote:
> 
> > On Wed, 2005-06-01 at 14:45 -0400, Alan Stern wrote:
> > > Has there been any more progress on this issue?
> > 
> > Well, the patch is in scsi-misc, if that's what you're asking.
> 
> That's good.  Is it too much to ask for an additional patch to zero out
> the sense data buffer after the error handler has finished with a command?  
> After a transport error or a timeout there shouldn't be any sense data
> (races aside), but the error handler often leaves apparently valid data in
> the buffer, unrelated to the original command.

I was hoping someone else might comment on that, I am not certain if sense
data is valid only when CHECK_CONDITION is set. 

But, we can zero/clear it after the TUR without hurting anything.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] saved and restore result for timed out commands
  2005-06-01 21:26         ` Alan Stern
  2005-06-01 21:57           ` Patrick Mansfield
@ 2005-06-03 15:21           ` James Bottomley
  2005-06-03 15:35             ` Alan Stern
  1 sibling, 1 reply; 19+ messages in thread
From: James Bottomley @ 2005-06-03 15:21 UTC (permalink / raw)
  To: Alan Stern; +Cc: Patrick Mansfield, Doug Gilbert, SCSI development list

On Wed, 2005-06-01 at 17:26 -0400, Alan Stern wrote:
> By the way, is scsi-misc available anywhere as a patch (or set of
> patches) for those of us not on the bleeding edge, or does it exist only 
> as a git repository?

I suppose I should have a cron job sending this information to linux-
scsi weekly ...

The git repositories are:

rsync://www.parisc-linux.org/~jejb/git/scsi-misc-2.6.git
rsync://www.parisc-linux.org/~jejb/git/scsi-rc-fixes-2.6.git

There's a browser for the git tree at

http://www.parisc-linux.org/cgi-bin/gitweb.pl

And the patches and change logs are here

http://www.parisc-linux.org/~jejb/scsi_diffs

I'm also in the process of setting up SCSI git trees on www.kernel.org

James



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] saved and restore result for timed out commands
  2005-06-03 15:21           ` James Bottomley
@ 2005-06-03 15:35             ` Alan Stern
  0 siblings, 0 replies; 19+ messages in thread
From: Alan Stern @ 2005-06-03 15:35 UTC (permalink / raw)
  To: James Bottomley; +Cc: Patrick Mansfield, Doug Gilbert, SCSI development list

On Fri, 3 Jun 2005, James Bottomley wrote:

> On Wed, 2005-06-01 at 17:26 -0400, Alan Stern wrote:
> > By the way, is scsi-misc available anywhere as a patch (or set of
> > patches) for those of us not on the bleeding edge, or does it exist only 
> > as a git repository?
> 
> I suppose I should have a cron job sending this information to linux-
> scsi weekly ...
> 
> The git repositories are:
> 
> rsync://www.parisc-linux.org/~jejb/git/scsi-misc-2.6.git
> rsync://www.parisc-linux.org/~jejb/git/scsi-rc-fixes-2.6.git
> 
> There's a browser for the git tree at
> 
> http://www.parisc-linux.org/cgi-bin/gitweb.pl
> 
> And the patches and change logs are here
> 
> http://www.parisc-linux.org/~jejb/scsi_diffs
> 
> I'm also in the process of setting up SCSI git trees on www.kernel.org

Sorry -- it's my fault for not subscribing to linux-scsi (I don't post to 
it very often).  Thanks for the pointers.

Alan Stern


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] saved and restore result for timed out commands
       [not found] <20050603160921.GA32212@us.ibm.com>
@ 2005-06-03 16:37 ` Alan Stern
  0 siblings, 0 replies; 19+ messages in thread
From: Alan Stern @ 2005-06-03 16:37 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: SCSI development list

[Now on-list]

On Fri, 3 Jun 2005, Patrick Mansfield wrote:

> On Fri, Jun 03, 2005 at 11:04:28AM -0400, Alan Stern wrote:
> 
> > As for lost Unit Attentions, what about events like media changes?  I can 
> > easily imagine those sorts of notifications getting swallowed by the error 
> > handler.
> 
> Yeh ... that sounds bad.
> 
> And the scsi_eh_tur() losing the media change looks broken! See
> drivers/scsi/scsi_lib.c, search for "->changed". I am thinking there are
> other similar cases, probably search and compare handling of other UNIT
> ATTENTIONs and checking of sense result.
> 
> scsi_eh_tur() should do the same things as scsi_test_unit_ready(). We need
> some helper function(s).
> 
> And we have to handle sd and sg both getting the media change, again
> might be true for other UNIT ATTENTIONs. 

Actually it looks like media change is the worst case.  The use_10_for_rw 
stuff in scsi_io_completion() and other various retry pathways shouldn't 
be invoked for sg.

> We might be able to pass back the data to sg by conditionally (just UNIT
> ATTENTION, or all cases???) not clearing the sense data (as per current
> code) and marking the sense data as present, prehaps just "scmd->result &=
> CHECK_CONDITION".
> 
> This seems a bit tricky. I have not thought this through, and don't have
> any patches :-)

I don't have any patches either, and it's not clear that simply passing
the sense data through would work.  Other status bits would lead the
client to believe that the original command aborted without returning
Check Condition status (although the error handler's TUR did get CC), so
the client might not bother to look at the sense data.

> So we should verify all command completions in scsi_error.c are OK, and
> add a helper function(s) called for all scsi IO completions to check UNIT
> ATTENTION etc. And then maybe keep sense data from the TUR for use with
> the timed out command.

Maybe.  It might work if you can fool the client into thinking the
timed-out command failed with the appropriate status.  Making up fake
return values like this doesn't seem to be a good idea... but perhaps for 
Unit Attention it's acceptable.

Come to think of it, the error handler doesn't stop when it sees Unit
Attention, does it?  It sends another TUR, looking for valid status.  (As
a side issue, this behavior needs to be fixed as well.  TUR will always
fail when no medium is present; the error handler should accept such a
failure and not try to do any further error recovery.)

Alan Stern

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2005-06-03 16:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-10 18:51 sg driver and the error handler Alan Stern
2005-05-10 22:58 ` Patrick Mansfield
2005-05-11 11:59   ` Douglas Gilbert
2005-05-11 16:35     ` Luben Tuikov
2005-05-11 17:45       ` Patrick Mansfield
2005-05-11 17:55         ` Luben Tuikov
2005-05-11 18:02           ` Patrick Mansfield
2005-05-11 19:43       ` Alan Stern
2005-05-11 16:40   ` Luben Tuikov
2005-05-11 17:14     ` Patrick Mansfield
2005-05-16 17:42   ` [PATCH] saved and restore result for timed out commands Patrick Mansfield
2005-05-16 19:42     ` Alan Stern
2005-06-01 18:45     ` Alan Stern
2005-06-01 21:00       ` James Bottomley
2005-06-01 21:26         ` Alan Stern
2005-06-01 21:57           ` Patrick Mansfield
2005-06-03 15:21           ` James Bottomley
2005-06-03 15:35             ` Alan Stern
     [not found] <20050603160921.GA32212@us.ibm.com>
2005-06-03 16:37 ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox