linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Device flags: use_10_for_rw and use_10_for_ms
@ 2005-11-28 21:43 Alan Stern
  2005-11-29 20:03 ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Alan Stern @ 2005-11-28 21:43 UTC (permalink / raw)
  To: Patrick Mansfield, James Bottomley; +Cc: SCSI development list

Patrick and James:

There are a couple of problems with the implementation of the 
use_10_for_rw and use_10_for_ms flags.

The easy problem is that use_10_for_rw is implemented twice: once in 
sd.c:sd_rw_intr and once in scsi_lib.c:scsi_io_completion.  The obvious 
fix is to remove of them.  Since scsi_io_completion is more general, it 
makes sense to leave the code there and remove it from sd.c.

The hard problem is that sometimes devices return an ILLEGAL_REQUEST sense
key when they shouldn't.  I posted an example a week or two ago; what
happened was the use_10_for_rw flag got turned off and from then on only
6-byte commands were used (and of course the device failed to recognize
them).

Clearly we need a mechanism for going the other way: when a 6-byte command 
gets ILLEGAL_REQUEST sense, turn the use_10_for_xxx flag back on.  The 
difficulty is that this will cause an infinite retry loop if the device 
doesn't like either form of the command.

Is there a standard way to limit the number of retries for these cases in 
scsi_io_completion?  The code pathway involves getting rid of the 
scsi_cmnd and keeping only the struct request, so I don't know where an 
appropriate place would be to store the retry counter.

Any ideas?

Alan Stern


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Device flags: use_10_for_rw and use_10_for_ms
  2005-11-28 21:43 Device flags: use_10_for_rw and use_10_for_ms Alan Stern
@ 2005-11-29 20:03 ` Jens Axboe
  2005-11-29 21:23   ` Alan Stern
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2005-11-29 20:03 UTC (permalink / raw)
  To: Alan Stern; +Cc: Patrick Mansfield, James Bottomley, SCSI development list

On Mon, Nov 28 2005, Alan Stern wrote:
> Patrick and James:
> 
> There are a couple of problems with the implementation of the 
> use_10_for_rw and use_10_for_ms flags.
> 
> The easy problem is that use_10_for_rw is implemented twice: once in 
> sd.c:sd_rw_intr and once in scsi_lib.c:scsi_io_completion.  The obvious 
> fix is to remove of them.  Since scsi_io_completion is more general, it 
> makes sense to leave the code there and remove it from sd.c.
> 
> The hard problem is that sometimes devices return an ILLEGAL_REQUEST sense
> key when they shouldn't.  I posted an example a week or two ago; what
> happened was the use_10_for_rw flag got turned off and from then on only
> 6-byte commands were used (and of course the device failed to recognize
> them).
> 
> Clearly we need a mechanism for going the other way: when a 6-byte command 
> gets ILLEGAL_REQUEST sense, turn the use_10_for_xxx flag back on.  The 
> difficulty is that this will cause an infinite retry loop if the device 
> doesn't like either form of the command.
> 
> Is there a standard way to limit the number of retries for these cases in 
> scsi_io_completion?  The code pathway involves getting rid of the 
> scsi_cmnd and keeping only the struct request, so I don't know where an 
> appropriate place would be to store the retry counter.
> 
> Any ideas?

How about just improving the check for correctly failed 10-byte command,
so we only clear the flag for the right reason? I think your problem is
likely due to the check being too relaxed right now.

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ce9d73a..1f27827 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -884,7 +884,8 @@ void scsi_io_completion(struct scsi_cmnd
 			* system where READ CAPACITY failed, we may have read
 			* past the end of the disk.
 		 	*/
-			if (cmd->device->use_10_for_rw &&
+			if ((cmd->device->use_10_for_rw &&
+			    sshdr.asc == 0x20 && sshdr.ascq == 0x00) &&
 			    (cmd->cmnd[0] == READ_10 ||
 			     cmd->cmnd[0] == WRITE_10)) {
 				cmd->device->use_10_for_rw = 0;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Device flags: use_10_for_rw and use_10_for_ms
  2005-11-29 20:03 ` Jens Axboe
@ 2005-11-29 21:23   ` Alan Stern
  2005-11-29 22:48     ` Patrick Mansfield
  2005-11-30  8:08     ` Jens Axboe
  0 siblings, 2 replies; 8+ messages in thread
From: Alan Stern @ 2005-11-29 21:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Patrick Mansfield, James Bottomley, SCSI development list

On Tue, 29 Nov 2005, Jens Axboe wrote:

> > The hard problem is that sometimes devices return an ILLEGAL_REQUEST sense
> > key when they shouldn't.  I posted an example a week or two ago; what
> > happened was the use_10_for_rw flag got turned off and from then on only
> > 6-byte commands were used (and of course the device failed to recognize
> > them).
> > 
> > Clearly we need a mechanism for going the other way: when a 6-byte command 
> > gets ILLEGAL_REQUEST sense, turn the use_10_for_xxx flag back on.  The 
> > difficulty is that this will cause an infinite retry loop if the device 
> > doesn't like either form of the command.
> > 
> > Is there a standard way to limit the number of retries for these cases in 
> > scsi_io_completion?  The code pathway involves getting rid of the 
> > scsi_cmnd and keeping only the struct request, so I don't know where an 
> > appropriate place would be to store the retry counter.
> > 
> > Any ideas?
> 
> How about just improving the check for correctly failed 10-byte command,
> so we only clear the flag for the right reason? I think your problem is
> likely due to the check being too relaxed right now.
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index ce9d73a..1f27827 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -884,7 +884,8 @@ void scsi_io_completion(struct scsi_cmnd
>  			* system where READ CAPACITY failed, we may have read
>  			* past the end of the disk.
>  		 	*/
> -			if (cmd->device->use_10_for_rw &&
> +			if ((cmd->device->use_10_for_rw &&
> +			    sshdr.asc == 0x20 && sshdr.ascq == 0x00) &&
>  			    (cmd->cmnd[0] == READ_10 ||
>  			     cmd->cmnd[0] == WRITE_10)) {
>  				cmd->device->use_10_for_rw = 0;

This will not help.  There's a log available at

http://www.cs.pdx.edu/~cklin/usb-error/0-original-kern.log

showing an example.  Here's the relevant part:

Nov 19 00:31:13 rho kernel: usb-storage: Command READ_10 (10 bytes)
Nov 19 00:31:13 rho kernel: usb-storage:  28 00 02 54 29 78 00 00 08 00
Nov 19 00:31:13 rho kernel: usb-storage: Bulk Command S 0x43425355 T 0x8 L 4096 F 128 Trg 0 LUN 0 CL 12
Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes
Nov 19 00:31:13 rho kernel: usb-storage: Status code 0; transferred 31/31
Nov 19 00:31:13 rho kernel: usb-storage: -- transfer complete
Nov 19 00:31:13 rho kernel: usb-storage: Bulk command transfer result=0
Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_sglist: xfer 4096 bytes, 1 entries
Nov 19 00:31:13 rho kernel: usb-storage: Status code -32; transferred 4032/4096
Nov 19 00:31:13 rho kernel: usb-storage: clearing endpoint halt for pipe 0xc0020380
Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_control_msg: rq=01 rqtype=02 value=0000 index=84 len=0
Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_clear_halt: result = 0
Nov 19 00:31:13 rho kernel: usb-storage: Bulk data transfer result 0x2
Nov 19 00:31:13 rho kernel: usb-storage: Attempting to get CSW...
Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 13 bytes
Nov 19 00:31:13 rho kernel: usb-storage: Status code 0; transferred 13/13
Nov 19 00:31:13 rho kernel: usb-storage: -- transfer complete
Nov 19 00:31:13 rho kernel: usb-storage: Bulk status result = 0
Nov 19 00:31:13 rho kernel: usb-storage: Bulk Status S 0x53425355 T 0x8 R 64 Stat 0x1
Nov 19 00:31:13 rho kernel: usb-storage: -- transport indicates command failure
Nov 19 00:31:13 rho kernel: usb-storage: -- unexpectedly short transfer
Nov 19 00:31:13 rho kernel: usb-storage: Issuing auto-REQUEST_SENSE
Nov 19 00:31:13 rho kernel: usb-storage: Bulk Command S 0x43425355 T 0x9 L 18 F 128 Trg 0 LUN 0 CL 12
Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes
Nov 19 00:31:13 rho kernel: usb-storage: Status code 0; transferred 31/31
Nov 19 00:31:13 rho kernel: usb-storage: -- transfer complete
Nov 19 00:31:13 rho kernel: usb-storage: Bulk command transfer result=0
Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 18 bytes
Nov 19 00:31:13 rho kernel: usb-storage: Status code 0; transferred 18/18
Nov 19 00:31:13 rho kernel: usb-storage: -- transfer complete
Nov 19 00:31:13 rho kernel: usb-storage: Bulk data transfer result 0x0
Nov 19 00:31:13 rho kernel: usb-storage: Attempting to get CSW...
Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 13 bytes
Nov 19 00:31:13 rho kernel: usb-storage: Status code 0; transferred 13/13
Nov 19 00:31:13 rho kernel: usb-storage: -- transfer complete
Nov 19 00:31:13 rho kernel: usb-storage: Bulk status result = 0
Nov 19 00:31:13 rho kernel: usb-storage: Bulk Status S 0x53425355 T 0x9 R 0 Stat 0x0
Nov 19 00:31:13 rho kernel: usb-storage: -- Result from auto-sense is 0
Nov 19 00:31:13 rho kernel: usb-storage: -- code: 0x70, key: 0x5, ASC: 0x20, ASCQ: 0x0

In this case the drive transmitted all but the last 64 bytes of a 
4096-byte READ(10), and the sense data was exactly what you want to test 
for.

Alan Stern


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Device flags: use_10_for_rw and use_10_for_ms
  2005-11-29 21:23   ` Alan Stern
@ 2005-11-29 22:48     ` Patrick Mansfield
  2005-11-30 15:46       ` Alan Stern
  2005-11-30  8:08     ` Jens Axboe
  1 sibling, 1 reply; 8+ messages in thread
From: Patrick Mansfield @ 2005-11-29 22:48 UTC (permalink / raw)
  To: Alan Stern; +Cc: Jens Axboe, James Bottomley, SCSI development list

On Tue, Nov 29, 2005 at 04:23:16PM -0500, Alan Stern wrote:

Alan -

Looking at the full log (with no patches) and your earlier patch, I'm
confused by two things.

First, the READ_10 command being sent after we get the ILLEGAL REQUEST.

Do you know how that READ_10 is sent? Is the upper layer add_disk() check
partition code retrying? Then I'd expect a READ_6. If it were already
prepped, I guess we can get another READ_10, but I thought that the
partition read/check code only issued one read at a time.

The scsi_check_sense() should not be allowing the retry, we have a "return
SUCCESS" there. If we did want to retry via scmd->retries, we might be
able to add code into scis_check_sense().

And:

How can the current scsi_lib.c scsi_io_completion() *ever* requeue after
an sd.c read/write has turned off use_10_for_rw in sd_rw_intr()?

That is, sd.c sd_rw_intr() gets ILLEGAL REQUEST, and clears use_10_for_rw,
this snippet:

		case ILLEGAL_REQUEST:
			if (SCpnt->device->use_10_for_rw &&
			    (SCpnt->cmnd[0] == READ_10 ||
			     SCpnt->cmnd[0] == WRITE_10))
				SCpnt->device->use_10_for_rw = 0;

Then it calls scsi_io_completion(), where we only requeue if use_10_for_rw
is still set, so we never requeue via this code path. That is why without
your patch we get an error output. Correct?!

So with the patch you posted earlier, all it does (in scsi_io_completion())
is set use_10_for_rw back to 1, and then requeue the IO.

If you remove (or move) that sd.c code, it seems you'll hit this, and
figure that you do not want to switch from 6 to 10 byte commands (and
perhaps back) at all.

Then we need a method so we don't switch: another bit or state set after
a successful read or write command (and I guess one for succesful mode
sense command). 

We need only check the do-not-switch flag if we get an ILLEGAL REQUEST. We
already have to check use_10_for_rw before every read or write request.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Device flags: use_10_for_rw and use_10_for_ms
  2005-11-29 21:23   ` Alan Stern
  2005-11-29 22:48     ` Patrick Mansfield
@ 2005-11-30  8:08     ` Jens Axboe
  2005-11-30 14:46       ` James Bottomley
  1 sibling, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2005-11-30  8:08 UTC (permalink / raw)
  To: Alan Stern; +Cc: Patrick Mansfield, James Bottomley, SCSI development list

On Tue, Nov 29 2005, Alan Stern wrote:
> On Tue, 29 Nov 2005, Jens Axboe wrote:
> 
> > > The hard problem is that sometimes devices return an ILLEGAL_REQUEST sense
> > > key when they shouldn't.  I posted an example a week or two ago; what
> > > happened was the use_10_for_rw flag got turned off and from then on only
> > > 6-byte commands were used (and of course the device failed to recognize
> > > them).
> > > 
> > > Clearly we need a mechanism for going the other way: when a 6-byte command 
> > > gets ILLEGAL_REQUEST sense, turn the use_10_for_xxx flag back on.  The 
> > > difficulty is that this will cause an infinite retry loop if the device 
> > > doesn't like either form of the command.
> > > 
> > > Is there a standard way to limit the number of retries for these cases in 
> > > scsi_io_completion?  The code pathway involves getting rid of the 
> > > scsi_cmnd and keeping only the struct request, so I don't know where an 
> > > appropriate place would be to store the retry counter.
> > > 
> > > Any ideas?
> > 
> > How about just improving the check for correctly failed 10-byte command,
> > so we only clear the flag for the right reason? I think your problem is
> > likely due to the check being too relaxed right now.
> > 
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index ce9d73a..1f27827 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -884,7 +884,8 @@ void scsi_io_completion(struct scsi_cmnd
> >  			* system where READ CAPACITY failed, we may have read
> >  			* past the end of the disk.
> >  		 	*/
> > -			if (cmd->device->use_10_for_rw &&
> > +			if ((cmd->device->use_10_for_rw &&
> > +			    sshdr.asc == 0x20 && sshdr.ascq == 0x00) &&
> >  			    (cmd->cmnd[0] == READ_10 ||
> >  			     cmd->cmnd[0] == WRITE_10)) {
> >  				cmd->device->use_10_for_rw = 0;
> 
> This will not help.  There's a log available at
> 
> http://www.cs.pdx.edu/~cklin/usb-error/0-original-kern.log
> 
> showing an example.  Here's the relevant part:
> 
> Nov 19 00:31:13 rho kernel: usb-storage: Command READ_10 (10 bytes)
> Nov 19 00:31:13 rho kernel: usb-storage:  28 00 02 54 29 78 00 00 08 00
> Nov 19 00:31:13 rho kernel: usb-storage: Bulk Command S 0x43425355 T 0x8 L 4096 F 128 Trg 0 LUN 0 CL 12
> Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes
> Nov 19 00:31:13 rho kernel: usb-storage: Status code 0; transferred 31/31
> Nov 19 00:31:13 rho kernel: usb-storage: -- transfer complete
> Nov 19 00:31:13 rho kernel: usb-storage: Bulk command transfer result=0
> Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_sglist: xfer 4096 bytes, 1 entries
> Nov 19 00:31:13 rho kernel: usb-storage: Status code -32; transferred 4032/4096
> Nov 19 00:31:13 rho kernel: usb-storage: clearing endpoint halt for pipe 0xc0020380
> Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_control_msg: rq=01 rqtype=02 value=0000 index=84 len=0
> Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_clear_halt: result = 0
> Nov 19 00:31:13 rho kernel: usb-storage: Bulk data transfer result 0x2
> Nov 19 00:31:13 rho kernel: usb-storage: Attempting to get CSW...
> Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 13 bytes
> Nov 19 00:31:13 rho kernel: usb-storage: Status code 0; transferred 13/13
> Nov 19 00:31:13 rho kernel: usb-storage: -- transfer complete
> Nov 19 00:31:13 rho kernel: usb-storage: Bulk status result = 0
> Nov 19 00:31:13 rho kernel: usb-storage: Bulk Status S 0x53425355 T 0x8 R 64 Stat 0x1
> Nov 19 00:31:13 rho kernel: usb-storage: -- transport indicates command failure
> Nov 19 00:31:13 rho kernel: usb-storage: -- unexpectedly short transfer
> Nov 19 00:31:13 rho kernel: usb-storage: Issuing auto-REQUEST_SENSE
> Nov 19 00:31:13 rho kernel: usb-storage: Bulk Command S 0x43425355 T 0x9 L 18 F 128 Trg 0 LUN 0 CL 12
> Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 31 bytes
> Nov 19 00:31:13 rho kernel: usb-storage: Status code 0; transferred 31/31
> Nov 19 00:31:13 rho kernel: usb-storage: -- transfer complete
> Nov 19 00:31:13 rho kernel: usb-storage: Bulk command transfer result=0
> Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 18 bytes
> Nov 19 00:31:13 rho kernel: usb-storage: Status code 0; transferred 18/18
> Nov 19 00:31:13 rho kernel: usb-storage: -- transfer complete
> Nov 19 00:31:13 rho kernel: usb-storage: Bulk data transfer result 0x0
> Nov 19 00:31:13 rho kernel: usb-storage: Attempting to get CSW...
> Nov 19 00:31:13 rho kernel: usb-storage: usb_stor_bulk_transfer_buf: xfer 13 bytes
> Nov 19 00:31:13 rho kernel: usb-storage: Status code 0; transferred 13/13
> Nov 19 00:31:13 rho kernel: usb-storage: -- transfer complete
> Nov 19 00:31:13 rho kernel: usb-storage: Bulk status result = 0
> Nov 19 00:31:13 rho kernel: usb-storage: Bulk Status S 0x53425355 T 0x9 R 0 Stat 0x0
> Nov 19 00:31:13 rho kernel: usb-storage: -- Result from auto-sense is 0
> Nov 19 00:31:13 rho kernel: usb-storage: -- code: 0x70, key: 0x5, ASC: 0x20, ASCQ: 0x0
> 
> In this case the drive transmitted all but the last 64 bytes of a 
> 4096-byte READ(10), and the sense data was exactly what you want to test 
> for.

Oh well, so much for expecting than even the most basic SCSI behaviour
is always implemented correctly. A device must only return
0x05/0x20/0x00 for an illegal opcode. It's a little suspicious I'd say,
are you sure it isn't something else affecting this? Even though there
are crappy devices out there, this seems a little too odd to me.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Device flags: use_10_for_rw and use_10_for_ms
  2005-11-30  8:08     ` Jens Axboe
@ 2005-11-30 14:46       ` James Bottomley
  2005-11-30 14:51         ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: James Bottomley @ 2005-11-30 14:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Alan Stern, Patrick Mansfield, SCSI development list

On Wed, 2005-11-30 at 09:08 +0100, Jens Axboe wrote:
> Oh well, so much for expecting than even the most basic SCSI behaviour
> is always implemented correctly. A device must only return
> 0x05/0x20/0x00 for an illegal opcode. It's a little suspicious I'd say,
> are you sure it isn't something else affecting this? Even though there
> are crappy devices out there, this seems a little too odd to me.

And anyway, just because it doesn't actually work in the failing case
doesn't mean this isn't a good patch ... it's certainly unsafe to switch
back to 6 byte commands without checking for the correct illegal opcode
sense, so we should put it in anyway.

James



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Device flags: use_10_for_rw and use_10_for_ms
  2005-11-30 14:46       ` James Bottomley
@ 2005-11-30 14:51         ` Jens Axboe
  0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2005-11-30 14:51 UTC (permalink / raw)
  To: James Bottomley; +Cc: Alan Stern, Patrick Mansfield, SCSI development list

On Wed, Nov 30 2005, James Bottomley wrote:
> On Wed, 2005-11-30 at 09:08 +0100, Jens Axboe wrote:
> > Oh well, so much for expecting than even the most basic SCSI behaviour
> > is always implemented correctly. A device must only return
> > 0x05/0x20/0x00 for an illegal opcode. It's a little suspicious I'd say,
> > are you sure it isn't something else affecting this? Even though there
> > are crappy devices out there, this seems a little too odd to me.
> 
> And anyway, just because it doesn't actually work in the failing case
> doesn't mean this isn't a good patch ... it's certainly unsafe to switch
> back to 6 byte commands without checking for the correct illegal opcode
> sense, so we should put it in anyway.

Yup, I think so too.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Device flags: use_10_for_rw and use_10_for_ms
  2005-11-29 22:48     ` Patrick Mansfield
@ 2005-11-30 15:46       ` Alan Stern
  0 siblings, 0 replies; 8+ messages in thread
From: Alan Stern @ 2005-11-30 15:46 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: Jens Axboe, James Bottomley, SCSI development list

On Tue, 29 Nov 2005, Patrick Mansfield wrote:

> On Tue, Nov 29, 2005 at 04:23:16PM -0500, Alan Stern wrote:
> 
> Alan -
> 
> Looking at the full log (with no patches) and your earlier patch, I'm
> confused by two things.
> 
> First, the READ_10 command being sent after we get the ILLEGAL REQUEST.
> 
> Do you know how that READ_10 is sent? Is the upper layer add_disk() check
> partition code retrying? Then I'd expect a READ_6. If it were already
> prepped, I guess we can get another READ_10, but I thought that the
> partition read/check code only issued one read at a time.

I didn't try to track it down in detail, but I think you're right that the
upper layer partition code is doing the retry.  In the log it appeared as
another READ_10 because the sector number was too large to fit in a READ_6
(sd automatically uses the larger command in that situation).  If you look
farther down in the log you'll see an attempt to read sector 0 using
READ_6.

> The scsi_check_sense() should not be allowing the retry, we have a "return
> SUCCESS" there. If we did want to retry via scmd->retries, we might be
> able to add code into scis_check_sense().

Maybe that would be a better way of doing it.  There are more levels of 
retrying in the SCSI core than I fully understand.  :-(

> And:
> 
> How can the current scsi_lib.c scsi_io_completion() *ever* requeue after
> an sd.c read/write has turned off use_10_for_rw in sd_rw_intr()?
> 
> That is, sd.c sd_rw_intr() gets ILLEGAL REQUEST, and clears use_10_for_rw,
> this snippet:
> 
> 		case ILLEGAL_REQUEST:
> 			if (SCpnt->device->use_10_for_rw &&
> 			    (SCpnt->cmnd[0] == READ_10 ||
> 			     SCpnt->cmnd[0] == WRITE_10))
> 				SCpnt->device->use_10_for_rw = 0;
> 
> Then it calls scsi_io_completion(), where we only requeue if use_10_for_rw
> is still set, so we never requeue via this code path. That is why without
> your patch we get an error output. Correct?!

Partly orrect.  That's why this command gives an error output.  But then
later commands use READ_6, which has no hope of succeeding.

> So with the patch you posted earlier, all it does (in scsi_io_completion())
> is set use_10_for_rw back to 1, and then requeue the IO.

Yes.  That was a preliminary, see-if-this-fixes-your-problem sort of 
patch.  I never intended it to go into the kernel.  It did kinda work; if 
you look at the second log posted on that web site you'll see where the 
core did switch to READ_6 for one command and then switched back to 
READ_10.

> If you remove (or move) that sd.c code, it seems you'll hit this, and
> figure that you do not want to switch from 6 to 10 byte commands (and
> perhaps back) at all.
> 
> Then we need a method so we don't switch: another bit or state set after
> a successful read or write command (and I guess one for succesful mode
> sense command). 
> 
> We need only check the do-not-switch flag if we get an ILLEGAL REQUEST. We
> already have to check use_10_for_rw before every read or write request.

To put it another way, we need two flags.  One to indicate when we
definitely know which sort of commands the device will accept, and one to
indicate which type to use.  This sounds like a very good idea.  (And yes,
we should assume that some devices will want MODE_SENSE_6 along with 
READ_10!)

Another factor is important here.  The use_10_for_rw flag is used only in 
sd.c, not in sr.c.  (I haven't checked up on use_10_for_ms.)  Given that 
most of the flaky behavior seems to be associated with disk drives, should 
we keep this switching behavior only in sd.c or should it move to 
scsi_io_completion?  Or somewhere else?


On Wed, 30 Nov 2005, Jens Axboe wrote:

> Oh well, so much for expecting than even the most basic SCSI behaviour
> is always implemented correctly. A device must only return
> 0x05/0x20/0x00 for an illegal opcode. It's a little suspicious I'd say,
> are you sure it isn't something else affecting this? Even though there 
> are crappy devices out there, this seems a little too odd to me.

I'm quite sure there was nothing else affecting it.  This is just a rather 
strange device.


On Wed, 30 Nov 2005, James Bottomley wrote:

> And anyway, just because it doesn't actually work in the failing case
> doesn't mean this isn't a good patch ... it's certainly unsafe to switch
> back to 6 byte commands without checking for the correct illegal opcode 
> sense, so we should put it in anyway.

Below is an updated version of the patch.  I'm not sure it's really the 
right approach, but at least it's a starting point for further discussion.
>From testing I know that it _will_ cause an infinite retry loop under 
certain conditions, so it should not be accepted as is.  It also doesn't 
check the ASC and ASCQ -- but even if it did, it still wouldn't be good.

Alan Stern



Index: usb-2.6/drivers/scsi/scsi_lib.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_lib.c
+++ usb-2.6/drivers/scsi/scsi_lib.c
@@ -894,6 +894,36 @@ void scsi_io_completion(struct scsi_cmnd
 				 */
 				scsi_requeue_command(q, cmd);
 				result = 0;
+			} else if (!cmd->device->use_10_for_rw &&
+			    (cmd->cmnd[0] == READ_6 ||
+			     cmd->cmnd[0] == WRITE_6)) {
+				cmd->device->use_10_for_rw = 1;
+				/*
+				 * This will cause a retry with a 10-byte
+				 * command.
+				 */
+				scsi_requeue_command(q, cmd);
+				result = 0;
+			} else if (cmd->device->use_10_for_ms &&
+			    (cmd->cmnd[0] == MODE_SENSE_10 ||
+			     cmd->cmnd[0] == MODE_SELECT_10)) {
+				cmd->device->use_10_for_ms = 0;
+				/*
+				 * This will cause a retry with a 6-byte
+				 * command.
+				 */
+				scsi_requeue_command(q, cmd);
+				result = 0;
+			} else if (!cmd->device->use_10_for_ms &&
+			    (cmd->cmnd[0] == MODE_SENSE ||
+			     cmd->cmnd[0] == MODE_SELECT)) {
+				cmd->device->use_10_for_ms = 1;
+				/*
+				 * This will cause a retry with a 10-byte
+				 * command.
+				 */
+				scsi_requeue_command(q, cmd);
+				result = 0;
 			} else {
 				scsi_end_request(cmd, 0, this_count, 1);
 				return;
Index: usb-2.6/drivers/scsi/sd.c
===================================================================
--- usb-2.6.orig/drivers/scsi/sd.c
+++ usb-2.6/drivers/scsi/sd.c
@@ -944,17 +944,6 @@ static void sd_rw_intr(struct scsi_cmnd 
 			good_bytes = this_count;
 			break;
 
-		case ILLEGAL_REQUEST:
-			if (SCpnt->device->use_10_for_rw &&
-			    (SCpnt->cmnd[0] == READ_10 ||
-			     SCpnt->cmnd[0] == WRITE_10))
-				SCpnt->device->use_10_for_rw = 0;
-			if (SCpnt->device->use_10_for_ms &&
-			    (SCpnt->cmnd[0] == MODE_SENSE_10 ||
-			     SCpnt->cmnd[0] == MODE_SELECT_10))
-				SCpnt->device->use_10_for_ms = 0;
-			break;
-
 		default:
 			break;
 		}


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-11-30 15:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-28 21:43 Device flags: use_10_for_rw and use_10_for_ms Alan Stern
2005-11-29 20:03 ` Jens Axboe
2005-11-29 21:23   ` Alan Stern
2005-11-29 22:48     ` Patrick Mansfield
2005-11-30 15:46       ` Alan Stern
2005-11-30  8:08     ` Jens Axboe
2005-11-30 14:46       ` James Bottomley
2005-11-30 14:51         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).