public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH as494] Add a scsi_device flag for RETRY_HWERROR
@ 2005-03-24 16:45 Alan Stern
  2005-03-24 23:13 ` Douglas Gilbert
  0 siblings, 1 reply; 3+ messages in thread
From: Alan Stern @ 2005-03-24 16:45 UTC (permalink / raw)
  To: James Bottomley; +Cc: SCSI development list

James:

It turns out that a bunch of USB-IDE converters make the mistake of
returning SK = 04 (Hardware Error) whenever the IDE device signals any
sort of error, without bothering to distinguish recoverable from
non-recoverable errors.  The best way to handle this is for usb-storage to
set a per-device flag indicating that these errors should always be
retried.  The current scheme (blacklist flag but no per-device flag) isn't
well suited for this situation.

This patch adds the per-device flag and sets it initially based on the
blacklist setting.  Once this has been merged, a separate patch will be
submitted to Matt Dharm adding the corresponding support to usb-storage.

Alan Stern



Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

===== include/scsi/scsi_device.h 1.18 vs edited =====
--- 1.18/include/scsi/scsi_device.h	2005-03-22 01:44:56 -05:00
+++ edited/include/scsi/scsi_device.h	2005-03-23 16:58:05 -05:00
@@ -112,6 +112,7 @@
 	unsigned no_uld_attach:1; /* disable connecting to upper level drivers */
 	unsigned select_no_atn:1;
 	unsigned fix_capacity:1;	/* READ_CAPACITY is too high by 1 */
+	unsigned retry_hwerror:1;	/* Retry HARDWARE_ERROR */
 
 	unsigned int device_blocked;	/* Device returned QUEUE_FULL. */
 
===== drivers/scsi/scsi_error.c 1.48 vs edited =====
--- 1.48/drivers/scsi/scsi_error.c	2005-03-22 01:44:55 -05:00
+++ edited/drivers/scsi/scsi_error.c	2005-03-24 11:41:29 -05:00
@@ -31,7 +31,6 @@
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_ioctl.h>
 #include <scsi/scsi_request.h>
-#include <scsi/scsi_devinfo.h>
 
 #include "scsi_priv.h"
 #include "scsi_logging.h"
@@ -352,10 +351,7 @@
 		return NEEDS_RETRY;
 
 	case HARDWARE_ERROR:
-		if (scsi_get_device_flags(scmd->device,
-					scmd->device->vendor,
-					scmd->device->model)
-				& BLIST_RETRY_HWERROR)
+		if (scmd->device->retry_hwerror)
 			return NEEDS_RETRY;
 		else
 			return SUCCESS;
===== drivers/scsi/scsi_scan.c 1.68 vs edited =====
--- 1.68/drivers/scsi/scsi_scan.c	2005-03-22 01:48:14 -05:00
+++ edited/drivers/scsi/scsi_scan.c	2005-03-23 16:58:13 -05:00
@@ -725,6 +725,9 @@
 	if (*bflags & BLIST_NOT_LOCKABLE)
 		sdev->lockable = 0;
 
+	if (*bflags & BLIST_RETRY_HWERROR)
+		sdev->retry_hwerror = 1;
+
 	transport_configure_device(&sdev->sdev_gendev);
 
 	if (sdev->host->hostt->slave_configure)


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH as494] Add a scsi_device flag for RETRY_HWERROR
  2005-03-24 16:45 [PATCH as494] Add a scsi_device flag for RETRY_HWERROR Alan Stern
@ 2005-03-24 23:13 ` Douglas Gilbert
  2005-03-25 15:37   ` Alan Stern
  0 siblings, 1 reply; 3+ messages in thread
From: Douglas Gilbert @ 2005-03-24 23:13 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Bottomley, SCSI development list

Alan Stern wrote:
> James:
> 
> It turns out that a bunch of USB-IDE converters make the mistake of
> returning SK = 04 (Hardware Error) whenever the IDE device signals any
> sort of error, without bothering to distinguish recoverable from
> non-recoverable errors. 

Alan,
The sense key of HARDWARE ERROR is a superset of MEDIUM ERROR.
SBC-2 treats them as synonymous. If it is a "real" medium
error then the LBA of the first (i.e. lowest address) bad
block should be placed in the "info" field and the "valid"
bit should be set. If this is done the block layer does
its job well.

Also for both hardware and medium errors the "sense key
specific" field (if SKSV=1) reports the actual retry
count. The read/write retry count is set in the "read
write error recovery" mode page. Anyways if the disk
has already retried reading the bad block 64 times (say)
what is the point of the mid level retrying??

<aside>
Another interesting quirk I noticed recently was a disk
yielding "recovered error, failure prediction threshold
exceeded" even though the PER bit was clear (which is the
normal case). I had expected recovered errors to be
bad block specific but this error was from SMART saying
"your disk hasn't got long to live". BTW That recovered
error was repeated every 10 minutes or so. Probably a
useful alerte to put in the log and on the console.

Doug Gilbert

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH as494] Add a scsi_device flag for RETRY_HWERROR
  2005-03-24 23:13 ` Douglas Gilbert
@ 2005-03-25 15:37   ` Alan Stern
  0 siblings, 0 replies; 3+ messages in thread
From: Alan Stern @ 2005-03-25 15:37 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: James Bottomley, SCSI development list

On Fri, 25 Mar 2005, Douglas Gilbert wrote:

> Alan Stern wrote:
> > James:
> > 
> > It turns out that a bunch of USB-IDE converters make the mistake of
> > returning SK = 04 (Hardware Error) whenever the IDE device signals any
> > sort of error, without bothering to distinguish recoverable from
> > non-recoverable errors. 
> 
> Alan,
> The sense key of HARDWARE ERROR is a superset of MEDIUM ERROR.
> SBC-2 treats them as synonymous. If it is a "real" medium
> error then the LBA of the first (i.e. lowest address) bad
> block should be placed in the "info" field and the "valid"
> bit should be set. If this is done the block layer does
> its job well.

I don't have a copy of SBC-2.  The (very old) document I do have and the 
code in scsi_error.c both treat HARDWARE_ERROR as nonrecoverable and 
MEDIUM_ERROR as recoverable.  So in a sense the patch does go some way 
towards treating one as a superset of the other.

Were the errors in question "real" medium errors?  I don't know -- maybe 
not.  There wasn't much information available.

> Also for both hardware and medium errors the "sense key
> specific" field (if SKSV=1) reports the actual retry
> count. The read/write retry count is set in the "read
> write error recovery" mode page. Anyways if the disk
> has already retried reading the bad block 64 times (say)
> what is the point of the mid level retrying??

For the devices I've had to deal with, it's not obvious that the disk has
already retried 64 times.  (And the error-recovery mode page may not exist
either; remember these are slightly buggy USB adaptors trying to pretend
that an IDE drive is a SCSI device.)  All I know for certain is that the
first time the command was issued the we got back HARDWARE_ERROR and then
on a retry the command succeeded.

(The IBM drive that started this whole thing is a different story.  But 
regardless, it's clear that having these patches helps with some devices 
and doesn't hurt others.)

Alan Stern


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-03-25 15:37 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-24 16:45 [PATCH as494] Add a scsi_device flag for RETRY_HWERROR Alan Stern
2005-03-24 23:13 ` Douglas Gilbert
2005-03-25 15:37   ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox