public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@suse.de>
To: Hannes Reinecke <hare@suse.de>
Cc: linux-scsi@vger.kernel.org
Subject: Re: [PATCH] sd: retry read_capacity on UNIT_ATTENTION
Date: Thu, 01 Apr 2010 10:30:01 -0400	[thread overview]
Message-ID: <1270132201.4439.14.camel@mulgrave.site> (raw)
In-Reply-To: <20100401134428.7E4D1337C5@ochil.suse.de>

On Thu, 2010-04-01 at 15:44 +0200, Hannes Reinecke wrote:
> Hazard testing uncovered yet another bug in sd. Under heavy
> reset activity the retry counter might be exhausted and
> the command will be returned with sense UNIT_ATTENTION/0x29/00
> (POWER ON, RESET, OR BUS DEVICE RESET OCCURRED). In those
> cases we should just increase the retry counter again,
> retrying one more to clear up this Unit Attention state.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 1962bea..7d75a21 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1454,8 +1454,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>  		if (media_not_present(sdkp, &sshdr))
>  			return -ENODEV;
>  
> -		if (the_result)
> +		if (the_result) {
>  			sense_valid = scsi_sense_valid(&sshdr);
> +			if (sense_valid &&
> +			    sshdr.sense_key == UNIT_ATTENTION &&
> +			    sshdr.asc = 0x29 && sshdr.asq == 0x00)
                                      ^^^^
should be ==

> +			    /* Device reset might occur several times,
> +			     * give it one more chance */
> +			    retries++;
> +		}

Firstly, not even compile checked:

drivers/scsi/sd.c: In function ‘read_capacity_10’:
drivers/scsi/sd.c:1558: error: ‘struct scsi_sense_hdr’ has no member named ‘asq’

Secondly, we can't quite do this.  Some devices (only broken ones in my
experience) will reply UNIT_ATTENTION I was RESET forever, leading to a
loop here.  Additionally, a massive reset storm on a shared bus would
DoS the code here, so there must be a give up point after a reasonable
number of retries.

The third problem is that if this is happening to a large device, we
only catch it in RC10 ... so we'll report undersize if the device is >
SPC2

How about this instead?

James

---

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7b75c8a..cdb8ed6 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1432,6 +1432,8 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
 #error RC16_LEN must not be more than SD_BUF_SIZE
 #endif
 
+#define READ_CAPACITY_RETRIES_ON_RESET	10
+
 static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 						unsigned char *buffer)
 {
@@ -1439,7 +1441,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
 	int the_result;
-	int retries = 3;
+	int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
 	unsigned int alignment;
 	unsigned long long lba;
 	unsigned sector_size;
@@ -1468,6 +1470,13 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
 				 * Invalid Field in CDB, just retry
 				 * silently with RC10 */
 				return -EINVAL;
+			if (sense_valid &&
+			    sshdr.sense_key == UNIT_ATTENTION &&
+			    sshdr.asc == 0x29 && sshdr.ascq == 0x00)
+				/* Device reset might occur several times,
+				 * give it one more chance */
+				if (--reset_retries > 0)
+					continue;
 		}
 		retries--;
 
@@ -1526,7 +1535,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
 	int the_result;
-	int retries = 3;
+	int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
 	sector_t lba;
 	unsigned sector_size;
 
@@ -1542,8 +1551,16 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
 		if (media_not_present(sdkp, &sshdr))
 			return -ENODEV;
 
-		if (the_result)
+		if (the_result) {
 			sense_valid = scsi_sense_valid(&sshdr);
+			if (sense_valid &&
+			    sshdr.sense_key == UNIT_ATTENTION &&
+			    sshdr.asc == 0x29 && sshdr.ascq == 0x00)
+				/* Device reset might occur several times,
+				 * give it one more chance */
+				if (--reset_retries > 0)
+					continue;
+		}			
 		retries--;
 
 	} while (the_result && retries);



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-04-01 14:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-01 13:44 [PATCH] sd: retry read_capacity on UNIT_ATTENTION Hannes Reinecke
2010-04-01 14:30 ` James Bottomley [this message]
2010-04-08  7:36   ` Hannes Reinecke
2010-04-08 13:48     ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1270132201.4439.14.camel@mulgrave.site \
    --to=james.bottomley@suse.de \
    --cc=hare@suse.de \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox