[PATCH] [SCSI] sd: make error handling more robust (v2)

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] [SCSI] sd: make error handling more robust (v2)
@ 2008-02-01 17:03 Tony Battersby
  2008-02-01 20:13 ` Luben Tuikov
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Tony Battersby @ 2008-02-01 17:03 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org, James Bottomley; +Cc: Luben Tuikov, Salyzyn, Mark

This patch fixes a problem with some out-of-spec SCSI disks that report
hardware or medium errors incorrectly.  Without the patch, the kernel
may silently ignore a failed write command or return corrupted data on a
failed read command.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
---

This is a simplified version of the original patch that fixes just the
problem at hand, without trying to handle other theoretical out-of-spec
cases.

Applies to kernels 2.6.18 - 2.6.24-git10+.

--- linux-2.6.24-git10/drivers/scsi/sd.c.orig	2008-02-01 11:24:37.000000000 -0500
+++ linux-2.6.24-git10/drivers/scsi/sd.c	2008-02-01 11:26:12.000000000 -0500
@@ -990,6 +990,8 @@ static int sd_done(struct scsi_cmnd *SCp
 		/* This computation should always be done in terms of
 		 * the resolution of the device's medium.
 		 */
+		if (bad_lba < start_lba)
+			goto out;
 		good_bytes = (bad_lba - start_lba)*SCpnt->device->sector_size;
 		break;
 	case RECOVERED_ERROR:



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-01 17:03 [PATCH] [SCSI] sd: make error handling more robust (v2) Tony Battersby
@ 2008-02-01 20:13 ` Luben Tuikov
  2008-02-01 20:47 ` Salyzyn, Mark
  2008-02-02 22:06 ` James Bottomley
  2 siblings, 0 replies; 15+ messages in thread
From: Luben Tuikov @ 2008-02-01 20:13 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org, James Bottomley, Tony Battersby; +Cc: Salyzyn, Mark

--- On Fri, 2/1/08, Tony Battersby <tonyb@cybernetics.com> wrote:

> From: Tony Battersby <tonyb@cybernetics.com>
> Subject: [PATCH] [SCSI] sd: make error handling more robust (v2)
> To: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>, "James Bottomley" <James.Bottomley@HansenPartnership.com>
> Cc: "Luben Tuikov" <ltuikov@yahoo.com>, "Salyzyn, Mark" <Mark_Salyzyn@adaptec.com>
> Date: Friday, February 1, 2008, 9:03 AM
> This patch fixes a problem with some out-of-spec SCSI disks
> that report
> hardware or medium errors incorrectly.  Without the patch,
> the kernel
> may silently ignore a failed write command or return
> corrupted data on a
> failed read command.
> 
> Signed-off-by: Tony Battersby <tonyb@cybernetics.com>

Acked-by: Luben Tuikov <ltuikov@yahoo.com>

> ---
> 
> This is a simplified version of the original patch that

Very simplified. ;-)

> fixes just the
> problem at hand, without trying to handle other theoretical
> out-of-spec
> cases.

Yes, it is chunk #2 of the patch I sent to you to
try on your hardware.

> 
> Applies to kernels 2.6.18 - 2.6.24-git10+.
> 
> --- linux-2.6.24-git10/drivers/scsi/sd.c.orig	2008-02-01
> 11:24:37.000000000 -0500
> +++ linux-2.6.24-git10/drivers/scsi/sd.c	2008-02-01
> 11:26:12.000000000 -0500
> @@ -990,6 +990,8 @@ static int sd_done(struct scsi_cmnd
> *SCp
>  		/* This computation should always be done in terms of
>  		 * the resolution of the device's medium.
>  		 */
> +		if (bad_lba < start_lba)
> +			goto out;
>  		good_bytes = (bad_lba -
> start_lba)*SCpnt->device->sector_size;
>  		break;
>  	case RECOVERED_ERROR:
> 
> 
> -
> To unsubscribe from this list: send the line
> "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at 
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-01 17:03 [PATCH] [SCSI] sd: make error handling more robust (v2) Tony Battersby
  2008-02-01 20:13 ` Luben Tuikov
@ 2008-02-01 20:47 ` Salyzyn, Mark
  2008-02-02  0:43   ` Mike Snitzer
  2008-02-02 22:06 ` James Bottomley
  2 siblings, 1 reply; 15+ messages in thread
From: Salyzyn, Mark @ 2008-02-01 20:47 UTC (permalink / raw)
  To: Tony Battersby, linux-scsi@vger.kernel.org, James Bottomley
  Cc: Luben Tuikov, Mike Snitzer, Greg KH

ACK

Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>

Mike Snitzer tested this also in conjunction with the aacraid driver and it resolved the error propagation problem he experienced with MD on a 2.6.22.16 kernel, recommend that this also be added to any stabilization for these other kernels. By his permission:

Signed-off-by: Mike Snitzer <snitzer@gmail.com>

Sincerely -- Mark Salyzyn

> -----Original Message-----
> From: Tony Battersby [mailto:tonyb@cybernetics.com]
> Sent: Friday, February 01, 2008 12:03 PM
> To: linux-scsi@vger.kernel.org; James Bottomley
> Cc: Luben Tuikov; Salyzyn, Mark
> Subject: [PATCH] [SCSI] sd: make error handling more robust (v2)
>
> This patch fixes a problem with some out-of-spec SCSI disks
> that report
> hardware or medium errors incorrectly.  Without the patch, the kernel
> may silently ignore a failed write command or return
> corrupted data on a
> failed read command.
>
> Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> ---
>
> This is a simplified version of the original patch that fixes just the
> problem at hand, without trying to handle other theoretical
> out-of-spec
> cases.
>
> Applies to kernels 2.6.18 - 2.6.24-git10+.
>
> --- linux-2.6.24-git10/drivers/scsi/sd.c.orig   2008-02-01
> 11:24:37.000000000 -0500
> +++ linux-2.6.24-git10/drivers/scsi/sd.c        2008-02-01
> 11:26:12.000000000 -0500
> @@ -990,6 +990,8 @@ static int sd_done(struct scsi_cmnd *SCp
>                 /* This computation should always be done in terms of
>                  * the resolution of the device's medium.
>                  */
> +               if (bad_lba < start_lba)
> +                       goto out;
>                 good_bytes = (bad_lba -
> start_lba)*SCpnt->device->sector_size;
>                 break;
>         case RECOVERED_ERROR:
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-01 20:47 ` Salyzyn, Mark
@ 2008-02-02  0:43   ` Mike Snitzer
  2008-02-02 20:24     ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2008-02-02  0:43 UTC (permalink / raw)
  To: Greg KH, stable
  Cc: Salyzyn, Mark, Tony Battersby, linux-scsi@vger.kernel.org,
	James Bottomley, Luben Tuikov

Hi Greg,

Given your recent call for review of the next 2.6.22-stable I'd
appreciate it if you and the rest of the stable team would strongly
consider this SCSI IO error propagation fix for inclusion in 2.6.22.17
(as well as the other stable trees).

please advise, thanks.
Mike


On Feb 1, 2008 3:47 PM, Salyzyn, Mark <Mark_Salyzyn@adaptec.com> wrote:
> ACK
>
> Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
>
> Mike Snitzer tested this also in conjunction with the aacraid driver and it resolved the error propagation problem he experienced with MD on a 2.6.22.16 kernel, recommend that this also be added to any stabilization for these other kernels. By his permission:
>
> Signed-off-by: Mike Snitzer <snitzer@gmail.com>
>
> Sincerely -- Mark Salyzyn
>
> > -----Original Message-----
> > From: Tony Battersby [mailto:tonyb@cybernetics.com]
> > Sent: Friday, February 01, 2008 12:03 PM
> > To: linux-scsi@vger.kernel.org; James Bottomley
> > Cc: Luben Tuikov; Salyzyn, Mark
> > Subject: [PATCH] [SCSI] sd: make error handling more robust (v2)
> >
> > This patch fixes a problem with some out-of-spec SCSI disks
> > that report
> > hardware or medium errors incorrectly.  Without the patch, the kernel
> > may silently ignore a failed write command or return
> > corrupted data on a
> > failed read command.
> >
> > Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> > ---
> >
> > This is a simplified version of the original patch that fixes just the
> > problem at hand, without trying to handle other theoretical
> > out-of-spec
> > cases.
> >
> > Applies to kernels 2.6.18 - 2.6.24-git10+.
> >
> > --- linux-2.6.24-git10/drivers/scsi/sd.c.orig   2008-02-01
> > 11:24:37.000000000 -0500
> > +++ linux-2.6.24-git10/drivers/scsi/sd.c        2008-02-01
> > 11:26:12.000000000 -0500
> > @@ -990,6 +990,8 @@ static int sd_done(struct scsi_cmnd *SCp
> >                 /* This computation should always be done in terms of
> >                  * the resolution of the device's medium.
> >                  */
> > +               if (bad_lba < start_lba)
> > +                       goto out;
> >                 good_bytes = (bad_lba -
> > start_lba)*SCpnt->device->sector_size;
> >                 break;
> >         case RECOVERED_ERROR:
> >
> >
> >
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-02  0:43   ` Mike Snitzer
@ 2008-02-02 20:24     ` Greg KH
  2008-02-02 20:56       ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2008-02-02 20:24 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: stable, Salyzyn, Mark, Tony Battersby, linux-scsi@vger.kernel.org,
	James Bottomley, Luben Tuikov

On Fri, Feb 01, 2008 at 07:43:54PM -0500, Mike Snitzer wrote:
> Hi Greg,
> 
> Given your recent call for review of the next 2.6.22-stable I'd
> appreciate it if you and the rest of the stable team would strongly
> consider this SCSI IO error propagation fix for inclusion in 2.6.22.17
> (as well as the other stable trees).

Is the patch in Linus's tree already?

If so, what git commit id?  That's a requirement to go into the stable
tree...

If not, we need to wait until it is.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-02 20:24     ` Greg KH
@ 2008-02-02 20:56       ` Mike Snitzer
  2008-02-02 21:08         ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2008-02-02 20:56 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, Salyzyn, Mark, Tony Battersby, linux-scsi@vger.kernel.org,
	James Bottomley, Luben Tuikov

On Feb 2, 2008 3:24 PM, Greg KH <greg@kroah.com> wrote:
> On Fri, Feb 01, 2008 at 07:43:54PM -0500, Mike Snitzer wrote:
> > Hi Greg,
> >
> > Given your recent call for review of the next 2.6.22-stable I'd
> > appreciate it if you and the rest of the stable team would strongly
> > consider this SCSI IO error propagation fix for inclusion in 2.6.22.17
> > (as well as the other stable trees).
>
> Is the patch in Linus's tree already?
>
> If so, what git commit id?  That's a requirement to go into the stable
> tree...
>
> If not, we need to wait until it is.

Fair enough.  Thanks for the clarification.

That likely leaves us with having James add it to his git tree for
Linus (provided he agrees with the change).  Which he and Luben seemed
to agree was the reasonable thin layer of protection that would be
allowed for such out of spec SCSI devices.

James, how would you like to proceed?  The patch fixes the good_bytes
!= 0 on HARDWARE_ERROR issue I have been chasing with Mark Salyzyn.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-02 20:56       ` Mike Snitzer
@ 2008-02-02 21:08         ` Greg KH
  0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2008-02-02 21:08 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: stable, Salyzyn, Mark, Tony Battersby, linux-scsi@vger.kernel.org,
	James Bottomley, Luben Tuikov

On Sat, Feb 02, 2008 at 03:56:30PM -0500, Mike Snitzer wrote:
> On Feb 2, 2008 3:24 PM, Greg KH <greg@kroah.com> wrote:
> > On Fri, Feb 01, 2008 at 07:43:54PM -0500, Mike Snitzer wrote:
> > > Hi Greg,
> > >
> > > Given your recent call for review of the next 2.6.22-stable I'd
> > > appreciate it if you and the rest of the stable team would strongly
> > > consider this SCSI IO error propagation fix for inclusion in 2.6.22.17
> > > (as well as the other stable trees).
> >
> > Is the patch in Linus's tree already?
> >
> > If so, what git commit id?  That's a requirement to go into the stable
> > tree...
> >
> > If not, we need to wait until it is.
> 
> Fair enough.  Thanks for the clarification.
> 
> That likely leaves us with having James add it to his git tree for
> Linus (provided he agrees with the change).  Which he and Luben seemed
> to agree was the reasonable thin layer of protection that would be
> allowed for such out of spec SCSI devices.
> 
> James, how would you like to proceed?  The patch fixes the good_bytes
> != 0 on HARDWARE_ERROR issue I have been chasing with Mark Salyzyn.

After it goes in, can someone notify stable@kernel.org that it went in,
the git id, and that it should be applied to all older kernel trees?

thanks,

greg "drowning in -stable patches" k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-01 17:03 [PATCH] [SCSI] sd: make error handling more robust (v2) Tony Battersby
  2008-02-01 20:13 ` Luben Tuikov
  2008-02-01 20:47 ` Salyzyn, Mark
@ 2008-02-02 22:06 ` James Bottomley
  2008-02-03 15:14   ` Mike Snitzer
  2008-02-04  9:11   ` [PATCH] [SCSI] sd: make error handling more robust (v2) Luben Tuikov
  2 siblings, 2 replies; 15+ messages in thread
From: James Bottomley @ 2008-02-02 22:06 UTC (permalink / raw)
  To: Tony Battersby; +Cc: linux-scsi@vger.kernel.org, Luben Tuikov, Salyzyn, Mark


On Fri, 2008-02-01 at 12:03 -0500, Tony Battersby wrote:
> This patch fixes a problem with some out-of-spec SCSI disks that report
> hardware or medium errors incorrectly.  Without the patch, the kernel
> may silently ignore a failed write command or return corrupted data on a
> failed read command.
> 
> Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> ---
> 
> This is a simplified version of the original patch that fixes just the
> problem at hand, without trying to handle other theoretical out-of-spec
> cases.

Actually, to restore the original check, this is what we want, isn't it?
Ok, so I also made the sector division logic futureproof for the day we
have > 4096 sector devices ...

James

---

>From 5ae2e4a8ff095aab5997f17068d3e4212c33f039 Mon Sep 17 00:00:00 2001
From: Tony Battersby <tonyb@cybernetics.com>
Date: Fri, 1 Feb 2008 12:03:27 -0500
Subject: [SCSI] sd: make error handling more robust

This patch fixes a problem with some out-of-spec SCSI disks that report
hardware or medium errors incorrectly.  Without the patch, the kernel
may silently ignore a failed write command or return corrupted data on a
failed read command.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
 drivers/scsi/sd.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

Index: BUILD-2.6/drivers/scsi/sd.c
===================================================================
--- BUILD-2.6.orig/drivers/scsi/sd.c	2008-02-02 15:43:20.000000000 -0600
+++ BUILD-2.6/drivers/scsi/sd.c	2008-02-02 16:01:24.000000000 -0600
@@ -929,6 +929,7 @@
 	unsigned int xfer_size = scsi_bufflen(SCpnt);
  	unsigned int good_bytes = result ? 0 : xfer_size;
  	u64 start_lba = SCpnt->request->sector;
+	u64 end_lba = SCpnt->request->sector + (xfer_size / 512);
  	u64 bad_lba;
 	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
@@ -967,26 +968,23 @@
 			goto out;
 		if (xfer_size <= SCpnt->device->sector_size)
 			goto out;
-		switch (SCpnt->device->sector_size) {
-		case 256:
+		if (SCpnt->device->sector_size < 512) {
+			/* only legitimate sector_size here is 256 */
 			start_lba <<= 1;
-			break;
-		case 512:
-			break;
-		case 1024:
-			start_lba >>= 1;
-			break;
-		case 2048:
-			start_lba >>= 2;
-			break;
-		case 4096:
-			start_lba >>= 3;
-			break;
-		default:
-			/* Print something here with limiting frequency. */
-			goto out;
-			break;
+			end_lba <<= 1;
+		} else {
+			/* be careful ... don't want any overflows */
+			u64 factor = SCpnt->device->sector_size / 512;
+			do_div(start_lba, factor);
+			do_div(end_lba, factor);
 		}
+
+		if (bad_lba < start_lba  || bad_lba >= end_lba)
+			/* the bad lba was reported incorrectly, we have
+			 * no idea where the error is
+			 */
+			goto out;
+
 		/* This computation should always be done in terms of
 		 * the resolution of the device's medium.
 		 */




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-02 22:06 ` James Bottomley
@ 2008-02-03 15:14   ` Mike Snitzer
  2008-02-04  9:14     ` Luben Tuikov
  2008-02-04  9:11   ` [PATCH] [SCSI] sd: make error handling more robust (v2) Luben Tuikov
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2008-02-03 15:14 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tony Battersby, linux-scsi@vger.kernel.org, Luben Tuikov,
	Salyzyn, Mark

On Feb 2, 2008 5:06 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> On Fri, 2008-02-01 at 12:03 -0500, Tony Battersby wrote:
> > This patch fixes a problem with some out-of-spec SCSI disks that report
> > hardware or medium errors incorrectly.  Without the patch, the kernel
> > may silently ignore a failed write command or return corrupted data on a
> > failed read command.
> >
> > Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> > ---
> >
> > This is a simplified version of the original patch that fixes just the
> > problem at hand, without trying to handle other theoretical out-of-spec
> > cases.
>
> Actually, to restore the original check, this is what we want, isn't it?
> Ok, so I also made the sector division logic futureproof for the day we
> have > 4096 sector devices ...
>
> James
>
> ---
>
> >From 5ae2e4a8ff095aab5997f17068d3e4212c33f039 Mon Sep 17 00:00:00 2001
> From: Tony Battersby <tonyb@cybernetics.com>
> Date: Fri, 1 Feb 2008 12:03:27 -0500
> Subject: [SCSI] sd: make error handling more robust
>
> This patch fixes a problem with some out-of-spec SCSI disks that report
> hardware or medium errors incorrectly.  Without the patch, the kernel
> may silently ignore a failed write command or return corrupted data on a
> failed read command.
>
> Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> Cc: Stable Tree <stable@kernel.org>
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>

I've verified that this patch fixes the 2.6.22.16 SCSI IO error
propagation issue I had when physically pulling a drive from an
enclosure connected to an aacraid controller.

Tested-by: Mike Snitzer <snitzer@gmail.com>

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-02 22:06 ` James Bottomley
  2008-02-03 15:14   ` Mike Snitzer
@ 2008-02-04  9:11   ` Luben Tuikov
  2008-02-04 21:22     ` James Bottomley
  1 sibling, 1 reply; 15+ messages in thread
From: Luben Tuikov @ 2008-02-04  9:11 UTC (permalink / raw)
  To: Tony Battersby, James Bottomley; +Cc: linux-scsi@vger.kernel.org, Salyzyn, Mark

--- On Sat, 2/2/08, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> From: James Bottomley <James.Bottomley@HansenPartnership.com>
> Subject: Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
> To: "Tony Battersby" <tonyb@cybernetics.com>
> Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>, "Luben Tuikov" <ltuikov@yahoo.com>, "Salyzyn, Mark" <Mark_Salyzyn@adaptec.com>
> Date: Saturday, February 2, 2008, 2:06 PM
> On Fri, 2008-02-01 at 12:03 -0500, Tony Battersby wrote:
> > This patch fixes a problem with some out-of-spec SCSI
> disks that report
> > hardware or medium errors incorrectly.  Without the
> patch, the kernel
> > may silently ignore a failed write command or return
> corrupted data on a
> > failed read command.
> > 
> > Signed-off-by: Tony Battersby
> <tonyb@cybernetics.com>
> > ---
> > 
> > This is a simplified version of the original patch
> that fixes just the
> > problem at hand, without trying to handle other
> theoretical out-of-spec
> > cases.
> 
> Actually, to restore the original check, this is what we
> want, isn't it?
> Ok, so I also made the sector division logic futureproof
> for the day we
> have > 4096 sector devices ...
> 
> James
> 
> ---
> 
> >From 5ae2e4a8ff095aab5997f17068d3e4212c33f039 Mon Sep
> 17 00:00:00 2001
> From: Tony Battersby <tonyb@cybernetics.com>
> Date: Fri, 1 Feb 2008 12:03:27 -0500
> Subject: [SCSI] sd: make error handling more robust
> 
> This patch fixes a problem with some out-of-spec SCSI disks
> that report
> hardware or medium errors incorrectly.  Without the patch,
> the kernel
> may silently ignore a failed write command or return
> corrupted data on a
> failed read command.
> 
> Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
> Cc: Stable Tree <stable@kernel.org>
> Signed-off-by: James Bottomley
> <James.Bottomley@HansenPartnership.com>
> ---
>  drivers/scsi/sd.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> Index: BUILD-2.6/drivers/scsi/sd.c
> ===================================================================
> --- BUILD-2.6.orig/drivers/scsi/sd.c	2008-02-02
> 15:43:20.000000000 -0600
> +++ BUILD-2.6/drivers/scsi/sd.c	2008-02-02
> 16:01:24.000000000 -0600
> @@ -929,6 +929,7 @@
>  	unsigned int xfer_size = scsi_bufflen(SCpnt);
>   	unsigned int good_bytes = result ? 0 : xfer_size;
>   	u64 start_lba = SCpnt->request->sector;
> +	u64 end_lba = SCpnt->request->sector + (xfer_size /
> 512);
>   	u64 bad_lba;
>  	struct scsi_sense_hdr sshdr;
>  	int sense_valid = 0;
> @@ -967,26 +968,23 @@
>  			goto out;
>  		if (xfer_size <= SCpnt->device->sector_size)
>  			goto out;
> -		switch (SCpnt->device->sector_size) {
> -		case 256:
> +		if (SCpnt->device->sector_size < 512) {
> +			/* only legitimate sector_size here is 256 */
>  			start_lba <<= 1;
> -			break;
> -		case 512:
> -			break;
> -		case 1024:
> -			start_lba >>= 1;
> -			break;
> -		case 2048:
> -			start_lba >>= 2;
> -			break;
> -		case 4096:
> -			start_lba >>= 3;
> -			break;
> -		default:
> -			/* Print something here with limiting frequency. */
> -			goto out;
> -			break;
> +			end_lba <<= 1;
> +		} else {
> +			/* be careful ... don't want any overflows */
> +			u64 factor = SCpnt->device->sector_size / 512;
> +			do_div(start_lba, factor);
> +			do_div(end_lba, factor);
>  		}
> +
> +		if (bad_lba < start_lba  || bad_lba >= end_lba)
> +			/* the bad lba was reported incorrectly, we have
> +			 * no idea where the error is
> +			 */
> +			goto out;
> +
>  		/* This computation should always be done in terms of
>  		 * the resolution of the device's medium.
>  		 */

Looks good except that "End LBA" is usually defined
to be something of the sort of "the LBA of the last
logical block accessed by the command" or "the LBA
of the logical block on which the command failed".

A spec savvy editor of this code would be
"pleasantly" surprised if they had to use "end_lba",
and didn't pay attention that it was actually
"End LBA" + 1.

   Luben


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-03 15:14   ` Mike Snitzer
@ 2008-02-04  9:14     ` Luben Tuikov
  2008-02-06 21:54       ` [PATCH 1/1] aacraid: do not set valid bit in sense information Salyzyn, Mark
  0 siblings, 1 reply; 15+ messages in thread
From: Luben Tuikov @ 2008-02-04  9:14 UTC (permalink / raw)
  To: James Bottomley, Mike Snitzer
  Cc: Tony Battersby, linux-scsi@vger.kernel.org, Salyzyn, Mark

--- On Sun, 2/3/08, Mike Snitzer <snitzer@gmail.com> wrote:

> From: Mike Snitzer <snitzer@gmail.com>
> Subject: Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
> To: "James Bottomley" <James.Bottomley@hansenpartnership.com>
> Cc: "Tony Battersby" <tonyb@cybernetics.com>, "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>, "Luben Tuikov" <ltuikov@yahoo.com>, "Salyzyn, Mark" <Mark_Salyzyn@adaptec.com>
> Date: Sunday, February 3, 2008, 7:14 AM
> On Feb 2, 2008 5:06 PM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> >
> > On Fri, 2008-02-01 at 12:03 -0500, Tony Battersby
> wrote:
> > > This patch fixes a problem with some out-of-spec
> SCSI disks that report
> > > hardware or medium errors incorrectly.  Without
> the patch, the kernel
> > > may silently ignore a failed write command or
> return corrupted data on a
> > > failed read command.
> > >
> > > Signed-off-by: Tony Battersby
> <tonyb@cybernetics.com>
> > > ---
> > >
> > > This is a simplified version of the original
> patch that fixes just the
> > > problem at hand, without trying to handle other
> theoretical out-of-spec
> > > cases.
> >
> > Actually, to restore the original check, this is what
> we want, isn't it?
> > Ok, so I also made the sector division logic
> futureproof for the day we
> > have > 4096 sector devices ...
> >
> > James
> >
> > ---
> >
> > >From 5ae2e4a8ff095aab5997f17068d3e4212c33f039 Mon
> Sep 17 00:00:00 2001
> > From: Tony Battersby <tonyb@cybernetics.com>
> > Date: Fri, 1 Feb 2008 12:03:27 -0500
> > Subject: [SCSI] sd: make error handling more robust
> >
> > This patch fixes a problem with some out-of-spec SCSI
> disks that report
> > hardware or medium errors incorrectly.  Without the
> patch, the kernel
> > may silently ignore a failed write command or return
> corrupted data on a
> > failed read command.
> >
> > Signed-off-by: Tony Battersby
> <tonyb@cybernetics.com>
> > Cc: Stable Tree <stable@kernel.org>
> > Signed-off-by: James Bottomley
> <James.Bottomley@HansenPartnership.com>
> 
> I've verified that this patch fixes the 2.6.22.16 SCSI
> IO error
> propagation issue I had when physically pulling a drive
> from an
> enclosure connected to an aacraid controller.

Just as in your case and Tony's case, which I presume
uses the same RAID firmware vendor, it would've
probably been better if the RAID firmware vendor
fixed the firmware to not set the VALID bit if the
INFORMATION field is not valid.

   Luben


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-04  9:11   ` [PATCH] [SCSI] sd: make error handling more robust (v2) Luben Tuikov
@ 2008-02-04 21:22     ` James Bottomley
  2008-02-04 23:23       ` Luben Tuikov
  0 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2008-02-04 21:22 UTC (permalink / raw)
  To: ltuikov; +Cc: Tony Battersby, linux-scsi@vger.kernel.org, Salyzyn, Mark

On Mon, 2008-02-04 at 01:11 -0800, Luben Tuikov wrote:
> Looks good except that "End LBA" is usually defined
> to be something of the sort of "the LBA of the last
> logical block accessed by the command" or "the LBA
> of the logical block on which the command failed".
> 
> A spec savvy editor of this code would be
> "pleasantly" surprised if they had to use "end_lba",
> and didn't pay attention that it was actually
> "End LBA" + 1.

Heh, well, that's where spec people and programmers part company.  The
universal expectation of a programmer in looping is

for (a = beginning; a < end; a++)

rather than <= if end were actually to point to last rather than last +
1.

James



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [SCSI] sd: make error handling more robust (v2)
  2008-02-04 21:22     ` James Bottomley
@ 2008-02-04 23:23       ` Luben Tuikov
  0 siblings, 0 replies; 15+ messages in thread
From: Luben Tuikov @ 2008-02-04 23:23 UTC (permalink / raw)
  To: James Bottomley; +Cc: Tony Battersby, linux-scsi@vger.kernel.org, Salyzyn, Mark

--- On Mon, 2/4/08, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Mon, 2008-02-04 at 01:11 -0800, Luben Tuikov wrote:
> > Looks good except that "End LBA" is usually
> defined
> > to be something of the sort of "the LBA of the
> last
> > logical block accessed by the command" or
> "the LBA
> > of the logical block on which the command
> failed".
> > 
> > A spec savvy editor of this code would be
> > "pleasantly" surprised if they had to use
> "end_lba",
> > and didn't pay attention that it was actually
> > "End LBA" + 1.
> 
> Heh, well, that's where spec people and programmers
> part company.  The
> universal expectation of a programmer in looping is
> 
> for (a = beginning; a < end; a++)
> 
> rather than <= if end were actually to point to last
> rather than last +
> 1.

For loop invariants that's true, although I didn't see
a loop in sd_done().

   Luben


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/1] aacraid: do not set valid bit in sense information
  2008-02-04  9:14     ` Luben Tuikov
@ 2008-02-06 21:54       ` Salyzyn, Mark
  2008-02-06 22:19         ` Luben Tuikov
  0 siblings, 1 reply; 15+ messages in thread
From: Salyzyn, Mark @ 2008-02-06 21:54 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org
  Cc: Tony Battersby, ltuikov@yahoo.com, Mike Snitzer

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]

Luben Tuikov [mailto:ltuikov@yahoo.com] sez:
> Just as in your case and Tony's case, which I presume
> uses the same RAID firmware vendor, it would've
> probably been better if the RAID firmware vendor
> fixed the firmware to not set the VALID bit if the
> INFORMATION field is not valid.

Point taken regarding the aacraid driver. Dropped the VALID bit, and then did some cleanup/simplification of the set_sense procedure and the associated parameters. Mike did some preliminary tests when the VALID bit was dropped before the 'Re: [PATCH] [SCSI] sd: make error handling more robust' patches came on the scene. The change in the SCSI subsystem does make this enclosed aacraid patch unnecessary, so this aacraid patch is merely post battle ground cleanup. If the simplification is an issue, repugnant, too much for a back-port to the stable trees or clouds the point, this patch could be happily distilled down to:

diff -ru a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
--- a/drivers/scsi/aacraid/aachba.c     2008-02-06 16:26:45.834938955 -0500
+++ b/drivers/scsi/aacraid/aachba.c     2008-02-06 16:32:01.109035329 -0500
@@ -865,7 +865,7 @@
                         u32 residue)
 {
-        sense_buf[0] = 0xF0;    /* Sense data valid, err code 70h (current error) */
+        sense_buf[0] = 0x70;    /* Sense data invalid, err code 70h (current error) */
         sense_buf[1] = 0;       /* Segment number, always zero */

         if (incorrect_length) {

This patch is against current scsi-misc-2.6

ObligatoryDisclaimer: Please accept my condolences regarding Outlook's handling of patch attachments. Please use the attached patch and not any inline content.

Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>

 drivers/scsi/aacraid/aachba.c |   81 +++++++++++++++---------------------------
 1 file changed, 30 insertions(+), 51 deletions(-)

Sincerely -- Mark Salyzyn

[-- Attachment #2: aacraid_set_sense.patch --]
[-- Type: application/octet-stream, Size: 5979 bytes --]

diff -ru a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
--- a/drivers/scsi/aacraid/aachba.c	2008-02-06 16:26:45.834938955 -0500
+++ b/drivers/scsi/aacraid/aachba.c	2008-02-06 16:32:01.109035329 -0500
@@ -859,44 +859,31 @@
 			le32_to_cpu(dev->adapter_info.serial[0]), cid);
 }
 
-static void set_sense(u8 *sense_buf, u8 sense_key, u8 sense_code,
-		      u8 a_sense_code, u8 incorrect_length,
-		      u8 bit_pointer, u16 field_pointer,
-		      u32 residue)
+static inline void set_sense(struct sense_data *sense_data, u8 sense_key,
+	u8 sense_code, u8 a_sense_code, u8 bit_pointer, u16 field_pointer)
 {
-	sense_buf[0] = 0xF0;	/* Sense data valid, err code 70h (current error) */
+	u8 *sense_buf = (u8 *)sense_data;
+	/* Sense data valid, err code 70h */
+	sense_buf[0] = 0x70; /* No info field */
 	sense_buf[1] = 0;	/* Segment number, always zero */
 
-	if (incorrect_length) {
-		sense_buf[2] = sense_key | 0x20;/* Set ILI bit | sense key */
-		sense_buf[3] = BYTE3(residue);
-		sense_buf[4] = BYTE2(residue);
-		sense_buf[5] = BYTE1(residue);
-		sense_buf[6] = BYTE0(residue);
-	} else
-		sense_buf[2] = sense_key;	/* Sense key */
-
-	if (sense_key == ILLEGAL_REQUEST)
-		sense_buf[7] = 10;	/* Additional sense length */
-	else
-		sense_buf[7] = 6;	/* Additional sense length */
+	sense_buf[2] = sense_key;	/* Sense key */
 
 	sense_buf[12] = sense_code;	/* Additional sense code */
 	sense_buf[13] = a_sense_code;	/* Additional sense code qualifier */
+
 	if (sense_key == ILLEGAL_REQUEST) {
-		sense_buf[15] = 0;
+		sense_buf[7] = 10;	/* Additional sense length */
 
-		if (sense_code == SENCODE_INVALID_PARAM_FIELD)
-			sense_buf[15] = 0x80;/* Std sense key specific field */
+		sense_buf[15] = bit_pointer;
 		/* Illegal parameter is in the parameter block */
-
 		if (sense_code == SENCODE_INVALID_CDB_FIELD)
-			sense_buf[15] = 0xc0;/* Std sense key specific field */
+			sense_buf[15] |= 0xc0;/* Std sense key specific field */
 		/* Illegal parameter is in the CDB block */
-		sense_buf[15] |= bit_pointer;
 		sense_buf[16] = field_pointer >> 8;	/* MSB */
 		sense_buf[17] = field_pointer;		/* LSB */
-	}
+	} else
+		sense_buf[7] = 6;	/* Additional sense length */
 }
 
 static int aac_bounds_32(struct aac_dev * dev, struct scsi_cmnd * cmd, u64 lba)
@@ -906,11 +893,9 @@
 		dprintk((KERN_DEBUG "aacraid: Illegal lba\n"));
 		cmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8 |
 			SAM_STAT_CHECK_CONDITION;
-		set_sense((u8 *) &dev->fsa_dev[cid].sense_data,
-			    HARDWARE_ERROR,
-			    SENCODE_INTERNAL_TARGET_FAILURE,
-			    ASENCODE_INTERNAL_TARGET_FAILURE, 0, 0,
-			    0, 0);
+		set_sense(&dev->fsa_dev[cid].sense_data,
+		  HARDWARE_ERROR, SENCODE_INTERNAL_TARGET_FAILURE,
+		  ASENCODE_INTERNAL_TARGET_FAILURE, 0, 0);
 		memcpy(cmd->sense_buffer, &dev->fsa_dev[cid].sense_data,
 		       min_t(size_t, sizeof(dev->fsa_dev[cid].sense_data),
 			     SCSI_SENSE_BUFFERSIZE));
@@ -1520,11 +1505,9 @@
 		  le32_to_cpu(readreply->status));
 #endif
 		scsicmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8 | SAM_STAT_CHECK_CONDITION;
-		set_sense((u8 *) &dev->fsa_dev[cid].sense_data,
-				    HARDWARE_ERROR,
-				    SENCODE_INTERNAL_TARGET_FAILURE,
-				    ASENCODE_INTERNAL_TARGET_FAILURE, 0, 0,
-				    0, 0);
+		set_sense(&dev->fsa_dev[cid].sense_data,
+		  HARDWARE_ERROR, SENCODE_INTERNAL_TARGET_FAILURE,
+		  ASENCODE_INTERNAL_TARGET_FAILURE, 0, 0);
 		memcpy(scsicmd->sense_buffer, &dev->fsa_dev[cid].sense_data,
 		       min_t(size_t, sizeof(dev->fsa_dev[cid].sense_data),
 			     SCSI_SENSE_BUFFERSIZE));
@@ -1733,11 +1716,9 @@
 		     le32_to_cpu(synchronizereply->status));
 		cmd->result = DID_OK << 16 |
 			COMMAND_COMPLETE << 8 | SAM_STAT_CHECK_CONDITION;
-		set_sense((u8 *)&dev->fsa_dev[cid].sense_data,
-				    HARDWARE_ERROR,
-				    SENCODE_INTERNAL_TARGET_FAILURE,
-				    ASENCODE_INTERNAL_TARGET_FAILURE, 0, 0,
-				    0, 0);
+		set_sense(&dev->fsa_dev[cid].sense_data,
+		  HARDWARE_ERROR, SENCODE_INTERNAL_TARGET_FAILURE,
+		  ASENCODE_INTERNAL_TARGET_FAILURE, 0, 0);
 		memcpy(cmd->sense_buffer, &dev->fsa_dev[cid].sense_data,
 		       min_t(size_t, sizeof(dev->fsa_dev[cid].sense_data),
 			     SCSI_SENSE_BUFFERSIZE));
@@ -1945,10 +1926,9 @@
 	{
 		dprintk((KERN_WARNING "Only INQUIRY & TUR command supported for controller, rcvd = 0x%x.\n", scsicmd->cmnd[0]));
 		scsicmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8 | SAM_STAT_CHECK_CONDITION;
-		set_sense((u8 *) &dev->fsa_dev[cid].sense_data,
-			    ILLEGAL_REQUEST,
-			    SENCODE_INVALID_COMMAND,
-			    ASENCODE_INVALID_COMMAND, 0, 0, 0, 0);
+		set_sense(&dev->fsa_dev[cid].sense_data,
+		  ILLEGAL_REQUEST, SENCODE_INVALID_COMMAND,
+		  ASENCODE_INVALID_COMMAND, 0, 0);
 		memcpy(scsicmd->sense_buffer, &dev->fsa_dev[cid].sense_data,
 		       min_t(size_t, sizeof(dev->fsa_dev[cid].sense_data),
 			     SCSI_SENSE_BUFFERSIZE));
@@ -1995,10 +1975,9 @@
 				scsicmd->result = DID_OK << 16 |
 				  COMMAND_COMPLETE << 8 |
 				  SAM_STAT_CHECK_CONDITION;
-				set_sense((u8 *) &dev->fsa_dev[cid].sense_data,
-				  ILLEGAL_REQUEST,
-				  SENCODE_INVALID_CDB_FIELD,
-				  ASENCODE_NO_SENSE, 0, 7, 2, 0);
+				set_sense(&dev->fsa_dev[cid].sense_data,
+				  ILLEGAL_REQUEST, SENCODE_INVALID_CDB_FIELD,
+				  ASENCODE_NO_SENSE, 7, 2);
 				memcpy(scsicmd->sense_buffer,
 				  &dev->fsa_dev[cid].sense_data,
 				  min_t(size_t,
@@ -2254,9 +2233,9 @@
 			 */
 			dprintk((KERN_WARNING "Unhandled SCSI Command: 0x%x.\n", scsicmd->cmnd[0]));
 			scsicmd->result = DID_OK << 16 | COMMAND_COMPLETE << 8 | SAM_STAT_CHECK_CONDITION;
-			set_sense((u8 *) &dev->fsa_dev[cid].sense_data,
-				ILLEGAL_REQUEST, SENCODE_INVALID_COMMAND,
-				ASENCODE_INVALID_COMMAND, 0, 0, 0, 0);
+			set_sense(&dev->fsa_dev[cid].sense_data,
+			  ILLEGAL_REQUEST, SENCODE_INVALID_COMMAND,
+			  ASENCODE_INVALID_COMMAND, 0, 0);
 			memcpy(scsicmd->sense_buffer, &dev->fsa_dev[cid].sense_data,
 				min_t(size_t,
 				      sizeof(dev->fsa_dev[cid].sense_data),

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/1] aacraid: do not set valid bit in sense information
  2008-02-06 21:54       ` [PATCH 1/1] aacraid: do not set valid bit in sense information Salyzyn, Mark
@ 2008-02-06 22:19         ` Luben Tuikov
  0 siblings, 0 replies; 15+ messages in thread
From: Luben Tuikov @ 2008-02-06 22:19 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org, Salyzyn, Mark; +Cc: Tony Battersby, Mike Snitzer

--- On Wed, 2/6/08, Salyzyn, Mark <Mark_Salyzyn@adaptec.com> wrote:

> From: Salyzyn, Mark <Mark_Salyzyn@adaptec.com>
> Subject: [PATCH 1/1] aacraid: do not set valid bit in sense information
> To: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
> Cc: "Tony Battersby" <tonyb@cybernetics.com>, "ltuikov@yahoo.com" <ltuikov@yahoo.com>, "Mike Snitzer" <snitzer@gmail.com>
> Date: Wednesday, February 6, 2008, 1:54 PM
> Luben Tuikov [mailto:ltuikov@yahoo.com] sez:
> > Just as in your case and Tony's case, which I
> presume
> > uses the same RAID firmware vendor, it would've
> > probably been better if the RAID firmware vendor
> > fixed the firmware to not set the VALID bit if the
> > INFORMATION field is not valid.
> 
> Point taken regarding the aacraid driver. Dropped the VALID
> bit, and then did some cleanup/simplification of the
> set_sense procedure and the associated parameters. Mike did
> some preliminary tests when the VALID bit was dropped before
> the 'Re: [PATCH] [SCSI] sd: make error handling more
> robust' patches came on the scene. The change in the
> SCSI subsystem does make this enclosed aacraid patch
> unnecessary, so this aacraid patch is merely post battle
> ground cleanup. If the simplification is an issue,
> repugnant, too much for a back-port to the stable trees or
> clouds the point, this patch could be happily distilled
> down to:

Sounds good.

   Luben

> 
> diff -ru a/drivers/scsi/aacraid/aachba.c
> b/drivers/scsi/aacraid/aachba.c
> --- a/drivers/scsi/aacraid/aachba.c     2008-02-06
> 16:26:45.834938955 -0500
> +++ b/drivers/scsi/aacraid/aachba.c     2008-02-06
> 16:32:01.109035329 -0500
> @@ -865,7 +865,7 @@
>                          u32 residue)
>  {
> -        sense_buf[0] = 0xF0;    /* Sense data valid, err
> code 70h (current error) */
> +        sense_buf[0] = 0x70;    /* Sense data invalid, err
> code 70h (current error) */
>          sense_buf[1] = 0;       /* Segment number, always
> zero */
> 
>          if (incorrect_length) {
> 
> This patch is against current scsi-misc-2.6
> 
> ObligatoryDisclaimer: Please accept my condolences
> regarding Outlook's handling of patch attachments.
> Please use the attached patch and not any inline content.
> 
> Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
> 
>  drivers/scsi/aacraid/aachba.c |   81
> +++++++++++++++---------------------------
>  1 file changed, 30 insertions(+), 51 deletions(-)
> 
> Sincerely -- Mark Salyzyn

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-02-06 22:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-01 17:03 [PATCH] [SCSI] sd: make error handling more robust (v2) Tony Battersby
2008-02-01 20:13 ` Luben Tuikov
2008-02-01 20:47 ` Salyzyn, Mark
2008-02-02  0:43   ` Mike Snitzer
2008-02-02 20:24     ` Greg KH
2008-02-02 20:56       ` Mike Snitzer
2008-02-02 21:08         ` Greg KH
2008-02-02 22:06 ` James Bottomley
2008-02-03 15:14   ` Mike Snitzer
2008-02-04  9:14     ` Luben Tuikov
2008-02-06 21:54       ` [PATCH 1/1] aacraid: do not set valid bit in sense information Salyzyn, Mark
2008-02-06 22:19         ` Luben Tuikov
2008-02-04  9:11   ` [PATCH] [SCSI] sd: make error handling more robust (v2) Luben Tuikov
2008-02-04 21:22     ` James Bottomley
2008-02-04 23:23       ` Luben Tuikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).