[PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
@ 2006-04-26 20:27 Mark Lord
  2006-04-26 20:52 ` Mark Lord
  2006-04-26 22:56 ` James Bottomley
  0 siblings, 2 replies; 14+ messages in thread
From: Mark Lord @ 2006-04-26 20:27 UTC (permalink / raw)
  To: linux-scsi, linux-kernel, Andrew Morton

From: Mark Lord <lkml@rtr.ca>

I am looking into how SCSI/SATA handle medium (disk) errors,
and the observed behaviour is a little more random than expected,
due to a bug in sd.c.

When scsi_get_sense_info_fld() fails (returns 0), it does NOT update the
value of first_err_block.  But sd_rw_intr() merrily continues to use that
variable regardless, possibly making incorrect decisions about retries and the like.

This patch removes the randomness there, by using the first sector of the
request (SCpnt->request->sector) in such cases, instead of first_err_block.

The patch shows more context than usual, to help see what's going on.

Signed-off-by: Mark Lord <lkml@rtr.ca>
---
--- linux-2.6.17-rc2-git8/drivers/scsi/sd.c	2006-04-26 15:36:12.000000000 -0400
+++ linux/drivers/scsi/sd.c	2006-04-26 16:07:39.000000000 -0400
@@ -930,64 +930,66 @@
 			info_valid = scsi_get_sense_info_fld(
 				SCpnt->sense_buffer, SCSI_SENSE_BUFFERSIZE,
 				&first_err_block);
 			/*
 			 * May want to warn and skip if following cast results
 			 * in actual truncation (if sector_t < 64 bits)
 			 */
 			error_sector = (sector_t)first_err_block;
 			if (SCpnt->request->bio != NULL)
 				block_sectors = bio_sectors(SCpnt->request->bio);
 			switch (SCpnt->device->sector_size) {
 			case 1024:
 				error_sector <<= 1;
 				if (block_sectors < 2)
 					block_sectors = 2;
 				break;
 			case 2048:
 				error_sector <<= 2;
 				if (block_sectors < 4)
 					block_sectors = 4;
 				break;
 			case 4096:
 				error_sector <<=3;
 				if (block_sectors < 8)
 					block_sectors = 8;
 				break;
 			case 256:
 				error_sector >>= 1;
 				break;
 			default:
 				break;
 			}
+			if (!info_valid)
+				error_sector = SCpnt->request->sector;

 			error_sector &= ~(block_sectors - 1);
 			good_bytes = (error_sector - SCpnt->request->sector) << 9;
 			if (good_bytes < 0 || good_bytes >= this_count)
 				good_bytes = 0;
 			break;

 		case RECOVERED_ERROR: /* an error occurred, but it recovered */
 		case NO_SENSE: /* LLDD got sense data */
 			/*
 			 * Inform the user, but make sure that it's not treated
 			 * as a hard error.
 			 */
 			scsi_print_sense("sd", SCpnt);
 			SCpnt->result = 0;
 			memset(SCpnt->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
 			good_bytes = this_count;
 			break;

 		case ILLEGAL_REQUEST:
 			if (SCpnt->device->use_10_for_rw &&
 			    (SCpnt->cmnd[0] == READ_10 ||
 			     SCpnt->cmnd[0] == WRITE_10))
 				SCpnt->device->use_10_for_rw = 0;
 			if (SCpnt->device->use_10_for_ms &&
 			    (SCpnt->cmnd[0] == MODE_SENSE_10 ||
 			     SCpnt->cmnd[0] == MODE_SELECT_10))
 				SCpnt->device->use_10_for_ms = 0;
 			break;

 		default:
 			break;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 20:27 [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors Mark Lord
@ 2006-04-26 20:52 ` Mark Lord
  2006-04-26 22:56 ` James Bottomley
  1 sibling, 0 replies; 14+ messages in thread
From: Mark Lord @ 2006-04-26 20:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-scsi, linux-kernel

Mark Lord wrote:
>
>When scsi_get_sense_info_fld() fails (returns 0), it does NOT update the
>value of first_err_block.  But sd_rw_intr() merrily continues to use that
>variable regardless, possibly making incorrect decisions about retries and the like.
>
>This patch removes the randomness there, by using the first sector of the
>request (SCpnt->request->sector) in such cases, instead of first_err_block.

Note that this bug has been around for a while, and is also present in 2.6.16.xx.
This same patch applies there too.

Cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 20:27 [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors Mark Lord
  2006-04-26 20:52 ` Mark Lord
@ 2006-04-26 22:56 ` James Bottomley
  2006-04-26 23:04   ` Mark Lord
  2006-04-26 23:14   ` Andrew Morton
  1 sibling, 2 replies; 14+ messages in thread
From: James Bottomley @ 2006-04-26 22:56 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-scsi, linux-kernel, Andrew Morton

On Wed, 2006-04-26 at 16:27 -0400, Mark Lord wrote:
> From: Mark Lord <lkml@rtr.ca>
> 
> I am looking into how SCSI/SATA handle medium (disk) errors,
> and the observed behaviour is a little more random than expected,
> due to a bug in sd.c.
> 
> When scsi_get_sense_info_fld() fails (returns 0), it does NOT update the
> value of first_err_block.  But sd_rw_intr() merrily continues to use that
> variable regardless, possibly making incorrect decisions about retries and the like.
> 
> This patch removes the randomness there, by using the first sector of the
> request (SCpnt->request->sector) in such cases, instead of first_err_block.
> 
> The patch shows more context than usual, to help see what's going on.

Thanks for finding the bug.  Your solution is a bit, um, convoluted.
What it should really be doing if we find no valid information field is
a break so we go out with the default good_sectors of zero (rather than
arriving at that value via a circuitous route).

And, of course, I couldn't resist eliminating the superfluous info_valid
variable and tidying the logic to be programmatic instead of a switch
case.  How does this work?

James

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c647d85..fff423e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -894,12 +894,10 @@ static void sd_rw_intr(struct scsi_cmnd 
 	int this_count = SCpnt->bufflen;
 	int good_bytes = (result == 0 ? this_count : 0);
 	sector_t block_sectors = 1;
-	u64 first_err_block;
 	sector_t error_sector;
 	struct scsi_sense_hdr sshdr;
 	int sense_valid = 0;
 	int sense_deferred = 0;
-	int info_valid;
 
 	if (result) {
 		sense_valid = scsi_command_normalize_sense(SCpnt, &sshdr);
@@ -923,43 +921,44 @@ #endif
 	 */
 	if (driver_byte(result) != 0 &&
 		 sense_valid && !sense_deferred) {
+		u64 first_err_block;
+		int sector_size;
+
 		switch (sshdr.sense_key) {
 		case MEDIUM_ERROR:
 			if (!blk_fs_request(SCpnt->request))
 				break;
-			info_valid = scsi_get_sense_info_fld(
-				SCpnt->sense_buffer, SCSI_SENSE_BUFFERSIZE,
-				&first_err_block);
-			/*
-			 * May want to warn and skip if following cast results
-			 * in actual truncation (if sector_t < 64 bits)
-			 */
-			error_sector = (sector_t)first_err_block;
-			if (SCpnt->request->bio != NULL)
-				block_sectors = bio_sectors(SCpnt->request->bio);
-			switch (SCpnt->device->sector_size) {
-			case 1024:
-				error_sector <<= 1;
-				if (block_sectors < 2)
-					block_sectors = 2;
-				break;
-			case 2048:
-				error_sector <<= 2;
-				if (block_sectors < 4)
-					block_sectors = 4;
-				break;
-			case 4096:
-				error_sector <<=3;
-				if (block_sectors < 8)
-					block_sectors = 8;
-				break;
-			case 256:
-				error_sector >>= 1;
-				break;
-			default:
+			if (!scsi_get_sense_info_fld(SCpnt->sense_buffer,
+						    SCSI_SENSE_BUFFERSIZE,
+						    &first_err_block))
 				break;
+
+			sector_size = SCpnt->device->sector_size / 512;
+
+			if (SCpnt->request->bio != NULL)
+				block_sectors = 
+					bio_sectors(SCpnt->request->bio);
+			if (block_sectors < sector_size)
+				block_sectors = sector_size;
+
+			error_sector = (sector_t)first_err_block;
+			/* 
+			 * convert from physical sector size to
+			 * standard 512 byte sized sectors
+			 */
+			if (sector_size)
+				error_sector *= sector_size;
+			else {
+				int sector_size_div =
+					512 / SCpnt->device->sector_size;
+				error_sector /= sector_size_div;
 			}
 
+			/*
+			 * if the device has a larger logical than
+			 * physical sector size, round down to the
+			 * beginning of a logical sector
+			 */
 			error_sector &= ~(block_sectors - 1);
 			good_bytes = (error_sector - SCpnt->request->sector) << 9;
 			if (good_bytes < 0 || good_bytes >= this_count)



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 22:56 ` James Bottomley
@ 2006-04-26 23:04   ` Mark Lord
  2006-04-26 23:18     ` James Bottomley
  2006-04-26 23:14   ` Andrew Morton
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Lord @ 2006-04-26 23:04 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-kernel, Andrew Morton

James Bottomley wrote:
> On Wed, 2006-04-26 at 16:27 -0400, Mark Lord wrote:
>> From: Mark Lord <lkml@rtr.ca>
..
>> When scsi_get_sense_info_fld() fails (returns 0), it does NOT update the
>> value of first_err_block.  But sd_rw_intr() merrily continues to use that
>> variable regardless, possibly making incorrect decisions about retries and the like.
>>
>> This patch removes the randomness there, by using the first sector of the
>> request (SCpnt->request->sector) in such cases, instead of first_err_block.
..
> Thanks for finding the bug.  Your solution is a bit, um, convoluted.

Heh, no, that's the original sd.c.  I just tried to keep the patch
as tiny as possible to show what is needed.  You are welcome to add
complexity to it, though.  The surrounding code is, uhm, rather interesting.

> What it should really be doing if we find no valid information field is
> a break so we go out with the default good_sectors of zero (rather than
> arriving at that value via a circuitous route).

Yes, but the difficulty there is that all of the convoluted logic
seems to still be wanted to set a correct "block_sectors" value,
needed as a parameter on the final call:

    scsi_io_completion(SCpnt, good_bytes, block_sectors << 9);

>How does this work?
...

Looks potentially buggy (still).

Cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 22:56 ` James Bottomley
  2006-04-26 23:04   ` Mark Lord
@ 2006-04-26 23:14   ` Andrew Morton
  2006-04-26 23:20     ` Mark Lord
  2006-04-26 23:22     ` James Bottomley
  1 sibling, 2 replies; 14+ messages in thread
From: Andrew Morton @ 2006-04-26 23:14 UTC (permalink / raw)
  To: James Bottomley; +Cc: lkml, linux-scsi, linux-kernel

James Bottomley <James.Bottomley@SteelEye.com> wrote:
>
> On Wed, 2006-04-26 at 16:27 -0400, Mark Lord wrote:
> > From: Mark Lord <lkml@rtr.ca>
> > 
> > I am looking into how SCSI/SATA handle medium (disk) errors,
> > and the observed behaviour is a little more random than expected,
> > due to a bug in sd.c.
> > 
> > When scsi_get_sense_info_fld() fails (returns 0), it does NOT update the
> > value of first_err_block.  But sd_rw_intr() merrily continues to use that
> > variable regardless, possibly making incorrect decisions about retries and the like.
> > 
> > This patch removes the randomness there, by using the first sector of the
> > request (SCpnt->request->sector) in such cases, instead of first_err_block.
> > 
> > The patch shows more context than usual, to help see what's going on.
> 
> Thanks for finding the bug.  Your solution is a bit, um, convoluted.
> What it should really be doing if we find no valid information field is
> a break so we go out with the default good_sectors of zero (rather than
> arriving at that value via a circuitous route).
> 
> And, of course, I couldn't resist eliminating the superfluous info_valid
> variable and tidying the logic to be programmatic instead of a switch
> case.  How does this work?

It'd be nice to have something simple-and-obvious for the
simple-and-obvious -stable maintainers.  That's if we think -stable needs
this fixed.

> +				int sector_size_div =
> +					512 / SCpnt->device->sector_size;
> +				error_sector /= sector_size_div;

You sure about this bit?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 23:04   ` Mark Lord
@ 2006-04-26 23:18     ` James Bottomley
  2006-04-26 23:28       ` Mark Lord
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2006-04-26 23:18 UTC (permalink / raw)
  To: Mark Lord; +Cc: linux-scsi, linux-kernel, Andrew Morton

On Wed, 2006-04-26 at 19:04 -0400, Mark Lord wrote:
> Yes, but the difficulty there is that all of the convoluted logic
> seems to still be wanted to set a correct "block_sectors" value,
> needed as a parameter on the final call:
> 
>     scsi_io_completion(SCpnt, good_bytes, block_sectors << 9);

Erm, but that's only used for volume overflow (or other single sector
errors), which this isn't ... Actually, as far as I can tell, the whole
block_sectors calculation can be killed as well.

> >How does this work?
> ...
> 
> Looks potentially buggy (still).

Where?

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 23:14   ` Andrew Morton
@ 2006-04-26 23:20     ` Mark Lord
  2006-04-26 23:35       ` Andrew Morton
  2006-04-26 23:22     ` James Bottomley
  1 sibling, 1 reply; 14+ messages in thread
From: Mark Lord @ 2006-04-26 23:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: James Bottomley, linux-scsi, linux-kernel

Andrew Morton wrote:
> 
> It'd be nice to have something simple-and-obvious for the
> simple-and-obvious -stable maintainers.

That's why I kept the original patch very simple and focused,
rather than trying to fix all of the convoluted code around it.

It's nice and simple, and *looks* correct.

A longer term cleanup of that function is better left to James!

> That's if we think -stable needs this fixed.

Let's say a bunch of read bio's get coalesced into a single
200+ sector request.  This then fails on one single bad sector
out of the 200+.  Without the patch, there is a very good chance
that sd.c will simply fail the entire request, all 200+ sectors.

With the patch, it will fail the first block, and then retry
the remaining blocks.  And repeat this until something works,
or until everything has failed one by one.

Better, but still not the best.

What I need to have happen when a request is failed due to bad-media,
is have it split the request into a sequence of single-block requests
that are passed to the LLD one at a time.  The ones with real bad
sectors will then be independently failed, and the rest will get done.

Much better.  Much more complex.

I'm thinking about something like that, just not sure whether to put it
(initially) in libata, sd.c, or the block layer.

Cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 23:14   ` Andrew Morton
  2006-04-26 23:20     ` Mark Lord
@ 2006-04-26 23:22     ` James Bottomley
  1 sibling, 0 replies; 14+ messages in thread
From: James Bottomley @ 2006-04-26 23:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-scsi, linux-kernel

On Wed, 2006-04-26 at 16:14 -0700, Andrew Morton wrote:
> It'd be nice to have something simple-and-obvious for the
> simple-and-obvious -stable maintainers.  That's if we think -stable needs
> this fixed.

Well .. the original will do for that.  

> > +				int sector_size_div =
> > +					512 / SCpnt->device->sector_size;
> > +				error_sector /= sector_size_div;
> 
> You sure about this bit?

Yes.  If the <hardware sector size> is < 512 bytes then to convert the
listed error sector to the standard 512 byte sector size block index, we
have to divide (512/<hardware sector size>).

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 23:18     ` James Bottomley
@ 2006-04-26 23:28       ` Mark Lord
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Lord @ 2006-04-26 23:28 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, linux-kernel, Andrew Morton

James Bottomley wrote:
> On Wed, 2006-04-26 at 19:04 -0400, Mark Lord wrote:
>> Yes, but the difficulty there is that all of the convoluted logic
>> seems to still be wanted to set a correct "block_sectors" value,
>> needed as a parameter on the final call:
>>
>>     scsi_io_completion(SCpnt, good_bytes, block_sectors << 9);
> 
> Erm, but that's only used for volume overflow (or other single sector
> errors), which this isn't ... Actually, as far as I can tell, the whole
> block_sectors calculation can be killed as well.

I wonder if it can be done away with as a parameter for scsi_io_completion() ?
If not, then we'll need some nice big comments in that function to warn
against relying on a valid value for that parameter in specific cases.

Cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 23:20     ` Mark Lord
@ 2006-04-26 23:35       ` Andrew Morton
  2006-04-27  1:43         ` Mark Lord
                           ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Andrew Morton @ 2006-04-26 23:35 UTC (permalink / raw)
  To: Mark Lord; +Cc: James.Bottomley, linux-scsi, linux-kernel

Mark Lord <lkml@rtr.ca> wrote:
>
> > That's if we think -stable needs this fixed.
> 
> Let's say a bunch of read bio's get coalesced into a single
> 200+ sector request.  This then fails on one single bad sector
> out of the 200+.  Without the patch, there is a very good chance
> that sd.c will simply fail the entire request, all 200+ sectors.
> 
> With the patch, it will fail the first block, and then retry
> the remaining blocks.  And repeat this until something works,
> or until everything has failed one by one.

Yowch.  I have the feeling that this'll take our EIO-handling time from
far-too-long to far-too-long*200.

I am still traumatised by my recent ten-minute wait for a dodgy DVD to
become ejectable.

I don't think -stable needs this, personally.

> Better, but still not the best.
> 
> What I need to have happen when a request is failed due to bad-media,
> is have it split the request into a sequence of single-block requests
> that are passed to the LLD one at a time.  The ones with real bad
> sectors will then be independently failed, and the rest will get done.
> 
> Much better.  Much more complex.
> 
> I'm thinking about something like that, just not sure whether to put it
> (initially) in libata, sd.c, or the block layer.

block, I suspect.  My DVD trauma was IDE-induced.  Jens is mulling the
problem - I'd suggest you coordinate with him.

It would be a good thing to fix.

It's moderately hard to test, though.  Easy enough for DVDs and CDs, but
it's harder to take a marker pen to a hard drive.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 23:35       ` Andrew Morton
@ 2006-04-27  1:43         ` Mark Lord
  2006-04-27  9:28         ` Rogier Wolff
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Mark Lord @ 2006-04-27  1:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: James.Bottomley, linux-scsi, linux-kernel, Jens Axboe

Andrew Morton wrote:
> Mark Lord <lkml@rtr.ca> wrote:
>>> That's if we think -stable needs this fixed.
>> Let's say a bunch of read bio's get coalesced into a single
>> 200+ sector request.  This then fails on one single bad sector
>> out of the 200+.  Without the patch, there is a very good chance
>> that sd.c will simply fail the entire request, all 200+ sectors.
>>
>> With the patch, it will fail the first block, and then retry
>> the remaining blocks.  And repeat this until something works,
>> or until everything has failed one by one.
> 
> Yowch.  I have the feeling that this'll take our EIO-handling time from
> far-too-long to far-too-long*200.

That's how it always used to work (eg. SUSE9 2.6.5+ kernels; Jens?).

>> What I need to have happen when a request is failed due to bad-media,
>> is have it split the request into a sequence of single-block requests
>> that are passed to the LLD one at a time.  The ones with real bad
>> sectors will then be independently failed, and the rest will get done.
>>
>> Much better.  Much more complex.
>>
>> I'm thinking about something like that, just not sure whether to put it
>> (initially) in libata, sd.c, or the block layer.
> 
> block, I suspect.  My DVD trauma was IDE-induced.  Jens is mulling the
> problem - I'd suggest you coordinate with him.

I've been pinging Jens about it for a couple of weeks now; no response.

> It would be a good thing to fix.
> 
> It's moderately hard to test, though.  Easy enough for DVDs and CDs, but
> it's harder to take a marker pen to a hard drive.

I have a bunch of "pre-marked" SATA drives here just for the purpose..

Cheers


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 23:35       ` Andrew Morton
  2006-04-27  1:43         ` Mark Lord
@ 2006-04-27  9:28         ` Rogier Wolff
  2006-04-27  9:37         ` Avi Kivity
  2006-04-27 16:03         ` Mark Lord
  3 siblings, 0 replies; 14+ messages in thread
From: Rogier Wolff @ 2006-04-27  9:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mark Lord, James.Bottomley, linux-scsi, linux-kernel

On Wed, Apr 26, 2006 at 04:35:36PM -0700, Andrew Morton wrote:
> Mark Lord <lkml@rtr.ca> wrote:
> >
> > > That's if we think -stable needs this fixed.
> > 
> > Let's say a bunch of read bio's get coalesced into a single
> > 200+ sector request.  This then fails on one single bad sector
> > out of the 200+.  Without the patch, there is a very good chance
> > that sd.c will simply fail the entire request, all 200+ sectors.
> > 
> > With the patch, it will fail the first block, and then retry
> > the remaining blocks.  And repeat this until something works,
> > or until everything has failed one by one.
> 
> Yowch.  I have the feeling that this'll take our EIO-handling time from
> far-too-long to far-too-long*200.

Whoa! Can't get worse. 

Hmm. OK. Maybe it can. It seems that IDE at least used to do the
"*200" behaviour.

I now issue a read for (1k-)block 1024 on my disk, which has blocks
1024 ... 1536 being bad. Kernel has decided i'm doing linear reads, so
it will issue a readahead for 1024-1536. 

I then don't want to have the kernel continue the 1025-1536 readahead.

Also, if just block 1027 is bad, a normal read of block 1024 will read
the page 1024-1027, and return EIO on 1024 because it couldn't read
the whole page. We've learned to live with this, but if you guys are
going to look into it, keep this in mind, please.... 

> It's moderately hard to test, though.  Easy enough for DVDs and CDs,
> but it's harder to take a marker pen to a hard drive.

I (accidentally) scratched a SCSI drive with a paperclip some weeks
ago. 65 bad blocks. (36 files hurt). I did manage to use the paperclip
to push the metal part that had fallen off the inside of the drive
somewhere to a spot where I could remove it. That metal part was
preventing the head from leaving the parking zone. 

Anyway, I have enough disks with bad blocks that I can make the
following offer: Serious kernel developpers that need a
drive-with-bad-blocks for testing should submit their snail-mail
address. I'll see what I can do. (Specify SCSI or IDE or both). 

	Roger Wolff
	R.E.Wolff@harddisk-recovery.nl

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement. 
Does it sit on the couch all day? Is it unemployed? Please be specific! 
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 23:35       ` Andrew Morton
  2006-04-27  1:43         ` Mark Lord
  2006-04-27  9:28         ` Rogier Wolff
@ 2006-04-27  9:37         ` Avi Kivity
  2006-04-27 16:03         ` Mark Lord
  3 siblings, 0 replies; 14+ messages in thread
From: Avi Kivity @ 2006-04-27  9:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Mark Lord, James.Bottomley, linux-scsi, linux-kernel

Andrew Morton wrote:
> block, I suspect.  My DVD trauma was IDE-induced.  Jens is mulling the
> problem - I'd suggest you coordinate with him.
>
> It would be a good thing to fix.
>
> It's moderately hard to test, though.  Easy enough for DVDs and CDs, but
> it's harder to take a marker pen to a hard drive.
>   
If it's in the block layer, perhaps using a device mapper error target 
can help in testing. One could write a table to map an entire partition 
to a logical volume, then replace it with a table that maps a few 
sectors to the error target.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors
  2006-04-26 23:35       ` Andrew Morton
                           ` (2 preceding siblings ...)
  2006-04-27  9:37         ` Avi Kivity
@ 2006-04-27 16:03         ` Mark Lord
  3 siblings, 0 replies; 14+ messages in thread
From: Mark Lord @ 2006-04-27 16:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: James.Bottomley, linux-scsi, linux-kernel

Andrew Morton wrote:
> Mark Lord <lkml@rtr.ca> wrote:
>>> That's if we think -stable needs this fixed.
>> Let's say a bunch of read bio's get coalesced into a single
>> 200+ sector request.  This then fails on one single bad sector
>> out of the 200+.  Without the patch, there is a very good chance
>> that sd.c will simply fail the entire request, all 200+ sectors.
>>
>> With the patch, it will fail the first block, and then retry
>> the remaining blocks.  And repeat this until something works,
>> or until everything has failed one by one.
> 
> Yowch.  I have the feeling that this'll take our EIO-handling time from
> far-too-long to far-too-long*200.
> 
> I am still traumatised by my recent ten-minute wait for a dodgy DVD to
> become ejectable.
> 
> I don't think -stable needs this, personally.

Perhaps, perhaps not.  The current behaviour is semi *random*, though.
Sometimes it may just fail the entire request (wrong!),
sometimes it may do the (almost as wrong) fail a block at a time
from the beginning, until the bad sector is passed, and then succeed
on the remainder.

Ugh.
>> What I need to have happen when a request is failed due to bad-media,
>> is have it split the request into a sequence of single-block requests
>> that are passed to the LLD one at a time.  The ones with real bad
>> sectors will then be independently failed, and the rest will get done.
>>
>> Much better.  Much more complex.

-- 
Mark Lord
Real-Time Remedies Inc.
mlord@pobox.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-04-27 16:03 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-26 20:27 [PATCH] drivers/scsi/sd.c: fix uninitialized variable in handling medium errors Mark Lord
2006-04-26 20:52 ` Mark Lord
2006-04-26 22:56 ` James Bottomley
2006-04-26 23:04   ` Mark Lord
2006-04-26 23:18     ` James Bottomley
2006-04-26 23:28       ` Mark Lord
2006-04-26 23:14   ` Andrew Morton
2006-04-26 23:20     ` Mark Lord
2006-04-26 23:35       ` Andrew Morton
2006-04-27  1:43         ` Mark Lord
2006-04-27  9:28         ` Rogier Wolff
2006-04-27  9:37         ` Avi Kivity
2006-04-27 16:03         ` Mark Lord
2006-04-26 23:22     ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox