Time spent waiting for uncorrectable errors

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Time spent waiting for uncorrectable errors
@ 2004-02-16  8:21 Alex Goller
  2004-02-19 15:13 ` Hans-Peter Jansen
  0 siblings, 1 reply; 4+ messages in thread
From: Alex Goller @ 2004-02-16  8:21 UTC (permalink / raw)
  To: linux-ide

Hi,

is there any data regarding how long current disks try to read a
sector before quitting with an uncorrectable error? The problem is,
that i have no reliable way to reproduce the error, i will try to read
from a hopefully (!) broken disk this afternoon and try to measure the
time spent for that.

bye, alex
-- 
alexander goller		alex@vivien.franken.de

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Time spent waiting for uncorrectable errors
  2004-02-16  8:21 Time spent waiting for uncorrectable errors Alex Goller
@ 2004-02-19 15:13 ` Hans-Peter Jansen
  2004-02-19 15:39   ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 4+ messages in thread
From: Hans-Peter Jansen @ 2004-02-19 15:13 UTC (permalink / raw)
  To: Alex Goller, linux-ide

On Monday 16 February 2004 09:21, Alex Goller wrote:
> Hi,
>
> is there any data regarding how long current disks try to read a
> sector before quitting with an uncorrectable error? The problem is,
> that i have no reliable way to reproduce the error, i will try to
> read from a hopefully (!) broken disk this afternoon and try to
> measure the time spent for that.

IIRC, the kernel tries to read a defect block exactly 8 times.

The problem is (according to some Maxtor guy), a drive, that returns
a hard sector error has tried to read it internally a few tausand
times (~2650), which results in about 21200 physical retries.

Unfortunately this renders an unpatched linux kernel useless for data 
recovery tasks. Well, not useless in general, but you simply need a 
_lot_ of patience.

Last time I've done it myself, it took about 40 hours to copy a 
defective 80 GB HD with dd_rescue. Fortunately, the damage was in
some mpeg2 streams, which is quite robust in handling long runs of
zeros ;-)..

Pete

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Time spent waiting for uncorrectable errors
  2004-02-19 15:13 ` Hans-Peter Jansen
@ 2004-02-19 15:39   ` Bartlomiej Zolnierkiewicz
  2004-02-19 16:30     ` Hans-Peter Jansen
  0 siblings, 1 reply; 4+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2004-02-19 15:39 UTC (permalink / raw)
  To: Hans-Peter Jansen; +Cc: Alex Goller, linux-ide

On Thursday 19 of February 2004 16:13, Hans-Peter Jansen wrote:
> On Monday 16 February 2004 09:21, Alex Goller wrote:
> > Hi,
> >
> > is there any data regarding how long current disks try to read a
> > sector before quitting with an uncorrectable error? The problem is,
> > that i have no reliable way to reproduce the error, i will try to
> > read from a hopefully (!) broken disk this afternoon and try to
> > measure the time spent for that.
>
> IIRC, the kernel tries to read a defect block exactly 8 times.
>
> The problem is (according to some Maxtor guy), a drive, that returns
> a hard sector error has tried to read it internally a few tausand
> times (~2650), which results in about 21200 physical retries.
>
> Unfortunately this renders an unpatched linux kernel useless for data
> recovery tasks. Well, not useless in general, but you simply need a
> _lot_ of patience.

So where are the patches? :-)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Time spent waiting for uncorrectable errors
  2004-02-19 15:39   ` Bartlomiej Zolnierkiewicz
@ 2004-02-19 16:30     ` Hans-Peter Jansen
  0 siblings, 0 replies; 4+ messages in thread
From: Hans-Peter Jansen @ 2004-02-19 16:30 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: Alex Goller, linux-ide

On Thursday 19 February 2004 16:39, Bartlomiej Zolnierkiewicz wrote:
> On Thursday 19 of February 2004 16:13, Hans-Peter Jansen wrote:
> >
> > IIRC, the kernel tries to read a defect block exactly 8 times.
> >
> > The problem is (according to some Maxtor guy), a drive, that
> > returns a hard sector error has tried to read it internally a few
> > tausand times (~2650), which results in about 21200 physical
> > retries.
> >
> > Unfortunately this renders an unpatched linux kernel useless for
> > data recovery tasks. Well, not useless in general, but you simply
> > need a _lot_ of patience.
>
> So where are the patches? :-)

First of all, I'm still on 2.4 here. Sorry..

When I looked into this last autumn, I discovered serveral problems:
 - crc-errors are also used for PIO fall back
 - ide crc-error handling is scattered over several modules for disk/
   cdrom/ide-scsi
 - differenciate medium/transport errors correctly (cable...)
and I'm a real dummy in those concerns.

Here's something I hacked up for internal use, only. Please don't 
consume with empty/full stomach:

--- linux-x/include/linux/ide.h	2003-09-23 21:13:31.000000000 +0200
+++ linux/include/linux/ide.h	2003-10-21 18:06:25.000000000 +0200
@@ -793,6 +793,9 @@
 	int		forced_lun;	/* if hdxlun was given at boot */
 	int		lun;		/* logical unit */
 	int		crc_count;	/* crc counter to reduce drive speed */
+	int		crc_err;	/* count crc errors */
+	u8		fail_on_crc_err; /* fail on all crc errors, will prevent */
+					/* retries and automatic speed reduce */
 
 	char		special_buf[4];	/* IDE_DRIVE_CMD, free use */
 } ide_drive_t;
--- linux-x/drivers/ide/ide-disk.c	2003-09-23 20:03:19.000000000 +0200
+++ linux/drivers/ide/ide-disk.c	2003-10-21 18:00:55.000000000 +0200
@@ -905,9 +905,21 @@
 			    hwif->INB(IDE_COMMAND_REG) == WIN_SPECIFY)
 				return ide_stopped;
 		} else if ((err & BAD_CRC) == BAD_CRC) {
-			/* UDMA crc error, just retry the operation */
-			drive->crc_count++;
+			if (drive->crc_err++ == 0x7fffffff)
+			    /* we have > MAX_INT-1 errors, stop counting
+			       to avoid a wrap around */
+			    drive->crc_err--;
+			if (drive->fail_on_crc_err)
+			    /* no retries */
+			    rq->errors = ERROR_MAX;
+			else
+			    /* UDMA crc error, just retry the operation */
+			    drive->crc_count++;
 		} else if (err & (BBD_ERR | ECC_ERR)) {
+			if (drive->crc_err++ == 0x7fffffff)
+			    /* we have > MAX_INT-1 errors, stop counting
+			       to avoid a wrap around */
+			    drive->crc_err--;
 			/* retries won't help these */
 			rq->errors = ERROR_MAX;
 		} else if (err & TRK0_ERR) {
@@ -1565,6 +1577,8 @@
 	ide_add_setting(drive,	"acoustic",		SETTING_RW,					HDIO_GET_ACOUSTIC,	HDIO_SET_ACOUSTIC,	TYPE_BYTE,	0,	254,				1,	1,	&drive->acoustic,		set_acoustic);
  	ide_add_setting(drive,	"failures",		SETTING_RW,					-1,			-1,			TYPE_INT,	0,	65535,				1,	1,	&drive->failures,		NULL);
  	ide_add_setting(drive,	"max_failures",		SETTING_RW,					-1,			-1,			TYPE_INT,	0,	65535,				1,	1,	&drive->max_failures,		NULL);
+ 	ide_add_setting(drive,	"crc_err",		SETTING_READ,					-1,			-1,			TYPE_INT,	0,	0x7fffffff,			1,	1,	&drive->crc_err,		NULL);
+ 	ide_add_setting(drive,	"fail_on_crc_err",	SETTING_RW,					-1,			-1,			TYPE_BYTE,	0,	1,				1,	1,	&drive->fail_on_crc_err,	NULL);
 }
 
 static int idedisk_ioctl (ide_drive_t *drive, struct inode *inode,


Here's an equally butt ugly attempt for ide-scsi:


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-02-19 16:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-16  8:21 Time spent waiting for uncorrectable errors Alex Goller
2004-02-19 15:13 ` Hans-Peter Jansen
2004-02-19 15:39   ` Bartlomiej Zolnierkiewicz
2004-02-19 16:30     ` Hans-Peter Jansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).