* Driver retries disk errors.
@ 2004-08-30 16:39 Rogier Wolff
2004-08-30 17:46 ` Theodore Ts'o
2004-08-31 11:45 ` Alan Cox
0 siblings, 2 replies; 31+ messages in thread
From: Rogier Wolff @ 2004-08-30 16:39 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-ide
Hi,
We encounter "bad" drives with quite a lot more regularity than other
people (look at the Email address). We're however, wondering why the
IDE code still retries a bad block 8 times? By the time the drive
reports "bad block" it has already tried it several times, including a
bunch of "recalibrates" etc etc. For comparison, the Scsi-disk driver
doesn't do any retrying.
So, why do we still do this?
- The driver may still work for MFM drives and less "intelligent"
controllers?
- Someone has recently seen that this actually helps?
In fact we regularly are able to recover data from drives: we have a
userspace application that retries over and over again, and this
sometimes recovers "marginal" blocks. This could be considered "good
practise" if there is a filesystem requesting the block. On the other
hand, when this happens, the drive is usually beyond being usable for
a filesystem: if we recover one block this way, the next block will be
errorred and the filesystem "crashes" anyway. In fact this behaviour
may masquerade the first warnings that something is going wrong....
So, I'm arguing for: We remove the retry code alltogether, OR we make
an option to re-enable the retry code for MFM era drives(*) (Note: those
are more than 10 years old, so almost (but not quite) extinct).
Roger.
(*) Note: Tested last month: The driver still works for MFM
drives. However, the initialization apparently is not enough
anymore. The drive did not work when the BIOS didn't think there was a
drive.
--
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-30 16:39 Driver retries disk errors Rogier Wolff
@ 2004-08-30 17:46 ` Theodore Ts'o
2004-08-30 18:26 ` James Courtier-Dutton
2004-08-30 22:17 ` Rogier Wolff
2004-08-31 11:45 ` Alan Cox
1 sibling, 2 replies; 31+ messages in thread
From: Theodore Ts'o @ 2004-08-30 17:46 UTC (permalink / raw)
To: Rogier Wolff; +Cc: linux-kernel, linux-ide
On Mon, Aug 30, 2004 at 06:39:31PM +0200, Rogier Wolff wrote:
> We encounter "bad" drives with quite a lot more regularity than other
> people (look at the Email address). We're however, wondering why the
> IDE code still retries a bad block 8 times?
I could see retrying 2 or 3 times, but 8 times does seem to be a bit
much, agreed.
> In fact we regularly are able to recover data from drives: we have a
> userspace application that retries over and over again, and this
> sometimes recovers "marginal" blocks. This could be considered "good
> practise" if there is a filesystem requesting the block. On the other
> hand, when this happens, the drive is usually beyond being usable for
> a filesystem: if we recover one block this way, the next block will be
> errorred and the filesystem "crashes" anyway. In fact this behaviour
> may masquerade the first warnings that something is going wrong....
If the block gets successfully read after 2 or 3 tries, it might be a
good idea for the kernel to automatically do a forced rewrite of the
block, which should cause the disk to do its own disk block
sparing/reassignment.
- Ted
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-30 17:46 ` Theodore Ts'o
@ 2004-08-30 18:26 ` James Courtier-Dutton
2004-08-30 22:25 ` Rogier Wolff
` (2 more replies)
2004-08-30 22:17 ` Rogier Wolff
1 sibling, 3 replies; 31+ messages in thread
From: James Courtier-Dutton @ 2004-08-30 18:26 UTC (permalink / raw)
To: Theodore Ts'o; +Cc: Rogier Wolff, linux-kernel, linux-ide
Theodore Ts'o wrote:
> On Mon, Aug 30, 2004 at 06:39:31PM +0200, Rogier Wolff wrote:
>
>>We encounter "bad" drives with quite a lot more regularity than other
>>people (look at the Email address). We're however, wondering why the
>>IDE code still retries a bad block 8 times?
>
>
> I could see retrying 2 or 3 times, but 8 times does seem to be a bit
> much, agreed.
>
>
>>In fact we regularly are able to recover data from drives: we have a
>>userspace application that retries over and over again, and this
>>sometimes recovers "marginal" blocks. This could be considered "good
>>practise" if there is a filesystem requesting the block. On the other
>>hand, when this happens, the drive is usually beyond being usable for
>>a filesystem: if we recover one block this way, the next block will be
>>errorred and the filesystem "crashes" anyway. In fact this behaviour
>>may masquerade the first warnings that something is going wrong....
>
>
> If the block gets successfully read after 2 or 3 tries, it might be a
> good idea for the kernel to automatically do a forced rewrite of the
> block, which should cause the disk to do its own disk block
> sparing/reassignment.
>
> - Ted
It does the same retries with CD-ROM and DVDs, and if the retries fail,
it disables DMA! It even does the retries when reading CD-Audio.
Maybe there should be a "retrys" setting that can be set by hdparm, then
we could set the retry counts, and what happens when a retry fails on a
per device basis.
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: Driver retries disk errors.
2004-08-30 18:26 ` James Courtier-Dutton
@ 2004-08-30 22:25 ` Rogier Wolff
2004-08-31 11:38 ` Alan Cox
2004-08-31 15:16 ` Bill Davidsen
2 siblings, 0 replies; 31+ messages in thread
From: Rogier Wolff @ 2004-08-30 22:25 UTC (permalink / raw)
To: James Courtier-Dutton; +Cc: Theodore Ts'o, linux-kernel, linux-ide
On Mon, Aug 30, 2004 at 07:26:27PM +0100, James Courtier-Dutton wrote:
> It does the same retries with CD-ROM and DVDs, and if the retries fail,
> it disables DMA!
As a matter of fact, we've had a computer where I tried to
get an MFM drive working. There I had changed lots of settings
in the BIOS to disable the onboard IDE and stuff like that.
When we tried to get IDE back working, we encountered the
situation where the secondary channel would not DMA unless
<something in the BIOS>. There the strategy "disable DMA"
works: the drive is "switched down" and something works.
I remember from the old days that this was: "To enable
the user to continue to use the system to fix the problem".
However, in practise, this failure is not something that
you can fix "if you have access to your drive", but something
you get to fix in the BIOS. So does this still help?
Well, maybe PIO is so "basic" that it will always work,
and is a good "last resort'.
The "same retries with CDROM" stems from the fact that the
code was initially duplicated. (as in cp ide-disk.c ide-cd.c)
The fact that DVDs are nowadays writable should be a hint that
the drivers may be better off getting merged one of these days.
Roger.
--
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-30 18:26 ` James Courtier-Dutton
2004-08-30 22:25 ` Rogier Wolff
@ 2004-08-31 11:38 ` Alan Cox
2004-09-02 16:23 ` Eric Mudama
2004-08-31 15:16 ` Bill Davidsen
2 siblings, 1 reply; 31+ messages in thread
From: Alan Cox @ 2004-08-31 11:38 UTC (permalink / raw)
To: James Courtier-Dutton
Cc: Theodore Ts'o, Rogier Wolff, Linux Kernel Mailing List,
linux-ide
On Llu, 2004-08-30 at 19:26, James Courtier-Dutton wrote:
> Theodore Ts'o wrote:
> > If the block gets successfully read after 2 or 3 tries, it might be a
> > good idea for the kernel to automatically do a forced rewrite of the
> > block, which should cause the disk to do its own disk block
> > sparing/reassignment.
Not really as far as I can tell. It isn't a disk any more, its a storage
appliance on a funny connector. It already knows a lot about retries
internally as well as rewriting blocks with high ECC error
count. In fact you actually have to issue a different command to do
read/write without retry.
> It does the same retries with CD-ROM and DVDs, and if the retries fail,
> it disables DMA! It even does the retries when reading CD-Audio.
> Maybe there should be a "retrys" setting that can be set by hdparm, then
> we could set the retry counts, and what happens when a retry fails on a
> per device basis.
It probably should be smarter about error strategy here. You can use
hdparm to control some of this in the IDE case but not enough.
Alan
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-31 11:38 ` Alan Cox
@ 2004-09-02 16:23 ` Eric Mudama
0 siblings, 0 replies; 31+ messages in thread
From: Eric Mudama @ 2004-09-02 16:23 UTC (permalink / raw)
To: Alan Cox
Cc: James Courtier-Dutton, Theodore Ts'o, Rogier Wolff,
Linux Kernel Mailing List, linux-ide
On Tue, 31 Aug 2004 12:38:45 +0100, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> Not really as far as I can tell. It isn't a disk any more, its a storage
> appliance on a funny connector. It already knows a lot about retries
> internally as well as rewriting blocks with high ECC error
> count. In fact you actually have to issue a different command to do
> read/write without retry.
True, but in the later versions of the ATA specification, the retry
option was depreciated. I think you'll find virtually every ATA drive
built today ignores that "suggestion" from the host.
--eric
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-30 18:26 ` James Courtier-Dutton
2004-08-30 22:25 ` Rogier Wolff
2004-08-31 11:38 ` Alan Cox
@ 2004-08-31 15:16 ` Bill Davidsen
2 siblings, 0 replies; 31+ messages in thread
From: Bill Davidsen @ 2004-08-31 15:16 UTC (permalink / raw)
To: linux-kernel
James Courtier-Dutton wrote:
> It does the same retries with CD-ROM and DVDs, and if the retries fail,
> it disables DMA! It even does the retries when reading CD-Audio.
> Maybe there should be a "retrys" setting that can be set by hdparm, then
> we could set the retry counts, and what happens when a retry fails on a
> per device basis.
Thinking hotswap, I could suggest that "device category" would be useful
for this. Yes, you could build policy into the plug code, but it still
needs to get the policy from somewhere.
--
-bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-30 17:46 ` Theodore Ts'o
2004-08-30 18:26 ` James Courtier-Dutton
@ 2004-08-30 22:17 ` Rogier Wolff
1 sibling, 0 replies; 31+ messages in thread
From: Rogier Wolff @ 2004-08-30 22:17 UTC (permalink / raw)
To: Theodore Ts'o, linux-kernel, linux-ide
On Mon, Aug 30, 2004 at 01:46:32PM -0400, Theodore Ts'o wrote:
> > a filesystem: if we recover one block this way, the next block will be
> > errorred and the filesystem "crashes" anyway. In fact this behaviour
> > may masquerade the first warnings that something is going wrong....
>
> If the block gets successfully read after 2 or 3 tries, it might be a
> good idea for the kernel to automatically do a forced rewrite of the
> block, which should cause the disk to do its own disk block
> sparing/reassignment.
Hi Ted,
I agree that this is the theory. In practise however, I've never
seen it work correctly. We've seen several disks with say 1-5 bad
blocks and nothing else, and "dd if=/dev/zero of=/dev/<disk>" doesn't
seem to cure them.
Roger.
--
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-30 16:39 Driver retries disk errors Rogier Wolff
2004-08-30 17:46 ` Theodore Ts'o
@ 2004-08-31 11:45 ` Alan Cox
2004-08-31 13:45 ` Andre Hedrick
2004-08-31 13:54 ` Rogier Wolff
1 sibling, 2 replies; 31+ messages in thread
From: Alan Cox @ 2004-08-31 11:45 UTC (permalink / raw)
To: Rogier Wolff; +Cc: Linux Kernel Mailing List, linux-ide
On Llu, 2004-08-30 at 17:39, Rogier Wolff wrote:
> We encounter "bad" drives with quite a lot more regularity than other
> people (look at the Email address). We're however, wondering why the
> IDE code still retries a bad block 8 times? By the time the drive
> reports "bad block" it has already tried it several times, including a
> bunch of "recalibrates" etc etc. For comparison, the Scsi-disk driver
> doesn't do any retrying.
It helps for some things like magneto-opticals. For generic hard drives
its only relevant for older devices.
> (*) Note: Tested last month: The driver still works for MFM
> drives. However, the initialization apparently is not enough
> anymore. The drive did not work when the BIOS didn't think there was a
> drive.
Please file a bug report if 2.6 also shows that problem.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-31 11:45 ` Alan Cox
@ 2004-08-31 13:45 ` Andre Hedrick
2004-08-31 13:54 ` Rogier Wolff
1 sibling, 0 replies; 31+ messages in thread
From: Andre Hedrick @ 2004-08-31 13:45 UTC (permalink / raw)
To: Alan Cox; +Cc: Rogier Wolff, Linux Kernel Mailing List, linux-ide
Rogier,
Because the command layer states to execute retries, regardless.
Modern drives now convert read-once to retry.
You need special opcodes to revert to desired status.
Media forensics is not a cake walk.
Andre Hedrick
LAD Storage Consulting Group
On Tue, 31 Aug 2004, Alan Cox wrote:
> On Llu, 2004-08-30 at 17:39, Rogier Wolff wrote:
> > We encounter "bad" drives with quite a lot more regularity than other
> > people (look at the Email address). We're however, wondering why the
> > IDE code still retries a bad block 8 times? By the time the drive
> > reports "bad block" it has already tried it several times, including a
> > bunch of "recalibrates" etc etc. For comparison, the Scsi-disk driver
> > doesn't do any retrying.
>
> It helps for some things like magneto-opticals. For generic hard drives
> its only relevant for older devices.
>
> > (*) Note: Tested last month: The driver still works for MFM
> > drives. However, the initialization apparently is not enough
> > anymore. The drive did not work when the BIOS didn't think there was a
> > drive.
>
> Please file a bug report if 2.6 also shows that problem.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-31 11:45 ` Alan Cox
2004-08-31 13:45 ` Andre Hedrick
@ 2004-08-31 13:54 ` Rogier Wolff
2004-08-31 14:12 ` Alan Cox
1 sibling, 1 reply; 31+ messages in thread
From: Rogier Wolff @ 2004-08-31 13:54 UTC (permalink / raw)
To: Alan Cox; +Cc: Rogier Wolff, Linux Kernel Mailing List, linux-ide
On Tue, Aug 31, 2004 at 12:45:15PM +0100, Alan Cox wrote:
> On Llu, 2004-08-30 at 17:39, Rogier Wolff wrote:
> > We encounter "bad" drives with quite a lot more regularity than other
> > people (look at the Email address). We're however, wondering why the
> > IDE code still retries a bad block 8 times? By the time the drive
> > reports "bad block" it has already tried it several times, including a
> > bunch of "recalibrates" etc etc. For comparison, the Scsi-disk driver
> > doesn't do any retrying.
>
> It helps for some things like magneto-opticals. For generic hard drives
> its only relevant for older devices.
>
> > (*) Note: Tested last month: The driver still works for MFM
> > drives. However, the initialization apparently is not enough
> > anymore. The drive did not work when the BIOS didn't think there was a
> > drive.
>
> Please file a bug report if 2.6 also shows that problem.
Will try to test when we have time.
So, can we agree on:
- might be needed for
- Floppies?
- MO drives
- older drives
Can we auto-detect these cases (Linus doesn't like configurable
parameters that need tweaking to work well, and I agree: 99% of the
users want to have stuff that works (well) out of the box.)
How about we set the num-retries to 1, and increase to 8 for
"weird devices" (floppy, MO), and older drives.
How do we detect: "Older drives"? Would "MFM": the user specified the
geometry" be valid as a detection of "older drive"?
Or do we want to include the 40Mb-1G generation drives as well? How
do we detect those if we want to include them?
I do want to make the num_retries thing a configurable parameter,
should the autodetect get it wrong: We get drives that we want to
recover without the kernel-level retries...
(still: I argue that you need to consider a "retry-works" error as an
early warning that your media is going bad, and you need to get your
data off ASAP! If the kernel silently retries and succeeds, the user
won't notice a thing and continue using the drive (or MO media) until
the error becomes irrecoverable. I recommend we put the retry at the
user level. As in "person behind keyboard".)
I'll try to make a patch as long as we work towards a feature set
first....
Roger.
--
+-- Rogier Wolff -- www.harddisk-recovery.nl -- 0800 220 20 20 --
| Files foetsie, bestanden kwijt, alle data weg?!
| Blijf kalm en neem contact op met Harddisk-recovery.nl!
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-31 13:54 ` Rogier Wolff
@ 2004-08-31 14:12 ` Alan Cox
2004-08-31 15:56 ` Erik Mouw
0 siblings, 1 reply; 31+ messages in thread
From: Alan Cox @ 2004-08-31 14:12 UTC (permalink / raw)
To: Rogier Wolff; +Cc: Linux Kernel Mailing List, linux-ide
On Maw, 2004-08-31 at 14:54, Rogier Wolff wrote:
> So, can we agree on:
> - might be needed for
> - Floppies?
> - MO drives
> - older drives
Other random stuff it saves our backside on we don't know about.
> How about we set the num-retries to 1, and increase to 8 for
> "weird devices" (floppy, MO), and older drives.
Disagree. I want it robust. If you want to set low retry counts then
the user should do so for special cases like forensics.
> I do want to make the num_retries thing a configurable parameter,
> should the autodetect get it wrong: We get drives that we want to
> recover without the kernel-level retries...
Making it configurable is good, but I can't help feeling that this
belongs at the block layer - I wonder what Jens thinks, it might well
have to be done by the driver because only the driver knows enough but
the ioctl/config option ought to be common.
> (still: I argue that you need to consider a "retry-works" error as an
> early warning that your media is going bad, and you need to get your
> data off ASAP! If the kernel silently retries and succeeds, the user
> won't notice a thing and continue using the drive (or MO media) until
> the error becomes irrecoverable. I recommend we put the retry at the
> user level. As in "person behind keyboard".)
M/O media retries generally do the right thing and have the right
effect. If you want to know if your drive is failing use SMART and ask
the drive
Remember: Storage appliance not disk. Treat it like a storage
appliance and you'll get better results.
Alan
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-31 14:12 ` Alan Cox
@ 2004-08-31 15:56 ` Erik Mouw
2004-08-31 15:13 ` Alan Cox
0 siblings, 1 reply; 31+ messages in thread
From: Erik Mouw @ 2004-08-31 15:56 UTC (permalink / raw)
To: Alan Cox; +Cc: Rogier Wolff, Linux Kernel Mailing List, linux-ide
On Tue, Aug 31, 2004 at 03:12:52PM +0100, Alan Cox wrote:
> On Maw, 2004-08-31 at 14:54, Rogier Wolff wrote:
> > How about we set the num-retries to 1, and increase to 8 for
> > "weird devices" (floppy, MO), and older drives.
>
> Disagree. I want it robust. If you want to set low retry counts then
> the user should do so for special cases like forensics.
The SCSI disk driver has been doing a single retry for quite some time
and it hasn't really bitten people. Why would the IDE disk driver be
different? The only case I can imagine a retry would be OK, is when we
get an UDMA CRC error (caused by bad cables).
(OK, for SCSI drives you have a lot more control about how a drive
should treat errors, but the kernel will not retry a block when the
drive reported it's bad.)
Erik
--
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-31 15:56 ` Erik Mouw
@ 2004-08-31 15:13 ` Alan Cox
2004-08-31 17:00 ` Erik Mouw
2004-08-31 22:55 ` Christer Weinigel
0 siblings, 2 replies; 31+ messages in thread
From: Alan Cox @ 2004-08-31 15:13 UTC (permalink / raw)
To: Erik Mouw; +Cc: Rogier Wolff, Linux Kernel Mailing List, linux-ide
On Maw, 2004-08-31 at 16:56, Erik Mouw wrote:
> The SCSI disk driver has been doing a single retry for quite some time
> and it hasn't really bitten people. Why would the IDE disk driver be
> different? The only case I can imagine a retry would be OK, is when we
> get an UDMA CRC error (caused by bad cables).
Retries also pop up in other less obvious cases and conveniently paper
over a wide variety of timeouts, power management quirks and drives just
having a random fit. Eight is probably excessive in all cases.
For non hard disk cases many devices do want and need retry.
Alan
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-31 15:13 ` Alan Cox
@ 2004-08-31 17:00 ` Erik Mouw
2004-08-31 16:12 ` Alan Cox
2004-08-31 22:55 ` Christer Weinigel
1 sibling, 1 reply; 31+ messages in thread
From: Erik Mouw @ 2004-08-31 17:00 UTC (permalink / raw)
To: Alan Cox; +Cc: Rogier Wolff, Linux Kernel Mailing List, linux-ide
On Tue, Aug 31, 2004 at 04:13:54PM +0100, Alan Cox wrote:
> On Maw, 2004-08-31 at 16:56, Erik Mouw wrote:
> > The SCSI disk driver has been doing a single retry for quite some time
> > and it hasn't really bitten people. Why would the IDE disk driver be
> > different? The only case I can imagine a retry would be OK, is when we
> > get an UDMA CRC error (caused by bad cables).
>
> Retries also pop up in other less obvious cases and conveniently paper
> over a wide variety of timeouts, power management quirks and drives just
> having a random fit. Eight is probably excessive in all cases.
There are indeed all sorts of other retries in various layers, the
worst one when the kernel tries to read-ahead a couple of blocks on an
IDE disk(1). You can work around them with raw IO or O_DIRECT.
> For non hard disk cases many devices do want and need retry.
And many others do not. CompactFlash readers are usually implemented as
a USB storage device, which on its turn is implemented as a SCSI
"disk". So far I haven't seen a CompactFlash which could be "fixed" by
retries.
iSCSI appliances can also make things worse: when the target machine is
implemented as a simple "pass everything to the real SCSI disk" device,
it's not really different from a directly connected SCSI disk. However,
when the iSCSI target interprets the SCSI commands and has its own way
to deal with bad blocks (i.e.: it retries the blocks), things can get
very bad when the kernel also uses a couple of retries.
In my experience it would be good if the IDE disk driver would behave
like the SCSI disk driver: no retries on a bad block. I agree that it
would be a good idea to make it configurable through the block layer to
avoid code duplication (blockdev --getretries/--setretries).
Erik
(1) Imagine an application doing a linear read on a file with an 8
block read ahead and the last block being bad. The kernel will try to
read that bad block 16 times, but because the IDE driver also has 8
retries, the kernel will try to read that bad block *64* times. It
usually takes an IDE drive about 2 seconds to figure out a block is
bad, so the application gets stuck for 2 minutes in that single bad
block.
--
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-31 17:00 ` Erik Mouw
@ 2004-08-31 16:12 ` Alan Cox
2004-09-01 15:18 ` Bill Davidsen
2004-09-01 15:28 ` Romano Giannetti
0 siblings, 2 replies; 31+ messages in thread
From: Alan Cox @ 2004-08-31 16:12 UTC (permalink / raw)
To: Erik Mouw; +Cc: Rogier Wolff, Linux Kernel Mailing List, linux-ide
On Maw, 2004-08-31 at 18:00, Erik Mouw wrote:
> > For non hard disk cases many devices do want and need retry.
>
> And many others do not. CompactFlash readers are usually implemented as
> a USB storage device, which on its turn is implemented as a SCSI
> "disk". So far I haven't seen a CompactFlash which could be "fixed" by
> retries.
It does no harm trying. It does real harm not being conservative and
losing peoples data. You recover people's data after its lost, the
IDE layer's job is to make sure it doesn't get lost in the first place.
> (1) Imagine an application doing a linear read on a file with an 8
> block read ahead and the last block being bad. The kernel will try to
> read that bad block 16 times, but because the IDE driver also has 8
> retries, the kernel will try to read that bad block *64* times. It
> usually takes an IDE drive about 2 seconds to figure out a block is
> bad, so the application gets stuck for 2 minutes in that single bad
> block.
Right now I know of no way to tell which is readahead for a failed
command or of telling the block layer to forget them. Fix this at the
block layer and IDE can abort the readahead sequence happily enough
because IDE is too dumb to have issued further commands to the drive at
this point.
Alan
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-31 16:12 ` Alan Cox
@ 2004-09-01 15:18 ` Bill Davidsen
2004-09-01 14:46 ` Alan Cox
2004-09-01 15:28 ` Romano Giannetti
1 sibling, 1 reply; 31+ messages in thread
From: Bill Davidsen @ 2004-09-01 15:18 UTC (permalink / raw)
To: linux-kernel
Alan Cox wrote:
> On Maw, 2004-08-31 at 18:00, Erik Mouw wrote:
>
>>>For non hard disk cases many devices do want and need retry.
>>
>>And many others do not. CompactFlash readers are usually implemented as
>>a USB storage device, which on its turn is implemented as a SCSI
>>"disk". So far I haven't seen a CompactFlash which could be "fixed" by
>>retries.
>
>
> It does no harm trying. It does real harm not being conservative and
> losing peoples data. You recover people's data after its lost, the
> IDE layer's job is to make sure it doesn't get lost in the first place.
>
>
>>(1) Imagine an application doing a linear read on a file with an 8
>>block read ahead and the last block being bad. The kernel will try to
>>read that bad block 16 times, but because the IDE driver also has 8
>>retries, the kernel will try to read that bad block *64* times. It
>>usually takes an IDE drive about 2 seconds to figure out a block is
>>bad, so the application gets stuck for 2 minutes in that single bad
>>block.
>
>
> Right now I know of no way to tell which is readahead for a failed
> command or of telling the block layer to forget them. Fix this at the
> block layer and IDE can abort the readahead sequence happily enough
> because IDE is too dumb to have issued further commands to the drive at
> this point.
If would probably be good to retry "read what you were asked, nothing
more" on error, to avoid passing back errors caused by readahead. I
suspect this would avoid some issues reading data off CD as well, where
one software can read clean and another ends with a short image and error.
--
-bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: Driver retries disk errors.
2004-09-01 15:18 ` Bill Davidsen
@ 2004-09-01 14:46 ` Alan Cox
2004-09-01 18:54 ` Bill Davidsen
0 siblings, 1 reply; 31+ messages in thread
From: Alan Cox @ 2004-09-01 14:46 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Linux Kernel Mailing List
On Mer, 2004-09-01 at 16:18, Bill Davidsen wrote:
> If would probably be good to retry "read what you were asked, nothing
> more" on error, to avoid passing back errors caused by readahead. I
> suspect this would avoid some issues reading data off CD as well, where
> one software can read clean and another ends with a short image and error.
Sure but as I understand the block layer currently (and I may be missing
something in the 2.6 code) I can't do that from a driver.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-09-01 14:46 ` Alan Cox
@ 2004-09-01 18:54 ` Bill Davidsen
0 siblings, 0 replies; 31+ messages in thread
From: Bill Davidsen @ 2004-09-01 18:54 UTC (permalink / raw)
To: Alan Cox; +Cc: Linux Kernel Mailing List
On Wed, 1 Sep 2004, Alan Cox wrote:
> On Mer, 2004-09-01 at 16:18, Bill Davidsen wrote:
> > If would probably be good to retry "read what you were asked, nothing
> > more" on error, to avoid passing back errors caused by readahead. I
> > suspect this would avoid some issues reading data off CD as well, where
> > one software can read clean and another ends with a short image and error.
>
> Sure but as I understand the block layer currently (and I may be missing
> something in the 2.6 code) I can't do that from a driver.
>
Sorry, that was unclear. I was speaking of a general approach rather than
what would be done in the driver. Clearly that's best done at a higher
level. Drivers should not be making policy decisions of that type, but I
don't think it's good to return a read error caused by data the program
didn't request (ie. readahead).
Unless S.M.A.R.T is lying, that happens so seldom on disk that the
overhead of a retry doesn't matter. And on CD it makes things work where
currently they fail.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-31 16:12 ` Alan Cox
2004-09-01 15:18 ` Bill Davidsen
@ 2004-09-01 15:28 ` Romano Giannetti
2004-09-01 14:44 ` Alan Cox
1 sibling, 1 reply; 31+ messages in thread
From: Romano Giannetti @ 2004-09-01 15:28 UTC (permalink / raw)
To: Linux Kernel Mailing List
On Tue, Aug 31, 2004 at 05:12:50PM +0100, Alan Cox wrote:
>
> > (1) Imagine an application doing a linear read on a file with an 8
> > block read ahead and the last block being bad. The kernel will try to
> > read that bad block 16 times, but because the IDE driver also has 8
> > retries, the kernel will try to read that bad block *64* times. It
> > usually takes an IDE drive about 2 seconds to figure out a block is
> > bad, so the application gets stuck for 2 minutes in that single bad
> > block.
>
> Right now I know of no way to tell which is readahead for a failed
> command or of telling the block layer to forget them. Fix this at the
> block layer and IDE can abort the readahead sequence happily enough
> because IDE is too dumb to have issued further commands to the drive at
> this point.
Just a question from a kernel-almost-illiterate. Could this explain the
behavior of my laptop yesterday, reading a damaged DVD? I had to wait almost
one full minute of retry until being able to kill xine...
If maintaining the retries, it could be nice to allow at least kill -9
between them. I do not know if that's foolish and/or impossible, so please
do not bash too hard...
Have a nice day,
Romano
--
Romano Giannetti - Univ. Pontificia Comillas (Madrid, Spain)
Electronic Engineer - phone +34 915 422 800 ext 2416 fax +34 915 596 569
^ permalink raw reply [flat|nested] 31+ messages in thread* Re: Driver retries disk errors.
2004-09-01 15:28 ` Romano Giannetti
@ 2004-09-01 14:44 ` Alan Cox
2004-09-01 23:14 ` Rogier Wolff
2004-09-02 16:26 ` Eric Mudama
0 siblings, 2 replies; 31+ messages in thread
From: Alan Cox @ 2004-09-01 14:44 UTC (permalink / raw)
To: Romano Giannetti; +Cc: Linux Kernel Mailing List
On Mer, 2004-09-01 at 16:28, Romano Giannetti wrote:
> Just a question from a kernel-almost-illiterate. Could this explain the
> behavior of my laptop yesterday, reading a damaged DVD? I had to wait almost
> one full minute of retry until being able to kill xine...
Thats the block layer. Its actually hard to fix the kill -9 case.
> If maintaining the retries, it could be nice to allow at least kill -9
> between them. I do not know if that's foolish and/or impossible, so please
> do not bash too hard...
Things like Xine are precisely the cases where you want retry turned off
by the application - if the sector is bad then you want to skip when
playing movies, while you don't want to skip while writing out your
database
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-09-01 14:44 ` Alan Cox
@ 2004-09-01 23:14 ` Rogier Wolff
2004-09-02 9:29 ` Alan Cox
2004-09-02 16:26 ` Eric Mudama
1 sibling, 1 reply; 31+ messages in thread
From: Rogier Wolff @ 2004-09-01 23:14 UTC (permalink / raw)
To: Alan Cox; +Cc: Romano Giannetti, Linux Kernel Mailing List
On Wed, Sep 01, 2004 at 03:44:38PM +0100, Alan Cox wrote:
> On Mer, 2004-09-01 at 16:28, Romano Giannetti wrote:
> > Just a question from a kernel-almost-illiterate. Could this explain the
> > behavior of my laptop yesterday, reading a damaged DVD? I had to wait almost
> > one full minute of retry until being able to kill xine...
>
> Thats the block layer. Its actually hard to fix the kill -9 case.
I don't think so. It starts with the ide-cd level driver
doing 8 retries. Most disk we see retry themselves for about a
4 second delay before reporting a bad block. A CD taking twice
that much would not sound abnormal. (seeks are about 10 times
as expensive on CDs). 8 times 8 seconds is a full minute.
Roger.
--
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam - no windows, no gates, apache inside!" ****
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-09-01 23:14 ` Rogier Wolff
@ 2004-09-02 9:29 ` Alan Cox
2004-09-02 10:54 ` Rogier Wolff
2004-09-02 14:30 ` John Stoffel
0 siblings, 2 replies; 31+ messages in thread
From: Alan Cox @ 2004-09-02 9:29 UTC (permalink / raw)
To: Rogier Wolff; +Cc: Romano Giannetti, Linux Kernel Mailing List
On Iau, 2004-09-02 at 00:14, Rogier Wolff wrote:
> I don't think so. It starts with the ide-cd level driver
> doing 8 retries. Most disk we see retry themselves for about a
> 4 second delay before reporting a bad block. A CD taking twice
"Most", that is the heart of the reason for not taking them out.
> that much would not sound abnormal. (seeks are about 10 times
> as expensive on CDs). 8 times 8 seconds is a full minute.
As I said media players need a way to turn it to no retry
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-09-02 9:29 ` Alan Cox
@ 2004-09-02 10:54 ` Rogier Wolff
2004-09-02 14:30 ` John Stoffel
1 sibling, 0 replies; 31+ messages in thread
From: Rogier Wolff @ 2004-09-02 10:54 UTC (permalink / raw)
To: Alan Cox; +Cc: Rogier Wolff, Romano Giannetti, Linux Kernel Mailing List
On Thu, Sep 02, 2004 at 10:29:29AM +0100, Alan Cox wrote:
> On Iau, 2004-09-02 at 00:14, Rogier Wolff wrote:
> > I don't think so. It starts with the ide-cd level driver
> > doing 8 retries. Most disk we see retry themselves for about a
> > 4 second delay before reporting a bad block. A CD taking twice
>
> "Most", that is the heart of the reason for not taking them out.
Some retry only for about a second, the rest takes more than
4 seconds.
Roger.
--
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam - no windows, no gates, apache inside!" ****
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-09-02 9:29 ` Alan Cox
2004-09-02 10:54 ` Rogier Wolff
@ 2004-09-02 14:30 ` John Stoffel
2004-09-02 14:59 ` Alan Cox
1 sibling, 1 reply; 31+ messages in thread
From: John Stoffel @ 2004-09-02 14:30 UTC (permalink / raw)
To: Alan Cox; +Cc: Rogier Wolff, Romano Giannetti, Linux Kernel Mailing List
>> that much would not sound abnormal. (seeks are about 10 times
>> as expensive on CDs). 8 times 8 seconds is a full minute.
Alan> As I said media players need a way to turn it to no retry
I just ran into this with a scratched CDROM and the program 'grip'
which ended up requiring a reboot of my 2.6.8 kernel to get back
control of /dev/cdrom on my system. Needless to say, I wasn't very
happy about this.
I really think that we need some way to keep such deadlocks from
happening. I really dislike having a device lockup a user application
so hard that it can't be exited. There's no real reason we should be
doing this any more. If we have to, let the user kill it and just
have the kernel make it into a zombie, but at least let the user kill
it off.
John
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-09-02 14:30 ` John Stoffel
@ 2004-09-02 14:59 ` Alan Cox
2004-09-02 16:07 ` John Stoffel
0 siblings, 1 reply; 31+ messages in thread
From: Alan Cox @ 2004-09-02 14:59 UTC (permalink / raw)
To: John Stoffel; +Cc: Rogier Wolff, Romano Giannetti, Linux Kernel Mailing List
On Iau, 2004-09-02 at 15:30, John Stoffel wrote:
> I really think that we need some way to keep such deadlocks from
> happening. I really dislike having a device lockup a user application
> so hard that it can't be exited. There's no real reason we should be
> doing this any more. If we have to, let the user kill it and just
> have the kernel make it into a zombie, but at least let the user kill
> it off.
If you had to reboot file a bug, none of the block error recovery code
or below should ever hang indefinitely.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-09-02 14:59 ` Alan Cox
@ 2004-09-02 16:07 ` John Stoffel
0 siblings, 0 replies; 31+ messages in thread
From: John Stoffel @ 2004-09-02 16:07 UTC (permalink / raw)
To: Alan Cox
Cc: John Stoffel, Rogier Wolff, Romano Giannetti,
Linux Kernel Mailing List
Alan> On Iau, 2004-09-02 at 15:30, John Stoffel wrote:
>> I really think that we need some way to keep such deadlocks from
>> happening. I really dislike having a device lockup a user application
>> so hard that it can't be exited. There's no real reason we should be
>> doing this any more. If we have to, let the user kill it and just
>> have the kernel make it into a zombie, but at least let the user kill
>> it off.
Alan> If you had to reboot file a bug, none of the block error
Alan> recovery code or below should ever hang indefinitely.
Once I can reproduce it reliably, I'll send a better report. I've
been holding off on my comments til now, but got caught up in the
moment.
I also know now that it should timeout and come back to life. I even
had a back trace on the hung process, but didn't save it. Mea cupla.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-09-01 14:44 ` Alan Cox
2004-09-01 23:14 ` Rogier Wolff
@ 2004-09-02 16:26 ` Eric Mudama
1 sibling, 0 replies; 31+ messages in thread
From: Eric Mudama @ 2004-09-02 16:26 UTC (permalink / raw)
To: Alan Cox; +Cc: Romano Giannetti, Linux Kernel Mailing List
On Wed, 01 Sep 2004 15:44:38 +0100, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> Things like Xine are precisely the cases where you want retry turned off
> by the application - if the sector is bad then you want to skip when
> playing movies, while you don't want to skip while writing out your
> database
This is what they're trying to accomplish with ATA-7 Streaming Feature
Set ... tell the drive to just read through errors and send the
garbage, without doing error recovery, for high bandwidth media
readback. The first drives to support this feature set will be coming
out relatively soon...
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Driver retries disk errors.
2004-08-31 15:13 ` Alan Cox
2004-08-31 17:00 ` Erik Mouw
@ 2004-08-31 22:55 ` Christer Weinigel
1 sibling, 0 replies; 31+ messages in thread
From: Christer Weinigel @ 2004-08-31 22:55 UTC (permalink / raw)
To: Alan Cox; +Cc: Erik Mouw, Rogier Wolff, Linux Kernel Mailing List, linux-ide
Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
> Retries also pop up in other less obvious cases and conveniently paper
> over a wide variety of timeouts, power management quirks and drives just
> having a random fit. Eight is probably excessive in all cases.
>
> For non hard disk cases many devices do want and need retry.
For ripping CDs I'd prefer if the application could control retries
and not default to what the CD player prefers.
I have a CD, KLF -- Ultra Rare Tracks, and unfortunately someone I
borrowed it to managed to literally stomp on it, so the record has a
lot of scratches. I would very much like to rip as much as possible
of this CD so that I can at least listen to part of it, but every time
I have tried the CD player just gets stuck in an endless retry loop.
The same thing happens when trying to rip a "copy controlled" CD,
which is also pretty irritating since this means that I can't rip my
Teddybears Stockholm CD and put the songs on my iPod.
/Christer
--
"Just how much can I get away with and still go to heaven?"
Freelance consultant specializing in device driver programming for Linux
Christer Weinigel <christer@weinigel.se> http://www.weinigel.se
^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <fa.d48te6f.1ol6tbb@ifi.uio.no>]
end of thread, other threads:[~2004-09-02 16:27 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-30 16:39 Driver retries disk errors Rogier Wolff
2004-08-30 17:46 ` Theodore Ts'o
2004-08-30 18:26 ` James Courtier-Dutton
2004-08-30 22:25 ` Rogier Wolff
2004-08-31 11:38 ` Alan Cox
2004-09-02 16:23 ` Eric Mudama
2004-08-31 15:16 ` Bill Davidsen
2004-08-30 22:17 ` Rogier Wolff
2004-08-31 11:45 ` Alan Cox
2004-08-31 13:45 ` Andre Hedrick
2004-08-31 13:54 ` Rogier Wolff
2004-08-31 14:12 ` Alan Cox
2004-08-31 15:56 ` Erik Mouw
2004-08-31 15:13 ` Alan Cox
2004-08-31 17:00 ` Erik Mouw
2004-08-31 16:12 ` Alan Cox
2004-09-01 15:18 ` Bill Davidsen
2004-09-01 14:46 ` Alan Cox
2004-09-01 18:54 ` Bill Davidsen
2004-09-01 15:28 ` Romano Giannetti
2004-09-01 14:44 ` Alan Cox
2004-09-01 23:14 ` Rogier Wolff
2004-09-02 9:29 ` Alan Cox
2004-09-02 10:54 ` Rogier Wolff
2004-09-02 14:30 ` John Stoffel
2004-09-02 14:59 ` Alan Cox
2004-09-02 16:07 ` John Stoffel
2004-09-02 16:26 ` Eric Mudama
2004-08-31 22:55 ` Christer Weinigel
[not found] <fa.d48te6f.1ol6tbb@ifi.uio.no>
[not found] ` <fa.eti1vu1.2nqlj5@ifi.uio.no>
2004-08-31 0:17 ` Robert Hancock
2004-09-01 23:04 ` Rogier Wolff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox