* IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op?
@ 2003-10-06 18:42 Daniel B.
2003-10-06 19:11 ` Bartlomiej Zolnierkiewicz
0 siblings, 1 reply; 7+ messages in thread
From: Daniel B. @ 2003-10-06 18:42 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org
I just got bitten _again_ by IDE DMA timeout errors and massive
filesystem corruption in kernel 2.4.22 (on an Asus A7M266-D dual-Athlon
XP motherboard (AMD 768 chip / amd7441 IDE controller)).
(I had turned DMA off in my init scripts, but apparently Debian
unstable's k7-smp configuration enables DMA by default before my init
scripts get control. Ext3 journal "recovery" trashed my system
partition.)
What's going on with the IDE DMA bugs? They have existed since 2.2
(right?), and even at .22 in the 2.4 series they still exist. Why
have they been around so long? Is it that few kernel developers use
the combinations of hardware or configuration options that expose
the bugs (like my dual-CPU box with IDE, not SCSI, disks)?
Are the DMA bugs believed to be fixed (for real) yet? IF so, in which
version?
Is there any consolidated documentation of the combinations of factors
that cause corruption, or of how to reliably avoid corruption (like
all the things to check to make sure your kernel never even tries to
enable DMA)?
Also, why does a DMA timeout cause such corruption? Doesn't the kernel
keep track of uncompleted operations, retain the information needed to
try again, and try again if there's a failure? If not, why not?
If it can't try again, shouldn't the kernel at least abort after one
disk-write failure instead of performing additional writes, which
frequently depend on the previous writes? (E.g., if I try to read
block 1's data and write it to block 2, and then write something new
to block 1, if the first write fails but continue and do the second
write, data gets destroyed. If the first write fails and I stop right
away, less is destroyed.)
Daniel
--
Daniel Barclay
dsb@smart.net
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op?
2003-10-06 18:42 IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op? Daniel B.
@ 2003-10-06 19:11 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 7+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2003-10-06 19:11 UTC (permalink / raw)
To: Daniel B.; +Cc: linux-kernel@vger.kernel.org
There are different IDE DMA errors.
Please post error, dmesg and .config.
On Monday 06 of October 2003 20:42, Daniel B. wrote:
> I just got bitten _again_ by IDE DMA timeout errors and massive
> filesystem corruption in kernel 2.4.22 (on an Asus A7M266-D dual-Athlon
> XP motherboard (AMD 768 chip / amd7441 IDE controller)).
>
> (I had turned DMA off in my init scripts, but apparently Debian
> unstable's k7-smp configuration enables DMA by default before my init
> scripts get control. Ext3 journal "recovery" trashed my system
> partition.)
>
> What's going on with the IDE DMA bugs? They have existed since 2.2
> (right?), and even at .22 in the 2.4 series they still exist. Why
> have they been around so long? Is it that few kernel developers use
> the combinations of hardware or configuration options that expose
> the bugs (like my dual-CPU box with IDE, not SCSI, disks)?
Well, yes, I have no problems for example :-).
> Are the DMA bugs believed to be fixed (for real) yet? IF so, in which
> version?
>
> Is there any consolidated documentation of the combinations of factors
> that cause corruption, or of how to reliably avoid corruption (like
> all the things to check to make sure your kernel never even tries to
> enable DMA)?
>
>
> Also, why does a DMA timeout cause such corruption? Doesn't the kernel
> keep track of uncompleted operations, retain the information needed to
> try again, and try again if there's a failure? If not, why not?
>
> If it can't try again, shouldn't the kernel at least abort after one
> disk-write failure instead of performing additional writes, which
> frequently depend on the previous writes? (E.g., if I try to read
> block 1's data and write it to block 2, and then write something new
> to block 1, if the first write fails but continue and do the second
> write, data gets destroyed. If the first write fails and I stop right
> away, less is destroyed.)
Are you sure you don't have faulty drive?
--bartlomiej
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op?
2003-10-06 19:32 IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy " Mudama, Eric
@ 2003-10-06 20:20 ` Daniel B.
2003-10-06 20:45 ` Valdis.Kletnieks
0 siblings, 1 reply; 7+ messages in thread
From: Daniel B. @ 2003-10-06 20:20 UTC (permalink / raw)
Cc: linux-kernel
"Mudama, Eric" wrote:
...
> > Doesn't the kernel keep track of uncompleted operations,
> > retain the information needed to try again, and try again
> > if there's a failure? If not, why not?
>
> If the disk has write cache enabled, this isn't necessarilly possible, since
> there's nothing in the IDE specification that guarantees the order of writes
> to the media without a FLUSH CACHE (EXT) command.
Are you sure? If you issue a write to block 1 and then issue another
write to block 1, it would have to guarantee the relative order of those
writes (or equivalent optimization in the write cache), wouldn't it?
> Hypothetically, if you were doing full-pack random writes continuously with
> no idle time and no FLUSH CACHE, you can have writes that are days old still
> in the drive's buffer and still un-attempted. A write with write-cache
> enabled reports ending status at the completion of the transfer. There is
> no mechanism to tell the host that a cached write failed, other than giving
> an error on the next command.
But we're not talking about errors IN the disk drive after the communi-
cation between the kernel and drive is already done. We're talking
about errors in the communication BETWEEN the kernel and the drive (lost
DMA interrupts), aren't we?
If the kernel issues a write command to the drive, and never gets a
response (DMA-complete interrupt?) from the drive that it has accepted
the command, why can't the kernel repeat the write command?
Daniel
--
Daniel Barclay
dsb@smart.net
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op?
2003-10-06 20:20 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why " Daniel B.
@ 2003-10-06 20:45 ` Valdis.Kletnieks
2003-10-06 21:07 ` Daniel B.
0 siblings, 1 reply; 7+ messages in thread
From: Valdis.Kletnieks @ 2003-10-06 20:45 UTC (permalink / raw)
To: Daniel B.; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1041 bytes --]
On Mon, 06 Oct 2003 16:20:42 EDT, "Daniel B." said:
> Are you sure? If you issue a write to block 1 and then issue another
> write to block 1, it would have to guarantee the relative order of those
> writes (or equivalent optimization in the write cache), wouldn't it?
If the old 'block 1' data is still in the write cache, then another write
should overlay it - that's a very basic optimization. Consider the case of a
very active block that has a popular inode that's being atime-updated a lot (or
whatever causes a lot of activity - ignore the in-memory cache and sync/fsync
for the moment). You really don't want 34 writes to the same block taking up 34
blocks of space in the write cache....
The ordering issue comes when the following type of thing happens:
1) a write for block 993 is issued (metadata, perhaps)
2) a write for block 10934 is issued - actual file contents or something that
depends on 993 being written.
3) Disk writes 10934 out.
4) Things go bad (power hit, whatever) before 993 gets written out.
5) fsck. ;)
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op?
2003-10-06 20:45 ` Valdis.Kletnieks
@ 2003-10-06 21:07 ` Daniel B.
2003-10-06 21:26 ` Jeff Garzik
0 siblings, 1 reply; 7+ messages in thread
From: Daniel B. @ 2003-10-06 21:07 UTC (permalink / raw)
Cc: linux-kernel
Valdis.Kletnieks@vt.edu wrote:
>
> ...
>
> The ordering issue comes when the following type of thing happens:
>
> 1) a write for block 993 is issued (metadata, perhaps)
> 2) a write for block 10934 is issued - actual file contents or something that
> depends on 993 being written.
> 3) Disk writes 10934 out.
> 4) Things go bad (power hit, whatever) before 993 gets written out.
> 5) fsck. ;)
It that scenario relevant to DMA errors?
I'm talking about problems in steps 1 and 2, not in later steps.
If the kernel starts a write command for block 993, wouldn't it wait
for a DMA interrupt signalling that the drive has received and accepted
the command before the kernel starts the write command for block 10934?
If it timed out waiting for that interrupt, can't it re-issue the
write for block 993 before proceeding?
Daniel
--
Daniel Barclay
dsb@smart.net
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op?
2003-10-06 21:07 ` Daniel B.
@ 2003-10-06 21:26 ` Jeff Garzik
2003-10-07 5:24 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynot " Daniel B.
0 siblings, 1 reply; 7+ messages in thread
From: Jeff Garzik @ 2003-10-06 21:26 UTC (permalink / raw)
To: Daniel B.; +Cc: linux-kernel
Daniel B. wrote:
> If the kernel starts a write command for block 993, wouldn't it wait
> for a DMA interrupt signalling that the drive has received and accepted
> the command before the kernel starts the write command for block 10934?
With command queueing, no, it would not wait.
> If it timed out waiting for that interrupt, can't it re-issue the
> write for block 993 before proceeding?
Assuming a large amount of sanity in your OS driver... certainly.
Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op?
2003-10-07 6:03 ` Valdis.Kletnieks
@ 2003-10-07 13:32 ` Daniel B.
0 siblings, 0 replies; 7+ messages in thread
From: Daniel B. @ 2003-10-07 13:32 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: linux-kernel
Valdis.Kletnieks@vt.edu wrote:
>
> On Tue, 07 Oct 2003 01:24:19 EDT, "Daniel B." said:
>
> > So if some command/batch/etc. wasn't acknowledged, why can't the
> > kernel retry the command/batch/etc.?
>
> The problem is that the disk ack'ed the command when the block went into the
> write cache.
That's the acknowledgment I'm talking about.
> You *DONT* in general get back another ack when the block
> actually hits the platters.
I know. I wasn't talking about any acknowledge after actually writing
the data to the medium.
> > Given the serious of disk data corruption, why isn't the Linux kernel
> > more reliable here? Hasn't this family of IDE problems been around
> > for a couple of years now?
>
> It's hard for the kernel to be more reliable unless you just disable the write cache.
Again, I'm NOT talking about write-cache problems. I'm talking about
problems in the communication/handshaking between the kernel and
the drive.
> The biggest reason we don't see more issues like this is that the average MTBF
> really is up in the 100K hours and up range
That reliability figure is for the _drives_.
That figure obviously does not apply to kernel-to-drive communication,
because I've had dozens of DMA-interrupt corruptions in the last two
or so years.
> Yes, this family of problems has been around ever since write caches were
> introduced.
I'm not talking about problems related to write caches. I'm talking
about DMA interrupt problems. Why do you think I'm talking about
inside-the-black-box write-cache problems?
> It's just taken until now that we've got file system code that's
> rock solid enough
Rock solid? Hah! If file system (and other disk-related) code is so
solid why did my root partition get screwed so badly it can't boot?
(Even if it's bad hardware's fault that an interrupt got lost, and
even if it's unreasonably complicated (or impossible) for the
kernel to retry an unacknowledged command, why didn't the kernel
stop writing to that disk after the first unacknowledged command?)
> that the write cache is a major reliability issue - for the
> longest time, one kernel bug or another has been more of a concern.
It's not "has been"--it is still a problem, in the newest (is .22
still the newest) released stable kernel.
Daniel
--
Daniel Barclay
dsb@smart.net
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2003-10-07 13:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-06 18:42 IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op? Daniel B.
2003-10-06 19:11 ` Bartlomiej Zolnierkiewicz
-- strict thread matches above, loose matches on Subject: below --
2003-10-06 19:32 IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy " Mudama, Eric
2003-10-06 20:20 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why " Daniel B.
2003-10-06 20:45 ` Valdis.Kletnieks
2003-10-06 21:07 ` Daniel B.
2003-10-06 21:26 ` Jeff Garzik
2003-10-07 5:24 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynot " Daniel B.
2003-10-07 6:03 ` Valdis.Kletnieks
2003-10-07 13:32 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not " Daniel B.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.