All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel B." <dsb@smart.net>
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: IDE DMA errors, massive disk corruption:  Why?  Fixed Yet?  Why not  re-do failed op?
Date: Mon, 06 Oct 2003 14:42:24 -0400	[thread overview]
Message-ID: <3F81B790.B8AF7136@smart.net> (raw)

I just got bitten _again_ by IDE DMA timeout errors and massive 
filesystem corruption in kernel 2.4.22 (on an Asus A7M266-D dual-Athlon 
XP motherboard (AMD 768 chip / amd7441 IDE controller)).

(I had turned DMA off in my init scripts, but apparently Debian 
unstable's k7-smp configuration enables DMA by default before my init
scripts get control.  Ext3 journal "recovery" trashed my system 
partition.)

What's going on with the IDE DMA bugs?  They have existed since 2.2 
(right?), and even at .22 in the 2.4 series they still exist.  Why
have they been around so long?  Is it that few kernel developers use
the combinations of hardware or configuration options that expose
the bugs (like my dual-CPU box with IDE, not SCSI, disks)?

Are the DMA bugs believed to be fixed (for real) yet?  IF so, in which 
version?

Is there any consolidated documentation of the combinations of factors
that cause corruption, or of how to reliably avoid corruption (like
all the things to check to make sure your kernel never even tries to 
enable DMA)?


Also, why does a DMA timeout cause such corruption?  Doesn't the kernel 
keep track of uncompleted operations, retain the information needed to
try again, and try again if there's a failure?  If not, why not?

If it can't try again, shouldn't the kernel at least abort after one 
disk-write failure instead of performing additional writes, which
frequently depend on the previous writes?  (E.g., if I try to read 
block 1's data and write it to block 2, and then write something new 
to block 1, if the first write fails but continue and do the second
write, data gets destroyed.  If the first write fails and I stop right 
away, less is destroyed.)




Daniel
-- 
Daniel Barclay
dsb@smart.net

             reply	other threads:[~2003-10-06 18:42 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-06 18:42 Daniel B. [this message]
2003-10-06 19:11 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not re-do failed op? Bartlomiej Zolnierkiewicz
  -- strict thread matches above, loose matches on Subject: below --
2003-10-06 19:32 IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy " Mudama, Eric
2003-10-06 20:20 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why " Daniel B.
2003-10-06 20:45   ` Valdis.Kletnieks
2003-10-06 21:07     ` Daniel B.
2003-10-06 21:26       ` Jeff Garzik
2003-10-07  5:24         ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynot " Daniel B.
2003-10-07  6:03           ` Valdis.Kletnieks
2003-10-07 13:32             ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not " Daniel B.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3F81B790.B8AF7136@smart.net \
    --to=dsb@smart.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.