All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel B." <dsb@smart.net>
To: Valdis.Kletnieks@vt.edu
Cc: linux-kernel@vger.kernel.org
Subject: Re: IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why not  re-do failed op?
Date: Tue, 07 Oct 2003 09:32:10 -0400	[thread overview]
Message-ID: <3F82C05A.71F72E72@smart.net> (raw)
In-Reply-To: 200310070603.h97631Yl011804@turing-police.cc.vt.edu

Valdis.Kletnieks@vt.edu wrote:
> 
> On Tue, 07 Oct 2003 01:24:19 EDT, "Daniel B." said:
> 
> > So if some command/batch/etc. wasn't acknowledged, why can't the
> > kernel retry the command/batch/etc.?
> 
> The problem is that the disk ack'ed the command when the block went into the
> write cache.  

That's the acknowledgment I'm talking about.


> You *DONT* in general get back another ack when the block
> actually hits the platters.

I know.  I wasn't talking about any acknowledge after actually writing
the data to the medium.

> > Given the serious of disk data corruption, why isn't the Linux kernel
> > more reliable here?  Hasn't this family of IDE problems been around
> > for a couple of years now?
> 
> It's hard for the kernel to be more reliable unless you just disable the write cache.

Again, I'm NOT talking about write-cache problems.  I'm talking about
problems in the communication/handshaking between the kernel and
the drive.


> The biggest reason we don't see more issues like this is that the average MTBF
> really is up in the 100K hours and up range

That reliability figure is for the _drives_.

That figure obviously does not apply to kernel-to-drive communication,
because I've had dozens of DMA-interrupt corruptions in the last two
or so years.



> Yes, this family of problems has been around ever since write caches were
> introduced. 

I'm not talking about problems related to write caches.  I'm talking 
about DMA interrupt problems.  Why do you think I'm talking about
inside-the-black-box write-cache problems?


> It's just taken until now that we've got file system code that's
> rock solid enough 

Rock solid?  Hah!  If file system (and other disk-related) code is so 
solid why did my root partition get screwed so badly it can't boot?  

(Even if it's bad hardware's fault that an interrupt got lost, and 
even if it's unreasonably complicated (or impossible) for the
kernel to retry an unacknowledged command, why didn't the kernel
stop writing to that disk after the first unacknowledged command?)


> that the write cache is a major reliability issue - for the
> longest time, one kernel bug or another has been more of a concern.

It's not "has been"--it is still a problem, in the newest (is .22 
still the newest) released stable kernel.  

Daniel
-- 
Daniel Barclay
dsb@smart.net

  parent reply	other threads:[~2003-10-07 13:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-06 19:32 IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy not re-do failed op? Mudama, Eric
2003-10-06 20:20 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why " Daniel B.
2003-10-06 20:45   ` Valdis.Kletnieks
2003-10-06 21:07     ` Daniel B.
2003-10-06 21:26       ` Jeff Garzik
2003-10-07  5:24         ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynot " Daniel B.
2003-10-07  6:03           ` Valdis.Kletnieks
2003-10-07 12:23             ` Ruth Ivimey-Cook
2003-10-07 13:46               ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? Whynotre-do " Daniel B.
2003-10-07 13:32             ` Daniel B. [this message]
2003-10-10  1:10 ` IDE DMA errors, massive disk corruption: Why? Fixed Yet? W hy not re-do " Greg Stark
  -- strict thread matches above, loose matches on Subject: below --
2003-10-06 18:42 IDE DMA errors, massive disk corruption: Why? Fixed Yet? Why " Daniel B.
2003-10-06 19:11 ` Bartlomiej Zolnierkiewicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3F82C05A.71F72E72@smart.net \
    --to=dsb@smart.net \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.