Re: unreadable drives can be synchronized?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Colin McCabe" <colin.p.mccabe@gmail.com>
To: Bill Davidsen <davidsen@tmr.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: unreadable drives can be synchronized?
Date: Wed, 16 May 2007 16:09:47 -0400	[thread overview]
Message-ID: <7296208f0705161309s75f2b989mf09d3b0736fae9a8@mail.gmail.com> (raw)
In-Reply-To: <464B3DC2.9090806@tmr.com>

On 5/16/07, Bill Davidsen <davidsen@tmr.com> wrote:
> Colin McCabe wrote:
> > Hi all,
> >
> > I am running software RAID on Linux 2.6.21.
> >
> > While experimenting with adding and removing devices from the RAID
> > array, I
> > noticed something very troubling. I have a bad drive (let's call it
> > drive B)
> > which gets random read errors. I also have a good drive, call it drive A.
> >
> > B can synchronize with A. But then, if I remove A from the raid array, A
> > cannot be re-added. This is because the bad drive, B, cannot be read
> > from.
> >
> > Basically, B appears to be "write-only"; it will never return an error
> > on a
> > write, but just try to read from it, and you will be sorry.
> >
> You may be able to recover from this (why would you do such a thing?) by
> stopping the array and reassembling the array with only the "good" drive
> and the other as failed. Caution, I made this up, it should work but I
> have no bad drive to use for a test, we have a good recycling system in
> my area.

This is an embedded systems application. There isn't any important
data on drives A or B at the moment.

What concerns me is that apparently these Hitachi disks have errors
that only show up when you try to read from them. I don't know if this
is a firmware bug or a physical limitation of the way the drive
detects errors. I actually have two different drives which could fill
the role of drive B in this scenario.

If I do a "check" on both drives, it speedily removes B once it
realizes that it can't read from it. But what bothers me is that it is
able to become active without ever being tested by being read from. So
it seems like at minimum, careful admins should do a "check"
immediately after adding a new disk to an array.

Colin

> > Writing is fine:
> > [root@cmccabe-devel root]# dd if=/dev/zero of=/dev/sdb bs=524288
> > dd: writing `/dev/sdb': No space left on device
> > 114464+0 records in
> > 114463+0 records out
> >
> > Reading is not:
> > [root@cmccabe-devel root]# dd if=/dev/sdb of=/dev/null bs=524288
> > ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x2 frozen
> > ata1.00: cmd 60/00:00:00:b0:01/01:00:00:00:00/40 tag 0 cdb 0x0 data
> > 131072 in
> > [ ... copious errors ... ]
> >
> > I have disabled write caching using hdparm -W0.
> > Both drives are: Fujitsu MHV2060BH, 60 GB, Serial ATA
> > The SATA controller is: ICH6
> >
> > My problem is that even though B gets into the synchronized state, it
> > is no
> > good at all. This is potentially misleading, and if someone removes A
> > after
> > synchronizing B, the system will probably crash, since there will be
> > no good
> > drives left.
> >
> > I wonder if anyone else is interested in a "paranoid recovery" mode
> > where the
> > md layer tests the data that has been written. Even if this doubles the
> > recovery time, I think that it would be desirable for many applications.
>
>
> --
> bill davidsen <davidsen@tmr.com>
>   CTO TMR Associates, Inc
>   Doing interesting things with small computers since 1979
>
>

next prev parent reply	other threads:[~2007-05-16 20:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-16 15:50 unreadable drives can be synchronized? Colin McCabe
2007-05-16 17:22 ` Bill Davidsen
2007-05-16 20:09   ` Colin McCabe [this message]
2007-05-16 20:18     ` Colin McCabe
2007-05-17  0:54 ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2007-05-18 14:47 Andrew Burgess
2007-05-18 15:04 ` Tomasz Chmielewski
2007-05-18 18:18   ` Colin McCabe
2007-05-23 17:46     ` Bill Davidsen
2007-05-18 18:10 ` Colin McCabe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7296208f0705161309s75f2b989mf09d3b0736fae9a8@mail.gmail.com \
    --to=colin.p.mccabe@gmail.com \
    --cc=davidsen@tmr.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).