Re: Bad drive discovered during raid5 reshape

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Neil Brown <neilb@suse.de>
To: Kyle Stuart <kstuart@sisna.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Bad drive discovered during raid5 reshape
Date: Tue, 30 Oct 2007 17:29:15 +1100	[thread overview]
Message-ID: <18214.53051.652280.492087@notabene.brown> (raw)
In-Reply-To: message from Kyle Stuart on Monday October 29

On Monday October 29, kstuart@sisna.com wrote:
> Hi,
> I bought two new hard drives to expand my raid array today and
> unfortunately one of them appears to be bad. The problem didn't arise
> until after I attempted to grow the raid array. I was trying to expand
> the array from 6 to 8 drives. I added both drives using mdadm --add
> /dev/md1 /dev/sdb1 which completed, then mdadm --add /dev/md1 /dev/sdc1
> which also completed. I then ran mdadm --grow /dev/md1 --raid-devices=8.
> It passed the critical section, then began the grow process.
> 
> After a few minutes I started to hear unusual sounds from within the
> case. Fearing the worst I tried to cat /proc/mdstat which resulted in no
> output so I checked dmesg which showed that /dev/sdb1 was not working
> correctly. After several minutes dmesg indicated that mdadm gave up and
> the grow process stopped. After googling around I tried the solutions
> that seemed most likely to work, including removing the new drives with
> mdadm --remove --force /dev/md1 /dev/sd[bc]1 and rebooting after which I
> ran mdadm -Af /dev/md1. The grow process restarted then failed almost
> immediately. Trying to mount the drive gives me a reiserfs replay
> failure and suggests running fsck. I don't dare fsck the array since
> I've already messed it up so badly. Is there any way to go back to the
> original working 6 disc configuration with minimal data loss? Here's
> where I'm at right now, please let me know if I need to include any
> additional information.

Looks like you are in real trouble.  Both the drives seem bad in some
way.  If it was just sdc that was failing it would have picked up
after the "-Af", but when it tried, sdb gave errors.

Have two failed devices in a RAID5 is not good!

Your best bet goes like this:

  The reshape has started and got up to some point.  The data
  before that point is spread over 8 drives.  The data after is over
  6.
  We need to restripe the 8drive data back to 6 drives.  This can be
  done with the test_stripe tool that can be built from the mdadm
  source. 

  1/ Find out how far the reshape progressed, by using "mdadm -E" on
     one of the devices.
  2/ use something like
        test_stripe save /some/file 8 $chunksize 5 2 0 $length  /dev/......

     If you get all the args right, this should copy the data from
     the array into /some/file.
     You could possibly do the same thing by assembling the array 
     read-only (set /sys/modules/md_mod/parameters/start_ro to 1)
     and 'dd' from the array.  It might be worth doing both and
     checking you get the same result.

  3/ use something like
        test_stripe restore /some/file 6 ..........
     to restore the data to just 6 devices.

  4/ use "mdadm -C" to create the array a-new on the 6 devices.  Make
     sure the order and the chunksize etc is preserved.

     Once you have done this, the start of the array should (again)
     look like the content of /some/file.  It wouldn't hurt to check.

   Then your data would be as much back together as possible.
   You will probably still need to do an fsck, but I think you did the
   right thing in holding off.  Don't do an fsck until you are sure
   the array is writable.

You can probably do the above without using test_stripe by using dd to
copy of the array before you recreate it, then using dd to put the
same data back.  Using test_stripe as well might give you extra
confidence. 

Feel free to ask questions

NeilBrown

next prev parent reply	other threads:[~2007-10-30  6:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-29  7:10 Bad drive discovered during raid5 reshape Kyle Stuart
2007-10-30  6:29 ` Neil Brown [this message]
2007-10-30 11:17   ` David Greaves
2007-10-30 11:43     ` Neil Brown
2007-10-30 12:35       ` David Greaves
2007-10-31  0:08         ` Kyle Stuart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18214.53051.652280.492087@notabene.brown \
    --to=neilb@suse.de \
    --cc=kstuart@sisna.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).