All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Schinagl <oliver+list@schinagl.nl>
To: Pierre Martineau <pierre.martineau@inserm.fr>,
	linux-raid@vger.kernel.org
Subject: Re: RAID5 recovering
Date: Mon, 15 Apr 2013 17:49:42 +0200	[thread overview]
Message-ID: <516C2196.4050308@schinagl.nl> (raw)
In-Reply-To: <20130415151939.GA8383@cthulhu.home.robinhill.me.uk>

On 15-04-13 17:19, Robin Hill wrote:
> On Mon Apr 15, 2013 at 03:47:39PM +0200, Pierre Martineau wrote:
>
>> Dear Raid experts,
>>
>> I have a Raid5 volume that recently crashed and I need you advices
>> before doing some irreversible action.
>>
>> Let me first summarize the past and current state.
>>
>> 1) I had a nicely running RAID5 volume with 3 x 1 To disks (LVM on top
>> and several LVM volumes in ext3 and axt4) but volume was now a bit too
>> small and I decided to add a new 1 To disk.
>>
> Given the rebuild time for a 1To disk, I'd be wary of running RAID5 - if
> you have the space, adding another disk and going to RAID6 will be much
> safer.
+1
Raid5 is great, it really is, but raid6 is so much more better.
>> 2) I added a new disk and did not do anything for a couple of days (Raid
>> still running with 3 disks)
>>
>> 3) One of the old disk failed and was ejected from the RAID.
>>
>> 4) The ejected disk was not even present as /dev/sdX. I thus tested the
>> connections and the disk came back.
>>
>> 5) I resync the ejected disk and I was back with my original 3 disk array.
>>
>> 6) I waited 2-3 days and everything was fine. I then added the new disk
>> and resync.
>>
>> 7) I had now a running 4 disk RAID5 array, I created a new volume and
>> started copying on it.
>>
>> 8) During the week-end, 2 disks were ejected from the array, the new
>> installed one and the same than previously (step 3)
>>
>> 9) Again the 2 disks were not present in /dev/sdX. I thus checked again
>> the connections and the problem was a molex connector. The two ejected
>> disks were on the same molex and this explains why both were detected as
>> faulty.
>>
>> Now, my list of errors as a newbie.
>>
>> 4) I did not save all the informations before proceeding (mdadm
>> --examine, /etc/mdadm/mdadm.conf, syslog, ...)
>>
>> 5) I tried to assemble the disks with
>> mdadm --assemble --scan
>> with no result
>>
>> 6) I thus tried and this is my big error I think !!!
>> mdadm --assemble /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>>
>> I forgot in this command /dev/md0 after assemble.
>> Because of this /dev/sdb1 suberblock was removed and now mdadm--examine
>> /dev/sdb1 returns "No md superblock detected on /dev/sdb1"
>>
>> I would like now to be more cautious. If some nice expert from the list
>> would be nice enough to tell me if the proposed method described below
>> is the right approach I will be grateful for the rest of my life :-)
>>
>> 7) I read the RAID wiki and the list.
>>
>> 8) I saved
>> mdadm --examine /dev/sd[bcde]1
>> dmesg
>> syslog
>> /etc/mdadm/mdadm.conf
>> fdisk -lu /dev/sd[bcde]
>>
>> I put the content of this files at the end of this message (except dmesg
>> and syslog because they are very long).
>>
>> 9) /dev/sdd is the new disk. This is clear in the fdisk listing since it
>> is a 4K sector disk.
>> The normal order of the raid is thus (see mdadm --examine /dev/sd[de]1)
>> sdb1 sdc1 sde1 sdd1
>>
>> 10) Events are
>> /dev/sdb1: no md superblock (see 6)
>> /dev/sdc1: Events : 112358
>> /dev/sdd1: Events : 112333
>> /dev/sde1: Events : 112358
>>
>> It seems that sdd was the first disk removed.
>> Presumably sdb1 is in sync since it was running with sdc1 when the sdd1
>> and sde1 were ejected from the array (see 8) but I can't be sure since I
>> stupidly erased its superblock!
>>
>> 11) I propose to re-create the array with the --assume-clean option,
>> then check everything using "fsck -n" and "mount -o ro"
>> the command would be:
>>
>> mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
>> --chunk=64 --size=976759936 /dev/sdb1 /dev/sdc1 /dev/sde1 /dev/sdd1
>>
> <-- snip -->
>
> Have you tried to force assemble the array first? Recreating the array
> is a risky option, so should be avoided if possible. First try doing:
>    mdadm -Af /dev/md0 /dev/sd[cde]1
I don't know if this would have been the best first course of action. 
You forcibly used the array with a wrong event count. You got lucky this 
time and only had minor corruptions, it could have been much much worse.

You could have examined the superblock first with hexdump -C /dev/sdb1 | 
less

See if it is all actually zero, or just some fields and hopefully could 
be recreated by examining the other disks.

I personally would have trusted the recreation method more. Dump all 
superblocks (as backup! with dd so you can always write it back)! 
recreate it using sd[bce]1 (sdd1 wasn't fully in sync) and fsck -n (read 
only test). If that is okay, read only mount. (I would even mark the 
array  as read-only). If all that works. You have a corrected 3/4 array. 
Re-add sdd1.

If you dump the superblock via dd (some hexdumping juju should give you 
the start of the ext/lvm's and thus upto that point should be dumped, 
about 4MiB i guess) you should have a perfectly acceptable way to get 
your superblocks back into its original state (if needed).

Also, I recall having read on this list that raid5 disk 'order' didn't 
matter? Only with raid6 it apparently mattered.

Anyway, you got it all back, so lucky you :)
>
> If that works then you'll need to re-add (and rebuild) /dev/sdb1. If it
> doesn't work, try rerunning (after making sure the array is stopped) and
> adding "-vvv" for extra verbosity, then send through the output from
> that and anything relevant from dmesg.
>
> HTH,
>      Robin


  reply	other threads:[~2013-04-15 15:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-15 13:47 RAID5 recovering Pierre Martineau
2013-04-15 15:19 ` Robin Hill
2013-04-15 15:49   ` Oliver Schinagl [this message]
2013-04-15 15:58   ` Pierre Martineau
2013-04-16  8:30   ` Roman Mamedov
2013-04-16 16:41     ` Roy Sigurd Karlsbakk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=516C2196.4050308@schinagl.nl \
    --to=oliver+list@schinagl.nl \
    --cc=linux-raid@vger.kernel.org \
    --cc=pierre.martineau@inserm.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.