Re: Raid5 crashed, need comments on possible repair solution

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Christoph Nelles <evilazrael@evilazrael.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid5 crashed, need comments on possible repair solution
Date: Tue, 24 Apr 2012 07:00:44 +1000	[thread overview]
Message-ID: <20120424070044.707745b8@notabene.brown> (raw)
In-Reply-To: <4F955F80.80903@evilazrael.de>

[-- Attachment #1: Type: text/plain, Size: 3452 bytes --]

On Mon, 23 Apr 2012 15:56:16 +0200 Christoph Nelles
<evilazrael@evilazrael.de> wrote:

> Hi,
> 
> Linux RAID worked for me fine in the last few years, but yesterday while
> reorganizing the HW in my server the RAID5 crashed. It was a
> Software-RAID Level 5 with 6x 3TB drives and ran XFS on top of it. I
> have no idea why it crashed, but now all superblocks are invalid (one
> dump follows) and sadly i have no information on the raid disk layout
> (in which sequence the drives were). All drives from the raid are
> available and running.
> 
> As i cannot afford to buy 6x more drives for making a backup prior
> trying to fix the situation, i need a non-destructive approach to fix
> the RAID configuration and the superblocks.
> 
> >From my understanding of the RAID5 implementation the correct order of
> drives is important.
> 
> First Question:
> 1) Am i right that the order is important and i have to try to find the
> right sequence of drives?
> 
> So i would create a loop over all permutations of the drive list and for
> each permutation:
> - Scrub the Superblock mdadm --zero-superblock /dev/sd[bcdefg]1
> - Recreate the RAID5 mdadm --create /dev/md0 -c 64 -l 5 \
> 	-n 6 --assume-clean <drive permutation>
> - Run xfs_check to see if it recognizes the FS xfs_check -s /dev/md0
> - Stop the RAID mdadm --stop /dev/md0
> 
> 2) Is that a promising approach to repair the RAID5 array?
> 3) According the man page the --assume-cleanthat no data is affected
> unless you write to the array, so this effectively prevents a rebuild?
> This is important for me, as i don't want to trigger a rebuild as this
> will certainly send my data to hell.
> 4) Any other idea for repairing the RAID without loosing user data?
> 
> Thanks in advance for any answers.
> 
> 
> Currently the RAID superblocks on each device look like this:
> 
> /dev/sdg1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 53a294b5:975244fc:343b0f94:16652fce
>            Name : grml:0
>   Creation Time : Fri Apr 15 20:55:52 2011
>      Raid Level : -unknown-
>    Raid Devices : 0
> 
>  Avail Dev Size : 5860529039 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : 9688dc72:02140045:c16a2123:4f6cc006
> 
>     Update Time : Sun Apr 22 23:56:14 2012
>        Checksum : 350d8d74 - correct
>          Events : 1
> 
> 
>    Device Role : spare
>    Array State :  ('A' == active, '.' == missing)
> 
> 
> Interestingly at the Update Time the system should have been shut down:
> Apr 22 23:55:55 router init: Switching to runlevel: 0
> [...]
> Apr 22 23:56:03 router exiting on signal 15
> Apr 22 23:59:21 router syslogd 1.5.0: restart.
> 
> I have really no clue what happened.

This is really worrying.  It's about the 3rd or 4th report recently which
contains:

>      Raid Level : -unknown-
>    Raid Devices : 0

and that should not be possible.  There must be some recent bug that causes
the array to be "cleared" *before* writing out the metadata - and that should
be impossible.
What kernel are you running?

You are correct that order is important.  Your algorithm looks good.
However I suggest that you first look through your system looks to see if

  RAID conf printout:

appears at all.  That could contain the device order.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2012-04-23 21:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-23 13:56 Raid5 crashed, need comments on possible repair solution Christoph Nelles
2012-04-23 21:00 ` NeilBrown [this message]
2012-04-23 21:47   ` Christoph Nelles
2012-04-23 23:01     ` NeilBrown
2012-05-12 17:19       ` Pierre Beck
2012-05-14 21:00         ` C.J. Adams-Collier KF7BMP

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120424070044.707745b8@notabene.brown \
    --to=neilb@suse.de \
    --cc=evilazrael@evilazrael.de \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).