Re: Raid5 crashed, need comments on possible repair solution

All of lore.kernel.org
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Christoph Nelles <evilazrael@evilazrael.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid5 crashed, need comments on possible repair solution
Date: Tue, 24 Apr 2012 07:00:44 +1000	[thread overview]
Message-ID: <20120424070044.707745b8@notabene.brown> (raw)
In-Reply-To: <4F955F80.80903@evilazrael.de>

[-- Attachment #1: Type: text/plain, Size: 3452 bytes --]

On Mon, 23 Apr 2012 15:56:16 +0200 Christoph Nelles
<evilazrael@evilazrael.de> wrote:

> Hi,
> 
> Linux RAID worked for me fine in the last few years, but yesterday while
> reorganizing the HW in my server the RAID5 crashed. It was a
> Software-RAID Level 5 with 6x 3TB drives and ran XFS on top of it. I
> have no idea why it crashed, but now all superblocks are invalid (one
> dump follows) and sadly i have no information on the raid disk layout
> (in which sequence the drives were). All drives from the raid are
> available and running.
> 
> As i cannot afford to buy 6x more drives for making a backup prior
> trying to fix the situation, i need a non-destructive approach to fix
> the RAID configuration and the superblocks.
> 
> >From my understanding of the RAID5 implementation the correct order of
> drives is important.
> 
> First Question:
> 1) Am i right that the order is important and i have to try to find the
> right sequence of drives?
> 
> So i would create a loop over all permutations of the drive list and for
> each permutation:
> - Scrub the Superblock mdadm --zero-superblock /dev/sd[bcdefg]1
> - Recreate the RAID5 mdadm --create /dev/md0 -c 64 -l 5 \
> 	-n 6 --assume-clean <drive permutation>
> - Run xfs_check to see if it recognizes the FS xfs_check -s /dev/md0
> - Stop the RAID mdadm --stop /dev/md0
> 
> 2) Is that a promising approach to repair the RAID5 array?
> 3) According the man page the --assume-cleanthat no data is affected
> unless you write to the array, so this effectively prevents a rebuild?
> This is important for me, as i don't want to trigger a rebuild as this
> will certainly send my data to hell.
> 4) Any other idea for repairing the RAID without loosing user data?
> 
> Thanks in advance for any answers.
> 
> 
> Currently the RAID superblocks on each device look like this:
> 
> /dev/sdg1:
>           Magic : a92b4efc
>         Version : 1.2
>     Feature Map : 0x0
>      Array UUID : 53a294b5:975244fc:343b0f94:16652fce
>            Name : grml:0
>   Creation Time : Fri Apr 15 20:55:52 2011
>      Raid Level : -unknown-
>    Raid Devices : 0
> 
>  Avail Dev Size : 5860529039 (2794.52 GiB 3000.59 GB)
>     Data Offset : 2048 sectors
>    Super Offset : 8 sectors
>           State : active
>     Device UUID : 9688dc72:02140045:c16a2123:4f6cc006
> 
>     Update Time : Sun Apr 22 23:56:14 2012
>        Checksum : 350d8d74 - correct
>          Events : 1
> 
> 
>    Device Role : spare
>    Array State :  ('A' == active, '.' == missing)
> 
> 
> Interestingly at the Update Time the system should have been shut down:
> Apr 22 23:55:55 router init: Switching to runlevel: 0
> [...]
> Apr 22 23:56:03 router exiting on signal 15
> Apr 22 23:59:21 router syslogd 1.5.0: restart.
> 
> I have really no clue what happened.

This is really worrying.  It's about the 3rd or 4th report recently which
contains:

>      Raid Level : -unknown-
>    Raid Devices : 0

and that should not be possible.  There must be some recent bug that causes
the array to be "cleared" *before* writing out the metadata - and that should
be impossible.
What kernel are you running?

You are correct that order is important.  Your algorithm looks good.
However I suggest that you first look through your system looks to see if

  RAID conf printout:

appears at all.  That could contain the device order.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2012-04-23 21:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-23 13:56 Raid5 crashed, need comments on possible repair solution Christoph Nelles
2012-04-23 21:00 ` NeilBrown [this message]
2012-04-23 21:47   ` Christoph Nelles
2012-04-23 23:01     ` NeilBrown
2012-05-12 17:19       ` Pierre Beck
2012-05-14 21:00         ` C.J. Adams-Collier KF7BMP

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120424070044.707745b8@notabene.brown \
    --to=neilb@suse.de \
    --cc=evilazrael@evilazrael.de \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.