From: NeilBrown <neilb@suse.de>
To: Christoph Nelles <evilazrael@evilazrael.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid5 crashed, need comments on possible repair solution
Date: Tue, 24 Apr 2012 07:00:44 +1000 [thread overview]
Message-ID: <20120424070044.707745b8@notabene.brown> (raw)
In-Reply-To: <4F955F80.80903@evilazrael.de>
[-- Attachment #1: Type: text/plain, Size: 3452 bytes --]
On Mon, 23 Apr 2012 15:56:16 +0200 Christoph Nelles
<evilazrael@evilazrael.de> wrote:
> Hi,
>
> Linux RAID worked for me fine in the last few years, but yesterday while
> reorganizing the HW in my server the RAID5 crashed. It was a
> Software-RAID Level 5 with 6x 3TB drives and ran XFS on top of it. I
> have no idea why it crashed, but now all superblocks are invalid (one
> dump follows) and sadly i have no information on the raid disk layout
> (in which sequence the drives were). All drives from the raid are
> available and running.
>
> As i cannot afford to buy 6x more drives for making a backup prior
> trying to fix the situation, i need a non-destructive approach to fix
> the RAID configuration and the superblocks.
>
> >From my understanding of the RAID5 implementation the correct order of
> drives is important.
>
> First Question:
> 1) Am i right that the order is important and i have to try to find the
> right sequence of drives?
>
> So i would create a loop over all permutations of the drive list and for
> each permutation:
> - Scrub the Superblock mdadm --zero-superblock /dev/sd[bcdefg]1
> - Recreate the RAID5 mdadm --create /dev/md0 -c 64 -l 5 \
> -n 6 --assume-clean <drive permutation>
> - Run xfs_check to see if it recognizes the FS xfs_check -s /dev/md0
> - Stop the RAID mdadm --stop /dev/md0
>
> 2) Is that a promising approach to repair the RAID5 array?
> 3) According the man page the --assume-cleanthat no data is affected
> unless you write to the array, so this effectively prevents a rebuild?
> This is important for me, as i don't want to trigger a rebuild as this
> will certainly send my data to hell.
> 4) Any other idea for repairing the RAID without loosing user data?
>
> Thanks in advance for any answers.
>
>
> Currently the RAID superblocks on each device look like this:
>
> /dev/sdg1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x0
> Array UUID : 53a294b5:975244fc:343b0f94:16652fce
> Name : grml:0
> Creation Time : Fri Apr 15 20:55:52 2011
> Raid Level : -unknown-
> Raid Devices : 0
>
> Avail Dev Size : 5860529039 (2794.52 GiB 3000.59 GB)
> Data Offset : 2048 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 9688dc72:02140045:c16a2123:4f6cc006
>
> Update Time : Sun Apr 22 23:56:14 2012
> Checksum : 350d8d74 - correct
> Events : 1
>
>
> Device Role : spare
> Array State : ('A' == active, '.' == missing)
>
>
> Interestingly at the Update Time the system should have been shut down:
> Apr 22 23:55:55 router init: Switching to runlevel: 0
> [...]
> Apr 22 23:56:03 router exiting on signal 15
> Apr 22 23:59:21 router syslogd 1.5.0: restart.
>
> I have really no clue what happened.
This is really worrying. It's about the 3rd or 4th report recently which
contains:
> Raid Level : -unknown-
> Raid Devices : 0
and that should not be possible. There must be some recent bug that causes
the array to be "cleared" *before* writing out the metadata - and that should
be impossible.
What kernel are you running?
You are correct that order is important. Your algorithm looks good.
However I suggest that you first look through your system looks to see if
RAID conf printout:
appears at all. That could contain the device order.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-04-23 21:00 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-23 13:56 Raid5 crashed, need comments on possible repair solution Christoph Nelles
2012-04-23 21:00 ` NeilBrown [this message]
2012-04-23 21:47 ` Christoph Nelles
2012-04-23 23:01 ` NeilBrown
2012-05-12 17:19 ` Pierre Beck
2012-05-14 21:00 ` C.J. Adams-Collier KF7BMP
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120424070044.707745b8@notabene.brown \
--to=neilb@suse.de \
--cc=evilazrael@evilazrael.de \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).