From: NeilBrown <neilb@suse.de>
To: Stephen Muskiewicz <stephen_muskiewicz@uml.edu>
Cc: linux-raid@vger.kernel.org
Subject: Re: Need help recovering RAID5 array
Date: Sat, 6 Aug 2011 11:29:10 +1000 [thread overview]
Message-ID: <20110806112910.332d450a@notabene.brown> (raw)
In-Reply-To: <4E3C0BCA.9000407@uml.edu>
On Fri, 5 Aug 2011 11:27:06 -0400 Stephen Muskiewicz
<stephen_muskiewicz@uml.edu> wrote:
> Hello,
>
> I'm hoping to figure out how I can recover a RAID5 array that suddenly
> won't start after one of our servers took a power hit.
> I'm fairly confident that all the individual disks of the RAID are OK
> and that I can recover my data (without having to resort to asking my
> sysadmin to fetch the backup tapes), but despite my extensive Googling
> and reviewing the list archives and mdadm manpage, so far nothing I've
> tried has worked. Hopefully I am just missing something simple.
>
> Background: The server is a Sun X4500 (thumper) running CentOS 5.5. I
> have confirmed using the (Sun provided) "hd" utilities that all of the
> individual disks are online and none of the device names appear to have
> changed from before the power outage. There are also two other RAID5
> arrays as well as the /dev/md0 RAID1 OS mirror on the same box that did
> come back cleanly (these have ext3 filesystems on them, the one that
> failed to come up is just a raw partition used via iSCSI if that makes
> any difference.) The array that didn't come back is /dev/md/51, the
> ones that did are /dev/md/52 and /dev/md/53. I have confirmed that all
> three device files do exist in /dev/md. (/dev/md51 is also a symlink to
> /dev/md/51, as are /dev/md52 and /dev/md53 for the working arrays). We
> also did quite a bit of testing on the box before we deployed the arrays
> and haven't seen this problem before now, previously all of the arrays
> came back online as expected. Of course it has also been about 7 months
> since the box has gone down but I don't think there were any major
> changes since then.
>
> When I boot the system (tried this twice including a hard power down
> just to be sure), I see "mdadm: No suitable drives found for /dev/md51".
> Again the other 2 arrays come up just fine. I have checked that the
> array is listed in /etc/mdadm.conf
>
> (I will apologize for a lack of specific mdadm output in my details
> below, the network people have conveniently (?) picked this weekend to
> upgrade the network in our campus building and I am currently unable to
> access the server until they are done!)
>
> "mdadm --detail /dev/md/51" does (as expected?) display: "mdadm: md
> device /dev/md51 does not appear to be active"
>
> I have done an "mdadm --examine" on each of the drives in the array and
> each one shows a state of "clean" with a status of "U" (and all of the
> other drives in the sequence shown as "u"). The array name and UUID
> value look good and the "update time" appears to be about when the
> server lost power. All the checksums read "correct" as well. So I'm
> confident all the individual drives are there and OK.
>
> I do have the original mdadm command used to construct the array.
> (There are 8 active disks in the array plus 2 spares.) I am using
> version 1.0 metadata with the -N arg to provide a name for each array.
> So I used this command with the assemble option (but without the -N or
> -u) options:
>
> mdadm -A /dev/md/51 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
> /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1 /dev/sdj1
>
> But this just gave the "no suitable drives found" message.
>
> I retried the mdadm command using -N <name> and -u <UUID> options but in
> both cases saw the same result.
>
> One odd thing that I noticed was that when I ran an:
> mdadm --detail --scan
>
> The output *does* display all three arrays, but the name of the arrays
> shows up as "ARRAY /dev/md/<arrayname>" rather than the "ARRAY
> /dev/md/NN" that I would expect (and that is in my /etc/mdadm.conf
> file). Not sure if this has anything to do with the problem or not.
> There are no /dev/md/<arrayname> device files or symlinks on the system.
So maybe the only problem is that the names are missing from /dev/md/ ???
When you can access the server again, could you report:
cat /proc/mdstat
grep md /proc/partitions
ls -l /dev/md*
and maybe
mdadm -Ds
mdadm -Es
cat /etc/mdadm.conf
just for completeness.
It certainly looks like your data is all there but maybe not appearing
exactly where you expect it.
>
> I *think* my next step based on the various posts I've read would be to
> try the same mdadm -A command with --force, but I'm a little wary of
> that and want to make sure I actually understand what I'm doing so I
> don't screw up the array entirely and lose all my data! I'm not sure if
> I should be giving it *all* of the drives as an arg, including the
> spares or should I just pass it the active drives? Should I use the
> --raid-devices and/or --spare-devices options? Anything else I should
> include or not include?
When you do a "-A --force" you do give it all they drives that might be part
of the array so it has maximum information.
--spare-devices and --raid-devices are not meaningful with --assemble.
NeilBrown
>
> Thanks in advance to any advice you can provide. I won't be able to
> test until Monday morning but it would be great to be armed with things
> to try so I can hopefully get back up and running soon and minimize all
> of those "When will the network share be back up?" questions that I'm
> already anticipating getting.
>
> Cheers,
> -steve
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-08-06 1:29 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-05 15:27 Need help recovering RAID5 array Stephen Muskiewicz
2011-08-06 1:29 ` NeilBrown [this message]
2011-08-08 17:41 ` Muskiewicz, Stephen C
2011-08-08 23:12 ` NeilBrown
2011-08-09 2:29 ` Stephen Muskiewicz
2011-08-09 2:55 ` NeilBrown
2011-08-09 11:38 ` Phil Turmel
2011-08-09 14:47 ` Muskiewicz, Stephen C
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110806112910.332d450a@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=stephen_muskiewicz@uml.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).