From: Harry Mangalam <harry.mangalam@uci.edu>
To: linux-raid <linux-raid@vger.kernel.org>
Subject: rescue an alien md raid5
Date: Mon, 23 Feb 2009 10:13:45 -0800 [thread overview]
Message-ID: <200902231013.46082.harry.mangalam@uci.edu> (raw)
Here's an unusual (long) tale of woe.
We had a USRobotics 8700 NAS appliance with 4 SATA disks in RAID5:
<http://www.usr.com/support/product-template.asp?prod=8700>
which was a fine (if crude) ARM-based Linux NAS until it stroked out
at some point, leaving us with a degraded RAID5 and comatose NAS
device.
We'd like to get the files back of course and I've moved the disks to
a Linux PC, hooked them up to a cheap Silicon Image 4x SATA
controller and brought up the whole frankenmess with mdadm. It
reported a clean but degraded array:
===============================================================
root@pnh-rcs:/# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Wed Feb 14 16:30:17 2007
Raid Level : raid5
Array Size : 1464370176 (1396.53 GiB 1499.52 GB)
Used Dev Size : 488123392 (465.51 GiB 499.84 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Fri Dec 12 20:26:27 2008
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 7a60cd58:ad85ebdc:3b55d79a:a33c7fe6
Events : 0.264294
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 35 1 active sync /dev/sdc3
2 8 51 2 active sync /dev/sdd3
3 8 67 3 active sync /dev/sde3
===============================================================
The original 500G Maxtor disks were formatted in 3 partitions as
follows:
(for /dev/sd[bcde])
disk sdb was bad so I had to replace it.
===============================================================
Disk /dev/sdc: 500.1 GB, 500107862016 bytes
16 heads, 63 sectors/track, 969021 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/sdc1 1 261 131543+ 83 Linux
/dev/sdc2 262 522 131544 82 Linux swap /
Solaris
/dev/sdc3 523 969022 488123496+ 89 Unknown
===============================================================
I formatted the replacement (different make/layout - Seagate) as a
single partition:
/dev/sdb1:
===============================================================
Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x21d01216
Device Boot Start End Blocks Id System
/dev/sdb1 1 60801 488384001 83 Linux
===============================================================
and tried to rebuild the raid by stopping the raid, removing the bad
disk, adding the new disk. It came up and reported that it was
rebuilding. After several hours, it rebuilt and reported itself
clean (altho during a reboot, it became /dev/md1 instead of md0)
===============================================================
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md1 : active raid5 sdb1[0] sde3[3] sdd3[2] sdc3[1]
1464370176 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
===============================================================
===============================================================
$ mdadm --detail /dev/md1
/dev/md1:
Version : 00.90.03
Creation Time : Wed Feb 14 16:30:17 2007
Raid Level : raid5
Array Size : 1464370176 (1396.53 GiB 1499.52 GB)
Used Dev Size : 488123392 (465.51 GiB 499.84 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Mon Feb 23 09:06:27 2009
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 7a60cd58:ad85ebdc:3b55d79a:a33c7fe6
Events : 0.265494
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 35 1 active sync /dev/sdc3
2 8 51 2 active sync /dev/sdd3
3 8 67 3 active sync /dev/sde3
===============================================================
The docs and files on the USR web site imply that the native
filesystem was originally XFS, but when i try to mount it as such, I
can't:
mount -vvv -t xfs /dev/md1 /mnt
mount: fstab path: "/etc/fstab"
mount: lock path: "/etc/mtab~"
mount: temp path: "/etc/mtab.tmp"
mount: no LABEL=, no UUID=, going to mount /dev/md1 by path
mount: spec: "/dev/md1"
mount: node: "/mnt"
mount: types: "xfs"
mount: opts: "(null)"
mount: mount(2) syscall: source: "/dev/md1", target: "/mnt",
filesystemtype: "xfs", mountflags: -1058209792, data: (null)
mount: wrong fs type, bad option, bad superblock on /dev/md1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
and when I check dmesg:
[ 245.008000] SGI XFS with ACLs, security attributes, realtime, large
block numbers, no debug enabled
[ 245.020000] SGI XFS Quota Management subsystem
[ 245.020000] XFS: SB read failed
[ 327.696000] md: md0 stopped.
[ 327.696000] md: unbind<sdc1>
[ 327.696000] md: export_rdev(sdc1)
[ 327.696000] md: unbind<sde1>
[ 327.696000] md: export_rdev(sde1)
[ 327.696000] md: unbind<sdd1>
[ 327.696000] md: export_rdev(sdd1)
[ 439.660000] XFS: bad magic number
[ 439.660000] XFS: SB validate failed
repeated attempts repeat the last 2 lines above. This implies that
the superblock is bad and xfs_repair also reports that:
xfs_repair /dev/md1
- creating 2 worker thread(s)
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!
attempting to find secondary superblock...
...... <lots of ...> ...
..found candidate secondary superblock...
unable to verify superblock, continuing...
<lots of ...> ...
...found candidate secondary superblock...
unable to verify superblock, continuing...
<lots of ...> ...
So my question is what should I do now? Were those 1st 2 partitions
(that I didn't create on the replacement disk) important? Should I
try to remove the replaced disk, create 3 partitions, and try again,
or am I just well and truly hosed?
--
Harry Mangalam - Research Computing, NACS, E2148, Engineering Gateway,
UC Irvine 92697 949 824-0084(o), 949 285-4487(c)
---
Good judgment comes from experience;
Experience comes from bad judgment. [F. Brooks.]
next reply other threads:[~2009-02-23 18:13 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-02-23 18:13 Harry Mangalam [this message]
2009-02-23 18:58 ` rescue an alien md raid5 Joe Landman
2009-02-23 19:59 ` Harry Mangalam
2009-02-23 19:14 ` NeilBrown
2009-02-23 20:04 ` Harry Mangalam
2009-03-02 11:22 ` Nagilum
2009-03-02 16:57 ` Harry Mangalam
2009-02-24 0:56 ` hgichon
2009-02-24 21:11 ` Harry Mangalam
2009-03-14 14:12 ` Ryan Wagoner
2009-03-14 18:15 ` Harry Mangalam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200902231013.46082.harry.mangalam@uci.edu \
--to=harry.mangalam@uci.edu \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).