From: NeilBrown <neilb@suse.de>
To: Oliver Schinagl <oliver+list@schinagl.nl>
Cc: Martin Ziler <martin.ziler@googlemail.com>, linux-raid@vger.kernel.org
Subject: Re: degraded raid 6 (1 bad drive) showing up inactive, only spares
Date: Fri, 8 Jun 2012 08:34:50 +1000 [thread overview]
Message-ID: <20120608083450.5dba2d2a@notabene.brown> (raw)
In-Reply-To: <4FD11A32.90107@schinagl.nl>
[-- Attachment #1: Type: text/plain, Size: 12934 bytes --]
On Thu, 07 Jun 2012 23:16:34 +0200 Oliver Schinagl <oliver+list@schinagl.nl>
wrote:
> Since i'm still working on repairing my own array, and using a wrong
> version of mdadm corrupted one of my raid10 array, I'm trying to hexedit
> the start of an image of the disk to recover the metadata.
>
> A quick question, if I've edited/checked the first superblock,
> (i'm using
> https://raid.wiki.kernel.org/index.php/RAID_superblock_formats for
> reference and looks quite accurate)
>
> Would I need to check other area's on the disk for superblocks? Or will
> the first superblock be enough?
Are we talking about filesystem superblocks or RAID superblocks?
there is only one RAID superblock - normally 4K from the start (with 1.2
metadta). There may be lots of filesystem superblocks. I think extX only
uses the first if it is good, but I don't know for certain.
NeilBrown
>
> On 07-06-12 14:29, NeilBrown wrote:
> > On Thu, 7 Jun 2012 13:55:32 +0200 Martin Ziler<martin.ziler@googlemail.com>
> > wrote:
> >
> >> Hello everybody,
> >>
> >> I am running a 9-disk raid6 without hot spares. I already had one drive go bad, which I could replace and continue using the array without any degraded raid messages. Recently I had another drive going bad by the smart-info. As it wasn't quite dead I left the array as was without really using it all that much waiting for a replacement drive I ordered. As I booted the machine up in order to replace the drive I was greeted by an inactive array with all devices showing up as spares.
> >>
> >> md0 : inactive sdh2[0](S) sdi2[7](S) sde2[6](S) sdd2[5](S) sdf2[1](S) sdg2[2](S) sdc1[9](S) sdb2[3](S)
> >> 15579088439 blocks super 1.2
> >>
> >> mdadm --examine confirms that. I already searched the web quite a bit and found this mailing list. Maybe someone in here can give me some input. Normally a degraded raid should still be active. So I am quite surprised that my array with only one drive missing goes inactive. I appended the info mdadm --examine puts out for all the drives. However the first two should probably suffice as only /dev/sdk differs from the rest. The faulty drive - sdk - is still recognized as a raid6 member, wheres all the others show up as spares. With lots of bad sectors sdk isn't accessible anymore.
> > You must be running 3.2.1 or 3.3 (I think).
> >
> > You've been bitten by a rather nasty bug.
> >
> > You can get your data back, but it will require a bit of care, so don't rush
> > it.
> >
> > The metadata on almost all the devices have been seriously corrupted. The
> > only way to repair it is to recreate the array.
> > Doing this just writes new metadata and assembles the array. It doesn't touch
> > the data so if we get the --create command right, all your data will be
> > available again.
> > If we get it wrong, you won't be able to see your data, but we can easily stop
> > the array and create again with different parameters until we get it right.
> >
> > First thing to do it to get a newer kernel. I would recommend the latest in
> > the 3.3.y series.
> >
> > Then you need to:
> > - make sure you have a version of mdadm which gets the data offset to 1M
> > (2048 sectors). I think 3.2.3 or earlier does that - don't upgrade to
> > 3.2.5.
> > - find the chunk size - looks like it is 4M, as sdk2 isn't corrupt.
> > - find the order of devices. This should be in your kernel logs in
> > "RAID conf printout". Hopefully device names haven't changed.
> >
> > Then (with new kernel running)
> >
> > mdadm --create /dev/md0 -l6 -n9 -c 4M -e 1.2 /dev/sdb2 /dev/sdc2 /dev/sdd2 \
> > /dev/sde2 /dev/sdf2 /dev/sdg2 /dev/sdh2 /dev/sdi2 missing \
> > --assume-clean
> >
> > Make double-sure you add that --assume-clean.
> >
> > Note the last device is 'missing'. That corresponds to sdk2 (which we
> > know is device 8 - the last of 9 (0..8)). It fails so it not part of the
> > array any more. The others I just guessed the order. You should try to
> > verify it before you proceed (see RAID conf printout in kernel logs).
> >
> > After the 'create' use "mdadm -E" to look at one device and make sure
> > the Data Offset, Avail Dev Size and Array Size are the same as we saw
> > on sdk2.
> > If it is, try "fsck -n /dev/md0". That assumes ext3 or ext4. If you had
> > something else on the array some other command might be needed.
> >
> > If that looks bad, "mdadm -S /dev/md0" and try again with a different order.
> > If it looks good, "echo check> /sys/block/md0/md/sync_action" and watch
> > "mismatch_cnt" in the same directory. If it says low (few hundred at most)
> > all is good. If it goes up to thousands something is wrong - try another
> > order.
> >
> > Once you have the array working again,
> > "echo repair> /sys/block/md0/md/sync_action"
> > then add your new device to be rebuilt.
> >
> > Good luck.
> > Please ask if you are unsure about anything.
> >
> > NeilBrown
> >
> >>
> >> /dev/sdk2:
> >> Magic : a92b4efc
> >> Version : 1.2
> >> Feature Map : 0x0
> >> Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >> Name : server:0 (local to host server)
> >> Creation Time : Mon Jul 25 23:40:50 2011
> >> Raid Level : raid6
> >> Raid Devices : 9
> >>
> >> Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >> Array Size : 27172970496 (12957.08 GiB 13912.56 GB)
> >> Used Dev Size : 3881852928 (1851.01 GiB 1987.51 GB)
> >> Data Offset : 2048 sectors
> >> Super Offset : 8 sectors
> >> State : clean
> >> Device UUID : 882eb11a:33b499a7:dd5856b7:165f916c
> >>
> >> Update Time : Fri Jun 1 20:26:45 2012
> >> Checksum : b8c58093 - correct
> >> Events : 623119
> >>
> >> Layout : left-symmetric
> >> Chunk Size : 4096K
> >>
> >> Device Role : Active device 8
> >> Array State : AAAAAAAAA ('A' == active, '.' == missing)
> >>
> >>
> >> /dev/sdh2:
> >> Magic : a92b4efc
> >> Version : 1.2
> >> Feature Map : 0x0
> >> Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >> Name : server:0 (local to host server)
> >> Creation Time : Mon Jul 25 23:40:50 2011
> >> Raid Level : -unknown-
> >> Raid Devices : 0
> >>
> >> Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >> Data Offset : 2048 sectors
> >> Super Offset : 8 sectors
> >> State : active
> >> Device UUID : 44008309:1dfb1408:cabfbd0a:64de3739
> >>
> >> Update Time : Thu Jun 7 12:27:52 2012
> >> Checksum : 27f93899 - correct
> >> Events : 2
> >>
> >> Device Role : spare
> >> Array State : ('A' == active, '.' == missing)
> >>
> >> ---------------------------------------------------------------------------------------------------------------
> >>
> >> /dev/sdi2:
> >> Magic : a92b4efc
> >> Version : 1.2
> >> Feature Map : 0x0
> >> Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >> Name : server:0 (local to host server)
> >> Creation Time : Mon Jul 25 23:40:50 2011
> >> Raid Level : -unknown-
> >> Raid Devices : 0
> >>
> >> Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >> Data Offset : 2048 sectors
> >> Super Offset : 8 sectors
> >> State : active
> >> Device UUID : 135f196d:184f11a1:09207617:4022e1a5
> >>
> >> Update Time : Thu Jun 7 12:27:52 2012
> >> Checksum : 9ded8f86 - correct
> >> Events : 2
> >>
> >>
> >> Device Role : spare
> >> Array State : ('A' == active, '.' == missing)
> >>
> >> /dev/sde2:
> >> Magic : a92b4efc
> >> Version : 1.2
> >> Feature Map : 0x0
> >> Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >> Name : server:0 (local to host server)
> >> Creation Time : Mon Jul 25 23:40:50 2011
> >> Raid Level : -unknown-
> >> Raid Devices : 0
> >>
> >> Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >> Data Offset : 2048 sectors
> >> Super Offset : 8 sectors
> >> State : active
> >> Device UUID : 3517bcc4:2acb381f:f5006058:5bd5c831
> >>
> >> Update Time : Thu Jun 7 12:27:52 2012
> >> Checksum : 408957c0 - correct
> >> Events : 2
> >>
> >>
> >> Device Role : spare
> >> Array State : ('A' == active, '.' == missing)
> >>
> >> /dev/sdd2:
> >> Magic : a92b4efc
> >> Version : 1.2
> >> Feature Map : 0x0
> >> Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >> Name : server:0 (local to host server)
> >> Creation Time : Mon Jul 25 23:40:50 2011
> >> Raid Level : -unknown-
> >> Raid Devices : 0
> >>
> >> Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >> Data Offset : 2048 sectors
> >> Super Offset : 8 sectors
> >> State : active
> >> Device UUID : 9e8b2d2c:844a009a:fd6914a2:390f10ac
> >>
> >> Update Time : Thu Jun 7 12:27:52 2012
> >> Checksum : e6bdee68 - correct
> >> Events : 2
> >>
> >>
> >> Device Role : spare
> >> Array State : ('A' == active, '.' == missing)
> >>
> >> /dev/sdf2:
> >> Magic : a92b4efc
> >> Version : 1.2
> >> Feature Map : 0x0
> >> Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >> Name : server:0 (local to host server)
> >> Creation Time : Mon Jul 25 23:40:50 2011
> >> Raid Level : -unknown-
> >> Raid Devices : 0
> >>
> >> Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >> Data Offset : 2048 sectors
> >> Super Offset : 8 sectors
> >> State : active
> >> Device UUID : 87ad38ac:4ccbd831:ee5502cd:28dafaad
> >>
> >> Update Time : Thu Jun 7 12:27:52 2012
> >> Checksum : 2b7a47f6 - correct
> >> Events : 2
> >>
> >>
> >> Device Role : spare
> >> Array State : ('A' == active, '.' == missing)
> >>
> >> /dev/sdg2:
> >> Magic : a92b4efc
> >> Version : 1.2
> >> Feature Map : 0x0
> >> Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >> Name : server:0 (local to host server)
> >> Creation Time : Mon Jul 25 23:40:50 2011
> >> Raid Level : -unknown-
> >> Raid Devices : 0
> >>
> >> Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >> Data Offset : 2048 sectors
> >> Super Offset : 8 sectors
> >> State : active
> >> Device UUID : eef2f06f:28f881a5:da857a00:fb90e250
> >>
> >> Update Time : Thu Jun 7 12:27:52 2012
> >> Checksum : 393ba0f8 - correct
> >> Events : 2
> >>
> >>
> >> Device Role : spare
> >> Array State : ('A' == active, '.' == missing)
> >>
> >> /dev/sdc1:
> >> Magic : a92b4efc
> >> Version : 1.2
> >> Feature Map : 0x0
> >> Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >> Name : server:0 (local to host server)
> >> Creation Time : Mon Jul 25 23:40:50 2011
> >> Raid Level : -unknown-
> >> Raid Devices : 0
> >>
> >> Avail Dev Size : 3985162143 (1900.27 GiB 2040.40 GB)
> >> Used Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >> Data Offset : 2048 sectors
> >> Super Offset : 8 sectors
> >> State : active
> >> Device UUID : 4cf86fb0:6f334e2c:19e89c99:0532f557
> >>
> >> Update Time : Thu Jun 7 12:27:52 2012
> >> Checksum : a6e42bdc - correct
> >> Events : 2
> >>
> >>
> >> Device Role : spare
> >> Array State : ('A' == active, '.' == missing)
> >>
> >> /dev/sdb2:
> >> Magic : a92b4efc
> >> Version : 1.2
> >> Feature Map : 0x0
> >> Array UUID : 25be3ab5:ef5f1166:d64b0e0e:4df143ed
> >> Name : server:0 (local to host server)
> >> Creation Time : Mon Jul 25 23:40:50 2011
> >> Raid Level : -unknown-
> >> Raid Devices : 0
> >>
> >> Avail Dev Size : 3881859248 (1851.01 GiB 1987.51 GB)
> >> Data Offset : 2048 sectors
> >> Super Offset : 8 sectors
> >> State : active
> >> Device UUID : 4852882a:b8a3989f:aad747c5:25f20d47
> >>
> >> Update Time : Thu Jun 7 12:27:52 2012
> >> Checksum : a8e25edd - correct
> >> Events : 2
> >>
> >>
> >> Device Role : spare
> >> Array State : ('A' == active, '.' == missing)--
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-06-07 22:34 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-07 11:55 degraded raid 6 (1 bad drive) showing up inactive, only spares Martin Ziler
2012-06-07 12:29 ` NeilBrown
[not found] ` <CAGHsWsm_Xvf59VCuHyJvoMW6peiFHK=YQKGzr3cq=RDk7jyqKg@mail.gmail.com>
2012-06-07 21:14 ` NeilBrown
[not found] ` <C9680CD3-8DA8-4FE3-8337-481676213C39@googlemail.com>
2012-06-09 22:09 ` NeilBrown
2012-06-07 21:16 ` Oliver Schinagl
2012-06-07 22:34 ` NeilBrown [this message]
[not found] ` <alpine.DEB.2.00.1206080743310.25054@uplift.swm.pp.se>
2012-06-15 7:37 ` NeilBrown
2012-06-15 11:35 ` Oliver Schinagl
2012-06-15 12:37 ` NeilBrown
-- strict thread matches above, loose matches on Subject: below --
2012-06-17 15:52 Martin Ziler
2012-06-20 21:56 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120608083450.5dba2d2a@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=martin.ziler@googlemail.com \
--cc=oliver+list@schinagl.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).