From: NeilBrown <neilb@suse.de>
To: jahammonds prost <gmitch64@yahoo.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Failed RAID 6 array advice
Date: Wed, 2 Mar 2011 16:26:57 +1100 [thread overview]
Message-ID: <20110302162657.036d4ab5@notabene.brown> (raw)
In-Reply-To: <65522.87245.qm@web55807.mail.re3.yahoo.com>
On Tue, 1 Mar 2011 21:05:33 -0800 (PST) jahammonds prost <gmitch64@yahoo.com>
wrote:
> I've just had a 3rd drive fail on one of my RAID 6 arrays, and I'm looking for
> some advice on how to get it back enough that I can recover the data, and then
> replacing the other failed drives.
>
>
> mdadm -V
> mdadm - v3.0.3 - 22nd October 2009
>
>
> Not the most up to date release, but it seems to be the latest one available on
> FC12
>
>
>
> The /etc/mdadm.conf file is
>
> ARRAY /dev/md0 uuid=1470c671:4236b155:67287625:899db153
>
>
> Which explains why I didn't get emailed about the drive failures. This isn't my
> standard file, and I don't know how it was changed, but that's another issue for
> another day.
>
>
>
> mdadm --detail /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Sat Jun 5 10:38:11 2010
> Raid Level : raid6
> Used Dev Size : 488383488 (465.76 GiB 500.10 GB)
> Raid Devices : 15
> Total Devices : 12
> Persistence : Superblock is persistent
> Update Time : Tue Mar 1 22:17:41 2011
> State : active, degraded, Not Started
> Active Devices : 12
> Working Devices : 12
> Failed Devices : 0
> Spare Devices : 0
> Chunk Size : 512K
> Name : file00bert.woodlea.org.uk:0 (local to host
> file00bert.woodlea.org.uk)
> UUID : 1470c671:4236b155:67287625:899db153
> Events : 254890
> Number Major Minor RaidDevice State
> 0 8 113 0 active sync /dev/sdh1
> 1 8 17 1 active sync /dev/sdb1
> 2 8 177 2 active sync /dev/sdl1
> 3 0 0 3 removed
> 4 8 33 4 active sync /dev/sdc1
> 5 8 193 5 active sync /dev/sdm1
> 6 0 0 6 removed
> 7 8 49 7 active sync /dev/sdd1
> 8 8 209 8 active sync /dev/sdn1
> 9 8 161 9 active sync /dev/sdk1
> 10 0 0 10 removed
> 11 8 225 11 active sync /dev/sdo1
> 12 8 81 12 active sync /dev/sdf1
> 13 8 241 13 active sync /dev/sdp1
> 14 8 1 14 active sync /dev/sda1
>
>
>
> The output from the failed drives are as follows.
>
>
> mdadm --examine /dev/sde1
> /dev/sde1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : 1470c671:4236b155:67287625:899db153
> Name : file00bert.woodlea.org.uk:0 (local to host
> file00bert.woodlea.org.uk)
> Creation Time : Sat Jun 5 10:38:11 2010
> Raid Level : raid6
> Raid Devices : 15
> Avail Dev Size : 976767730 (465.76 GiB 500.11 GB)
> Array Size : 12697970688 (6054.86 GiB 6501.36 GB)
> Used Dev Size : 976766976 (465.76 GiB 500.10 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 3e284f2e:d939fb97:0b74eb88:326e879c
> Internal Bitmap : 2 sectors from superblock
> Update Time : Tue Mar 1 21:53:31 2011
> Checksum : 768f0f34 - correct
> Events : 254591
> Chunk Size : 512K
> Device Role : Active device 10
> Array State : AAA.AA.AAAAAAAA ('A' == active, '.' == missing)
>
>
> The above is the drive that failed tonight, and the one I would like to re add
> back into the array. There have been no writes to the filesystem on the array in
> the last couple of days (other than what ext4 would do on it's own).
>
>
> mdadm --examine /dev/sdi1
> /dev/sdi1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : 1470c671:4236b155:67287625:899db153
> Name : file00bert.woodlea.org.uk:0 (local to host
> file00bert.woodlea.org.uk)
> Creation Time : Sat Jun 5 10:38:11 2010
> Raid Level : raid6
> Raid Devices : 15
> Avail Dev Size : 976767730 (465.76 GiB 500.11 GB)
> Array Size : 12697970688 (6054.86 GiB 6501.36 GB)
> Used Dev Size : 976766976 (465.76 GiB 500.10 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 8e668e39:06d8281b:b79aa3ab:a1d55fb5
> Internal Bitmap : 2 sectors from superblock
> Update Time : Thu Feb 10 18:20:54 2011
> Checksum : 4078396b - correct
> Events : 254075
> Chunk Size : 512K
> Device Role : Active device 3
> Array State : AAAAAA.AAAAAAAA ('A' == active, '.' == missing)
>
>
> mdadm --examine /dev/sdj1
> /dev/sdj1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : 1470c671:4236b155:67287625:899db153
> Name : file00bert.woodlea.org.uk:0 (local to host
> file00bert.woodlea.org.uk)
> Creation Time : Sat Jun 5 10:38:11 2010
> Raid Level : raid6
> Raid Devices : 15
> Avail Dev Size : 976767730 (465.76 GiB 500.11 GB)
> Array Size : 12697970688 (6054.86 GiB 6501.36 GB)
> Used Dev Size : 976766976 (465.76 GiB 500.10 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 37d422cc:8436960a:c3c4d11c:81a8e4fa
> Internal Bitmap : 2 sectors from superblock
> Update Time : Thu Oct 21 23:45:06 2010
> Checksum : 78950bb5 - correct
> Events : 21435
> Chunk Size : 512K
> Device Role : Active device 6
> Array State : AAAAAAAAAAAAAAA ('A' == active, '.' == missing)
>
>
> Looks like sdj1 failed waaay back in Oct last year (sigh). As I said, I am not
> to bothered about adding these last 2 drives back into the array, since they
> failed so long ago. I have a couple of spare drives sitting here, and I will
> replace these 2 drives with them (once I have completed a badblocks on them).
> Looking at the output of dmesg, there are no other errors showing for the 3
> drives, other than them being kicked out of the array for being non fresh.
>
> I guess I have a couple of questions.
>
> What's the correct process for adding the failed /dev/sde1 back into the array
> so I can start it. I don't want to rush into this and make things worse.
If you think that the drives really are working and that it was a cabling
problem then stop the array (if it isn't stopped already) and assemble with
--force:
mdadm --assemble --force /dev/md0 /dev....list of devices
Then find the devices that it chose not to include and add them individually
mdadm /dev/md0 --add /dev/something
However if any device has a bad block that cannot be read, then this won't
work.
In that case you need to get a new device, partition it to have a partition
EXACTLY the same size, use
dd_rescue
to copy all the good data from the bad drive to the new drive, remove the bad
drive from the system, and use the "--assemble --force" command using the new
drive, not the old drive.
>
> What's the correct process for replacing the 2 other drives?
> I am presuming that I need to --fail, then --remove then --add the drives (one
> at a time?), but I want to make sure.
There are already failed and removed so there is no point in trying to do
that again
Good luck.
NeilBrown
>
>
> Thanks for your help.
>
>
> Graham.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2011-03-02 5:26 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-02 5:05 Failed RAID 6 array advice jahammonds prost
2011-03-02 5:26 ` Mikael Abrahamsson
2011-03-02 5:26 ` NeilBrown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110302162657.036d4ab5@notabene.brown \
--to=neilb@suse.de \
--cc=gmitch64@yahoo.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).