* Failed RAID 6 array advice
@ 2011-03-02 5:05 jahammonds prost
2011-03-02 5:26 ` Mikael Abrahamsson
2011-03-02 5:26 ` NeilBrown
0 siblings, 2 replies; 3+ messages in thread
From: jahammonds prost @ 2011-03-02 5:05 UTC (permalink / raw)
To: linux-raid
I've just had a 3rd drive fail on one of my RAID 6 arrays, and I'm looking for
some advice on how to get it back enough that I can recover the data, and then
replacing the other failed drives.
mdadm -V
mdadm - v3.0.3 - 22nd October 2009
Not the most up to date release, but it seems to be the latest one available on
FC12
The /etc/mdadm.conf file is
ARRAY /dev/md0 uuid=1470c671:4236b155:67287625:899db153
Which explains why I didn't get emailed about the drive failures. This isn't my
standard file, and I don't know how it was changed, but that's another issue for
another day.
mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Sat Jun 5 10:38:11 2010
Raid Level : raid6
Used Dev Size : 488383488 (465.76 GiB 500.10 GB)
Raid Devices : 15
Total Devices : 12
Persistence : Superblock is persistent
Update Time : Tue Mar 1 22:17:41 2011
State : active, degraded, Not Started
Active Devices : 12
Working Devices : 12
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Name : file00bert.woodlea.org.uk:0 (local to host
file00bert.woodlea.org.uk)
UUID : 1470c671:4236b155:67287625:899db153
Events : 254890
Number Major Minor RaidDevice State
0 8 113 0 active sync /dev/sdh1
1 8 17 1 active sync /dev/sdb1
2 8 177 2 active sync /dev/sdl1
3 0 0 3 removed
4 8 33 4 active sync /dev/sdc1
5 8 193 5 active sync /dev/sdm1
6 0 0 6 removed
7 8 49 7 active sync /dev/sdd1
8 8 209 8 active sync /dev/sdn1
9 8 161 9 active sync /dev/sdk1
10 0 0 10 removed
11 8 225 11 active sync /dev/sdo1
12 8 81 12 active sync /dev/sdf1
13 8 241 13 active sync /dev/sdp1
14 8 1 14 active sync /dev/sda1
The output from the failed drives are as follows.
mdadm --examine /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 1470c671:4236b155:67287625:899db153
Name : file00bert.woodlea.org.uk:0 (local to host
file00bert.woodlea.org.uk)
Creation Time : Sat Jun 5 10:38:11 2010
Raid Level : raid6
Raid Devices : 15
Avail Dev Size : 976767730 (465.76 GiB 500.11 GB)
Array Size : 12697970688 (6054.86 GiB 6501.36 GB)
Used Dev Size : 976766976 (465.76 GiB 500.10 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 3e284f2e:d939fb97:0b74eb88:326e879c
Internal Bitmap : 2 sectors from superblock
Update Time : Tue Mar 1 21:53:31 2011
Checksum : 768f0f34 - correct
Events : 254591
Chunk Size : 512K
Device Role : Active device 10
Array State : AAA.AA.AAAAAAAA ('A' == active, '.' == missing)
The above is the drive that failed tonight, and the one I would like to re add
back into the array. There have been no writes to the filesystem on the array in
the last couple of days (other than what ext4 would do on it's own).
mdadm --examine /dev/sdi1
/dev/sdi1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 1470c671:4236b155:67287625:899db153
Name : file00bert.woodlea.org.uk:0 (local to host
file00bert.woodlea.org.uk)
Creation Time : Sat Jun 5 10:38:11 2010
Raid Level : raid6
Raid Devices : 15
Avail Dev Size : 976767730 (465.76 GiB 500.11 GB)
Array Size : 12697970688 (6054.86 GiB 6501.36 GB)
Used Dev Size : 976766976 (465.76 GiB 500.10 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 8e668e39:06d8281b:b79aa3ab:a1d55fb5
Internal Bitmap : 2 sectors from superblock
Update Time : Thu Feb 10 18:20:54 2011
Checksum : 4078396b - correct
Events : 254075
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAAAA.AAAAAAAA ('A' == active, '.' == missing)
mdadm --examine /dev/sdj1
/dev/sdj1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 1470c671:4236b155:67287625:899db153
Name : file00bert.woodlea.org.uk:0 (local to host
file00bert.woodlea.org.uk)
Creation Time : Sat Jun 5 10:38:11 2010
Raid Level : raid6
Raid Devices : 15
Avail Dev Size : 976767730 (465.76 GiB 500.11 GB)
Array Size : 12697970688 (6054.86 GiB 6501.36 GB)
Used Dev Size : 976766976 (465.76 GiB 500.10 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : active
Device UUID : 37d422cc:8436960a:c3c4d11c:81a8e4fa
Internal Bitmap : 2 sectors from superblock
Update Time : Thu Oct 21 23:45:06 2010
Checksum : 78950bb5 - correct
Events : 21435
Chunk Size : 512K
Device Role : Active device 6
Array State : AAAAAAAAAAAAAAA ('A' == active, '.' == missing)
Looks like sdj1 failed waaay back in Oct last year (sigh). As I said, I am not
to bothered about adding these last 2 drives back into the array, since they
failed so long ago. I have a couple of spare drives sitting here, and I will
replace these 2 drives with them (once I have completed a badblocks on them).
Looking at the output of dmesg, there are no other errors showing for the 3
drives, other than them being kicked out of the array for being non fresh.
I guess I have a couple of questions.
What's the correct process for adding the failed /dev/sde1 back into the array
so I can start it. I don't want to rush into this and make things worse.
What's the correct process for replacing the 2 other drives?
I am presuming that I need to --fail, then --remove then --add the drives (one
at a time?), but I want to make sure.
Thanks for your help.
Graham.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Failed RAID 6 array advice
2011-03-02 5:05 Failed RAID 6 array advice jahammonds prost
@ 2011-03-02 5:26 ` Mikael Abrahamsson
2011-03-02 5:26 ` NeilBrown
1 sibling, 0 replies; 3+ messages in thread
From: Mikael Abrahamsson @ 2011-03-02 5:26 UTC (permalink / raw)
To: jahammonds prost; +Cc: linux-raid
On Tue, 1 Mar 2011, jahammonds prost wrote:
> What's the correct process for adding the failed /dev/sde1 back into the
> array so I can start it. I don't want to rush into this and make things
> worse.
There are a lot of discussions about this in the archives, but basically I
recommend the following:
Make sure you're running the latest mdadm, right now it's 3.1.4. Compile
it yourself if you have to. After that you stop the array and use
--assemble --force to get the array up and running again with the drives
you know are good (make sure you don't use the drives that was offlined a
long time ago).
> What's the correct process for replacing the 2 other drives?
> I am presuming that I need to --fail, then --remove then --add the drives (one
> at a time?), but I want to make sure.
Yes, when you have a working degraded array you just add them and a
re-sync should happen and then everything should be ok if the resync
succeeds.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Failed RAID 6 array advice
2011-03-02 5:05 Failed RAID 6 array advice jahammonds prost
2011-03-02 5:26 ` Mikael Abrahamsson
@ 2011-03-02 5:26 ` NeilBrown
1 sibling, 0 replies; 3+ messages in thread
From: NeilBrown @ 2011-03-02 5:26 UTC (permalink / raw)
To: jahammonds prost; +Cc: linux-raid
On Tue, 1 Mar 2011 21:05:33 -0800 (PST) jahammonds prost <gmitch64@yahoo.com>
wrote:
> I've just had a 3rd drive fail on one of my RAID 6 arrays, and I'm looking for
> some advice on how to get it back enough that I can recover the data, and then
> replacing the other failed drives.
>
>
> mdadm -V
> mdadm - v3.0.3 - 22nd October 2009
>
>
> Not the most up to date release, but it seems to be the latest one available on
> FC12
>
>
>
> The /etc/mdadm.conf file is
>
> ARRAY /dev/md0 uuid=1470c671:4236b155:67287625:899db153
>
>
> Which explains why I didn't get emailed about the drive failures. This isn't my
> standard file, and I don't know how it was changed, but that's another issue for
> another day.
>
>
>
> mdadm --detail /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Sat Jun 5 10:38:11 2010
> Raid Level : raid6
> Used Dev Size : 488383488 (465.76 GiB 500.10 GB)
> Raid Devices : 15
> Total Devices : 12
> Persistence : Superblock is persistent
> Update Time : Tue Mar 1 22:17:41 2011
> State : active, degraded, Not Started
> Active Devices : 12
> Working Devices : 12
> Failed Devices : 0
> Spare Devices : 0
> Chunk Size : 512K
> Name : file00bert.woodlea.org.uk:0 (local to host
> file00bert.woodlea.org.uk)
> UUID : 1470c671:4236b155:67287625:899db153
> Events : 254890
> Number Major Minor RaidDevice State
> 0 8 113 0 active sync /dev/sdh1
> 1 8 17 1 active sync /dev/sdb1
> 2 8 177 2 active sync /dev/sdl1
> 3 0 0 3 removed
> 4 8 33 4 active sync /dev/sdc1
> 5 8 193 5 active sync /dev/sdm1
> 6 0 0 6 removed
> 7 8 49 7 active sync /dev/sdd1
> 8 8 209 8 active sync /dev/sdn1
> 9 8 161 9 active sync /dev/sdk1
> 10 0 0 10 removed
> 11 8 225 11 active sync /dev/sdo1
> 12 8 81 12 active sync /dev/sdf1
> 13 8 241 13 active sync /dev/sdp1
> 14 8 1 14 active sync /dev/sda1
>
>
>
> The output from the failed drives are as follows.
>
>
> mdadm --examine /dev/sde1
> /dev/sde1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : 1470c671:4236b155:67287625:899db153
> Name : file00bert.woodlea.org.uk:0 (local to host
> file00bert.woodlea.org.uk)
> Creation Time : Sat Jun 5 10:38:11 2010
> Raid Level : raid6
> Raid Devices : 15
> Avail Dev Size : 976767730 (465.76 GiB 500.11 GB)
> Array Size : 12697970688 (6054.86 GiB 6501.36 GB)
> Used Dev Size : 976766976 (465.76 GiB 500.10 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : clean
> Device UUID : 3e284f2e:d939fb97:0b74eb88:326e879c
> Internal Bitmap : 2 sectors from superblock
> Update Time : Tue Mar 1 21:53:31 2011
> Checksum : 768f0f34 - correct
> Events : 254591
> Chunk Size : 512K
> Device Role : Active device 10
> Array State : AAA.AA.AAAAAAAA ('A' == active, '.' == missing)
>
>
> The above is the drive that failed tonight, and the one I would like to re add
> back into the array. There have been no writes to the filesystem on the array in
> the last couple of days (other than what ext4 would do on it's own).
>
>
> mdadm --examine /dev/sdi1
> /dev/sdi1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : 1470c671:4236b155:67287625:899db153
> Name : file00bert.woodlea.org.uk:0 (local to host
> file00bert.woodlea.org.uk)
> Creation Time : Sat Jun 5 10:38:11 2010
> Raid Level : raid6
> Raid Devices : 15
> Avail Dev Size : 976767730 (465.76 GiB 500.11 GB)
> Array Size : 12697970688 (6054.86 GiB 6501.36 GB)
> Used Dev Size : 976766976 (465.76 GiB 500.10 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 8e668e39:06d8281b:b79aa3ab:a1d55fb5
> Internal Bitmap : 2 sectors from superblock
> Update Time : Thu Feb 10 18:20:54 2011
> Checksum : 4078396b - correct
> Events : 254075
> Chunk Size : 512K
> Device Role : Active device 3
> Array State : AAAAAA.AAAAAAAA ('A' == active, '.' == missing)
>
>
> mdadm --examine /dev/sdj1
> /dev/sdj1:
> Magic : a92b4efc
> Version : 1.2
> Feature Map : 0x1
> Array UUID : 1470c671:4236b155:67287625:899db153
> Name : file00bert.woodlea.org.uk:0 (local to host
> file00bert.woodlea.org.uk)
> Creation Time : Sat Jun 5 10:38:11 2010
> Raid Level : raid6
> Raid Devices : 15
> Avail Dev Size : 976767730 (465.76 GiB 500.11 GB)
> Array Size : 12697970688 (6054.86 GiB 6501.36 GB)
> Used Dev Size : 976766976 (465.76 GiB 500.10 GB)
> Data Offset : 272 sectors
> Super Offset : 8 sectors
> State : active
> Device UUID : 37d422cc:8436960a:c3c4d11c:81a8e4fa
> Internal Bitmap : 2 sectors from superblock
> Update Time : Thu Oct 21 23:45:06 2010
> Checksum : 78950bb5 - correct
> Events : 21435
> Chunk Size : 512K
> Device Role : Active device 6
> Array State : AAAAAAAAAAAAAAA ('A' == active, '.' == missing)
>
>
> Looks like sdj1 failed waaay back in Oct last year (sigh). As I said, I am not
> to bothered about adding these last 2 drives back into the array, since they
> failed so long ago. I have a couple of spare drives sitting here, and I will
> replace these 2 drives with them (once I have completed a badblocks on them).
> Looking at the output of dmesg, there are no other errors showing for the 3
> drives, other than them being kicked out of the array for being non fresh.
>
> I guess I have a couple of questions.
>
> What's the correct process for adding the failed /dev/sde1 back into the array
> so I can start it. I don't want to rush into this and make things worse.
If you think that the drives really are working and that it was a cabling
problem then stop the array (if it isn't stopped already) and assemble with
--force:
mdadm --assemble --force /dev/md0 /dev....list of devices
Then find the devices that it chose not to include and add them individually
mdadm /dev/md0 --add /dev/something
However if any device has a bad block that cannot be read, then this won't
work.
In that case you need to get a new device, partition it to have a partition
EXACTLY the same size, use
dd_rescue
to copy all the good data from the bad drive to the new drive, remove the bad
drive from the system, and use the "--assemble --force" command using the new
drive, not the old drive.
>
> What's the correct process for replacing the 2 other drives?
> I am presuming that I need to --fail, then --remove then --add the drives (one
> at a time?), but I want to make sure.
There are already failed and removed so there is no point in trying to do
that again
Good luck.
NeilBrown
>
>
> Thanks for your help.
>
>
> Graham.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-03-02 5:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-02 5:05 Failed RAID 6 array advice jahammonds prost
2011-03-02 5:26 ` Mikael Abrahamsson
2011-03-02 5:26 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).