* mdadm dropped disk, won't re-add
@ 2012-02-15 13:58 John Paul Adrian Glaubitz
2012-02-15 14:45 ` Robin Hill
0 siblings, 1 reply; 3+ messages in thread
From: John Paul Adrian Glaubitz @ 2012-02-15 13:58 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1238 bytes --]
Hello,
I have a rather big problem with my Linux software RAID5.
It consists of 4 SATA disks each 1 TB in size, resulting in a 3 TB RAID5
volume (/dev/md0 assembled from /dev/sd{b,c,d,e}1.
Today, mdadm kicked disk sde1 from the RAID since the cable seemed to
make problems. I shutdown the machine, replaced the cable and tried
re-adding the disk, however, mdadm refused to add the drive.
So I re-partioned sde1 and added it as a new devices, mdadm instantly
started rebuilding the raid. Unfortunately, during the rebuild, mdadm
decided to kick sdc1 and I have now ended up with two drives failing.
I have tried re-adding sdc1 with the --re-add command, but mdadm again
refuses to re-add the drive.
I haven't changed anything since as I don't know what to do further. I
don't want to make any further damage to the raid and hope that someone
knows how to restore it.
My primary question is whether mdadm actually deletes any important data
on the remaining disks (sd{b,c,d}1) while rebuilding or whether it just
writes data to the newly added disk sde1.
mdadm is version 3.2.3, kernel is Linux 3.2.0 on Debian Wheezy.
Can anyone give further advise?
I'm attaching the output of mdadm -E /dev/sd{b,c,d,e}1.
Kind Regards,
Adrian
[-- Attachment #2: mddata.txt --]
[-- Type: text/plain, Size: 4235 bytes --]
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : 6db22c7b:7d9287e2:d01e5766:86e12a40 (local to host z6)
Creation Time : Fri Apr 23 13:53:33 2010
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Wed Feb 15 13:27:31 2012
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 2
Spare Devices : 1
Checksum : c0bf6492 - correct
Events : 311622
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/sdb1
0 0 8 49 0 active sync /dev/sdd1
1 1 8 17 1 active sync /dev/sdb1
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed
4 4 8 65 4 spare /dev/sde1
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.00
UUID : 6db22c7b:7d9287e2:d01e5766:86e12a40 (local to host z6)
Creation Time : Fri Apr 23 13:53:33 2010
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Wed Feb 15 13:25:25 2012
State : clean
Active Devices : 3
Working Devices : 4
Failed Devices : 1
Spare Devices : 1
Checksum : c0bf6411 - correct
Events : 311617
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 33 3 active sync /dev/sdc1
0 0 8 49 0 active sync /dev/sdd1
1 1 8 17 1 active sync /dev/sdb1
2 2 0 0 2 faulty removed
3 3 8 33 3 active sync /dev/sdc1
4 4 8 65 4 spare /dev/sde1
/dev/sdd1:
Magic : a92b4efc
Version : 0.90.00
UUID : 6db22c7b:7d9287e2:d01e5766:86e12a40 (local to host z6)
Creation Time : Fri Apr 23 13:53:33 2010
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Wed Feb 15 13:27:31 2012
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 2
Spare Devices : 1
Checksum : c0bf64b0 - correct
Events : 311622
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 49 0 active sync /dev/sdd1
0 0 8 49 0 active sync /dev/sdd1
1 1 8 17 1 active sync /dev/sdb1
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed
4 4 8 65 4 spare /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 0.90.00
UUID : 6db22c7b:7d9287e2:d01e5766:86e12a40 (local to host z6)
Creation Time : Fri Apr 23 13:53:33 2010
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Wed Feb 15 13:27:31 2012
State : clean
Active Devices : 2
Working Devices : 3
Failed Devices : 2
Spare Devices : 1
Checksum : c0bf64c2 - correct
Events : 311622
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 65 4 spare /dev/sde1
0 0 8 49 0 active sync /dev/sdd1
1 1 8 17 1 active sync /dev/sdb1
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed
4 4 8 65 4 spare /dev/sde1
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: mdadm dropped disk, won't re-add
2012-02-15 13:58 mdadm dropped disk, won't re-add John Paul Adrian Glaubitz
@ 2012-02-15 14:45 ` Robin Hill
2012-02-15 23:01 ` John Paul Adrian Glaubitz
0 siblings, 1 reply; 3+ messages in thread
From: Robin Hill @ 2012-02-15 14:45 UTC (permalink / raw)
To: John Paul Adrian Glaubitz; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2709 bytes --]
On Wed Feb 15, 2012 at 02:58:42PM +0100, John Paul Adrian Glaubitz wrote:
> Hello,
>
> I have a rather big problem with my Linux software RAID5.
>
> It consists of 4 SATA disks each 1 TB in size, resulting in a 3 TB RAID5
> volume (/dev/md0 assembled from /dev/sd{b,c,d,e}1.
>
> Today, mdadm kicked disk sde1 from the RAID since the cable seemed to
> make problems. I shutdown the machine, replaced the cable and tried
> re-adding the disk, however, mdadm refused to add the drive.
>
> So I re-partioned sde1 and added it as a new devices, mdadm instantly
> started rebuilding the raid. Unfortunately, during the rebuild, mdadm
> decided to kick sdc1 and I have now ended up with two drives failing.
>
> I have tried re-adding sdc1 with the --re-add command, but mdadm again
> refuses to re-add the drive.
>
That's a safety measure. If it can't actually re-add the drive then it
fails, rather than changing to do an --add instead (as older mdadm
versions did), potentially losing data.
> I haven't changed anything since as I don't know what to do further. I
> don't want to make any further damage to the raid and hope that someone
> knows how to restore it.
>
> My primary question is whether mdadm actually deletes any important data
> on the remaining disks (sd{b,c,d}1) while rebuilding or whether it just
> writes data to the newly added disk sde1.
>
It just writes data/checksums to the newly added disk. The only writes
to the remaining disks will be if other applications are writing to the
array during the rebuild process.
> mdadm is version 3.2.3, kernel is Linux 3.2.0 on Debian Wheezy.
>
> Can anyone give further advise?
>
What errors does dmesg give about why sdc1 was failed? You'll need to
fix that before you try recovering the array. If it's a drive error then
using ddrescue to clone it (or as much of it as possible) to sde1 would
probably be your best bet, then get a replacement drive.
Once you've fixed that issue then you should be able to force assemble
the array (mdadm -S /dev/md0; mdadm -Af /dev/md0) and continue/restart
the recovery process. I'd recommend doing a fsck on the filesystem
afterwards as well, especially if you've replaced sdc.
If the force assembly fails then try it with added verbosity (mdadm -S
/dev/md0; mdadm -Afvvv /dev/md0) and post the output from that (and from
dmesg) and hopefully someone will be able to figure out what's going
wrong.
Cheers,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: mdadm dropped disk, won't re-add
2012-02-15 14:45 ` Robin Hill
@ 2012-02-15 23:01 ` John Paul Adrian Glaubitz
0 siblings, 0 replies; 3+ messages in thread
From: John Paul Adrian Glaubitz @ 2012-02-15 23:01 UTC (permalink / raw)
To: Robin Hill; +Cc: linux-raid
Hi,
On Feb 15, 2012, at 3:45 PM, Robin Hill wrote:
>> I have tried re-adding sdc1 with the --re-add command, but mdadm again
>> refuses to re-add the drive.
>>
> That's a safety measure. If it can't actually re-add the drive then it
> fails, rather than changing to do an --add instead (as older mdadm
> versions did), potentially losing data.
Aha, thanks for clarifying.
>> My primary question is whether mdadm actually deletes any important data
>> on the remaining disks (sd{b,c,d}1) while rebuilding or whether it just
>> writes data to the newly added disk sde1.
>>
> It just writes data/checksums to the newly added disk. The only writes
> to the remaining disks will be if other applications are writing to the
> array during the rebuild process.
Great :). I was hoping so.
>> Can anyone give further advise?
>>
> What errors does dmesg give about why sdc1 was failed? You'll need to
> fix that before you try recovering the array. If it's a drive error then
> using ddrescue to clone it (or as much of it as possible) to sde1 would
> probably be your best bet, then get a replacement drive.
Those were errors related to the cable, the SATA link failed, the disk is ok,
smart log is clean.
> Once you've fixed that issue then you should be able to force assemble
> the array (mdadm -S /dev/md0; mdadm -Af /dev/md0) and continue/restart
> the recovery process. I'd recommend doing a fsck on the filesystem
> afterwards as well, especially if you've replaced sdc.
It did work, the raid is now rebuilding. I had actually had a friend who has
more expertise (he is a casual kernel hacker himself) have a look at it and
he fixed everything.
Basically, he reassembled the array from sd{b,c,d}1 with the --force option,
corrected the partitioning on the sde disk (I created a partition larger than
on the other disks accidentally, so he just copied the partition table from
one of the other disks in the array) and then added sde1 as a new disk.
Raid is no rebuilding and will be finished in 4 hours.
Thanks a lot for your quick help!
Adrian
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-02-15 23:01 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-15 13:58 mdadm dropped disk, won't re-add John Paul Adrian Glaubitz
2012-02-15 14:45 ` Robin Hill
2012-02-15 23:01 ` John Paul Adrian Glaubitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).