* Adding a new disk after disk failure on raid6 volume
@ 2011-12-20 8:46 BERTRAND Joël
2011-12-20 9:21 ` Robin Hill
0 siblings, 1 reply; 4+ messages in thread
From: BERTRAND Joël @ 2011-12-20 8:46 UTC (permalink / raw)
To: linux-raid
Hello,
I use several softraid volumes for a very long time. Last week, a disk
has crashed on a raid6 volume and I have tried to replace faulty disk.
Today, when Linux boots, it only assembles this volume if the new disk
is marked as 'faulty' or 'removed', and I don't understand...
System is a sparc64-smp server running linux debian/testing :
Root rayleigh:[~] > uname -a
Linux rayleigh 2.6.36.2 #1 SMP Sun Jan 2 11:50:13 CET 2011 sparc64
GNU/Linux
Root rayleigh:[~] > dpkg-query -l | grep mdadm
ii mdadm 3.2.2-1
Faulty device is /dev/sde1 :
Root rayleigh:[~] > cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid6 sdc1[0] sdi1[6] sdh1[5] sdg1[4] sdf1[3] sdd1[1]
359011840 blocks level 6, 64k chunk, algorithm 2 [7/6] [UU_UUUU]
All disks (/dev/sd[cdefghi]) are same model (Fujitsu SCA-2 73 GB) and
each disk only contains one partition (type FD, linux autodetect). If I
add /dev/sde1 to raid6 with mdadm -a /dev/md7 /dev/sde1, disk is added
and my raid6 runs with all disks. But I obtain the same superblock on
/dev/sde1 and /dev/sde ! If I remove /dev/sde superblock, /dev/sde1 one
disappears also (i think that both superblocks are the same).
For information :
Root rayleigh:[~] > mdadm --examine --scan
ARRAY /dev/md6 UUID=a003dce6:121c0c4a:3f886e0a:7567841c
ARRAY /dev/md0 UUID=7439e08d:fc4de395:22484380:bdd49890
ARRAY /dev/md1 UUID=d035cc29:f693b530:a3f65a60:fc74e45f
ARRAY /dev/md2 UUID=dd9b6218:838d551e:e9582b84:96b48232
ARRAY /dev/md3 UUID=d5639361:22e3ea3e:1405d837:f1e5c9ea
ARRAY /dev/md4 UUID=41b4f376:e14d8be1:f3ff4b3c:33ab8d40
ARRAY /dev/md5 UUID=cba7995c:045168a1:f998aa64:f0e66714
ARRAY /dev/md7 UUID=3c07a5ac:79f3ad38:980f40e8:743f4cce
Root rayleigh:[~] > mdadm --examine /dev/sdc
/dev/sdc:
MBR Magic : 55aa
Partition[0] : 143637102 sectors at 63 (type fd)
Root rayleigh:[~] > mdadm --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.00
UUID : 3c07a5ac:79f3ad38:980f40e8:743f4cce (local to host
rayleigh)
Creation Time : Sun Dec 17 16:56:20 2006
Raid Level : raid6
Used Dev Size : 71802368 (68.48 GiB 73.53 GB)
Array Size : 359011840 (342.38 GiB 367.63 GB)
Raid Devices : 7
Total Devices : 6
Preferred Minor : 7
Update Time : Tue Dec 20 09:38:02 2011
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 1
Spare Devices : 0
Checksum : 464688f - correct
Events : 1602268
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 33 0 active sync /dev/sdc1
0 0 8 33 0 active sync /dev/sdc1
1 1 8 49 1 active sync /dev/sdd1
2 2 0 0 2 faulty removed
3 3 8 81 3 active sync /dev/sdf1
4 4 8 97 4 active sync /dev/sdg1
5 5 8 113 5 active sync /dev/sdh1
6 6 8 129 6 active sync /dev/sdi1
Root rayleigh:[~] >
All disks return same information except /dev/sde when it is running
(mdadm --examine /dev/sde and mdadm --examine /dev/sde1 return the same
information). What is my mistake ? Is this a known issue ?
Best regards,
JB
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Adding a new disk after disk failure on raid6 volume
2011-12-20 8:46 Adding a new disk after disk failure on raid6 volume BERTRAND Joël
@ 2011-12-20 9:21 ` Robin Hill
2011-12-20 11:39 ` John Robinson
0 siblings, 1 reply; 4+ messages in thread
From: Robin Hill @ 2011-12-20 9:21 UTC (permalink / raw)
To: BERTRAND Joël; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2795 bytes --]
On Tue Dec 20, 2011 at 09:46:13AM +0100, BERTRAND Joël wrote:
> Hello,
>
> I use several softraid volumes for a very long time. Last week, a disk
> has crashed on a raid6 volume and I have tried to replace faulty disk.
> Today, when Linux boots, it only assembles this volume if the new disk
> is marked as 'faulty' or 'removed', and I don't understand...
>
> System is a sparc64-smp server running linux debian/testing :
>
> Root rayleigh:[~] > uname -a
> Linux rayleigh 2.6.36.2 #1 SMP Sun Jan 2 11:50:13 CET 2011 sparc64
> GNU/Linux
> Root rayleigh:[~] > dpkg-query -l | grep mdadm
> ii mdadm 3.2.2-1
>
> Faulty device is /dev/sde1 :
>
> Root rayleigh:[~] > cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md7 : active raid6 sdc1[0] sdi1[6] sdh1[5] sdg1[4] sdf1[3] sdd1[1]
> 359011840 blocks level 6, 64k chunk, algorithm 2 [7/6] [UU_UUUU]
>
> All disks (/dev/sd[cdefghi]) are same model (Fujitsu SCA-2 73 GB) and
> each disk only contains one partition (type FD, linux autodetect). If I
> add /dev/sde1 to raid6 with mdadm -a /dev/md7 /dev/sde1, disk is added
> and my raid6 runs with all disks. But I obtain the same superblock on
> /dev/sde1 and /dev/sde ! If I remove /dev/sde superblock, /dev/sde1 one
> disappears also (i think that both superblocks are the same).
>
<- SNIP info ->
>
> All disks return same information except /dev/sde when it is running
> (mdadm --examine /dev/sde and mdadm --examine /dev/sde1 return the same
> information). What is my mistake ? Is this a known issue ?
>
It's a known issue with 0.9 superblocks, yes. There's no information in
the superblock with allows md to tell whether it's on the partition or
the disk, so for full-disk partitions the same superblock could be valid
for both. 0.1 superblocks contain extra information which can be used to
differentiate between these. I'm a little surprised that the other
drives don't get detected in the same way though.
The standard suggestions are either to change the mdadm.conf file to
look only at partitions for arrays (e.g. DEVICE /dev/sd*1), or (more
complicated) to re-create the array using a version 1 superblock. This
can be done without losing the data, but you need to be very careful to
use the 1.0 superblock and the exact same parameters you used for the
original array (with the disks in the same order). Neil has posted
instructions for this fairly recently, so you should be able to find
those in the archives.
HTH,
Robin
--
___
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Adding a new disk after disk failure on raid6 volume
2011-12-20 9:21 ` Robin Hill
@ 2011-12-20 11:39 ` John Robinson
2011-12-20 22:17 ` BERTRAND Joël
0 siblings, 1 reply; 4+ messages in thread
From: John Robinson @ 2011-12-20 11:39 UTC (permalink / raw)
To: BERTRAND Joël, linux-raid
On 20/12/2011 09:21, Robin Hill wrote:
> On Tue Dec 20, 2011 at 09:46:13AM +0100, BERTRAND Joël wrote:
>
>> Hello,
>>
>> I use several softraid volumes for a very long time. Last week, a disk
>> has crashed on a raid6 volume and I have tried to replace faulty disk.
>> Today, when Linux boots, it only assembles this volume if the new disk
>> is marked as 'faulty' or 'removed', and I don't understand...
>>
>> System is a sparc64-smp server running linux debian/testing :
>>
>> Root rayleigh:[~]> uname -a
>> Linux rayleigh 2.6.36.2 #1 SMP Sun Jan 2 11:50:13 CET 2011 sparc64
>> GNU/Linux
>> Root rayleigh:[~]> dpkg-query -l | grep mdadm
>> ii mdadm 3.2.2-1
>>
>> Faulty device is /dev/sde1 :
>>
>> Root rayleigh:[~]> cat /proc/mdstat
>> Personalities : [raid1] [raid6] [raid5] [raid4]
>> md7 : active raid6 sdc1[0] sdi1[6] sdh1[5] sdg1[4] sdf1[3] sdd1[1]
>> 359011840 blocks level 6, 64k chunk, algorithm 2 [7/6] [UU_UUUU]
>>
>> All disks (/dev/sd[cdefghi]) are same model (Fujitsu SCA-2 73 GB) and
>> each disk only contains one partition (type FD, linux autodetect). If I
>> add /dev/sde1 to raid6 with mdadm -a /dev/md7 /dev/sde1, disk is added
>> and my raid6 runs with all disks. But I obtain the same superblock on
>> /dev/sde1 and /dev/sde ! If I remove /dev/sde superblock, /dev/sde1 one
>> disappears also (i think that both superblocks are the same).
>>
> <- SNIP info ->
>>
>> All disks return same information except /dev/sde when it is running
>> (mdadm --examine /dev/sde and mdadm --examine /dev/sde1 return the same
>> information). What is my mistake ? Is this a known issue ?
>>
> It's a known issue with 0.9 superblocks, yes. There's no information in
> the superblock with allows md to tell whether it's on the partition or
> the disk, so for full-disk partitions the same superblock could be valid
> for both. 0.1 superblocks contain extra information which can be used to
> differentiate between these. I'm a little surprised that the other
> drives don't get detected in the same way though.
I think the above issue only occurs on partitions with particular
alignments, iirc starting at multiples of 8 sectors. Old fdisk would
always create the first partition starting at sector 63, and that was
the case with the output we saw for /dev/sdc, but a new fdisk will
likely create the partition starting at sector 2048.
Alternatively, or additionally, the problem may be that very old fdisk
had a bug where it miscounted and didn't create partitions right up to
the last "cylinder" of the disc, so the md metadata on the last
partition wasn't in the the same place as it would have been if it was
for the whole disc.
Either way, I would recommend that the OP --fail, --remove and
--zero-superblock his /dev/sde1, then copy a working partition table
from sdc with `dd if=/dev/sdc of=/dev/sde bs=512 count=1`, then
`blockdev --rereadpt /dev/sde`, then `fdisk -lu /dev/sde` just to make
sure that there is now an sde1 that's identical to sdc1, then --add the
new /dev/sde1.
Hope this helps!
Cheers,
John.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Adding a new disk after disk failure on raid6 volume
2011-12-20 11:39 ` John Robinson
@ 2011-12-20 22:17 ` BERTRAND Joël
0 siblings, 0 replies; 4+ messages in thread
From: BERTRAND Joël @ 2011-12-20 22:17 UTC (permalink / raw)
To: John Robinson; +Cc: linux-raid
Thanks a lot. Now /dev/sde and /dev/sde1 don't have same superbloc. My
raid volume is rebuilding and I shall try to reboot this server as soon
as possible to check.
Regards,
JKB
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-12-20 22:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-20 8:46 Adding a new disk after disk failure on raid6 volume BERTRAND Joël
2011-12-20 9:21 ` Robin Hill
2011-12-20 11:39 ` John Robinson
2011-12-20 22:17 ` BERTRAND Joël
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).