Adding a new disk after disk failure on raid6 volume

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Adding a new disk after disk failure on raid6 volume
@ 2011-12-20  8:46 BERTRAND Joël
  2011-12-20  9:21 ` Robin Hill
  0 siblings, 1 reply; 4+ messages in thread
From: BERTRAND Joël @ 2011-12-20  8:46 UTC (permalink / raw)
  To: linux-raid

Hello,

I use several softraid volumes for a very long time. Last week, a disk 
has crashed on a raid6 volume and I have tried to replace faulty disk. 
Today, when Linux boots, it only assembles this volume if the new disk 
is marked as 'faulty' or 'removed', and I don't understand...

System is a sparc64-smp server running linux debian/testing :

Root rayleigh:[~] > uname -a
Linux rayleigh 2.6.36.2 #1 SMP Sun Jan 2 11:50:13 CET 2011 sparc64 
GNU/Linux
Root rayleigh:[~] > dpkg-query -l | grep mdadm
ii  mdadm                                 3.2.2-1

Faulty device is /dev/sde1 :

Root rayleigh:[~] > cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid6 sdc1[0] sdi1[6] sdh1[5] sdg1[4] sdf1[3] sdd1[1]
       359011840 blocks level 6, 64k chunk, algorithm 2 [7/6] [UU_UUUU]

All disks (/dev/sd[cdefghi]) are same model (Fujitsu SCA-2 73 GB) and 
each disk only contains one partition (type FD, linux autodetect). If I 
add /dev/sde1 to raid6 with mdadm -a /dev/md7 /dev/sde1, disk is added 
and my raid6 runs with all disks. But I obtain the same superblock on 
/dev/sde1 and /dev/sde ! If I remove /dev/sde superblock, /dev/sde1 one 
disappears also (i think that both superblocks are the same).

For information :

Root rayleigh:[~] > mdadm --examine --scan
ARRAY /dev/md6 UUID=a003dce6:121c0c4a:3f886e0a:7567841c
ARRAY /dev/md0 UUID=7439e08d:fc4de395:22484380:bdd49890
ARRAY /dev/md1 UUID=d035cc29:f693b530:a3f65a60:fc74e45f
ARRAY /dev/md2 UUID=dd9b6218:838d551e:e9582b84:96b48232
ARRAY /dev/md3 UUID=d5639361:22e3ea3e:1405d837:f1e5c9ea
ARRAY /dev/md4 UUID=41b4f376:e14d8be1:f3ff4b3c:33ab8d40
ARRAY /dev/md5 UUID=cba7995c:045168a1:f998aa64:f0e66714
ARRAY /dev/md7 UUID=3c07a5ac:79f3ad38:980f40e8:743f4cce
Root rayleigh:[~] > mdadm --examine /dev/sdc
/dev/sdc:
    MBR Magic : 55aa
Partition[0] :    143637102 sectors at           63 (type fd)
Root rayleigh:[~] > mdadm --examine /dev/sdc1
/dev/sdc1:
           Magic : a92b4efc
         Version : 0.90.00
            UUID : 3c07a5ac:79f3ad38:980f40e8:743f4cce (local to host 
rayleigh)
   Creation Time : Sun Dec 17 16:56:20 2006
      Raid Level : raid6
   Used Dev Size : 71802368 (68.48 GiB 73.53 GB)
      Array Size : 359011840 (342.38 GiB 367.63 GB)
    Raid Devices : 7
   Total Devices : 6
Preferred Minor : 7

     Update Time : Tue Dec 20 09:38:02 2011
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 1
   Spare Devices : 0
        Checksum : 464688f - correct
          Events : 1602268

          Layout : left-symmetric
      Chunk Size : 64K

       Number   Major   Minor   RaidDevice State
this     0       8       33        0      active sync   /dev/sdc1

    0     0       8       33        0      active sync   /dev/sdc1
    1     1       8       49        1      active sync   /dev/sdd1
    2     2       0        0        2      faulty removed
    3     3       8       81        3      active sync   /dev/sdf1
    4     4       8       97        4      active sync   /dev/sdg1
    5     5       8      113        5      active sync   /dev/sdh1
    6     6       8      129        6      active sync   /dev/sdi1
Root rayleigh:[~] >

All disks return same information except /dev/sde when it is running 
(mdadm --examine /dev/sde and mdadm --examine /dev/sde1 return the same 
information). What is my mistake ? Is this a known issue ?

Best regards,

JB

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Adding a new disk after disk failure on raid6 volume
  2011-12-20  8:46 Adding a new disk after disk failure on raid6 volume BERTRAND Joël
@ 2011-12-20  9:21 ` Robin Hill
  2011-12-20 11:39   ` John Robinson
  0 siblings, 1 reply; 4+ messages in thread
From: Robin Hill @ 2011-12-20  9:21 UTC (permalink / raw)
  To: BERTRAND Joël; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2795 bytes --]

On Tue Dec 20, 2011 at 09:46:13AM +0100, BERTRAND Joël wrote:

> Hello,
> 
> I use several softraid volumes for a very long time. Last week, a disk 
> has crashed on a raid6 volume and I have tried to replace faulty disk. 
> Today, when Linux boots, it only assembles this volume if the new disk 
> is marked as 'faulty' or 'removed', and I don't understand...
> 
> System is a sparc64-smp server running linux debian/testing :
> 
> Root rayleigh:[~] > uname -a
> Linux rayleigh 2.6.36.2 #1 SMP Sun Jan 2 11:50:13 CET 2011 sparc64 
> GNU/Linux
> Root rayleigh:[~] > dpkg-query -l | grep mdadm
> ii  mdadm                                 3.2.2-1
> 
> Faulty device is /dev/sde1 :
> 
> Root rayleigh:[~] > cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md7 : active raid6 sdc1[0] sdi1[6] sdh1[5] sdg1[4] sdf1[3] sdd1[1]
>        359011840 blocks level 6, 64k chunk, algorithm 2 [7/6] [UU_UUUU]
> 
> All disks (/dev/sd[cdefghi]) are same model (Fujitsu SCA-2 73 GB) and 
> each disk only contains one partition (type FD, linux autodetect). If I 
> add /dev/sde1 to raid6 with mdadm -a /dev/md7 /dev/sde1, disk is added 
> and my raid6 runs with all disks. But I obtain the same superblock on 
> /dev/sde1 and /dev/sde ! If I remove /dev/sde superblock, /dev/sde1 one 
> disappears also (i think that both superblocks are the same).
> 
<- SNIP info ->
>
> All disks return same information except /dev/sde when it is running 
> (mdadm --examine /dev/sde and mdadm --examine /dev/sde1 return the same 
> information). What is my mistake ? Is this a known issue ?
> 
It's a known issue with 0.9 superblocks, yes. There's no information in
the superblock with allows md to tell whether it's on the partition or
the disk, so for full-disk partitions the same superblock could be valid
for both. 0.1 superblocks contain extra information which can be used to
differentiate between these. I'm a little surprised that the other
drives don't get detected in the same way though.

The standard suggestions are either to change the mdadm.conf file to
look only at partitions for arrays (e.g. DEVICE /dev/sd*1), or (more
complicated) to re-create the array using a version 1 superblock. This
can be done without losing the data, but you need to be very careful to
use the 1.0 superblock and the exact same parameters you used for the
original array (with the disks in the same order). Neil has posted
instructions for this fairly recently, so you should be able to find
those in the archives.

HTH,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Adding a new disk after disk failure on raid6 volume
  2011-12-20  9:21 ` Robin Hill
@ 2011-12-20 11:39   ` John Robinson
  2011-12-20 22:17     ` BERTRAND Joël
  0 siblings, 1 reply; 4+ messages in thread
From: John Robinson @ 2011-12-20 11:39 UTC (permalink / raw)
  To: BERTRAND Joël, linux-raid

On 20/12/2011 09:21, Robin Hill wrote:
> On Tue Dec 20, 2011 at 09:46:13AM +0100, BERTRAND Joël wrote:
>
>> Hello,
>>
>> I use several softraid volumes for a very long time. Last week, a disk
>> has crashed on a raid6 volume and I have tried to replace faulty disk.
>> Today, when Linux boots, it only assembles this volume if the new disk
>> is marked as 'faulty' or 'removed', and I don't understand...
>>
>> System is a sparc64-smp server running linux debian/testing :
>>
>> Root rayleigh:[~]>  uname -a
>> Linux rayleigh 2.6.36.2 #1 SMP Sun Jan 2 11:50:13 CET 2011 sparc64
>> GNU/Linux
>> Root rayleigh:[~]>  dpkg-query -l | grep mdadm
>> ii  mdadm                                 3.2.2-1
>>
>> Faulty device is /dev/sde1 :
>>
>> Root rayleigh:[~]>  cat /proc/mdstat
>> Personalities : [raid1] [raid6] [raid5] [raid4]
>> md7 : active raid6 sdc1[0] sdi1[6] sdh1[5] sdg1[4] sdf1[3] sdd1[1]
>>         359011840 blocks level 6, 64k chunk, algorithm 2 [7/6] [UU_UUUU]
>>
>> All disks (/dev/sd[cdefghi]) are same model (Fujitsu SCA-2 73 GB) and
>> each disk only contains one partition (type FD, linux autodetect). If I
>> add /dev/sde1 to raid6 with mdadm -a /dev/md7 /dev/sde1, disk is added
>> and my raid6 runs with all disks. But I obtain the same superblock on
>> /dev/sde1 and /dev/sde ! If I remove /dev/sde superblock, /dev/sde1 one
>> disappears also (i think that both superblocks are the same).
>>
> <- SNIP info ->
>>
>> All disks return same information except /dev/sde when it is running
>> (mdadm --examine /dev/sde and mdadm --examine /dev/sde1 return the same
>> information). What is my mistake ? Is this a known issue ?
>>
> It's a known issue with 0.9 superblocks, yes. There's no information in
> the superblock with allows md to tell whether it's on the partition or
> the disk, so for full-disk partitions the same superblock could be valid
> for both. 0.1 superblocks contain extra information which can be used to
> differentiate between these. I'm a little surprised that the other
> drives don't get detected in the same way though.

I think the above issue only occurs on partitions with particular 
alignments, iirc starting at multiples of 8 sectors. Old fdisk would 
always create the first partition starting at sector 63, and that was 
the case with the output we saw for /dev/sdc, but a new fdisk will 
likely create the partition starting at sector 2048.

Alternatively, or additionally, the problem may be that very old fdisk 
had a bug where it miscounted and didn't create partitions right up to 
the last "cylinder" of the disc, so the md metadata on the last 
partition wasn't in the the same place as it would have been if it was 
for the whole disc.

Either way, I would recommend that the OP --fail, --remove and 
--zero-superblock his /dev/sde1, then copy a working partition table 
from sdc with `dd if=/dev/sdc of=/dev/sde bs=512 count=1`, then 
`blockdev --rereadpt /dev/sde`, then `fdisk -lu /dev/sde` just to make 
sure that there is now an sde1 that's identical to sdc1, then --add the 
new /dev/sde1.

Hope this helps!

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Adding a new disk after disk failure on raid6 volume
  2011-12-20 11:39   ` John Robinson
@ 2011-12-20 22:17     ` BERTRAND Joël
  0 siblings, 0 replies; 4+ messages in thread
From: BERTRAND Joël @ 2011-12-20 22:17 UTC (permalink / raw)
  To: John Robinson; +Cc: linux-raid

	Thanks a lot. Now /dev/sde and /dev/sde1 don't have same superbloc. My 
raid volume is rebuilding and I shall try to reboot this server as soon 
as possible to check.

	Regards,

	JKB

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-12-20 22:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-20  8:46 Adding a new disk after disk failure on raid6 volume BERTRAND Joël
2011-12-20  9:21 ` Robin Hill
2011-12-20 11:39   ` John Robinson
2011-12-20 22:17     ` BERTRAND Joël

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).