Rebuild after a drive replacement

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Rebuild after a drive replacement
@ 2008-06-08 20:52 Gavin Hamill
  2008-06-08 21:32 ` David Greaves
  2008-06-08 22:25 ` NeilBrown
  0 siblings, 2 replies; 7+ messages in thread
From: Gavin Hamill @ 2008-06-08 20:52 UTC (permalink / raw)
  To: linux-raid

Hi :)

I've had a drive die recently, and took the opportunity to upgrade my
aging Debian sarge box to etch. Once the new OS was running and I had
finished beating booting from /dev/md0 with a blunt instrument, I turned
my attention to the data raid-sets... there's only one I can't sort out,
and of course.. it's the biggest one :)

eddie:~# cat /proc/mdstat 
Personalities : [raid1] [raid6] [raid5] [raid4] 
md3 : active raid5 hda4[1] hdg4[2]
      349702144 blocks level 5, 1024k chunk, algorithm 0 [3/2] [_UU]

eddie:~# mdadm --add /dev/md3 /dev/hde5
mdadm: add new device failed for /dev/hde5 as 3: Invalid argument

hda and hdg are 200G drives. hde is the new one at 250G. I have
configured the partition sizes + types (0xFD) identically, but the
partition number is different on hde because I wanted to put extra stuff
in the 'spare' 50G.. so the set should be using hda4, hdg4, hde5.

eddie:~# mdadm -E /dev/hda4
/dev/hda4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 788aae08:cdb33dda:d82c141f:f33b4b89
  Creation Time : Sat Apr  9 23:46:23 2005
     Raid Level : raid5
    Device Size : 174851072 (166.75 GiB 179.05 GB)
     Array Size : 349702144 (333.50 GiB 358.09 GB)
   Raid Devices : 3
  Total Devices : 2
Preferred Minor : 3

    Update Time : Sun Jun  8 21:38:02 2008
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 50a43565 - correct
         Events : 0.5864408

         Layout : left-asymmetric
     Chunk Size : 1024K

      Number   Major   Minor   RaidDevice State
this     1       3        4        1      active sync   /dev/hda4

   0     0       0        0        0      removed
   1     1       3        4        1      active sync   /dev/hda4
   2     2      34        4        2      active sync   /dev/hdg4


eddie:~# mdadm -E /dev/hde5
/dev/hde5:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 788aae08:cdb33dda:d82c141f:f33b4b89
  Creation Time : Sat Apr  9 23:46:23 2005
     Raid Level : raid5
    Device Size : 174851072 (166.75 GiB 179.05 GB)
     Array Size : 349702144 (333.50 GiB 358.09 GB)
   Raid Devices : 3
  Total Devices : 2
Preferred Minor : 3

    Update Time : Sun Jun  8 19:05:11 2008
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 50a3f09d - correct
         Events : 0.5860156

         Layout : left-asymmetric
     Chunk Size : 1024K

      Number   Major   Minor   RaidDevice State
this     3      33        5       -1      spare   /dev/hde5

   0     0       0        0        0      removed
   1     1       3        4        1      active sync   /dev/hda4
   2     2      34        4        2      active sync   /dev/hdg4



eddie:~# mdadm -E /dev/hdg4
/dev/hdg4:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 788aae08:cdb33dda:d82c141f:f33b4b89
  Creation Time : Sat Apr  9 23:46:23 2005
     Raid Level : raid5
    Device Size : 174851072 (166.75 GiB 179.05 GB)
     Array Size : 349702144 (333.50 GiB 358.09 GB)
   Raid Devices : 3
  Total Devices : 2
Preferred Minor : 3

    Update Time : Sun Jun  8 21:38:12 2008
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 50a43598 - correct
         Events : 0.5864412

         Layout : left-asymmetric
     Chunk Size : 1024K

      Number   Major   Minor   RaidDevice State
this     2      34        4        2      active sync   /dev/hdg4

   0     0       0        0        0      removed
   1     1       3        4        1      active sync   /dev/hda4
   2     2      34        4        2      active sync   /dev/hdg4

As you can see, only hde thinks it's a spare, when I want it to replace
that 'removed' in all cases. What can I do?

Debian etch, so kernel 2.6.18 and mdadm 2.5.6.

Cheers,
Gavin.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rebuild after a drive replacement
  2008-06-08 20:52 Rebuild after a drive replacement Gavin Hamill
@ 2008-06-08 21:32 ` David Greaves
  2008-06-08 22:25 ` NeilBrown
  1 sibling, 0 replies; 7+ messages in thread
From: David Greaves @ 2008-06-08 21:32 UTC (permalink / raw)
  To: Gavin Hamill; +Cc: linux-raid

Gavin Hamill wrote:
> Hi :)
> 
> I've had a drive die recently, and took the opportunity to upgrade my
> aging Debian sarge box to etch. Once the new OS was running and I had
> finished beating booting from /dev/md0 with a blunt instrument, I turned
> my attention to the data raid-sets... there's only one I can't sort out,
> and of course.. it's the biggest one :)
> 
> eddie:~# cat /proc/mdstat 
> Personalities : [raid1] [raid6] [raid5] [raid4] 
> md3 : active raid5 hda4[1] hdg4[2]
>       349702144 blocks level 5, 1024k chunk, algorithm 0 [3/2] [_UU]
> 
> eddie:~# mdadm --add /dev/md3 /dev/hde5
try:
  mdadm /dev/md3 --add /dev/hde5

David



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rebuild after a drive replacement
  2008-06-08 20:52 Rebuild after a drive replacement Gavin Hamill
  2008-06-08 21:32 ` David Greaves
@ 2008-06-08 22:25 ` NeilBrown
  2008-06-09  8:12   ` Gavin Hamill
  1 sibling, 1 reply; 7+ messages in thread
From: NeilBrown @ 2008-06-08 22:25 UTC (permalink / raw)
  To: Gavin Hamill; +Cc: linux-raid

On Mon, June 9, 2008 6:52 am, Gavin Hamill wrote:
> Hi :)
>
> I've had a drive die recently, and took the opportunity to upgrade my
> aging Debian sarge box to etch. Once the new OS was running and I had
> finished beating booting from /dev/md0 with a blunt instrument, I turned
> my attention to the data raid-sets... there's only one I can't sort out,
> and of course.. it's the biggest one :)
>
> eddie:~# cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md3 : active raid5 hda4[1] hdg4[2]
>       349702144 blocks level 5, 1024k chunk, algorithm 0 [3/2] [_UU]
>
> eddie:~# mdadm --add /dev/md3 /dev/hde5
> mdadm: add new device failed for /dev/hde5 as 3: Invalid argument

Do you get any kernel messages when this fails?
   dmesg | tail

> As you can see, only hde thinks it's a spare, when I want it to replace
> that 'removed' in all cases. What can I do?

Given that the --add failed, this is normal.  The device is added to
the array as a spare, then recover starts.  Once recovery finished
the device so changed to be a full member of the array.

NeilBrown


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rebuild after a drive replacement
  2008-06-08 22:25 ` NeilBrown
@ 2008-06-09  8:12   ` Gavin Hamill
  2008-06-09  9:04     ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Gavin Hamill @ 2008-06-09  8:12 UTC (permalink / raw)
  To: linux-raid

On Mon, 2008-06-09 at 08:25 +1000, NeilBrown wrote:

> > eddie:~# mdadm --add /dev/md3 /dev/hde5
> > mdadm: add new device failed for /dev/hde5 as 3: Invalid argument
> 
> Do you get any kernel messages when this fails?
>    dmesg | tail

Yes indeed...

md: hde5 has invalid sb, not importing!
md: md_import_device returned -22

I have tried zeroing the superblock on hde5, (and mdadm -E /dev/hde5
confirms that it's gone) and then adding the drive again, but it doesn't
change anything :(

> > As you can see, only hde thinks it's a spare, when I want it to replace
> > that 'removed' in all cases. What can I do?
> 
> Given that the --add failed, this is normal.  The device is added to
> the array as a spare, then recover starts.  Once recovery finished
> the device so changed to be a full member of the array.

Well, whatever's going on, it isn't being configured correctly as a
spare, so no rebuild begins:

md3 : active raid5 hda4[1] hdg4[2]
      349702144 blocks level 5, 1024k chunk, algorithm 0 [3/2] [_UU]

Again, any advice warmly received :) At least now each drive is from a
different vendor, so I'm spreading my bets against a double-failure...

Cheers,
Gavin.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rebuild after a drive replacement
  2008-06-09  8:12   ` Gavin Hamill
@ 2008-06-09  9:04     ` NeilBrown
  2008-06-09  9:15       ` Gavin Hamill
  2008-06-09 18:56       ` [SOLVED] " Gavin Hamill
  0 siblings, 2 replies; 7+ messages in thread
From: NeilBrown @ 2008-06-09  9:04 UTC (permalink / raw)
  To: Gavin Hamill; +Cc: linux-raid

On Mon, June 9, 2008 6:12 pm, Gavin Hamill wrote:
> On Mon, 2008-06-09 at 08:25 +1000, NeilBrown wrote:
>
>> > eddie:~# mdadm --add /dev/md3 /dev/hde5
>> > mdadm: add new device failed for /dev/hde5 as 3: Invalid argument
>>
>> Do you get any kernel messages when this fails?
>>    dmesg | tail
>
> Yes indeed...
>
> md: hde5 has invalid sb, not importing!
> md: md_import_device returned -22

Thanks.

I suspect the partition is slightly too small.
Check the sizes in /proc/partitions.

hde5 needs to be at least 174851136K.

Newer version of mdadm notice this and report a better
error message (I think 2.6.3 contains the fix).

So: you might need to repartition the drive and make hde5 a
little bigger.

NeilBrown


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Rebuild after a drive replacement
  2008-06-09  9:04     ` NeilBrown
@ 2008-06-09  9:15       ` Gavin Hamill
  2008-06-09 18:56       ` [SOLVED] " Gavin Hamill
  1 sibling, 0 replies; 7+ messages in thread
From: Gavin Hamill @ 2008-06-09  9:15 UTC (permalink / raw)
  To: linux-raid

On Mon, 2008-06-09 at 19:04 +1000, NeilBrown wrote:
> On Mon, June 9, 2008 6:12 pm, Gavin Hamill wrote:
> > On Mon, 2008-06-09 at 08:25 +1000, NeilBrown wrote:
> >

> I suspect the partition is slightly too small.
> Check the sizes in /proc/partitions.
> 
> hde5 needs to be at least 174851136K.

Well spotted, that man! hde5 is indeed 50000 blocks smaller than the
others. I guess the debian installer must use different calculations
than cfdisk :(

> Newer version of mdadm notice this and report a better
> error message (I think 2.6.3 contains the fix).
> 
> So: you might need to repartition the drive and make hde5 a
> little bigger.

Yup - thanks muchly.. I'll give it a go and report back.

Cheers,
Gavin.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [SOLVED] Re: Rebuild after a drive replacement
  2008-06-09  9:04     ` NeilBrown
  2008-06-09  9:15       ` Gavin Hamill
@ 2008-06-09 18:56       ` Gavin Hamill
  1 sibling, 0 replies; 7+ messages in thread
From: Gavin Hamill @ 2008-06-09 18:56 UTC (permalink / raw)
  To: linux-raid

On Mon, 2008-06-09 at 19:04 +1000, NeilBrown wrote:

> So: you might need to repartition the drive and make hde5 a
> little bigger.

Yup, that was precisely it. Rebooted to refresh the updated partition
table, ran  mdadm --add /dev/md3 /dev/hde5 and it's happily rebuilding
as I write. Now I'll be a little better armed when the next drive fails
(smartctl is already showing the signs...)

Hurrah!

Cheers,
Gavin.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-06-09 18:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-08 20:52 Rebuild after a drive replacement Gavin Hamill
2008-06-08 21:32 ` David Greaves
2008-06-08 22:25 ` NeilBrown
2008-06-09  8:12   ` Gavin Hamill
2008-06-09  9:04     ` NeilBrown
2008-06-09  9:15       ` Gavin Hamill
2008-06-09 18:56       ` [SOLVED] " Gavin Hamill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).