* Rebuild after a drive replacement
@ 2008-06-08 20:52 Gavin Hamill
2008-06-08 21:32 ` David Greaves
2008-06-08 22:25 ` NeilBrown
0 siblings, 2 replies; 7+ messages in thread
From: Gavin Hamill @ 2008-06-08 20:52 UTC (permalink / raw)
To: linux-raid
Hi :)
I've had a drive die recently, and took the opportunity to upgrade my
aging Debian sarge box to etch. Once the new OS was running and I had
finished beating booting from /dev/md0 with a blunt instrument, I turned
my attention to the data raid-sets... there's only one I can't sort out,
and of course.. it's the biggest one :)
eddie:~# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md3 : active raid5 hda4[1] hdg4[2]
349702144 blocks level 5, 1024k chunk, algorithm 0 [3/2] [_UU]
eddie:~# mdadm --add /dev/md3 /dev/hde5
mdadm: add new device failed for /dev/hde5 as 3: Invalid argument
hda and hdg are 200G drives. hde is the new one at 250G. I have
configured the partition sizes + types (0xFD) identically, but the
partition number is different on hde because I wanted to put extra stuff
in the 'spare' 50G.. so the set should be using hda4, hdg4, hde5.
eddie:~# mdadm -E /dev/hda4
/dev/hda4:
Magic : a92b4efc
Version : 00.90.00
UUID : 788aae08:cdb33dda:d82c141f:f33b4b89
Creation Time : Sat Apr 9 23:46:23 2005
Raid Level : raid5
Device Size : 174851072 (166.75 GiB 179.05 GB)
Array Size : 349702144 (333.50 GiB 358.09 GB)
Raid Devices : 3
Total Devices : 2
Preferred Minor : 3
Update Time : Sun Jun 8 21:38:02 2008
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 50a43565 - correct
Events : 0.5864408
Layout : left-asymmetric
Chunk Size : 1024K
Number Major Minor RaidDevice State
this 1 3 4 1 active sync /dev/hda4
0 0 0 0 0 removed
1 1 3 4 1 active sync /dev/hda4
2 2 34 4 2 active sync /dev/hdg4
eddie:~# mdadm -E /dev/hde5
/dev/hde5:
Magic : a92b4efc
Version : 00.90.00
UUID : 788aae08:cdb33dda:d82c141f:f33b4b89
Creation Time : Sat Apr 9 23:46:23 2005
Raid Level : raid5
Device Size : 174851072 (166.75 GiB 179.05 GB)
Array Size : 349702144 (333.50 GiB 358.09 GB)
Raid Devices : 3
Total Devices : 2
Preferred Minor : 3
Update Time : Sun Jun 8 19:05:11 2008
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 50a3f09d - correct
Events : 0.5860156
Layout : left-asymmetric
Chunk Size : 1024K
Number Major Minor RaidDevice State
this 3 33 5 -1 spare /dev/hde5
0 0 0 0 0 removed
1 1 3 4 1 active sync /dev/hda4
2 2 34 4 2 active sync /dev/hdg4
eddie:~# mdadm -E /dev/hdg4
/dev/hdg4:
Magic : a92b4efc
Version : 00.90.00
UUID : 788aae08:cdb33dda:d82c141f:f33b4b89
Creation Time : Sat Apr 9 23:46:23 2005
Raid Level : raid5
Device Size : 174851072 (166.75 GiB 179.05 GB)
Array Size : 349702144 (333.50 GiB 358.09 GB)
Raid Devices : 3
Total Devices : 2
Preferred Minor : 3
Update Time : Sun Jun 8 21:38:12 2008
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 50a43598 - correct
Events : 0.5864412
Layout : left-asymmetric
Chunk Size : 1024K
Number Major Minor RaidDevice State
this 2 34 4 2 active sync /dev/hdg4
0 0 0 0 0 removed
1 1 3 4 1 active sync /dev/hda4
2 2 34 4 2 active sync /dev/hdg4
As you can see, only hde thinks it's a spare, when I want it to replace
that 'removed' in all cases. What can I do?
Debian etch, so kernel 2.6.18 and mdadm 2.5.6.
Cheers,
Gavin.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Rebuild after a drive replacement
2008-06-08 20:52 Rebuild after a drive replacement Gavin Hamill
@ 2008-06-08 21:32 ` David Greaves
2008-06-08 22:25 ` NeilBrown
1 sibling, 0 replies; 7+ messages in thread
From: David Greaves @ 2008-06-08 21:32 UTC (permalink / raw)
To: Gavin Hamill; +Cc: linux-raid
Gavin Hamill wrote:
> Hi :)
>
> I've had a drive die recently, and took the opportunity to upgrade my
> aging Debian sarge box to etch. Once the new OS was running and I had
> finished beating booting from /dev/md0 with a blunt instrument, I turned
> my attention to the data raid-sets... there's only one I can't sort out,
> and of course.. it's the biggest one :)
>
> eddie:~# cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md3 : active raid5 hda4[1] hdg4[2]
> 349702144 blocks level 5, 1024k chunk, algorithm 0 [3/2] [_UU]
>
> eddie:~# mdadm --add /dev/md3 /dev/hde5
try:
mdadm /dev/md3 --add /dev/hde5
David
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Rebuild after a drive replacement
2008-06-08 20:52 Rebuild after a drive replacement Gavin Hamill
2008-06-08 21:32 ` David Greaves
@ 2008-06-08 22:25 ` NeilBrown
2008-06-09 8:12 ` Gavin Hamill
1 sibling, 1 reply; 7+ messages in thread
From: NeilBrown @ 2008-06-08 22:25 UTC (permalink / raw)
To: Gavin Hamill; +Cc: linux-raid
On Mon, June 9, 2008 6:52 am, Gavin Hamill wrote:
> Hi :)
>
> I've had a drive die recently, and took the opportunity to upgrade my
> aging Debian sarge box to etch. Once the new OS was running and I had
> finished beating booting from /dev/md0 with a blunt instrument, I turned
> my attention to the data raid-sets... there's only one I can't sort out,
> and of course.. it's the biggest one :)
>
> eddie:~# cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md3 : active raid5 hda4[1] hdg4[2]
> 349702144 blocks level 5, 1024k chunk, algorithm 0 [3/2] [_UU]
>
> eddie:~# mdadm --add /dev/md3 /dev/hde5
> mdadm: add new device failed for /dev/hde5 as 3: Invalid argument
Do you get any kernel messages when this fails?
dmesg | tail
> As you can see, only hde thinks it's a spare, when I want it to replace
> that 'removed' in all cases. What can I do?
Given that the --add failed, this is normal. The device is added to
the array as a spare, then recover starts. Once recovery finished
the device so changed to be a full member of the array.
NeilBrown
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Rebuild after a drive replacement
2008-06-08 22:25 ` NeilBrown
@ 2008-06-09 8:12 ` Gavin Hamill
2008-06-09 9:04 ` NeilBrown
0 siblings, 1 reply; 7+ messages in thread
From: Gavin Hamill @ 2008-06-09 8:12 UTC (permalink / raw)
To: linux-raid
On Mon, 2008-06-09 at 08:25 +1000, NeilBrown wrote:
> > eddie:~# mdadm --add /dev/md3 /dev/hde5
> > mdadm: add new device failed for /dev/hde5 as 3: Invalid argument
>
> Do you get any kernel messages when this fails?
> dmesg | tail
Yes indeed...
md: hde5 has invalid sb, not importing!
md: md_import_device returned -22
I have tried zeroing the superblock on hde5, (and mdadm -E /dev/hde5
confirms that it's gone) and then adding the drive again, but it doesn't
change anything :(
> > As you can see, only hde thinks it's a spare, when I want it to replace
> > that 'removed' in all cases. What can I do?
>
> Given that the --add failed, this is normal. The device is added to
> the array as a spare, then recover starts. Once recovery finished
> the device so changed to be a full member of the array.
Well, whatever's going on, it isn't being configured correctly as a
spare, so no rebuild begins:
md3 : active raid5 hda4[1] hdg4[2]
349702144 blocks level 5, 1024k chunk, algorithm 0 [3/2] [_UU]
Again, any advice warmly received :) At least now each drive is from a
different vendor, so I'm spreading my bets against a double-failure...
Cheers,
Gavin.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Rebuild after a drive replacement
2008-06-09 8:12 ` Gavin Hamill
@ 2008-06-09 9:04 ` NeilBrown
2008-06-09 9:15 ` Gavin Hamill
2008-06-09 18:56 ` [SOLVED] " Gavin Hamill
0 siblings, 2 replies; 7+ messages in thread
From: NeilBrown @ 2008-06-09 9:04 UTC (permalink / raw)
To: Gavin Hamill; +Cc: linux-raid
On Mon, June 9, 2008 6:12 pm, Gavin Hamill wrote:
> On Mon, 2008-06-09 at 08:25 +1000, NeilBrown wrote:
>
>> > eddie:~# mdadm --add /dev/md3 /dev/hde5
>> > mdadm: add new device failed for /dev/hde5 as 3: Invalid argument
>>
>> Do you get any kernel messages when this fails?
>> dmesg | tail
>
> Yes indeed...
>
> md: hde5 has invalid sb, not importing!
> md: md_import_device returned -22
Thanks.
I suspect the partition is slightly too small.
Check the sizes in /proc/partitions.
hde5 needs to be at least 174851136K.
Newer version of mdadm notice this and report a better
error message (I think 2.6.3 contains the fix).
So: you might need to repartition the drive and make hde5 a
little bigger.
NeilBrown
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Rebuild after a drive replacement
2008-06-09 9:04 ` NeilBrown
@ 2008-06-09 9:15 ` Gavin Hamill
2008-06-09 18:56 ` [SOLVED] " Gavin Hamill
1 sibling, 0 replies; 7+ messages in thread
From: Gavin Hamill @ 2008-06-09 9:15 UTC (permalink / raw)
To: linux-raid
On Mon, 2008-06-09 at 19:04 +1000, NeilBrown wrote:
> On Mon, June 9, 2008 6:12 pm, Gavin Hamill wrote:
> > On Mon, 2008-06-09 at 08:25 +1000, NeilBrown wrote:
> >
> I suspect the partition is slightly too small.
> Check the sizes in /proc/partitions.
>
> hde5 needs to be at least 174851136K.
Well spotted, that man! hde5 is indeed 50000 blocks smaller than the
others. I guess the debian installer must use different calculations
than cfdisk :(
> Newer version of mdadm notice this and report a better
> error message (I think 2.6.3 contains the fix).
>
> So: you might need to repartition the drive and make hde5 a
> little bigger.
Yup - thanks muchly.. I'll give it a go and report back.
Cheers,
Gavin.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [SOLVED] Re: Rebuild after a drive replacement
2008-06-09 9:04 ` NeilBrown
2008-06-09 9:15 ` Gavin Hamill
@ 2008-06-09 18:56 ` Gavin Hamill
1 sibling, 0 replies; 7+ messages in thread
From: Gavin Hamill @ 2008-06-09 18:56 UTC (permalink / raw)
To: linux-raid
On Mon, 2008-06-09 at 19:04 +1000, NeilBrown wrote:
> So: you might need to repartition the drive and make hde5 a
> little bigger.
Yup, that was precisely it. Rebooted to refresh the updated partition
table, ran mdadm --add /dev/md3 /dev/hde5 and it's happily rebuilding
as I write. Now I'll be a little better armed when the next drive fails
(smartctl is already showing the signs...)
Hurrah!
Cheers,
Gavin.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2008-06-09 18:56 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-08 20:52 Rebuild after a drive replacement Gavin Hamill
2008-06-08 21:32 ` David Greaves
2008-06-08 22:25 ` NeilBrown
2008-06-09 8:12 ` Gavin Hamill
2008-06-09 9:04 ` NeilBrown
2008-06-09 9:15 ` Gavin Hamill
2008-06-09 18:56 ` [SOLVED] " Gavin Hamill
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).