After partition resize, RAID5 array does not assemble on boot

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* After partition resize, RAID5 array does not assemble on boot
@ 2008-06-03  6:49 Jules Bean
  2008-06-03 21:19 ` Jules Bean
  0 siblings, 1 reply; 14+ messages in thread
From: Jules Bean @ 2008-06-03  6:49 UTC (permalink / raw)
  To: linux-raid

Kernel: 2.6.24 i386
mdadm: 2.6.4

Hi,

I had a RAID5 array in the configuration 250/250/400/400 (so only
250/250/250/250 was actually being used)

After a partition rearrangement it was possible to increase the size
of the two 250 partitions to 400. I did the following:

mdadm --fail partition
mdadm --remove partition
cfdisk resize partition
mdadm --add partition
wait some hours for rebuild to complete

Twice, once for each 250G partition.

The new array was running fine. Here is its status:

champagne:/home/jules# mdadm --detail /dev/md0
/dev/md0:
         Version : 00.90.03
   Creation Time : Tue Jan 30 21:28:07 2007
      Raid Level : raid5
      Array Size : 726732096 (693.07 GiB 744.17 GB)
   Used Dev Size : 242244032 (231.02 GiB 248.06 GB)
    Raid Devices : 4
   Total Devices : 4
Preferred Minor : 0
     Persistence : Superblock is persistent

     Update Time : Mon Jun  2 22:46:33 2008
           State : clean
  Active Devices : 4
Working Devices : 4
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 64K

            UUID : 52252ae8:5d1fd858:31a51f4c:5ff55ddd
          Events : 0.1638776

     Number   Major   Minor   RaidDevice State
        0       8        2        0      active sync   /dev/sda2
        1       8       18        1      active sync   /dev/sdb2
        2       8       50        2      active sync   /dev/sdd2
        3       8       34        3      active sync   /dev/sdc2

However I was under the impression --grow --size=max would grow it up
to the real limits of the partitions. This didn't work.

Thinking something was cached in internal tables with incorrect
partition sizes, I rebooted the machine.

Bad idea :(

The RAID array failed to reconstruct. The boot messages said there
were only two working devices, not enough to start array.

/dev/sdd2 and /dev/sd2 (which were the partitions I didn't touch) were
both there. /dev/sda2 and /dev/sdb2 didn't add.

I tried (mistake?) adding /dev/sda2 explicitly with --add but it added
as a spare, not as a proper member.

I tried assembling explicitly with --assemble /dev/sda2 /dev/sdb2
/dev/sdc2 /dev/sdd2 and it complained of no RAID superblock on
/dev/sdb2.

Help? What next ;) Is there enough information in /dev/sdd2 and 
/dev/sdc2 to reconstruct the apparently missing superblocks on /dev/sda2 
and /dev/sdb2? Do I need to try to resize my partitions back to their 
old size so it can find the old superblock? Even if by adding /dev/sda2 
as a spare I've corrupted its superblock entirely, sdb2 should still 
have enough to save my array with 3 out of 4 devices?

Many thanks,

Jules

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-03  6:49 After partition resize, RAID5 array does not assemble on boot Jules Bean
@ 2008-06-03 21:19 ` Jules Bean
  2008-06-03 21:27   ` NeilBrown
  2008-06-03 21:46   ` Peter Rabbitson
  0 siblings, 2 replies; 14+ messages in thread
From: Jules Bean @ 2008-06-03 21:19 UTC (permalink / raw)
  To: linux-raid

Jules Bean wrote:
> Help? What next ;) Is there enough information in /dev/sdd2 and 
> /dev/sdc2 to reconstruct the apparently missing superblocks on /dev/sda2 
> and /dev/sdb2? Do I need to try to resize my partitions back to their 
> old size so it can find the old superblock? Even if by adding /dev/sda2 
> as a spare I've corrupted its superblock entirely, sdb2 should still 
> have enough to save my array with 3 out of 4 devices?

I have become convinced (correct me if you think I'm wrong) that the 
problem was cfdisk resizing the partitions but the kernel tables not 
being updated.

Therefore although I thought the partitions were 400G, the kernel still 
thought they were 250G, and presumably the raid subsystem used that figure.

So the raid subsytem probably recorded its superblock as if the 
partitions were still only 250G long? So I ought to be able to find that 
superblock again by resizing the partitions back?

Alas I didn't take precise notes of my old partition table (stupid 
error). I have tried a couple of cylinder counts near 250G but no luck. 
Is there any good way to 'search for' somethign which looks like a RAID 
superblock?

Does the mdadm --detail output I pasted in my last message hold any clues?

Jules

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-03 21:19 ` Jules Bean
@ 2008-06-03 21:27   ` NeilBrown
  2008-06-04  6:31     ` Jules Bean
  2008-06-03 21:46   ` Peter Rabbitson
  1 sibling, 1 reply; 14+ messages in thread
From: NeilBrown @ 2008-06-03 21:27 UTC (permalink / raw)
  To: Jules Bean; +Cc: linux-raid

On Wed, June 4, 2008 7:19 am, Jules Bean wrote:
> Jules Bean wrote:
>> Help? What next ;) Is there enough information in /dev/sdd2 and
>> /dev/sdc2 to reconstruct the apparently missing superblocks on /dev/sda2
>> and /dev/sdb2? Do I need to try to resize my partitions back to their
>> old size so it can find the old superblock? Even if by adding /dev/sda2
>> as a spare I've corrupted its superblock entirely, sdb2 should still
>> have enough to save my array with 3 out of 4 devices?
>
> I have become convinced (correct me if you think I'm wrong) that the
> problem was cfdisk resizing the partitions but the kernel tables not
> being updated.

That sounds likely.... That really should get fixed one day!

>
> Therefore although I thought the partitions were 400G, the kernel still
> thought they were 250G, and presumably the raid subsystem used that
> figure.
>
> So the raid subsytem probably recorded its superblock as if the
> partitions were still only 250G long? So I ought to be able to find that
> superblock again by resizing the partitions back?
>
> Alas I didn't take precise notes of my old partition table (stupid
> error). I have tried a couple of cylinder counts near 250G but no luck.
> Is there any good way to 'search for' somethign which looks like a RAID
> superblock?
>
> Does the mdadm --detail output I pasted in my last message hold any clues?
>

Yes.  Based on the "used device size", the smallest device was between
242244096K and 242244160K

Hopefully both of the smaller devices were the same size.

NeilBrown


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-03 21:19 ` Jules Bean
  2008-06-03 21:27   ` NeilBrown
@ 2008-06-03 21:46   ` Peter Rabbitson
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Rabbitson @ 2008-06-03 21:46 UTC (permalink / raw)
  To: Jules Bean; +Cc: linux-raid

Jules Bean wrote:

> Is there any good way to 'search for' somethign which looks like a RAID 
> superblock?
> 

The superblock itself starts with the "magic number" 0xA92B4EFC

HTH

Peter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-03 21:27   ` NeilBrown
@ 2008-06-04  6:31     ` Jules Bean
  2008-06-04  6:36       ` Peter Rabbitson
  2008-06-04  7:58       ` David Greaves
  0 siblings, 2 replies; 14+ messages in thread
From: Jules Bean @ 2008-06-04  6:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

NeilBrown wrote:
>> Does the mdadm --detail output I pasted in my last message hold any clues?
>>
> 
> Yes.  Based on the "used device size", the smallest device was between
> 242244096K and 242244160K
> 
> Hopefully both of the smaller devices were the same size.

They were (the same size) but I couldn't find the superblock. I tried 
several possible partition sizes around 242244032K (plus or minus a 
cylinder or two) and no luck.

In the end I gritted my teeth and, following the advice in

http://joshuahayes.blogspot.com/2006/11/expand-existing-raid-5-array.html

I forced mdadm --create to recreate the array using correct parameters.

  mdadm --create /dev/md0 --chunk=64 --level=5 --layout=left-symmetric 
--raid-devices=4 --size=242244032 /dev/sda2 /dev/sdb2 /dev/sdd2 /dev/sdc2

and after a reboot, everything came all the way up to multiuser (which 
is significant, because /usr was on an LVM on this RAID partition) and 
as far as I can see, everything is fine.

Phew!

As to where my superblock has gone, the only theory I have is that the 
MD layer knew that my partitions were 400G large while the kernel was 
convinced they were 250G large, so the md layer tried to write the 
superblock at (approx) +400G, and the kernel refused to do that.

However, since the used dev size was only 242244032, all my actual data 
was safe and just using mdadm to recreate the superblocks was all that 
was needed.

The lesson here is always reboot after changing partition sizes, unless 
you have a tool which reliably flushes the kernel partition table cache 
(partprobe?)

Thanks for the help,

Jules

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-04  6:31     ` Jules Bean
@ 2008-06-04  6:36       ` Peter Rabbitson
  2008-06-04  7:58       ` David Greaves
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Rabbitson @ 2008-06-04  6:36 UTC (permalink / raw)
  To: Jules Bean; +Cc: NeilBrown, linux-raid

Jules Bean wrote:
> The lesson here is always reboot after changing partition sizes, unless 
> you have a tool which reliably flushes the kernel partition table cache 
> (partprobe?)
> 

hdparm -z ?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-04  6:31     ` Jules Bean
  2008-06-04  6:36       ` Peter Rabbitson
@ 2008-06-04  7:58       ` David Greaves
  2008-06-04  8:30         ` Jules Bean
  1 sibling, 1 reply; 14+ messages in thread
From: David Greaves @ 2008-06-04  7:58 UTC (permalink / raw)
  To: Jules Bean, NeilBrown; +Cc: linux-raid

Jules Bean wrote:
> As to where my superblock has gone, the only theory I have is that the
> MD layer knew that my partitions were 400G large while the kernel was
> convinced they were 250G large, so the md layer tried to write the
> superblock at (approx) +400G, and the kernel refused to do that.

I failed to do a similar grow operation recently and had to re-create.

I was using 0.9 sb which is stored at the end of the disk.
I have no idea how this is supposed to work...

If I have sda1 at 250Mb then the sb is at 250-d Mb
I'd like to stop the array, remove the partition, grow the partition to 400Mb
and start the array.
This won't work because md won't find an sb at 400-d Mb and so won't know that
it's an md component.

However, with a 1.1 or 1.2 sb I think it would work.

I tried using Michael Tokarev's mdsuper to pull the sb from the partition,
resize and then push it to the end of the new partition but that went wrong
somewhere.

I think the process should be:

1 stop array
2 mdadm --save-superblock=component.sb /dev/<component>
3 grow partition
4 mdadm --write-superblock=component.sb /dev/<component>
5 start array
6 grow array
7 grow fs

For sb 1.1 and 1.2 steps 2+4 should be no-ops
in step 4 mdadm may want to call the reread pt ioctl (which is what blockdev
--rereadpt does)

This approach, it seems to me, would avoid any reconstruction and would be a
'safer' way to grow the components.

If this sounds reasonable then I'd happily have a go at implementing
--save-superblock/--write-superblock

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-04  7:58       ` David Greaves
@ 2008-06-04  8:30         ` Jules Bean
  2008-06-04 11:51           ` David Greaves
  0 siblings, 1 reply; 14+ messages in thread
From: Jules Bean @ 2008-06-04  8:30 UTC (permalink / raw)
  To: David Greaves; +Cc: NeilBrown, linux-raid

David Greaves wrote:
> Jules Bean wrote:
>> As to where my superblock has gone, the only theory I have is that the
>> MD layer knew that my partitions were 400G large while the kernel was
>> convinced they were 250G large, so the md layer tried to write the
>> superblock at (approx) +400G, and the kernel refused to do that.
> 
> I failed to do a similar grow operation recently and had to re-create.
> 
> I was using 0.9 sb which is stored at the end of the disk.
> I have no idea how this is supposed to work...
> 
> If I have sda1 at 250Mb then the sb is at 250-d Mb
> I'd like to stop the array, remove the partition, grow the partition to 400Mb
> and start the array.
> This won't work because md won't find an sb at 400-d Mb and so won't know that
> it's an md component.

That doesn't matter.

Just add it as a fresh component. The old SB is irrelevant.

1. Fail component
2. remove component
3. resize partition
4. FORCE KERNEL TO NOTICE NEW SIZE (that's what I got wrong!). Reboot is 
safest.
5. add component as new
6. watch as md layer rebuilds

If I hadn't screwed up step 4, I would have been fine. I have now done 
step 4 correctly and grown my array to used dev size 400 (up from 250).

Of course this does assume your RAID level has the redundancy required 
to remove a component (i.e. not RAID0).

 > in step 4 mdadm may want to call the reread pt ioctl (which is what
 > blockdev --rereadpt does)

Seems to me that whilst cfdisk makes no visible attempt, plain 'fdisk' 
does try to call this ioctl but nonetheless it doesn't work if some 
other partition on that disk is busy (e.g. involved in some other md 
device, or mounted elsewhere). I saw messages to this effect whilst 
experimenting with different partition sizes looking for my missing 
superblock.

> This approach, it seems to me, would avoid any reconstruction and would be a
> 'safer' way to grow the components.

It would avoid reconstruction which is good for the impatient.

I don't really see that it's "safer" though. I would have thought it was 
quicker but potentially less safe.

Jules

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-04  8:30         ` Jules Bean
@ 2008-06-04 11:51           ` David Greaves
  2008-06-04 13:14             ` Jules Bean
  2008-06-12  3:59             ` Neil Brown
  0 siblings, 2 replies; 14+ messages in thread
From: David Greaves @ 2008-06-04 11:51 UTC (permalink / raw)
  To: Jules Bean; +Cc: NeilBrown, linux-raid

Jules Bean wrote:
> David Greaves wrote:
> Just add it as a fresh component. The old SB is irrelevant.
> 
> 1. Fail component
> 2. remove component
> 3. resize partition
> 4. FORCE KERNEL TO NOTICE NEW SIZE (that's what I got wrong!). Reboot is
> safest.
> 5. add component as new


> 6. watch as md layer rebuilds
And that's the bit that shouldn't be needed :)

Otherwise you're 'just' replacing devices one at a time which isn't very
interesting.

> Of course this does assume your RAID level has the redundancy required
> to remove a component (i.e. not RAID0).
Neil, would the sb-move approach support RAID0?

>> in step 4 mdadm may want to call the reread pt ioctl (which is what
>> blockdev --rereadpt does)
> 
> Seems to me that whilst cfdisk makes no visible attempt, plain 'fdisk'
> does try to call this ioctl but nonetheless it doesn't work if some
> other partition on that disk is busy (e.g. involved in some other md
> device, or mounted elsewhere). I saw messages to this effect whilst
> experimenting with different partition sizes looking for my missing
> superblock.
I read (http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/4319.html) there
are some ioctls that work even when some disk partitions are still in use
(providing there is no impact).
However I couldn't see them in the block-layer ioctls or any code in
fs/partitions/check.c

>> This approach, it seems to me, would avoid any reconstruction and
>> would be a
>> 'safer' way to grow the components.
> 
> It would avoid reconstruction which is good for the impatient.
> 
> I don't really see that it's "safer" though. I would have thought it was
> quicker but potentially less safe.

Avoiding a lot of time stress testing the disks in degraded mode isn't 'safer'?

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-04 11:51           ` David Greaves
@ 2008-06-04 13:14             ` Jules Bean
  2008-06-06 13:52               ` Bill Davidsen
  2008-06-12  3:59             ` Neil Brown
  1 sibling, 1 reply; 14+ messages in thread
From: Jules Bean @ 2008-06-04 13:14 UTC (permalink / raw)
  To: David Greaves; +Cc: NeilBrown, linux-raid

David Greaves wrote:
>> I don't really see that it's "safer" though. I would have thought it was
>> quicker but potentially less safe.
> 
> Avoiding a lot of time stress testing the disks in degraded mode isn't 'safer'?

Stress testing the disks by an md rebuild is a feature! It increases 
confidence that they work.

;)

Seriously, I understand your point now. Yes, a rebuild-free partition 
resize would be a nice feature. So would a "help, please find my 
superblock by exhaustive scanning" utility ;)

Jules

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-04 13:14             ` Jules Bean
@ 2008-06-06 13:52               ` Bill Davidsen
  2008-06-06 14:42                 ` David Greaves
  0 siblings, 1 reply; 14+ messages in thread
From: Bill Davidsen @ 2008-06-06 13:52 UTC (permalink / raw)
  To: Jules Bean; +Cc: David Greaves, NeilBrown, linux-raid

Jules Bean wrote:
> David Greaves wrote:
>>> I don't really see that it's "safer" though. I would have thought it 
>>> was
>>> quicker but potentially less safe.
>>
>> Avoiding a lot of time stress testing the disks in degraded mode 
>> isn't 'safer'?
>
> Stress testing the disks by an md rebuild is a feature! It increases 
> confidence that they work.
>
> ;)
>
> Seriously, I understand your point now. Yes, a rebuild-free partition 
> resize would be a nice feature. So would a "help, please find my 
> superblock by exhaustive scanning" utility ;)

Since this code must work when a partition is added on a totally new 
drive, and when the partition is grown DOWN from the low end, clearly 
the default must be a rebuild. And running "repair" before doing this 
stuff is a really good idea!

What is needed is to do something like assume-clean on the old data and 
a sync on the new chunks. I don't see that there is a remotely safe way 
to do that, currently, although if you were willing to be unsafe you 
could remove a partition, grow it at the "top" end, and reassemble with 
--assume-clean. Sprinkling with holy water first might be a good thing. 
I'm just thinking out loud here, there are probably good reasons why 
this wouldn't work.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-06 13:52               ` Bill Davidsen
@ 2008-06-06 14:42                 ` David Greaves
  2008-06-06 14:46                   ` Jules Bean
  0 siblings, 1 reply; 14+ messages in thread
From: David Greaves @ 2008-06-06 14:42 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Jules Bean, NeilBrown, linux-raid

Bill Davidsen wrote:
> What is needed is to do something like assume-clean on the old data and
> a sync on the new chunks. I don't see that there is a remotely safe way
> to do that,

I assumed that --grow --size=max on a v1.2 superblock would do exactly that...
(well, start a resync at the old partition-end location anyway).

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-06 14:42                 ` David Greaves
@ 2008-06-06 14:46                   ` Jules Bean
  0 siblings, 0 replies; 14+ messages in thread
From: Jules Bean @ 2008-06-06 14:46 UTC (permalink / raw)
  To: David Greaves; +Cc: Bill Davidsen, NeilBrown, linux-raid

David Greaves wrote:
> Bill Davidsen wrote:
>> What is needed is to do something like assume-clean on the old data and
>> a sync on the new chunks. I don't see that there is a remotely safe way
>> to do that,
> 
> I assumed that --grow --size=max on a v1.2 superblock would do exactly that...
> (well, start a resync at the old partition-end location anyway).

I can confirm that when I did this on my array, it only resynced from 
the old-end. It "started" at 70% complete or so.

Jules

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: After partition resize, RAID5 array does not assemble on boot
  2008-06-04 11:51           ` David Greaves
  2008-06-04 13:14             ` Jules Bean
@ 2008-06-12  3:59             ` Neil Brown
  1 sibling, 0 replies; 14+ messages in thread
From: Neil Brown @ 2008-06-12  3:59 UTC (permalink / raw)
  To: David Greaves; +Cc: Jules Bean, linux-raid

On Wednesday June 4, david@dgreaves.com wrote:
> Jules Bean wrote:
> > David Greaves wrote:
> > Just add it as a fresh component. The old SB is irrelevant.
> > 
> > 1. Fail component
> > 2. remove component
> > 3. resize partition
> > 4. FORCE KERNEL TO NOTICE NEW SIZE (that's what I got wrong!). Reboot is
> > safest.
> > 5. add component as new
> 
> 
> > 6. watch as md layer rebuilds
> And that's the bit that shouldn't be needed :)
> 
> Otherwise you're 'just' replacing devices one at a time which isn't very
> interesting.
> 
> > Of course this does assume your RAID level has the redundancy required
> > to remove a component (i.e. not RAID0).
> Neil, would the sb-move approach support RAID0?

Maybe.  If the old partitions were all exactly the same size, and the
new are too.
If you start with different sized partitions, md/raid0 uses all the
space by having some stripes across fewer devices.  So changing the
sizes of the partitions will confuse things.

> 
> >> in step 4 mdadm may want to call the reread pt ioctl (which is what
> >> blockdev --rereadpt does)
> > 
> > Seems to me that whilst cfdisk makes no visible attempt, plain 'fdisk'
> > does try to call this ioctl but nonetheless it doesn't work if some
> > other partition on that disk is busy (e.g. involved in some other md
> > device, or mounted elsewhere). I saw messages to this effect whilst
> > experimenting with different partition sizes looking for my missing
> > superblock.
> I read (http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-10/4319.html) there
> are some ioctls that work even when some disk partitions are still in use
> (providing there is no impact).
> However I couldn't see them in the block-layer ioctls or any code in
> fs/partitions/check.c

linux/block/ioctl.c.
 BLKPG_DEL_PARTITION
 BLKPG_ADD_PARTITION

Only works if the partition being changed isn't in used though.

NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-06-12  3:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-03  6:49 After partition resize, RAID5 array does not assemble on boot Jules Bean
2008-06-03 21:19 ` Jules Bean
2008-06-03 21:27   ` NeilBrown
2008-06-04  6:31     ` Jules Bean
2008-06-04  6:36       ` Peter Rabbitson
2008-06-04  7:58       ` David Greaves
2008-06-04  8:30         ` Jules Bean
2008-06-04 11:51           ` David Greaves
2008-06-04 13:14             ` Jules Bean
2008-06-06 13:52               ` Bill Davidsen
2008-06-06 14:42                 ` David Greaves
2008-06-06 14:46                   ` Jules Bean
2008-06-12  3:59             ` Neil Brown
2008-06-03 21:46   ` Peter Rabbitson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).