Assume-clean for md grow

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Assume-clean for md grow
@ 2009-02-25 14:22 Chris Webb
  2009-02-25 19:34 ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Webb @ 2009-02-25 14:22 UTC (permalink / raw)
  To: linux-raid

I use md arrays made up of slots synthesized through device mapper from
physical storage distributed across a cluster of machines. When these slots
are created, they are guaranteed to be zero-initialised, so I can safely do
mdadm --create --assume-clean to avoid an initial resync.

When I grow the backing slots, the new storage space is also
zero-initialised, so I'd like to be able to do the equivalent of
mdadm --grow --size=max --assume-clean. However, --assume-clean isn't
supported for grow operations.

Is there some way I can tell the kernel driver (perhaps through sysfs) that
the resync is clean/already complete to avoid unnecessary heavy IO on every
grow? I tried things like

  echo idle >/sys/block/mdX/md/sync_action

without success.

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Assume-clean for md grow
  2009-02-25 14:22 Assume-clean for md grow Chris Webb
@ 2009-02-25 19:34 ` NeilBrown
  2009-02-26 14:54   ` Chris Webb
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2009-02-25 19:34 UTC (permalink / raw)
  To: Chris Webb; +Cc: linux-raid

On Thu, February 26, 2009 1:22 am, Chris Webb wrote:
> I use md arrays made up of slots synthesized through device mapper from
> physical storage distributed across a cluster of machines. When these
> slots
> are created, they are guaranteed to be zero-initialised, so I can safely
> do
> mdadm --create --assume-clean to avoid an initial resync.
>
> When I grow the backing slots, the new storage space is also
> zero-initialised, so I'd like to be able to do the equivalent of
> mdadm --grow --size=max --assume-clean. However, --assume-clean isn't
> supported for grow operations.
>
> Is there some way I can tell the kernel driver (perhaps through sysfs)
> that
> the resync is clean/already complete to avoid unnecessary heavy IO on
> every
> grow? I tried things like
>
>   echo idle >/sys/block/mdX/md/sync_action
>
> without success.

No, there isn't any way do get --grow to --assume-clean.
And I cannot immediately think of a neat way to implement it.
I'll have a think about it and let you know if I come up with anything.

If you want to hack your own kernel, you could just remove the
		mddev->recovery_cp = mddev->dev_sectors;
from raid1_resuze (Assuming it is raid1 arrays that you are growing).
Then the resync-after-grow would never happen.

Or if you want to be slightly more subtle, remove the
	if (mddev->pers)
		return -EBUSY;
from resync_start_store.  Then before a grow that you want to be
--assume-clean, write into /sys/block/mdXX/md/resync_start the
number of sectors in the final raid1 array.

That last might be the way I end up doing it, but I'm not sure yet.

NeilBrown


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Assume-clean for md grow
  2009-02-25 19:34 ` NeilBrown
@ 2009-02-26 14:54   ` Chris Webb
  2009-02-26 15:57     ` Chris Webb
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Webb @ 2009-02-26 14:54 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

NeilBrown <neilb@suse.de> writes:

> Or if you want to be slightly more subtle, remove the
> 	if (mddev->pers)
> 		return -EBUSY;
> from resync_start_store.  Then before a grow that you want to be
> --assume-clean, write into /sys/block/mdXX/md/resync_start the
> number of sectors in the final raid1 array.

Hi Neil. Thanks for this suggestion: it looks fine for what we're looking to
achieve. Interestingly, this will mean writing a smaller value that shown by
a read from this file for an in-sync array, but will still work?

I'll build a patched kernel and give it a go a little later on.

Best wishes,

Chris.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Assume-clean for md grow
  2009-02-26 14:54   ` Chris Webb
@ 2009-02-26 15:57     ` Chris Webb
  2009-02-26 16:03       ` Chris Webb
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Webb @ 2009-02-26 15:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Chris Webb <chris@arachsys.com> writes:

> NeilBrown <neilb@suse.de> writes:
> 
> > Or if you want to be slightly more subtle, remove the
> > 	if (mddev->pers)
> > 		return -EBUSY;
> > from resync_start_store.  Then before a grow that you want to be
> > --assume-clean, write into /sys/block/mdXX/md/resync_start the
> > number of sectors in the final raid1 array.
> 
> Hi Neil. Thanks for this suggestion: it looks fine for what we're looking to
> achieve. Interestingly, this will mean writing a smaller value that shown by
> a read from this file for an in-sync array, but will still work?

For instance, 

  3# cat /sys/block/md127/{size,md/resync_start}
  2097136
  18446744073709551615

I just tried growing the slots up from 1G to 3G, then

  echo $((2097136 + 2*1024*1024*2)) >/sys/block/md127/md/resync_start
  mdadm --grow 3145720

but this gives me

  md127 : active raid1 dm-3[1] dm-2[0]
        3145720 blocks super 1.1 [2/2] [UU]
          resync=PENDING

in /proc/mdstat, which is presumably not right?

Does this sysfs file hold the size of the array or the size of the
components, for implementing the non-RAID1 case?

Also, if I don't know the size of the final array[1], is it safe to write a
value much larger than the size of the array in here, or will that cause
future grows to be clean when this isn't necessarily intended?

[1] One of the things which has been most awkward with using md as part of a
automated storage system has been going from component size to available
array size and back again, given that bitmap reservation depends on the
original size of the array not the current size of the array. (In the end,
we've cheated and always written everything in terms of change in component
size vs change in array size, and been generous with the amount of space we
allocate to components on initial device create. It does feel like I'm
coding far too much knowledge of the internal choices mdadm makes into my
management layer, though!)

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Assume-clean for md grow
  2009-02-26 15:57     ` Chris Webb
@ 2009-02-26 16:03       ` Chris Webb
  0 siblings, 0 replies; 5+ messages in thread
From: Chris Webb @ 2009-02-26 16:03 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Chris Webb <chris@arachsys.com> writes:

> I just tried growing the slots up from 1G to 3G, then
> 
>   echo $((2097136 + 2*1024*1024*2)) >/sys/block/md127/md/resync_start
>   mdadm --grow 3145720
> 
> but this gives me
> 
>   md127 : active raid1 dm-3[1] dm-2[0]
>         3145720 blocks super 1.1 [2/2] [UU]
>           resync=PENDING
> 
> in /proc/mdstat, which is presumably not right?

Rewriting the original value of 18446744073709551615 back to
/sys/block/md127/md/resync_start removes this, although it looks like any
value lower than this gives a PENDING state. Presumably this means that:-

  echo 18446744073709551614 >/sys/block/md127/md/resync_start
  mdadm --grow --size=... /dev/md127
  echo 18446744073709551615 >/sys/block/md127/md/resync_start

is a generic recipe for growing an array 'assume-clean' without needing to know
anything more about the eventual size of the array other than the parameter
passed to --size= (max or component size).

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-02-26 16:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-25 14:22 Assume-clean for md grow Chris Webb
2009-02-25 19:34 ` NeilBrown
2009-02-26 14:54   ` Chris Webb
2009-02-26 15:57     ` Chris Webb
2009-02-26 16:03       ` Chris Webb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).