Re: [md PATCH 15/23] md/raid10 - support resizing some RAID10 arrays.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: keld@keldix.com
To: David Brown <david@westcontrol.com>
Cc: NeilBrown <neilb@suse.de>, linux-raid@vger.kernel.org
Subject: Re: [md PATCH 15/23] md/raid10 - support resizing some RAID10 arrays.
Date: Wed, 14 Mar 2012 13:37:46 +0100	[thread overview]
Message-ID: <20120314123746.GA11117@www5.open-std.org> (raw)
In-Reply-To: <4F6070EF.5060604@westcontrol.com>

On Wed, Mar 14, 2012 at 11:20:31AM +0100, David Brown wrote:
> On 14/03/2012 09:32, NeilBrown wrote:
> >On Wed, 14 Mar 2012 08:51:44 +0100 David Brown<david@westcontrol.com>  
> >wrote:
> >
> >>On 14/03/2012 07:27, NeilBrown wrote:
> >>>On Wed, 14 Mar 2012 07:17:46 +0100 keld@keldix.com wrote:
> >>>
> >>>>Hi Neil
> >>>>
> >>>>What is the problem with adding space to the 'far' layout?
> >>>>
> >>>>I would think you could just create the new array part 1 from the
> >>>>old array part 2, and then sync the new array part 2 with the new
> >>>>array part 1. (in the case of a far=2 array, for n>2 similar
> >>>>constructs would apply).
> >>>
> >>>If I understand your proposal correctly, you would lose redundancy
> >>>during the process, which is not acceptable.
> >>>
> >>
> >>That's how I understood the suggestion too.  And in some cases, that
> >>might be a good choice for the user - if they have good backups, they
> >>might be happy to risk such a re-shape.  Of course, they would have to
> >>use the "--yes-I-really-understand-the-risks" flag to mdadm, but other
> >>than that it should be pretty simple to implement.
> >
> >Patches welcome :-)
> >
> >(well, actually not - I really don't like the idea.  But my point is that
> >these things turn out to be somewhat more complicated than they appear at
> >first).
> >
> 
> I haven't written any code for md raid, but I've looked at enough to 
> know that you have to tread carefully - especially as people expect a 
> particularly high level of code correctness in this area.  "Pretty 
> simple to implement" is a relative term!
> 
> I can imagine use cases where it would be better to have an unsafe 
> resize than no resize - and maybe also cases where a fast unsafe resize 
> is better than a slow safe resize.  But I can also imagine people 
> getting upset when they find they have used the wrong one, and I can 
> also see that implementing one "fast but unsafe" feature could easy be 
> the start of a slippery slope.
> 
> >>
> >>For a safe re-shape of raid10, you would need to move the "far" copy
> >>backwards to the right spot on the growing disk (or forwards if you are
> >>shrinking the array).  It could certainly be done safely, and would be
> >>very useful for users, but it is not quite as simple as an unsafe re-size.
> >
> >Reshaping a raid10-far to use a different amount of the device would
> >certainly be possibly, but is far from trivial.
> >One interesting question is how to record all the intermediate states in 
> >the
> >metadata.
> >
> 
> I had only been thinking of the data itself, not the metadata.
> 
> When doing the reshape, you would start off with some free space at the 
> end of the device (assuming you are growing the raid).  You copy a block 
> of data from near the end of the far copy to its new place in the free 
> space.  Then you can update the metadata to track that change.  While 
> you are doing the metadata update, both the original part of the far 
> copy, and the new part are valid, so you should be safe if you get a 
> crash during the update.  Once the metadata has been updated, you've got 
> some new free space ready to move the next block.  I don't /think/ you'd 
> need to track much new metadata - just a "progress so far" record.
> 
> Of course, any changes made to the data and filesystems while this is in 
> progress might cause more complications...
> 
> 
> 
> One particular situation that might be easier as a special case, and 
> would be common in practice, would be when growing a raid10,far to 
> devices that are at least twice the size.  If you pretend that the 
> existing raid10,f device sits on top of a newly created, bigger raid10,f 
> device, then standard raid10,far synchronisation code would copy over 
> everything to the right place in the bigger disks - even if the data 
> changes underway.  This artificial big raid10,f would have its metadata 
> in memory only - there is no need to save anything, since you still have 
> the normal original raid10 copy for safety.  Once the new big raid is 
> fully synchronised, you write its metadata over the original raid10 
> metadata.
> 
> 
> I'm just throwing around ideas here.  If they are of help or inspiration 
> to anyone, that's great - if not, that's okay too.
> 
> mvh.,
> 
> David
> 
> 
> 
> 
> >NeilBrown
> >
> >
> >
> >>
> >>mvh.,
> >>
> >>David
> >>
> >>
> >>>If I don't understand properly - please explain in a bit more
> >>>detail.

Well, my knowledge of the kernel code is quite limited, and I originally did not know of your
aim at keeping it safe. Anyway for keeping it safe, I then have some ideas, for raid10,f

Assuming a grow, you could copy within the same raid0 part, that is: you could just make a raid0 grow.
And make a raid0 grow for each of the raid0 parts in the raid10,f array.

Some details: initially you could copy the first blocks to an unused area of one of the new disks.
This would be to get started. Once you have gotten clear of the problem with writing over
data that you have just read then you just need to copy and keep track of where you are,
block number read and block number written. No need to keep other copies. And the ordinary read/writes
will go to the disk blocks according to the old/new blocks divide. For metadata, I dont know what
more you need to keep track of.  For efficiency I would use quite large buffers, say 2 to 8 MB stripes.

Best regards
keld

next prev parent reply	other threads:[~2012-03-14 12:37 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-14  4:40 [md PATCH 00/23] md patches heading for 3.4 NeilBrown
2012-03-14  4:40 ` [md PATCH 05/23] md/raid5: use atomic_dec_return() instead of atomic_dec() and atomic_read() NeilBrown
2012-03-14  4:40 ` [md PATCH 02/23] md/raid10: remove unnecessary smp_mb() from end_sync_write NeilBrown
2012-03-14  4:40 ` [md PATCH 04/23] md: Use existed macros instead of numbers NeilBrown
2012-03-14  4:40 ` [md PATCH 03/23] md/raid5: removed unused 'added_devices' variable NeilBrown
2012-03-14  4:40 ` [md PATCH 06/23] md: allow last device to be forcibly removed from RAID1/RAID10 NeilBrown
2012-03-14  4:40 ` [md PATCH 01/23] md/raid5: make sure reshape_position is cleared on error path NeilBrown
2012-03-14  4:40 ` [md PATCH 10/23] md/raid1, raid10: avoid deadlock during resync/recovery NeilBrown
2012-03-14  4:40 ` [md PATCH 13/23] md/raid10: handle merge_bvec_fn in member devices NeilBrown
2012-03-14  4:40 ` [md PATCH 11/23] md: tidy up rdev_for_each usage NeilBrown
2012-03-14  4:40 ` [md PATCH 07/23] md: allow re-add to failed arrays NeilBrown
2012-03-14  4:40 ` [md PATCH 09/23] md/bitmap: ensure to load bitmap when creating via sysfs NeilBrown
2012-03-14  4:40 ` [md PATCH 12/23] md: add proper merge_bvec handling to RAID0 and Linear NeilBrown
2012-03-14  4:40 ` [md PATCH 14/23] md/raid1: handle merge_bvec_fn in member devices NeilBrown
2012-03-14  4:40 ` [md PATCH 08/23] md: don't set md arrays to readonly on shutdown NeilBrown
2012-04-18 15:37   ` Alexander Lyakas
2012-04-18 17:44     ` Paweł Brodacki
2012-04-18 20:53       ` Alexander Lyakas
2012-04-18 22:48     ` NeilBrown
2012-04-19  9:11       ` Alexander Lyakas
2012-04-19  9:57         ` NeilBrown
2012-04-20 11:30           ` Paweł Brodacki
2012-04-20 12:01             ` NeilBrown
2012-04-21 15:18               ` Paweł Brodacki
2012-04-21 20:42                 ` NeilBrown
2012-04-30 10:32                   ` Paweł Brodacki
2012-04-20 16:26           ` John Robinson
2012-03-14  4:40 ` [md PATCH 22/23] md: fix clearing of the 'changed' flags for the bad blocks list NeilBrown
2012-03-14  4:40 ` [md PATCH 15/23] md/raid10 - support resizing some RAID10 arrays NeilBrown
2012-03-14  6:17   ` keld
2012-03-14  6:27     ` NeilBrown
2012-03-14  7:51       ` David Brown
2012-03-14  8:32         ` NeilBrown
2012-03-14 10:20           ` David Brown
2012-03-14 12:37             ` keld [this message]
2012-03-14  4:40 ` [md PATCH 16/23] md/bitmap: remove some unused noise from bitmap.h NeilBrown
2012-03-14  4:40 ` [md PATCH 20/23] md/bitmap: remove unnecessary indirection when allocating NeilBrown
2012-03-14  4:40 ` [md PATCH 19/23] md/bitmap: remove some pointless locking NeilBrown
2012-03-14  4:40 ` [md PATCH 21/23] md/bitmap: discard CHUNK_BLOCK_SHIFT macro NeilBrown
2012-03-14  4:40 ` [md PATCH 17/23] md/bitmap: move printing of bitmap status to bitmap.c NeilBrown
2012-03-14  4:40 ` [md PATCH 18/23] md/bitmap: change a 'goto' to a normal 'if' construct NeilBrown
2012-03-14  4:40 ` [md PATCH 23/23] md: Add judgement bb->unacked_exist in function md_ack_all_badblocks() NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120314123746.GA11117@www5.open-std.org \
    --to=keld@keldix.com \
    --cc=david@westcontrol.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).