Re: [PATCH 00/29] OLCE, migrations and raid10 takeover

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Neil Brown <neilb@suse.de>
To: "Kwolek, Adam" <adam.kwolek@intel.com>
Cc: "Wojcik, Krzysztof" <krzysztof.wojcik@intel.com>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"Ciechanowski, Ed" <ed.ciechanowski@intel.com>,
	"Neubauer, Wojciech" <Wojciech.Neubauer@intel.com>
Subject: Re: [PATCH 00/29] OLCE, migrations and raid10 takeover
Date: Tue, 28 Dec 2010 13:24:25 +1100	[thread overview]
Message-ID: <20101228132425.5bebf1ae@notabene.brown> (raw)
In-Reply-To: <905EDD02F158D948B186911EB64DB3D176C65EA1@irsmsx503.ger.corp.intel.com>

On Mon, 27 Dec 2010 14:56:05 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> Hi,
> 
> Please find below my few comments for current grow/reshape code on devel 3.2.
> 
> In super-intel.c, update_reshape_container_disk update is defined.
> It is intended to send for container only and operates on all devices /arrays/ in container in one shot. 
> This is IMSM incompatible way. IMSM updates disks list in metadata and first device information during first array reshape.
> During second array reshape, device is updated only (using disk information updated during first reshape)
> As it I am showing above. Container operation is in fact set of single array operations. This is due to single migration area (per container),
> where backup is stored. It is related to currently reshaped device and we can have single array in reshape state only.
> 
> This drives to conclusion that update should be issued per array and should have device id (dev minor or subdev).
> 
> Considering above current implementation of update_reshape_container_disk is not compatible with imsm.
> 
> This is driven by current reshape_container() implementation in Grow.c.
> The flow goes as follow:
> 1. reshape_container()
>   - metadata update for container (+ sync_metadata)
>   - call in loop reshape_array() for each array in container
> 
> 2. reshape_array()
>    - adds disks for each device in container - but it should add disks to this particular array it is called for (loop is on reshape_container() already)
>    - reshape_array() - currently doesn't update metadata, if so - in line 1891 sync_metadata() seems to be not necessary /no reshape_super() call/
>    - currently reshape_array() assumes that metadata is ready for all devices in container - this is not imsm compatible /as described above/
> 
> How mdadm should work to be imsm compatible:
> 1. container operation is set of array operations (not atomic/single container update)
> 2. for every array the same disk set should be used (it is covered in current devel 3.2 code but not in imsm compatible way)
> 3. there is not metadata update after reshape_super() call for container.
> 4. first array update should update disk list in metadata
> 6. second array update should reuse (+verify) used disk list present in metadata already
> 7. reshape cannot be run in parallel
> 
> Main code changes I'm proposing?
> Problem 1.
> 	Main code in Grow.c cannot assume when whole metadata /all arrays in container/ will be updated for particular external meta
> 	during reshape_super() in reshape_container(). It can issue no update to container/arrays/ due to compatibility reason.
> Resolution 1.
> 	reshape_super() has to be called for container and for every array in reshape_array() (i.e.) before new disks will be added to array (and after monitor is running).
> 	We can distinguish calls by comparing (as currently), st->container_dev and st->devnum.
> 	This causes that in reshape_array(), spares cannot be added to all arrays in container at one time (based on metadata information), 
> 	but for current array only (all arrays will be covered by loop in reshape_container()).
> 	This means also, that disks cannot be added based on metadata information without adding additional reshape_suped() call in reshape_array(),
> 	as it cannot be assumed that all arrays in container are updated at this moment without this call.
> 
> Problem 2.
> 	As reshape_container() performs action for all arrays in container probably we should allow for code reuse from Grow.c by external meta.
> Resolution 2.
> 	In my implementation, I've did it by moving manage_reshape() call to the end of reshape processing. In this function I've manage array size and backward takeover to raid0 /optionally/
>   	I think that this would be correct for imsm, but to have flexibility manage_reshape() call should not be changed. In such situation additional function can be added:
> 	i.e. finalize_reshape() for purposes I've pointed above (after wait_reshape() and before level change action, Grow.c:2045).
> 	Action after manage_reshape() should be based on manage_reshape() on condition if it exists and is yes, on return code form this call.
> 	This allows metadata specific code decide to reuse or not /or for some cases/ as much as it is possible of main reshape code.
> 
> Problem 3.
> 	After manage_reshape() call we are returning to main reshape_container() code, I assume that manage_reshape() should execute at some point fork() and return,
> 	as code in Grow.c does. This can cause running parallel reshapes using i.e. single backup file.
> Resolution 3. 
> 	Resolution will depend on particular metadata implementation, but as it is single backup_file used, I can assume that intention is sequential reshapes run.
> 	This can be handled in reshape_super() by waiting for all arrays to be in idle state. This entry condition should be checked in reshape_super() called in reshape_container()
> 	without waiting for state change, and in reshape_super() called in reshape_array() with waiting for idle state.
> 
> 
> Considering all above I think that container reshape should work as follow (using current devel 3.2 code):
> 
> reshape_container()
> {
>   a) reshape_super(container context)
> 	- check reshape condition and exit with error when any array is not ready for reshape
> 	  Checking condition here makes us safe that we can wait for required condition during reshape_super() call in array context later
> 	- leaving this call here makes possible to implement container update also if metadata requires this
>   b) fork() if reshape_super() call succeed (+).
> 	For multi-array container this is required to return to console for sequential work on arrays. 
> 	When we want to procced for second array, we have to wait until first reshape is finished. 
> 	This would block console.
>   c) call in loop for all devices reshape_array()
> 	1. analyse_change()
> 	2. change level and start monitor (if required)
> 	3. reshape_super(array context) (+)
> 		- check reshape condition, if any array is in reshape state in container this time wait for reshape end.
> 		- send metadata update and allow for reshape processing
> 	4. load metadata (container_content()) and based on this information add disks to md for single array (currently processed)
> 	5. set other array parameters to md (including raid_disks)
> 	6. start_reshape()
> 	7. manage_reshape()(+) -> if metadata requires take ownership of process or ...
> 	8. control reshape by Grow.c code using fork() 
> 	   (possible that fork usage can be optional for container, as it is forked already)
> 		- child_XXX()
> 		- wait reshape
> 		- finalize_reshape(for external metadata)(+) - we can handle size change in common code usage case
> 		- backward takeover
> 	9. return to reshape_container() and process next array (c.) if any
> }
> 
> Things added to your current design (marked above with (+)):
> 	1. additional fork()
> 	2. additional reshape_super() in reshape_array()
> 	3. change in call to manage_reshape() to allow processing in main code for external metadata
> 		Depends on:
> 			- if manage_reshape() exists
> 			- manage_reshape() return code
> 	4. added finalize_reshape()
> 
> 
> Above flow gives possibility to:
> 1. run reshapes in sequence or
> 2. run reshapes in parallel 
> 3. reuse Grow.c code (we can use manage_reshape, finalize_reshape, code in Grow.c or any reasonable combination)
> 
> 	and all this is up to metadata specific code (reshape_super()/manage_reshape()/finalize_reshape()).
> 
> Please let me know what do you think about this. If you like it, I'll start work on code changes.
> 
> BR
> Adam
> 


Hi Adam,
 thanks for you comments and the obvious effort you have put into
 understanding the new structure.

 I agree that my code is making some assumption that don't fit quite right
 with IMSM.  However I think they can be resolved with relatively small
 changes.

 I don't think we should support parallel reshaping at all.  md doesn't allow
 multiple arrays that share a device to be reshaped at the same time anyway,
 and running parallel reshapes would result in horrible performance.  So we
 should only support multiple reshapes in series.  Therefore on a single
 'fork' call is needed.  The forked child then handles all of the reshaping.

 I only want one call to ->reshape_super.  It essentially tells the metadata
 handler that the user has requested a reshape, and as the user only made one
 request, only one call to ->reshape_super should be made.

 I had assume that the one call would update the metadata for all of the
 arrays, but you say that is incorrect for IMSM. In that case it should just
 update the metadata for the first array.  When that reshape completes, the
 metadata will be told (in ->set_array_state) and it should at that point
 update the metadata to reflect that the second array is undergoing reshape.

 This requires slightly different behaviour in reshape_container.  Instead of
 calling ->container_content just once and then reshaping each array it should
 repeatedly:
    - load the metadata
    - call ->container_content, find the first array which is ready for
      reshape
    - process that reshape so that it completes
    - ping_monitor to ensure that the metadata is updated
 That should be quite general and compatible with IMSM.

 I am not entirely sure that you wanted finalize_reshape to do.  If it was to
 update the metadata to show that the reshape was complete, then that should
 be done in ->set_array_state.  If it was to e.g. convert a RAID4 back to
 RAID0, that is already done towards the end of reshape_array.

Thanks,
NeilBrown

next prev parent reply	other threads:[~2010-12-28  2:24 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-09 15:18 [PATCH 00/29] OLCE, migrations and raid10 takeover Adam Kwolek
2010-12-09 15:18 ` [PATCH 01/29] Add state_of_reshape for external metadata Adam Kwolek
2010-12-09 15:18 ` [PATCH 02/29] imsm: Prepare reshape_update in mdadm Adam Kwolek
2010-12-14  0:07   ` Neil Brown
2010-12-14  7:54     ` Kwolek, Adam
2010-12-09 15:19 ` [PATCH 03/29] imsm: Process reshape_update in mdmon Adam Kwolek
2010-12-09 15:19 ` [PATCH 04/29] imsm: Block array state change during reshape Adam Kwolek
2010-12-09 15:19 ` [PATCH 05/29] Process reshape initialization by managemon Adam Kwolek
2010-12-09 15:19 ` [PATCH 06/29] imsm: Verify slots in meta against slot numbers set by md Adam Kwolek
2010-12-09 15:19 ` [PATCH 07/29] imsm: Cancel metadata changes on reshape start failure Adam Kwolek
2010-12-09 15:19 ` [PATCH 08/29] imsm: Do not accept messages sent by mdadm Adam Kwolek
2010-12-09 15:19 ` [PATCH 09/29] imsm: Do not indicate resync during reshape Adam Kwolek
2010-12-09 15:20 ` [PATCH 10/29] imsm: Fill delta_disks field in getinfo_super() Adam Kwolek
2010-12-09 15:20 ` [PATCH 11/29] Control reshape in mdadm Adam Kwolek
2010-12-09 15:20 ` [PATCH 12/29] Finalize reshape after adding disks to array Adam Kwolek
2010-12-09 15:20 ` [PATCH 13/29] Add reshape progress updating Adam Kwolek
2010-12-09 15:20 ` [PATCH 14/29] WORKAROUND: md reports idle state during reshape start Adam Kwolek
2010-12-09 15:20 ` [PATCH 15/29] FIX: core during getting map Adam Kwolek
2010-12-09 15:20 ` [PATCH 16/29] Enable reshape for subarrays Adam Kwolek
2010-12-09 15:21 ` [PATCH 17/29] Change manage_reshape() placement Adam Kwolek
2010-12-09 15:21 ` [PATCH 18/29] Migration: raid5->raid0 Adam Kwolek
2010-12-09 15:21 ` [PATCH 19/29] Detect level change Adam Kwolek
2010-12-09 15:21 ` [PATCH 20/29] Migration raid0->raid5 Adam Kwolek
2010-12-09 15:21 ` [PATCH 21/29] Read chunk size and layout from mdstat Adam Kwolek
2010-12-09 15:21 ` [PATCH 22/29] FIX: mdstat doesn't read chunk size correctly Adam Kwolek
2010-12-09 15:21 ` [PATCH 23/29] Migration: Chunk size migration Adam Kwolek
2010-12-09 15:21 ` [PATCH 24/29] Add takeover support for external meta Adam Kwolek
2010-12-09 15:22 ` [PATCH 25/29] Takeover raid10 -> raid0 for external metadata Adam Kwolek
2010-12-09 15:22 ` [PATCH 26/29] Takeover raid0 -> raid10 " Adam Kwolek
2010-12-09 15:22 ` [PATCH 27/29] FIX: Problem with removing array after takeover Adam Kwolek
2010-12-09 15:22 ` [PATCH 28/29] IMSM compatibility for raid0 -> raid10 takeover Adam Kwolek
2010-12-09 15:22 ` [PATCH 29/29] Add spares to raid0 in mdadm Adam Kwolek
2010-12-16 11:20 ` [PATCH 00/29] OLCE, migrations and raid10 takeover Neil Brown
2010-12-20  8:27   ` Wojcik, Krzysztof
2010-12-20 21:59     ` Neil Brown
2010-12-21 12:37     ` Neil Brown
2010-12-27 14:56       ` Kwolek, Adam
2010-12-28  2:24         ` Neil Brown [this message]
2010-12-28  8:49           ` Kwolek, Adam
2010-12-29 10:34             ` Neil Brown
2010-12-29 12:29               ` Kwolek, Adam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101228132425.5bebf1ae@notabene.brown \
    --to=neilb@suse.de \
    --cc=Wojciech.Neubauer@intel.com \
    --cc=adam.kwolek@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=ed.ciechanowski@intel.com \
    --cc=krzysztof.wojcik@intel.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).