From: NeilBrown <neilb@suse.de>
To: "Kwolek, Adam" <adam.kwolek@intel.com>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
"Williams, Dan J" <dan.j.williams@intel.com>,
"Ciechanowski, Ed" <ed.ciechanowski@intel.com>,
"Neubauer, Wojciech" <Wojciech.Neubauer@intel.com>
Subject: Re: [PATCH 0/3] Continue expansion after reboot
Date: Sun, 27 Feb 2011 17:51:20 +1100 [thread overview]
Message-ID: <20110227175120.20b6fe2c@notabene.brown> (raw)
In-Reply-To: <905EDD02F158D948B186911EB64DB3D17904F0CA@irsmsx503.ger.corp.intel.com>
On Fri, 25 Feb 2011 15:55:01 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:
>
>
> > -----Original Message-----
> > From: Kwolek, Adam
> > Sent: Wednesday, February 23, 2011 10:02 AM
> > To: 'NeilBrown'
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > Neubauer, Wojciech
> > Subject: RE: [PATCH 0/3] Continue expansion after reboot
> >
> >
> >
> > > -----Original Message-----
> > > From: NeilBrown [mailto:neilb@suse.de]
> > > Sent: Wednesday, February 23, 2011 4:38 AM
> > > To: Kwolek, Adam
> > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > > Neubauer, Wojciech
> > > Subject: Re: [PATCH 0/3] Continue expansion after reboot
> > >
> > > On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek <adam.kwolek@intel.com>
> > > wrote:
> > >
> > > > Currently reshaped/expanded array is assembled but it stays in
> > > inactive state.
> > > > This patches allows for array assembly when array is under
> > expansion.
> > > > Array with reshape/expansion information in metadata is assembled
> > > > and reshape process continues automatically.
> > > >
> > > > Next step:
> > > > Problem is how to address container operation during assembly.
> > > > 1. After first array being reshaped, assebly process looks if mdmon
> > > > sets migration for other array in container. If yes it continues
> > > work
> > > > for next array.
> > > >
> > > > 2. Assembly process performs reshape of currently reshaped array
> > only.
> > > > Mdmon sets next array for reshape and user triggers manually
> > mdadm
> > > > to finish container operation with just the same parameters set.
> > > >
> > > > Reshape finish can be executed for container operation by container
> > > re-assembly
> > > > also (this works in current code).
> > > >
> > >
> > > Yes, this is an awkward problem.
> > >
> > > Just to be sure we are thinking about the same thing:
> > > When restarting an array in which migration is already underway
> > mdadm
> > > simply
> > > forks and continues monitoring that migration.
> > > However if it is an array-wide migration, then when the migration of
> > > the
> > > first array completes, mdmon will update the metadata on the second
> > > array,
> > > but it isn't clear how mdadm can be told to start monitoring that
> > > array.
> > >
> > > How about this:
> > > the imsm metadata handler should report that an array is 'undergoing
> > > migration if it is, or if an earlier array in the container is
> > > undergoing a
> > > migration which will cause 'this' array to subsequently be migrated
> > > too.
> > >
> > > So if the first array is in the middle of a 4drive->5drive
> > conversion
> > > and
> > > the second array is simply at '4 drives', then imsm reported (to
> > > container_content) that the second drive is actually undergoing a
> > > migration
> > > from 4 to 5 drives, and is at the very beginning.
> > >
> > > When mdadm assembles that second array it will fork a child to
> > monitor
> > > it.
> > > It will need to somehow wait for mdmon to really update the metadata
> > > before
> > > it starts. This can probably be handled in the ->manage_reshape
> > > function.
> > >
> > > Something along those line would be the right way to go I think. It
> > > avoid
> > > any races between arrays being assembled at different times.
> >
> >
> > This looks fine for me.
> >
> > >
> > >
> > > > Adam Kwolek (3):
> > > > FIX: Assemble device in reshape state with new disks number
> > >
> > > I don't think this patch is correct. We need to configure the array
> > > with the
> > > 'old' number of devices first, then 'reshape_array' will also set the
> > > 'new'
> > > number of devices.
> > > What exactly what the problem you were trying to fix?
> >
> > When array is being assembled with old raid disk number assembly cannot
> > set readOnly array state
> > (error on sysfs state writing). Array stays in inactive state, so
> > nothing (reshape) happened later.
> >
> > I think that array cannot be assembled with old disks number (added new
> > disks are present as spares)
> > because begin of array uses new disks already. This means we are
> > assembling array with not complete disk set.
> > Stripes on begin can be corrupted (not all disks present in array). At
> > this point inactive array state is ok to keep safe user data.
> >
> >
> > I'll test is setting old disk number and later configuration change in
> > disks number and array state resolves problem.
> > I'll let you know results.
>
> I've made some investigations. I've tried assemble algorithm (as you suggested):
> Conditions:
> reshape 3 disk raid5 array to 4 disks raid5 array
> is interrupted. Restart is invoked by command 'mdadm -As'
>
> 1. Assemble() builds container with new disks number
> 2. Assemble() builds container content (array with /old/ 3 disks)
> 3. array is set to frozen to block monitor
> 4. sync_max in sysfs is set to 0, to block md until reshape monitoring takes carry about reshape process
> 5. Continue_reshape() starts reshape process
> 6. Continue_reshape() continues reshape process
>
> Problems I've met:
> 1. not all disks in Assembly() are added to array (old disks number limitation)
I want to fix this by getting sysfs_set_array to set up the new raid_disks
number.
It currently doesn't because the number of disks that md is to expect could
be different to the number of disks recorded in the metadata, and
"analyse_change" might be needed to resolve the difference.
A particular example is that the metadata might think a RAID0 is changing from
4 device to 5 devices, but md need to be told that a RAID4 is changing from
5 devices to 6 devices.
However in the case, we really need to do the 'analyse_change' before calling
sysfs_set_array anyway.
So get sysfs_set_array to set up the array fully, and find somewhere
appropriate to put a call to analyse_change ... possibly modifying
analyse_change a bit ...
> 2. setting reshape_position invokes automatically reshape start in md on array run
That shouldn't be a problem.. We start the array read-only and the reshape
will not start while that is set.
So:
set 'old' shape of array,
set reshape_position
set 'new' shape of array
start array 'readonly'
set sync_max to 0
enable read/write
allow reshape to continue while monitoring it with mdadm.
Does this work, or is there something I have missed.
> 3. setting of reshape position clears delta_disks in md (and other parameters, for now not important)
That shouldn't matter ... where do we set reshape_position that it causes
a problem?
> 4. Assembly() closes handle to array (it has to be not closed and used in reshape continuation)
I'm not sure what you are getting at.... reshape continuation is handled by
Grow_continue which is passed the handle to the array. It should fork and
monitor the array in the background, so it has its own copy of the handle
???
> 5. reshape continuation can require backup file. It depends where it was interrupted during expansion,
> Other reshapes can always require backup file
Yes ... Why is this a problem?
> 6. to run reshape, 'reshape' has to be written to sync_action.
> Raid5_start_reshape() is not prepared for reshape restart (i.e reshape position can be 0 or max array value
> - it depends on operation grow/shrink)
Yes ... raid5_start_reshape isn't used for restarting a reshape.
run() will start the reshape thread, which will not run because the array is
read-only
Once you switch the array to read-write the sync_thread should get woken up
and will continue the reshape.
I think the remainder of your email is also addressed by what I have said
above so I won't try to address specific things.
Please let me know if you see any problem with what I have outlined.
Thanks!
NeilBrown
> 7. After array start flag MD_RECOVERY_NEEDED is set, so reshape cannot be started from mdadm
> As array is started with not all disks (old raid disks), we cannot allow for such check (???)
> I've made workaround (setting reshape position clears this flag for external meta)
>
> I've started reshape again on /all/ new disks number, but it still starts from array begin. This is a matter of search where checkpoint is lost.
>
> I've tested my first idea also.
> To do as much as we can, as for native meta (reshape is started by array run).
> Some problems are similar as before (p.4, p.5)
> The only serious problem, that I've got with this is how to let to know md about delta_disks.
> I've resolved it by adding special case in raid_disks_store(),
> similar to native metadata when old_disks number is guessed.
> For external metadata, I am storing old and then new disks numbers, md calculates delta disks from this raid disks numbers sequence.
> (as I remember you do not want to expose delta disks in sysfs).
>
> Other issue that I'm observing in both methods is sync_action sysfs entry behavior. It reports reshape->idle->reshape...
> This 'idle' for a very short time causes migration cancelation. I've made workaround in mdmon for a now.
>
> Both methods are not fully workable yet, but I think this will change on Monday.
>
> Considering above, I still like more method when we construct array with new disks number.
> Begin of array /already reshaped/ has all disks present. In the same way md works for native arrays.
>
> I'm waiting for your comments/questions/ideas.
>
> BR
> Adam
>
> >
> > BR
> > Adam
> >
> > >
> > >
> > > > imsm: FIX: Report correct array size during reshape
> > > > imsm: FIX: initalize reshape progress as it is stored in
> > > metatdata
> > > >
> > > These both look good - I have applied them. Thanks.
> > >
> > > NeilBrown
> > >
next prev parent reply other threads:[~2011-02-27 6:51 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-22 14:13 [PATCH 0/3] Continue expansion after reboot Adam Kwolek
2011-02-22 14:13 ` [PATCH 1/3] imsm: FIX: initalize reshape progress as it is stored in metatdata Adam Kwolek
2011-02-22 14:13 ` [PATCH 2/3] imsm: FIX: Report correct array size during reshape Adam Kwolek
2011-02-22 14:13 ` [PATCH 3/3] FIX: Assemble device in reshape state with new disks number Adam Kwolek
2011-02-23 3:37 ` [PATCH 0/3] Continue expansion after reboot NeilBrown
2011-02-23 9:02 ` Kwolek, Adam
2011-02-25 15:55 ` Kwolek, Adam
2011-02-27 6:51 ` NeilBrown [this message]
2011-02-28 13:35 ` Kwolek, Adam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110227175120.20b6fe2c@notabene.brown \
--to=neilb@suse.de \
--cc=Wojciech.Neubauer@intel.com \
--cc=adam.kwolek@intel.com \
--cc=dan.j.williams@intel.com \
--cc=ed.ciechanowski@intel.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox