[PATCH 0/3] Continue expansion after reboot

Linux RAID subsystem development
 help / color / mirror / Atom feed

* [PATCH 0/3] Continue expansion after reboot
@ 2011-02-22 14:13 Adam Kwolek
  2011-02-22 14:13 ` [PATCH 1/3] imsm: FIX: initalize reshape progress as it is stored in metatdata Adam Kwolek
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Adam Kwolek @ 2011-02-22 14:13 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

Currently reshaped/expanded array is assembled but it stays in inactive state.
This patches allows for array assembly when array is under expansion.
Array with reshape/expansion information in metadata is assembled
and reshape process continues automatically.

Next step:
Problem is how to address container operation during assembly.
1. After first array being reshaped, assebly process looks if mdmon
   sets migration for other array in container. If yes it continues work
   for next array.

2. Assembly process performs reshape of currently reshaped array only.
   Mdmon sets next array for reshape and user triggers manually mdadm
   to finish container operation with just the same parameters set.

Reshape finish can be executed for container operation by container re-assembly
also (this works in current code).

BR
Adam

---

Adam Kwolek (3):
      FIX: Assemble device in reshape state with new disks number
      imsm: FIX: Report correct array size during reshape
      imsm: FIX: initalize reshape progress as it is stored in metatdata

 Assemble.c    |    2 ++
 super-intel.c |   24 +++++++++++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] imsm: FIX: initalize reshape progress as it is stored in metatdata
  2011-02-22 14:13 [PATCH 0/3] Continue expansion after reboot Adam Kwolek
@ 2011-02-22 14:13 ` Adam Kwolek
  2011-02-22 14:13 ` [PATCH 2/3] imsm: FIX: Report correct array size during reshape Adam Kwolek
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Adam Kwolek @ 2011-02-22 14:13 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

reshape prodess cannot be restarted due to no checkpoint information
in mdinfo.
When metadata is read during reshape process or reshape restart,
rehape_progress (mdinfo field) has to be initialized to value
stored in metadata. This allows start reshape from stored
in metadata checkpoint.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index e8d2c6b..1b3407a 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -1784,6 +1784,7 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 	else
 		info->delta_disks = 0;
 
+	info->reshape_progress = 0;
 	if (map_to_analyse->map_state == IMSM_T_STATE_UNINITIALIZED ||
 	    dev->vol.dirty) {
 		info->resync_start = 0;
@@ -1797,6 +1798,15 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 			info->resync_start = blocks_per_unit * units;
 			break;
 		}
+		case MIGR_GEN_MIGR: {
+			__u64 blocks_per_unit = blocks_per_migr_unit(dev);
+			__u64 units = __le32_to_cpu(dev->vol.curr_migr_unit);
+
+			info->reshape_progress = blocks_per_unit * units;
+			dprintf("IMSM: General Migration checkpoint : %llu "
+			       "(%llu) -> read reshape progress : %llu\n",
+				units, blocks_per_unit, info->reshape_progress);
+		}
 		case MIGR_VERIFY:
 			/* we could emulate the checkpointing of
 			 * 'sync_action=check' migrations, but for now
@@ -1804,7 +1814,6 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 			 */
 		case MIGR_REBUILD:
 			/* this is handled by container_content_imsm() */
-		case MIGR_GEN_MIGR:
 		case MIGR_STATE_CHANGE:
 			/* FIXME handle other migrations */
 		default:


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] imsm: FIX: Report correct array size during reshape
  2011-02-22 14:13 [PATCH 0/3] Continue expansion after reboot Adam Kwolek
  2011-02-22 14:13 ` [PATCH 1/3] imsm: FIX: initalize reshape progress as it is stored in metatdata Adam Kwolek
@ 2011-02-22 14:13 ` Adam Kwolek
  2011-02-22 14:13 ` [PATCH 3/3] FIX: Assemble device in reshape state with new disks number Adam Kwolek
  2011-02-23  3:37 ` [PATCH 0/3] Continue expansion after reboot NeilBrown
  3 siblings, 0 replies; 9+ messages in thread
From: Adam Kwolek @ 2011-02-22 14:13 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

When reshape is started imsm stores new size in metadata.
mdadm requires "old" size to proper initialization restarted array.

When reshape is in progress getinfo_super_imsm_volume() should report
computed array size value instead array size stored in metatda.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 1b3407a..c101dca 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -1806,6 +1806,19 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
 			dprintf("IMSM: General Migration checkpoint : %llu "
 			       "(%llu) -> read reshape progress : %llu\n",
 				units, blocks_per_unit, info->reshape_progress);
+			unsigned long long array_blocks;
+			int used_disks;
+
+			used_disks = imsm_num_data_members(dev, 1);
+			if (used_disks > 0) {
+				array_blocks = map->blocks_per_member *
+					used_disks;
+				/* round array size down to closest MB
+				 */
+				info->custom_array_size = (array_blocks
+						>> SECT_PER_MB_SHIFT)
+						<< SECT_PER_MB_SHIFT;
+			}
 		}
 		case MIGR_VERIFY:
 			/* we could emulate the checkpointing of


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] FIX: Assemble device in reshape state with new disks number
  2011-02-22 14:13 [PATCH 0/3] Continue expansion after reboot Adam Kwolek
  2011-02-22 14:13 ` [PATCH 1/3] imsm: FIX: initalize reshape progress as it is stored in metatdata Adam Kwolek
  2011-02-22 14:13 ` [PATCH 2/3] imsm: FIX: Report correct array size during reshape Adam Kwolek
@ 2011-02-22 14:13 ` Adam Kwolek
  2011-02-23  3:37 ` [PATCH 0/3] Continue expansion after reboot NeilBrown
  3 siblings, 0 replies; 9+ messages in thread
From: Adam Kwolek @ 2011-02-22 14:13 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

When expansion was interrupted, assembly should include all disks
in to new array. To do this delta_disks has to be added
to raid disks number.

When metadata reports all disks in content->array.raid_disks
then delta_disks should be equal to 0 (it is included in raid_disks already).

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 Assemble.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index 317be8b..8d89dbe 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1508,6 +1508,8 @@ int assemble_container_content(struct supertype *st, int mdfd,
 
 	sysfs_init(content, mdfd, 0);
 
+	if (content->reshape_active)
+		content->array.raid_disks += content->delta_disks;
 	sra = sysfs_read(mdfd, 0, GET_VERSION);
 	if (sra == NULL || strcmp(sra->text_version, content->text_version) != 0)
 		if (sysfs_set_array(content, md_get_version(mdfd)) != 0) {


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] Continue expansion after reboot
  2011-02-22 14:13 [PATCH 0/3] Continue expansion after reboot Adam Kwolek
                   ` (2 preceding siblings ...)
  2011-02-22 14:13 ` [PATCH 3/3] FIX: Assemble device in reshape state with new disks number Adam Kwolek
@ 2011-02-23  3:37 ` NeilBrown
  2011-02-23  9:02   ` Kwolek, Adam
  2011-02-25 15:55   ` Kwolek, Adam
  3 siblings, 2 replies; 9+ messages in thread
From: NeilBrown @ 2011-02-23  3:37 UTC (permalink / raw)
  To: Adam Kwolek
  Cc: linux-raid, dan.j.williams, ed.ciechanowski, wojciech.neubauer

On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote:

> Currently reshaped/expanded array is assembled but it stays in inactive state.
> This patches allows for array assembly when array is under expansion.
> Array with reshape/expansion information in metadata is assembled
> and reshape process continues automatically.
> 
> Next step:
> Problem is how to address container operation during assembly.
> 1. After first array being reshaped, assebly process looks if mdmon
>    sets migration for other array in container. If yes it continues work
>    for next array.
> 
> 2. Assembly process performs reshape of currently reshaped array only.
>    Mdmon sets next array for reshape and user triggers manually mdadm
>    to finish container operation with just the same parameters set.
> 
> Reshape finish can be executed for container operation by container re-assembly
> also (this works in current code).
> 

Yes, this is an awkward problem.

Just to be sure we are thinking about the same thing:
  When restarting an array in which migration is already underway mdadm simply
  forks and continues monitoring that migration.
  However if it is an array-wide migration, then when the migration of the
  first array completes, mdmon will update the metadata on the second array,
  but it isn't clear how mdadm can be told to start monitoring that array.

How about this:
  the imsm metadata handler should report that an array is 'undergoing
  migration if it is, or if an earlier array in the container is undergoing a
  migration which will cause 'this' array to subsequently be migrated too.

  So if the first array is in the middle of a 4drive->5drive conversion and
  the second array is simply at '4 drives', then imsm reported (to
  container_content) that the second drive is actually undergoing a migration
  from 4 to 5 drives, and is at the very beginning.

  When mdadm assembles that second array it will fork a child to monitor it.
  It will need to somehow wait for mdmon to really update the metadata before
  it starts.  This can probably be handled in the ->manage_reshape function.

Something along those line would be the right way to go I think.  It avoid
any races between arrays being assembled at different times.

> Adam Kwolek (3):
>       FIX: Assemble device in reshape state with new disks number

I don't think this patch is correct.  We need to configure the array with the
'old' number of devices first, then 'reshape_array' will also set the 'new'
number of devices.
What exactly what the problem you were  trying to fix?

>       imsm: FIX: Report correct array size during reshape
>       imsm: FIX: initalize reshape progress as it is stored in metatdata
>
These both look good - I have applied them.  Thanks.

NeilBrown

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 0/3] Continue expansion after reboot
  2011-02-23  3:37 ` [PATCH 0/3] Continue expansion after reboot NeilBrown
@ 2011-02-23  9:02   ` Kwolek, Adam
  2011-02-25 15:55   ` Kwolek, Adam
  1 sibling, 0 replies; 9+ messages in thread
From: Kwolek, Adam @ 2011-02-23  9:02 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed,
	Neubauer, Wojciech



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Wednesday, February 23, 2011 4:38 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 0/3] Continue expansion after reboot
> 
> On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek <adam.kwolek@intel.com>
> wrote:
> 
> > Currently reshaped/expanded array is assembled but it stays in
> inactive state.
> > This patches allows for array assembly when array is under expansion.
> > Array with reshape/expansion information in metadata is assembled
> > and reshape process continues automatically.
> >
> > Next step:
> > Problem is how to address container operation during assembly.
> > 1. After first array being reshaped, assebly process looks if mdmon
> >    sets migration for other array in container. If yes it continues
> work
> >    for next array.
> >
> > 2. Assembly process performs reshape of currently reshaped array only.
> >    Mdmon sets next array for reshape and user triggers manually mdadm
> >    to finish container operation with just the same parameters set.
> >
> > Reshape finish can be executed for container operation by container
> re-assembly
> > also (this works in current code).
> >
> 
> Yes, this is an awkward problem.
> 
> Just to be sure we are thinking about the same thing:
>   When restarting an array in which migration is already underway mdadm
> simply
>   forks and continues monitoring that migration.
>   However if it is an array-wide migration, then when the migration of
> the
>   first array completes, mdmon will update the metadata on the second
> array,
>   but it isn't clear how mdadm can be told to start monitoring that
> array.
> 
> How about this:
>   the imsm metadata handler should report that an array is 'undergoing
>   migration if it is, or if an earlier array in the container is
> undergoing a
>   migration which will cause 'this' array to subsequently be migrated
> too.
> 
>   So if the first array is in the middle of a 4drive->5drive conversion
> and
>   the second array is simply at '4 drives', then imsm reported (to
>   container_content) that the second drive is actually undergoing a
> migration
>   from 4 to 5 drives, and is at the very beginning.
> 
>   When mdadm assembles that second array it will fork a child to monitor
> it.
>   It will need to somehow wait for mdmon to really update the metadata
> before
>   it starts.  This can probably be handled in the ->manage_reshape
> function.
> 
> Something along those line would be the right way to go I think.  It
> avoid
> any races between arrays being assembled at different times.


This looks fine for me.

> 
> 
> > Adam Kwolek (3):
> >       FIX: Assemble device in reshape state with new disks number
> 
> I don't think this patch is correct.  We need to configure the array
> with the
> 'old' number of devices first, then 'reshape_array' will also set the
> 'new'
> number of devices.
> What exactly what the problem you were  trying to fix?

When array is being assembled with old raid disk number assembly cannot set readOnly array state
(error on sysfs state writing). Array stays in inactive state, so nothing (reshape) happened later.

I think that array cannot be assembled with old disks number (added new disks are present as spares)
because begin of array uses new disks already. This means we are assembling array with not complete disk set.
Stripes on begin can be corrupted (not all disks present in array). At this point inactive array state is ok to keep safe user data.


I'll test is setting old disk number and later configuration change in disks number and array state resolves problem.
I'll let you know results.

BR
Adam

> 
> 
> >       imsm: FIX: Report correct array size during reshape
> >       imsm: FIX: initalize reshape progress as it is stored in
> metatdata
> >
> These both look good - I have applied them.  Thanks.
> 
> NeilBrown
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 0/3] Continue expansion after reboot
  2011-02-23  3:37 ` [PATCH 0/3] Continue expansion after reboot NeilBrown
  2011-02-23  9:02   ` Kwolek, Adam
@ 2011-02-25 15:55   ` Kwolek, Adam
  2011-02-27  6:51     ` NeilBrown
  1 sibling, 1 reply; 9+ messages in thread
From: Kwolek, Adam @ 2011-02-25 15:55 UTC (permalink / raw)
  To: Kwolek, Adam, NeilBrown
  Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed,
	Neubauer, Wojciech



> -----Original Message-----
> From: Kwolek, Adam
> Sent: Wednesday, February 23, 2011 10:02 AM
> To: 'NeilBrown'
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: RE: [PATCH 0/3] Continue expansion after reboot
> 
> 
> 
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@suse.de]
> > Sent: Wednesday, February 23, 2011 4:38 AM
> > To: Kwolek, Adam
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > Neubauer, Wojciech
> > Subject: Re: [PATCH 0/3] Continue expansion after reboot
> >
> > On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek <adam.kwolek@intel.com>
> > wrote:
> >
> > > Currently reshaped/expanded array is assembled but it stays in
> > inactive state.
> > > This patches allows for array assembly when array is under
> expansion.
> > > Array with reshape/expansion information in metadata is assembled
> > > and reshape process continues automatically.
> > >
> > > Next step:
> > > Problem is how to address container operation during assembly.
> > > 1. After first array being reshaped, assebly process looks if mdmon
> > >    sets migration for other array in container. If yes it continues
> > work
> > >    for next array.
> > >
> > > 2. Assembly process performs reshape of currently reshaped array
> only.
> > >    Mdmon sets next array for reshape and user triggers manually
> mdadm
> > >    to finish container operation with just the same parameters set.
> > >
> > > Reshape finish can be executed for container operation by container
> > re-assembly
> > > also (this works in current code).
> > >
> >
> > Yes, this is an awkward problem.
> >
> > Just to be sure we are thinking about the same thing:
> >   When restarting an array in which migration is already underway
> mdadm
> > simply
> >   forks and continues monitoring that migration.
> >   However if it is an array-wide migration, then when the migration of
> > the
> >   first array completes, mdmon will update the metadata on the second
> > array,
> >   but it isn't clear how mdadm can be told to start monitoring that
> > array.
> >
> > How about this:
> >   the imsm metadata handler should report that an array is 'undergoing
> >   migration if it is, or if an earlier array in the container is
> > undergoing a
> >   migration which will cause 'this' array to subsequently be migrated
> > too.
> >
> >   So if the first array is in the middle of a 4drive->5drive
> conversion
> > and
> >   the second array is simply at '4 drives', then imsm reported (to
> >   container_content) that the second drive is actually undergoing a
> > migration
> >   from 4 to 5 drives, and is at the very beginning.
> >
> >   When mdadm assembles that second array it will fork a child to
> monitor
> > it.
> >   It will need to somehow wait for mdmon to really update the metadata
> > before
> >   it starts.  This can probably be handled in the ->manage_reshape
> > function.
> >
> > Something along those line would be the right way to go I think.  It
> > avoid
> > any races between arrays being assembled at different times.
> 
> 
> This looks fine for me.
> 
> >
> >
> > > Adam Kwolek (3):
> > >       FIX: Assemble device in reshape state with new disks number
> >
> > I don't think this patch is correct.  We need to configure the array
> > with the
> > 'old' number of devices first, then 'reshape_array' will also set the
> > 'new'
> > number of devices.
> > What exactly what the problem you were  trying to fix?
> 
> When array is being assembled with old raid disk number assembly cannot
> set readOnly array state
> (error on sysfs state writing). Array stays in inactive state, so
> nothing (reshape) happened later.
> 
> I think that array cannot be assembled with old disks number (added new
> disks are present as spares)
> because begin of array uses new disks already. This means we are
> assembling array with not complete disk set.
> Stripes on begin can be corrupted (not all disks present in array). At
> this point inactive array state is ok to keep safe user data.
> 
> 
> I'll test is setting old disk number and later configuration change in
> disks number and array state resolves problem.
> I'll let you know results.

I've made some investigations. I've tried assemble algorithm (as you suggested):
Conditions:
 reshape 3 disk raid5 array to 4 disks raid5 array
   is interrupted. Restart is invoked by command 'mdadm -As'

1. Assemble() builds container with new disks number
2. Assemble() builds container content (array with /old/ 3 disks)
3. array is set to frozen to block monitor
4. sync_max in sysfs is set to 0, to block md until reshape monitoring takes carry about reshape process
5. Continue_reshape() starts reshape process
6. Continue_reshape() continues reshape process

Problems I've met:
1. not all disks in Assembly() are added to array (old disks number limitation)
2. setting reshape_position invokes automatically reshape start in md on array run
3. setting of reshape position clears delta_disks in md (and other parameters, for now not important)
4. Assembly() closes handle to array (it has to be not closed and used in reshape continuation)
5. reshape continuation can require backup file. It depends where it was interrupted during expansion,
   Other reshapes can always require backup file 
6. to run reshape, 'reshape' has to be written to sync_action.
    Raid5_start_reshape() is not prepared for reshape restart (i.e reshape position can be 0 or max array value
    - it depends on operation grow/shrink)
7. After array start flag MD_RECOVERY_NEEDED is set, so reshape cannot be started from mdadm
    As array is started with not all disks (old raid disks), we cannot allow for such check (???)
    I've made workaround (setting reshape position clears this flag for external meta)

I've started reshape again on /all/ new disks number, but it still starts from array begin. This is a matter of search where checkpoint is lost.

I've tested my first idea also.
To do as much as we can, as for native meta (reshape is started by array run).
Some problems are similar as before (p.4, p.5)
The only serious problem, that I've got with this is how to let to know md about delta_disks.
I've resolved it by adding special case in raid_disks_store(), 
similar to native metadata when old_disks number is guessed.
For external metadata, I am storing old and then new disks numbers, md calculates delta disks from this raid disks numbers sequence. 
(as I remember you do not want to expose delta disks in sysfs). 

Other issue that I'm observing in both methods is sync_action sysfs entry behavior. It reports reshape->idle->reshape...
This 'idle' for a very short time causes migration cancelation. I've made workaround in mdmon for a now.

Both methods are not fully workable yet, but I think this will change on Monday.

Considering above, I still like more method when we construct array with new disks number.
Begin of array /already reshaped/ has all disks present. In the same way md works for native arrays.

I'm waiting for your comments/questions/ideas.

BR
Adam

> 
> BR
> Adam
> 
> >
> >
> > >       imsm: FIX: Report correct array size during reshape
> > >       imsm: FIX: initalize reshape progress as it is stored in
> > metatdata
> > >
> > These both look good - I have applied them.  Thanks.
> >
> > NeilBrown
> >


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] Continue expansion after reboot
  2011-02-25 15:55   ` Kwolek, Adam
@ 2011-02-27  6:51     ` NeilBrown
  2011-02-28 13:35       ` Kwolek, Adam
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2011-02-27  6:51 UTC (permalink / raw)
  To: Kwolek, Adam
  Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed,
	Neubauer, Wojciech

On Fri, 25 Feb 2011 15:55:01 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> 
> 
> > -----Original Message-----
> > From: Kwolek, Adam
> > Sent: Wednesday, February 23, 2011 10:02 AM
> > To: 'NeilBrown'
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > Neubauer, Wojciech
> > Subject: RE: [PATCH 0/3] Continue expansion after reboot
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: NeilBrown [mailto:neilb@suse.de]
> > > Sent: Wednesday, February 23, 2011 4:38 AM
> > > To: Kwolek, Adam
> > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > > Neubauer, Wojciech
> > > Subject: Re: [PATCH 0/3] Continue expansion after reboot
> > >
> > > On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek <adam.kwolek@intel.com>
> > > wrote:
> > >
> > > > Currently reshaped/expanded array is assembled but it stays in
> > > inactive state.
> > > > This patches allows for array assembly when array is under
> > expansion.
> > > > Array with reshape/expansion information in metadata is assembled
> > > > and reshape process continues automatically.
> > > >
> > > > Next step:
> > > > Problem is how to address container operation during assembly.
> > > > 1. After first array being reshaped, assebly process looks if mdmon
> > > >    sets migration for other array in container. If yes it continues
> > > work
> > > >    for next array.
> > > >
> > > > 2. Assembly process performs reshape of currently reshaped array
> > only.
> > > >    Mdmon sets next array for reshape and user triggers manually
> > mdadm
> > > >    to finish container operation with just the same parameters set.
> > > >
> > > > Reshape finish can be executed for container operation by container
> > > re-assembly
> > > > also (this works in current code).
> > > >
> > >
> > > Yes, this is an awkward problem.
> > >
> > > Just to be sure we are thinking about the same thing:
> > >   When restarting an array in which migration is already underway
> > mdadm
> > > simply
> > >   forks and continues monitoring that migration.
> > >   However if it is an array-wide migration, then when the migration of
> > > the
> > >   first array completes, mdmon will update the metadata on the second
> > > array,
> > >   but it isn't clear how mdadm can be told to start monitoring that
> > > array.
> > >
> > > How about this:
> > >   the imsm metadata handler should report that an array is 'undergoing
> > >   migration if it is, or if an earlier array in the container is
> > > undergoing a
> > >   migration which will cause 'this' array to subsequently be migrated
> > > too.
> > >
> > >   So if the first array is in the middle of a 4drive->5drive
> > conversion
> > > and
> > >   the second array is simply at '4 drives', then imsm reported (to
> > >   container_content) that the second drive is actually undergoing a
> > > migration
> > >   from 4 to 5 drives, and is at the very beginning.
> > >
> > >   When mdadm assembles that second array it will fork a child to
> > monitor
> > > it.
> > >   It will need to somehow wait for mdmon to really update the metadata
> > > before
> > >   it starts.  This can probably be handled in the ->manage_reshape
> > > function.
> > >
> > > Something along those line would be the right way to go I think.  It
> > > avoid
> > > any races between arrays being assembled at different times.
> > 
> > 
> > This looks fine for me.
> > 
> > >
> > >
> > > > Adam Kwolek (3):
> > > >       FIX: Assemble device in reshape state with new disks number
> > >
> > > I don't think this patch is correct.  We need to configure the array
> > > with the
> > > 'old' number of devices first, then 'reshape_array' will also set the
> > > 'new'
> > > number of devices.
> > > What exactly what the problem you were  trying to fix?
> > 
> > When array is being assembled with old raid disk number assembly cannot
> > set readOnly array state
> > (error on sysfs state writing). Array stays in inactive state, so
> > nothing (reshape) happened later.
> > 
> > I think that array cannot be assembled with old disks number (added new
> > disks are present as spares)
> > because begin of array uses new disks already. This means we are
> > assembling array with not complete disk set.
> > Stripes on begin can be corrupted (not all disks present in array). At
> > this point inactive array state is ok to keep safe user data.
> > 
> > 
> > I'll test is setting old disk number and later configuration change in
> > disks number and array state resolves problem.
> > I'll let you know results.
> 
> I've made some investigations. I've tried assemble algorithm (as you suggested):
> Conditions:
>  reshape 3 disk raid5 array to 4 disks raid5 array
>    is interrupted. Restart is invoked by command 'mdadm -As'
> 
> 1. Assemble() builds container with new disks number
> 2. Assemble() builds container content (array with /old/ 3 disks)
> 3. array is set to frozen to block monitor
> 4. sync_max in sysfs is set to 0, to block md until reshape monitoring takes carry about reshape process
> 5. Continue_reshape() starts reshape process
> 6. Continue_reshape() continues reshape process
> 
> Problems I've met:
> 1. not all disks in Assembly() are added to array (old disks number limitation)

I want to fix this by getting sysfs_set_array to set up the new raid_disks
number.
It currently doesn't because the number of disks that md is to expect could
be different to the number of disks recorded in the metadata, and
"analyse_change" might be needed to resolve the difference.
A particular example is that the metadata might think a RAID0 is changing from
4 device to 5 devices, but md need to be told that a RAID4 is changing from
5 devices to 6 devices.
However in the case, we really need to do the 'analyse_change' before calling
sysfs_set_array anyway.

So get sysfs_set_array to set up the array fully, and find somewhere
appropriate to put a call to analyse_change ... possibly modifying
analyse_change a bit ...

> 2. setting reshape_position invokes automatically reshape start in md on array run

That shouldn't be a problem..  We start the array read-only and the reshape
will not start while that is set.
So:
  set 'old' shape of array,
  set reshape_position
  set 'new' shape of array
  start array 'readonly'
  set sync_max to 0
  enable read/write
  allow reshape to continue while monitoring it with mdadm.

Does this work, or is there something I have missed.


> 3. setting of reshape position clears delta_disks in md (and other parameters, for now not important)

That shouldn't matter ... where do we set reshape_position that it causes
a problem?

> 4. Assembly() closes handle to array (it has to be not closed and used in reshape continuation)

I'm not sure what you are getting at.... reshape continuation is handled by 
Grow_continue which is passed the handle to the array.  It should fork and
monitor the array in the background, so it has its own copy of the handle
???


> 5. reshape continuation can require backup file. It depends where it was interrupted during expansion,
>    Other reshapes can always require backup file 

Yes ... Why is this a problem?

> 6. to run reshape, 'reshape' has to be written to sync_action.
>     Raid5_start_reshape() is not prepared for reshape restart (i.e reshape position can be 0 or max array value
>     - it depends on operation grow/shrink)

Yes ... raid5_start_reshape isn't used for restarting a reshape.
run() will start the reshape thread, which will not run because the array is
read-only
Once you switch the array to read-write the sync_thread should get woken up
and will continue the reshape.


I think the remainder of your email is also addressed by what I have said
above so I won't try to address specific things.

Please let me know if you see any problem with what I have outlined.

Thanks!

NeilBrown




> 7. After array start flag MD_RECOVERY_NEEDED is set, so reshape cannot be started from mdadm
>     As array is started with not all disks (old raid disks), we cannot allow for such check (???)
>     I've made workaround (setting reshape position clears this flag for external meta)
> 
> I've started reshape again on /all/ new disks number, but it still starts from array begin. This is a matter of search where checkpoint is lost.
> 
> I've tested my first idea also.
> To do as much as we can, as for native meta (reshape is started by array run).
> Some problems are similar as before (p.4, p.5)
> The only serious problem, that I've got with this is how to let to know md about delta_disks.
> I've resolved it by adding special case in raid_disks_store(), 
> similar to native metadata when old_disks number is guessed.
> For external metadata, I am storing old and then new disks numbers, md calculates delta disks from this raid disks numbers sequence. 
> (as I remember you do not want to expose delta disks in sysfs). 
> 
> Other issue that I'm observing in both methods is sync_action sysfs entry behavior. It reports reshape->idle->reshape...
> This 'idle' for a very short time causes migration cancelation. I've made workaround in mdmon for a now.
> 
> Both methods are not fully workable yet, but I think this will change on Monday.
> 
> Considering above, I still like more method when we construct array with new disks number.
> Begin of array /already reshaped/ has all disks present. In the same way md works for native arrays.
> 
> I'm waiting for your comments/questions/ideas.
> 
> BR
> Adam
> 
> > 
> > BR
> > Adam
> > 
> > >
> > >
> > > >       imsm: FIX: Report correct array size during reshape
> > > >       imsm: FIX: initalize reshape progress as it is stored in
> > > metatdata
> > > >
> > > These both look good - I have applied them.  Thanks.
> > >
> > > NeilBrown
> > >


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 0/3] Continue expansion after reboot
  2011-02-27  6:51     ` NeilBrown
@ 2011-02-28 13:35       ` Kwolek, Adam
  0 siblings, 0 replies; 9+ messages in thread
From: Kwolek, Adam @ 2011-02-28 13:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid@vger.kernel.org, Williams, Dan J, Ciechanowski, Ed,
	Neubauer, Wojciech



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Sunday, February 27, 2011 7:51 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 0/3] Continue expansion after reboot
> 
> On Fri, 25 Feb 2011 15:55:01 +0000 "Kwolek, Adam"
> <adam.kwolek@intel.com>
> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Kwolek, Adam
> > > Sent: Wednesday, February 23, 2011 10:02 AM
> > > To: 'NeilBrown'
> > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > > Neubauer, Wojciech
> > > Subject: RE: [PATCH 0/3] Continue expansion after reboot
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: NeilBrown [mailto:neilb@suse.de]
> > > > Sent: Wednesday, February 23, 2011 4:38 AM
> > > > To: Kwolek, Adam
> > > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > > > Neubauer, Wojciech
> > > > Subject: Re: [PATCH 0/3] Continue expansion after reboot
> > > >
> > > > On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek
> <adam.kwolek@intel.com>
> > > > wrote:
> > > >
> > > > > Currently reshaped/expanded array is assembled but it stays in
> > > > inactive state.
> > > > > This patches allows for array assembly when array is under
> > > expansion.
> > > > > Array with reshape/expansion information in metadata is
> assembled
> > > > > and reshape process continues automatically.
> > > > >
> > > > > Next step:
> > > > > Problem is how to address container operation during assembly.
> > > > > 1. After first array being reshaped, assebly process looks if
> mdmon
> > > > >    sets migration for other array in container. If yes it
> continues
> > > > work
> > > > >    for next array.
> > > > >
> > > > > 2. Assembly process performs reshape of currently reshaped array
> > > only.
> > > > >    Mdmon sets next array for reshape and user triggers manually
> > > mdadm
> > > > >    to finish container operation with just the same parameters
> set.
> > > > >
> > > > > Reshape finish can be executed for container operation by
> container
> > > > re-assembly
> > > > > also (this works in current code).
> > > > >
> > > >
> > > > Yes, this is an awkward problem.
> > > >
> > > > Just to be sure we are thinking about the same thing:
> > > >   When restarting an array in which migration is already underway
> > > mdadm
> > > > simply
> > > >   forks and continues monitoring that migration.
> > > >   However if it is an array-wide migration, then when the
> migration of
> > > > the
> > > >   first array completes, mdmon will update the metadata on the
> second
> > > > array,
> > > >   but it isn't clear how mdadm can be told to start monitoring
> that
> > > > array.
> > > >
> > > > How about this:
> > > >   the imsm metadata handler should report that an array is
> 'undergoing
> > > >   migration if it is, or if an earlier array in the container is
> > > > undergoing a
> > > >   migration which will cause 'this' array to subsequently be
> migrated
> > > > too.
> > > >
> > > >   So if the first array is in the middle of a 4drive->5drive
> > > conversion
> > > > and
> > > >   the second array is simply at '4 drives', then imsm reported (to
> > > >   container_content) that the second drive is actually undergoing
> a
> > > > migration
> > > >   from 4 to 5 drives, and is at the very beginning.
> > > >
> > > >   When mdadm assembles that second array it will fork a child to
> > > monitor
> > > > it.
> > > >   It will need to somehow wait for mdmon to really update the
> metadata
> > > > before
> > > >   it starts.  This can probably be handled in the ->manage_reshape
> > > > function.
> > > >
> > > > Something along those line would be the right way to go I think.
> It
> > > > avoid
> > > > any races between arrays being assembled at different times.
> > >
> > >
> > > This looks fine for me.
> > >
> > > >
> > > >
> > > > > Adam Kwolek (3):
> > > > >       FIX: Assemble device in reshape state with new disks
> number
> > > >
> > > > I don't think this patch is correct.  We need to configure the
> array
> > > > with the
> > > > 'old' number of devices first, then 'reshape_array' will also set
> the
> > > > 'new'
> > > > number of devices.
> > > > What exactly what the problem you were  trying to fix?
> > >
> > > When array is being assembled with old raid disk number assembly
> cannot
> > > set readOnly array state
> > > (error on sysfs state writing). Array stays in inactive state, so
> > > nothing (reshape) happened later.
> > >
> > > I think that array cannot be assembled with old disks number (added
> new
> > > disks are present as spares)
> > > because begin of array uses new disks already. This means we are
> > > assembling array with not complete disk set.
> > > Stripes on begin can be corrupted (not all disks present in array).
> At
> > > this point inactive array state is ok to keep safe user data.
> > >
> > >
> > > I'll test is setting old disk number and later configuration change
> in
> > > disks number and array state resolves problem.
> > > I'll let you know results.
> >
> > I've made some investigations. I've tried assemble algorithm (as you
> suggested):
> > Conditions:
> >  reshape 3 disk raid5 array to 4 disks raid5 array
> >    is interrupted. Restart is invoked by command 'mdadm -As'
> >
> > 1. Assemble() builds container with new disks number
> > 2. Assemble() builds container content (array with /old/ 3 disks)
> > 3. array is set to frozen to block monitor
> > 4. sync_max in sysfs is set to 0, to block md until reshape monitoring
> takes carry about reshape process
> > 5. Continue_reshape() starts reshape process
> > 6. Continue_reshape() continues reshape process
> >
> > Problems I've met:
> > 1. not all disks in Assembly() are added to array (old disks number
> limitation)
> 
> I want to fix this by getting sysfs_set_array to set up the new
> raid_disks
> number.
> It currently doesn't because the number of disks that md is to expect
> could
> be different to the number of disks recorded in the metadata, and
> "analyse_change" might be needed to resolve the difference.
> A particular example is that the metadata might think a RAID0 is
> changing from
> 4 device to 5 devices, but md need to be told that a RAID4 is changing
> from
> 5 devices to 6 devices.
> However in the case, we really need to do the 'analyse_change' before
> calling
> sysfs_set_array anyway.
> 
> So get sysfs_set_array to set up the array fully, and find somewhere
> appropriate to put a call to analyse_change ... possibly modifying
> analyse_change a bit ...

In the nearest patches that I hope makes me closer to the final code, I will not use analyze changes.
I want to have that makes the thing (restore from checkpoint) workable first.

> 
> > 2. setting reshape_position invokes automatically reshape start in md
> on array run
> 
> That shouldn't be a problem..  We start the array read-only and the
> reshape
> will not start while that is set.
> So:
>   set 'old' shape of array,
>   set reshape_position
>   set 'new' shape of array
>   start array 'readonly'
>   set sync_max to 0
>   enable read/write
>   allow reshape to continue while monitoring it with mdadm.
> 
> Does this work, or is there something I have missed.

Everything in mdadm seems to be ok.
One small problem is in md (raid5.c:5052):
For grow case there is check for checkpoint. For my code chunk_sectors and new_chunk_sectors
are the same, so array is not started. If I ignore '==' case array can be assembled.
 
> 
> 
> > 3. setting of reshape position clears delta_disks in md (and other
> parameters, for now not important)
> 
> That shouldn't matter ... where do we set reshape_position that it
> causes
> a problem?

Yes I've found it during my tests.

> 
> > 4. Assembly() closes handle to array (it has to be not closed and used
> in reshape continuation)
> 
> I'm not sure what you are getting at.... reshape continuation is handled
> by
> Grow_continue which is passed the handle to the array.  It should fork
> and
> monitor the array in the background, so it has its own copy of the
> handle
> ???

Assemble_container_content() closes array handle. I've not fork() inside this function,
but probably this will be better.

> 
> > 5. reshape continuation can require backup file. It depends where it
> was interrupted during expansion,
> >    Other reshapes can always require backup file
> 
> Yes ... Why is this a problem?

... not a problem, rather TBD ;)
Assemble operation has not pointed backup file, so backup file name has to be generated.


> 
> > 6. to run reshape, 'reshape' has to be written to sync_action.
> >     Raid5_start_reshape() is not prepared for reshape restart (i.e
> reshape position can be 0 or max array value
> >     - it depends on operation grow/shrink)
> 
> Yes ... raid5_start_reshape isn't used for restarting a reshape.
> run() will start the reshape thread, which will not run because the
> array is
> read-only
> Once you switch the array to read-write the sync_thread should get woken
> up
> and will continue the reshape.

This is what I wanted to hear :).

> 
> 
> I think the remainder of your email is also addressed by what I have
> said
> above so I won't try to address specific things.
> 
> Please let me know if you see any problem with what I have outlined.
> 
> Thanks!
> 
> NeilBrown

Everything is more or less clear, I'll prepare few patches that allows
For reshape continuation for expansion to get your feedback.

BR
Adam


> 
> 
> 
> > 7. After array start flag MD_RECOVERY_NEEDED is set, so reshape cannot
> be started from mdadm
> >     As array is started with not all disks (old raid disks), we cannot
> allow for such check (???)
> >     I've made workaround (setting reshape position clears this flag
> for external meta)
> >
> > I've started reshape again on /all/ new disks number, but it still
> starts from array begin. This is a matter of search where checkpoint is
> lost.
> >
> > I've tested my first idea also.
> > To do as much as we can, as for native meta (reshape is started by
> array run).
> > Some problems are similar as before (p.4, p.5)
> > The only serious problem, that I've got with this is how to let to
> know md about delta_disks.
> > I've resolved it by adding special case in raid_disks_store(),
> > similar to native metadata when old_disks number is guessed.
> > For external metadata, I am storing old and then new disks numbers, md
> calculates delta disks from this raid disks numbers sequence.
> > (as I remember you do not want to expose delta disks in sysfs).
> >
> > Other issue that I'm observing in both methods is sync_action sysfs
> entry behavior. It reports reshape->idle->reshape...
> > This 'idle' for a very short time causes migration cancelation. I've
> made workaround in mdmon for a now.
> >
> > Both methods are not fully workable yet, but I think this will change
> on Monday.
> >
> > Considering above, I still like more method when we construct array
> with new disks number.
> > Begin of array /already reshaped/ has all disks present. In the same
> way md works for native arrays.
> >
> > I'm waiting for your comments/questions/ideas.
> >
> > BR
> > Adam
> >
> > >
> > > BR
> > > Adam
> > >
> > > >
> > > >
> > > > >       imsm: FIX: Report correct array size during reshape
> > > > >       imsm: FIX: initalize reshape progress as it is stored in
> > > > metatdata
> > > > >
> > > > These both look good - I have applied them.  Thanks.
> > > >
> > > > NeilBrown
> > > >


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-02-28 13:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-22 14:13 [PATCH 0/3] Continue expansion after reboot Adam Kwolek
2011-02-22 14:13 ` [PATCH 1/3] imsm: FIX: initalize reshape progress as it is stored in metatdata Adam Kwolek
2011-02-22 14:13 ` [PATCH 2/3] imsm: FIX: Report correct array size during reshape Adam Kwolek
2011-02-22 14:13 ` [PATCH 3/3] FIX: Assemble device in reshape state with new disks number Adam Kwolek
2011-02-23  3:37 ` [PATCH 0/3] Continue expansion after reboot NeilBrown
2011-02-23  9:02   ` Kwolek, Adam
2011-02-25 15:55   ` Kwolek, Adam
2011-02-27  6:51     ` NeilBrown
2011-02-28 13:35       ` Kwolek, Adam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox