RE: Devel 3.2 branch issues

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RE: Devel 3.2 branch issues
@ 2010-11-22 22:39 Czarnowska, Anna
  2010-11-23  0:52 ` Neil Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Czarnowska, Anna @ 2010-11-22 22:39 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw

> by the way, some of the changes in you of the patches you sent have not
> been
> included in any form.  They include:
> 
> - the getinfo_super_disks method.  I couldn't see why you need this.
> All the
>   info about the state of the arrays should already be available.
>   If there is something that you need that we don't have, please
> explain and
>   we can see how best to add it back in.

Marcin has already answered this but here is my explanation.
Current test devstate[i]==0 is always true for container so any device seems a good candidate to move.
To be able to identify members, failed devices and real spares we updated devstate for containers.
To find members we can just check which disks are used in subarrays, but a failed disk is removed from subarray after a short while and as soon as it happens we are not able to see a difference between the failed disk and a spare unless we look at metadata. 

> - min_active_disk_size_in_array.  I don't think the minimum current
> size is
>   really a good guide.  I've kept the code for letting the metadata
> handler
>   check the size, but anything beyond that should be done with domains
> I
>   think.
>   E.g have a domain '2G-or-greater' which is assigned to all 2G or
> greater
>   devices.  Then anything smaller will automatically be excluded from
> arrays
>   with those devices.

So if someone doesn't base domains on size they may have a small spare added to an array where it cannot be used. 
Min_active_disk_size was more than required for an array that didn't occupy the whole disk but at least it ensured that we are not throwing in something that wouldn't help. If we do this the array will remain degraded but will have spare - so Monitor may think it does not need more.
For this reason we also checked the case when there was a spare in "to" container. If the spare was not suitable (size check here too) we would still look for a good one. 

And now back to assembly. There is still a segmentation fault when we try to assemble a subarray. Occurs when there is any config file and we run "mdadm -As" or "mdadm -Asc /etc/mdadm.conf". content is NULL when we try to compare uuid in line 413 in Assemble.c.
We are going to prepare some tests to add to current suite so it will be easier to verify new patches. 

Regards
Anna
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
z siedziba w Gdansku
ul. Slowackiego 173
80-298 Gdansk

Sad Rejonowy Gdansk Polnoc w Gdansku, 
VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
numer KRS 101882

NIP 957-07-52-316
Kapital zakladowy 200.000 zl

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Devel 3.2 branch issues
  2010-11-22 22:39 Devel 3.2 branch issues Czarnowska, Anna
@ 2010-11-23  0:52 ` Neil Brown
  2010-11-23 12:04   ` Czarnowska, Anna
  2010-11-25  8:01   ` Neil Brown
  0 siblings, 2 replies; 12+ messages in thread
From: Neil Brown @ 2010-11-23  0:52 UTC (permalink / raw)
  To: Czarnowska, Anna
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw

On Mon, 22 Nov 2010 22:39:00 +0000
"Czarnowska, Anna" <anna.czarnowska@intel.com> wrote:

> 
> > by the way, some of the changes in you of the patches you sent have not
> > been
> > included in any form.  They include:
> > 
> > - the getinfo_super_disks method.  I couldn't see why you need this.
> > All the
> >   info about the state of the arrays should already be available.
> >   If there is something that you need that we don't have, please
> > explain and
> >   we can see how best to add it back in.
> 
> Marcin has already answered this but here is my explanation.
> Current test devstate[i]==0 is always true for container so any device seems a good candidate to move.
> To be able to identify members, failed devices and real spares we updated devstate for containers.
> To find members we can just check which disks are used in subarrays, but a failed disk is removed from subarray after a short while and as soon as it happens we are not able to see a difference between the failed disk and a spare unless we look at metadata. 

Thanks.  That makes sense.  I'll look at the code and see about applying it.

> 
> > - min_active_disk_size_in_array.  I don't think the minimum current
> > size is
> >   really a good guide.  I've kept the code for letting the metadata
> > handler
> >   check the size, but anything beyond that should be done with domains
> > I
> >   think.
> >   E.g have a domain '2G-or-greater' which is assigned to all 2G or
> > greater
> >   devices.  Then anything smaller will automatically be excluded from
> > arrays
> >   with those devices.
>  
> So if someone doesn't base domains on size they may have a small spare added to an array where it cannot be used. 
> Min_active_disk_size was more than required for an array that didn't occupy the whole disk but at least it ensured that we are not throwing in something that wouldn't help. If we do this the array will remain degraded but will have spare - so Monitor may think it does not need more.
> For this reason we also checked the case when there was a spare in "to" container. If the spare was not suitable (size check here too) we would still look for a good one. 

I don't think it is possible to come up with an automatic way to determine if
a given spare suits a given array that is always correct.  There are too many
subtleties.
So I would like to allow the sysadmin to exercise complete control, and have
defaults that make reasonable sense in common cases.

The 'complete control' can be exercised through domain - though I will
probably add some size based rule mechanism to the policy code so devices can
be categorised by size if wanted.
The 'safe default' is probably best left to the metadata handler.  So
ultimately all metadata types *should* specify min_acceptable_spare_size,
and we will just make do with that.

Does that sound OK?


> 
> And now back to assembly. There is still a segmentation fault when we try to assemble a subarray. Occurs when there is any config file and we run "mdadm -As" or "mdadm -Asc /etc/mdadm.conf". content is NULL when we try to compare uuid in line 413 in Assemble.c.

Yes - patch below should fix this.

> We are going to prepare some tests to add to current suite so it will be easier to verify new patches. 

That would be greatly appreciated!!

Thanks,
NeilBrown

commit 87477e6d5e4201bf2bd812f34f8321983310bd99
Author: NeilBrown <neilb@suse.de>
Date:   Tue Nov 23 11:34:36 2010 +1100

    Assemble: get content before testing it.
    
    When checking that a container matches the required uuid,
    we need to call 'getinfo_super' before we have a 'content'
    to test.
    
    Reported-by: "Czarnowska, Anna" <anna.czarnowska@intel.com>
    Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/Assemble.c b/Assemble.c
index 1a1e128..607f2af 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -409,6 +409,11 @@ int Assemble(struct supertype *st, char *mddev,
                                if (ident->container[0] != '/') {
                                        /* we have a uuid */
                                        int uuid[4];
+
+                                       content = &info;
+                                       memset(content, 0, sizeof(*content));
+                                       tst->ss->getinfo_super(tst, content, NULL);
+
                                        if (!parse_uuid(ident->container, uuid) ||
                                            !same_uuid(content->uuid, uuid, tst->ss->swapuuid)) {
                                                if (report_missmatch)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: Devel 3.2 branch issues
  2010-11-23  0:52 ` Neil Brown
@ 2010-11-23 12:04   ` Czarnowska, Anna
  2010-11-25  8:01   ` Neil Brown
  1 sibling, 0 replies; 12+ messages in thread
From: Czarnowska, Anna @ 2010-11-23 12:04 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw


> I don't think it is possible to come up with an automatic way to
> determine if
> a given spare suits a given array that is always correct.  There are
> too many
> subtleties.
> So I would like to allow the sysadmin to exercise complete control, and
> have
> defaults that make reasonable sense in common cases.
> 
> The 'complete control' can be exercised through domain - though I will
> probably add some size based rule mechanism to the policy code so
> devices can
> be categorised by size if wanted.
> The 'safe default' is probably best left to the metadata handler.  So
> ultimately all metadata types *should* specify
> min_acceptable_spare_size,
> and we will just make do with that.
> 
> Does that sound OK?

Yes. 
When all metadata types have min_acceptable_spare_size there will be no need for min_active_disk_size at all.
One more thought: Manage_subdevs checks component_size so for native metadata it will not allow to add a spare that is too small.
But checking size in Monitor will prevent unnecessary removal and re-adding. 
It would make sense to get Manage_subdevs to check the size properly for external metadata too.
Anna
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
z siedziba w Gdansku
ul. Slowackiego 173
80-298 Gdansk

Sad Rejonowy Gdansk Polnoc w Gdansku, 
VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
numer KRS 101882

NIP 957-07-52-316
Kapital zakladowy 200.000 zl

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Devel 3.2 branch issues
  2010-11-23  0:52 ` Neil Brown
  2010-11-23 12:04   ` Czarnowska, Anna
@ 2010-11-25  8:01   ` Neil Brown
  2010-11-25 10:28     ` Czarnowska, Anna
  1 sibling, 1 reply; 12+ messages in thread
From: Neil Brown @ 2010-11-25  8:01 UTC (permalink / raw)
  To: Czarnowska, Anna
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw

On Tue, 23 Nov 2010 11:52:13 +1100 Neil Brown <neilb@suse.de> wrote:

> On Mon, 22 Nov 2010 22:39:00 +0000
> "Czarnowska, Anna" <anna.czarnowska@intel.com> wrote:
> 
> > 
> > > by the way, some of the changes in you of the patches you sent have not
> > > been
> > > included in any form.  They include:
> > > 
> > > - the getinfo_super_disks method.  I couldn't see why you need this.
> > > All the
> > >   info about the state of the arrays should already be available.
> > >   If there is something that you need that we don't have, please
> > > explain and
> > >   we can see how best to add it back in.
> > 
> > Marcin has already answered this but here is my explanation.
> > Current test devstate[i]==0 is always true for container so any device seems a good candidate to move.
> > To be able to identify members, failed devices and real spares we updated devstate for containers.
> > To find members we can just check which disks are used in subarrays, but a failed disk is removed from subarray after a short while and as soon as it happens we are not able to see a difference between the failed disk and a spare unless we look at metadata. 
> 
> Thanks.  That makes sense.  I'll look at the code and see about applying it.
> 

OK, I have something, though I haven't tested it.

It uses your getinfo_super_disks and does the following to choose a spare
from an external array.  There are a couple of rearrangement patches before
this so it won't apply as-it, but should appear in my devel-3.2 within a few
hours.

NeilBrown

commit 5739e0d007a3eea80f5108d73d444751dbbde1ef
Author: NeilBrown <neilb@suse.de>
Date:   Thu Nov 25 18:58:27 2010 +1100

    Monitor: choose spare correctly for external metadata.
    
    When metadata is managed externally - probably as a container - we
    need to examine that metadata to see which devices are spares.
    
    So use the getinfo_super_disk message and use the info returned.
    
    Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/Monitor.c b/Monitor.c
index 5fc18d1..9ba49f2 100644
--- a/Monitor.c
+++ b/Monitor.c
@@ -798,6 +798,63 @@ static int choose_spare(struct state *from, struct state *to,
 	return dev;
 }
 
+static int container_choose_spare(struct state *from, struct state *to,
+				  struct domainlist *domlist)
+{
+	/* This is similar to choose_spare, but we cannot trust devstate,
+	 * so we need to read the metadata instead
+	 */
+
+	struct supertype *st = from->metadata;
+	int fd = open(st->devname, O_RDONLY);
+	int err;
+	struct mdinfo *disks, *d;
+	unsigned long long min_size
+		= min_spare_size_required(to);
+	int dev;
+
+	if (fd < 0)
+		return 0;
+	if (!st->ss->getinfo_super_disks)
+		return 0;
+	
+	err = st->ss->load_container(st, fd, NULL);
+	close(fd);
+	if (err)
+		return 0;
+
+	disks = st->ss->getinfo_super_disks(st);
+	st->ss->free_super(st);
+
+	if (!disks)
+		return 0;
+	
+	for (d = disks->devs ; d && !dev ; d = d->next) {
+		if (d->disk.state == 0) {
+			struct dev_policy *pol;
+			unsigned long long dev_size;
+			dev = makedev(d->disk.major,d->disk.minor);
+			
+			if (min_size &&
+			    dev_size_from_id(dev,  &dev_size) &&
+			    dev_size < min_size)
+				continue;
+
+			pol = devnum_policy(dev);
+			if (from->spare_group)
+				pol_add(&pol, pol_domain,
+					from->spare_group, NULL);
+			if (!domain_test(domlist, pol, to->metadata->ss->name))
+				dev = 0;
+			
+			dev_policy_free(pol);
+		}
+	}
+	sysfs_free(disks);
+	return dev;
+}
+
+
 static void try_spare_migration(struct state *statelist, struct alert_info *info)
 {
 	struct state *from;
@@ -827,7 +884,11 @@ static void try_spare_migration(struct state *statelist, struct alert_info *info
 				int devid;
 				if (!check_donor(from, to, domlist))
 					continue;
-				devid = choose_spare(from, to, domlist);
+				if (from->metadata->ss->external)
+					devid = container_choose_spare(
+						from, to, domlist);
+				else
+					devid = choose_spare(from, to, domlist);
 				if (devid > 0
 				    && move_spare(from, to, devid, info))
 						break;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: Devel 3.2 branch issues
  2010-11-25  8:01   ` Neil Brown
@ 2010-11-25 10:28     ` Czarnowska, Anna
  2010-11-26 18:23       ` Czarnowska, Anna
  0 siblings, 1 reply; 12+ messages in thread
From: Czarnowska, Anna @ 2010-11-25 10:28 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Neil Brown
> Sent: Thursday, November 25, 2010 9:02 AM
> To: Czarnowska, Anna
> Cc: linux-raid@vger.kernel.org; Neubauer, Wojciech; Williams, Dan J;
> Ciechanowski, Ed; Labun, Marcin; Hawrylewicz Czarnowski, Przemyslaw
> Subject: Re: Devel 3.2 branch issues
> 
> On Tue, 23 Nov 2010 11:52:13 +1100 Neil Brown <neilb@suse.de> wrote:
> 
> > On Mon, 22 Nov 2010 22:39:00 +0000
> > "Czarnowska, Anna" <anna.czarnowska@intel.com> wrote:
> >
> > >
> > > > by the way, some of the changes in you of the patches you sent
> have not
> > > > been
> > > > included in any form.  They include:
> > > >
> > > > - the getinfo_super_disks method.  I couldn't see why you need
> this.
> > > > All the
> > > >   info about the state of the arrays should already be available.
> > > >   If there is something that you need that we don't have, please
> > > > explain and
> > > >   we can see how best to add it back in.
> > >
> > > Marcin has already answered this but here is my explanation.
> > > Current test devstate[i]==0 is always true for container so any
> device seems a good candidate to move.
> > > To be able to identify members, failed devices and real spares we
> updated devstate for containers.
> > > To find members we can just check which disks are used in
> subarrays, but a failed disk is removed from subarray after a short
> while and as soon as it happens we are not able to see a difference
> between the failed disk and a spare unless we look at metadata.
> >
> > Thanks.  That makes sense.  I'll look at the code and see about
> applying it.
> >
> 
> OK, I have something, though I haven't tested it.
> 
> It uses your getinfo_super_disks and does the following to choose a
> spare
> from an external array.  There are a couple of rearrangement patches
> before
> this so it won't apply as-it, but should appear in my devel-3.2 within
> a few
> hours.
> 
> NeilBrown
> 

Well, this didn't help. 
In the set of tests I have just posted even the basic ones fail for imsm.
For native there are still some problems with tests:
 5c - spare not moved to degraded array in the same domain. This is really basic test with 4 arrays instead of 2.
 9 - spare moved between different metadata arrays
13 - spare moved despite action=include which doesn't allow migration

Test9 run in scan mode generates a segmentation fault.

I will have a look at this in debugger and give you more info on the reasons later on.

Anna

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
z siedziba w Gdansku
ul. Slowackiego 173
80-298 Gdansk

Sad Rejonowy Gdansk Polnoc w Gdansku, 
VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
numer KRS 101882

NIP 957-07-52-316
Kapital zakladowy 200.000 zl

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Devel 3.2 branch issues
  2010-11-25 10:28     ` Czarnowska, Anna
@ 2010-11-26 18:23       ` Czarnowska, Anna
  2010-11-28 22:59         ` Neil Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Czarnowska, Anna @ 2010-11-26 18:23 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw

> Well, this didn't help.
> In the set of tests I have just posted even the basic ones fail for
> imsm.
> For native there are still some problems with tests:
>  5c - spare not moved to degraded array in the same domain. This is
> really basic test with 4 arrays instead of 2.
>  9 - spare moved between different metadata arrays
> 13 - spare moved despite action=include which doesn't allow migration
> 
> Test9 run in scan mode generates a segmentation fault.
> 
> I will have a look at this in debugger and give you more info on the
> reasons later on.
> 
> Anna

After applying yesterday's fixes test5 and test9 don't fail any more for native.

Test6 often fails because Monitor keeps removing and re-adding spare
that is too small to add to degraded array. Test fails when we see it removed.

I have prepared few further fixes to address possible problems with Monitor
as described in each patch.

Spare migration works now also for imsm. (test12 and 13 still fail).
Test12 - two spares are taken when just one needed.
Test13 - action=include so spare should not be moved

Scan mode still needs to be investigated.

Anna

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Devel 3.2 branch issues
  2010-11-26 18:23       ` Czarnowska, Anna
@ 2010-11-28 22:59         ` Neil Brown
  0 siblings, 0 replies; 12+ messages in thread
From: Neil Brown @ 2010-11-28 22:59 UTC (permalink / raw)
  To: Czarnowska, Anna
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw

On Fri, 26 Nov 2010 18:23:34 +0000 "Czarnowska, Anna"
<anna.czarnowska@intel.com> wrote:

> > Well, this didn't help.
> > In the set of tests I have just posted even the basic ones fail for
> > imsm.
> > For native there are still some problems with tests:
> >  5c - spare not moved to degraded array in the same domain. This is
> > really basic test with 4 arrays instead of 2.
> >  9 - spare moved between different metadata arrays
> > 13 - spare moved despite action=include which doesn't allow migration
> > 
> > Test9 run in scan mode generates a segmentation fault.
> > 
> > I will have a look at this in debugger and give you more info on the
> > reasons later on.
> > 
> > Anna
> 
> After applying yesterday's fixes test5 and test9 don't fail any more for native.
> 
> Test6 often fails because Monitor keeps removing and re-adding spare
> that is too small to add to degraded array. Test fails when we see it removed.
> 
> I have prepared few further fixes to address possible problems with Monitor
> as described in each patch.

Thanks for these patches and more particularly for all the testing effort,
finding and fixing my bugs!

I have applied them all.

NeilBrown


> 
> Spare migration works now also for imsm. (test12 and 13 still fail).
> Test12 - two spares are taken when just one needed.
> Test13 - action=include so spare should not be moved
> 
> Scan mode still needs to be investigated.
> 
> Anna


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Patch 00/17] Autorebuild
@ 2010-10-29 14:13 Czarnowska, Anna
  2010-11-17 10:22 ` Neil Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Czarnowska, Anna @ 2010-10-29 14:13 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw, Czarnowska, Anna

This is updated series of patches forming autorebuild functionality in mdadm monitor based on new policy code.

Autorebuild Monitoring application:
Autorebuild monitor is part of monitor application (mdadm -F). In the current code of mdadm monitor autorebuild feature was based on spare group assignment in mdadm.conf file and worked only for native metadata. This code has been retained for compatibility with old config format.
The new autorebuild implementation works also for external metadata types. It uses the concept of domains in mdadm.conf introduced by Neil Brown. 
Monitoring application periodically checks the state of MD active arrays and triggers a rebuild if there are eligible spare disks in other arrays/containers.
Degraded arrays are checked one by one. If there is a spare disk in other array/container that matches the domain of the degraded array and the domain action allows for spare sharing the spare is moved using existing Manage_subdevs function. If the addition fails, the spare device is moved back to the original container and next potential spare is tried. The process is repeated until all arrays are checked and the process is put into a sleep state for a configured period.

New option --no-sharing has been added to Monitor mode to be able to run monitoring only (without moving spares). This is recommended when many instances of monitor are to be run on the same set of devices.
Spare sharing is allowed in only one instance of Monitor running with --scan option. User is still able to start Monitoring functions in multiple instances without --scan option.

The autorebuild build-in assumptions are:
1\spares are shared between the arrays of the same metadata 2\spares are moved only from containers/volumes that are not degraded 3\spares are moved to containers/volumes lacking a *good* spare (size)

Anna Czarnowska
Przemyslaw Hawrylewicz-Czarnowski
Marcin Labun

0001-added-path-path_id-to-give-the-information-on-the-pa.patch
0002-Update-of-udev-rules-to-support-IMSM-devices.patch
0003-extension-of-IncrementalRemove-to-store-location-pat.patch
0004-Incremental-for-bare-disks-implementation-of-spare-s.patch
0005-Util-get-device-size-from-id.patch
0006-Monitor-set-err-on-arrays-not-in-mdstat.patch
0007-Monitor-spare-group-based-spare-sharing-moved-to-sep.patch
0008-mdadm-added-no-sharing-option-for-Monitor-mode.patch
0009-Monitor-avoid-skipping-checks-on-external-arays.patch
0010-Monitor-include-containers-in-scan-mode.patch
0011-Monitor-link-containers-with-subarrays-in-statelist.patch
0012-imsm-create-mdinfo-list-of-disks-in-a-container-from.patch
0013-Monitor-autorebuild-functionality-added.patch
0014-Monitor-Respect-policy-in-auto-rebuild-in-mdadm-moni.patch
0015-Monitor-more-accurate-size-check-when-looking-for-sp.patch
0016-IMSM-Fix-problem-in-mdmon-monitor-of-using-removed-d.patch
0017-Policy-is-aware-of-metadata-disk-s-controller-domain.patch

Incremental.c      |  230 +++++++++++++++---
Makefile           |    3 +
Monitor.c          |  691 ++++++++++++++++++++++++++++++++++++++++++++--------
ReadMe.c           |    4 +
managemon.c        |   38 +++
mdadm.c            |   29 ++-
mdadm.h            |   49 ++++-
policy.c           |  134 +++++++++-
super-intel.c      |  274 ++++++++++++++++++---
udev-md-raid.rules |    7 +-
util.c             |   23 ++
11 files changed, 1290 insertions(+), 192 deletions(-)
---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
z siedziba w Gdansku
ul. Slowackiego 173
80-298 Gdansk

Sad Rejonowy Gdansk Polnoc w Gdansku, 
VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
numer KRS 101882

NIP 957-07-52-316
Kapital zakladowy 200.000 zl

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch 00/17] Autorebuild
  2010-10-29 14:13 [Patch 00/17] Autorebuild Czarnowska, Anna
@ 2010-11-17 10:22 ` Neil Brown
  2010-11-18 23:14   ` Czarnowska, Anna
  0 siblings, 1 reply; 12+ messages in thread
From: Neil Brown @ 2010-11-17 10:22 UTC (permalink / raw)
  To: Czarnowska, Anna
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw

On Fri, 29 Oct 2010 15:13:40 +0100
"Czarnowska, Anna" <anna.czarnowska@intel.com> wrote:

> This is updated series of patches forming autorebuild functionality in mdadm monitor based on new policy code.

I Anna and all,

I have decided that the best way forward is for me to apply all of your
patches and fix them up on the way.  It turned out there was quite a few
changes that I wanted to make that I only discovered while examining the
patches very closely, so this seems to be a very useful exercise.

I haven't finished yet, but you can see my current state - which is up to
about patch 11 in your series - in my devel-3.2 branch.

Probably the most significant change so far is the interpretation of
action=spare-same-slot.  I have defined that in a way that is *more*
permissive than action=spare, where you had it less permissive.

The comment for that change set is below.
The new list of policy actions is:

enum policy_action {
	act_default,
	act_include,
	act_re_add,
	act_spare,	/* This only applies to bare devices */
	act_spare_same_slot, /* this allows non-bare devices,
			      * but only if recent removal */
	act_force_spare, /* this allow non-bare devices in any case */
	act_err
};

I'll hopefully get the rest of your patches done tomorrow and then I might
start on Adam... and then on the changes that I want to make.
If you can do some testing once enough code is in place, that would be great.
Thanks,
NeilBrown

commit a2191ce8af4aa178d62df759fab47ef4dc8e6f67
Author: NeilBrown <neilb@suse.de>
Date:   Wed Nov 17 12:46:35 2010 +1100

    Add action=spare-same-slot policy.
    
    When "mdadm -I" is given a device with no metadata, mdadm tries to add
    it as a 'spare' somewhere based on policy.
    
    This patch changes the behaviour in two ways:
    
    1/ If the device is at a 'path' where a previous device was removed
      from an array or container, then we preferentially add the spare to
      that array or container.
    
    2/ Previously only 'bare' devices were considered for adding as
      spares.  Now if action=spare-same-slot is active, we will add
      non-bare devices, but *only* if the path was previously in use
      for some array, and the device will only be added to that array.
    
    Based on code
      From: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
    
    Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com>
    Signed-off-by: NeilBrown <neilb@suse.de>


> 
> Autorebuild Monitoring application:
> Autorebuild monitor is part of monitor application (mdadm -F). In the current code of mdadm monitor autorebuild feature was based on spare group assignment in mdadm.conf file and worked only for native metadata. This code has been retained for compatibility with old config format.
> The new autorebuild implementation works also for external metadata types. It uses the concept of domains in mdadm.conf introduced by Neil Brown. 
> Monitoring application periodically checks the state of MD active arrays and triggers a rebuild if there are eligible spare disks in other arrays/containers.
> Degraded arrays are checked one by one. If there is a spare disk in other array/container that matches the domain of the degraded array and the domain action allows for spare sharing the spare is moved using existing Manage_subdevs function. If the addition fails, the spare device is moved back to the original container and next potential spare is tried. The process is repeated until all arrays are checked and the process is put into a sleep state for a configured period.
> 
> New option --no-sharing has been added to Monitor mode to be able to run monitoring only (without moving spares). This is recommended when many instances of monitor are to be run on the same set of devices.
> Spare sharing is allowed in only one instance of Monitor running with --scan option. User is still able to start Monitoring functions in multiple instances without --scan option.
> 
> The autorebuild build-in assumptions are:
> 1\spares are shared between the arrays of the same metadata 2\spares are moved only from containers/volumes that are not degraded 3\spares are moved to containers/volumes lacking a *good* spare (size)
> 
> Anna Czarnowska
> Przemyslaw Hawrylewicz-Czarnowski
> Marcin Labun
> 
> 0001-added-path-path_id-to-give-the-information-on-the-pa.patch
> 0002-Update-of-udev-rules-to-support-IMSM-devices.patch
> 0003-extension-of-IncrementalRemove-to-store-location-pat.patch
> 0004-Incremental-for-bare-disks-implementation-of-spare-s.patch
> 0005-Util-get-device-size-from-id.patch
> 0006-Monitor-set-err-on-arrays-not-in-mdstat.patch
> 0007-Monitor-spare-group-based-spare-sharing-moved-to-sep.patch
> 0008-mdadm-added-no-sharing-option-for-Monitor-mode.patch
> 0009-Monitor-avoid-skipping-checks-on-external-arays.patch
> 0010-Monitor-include-containers-in-scan-mode.patch
> 0011-Monitor-link-containers-with-subarrays-in-statelist.patch
> 0012-imsm-create-mdinfo-list-of-disks-in-a-container-from.patch
> 0013-Monitor-autorebuild-functionality-added.patch
> 0014-Monitor-Respect-policy-in-auto-rebuild-in-mdadm-moni.patch
> 0015-Monitor-more-accurate-size-check-when-looking-for-sp.patch
> 0016-IMSM-Fix-problem-in-mdmon-monitor-of-using-removed-d.patch
> 0017-Policy-is-aware-of-metadata-disk-s-controller-domain.patch
> 
> Incremental.c      |  230 +++++++++++++++---
> Makefile           |    3 +
> Monitor.c          |  691 ++++++++++++++++++++++++++++++++++++++++++++--------
> ReadMe.c           |    4 +
> managemon.c        |   38 +++
> mdadm.c            |   29 ++-
> mdadm.h            |   49 ++++-
> policy.c           |  134 +++++++++-
> super-intel.c      |  274 ++++++++++++++++++---
> udev-md-raid.rules |    7 +-
> util.c             |   23 ++
> 11 files changed, 1290 insertions(+), 192 deletions(-)
> ---------------------------------------------------------------------
> Intel Technology Poland sp. z o.o.
> z siedziba w Gdansku
> ul. Slowackiego 173
> 80-298 Gdansk
> 
> Sad Rejonowy Gdansk Polnoc w Gdansku, 
> VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
> numer KRS 101882
> 
> NIP 957-07-52-316
> Kapital zakladowy 200.000 zl
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [Patch 00/17] Autorebuild
  2010-11-17 10:22 ` Neil Brown
@ 2010-11-18 23:14   ` Czarnowska, Anna
  2010-11-19 12:43     ` Devel 3.2 branch issues Czarnowska, Anna
  0 siblings, 1 reply; 12+ messages in thread
From: Czarnowska, Anna @ 2010-11-18 23:14 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw

Hi Neil,
I started testing new code today. Just the Incremental part.

There are few problems:
1. Cookie file is cleared before it is read so spare-same-slot can't work. It should be just open for reading. (probably a typo)
2. Container uuid instead of subarray uuid is written in cookie file, so for ddf it may not be clear which subarray used the slot.
3. Incremental fail does not work for external metadata. Przemek's original patch did fail the disk in subarrays. Now Manage_subdevs tries to fail a disk in container while subarray is expected. Do you intend to change Manage_subdevs to take a container?
4. With spare-same-slot when there is a cookie and disk has no metadata then we probably shouldn't look at domains. Just add.

This is all for now.
Regards
Anna


> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Wednesday, November 17, 2010 11:23 AM
> To: Czarnowska, Anna
> Cc: linux-raid@vger.kernel.org; Neubauer, Wojciech; Williams, Dan J;
> Ciechanowski, Ed; Labun, Marcin; Hawrylewicz Czarnowski, Przemyslaw
> Subject: Re: [Patch 00/17] Autorebuild
> 
> On Fri, 29 Oct 2010 15:13:40 +0100
> "Czarnowska, Anna" <anna.czarnowska@intel.com> wrote:
> 
> > This is updated series of patches forming autorebuild functionality
> in mdadm monitor based on new policy code.
> 
> I Anna and all,
> 
> I have decided that the best way forward is for me to apply all of your
> patches and fix them up on the way.  It turned out there was quite a
> few
> changes that I wanted to make that I only discovered while examining
> the
> patches very closely, so this seems to be a very useful exercise.
> 
> I haven't finished yet, but you can see my current state - which is up
> to
> about patch 11 in your series - in my devel-3.2 branch.
> 
> Probably the most significant change so far is the interpretation of
> action=spare-same-slot.  I have defined that in a way that is *more*
> permissive than action=spare, where you had it less permissive.
> 
> The comment for that change set is below.
> The new list of policy actions is:
> 
> enum policy_action {
> 	act_default,
> 	act_include,
> 	act_re_add,
> 	act_spare,	/* This only applies to bare devices */
> 	act_spare_same_slot, /* this allows non-bare devices,
> 			      * but only if recent removal */
> 	act_force_spare, /* this allow non-bare devices in any case */
> 	act_err
> };
> 
> I'll hopefully get the rest of your patches done tomorrow and then I
> might
> start on Adam... and then on the changes that I want to make.
> If you can do some testing once enough code is in place, that would be
> great.
> Thanks,
> NeilBrown
> 
> commit a2191ce8af4aa178d62df759fab47ef4dc8e6f67
> Author: NeilBrown <neilb@suse.de>
> Date:   Wed Nov 17 12:46:35 2010 +1100
> 
>     Add action=spare-same-slot policy.
> 
>     When "mdadm -I" is given a device with no metadata, mdadm tries to
> add
>     it as a 'spare' somewhere based on policy.
> 
>     This patch changes the behaviour in two ways:
> 
>     1/ If the device is at a 'path' where a previous device was removed
>       from an array or container, then we preferentially add the spare
> to
>       that array or container.
> 
>     2/ Previously only 'bare' devices were considered for adding as
>       spares.  Now if action=spare-same-slot is active, we will add
>       non-bare devices, but *only* if the path was previously in use
>       for some array, and the device will only be added to that array.
> 
>     Based on code
>       From: Przemyslaw Czarnowski
> <przemyslaw.hawrylewicz.czarnowski@intel.com>
> 
>     Signed-off-by: Przemyslaw Czarnowski
> <przemyslaw.hawrylewicz.czarnowski@intel.com>
>     Signed-off-by: NeilBrown <neilb@suse.de>
> 
> 
> >
> > Autorebuild Monitoring application:
> > Autorebuild monitor is part of monitor application (mdadm -F). In the
> current code of mdadm monitor autorebuild feature was based on spare
> group assignment in mdadm.conf file and worked only for native
> metadata. This code has been retained for compatibility with old config
> format.
> > The new autorebuild implementation works also for external metadata
> types. It uses the concept of domains in mdadm.conf introduced by Neil
> Brown.
> > Monitoring application periodically checks the state of MD active
> arrays and triggers a rebuild if there are eligible spare disks in
> other arrays/containers.
> > Degraded arrays are checked one by one. If there is a spare disk in
> other array/container that matches the domain of the degraded array and
> the domain action allows for spare sharing the spare is moved using
> existing Manage_subdevs function. If the addition fails, the spare
> device is moved back to the original container and next potential spare
> is tried. The process is repeated until all arrays are checked and the
> process is put into a sleep state for a configured period.
> >
> > New option --no-sharing has been added to Monitor mode to be able to
> run monitoring only (without moving spares). This is recommended when
> many instances of monitor are to be run on the same set of devices.
> > Spare sharing is allowed in only one instance of Monitor running with
> --scan option. User is still able to start Monitoring functions in
> multiple instances without --scan option.
> >
> > The autorebuild build-in assumptions are:
> > 1\spares are shared between the arrays of the same metadata 2\spares
> are moved only from containers/volumes that are not degraded 3\spares
> are moved to containers/volumes lacking a *good* spare (size)
> >
> > Anna Czarnowska
> > Przemyslaw Hawrylewicz-Czarnowski
> > Marcin Labun
> >
> > 0001-added-path-path_id-to-give-the-information-on-the-pa.patch
> > 0002-Update-of-udev-rules-to-support-IMSM-devices.patch
> > 0003-extension-of-IncrementalRemove-to-store-location-pat.patch
> > 0004-Incremental-for-bare-disks-implementation-of-spare-s.patch
> > 0005-Util-get-device-size-from-id.patch
> > 0006-Monitor-set-err-on-arrays-not-in-mdstat.patch
> > 0007-Monitor-spare-group-based-spare-sharing-moved-to-sep.patch
> > 0008-mdadm-added-no-sharing-option-for-Monitor-mode.patch
> > 0009-Monitor-avoid-skipping-checks-on-external-arays.patch
> > 0010-Monitor-include-containers-in-scan-mode.patch
> > 0011-Monitor-link-containers-with-subarrays-in-statelist.patch
> > 0012-imsm-create-mdinfo-list-of-disks-in-a-container-from.patch
> > 0013-Monitor-autorebuild-functionality-added.patch
> > 0014-Monitor-Respect-policy-in-auto-rebuild-in-mdadm-moni.patch
> > 0015-Monitor-more-accurate-size-check-when-looking-for-sp.patch
> > 0016-IMSM-Fix-problem-in-mdmon-monitor-of-using-removed-d.patch
> > 0017-Policy-is-aware-of-metadata-disk-s-controller-domain.patch
> >
> > Incremental.c      |  230 +++++++++++++++---
> > Makefile           |    3 +
> > Monitor.c          |  691
> ++++++++++++++++++++++++++++++++++++++++++++--------
> > ReadMe.c           |    4 +
> > managemon.c        |   38 +++
> > mdadm.c            |   29 ++-
> > mdadm.h            |   49 ++++-
> > policy.c           |  134 +++++++++-
> > super-intel.c      |  274 ++++++++++++++++++---
> > udev-md-raid.rules |    7 +-
> > util.c             |   23 ++
> > 11 files changed, 1290 insertions(+), 192 deletions(-)
> > ---------------------------------------------------------------------
> > Intel Technology Poland sp. z o.o.
> > z siedziba w Gdansku
> > ul. Slowackiego 173
> > 80-298 Gdansk
> >
> > Sad Rejonowy Gdansk Polnoc w Gdansku,
> > VII Wydzial Gospodarczy Krajowego Rejestru Sadowego,
> > numer KRS 101882
> >
> > NIP 957-07-52-316
> > Kapital zakladowy 200.000 zl
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
z siedziba w Gdansku
ul. Slowackiego 173
80-298 Gdansk

Sad Rejonowy Gdansk Polnoc w Gdansku, 
VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
numer KRS 101882

NIP 957-07-52-316
Kapital zakladowy 200.000 zl

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Devel 3.2 branch issues
  2010-11-18 23:14   ` Czarnowska, Anna
@ 2010-11-19 12:43     ` Czarnowska, Anna
  2010-11-22  3:29       ` Neil Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Czarnowska, Anna @ 2010-11-19 12:43 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw

Hi Neil,
Our validation team have reported problems with assembly on the devel 3.2 branch.
I have verified that currently it is not possible to assemble any array.

Patch: Assemble - avoid including wayward devices
It does not affect native metadata but breaks assembly of external arrays.
Only one disk is assembled for any raid level. 

After patch: super_by_fd: return subarray info explicitly
Assembly becomes much slower.

Patch: Assemble: small cleanup of error checking
Breaks assembly for all metadata types. Nothing assembles after it is applied.

These are just early modifications of Assemble.c. The impact of further changes
can't be verified at the moment. 
Are you aware of the above issues? This is stopping our further validation.
I also mentioned issues with Incremental in previous mail.
When are you planning to submit the rest of modified autorebuild code?

Regards
Anna

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
z siedziba w Gdansku
ul. Slowackiego 173
80-298 Gdansk

Sad Rejonowy Gdansk Polnoc w Gdansku, 
VII Wydzial Gospodarczy Krajowego Rejestru Sadowego, 
numer KRS 101882

NIP 957-07-52-316
Kapital zakladowy 200.000 zl

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Devel 3.2 branch issues
  2010-11-19 12:43     ` Devel 3.2 branch issues Czarnowska, Anna
@ 2010-11-22  3:29       ` Neil Brown
  2010-11-22 17:18         ` Labun, Marcin
  2010-11-23 17:34         ` Labun, Marcin
  0 siblings, 2 replies; 12+ messages in thread
From: Neil Brown @ 2010-11-22  3:29 UTC (permalink / raw)
  To: Czarnowska, Anna
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Labun, Marcin,
	Hawrylewicz Czarnowski, Przemyslaw

On Fri, 19 Nov 2010 12:43:20 +0000
"Czarnowska, Anna" <anna.czarnowska@intel.com> wrote:

> Hi Neil,
> Our validation team have reported problems with assembly on the devel 3.2 branch.
> I have verified that currently it is not possible to assemble any array.
> 
> Patch: Assemble - avoid including wayward devices
> It does not affect native metadata but breaks assembly of external arrays.
> Only one disk is assembled for any raid level. 

Thanks - fixed.

> 
> After patch: super_by_fd: return subarray info explicitly
> Assembly becomes much slower.

Yep .. I was calling 'strcpy' with a NULL as the source - bad.
Fixed, though a subsequent patch removed the strcpy anyway.


> 
> Patch: Assemble: small cleanup of error checking
> Breaks assembly for all metadata types. Nothing assembles after it is applied.
> 

I'm not sure this is true, but the test/03* tests of assembly certainly fail.
I've fixed that.  Thanks.


> These are just early modifications of Assemble.c. The impact of further changes
> can't be verified at the moment. 
> Are you aware of the above issues? This is stopping our further validation.
> I also mentioned issues with Incremental in previous mail.
> When are you planning to submit the rest of modified autorebuild code?

Shortly .. 

by the way, some of the changes in you of the patches you sent have not been
included in any form.  They include:

- the getinfo_super_disks method.  I couldn't see why you need this.  All the
  info about the state of the arrays should already be available.
  If there is something that you need that we don't have, please explain and
  we can see how best to add it back in.

- min_active_disk_size_in_array.  I don't think the minimum current size is
  really a good guide.  I've kept the code for letting the metadata handler
  check the size, but anything beyond that should be done with domains I
  think.
  E.g have a domain '2G-or-greater' which is assigned to all 2G or greater
  devices.  Then anything smaller will automatically be excluded from arrays
  with those devices.

- The remove_from_super method.  As Dan pointed out there seems to be
  something wrong there so I chose to just leave it out for now.  If you
  could explain again what is needed, we can find the best way to add that
  functionality.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Devel 3.2 branch issues
  2010-11-22  3:29       ` Neil Brown
@ 2010-11-22 17:18         ` Labun, Marcin
  2010-11-22 18:47           ` Dan Williams
  2010-11-23 17:34         ` Labun, Marcin
  1 sibling, 1 reply; 12+ messages in thread
From: Labun, Marcin @ 2010-11-22 17:18 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Czarnowska, Anna,
	Williams, Dan J, Ciechanowski, Ed,
	Hawrylewicz Czarnowski, Przemyslaw

> - the getinfo_super_disks method.  I couldn't see why you need this.
> All the
>   info about the state of the arrays should already be available.
>   If there is something that you need that we don't have, please
> explain and
>   we can see how best to add it back in.

For external metadata we have added a metadata handler to get a disk state (a spare or not a spare) based on current metadata state on disk.
Ioctl(GET_DISK_INFO) does not have a disk state info for containers (returns 0 - so we don't know if it is a spare or a failed disk).
We know that a disk is an array member based on check its state in the array.

Since Monitor code on devel-3.2 does not have calls to getinfo_super_disks method, 
auto-rebuild grabs the first disk in container and tries to move it to a degraded one (without a successes), and the first one happens to be array member and have state = 0 (a good spare).
Like all disks in container. 

There is also a fatal in pol_add, when trying to update policy rules with NULL spare_group:
@@ -732,7 +738,8 @@ static int move_spare(struct state *from, struct state *to,
 				continue;

 			pol = devnum_policy(from->devid[d]);
-			pol_add(&pol, pol_domain, from->spare_group, NULL);
+			if (from->spare_group)
+				pol_add(&pol, pol_domain, from->spare_group, NULL);

We will send you our FT integrated with your mdadm test suit in a couple of days. 

Marcin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Devel 3.2 branch issues
  2010-11-22 17:18         ` Labun, Marcin
@ 2010-11-22 18:47           ` Dan Williams
  0 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2010-11-22 18:47 UTC (permalink / raw)
  To: Labun, Marcin
  Cc: Neil Brown, linux-raid@vger.kernel.org, Neubauer, Wojciech,
	Czarnowska, Anna, Ciechanowski, Ed,
	Hawrylewicz Czarnowski, Przemyslaw

On 11/22/2010 9:18 AM, Labun, Marcin wrote:
>> - the getinfo_super_disks method.  I couldn't see why you need this.
>> All the
>>    info about the state of the arrays should already be available.
>>    If there is something that you need that we don't have, please
>> explain and
>>    we can see how best to add it back in.
>
> For external metadata we have added a metadata handler to get a disk state (a spare or not a spare) based on current metadata state on disk.
> Ioctl(GET_DISK_INFO) does not have a disk state info for containers (returns 0 - so we don't know if it is a spare or a failed disk).
> We know that a disk is an array member based on check its state in the array.

I'm still catching up on the devel-3.2 getinfo/load_super reworks, but I 
think this info would probably fit into the new 'map' parameter of 
getinfo_super().  Spares can be indicated as 'working' in the map.  I.e. 
if map returns 1 for a container member and that disk is not currently 
in use in a subarray then we can assume it is a spare at the container 
level.

Alternatively we could just return 2 in the map to indicate spare, but I 
think in the locations we care about we already know that it is not 
currently in use in a subarray.

--
Dan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Devel 3.2 branch issues
  2010-11-22  3:29       ` Neil Brown
  2010-11-22 17:18         ` Labun, Marcin
@ 2010-11-23 17:34         ` Labun, Marcin
  1 sibling, 0 replies; 12+ messages in thread
From: Labun, Marcin @ 2010-11-23 17:34 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Neubauer, Wojciech, Williams, Dan J,
	Ciechanowski, Ed, Hawrylewicz Czarnowski, Przemyslaw,
	Czarnowska, Anna

This is fixed version of original patch for the problem of imsm using spare disk 
that has been removed from a container. In previous patch  there was a problem of
releasing spare structures too early for spares that has been a part of volume (mdadm -f) - fixed.
As for Dan's comment, 

> Since we do not update the metadata can we just lazily queue an modified
> imsm_delete() update the next time we call activate_spare() and find the spare removed?  That way it is just garbage collection without this new infrastructure that gives the appearance we are writing metadata when removing a spare.

In fact, we are not updating metadata when removing a spare. I am reusing current imsm communication method between mdmon threads that already is used to do much more than just metadata updates. 
In your proposal we would still need to send info to  monitor that a disk shall not longer be used in a container.

Marcin


From 49406f135843a6bc2d2d28a34f8e8647fcced4d0 Mon Sep 17 00:00:00 2001
From: Marcin Labun <marcin.labun@intel.com>
Date: Wed, 17 Nov 2010 00:09:02 +0100
Subject: [PATCH] IMSM: Fix problem in mdmon monitor of using removed disk from in imsm container.

Manager thread shall pass the information to monitor thread (mdmon)
that some devices are removed from container. Otherwise, monitor (mdmon)
might use such devices (spares) to rebuild the array that has gone degraded.

This problem happens for imsm containers, since a list of the container disks
is maintained in intel_super structure. When array goes degraded, the list is
searched to find a spare disks to start rebuild.
Without this fix the rebuild could be stared on the spare device that was
a member of the container, but has been removed from it.

New super type function handler has been introduced to prepare metadata
format specific information about removed devices.
int (*remove_from_super)(struct supertype *st, mdu_disk_info_t *dinfo,
                         int fd);
The message prepared in remove_from_super is later processed
by proceess_update handler in monitor thread.

Signed-off-by: Marcin Labun <marcin.labun@intel.com>
---
 managemon.c   |   38 +++++++++++++
 mdadm.h       |    7 ++-
 super-intel.c |  173 +++++++++++++++++++++++++++++++++++++++++++++++----------
 3 files changed, 187 insertions(+), 31 deletions(-)

diff --git a/managemon.c b/managemon.c
index 8915522..93b130a 100644
--- a/managemon.c
+++ b/managemon.c
@@ -297,6 +297,43 @@ static void add_disk_to_container(struct supertype *st, struct mdinfo *sd)
 	st->update_tail = NULL;
 }
 
+/*
+ * Create and queue update structure about the removed disks.
+ * The update is prepared by super type handler and passed to the monitor
+ * thread.
+ */
+static void remove_disk_from_container(struct supertype *st, struct mdinfo *sd)
+{
+	int dfd;
+	char nm[20];
+	struct metadata_update *update = NULL;
+	mdu_disk_info_t dk = {
+		.number = -1,
+		.major = sd->disk.major,
+		.minor = sd->disk.minor,
+		.raid_disk = -1,
+		.state = 0,
+	};
+	/* nothing to do if super type handler does not support
+	 * remove disk primitive
+	 */
+	if (!st->ss->remove_from_super)
+		return;
+	dprintf("%s: remove %d:%d to container\n",
+		__func__, sd->disk.major, sd->disk.minor);
+
+	sprintf(nm, "%d:%d", sd->disk.major, sd->disk.minor);
+	dfd = dev_open(nm, O_RDWR);
+	if (dfd < 0)
+		return;
+
+	st->update_tail = &update;
+	st->ss->remove_from_super(st, &dk, dfd);
+	st->ss->write_init_super(st);
+	queue_metadata_update(update);
+	st->update_tail = NULL;
+}
+
 static void manage_container(struct mdstat_ent *mdstat,
 			     struct supertype *container)
 {
@@ -334,6 +371,7 @@ static void manage_container(struct mdstat_ent *mdstat,
 			if (!found) {
 				cd = *cdp;
 				*cdp = (*cdp)->next;
+				remove_disk_from_container(container, cd);
 				free(cd);
 			} else
 				cdp = &(*cdp)->next;
diff --git a/mdadm.h b/mdadm.h
index 2d1db36..6309a62 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -596,7 +596,12 @@ extern struct superswitch {
 	 * when hot-adding a spare.
 	 */
 	int (*add_to_super)(struct supertype *st, mdu_disk_info_t *dinfo,
-			     int fd, char *devname);
+			    int fd, char *devname);
+	/* update the metadata to delete a device,
+	 * when hot-removing a spare.
+	 */
+	int (*remove_from_super)(struct supertype *st, mdu_disk_info_t *dinfo,
+				 int fd);
 
 	/* Write metadata to one device when fixing problems or adding
 	 * a new device.
diff --git a/super-intel.c b/super-intel.c
index 9b4ad19..ac168e8 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -233,6 +233,10 @@ struct intel_dev {
 	unsigned index;
 };
 
+enum action {
+  DISK_REMOVE = 0,
+  DISK_ADD
+};
 /* internal representation of IMSM metadata */
 struct intel_super {
 	union {
@@ -258,8 +262,10 @@ struct intel_super {
 		int extent_cnt;
 		struct extent *e; /* for determining freespace @ create */
 		int raiddisk; /* slot to fill in autolayout */
+		enum action action;
 	} *disks;
-	struct dl *add; /* list of disks to add while mdmon active */
+	struct dl *disk_mgmt_list; /* list of disks to add/remove while mdmon
+				      active */
 	struct dl *missing; /* disks removed while we weren't looking */
 	struct bbm_log *bbm_log;
 	const char *hba; /* device path of the raid controller for this metadata */
@@ -285,6 +291,7 @@ enum imsm_update_type {
 	update_kill_array,
 	update_rename_array,
 	update_add_disk,
+	update_add_remove_disk
 };
 
 struct imsm_update_activate_spare {
@@ -316,7 +323,7 @@ struct imsm_update_rename_array {
 	int dev_idx;
 };
 
-struct imsm_update_add_disk {
+struct imsm_update_add_remove_disk {
 	enum imsm_update_type type;
 };
 
@@ -2428,6 +2435,7 @@ static void __free_imsm_disk(struct dl *d)
 	free(d);
 
 }
+
 static void free_imsm_disks(struct intel_super *super)
 {
 	struct dl *d;
@@ -3393,6 +3401,7 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
 	dd->devname = devname ? strdup(devname) : NULL;
 	dd->fd = fd;
 	dd->e = NULL;
+	dd->action = DISK_ADD;
 	rv = imsm_read_serial(fd, devname, dd->serial);
 	if (rv) {
 		fprintf(stderr,
@@ -3412,8 +3421,8 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
 		dd->disk.scsi_id = __cpu_to_le32(0);
 
 	if (st->update_tail) {
-		dd->next = super->add;
-		super->add = dd;
+		dd->next = super->disk_mgmt_list;
+		super->disk_mgmt_list = dd;
 	} else {
 		dd->next = super->disks;
 		super->disks = dd;
@@ -3422,6 +3431,45 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
 	return 0;
 }
 
+
+static int remove_from_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
+				  int fd)
+{
+	struct intel_super *super = st->sb;
+	struct dl *dd;
+
+	/* remove from super works only in mdmon - for communication
+	 * manager - monitor. Check if communication memory buffer
+	 * is prepared.
+	 */
+	if (!st->update_tail) {
+		fprintf(stderr,
+			Name ": %s shall be used in mdmon context only"
+			"(line %d).\n", __func__, __LINE__);
+		return 1;
+	}
+	dd = malloc(sizeof(*dd));
+	if (!dd) {
+		fprintf(stderr,
+			Name ": malloc failed %s:%d.\n", __func__, __LINE__);
+		return 1;
+	}
+	memset(dd, 0, sizeof(*dd));
+	dd->major = dk->major;
+	dd->minor = dk->minor;
+	dd->index = -1;
+	dd->fd = fd;
+	dd->disk.status = SPARE_DISK;
+	dd->action = DISK_REMOVE;
+
+	if (st->update_tail) {
+		dd->next = super->disk_mgmt_list;
+		super->disk_mgmt_list = dd;
+	}
+
+	return 0;
+}
+
 static int store_imsm_mpb(int fd, struct imsm_super *mpb);
 
 static union {
@@ -3574,13 +3622,13 @@ static int create_array(struct supertype *st, int dev_idx)
 	return 0;
 }
 
-static int _add_disk(struct supertype *st)
+static int mgmt_disk(struct supertype *st)
 {
 	struct intel_super *super = st->sb;
 	size_t len;
-	struct imsm_update_add_disk *u;
+	struct imsm_update_add_remove_disk *u;
 
-	if (!super->add)
+	if (!super->disk_mgmt_list)
 		return 0;
 
 	len = sizeof(*u);
@@ -3591,7 +3639,7 @@ static int _add_disk(struct supertype *st)
 		return 1;
 	}
 
-	u->type = update_add_disk;
+	u->type = update_add_remove_disk;
 	append_metadata_update(st, u, len);
 
 	return 0;
@@ -3613,10 +3661,10 @@ static int write_init_super_imsm(struct supertype *st)
 
 		/* determine if we are creating a volume or adding a disk */
 		if (current_vol < 0) {
-			/* in the add disk case we are running in mdmon
-			 * context, so don't close fd's
+			/* in the mgmt (add/remove) disk case we are running
+			 * in mdmon context, so don't close fd's
 			 */
-			return _add_disk(st);
+			return mgmt_disk(st);
 		} else
 			rv = create_array(st, current_vol);
 
@@ -4873,10 +4921,9 @@ static int store_imsm_mpb(int fd, struct imsm_super *mpb)
 static void imsm_sync_metadata(struct supertype *container)
 {
 	struct intel_super *super = container->sb;
-
+	dprintf("sync metadata: %d\n", super->updates_pending);
 	if (!super->updates_pending)
 		return;
-
 	write_super_imsm(super, 0);
 
 	super->updates_pending = 0;
@@ -5165,8 +5212,80 @@ static int disks_overlap(struct intel_super *super, int idx, struct imsm_update_
 	return 0;
 }
 
+
+static struct dl *get_disk_super(struct intel_super *super, int major, int minor)
+{
+	struct dl *dl = NULL;
+	for (dl = super->disks; dl; dl = dl->next)
+		if ((dl->major == major) &&  (dl->minor == minor))
+			return dl;
+	return NULL;
+}
+
+static int remove_disk_super(struct intel_super *super, int major, int minor)
+{
+	struct dl *prev = NULL;
+	struct dl *dl;
+
+	prev = NULL;
+	for (dl = super->disks; dl; dl = dl->next) {
+		if ((dl->major == major) && (dl->minor == minor)) {
+			/* remove */
+			if (prev)
+				prev->next = dl->next;
+			else
+				super->disks = dl->next;
+			dl->next = NULL;
+			__free_imsm_disk(dl);
+			dprintf("%s: removed %x:%x\n",
+				__func__, major, minor);
+			break;
+		}
+		prev = dl;
+	}
+	return 0;
+}
+
 static void imsm_delete(struct intel_super *super, struct dl **dlp, unsigned index);
 
+static int add_remove_disk_update(struct intel_super *super)
+{
+	int check_degraded = 0;
+	struct dl *disk = NULL;
+	/* add/remove some spares to/from the metadata/contrainer */
+	while (super->disk_mgmt_list) {
+		struct dl *disk_cfg;
+
+		disk_cfg = super->disk_mgmt_list;
+		super->disk_mgmt_list = disk_cfg->next;
+		disk_cfg->next = NULL;
+
+		if (disk_cfg->action == DISK_ADD) {
+			disk_cfg->next = super->disks;
+			super->disks = disk_cfg;
+			check_degraded = 1;
+			dprintf("%s: added %x:%x\n",
+				__func__, disk_cfg->major,
+				disk_cfg->minor);
+		} else if (disk_cfg->action == DISK_REMOVE) {
+			dprintf("Disk remove action processed: %x.%x\n",
+				disk_cfg->major, disk_cfg->minor);
+			disk = get_disk_super(super,
+					     disk_cfg->major,
+					     disk_cfg->minor);
+			/* remove spare disks only */
+			if (disk->index == -1) {
+				remove_disk_super(super,  
+						  disk_cfg->major,
+						  disk_cfg->minor);
+			}
+			/* release allocate disk structure */
+			__free_imsm_disk(disk_cfg);
+		}
+	}
+	return check_degraded;
+}
+
 static void imsm_process_update(struct supertype *st,
 			        struct metadata_update *update)
 {
@@ -5476,31 +5595,24 @@ static void imsm_process_update(struct supertype *st,
 		super->updates_pending++;
 		break;
 	}
-	case update_add_disk:
-
+	case update_add_remove_disk: {
 		/* we may be able to repair some arrays if disks are
-		 * being added */
-		if (super->add) {
+		 * being added, check teh status of add_remove_disk
+		 * if discs has been added.
+		 */
+		if (add_remove_disk_update(super)) {
 			struct active_array *a;
 
 			super->updates_pending++;
- 			for (a = st->arrays; a; a = a->next)
+			for (a = st->arrays; a; a = a->next)
 				a->check_degraded = 1;
 		}
-		/* add some spares to the metadata */
-		while (super->add) {
-			struct dl *al;
-
-			al = super->add;
-			super->add = al->next;
-			al->next = super->disks;
-			super->disks = al;
-			dprintf("%s: added %x:%x\n",
-				__func__, al->major, al->minor);
-		}
-
 		break;
 	}
+	default:
+		fprintf(stderr, "error: unsuported process update type:"
+			"(type: %d)\n",	type);
+	}
 }
 
 static void imsm_prepare_update(struct supertype *st,
@@ -5685,6 +5797,7 @@ struct superswitch super_imsm = {
 	.write_init_super = write_init_super_imsm,
 	.validate_geometry = validate_geometry_imsm,
 	.add_to_super	= add_to_super_imsm,
+	.remove_from_super	= remove_from_super_imsm,
 	.detail_platform = detail_platform_imsm,
 	.kill_subarray = kill_subarray_imsm,
 	.update_subarray = update_subarray_imsm,
-- 
1.6.4.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-11-28 22:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-22 22:39 Devel 3.2 branch issues Czarnowska, Anna
2010-11-23  0:52 ` Neil Brown
2010-11-23 12:04   ` Czarnowska, Anna
2010-11-25  8:01   ` Neil Brown
2010-11-25 10:28     ` Czarnowska, Anna
2010-11-26 18:23       ` Czarnowska, Anna
2010-11-28 22:59         ` Neil Brown
  -- strict thread matches above, loose matches on Subject: below --
2010-10-29 14:13 [Patch 00/17] Autorebuild Czarnowska, Anna
2010-11-17 10:22 ` Neil Brown
2010-11-18 23:14   ` Czarnowska, Anna
2010-11-19 12:43     ` Devel 3.2 branch issues Czarnowska, Anna
2010-11-22  3:29       ` Neil Brown
2010-11-22 17:18         ` Labun, Marcin
2010-11-22 18:47           ` Dan Williams
2010-11-23 17:34         ` Labun, Marcin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).