linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* A policy frame work for mdadm (incorporating domains and hotplug and such)
       [not found]                       ` <4C2B97E3.2080309@intel.com>
@ 2010-07-01  6:50                         ` Neil Brown
  2010-07-01  8:26                           ` Dan Williams
                                             ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Neil Brown @ 2010-07-01  6:50 UTC (permalink / raw)
  To: Dan Williams, Doug Ledford
  Cc: Labun, Marcin, Czarnowska, Anna,
	Hawrylewicz Czarnowski, Przemyslaw, Ciechanowski, Ed,
	Healey, Douglas D, Neubauer, Wojciech, linux-raid


Hi all,
 I figured it was time to make a firm decision on what "domains" and related
 things would look like in mdadm.  In all the discussions so far I have just
 been making suggestions and exploring possibilities and wandering around the
 edges of the issue.  But that cannot last forever as there is need for some
 certainty.

 I had a read through Doug's patch set and Przemyslaw's and Anna's work on
 top of that and there were certain aspects of what I saw that I didn't
 like.
 In particular the model of what a 'domain' was seems to keep changing, first
 growing special cases for partitions, and then growing subsets (which I admit
 I didn't completely understand).  When something grows and changes like that
 so quickly there is a very real possibility that the final result won't meet
 the original needs any more.

 I think we need to start with something that is *right* - at least as far as
 it goes.  Refinements that are predictable are ok, but structural changes
 aren't.

 So here is my concrete proposal on how these things will work.  I have
 already started implementing it, which shows that I'm fairly committed to
 this and would need a very strong argument for significant change to happen.


 The first step is to forget about domains.  We will come back to them later
 as they are important and useful.  But they are not central and we won't be
 starting there.  So forget them.  (Forget what? I don't remember anything...)

 What we need is a policy framework, for encoding policy about the various
 automatic actions that mdadm performs.  We already have bits of policy like
 the spare-group tag (which guides automatic spare migration) and the 'auto'
 mdadm.conf line (which guides automatic assembly).  However that is all
 ad-hoc and as the amount of policy increases, the amount of interaction
 increases so we need a unifying platform.  That is where we need to start.

 So point 1 is that we need a policy framework.

 Point 2 is that policy revolves primarily around devices (rather than
 arrays) and to a lesser extent around metadata types.
 It is devices that are migrated, devices that arrays are built from, devices
 that are automatically made into spares etc.
 Metadata types often encode some specific policy in the metadata, so they
 need some fairly strong role in the policy framework too.  Often the
 metadata type is like a parameter to a policy.  "You can incorporate this
 device in any imsm array".

 So Abstraction 1 is a "Policy statement".

 A policy statement applies to a particular device, possibly in the context
 of a particular metadata, and asserts that a particular name has a
 particular value.
     action=spare (ddf1)
 might be a policy statement about a device.  It says that where ddf1
 metadata is involved, the device can be made a hot-spare when it is
 hot-plugged.
     auto=homehost (0.90)
 might be another which says that auto-assembly may use a non-disambiguated
 name (no trailing _NN) when assembling this device into a metadata=0.90
 array providing the homehost information in the metadata matches this host.

 A statement might not have any metadata type associated.
     action=ignore
 applies irrespective of metadata type.

 The policy names that I currently envisage are:

   action=  ignore, include, spare, force-spare

     which covers the hotplug actions that --incremental might perform.

   auto=  yes, homehost, no
 
     which covers the functionality currently in the AUTO mdadm.conf line

   domain=  arbitrary-string

     This provides the 'domain' isolation functionality.
     The semantics I have in mind (and the precise details here are fairly
     important so this cannot be changed lightly) are:
       A device can have a number of domains, possibly from various sources.
       An array can have a number of domains, from the devices plus from
       spare-group

      A device may be attached to an array if all of the domains of the device
      are also domains of the array.  The array may have extra domains.  The
      device may not.

      This requires that if there are overlapping domains, they must properly
      nest. i.e. the intersection of two domains must be empty, or one of the
      domains.  It might make sense to have a domain 'global' which all
      devices have, and some other domains which just subsets have.

  There is probably room for other policies like whether to start an
  incrementally assembled degraded array early, or wait until it is not
  degraded.  Maybe some policy of handling "prodigal device" situations where
  two halfs of a mirror both this they are "it" and the other is "not".

By now Doug (hope your back is feeling better) will have noticed that
partitions haven't been mentioned yet.  So it is time for them.

Point 3: partitions become a new metadata type (or types).

If we want mdadm to ensure there is a MBR partition table on a device, then
provide a policy statement like
   action=spare (mbr)

so if the device doesn't have recognised metadata, mdadm configures it as a
spare of type mdr, getting the table from some compatible pre-existing device.
There is probably room to refine this to get the table from a file like
Doug's patches aimed to.  That wouldn't be my first preference as it requires
extra configuration, but it might be necessary.  That would require adding
some sort of argument to each policy statement, they become
  name = value (metadata) other-arguments
I'd rather keep that to a very minimum though.


Note that the above syntax is all abstract syntax.  It reflects the internal
data structures, but not necessarily the way that policy will be expressed to
mdadm.  For that we need to start with some concrete syntax for mdadm.conf
So:

  Point 4:  policy is specified in mdadm.conf by "POLICY" lines (aka policy
  rules)

   A policy line contains match words, assignment words, and metadata words.
     match words are name=value  or possibly  name==value - haven't decided
               yet.
     assignment words are name=value (or name:=value ... probably not)
     metadata words are "metadata=foo"

   A device matches a policy line if, for each match name that appears, the
   device matches at least one of the values.
   So if we have
              POLICY a==1 a==2 b==3 b==4

   then for a device to match it must have an 'a' or 1 or 2, and a 'b' of
   3 or 4, but it doesn't matter what the device has for 'c'.

   One device may match multiple POLICY lines and if it does so, it
   accumulates all the assigned words.  The ordering of policy lines is
   irrelevant to the end result.  For this to work we might need to add
   a "word!=value" - I hope not, but it wouldn't be a big problem.


   If a device matches a policy line then a separate policy statement is
   created combining each assignment word with each metadata word (if there
   are any).  This list of policy statements is added to the device's policy.

 Sometimes policy is very metadata dependent so:

 Point 5: policy can be specified by the metadata handler too.

   If a device is found to have metadata on it, then when that metadata is
   loaded (->load_super())  it might add some policy statements to the
   device.  If it does they will all be in the context of the relevant
   metadata type.  This will probably include 'domain' assignments to restrict
   spare migration.

 But wait, there's more

 Point 6:  We probably have platform policy too. I'm not really sure what
 this will involve, and what if anything needs to be explicit.  Maybe just

     platform-policy  imsm

 in mdadm.conf tell mdadm to query the platform and deduce some policy
 statements or police rules.

 There is a strong pattern that when a set of devices is partitioned, all the
 '1' partitions go in one array, all the '2' partitions in another etc.
 It might be useful to have config-file support for this pattern, so a
 possible config file line would be:

    partition-policy  path=foo domain=bar

 which effectively makes multiple policy lines each of which has '-partNN'
 added to all 'path' values and all 'domain' values.  But I'm getting ahead
 of myself...

 The 'match' names that I imagine are:
    path=   which is given a 'glob' pattern to match against the path name
               from /dev/disk/by-path/
    type=   which is either 'disk' or 'partition'

 We could also have size=  which uses the standardised disk sizes so it
 would be easy to say that all 2GB devices only migrate to arrays with 2GB
 devices in them.


So: given a device we extract a bunch of policy statements from various
sources.  Now we need to know how to apply those policy statements in
different situations.  There are various contexts where we need to review
policy.

A/ When considering adding a device to an array.
   This can happen at hot-plug either because the device looked like
   a member of the array, or because the device is being added as a new
   spare.

   The primary policy information here is 'domain'.
   We extract a list of domains that the device is in which are specific
   to the metadata of the array (or are not metadata-specific)

   We also get a list of domains for the array by extracting
   a similar list for each device and including any spare-group
   from mdadm.conf

   Then we check if the set of domains for the device is a subset of the set
   of domains for the array.  If it is (and is non-empty), the addition is
   allowed.  If it isn't then the addition probably isn't allow, though we
   might invent some other policy like "no-strict-domains", or assert that
   domains don't apply when the user explicitly makes a request.  or uses
   --force.  Or something.

   This might have some variation depending on whether the 'add this to an
   array' came from --create or --assemble or --add or --re-add or
   --incremental or --monitor doing spare migration.
   My point at the moment isn't to give the entire algorithm but the show how
   the policy framework would inform that algorithm.

B/ when considering what to do with a device that has been passed to
    --incremental.

   For this we need to
       1/ identify an array, and hence a metadata type
       2/ find the 'action' policy for the device with that metadata type.
       3/ if there are more than one, fail
       4/ if the one is 'ignore' do nothing
       5/ if 'A' above says we cannot add this device, then give up
       6/ consider which of 'include', 'spare', 'force-spare' might apply
          here.....

   If the device has recognisable metadata, which identifies an array, then
   the array identified in step 1 is just that array.
   If the device does not have recognisable metadata, then we consider 
   each array in turn (though we might optimise out some easy cases like
   if all metadatas say 'ignore' then don't bother listing arrays).

   If multiple arrays all allow the device to be added, we would need to
   chose the first which is degraded (unless we invented some other policy).


So this is how I want these things to work, and this is what I'm going to be
coding.  I should have the basic framework in place early next week (assuming
no major interruptions) at which point I'll make the code available.

The part of this that I'm least confident of is assigning domains to arrays.
Extracting a list of policy statements for each device sounds a bit
cumbersome.  Maybe if I cache enough bits of it, it will work nicely.

Comments, as always, are most welcome.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A policy frame work for mdadm (incorporating domains and hotplug and such)
  2010-07-01  6:50                         ` A policy frame work for mdadm (incorporating domains and hotplug and such) Neil Brown
@ 2010-07-01  8:26                           ` Dan Williams
  2010-07-06  5:19                             ` Neil Brown
  2010-07-06 14:03                           ` Labun, Marcin
  2010-07-08  7:58                           ` Neil Brown
  2 siblings, 1 reply; 11+ messages in thread
From: Dan Williams @ 2010-07-01  8:26 UTC (permalink / raw)
  To: Neil Brown
  Cc: Doug Ledford, Labun, Marcin, Czarnowska, Anna,
	Hawrylewicz Czarnowski, Przemyslaw, Ciechanowski, Ed,
	Healey, Douglas D, Neubauer, Wojciech, linux-raid@vger.kernel.org

On 6/30/2010 11:50 PM, Neil Brown wrote:
>        This requires that if there are overlapping domains, they must properly
>        nest. i.e. the intersection of two domains must be empty, or one of the
>        domains.It might make sense to have a domain 'global' which all
>        devices have, and some other domains which just subsets have.

You lost me here "or one of the domains..." must be a superset of the other?

How do we a priori know which domain an array belongs to?  Will we 
require them to be tagged (makes our job easier at the cost of some 
configuration file maintenance for the administrator).  Taking the 
domain == controller example, if a user identifies an array as 	 
belonging to controller1 in the configuration file and later moves a set 
of member devices to controller2 I assume we ignore those devices right?

This would simplify things for the imsm assembly case because it 
requires the array-to-domain association to be identified ahead of time 
rather than arbitrarily autodetected by where we happen to find the 
first array member.

If an assembly statement is ambiguous we fail and ask for the domain to 
be clarified.

>    There is probably room for other policies like whether to start an
>    incrementally assembled degraded array early, or wait until it is not
>    degraded.  Maybe some policy of handling "prodigal device" situations where
>    two halfs of a mirror both this they are "it" and the other is "not".
>
> By now Doug (hope your back is feeling better) will have noticed that
> partitions haven't been mentioned yet.  So it is time for them.
>
> Point 3: partitions become a new metadata type (or types).
>
> If we want mdadm to ensure there is a MBR partition table on a device, then
> provide a policy statement like
>     action=spare (mbr)

Where the metadata type is determined by the current arrays in the 
domain where the device was attached if I am following correctly.

[..]
>  Point 6:  We probably have platform policy too. I'm not really sure what
>  this will involve, and what if anything needs to be explicit.  Maybe just
>
>      platform-policy  imsm
>
>  in mdadm.conf tell mdadm to query the platform and deduce some policy
>  statements or police rules.

I don't know if we need to add platform policy to the configuration 
file, maybe we can revisit this when we have a metadata format where 
"RAID mode" cannot be disabled in the firmware.  For now the policies 
enforced by the platform really are not optional (lest we confuse 
firmware), so I'd just as soon not allow them to be configured.  The 
mitigations are turn off raid mode or set the environment variable which 
should tell you that you are doing something tricky.  I'll come back if 
I think of a non-critical platform dependent policy.

[..]
> The part of this that I'm least confident of is assigning domains to arrays.

It would be nice if every array came pre-tagged with what domain it 
belongs, but that can't be a requirement.  Conversely users that don't 
set up a domain will sometimes find one forced upon them by the 
metadata.  On such a platform where there are hardware defined domains I 
think it would be reasonable for the user to identify which domain is 
the context for the action.

Like the following, (assuming an empty mdadm.conf) sda has imsm metadata 
attached to ahci and sdb has imsm metadata, but is attached to usb.

	mdadm -A /dev/md0 /dev/sda /dev/sdb

...we fail with an error message like "/dev/sda was tagged as a member 
of the ahci domain while /dev/sdb is only a member of the global domain, 
aborting".

	mdadm -A /dev/md0 /dev/sda /dev/sdb --domain ahci

...would succeed with a message like "/dev/sdb is not a member of the 
ahci domain, ignoring."

> Extracting a list of policy statements for each device sounds a bit
> cumbersome.  Maybe if I cache enough bits of it, it will work nicely.
>
> Comments, as always, are most welcome.

Thanks for the thoughtful write up, as always.

--
Dan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A policy frame work for mdadm (incorporating domains and hotplug and such)
  2010-07-01  8:26                           ` Dan Williams
@ 2010-07-06  5:19                             ` Neil Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Brown @ 2010-07-06  5:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Doug Ledford, Labun, Marcin, Czarnowska, Anna,
	Hawrylewicz Czarnowski, Przemyslaw, Ciechanowski, Ed,
	Healey, Douglas D, Neubauer, Wojciech, linux-raid@vger.kernel.org

On Thu, 01 Jul 2010 01:26:45 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> On 6/30/2010 11:50 PM, Neil Brown wrote:
> >        This requires that if there are overlapping domains, they must properly
> >        nest. i.e. the intersection of two domains must be empty, or one of the
> >        domains.It might make sense to have a domain 'global' which all
> >        devices have, and some other domains which just subsets have.
> 
> You lost me here "or one of the domains..." must be a superset of the other?

Yes, it is the mathematician in me: precise and obscure...

If the domains (sets of devices) are A and B then

  A intersect B    is-an-element-of    { {} , A , B }
(empty or one of the domains)
so either they are disjoint or one is a subset of the other.

> 
> How do we a priori know which domain an array belongs to?  Will we 
> require them to be tagged (makes our job easier at the cost of some 
> configuration file maintenance for the administrator).  Taking the 
> domain == controller example, if a user identifies an array as 	 
> belonging to controller1 in the configuration file and later moves a set 
> of member devices to controller2 I assume we ignore those devices right?

I'm not sure.  If the user asks for something that doesn't make sense, how
brutal should we be?

My tendency would be to use domains mainly as a guide.  If the user
explicitly ask to violate domain constraints, we let them and give a
warning.  If the metadata explicitly states a relationship between two devices
that cannot be united without violating a metadata constraint, we obey the
metadata (with a warning).  But mdadm never violates a metadata constraint
without something that explicit.

Possibly a policy assertion could add that certain devices must always obey
all domain constraints.  This would turn those warnings into errors.

But it is only a tendency....

> 
> This would simplify things for the imsm assembly case because it 
> requires the array-to-domain association to be identified ahead of time 
> rather than arbitrarily autodetected by where we happen to find the 
> first array member.
> 
> If an assembly statement is ambiguous we fail and ask for the domain to 
> be clarified.
> 
> >    There is probably room for other policies like whether to start an
> >    incrementally assembled degraded array early, or wait until it is not
> >    degraded.  Maybe some policy of handling "prodigal device" situations where
> >    two halfs of a mirror both this they are "it" and the other is "not".
> >
> > By now Doug (hope your back is feeling better) will have noticed that
> > partitions haven't been mentioned yet.  So it is time for them.
> >
> > Point 3: partitions become a new metadata type (or types).
> >
> > If we want mdadm to ensure there is a MBR partition table on a device, then
> > provide a policy statement like
> >     action=spare (mbr)
> 
> Where the metadata type is determined by the current arrays in the 
> domain where the device was attached if I am following correctly.

In the example the metadata (mbr) is explicitly included in the policy.
If you had a policy which just said "action=space" without identifying a
metadata type then yes: if all the other devices in the same domain had a
common metadata type that would be used.

> 
> [..]
> >  Point 6:  We probably have platform policy too. I'm not really sure what
> >  this will involve, and what if anything needs to be explicit.  Maybe just
> >
> >      platform-policy  imsm
> >
> >  in mdadm.conf tell mdadm to query the platform and deduce some policy
> >  statements or police rules.
> 
> I don't know if we need to add platform policy to the configuration 
> file, maybe we can revisit this when we have a metadata format where 
> "RAID mode" cannot be disabled in the firmware.  For now the policies 
> enforced by the platform really are not optional (lest we confuse 
> firmware), so I'd just as soon not allow them to be configured.  The 
> mitigations are turn off raid mode or set the environment variable which 
> should tell you that you are doing something tricky.  I'll come back if 
> I think of a non-critical platform dependent policy.

OK, we'll leave that aspect for future decision.


> 
> [..]
> > The part of this that I'm least confident of is assigning domains to arrays.
> 
> It would be nice if every array came pre-tagged with what domain it 
> belongs, but that can't be a requirement.  Conversely users that don't 
> set up a domain will sometimes find one forced upon them by the 
> metadata.  On such a platform where there are hardware defined domains I 
> think it would be reasonable for the user to identify which domain is 
> the context for the action.
> 
> Like the following, (assuming an empty mdadm.conf) sda has imsm metadata 
> attached to ahci and sdb has imsm metadata, but is attached to usb.
> 
> 	mdadm -A /dev/md0 /dev/sda /dev/sdb
> 
> ...we fail with an error message like "/dev/sda was tagged as a member 
> of the ahci domain while /dev/sdb is only a member of the global domain, 
> aborting".
> 
> 	mdadm -A /dev/md0 /dev/sda /dev/sdb --domain ahci
> 
> ...would succeed with a message like "/dev/sdb is not a member of the 
> ahci domain, ignoring."

If someone actually has two such devices which do constitute a valid imsm
array then either:
 - they are trying recover some some sort of failure (copied a failing device
   to a spare usb device?) and don't want mdadm to get in their way. or
 - they explicitly created it that way and jumped any hurdles at that point.
 - something else I haven't thought of.

In the first two cases I would rather assemble the array and just give a
warning.

When creating such an array it might be appropriate to give some warning,
and require confirmation but I cannot quite think how that should look yet.

I'm toying with the idea of recording the domains of an array in the 'map'
file.   They would be assessed when assembling the array and updated when
a spare was added etc.  They would be the unions of the domains of all
component devices plus anything explicitly configured for the array.
Not sure how much this would help yet though.

Maybe if something were explicitly configured, we would ignore the domains of
the component devices (?).

Thanks,
NeilBrown


> 
> > Extracting a list of policy statements for each device sounds a bit
> > cumbersome.  Maybe if I cache enough bits of it, it will work nicely.
> >
> > Comments, as always, are most welcome.
> 
> Thanks for the thoughtful write up, as always.
> 
> --
> Dan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: A policy frame work for mdadm (incorporating domains and hotplug and such)
  2010-07-01  6:50                         ` A policy frame work for mdadm (incorporating domains and hotplug and such) Neil Brown
  2010-07-01  8:26                           ` Dan Williams
@ 2010-07-06 14:03                           ` Labun, Marcin
  2010-07-06 22:40                             ` Neil Brown
  2010-07-08  7:58                           ` Neil Brown
  2 siblings, 1 reply; 11+ messages in thread
From: Labun, Marcin @ 2010-07-06 14:03 UTC (permalink / raw)
  To: Neil Brown
  Cc: Williams, Dan J, Doug Ledford, Czarnowska, Anna,
	Hawrylewicz Czarnowski, Przemyslaw, Ciechanowski, Ed,
	Healey, Douglas D, Neubauer, Wojciech, linux-raid@vger.kernel.org


> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Thursday, July 01, 2010 8:50 AM
> To: Williams, Dan J; Doug Ledford
> Cc: Labun, Marcin; Czarnowska, Anna; Hawrylewicz Czarnowski,
> Przemyslaw; Ciechanowski, Ed; Healey, Douglas D; Neubauer, Wojciech;
> linux-raid@vger.kernel.org
> Subject: A policy frame work for mdadm (incorporating domains and
> hotplug and such)

[cut]


> 
>  Point 2 is that policy revolves primarily around devices (rather than
>  arrays) and to a lesser extent around metadata types.
>  It is devices that are migrated, devices that arrays are built from,
> devices
>  that are automatically made into spares etc.
>  Metadata types often encode some specific policy in the metadata, so
> they
>  need some fairly strong role in the policy framework too.  Often the
>  metadata type is like a parameter to a policy.  "You can incorporate
> this
>  device in any imsm array".
> 
>  So Abstraction 1 is a "Policy statement".
> 
>  A policy statement applies to a particular device, possibly in the
> context
>  of a particular metadata, and asserts that a particular name has a
>  particular value.
>      action=spare (ddf1)
>  might be a policy statement about a device.  It says that where ddf1
>  metadata is involved, the device can be made a hot-spare when it is
>  hot-plugged.
>      auto=homehost (0.90)
>  might be another which says that auto-assembly may use a non-
> disambiguated
>  name (no trailing _NN) when assembling this device into a
> metadata=0.90
>  array providing the homehost information in the metadata matches this
> host.
> 
>  A statement might not have any metadata type associated.
>      action=ignore
>  applies irrespective of metadata type.
> 
>  The policy names that I currently envisage are:
> 
>    action=  ignore, include, spare, force-spare
> 
>      which covers the hotplug actions that --incremental might perform.
> 
>    auto=  yes, homehost, no
> 
>      which covers the functionality currently in the AUTO mdadm.conf
> line
> 
>    domain=  arbitrary-string
> 
>      This provides the 'domain' isolation functionality.
>      The semantics I have in mind (and the precise details here are
> fairly
>      important so this cannot be changed lightly) are:
>        A device can have a number of domains, possibly from various
> sources.
>        An array can have a number of domains, from the devices plus
> from
>        spare-group
> 
>       A device may be attached to an array if all of the domains of the
> device
>       are also domains of the array.  The array may have extra domains.
> The
>       device may not.
> 
>       This requires that if there are overlapping domains, they must
> properly
>       nest. i.e. the intersection of two domains must be empty, or one
> of the
>       domains.  It might make sense to have a domain 'global' which all
>       devices have, and some other domains which just subsets have.
> 

What is the usage of domain name? 
I understand that each policy line can have a domain token.
In this case each device that matches "match" keyword gains another domain. 
But how it can use it?

If each policy line has its own domain, so we have explicit relation 
one-to-one between POLICY line and domain name.
Can different POLICY line share the same domain name?

Another questions is domain intersection. 
It seems to me that the plan is test if "matching" keyword intersect. 
In case domain would have non-empty intersection if the path globs overlap? 
What else? Type (disk, partition) Metadata name?


Matching multiple policy lines that define conflicting policy names assignment is invalid configuration, right?
(for instance one policy defines action=ignore and the other include and disk matches both POLICY line)
How to deal with it? 




Thanks,
Marcin Labun




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A policy frame work for mdadm (incorporating domains and hotplug and such)
  2010-07-06 14:03                           ` Labun, Marcin
@ 2010-07-06 22:40                             ` Neil Brown
  2010-07-08  7:54                               ` Labun, Marcin
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Brown @ 2010-07-06 22:40 UTC (permalink / raw)
  To: Labun, Marcin
  Cc: Williams, Dan J, Doug Ledford, Czarnowska, Anna,
	Hawrylewicz Czarnowski, Przemyslaw, Ciechanowski, Ed,
	Healey, Douglas D, Neubauer, Wojciech, linux-raid@vger.kernel.org

On Tue, 6 Jul 2010 15:03:07 +0100
"Labun, Marcin" <Marcin.Labun@intel.com> wrote:

> 
> > -----Original Message-----
> > From: Neil Brown [mailto:neilb@suse.de]
> > Sent: Thursday, July 01, 2010 8:50 AM
> > To: Williams, Dan J; Doug Ledford
> > Cc: Labun, Marcin; Czarnowska, Anna; Hawrylewicz Czarnowski,
> > Przemyslaw; Ciechanowski, Ed; Healey, Douglas D; Neubauer, Wojciech;
> > linux-raid@vger.kernel.org
> > Subject: A policy frame work for mdadm (incorporating domains and
> > hotplug and such)
> 
> [cut]
> 
> 
> > 
> >  Point 2 is that policy revolves primarily around devices (rather than
> >  arrays) and to a lesser extent around metadata types.
> >  It is devices that are migrated, devices that arrays are built from,
> > devices
> >  that are automatically made into spares etc.
> >  Metadata types often encode some specific policy in the metadata, so
> > they
> >  need some fairly strong role in the policy framework too.  Often the
> >  metadata type is like a parameter to a policy.  "You can incorporate
> > this
> >  device in any imsm array".
> > 
> >  So Abstraction 1 is a "Policy statement".
> > 
> >  A policy statement applies to a particular device, possibly in the
> > context
> >  of a particular metadata, and asserts that a particular name has a
> >  particular value.
> >      action=spare (ddf1)
> >  might be a policy statement about a device.  It says that where ddf1
> >  metadata is involved, the device can be made a hot-spare when it is
> >  hot-plugged.
> >      auto=homehost (0.90)
> >  might be another which says that auto-assembly may use a non-
> > disambiguated
> >  name (no trailing _NN) when assembling this device into a
> > metadata=0.90
> >  array providing the homehost information in the metadata matches this
> > host.
> > 
> >  A statement might not have any metadata type associated.
> >      action=ignore
> >  applies irrespective of metadata type.
> > 
> >  The policy names that I currently envisage are:
> > 
> >    action=  ignore, include, spare, force-spare
> > 
> >      which covers the hotplug actions that --incremental might perform.
> > 
> >    auto=  yes, homehost, no
> > 
> >      which covers the functionality currently in the AUTO mdadm.conf
> > line
> > 
> >    domain=  arbitrary-string
> > 
> >      This provides the 'domain' isolation functionality.
> >      The semantics I have in mind (and the precise details here are
> > fairly
> >      important so this cannot be changed lightly) are:
> >        A device can have a number of domains, possibly from various
> > sources.
> >        An array can have a number of domains, from the devices plus
> > from
> >        spare-group
> > 
> >       A device may be attached to an array if all of the domains of the
> > device
> >       are also domains of the array.  The array may have extra domains.
> > The
> >       device may not.
> > 
> >       This requires that if there are overlapping domains, they must
> > properly
> >       nest. i.e. the intersection of two domains must be empty, or one
> > of the
> >       domains.  It might make sense to have a domain 'global' which all
> >       devices have, and some other domains which just subsets have.
> > 
> 
> What is the usage of domain name? 

They determine whether a device may be attached to an array.
Both the device and the array have domain names.
A device may only be attached to an array if all of the domain names
it has also belong to the array.

The exact mechanism for assigning domain names to arrays is not finalised,
but the first-estimate is that it is the union of the domain names of all
current members.

> I understand that each policy line can have a domain token.

"can" yes.  However some policy lines may not.

> In this case each device that matches "match" keyword gains another domain. 
> But how it can use it?

The domains will typically be subset one of another.

We might have three sets of devices.
All are given domain=global
First set is given domain=first
second set is given domain=second

Then we create two arrays, one from devices in the first set, one from
devices in the second set.
Now any spare in the first set can be use in the first array
Any spare in the second set can be used in the second array
Any spare in the third set can be used in any array.

> 
> If each policy line has its own domain, so we have explicit relation 
> one-to-one between POLICY line and domain name.

No, each policy line does not its own domain.  Some do, some don't.

> Can different POLICY line share the same domain name?

Definitely.

> 
> Another questions is domain intersection. 
> It seems to me that the plan is test if "matching" keyword intersect. 

I don't currently foresee any explicit testing for intersection.
Certainly some devices will match several policy lines, and those policy
lines might make related statements - e.g. that might both give different
domains to the device, or they might even give the same domain to a device.

In generally policy can accumulate.  As mentioned above having multiple
domains associated with a device is perfectly fine.

I'm not sure yet how "action=" policies would accumulate.  I suspect that the
most permissive would apply, but I'm not satisfied that I have explored all
the possibilities yet so I'm not sure.

> In case domain would have non-empty intersection if the path globs overlap? 

Yes, domains can have non-empty intersections.  This will normally mean that
one is a subset of another, though that isn't strictly necessary.

> What else? Type (disk, partition) Metadata name?

Sorry, I don't know what you are asking here.

> 
> 
> Matching multiple policy lines that define conflicting policy names assignment is invalid configuration, right?
> (for instance one policy defines action=ignore and the other include and disk matches both POLICY line)
> How to deal with it? 

Good question.
I'm beginning to think that "action=ignore" isn't something we really want.
It seems appropriate at first glance, but it is inconsistent with the other
possible actions as it forbids rather than allows.
So we might just discard 'action=ignore'.  If we don't want to consider
certain devices for md at all, then tell udev not to run mdadm -I on them.

That leave
  action=include    # include this in the array if it appears to be a current
                    # member
  action=re-add     # include this in the array if it appears to be a current
                    # member or a member that was recently removed
  action=spare      # include this if either of the above apply, or if it is
                    # a bare device, make it a spare
  action=force-space# as above, but if that doesn't succeed, make is a spare
                    # anyway

Then if multiple action policies apply to the same device, we simply take
the most permissive - the last one in the above list.

I'm actually wondering if these 'action's should apply during --assemble as
well.
Normally Assemble will only include devices that are current members.
Anything that is a bit old is rejected (by the kernel). i.e. action=include.
Probably if action=re-add were in-force, then those rejected devices should be
immediately re-added.
Can that be extended to other devices? If there are other devices which don't
have a domain conflict and which are bare and have action=spare, should they
be immediately added as spare devices?

I'm not at all sure about force-space though... it seems just too easy for it
to destroy data.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: A policy frame work for mdadm (incorporating domains and hotplug and such)
  2010-07-06 22:40                             ` Neil Brown
@ 2010-07-08  7:54                               ` Labun, Marcin
  0 siblings, 0 replies; 11+ messages in thread
From: Labun, Marcin @ 2010-07-08  7:54 UTC (permalink / raw)
  To: Neil Brown
  Cc: Williams, Dan J, Doug Ledford, Czarnowska, Anna,
	Hawrylewicz Czarnowski, Przemyslaw, Ciechanowski, Ed,
	Healey, Douglas D, Neubauer, Wojciech, linux-raid@vger.kernel.org

Is there a chance to see policy framework code?
Thanks,
Marcin

> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Wednesday, July 07, 2010 12:40 AM
> To: Labun, Marcin
> Cc: Williams, Dan J; Doug Ledford; Czarnowska, Anna; Hawrylewicz
> Czarnowski, Przemyslaw; Ciechanowski, Ed; Healey, Douglas D; Neubauer,
> Wojciech; linux-raid@vger.kernel.org
> Subject: Re: A policy frame work for mdadm (incorporating domains and
> hotplug and such)
> 
> On Tue, 6 Jul 2010 15:03:07 +0100
> "Labun, Marcin" <Marcin.Labun@intel.com> wrote:
> 
> >
> > > -----Original Message-----
> > > From: Neil Brown [mailto:neilb@suse.de]
> > > Sent: Thursday, July 01, 2010 8:50 AM
> > > To: Williams, Dan J; Doug Ledford
> > > Cc: Labun, Marcin; Czarnowska, Anna; Hawrylewicz Czarnowski,
> > > Przemyslaw; Ciechanowski, Ed; Healey, Douglas D; Neubauer,
> Wojciech;
> > > linux-raid@vger.kernel.org
> > > Subject: A policy frame work for mdadm (incorporating domains and
> > > hotplug and such)
> >
> > [cut]
> >
> >
> > >
> > >  Point 2 is that policy revolves primarily around devices (rather
> than
> > >  arrays) and to a lesser extent around metadata types.
> > >  It is devices that are migrated, devices that arrays are built
> from,
> > > devices
> > >  that are automatically made into spares etc.
> > >  Metadata types often encode some specific policy in the metadata,
> so
> > > they
> > >  need some fairly strong role in the policy framework too.  Often
> the
> > >  metadata type is like a parameter to a policy.  "You can
> incorporate
> > > this
> > >  device in any imsm array".
> > >
> > >  So Abstraction 1 is a "Policy statement".
> > >
> > >  A policy statement applies to a particular device, possibly in the
> > > context
> > >  of a particular metadata, and asserts that a particular name has a
> > >  particular value.
> > >      action=spare (ddf1)
> > >  might be a policy statement about a device.  It says that where
> ddf1
> > >  metadata is involved, the device can be made a hot-spare when it
> is
> > >  hot-plugged.
> > >      auto=homehost (0.90)
> > >  might be another which says that auto-assembly may use a non-
> > > disambiguated
> > >  name (no trailing _NN) when assembling this device into a
> > > metadata=0.90
> > >  array providing the homehost information in the metadata matches
> this
> > > host.
> > >
> > >  A statement might not have any metadata type associated.
> > >      action=ignore
> > >  applies irrespective of metadata type.
> > >
> > >  The policy names that I currently envisage are:
> > >
> > >    action=  ignore, include, spare, force-spare
> > >
> > >      which covers the hotplug actions that --incremental might
> perform.
> > >
> > >    auto=  yes, homehost, no
> > >
> > >      which covers the functionality currently in the AUTO
> mdadm.conf
> > > line
> > >
> > >    domain=  arbitrary-string
> > >
> > >      This provides the 'domain' isolation functionality.
> > >      The semantics I have in mind (and the precise details here are
> > > fairly
> > >      important so this cannot be changed lightly) are:
> > >        A device can have a number of domains, possibly from various
> > > sources.
> > >        An array can have a number of domains, from the devices plus
> > > from
> > >        spare-group
> > >
> > >       A device may be attached to an array if all of the domains of
> the
> > > device
> > >       are also domains of the array.  The array may have extra
> domains.
> > > The
> > >       device may not.
> > >
> > >       This requires that if there are overlapping domains, they
> must
> > > properly
> > >       nest. i.e. the intersection of two domains must be empty, or
> one
> > > of the
> > >       domains.  It might make sense to have a domain 'global' which
> all
> > >       devices have, and some other domains which just subsets have.
> > >
> >
> > What is the usage of domain name?
> 
> They determine whether a device may be attached to an array.
> Both the device and the array have domain names.
> A device may only be attached to an array if all of the domain names
> it has also belong to the array.
> 
> The exact mechanism for assigning domain names to arrays is not
> finalised,
> but the first-estimate is that it is the union of the domain names of
> all
> current members.
> 
> > I understand that each policy line can have a domain token.
> 
> "can" yes.  However some policy lines may not.
> 
> > In this case each device that matches "match" keyword gains another
> domain.
> > But how it can use it?
> 
> The domains will typically be subset one of another.
> 
> We might have three sets of devices.
> All are given domain=global
> First set is given domain=first
> second set is given domain=second
> 
> Then we create two arrays, one from devices in the first set, one from
> devices in the second set.
> Now any spare in the first set can be use in the first array
> Any spare in the second set can be used in the second array
> Any spare in the third set can be used in any array.
> 
> >
> > If each policy line has its own domain, so we have explicit relation
> > one-to-one between POLICY line and domain name.
> 
> No, each policy line does not its own domain.  Some do, some don't.
> 
> > Can different POLICY line share the same domain name?
> 
> Definitely.
> 
> >
> > Another questions is domain intersection.
> > It seems to me that the plan is test if "matching" keyword intersect.
> 
> I don't currently foresee any explicit testing for intersection.
> Certainly some devices will match several policy lines, and those
> policy
> lines might make related statements - e.g. that might both give
> different
> domains to the device, or they might even give the same domain to a
> device.
> 
> In generally policy can accumulate.  As mentioned above having multiple
> domains associated with a device is perfectly fine.
> 
> I'm not sure yet how "action=" policies would accumulate.  I suspect
> that the
> most permissive would apply, but I'm not satisfied that I have explored
> all
> the possibilities yet so I'm not sure.
> 
> > In case domain would have non-empty intersection if the path globs
> overlap?
> 
> Yes, domains can have non-empty intersections.  This will normally mean
> that
> one is a subset of another, though that isn't strictly necessary.
> 
> > What else? Type (disk, partition) Metadata name?
> 
> Sorry, I don't know what you are asking here.
> 
> >
> >
> > Matching multiple policy lines that define conflicting policy names
> assignment is invalid configuration, right?
> > (for instance one policy defines action=ignore and the other include
> and disk matches both POLICY line)
> > How to deal with it?
> 
> Good question.
> I'm beginning to think that "action=ignore" isn't something we really
> want.
> It seems appropriate at first glance, but it is inconsistent with the
> other
> possible actions as it forbids rather than allows.
> So we might just discard 'action=ignore'.  If we don't want to consider
> certain devices for md at all, then tell udev not to run mdadm -I on
> them.
> 
> That leave
>   action=include    # include this in the array if it appears to be a
> current
>                     # member
>   action=re-add     # include this in the array if it appears to be a
> current
>                     # member or a member that was recently removed
>   action=spare      # include this if either of the above apply, or if
> it is
>                     # a bare device, make it a spare
>   action=force-space# as above, but if that doesn't succeed, make is a
> spare
>                     # anyway
> 
> Then if multiple action policies apply to the same device, we simply
> take
> the most permissive - the last one in the above list.
> 
> I'm actually wondering if these 'action's should apply during --
> assemble as
> well.
> Normally Assemble will only include devices that are current members.
> Anything that is a bit old is rejected (by the kernel). i.e.
> action=include.
> Probably if action=re-add were in-force, then those rejected devices
> should be
> immediately re-added.
> Can that be extended to other devices? If there are other devices which
> don't
> have a domain conflict and which are bare and have action=spare, should
> they
> be immediately added as spare devices?
> 
> I'm not at all sure about force-space though... it seems just too easy
> for it
> to destroy data.
> 
> Thanks,
> NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A policy frame work for mdadm (incorporating domains and hotplug and such)
  2010-07-01  6:50                         ` A policy frame work for mdadm (incorporating domains and hotplug and such) Neil Brown
  2010-07-01  8:26                           ` Dan Williams
  2010-07-06 14:03                           ` Labun, Marcin
@ 2010-07-08  7:58                           ` Neil Brown
  2010-07-27 16:17                             ` Hawrylewicz Czarnowski, Przemyslaw
  2 siblings, 1 reply; 11+ messages in thread
From: Neil Brown @ 2010-07-08  7:58 UTC (permalink / raw)
  To: Dan Williams
  Cc: Doug Ledford, Labun, Marcin, Czarnowska, Anna,
	Hawrylewicz Czarnowski, Przemyslaw, Ciechanowski, Ed,
	Healey, Douglas D, Neubauer, Wojciech, linux-raid

On Thu, 1 Jul 2010 16:50:07 +1000
Neil Brown <neilb@suse.de> wrote:

> So this is how I want these things to work, and this is what I'm going to be
> coding.  I should have the basic framework in place early next week (assuming
> no major interruptions) at which point I'll make the code available.

As one might expect there was a fairly significant interruption, so I didn't
get as far as I hoped.

Below is my current code, which compiles but is otherwise untested.

This is just the infrastructure for reading, manipulating, and checking
policy.

The next big step is implementing 'mbr' and 'gpt' metadata types (for
partitioning) and making sure I can make that idea work.
Then I need to generate a domain list given an array, and write code
to compare domain lists.

Then we should be able to start connecting the policy framework with the code
that will make use of the policy.

NeilBrown

Add policy framework.

From: NeilBrown <neilb@suse.de>

---
 Makefile |   10 +
 config.c |   10 +
 mdadm.h  |   59 ++++++++
 policy.c |  471 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 544 insertions(+), 6 deletions(-)
 create mode 100644 policy.c

diff --git a/Makefile b/Makefile
index 3af1665..11181e7 100644
--- a/Makefile
+++ b/Makefile
@@ -87,26 +87,26 @@ MAN4DIR = $(MANDIR)/man4
 MAN5DIR = $(MANDIR)/man5
 MAN8DIR = $(MANDIR)/man8
 
-OBJS =  mdadm.o config.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
+OBJS =  mdadm.o config.o policy.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
 	Create.o Detail.o Examine.o Grow.o Monitor.o dlink.o Kill.o Query.o \
 	Incremental.o \
 	mdopen.o super0.o super1.o super-ddf.o super-intel.o bitmap.o \
 	restripe.o sysfs.o sha1.o mapfile.o crc32.o sg_io.o msg.o \
 	platform-intel.o probe_roms.o
 
-SRCS =  mdadm.c config.c mdstat.c  ReadMe.c util.c Manage.c Assemble.c Build.c \
+SRCS =  mdadm.c config.c policy.c mdstat.c  ReadMe.c util.c Manage.c Assemble.c Build.c \
 	Create.c Detail.c Examine.c Grow.c Monitor.c dlink.c Kill.c Query.c \
 	Incremental.c \
 	mdopen.c super0.c super1.c super-ddf.c super-intel.c bitmap.c \
 	restripe.c sysfs.c sha1.c mapfile.c crc32.c sg_io.c msg.c \
 	platform-intel.c probe_roms.c
 
-MON_OBJS = mdmon.o monitor.o managemon.o util.o mdstat.o sysfs.o config.o \
+MON_OBJS = mdmon.o monitor.o managemon.o util.o mdstat.o sysfs.o config.o policy.o \
 	Kill.o sg_io.o dlink.o ReadMe.o super0.o super1.o super-intel.o \
 	super-ddf.o sha1.o crc32.o msg.o bitmap.o \
 	platform-intel.o probe_roms.o
 
-MON_SRCS = mdmon.c monitor.c managemon.c util.c mdstat.c sysfs.c config.c \
+MON_SRCS = mdmon.c monitor.c managemon.c util.c mdstat.c sysfs.c config.c policy.c \
 	Kill.c sg_io.c dlink.c ReadMe.c super0.c super1.c super-intel.c \
 	super-ddf.c sha1.c crc32.c msg.c bitmap.c \
 	platform-intel.c probe_roms.c
@@ -114,7 +114,7 @@ MON_SRCS = mdmon.c monitor.c managemon.c util.c mdstat.c sysfs.c config.c \
 STATICSRC = pwgr.c
 STATICOBJS = pwgr.o
 
-ASSEMBLE_SRCS := mdassemble.c Assemble.c Manage.c config.c dlink.c util.c \
+ASSEMBLE_SRCS := mdassemble.c Assemble.c Manage.c config.c policy.c dlink.c util.c \
 	super0.c super1.c super-ddf.c super-intel.c sha1.c crc32.c sg_io.c mdstat.c \
 	platform-intel.c probe_roms.c sysfs.c
 ASSEMBLE_AUTO_SRCS := mdopen.c
diff --git a/config.c b/config.c
index 20c46e9..995b41d 100644
--- a/config.c
+++ b/config.c
@@ -75,7 +75,7 @@ char DefaultConfFile[] = CONFFILE;
 char DefaultAltConfFile[] = CONFFILE2;
 
 enum linetype { Devices, Array, Mailaddr, Mailfrom, Program, CreateDev,
-		Homehost, AutoMode, LTEnd };
+		Homehost, AutoMode, Policy, PartPolicy, LTEnd };
 char *keywords[] = {
 	[Devices]  = "devices",
 	[Array]    = "array",
@@ -85,6 +85,8 @@ char *keywords[] = {
 	[CreateDev]= "create",
 	[Homehost] = "homehost",
 	[AutoMode] = "auto",
+	[Policy]   = "policy",
+	[PartPolicy]="part-policy",
 	[LTEnd]    = NULL
 };
 
@@ -766,6 +768,12 @@ void load_conffile(void)
 		case AutoMode:
 			autoline(line);
 			break;
+		case Policy:
+			policyline(line, rule_policy);
+			break;
+		case PartPolicy:
+			policyline(line, rule_part);
+			break;
 		default:
 			fprintf(stderr, Name ": Unknown keyword %s\n", line);
 		}
diff --git a/mdadm.h b/mdadm.h
index d15e73e..f7e6548 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -724,6 +724,65 @@ extern void get_one_disk(int mdfd, mdu_array_info_t *ainf,
 			 mdu_disk_info_t *disk);
 void wait_for(char *dev, int fd);
 
+/*
+ * Data structures for policy management.
+ * Each device can have a policy structure that lists
+ * various name/value pairs each possibly with a metadata associated.
+ * The policy list is sorted by name/value/metadata
+ */
+struct dev_policy {
+	struct dev_policy *next;
+	char *name;	/* None of these strings are allocated.  They are
+			 * all just references to strings which are known
+			 * to exist elsewhere.
+			 * name and metadata can be compared by address equality.
+			 */
+	char *metadata;
+	char *value;
+};
+
+extern char pol_act[], pol_domain[], pol_metadata[];
+
+/* iterate over the sublist starting at list, having the same
+ * 'name' as 'list', and matching the given metadata (Where
+ * NULL matches anything
+ */
+#define pol_for_each(item, list, metadata)				\
+	for (item = list;						\
+	     item && item->name == list->name;				\
+	     item = item->next)						\
+		if (!(!metadata || !item->metadata || metadata == item->metadata)) \
+			; else
+
+/*
+ * policy records read from mdadm are largely just name-value pairs.
+ * The names are constants, not strdupped
+ */
+struct pol_rule {
+	struct pol_rule *next;
+	char *type;	/* rule_policy or rule_part */
+	struct rule {
+		struct rule *next;
+		char *name;
+		char *value;
+		char *dups; /* duplicates of 'value' with a partNN appended */
+	} *rule;
+};
+
+extern char rule_policy[], rule_part[];
+extern char rule_path[], rule_type[];
+
+extern void policyline(char *line, char *type);
+
+enum policy_action {
+	act_default,
+	act_include,
+	act_re_add,
+	act_spare,
+	act_force_spare,
+	act_err
+};
+
 #if __GNUC__ < 3
 struct stat64;
 #endif
diff --git a/policy.c b/policy.c
new file mode 100644
index 0000000..7019d11
--- /dev/null
+++ b/policy.c
@@ -0,0 +1,471 @@
+/*
+ * mdadm - manage Linux "md" devices aka RAID arrays.
+ *
+ * Copyright (C) 2001-2009 Neil Brown <neilb@suse.de>
+ *
+ *
+ *    This program is free software; you can redistribute it and/or modify
+ *    it under the terms of the GNU General Public License as published by
+ *    the Free Software Foundation; either version 2 of the License, or
+ *    (at your option) any later version.
+ *
+ *    This program is distributed in the hope that it will be useful,
+ *    but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *    GNU General Public License for more details.
+ *
+ *    You should have received a copy of the GNU General Public License
+ *    along with this program; if not, write to the Free Software
+ *    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ *    Author: Neil Brown
+ *    Email: <neilb@suse.de>
+ */
+
+#include "mdadm.h"
+#include <dirent.h>
+#include <fnmatch.h>
+#include <ctype.h>
+#include "dlink.h"
+/*
+ * Policy module for mdadm.
+ * A policy statement about a device lists a set of values for each
+ * of a set of names.  Each value can have a metadata type as context.
+ *
+ * names include:
+ *   action - the actions that can be taken on hot-plug
+ *   domain - the domain(s) that the device is part of
+ *
+ * Policy information is extracted from various sources, but
+ * particularly from a set of policy rules in mdadm.conf
+ */
+
+void pol_new(struct dev_policy **pol, char *name, char *val, char *metadata)
+{
+	struct dev_policy *n = malloc(sizeof(*n));
+	n->name = name;
+	n->value = val;
+	n->metadata = metadata;
+	n->next = *pol;
+	*pol = n;
+}
+
+int pol_lesseq(struct dev_policy *a, struct dev_policy *b)
+{
+	int cmp;
+
+	if (a->name < b->name)
+		return 1;
+	if (a->name > b->name)
+		return 0;
+
+	cmp = strcmp(a->value, b->value);
+	if (cmp < 0)
+		return 1;
+	if (cmp > 0)
+		return 0;
+
+	return (a->metadata <= b->metadata);
+}
+
+void pol_sort(struct dev_policy **pol)
+{
+	/* sort policy list in *pol by name/metadata/value
+	 * using merge sort
+	 */
+
+	struct dev_policy *pl[2];
+	pl[0] = *pol;
+	pl[1] = NULL;
+
+	do {
+		struct dev_policy **plp[2], *p[2];
+		int curr = 0;
+		struct dev_policy nul = { NULL };
+		struct dev_policy *prev = &nul;
+		int next = 0;
+
+		/* p[] are the two lists that we are merging.
+		 * plp[] are the ends of the two lists we create
+		 * from the merge.
+		 * 'curr' is which of plp[] that we are currently
+		 *   adding items to.
+		 * 'next' is which if p[] we will take the next
+		 *   item from.
+		 * 'prev' is that last value, which was placed in
+		 * plp[curr].
+		 */
+		plp[0] = &pl[0];
+		plp[1] = &pl[1];
+		p[0] = pl[0];
+		p[1] = pl[1];
+
+		/* take least of p[0] and p[1]
+		 * if it is larger than prev, add to
+		 * plp[curr], else swap curr then add
+		 */
+		while (p[0] || p[1]) {
+			if (p[next] == NULL ||
+			    (p[1-next] != NULL &&
+			     !(pol_lesseq(prev, p[1-next])
+			       ^pol_lesseq(p[1-next], p[next])
+			       ^pol_lesseq(p[next], prev)))
+				)
+				next = 1 - next;
+
+			if (!pol_lesseq(prev, p[next]))
+				curr = 1 - curr;
+
+			*plp[curr] = prev = p[next];
+			plp[curr] = &p[next]->next;
+			p[next] = p[next]->next;
+		}
+		*plp[0] = NULL;
+		*plp[1] = NULL;
+	} while (pl[0] && pl[1]);
+	if (pl[0])
+		*pol = pl[0];
+	else
+		*pol = pl[1];
+}
+
+void pol_dedup(struct dev_policy *pol)
+{
+	/* This is a sorted list - remove duplicates. */
+	while (pol && pol->next) {
+		if (pol_lesseq(pol->next, pol)) {
+			struct dev_policy *tmp = pol->next;
+			pol->next = tmp->next;
+			free(tmp);
+		} else
+			pol = pol->next;
+	}
+}
+
+#if 0
+struct dev_policy *pol_dup(struct dev_policy *pol)
+{
+	struct dev_policy *rv = NULL;
+	struct dev_policy **ep = &rv;
+
+	while (pol) {
+		pol_new(ep, pol->name, pol->val, pol->metadata);
+		ep = &(*ep)->next;
+		pol = pol->next;
+	}
+	return rv;
+}
+#endif
+
+/*
+ * pol_find finds the first entry in the policy
+ * list to match name.
+ * If it returns non-NULL there is at least one
+ * value, but how many can only be found by
+ * iterating through the list.
+ */
+struct dev_policy *pol_find(struct dev_policy *pol, char *name)
+{
+	while (pol && pol->name < name)
+		pol = pol->next;
+
+	if (!pol || pol->name != name)
+		return NULL;
+	return pol;
+}
+
+char *path_from_fd(int fd)
+{
+	struct stat stb1, stb2;
+	int prefix_len;
+	DIR *by_path;
+	char symlink[PATH_MAX] = "/dev/disk/by_path/";
+	struct dirent *ent;
+
+	fstat(fd, &stb1);
+
+	by_path = opendir(symlink);
+	if (!by_path)
+		return NULL;
+	prefix_len = strlen(symlink);
+
+	while ((ent = readdir(by_path)) != NULL) {
+		if (ent->d_type != DT_LNK)
+			continue;
+		strncpy(symlink + prefix_len,
+			ent->d_name,
+			sizeof(symlink) - prefix_len);
+		if (stat(symlink, &stb2) < 0)
+			continue;
+		if ((stb1.st_mode & S_IFMT) !=
+		    (stb2.st_mode & S_IFMT))
+			continue;
+		if (stb1.st_rdev != stb2.st_rdev)
+			continue;
+		closedir(by_path);
+		return strdup(ent->d_name);
+	}
+	closedir(by_path);
+	return NULL;
+}
+
+char type_part[] = "part";
+char type_disk[] = "disk";
+char *type_from_fd(int fd)
+{
+	if (test_partition(fd))
+		return type_part;
+	else
+		return type_disk;
+}
+
+int pol_match(struct rule *rule, char *path, char *type)
+{
+	/* check if this rule matches on path and type */
+	int pathok = 0; /* 0 == no path, 1 == match, -1 == no match yet */
+	int typeok = 0;
+
+	while (rule) {
+		if (rule->name == rule_path) {
+			if (pathok == 0)
+				pathok = -1;
+			if (fnmatch(rule->value, path, 0) == 0)
+				pathok = 1;
+		}
+		if (rule->name == rule_type) {
+			if (typeok == 0)
+				typeok = -1;
+			if (strcmp(rule->value, type) == 0)
+				typeok = 1;
+		}
+		rule = rule->next;
+	}
+	return pathok >= 0 && typeok >= 0;
+}
+
+void pol_merge(struct dev_policy **pol, struct rule *rule)
+{
+	/* copy any name assignments from rule into pol */
+	struct rule *r;
+	char *metadata = NULL;
+	for (r = rule; r ; r = r->next)
+		if (r->name == pol_metadata)
+			metadata = r->value;
+
+	for (r = rule; r ; r = r->next)
+		if (r->name == pol_act ||
+		    r->name == pol_domain)
+			pol_new(pol, r->name, r->value, metadata);
+}
+
+static int path_has_part(char *path, char **part)
+{
+	/* check if path ends with "-partNN" and
+	 * if it does, place a pointer to "-pathNN"
+	 * in 'part'.
+	 */
+	int l = strlen(path);
+	while (l > 1 && isdigit(path[l-1]))
+		l--;
+	if (l < 5 || strncmp(path+l-5, "-part", 5) != 0)
+		return 0;
+	*part = path+l-4;
+	return 1;
+}
+
+void pol_merge_part(struct dev_policy **pol, struct rule *rule, char *part)
+{
+	/* copy any name assignments from rule into pol, appending
+	 * -part to any domain.  The string with -part appended is
+	 * stored with the rule so it has a lifetime to match
+	 * the rule.
+	 */
+	struct rule *r;
+	char *metadata = NULL;
+	for (r = rule; r ; r = r->next)
+		if (r->name == pol_metadata)
+			metadata = r->value;
+
+	for (r = rule; r ; r = r->next) {
+		if (r->name == pol_act)
+			pol_new(pol, r->name, r->value, metadata);
+		else if (r->name == pol_domain) {
+			char *dom;
+			int len;
+			if (r->dups == NULL)
+				r->dups = dl_head();
+			len = strlen(r->value);
+			for (dom = dl_next(r->dups); dom != r->dups; dom = dl_next(dom))
+				if (strcmp(dom+len+1, part)== 0)
+					break;
+			if (dom == r->dups) {
+				char *newdom = dl_strndup(r->value, len + 1 + strlen(part));
+				strcat(strcat(newdom, "-"), part);
+				dl_add(r->dups, newdom);
+				dom = newdom;
+			}
+			pol_new(pol, r->name, dom, metadata);
+		}
+	}
+}
+
+static struct pol_rule *config_rules = NULL;
+static struct pol_rule **config_rules_end = NULL;
+static int config_rules_has_path = 0;
+
+/*
+ * most policy comes from a set policy rules that are
+ * read from the config file.
+ * device_policy() gathers policy information for the
+ * device opened in 'fd'.
+ */
+struct dev_policy *device_policy(int fd)
+{
+	char *path;
+	char *type = type_from_fd(fd);
+	struct pol_rule *rules;
+	struct dev_policy *pol = NULL;
+
+	if (config_rules_has_path) {
+		path = path_from_fd(fd);
+		if (!path || !type) {
+			free(path);
+			return NULL;
+		}
+	}
+
+	rules = config_rules;
+
+	while (rules) {
+		char *part;
+		if (rules->type == rule_policy)
+			if (pol_match(rules->rule, path, type))
+				pol_merge(&pol, rules->rule);
+		if (rules->type == rule_part && strcmp(type, type_part) == 0)
+			if (path_has_part(path, &part)) {
+				*part = 0;
+				if (pol_match(rules->rule, path, type_disk))
+					pol_merge_part(&pol, rules->rule, part+1);
+				*part = '-';
+			}
+		rules = rules->next;
+	}
+	pol_sort(&pol);
+	pol_dedup(pol);
+	free(path);
+	return pol;
+}
+
+/*
+ * process policy rules read from config file.
+ */
+
+char rule_path[] = "path";
+char rule_type[] = "type";
+
+char rule_policy[] = "policy";
+char rule_part[] = "part-policy";
+
+char pol_metadata[] = "metadata";
+char pol_act[] = "action";
+char pol_domain[] = "domain";
+
+static int try_rule(char *w, char *name, struct rule **rp)
+{
+	struct rule *r;
+	int len = strlen(name);
+	if (strncmp(w, name, len) != 0 ||
+	    name[len] != '=')
+		return 0;
+	r = malloc(sizeof(*r));
+	r->next = *rp;
+	r->name = name;
+	r->value = strdup(w+len+1);
+	r->dups = NULL;
+	*rp = r;
+	return 1;
+}
+
+void policyline(char *line, char *type)
+{
+	struct pol_rule *pr;
+	char *w;
+
+	if (config_rules_end == NULL)
+		config_rules_end = &config_rules;
+
+	pr = malloc(sizeof(*pr));
+	pr->type = type;
+	pr->rule = NULL;
+	for (w = dl_next(line); w != line ; w = dl_next(w)) {
+		if (try_rule(w, rule_path, &pr->rule))
+			config_rules_has_path = 1;
+		else if (! try_rule(w, rule_type, &pr->rule) &&
+			 ! try_rule(w, pol_metadata, &pr->rule) &&
+			 ! try_rule(w, pol_act, &pr->rule) &&
+			 ! try_rule(w, pol_domain, &pr->rule))
+			fprintf(stderr, Name ": policy rule %s unrecognised and ignored\n",
+				w);
+	}
+	pr->next = config_rules;
+	config_rules = pr;
+}
+
+void policy_free(void)
+{
+	while (config_rules) {
+		struct pol_rule *pr = config_rules;
+		struct rule *r;
+
+		config_rules = config_rules->next;
+
+		for (r = pr->rule; r; ) {
+			struct rule *next = r->next;
+			free(r->value);
+			if (r->dups)
+				free_line(r->dups);
+			free(r);
+			r = next;
+		}
+		free(pr);
+	}
+	config_rules_end = NULL;
+	config_rules_has_path = 0;
+}
+
+void dev_policy_free(struct dev_policy *p)
+{
+	struct dev_policy *t;
+	while (p) {
+		t = p;
+		p = p->next;
+		free(t);
+	}
+}
+
+enum policy_action map_act(char *act)
+{
+	if (strcmp(act, "include") == 0)
+		return act_include;
+	if (strcmp(act, "re-add") == 0)
+		return act_re_add;
+	if (strcmp(act, "spare") == 0)
+		return act_spare;
+	if (strcmp(act, "force-spare") == 0)
+		return act_force_spare;
+	return act_err;
+}
+
+enum policy_action policy_action(struct dev_policy *plist, char *metadata)
+{
+	enum policy_action rv = act_default;
+	struct dev_policy *p;
+
+	plist = pol_find(plist, pol_act);
+	pol_for_each(p, plist, metadata) {
+		enum policy_action a = map_act(p->value);
+		if (a > rv)
+			rv = a;
+	}
+	return rv;
+}

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: A policy frame work for mdadm (incorporating domains and hotplug and such)
  2010-07-08  7:58                           ` Neil Brown
@ 2010-07-27 16:17                             ` Hawrylewicz Czarnowski, Przemyslaw
  2010-07-27 22:18                               ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Hawrylewicz Czarnowski, Przemyslaw @ 2010-07-27 16:17 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Labun, Marcin, Czarnowska, Anna,
	Hawrylewicz Czarnowski, Przemyslaw, Ciechanowski, Ed,
	Healey, Douglas D, Neubauer, Wojciech

Hi Neil,

I was wondering if there is anything new in this area?
Please share your ideas and/or a new code :)

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Neil Brown
> Sent: Thursday, July 08, 2010 9:59 AM
> To: Williams, Dan J
> Cc: Doug Ledford; Labun, Marcin; Czarnowska, Anna; Hawrylewicz
> Czarnowski, Przemyslaw; Ciechanowski, Ed; Healey, Douglas D; Neubauer,
> Wojciech; linux-raid@vger.kernel.org
> Subject: Re: A policy frame work for mdadm (incorporating domains and
> hotplug and such)
>
> On Thu, 1 Jul 2010 16:50:07 +1000
> Neil Brown <neilb@suse.de> wrote:
>
> > So this is how I want these things to work, and this is what I'm
> going to be
> > coding.  I should have the basic framework in place early next week
> (assuming
> > no major interruptions) at which point I'll make the code available.
>
> As one might expect there was a fairly significant interruption, so I
> didn't
> get as far as I hoped.
>
> Below is my current code, which compiles but is otherwise untested.
>
> This is just the infrastructure for reading, manipulating, and checking
> policy.
>
> The next big step is implementing 'mbr' and 'gpt' metadata types (for
> partitioning) and making sure I can make that idea work.
> Then I need to generate a domain list given an array, and write code
> to compare domain lists.
>
> Then we should be able to start connecting the policy framework with
> the code
> that will make use of the policy.
>
> NeilBrown
>
> Add policy framework.
>
> From: NeilBrown <neilb@suse.de>
>
> ---
>  Makefile |   10 +
>  config.c |   10 +
>  mdadm.h  |   59 ++++++++
>  policy.c |  471
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 544 insertions(+), 6 deletions(-)
>  create mode 100644 policy.c
>
> diff --git a/Makefile b/Makefile
> index 3af1665..11181e7 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -87,26 +87,26 @@ MAN4DIR = $(MANDIR)/man4
>  MAN5DIR = $(MANDIR)/man5
>  MAN8DIR = $(MANDIR)/man8
>
> -OBJS =  mdadm.o config.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o
> Build.o \
> +OBJS =  mdadm.o config.o policy.o mdstat.o  ReadMe.o util.o Manage.o
> Assemble.o Build.o \
>       Create.o Detail.o Examine.o Grow.o Monitor.o dlink.o Kill.o
> Query.o \
>       Incremental.o \
>       mdopen.o super0.o super1.o super-ddf.o super-intel.o bitmap.o \
>       restripe.o sysfs.o sha1.o mapfile.o crc32.o sg_io.o msg.o \
>       platform-intel.o probe_roms.o
>
> -SRCS =  mdadm.c config.c mdstat.c  ReadMe.c util.c Manage.c Assemble.c
> Build.c \
> +SRCS =  mdadm.c config.c policy.c mdstat.c  ReadMe.c util.c Manage.c
> Assemble.c Build.c \
>       Create.c Detail.c Examine.c Grow.c Monitor.c dlink.c Kill.c
> Query.c \
>       Incremental.c \
>       mdopen.c super0.c super1.c super-ddf.c super-intel.c bitmap.c \
>       restripe.c sysfs.c sha1.c mapfile.c crc32.c sg_io.c msg.c \
>       platform-intel.c probe_roms.c
>
> -MON_OBJS = mdmon.o monitor.o managemon.o util.o mdstat.o sysfs.o
> config.o \
> +MON_OBJS = mdmon.o monitor.o managemon.o util.o mdstat.o sysfs.o
> config.o policy.o \
>       Kill.o sg_io.o dlink.o ReadMe.o super0.o super1.o super-intel.o \
>       super-ddf.o sha1.o crc32.o msg.o bitmap.o \
>       platform-intel.o probe_roms.o
>
> -MON_SRCS = mdmon.c monitor.c managemon.c util.c mdstat.c sysfs.c
> config.c \
> +MON_SRCS = mdmon.c monitor.c managemon.c util.c mdstat.c sysfs.c
> config.c policy.c \
>       Kill.c sg_io.c dlink.c ReadMe.c super0.c super1.c super-intel.c \
>       super-ddf.c sha1.c crc32.c msg.c bitmap.c \
>       platform-intel.c probe_roms.c
> @@ -114,7 +114,7 @@ MON_SRCS = mdmon.c monitor.c managemon.c util.c
> mdstat.c sysfs.c config.c \
>  STATICSRC = pwgr.c
>  STATICOBJS = pwgr.o
>
> -ASSEMBLE_SRCS := mdassemble.c Assemble.c Manage.c config.c dlink.c
> util.c \
> +ASSEMBLE_SRCS := mdassemble.c Assemble.c Manage.c config.c policy.c
> dlink.c util.c \
>       super0.c super1.c super-ddf.c super-intel.c sha1.c crc32.c
> sg_io.c mdstat.c \
>       platform-intel.c probe_roms.c sysfs.c
>  ASSEMBLE_AUTO_SRCS := mdopen.c
> diff --git a/config.c b/config.c
> index 20c46e9..995b41d 100644
> --- a/config.c
> +++ b/config.c
> @@ -75,7 +75,7 @@ char DefaultConfFile[] = CONFFILE;
>  char DefaultAltConfFile[] = CONFFILE2;
>
>  enum linetype { Devices, Array, Mailaddr, Mailfrom, Program,
> CreateDev,
> -             Homehost, AutoMode, LTEnd };
> +             Homehost, AutoMode, Policy, PartPolicy, LTEnd };
>  char *keywords[] = {
>       [Devices]  = "devices",
>       [Array]    = "array",
> @@ -85,6 +85,8 @@ char *keywords[] = {
>       [CreateDev]= "create",
>       [Homehost] = "homehost",
>       [AutoMode] = "auto",
> +     [Policy]   = "policy",
> +     [PartPolicy]="part-policy",
>       [LTEnd]    = NULL
>  };
>
> @@ -766,6 +768,12 @@ void load_conffile(void)
>               case AutoMode:
>                       autoline(line);
>                       break;
> +             case Policy:
> +                     policyline(line, rule_policy);
> +                     break;
> +             case PartPolicy:
> +                     policyline(line, rule_part);
> +                     break;
>               default:
>                       fprintf(stderr, Name ": Unknown keyword %s\n", line);
>               }
> diff --git a/mdadm.h b/mdadm.h
> index d15e73e..f7e6548 100644
> --- a/mdadm.h
> +++ b/mdadm.h
> @@ -724,6 +724,65 @@ extern void get_one_disk(int mdfd,
> mdu_array_info_t *ainf,
>                        mdu_disk_info_t *disk);
>  void wait_for(char *dev, int fd);
>
> +/*
> + * Data structures for policy management.
> + * Each device can have a policy structure that lists
> + * various name/value pairs each possibly with a metadata associated.
> + * The policy list is sorted by name/value/metadata
> + */
> +struct dev_policy {
> +     struct dev_policy *next;
> +     char *name;     /* None of these strings are allocated.  They are
> +                      * all just references to strings which are known
> +                      * to exist elsewhere.
> +                      * name and metadata can be compared by address
> equality.
> +                      */
> +     char *metadata;
> +     char *value;
> +};
> +
> +extern char pol_act[], pol_domain[], pol_metadata[];
> +
> +/* iterate over the sublist starting at list, having the same
> + * 'name' as 'list', and matching the given metadata (Where
> + * NULL matches anything
> + */
> +#define pol_for_each(item, list, metadata)                           \
> +     for (item = list;                                               \
> +          item && item->name == list->name;                          \
> +          item = item->next)                                         \
> +             if (!(!metadata || !item->metadata || metadata == item-
> >metadata)) \
> +                     ; else
> +
> +/*
> + * policy records read from mdadm are largely just name-value pairs.
> + * The names are constants, not strdupped
> + */
> +struct pol_rule {
> +     struct pol_rule *next;
> +     char *type;     /* rule_policy or rule_part */
> +     struct rule {
> +             struct rule *next;
> +             char *name;
> +             char *value;
> +             char *dups; /* duplicates of 'value' with a partNN appended
> */
> +     } *rule;
> +};
> +
> +extern char rule_policy[], rule_part[];
> +extern char rule_path[], rule_type[];
> +
> +extern void policyline(char *line, char *type);
> +
> +enum policy_action {
> +     act_default,
> +     act_include,
> +     act_re_add,
> +     act_spare,
> +     act_force_spare,
> +     act_err
> +};
> +
>  #if __GNUC__ < 3
>  struct stat64;
>  #endif
> diff --git a/policy.c b/policy.c
> new file mode 100644
> index 0000000..7019d11
> --- /dev/null
> +++ b/policy.c
> @@ -0,0 +1,471 @@
> +/*
> + * mdadm - manage Linux "md" devices aka RAID arrays.
> + *
> + * Copyright (C) 2001-2009 Neil Brown <neilb@suse.de>
> + *
> + *
> + *    This program is free software; you can redistribute it and/or
> modify
> + *    it under the terms of the GNU General Public License as
> published by
> + *    the Free Software Foundation; either version 2 of the License,
> or
> + *    (at your option) any later version.
> + *
> + *    This program is distributed in the hope that it will be useful,
> + *    but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *    GNU General Public License for more details.
> + *
> + *    You should have received a copy of the GNU General Public
> License
> + *    along with this program; if not, write to the Free Software
> + *    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-
> 1307  USA
> + *
> + *    Author: Neil Brown
> + *    Email: <neilb@suse.de>
> + */
> +
> +#include "mdadm.h"
> +#include <dirent.h>
> +#include <fnmatch.h>
> +#include <ctype.h>
> +#include "dlink.h"
> +/*
> + * Policy module for mdadm.
> + * A policy statement about a device lists a set of values for each
> + * of a set of names.  Each value can have a metadata type as context.
> + *
> + * names include:
> + *   action - the actions that can be taken on hot-plug
> + *   domain - the domain(s) that the device is part of
> + *
> + * Policy information is extracted from various sources, but
> + * particularly from a set of policy rules in mdadm.conf
> + */
> +
> +void pol_new(struct dev_policy **pol, char *name, char *val, char
> *metadata)
> +{
> +     struct dev_policy *n = malloc(sizeof(*n));
> +     n->name = name;
> +     n->value = val;
> +     n->metadata = metadata;
> +     n->next = *pol;
> +     *pol = n;
> +}
> +
> +int pol_lesseq(struct dev_policy *a, struct dev_policy *b)
> +{
> +     int cmp;
> +
> +     if (a->name < b->name)
> +             return 1;
> +     if (a->name > b->name)
> +             return 0;
> +
> +     cmp = strcmp(a->value, b->value);
> +     if (cmp < 0)
> +             return 1;
> +     if (cmp > 0)
> +             return 0;
> +
> +     return (a->metadata <= b->metadata);
> +}
> +
> +void pol_sort(struct dev_policy **pol)
> +{
> +     /* sort policy list in *pol by name/metadata/value
> +      * using merge sort
> +      */
> +
> +     struct dev_policy *pl[2];
> +     pl[0] = *pol;
> +     pl[1] = NULL;
> +
> +     do {
> +             struct dev_policy **plp[2], *p[2];
> +             int curr = 0;
> +             struct dev_policy nul = { NULL };
> +             struct dev_policy *prev = &nul;
> +             int next = 0;
> +
> +             /* p[] are the two lists that we are merging.
> +              * plp[] are the ends of the two lists we create
> +              * from the merge.
> +              * 'curr' is which of plp[] that we are currently
> +              *   adding items to.
> +              * 'next' is which if p[] we will take the next
> +              *   item from.
> +              * 'prev' is that last value, which was placed in
> +              * plp[curr].
> +              */
> +             plp[0] = &pl[0];
> +             plp[1] = &pl[1];
> +             p[0] = pl[0];
> +             p[1] = pl[1];
> +
> +             /* take least of p[0] and p[1]
> +              * if it is larger than prev, add to
> +              * plp[curr], else swap curr then add
> +              */
> +             while (p[0] || p[1]) {
> +                     if (p[next] == NULL ||
> +                         (p[1-next] != NULL &&
> +                          !(pol_lesseq(prev, p[1-next])
> +                            ^pol_lesseq(p[1-next], p[next])
> +                            ^pol_lesseq(p[next], prev)))
> +                             )
> +                             next = 1 - next;
> +
> +                     if (!pol_lesseq(prev, p[next]))
> +                             curr = 1 - curr;
> +
> +                     *plp[curr] = prev = p[next];
> +                     plp[curr] = &p[next]->next;
> +                     p[next] = p[next]->next;
> +             }
> +             *plp[0] = NULL;
> +             *plp[1] = NULL;
> +     } while (pl[0] && pl[1]);
> +     if (pl[0])
> +             *pol = pl[0];
> +     else
> +             *pol = pl[1];
> +}
> +
> +void pol_dedup(struct dev_policy *pol)
> +{
> +     /* This is a sorted list - remove duplicates. */
> +     while (pol && pol->next) {
> +             if (pol_lesseq(pol->next, pol)) {
> +                     struct dev_policy *tmp = pol->next;
> +                     pol->next = tmp->next;
> +                     free(tmp);
> +             } else
> +                     pol = pol->next;
> +     }
> +}
> +
> +#if 0
> +struct dev_policy *pol_dup(struct dev_policy *pol)
> +{
> +     struct dev_policy *rv = NULL;
> +     struct dev_policy **ep = &rv;
> +
> +     while (pol) {
> +             pol_new(ep, pol->name, pol->val, pol->metadata);
> +             ep = &(*ep)->next;
> +             pol = pol->next;
> +     }
> +     return rv;
> +}
> +#endif
> +
> +/*
> + * pol_find finds the first entry in the policy
> + * list to match name.
> + * If it returns non-NULL there is at least one
> + * value, but how many can only be found by
> + * iterating through the list.
> + */
> +struct dev_policy *pol_find(struct dev_policy *pol, char *name)
> +{
> +     while (pol && pol->name < name)
> +             pol = pol->next;
> +
> +     if (!pol || pol->name != name)
> +             return NULL;
> +     return pol;
> +}
> +
> +char *path_from_fd(int fd)
> +{
> +     struct stat stb1, stb2;
> +     int prefix_len;
> +     DIR *by_path;
> +     char symlink[PATH_MAX] = "/dev/disk/by_path/";
> +     struct dirent *ent;
> +
> +     fstat(fd, &stb1);
> +
> +     by_path = opendir(symlink);
> +     if (!by_path)
> +             return NULL;
> +     prefix_len = strlen(symlink);
> +
> +     while ((ent = readdir(by_path)) != NULL) {
> +             if (ent->d_type != DT_LNK)
> +                     continue;
> +             strncpy(symlink + prefix_len,
> +                     ent->d_name,
> +                     sizeof(symlink) - prefix_len);
> +             if (stat(symlink, &stb2) < 0)
> +                     continue;
> +             if ((stb1.st_mode & S_IFMT) !=
> +                 (stb2.st_mode & S_IFMT))
> +                     continue;
> +             if (stb1.st_rdev != stb2.st_rdev)
> +                     continue;
> +             closedir(by_path);
> +             return strdup(ent->d_name);
> +     }
> +     closedir(by_path);
> +     return NULL;
> +}
> +
> +char type_part[] = "part";
> +char type_disk[] = "disk";
> +char *type_from_fd(int fd)
> +{
> +     if (test_partition(fd))
> +             return type_part;
> +     else
> +             return type_disk;
> +}
> +
> +int pol_match(struct rule *rule, char *path, char *type)
> +{
> +     /* check if this rule matches on path and type */
> +     int pathok = 0; /* 0 == no path, 1 == match, -1 == no match yet
> */
> +     int typeok = 0;
> +
> +     while (rule) {
> +             if (rule->name == rule_path) {
> +                     if (pathok == 0)
> +                             pathok = -1;
> +                     if (fnmatch(rule->value, path, 0) == 0)
> +                             pathok = 1;
> +             }
> +             if (rule->name == rule_type) {
> +                     if (typeok == 0)
> +                             typeok = -1;
> +                     if (strcmp(rule->value, type) == 0)
> +                             typeok = 1;
> +             }
> +             rule = rule->next;
> +     }
> +     return pathok >= 0 && typeok >= 0;
> +}
> +
> +void pol_merge(struct dev_policy **pol, struct rule *rule)
> +{
> +     /* copy any name assignments from rule into pol */
> +     struct rule *r;
> +     char *metadata = NULL;
> +     for (r = rule; r ; r = r->next)
> +             if (r->name == pol_metadata)
> +                     metadata = r->value;
> +
> +     for (r = rule; r ; r = r->next)
> +             if (r->name == pol_act ||
> +                 r->name == pol_domain)
> +                     pol_new(pol, r->name, r->value, metadata);
> +}
> +
> +static int path_has_part(char *path, char **part)
> +{
> +     /* check if path ends with "-partNN" and
> +      * if it does, place a pointer to "-pathNN"
> +      * in 'part'.
> +      */
> +     int l = strlen(path);
> +     while (l > 1 && isdigit(path[l-1]))
> +             l--;
> +     if (l < 5 || strncmp(path+l-5, "-part", 5) != 0)
> +             return 0;
> +     *part = path+l-4;
> +     return 1;
> +}
> +
> +void pol_merge_part(struct dev_policy **pol, struct rule *rule, char
> *part)
> +{
> +     /* copy any name assignments from rule into pol, appending
> +      * -part to any domain.  The string with -part appended is
> +      * stored with the rule so it has a lifetime to match
> +      * the rule.
> +      */
> +     struct rule *r;
> +     char *metadata = NULL;
> +     for (r = rule; r ; r = r->next)
> +             if (r->name == pol_metadata)
> +                     metadata = r->value;
> +
> +     for (r = rule; r ; r = r->next) {
> +             if (r->name == pol_act)
> +                     pol_new(pol, r->name, r->value, metadata);
> +             else if (r->name == pol_domain) {
> +                     char *dom;
> +                     int len;
> +                     if (r->dups == NULL)
> +                             r->dups = dl_head();
> +                     len = strlen(r->value);
> +                     for (dom = dl_next(r->dups); dom != r->dups; dom =
> dl_next(dom))
> +                             if (strcmp(dom+len+1, part)== 0)
> +                                     break;
> +                     if (dom == r->dups) {
> +                             char *newdom = dl_strndup(r->value, len + 1 +
> strlen(part));
> +                             strcat(strcat(newdom, "-"), part);
> +                             dl_add(r->dups, newdom);
> +                             dom = newdom;
> +                     }
> +                     pol_new(pol, r->name, dom, metadata);
> +             }
> +     }
> +}
> +
> +static struct pol_rule *config_rules = NULL;
> +static struct pol_rule **config_rules_end = NULL;
> +static int config_rules_has_path = 0;
> +
> +/*
> + * most policy comes from a set policy rules that are
> + * read from the config file.
> + * device_policy() gathers policy information for the
> + * device opened in 'fd'.
> + */
> +struct dev_policy *device_policy(int fd)
> +{
> +     char *path;
> +     char *type = type_from_fd(fd);
> +     struct pol_rule *rules;
> +     struct dev_policy *pol = NULL;
> +
> +     if (config_rules_has_path) {
> +             path = path_from_fd(fd);
> +             if (!path || !type) {
> +                     free(path);
> +                     return NULL;
> +             }
> +     }
> +
> +     rules = config_rules;
> +
> +     while (rules) {
> +             char *part;
> +             if (rules->type == rule_policy)
> +                     if (pol_match(rules->rule, path, type))
> +                             pol_merge(&pol, rules->rule);
> +             if (rules->type == rule_part && strcmp(type, type_part) ==
> 0)
> +                     if (path_has_part(path, &part)) {
> +                             *part = 0;
> +                             if (pol_match(rules->rule, path, type_disk))
> +                                     pol_merge_part(&pol, rules->rule,
> part+1);
> +                             *part = '-';
> +                     }
> +             rules = rules->next;
> +     }
> +     pol_sort(&pol);
> +     pol_dedup(pol);
> +     free(path);
> +     return pol;
> +}
> +
> +/*
> + * process policy rules read from config file.
> + */
> +
> +char rule_path[] = "path";
> +char rule_type[] = "type";
> +
> +char rule_policy[] = "policy";
> +char rule_part[] = "part-policy";
> +
> +char pol_metadata[] = "metadata";
> +char pol_act[] = "action";
> +char pol_domain[] = "domain";
> +
> +static int try_rule(char *w, char *name, struct rule **rp)
> +{
> +     struct rule *r;
> +     int len = strlen(name);
> +     if (strncmp(w, name, len) != 0 ||
> +         name[len] != '=')
> +             return 0;
> +     r = malloc(sizeof(*r));
> +     r->next = *rp;
> +     r->name = name;
> +     r->value = strdup(w+len+1);
> +     r->dups = NULL;
> +     *rp = r;
> +     return 1;
> +}
> +
> +void policyline(char *line, char *type)
> +{
> +     struct pol_rule *pr;
> +     char *w;
> +
> +     if (config_rules_end == NULL)
> +             config_rules_end = &config_rules;
> +
> +     pr = malloc(sizeof(*pr));
> +     pr->type = type;
> +     pr->rule = NULL;
> +     for (w = dl_next(line); w != line ; w = dl_next(w)) {
> +             if (try_rule(w, rule_path, &pr->rule))
> +                     config_rules_has_path = 1;
> +             else if (! try_rule(w, rule_type, &pr->rule) &&
> +                      ! try_rule(w, pol_metadata, &pr->rule) &&
> +                      ! try_rule(w, pol_act, &pr->rule) &&
> +                      ! try_rule(w, pol_domain, &pr->rule))
> +                     fprintf(stderr, Name ": policy rule %s unrecognised
> and ignored\n",
> +                             w);
> +     }
> +     pr->next = config_rules;
> +     config_rules = pr;
> +}
> +
> +void policy_free(void)
> +{
> +     while (config_rules) {
> +             struct pol_rule *pr = config_rules;
> +             struct rule *r;
> +
> +             config_rules = config_rules->next;
> +
> +             for (r = pr->rule; r; ) {
> +                     struct rule *next = r->next;
> +                     free(r->value);
> +                     if (r->dups)
> +                             free_line(r->dups);
> +                     free(r);
> +                     r = next;
> +             }
> +             free(pr);
> +     }
> +     config_rules_end = NULL;
> +     config_rules_has_path = 0;
> +}
> +
> +void dev_policy_free(struct dev_policy *p)
> +{
> +     struct dev_policy *t;
> +     while (p) {
> +             t = p;
> +             p = p->next;
> +             free(t);
> +     }
> +}
> +
> +enum policy_action map_act(char *act)
> +{
> +     if (strcmp(act, "include") == 0)
> +             return act_include;
> +     if (strcmp(act, "re-add") == 0)
> +             return act_re_add;
> +     if (strcmp(act, "spare") == 0)
> +             return act_spare;
> +     if (strcmp(act, "force-spare") == 0)
> +             return act_force_spare;
> +     return act_err;
> +}
> +
> +enum policy_action policy_action(struct dev_policy *plist, char
> *metadata)
> +{
> +     enum policy_action rv = act_default;
> +     struct dev_policy *p;
> +
> +     plist = pol_find(plist, pol_act);
> +     pol_for_each(p, plist, metadata) {
> +             enum policy_action a = map_act(p->value);
> +             if (a > rv)
> +                     rv = a;
> +     }
> +     return rv;
> +}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A policy frame work for mdadm (incorporating domains and hotplug and such)
  2010-07-27 16:17                             ` Hawrylewicz Czarnowski, Przemyslaw
@ 2010-07-27 22:18                               ` Neil Brown
  2010-08-12 13:53                                 ` Labun, Marcin
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Brown @ 2010-07-27 22:18 UTC (permalink / raw)
  To: Hawrylewicz Czarnowski, Przemyslaw
  Cc: linux-raid@vger.kernel.org, Labun, Marcin, Czarnowska, Anna,
	Ciechanowski, Ed, Healey, Douglas D, Neubauer, Wojciech

On Tue, 27 Jul 2010 17:17:46 +0100
"Hawrylewicz Czarnowski, Przemyslaw"
<przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:

> Hi Neil,
> 
> I was wondering if there is anything new in this area?
> Please share your ideas and/or a new code :)

No, other things have taken priority.  I might get back to it next week.

NeilBrown

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: A policy frame work for mdadm (incorporating domains and hotplug and such)
  2010-07-27 22:18                               ` Neil Brown
@ 2010-08-12 13:53                                 ` Labun, Marcin
  2010-08-24  4:44                                   ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Labun, Marcin @ 2010-08-12 13:53 UTC (permalink / raw)
  To: Neil Brown
  Cc: linux-raid@vger.kernel.org, Hawrylewicz Czarnowski, Przemyslaw,
	Czarnowska, Anna, Ciechanowski, Ed, Healey, Douglas D,
	Neubauer, Wojciech

Hi Neil,
We have made most changes in the auto-rebuild and hot-(un)plug code as you suggested.
And we are eager to port our code to your new policy framework.
What is the actual plan about policy framework release? 
Thanks,
Marcin



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Neil Brown
> Sent: Wednesday, July 28, 2010 12:18 AM
> To: Hawrylewicz Czarnowski, Przemyslaw
> Cc: linux-raid@vger.kernel.org; Labun, Marcin; Czarnowska, Anna;
> Ciechanowski, Ed; Healey, Douglas D; Neubauer, Wojciech
> Subject: Re: A policy frame work for mdadm (incorporating domains and
> hotplug and such)
> 
> On Tue, 27 Jul 2010 17:17:46 +0100
> "Hawrylewicz Czarnowski, Przemyslaw"
> <przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> 
> > Hi Neil,
> >
> > I was wondering if there is anything new in this area?
> > Please share your ideas and/or a new code :)
> 
> No, other things have taken priority.  I might get back to it next
> week.
> 
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: A policy frame work for mdadm (incorporating domains and hotplug and such)
  2010-08-12 13:53                                 ` Labun, Marcin
@ 2010-08-24  4:44                                   ` Neil Brown
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Brown @ 2010-08-24  4:44 UTC (permalink / raw)
  To: Labun, Marcin
  Cc: linux-raid@vger.kernel.org, Hawrylewicz Czarnowski, Przemyslaw,
	Czarnowska, Anna, Ciechanowski, Ed, Healey, Douglas D,
	Neubauer, Wojciech

On Thu, 12 Aug 2010 14:53:36 +0100
"Labun, Marcin" <Marcin.Labun@intel.com> wrote:

> Hi Neil,
> We have made most changes in the auto-rebuild and hot-(un)plug code as you suggested.
> And we are eager to port our code to your new policy framework.
> What is the actual plan about policy framework release? 
> Thanks,
> Marcin
> 

Sorry I have been so slow in responding to this - always too many calls on my
time :-(

I have just pushed out a devel-3.2 branch which should give a pretty good
idea of where I am going.

http://neil.brown.name/git?p=mdadm;a=shortlog;h=refs/heads/devel-3.2

I haven't actually tested anything yet so their might be some crazy stuff in
there but it shouldn't be too crazy.

I need to work out exactly how the "copy a partition table to a bare disk"
thing should work, then I want to start writing some test scripts and
actually test it.  Unfortunately loop devices don't appear
in /dev/disk/by-part/, so I might need to figure out something else
clever ... or just create some symlinks by hand.

NeilBrown


> 
> 
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > owner@vger.kernel.org] On Behalf Of Neil Brown
> > Sent: Wednesday, July 28, 2010 12:18 AM
> > To: Hawrylewicz Czarnowski, Przemyslaw
> > Cc: linux-raid@vger.kernel.org; Labun, Marcin; Czarnowska, Anna;
> > Ciechanowski, Ed; Healey, Douglas D; Neubauer, Wojciech
> > Subject: Re: A policy frame work for mdadm (incorporating domains and
> > hotplug and such)
> > 
> > On Tue, 27 Jul 2010 17:17:46 +0100
> > "Hawrylewicz Czarnowski, Przemyslaw"
> > <przemyslaw.hawrylewicz.czarnowski@intel.com> wrote:
> > 
> > > Hi Neil,
> > >
> > > I was wondering if there is anything new in this area?
> > > Please share your ideas and/or a new code :)
> > 
> > No, other things have taken priority.  I might get back to it next
> > week.
> > 
> > NeilBrown
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid"
> > in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-08-24  4:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <66C59AD0932712458090B447266D638CD69A3E95@irsmsx504.ger.corp.intel.com>
     [not found] ` <20100512070231.17342f29@notabene.brown>
     [not found]   ` <4BE9D7DC.6000306@intel.com>
     [not found]     ` <20100512132624.7a189995@notabene.brown>
     [not found]       ` <4BEAF416.5050509@intel.com>
     [not found]         ` <66C59AD0932712458090B447266D638CD6A81248@irsmsx504.ger.corp.intel.com>
     [not found]           ` <4BEC941A.8070508@intel.com>
     [not found]             ` <905EDD02F158D948B186911EB64DB3D11E33E9DE@irsmsx503.ger.corp.intel.com>
     [not found]               ` <66C59AD0932712458090B447266D638CD6D80FFF@irsmsx504.ger.corp.intel.com>
     [not found]                 ` <A9DE54D0CD747C4CB06DCE5B6FA2246FDA55FA48@irsmsx504.ger.corp.intel.com>
     [not found]                   ` <20100629113727.197a2002@notabene.brown>
     [not found]                     ` <905EDD02F158D948B186911EB64DB3D11F5235D9@irsmsx503.ger.corp.intel.com>
     [not found]                       ` <4C2B97E3.2080309@intel.com>
2010-07-01  6:50                         ` A policy frame work for mdadm (incorporating domains and hotplug and such) Neil Brown
2010-07-01  8:26                           ` Dan Williams
2010-07-06  5:19                             ` Neil Brown
2010-07-06 14:03                           ` Labun, Marcin
2010-07-06 22:40                             ` Neil Brown
2010-07-08  7:54                               ` Labun, Marcin
2010-07-08  7:58                           ` Neil Brown
2010-07-27 16:17                             ` Hawrylewicz Czarnowski, Przemyslaw
2010-07-27 22:18                               ` Neil Brown
2010-08-12 13:53                                 ` Labun, Marcin
2010-08-24  4:44                                   ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).