Re: RFC - device names and mdadm with some reference to udev.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Neil Brown <neilb@suse.de>
To: Doug Ledford <dledford@redhat.com>
Cc: linux-raid@vger.kernel.org,
	"martin f. krafft" <madduck@debian.org>,
	Michal Marek <mmarek@novell.com>,
	Kay Sievers <kay.sievers@vrfy.org>
Subject: Re: RFC - device names and mdadm with some reference to udev.
Date: Fri, 31 Oct 2008 20:45:35 +1100	[thread overview]
Message-ID: <18698.54207.663716.200113@notabene.brown> (raw)
In-Reply-To: message from Doug Ledford on Thursday October 30

On Thursday October 30, dledford@redhat.com wrote:
> On Mon, 2008-10-27 at 09:56 +1100, Neil Brown wrote:
> > Greeting.
> >  This is a Request For Comments....
> 
> OK, I've taken my time responding to this at least partially because I
> wanted to get you my current changes first.

You mean it isn't a law of the Internet the every email must be
replied to in less that 2 hour!  In never knew!!

Thanks for taking the time when the time was right.
> > 
> >  In 2.6.28, partitioned devices (mdp) wont be needed any more as md
> >  will make use of the "extended partition" functionality recently
> >  added.  All md devices can be partitioned.  The device number for the
> >  partitions will be very different to that of the whole device, but
> >  udev should hide all of that.  So we don't have to worry too much
> >  about mdp devices.
> 
> Back compatibility, and the ability to use current mdadm on older
> kernels may mean that we need to deal with mdp devices regardless.

True.  My thoughts were that the needs of mdp should not drive design
any more.  Certainly we keep back compatibility where practical and
fix bug (thanks!) and include support for mdp on at least and equal
level with md.  But any extra concerns don't need to drive design.

> 
> >  So I think the following is how I want things to work.  I am very
> >  open to comments and suggestions.  Particularly I want to know what
> >  (if anything) this will break.
> > 
> >  1/ The only device nodes created will be /dev/mdX and /dev/md_dX
> >     along with partitions /dev/mdXpY and /dev/md_dXpY as appropriate.
> >     These will be created by mdadm in accordance with the "--auto"
> >     flag unless something in mdadm.conf says to leave it to udev.
> >     In that case, mdadm will create a temporary node
> >     (/dev/.mdadm.whatever) and remove it once udev has created the
> >     real thing.
> 
> One thing I noticed in my work on the incremental stuff, is that the
> user friendly device naming method still wants to create
> these /dev/md_dX{pY} array names.  I'm actually in favor of doing away
> with the notion that an array needs to be numbered and exist in a
> numbered format in the /dev/ namespace.  If you have a user friendly
> name, such as /dev/md/root and /dev/md/boot, or /dev/md/root_p1
> and /dev/md/root_p2, I see no need to add additional numbered devices.
> Instead, just allow the device number of the named devices to be random.

I have considered dropping the "/dev/mdXX" names altogether, and I
think mdadm.2 sometimes does that.  But I've decided against it.
My reasons are:

 1/ udev is going to create them anyway, so there is no point trying
    to hide them.
 2/ those names appear in /proc/mdstat and despite all the rhetoric
    about naming policy not belonging in the kernel, the kernel does
    set some naming policy, "mdX" etc are part of that, and we cannot
    avoid it.
    Joe Sysadmin will see a name in /proc/mdstat and might want to
    access that device.  Having it easily available in /dev is good.

My current thought is that /dev/md/ provides human friendly names.
/dev/disk/by-id/md-whatever provides script-friendly names.  And /dev
directly contains kernel-friendly names.

> 
> >  2/ There will be various symlinks to these devices.
> >     a/ if "symlinks=yes" is given in mdadm.conf, symlinks from
> >          /dev/md/X or /dev/md/dX will be created.
> >     b/ if udev is configured like on Debian,
> >               /dev/disk/by-id/md-name-XXXX
> > 	and   /dev/disk/by-id/md-uuid-UUUU
> >        will be created (by udev).
> >     c/ If there is a 'name' associated with the array then
> >         /dev/md/name will be created as a link.
> >     d/ if an explicit device name of /dev/name was given,
> >         either on a -A, -B, -C, command or in mdadm.conf,
> > 	then the 'name' must match the name of the array,
> > 	and /dev/name will be used as well as /dev/md/name.
> 
> I think all these symlinks are problematic.  We have a naming
> consistency problem, and creating all these links just perpetuates that
> problem.  I would be in favor of standardizing the namespace location
> and semantics and doing away with all the symlinks.  Do that, and within
> one release cycle all the confusion will be gone.

Your last sentence is very pragmatic and sensible.  If confusion
exists, we really want to move firmly away from it, and people will
cope, particularly if things become cleared (even if they are
different to what they are used to).

I am dropping support for the "--symlinks" option and matching
mdadm.conf entry. 
/dev/mdXXX will always be the device node.  There will always be (at
most) one entry in /dev/md/ which points to it.  It might be e.g.
/dev/md/0, but only if no better name is available.

Hopefully this will be clear if documented well.

I think having a large number of symlinks from different places in
/dev is inevitable.   But if we come up with clear definitions of
meaning, purpose, and behaviour, we should be safe.

> 
> There's no need to make autostarted arrays that we can't identify as
> being intended solely for this host hard to find.  It's a little tricky
> if there's no homehost in the array, so let's skip that for a second.
> If there *is* a homehost, and we don't list the array in mdadm.conf or
> it doesn't match our homehost, then I think the answer is to just start
> the array auto-readonly with the name /dev/md/homehost:name.  Since we
> are assuming that homehost:name is unique even if it isn't our device,
> then that means it's sufficient for naming the device uniquely in
> our /dev/md space.  Now, if we don't have a homehost on the array, then
> I would do as you suggest and use a random high device number and have
> udev create any appropriate links.  Of course, udev would make those
> same links on devices with a homehost, so the final difference is just
> that you create a homehost:name device when possible, skip it when not.
> All the rest is the same.

I agree.  I think I have implemented some of this, but not all.  In
particular the idea of not starting unexpectedly-degraded arrays which
are foreign is not implemented I don't think.  I will do that.
I also now create e.g. /dev/md/homehost:name when that might be
appropriate.  However it isn't always the case that the name of the
homehost is known.  For 0.90, I can tell if a particular homehost
matches, but I cannot tell the correct homehost name.

But yes, we can still assemble the array and provide some sort of
meaningful name in /dev/md.

> > 
> >  2/ Auto-assembly of new arrays must not conflict with auto-assembly
> >     of previously existing arrays, even if the devices comprising the
> >     new arrays are discovered earlier.  This is what the 'homehost'
> >     concept is for.  Your array will only get assembled with a
> >     predictable name if it is known to be attached to 'this' host.
> 
> Really, with the advent of mount-by-label filesystem usage, this
> argument has become less legitimate.  That's not to say that using
> homehost intelligently isn't desirable, but even if there is a name
> conflict, and the wrong array gets assembled first, it really doesn't
> matter since the upper layers will detect the proper filesystem by
> filesystem label or uuid and use whatever device contains the filesystem
> they want.  So, I would treat homehost as a convenience and a hint, but
> I wouldn't allow lack of homehost or wrong homehost to prevent assembly.

Agreed.  auto-read-only and not starting unexpectedly-degraded foreign
arrays make me more comfortable about this.

> >  4/ auto-assembly needs to do the right thing on a SAN where multiple
> >     hosts can each see multiple arrays.  Clearly only one host should
> >     write to any one array at one time (until I get some
> >     cluster-awareness going, which I had hoped to work on this year,
> >     but it doesn't look like I will).
> >     In this case, I don't think read-auto is enough.  We either need
> >     to not assemble arrays when aren't known to belong to us, or we
> >     need to assemble them read-only and require and explicit
> >     read-write setting.
> > 
> >     So we need some way to know which devices could be visible to
> >     other hosts.
> >     I could have a global flag in mdadm.conf "Options SAN"
> >     I could have a SAN-DEVICES to match "DEVICES", but as just about
> >     everything is "/dev/sd*" these days, I don't know if that would
> >     work.
> > 
> >     Any suggestions concerning this would be welcome.
> 
> The scariest suggestion, but probably the most complete and automated,
> would be to have mdadm do a search on any constituent devices to find
> out what the eventual low level driver is.  If it's a fiber channel
> driver, or iSCSI, then don't auto assemble.  If it's sata/e-sata, or
> local SAS, then it's more likely auto assemble is fine.  But, that level
> of mucking around in /sys for each device would probably be quite ugly.


Quite.  And I'd almost certainly get it wrong.  One day someone might
come up with a solution that can be automated.  For now I think I
stick with configuration in mdadm.conf

> 
> > I'm also wondering if I should include a udev 'rules' file for md in
> > the mdadm distribution.  Obviously it would be no more than a
> > recommendation, but it might give me a voice in guiding how udev
> > interacted with mdadm.
> 
> Actually, this would probably be very helpful.  For instance, that udev
> rules file is probably the way you decide whether mdadm or udev
> creates/deletes all the links.  The actions of mdadm and udev have to be
> synchronized in order to avoid confusion about responsibilities.

Good.  I'm feeling quite positive about the idea of distributing an
mdadm.rules file.  I'm now even starting to understand udev rules
files!


Thanks for your thoughtful contributions.

NeilBrown

next prev parent reply	other threads:[~2008-10-31  9:45 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-26 22:56 RFC - device names and mdadm with some reference to udev Neil Brown
2008-10-27  8:22 ` martin f krafft
2008-10-27 15:13   ` Doug Ledford
2008-10-27 16:10     ` Andre Noll
2008-10-27 16:37       ` Kay Sievers
2008-10-27 16:59         ` martin f krafft
2008-10-27 18:31           ` Kay Sievers
2008-10-28  6:21             ` Luca Berra
2008-10-27 17:24         ` Doug Ledford
2008-10-27 23:36           ` Neil Brown
2008-10-29 18:49             ` Doug Ledford
2008-10-28  6:32           ` Luca Berra
2008-10-28  9:42           ` occasional bitmap was " David Greaves
2008-10-27 17:30         ` Andre Noll
2008-10-27 16:13     ` Kay Sievers
2008-10-27 22:37   ` Neil Brown
2008-10-27 22:51     ` Kay Sievers
2008-10-27 23:56       ` Neil Brown
2008-10-28  0:20         ` Kay Sievers
2008-10-28  6:17   ` Luca Berra
2008-10-27 12:41 ` Kay Sievers
2008-10-27 13:23   ` David Lethe
2008-10-27 23:27     ` Neil Brown
2008-10-27 23:48       ` David Lethe
2008-10-27 13:24   ` Andre Noll
2008-10-27 14:20     ` Kay Sievers
2008-10-27 23:23   ` Neil Brown
2008-10-28  0:03     ` Kay Sievers
2008-10-28  0:43       ` Neil Brown
2008-10-28  1:16         ` Kay Sievers
2008-10-28  1:44       ` Neil Brown
2008-10-28  1:52         ` Kay Sievers
2008-10-28  1:54           ` Kay Sievers
2008-10-31 20:54       ` Debian and udev (was: RFC - device names and mdadm with some reference to udev.) martin f krafft
2008-10-31 23:08         ` Bernd Schubert
2008-10-29  8:56     ` RFC - device names and mdadm with some reference to udev Gabor Gombas
2008-10-31 20:49     ` mdp devices on Debian (was: RFC - device names and mdadm with some reference to udev.) martin f krafft
2008-10-30 17:18 ` RFC - device names and mdadm with some reference to udev Doug Ledford
2008-10-31  9:45   ` Neil Brown [this message]
2008-11-03  9:29     ` Gabor Gombas
2008-11-03 10:33       ` Kay Sievers
2008-11-03 11:58         ` Gabor Gombas
2008-11-03 12:11           ` Kay Sievers
2008-11-03 14:34     ` Doug Ledford
2008-11-03 15:20       ` Dan Williams
2008-11-07  6:13       ` Neil Brown
2008-11-02 13:47   ` Luca Berra
     [not found] <dledford@redhat.com>
2008-10-31  1:02 ` greg
2008-10-31  9:18   ` Neil Brown
2008-11-02 13:52     ` Luca Berra
  -- strict thread matches above, loose matches on Subject: below --
2008-11-04 15:36 greg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18698.54207.663716.200113@notabene.brown \
    --to=neilb@suse.de \
    --cc=dledford@redhat.com \
    --cc=kay.sievers@vrfy.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=madduck@debian.org \
    --cc=mmarek@novell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).