linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org>,
	linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	martin f krafft <madduck-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org>,
	Michal Marek <mmarek-l3A5Bk7waGM@public.gmane.org>,
	Hans de Goede <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Bill Nottingham <notting-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
Date: Sat, 6 Feb 2010 14:07:03 -0700	[thread overview]
Message-ID: <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367@mail.gmail.com> (raw)
In-Reply-To: <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

>> 1/ If you take a look at native md superblock support you see that the
>> support code is duplicated between kernel-space and user space, having
>> it all handled in userspace means only one code base to maintain
>> (elegant aspect #1).
>
> Elegance is in the eye of the beholder.  More on that in a minute.
>

True, but let's agree that superblock formats are quirky, arbitrary
and all around inelegant.  Only needing to write that code once is at
the very least an aid to one's sanity.

>> 2/ The kernel can simply worry about the *mechanism* of providing raid
>> while all the assembly *policy* and support for any number of
>> superblock formats is relegated to where policy belongs (elegant
>> aspect #2).
>
> I would argue that dirty/clean state manipulation is *not* policy and
> *is* mechanism.  So, by your definition of what should be in the kernel
> combined with my definition of what dirty/clean state manipulation is,
> the solution is not only not elegant, it's flat incorrect.

You are conveniently blurring the lines between event generation and
event handling.  The kernel handles all the detail of detecting,
notifying and reaping the event.  The arbitrary superblock specific
actions that need to happen in response to that event are really not
very interesting to rest of the mechanism of providing raid.  You
could argue that I am conveniently drawing a line, and you would be
right.  There are convenient aspects of having this portion of the
solution in userspace which do not compromise the integrity of the
raid mechanism.

We can now also handle spare assignment policy, hot-plug policy,
corner case disagreements between a superblock's definition of a
"container", all without thrashing the kernel.

>
>> 2a/ This simply follows in the path of the design decision to not
>> support in-kernel auto-assembly of version-1 superblocks which started
>> the requirement to use an initramfs to boot software raid.  (this is a
>> not so elegant aspect because it mandates an initramfs to boot, but I
>> don't think a general purpose distro can ever get away from that
>> requirement).
>
> I'm fine with needing mdadm to assemble the device.  I'm not fine with
> needing mdmon once it's assembled.
>
>> I will say that needing to touch several software packages (kernel,
>> initramfs, initscripts, mdadm) to get imsm superblock support has
>> added some excitement to the process in the short term.  Long term I
>> think the elegant aspects of the decision will prove their worth.
>
> I will say that needing to touch multiple software packages might not be
> a bad thing, but think of *how* they had to be changed.  We had to add
> special exceptions for mdmon all over the place: kernel scheduler (for
> suspend/resume, mdmon can't be frozen like the rest of user space or
> else writing our suspend to disk image doesn't work), initramfs,
> initscripts after initramfs, initscripts on halt, SELinux.  In all these
> cases, we had to take something that we want to keep simple and add
> special case rules and exceptions for mdmon.  That pretty solidly says
> that while this arrangement may have been elegant for *you*, it was not
> elegant in the overall grand scheme of things.

No, nothing elegant about that, but I think you would agree this isn't
something we threw over the wall and walked away from.  Making mdmon
more convenient to handle is hopefully an obvious priority.  Yes, I
know you would like to see it die, but we are where we are.

>
> What would have been smart was to leave array creation, assembly,
> verfication, and modification to user space, but to put *all* of the
> raid mechanics, including superblock clean/dirty state processing and
> array shut down capabilities, in the kernel.  Had you done that, I would
> have called your solution elegant.
>
> It's at this point that I feel obliged to mention that, in terms of this
> whole big argument, the incremental map file has at least some amount of
> sense belonging in /dev, it's really the mdmon .pid and .sock files that
> don't, and those files wouldn't even exist had you designed things as I
> mention here.  It's the fact that you have two files per device that you
> should be placing in a specific place on the filesystem in order for
> them to be useful and adhere to standards yet the program they belong to
> needs to exist outside the context of any filesystem that I think is
> pretty strong evidence of the inelegance of this design.
>

This comment makes me see Neil's argument in a different light,
(hopefully I am not mischaracterizing it), but essentially we are
waiting for the standards to catch up with this new class of program.
FUSE, CUSE, and mdmon belong to a class of programs that move
traditionally exclusive kernel space functionality to userspace.
Debian's /lib/init/rw looks to be a response to this grey area of the
standards (not that I have any familiarity with the LSB).

--
Dan

  parent reply	other threads:[~2010-02-06 21:07 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-11 20:38 Minor mdadm fixes Doug Ledford
2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford
2010-01-18 22:01   ` Neil Brown
2010-01-18 22:13   ` Dan Williams
2010-01-19  1:55     ` Doug Ledford
2010-01-19  4:42       ` Dan Williams
2010-01-19  5:31         ` Doug Ledford
2010-01-19  5:47           ` Dan Williams
2010-01-11 20:38 ` [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot Doug Ledford
2010-01-18 22:09   ` Neil Brown
2010-01-19  7:21     ` Luca Berra
2010-01-19 17:51     ` Doug Ledford
2010-02-01 20:32       ` Bill Davidsen
2010-02-01 21:32         ` Doug Ledford
2010-02-01 22:42           ` Bill Davidsen
2010-02-02  4:08             ` Michael Evans
2010-02-02  7:17               ` Luca Berra
2010-02-02 15:42               ` Bill Davidsen
2010-02-02 18:19                 ` Doug Ledford
2010-02-04 13:50                   ` Bernd Schubert
2010-02-04 15:03                     ` Bernd Schubert
2010-02-04 15:48                       ` Doug Ledford
2010-02-04 16:40                         ` Bernd Schubert
2010-02-04 17:35                           ` Doug Ledford
2010-02-02 18:11               ` Doug Ledford
2010-02-02 18:07             ` Doug Ledford
2010-02-02 18:18               ` Bill Davidsen
2010-02-04  6:40       ` Neil Brown
2010-02-04 18:45         ` Doug Ledford
     [not found]           ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-04 23:04             ` Dan Williams
     [not found]               ` <e9c3a7c21002041504w17565653m5a8b8cd90543cf1e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-05  0:21                 ` Bill Davidsen
2010-02-05 12:14                   ` Luca Berra
2010-02-06 17:51               ` Doug Ledford
     [not found]                 ` <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-06 21:07                   ` Dan Williams [this message]
     [not found]                     ` <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-06 21:46                       ` martin f krafft
2010-02-06 22:06                         ` Michael Evans
2010-02-08 15:32                       ` Doug Ledford
2010-02-08 21:38                         ` Neil Brown
2010-02-09  0:20                           ` Michael Evans
     [not found]                           ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-02-09  2:19                             ` martin f krafft
     [not found]                               ` <20100209021949.GB11780-0owbi4v4jRjYceiJAzDLgeTW4wlIGRCZ@public.gmane.org>
2010-02-09 20:34                                 ` Doug Ledford
     [not found]                                   ` <4B71C6CA.3010407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-10  0:58                                     ` Mr. James W. Laferriere
     [not found]                                       ` <alpine.LNX.2.01.1002091553580.10004-pIN9qAC4yfKseEBmXaVrNB5FPEiCeG3sAL8bYrjMMd8@public.gmane.org>
2010-02-10  1:33                                         ` Neil Brown
2010-02-10  9:46                                           ` Harald Hoyer
     [not found]                                           ` <20100210123321.324e5de6-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-02-10 15:49                                             ` Dan Williams
2010-02-10 16:06                                               ` Michael Evans
     [not found]                                                 ` <4877c76c1002100806w66e504deg767f6ecc8cc7fa8a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-11  2:30                                                   ` Doug Ledford
2010-02-09 20:30                             ` Doug Ledford
2010-02-08  4:23                   ` Neil Brown
2010-02-07 22:13             ` Hans de Goede
2010-02-07 23:06               ` Neil Brown
2010-02-08  3:45           ` Neil Brown
2010-02-08 16:56             ` Bill Nottingham
2010-01-11 20:38 ` [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later Doug Ledford
2010-01-18 22:02   ` Neil Brown
2010-01-11 20:38 ` [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out Doug Ledford
2010-01-18 22:03   ` Neil Brown
2010-01-11 20:38 ` [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file Doug Ledford
2010-01-18 22:03   ` Neil Brown
2010-01-12  0:49 ` Minor mdadm fixes Mr. James W. Laferriere
2010-01-12  3:10   ` Andre Noll
2010-01-12  3:36     ` Doug Ledford
2010-01-12  4:39       ` Andre Noll
2010-01-12  4:46         ` Doug Ledford
2010-01-12  5:21           ` Andre Noll
2010-01-18 22:05 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9c3a7c21002061307le6f5d56ked4fa3711bdd2367@mail.gmail.com \
    --to=dan.j.williams-ral2jqcrhueavxtiumwx3w@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=madduck-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org \
    --cc=mmarek-l3A5Bk7waGM@public.gmane.org \
    --cc=neilb-l3A5Bk7waGM@public.gmane.org \
    --cc=notting-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).