linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hans de Goede <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Neil Brown <neilb-l3A5Bk7waGM@public.gmane.org>,
	linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Dan Williams
	<dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	martin f krafft <madduck-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org>,
	Michal Marek <mmarek-l3A5Bk7waGM@public.gmane.org>,
	Bill Nottingham <notting-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot
Date: Sun, 07 Feb 2010 23:13:49 +0100	[thread overview]
Message-ID: <4B6F3B1D.9020008@redhat.com> (raw)
In-Reply-To: <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Hi All,

On 02/04/2010 07:45 PM, Doug Ledford wrote:
> On 02/04/2010 01:40 AM, Neil Brown wrote:
>>

<snip>

>> Because we want to unmount and completely discard the filesystem that holds
>> the mdmon binary that was run early, we need to kill it and start a new one
>> running from final namespace.  This is also needed as to a small extent the
>> filesystem is used to communicate between mdadm and a running mdmon, and
>> having them have the same root is less confusing.
>>
>> There are three ways we can achieve this.
>>
>> 1/ If we can assume that between the time when the original "mount" completes
>>     and when the "mount -o remount,rw" happens the filesystem doesn't write to
>>     the device, then we can simply kill mdmon after the root is mounted, and
>>     restart it before remounting.   However I don't trust filesystem
>>     implementers so I won't recommend that.
>>
>> 2/ Before the pivot root we can kill the old mdmon and start the new one
>>     chrooted into the final root.
>> 3/ After the pivot root we can kill the old mdmon and start the new one.
>>
>> Number 2 is the approach that we (Well mostly Dan) originally intended and
>> that the code implements ... or tries to.  It got broken and I never
>> noticed.  I think I have fixed it now for 3.1.2.
>
> Note, as I recall, Hans switched things to be #3 for various reasons.
> That he switched it to #3 doesn't effect mdmon really, as it still is
> just killing and restarting, but doing it after the pivot root solved a
> couple issues.  I don't recall what they were, you would have to talk to
> Hans about that.
>

The reasons I made this change was that although the mdmon takeover
mechanism was designed to be used as 2., at the time I was integrating this
code in to Fedora and tying all bits together the mdmon code for doing 2
was very very broken. Back then I've send Dan a long list of issues with it,
which I believe are all fixed now.

But as using option 3. just worked from the time I integrated this and
has stayed working. I've never seen a need to switch things back to 2. again
and given that 2. requires all kind of trickery and is hard to get right,
where as 3. is pretty easy to get right, and much less prone to break
(regress) I think that staying with 3. is a good solution / decision.

As for the whole were to store mdmon .pid and .sock files, my 2cents is
that /dev is the only dir where a socket file (which cannot be moved
cross filesystems) can be made in the initramfs and still be accessible
from the real root, and other things like /lib/whythefuckputthisinslashlib/rw,
can only be implemented by:
1) adding a second tmpfs which stays living after the chroot to the real
    root.
2) symlinks which need to be both present on the real and the initramfs,
    with the big problem being ensuring they are there on the read only
    root fs from the initramds.

Both of which is needlessly complicated and fragile. So as for as I'm concerned
Fedora and the next RHEL will have these files under /dev. And if upstream
does not want this, then we will just keep patching mdadm / mdmon to do this
till the end of time. Note that /dev is already (ab)used in the same way
for passing dhcp leases from the initramfs to the running system when / lives
on a network device, and a few other state things which need to be passed
between the initramfs and the real root.

Pretty? No but effective and simple, and anytime you have this state passing
problem the most likely solution you will end up with, because it is
KISS and KISS is good.

Regards,

Hans

  parent reply	other threads:[~2010-02-07 22:13 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-11 20:38 Minor mdadm fixes Doug Ledford
2010-01-11 20:38 ` [[Patch mdadm] 1/5] Make the IMSM_DEVNAME_AS_SERIAL option work when creating containers. This allows a person to testing using loopback devices that don't support serial number queries Doug Ledford
2010-01-18 22:01   ` Neil Brown
2010-01-18 22:13   ` Dan Williams
2010-01-19  1:55     ` Doug Ledford
2010-01-19  4:42       ` Dan Williams
2010-01-19  5:31         ` Doug Ledford
2010-01-19  5:47           ` Dan Williams
2010-01-11 20:38 ` [[Patch mdadm] 2/5] Move the files mdmon opens into /dev/ to support handoff after pivotroot Doug Ledford
2010-01-18 22:09   ` Neil Brown
2010-01-19  7:21     ` Luca Berra
2010-01-19 17:51     ` Doug Ledford
2010-02-01 20:32       ` Bill Davidsen
2010-02-01 21:32         ` Doug Ledford
2010-02-01 22:42           ` Bill Davidsen
2010-02-02  4:08             ` Michael Evans
2010-02-02  7:17               ` Luca Berra
2010-02-02 15:42               ` Bill Davidsen
2010-02-02 18:19                 ` Doug Ledford
2010-02-04 13:50                   ` Bernd Schubert
2010-02-04 15:03                     ` Bernd Schubert
2010-02-04 15:48                       ` Doug Ledford
2010-02-04 16:40                         ` Bernd Schubert
2010-02-04 17:35                           ` Doug Ledford
2010-02-02 18:11               ` Doug Ledford
2010-02-02 18:07             ` Doug Ledford
2010-02-02 18:18               ` Bill Davidsen
2010-02-04  6:40       ` Neil Brown
2010-02-04 18:45         ` Doug Ledford
     [not found]           ` <4B6B15B3.8030205-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-04 23:04             ` Dan Williams
     [not found]               ` <e9c3a7c21002041504w17565653m5a8b8cd90543cf1e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-05  0:21                 ` Bill Davidsen
2010-02-05 12:14                   ` Luca Berra
2010-02-06 17:51               ` Doug Ledford
     [not found]                 ` <4B6DAC06.6060909-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-06 21:07                   ` Dan Williams
     [not found]                     ` <e9c3a7c21002061307le6f5d56ked4fa3711bdd2367-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-06 21:46                       ` martin f krafft
2010-02-06 22:06                         ` Michael Evans
2010-02-08 15:32                       ` Doug Ledford
2010-02-08 21:38                         ` Neil Brown
2010-02-09  0:20                           ` Michael Evans
     [not found]                           ` <20100209083838.6568cac0-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-02-09  2:19                             ` martin f krafft
     [not found]                               ` <20100209021949.GB11780-0owbi4v4jRjYceiJAzDLgeTW4wlIGRCZ@public.gmane.org>
2010-02-09 20:34                                 ` Doug Ledford
     [not found]                                   ` <4B71C6CA.3010407-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-02-10  0:58                                     ` Mr. James W. Laferriere
     [not found]                                       ` <alpine.LNX.2.01.1002091553580.10004-pIN9qAC4yfKseEBmXaVrNB5FPEiCeG3sAL8bYrjMMd8@public.gmane.org>
2010-02-10  1:33                                         ` Neil Brown
2010-02-10  9:46                                           ` Harald Hoyer
     [not found]                                           ` <20100210123321.324e5de6-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2010-02-10 15:49                                             ` Dan Williams
2010-02-10 16:06                                               ` Michael Evans
     [not found]                                                 ` <4877c76c1002100806w66e504deg767f6ecc8cc7fa8a-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-02-11  2:30                                                   ` Doug Ledford
2010-02-09 20:30                             ` Doug Ledford
2010-02-08  4:23                   ` Neil Brown
2010-02-07 22:13             ` Hans de Goede [this message]
2010-02-07 23:06               ` Neil Brown
2010-02-08  3:45           ` Neil Brown
2010-02-08 16:56             ` Bill Nottingham
2010-01-11 20:38 ` [[Patch mdadm] 3/5] We don't like %02d as a metadata format specifier, it confuses us when we read the output back later Doug Ledford
2010-01-18 22:02   ` Neil Brown
2010-01-11 20:38 ` [[Patch mdadm] 4/5] When using -D --export the UUID is helpful, so print it out Doug Ledford
2010-01-18 22:03   ` Neil Brown
2010-01-11 20:38 ` [[Patch mdadm] 5/5] Fix segfault when the AUTO keyword is used in the config file Doug Ledford
2010-01-18 22:03   ` Neil Brown
2010-01-12  0:49 ` Minor mdadm fixes Mr. James W. Laferriere
2010-01-12  3:10   ` Andre Noll
2010-01-12  3:36     ` Doug Ledford
2010-01-12  4:39       ` Andre Noll
2010-01-12  4:46         ` Doug Ledford
2010-01-12  5:21           ` Andre Noll
2010-01-18 22:05 ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B6F3B1D.9020008@redhat.com \
    --to=hdegoede-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=madduck-8fiUuRrzOP0dnm+yROfE0A@public.gmane.org \
    --cc=mmarek-l3A5Bk7waGM@public.gmane.org \
    --cc=neilb-l3A5Bk7waGM@public.gmane.org \
    --cc=notting-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).