Re: RFC: mdadm and bringing up raid sets from initrd (dracut)

linux-hotplug.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Harald Hoyer <harald@redhat.com>
To: David Zeuthen <david@fubar.dk>
Cc: Doug Ledford <dledford@redhat.com>,
	Hans de Goede <hdegoede@redhat.com>,
	initramfs <initramfs@vger.kernel.org>,
	linux-hotplug@vger.kernel.org, "Danecki,
	Jacek" <jacek.danecki@intel.com>
Subject: Re: RFC: mdadm and bringing up raid sets from initrd (dracut)
Date: Thu, 16 Jul 2009 10:56:18 +0000	[thread overview]
Message-ID: <4A5F0752.1020400@redhat.com> (raw)
In-Reply-To: <1247583632.1991.39.camel@localhost.localdomain>

On 07/14/2009 05:00 PM, David Zeuthen wrote:
> On Tue, 2009-07-14 at 10:14 -0400, Doug Ledford wrote:
>> On Jul 14, 2009, at 11:02 AM, Hans de Goede wrote:
>>> Hi,
>>> On 07/14/2009 03:39 PM, Doug Ledford wrote:
>>>> On Jul 14, 2009, at 6:59 AM, Hans de Goede wrote:
>>>>> Hi,
>>>>>
>>>>> As you probably know I'm working on making Fedora 12 use mdraid
>>>>> instead of dmraid for Intel BIOS-RAID setups.
>>>>>
>>>>> The installer (anaconda) part is mostly done (needs more testing)
>>>>> and now I'm looking at implementing support for this in dracut
>>>>> (the new mkinitrd for Fedora 12).
>>>>>
>>>>> So I've been testing how this works for both imsm mdraid sets
>>>>> and native mdraid metadata sets, in both cases using a 2 disk
>>>>> mirror, so that the set can also be brought up in degraded mode.
>>>>>
>>>>> Currently the udev rules use incremental assembly like this:
>>>>> mdadm -I /dev/mdraid-member
>>>> Hmmm...does dracut use udev during initramfs time?
>>> Yes, it uses udev for everything, making discovery of / consistent
>>> with the discovery of other storage devices.
>> I'm not sure I like or agree with that philosophy.  I absolutely
>> *don't* want my / filesystem or raid device treated like some plug in,
>> temporary, roaming raid device.  They *aren't* the same, not in terms
>> of importance to the running of the machine and not in terms of
>> reliability requirements.  By using mdadm -A in the mkinitrd calls, I
>> was able to put in an mdadm.conf file and limit what arrays get
>> started to arrays found non-ambiguously in that mdadm.conf file and
>> identified by UUID.  When you switch to incremental assembly for root,
>> you risk the possibility of name space collisions and non-
>> deterministic bring up of your / array.
>
> I'm concerned about this too. To be more specific, I'm concerned about
> both automatically assembling things like RAID arrays / LVM logical
> volumes and also automounting devices [1].
>
> Anyway, my point with all this is that maybe we are going about things
> wrong in the initramfs. My understanding is that dracut roughly works
> this way (please let me know if this is wrong)
>
>   1. when generating the initramfs image, we leave information in
>      the kernel command-line about the root filesystem - typically
>      the UUID - e.g. root=UUIDx6263c4-5e28-4cdc-97b8-1ab6e221c344
>
>   2. when the initramfs starts, we trigger all uevents and wait for
>      things to settle
>
>   3. Autoassembly / magic:
>
>      - If we see e.g. md components, we activate them via udev rules
>      - If we see e.g. LUKS devices, we unlock them (by interacting with
>        the user asking for the passphrase) via udev rules.
>      - Ditto for e.g. LVM
>
>   5. if we see the rootfs (matching on e.g. the UUID passed on the
>      kernel command line) we create the /dev/root symlink
>
>   6. when the system has settled (e.g. no more uevents) we mount
>      /dev/root and transition to non-early user space. If there
>      is no /dev/root link, we bail out
>
> Now, my beef is 3. above. I think it is way too optimistic to just
> auto-assemble / unlock etc. everything. E.g. we end up doing a lot of
> work not related to the rootfs that is better done in non-early user
> space.
>
> Instead, just like we specify the UUID for rootfs on the command-line,
> we need to leave some instructions to the initramfs logic on _exactly_
> what things should be autoassembled / unlocked / etc. in order to find
> the rootfs. So the kernel command-line wouldn't really be "just" the
> UUID of rootfs; it would be a whole recipe of actions to do. E.g.
>
>   ROOTFS=UUID\x1234          \ # this the UUID of my rootfs
>   MD_ASSEMBLE=UUIDE67     \ # assemble MD array with UUID 4567
>   LUKS_UNLOCK=UUID‰ab       # unlock LUKS device with UUID 89ab
>
> which would work for e.g. cases where rootfs is on a LUKS device which
> is on a MD array. In other words, we'd need a whole "recipe" passed to
> the initramfs (the mkinitrd tool would generate this recipe), not just
> the UUID of the rootfs.
>
> Coincidentally, if we had something like this and the format of the
> "recipe" was documented somewhere, it would be easy to e.g. implement
> "rescue" functionality as described here
>
> http://www.redhat.com/archives/fedora-desktop-list/2009-July/msg00019.html
>
> since graphical disk utilities would just find /etc/grub.conf (or
> similar), read the recipe and then start assembling/unlocking bits and
> mount them as appropriate in /mnt/rescue/.
>
> Actually this is very close to what Doug is asking for when he says
> (paraphrased) "just include mdadm.conf instead of this magic". The key
> difference, however, is that the user _won't_ have to use mdadm.conf or
> care about config files - it's all taken care of by the mkinitrd binary
> when building the recipe. This is a good thing as having one less config
> file to worry about is good.
>
> Thanks for considering, and sorry for the long mail,
> David
>
> [1] : As some background information, I've spent a good chunk of my
> life, five years or so, dealing with end users complaining about how
> plain block devices got automounted when they were plugged in. FWIW, the
> complaints ranges from both non-sensical (irritated users: "these
> desktop kids shall not decide how UNIX works") to actual bugs where the
> on-disk contents were mis-detected and either something wrong got
> automounted or we failed to automount at all.
>
> If I've learned anything it's that you need to be very very careful here
> - unlike Windows and other operating systems with such capabilities,
> Linux is.. different.. mostly because we support so many different ways
> to put a file system through things likd md and dm. And you need to make
> it very easy to turn things like this off.
>
>
>

David, thanks for your suggestion. As of yesterday, dracut recognizes now the 
following command line parameters:

LVM
        rd_NO_LVM
               disable LVM detection

        rd_LVM_VG=<volume group name>
               only activate the volume groups with the given name

crypto LUKS
        rd_NO_LUKS
               disable crypto LUKS detection

        rd_LUKS_UUID=<luks uuid>
               only activate the LUKS partitions with the given UUID

MD
        rd_NO_MD
               disable MD RAID detection

        rd_MD_UUID=<md uuid>
               only activate the raid sets with the given UUID

DMRAID
        rd_NO_DM
               disable DM RAID detection

        rd_DM_UUID=<dmraid uuid>
               only activate the raid sets with the given UUID

next prev parent reply	other threads:[~2009-07-16 10:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-14  9:57 RFC: mdadm and bringing up raid sets from initrd (dracut) Hans de Goede
2009-07-14 13:39 ` Doug Ledford
     [not found]   ` <1955210A-EF27-479F-8C58-BA4FA9018A56-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-07-14 14:01     ` Hans de Goede
2009-07-14 14:14       ` Doug Ledford
     [not found]         ` <D758972F-0E5A-4860-9011-6B2DA1FA771A-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-07-14 15:00           ` David Zeuthen
2009-07-16 10:56             ` Harald Hoyer [this message]
     [not found] ` <4A5C6501.3080607-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-07-14 14:30   ` David Zeuthen
     [not found]     ` <1247581847.1991.16.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-07-15 18:47       ` Dan Williams
2009-07-16  0:16         ` Jeremy Katz
     [not found]           ` <20090716001651.GB45537-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-07-16  7:11             ` Victor Lowther
2009-07-16 10:56         ` Neil Brown
2009-07-16 11:09         ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A5F0752.1020400@redhat.com \
    --to=harald@redhat.com \
    --cc=david@fubar.dk \
    --cc=dledford@redhat.com \
    --cc=hdegoede@redhat.com \
    --cc=initramfs@vger.kernel.org \
    --cc=jacek.danecki@intel.com \
    --cc=linux-hotplug@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).