Re: RFC: mdadm and bringing up raid sets from initrd (dracut)

linux-hotplug.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Neil Brown <neilb@suse.de>
To: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: David Zeuthen <david-o55+BOBDEFg@public.gmane.org>,
	Hans de Goede <hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	initramfs <initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-hotplug-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Danecki,
	Jacek" <jacek.danecki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Harald Hoyer <harald-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: RFC: mdadm and bringing up raid sets from initrd (dracut)
Date: Thu, 16 Jul 2009 11:09:52 +0000	[thread overview]
Message-ID: <19039.2688.856024.360983@notabene.brown> (raw)
In-Reply-To: message from Dan Williams on Wednesday July 15

On Wednesday July 15, dan.j.williams@intel.com wrote:
> [ Cc: Neil ]
> 
> On Tue, Jul 14, 2009 at 7:30 AM, David Zeuthen<david@fubar.dk> wrote:
> > On Tue, 2009-07-14 at 12:59 +0200, Hans de Goede wrote:
> >> Currently the udev rules use incremental assembly like this:
> >> mdadm -I /dev/mdraid-member
> >>
> >> There are 2 problems with this:
> >> 1) When doing this for native mdraid metadata arrays, if only
> >>     one disk is present the set never gets activated
> >> 2) When doing this for imsm metadata arrays, as soon as the
> >>     first disk is incrementally added, the set gets activated
> >>     in degraded mode and stays that way, the second disk
> >>     will get added to the container, but not to the actual
> >>     sets in the container
> >
> > FWIW, this incremental assembly business in mdadm is actually not a very
> > good idea. At least not the current implementation. I'm not sure whether
> > it's still a Fedora-ism or whether it's something that's in upstream
> > mdadm yet. I'm talking about this udev rule
> >
> >  /lib/udev/rules.d/65-md-incremental.rules:
> >  # This file causes block devices with Linux RAID (mdadm) signatures to
> >  # automatically cause mdadm to be run.
> >  # See udev(8) for syntax
> >
> >  SUBSYSTEM="block", ACTION="add", ENV{ID_FS_TYPE}="linux_raid_member", \
> >        IMPORT{program}="/sbin/mdadm --examine --export $tempnode", \
> >        RUN+="/bin/bash -c '[ ! -f /dev/.in_sysinit ] && mdadm -I $env{DEVNAME}'"
> >
> > For example if the user plugs in a random old disk that happens to
> > contain half of a RAID1 mirror, then the incremental assembly bits sets
> > up an inert md-device and the user is now left to his own devices as to
> > sort this out when he's told by partitioning tools etc. that the disk
> > (or partition of) he just plugged in, is "busy" (it is claimed by the
> > inert md node).
> >
> > I actually had to add some extra code to the GNOME Disk Utility bits to
> > handle such things (stop inert md devices) - makes the user experience
> > quite a bit worse since there's now an extra state to worry about. And
> > most current users don't use the UI bits yet for this so they get extra
> > confused when trying to use e.g. parted(8) or fdisk(8) on the device.
> >
> > FWIW, I'd wish people would stop playing games like this. If you want to
> > do auto-assembly at the system-level, at the very least don't leave the
> > system in a state like this. For example, one way to do auto-assembly
> > without such bugs would be to use libudev to enumerate all md component
> > devices with the same MD_UUID. Then you count the number of components
> > and only start the array if the number of components equals MD_DEVICES.
> > That's much better than incrementally adding to an md device node that
> > might never get used.

Yes:  auto-assembly is hard, and easy to get wrong.

While I don't claim that the current scheme is at all perfect, I don't
think your suggestion is a clear improvement.
The whole point of RAID is to survive drive failure, and that includes
drives being missing.
So I don't think "completely ignore the array if not all expected
drives are present" is the correct answer.

It is very easy to remove unwanted raid metadata 
(mdadm --zero-superblock), and making that easily accessible from a
GUI would probably be a good and useful thing, and might solve some
problems for some people.

One thing that I have contemplated is for md to not claim exclusive
ownership of drives until the array is activated and switch to
read-write.  That would address the 'my drive was stolen by md'
problem, but it may well create other problems in its place.

My general goal at present is to make mdadm sufficiently flexible that
a distro can choose a suitable policy implement it.  If someone comes
up with a policy that works convincingly well, I could then make that
the default approach that mdadm takes.
There is certainly still room for improvement and I am happy to
discuss possibilities.

NeilBrown

     prev parent reply	other threads:[~2009-07-16 11:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-14  9:57 RFC: mdadm and bringing up raid sets from initrd (dracut) Hans de Goede
2009-07-14 13:39 ` Doug Ledford
     [not found]   ` <1955210A-EF27-479F-8C58-BA4FA9018A56-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-07-14 14:01     ` Hans de Goede
2009-07-14 14:14       ` Doug Ledford
     [not found]         ` <D758972F-0E5A-4860-9011-6B2DA1FA771A-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-07-14 15:00           ` David Zeuthen
2009-07-16 10:56             ` Harald Hoyer
     [not found] ` <4A5C6501.3080607-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-07-14 14:30   ` David Zeuthen
     [not found]     ` <1247581847.1991.16.camel-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-07-15 18:47       ` Dan Williams
2009-07-16  0:16         ` Jeremy Katz
     [not found]           ` <20090716001651.GB45537-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2009-07-16  7:11             ` Victor Lowther
2009-07-16 10:56         ` Neil Brown
2009-07-16 11:09         ` Neil Brown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19039.2688.856024.360983@notabene.brown \
    --to=neilb@suse.de \
    --cc=dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=david-o55+BOBDEFg@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=harald-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=hdegoede-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=initramfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=jacek.danecki-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=linux-hotplug-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).