From: Robert White <rwhite@pobox.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
Goffredo Baroncelli <kreijack@inwind.it>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: BTRFS messes up snapshot LV with origin
Date: Fri, 28 Nov 2014 23:55:07 -0800 [thread overview]
Message-ID: <54797BDB.1050905@pobox.com> (raw)
In-Reply-To: <20141129045924.GX17395@hungrycats.org>
On 11/28/2014 08:59 PM, Zygo Blaxell wrote:
> On Fri, Nov 28, 2014 at 06:05:48PM +0100, Goffredo Baroncelli wrote:
>> On 11/27/2014 05:15 AM, Zygo Blaxell wrote:
>>> This is a weakness of the current udev and asynchronous device hotplug
>>> concept: there is no notion of bus enumeration in progress, so we can be
>>> trying to assemble multi-device storage before we have all the devices
>>> visible. Assembly of aggregate storage (whatever it is--btrfs, md,
>>> lvm2...) has to wait until all known storage buses are fully enumerated
>>> in order to detect if there are duplicates.
>>
>> It is more complex than that. Some devices may appear after the "1st" bus
>> enumeration.
>
> That case is well handled already--a new enumeration will start with the
> second (and all later) hotplug events.
>
> The problem arises when we try to assemble disk arrays before the
> known end of the "1st" (or any) enumeration. There is no way for an
> enumerating agent to tell other agents "this is definitely not the
> complete list of devices yet, other devices may be inserted imminently"
> and defer all the multi-device assembly until the address space of the
> enumering bus is fully covered.
>
MDADM has an "attached" but not "started" state for arrays that handles
this condition during incremental assembly. (see "mdadm --incremental
/dev/whatever"),
To slightly misuse the vocabulary, as each partition is encountered and
submitted to the system it's checked for a superblock. If one is found
then it has the identity of an array encoded on it and if that array
doesn't exist it is allocated, otherwise the device is added to the
existent array. The array is only started if all the devices are
accounted for unless an option is added to allow earlier starts, and
even then "enough" of the devices must be present to make sense (e.g.
only one device missing from a RAID5, or a correct pair of devices for a
RAID10 etc.)
So we'd need a "partially assembled but not started" state and some
ioctls to do things like force-start or force-disown a filesystem that
cannot be "finished" automatically.
That sort of thing is very easy to do with devices because devices don't
have to be opened and can reject an open attempt, or at least the
read/writes after an open and such.
Unfortunately a filesystem can really only exist as a mounted thing, and
can really only be controlled by remounting thereafter. The most
efficient way to do this would be to have a alternate file system
operations structure that was filled mostly with dummy operations that
would return ENOENT and friends. Then the remount that finally fulfilled
the file system's requirements would then switch out that struct for the
fully functional one. That remount would need an "adddev=" and some
other such options (much like AUFS adds layers).
It;s all doable. But it stretches to near breaking the "mount" paradigm.
You would need an operation that looked like "mount -t btrfs -o
do_we_need_this /dev/whatever /this/datum/means/nothing" to match and
attach a device "wherever it goes" or you might end up needing to do the
Cartesian product of trial attachments of each new device to all active
fileystems to match it up, which is an ugly external scripting requirement.
As far as waiting for the address space to be fully covered. Meh. If a
ready-or-not, or ready-enough, status is established in the file system
it would be undesirable for it to know anything about any other subsystem.
We don't care if enumeration is "done" we only care if we have a
rational set of storage, and whether that rational set is "enough" to be
fully ready, enough to be only read-ready, or just plain not enough.
In theory, the idempotent mount command could be
mount -t btrfs some-uuid-instead-of-device /mount/point
mount -t btrfs some-other-uuid-here /other/mount/point
to create the zero-devices involved entity, followed by
mount -t btrfs -o trydev /dev/something /this/bit/is/ignored
repeated for all possible somethings. /mount/point and
/other/mount/point would be returning ENOENT for their contents until
they were ready-enough.
In practice this is very impure compared to how mdadm has the /dev/md-
namespace in which to build its devices before any actual mount is possible.
next prev parent reply other threads:[~2014-11-29 7:55 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-16 21:35 BTRFS messes up snapshot LV with origin MegaBrutal
2014-11-17 1:42 ` Duncan
2014-11-17 6:59 ` Brendan Hide
2014-11-17 7:35 ` Daniel Dressler
2014-11-17 9:00 ` Brendan Hide
2014-11-17 19:04 ` Goffredo Baroncelli
[not found] ` <CAE8gLh=VubBbZdeKTAuWRjOxPF7C+ouUeeVvmGfT2ckYWGhQVA@mail.gmail.com>
2014-11-17 19:45 ` Fwd: " MegaBrutal
2014-11-17 20:32 ` Goffredo Baroncelli
2014-11-18 6:16 ` Chris Murphy
2014-11-18 15:42 ` Phillip Susi
2014-11-18 19:17 ` Chris Murphy
2014-11-18 20:17 ` Phillip Susi
2014-11-19 2:54 ` Chris Murphy
2014-11-19 15:20 ` Phillip Susi
2014-11-19 18:35 ` Chris Murphy
2014-11-19 19:23 ` Phillip Susi
2014-11-21 4:28 ` Zygo Blaxell
2014-11-21 6:22 ` Duncan
2014-11-21 11:35 ` Robert White
2014-11-21 11:54 ` Duncan
2014-11-21 17:56 ` Zygo Blaxell
2014-11-21 23:09 ` Duncan
2014-11-21 18:23 ` Chris Murphy
2014-11-21 22:49 ` Duncan
2014-11-21 23:41 ` Duncan
2014-11-21 23:51 ` Duncan
2014-11-22 17:34 ` Goffredo Baroncelli
2014-11-23 0:19 ` Zygo Blaxell
2014-11-25 16:34 ` Goffredo Baroncelli
2014-11-25 20:29 ` Zygo Blaxell
2014-11-25 21:59 ` Goffredo Baroncelli
2014-11-25 22:21 ` Zygo Blaxell
2014-11-25 22:47 ` Chris Murphy
[not found] ` <CAJCQCtQUM=viSoPtcJMcyKquYb1DLmEsqBi=p++uXPy63+r3Ow@mail.gmail.com>
[not found] ` <20141126021134.GR17380@hungrycats.org>
2014-11-26 4:48 ` Chris Murphy
2014-11-26 17:19 ` Goffredo Baroncelli
2014-11-27 4:15 ` Zygo Blaxell
2014-11-28 17:05 ` Goffredo Baroncelli
2014-11-29 1:25 ` Robert White
2014-11-29 7:35 ` Goffredo Baroncelli
2014-11-29 8:02 ` Robert White
2014-11-29 7:37 ` MegaBrutal
2014-11-29 4:59 ` Zygo Blaxell
2014-11-29 7:55 ` Robert White [this message]
2014-12-01 15:25 ` Zygo Blaxell
2014-11-26 3:22 ` Duncan
2014-11-26 5:11 ` Chris Murphy
2014-11-26 22:08 ` Robert White
2014-11-27 9:08 ` Duncan
2014-11-28 7:10 ` Chris Murphy
2014-11-29 7:29 ` Duncan
2014-11-29 8:20 ` Robert White
2014-11-29 9:41 ` Duncan
2014-11-29 16:33 ` Robert White
2014-11-29 16:50 ` Robert White
2014-11-30 6:46 ` Duncan
2014-11-29 21:15 ` Chris Murphy
2014-11-18 20:41 ` MegaBrutal
2014-11-19 1:29 ` Robert White
2014-11-19 3:37 ` Duncan
2014-11-21 4:24 ` Zygo Blaxell
2014-11-18 6:21 ` Chris Murphy
2014-11-18 12:13 ` Duncan
2014-11-18 20:01 ` Goffredo Baroncelli
-- strict thread matches above, loose matches on Subject: below --
2014-11-17 8:00 MegaBrutal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54797BDB.1050905@pobox.com \
--to=rwhite@pobox.com \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).