From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: Can I see what device was used to mount btrfs?
Date: Wed, 3 May 2017 07:32:52 -0400 [thread overview]
Message-ID: <1ed1b82c-9a1c-193f-5640-b19c6cb509f6@gmail.com> (raw)
In-Reply-To: <20170502221506.3dfe125e@jupiter.sol.kaishome.de>
On 2017-05-02 16:15, Kai Krakow wrote:
> Am Tue, 2 May 2017 21:50:19 +0200
> schrieb Goffredo Baroncelli <kreijack@inwind.it>:
>
>> On 2017-05-02 20:49, Adam Borowski wrote:
>>>> It could be some daemon that waits for btrfs to become complete.
>>>> Do we have something?
>>> Such a daemon would also have to read the chunk tree.
>>
>> I don't think that a daemon is necessary. As proof of concept, in the
>> past I developed a mount helper [1] which handled the mount of a
>> btrfs filesystem: this handler first checks if the filesystem is a
>> multivolume devices, if so it waits that all the devices are
>> appeared. Finally mount the filesystem.
>>
>>> It's not so simple -- such a btrfs device would have THREE states:
>>>
>>> 1. not mountable yet (multi-device with not enough disks present)
>>> 2. mountable ro / rw-degraded
>>> 3. healthy
>>
>> My mount.btrfs could be "programmed" to wait a timeout, then it
>> mounts the filesystem as degraded if not all devices are present.
>> This is a very simple strategy, but this could be expanded.
>>
>> I am inclined to think that the current approach doesn't fit well the
>> btrfs requirements. The roles and responsibilities are spread to too
>> much layer (udev, systemd, mount)... I hoped that my helper could be
>> adopted in order to concentrate all the responsibility to only one
>> binary; this would reduce the interface number with the other
>> subsystem (eg systemd, udev).
>>
>> For example, it would be possible to implement a sane check that
>> prevent to mount a btrfs filesystem if two devices exposes the same
>> UUID...
>
> Ideally, the btrfs wouldn't even appear in /dev until it was assembled
> by udev. But apparently that's not the case, and I think this is where
> the problems come from. I wish, btrfs would not show up as device nodes
> in /dev that the mount command identified as btrfs. Instead, btrfs
> would expose (probably through udev) a device node
> in /dev/btrfs/fs_identifier when it is ready.
>
> Apparently, the core problem of how to handle degraded btrfs still
> remains. Maybe it could be solved by adding more stages of btrfs nodes,
> like /dev/btrfs-incomplete (for unusable btrfs), /dev/btrfs-degraded
> (for btrfs still missing devices but at least one stripe of btrfs raid
> available) and /dev/btrfs as the final stage. That way, a mount process
> could wait for a while, and if the device doesn't appear, it tries the
> degraded stage instead. If the fs is opened from the degraded dev node
> stage, udev (or other processes) that scan for devices should stop
> assembling the fs if they still do so.
That won't work though because BTRFS is a _filesystem_ not a block
layer. We don't have any way of hiding things. Even if we did, we
would still need to parse the superblocks and chunk tree, and at that
point, it just makes more sense to try to mount the FS instead. IOW,
the correct way to determine if a BTRFS volume is mountable is to try to
mount it, not to wait and try to find all the devices.
>
> bcache has a similar approach by hiding an fs within a protective
> superblock. Unless bcache is setup, the fs won't show up in /dev, and
> that fs won't be visible by other means. Btrfs should do something
> similar and only show a single device node if assembled completely. The
> component devices would have superblocks ignored by mount, and only the
> final node would expose a virtual superblock and the compound device
> after it. Of course, this makes things like compound device resizing
> more complicated maybe even impossible.
Except there is no 'btrfs' device node for a filesystem. The only node
is /dev/btrfs-control, which is used for a small handful of things that
don't involve the mountability of any filesystem. To reiterate, we are
_NOT_ a block layer, so there is _NO_ associated block device for an
assembled multi-device volume, nor should there be.
>
> If I'm not totally wrong, I think this is also how zfs exposes its
> pools. You need user space tools to make the fs pools visible in the
> tree. If zfs is incomplete, there's nothing to mount, and thus no race
> condition. But I never tried zfs seriously, so I do not know.
For zvols, yes, this is how it works. For actual filesystem datasets,
it behaves almost identically to BTRFS AFAIK.
next prev parent reply other threads:[~2017-05-03 11:32 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-30 5:47 Can I see what device was used to mount btrfs? Andrei Borzenkov
2017-05-02 3:26 ` Anand Jain
2017-05-02 13:58 ` Adam Borowski
2017-05-02 14:19 ` Andrei Borzenkov
2017-05-02 18:49 ` Adam Borowski
2017-05-02 19:50 ` Goffredo Baroncelli
2017-05-02 20:15 ` Kai Krakow
2017-05-02 20:34 ` Adam Borowski
2017-05-03 11:32 ` Austin S. Hemmelgarn [this message]
2017-05-03 17:05 ` Goffredo Baroncelli
2017-05-03 18:43 ` Chris Murphy
2017-05-03 21:19 ` Duncan
2017-05-04 2:15 ` Chris Murphy
2017-05-04 3:48 ` Andrei Borzenkov
2017-05-03 11:26 ` Austin S. Hemmelgarn
2017-05-03 18:12 ` Andrei Borzenkov
2017-05-03 18:53 ` Austin S. Hemmelgarn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1ed1b82c-9a1c-193f-5640-b19c6cb509f6@gmail.com \
--to=ahferroin7@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).