From: Goffredo Baroncelli <kreijack@inwind.it>
To: Austin S Hemmelgarn <ahferroin7@gmail.com>,
Lennart Poettering <lennart@poettering.net>,
Harald Hoyer <harald@redhat.com>
Cc: linux-btrfs@vger.kernel.org, Kay Sievers <kay@vrfy.org>,
Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>
Subject: Re: Extend BTRFS_IOC_DEVICES_READY for degraded RAID
Date: Mon, 05 Jan 2015 18:57:21 +0100 [thread overview]
Message-ID: <54AAD081.9010206@inwind.it> (raw)
In-Reply-To: <54AAC3AD.3010802@gmail.com>
On 2015-01-05 18:02, Austin S Hemmelgarn wrote:
> On 2015-01-05 11:36, Goffredo Baroncelli wrote:
>> On 2015-01-05 12:31, Lennart Poettering wrote:
>>> On Mon, 05.01.15 10:46, Harald Hoyer (harald@redhat.com) wrote:
>>>
>>>> We have BTRFS_IOC_DEVICES_READY to report, if all devices are
>>>> present, so that a udev rule can report ID_BTRFS_READY and
>>>> SYSTEMD_READY.
>>>>
>>>> I think we need a third state here for a degraded RAID, which
>>>> can be mounted, but should only after a certain timeout/kernel
>>>> command line params.
>>>>
>>>> We also have to rethink how to handle the udev DB update for
>>>> the change of the state. incomplete -> degraded -> complete
>>>
>>> I am not convinced that automatically booting degraded arrays
>>> would be a good idea. Instead, requiring one manual step before
>>> booting a degraded array sounds OK to me.
>>
>> I think that a good use case is when the root filesystem is a raid
>> one.
>>
>> However I don't think that the current architecture is enough
>> flexible to perform this job:
> - mounting a raid filesystem in
>> degraded mode is good for some setup but it is not the right
>> solution for all: a configure parameter to allow one behavior or
>> the other is needed:
> - the degraded mode should be allowed only if
>> not all the devices are discovered AND a timeout is expired. This
>> timeout is another variable which (IMHO) should be configurable;
> These first 2 points can be easily handled with some simple logic in
> userspace without needing a mount helper.
If you implement it in a mount.btrfs, you have this logic available
for all cases, not only for mounting the root fs
>> - there are different degrees of degraded mode: if the raid is a
>> RAID6, losing a device would be acceptable; loosing two devices may
>> be unacceptable. Again there is no a simple answer; it is needed a
>> configurable policy;
> This can be solved by providing 2 new return values for the
> BBTRFS_IOC_DEVICES_READY ioctl (instead of just one), one for for
> arrays that are in such a state that losing another disk will almost
> certainly cause data loss (ie, a RAID6 with two missing devices, or a
> BTRFS raid1/10 with one missing device), and one for an array
> (theoretically) won't lose any data if one more device drops out (ie,
> a RAID6 (or something with higher parity) with one missing disk)
This is a detail; the point is that it is needed to implement this policy.
I am suggesting to not "spread" this logic in too many subsystem (kernel,
systemd, udev, scripts......).
BTRFS couples a filesystem with a devices manager. This exposes a lot of
new problems and options. I am suggesting to create a "tool" to manage all
these new problems/options. This tool is (of course) btrfs specific, and I
am convinced that a good place to start is a mount.btrfs helper.
>, and
> then provide a module parameter to allow forcing the kernel to report
> one or the other.
this policy should be different by mount point: if the machine is a
remote one, I can allow to mount the root of filesystem even in degraded
mode to start some "recovery"; but a more conservative policy may be
applied to the other ones fss.
This is one of the reason to let the policy out from the kernel.
>> - pay attention that the current architecture has some flaws: if a
>> device disappear during the device discovery, ID_BTRFS_READY
>> returns OK even if a device is missing.
> Point 4 would require for some kind of continuous
> scanning/notification (and therefore add more bulk, the lack of which
> is in my opinion one of the biggest advantages of BTRFS over ZFS),
> and even then there will always be the possibility that a device
> drops out between you calling the ioctl and trying to mount the
> filesystem.
If you shorter the windows, then less likely it may happen.
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
prev parent reply other threads:[~2015-01-05 17:55 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-05 9:46 Extend BTRFS_IOC_DEVICES_READY for degraded RAID Harald Hoyer
2015-01-05 11:31 ` Lennart Poettering
2015-01-05 12:08 ` Austin S Hemmelgarn
2015-01-05 16:36 ` Goffredo Baroncelli
2015-01-05 17:02 ` Austin S Hemmelgarn
2015-01-05 17:57 ` Goffredo Baroncelli [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54AAD081.9010206@inwind.it \
--to=kreijack@inwind.it \
--cc=ahferroin7@gmail.com \
--cc=clm@fb.com \
--cc=dsterba@suse.cz \
--cc=harald@redhat.com \
--cc=kay@vrfy.org \
--cc=lennart@poettering.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).