linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tomasz Pala <gotar@polanet.pl>
To: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: degraded permanent mount option
Date: Mon, 29 Jan 2018 00:13:30 +0100	[thread overview]
Message-ID: <20180128231330.GB26726@polanet.pl> (raw)
In-Reply-To: <CAJCQCtSCC5RFEybWTpWVDeVS9MPwBsbY0-F4C_mB0Mq5EDhH9g@mail.gmail.com>

On Sun, Jan 28, 2018 at 13:28:55 -0700, Chris Murphy wrote:

>> Are you sure you really understand the problem? No mount happens because
>> systemd waits for indication that it can mount and it never gets this
>> indication.
> 
> "not ready" is rather vague terminology but yes that's how systemd
> ends up using the ioctl this rule depends on, even though the rule has
> nothing to do with readiness per se. If all devices for a volume

If you avoid using THIS ioctl, then you'd have nothing to fire the rule
at all. One way or another, this is btrfs that must emit _some_ event or
be polled _somehow_.

> aren't found, we can correctly conclude a normal mount attempt *will*
> fail. But that's all we can conclude. What I can't parse in all of
> this is if the udev rule is a one shot, if the ioctl is a one shot, if
> something is constantly waiting for "not all devices are found" to
> transition to "all devices are found" or what. I can't actually parse

It's not one shot. This works like this:

sda1 appears -> udev catches event -> udev detects btrfs and IOCTLs => not ready
sdb1 appears -> udev catches event -> udev detects btrfs and IOCTLs => ready

The end.

If there were some other device appearing after assembly, like /dev/md1,
or if there were some event generated by btrfs code itself, udev could
catch this and follow. Now, if you unplug sdb1, there's no such event at
all.

Since this IOCTL is the *only* thing that udev can rely on, it cannot be
removed from the logic. So even if you create a timer to force assembly,
you must do it by influencing the IOCTL response.

Or creating some other IOCTL for this purpose, or creating some
userspace daemon or whatever.

> the two critical lines in this rule. I
> 
> # let the kernel know about this btrfs filesystem, and check if it is complete
> IMPORT{builtin}="btrfs ready $devnode"

This sends IOCTL.

> # mark the device as not ready to be used by the system
> ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
      ^^^^^^^^^^^^^^this is IOCTL response being checked

and SYSTEMD_READY set to 0 prevents systemd from mounting.

> I think the Btrfs ioctl is a one shot. Either they are all present or not.

The rules are called once per (block) device.
So when btrfs scans all the devices to return READY, this would finally
be systemd-ready. This is trivial to re-trigger udev rule (udevadm trigger),
but there is no way to force btrfs to return READY after any timeout.

> The waiting is a policy by systemd udev rule near as I can tell.

There is no problem in waiting or re-triggering. This can be done in ~10
lines of rules. The problem is that the IOCTL won't EVER return READY until
there are ALL the components present.

It's simple as that: there MUST be some mechanism at device-manager
level that tells if a compound device is mountable, degraded or not;
upper layers (systemd-mount) do not care about degradation, handling
redundancy/mirrors/chunks/stripes/spares is not it's job.
It (systemd) can (easily!) handle expiration timer to push pending
compound to be force-assembled, but currently there is no way to push.


If the IOCTL would be extended to return TRYING_DEGRADED (when
instructed to do so after expired timeout), systemd could handle
additional per-filesystem fstab options, like x-systemd.allow-degraded.

Then in would be possible to have best-effort policy for rootfs (to make
machine boot), and more strict one for crucial data (do not mount it
when there is no redundancy, wait for operator intervention).

-- 
Tomasz Pala <gotar@pld-linux.org>

  reply	other threads:[~2018-01-28 23:13 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-26 14:02 degraded permanent mount option Christophe Yayon
2018-01-26 14:18 ` Austin S. Hemmelgarn
2018-01-26 14:47   ` Christophe Yayon
2018-01-26 14:55     ` Austin S. Hemmelgarn
2018-01-27  5:50     ` Andrei Borzenkov
     [not found]       ` <1517035210.1252874.1249880112.19FABD13@webmail.messagingengine.com>
2018-01-27  6:43         ` Andrei Borzenkov
2018-01-27  6:48           ` Christophe Yayon
2018-01-27 10:08             ` Christophe Yayon
2018-01-27 10:26               ` Andrei Borzenkov
2018-01-27 11:06                 ` Tomasz Pala
2018-01-27 13:26                   ` Adam Borowski
2018-01-27 14:36                     ` Goffredo Baroncelli
2018-01-27 15:38                       ` Adam Borowski
2018-01-27 15:22                     ` Duncan
2018-01-28  0:39                       ` Tomasz Pala
2018-01-28 20:02                         ` Chris Murphy
2018-01-28 22:39                           ` Tomasz Pala
2018-01-29  0:00                             ` Chris Murphy
2018-01-29  8:54                               ` Tomasz Pala
2018-01-29 11:24                                 ` Adam Borowski
2018-01-29 13:05                                   ` Austin S. Hemmelgarn
2018-01-30 13:46                                     ` Tomasz Pala
2018-01-30 15:05                                       ` Austin S. Hemmelgarn
2018-01-30 16:07                                         ` Tomasz Pala
2018-01-29 17:58                                   ` Andrei Borzenkov
2018-01-29 19:00                                     ` Austin S. Hemmelgarn
2018-01-29 21:54                                       ` waxhead
2018-01-30 13:46                                         ` Austin S. Hemmelgarn
2018-01-30 19:50                                           ` Tomasz Pala
2018-01-30 20:40                                             ` Austin S. Hemmelgarn
2018-01-30 15:24                                       ` Tomasz Pala
2018-01-30 13:36                                   ` Tomasz Pala
2018-01-30  4:44                                 ` Chris Murphy
2018-01-30 15:40                                   ` Tomasz Pala
2018-01-28  8:06                       ` Andrei Borzenkov
2018-01-28 10:27                         ` Tomasz Pala
2018-01-28 15:57                         ` Duncan
2018-01-28 16:51                           ` Andrei Borzenkov
2018-01-28 20:28                         ` Chris Murphy
2018-01-28 23:13                           ` Tomasz Pala [this message]
2018-01-27 21:12                     ` Chris Murphy
2018-01-28  0:16                       ` Tomasz Pala
2018-01-27 22:42                     ` Tomasz Pala
2018-01-29 13:42                       ` Austin S. Hemmelgarn
2018-01-30 15:09                         ` Tomasz Pala
2018-01-30 16:22                           ` Tomasz Pala
2018-01-30 16:30                           ` Austin S. Hemmelgarn
2018-01-30 19:24                             ` Tomasz Pala
2018-01-30 19:40                             ` Tomasz Pala
2018-01-27 20:57                   ` Chris Murphy
2018-01-28  0:00                     ` Tomasz Pala
2018-01-28 10:43                       ` Tomasz Pala
2018-01-26 21:54 ` Chris Murphy
2018-01-26 22:03   ` Christophe Yayon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180128231330.GB26726@polanet.pl \
    --to=gotar@polanet.pl \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).