Re: degraded permanent mount option

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chris Murphy <lists@colorremedies.com>
To: Tomasz Pala <gotar@polanet.pl>
Cc: "Majordomo vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: degraded permanent mount option
Date: Sat, 27 Jan 2018 13:57:29 -0700	[thread overview]
Message-ID: <CAJCQCtT8_zdmc5oTLLa7AQt5_ObQchwBvFJCNf2UkC7ygn0rXw@mail.gmail.com> (raw)
In-Reply-To: <20180127110619.GA10472@polanet.pl>

On Sat, Jan 27, 2018 at 4:06 AM, Tomasz Pala <gotar@polanet.pl> wrote:

> As for the regular by-UUID mounts: these links are created by udev WHEN
> underlying devices appear. Does btrfs volume appear? No.

If I boot with rd.break=pre-mount I can absolutely mount a Btrfs
multiple volume that has a missing device by UUID with --uuid flag, or
by /dev/sdXY, along with -o degraded. And I can then use the exit
command to continue the startup process. In fact I can try to mount
without -o degraded, and the mount command "works" in that it does not
complain about an invalid node or UUID.

The Btrfs systemd udev rule is a sledghammer because it has no
timeout. It neither times out and tries to mount anyway, nor does it
time out and just drop to a dracut prompt. There are a number of
things in systemd startups that have timeouts, I have no idea how they
get defined, but that single thing would make this a lot better. Right
now the Btrfs udev rule means if all devices aren't available, hang
indefinitely.

I don't know systemd or systemd-udev well enough at all to know if
this rule can have a timer. Service units absolutely can have timers,
so maybe there's a way to marry a udev rule with a service which has a
timer. The absolute dumbest thing that's better than now, is at the
timer just fail and drop to a dracut prompt. Better would be to try a
normal mount anyway, which also fails to a dracut prompt, but
additionally gives us a kernel error for Btrfs (the missing device
open ctree error you'd expect to get when mounting without -o degraded
when you're missing a device). And even better would be a way for the
user to edit the service unit to indicate "upon timeout being reached,
use mount -o degraded rather than just mount". This is the simplest of
Boolean logic, so I'd be surprised if systemd doesn't offer a way for
us to do exactly what I'm describing.

Again the central problem is the udev rule now means "wait for device
to appear" with no timed fallback.

The mdadm case has this, and it's done by dracut. At this same stage
of startup with a  missing device, there is in fact no fs colume UUID
yet because the array hasn't started. Dracut+mdadm knows there's a
missing device so it's just iterating: look, sleep 3, look, sleep 3,
look, sleep 3. It's on a loop. And after that loop hits something like
100, the script says f it, start array anyway, so now there is a
degraded array, and for the first time the fs volume UUID appears, and
systemd goes "ahaha! mount that!" and it does it normally.

So the timer and timeout and what happens at the timeout is defined by
dracut. That's probably why the systemd folks say "not our problem"
and why the kernel folks say "not our problem".

> If btrfs pretends to be device manager it should expose more states,
> especially "ready to be mounted, but not fully populated" (i.e.
> "degraded mount possible"). Then systemd could _fallback_ after timing
> out to degraded mount automatically according to some systemd-level
> option.

No, mdadm is a device manager and it has no such facility. Something
issues a command to start the array anyway, and only then do you find
out if there are enough devices to start it. I don't understand the
value of knowing whether it is possible. Just try to mount it degraded
and then if it fails we fail, nothing can be done automatically it's
up to an admin.

And even if you had this "degraded mount possible" state, you still
need a timer. So just build the timer.

If all devices ready ioctl is true, the timer doesn't start, it means
all devices are available, mount normally.
If all devices ready ioctl is false, the timer starts, if all devices
appear later the ioctl goes to true, the timer is belayed, mount
normally.
If all devices ready ioctl is false, the timer starts, when the timer
times out, mount normally which fails and gives us a shell to
troubleshoot at.
OR
If all devices ready ioctl is false, the timer starts, when the timer
times out, mount with -o degraded which either succeeds and we boot or
it fails and we have a troubleshooting shell.

The central problem is the lack of a timer and time out.

> Unless there is *some* signalling from btrfs, there is really not much
> systemd can *safely* do.

That is not true. It's not how mdadm works anyway.

-- 
Chris Murphy

next prev parent reply	other threads:[~2018-01-27 20:57 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-26 14:02 degraded permanent mount option Christophe Yayon
2018-01-26 14:18 ` Austin S. Hemmelgarn
2018-01-26 14:47   ` Christophe Yayon
2018-01-26 14:55     ` Austin S. Hemmelgarn
2018-01-27  5:50     ` Andrei Borzenkov
     [not found]       ` <1517035210.1252874.1249880112.19FABD13@webmail.messagingengine.com>
2018-01-27  6:43         ` Andrei Borzenkov
2018-01-27  6:48           ` Christophe Yayon
2018-01-27 10:08             ` Christophe Yayon
2018-01-27 10:26               ` Andrei Borzenkov
2018-01-27 11:06                 ` Tomasz Pala
2018-01-27 13:26                   ` Adam Borowski
2018-01-27 14:36                     ` Goffredo Baroncelli
2018-01-27 15:38                       ` Adam Borowski
2018-01-27 15:22                     ` Duncan
2018-01-28  0:39                       ` Tomasz Pala
2018-01-28 20:02                         ` Chris Murphy
2018-01-28 22:39                           ` Tomasz Pala
2018-01-29  0:00                             ` Chris Murphy
2018-01-29  8:54                               ` Tomasz Pala
2018-01-29 11:24                                 ` Adam Borowski
2018-01-29 13:05                                   ` Austin S. Hemmelgarn
2018-01-30 13:46                                     ` Tomasz Pala
2018-01-30 15:05                                       ` Austin S. Hemmelgarn
2018-01-30 16:07                                         ` Tomasz Pala
2018-01-29 17:58                                   ` Andrei Borzenkov
2018-01-29 19:00                                     ` Austin S. Hemmelgarn
2018-01-29 21:54                                       ` waxhead
2018-01-30 13:46                                         ` Austin S. Hemmelgarn
2018-01-30 19:50                                           ` Tomasz Pala
2018-01-30 20:40                                             ` Austin S. Hemmelgarn
2018-01-30 15:24                                       ` Tomasz Pala
2018-01-30 13:36                                   ` Tomasz Pala
2018-01-30  4:44                                 ` Chris Murphy
2018-01-30 15:40                                   ` Tomasz Pala
2018-01-28  8:06                       ` Andrei Borzenkov
2018-01-28 10:27                         ` Tomasz Pala
2018-01-28 15:57                         ` Duncan
2018-01-28 16:51                           ` Andrei Borzenkov
2018-01-28 20:28                         ` Chris Murphy
2018-01-28 23:13                           ` Tomasz Pala
2018-01-27 21:12                     ` Chris Murphy
2018-01-28  0:16                       ` Tomasz Pala
2018-01-27 22:42                     ` Tomasz Pala
2018-01-29 13:42                       ` Austin S. Hemmelgarn
2018-01-30 15:09                         ` Tomasz Pala
2018-01-30 16:22                           ` Tomasz Pala
2018-01-30 16:30                           ` Austin S. Hemmelgarn
2018-01-30 19:24                             ` Tomasz Pala
2018-01-30 19:40                             ` Tomasz Pala
2018-01-27 20:57                   ` Chris Murphy [this message]
2018-01-28  0:00                     ` Tomasz Pala
2018-01-28 10:43                       ` Tomasz Pala
2018-01-26 21:54 ` Chris Murphy
2018-01-26 22:03   ` Christophe Yayon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJCQCtT8_zdmc5oTLLa7AQt5_ObQchwBvFJCNf2UkC7ygn0rXw@mail.gmail.com \
    --to=lists@colorremedies.com \
    --cc=gotar@polanet.pl \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).