From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Tomasz Pala <gotar@polanet.pl>,
Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: degraded permanent mount option
Date: Tue, 30 Jan 2018 10:05:34 -0500 [thread overview]
Message-ID: <2e6b43ce-048f-2404-9455-c768f95e34fb@gmail.com> (raw)
In-Reply-To: <20180130134649.GA7126@polanet.pl>
On 2018-01-30 08:46, Tomasz Pala wrote:
> On Mon, Jan 29, 2018 at 08:05:42 -0500, Austin S. Hemmelgarn wrote:
>
>> Seriously, _THERE IS A RACE CONDITION IN SYSTEMD'S CURRENT HANDLING OF
>> THIS_. It's functionally no different than prefacing an attempt to send
>> a signal to a process by checking if the process exists, or trying to
>> see if some other process is using a file that might be locked by
>
> Seriously, there is a race condition on train stations. People check if
> the train has stopped and opened the door before they move their legs to
> get in, but the train might be already gone - so this is pointless.
>
> Instead, they should move their legs continuously and if the train is > not on the station yet, just climb back and retry.
No, that's really not a good analogy given the fact that that check for
the presence of a train takes a normal person milliseconds while the
event being raced against (the train departing) takes minutes. In the
case being discussed, the check takes milliseconds and the event being
raced against also takes milliseconds. The scale here is drastically
different.>
> See the difference? I hope now you know what is the race condition.
> It is the condition, where CONSEQUENCES are fatal.
Yes, the consequences of the condition being discussed functionally are
fatal (you completely fail to mount the volume), because systemd doesn't
retry mounting the root filesystem, it just breaks, which is absolutely
at odds with the whole 'just works' mentality I always hear from the
systemd fanboys and developers.
You're already looping forever _waiting_ for the volume to appear. How
is that any different from lopping forever trying to _mount_ the volume
instead given that failing to mount the volume is not going to damage
things. The issue here is that systemd refuses to implement any method
of actually retrying things that fail during startup.>
> mounting BEFORE volume is complete is FATAL - since no userspace daemon
> would ever retrigger the mount and the system won't came up. Provide one
> btrfsd volume manager and systemd could probably switch to using it.
And here you've lost any respect I might have had for you.
**YOU DO NOT NEED A DAEMON TO DO EVERY LAST TASK ON THE SYSTEM**
Period, end of story.
<rant>
This is one of the two biggest things I hate about systemd (the journal
is the other one for those who care). You don't need some special
daemon to set the time, or to set the hostname, or to fetch account
data, or even to track who's logged in (though I understand that the
last one is not systemd's fault originally).
As much as it may surprise the systemd developers, people got on just
fine handling setting the system time, setting the hostname, fetching
account info, tracking active users, and any number of myriad other
tasks before systemd decided they needed to have their own special daemon.
</rant>
In this particular case, you don't need a daemon because the kernel does
the state tracking. It only checks that state completely though _when
you ask it to mount the filesystem_ because it requires doing 99% of the
work of mounting the filesystem (quite literally, you're doing pretty
much everything short of actually hooking things up in the VFS layer).
We are not a case like MD where there's just a tiny bit of metadata to
parse to check what the state is supposed to be. Imagine if LVM
required you to unconditionally activate all the LV's in a VG when you
activate the VG and what logic would be required to validate the VG
then, and you're pretty close to what's needed to check state for a
BTRFS volume (translating LV's to chunks and the VG to the filesystem as
a whole). There is no point in trying to parse that data every time a
new device shows up, it's a waste of time (at a minimum, you're almost
doubling the amount of time it takes to mount a volume if you are doing
this each time a device shows up), energy, and resources in general.
>
> mounting AFTER volume is complete is FINE - and if the "pseudo-race" happens
> and volume disappears, then this was either some operator action, so the
> umount SHOULD happen, or we are facing some MALFUNCION, which is fatal
> itself, not by being a "race condition".
Short of catastrophic failure, the _volume_ doesn't disappear, a
component device does, and that is where the problem lies, especially
given that the ioctl only tracks that each component device has been
seen, not that all are present at the moment the ioctl is invoked.
next prev parent reply other threads:[~2018-01-30 15:05 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-26 14:02 degraded permanent mount option Christophe Yayon
2018-01-26 14:18 ` Austin S. Hemmelgarn
2018-01-26 14:47 ` Christophe Yayon
2018-01-26 14:55 ` Austin S. Hemmelgarn
2018-01-27 5:50 ` Andrei Borzenkov
[not found] ` <1517035210.1252874.1249880112.19FABD13@webmail.messagingengine.com>
2018-01-27 6:43 ` Andrei Borzenkov
2018-01-27 6:48 ` Christophe Yayon
2018-01-27 10:08 ` Christophe Yayon
2018-01-27 10:26 ` Andrei Borzenkov
2018-01-27 11:06 ` Tomasz Pala
2018-01-27 13:26 ` Adam Borowski
2018-01-27 14:36 ` Goffredo Baroncelli
2018-01-27 15:38 ` Adam Borowski
2018-01-27 15:22 ` Duncan
2018-01-28 0:39 ` Tomasz Pala
2018-01-28 20:02 ` Chris Murphy
2018-01-28 22:39 ` Tomasz Pala
2018-01-29 0:00 ` Chris Murphy
2018-01-29 8:54 ` Tomasz Pala
2018-01-29 11:24 ` Adam Borowski
2018-01-29 13:05 ` Austin S. Hemmelgarn
2018-01-30 13:46 ` Tomasz Pala
2018-01-30 15:05 ` Austin S. Hemmelgarn [this message]
2018-01-30 16:07 ` Tomasz Pala
2018-01-29 17:58 ` Andrei Borzenkov
2018-01-29 19:00 ` Austin S. Hemmelgarn
2018-01-29 21:54 ` waxhead
2018-01-30 13:46 ` Austin S. Hemmelgarn
2018-01-30 19:50 ` Tomasz Pala
2018-01-30 20:40 ` Austin S. Hemmelgarn
2018-01-30 15:24 ` Tomasz Pala
2018-01-30 13:36 ` Tomasz Pala
2018-01-30 4:44 ` Chris Murphy
2018-01-30 15:40 ` Tomasz Pala
2018-01-28 8:06 ` Andrei Borzenkov
2018-01-28 10:27 ` Tomasz Pala
2018-01-28 15:57 ` Duncan
2018-01-28 16:51 ` Andrei Borzenkov
2018-01-28 20:28 ` Chris Murphy
2018-01-28 23:13 ` Tomasz Pala
2018-01-27 21:12 ` Chris Murphy
2018-01-28 0:16 ` Tomasz Pala
2018-01-27 22:42 ` Tomasz Pala
2018-01-29 13:42 ` Austin S. Hemmelgarn
2018-01-30 15:09 ` Tomasz Pala
2018-01-30 16:22 ` Tomasz Pala
2018-01-30 16:30 ` Austin S. Hemmelgarn
2018-01-30 19:24 ` Tomasz Pala
2018-01-30 19:40 ` Tomasz Pala
2018-01-27 20:57 ` Chris Murphy
2018-01-28 0:00 ` Tomasz Pala
2018-01-28 10:43 ` Tomasz Pala
2018-01-26 21:54 ` Chris Murphy
2018-01-26 22:03 ` Christophe Yayon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2e6b43ce-048f-2404-9455-c768f95e34fb@gmail.com \
--to=ahferroin7@gmail.com \
--cc=gotar@polanet.pl \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).