From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f48.google.com ([209.85.214.48]:51010 "EHLO mail-it0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751835AbeA3PFk (ORCPT ); Tue, 30 Jan 2018 10:05:40 -0500 Received: by mail-it0-f48.google.com with SMTP id x128so982665ite.0 for ; Tue, 30 Jan 2018 07:05:40 -0800 (PST) Subject: Re: degraded permanent mount option To: Tomasz Pala , Btrfs BTRFS References: <20180127110619.GA10472@polanet.pl> <20180127132641.mhmdhpokqrahgd4n@angband.pl> <20180128003910.GA31699@polanet.pl> <20180128223946.GA26726@polanet.pl> <20180129085404.GA2500@polanet.pl> <20180129112456.r7ksq5mwp3ie6gmg@angband.pl> <6804d30d-53ff-403f-1eac-ac5da01509f7@gmail.com> <20180130134649.GA7126@polanet.pl> From: "Austin S. Hemmelgarn" Message-ID: <2e6b43ce-048f-2404-9455-c768f95e34fb@gmail.com> Date: Tue, 30 Jan 2018 10:05:34 -0500 MIME-Version: 1.0 In-Reply-To: <20180130134649.GA7126@polanet.pl> Content-Type: text/plain; charset=UTF-8; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2018-01-30 08:46, Tomasz Pala wrote: > On Mon, Jan 29, 2018 at 08:05:42 -0500, Austin S. Hemmelgarn wrote: > >> Seriously, _THERE IS A RACE CONDITION IN SYSTEMD'S CURRENT HANDLING OF >> THIS_. It's functionally no different than prefacing an attempt to send >> a signal to a process by checking if the process exists, or trying to >> see if some other process is using a file that might be locked by > > Seriously, there is a race condition on train stations. People check if > the train has stopped and opened the door before they move their legs to > get in, but the train might be already gone - so this is pointless. > > Instead, they should move their legs continuously and if the train is > not on the station yet, just climb back and retry. No, that's really not a good analogy given the fact that that check for the presence of a train takes a normal person milliseconds while the event being raced against (the train departing) takes minutes. In the case being discussed, the check takes milliseconds and the event being raced against also takes milliseconds. The scale here is drastically different.> > See the difference? I hope now you know what is the race condition. > It is the condition, where CONSEQUENCES are fatal. Yes, the consequences of the condition being discussed functionally are fatal (you completely fail to mount the volume), because systemd doesn't retry mounting the root filesystem, it just breaks, which is absolutely at odds with the whole 'just works' mentality I always hear from the systemd fanboys and developers. You're already looping forever _waiting_ for the volume to appear. How is that any different from lopping forever trying to _mount_ the volume instead given that failing to mount the volume is not going to damage things. The issue here is that systemd refuses to implement any method of actually retrying things that fail during startup.> > mounting BEFORE volume is complete is FATAL - since no userspace daemon > would ever retrigger the mount and the system won't came up. Provide one > btrfsd volume manager and systemd could probably switch to using it. And here you've lost any respect I might have had for you. **YOU DO NOT NEED A DAEMON TO DO EVERY LAST TASK ON THE SYSTEM** Period, end of story. This is one of the two biggest things I hate about systemd (the journal is the other one for those who care). You don't need some special daemon to set the time, or to set the hostname, or to fetch account data, or even to track who's logged in (though I understand that the last one is not systemd's fault originally). As much as it may surprise the systemd developers, people got on just fine handling setting the system time, setting the hostname, fetching account info, tracking active users, and any number of myriad other tasks before systemd decided they needed to have their own special daemon. In this particular case, you don't need a daemon because the kernel does the state tracking. It only checks that state completely though _when you ask it to mount the filesystem_ because it requires doing 99% of the work of mounting the filesystem (quite literally, you're doing pretty much everything short of actually hooking things up in the VFS layer). We are not a case like MD where there's just a tiny bit of metadata to parse to check what the state is supposed to be. Imagine if LVM required you to unconditionally activate all the LV's in a VG when you activate the VG and what logic would be required to validate the VG then, and you're pretty close to what's needed to check state for a BTRFS volume (translating LV's to chunks and the VG to the filesystem as a whole). There is no point in trying to parse that data every time a new device shows up, it's a waste of time (at a minimum, you're almost doubling the amount of time it takes to mount a volume if you are doing this each time a device shows up), energy, and resources in general. > > mounting AFTER volume is complete is FINE - and if the "pseudo-race" happens > and volume disappears, then this was either some operator action, so the > umount SHOULD happen, or we are facing some MALFUNCION, which is fatal > itself, not by being a "race condition". Short of catastrophic failure, the _volume_ doesn't disappear, a component device does, and that is where the problem lies, especially given that the ioctl only tracks that each component device has been seen, not that all are present at the moment the ioctl is invoked.