From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from pepin.polanet.pl ([193.34.52.2]:48024 "EHLO pepin.polanet.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752651AbeA1XNc (ORCPT ); Sun, 28 Jan 2018 18:13:32 -0500 Date: Mon, 29 Jan 2018 00:13:30 +0100 From: Tomasz Pala To: Btrfs BTRFS Subject: Re: degraded permanent mount option Message-ID: <20180128231330.GB26726@polanet.pl> References: <1517035210.1252874.1249880112.19FABD13@webmail.messagingengine.com> <8607255b-98e7-5623-6f62-75d6f7cf23db@gmail.com> <569AC15F-174E-4C78-8FE5-6CE9E0BED479@yayon.me> <111ca301-f631-694d-93eb-b73a790f57d4@gmail.com> <20180127110619.GA10472@polanet.pl> <20180127132641.mhmdhpokqrahgd4n@angband.pl> <7c95b4ae-f65e-b31d-f907-5eae5c60c49a@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-2 In-Reply-To: Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Sun, Jan 28, 2018 at 13:28:55 -0700, Chris Murphy wrote: >> Are you sure you really understand the problem? No mount happens because >> systemd waits for indication that it can mount and it never gets this >> indication. > > "not ready" is rather vague terminology but yes that's how systemd > ends up using the ioctl this rule depends on, even though the rule has > nothing to do with readiness per se. If all devices for a volume If you avoid using THIS ioctl, then you'd have nothing to fire the rule at all. One way or another, this is btrfs that must emit _some_ event or be polled _somehow_. > aren't found, we can correctly conclude a normal mount attempt *will* > fail. But that's all we can conclude. What I can't parse in all of > this is if the udev rule is a one shot, if the ioctl is a one shot, if > something is constantly waiting for "not all devices are found" to > transition to "all devices are found" or what. I can't actually parse It's not one shot. This works like this: sda1 appears -> udev catches event -> udev detects btrfs and IOCTLs => not ready sdb1 appears -> udev catches event -> udev detects btrfs and IOCTLs => ready The end. If there were some other device appearing after assembly, like /dev/md1, or if there were some event generated by btrfs code itself, udev could catch this and follow. Now, if you unplug sdb1, there's no such event at all. Since this IOCTL is the *only* thing that udev can rely on, it cannot be removed from the logic. So even if you create a timer to force assembly, you must do it by influencing the IOCTL response. Or creating some other IOCTL for this purpose, or creating some userspace daemon or whatever. > the two critical lines in this rule. I > > # let the kernel know about this btrfs filesystem, and check if it is complete > IMPORT{builtin}="btrfs ready $devnode" This sends IOCTL. > # mark the device as not ready to be used by the system > ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0" ^^^^^^^^^^^^^^this is IOCTL response being checked and SYSTEMD_READY set to 0 prevents systemd from mounting. > I think the Btrfs ioctl is a one shot. Either they are all present or not. The rules are called once per (block) device. So when btrfs scans all the devices to return READY, this would finally be systemd-ready. This is trivial to re-trigger udev rule (udevadm trigger), but there is no way to force btrfs to return READY after any timeout. > The waiting is a policy by systemd udev rule near as I can tell. There is no problem in waiting or re-triggering. This can be done in ~10 lines of rules. The problem is that the IOCTL won't EVER return READY until there are ALL the components present. It's simple as that: there MUST be some mechanism at device-manager level that tells if a compound device is mountable, degraded or not; upper layers (systemd-mount) do not care about degradation, handling redundancy/mirrors/chunks/stripes/spares is not it's job. It (systemd) can (easily!) handle expiration timer to push pending compound to be force-assembled, but currently there is no way to push. If the IOCTL would be extended to return TRYING_DEGRADED (when instructed to do so after expired timeout), systemd could handle additional per-filesystem fstab options, like x-systemd.allow-degraded. Then in would be possible to have best-effort policy for rootfs (to make machine boot), and more strict one for crucial data (do not mount it when there is no redundancy, wait for operator intervention). -- Tomasz Pala