All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@libero.it>
To: Chris Murphy <lists@colorremedies.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: problem with degraded boot and systemd
Date: Wed, 21 May 2014 00:00:24 +0200	[thread overview]
Message-ID: <537BD078.7070504@libero.it> (raw)
In-Reply-To: <45D5C607-ED9D-49BB-BA60-CA2B0E94223D@colorremedies.com>

On 05/19/2014 02:54 AM, Chris Murphy wrote:
> Summary:
> 
> It's insufficient to pass rootflags=degraded to get the system root
> to mount when a device is missing. It looks like when a device is
> missing, udev doesn't create the dev-disk-by-uuid linkage that then
> causes systemd to change the device state from dead to plugged. Only
> once plugged, will systemd attempt to mount the volume. This issue
> was brought up on systemd-devel under the subject "timed out waiting
> for device dev-disk-by\x2duuid" for those who want details.
> 
[...]
> 
> I think the key problem is either a limitation of udev, or a problem
> with the existing udev rule, that prevents the link creation for any
> remaining btrfs device. Or maybe it's intentional. But I'm not a udev
> expert. This is the current udev rule:
> 
> # cat /usr/lib/udev/rules.d/64-btrfs.rules 
> # do not edit this file, it will be overwritten on update
> 
> SUBSYSTEM!="block", GOTO="btrfs_end" ACTION=="remove",
> GOTO="btrfs_end" ENV{ID_FS_TYPE}!="btrfs", GOTO="btrfs_end"
> 
> # let the kernel know about this btrfs filesystem, and check if it is complete 
> IMPORT{builtin}="btrfs ready $devnode"
> 
> # mark the device as not ready to be used by the system 
> ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"
> 
> LABEL="btrfs_end"


The key is the line 

	IMPORT{builtin}="btrfs ready $devnode"

This line sets ID_BTRFS_READY=0 if a filesystem is not ready; otherwise 
set ID_BTRFS_READY=1 [1].
The next line 

	ENV{ID_BTRFS_READY}=="0", ENV{SYSTEMD_READY}="0"

sets SYSTEMD_READY=0 if the filesystem is not ready so the "plug" event
is not raised to systemd.

This is my understanding.

 
 
> How this works with raid:
> 
> RAID assembly is separate from filesystem mount. The volume UUID
> isn't available until the RAID is successfully assembled.
> 
> On at least Fedora (dracut) systems with the system root on an md
> device, the initramfs contains 30-parse-md.sh which includes a loop
> to check for the volume UUID. If it's not found, the script sleeps
> for 0.5 seconds, and then looks for it again, up to 240 times. If
> it's still not found at attempt 240, then the script executes mdadm
> -R to forcibly run the array with fewer than all devices present
> (degraded assembly). Now the volume UUID exists, udevd creates the
> linkage, systemd picks this up and changes device state from dead to
> plugged, and then executes a normal mount command.

> The approximate Btrfs equivalent down the road would be a similar
> initrd script, or maybe a user space daemon, that causes btrfs device
> ready to confirm/deny all devices are present. And after x number of
> failures, then it's issue an equivalent to mdadm -R which right now
> we don't seem to have.

I suggest to implement a mount.btrfs command, which waits all the 
needed disks until a timeout expires. After this timeout it could try
a "degraded" mount until a second timeout. Only then it fails.

Each time a device appear, the system may start mount.btrfs. Each 
invocation has to test if there is another instance of mount.btrfs related
to the same filesystem; if so it ends, otherwise it follows the above
behavior.


> 
> That equivalent might be a decoupling of degraded as a mount option,
> such that the user space tool deals with degradedness. And the mount
>[...]
> 
> Chris Murphy
G.Baroncelli

[1] http://lists.freedesktop.org/archives/systemd-commits/2012-September/002503.html

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  reply	other threads:[~2014-05-20 21:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-19  0:54 problem with degraded boot and systemd Chris Murphy
2014-05-20 22:00 ` Goffredo Baroncelli [this message]
2014-05-20 22:26   ` Hugo Mills
2014-05-21  0:03     ` Duncan
2014-05-21  0:51       ` Chris Murphy
     [not found]       ` <4QrS1o00b1EMSLa01QrT4N>
2014-05-21 10:35         ` Duncan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=537BD078.7070504@libero.it \
    --to=kreijack@libero.it \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.