Re: Large multi-device BTRFS array (usually) fails to mount on boot.

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Joshua" <joshua@mailmag.net>
To: "Graham Cobb" <g.btrfs@cobb.uk.net>, linux-btrfs@vger.kernel.org
Subject: Re: Large multi-device BTRFS array (usually) fails to mount on boot.
Date: Fri, 19 Feb 2021 23:56:36 +0000	[thread overview]
Message-ID: <d2ca3e092c22c8186fe3f8a0e65192d5@mailmag.net> (raw)
In-Reply-To: <f559cfb0-1e9c-7dc8-f54d-2a9fc71980cd@cobb.uk.net>

February 19, 2021 2:45 PM, "Graham Cobb" <g.btrfs@cobb.uk.net> wrote:

> On 19/02/2021 17:42, Joshua wrote:
> 
>> February 3, 2021 3:16 PM, "Graham Cobb" <g.btrfs@cobb.uk.net> wrote:
>> 
>>> On 03/02/2021 21:54, joshua@mailmag.net wrote:
>> 
>> Good Evening.
>> 
>> I have a large BTRFS array, (14 Drives, ~100 TB RAW) which has been having problems mounting on
>> boot without timing out. This causes the system to drop to emergency mode. I am then able to mount
>> the array in emergency mode and all data appears fine, but upon reboot it fails again.
>> 
>> I actually first had this problem around a year ago, and initially put considerable effort into
>> extending the timeout in systemd, as I believed that to be the problem. However, all the methods I
>> attempted did not work properly or caused the system to continue booting before the array was
>> mounted, causing all sorts of issues. Eventually, I was able to almost completely resolve it by
>> defragmenting the extent tree and subvolume tree for each subvolume. (btrfs fi defrag
>> /mountpoint/subvolume/) This seemed to reduce the time required to mount, and made it mount on boot
>> the majority of the time.
>>> Not what you asked, but adding "x-systemd.mount-timeout=180s" to the
>>> mount options in /etc/fstab works reliably for me to extend the timeout.
>>> Of course, my largest filesystem is only 20TB, across only two devices
>>> (two lvm-over-LUKS, each on separate physical drives) but it has very
>>> heavy use of snapshot creation and deletion. I also run with commit=15
>>> as power is not too reliable here and losing power is the most frequent
>>> cause of a reboot.
>> 
>> Thanks for the suggestion, but I have not been able to get this method to work either.
>> 
>> Here's what my fstab looks like, let me know if this is not what you meant!
>> 
>> UUID={snip} / ext4 errors=remount-ro 0 0
>> UUID={snip} /mnt/data btrfs defaults,noatime,compress-force=zstd:2,x-systemd.mount-timeout=300s 0 0
> 
> Hmmm. The line from my fstab is:
> 
> LABEL=lvmdata /mnt/data btrfs
> defaults,subvolid=0,noatime,nodiratime,compress=lzo,skip_balance,commit=15,space_cache=v2,x-systemd.
> ount-timeout=180s,nofail
> 0 3

Not very important, but note that noatime implies nodiratime.  https://lwn.net/Articles/245002/

> I note that I do have "nofail" in there, although it doesn't fail for me
> so I assume it shouldn't make a difference.

Ahh, I bet you're right, at least indirectly.

It appears nofail makes the system continue booting even if the mount was unsuccessful, which I'd rather not since some services do depend on this volume.  For example, some docker containers could misbehave if the path to the data they expect doesn't exist.

Not exactly the outcome I'd prefer, (due to services that may depend on the mount existing being allowed to start) but it may work.

I'm really very unsure how nofail interacts with x-systemd.mount-timeout.  I would think it would increase the timeout period.  But that's not what I'm seeing.  Perhaps there's some other kind of internal systemd timeout, and it gives up and continues to boot after that runs out, but allows mount to continue for the time specified?  Seems kinda weird.

I'll give it a try and see what happens.  I'll try and remember to report back here if so.

> I can't swear that the disk is currently taking longer to mount than the
> systemd default (and I will not be in a position to reboot this system
> any time soon to check). But I am quite sure this made a difference when
> I added it.
> 
> Not sure why it isn't working for you, unless it is some systemd
> problem. It isn't systemd giving up and dropping to emergency because of
> some other startup problem that occurs before the mount is finished, is
> it? I could believe systemd cancels any mounts in progress when that
> happens.
> 
> Graham

     prev parent reply	other threads:[~2021-02-19 23:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-03 21:54 Large multi-device BTRFS array (usually) fails to mount on boot joshua
2021-02-03 23:08 ` Graham Cobb
2021-02-04  0:56 ` Qu Wenruo
2021-02-06  5:00 ` Joshua
2021-02-19 17:42 ` Joshua
2021-02-19 22:45   ` Graham Cobb
2021-02-19 23:56   ` Joshua [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d2ca3e092c22c8186fe3f8a0e65192d5@mailmag.net \
    --to=joshua@mailmag.net \
    --cc=g.btrfs@cobb.uk.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).