first mount(s) after unclean shutdown always fail, second attempt

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* first mount(s) after unclean shutdown always fail, second attempt
@ 2020-07-02  2:18 Marc Lehmann
  2020-07-08 17:35 ` Calvin Walton
  0 siblings, 1 reply; 3+ messages in thread
From: Marc Lehmann @ 2020-07-02  2:18 UTC (permalink / raw)
  To: linux-btrfs

As requested by Qu Wenruo, I herewith "separately report" the spurious
mount bug again. It's mostly a copy of my first bug report, because, after
all, it's the same bug, but I tried to clarify things.

I have a server with multiple btrfs filesystems and some moderate-sized
dmcache caches (a few million blocks/100s of GBs).

When the server has an unclean shutdown, dmcache treats all cached blocks
as dirty. This has the effect of extremely slow I/O, as dmcache basically
caches a lot of random I/O, and writing these blocks back to the rotating
disk backing store can take hours. This, I think, is related to the
problem.

When the server is in this condition, then all btrfs filesystems on slow
stores (regardless of whether they use dmcache or not) fail their first
mount attempt(s) like this:

   [  173.243117] BTRFS info (device dm-7): has skinny extents
   [  864.982108] BTRFS error (device dm-7): open_ctree failed

all the filesystems in question are mounted twice during normal boots,
with diferent subvolumes, and systemd parallelises these mounts. This might
play a role in these failures.

Simply trying to mount the filesystems again then (usually) succeeds with
seemingly no issues, so these are spurious mount failures. These repeated
mount attewmpts are also much faster, presumably because a lot of the data
is already in memory.

As far as I am concerned, this is 100% reproducible (i.e. it happens on every
unclean shutdown). It also happens on "old" (4.19 era) filesystems as well as
on filesystems that have never seen anything older than 5.4 kernels.

It does _not_ happen with filesystems on SSDs, regardless of whether they
are mounted multiple times or not. It does happen to all filesystems that
are on rotating disks affected by dm-cache writes, regardless of whether
the filesystem itself uses dmcache or not.

The system in question is currently running 5.6.17, but the same thing
happens with 5.4 and 5.2 kernels, and it might have happened with much
earlier kernels as well, but I didn't have time to report this (as I
secretly hoped newer kernels would fix this, and unclean shutdowns are
rare).

Example btrfs kernel messages for one such unclean boot. This involved
normal boot, followed by unsuccessfull "mount -va" in the emergency shell
(i.e. a second mount fasilure for the same filesystem), followed by a
successfull "mount -va" in the shell.

[  122.856787] BTRFS: device label LOCALVOL devid 1 transid 152865 /dev/mapper/cryptlocalvol scanned by btrfs (727)
[  173.242545] BTRFS info (device dm-7): disk space caching is enabled
[  173.243117] BTRFS info (device dm-7): has skinny extents
[  363.573875] INFO: task mount:1103 blocked for more than 120 seconds.
the above message repeats multiple times, backtrace &c has been removed for clarity
[  484.405875] INFO: task mount:1103 blocked for more than 241 seconds.
[  605.237859] INFO: task mount:1103 blocked for more than 362 seconds.
[  605.252478] INFO: task mount:1211 blocked for more than 120 seconds.
[  726.069900] INFO: task mount:1103 blocked for more than 483 seconds.
[  726.084415] INFO: task mount:1211 blocked for more than 241 seconds.
[  846.901874] INFO: task mount:1103 blocked for more than 604 seconds.
[  846.916431] INFO: task mount:1211 blocked for more than 362 seconds.
[  864.982108] BTRFS error (device dm-7): open_ctree failed
[  867.551400] BTRFS info (device dm-7): turning on sync discard
[  867.551875] BTRFS info (device dm-7): disk space caching is enabled
[  867.552242] BTRFS info (device dm-7): has skinny extents
[  867.565896] BTRFS error (device dm-7): open_ctree failed
[  867.721885] BTRFS info (device dm-7): turning on sync discard
[  867.722341] BTRFS info (device dm-7): disk space caching is enabled
[  867.722691] BTRFS info (device dm-7): has skinny extents

Example fstab entries for the mounts above:

/dev/mapper/cryptlocalvol       /localvol       btrfs           defaults,nossd,discard                  0       0
/dev/mapper/cryptlocalvol       /cryptlocalvol  btrfs           defaults,nossd,subvol=/                 0       0

Sometimes, I also get the "super_total_bytes" message, but that one is
also spurious (i.e., subsequent mount attempts work). Running btrfs rescue
fix-device-size does not change anything, i.e. the filesystem still mounts
without an error many times, until there is another unclean shutdown,
where it has a chance to reappear.

In the example above, you can see that the super_total_bytes did not
appear for the first two mount attempts that failed.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: first mount(s) after unclean shutdown always fail, second attempt
  2020-07-02  2:18 first mount(s) after unclean shutdown always fail, second attempt Marc Lehmann
@ 2020-07-08 17:35 ` Calvin Walton
  2020-07-12  5:12   ` Marc Lehmann
  0 siblings, 1 reply; 3+ messages in thread
From: Calvin Walton @ 2020-07-08 17:35 UTC (permalink / raw)
  To: Marc Lehmann, linux-btrfs

On Thu, 2020-07-02 at 04:18 +0200, Marc Lehmann wrote:
> 
> When the server is in this condition, then all btrfs filesystems on
> slow
> stores (regardless of whether they use dmcache or not) fail their
> first
> mount attempt(s) like this:
> 
>    [  173.243117] BTRFS info (device dm-7): has skinny extents
>    [  864.982108] BTRFS error (device dm-7): open_ctree failed
> 
> all the filesystems in question are mounted twice during normal
> boots,
> with diferent subvolumes, and systemd parallelises these mounts. This
> might
> play a role in these failures.
> 
> Simply trying to mount the filesystems again then (usually) succeeds
> with
> seemingly no issues, so these are spurious mount failures. These
> repeated
> mount attewmpts are also much faster, presumably because a lot of the
> data
> is already in memory.

You shared kernel logs, but it would be helpful to see the systemd
journal. One thing to note is that by default systemd has a timeout on
mounts! It's entirely possible that as soon as the mount kernel thread
becomes unblocked, it notices that systemd has sent a SIGTERM/SIGKILL
and aborts the mount.

See the documentation (man systemd.mount) and consider increasing or
disabling the timeout on the affected mount units.

-- 
Calvin Walton <calvin.walton@kepstin.ca>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: first mount(s) after unclean shutdown always fail, second attempt
  2020-07-08 17:35 ` Calvin Walton
@ 2020-07-12  5:12   ` Marc Lehmann
  0 siblings, 0 replies; 3+ messages in thread
From: Marc Lehmann @ 2020-07-12  5:12 UTC (permalink / raw)
  To: Calvin Walton; +Cc: linux-btrfs

On Wed, Jul 08, 2020 at 01:35:08PM -0400, Calvin Walton <calvin.walton@kepstin.ca> wrote:
> You shared kernel logs, but it would be helpful to see the systemd
> journal. One thing to note is that by default systemd has a timeout on
> mounts! It's entirely possible that as soon as the mount kernel thread
> becomes unblocked, it notices that systemd has sent a SIGTERM/SIGKILL
> and aborts the mount.
> 
> See the documentation (man systemd.mount) and consider increasing or
> disabling the timeout on the affected mount units.

Good idea, systemd indeed KILLed the mount:

   Jul 01 02:02:26 cerebro systemd[1]: cryptlocalvol.mount: Mount process still around after SIGKILL. Ignoring.
   Jul 01 02:02:26 cerebro systemd[1]: cryptlocalvol.mount: Failed with result 'timeout'.
   Jul 01 02:02:26 cerebro systemd[1]: Failed to mount /cryptlocalvol.

However, are btrfs mounts really interruptible? In any case, the
problem happens both for systemd-triggered mounts as well as for mount
commands entered interactively (and also for volumes where there is no
fstab/mount unit at all). For example, here is a case where the initial
systemd-controlled mount fails, and the interactively-entered "mount
/localvol" then fails once more and it only succeeds on the third attempt:

   May 30 03:17:53 cerebro kernel: BTRFS info (device dm-7): disk space caching is enabled
   May 30 03:17:53 cerebro kernel: BTRFS info (device dm-7): has skinny extents
   May 30 03:19:23 cerebro systemd[1]: localvol.mount: Mounting timed out. Terminating.
   May 30 03:20:53 cerebro systemd[1]: localvol.mount: Mount process timed out. Killing.
   May 30 03:20:53 cerebro systemd[1]: localvol.mount: Killing process 1116 (mount) with signal SIGKILL.
   May 30 03:22:23 cerebro systemd[1]: localvol.mount: Mount process still around after SIGKILL. Ignoring.
   May 30 03:22:23 cerebro systemd[1]: localvol.mount: Failed with result 'timeout'.
   May 30 03:22:23 cerebro systemd[1]: Failed to mount /localvol.
   May 30 03:27:53 cerebro kernel: BTRFS error (device dm-7): open_ctree failed
[systemd-initiated mount failed here]
   May 30 03:27:54 cerebro kernel: BTRFS info (device dm-7): turning on discard
   May 30 03:27:54 cerebro kernel: BTRFS info (device dm-7): disk space caching is enabled
   May 30 03:27:54 cerebro kernel: BTRFS info (device dm-7): has skinny extents
   May 30 03:27:54 cerebro kernel: BTRFS error (device dm-7): open_ctree failed
[emergency-shell interactive mount failed here]
   May 30 03:28:14 cerebro kernel: BTRFS info (device dm-7): turning on discard
   May 30 03:28:14 cerebro kernel: BTRFS info (device dm-7): disk space caching is enabled
   May 30 03:28:14 cerebro kernel: BTRFS info (device dm-7): has skinny extents
[third attempt succeeded]
   May 30 03:40:04 cerebro systemd[1]: localvol.mount: Succeeded.

While looking at the case above, I notice that the second mount fails
practically instantly, and it was initiated practically the moment the
previous mount failed.

I think the reason is that systemd dropped me into the emergency
shell long before the (kernel) mount failed, and I likely entered the
interactive mount command long before the previous mount finished, which
could explain why the interactive mount appears to happen within one
second of the previous mount failure - it was probably running for minutes
already, waiting for some lock.

I have disabled the systemd mount timeout for the time being, to exclude
this case.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp@schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-07-12  5:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-02  2:18 first mount(s) after unclean shutdown always fail, second attempt Marc Lehmann
2020-07-08 17:35 ` Calvin Walton
2020-07-12  5:12   ` Marc Lehmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).