Re: mount time of multi-disk arrays

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Benjamin O'Connor" <boconnor@tripadvisor.com>
To: Duncan <1i5t5.duncan@cox.net>
Cc: <linux-btrfs@vger.kernel.org>
Subject: Re: mount time of multi-disk arrays
Date: Mon, 7 Jul 2014 12:40:23 -0400	[thread overview]
Message-ID: <53BACD77.8090200@tripadvisor.com> (raw)
In-Reply-To: <pan$e34e1$34425a38$28d282d8$6ffde053@cox.net>

As a point of reference, my BTRFS filesystem with 11 x 21TB devices in 
RAID0 with space cache enabled takes about 4 minutes to mount after a 
clean unmount.

There is a decent amount of variation in the amount of time (has been as 
low as 3 minutes or taken 5 minutes or longer).  These devices are all 
connected via 10gb iscsi.

Mount time seems to have not increased relative to the number of devices 
(so far).  I think that back when we had only 6 devices, it still took 
roughly that amount of time.

-ben

-- 
-----------------------------
Benjamin O'Connor
TechOps Systems Administrator
TripAdvisor Media Group

benoc@tripadvisor.com
c. 617-312-9072
-----------------------------


Duncan wrote:
> Konstantinos Skarlatos posted on Mon, 07 Jul 2014 16:54:05 +0300 as
> excerpted:
>
>> On 7/7/2014 4:38 μμ, André-Sebastian Liebe wrote:
>>> can anyone tell me how much time is acceptable and assumable for a
>>> multi-disk btrfs array with classical hard disk drives to mount?
>>>
>>> I'm having a bit of trouble with my current systemd setup, because it
>>> couldn't mount my btrfs raid anymore after adding the 5th drive. With
>>> the 4 drive setup it failed to mount once in a few times. Now it fails
>>> everytime because the default timeout of 1m 30s is reached and mount is
>>> aborted.
>>> My last 10 manual mounts took between 1m57s and 2m12s to finish.
>
>> I have the exact same problem, and have to manually mount my large
>> multi-disk btrfs filesystems, so I would be interested in a solution as
>> well.
>
> I don't have a direct answer, as my btrfs devices are all SSD, but...
>
> a) Btrfs, like some other filesystems, is designed not to need a
> pre-mount (or pre-rw-mount) fsck, because it does what /should/ be a
> quick-scan at mount-time.  However, that isn't always as quick as it
> might be for a number of reasons:
>
> a1) Btrfs is still a relatively immature filesystem and certain
> operations are not yet optimized.  In particular, multi-device btrfs
> operations tend to still be using a first-working-implementation type of
> algorithm instead of a well optimized for parallel operation algorithm,
> and thus often serialize access to multiple devices where a more
> optimized algorithm would parallelize operations across multiple devices
> at the same time.  That will come, but it's not there yet.
>
> a2) Certain operations such as orphan cleanup ("orphans" are files that
> were deleted while they were in use and thus weren't fully deleted at the
> time; if they were still in use at unmount (remount-read-only), cleanup
> is done at mount-time) can delay mount as well.
>
> a3) Inode_cache mount option:  Don't use this unless you can explain
> exactly WHY you are using it, preferably backed up with benchmark
> numbers, etc.  It's useful only on 32-bit, generally high-file-activity
> server systems and has general-case problems, including long mount times
> and possible overflow issues that make it inappropriate for normal use.
> Unfortunately there's a lot of people out there using it that shouldn't
> be, and I even saw it listed on at least one distro (not mine!) wiki. =:^(
>
> a4) The space_cache mount option OTOH *IS* appropriate for normal use
> (and is in fact enabled by default these days), but particularly in
> improper shutdown cases can require rebuilding at mount time -- altho
> this should happen /after/ mount, the system will just be busy for some
> minutes, until the space-cache is rebuilt.  But the IO from a space_cache
> rebuild on one filesystem could slow down the mounting of filesystems
> that mount after it, as well as the boot-time launching of other post-
> mount launched services.
>
> If you're seeing the time go up dramatically with the addition of more
> filesystem devices, however, and you do /not/ have inode_cache active,
> I'd guess it's mainly the not-yet-optimized multi-device operations.
>
>
> b) As with any systemd launched unit, however, there's systemd
> configuration mechanisms for working around specific unit issues,
> including timeout issues.  Of course most systems continue to use fstab
> and let systemd auto-generate the mount units, and in fact that is
> recommended, but either with fstab or directly created mount units,
> there's a timeout configuration option that can be set.
>
> b1) The general systemd *.mount unit [Mount] section option appears to be
> TimeoutSec=.  As is usual with systemd times, the default is seconds, or
> pass the unit(s, like "5min 20s").
>
> b2) I don't see it /specifically/ stated, but with a bit of reading
> between the lines, the corresponding fstab option appears to be either
> x-systemd.timeoutsec= or x-systemd.TimeoutSec= (IOW I'm not sure of the
> case).  You may also want to try x-systemd.device-timeout=, which /is/
> specifically mentioned, altho that appears to be specifically the timeout
> for the device to appear, NOT for the filesystem to mount after it does.
>
> b3) See the systemd.mount (5) and systemd-fstab-generator (8) manpages
> for more, that being what the above is based on.
>
> So it might take a bit of experimentation to find the exact command, but
> based on the above anyway, it /should/ be pretty easy to tell systemd to
> wait a bit longer for that filesystem.
>
> When you find the right invocation, please reply with it here, as I'm
> sure there's others who will benefit as well.  FWIW, I'm still on
> reiserfs for my spinning rust (only btrfs on my ssds), but I expect I'll
> switch them to btrfs at some point, so I may well use the information
> myself.  =:^)
>

next prev parent reply	other threads:[~2014-07-07 16:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-07 13:38 mount time of multi-disk arrays André-Sebastian Liebe
2014-07-07 13:54 ` Konstantinos Skarlatos
2014-07-07 14:14   ` Austin S Hemmelgarn
2014-07-07 16:57     ` André-Sebastian Liebe
2014-07-07 14:24   ` André-Sebastian Liebe
2014-07-07 22:34     ` Konstantinos Skarlatos
2014-07-07 15:48   ` Duncan
2014-07-07 16:40     ` Benjamin O'Connor [this message]
2014-07-07 22:31     ` Konstantinos Skarlatos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53BACD77.8090200@tripadvisor.com \
    --to=boconnor@tripadvisor.com \
    --cc=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.