From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: mount option nodatacow for VMs on SSD?
Date: Fri, 25 Nov 2016 12:01:37 +0000 (UTC) [thread overview]
Message-ID: <pan$c85ad$dae9ebf5$a0482059$7c135fb1@cox.net> (raw)
In-Reply-To: 20161125082840.GA32711@rus.uni-stuttgart.de
Ulli Horlacher posted on Fri, 25 Nov 2016 09:28:40 +0100 as excerpted:
> I have vmware and virtualbox VMs on btrfs SSD.
>
> I read in
> https://btrfs.wiki.kernel.org/index.php/SysadminGuide
#When_To_Make_Subvolumes
>
> certain types of data (databases, VM images and similar typically
> big files that are randomly written internally) may require CoW to
> be disabled for them. So for example such areas could be placed in
> a subvolume, that is always mounted with the option "nodatacow".
>
> Does this apply to SSDs, too?
It can, because the root issue is the same, the COW-based fragmentation
that's always a problem with this sort of frequently randomly partially
rewritten file on COW-based filesystems, but the symptoms tend to be much
less of a problem on ssd, so it doesn't tend to be as big of an issue
there.
On multi-gig database files or VM images, files can end up with 100K
extents due to COW-based rewriting. Obviously this can be a HUGE problem
on spinning rust due to its seek times, a problem zero-seek-time ssds
don't have, but the sheer amount of metadata overhead due to tracking all
those tiny extents can be a problem of its own, particularly when doing
maintenance such as btrfs balance or btrfs check. Both snapshotting and
quota tracking amplify this overhead tracking problem as well, and it's
this problem that can still be an issue on ssds.
That said, the autodefrag mount option, used to eliminate some of the
heavy fragmentation due to copy-on-write (COW) that's the root problem,
tends to be faster on ssd, and can often be all that's needed on ssd as
between it ameliorating the root problem to a large extent and the faster
speed of ssds, often that's all that's needed, particularly if you don't
need quotas so have them off and only do relatively limited snapshotting.
The problem with both the nodatacow mount option and the nocow file
attribute is that they disable some of the btrfs features and are
weakened by other features that may well be a big part of the reason
behind your choice of btrfs in the first place. Both btrfs compression,
if otherwise enabled, and checksuming and thus file integrity checking
(and repair in the case of btrfs raid1/10), would be complicated or
impossible to implement without COW, and thus are disabled in the NOCOW
case. Similarly, btrfs snapshotting depends on COW because the snapshot
locks in place the existing version so a rewrite must be written
elsewhere. As a result, snapshotting weakens NOCOW to what has been
called COW1, COW the first time a block is rewritten after a snapshot,
but after that further writes to the same block will be rewritten into
the (new) existing block location. If you only do very occasional
snapshots that may not be a problem, but if you're doing regular
snapshots, particularly automated and multiple per day, the effect of the
snapshotting forced COW1s may be fragmentation as bad as if NOCOW wasn't
in place in the first place.
So to some degree, if you're going to be setting the nocow attribute or
using the nodatacow mount option, you might as well just setup a
different partition/volume and mkfs to something other than btrfs for
those files. OTOH, the btrfs multi-device and storage pool features
aren't affected, so if they are big reasons you're doing btrfs, then
there's some reason to keep using btrfs and simply do the nodatacow mount
or nocow attribute if autodefrag isn't enough on its own to handle it.
Bottom line, the fragmentation is much less of a problem on ssds,
particularly with autodefrag which may well be enough, but as always, it
can be installation and task dependent, so if it's going to be a
production system, do your own testing and make your own decisions based
on the results. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2016-11-25 12:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-25 8:28 mount option nodatacow for VMs on SSD? Ulli Horlacher
2016-11-25 12:01 ` Duncan [this message]
2016-11-25 12:25 ` Roman Mamedov
2016-11-26 10:27 ` Kai Krakow
2016-11-28 0:38 ` Ulli Horlacher
2016-11-28 2:56 ` Duncan
2016-11-28 9:49 ` [Not TLS] " Graham Cobb
2016-11-29 5:14 ` Duncan
2016-11-29 10:34 ` [Not TLS] " Niccolò Belli
2016-11-29 12:18 ` [Not TLS] " Austin S. Hemmelgarn
2016-11-28 8:20 ` Kai Krakow
2016-11-28 11:11 ` Niccolò Belli
2016-11-29 5:06 ` Duncan
2016-11-29 12:20 ` Austin S. Hemmelgarn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$c85ad$dae9ebf5$a0482059$7c135fb1@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).