From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo
Date: Thu, 12 May 2016 10:18:45 +0000 (UTC) [thread overview]
Message-ID: <pan$331b6$43ad41cc$c9aa5858$e14154bc@cox.net> (raw)
In-Reply-To: 264486fb-94ce-4040-a436-e9db5de9a203@linuxsystems.it
Niccolò Belli posted on Wed, 11 May 2016 21:50:43 +0200 as excerpted:
> Hi,
> Before doing the daily backup I did a btrfs check and btrfs scrub as
> usual.
> After that this time I also decided to run btrfs filesystem defragment
> -r -v -clzo on all subvolumes (from a live distro) and just to be sure I
> runned check and scrub once again.
>
> Before defragment: total bytes scrubbed: 15.90GiB with 0 errors
> After defragment: total bytes scrubbed: 26.66GiB with 0 errors
>
> What did happen? This is something like a night and day difference:
> almost double the data! As stated in the subject all the subolumes have
> always been mounted with compress=lzo in /etc/fstab, even when I
> installed the distro a couple of days ago I manually mounted the
> subvolumes with -o compress=lzo. Instead I never used autodefrag.
I'd place money on your use of either snapshots or dedup. As CAM says
(perhaps too) briefly, defrag isn't snapshot (technically, reflink)
aware, and will break reflinks from other snapshots/dedups as it defrags
whatever file it's currently working on.
If there's few to no reflinks, as there won't be if you're not using
snapshots, btrfs dedup, etc, no problem, but where there's existing
reflinks, the mechanism both snapshots and the various btrfs dedup tools
use, it will rewrite only the copy of the data it's working on, leaving
the others as they are, thus effectively doubling (for the snapshots and
first defrag case) the data usage, the old possibly multiply snapshot-
reflinked copy, and the new defragged copy that no longer shares extents
with the snapshots and other previously reflinked copies.
And unlike a normal defrag, when you use the compress option, it forced
rewrite of every file in ordered to (possibly re)compress it. So while a
normal defrag would have only rewritten some files and would have only
expanded data usage to the extent it actually did rewrites, the compress
option forced it to recompress all files it came across, breaking all
those reflinks and duplicating the data if existing snapshots, etc, still
referenced the old copies, in the process, thereby effectively doubling
your data usage.
The fact that it didn't /quite/ double usage may be down to the normal
compress mount option only doing a quick compression test and not
compressing it if the file doesn't seem particularly compressible based
on that quick test, while the defrag with compress likely actually checks
every (128 KiB compression) block, getting a bit better compression in
the process. So the defrag/compress run didn't quite double usage as it
compressed some stuff that the runtime compression didn't. (FWIW, you
can get the more thorough runtime compression behavior with the compress-
force option, which always tries compression, not just doing a quick test
and skipping compression on the entire file if the bit the test tried
didn't compress so well.)
FWIW, around 3.9, btrfs defrag was actually snapshot/reflink aware for a
few releases, but it turned out that dealing with all those reflinks
simply didn't scale well with the then existing code, and people were
reporting defrag runs taking days or weeks, to (would-be) months with
enough snapshots and with quotas (which didn't scale well either) turned
on.
Obviously that was simply unworkable, so defrag's snapshot awareness was
reverted until they could make it scale better, as a working but snapshot
unaware defrag was clearly more practical than one that couldn't be run
because it'd take months, and that snapshot awareness has yet to be
reactivated.
So now the bottom line is don't defrag what you don't want un-reflinked.
FWIW, autodefrag has the same problem in theory, but the effect in
practice is far more limited, in part because it only does its defrag
thing when some part of the file is being rewritten (and thus COWed
elsewhere, doing a limited dereflink already for the actually written
block(s) already, and while autodefrag will magnify that a bit by COWing
somewhat larger extents, for files of any size (MiB scale and larger)
it's not going to rewrite and thus duplicate the entire file, as as
defrag could do. And it's definitely not going to be rewriting all files
in large sections of the filesystem as recursive defrag with the
compression option will.
Additionally, autodefrag will tend to defrag the file shortly after it
has been changed, likely before any snapshots have been taken if they're
only taken daily or so, so you'll only have effectively two copies of the
portion of the file that was changed, the old version as still locked in
place by previous snapshots and the new version, not the three that
you're likely to have if you wait until snapshots have been done before
doing the defrag (the old version as in previous snapshots, the new
version as initially written and locked in place by post-change pre-
defrag snapshots, and the new version as defragged).
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
next prev parent reply other threads:[~2016-05-12 10:19 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-11 19:50 Amount of scrubbed data goes from 15.90GiB to 26.66GiB after defragment -r -v -clzo on a fs always mounted with compress=lzo Niccolò Belli
2016-05-11 20:07 ` Christoph Anton Mitterer
2016-05-12 10:18 ` Duncan [this message]
2016-05-12 13:56 ` Niccolò Belli
2016-05-13 6:11 ` Duncan
2016-05-20 15:51 ` Niccolò Belli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='pan$331b6$43ad41cc$c9aa5858$e14154bc@cox.net' \
--to=1i5t5.duncan@cox.net \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).