Re: btrfs is using 25% more disk than it should

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs is using 25% more disk than it should
Date: Sat, 20 Dec 2014 01:33:10 +0000 (UTC)	[thread overview]
Message-ID: <pan$5ee96$62de2cb1$5eb9b857$32dd0ff2@cox.net> (raw)
In-Reply-To: CAN6BF2JoMki_KmpmVYVM-_ECqCg_w-qo9_6P=MiZbabMQyVN_g@mail.gmail.com

Daniele Testa posted on Sat, 20 Dec 2014 03:59:42 +0800 as excerpted:

> The file has both checksums and datacow on it. I will do "chattr +C"
> on the parent dir and re-create the file to make sure all files are
> marked as "nodatacow".
> 
> Should I also turn off checksums with the mount-flags if this filesystem
> only contain big VM-files? Or is it not needed if I put +C on the parent
> dir?

FWIW...

Turning off datacow, whether by chattr +C on the parent dir before 
creating the file, or via mount option, turns off checksumming as well.  
(For completeness, it also turns off compression, but I don't think that 
applies in your case.)

In general, active VM images (and database files) with default flags tend 
to get very highly fragmented very fast, due to btrfs' default COW on a 
file with a heavy "internal rewrite" pattern (as opposed to append-only 
or full rename/replace on rewrite).  For relatively small files with this 
rewrite pattern, think typical desktop firefox sqlite database files of a 
quarter GiB or less, the btrfs autodefrag mount option can be helpful, 
but because it triggers a rewrite of the entire file, as filesize goes 
up, the viability of autodefrag goes down, and at somewhere around half a 
gig, autodefrag doesn't work so well any more, particularly on very 
active files where the incoming rewrite stream may be faster than btrfs 
can rewrite the entire file.

Making heavy-internal-rewrite pattern files of over say half a GiB in 
size nocow is one suggested solution.  However, snapshots lock in place 
the existing version, causing a one-time COW after a snapshot.  If people 
are doing frequent automated snapshots (say once an hour), this can be a 
big problem, as the file ends up fragmenting pretty badly with these 1-
cow writes as well.  That's where snapshots come into the picture.

There are ways to work around the problem (put the files in question on a 
subvolume and don't snapshot it as often as the parent, setup a cron job 
to do say weekly defrag on the files in question, etc), but since you 
don't have snapshots going anyway, that's not a concern for you except as 
a preventative -- consider it if you /do/ start doing snapshots.

So anyway, as I said, creating the file nocow (whether by mount option or 
chattr) will turn off checksumming too.  But on something that frequently 
internally rewritten, where corruption will very likely corrupt the VM 
anyway and there's already mechanisms in place to deal with that (either 
VM integrity mechanisms, or backups, or simply disposable VMs, fire up a 
new one when necessary), at least with btrfs single-mode-data where 
there's no second copy to restore from if the checksum /does/ fail, 
turning off checksumming isn't necessarily as bad as it may seem anyway.

And it /should/ save you some on the metadata... tho I'd not consider 
that savings worth turning off checksumming if that were the /only/ 
reason, on its own.  The metadata difference is more a nice side-effect 
of an already commonly recommended practice for large VM image files, 
than something you'd turn off checksumming for in the first place.  
Certainly, on most files I'd prefer the checksums, and in fact am running 
btrfs raid1 mode here specifically to get the benefit of having a second 
copy to retrieve from if the first attempted copy fails checksum.  But VM 
images and database files are a bit of an exception.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2014-12-20  1:33 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-18 14:59 btrfs is using 25% more disk than it should Daniele Testa
2014-12-19 18:53 ` Phillip Susi
2014-12-19 19:59   ` Daniele Testa
2014-12-19 20:35     ` Phillip Susi
2014-12-19 21:15     ` Josef Bacik
2014-12-19 21:53       ` Phillip Susi
2014-12-19 22:06         ` Josef Bacik
2014-12-20  1:33     ` Duncan [this message]
2014-12-19 21:10 ` Josef Bacik
2014-12-19 21:17   ` Josef Bacik
2014-12-20  1:38     ` Duncan
2014-12-20  5:52     ` Zygo Blaxell
2014-12-20  6:18       ` Daniele Testa
2014-12-20  6:59         ` Duncan
2014-12-20 11:02         ` Josef Bacik
2014-12-20 11:28       ` Josef Bacik
2014-12-23 21:51         ` Zygo Blaxell
2014-12-20  9:15     ` Daniele Testa
2014-12-20 11:23     ` Robert White
2014-12-20 11:39       ` Josef Bacik
2014-12-21  1:40         ` Robert White
2014-12-21  3:04   ` Robert White

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$5ee96$62de2cb1$5eb9b857$32dd0ff2@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.