linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <jbacik@fb.com>
To: Daniele Testa <daniele.testa@gmail.com>, <linux-btrfs@vger.kernel.org>
Subject: Re: btrfs is using 25% more disk than it should
Date: Fri, 19 Dec 2014 16:10:44 -0500	[thread overview]
Message-ID: <54949454.9020601@fb.com> (raw)
In-Reply-To: <CAN6BF2Luf3ERd+ShLyUavzM3bLmy9dT918Zg17xL9T42DNVtVQ@mail.gmail.com>

On 12/18/2014 09:59 AM, Daniele Testa wrote:
> Hey,
>
> I am hoping you guys can shed some light on my issue. I know that it's
> a common question that people see differences in the "disk used" when
> running different calculations, but I still think that my issue is
> weird.
>
> root@s4 / # mount
> /dev/md3 on /opt/drives/ssd type btrfs
> (rw,noatime,compress=zlib,discard,nospace_cache)
>
> root@s4 / # btrfs filesystem df /opt/drives/ssd
> Data: total=407.97GB, used=404.08GB
> System, DUP: total=8.00MB, used=52.00KB
> System: total=4.00MB, used=0.00
> Metadata, DUP: total=1.25GB, used=672.21MB
> Metadata: total=8.00MB, used=0.00
>
> root@s4 /opt/drives/ssd # ls -alhs
> total 302G
> 4.0K drwxr-xr-x 1 root         root           42 Dec 18 14:34 .
> 4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 ..
> 302G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 18 14:49 disk_208.img
>     0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu    0 Dec 18 10:08 snapshots
>
> root@s4 /opt/drives/ssd # du -h
> 0       ./snapshots
> 302G    .
>
> As seen above, I have a 410GB SSD mounted at "/opt/drives/ssd". On
> that partition, I have one single starse file, taking 302GB of space
> (max 315GB). The snapshots directory is completely empty.
>
> However, for some weird reason, btrfs seems to think it takes 404GB.
> The big file is a disk that I use in a virtual server and when I write
> stuff inside that virtual server, the disk-usage of the btrfs
> partition on the host keeps increasing even if the sparse-file is
> constant at 302GB. I even have 100GB of "free" disk-space inside that
> virtual disk-file. Writing 1GB inside the virtual disk-file seems to
> increase the usage about 4-5GB on the "outside".
>
> Does anyone have a clue on what is going on? How can the difference
> and behaviour be like this when I just have one single file? Is it
> also normal to have 672MB of metadata for a single file?
>

Hello and welcome to the wonderful world of btrfs, where COW can really 
suck hard without being super clear why!  It's 4pm on a Friday right 
before I'm gone for 2 weeks so I'm a bit happy and drunk so I'm going to 
use pretty pictures.  You have this case to start with

file offset 0                                               offset 302g
[-------------------------prealloced 302g extent----------------------]

(man it's impressive I got all that lined up right)

On disk you have 2 things.  First your file which has file extents which 
says

inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g

and then in the extent tree, who keeps track of actual allocated space 
has this

extent bytenr 123, len 302g, refs 1

Now say you boot up your virt image and it writes 1 4k block to offset 
0.  Now you have this

[4k][--------------------302g-4k--------------------------------------]

And for your inode you now have this

inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), 
disklen 4k
inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, 
disklen 302g

and in your extent tree you have

extent bytenr 123, len 302g, refs 1
extent bytenr whatever, len 4k, refs 1

See that?  Your file is still the same size, it is still 302g.  If you 
cp'ed it right now it would copy 302g of information.  But what you have 
actually allocated on disk?  Well that's now 302g + 4k.  Now lets say 
your virt thing decides to write to the middle, lets say at offset 12k, 
now you have this

inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), 
disklen 4k
inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g
inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever, 
disklen 4k
inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123, 
disklen 302g

and in the extent tree you have this

extent bytenr 123, len 302g, refs 2
extent bytenr whatever, len 4k, refs 1
extent bytenr notimportant, len 4k, refs 1

See that refs 2 change?  We split the original extent, so we have 2 file 
extents pointing to the same physical extents, so we bumped the ref 
count.  This will happen over and over again until we have completely 
overwritten the original extent, at which point your space usage will go 
back down to ~302g.

We split big extents with cow, so unless you've got lots of space to 
spare or are going to use nodatacow you should probably not pre-allocate 
virt images.  Thanks,

Josef

  parent reply	other threads:[~2014-12-19 21:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-18 14:59 btrfs is using 25% more disk than it should Daniele Testa
2014-12-19 18:53 ` Phillip Susi
2014-12-19 19:59   ` Daniele Testa
2014-12-19 20:35     ` Phillip Susi
2014-12-19 21:15     ` Josef Bacik
2014-12-19 21:53       ` Phillip Susi
2014-12-19 22:06         ` Josef Bacik
2014-12-20  1:33     ` Duncan
2014-12-19 21:10 ` Josef Bacik [this message]
2014-12-19 21:17   ` Josef Bacik
2014-12-20  1:38     ` Duncan
2014-12-20  5:52     ` Zygo Blaxell
2014-12-20  6:18       ` Daniele Testa
2014-12-20  6:59         ` Duncan
2014-12-20 11:02         ` Josef Bacik
2014-12-20 11:28       ` Josef Bacik
2014-12-23 21:51         ` Zygo Blaxell
2014-12-20  9:15     ` Daniele Testa
2014-12-20 11:23     ` Robert White
2014-12-20 11:39       ` Josef Bacik
2014-12-21  1:40         ` Robert White
2014-12-21  3:04   ` Robert White

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54949454.9020601@fb.com \
    --to=jbacik@fb.com \
    --cc=daniele.testa@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).