linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: btrfs space used issue
Date: Wed, 28 Feb 2018 14:24:40 -0500	[thread overview]
Message-ID: <2892a866-fdc3-b337-4cd4-2cd4a18b9f21@gmail.com> (raw)
In-Reply-To: <pan$e116f$e5aa2400$88c9d453$f5589ed1@cox.net>

On 2018-02-28 14:09, Duncan wrote:
> vinayak hegde posted on Tue, 27 Feb 2018 18:39:51 +0530 as excerpted:
> 
>> I am using btrfs, But I am seeing du -sh and df -h showing huge size
>> difference on ssd.
>>
>> mount:
>> /dev/drbd1 on /dc/fileunifier.datacache type btrfs
>>
> (rw,noatime,nodiratime,flushoncommit,discard,nospace_cache,recovery,commit=5,subvolid=5,subvol=/)
>>
>>
>> du -sh /dc/fileunifier.datacache/ -  331G
>>
>> df -h /dev/drbd1      746G  346G  398G  47% /dc/fileunifier.datacache
>>
>> btrfs fi usage /dc/fileunifier.datacache/
>> Overall:
>>      Device size:         745.19GiB Device allocated:         368.06GiB
>>      Device unallocated:         377.13GiB Device missing:
>>      0.00B Used:             346.73GiB Free (estimated):
>>      396.36GiB    (min: 207.80GiB)
>>      Data ratio:                  1.00 Metadata ratio:              2.00
>>      Global reserve:         176.00MiB    (used: 0.00B)
>>
>> Data,single: Size:365.00GiB, Used:345.76GiB
>>     /dev/drbd1     365.00GiB
>>
>> Metadata,DUP: Size:1.50GiB, Used:493.23MiB
>>     /dev/drbd1       3.00GiB
>>
>> System,DUP: Size:32.00MiB, Used:80.00KiB
>>     /dev/drbd1      64.00MiB
>>
>> Unallocated:
>>     /dev/drbd1     377.13GiB
>>
>>
>> Even if we consider 6G metadata its 331+6 = 337.
>> where is 9GB used?
>>
>> Please explain.
> 
> Taking a somewhat higher level view than Austin's reply, on btrfs, plain
> df and to a somewhat lessor extent du[1] are at best good /estimations/
> of usage, and for df, space remaining.  Due to btrfs' COW/copy-on-write
> semantics and features such as the various replication/raid schemes,
> snapshotting, etc, btrfs makes available, that df/du don't really
> understand as they simply don't have and weren't /designed/ to have that
> level of filesystem-specific insight, they, particularly df due to its
> whole-filesystem focus, aren't particularly accurate on btrfs.  Consider
> their output more a "best estimate given the rough data we have
> available" sort of report.
> 
> To get the real filesystem focused picture, use btrfs filesystem usage,
> or btrfs filesystem show combined with btrfs filesystem df.  That's what
> you should trust, altho various utilities that check for available space
> before doing something often use the kernel-call equivalent of (plain) df
> to ensure they have the required space, so it's worthwhile to keep an eye
> on it as the filesystem fills, as well.  If it gets too out of sync with
> btrfs filesystem usage, or if btrfs filesystem usage unallocated drops
> below say five gigs or data or metadata size vs used shows a spread of
> multiple gigs (your data shows a spread of ~20 gigs ATM, but with 377
> gigs still unallocated it's no big deal; it would be a big deal if those
> were reversed, tho, only 20 gigs unallocated and a spread of 300+ gigs in
> data size vs used), then corrective action such as a filtered rebalance
> may be necessary.
> 
> There are entries in the FAQ discussing free space issues that you should
> definitely read if you haven't, altho they obviously address the general
> case, so if you have more questions about an individual case after having
> read them, here is a good place to ask. =:^)
> 
> Everything having to do with "space" (see both the 1/Important-questions
> and 4/Common-questions sections) here:
> 
> https://btrfs.wiki.kernel.org/index.php/FAQ
> 
> Meanwhile, it's worth noting that not entirely intuitively, btrfs' COW
> implementation can "waste" space on larger files that are mostly, but not
> entirely, rewritten.  An example is the best way to demonstrate.
> Consider each x a used block and each - an unused but still referenced
> block:
> 
> Original file, written as a single extent (diagram works best with
> monospace, not arbitrarily rewrapped):
> 
> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> 
> First rewrite of part of it:
> 
> xxxxxxxxxxx------xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>             xxxxxx
> 
> 
> Nth rewrite, where some blocks of the original still remain as originally
> written:
> 
> ------------------xxx------------------------------
>             xxx---
> xxxx----xxx
>      xxxx
>                       xxxxxxxxxxxxxxxxxxxxx---xxxxxx
>                                            xxx
>                xxx
> 
> 
> As you can see, that first really large extent remains fully referenced,
> altho only three blocks of it remain in actual use.  All those -- won't
> be returned to free space until those last three blocks get rewritten as
> well, thus freeing the entire original extent.
> 
> I believe this effect is what Austin was referencing when he suggested
> the defrag, tho defrag won't necessarily /entirely/ clear it up.  One way
> to be /sure/ it's cleared up would be to rewrite the entire file,
> deleting the original, either by copying it to a different filesystem and
> back (with the off-filesystem copy guaranteeing that it can't use reflinks
> to the existing extents), or by using cp's --reflink=never option.
> (FWIW, I prefer the former, just to be sure, using temporary copies to a
> suitably sized tmpfs for speed where possible, tho obviously if the file
> is larger than your memory size that's not possible.)
Correct, this is why I recommended trying a defrag.  I've actually never 
seen things so bad that a simple defrag didn't fix them however (though 
I have seen a few cases where the target extent size had to be set 
higher than the default of 20MB).  Also, as counter-intuitive as it 
might sound, autodefrag really doesn't help much with this, and can 
actually make things worse.

This is also one of the things I was referring to in item 6of the list 
of causes I gave, partly because I couldn't come up with a good way to 
explain it clearly (which I feel you did an excellent job of above), 
with the other big one being handling of xattrs and ACL's (which get 
accounted by `df` but generally aren't by `du` (at least, not reliably).
> 
> Of course where applicable, snapshots and dedup keep reflink-references
> to the old extents, so they must be adjusted or deleted as well, to
> properly free that space.
> 
> ---
> [1] du: Because its purpose is different.  du's primary purpose is
> telling you in detail what space files take up, per-file and per-
> directory, without particular regard to usage on the filesystem itself.
> df's focus, by contrast, is on the filesystem as a whole.  So where two
> files share the same extent due to reflinking, du should and does count
> that usage for each file, because that's what each file /uses/ even if
> they both use the same extents.

  reply	other threads:[~2018-02-28 19:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-27 13:09 btrfs space used issue vinayak hegde
2018-02-27 13:54 ` Austin S. Hemmelgarn
2018-02-28  6:01   ` vinayak hegde
2018-02-28 15:22     ` Andrei Borzenkov
2018-03-01  9:26       ` vinayak hegde
2018-03-01 10:18         ` Andrei Borzenkov
2018-03-01 12:25           ` Austin S. Hemmelgarn
2018-03-03  6:59         ` Duncan
2018-03-05 15:28           ` Christoph Hellwig
2018-03-05 16:17             ` Austin S. Hemmelgarn
2018-02-28 19:09 ` Duncan
2018-02-28 19:24   ` Austin S. Hemmelgarn [this message]
2018-02-28 19:54     ` Duncan
2018-02-28 20:15       ` Austin S. Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2892a866-fdc3-b337-4cd4-2cd4a18b9f21@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).