linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>
To: Shyam Prasad N <nspmangalore@gmail.com>,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Btrfs occupies more space than du reports...
Date: Fri, 23 Feb 2018 08:23:47 -0500	[thread overview]
Message-ID: <3968047d-32ef-780c-5375-77c923d96f38@gmail.com> (raw)
In-Reply-To: <CANT5p=q76WRoc8VHdLKqb8zZrGm3h4qpt2o4-zv=M-Mmd3rtBQ@mail.gmail.com>

On 2018-02-23 06:21, Shyam Prasad N wrote:
> Hi,
> 
> Can someone explain me why there is a difference in the number of
> blocks reported by df and du commands below?
> 
> =====================
> # df -h /dc
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/drbd1      746G  519G  225G  70% /dc
> 
> # btrfs filesystem df -h /dc/
> Data, single: total=518.01GiB, used=516.58GiB
> System, DUP: total=8.00MiB, used=80.00KiB
> Metadata, DUP: total=2.00GiB, used=1019.72MiB
> GlobalReserve, single: total=352.00MiB, used=0.00B
> 
> # du -sh /dc
> 467G    /dc
> =====================
> 
> df shows 519G is used. While recursive check using du shows only 467G.
> The filesystem doesn't contain any snapshots/extra subvolumes.
> Neither does it contain any mounted filesystem under /dc.
> I also considered that it could be a void left behind by one of the
> open FDs held by a process. So I rebooted the system. Still no
> changes.
> 
> The situation is even worse on a few other systems with similar configuration.
> 

At least part of this is a difference in how each tool computes space usage.

* `df` calls `statvfs` to get it's data, which tries to count physical 
allocation accounting for replication profiles.  In other words, data in 
chunks with the dup, raid1, and raid10 profiles gets counted twice, data 
in raid5 and raid6 chunks gets counted with a bit of extra space for the 
parity, etc.

* `btrfs fi df` looks directly at the filesystem itself and counts how 
much space is available to each chunk type in the `total` values and how 
much space is used in each chunk type in the `used` values, after 
replication.  If you add together the data used value and twice the 
system and metadata used values, you get the used value reported by 
regular `df` (well, close to it that is, `df` rounds at a lower 
precision than `btrfs fi df` does).

* `du` scans the directory tree and looks at the file allocation values 
returned form `stat` calls (or just looks at file sizes if you pass the 
`--apparent-size` flag to it).  Like `btrfs fi df`, it reports values 
after replication, it has a couple of nasty caveats on BTRFS, namely 
that it will report sizes for natively compressed files _before_ 
compression, and will count reflinked blocks once for each link.

Now, this doesn't explain the entirety of the discrepancy with `du`, but 
it should cover the whole difference between `df` and `btrfs fi df`.

  reply	other threads:[~2018-02-23 13:23 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-23 11:21 Btrfs occupies more space than du reports Shyam Prasad N
2018-02-23 13:23 ` Austin S. Hemmelgarn [this message]
2018-02-28 11:26   ` Shyam Prasad N
2018-02-28 15:10     ` Andrei Borzenkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3968047d-32ef-780c-5375-77c923d96f38@gmail.com \
    --to=ahferroin7@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nspmangalore@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).