Btrfs occupies more space than du reports...

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Btrfs occupies more space than du reports...
@ 2018-02-23 11:21 Shyam Prasad N
  2018-02-23 13:23 ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 4+ messages in thread
From: Shyam Prasad N @ 2018-02-23 11:21 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi,

Can someone explain me why there is a difference in the number of
blocks reported by df and du commands below?

=====================
# df -h /dc
Filesystem      Size  Used Avail Use% Mounted on
/dev/drbd1      746G  519G  225G  70% /dc

# btrfs filesystem df -h /dc/
Data, single: total=518.01GiB, used=516.58GiB
System, DUP: total=8.00MiB, used=80.00KiB
Metadata, DUP: total=2.00GiB, used=1019.72MiB
GlobalReserve, single: total=352.00MiB, used=0.00B

# du -sh /dc
467G    /dc
=====================

df shows 519G is used. While recursive check using du shows only 467G.
The filesystem doesn't contain any snapshots/extra subvolumes.
Neither does it contain any mounted filesystem under /dc.
I also considered that it could be a void left behind by one of the
open FDs held by a process. So I rebooted the system. Still no
changes.

The situation is even worse on a few other systems with similar configuration.

-- 
-Shyam

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs occupies more space than du reports...
  2018-02-23 11:21 Btrfs occupies more space than du reports Shyam Prasad N
@ 2018-02-23 13:23 ` Austin S. Hemmelgarn
  2018-02-28 11:26   ` Shyam Prasad N
  0 siblings, 1 reply; 4+ messages in thread
From: Austin S. Hemmelgarn @ 2018-02-23 13:23 UTC (permalink / raw)
  To: Shyam Prasad N, Btrfs BTRFS

On 2018-02-23 06:21, Shyam Prasad N wrote:
> Hi,
> 
> Can someone explain me why there is a difference in the number of
> blocks reported by df and du commands below?
> 
> =====================
> # df -h /dc
> Filesystem      Size  Used Avail Use% Mounted on
> /dev/drbd1      746G  519G  225G  70% /dc
> 
> # btrfs filesystem df -h /dc/
> Data, single: total=518.01GiB, used=516.58GiB
> System, DUP: total=8.00MiB, used=80.00KiB
> Metadata, DUP: total=2.00GiB, used=1019.72MiB
> GlobalReserve, single: total=352.00MiB, used=0.00B
> 
> # du -sh /dc
> 467G    /dc
> =====================
> 
> df shows 519G is used. While recursive check using du shows only 467G.
> The filesystem doesn't contain any snapshots/extra subvolumes.
> Neither does it contain any mounted filesystem under /dc.
> I also considered that it could be a void left behind by one of the
> open FDs held by a process. So I rebooted the system. Still no
> changes.
> 
> The situation is even worse on a few other systems with similar configuration.
> 

At least part of this is a difference in how each tool computes space usage.

* `df` calls `statvfs` to get it's data, which tries to count physical 
allocation accounting for replication profiles.  In other words, data in 
chunks with the dup, raid1, and raid10 profiles gets counted twice, data 
in raid5 and raid6 chunks gets counted with a bit of extra space for the 
parity, etc.

* `btrfs fi df` looks directly at the filesystem itself and counts how 
much space is available to each chunk type in the `total` values and how 
much space is used in each chunk type in the `used` values, after 
replication.  If you add together the data used value and twice the 
system and metadata used values, you get the used value reported by 
regular `df` (well, close to it that is, `df` rounds at a lower 
precision than `btrfs fi df` does).

* `du` scans the directory tree and looks at the file allocation values 
returned form `stat` calls (or just looks at file sizes if you pass the 
`--apparent-size` flag to it).  Like `btrfs fi df`, it reports values 
after replication, it has a couple of nasty caveats on BTRFS, namely 
that it will report sizes for natively compressed files _before_ 
compression, and will count reflinked blocks once for each link.

Now, this doesn't explain the entirety of the discrepancy with `du`, but 
it should cover the whole difference between `df` and `btrfs fi df`.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs occupies more space than du reports...
  2018-02-23 13:23 ` Austin S. Hemmelgarn
@ 2018-02-28 11:26   ` Shyam Prasad N
  2018-02-28 15:10     ` Andrei Borzenkov
  0 siblings, 1 reply; 4+ messages in thread
From: Shyam Prasad N @ 2018-02-28 11:26 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Btrfs BTRFS

Hi,

Thanks for the reply.

> * `df` calls `statvfs` to get it's data, which tries to count physical
> allocation accounting for replication profiles.  In other words, data in
> chunks with the dup, raid1, and raid10 profiles gets counted twice, data in
> raid5 and raid6 chunks gets counted with a bit of extra space for the
> parity, etc.

We have data not using raid (single), metadata using dup, we've not
used compression, subvols have not been created yet (other than the
default subvol), there are no other mount points within the tree.
Taking into account all that you're saying, the numbers don't make
sense to me. "btrfs fi usage" tells that the data "used" is much more
than what it should be. I agree more with what du is saying the disk
usage is.
I tried an experiment. Filled up the available space (as per what
btrfs believes is available) with huge files. As soon as the usage
reached 100%, further writes started to return ENOSPC. This is what
I'm scared is what is going to happen when these filesystems
eventually fill up. This would normally be the expected behaviour, but
in many of these servers, the actual data that is being used is much
lesser (60-70 GBs in some cases).
To me, it looks like a btrfs internal refcounting has gone wrong.
Maybe it's thinking that some data blocks (which are actually free)
are in use? Or some other refcounting issue?
We've tried "btrfs check" as well as "btrfs scrub", so far. Both have
not reported any errors.

Regards,
Shyam

On Fri, Feb 23, 2018 at 6:53 PM, Austin S. Hemmelgarn
<ahferroin7@gmail.com> wrote:
> On 2018-02-23 06:21, Shyam Prasad N wrote:
>>
>> Hi,
>>
>> Can someone explain me why there is a difference in the number of
>> blocks reported by df and du commands below?
>>
>> =====================
>> # df -h /dc
>> Filesystem      Size  Used Avail Use% Mounted on
>> /dev/drbd1      746G  519G  225G  70% /dc
>>
>> # btrfs filesystem df -h /dc/
>> Data, single: total=518.01GiB, used=516.58GiB
>> System, DUP: total=8.00MiB, used=80.00KiB
>> Metadata, DUP: total=2.00GiB, used=1019.72MiB
>> GlobalReserve, single: total=352.00MiB, used=0.00B
>>
>> # du -sh /dc
>> 467G    /dc
>> =====================
>>
>> df shows 519G is used. While recursive check using du shows only 467G.
>> The filesystem doesn't contain any snapshots/extra subvolumes.
>> Neither does it contain any mounted filesystem under /dc.
>> I also considered that it could be a void left behind by one of the
>> open FDs held by a process. So I rebooted the system. Still no
>> changes.
>>
>> The situation is even worse on a few other systems with similar
>> configuration.
>>
>
> At least part of this is a difference in how each tool computes space usage.
>
> * `df` calls `statvfs` to get it's data, which tries to count physical
> allocation accounting for replication profiles.  In other words, data in
> chunks with the dup, raid1, and raid10 profiles gets counted twice, data in
> raid5 and raid6 chunks gets counted with a bit of extra space for the
> parity, etc.
>
> * `btrfs fi df` looks directly at the filesystem itself and counts how much
> space is available to each chunk type in the `total` values and how much
> space is used in each chunk type in the `used` values, after replication.
> If you add together the data used value and twice the system and metadata
> used values, you get the used value reported by regular `df` (well, close to
> it that is, `df` rounds at a lower precision than `btrfs fi df` does).
>
> * `du` scans the directory tree and looks at the file allocation values
> returned form `stat` calls (or just looks at file sizes if you pass the
> `--apparent-size` flag to it).  Like `btrfs fi df`, it reports values after
> replication, it has a couple of nasty caveats on BTRFS, namely that it will
> report sizes for natively compressed files _before_ compression, and will
> count reflinked blocks once for each link.
>
> Now, this doesn't explain the entirety of the discrepancy with `du`, but it
> should cover the whole difference between `df` and `btrfs fi df`.



-- 
-Shyam

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Btrfs occupies more space than du reports...
  2018-02-28 11:26   ` Shyam Prasad N
@ 2018-02-28 15:10     ` Andrei Borzenkov
  0 siblings, 0 replies; 4+ messages in thread
From: Andrei Borzenkov @ 2018-02-28 15:10 UTC (permalink / raw)
  To: Shyam Prasad N; +Cc: Austin S. Hemmelgarn, Btrfs BTRFS

On Wed, Feb 28, 2018 at 2:26 PM, Shyam Prasad N <nspmangalore@gmail.com> wrote:
> Hi,
>
> Thanks for the reply.
>
>> * `df` calls `statvfs` to get it's data, which tries to count physical
>> allocation accounting for replication profiles.  In other words, data in
>> chunks with the dup, raid1, and raid10 profiles gets counted twice, data in
>> raid5 and raid6 chunks gets counted with a bit of extra space for the
>> parity, etc.
>
> We have data not using raid (single), metadata using dup, we've not
> used compression, subvols have not been created yet (other than the
> default subvol), there are no other mount points within the tree.
> Taking into account all that you're saying, the numbers don't make
> sense to me. "btrfs fi usage" tells that the data "used" is much more
> than what it should be. I agree more with what du is saying the disk
> usage is.
> I tried an experiment. Filled up the available space (as per what
> btrfs believes is available) with huge files. As soon as the usage
> reached 100%, further writes started to return ENOSPC. This is what
> I'm scared is what is going to happen when these filesystems
> eventually fill up. This would normally be the expected behaviour, but
> in many of these servers, the actual data that is being used is much
> lesser (60-70 GBs in some cases).
> To me, it looks like a btrfs internal refcounting has gone wrong.
> Maybe it's thinking that some data blocks (which are actually free)
> are in use?

One reason could be overwrites inside of extents. What happens is
btrfs does not (always) physically split extent when it is partially
overwritten. So some space remains free but unavailable.

Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/sdb1        8387584 16704   7531456   1% /mnt
localhost:~ # dd if=/dev/urandom of=/mnt/file bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.580041 s, 181 MB/s
localhost:~ # sync
localhost:~ # df -k /mnt
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/sdb1        8387584 119552   7428864   2% /mnt
localhost:~ # dd if=/dev/urandom of=/mnt/file bs=1M count=1 conv=notrunc seek=25
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00781892 s, 134 MB/s
localhost:~ # dd if=/dev/urandom of=/mnt/file bs=1M count=1 conv=notrunc seek=50
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00780386 s, 134 MB/s
localhost:~ # dd if=/dev/urandom of=/mnt/file bs=1M count=1 conv=notrunc seek=75
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.00761908 s, 138 MB/s
localhost:~ # sync
localhost:~ # df -k /mnt
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/sdb1        8387584 122624   7425792   2% /mnt

So 3M is lost. And if you write 50M in the middle you will get 50M "lost" space.

I do not know how btrfs decides when to split extent.

defragmenting file should free those partial extents again.

btrfs fi defrag -r /mnt

> Or some other refcounting issue?
> We've tried "btrfs check" as well as "btrfs scrub", so far. Both have
> not reported any errors.
>
> Regards,
> Shyam
>
> On Fri, Feb 23, 2018 at 6:53 PM, Austin S. Hemmelgarn
> <ahferroin7@gmail.com> wrote:
>> On 2018-02-23 06:21, Shyam Prasad N wrote:
>>>
>>> Hi,
>>>
>>> Can someone explain me why there is a difference in the number of
>>> blocks reported by df and du commands below?
>>>
>>> =====================
>>> # df -h /dc
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/drbd1      746G  519G  225G  70% /dc
>>>
>>> # btrfs filesystem df -h /dc/
>>> Data, single: total=518.01GiB, used=516.58GiB
>>> System, DUP: total=8.00MiB, used=80.00KiB
>>> Metadata, DUP: total=2.00GiB, used=1019.72MiB
>>> GlobalReserve, single: total=352.00MiB, used=0.00B
>>>
>>> # du -sh /dc
>>> 467G    /dc
>>> =====================
>>>
>>> df shows 519G is used. While recursive check using du shows only 467G.
>>> The filesystem doesn't contain any snapshots/extra subvolumes.
>>> Neither does it contain any mounted filesystem under /dc.
>>> I also considered that it could be a void left behind by one of the
>>> open FDs held by a process. So I rebooted the system. Still no
>>> changes.
>>>
>>> The situation is even worse on a few other systems with similar
>>> configuration.
>>>
>>
>> At least part of this is a difference in how each tool computes space usage.
>>
>> * `df` calls `statvfs` to get it's data, which tries to count physical
>> allocation accounting for replication profiles.  In other words, data in
>> chunks with the dup, raid1, and raid10 profiles gets counted twice, data in
>> raid5 and raid6 chunks gets counted with a bit of extra space for the
>> parity, etc.
>>
>> * `btrfs fi df` looks directly at the filesystem itself and counts how much
>> space is available to each chunk type in the `total` values and how much
>> space is used in each chunk type in the `used` values, after replication.
>> If you add together the data used value and twice the system and metadata
>> used values, you get the used value reported by regular `df` (well, close to
>> it that is, `df` rounds at a lower precision than `btrfs fi df` does).
>>
>> * `du` scans the directory tree and looks at the file allocation values
>> returned form `stat` calls (or just looks at file sizes if you pass the
>> `--apparent-size` flag to it).  Like `btrfs fi df`, it reports values after
>> replication, it has a couple of nasty caveats on BTRFS, namely that it will
>> report sizes for natively compressed files _before_ compression, and will
>> count reflinked blocks once for each link.
>>
>> Now, this doesn't explain the entirety of the discrepancy with `du`, but it
>> should cover the whole difference between `df` and `btrfs fi df`.
>
>
>
> --
> -Shyam
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-02-28 15:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-23 11:21 Btrfs occupies more space than du reports Shyam Prasad N
2018-02-23 13:23 ` Austin S. Hemmelgarn
2018-02-28 11:26   ` Shyam Prasad N
2018-02-28 15:10     ` Andrei Borzenkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).