Re: Massive loss of disk space

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Duncan <1i5t5.duncan@cox.net>
To: linux-btrfs@vger.kernel.org
Subject: Re: Massive loss of disk space
Date: Thu, 3 Aug 2017 03:48:29 +0000 (UTC)	[thread overview]
Message-ID: <pan$5725a$55de2372$c6dc24b7$520ac328@cox.net> (raw)
In-Reply-To: 798a9077-bcbd-076c-a458-3403010ce8ac@libero.it

Goffredo Baroncelli posted on Wed, 02 Aug 2017 19:52:30 +0200 as
excerpted:

> it seems that BTRFS always allocate the maximum space required, without
> consider the one already allocated. Is it too conservative ? I think no:
> consider the following scenario:
> 
> a) create a 2GB file
> b) fallocate -o 1GB -l 2GB
> c) write from 1GB to 3GB
> 
> after b), the expectation is that c) always succeed [1]: i.e. there is
> enough space on the filesystem. Due to the COW nature of BTRFS, you
> cannot rely on the already allocated space because there could be a
> small time window where both the old and the new data exists on the
> disk.

Not only a small time, perhaps (effectively) permanently, due to either 
of two factors:

1) If the existing extents are reflinked by snapshots or other files they 
obviously won't be released at all when the overwrite is completed.  
fallocate must account for this possibility, and behaving differently in 
the context of other reflinks would be confusing, so the best policy is 
consistently behave as if the existing data will not be freed.

2) As the devs have commented a number of times, an extent isn't freed if 
there's still a reflink to part of it.  If the original extent was a full 
1 GiB data chunk (the chunk being the max size of a native btrfs extent, 
one of the reasons a balance and defrag after conversion from ext4 and 
deletion of the ext4-saved subvolume is recommended, to break up the 
longer ext4 extents so they won't cause btrfs problems later) and all but 
a single 4 KiB block has been rewritten, the full 1 GiB extent will 
remain referenced and continue to take that original full 1 GiB space, 
*plus* the space of all the new-version extents of the overwritten data, 
of course.

So in our fallocate and overwrite scenario, we again must reserve space 
for two copies of the data, the original which may well not be freed even 
without other reflinks, if a single 4 KiB block of an extent remains 
unoverwritten, and the new version of the data.

At least that /was/ the behavior explained on-list previous to the hole-
punching changes.  I'm not a dev and haven't seen a dev comment on 
whether that remains the behavior after hole-punching, which may at least 
naively be expected to automatically handle and free overwritten data 
using hole-punching, or not.  I'd be interested in seeing someone who can 
read the code confirm one way or the other whether hole-punching changed 
that previous behavior, or not.

> My opinion is that in general this behavior is correct due to the COW
> nature of BTRFS.
> The only exception that I can find, is about the "nocow" file. For these
> cases taking in accout the already allocated space would be better.

I'd say it's dangerously optimistic even then, considering that "nocow" 
is actually "cow1" in the presence of snapshots.

Meanwhile, it's worth keeping in mind that it's exactly these sorts of 
corner-cases that are why btrfs is taking so long to stabilize.  
Supposedly "simple" expectations aren't always so simple, and if a 
filesystem gets it wrong, it's somebody's data hanging in the balance!  
(Tho if they've any wisdom at all, they'll ensure they're aware of the 
stability status of a filesystem before they put data on it, and will 
adjust their backup policies accordingly if they're using a still not 
fully stabilized filesystem such as btrfs, so the data won't actually be 
in any danger anyway unless it was literally throw-away value, only 
whatever specific instance of it was involved in that corner-case.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

next prev parent reply	other threads:[~2017-08-03  3:48 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-01 11:43 Massive loss of disk space pwm
2017-08-01 12:20 ` Hugo Mills
2017-08-01 14:39   ` pwm
2017-08-01 14:47     ` Austin S. Hemmelgarn
2017-08-01 15:00       ` Austin S. Hemmelgarn
2017-08-01 15:24         ` pwm
2017-08-01 15:45           ` Austin S. Hemmelgarn
2017-08-01 16:50             ` pwm
2017-08-01 17:04               ` Austin S. Hemmelgarn
2017-08-02 17:52         ` Goffredo Baroncelli
2017-08-02 19:10           ` Austin S. Hemmelgarn
2017-08-02 21:05             ` Goffredo Baroncelli
2017-08-03 11:39               ` Austin S. Hemmelgarn
2017-08-03 16:37                 ` Goffredo Baroncelli
2017-08-03 17:23                   ` Austin S. Hemmelgarn
2017-08-04 14:45                     ` Goffredo Baroncelli
2017-08-04 15:05                       ` Austin S. Hemmelgarn
2017-08-03  3:48           ` Duncan [this message]
2017-08-03 11:44           ` Marat Khalili
2017-08-03 11:52             ` Austin S. Hemmelgarn
2017-08-03 16:01             ` Goffredo Baroncelli
2017-08-03 17:15               ` Marat Khalili
2017-08-03 17:25                 ` Austin S. Hemmelgarn
2017-08-03 22:51               ` pwm
2017-08-02  4:14       ` Duncan
2017-08-02 11:18         ` Austin S. Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='pan$5725a$55de2372$c6dc24b7$520ac328@cox.net' \
    --to=1i5t5.duncan@cox.net \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).