linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Goffredo Baroncelli <kreijack@libero.it>
To: "Austin S. Hemmelgarn" <ahferroin7@gmail.com>,
	pwm <pwm@iapetus.neab.net>, Hugo Mills <hugo@carfax.org.uk>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: Massive loss of disk space
Date: Wed, 2 Aug 2017 19:52:30 +0200	[thread overview]
Message-ID: <798a9077-bcbd-076c-a458-3403010ce8ac@libero.it> (raw)
In-Reply-To: <7f2b5c3a-2f5c-e857-d2dc-3ea16b58ecaf@gmail.com>

Hi,

On 2017-08-01 17:00, Austin S. Hemmelgarn wrote:
> OK, I just did a dead simple test by hand, and it looks like I was right.  The method I used to check this is as follows:
> 1. Create and mount a reasonably small filesystem (I used an 8G temporary LV for this, a file would work too though).
> 2. Using dd or a similar tool, create a test file that takes up half of the size of the filesystem.  It is important that this _not_ be fallocated, but just written out.
> 3. Use `fallocate -l` to try and extend the size of the file beyond half the size of the filesystem.
> 
> For BTRFS, this will result in -ENOSPC, while for ext4 and XFS, it will succeed with no error.  Based on this and some low-level inspection, it looks like BTRFS treats the full range of the fallocate call as unallocated, and thus is trying to allocate space for regions of that range that are already allocated.

I can confirm this behavior; below some step to reproduce it [2]; however I don't think that it is a bug, but this is the correct behavior for a COW filesystem (see below).


Looking at the function btrfs_fallocate() (file fs/btrfs/file.c)


static long btrfs_fallocate(struct file *file, int mode,
                            loff_t offset, loff_t len)
{
[...]
        alloc_start = round_down(offset, blocksize);        
        alloc_end = round_up(offset + len, blocksize);
[...]
        /*
         * Only trigger disk allocation, don't trigger qgroup reserve
         *
         * For qgroup space, it will be checked later.
         */
        ret = btrfs_alloc_data_chunk_ondemand(BTRFS_I(inode),
                        alloc_end - alloc_start)


it seems that BTRFS always allocate the maximum space required, without consider the one already allocated. Is it too conservative ? I think no: consider the following scenario:

a) create a 2GB file
b) fallocate -o 1GB -l 2GB
c) write from 1GB to 3GB

after b), the expectation is that c) always succeed [1]: i.e. there is enough space on the filesystem. Due to the COW nature of BTRFS, you cannot rely on the already allocated space because there could be a small time window where both the old and the new data exists on the disk. 

My opinion is that in general this behavior is correct due to the COW nature of BTRFS. 
The only exception that I can find, is about the "nocow" file. For these cases taking in accout the already allocated space would be better.

Comments are welcome.

BR
G.Baroncelli

[1] from man 2 fallocate
[...]
       After  a  successful call, subsequent writes into the range specified by offset and len are
       guaranteed not to fail because of lack of disk space.
[...]


[2]

-- create a 5G btrfs filesystem

# mkdir t1
# truncate --size 5G disk
# losetup /dev/loop0 disk
# mkfs.btrfs /dev/loop0
# mount /dev/loop0 t1

-- test
-- create a 1500 MB file, the expand it to 4000MB
-- expected result: the file is 4000MB size
-- result: fail: the expansion fails

# fallocate -l $((1024*1024*100*15))  file.bin
# fallocate -l $((1024*1024*100*40))  file.bin
fallocate: fallocate failed: No space left on device
# ls -lh file.bin 
-rw-r--r-- 1 root root 1.5G Aug  2 19:09 file.bin


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

  parent reply	other threads:[~2017-08-02 18:00 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-01 11:43 Massive loss of disk space pwm
2017-08-01 12:20 ` Hugo Mills
2017-08-01 14:39   ` pwm
2017-08-01 14:47     ` Austin S. Hemmelgarn
2017-08-01 15:00       ` Austin S. Hemmelgarn
2017-08-01 15:24         ` pwm
2017-08-01 15:45           ` Austin S. Hemmelgarn
2017-08-01 16:50             ` pwm
2017-08-01 17:04               ` Austin S. Hemmelgarn
2017-08-02 17:52         ` Goffredo Baroncelli [this message]
2017-08-02 19:10           ` Austin S. Hemmelgarn
2017-08-02 21:05             ` Goffredo Baroncelli
2017-08-03 11:39               ` Austin S. Hemmelgarn
2017-08-03 16:37                 ` Goffredo Baroncelli
2017-08-03 17:23                   ` Austin S. Hemmelgarn
2017-08-04 14:45                     ` Goffredo Baroncelli
2017-08-04 15:05                       ` Austin S. Hemmelgarn
2017-08-03  3:48           ` Duncan
2017-08-03 11:44           ` Marat Khalili
2017-08-03 11:52             ` Austin S. Hemmelgarn
2017-08-03 16:01             ` Goffredo Baroncelli
2017-08-03 17:15               ` Marat Khalili
2017-08-03 17:25                 ` Austin S. Hemmelgarn
2017-08-03 22:51               ` pwm
2017-08-02  4:14       ` Duncan
2017-08-02 11:18         ` Austin S. Hemmelgarn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=798a9077-bcbd-076c-a458-3403010ce8ac@libero.it \
    --to=kreijack@libero.it \
    --cc=ahferroin7@gmail.com \
    --cc=hugo@carfax.org.uk \
    --cc=kreijack@inwind.it \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=pwm@iapetus.neab.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).