Re: Why is the actual disk usage of btrfs considered unknowable?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Goffredo Baroncelli <kreijack@inwind.it>
To: Shriramana Sharma <samjnaa@gmail.com>
Cc: Martin Steigerwald <Martin@lichtvoll.de>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Why is the actual disk usage of btrfs considered unknowable?
Date: Sun, 07 Dec 2014 20:19:22 +0100	[thread overview]
Message-ID: <5484A83A.5090109@inwind.it> (raw)
In-Reply-To: <44320137.fRRuR6EFMP@merkaba>

On 12/07/2014 04:33 PM, Martin Steigerwald wrote:
> Hi Shriramana!
> 
> Am Sonntag, 7. Dezember 2014, 20:45:59 schrieb Shriramana Sharma:
>>> IIUC:
>>> 
>>> 1) btrfs fi df already shows the alloc-ed space and the space 
>>> used out of that.
>>> 
>>> 2) Despite snapshots, CoW and compression, the tree knows how 
>>> many extents of data and metadata there are, and how many bytes 
>>> on disk these occcupy, no matter what is the total (uncompressed,
>>> "unsnapshotted") size of all the directories and files on the
>>> disk.
>>> 
>>> So this means that btrfs fi df actually shows the real on-disk 
>>> usage. In this case, why do we hear people saying it's not 
>>> possible to know the actual on-disk usage and when a 
>>> btrfs-formatted disk (or partition) will go out of space?
> I never read that the actual disk usage is unknown. But I read that 
> the actual what is free is unknown. And there are several reasons
> for that:
> 
> 1) On a compressed filesystem you cannot know, but only estimate the 
> compression ratio for future data.
> 
> 2) On a compressed filesystem you can choose to have parts of it 
> uncompressed by file / directory attributes, I think. BTRFS can´t 
> know how much of the future data you are going to store compressed
> or uncompressed.
> 
> 3) From what I gathered it is planned to allow different raid / 
> redundancy levels for different subvolumes. BTRFS can´t know 
> beforehand where applications request to save future data, i.e. in 
> which subvolume.


3.1) even in the case of a single disk filesystem, data and metadata 
have different profiles: the data chunk doesn't have any redundancy, 
so 64kb of data consume 64kb of disk space. The metadata chunks 
usually are stored as DUP, so 64kb of metadata consume 128kb on disk.
Moreover you have to consider that small files are stored in metadata
chunk. This means that for big file the disk space consumed is equal
to the data size, but for small file this is doubled.

Going back to your request, to be more clear I used the following terms:
1- disk space used: the space used on the disk
2- size of data: the size of the data stored on the disks
3- disk free space: the unused space of the disk
4- free space: the size of data that the system is able to contain

The value 1,2,3 are known. Which is unknown is the point 4. In
the past I posted some patch which try to estimate the point 4 as:

                                 size_of_data 
free_space = disk_free_space * -----------------
                                disk_space_used

This estimation assumes that the ratio size_of_data/disk_space_used
is constant. But for the point above this assumption may be wrong.

In conclusion, the disk usage is well known; which is unknown is
the space that is available to the user (who is uninterested to
all the details inside a filesystem). The best that is doable
is an estimation like the above one.
BR
Goffredo

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

next prev parent reply	other threads:[~2014-12-07 19:18 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-07 15:15 Why is the actual disk usage of btrfs considered unknowable? Shriramana Sharma
2014-12-07 15:33 ` Martin Steigerwald
2014-12-07 15:37   ` Shriramana Sharma
2014-12-07 15:40   ` Martin Steigerwald
2014-12-08  5:32     ` Robert White
2014-12-08  6:20       ` ashford
2014-12-08  7:06         ` Robert White
2014-12-08 14:47       ` Martin Steigerwald
2014-12-08 14:57         ` Austin S Hemmelgarn
2014-12-08 15:52           ` Martin Steigerwald
2014-12-08 23:14         ` Zygo Blaxell
2014-12-07 18:20   ` ashford
2014-12-07 18:34     ` Hugo Mills
2014-12-07 18:48       ` Martin Steigerwald
2014-12-07 19:39       ` ashford
2014-12-08  5:17       ` Chris Murphy
2014-12-07 18:38     ` Martin Steigerwald
2014-12-07 19:44       ` ashford
2014-12-07 19:19   ` Goffredo Baroncelli [this message]
2014-12-07 20:32     ` ashford
2014-12-07 23:01       ` Goffredo Baroncelli
2014-12-08  0:12         ` ashford
2014-12-08  2:42           ` Qu Wenruo
2014-12-08  8:12             ` ashford
2014-12-08 14:34           ` Goffredo Baroncelli
2014-12-08  8:18       ` Chris Murphy
2014-12-08  4:59 ` Robert White
2014-12-08  6:43 ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5484A83A.5090109@inwind.it \
    --to=kreijack@inwind.it \
    --cc=Martin@lichtvoll.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=samjnaa@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.