Re: Why is the actual disk usage of btrfs considered unknowable?

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Robert White <rwhite@pobox.com>
To: Martin Steigerwald <Martin@lichtvoll.de>,
	Shriramana Sharma <samjnaa@gmail.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: Why is the actual disk usage of btrfs considered unknowable?
Date: Sun, 07 Dec 2014 21:32:01 -0800	[thread overview]
Message-ID: <548537D1.7070602@pobox.com> (raw)
In-Reply-To: <1610909.CxuY1Bb9iL@merkaba>

On 12/07/2014 07:40 AM, Martin Steigerwald wrote:
> Well what would be possible I bet would be a kind of system call like this:
>
> I need to write 5 GB of data in 100 of files to /opt/mynewshinysoftware, can I
> do it *and* give me a guarentee I can.
>
> So like a more flexible fallocate approach as fallocate just allocates one file
> and you would need to run it for all files you intend to create. But challenge
> would be to estimate metadata allocation beforehand accurately.
>
> Or have tar --fallocate -xf which for all files in the archive will first call
> fallocate and only if that succeeded, actually write them. But due to the
> nature of tar archives with their content listing across the whole archive,
> this means it may have to read the tar archive twice, so ZIP archives might be
> better suited for that.
>

What you suggest is Still Not Practical™ (the tar thing might have some 
ability if you were willing to analyze every file to the byte level).

Compression _can_ make a file _bigger_ than its base size. BTRFS decides 
whether or not to compress a file based on the results it gets when 
tying to compress the first N bytes. (I do not know the value of N). But 
it is _easy_ to have a file where the first N bytes compress well but 
the bytes after N take up more space than their byte count. So to 
fallocate() the right size in blocks you'd have to compress the input 
and determine what BTRFS _would_ _do_ and then allocate that much space 
instead of the file size.

And even then, if you didn't create all the names and directories you 
might find that the RBtree had to expand (allocate another tree node) 
one or more times to accommodate the actual files. Lather rinse repeat 
for any checksum trees and anything hitting a flush barrier because of 
commit= or sync() events or other writers perturbing your results 
because it only matters if the filesystem is nearly full and nearly full 
filesystems may not be quiescent at all.

So while the core problem isn't insoluble, in real life it is _not_ 
_worth_ _solving_.

On a nearly empty filesystem, it's going to fit.

In a reasonably empty filesystem, it's going to fit.

On a nearly full filesystem, it may or may not fit.

On a filesystem that is so close to full that you have reason to doubt 
it will fit, you are going to have a very bad time even if it fits.

If you did manage to invent and implement an fallocate algorythm that 
could make this promise and make it stick, then some other running 
program is what's going to crash when you use up that last byte anyway.

Almost full filesystems are their own reward.

next prev parent reply	other threads:[~2014-12-08  5:32 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-07 15:15 Why is the actual disk usage of btrfs considered unknowable? Shriramana Sharma
2014-12-07 15:33 ` Martin Steigerwald
2014-12-07 15:37   ` Shriramana Sharma
2014-12-07 15:40   ` Martin Steigerwald
2014-12-08  5:32     ` Robert White [this message]
2014-12-08  6:20       ` ashford
2014-12-08  7:06         ` Robert White
2014-12-08 14:47       ` Martin Steigerwald
2014-12-08 14:57         ` Austin S Hemmelgarn
2014-12-08 15:52           ` Martin Steigerwald
2014-12-08 23:14         ` Zygo Blaxell
2014-12-07 18:20   ` ashford
2014-12-07 18:34     ` Hugo Mills
2014-12-07 18:48       ` Martin Steigerwald
2014-12-07 19:39       ` ashford
2014-12-08  5:17       ` Chris Murphy
2014-12-07 18:38     ` Martin Steigerwald
2014-12-07 19:44       ` ashford
2014-12-07 19:19   ` Goffredo Baroncelli
2014-12-07 20:32     ` ashford
2014-12-07 23:01       ` Goffredo Baroncelli
2014-12-08  0:12         ` ashford
2014-12-08  2:42           ` Qu Wenruo
2014-12-08  8:12             ` ashford
2014-12-08 14:34           ` Goffredo Baroncelli
2014-12-08  8:18       ` Chris Murphy
2014-12-08  4:59 ` Robert White
2014-12-08  6:43 ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=548537D1.7070602@pobox.com \
    --to=rwhite@pobox.com \
    --cc=Martin@lichtvoll.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=samjnaa@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).