linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Robert White <rwhite@pobox.com>
To: Josef Bacik <jbacik@fb.com>,
	Daniele Testa <daniele.testa@gmail.com>,
	linux-btrfs@vger.kernel.org
Subject: Re: btrfs is using 25% more disk than it should
Date: Sat, 20 Dec 2014 17:40:57 -0800	[thread overview]
Message-ID: <54962529.4070608@pobox.com> (raw)
In-Reply-To: <54955FE6.3030601@fb.com>

On 12/20/2014 03:39 AM, Josef Bacik wrote:
> On 12/20/2014 06:23 AM, Robert White wrote:
>> On 12/19/2014 01:17 PM, Josef Bacik wrote:
>>> tl;dr: Cow means you can in the worst case end up using 2 * filesize -
>>> blocksize of data on disk and the file will appear to be filesize.
>>> Thanks,
>>
>> Doesn't the worst case more like N^log(N) (when N is file in blocksize)
>> in the pernicious case?
>>
>> Staggered block overwrites can "peer down" through gaps to create more
>> than two layers of retention. The only real requirement is that each
>> layer get smaller than the one before it so as to leave some of each of
>> it's predecessor visible.
>>
>> So if I make a file size N blocks, then overwrite it with N-1 blocks,
>> then overwrite it again with N-2 blocks (etc). I can easily create a
>> deep slop of obscured data.
>>
>> [-----------------]
>> [----------------]
>> [---------------]
>> [--------------]
>> [-------------]
>> [------------]
>> [-----------]
>> [----------]
>> [---------]
>> (etc...)
>>
>>
>> Or would I have to bracket the front and back
>>
>> ----------
>>   --------
>>    ------
>>
>> Or could I bracket the sides
>>
>> ---------
>> ---- ----
>> ---   ---
>> --     --
>> -       -
>>
>> There's got to be pahological patterns like this that can end up with a
>> heck of a lot of "hidden" data.
>
> Just the sloped case would do it, the pathological case would result in
> way more used than you expect.  So I guess the worst case would be
> something like
>
> (num_blocks + (num_blocks - 1)!) * blocksize

I think that for a single file it's not factorial but consecutive sum. 
(One of Gauss' equations.)

so max=((n * (n+1))/2)*blocksize

A lot smaller than factorial but still n^2+n blocks, which is nothing to 
discard lightly.

>
> in actually size usage.  Our extents are limited to 128mb in size, but
> still that ends up being pretty huge.  I'm actually going to do this
> locally and see what happens.  Thanks,
>
> Josef
>


  reply	other threads:[~2014-12-21  1:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-18 14:59 btrfs is using 25% more disk than it should Daniele Testa
2014-12-19 18:53 ` Phillip Susi
2014-12-19 19:59   ` Daniele Testa
2014-12-19 20:35     ` Phillip Susi
2014-12-19 21:15     ` Josef Bacik
2014-12-19 21:53       ` Phillip Susi
2014-12-19 22:06         ` Josef Bacik
2014-12-20  1:33     ` Duncan
2014-12-19 21:10 ` Josef Bacik
2014-12-19 21:17   ` Josef Bacik
2014-12-20  1:38     ` Duncan
2014-12-20  5:52     ` Zygo Blaxell
2014-12-20  6:18       ` Daniele Testa
2014-12-20  6:59         ` Duncan
2014-12-20 11:02         ` Josef Bacik
2014-12-20 11:28       ` Josef Bacik
2014-12-23 21:51         ` Zygo Blaxell
2014-12-20  9:15     ` Daniele Testa
2014-12-20 11:23     ` Robert White
2014-12-20 11:39       ` Josef Bacik
2014-12-21  1:40         ` Robert White [this message]
2014-12-21  3:04   ` Robert White

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54962529.4070608@pobox.com \
    --to=rwhite@pobox.com \
    --cc=daniele.testa@gmail.com \
    --cc=jbacik@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).