From: Robert White <rwhite@pobox.com>
To: Josef Bacik <jbacik@fb.com>,
Daniele Testa <daniele.testa@gmail.com>,
linux-btrfs@vger.kernel.org
Subject: Re: btrfs is using 25% more disk than it should
Date: Sat, 20 Dec 2014 17:40:57 -0800 [thread overview]
Message-ID: <54962529.4070608@pobox.com> (raw)
In-Reply-To: <54955FE6.3030601@fb.com>
On 12/20/2014 03:39 AM, Josef Bacik wrote:
> On 12/20/2014 06:23 AM, Robert White wrote:
>> On 12/19/2014 01:17 PM, Josef Bacik wrote:
>>> tl;dr: Cow means you can in the worst case end up using 2 * filesize -
>>> blocksize of data on disk and the file will appear to be filesize.
>>> Thanks,
>>
>> Doesn't the worst case more like N^log(N) (when N is file in blocksize)
>> in the pernicious case?
>>
>> Staggered block overwrites can "peer down" through gaps to create more
>> than two layers of retention. The only real requirement is that each
>> layer get smaller than the one before it so as to leave some of each of
>> it's predecessor visible.
>>
>> So if I make a file size N blocks, then overwrite it with N-1 blocks,
>> then overwrite it again with N-2 blocks (etc). I can easily create a
>> deep slop of obscured data.
>>
>> [-----------------]
>> [----------------]
>> [---------------]
>> [--------------]
>> [-------------]
>> [------------]
>> [-----------]
>> [----------]
>> [---------]
>> (etc...)
>>
>>
>> Or would I have to bracket the front and back
>>
>> ----------
>> --------
>> ------
>>
>> Or could I bracket the sides
>>
>> ---------
>> ---- ----
>> --- ---
>> -- --
>> - -
>>
>> There's got to be pahological patterns like this that can end up with a
>> heck of a lot of "hidden" data.
>
> Just the sloped case would do it, the pathological case would result in
> way more used than you expect. So I guess the worst case would be
> something like
>
> (num_blocks + (num_blocks - 1)!) * blocksize
I think that for a single file it's not factorial but consecutive sum.
(One of Gauss' equations.)
so max=((n * (n+1))/2)*blocksize
A lot smaller than factorial but still n^2+n blocks, which is nothing to
discard lightly.
>
> in actually size usage. Our extents are limited to 128mb in size, but
> still that ends up being pretty huge. I'm actually going to do this
> locally and see what happens. Thanks,
>
> Josef
>
next prev parent reply other threads:[~2014-12-21 1:41 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-18 14:59 btrfs is using 25% more disk than it should Daniele Testa
2014-12-19 18:53 ` Phillip Susi
2014-12-19 19:59 ` Daniele Testa
2014-12-19 20:35 ` Phillip Susi
2014-12-19 21:15 ` Josef Bacik
2014-12-19 21:53 ` Phillip Susi
2014-12-19 22:06 ` Josef Bacik
2014-12-20 1:33 ` Duncan
2014-12-19 21:10 ` Josef Bacik
2014-12-19 21:17 ` Josef Bacik
2014-12-20 1:38 ` Duncan
2014-12-20 5:52 ` Zygo Blaxell
2014-12-20 6:18 ` Daniele Testa
2014-12-20 6:59 ` Duncan
2014-12-20 11:02 ` Josef Bacik
2014-12-20 11:28 ` Josef Bacik
2014-12-23 21:51 ` Zygo Blaxell
2014-12-20 9:15 ` Daniele Testa
2014-12-20 11:23 ` Robert White
2014-12-20 11:39 ` Josef Bacik
2014-12-21 1:40 ` Robert White [this message]
2014-12-21 3:04 ` Robert White
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54962529.4070608@pobox.com \
--to=rwhite@pobox.com \
--cc=daniele.testa@gmail.com \
--cc=jbacik@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).