From: Liu Bo <liubo2009@cn.fujitsu.com>
To: Chris Mason <chris.mason@oracle.com>, linux-btrfs@vger.kernel.org
Subject: Re: [RFC PATCH v2] Btrfs: improve space count for files with fragments
Date: Fri, 27 Apr 2012 09:44:13 +0800 [thread overview]
Message-ID: <4F99F9ED.6030106@cn.fujitsu.com> (raw)
In-Reply-To: <20120426171430.GP22794@shiny>
On 04/27/2012 01:14 AM, Chris Mason wrote:
> On Thu, Apr 26, 2012 at 02:39:23PM +0800, Liu Bo wrote:
>> > Here is a simple scenario:
>> > $ dd if=/dev/zero of=/mnt/btrfs/foobar bs=1k count=20;sync
>> > $ btrfs fi df /mnt/btrfs
>> >
>> > we get 20K used, but then
>> > $ dd if=/dev/zero of=/mnt/btrfs/foobar bs=1k count=4 seek=4 conv=notrunc;sync
>> > $ btrfs fi df /mnt/btrfs
>> >
>> > we get 24K used.
>> > Here is the problem, it is possible that an _unshared_ file with lots of
>> > fragments costs nearly double space than its i_size, like:
>> > 0k 20k
>> > | --- extent --- | turned to be on disk <--- extent ---> <-- A -->
>> > | - A - | | -------------- | | ----- |
>> > 1k 19k 20k + 18k = 38k
>> >
>> > but what users want is <--- extent ---> <-- A -->
>> > | --- | | -- | | ----- |
>> > 1k + 1k + 18k = 20k
>> > so 18k is wasted.
>> >
>> > With the current backref design, there is no easy way to fix this, because it
>> > needs to touch several subtle parts, such as delayed ref stuff, extent backref.
>> >
>> > So here I give it a try by splitting the extent which we're processing(the idea
>> > comes from Chris :)).
>> >
>> > The benifits:
>> > As the above example shows, we'll get three individual extents: 1k + 1k + 18k,
>> > with their checksums are well splitted.
>> >
>> > The defects:
>> > Yes, it makes the code much uglier. And since we've disabled the merging of
>> > delayed refs, we'll get some performance regression.
>> >
>> > NOTE:
>> > The patch may still have some bugs since we need more time to tune the subtle
>> > things.
>
> Thanks for working on this. Could you please explain in detail what the
> pinned extents do?
Sure.
Let's take the above case:
0k 20k
| --- extent --- |
| - A - |
1k 19k
And we assume that this extent starts from disk_bytenr on its FS logical offset.
By splitting the [0k, 20k) extent, we'll get three delayed refs into the delayed-ref rbtree:
a) [0k, 20k), in which only [disk_bytenr+1k, disk_bytenr+19k) will be freed at the end.
b) [0k, 1k), which will _not_ allocate a new extent but use the remained space of [0k, 20k).
c) [19k, 20k), ditto.
And another ref [1k,19k) will get a new allocated space by our normal endio routine.
What I want is
free [0k, 20k), set this range DIRTY in the pinned_extents tree.
alloc [0k, 1k), clear this range DIRTY in the pinned_extents tree.
alloc [19k, 20k), ditto.
However, in my stress test, this three refs may not be ordered by a)->b)->c), but b)->a)->c) instead.
That would be a problem, because it will confuse our space_info's counter: bytes_reserved, bytes_pinned.
So I use EXTENT_PINNED flag to indicate this range will use pinned space rather than reserved space.
As for b)->a)->c), the logic will be:
b) alloc [0k, 1k), since no range covers this, set it PINNED in the pinned_extents tree.
a) free [0k, 20k), set this range DIRTY in the pinned_extents tree, and find [0k, 1k) tagged with PINNED,
clear [0k, 1k) PINNED & DIRTY in the pinned_extents tree, and update our space_info's counter: bytes_pinned.
c) alloc [19k, 20k), clear this range DIRTY in the pinned_extents tree.
Maybe I'd better pick up a new name for EXTENT_PINNED. :)
Actually I'm not satisfied with the patch as I do not want to make the code any uglier at all, so I kept it as RFC...
Any thoughts?
thanks,
liubo
next prev parent reply other threads:[~2012-04-27 1:44 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-26 6:39 [RFC PATCH v2] Btrfs: improve space count for files with fragments Liu Bo
2012-04-26 17:14 ` Chris Mason
2012-04-27 1:44 ` Liu Bo [this message]
2012-05-09 17:29 ` David Sterba
2012-05-10 1:11 ` Liu Bo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F99F9ED.6030106@cn.fujitsu.com \
--to=liubo2009@cn.fujitsu.com \
--cc=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.