From: liubo <liubo2009@cn.fujitsu.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>,
Linux Btrfs <linux-btrfs@vger.kernel.org>,
Josef Bacik <josef@redhat.com>
Subject: Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
Date: Mon, 25 Apr 2011 17:58:05 +0800 [thread overview]
Message-ID: <4DB545AD.5050908@cn.fujitsu.com> (raw)
In-Reply-To: <1303435579-sup-6101@think>
On 04/22/2011 09:28 AM, Chris Mason wrote:
> Excerpts from Li Zefan's message of 2011-04-21 20:55:40 -0400:
>> Chris Mason wrote:
>>> Excerpts from liubo's message of 2011-04-21 03:58:21 -0400:
>>>> The current code relogs the entire inode every time during fsync log,
>>>> and it is much better suited to small files rather than large ones.
>>>>
>>>> During my performance test, the fsync performace of large files sucks,
>>>> and we can ascribe this to the tremendous amount of csum infos of the
>>>> large ones, cause we have to flush all of these csum infos into log trees
>>>> even when there are only _one_ change in the whole file data. Apparently,
>>>> to optimize fsync, we need to create a filter to skip the unnecessary csum
>>>> ones, that is, the corresponding file data remains unchanged before this fsync.
>>>>
>>>> Here I have some test results to show, I use sysbench to do "random write + fsync".
>>>>
>>>> Sysbench args:
>>>> - Number of threads: 1
>>>> - Extra file open flags: 0
>>>> - 2 files, 4Gb each
>>>> - Block size 4Kb
>>>> - Number of random requests for random IO: 10000
>>>> - Read/Write ratio for combined random IO test: 1.50
>>>> - Periodic FSYNC enabled, calling fsync() each 100 requests.
>>>> - Calling fsync() at the end of test, Enabled.
>>>> - Using synchronous I/O mode
>>>> - Doing random write test
>>>>
>>>> Sysbench results:
>>>> ===
>>>> Operations performed: 0 Read, 10000 Write, 200 Other = 10200 Total
>>>> Read 0b Written 39.062Mb Total transferred 39.062Mb
>>>> ===
>>>> a) without patch: (*SPEED* : 451.01Kb/sec)
>>>> 112.75 Requests/sec executed
>>>>
>>>> b) with patch: (*SPEED* : 5.1537Mb/sec)
>>>> 1319.34 Requests/sec executed
>>> Really nice results! Especially considering the small size of the patch.
>>>
>>> But, I'd really like to look at using sub transaction ids for this, and
>>> then logging just the part of the inode that had changed since the last
>>> log commit. It's more complex, but will also help reduce tree searches
>>> for the file items.
>>>
>> And this patch forgot to mention it has compatability issue.
>
> Right, at the very least we want to just use one bit of that field
> instead of all 8. But keeping a sub-transid and putting that in the
> generation field of the file extent instead can get us the same benefits
> without stealing the bits.
>
Nice. This is the first step of my plan.
> As we push the sub transid into the btree blocks as well, we'll get much
> faster tree walks too. The penalty is in complexity in the logging
> code, since it will have to deal with finding extents in the log tree
> and merging in the new extents from the file.
I've been thinking of this extent buffer with sub transid stuff for a while,
and will give it a try. :)
thanks,
liubo.
>
> -chris
>
next prev parent reply other threads:[~2011-04-25 9:58 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-21 7:58 [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog liubo
2011-04-21 13:16 ` Chris Mason
2011-04-22 0:55 ` Li Zefan
2011-04-22 1:28 ` Chris Mason
2011-04-25 9:58 ` liubo [this message]
2011-10-25 23:18 ` Myroslav Opyr
2011-10-26 1:12 ` Liu Bo
[not found] <4DAD7957.6070505@cn.fujitsu.com>
[not found] ` <4DAE3787.8050602@cn.fujitsu.com>
[not found] ` <4DAE9C00.2020705@cn.fujitsu.com>
2011-05-06 2:36 ` liubo
2011-05-06 12:51 ` Josef Bacik
2011-05-06 14:59 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DB545AD.5050908@cn.fujitsu.com \
--to=liubo2009@cn.fujitsu.com \
--cc=chris.mason@oracle.com \
--cc=josef@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).