Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: liubo <liubo2009@cn.fujitsu.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>,
	Linux Btrfs <linux-btrfs@vger.kernel.org>,
	Josef Bacik <josef@redhat.com>
Subject: Re: [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog
Date: Mon, 25 Apr 2011 17:58:05 +0800	[thread overview]
Message-ID: <4DB545AD.5050908@cn.fujitsu.com> (raw)
In-Reply-To: <1303435579-sup-6101@think>

On 04/22/2011 09:28 AM, Chris Mason wrote:
> Excerpts from Li Zefan's message of 2011-04-21 20:55:40 -0400:
>> Chris Mason wrote:
>>> Excerpts from liubo's message of 2011-04-21 03:58:21 -0400:
>>>> The current code relogs the entire inode every time during fsync log,
>>>> and it is much better suited to small files rather than large ones.
>>>>
>>>> During my performance test, the fsync performace of large files sucks,
>>>> and we can ascribe this to the tremendous amount of csum infos of the
>>>> large ones, cause we have to flush all of these csum infos into log trees
>>>> even when there are only _one_ change in the whole file data.  Apparently,
>>>> to optimize fsync, we need to create a filter to skip the unnecessary csum
>>>> ones, that is, the corresponding file data remains unchanged before this fsync.
>>>>
>>>> Here I have some test results to show, I use sysbench to do "random write + fsync".
>>>>
>>>> Sysbench args:
>>>>   - Number of threads: 1
>>>>   - Extra file open flags: 0
>>>>   - 2 files, 4Gb each
>>>>   - Block size 4Kb
>>>>   - Number of random requests for random IO: 10000
>>>>   - Read/Write ratio for combined random IO test: 1.50
>>>>   - Periodic FSYNC enabled, calling fsync() each 100 requests.
>>>>   - Calling fsync() at the end of test, Enabled.
>>>>   - Using synchronous I/O mode
>>>>   - Doing random write test
>>>>
>>>> Sysbench results:
>>>> ===
>>>>    Operations performed:  0 Read, 10000 Write, 200 Other = 10200 Total
>>>>    Read 0b  Written 39.062Mb  Total transferred 39.062Mb
>>>> ===
>>>> a) without patch:  (*SPEED* : 451.01Kb/sec)
>>>>    112.75 Requests/sec executed
>>>>
>>>> b) with patch:     (*SPEED* : 5.1537Mb/sec)
>>>>    1319.34 Requests/sec executed
>>> Really nice results! Especially considering the small size of the patch.
>>>
>>> But, I'd really like to look at using sub transaction ids for this, and
>>> then logging just the part of the inode that had changed since the last
>>> log commit.  It's more complex, but will also help reduce tree searches
>>> for the file items.
>>>
>> And this patch forgot to mention it has compatability issue.
> 
> Right, at the very least we want to just use one bit of that field
> instead of all 8.  But keeping a sub-transid and putting that in the
> generation field of the file extent instead can get us the same benefits
> without stealing the bits.
> 

Nice.  This is the first step of my plan.

> As we push the sub transid into the btree blocks as well, we'll get much
> faster tree walks too.  The penalty is in complexity in the logging
> code, since it will have to deal with finding extents in the log tree
> and merging in the new extents from the file.

I've been thinking of this extent buffer with sub transid stuff for a while,
and will give it a try. :)

thanks,
liubo.

> 
> -chris
>

next prev parent reply	other threads:[~2011-04-25  9:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-21  7:58 [RFC PATCH] Btrfs: do not flush csum items of unchanged file data during treelog liubo
2011-04-21 13:16 ` Chris Mason
2011-04-22  0:55   ` Li Zefan
2011-04-22  1:28     ` Chris Mason
2011-04-25  9:58       ` liubo [this message]
2011-10-25 23:18         ` Myroslav Opyr
2011-10-26  1:12           ` Liu Bo
     [not found] <4DAD7957.6070505@cn.fujitsu.com>
     [not found] ` <4DAE3787.8050602@cn.fujitsu.com>
     [not found]   ` <4DAE9C00.2020705@cn.fujitsu.com>
2011-05-06  2:36     ` liubo
2011-05-06 12:51       ` Josef Bacik
2011-05-06 14:59       ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DB545AD.5050908@cn.fujitsu.com \
    --to=liubo2009@cn.fujitsu.com \
    --cc=chris.mason@oracle.com \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).