From: liubo <liubo2009@cn.fujitsu.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>, dave <dave@jikos.cz>,
josef <josef@redhat.com>
Subject: Re: [GIT PULL v3] Btrfs: improve write ahead log with sub transaction
Date: Sat, 06 Aug 2011 17:23:36 +0800 [thread overview]
Message-ID: <4E3D0818.1090309@cn.fujitsu.com> (raw)
In-Reply-To: <4E3CB89A.9010706@cn.fujitsu.com>
On 08/06/2011 11:44 AM, liubo wrote:
> Hi, Chris,
>
> On 08/04/2011 09:57 PM, Chris Mason wrote:
>> Excerpts from Liu Bo's message of 2011-06-21 04:49:41 -0400:
>>> I've been working to try to improve the write-ahead log's performance,
>>> and I found that the bottleneck addresses in the checksum items,
>>> especially when we want to make a random write on a large file, e.g a 4G file.
>> I spent some time last week on this code, because I really wanted to
>> be able to include it. But I hit two problems.
>>
>> Recording the transid of the log tree root doesn't completely solve
>> problems with later mounts expecting generation + 1. If an older kernel
>> were to try and mount a log created by our new code, it wouldn't
>> understand the transid and the mount would fail.
>>
>> I think we just need to force the transid of the root block to
>> generation + 1. It is slightly less optimal but still much better than
>> what we have.
>>
>
>
> ohh, I forgot to fix this, sorry.
>
>
>> The second problem was that I consistently hit crashes during log replay
>> after a crash. The test was just to use synctest:
>>
>> http://oss.oracle.com/~mason/synctest/
>>
>> synctest -t 32 -f -F -u -n 100 /mnt
>>
>> I waited about 45 seconds and reset the machine. Later mounts would
>> crash during log replay.
>>
>
>
> I've reproduced as your tips and hit a crash.
>
> But I'm not sure if the following bug is just what is happening on your box?
>
> And if it is, it is a bug from the original add_inode_ref() code, and I've been working it out.
I've just posted a patch to solve this:
[PATCH] Btrfs: fix an oops of log replay
This patch should be helpful then.
thanks,
liubo
> Otherwise, if your crash is _another_ bug, plz let me know ASAP.
>
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/inode.c:4580!
> Pid: 2124, comm: mount Not tainted 3.0.0-for-linus+ #13 LENOVO QiTianM7150/To be filled by O.E.M.
> RIP: 0010:[<ffffffffa03df251>] [<ffffffffa03df251>] btrfs_add_link+0x161/0x1c0 [btrfs]
> [...]
> Call Trace:
> [<ffffffffa03e7b31>] ? btrfs_inode_ref_index+0x31/0x80 [btrfs]
> [<ffffffffa04054e9>] add_inode_ref+0x319/0x3f0 [btrfs]
> [<ffffffffa0407087>] replay_one_buffer+0x2c7/0x390 [btrfs]
> [<ffffffffa040444a>] walk_down_log_tree+0x32a/0x480 [btrfs]
> [<ffffffffa0404695>] walk_log_tree+0xf5/0x240 [btrfs]
> [<ffffffffa0406cc0>] btrfs_recover_log_trees+0x250/0x350 [btrfs]
> [<ffffffffa0406dc0>] ? btrfs_recover_log_trees+0x350/0x350 [btrfs]
> [<ffffffffa03d18b2>] open_ctree+0x1442/0x17d0 [btrfs]
> [<ffffffff811aad34>] ? disk_name+0x64/0xc0
> [<ffffffffa03afd9e>] btrfs_mount+0x3de/0x570 [btrfs]
> [<ffffffff81149293>] mount_fs+0x43/0x1a0
> [<ffffffff8110f920>] ? __alloc_percpu+0x10/0x20
> [<ffffffff81163873>] vfs_kern_mount+0x63/0xd0
> [<ffffffff81163952>] do_kern_mount+0x52/0x110
> [<ffffffff811d226a>] ? security_capable+0x2a/0x30
> [<ffffffff81165597>] do_mount+0x257/0x7e0
> [<ffffffff8110a25b>] ? memdup_user+0x4b/0x90
> [<ffffffff8110a2fb>] ? strndup_user+0x5b/0x80
> [<ffffffff81165bb0>] sys_mount+0x90/0xe0
> [<ffffffff8147142b>] system_call_fastpath+0x16/0x1b
>
>
> thanks,
> liubo
>
>> -chris
>>
>>> Then a idea for this suggested by Chris is to use sub transaction ids and just
>>> to log the part of inode that had changed since either the last log commit or
>>> the last transaction commit. And as we also push the sub transid into the btree
>>> blocks, we'll get much faster tree walks. As a result, we abandon the original
>>> brute force approach, which is "to delete all items of the inode in log",
>>> to making sure we get the most uptodate copies of everything, and instead
>>> we manage to "find and merge", i.e. finding extents in the log tree and merging
>>> in the new extents from the file.
>>>
>>> This patchset puts the above idea into code, and although the code is now more
>>> complex, it brings us a great deal of performance improvement:
>>>
>>> in my sysbench "write + fsync" test:
>>>
>>> 451.01Kb/sec -> 4.3621Mb/sec
>>>
>>> In v2, thanks to Chris, we worked together to solve 2 bugs, and after that it
>>> works as expected.
>>>
>>> Since there are some vital changes in recent rc, like "kill trans_mutex" and
>>> "use cur_trans", as David asked, I rebase the patchset to the latest for-linus
>>> branch.
>>>
>>> More tests are welcome!
>>>
>>> You can also get this patchset from:
>>>
>>> git://repo.or.cz/linux-btrfs-devel.git sub-trans
>>>
>>> Liu Bo (12):
>>> Btrfs: introduce sub transaction stuff
>>> Btrfs: update block generation if should_cow_block fails
>>> Btrfs: modify btrfs_drop_extents API
>>> Btrfs: introduce first sub trans
>>> Btrfs: still update inode trans stuff when size remains unchanged
>>> Btrfs: improve log with sub transaction
>>> Btrfs: add checksum check for log
>>> Btrfs: fix a bug of log check
>>> Btrfs: kick off useless code
>>> Btrfs: deal with EEXIST after iput
>>> Btrfs: use the right generation number to read log_root_tree
>>> Revert "Btrfs: do not flush csum items of unchanged file data during
>>> treelog"
>>>
>>> fs/btrfs/btrfs_inode.h | 12 ++-
>>> fs/btrfs/ctree.c | 69 +++++++++---
>>> fs/btrfs/ctree.h | 5 +-
>>> fs/btrfs/disk-io.c | 12 +-
>>> fs/btrfs/extent-tree.c | 10 +-
>>> fs/btrfs/file.c | 22 ++---
>>> fs/btrfs/inode.c | 33 ++++---
>>> fs/btrfs/ioctl.c | 6 +-
>>> fs/btrfs/relocation.c | 6 +-
>>> fs/btrfs/transaction.c | 14 ++-
>>> fs/btrfs/transaction.h | 19 +++-
>>> fs/btrfs/tree-defrag.c | 2 +-
>>> fs/btrfs/tree-log.c | 272 ++++++++++++++++++++++++++++++++++-------------
>>> 13 files changed, 331 insertions(+), 151 deletions(-)
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
prev parent reply other threads:[~2011-08-06 9:23 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-21 8:49 [GIT PULL v3] Btrfs: improve write ahead log with sub transaction Liu Bo
2011-06-21 8:49 ` [PATCH 01/12 v3] Btrfs: introduce sub transaction stuff Liu Bo
2011-06-21 8:49 ` [PATCH 02/12 v3] Btrfs: update block generation if should_cow_block fails Liu Bo
2011-06-21 8:49 ` [PATCH 03/12 v3] Btrfs: modify btrfs_drop_extents API Liu Bo
2011-06-21 8:49 ` [PATCH 04/12 v3] Btrfs: introduce first sub trans Liu Bo
2011-06-21 8:49 ` [PATCH 05/12 v3] Btrfs: still update inode trans stuff when size remains unchanged Liu Bo
2011-06-21 8:49 ` [PATCH 06/12 v3] Btrfs: improve log with sub transaction Liu Bo
2011-06-21 13:46 ` Josef Bacik
2011-06-21 8:49 ` [PATCH 07/12 v3] Btrfs: add checksum check for log Liu Bo
2011-06-21 13:49 ` Josef Bacik
2011-06-21 8:49 ` [PATCH 08/12 v3] Btrfs: fix a bug of log check Liu Bo
2011-06-21 8:49 ` [PATCH 09/12 v3] Btrfs: kick off useless code Liu Bo
2011-06-21 8:49 ` [PATCH 10/12 v3] Btrfs: deal with EEXIST after iput Liu Bo
2011-06-21 14:00 ` Josef Bacik
2011-06-22 1:57 ` liubo
2011-06-21 8:49 ` [PATCH 11/12 v3] Btrfs: use the right generation number to read log_root_tree Liu Bo
2011-06-21 8:49 ` [PATCH 12/12 v3] Revert "Btrfs: do not flush csum items of unchanged file data during treelog" Liu Bo
2011-08-04 13:57 ` [GIT PULL v3] Btrfs: improve write ahead log with sub transaction Chris Mason
2011-08-06 3:44 ` liubo
2011-08-06 9:23 ` liubo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E3D0818.1090309@cn.fujitsu.com \
--to=liubo2009@cn.fujitsu.com \
--cc=chris.mason@oracle.com \
--cc=dave@jikos.cz \
--cc=josef@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.