From: Arne Jansen <sensille@gmx.net>
To: Linux Btrfs <linux-btrfs@vger.kernel.org>
Cc: Chris Mason <chris.mason@oracle.com>
Subject: [RFC] Tree fragmentation and prefetching
Date: Wed, 23 Mar 2011 14:06:02 +0100 [thread overview]
Message-ID: <4D89F03A.9050306@gmx.net> (raw)
While looking into the performance of scrub I noticed that a significant
amount of time is being used for loading the extent tree and the csum
tree. While this is no surprise I did some prototyping on how to improve
on it.
The main idea is to load the tree (or parts of it) top-down, order the
needed blocks and distribute it over all disks.
To keep you interested, some results first.
a) by tree enumeration with reada=2
reading extent tree: 242s
reading csum tree: 140s
reading both trees: 324s
b) prefetch prototype
reading extent tree: 23.5s
reading csum tree: 20.4s
reading both trees: 25.7s
The test setup consists of a 7 Seagate ES.2 1TB disks filesystem, filled
28%. It is created with the current git tree + the round robin patch and
filled with
fs_mark -D 512 -t 16 -n 4096 -F -S0
The 'normal' read is done by enumerating the leaves by btrfs_next_leaf()
with path->reada=2. Both trees are being enumerated one after the other.
The prototype currently just uses raw bios, does not make use of the
page cache and does not enter the read pages into the cache. This will
probably add some overhead. It also does not check the crcs.
While it is very promising to implement it for scrub, I think a more
general interface which can be used for every enumeration would be
beneficial. Use cases that come to mind are rebalance, reflink, deletion
of large files, listing of large directories etc..
I'd imagine an interface along the lines of
int btrfs_readahead_init(struct btrfs_reada_ctx *reada);
int btrfs_readahead_add(struct btrfs_root *root,
struct btrfs_key *start,
struct btrfs_key *end,
struct btrfs_reada_ctx *reada);
void btrfs_readahead_wait(struct btrfs_reada_ctx *reada);
to trigger the readahead of parts of a tree. Multiple readahead
requests can be given before waiting. This would enable the very
beneficial folding seen above for 'reading both trees'.
Also it would be possible to add a cascading readahead, where the
content of leaves would trigger readaheads in other trees, maybe by
giving a callback for the decisions what to read instead of the fixed
start/end range.
For the implementation I'd need an interface which I haven't been able
to find yet. Currently I can trigger the read of several pages / tree
blocks and wait for the completion of each of them. What I'd need would
be an interface that gives me a callback on each completion or a waiting
function that wakes up on each completion with the information which
pages just completed.
One way to achieve this would be to add a hook, but I gladly take any
implementation hints.
--
Arne
next reply other threads:[~2011-03-23 13:06 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-23 13:06 Arne Jansen [this message]
2011-03-23 15:01 ` [RFC] Tree fragmentation and prefetching Arne Jansen
2011-03-23 19:26 ` Andrey Kuzmin
2011-03-23 20:28 ` Arne Jansen
2011-03-23 21:47 ` Andrey Kuzmin
2011-03-24 1:38 ` Miao Xie
2011-03-24 7:29 ` Arne Jansen
2011-03-24 12:45 ` Miao Xie
2011-03-23 19:32 ` Chris Mason
2011-03-25 20:14 ` Arne Jansen
2011-03-25 20:15 ` Chris Mason
2011-03-25 20:53 ` Arne Jansen
2011-03-25 21:01 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D89F03A.9050306@gmx.net \
--to=sensille@gmx.net \
--cc=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).