From: Liu Bo <liubo2009@cn.fujitsu.com>
To: David Sterba <dave@jikos.cz>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 3/4] Btrfs: use large extent range for read and its endio
Date: Fri, 15 Jun 2012 09:50:39 +0800 [thread overview]
Message-ID: <4FDA94EF.9030601@cn.fujitsu.com> (raw)
In-Reply-To: <20120614161239.GU32402@twin.jikos.cz>
On 06/15/2012 12:12 AM, David Sterba wrote:
> On Wed, Jun 13, 2012 at 06:19:10PM +0800, Liu Bo wrote:
>> we use larger extent state range for both readpages and read endio, so that
>> we can lock or unlock less and avoid most of split ops, then we'll reduce write
>> locks taken at endio time.
>>
>> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
>> ---
>> fs/btrfs/extent_io.c | 201 +++++++++++++++++++++++++++++++++++++++++++++-----
>> 1 files changed, 182 insertions(+), 19 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 081fe13..bb66e3c 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -2258,18 +2258,26 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
>> struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1;
>> struct bio_vec *bvec = bio->bi_io_vec;
>> struct extent_io_tree *tree;
>> + struct extent_state *cached = NULL;
>> u64 start;
>> u64 end;
>> int whole_page;
>> int mirror;
>> int ret;
>> + u64 up_start, up_end, un_start, un_end;
>> + int up_first, un_first;
>> + int for_uptodate[bio->bi_vcnt];
>
> The array size depends on an argument, compiler will handle it by
> dynamically expanding the stackframe. What is the expected maximum value
> for bi_vcnt ? There's no hard limit AFAICS, the value gets incremented
> on each page in the bio. If the function is called from the worker
> thread, it's not much of a problem even for values like 128.
>
> Alternate way to avoid over-limit stack consumption is to declare an
> array to hold a fair number of items (eg. 16), and fall back to kmalloc
> otherwise.
>
hmm, you're right. I'm going to use kmalloc with KERNEL_ATOMIC.
>> + int i = 0;
>> +
>> + up_start = un_start = (u64)-1;
>> + up_end = un_end = 0;
>> + up_first = un_first = 1;
>>
>> if (err)
>> uptodate = 0;
>>
>> do {
>> struct page *page = bvec->bv_page;
>> - struct extent_state *cached = NULL;
>>
>> pr_debug("end_bio_extent_readpage: bi_vcnt=%d, idx=%d, err=%d, "
>> "mirror=%ld\n", bio->bi_vcnt, bio->bi_idx, err,
>> @@ -2352,7 +2412,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
>> }
>> unlock_page(page);
>> } else {
>> - if (uptodate) {
>> + if (for_uptodate[i++]) {
>> check_page_uptodate(tree, page);
>> } else {
>> clearpageuptodate(page);
>> @@ -2360,6 +2420,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
>> }
>> check_page_locked(tree, page);
>> }
>> + ++bvec;
>
> can you please move the i++ increments here? From reading the code it's
> not clear on first sight if the sideeffects are ok.
>
Sure, will update it.
>> } while (bvec <= bvec_end);
>>
>> bio_put(bio);
>> @@ -2554,6 +2615,8 @@ static int __extent_read_full_page(struct extent_io_tree *tree,
>>
>> end = page_end;
>> while (1) {
>> + if (range_lock)
>> + break;
>
> with range_lock set, this is equivalent to calling
>
> 2896 set_page_extent_mapped(page);
>
> (plus the cleancache code), a few lines above. Is this inteded? It's
> actually called with '1' from a single place, process_batch_pages().
>
Yeah, I want to skip the lock part, since I've already locked this range of extent.
After that, we'll have a continuous range of extent for read.
>> lock_extent(tree, start, end);
>> ordered = btrfs_lookup_ordered_extent(inode, start);
>> if (!ordered)
>> @@ -3497,6 +3560,59 @@ int extent_writepages(struct extent_io_tree *tree,
>> return ret;
>> }
>>
>> +struct page_list {
>> + struct page *page;
>> + struct list_head list;
>> +};
>> +
>> +static int process_batch_pages(struct extent_io_tree *tree,
>> + struct address_space *mapping,
>> + struct list_head *lock_pages, int *page_cnt,
>> + u64 lock_start, u64 lock_end,
>> + get_extent_t get_extent, struct bio **bio,
>> + unsigned long *bio_flags)
>> +{
>> + u64 page_start;
>> + struct page_list *plist;
>> +
>> + while (1) {
>> + struct btrfs_ordered_extent *ordered = NULL;
>> +
>> + lock_extent(tree, lock_start, lock_end);
>> + page_start = lock_start;
>> + while (page_start < lock_end) {
>> + ordered = btrfs_lookup_ordered_extent(mapping->host,
>> + page_start);
>> + if (ordered) {
>> + page_start = ordered->file_offset;
>> + break;
>> + }
>> + page_start += PAGE_CACHE_SIZE;
>> + }
>> + if (!ordered)
>> + break;
>
> You're leaving the range [lock_start, lock_end) locked after taking this
> branch. Intended?
>
Yeah, this is what it should be.
Usually we do:
o lock the range that is going to be sent to read (at read_page)
o set the range uptodate and unlock it (at end_io)
>> + unlock_extent(tree, lock_start, lock_end);
>> + btrfs_start_ordered_extent(mapping->host, ordered, 1);
>> + btrfs_put_ordered_extent(ordered);
>> + }
>> +
>> + plist = NULL;
>> + while (!list_empty(lock_pages)) {
>> + plist = list_entry(lock_pages->prev, struct page_list, list);
>> +
>> + __extent_read_full_page(tree, plist->page, get_extent,
>> + bio, 0, bio_flags, 1);
>> + page_cache_release(plist->page);
>> + list_del(&plist->list);
>> + plist->page = NULL;
>
> you can drop this assignment
>
Sure.
Thanks a lot for reviewing! :)
thanks,
liubo
>> + kfree(plist);
>> + (*page_cnt)--;
>> + }
>> +
>> + WARN_ON((*page_cnt));
>> + return 0;
>> +}
>> +
>> int extent_readpages(struct extent_io_tree *tree,
>> struct address_space *mapping,
>> struct list_head *pages, unsigned nr_pages,
>
next prev parent reply other threads:[~2012-06-15 1:43 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-13 10:19 [PATCH 0/4 v2][RFC] apply rwlock for extent state Liu Bo
2012-06-13 10:19 ` [PATCH 1/4] Btrfs: use radix tree for checksum Liu Bo
2012-06-13 16:07 ` Zach Brown
2012-06-14 1:50 ` Liu Bo
2012-06-14 16:42 ` Zach Brown
2012-07-06 15:37 ` Chris Mason
2012-07-07 7:43 ` Liu Bo
2012-06-14 15:03 ` David Sterba
2012-06-13 10:19 ` [PATCH 2/4] Btrfs: merge adjacent states as much as possible Liu Bo
2012-06-13 10:19 ` [PATCH 3/4] Btrfs: use large extent range for read and its endio Liu Bo
2012-06-14 16:12 ` David Sterba
2012-06-15 1:50 ` Liu Bo [this message]
2012-06-13 10:19 ` [PATCH 4/4] Btrfs: apply rwlock for extent state Liu Bo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FDA94EF.9030601@cn.fujitsu.com \
--to=liubo2009@cn.fujitsu.com \
--cc=dave@jikos.cz \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).