Re: [PATCH 3/4] Btrfs: use large extent range for read and its endio

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Liu Bo <liubo2009@cn.fujitsu.com>
To: David Sterba <dave@jikos.cz>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 3/4] Btrfs: use large extent range for read and its endio
Date: Fri, 15 Jun 2012 09:50:39 +0800	[thread overview]
Message-ID: <4FDA94EF.9030601@cn.fujitsu.com> (raw)
In-Reply-To: <20120614161239.GU32402@twin.jikos.cz>

On 06/15/2012 12:12 AM, David Sterba wrote:

> On Wed, Jun 13, 2012 at 06:19:10PM +0800, Liu Bo wrote:
>> we use larger extent state range for both readpages and read endio, so that
>> we can lock or unlock less and avoid most of split ops, then we'll reduce write
>> locks taken at endio time.
>>
>> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
>> ---
>>  fs/btrfs/extent_io.c |  201 +++++++++++++++++++++++++++++++++++++++++++++-----
>>  1 files changed, 182 insertions(+), 19 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 081fe13..bb66e3c 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -2258,18 +2258,26 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
>>  	struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1;
>>  	struct bio_vec *bvec = bio->bi_io_vec;
>>  	struct extent_io_tree *tree;
>> +	struct extent_state *cached = NULL;
>>  	u64 start;
>>  	u64 end;
>>  	int whole_page;
>>  	int mirror;
>>  	int ret;
>> +	u64 up_start, up_end, un_start, un_end;
>> +	int up_first, un_first;
>> +	int for_uptodate[bio->bi_vcnt];
> 
> The array size depends on an argument, compiler will handle it by
> dynamically expanding the stackframe. What is the expected maximum value
> for bi_vcnt ? There's no hard limit AFAICS, the value gets incremented
> on each page in the bio. If the function is called from the worker
> thread, it's not much of a problem even for values like 128.
> 
> Alternate way to avoid over-limit stack consumption is to declare an
> array to hold a fair number of items (eg. 16), and fall back to kmalloc
> otherwise.
> 


hmm, you're right.  I'm going to use kmalloc with KERNEL_ATOMIC.


>> +	int i = 0;
>> +
>> +	up_start = un_start = (u64)-1;
>> +	up_end = un_end = 0;
>> +	up_first = un_first = 1;
>>  
>>  	if (err)
>>  		uptodate = 0;
>>  
>>  	do {
>>  		struct page *page = bvec->bv_page;
>> -		struct extent_state *cached = NULL;
>>  
>>  		pr_debug("end_bio_extent_readpage: bi_vcnt=%d, idx=%d, err=%d, "
>>  			 "mirror=%ld\n", bio->bi_vcnt, bio->bi_idx, err,
>> @@ -2352,7 +2412,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
>>  			}
>>  			unlock_page(page);
>>  		} else {
>> -			if (uptodate) {
>> +			if (for_uptodate[i++]) {
>>  				check_page_uptodate(tree, page);
>>  			} else {
>>  				clearpageuptodate(page);
>> @@ -2360,6 +2420,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
>>  			}
>>  			check_page_locked(tree, page);
>>  		}
>> +		++bvec;
> 
> can you please move the i++ increments here? From reading the code it's
> not clear on first sight if the sideeffects are ok.
> 


Sure, will update it.

>>  	} while (bvec <= bvec_end);
>>  
>>  	bio_put(bio);
>> @@ -2554,6 +2615,8 @@ static int __extent_read_full_page(struct extent_io_tree *tree,
>>  
>>  	end = page_end;
>>  	while (1) {
>> +		if (range_lock)
>> +			break;
> 
> with range_lock set, this is equivalent to calling
> 
> 2896	set_page_extent_mapped(page);
> 
> (plus the cleancache code), a few lines above. Is this inteded? It's
> actually called with '1' from a single place, process_batch_pages().
> 


Yeah, I want to skip the lock part, since I've already locked this range of extent.
After that, we'll have a continuous range of extent for read.


>>  		lock_extent(tree, start, end);
>>  		ordered = btrfs_lookup_ordered_extent(inode, start);
>>  		if (!ordered)
>> @@ -3497,6 +3560,59 @@ int extent_writepages(struct extent_io_tree *tree,
>>  	return ret;
>>  }
>>  
>> +struct page_list {
>> +	struct page *page;
>> +	struct list_head list;
>> +};
>> +
>> +static int process_batch_pages(struct extent_io_tree *tree,
>> +			       struct address_space *mapping,
>> +			       struct list_head *lock_pages, int *page_cnt,
>> +			       u64 lock_start, u64 lock_end,
>> +				get_extent_t get_extent, struct bio **bio,
>> +				unsigned long *bio_flags)
>> +{
>> +	u64 page_start;
>> +	struct page_list *plist;
>> +
>> +	while (1) {
>> +		struct btrfs_ordered_extent *ordered = NULL;
>> +
>> +		lock_extent(tree, lock_start, lock_end);
>> +		page_start = lock_start;
>> +		while (page_start < lock_end) {
>> +			ordered = btrfs_lookup_ordered_extent(mapping->host,
>> +							      page_start);
>> +			if (ordered) {
>> +				page_start = ordered->file_offset;
>> +				break;
>> +			}
>> +			page_start += PAGE_CACHE_SIZE;
>> +		}
>> +		if (!ordered)
>> +			break;
> 
> You're leaving the range [lock_start, lock_end) locked after taking this
> branch. Intended?
> 


Yeah, this is what it should be.
Usually we do:

o  lock the range that is going to be sent to read (at read_page)
o  set the range uptodate and unlock it (at end_io)


>> +		unlock_extent(tree, lock_start, lock_end);
>> +		btrfs_start_ordered_extent(mapping->host, ordered, 1);
>> +		btrfs_put_ordered_extent(ordered);
>> +	}
>> +
>> +	plist = NULL;
>> +	while (!list_empty(lock_pages)) {
>> +		plist = list_entry(lock_pages->prev, struct page_list, list);
>> +
>> +		__extent_read_full_page(tree, plist->page, get_extent,
>> +					bio, 0, bio_flags, 1);
>> +		page_cache_release(plist->page);
>> +		list_del(&plist->list);
>> +		plist->page = NULL;
> 
> you can drop this assignment
> 


Sure.

Thanks a lot for reviewing! :)


thanks,
liubo

>> +		kfree(plist);
>> +		(*page_cnt)--;
>> +	}
>> +
>> +	WARN_ON((*page_cnt));
>> +	return 0;
>> +}
>> +
>>  int extent_readpages(struct extent_io_tree *tree,
>>  		     struct address_space *mapping,
>>  		     struct list_head *pages, unsigned nr_pages,
>

next prev parent reply	other threads:[~2012-06-15  1:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-13 10:19 [PATCH 0/4 v2][RFC] apply rwlock for extent state Liu Bo
2012-06-13 10:19 ` [PATCH 1/4] Btrfs: use radix tree for checksum Liu Bo
2012-06-13 16:07   ` Zach Brown
2012-06-14  1:50     ` Liu Bo
2012-06-14 16:42       ` Zach Brown
2012-07-06 15:37       ` Chris Mason
2012-07-07  7:43         ` Liu Bo
2012-06-14 15:03   ` David Sterba
2012-06-13 10:19 ` [PATCH 2/4] Btrfs: merge adjacent states as much as possible Liu Bo
2012-06-13 10:19 ` [PATCH 3/4] Btrfs: use large extent range for read and its endio Liu Bo
2012-06-14 16:12   ` David Sterba
2012-06-15  1:50     ` Liu Bo [this message]
2012-06-13 10:19 ` [PATCH 4/4] Btrfs: apply rwlock for extent state Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FDA94EF.9030601@cn.fujitsu.com \
    --to=liubo2009@cn.fujitsu.com \
    --cc=dave@jikos.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).