All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liu Bo <liubo2009@cn.fujitsu.com>
To: David Sterba <dave@jikos.cz>
Cc: chris.mason@fusionio.com, linux-btrfs@vger.kernel.org,
	JBacik@fusionio.com
Subject: Re: [PATCH v3] Btrfs: improve multi-thread buffer read
Date: Fri, 20 Jul 2012 11:51:04 +0800	[thread overview]
Message-ID: <5008D5A8.4090000@cn.fujitsu.com> (raw)
In-Reply-To: <20120720033610.GF17430@twin.jikos.cz>

On 07/20/2012 11:36 AM, David Sterba wrote:

> On Thu, Jul 19, 2012 at 10:31:05AM +0800, Liu Bo wrote:
>>> 128 is too much, this would snip 128 * 8 = 1K off the stack.
>> That's why I give up 128. :)
> 
> It's good as a reference point, nobody says it should stay at 128.
> 
>>>> But as Chris suggested, my test is really a race case in practical use, half of improvement
>>>> is somehow enough, so we turn to use pagevec struct because it is closer to how we solve
>>>> similar problems in other parts of the kernel.
>>> Yes it's an optimization, nice and simple one, but I don't see the
>>> use of pagevec justified. By the other parts of kernel is probably meant
>>> memory management, and pagevec's are used along with lookups to inode
>>> mappings, plus there are other sideefects on pagecache (like calling
>>> lru_add_drain() from pagevec_release, as can be seen in your code).
>>>
>>> Filesystems can use pagevec_lookup instead of find_get_pages,
>>> like ext4 does, but btrfs uses simple arrays of 16 pages, in
>>> lock_delalloc_pages, end_compressed_writeback, __unlock_for_delalloc and
>>> extent_clear_unlock_delalloc (in connection with find_get_pages).
>>>
>>> I was specifically interested in benchmarking pagevec used as in V3
>>> against simple array with 16 elements, but now that I looked around
>>> while writing this mail, I think that pagevec is not the way to go.
>>>
>> Sorry, I see no difference between 16 pages array and pagevec(14 pages),
> 
> The difference is 2 pages, at least. Besides [quoting patch from the
> first post for reference]
> 
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -3557,7 +3557,10 @@ int extent_readpages(struct extent_io_tree *tree,
>>         struct bio *bio = NULL;
>>         unsigned page_idx;
>>         unsigned long bio_flags = 0;
>> +       struct pagevec pvec;
>> +       int i = 0;
>>
>> +       pagevec_init(&pvec, 0);
>>         for (page_idx = 0; page_idx < nr_pages; page_idx++) {
>>                 struct page *page = list_entry(pages->prev, struct page, lru);
>>
>> @@ -3565,11 +3568,22 @@ int extent_readpages(struct extent_io_tree *tree,
>>                 list_del(&page->lru);
>>                 if (!add_to_page_cache_lru(page, mapping,
>>                                         page->index, GFP_NOFS)) {
>> -                       __extent_read_full_page(tree, page, get_extent,
>> +                       page_cache_get(page);
>> +                       if (pagevec_add(&pvec, page) == 0) {
>> +                               for (i = 0; i < pagevec_count(&pvec); i++)
>> +                                       __extent_read_full_page(tree,
>> +                                               pvec.pages[i], get_extent,
>>                                                 &bio, 0, &bio_flags);
>> +                               pagevec_release(&pvec);
>                                   ^^^^^^^^^^^^^^^^^^^^^^
> here
> 
>> +                       }
>>                 }
>>                 page_cache_release(page);
>>         }
>> +       for (i = 0; i < pagevec_count(&pvec); i++)
>> +               __extent_read_full_page(tree, pvec.pages[i], get_extent,
>> +                                       &bio, 0, &bio_flags);
>> +       pagevec_release(&pvec);
>           ^^^^^^^^^^^^^^^^^^^^^^
> and here
> 
>> +
>>         BUG_ON(!list_empty(pages));
>>         if (bio)
>>                 return submit_one_bio(READ, bio, 0, bio_flags);
> 
> you actually call pagevec_release. And I pointed out that this is not a
> simple operation (like the other pagevec_* functions just doing some
> arithmetics) -- it calls lru_add_drain(), this does lots of things with
> pagecache and LRU lists, follow the call chain from there if you don't
> believe me.
> 


well, you're totally right.  It does make some side effects.

>> and I have no idea why ext4 use 16 pages array(maybe historical
>> reasons),
> 
> sigh, I didn't say that ext4 uses 16 pointer array, quite the opposite:
> 


oh, sorry, I owe you.

>>> like ext4 does, but btrfs uses simple arrays of 16 pages, in
>>> lock_delalloc_pages, end_compressed_writeback, __unlock_for_delalloc and
>>> extent_clear_unlock_delalloc (in connection with find_get_pages).
> 
>> but IMO it is proper and natural to use pagevec to manage pages.
> 
> As you've benchmarked, the more pages one can batch here at once the
> better and I don't see why we should miss the opportunity for 2 another
> pages just because it's shorter/nicer to write it via pagevec's.
> 


Thanks for your explanation and review, I will give it a hit ASAP.

thanks,
liubo

> 
> david
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



  reply	other threads:[~2012-07-20  3:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-16 18:05 [PATCH v3] Btrfs: improve multi-thread buffer read Liu Bo
2012-07-18 11:57 ` David Sterba
2012-07-19  1:11   ` Liu Bo
2012-07-19  2:05     ` David Sterba
2012-07-19  2:31       ` Liu Bo
2012-07-20  3:36         ` David Sterba
2012-07-20  3:51           ` Liu Bo [this message]
2012-07-20 12:40         ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5008D5A8.4090000@cn.fujitsu.com \
    --to=liubo2009@cn.fujitsu.com \
    --cc=JBacik@fusionio.com \
    --cc=chris.mason@fusionio.com \
    --cc=dave@jikos.cz \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.