[PATCH v2] Btrfs: improve multi-thread buffer read

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] Btrfs: improve multi-thread buffer read
@ 2012-07-12  2:13 Liu Bo
  2012-07-12 18:04 ` Chris Mason
  0 siblings, 1 reply; 4+ messages in thread
From: Liu Bo @ 2012-07-12  2:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: jbacik

While testing with my buffer read fio jobs[1], I find that btrfs does not
perform well enough.

Here is a scenario in fio jobs:

We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
and all of them will race on add_to_page_cache_lru(), and if one thread
successfully puts its page into the page cache, it takes the responsibility
to read the page's data.

And what's more, reading a page needs a period of time to finish, in which
other threads can slide in and process rest pages:

     t1          t2          t3          t4
   add Page1
   read Page1  add Page2
     |         read Page2  add Page3
     |            |        read Page3  add Page4
     |            |           |        read Page4
-----|------------|-----------|-----------|--------
     v            v           v           v
    bio          bio         bio         bio

Now we have four bios, each of which holds only one page since we need to
maintain consecutive pages in bio.  Thus, we can end up with far more bios
than we need.

Here we're going to
a) delay the real read-page section and
b) try to put more pages into page cache.

With that said, we can make each bio hold more pages and reduce the number
of bios we need.

Here is some numbers taken from fio results:
         w/o patch                 w patch
       -------------  --------  ---------------
READ:    745MB/s        +32%       987MB/s

[1]:
[global]
group_reporting
thread
numjobs=4
bs=32k
rw=read
ioengine=sync
directory=/mnt/btrfs/

[READ]
filename=foobar
size=2000M
invalidate=1

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
---
v1->v2: if we fail to make a allocation, just fall back to the old way to
        read page.
 fs/btrfs/extent_io.c |   41 +++++++++++++++++++++++++++++++++++++++--
 1 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 01c21b6..5c8ab6c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3549,6 +3549,11 @@ int extent_writepages(struct extent_io_tree *tree,
 	return ret;
 }
 
+struct pagelst {
+	struct page *page;
+	struct list_head lst;
+};
+
 int extent_readpages(struct extent_io_tree *tree,
 		     struct address_space *mapping,
 		     struct list_head *pages, unsigned nr_pages,
@@ -3557,19 +3562,51 @@ int extent_readpages(struct extent_io_tree *tree,
 	struct bio *bio = NULL;
 	unsigned page_idx;
 	unsigned long bio_flags = 0;
+	LIST_HEAD(page_pool);
+	struct pagelst *pagelst = NULL;
 
 	for (page_idx = 0; page_idx < nr_pages; page_idx++) {
 		struct page *page = list_entry(pages->prev, struct page, lru);
+		bool delay_read = true;
 
 		prefetchw(&page->flags);
 		list_del(&page->lru);
+
+		if (!pagelst)
+			pagelst = kmalloc(sizeof(*pagelst), GFP_NOFS);
+		if (!pagelst)
+			delay_read = false;
+
 		if (!add_to_page_cache_lru(page, mapping,
 					page->index, GFP_NOFS)) {
-			__extent_read_full_page(tree, page, get_extent,
-						&bio, 0, &bio_flags);
+			if (delay_read) {
+				pagelst->page = page;
+				list_add(&pagelst->lst, &page_pool);
+				page_cache_get(page);
+				pagelst = NULL;
+			} else {
+				__extent_read_full_page(tree, page, get_extent,
+							&bio, 0, &bio_flags);
+			}
 		}
 		page_cache_release(page);
 	}
+
+	while (!list_empty(&page_pool)) {
+		struct page *page;
+
+		pagelst = list_entry(page_pool.prev, struct pagelst, lst);
+		page = pagelst->page;
+
+		prefetchw(&page->flags);
+		__extent_read_full_page(tree, page, get_extent,
+					&bio, 0, &bio_flags);
+
+		page_cache_release(page);
+		list_del(&pagelst->lst);
+		kfree(pagelst);
+	}
+	BUG_ON(!list_empty(&page_pool));
 	BUG_ON(!list_empty(pages));
 	if (bio)
 		return submit_one_bio(READ, bio, 0, bio_flags);
-- 
1.6.5.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Btrfs: improve multi-thread buffer read
  2012-07-12  2:13 [PATCH v2] Btrfs: improve multi-thread buffer read Liu Bo
@ 2012-07-12 18:04 ` Chris Mason
  2012-07-13  2:52   ` Liu Bo
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Mason @ 2012-07-12 18:04 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs@vger.kernel.org, Josef Bacik

On Wed, Jul 11, 2012 at 08:13:51PM -0600, Liu Bo wrote:
> While testing with my buffer read fio jobs[1], I find that btrfs does not
> perform well enough.
> 
> Here is a scenario in fio jobs:
> 
> We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
> and all of them will race on add_to_page_cache_lru(), and if one thread
> successfully puts its page into the page cache, it takes the responsibility
> to read the page's data.
> 
> And what's more, reading a page needs a period of time to finish, in which
> other threads can slide in and process rest pages:
> 
>      t1          t2          t3          t4
>    add Page1
>    read Page1  add Page2
>      |         read Page2  add Page3
>      |            |        read Page3  add Page4
>      |            |           |        read Page4
> -----|------------|-----------|-----------|--------
>      v            v           v           v
>     bio          bio         bio         bio
> 
> Now we have four bios, each of which holds only one page since we need to
> maintain consecutive pages in bio.  Thus, we can end up with far more bios
> than we need.
> 
> Here we're going to
> a) delay the real read-page section and
> b) try to put more pages into page cache.
> 
> With that said, we can make each bio hold more pages and reduce the number
> of bios we need.
> 
> Here is some numbers taken from fio results:
>          w/o patch                 w patch
>        -------------  --------  ---------------
> READ:    745MB/s        +32%       987MB/s
> 
> [1]:
> [global]
> group_reporting
> thread
> numjobs=4
> bs=32k
> rw=read
> ioengine=sync
> directory=/mnt/btrfs/
> 
> [READ]
> filename=foobar
> size=2000M
> invalidate=1
> 
> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
> ---
> v1->v2: if we fail to make a allocation, just fall back to the old way to
>         read page.
>  fs/btrfs/extent_io.c |   41 +++++++++++++++++++++++++++++++++++++++--
>  1 files changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 01c21b6..5c8ab6c 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3549,6 +3549,11 @@ int extent_writepages(struct extent_io_tree *tree,
>  	return ret;
>  }
>  
> +struct pagelst {
> +	struct page *page;
> +	struct list_head lst;
> +};
> +

I like this patch, its a  big improvement for just a little timing
change.  Instead of doing the kmalloc of this struct, can you please
change it to put a pagevec on the stack.

The model would be:

add a page to the pagevec array
if pagevec full
	launch all the readpages

This lets you avoid the kmalloc, and it is closer to how we solve
similar problems in other parts of the kernel.

Thanks!

-chris

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Btrfs: improve multi-thread buffer read
  2012-07-12 18:04 ` Chris Mason
@ 2012-07-13  2:52   ` Liu Bo
  2012-07-13 10:32     ` Chris Mason
  0 siblings, 1 reply; 4+ messages in thread
From: Liu Bo @ 2012-07-13  2:52 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs@vger.kernel.org, Josef Bacik

On 07/12/2012 02:04 PM, Chris Mason wrote:

> On Wed, Jul 11, 2012 at 08:13:51PM -0600, Liu Bo wrote:
>> While testing with my buffer read fio jobs[1], I find that btrfs does not
>> perform well enough.
>>
>> Here is a scenario in fio jobs:
>>
>> We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
>> and all of them will race on add_to_page_cache_lru(), and if one thread
>> successfully puts its page into the page cache, it takes the responsibility
>> to read the page's data.
>>
>> And what's more, reading a page needs a period of time to finish, in which
>> other threads can slide in and process rest pages:
>>
>>      t1          t2          t3          t4
>>    add Page1
>>    read Page1  add Page2
>>      |         read Page2  add Page3
>>      |            |        read Page3  add Page4
>>      |            |           |        read Page4
>> -----|------------|-----------|-----------|--------
>>      v            v           v           v
>>     bio          bio         bio         bio
>>
>> Now we have four bios, each of which holds only one page since we need to
>> maintain consecutive pages in bio.  Thus, we can end up with far more bios
>> than we need.
>>
>> Here we're going to
>> a) delay the real read-page section and
>> b) try to put more pages into page cache.
>>
>> With that said, we can make each bio hold more pages and reduce the number
>> of bios we need.
>>
>> Here is some numbers taken from fio results:
>>          w/o patch                 w patch
>>        -------------  --------  ---------------
>> READ:    745MB/s        +32%       987MB/s
>>
>> [1]:
>> [global]
>> group_reporting
>> thread
>> numjobs=4
>> bs=32k
>> rw=read
>> ioengine=sync
>> directory=/mnt/btrfs/
>>
>> [READ]
>> filename=foobar
>> size=2000M
>> invalidate=1
>>
>> Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
>> ---
>> v1->v2: if we fail to make a allocation, just fall back to the old way to
>>         read page.
>>  fs/btrfs/extent_io.c |   41 +++++++++++++++++++++++++++++++++++++++--
>>  1 files changed, 39 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 01c21b6..5c8ab6c 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -3549,6 +3549,11 @@ int extent_writepages(struct extent_io_tree *tree,
>>  	return ret;
>>  }
>>  
>> +struct pagelst {
>> +	struct page *page;
>> +	struct list_head lst;
>> +};
>> +
> 
> I like this patch, its a  big improvement for just a little timing
> change.  Instead of doing the kmalloc of this struct, can you please
> change it to put a pagevec on the stack.
> 
> The model would be:
> 
> add a page to the pagevec array
> if pagevec full
> 	launch all the readpages
> 
> This lets you avoid the kmalloc, and it is closer to how we solve
> similar problems in other parts of the kernel.
> 


Yeah, but there is something different.

Actually my first attempt is doing this with struct pagevec, but pagevec has
a PAGEVEC_SIZE, which is limited to 14.

That means that at the worst case, we batch only 14 pages in a bio to submit.

However, a bio is able to contains at most 128 pages with my devices, that's the
reason why I turn to kmalloc another struct.

Here is some performance number:
          w/o patch      w pvec patch     w kmalloc patch
        -------------   --------------  ---------------
READ:      745MB/s         880MB/s          987MB/s


So what do you think about it?  I'm ok with both.

thanks,
liubo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Btrfs: improve multi-thread buffer read
  2012-07-13  2:52   ` Liu Bo
@ 2012-07-13 10:32     ` Chris Mason
  0 siblings, 0 replies; 4+ messages in thread
From: Chris Mason @ 2012-07-13 10:32 UTC (permalink / raw)
  To: Liu Bo; +Cc: Chris L. Mason, linux-btrfs@vger.kernel.org, Josef Bacik

On Thu, Jul 12, 2012 at 08:52:15PM -0600, Liu Bo wrote:
> On 07/12/2012 02:04 PM, Chris Mason wrote:
> > I like this patch, its a  big improvement for just a little timing
> > change.  Instead of doing the kmalloc of this struct, can you please
> > change it to put a pagevec on the stack.
> > 
> > The model would be:
> > 
> > add a page to the pagevec array
> > if pagevec full
> > 	launch all the readpages
> > 
> > This lets you avoid the kmalloc, and it is closer to how we solve
> > similar problems in other parts of the kernel.
> > 
> 
> 
> Yeah, but there is something different.
> 
> Actually my first attempt is doing this with struct pagevec, but pagevec has
> a PAGEVEC_SIZE, which is limited to 14.
> 
> That means that at the worst case, we batch only 14 pages in a bio to submit.
> 
> However, a bio is able to contains at most 128 pages with my devices, that's the
> reason why I turn to kmalloc another struct.
> 
> Here is some performance number:
>           w/o patch      w pvec patch     w kmalloc patch
>         -------------   --------------  ---------------
> READ:      745MB/s         880MB/s          987MB/s
> 
> 
> So what do you think about it?  I'm ok with both.

Lets go with the pagevec.  This particular workload (multiple threads
hammering big reads on the same range) is very specific.  The pagevec
gets us half way there, but won't add extra cost in any other workloads.

-chris


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-07-13 10:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-12  2:13 [PATCH v2] Btrfs: improve multi-thread buffer read Liu Bo
2012-07-12 18:04 ` Chris Mason
2012-07-13  2:52   ` Liu Bo
2012-07-13 10:32     ` Chris Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).