From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.fusionio.com ([66.114.96.31]:34061 "EHLO mx2.fusionio.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751660Ab2GKRVZ (ORCPT ); Wed, 11 Jul 2012 13:21:25 -0400 Date: Wed, 11 Jul 2012 13:21:22 -0400 From: Josef Bacik To: Liu Bo CC: "linux-btrfs@vger.kernel.org" Subject: Re: [PATCH RFC] Btrfs: improve multi-thread buffer read Message-ID: <20120711172122.GI7529@localhost.localdomain> References: <1341919679-13792-1-git-send-email-liubo2009@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" In-Reply-To: <1341919679-13792-1-git-send-email-liubo2009@cn.fujitsu.com> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Tue, Jul 10, 2012 at 05:27:59AM -0600, Liu Bo wrote: > While testing with my buffer read fio jobs[1], I find that btrfs does not > perform well enough. > > Here is a scenario in fio jobs: > > We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file, > and all of them will race on add_to_page_cache_lru(), and if one thread > successfully puts its page into the page cache, it takes the responsibility > to read the page's data. > > And what's more, reading a page needs a period of time to finish, in which > other threads can slide in and process rest pages: > > t1 t2 t3 t4 > add Page1 > read Page1 add Page2 > | read Page2 add Page3 > | | read Page3 add Page4 > | | | read Page4 > -----|------------|-----------|-----------|-------- > v v v v > bio bio bio bio > > Now we have four bios, each of which holds only one page since we need to > maintain consecutive pages in bio. Thus, we can end up with far more bios > than we need. > > Here we're going to > a) delay the real read-page section and > b) try to put more pages into page cache. > > With that said, we can make each bio hold more pages and reduce the number > of bios we need. > > Here is some numbers taken from fio results: > w/o patch w patch > ------------- -------- --------------- > READ: 745MB/s +32% 987MB/s > > [1]: > [global] > group_reporting > thread > numjobs=4 > bs=32k > rw=read > ioengine=sync > directory=/mnt/btrfs/ > > [READ] > filename=foobar > size=2000M > invalidate=1 > > Signed-off-by: Liu Bo > --- > fs/btrfs/extent_io.c | 37 +++++++++++++++++++++++++++++++++++-- > 1 files changed, 35 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 01c21b6..8f9c18d 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -3549,6 +3549,11 @@ int extent_writepages(struct extent_io_tree *tree, > return ret; > } > > +struct pagelst { > + struct page *page; > + struct list_head lst; > +}; > + > int extent_readpages(struct extent_io_tree *tree, > struct address_space *mapping, > struct list_head *pages, unsigned nr_pages, > @@ -3557,19 +3562,47 @@ int extent_readpages(struct extent_io_tree *tree, > struct bio *bio = NULL; > unsigned page_idx; > unsigned long bio_flags = 0; > + LIST_HEAD(page_pool); > + struct pagelst *pagelst = NULL; > > for (page_idx = 0; page_idx < nr_pages; page_idx++) { > struct page *page = list_entry(pages->prev, struct page, lru); > > prefetchw(&page->flags); > list_del(&page->lru); > + > + if (!pagelst) > + pagelst = kmalloc(sizeof(*pagelst), GFP_NOFS); > + > + if (!pagelst) { > + page_cache_release(page); > + continue; > + } I'd rather not fail here if we can't make an allocation, since it's just a optimization, just continue on like we normally would. Thanks, Josef