All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: Jens Axboe <axboe@suse.de>,
	torvalds@osdl.org, linux-kernel@vger.kernel.org, npiggin@suse.de,
	linux-mm@kvack.org
Subject: Re: Lockless page cache test results
Date: Fri, 28 Apr 2006 19:28:35 +0800	[thread overview]
Message-ID: <346223668.21667@ustc.edu.cn> (raw)
Message-ID: <20060428112835.GA8072@mail.ustc.edu.cn> (raw)
In-Reply-To: <20060426131200.516cbabc.akpm@osdl.org>

On Wed, Apr 26, 2006 at 01:12:00PM -0700, Andrew Morton wrote:
> Jens Axboe <axboe@suse.de> wrote:
> >
> > With a 16-page gang lookup in splice, the top profile for the 4-client
> > case (which is now at 4GiB/sec instead of 3) are:
> > 
> > samples  %        symbol name
> > 30396    36.7217  __do_page_cache_readahead
> > 25843    31.2212  find_get_pages_contig
> > 9699     11.7174  default_idle
> 
> __do_page_cache_readahead() should use gang lookup.  We never got around to
> that, mainly because nothing really demonstrated a need.

I have been testing a patch for this for a while. The new function
looks like

static int
__do_page_cache_readahead(struct address_space *mapping, struct file *filp,
			pgoff_t offset, unsigned long nr_to_read)
{
	struct inode *inode = mapping->host;
	struct page *page;
	LIST_HEAD(page_pool);
	pgoff_t last_index;	/* The last page we want to read */
	pgoff_t hole_index;
	int ret = 0;
	loff_t isize = i_size_read(inode);

	last_index = ((isize - 1) >> PAGE_CACHE_SHIFT);

	if (unlikely(!isize || !nr_to_read))
		goto out;
	if (unlikely(last_index < offset))
		goto out;

	if (last_index > offset + nr_to_read - 1 &&
		offset < offset + nr_to_read)
		last_index = offset + nr_to_read - 1;

	/*
	 * Go through ranges of holes and preallocate all the absent pages.
	 */
next_hole_range:
	cond_resched();

	read_lock_irq(&mapping->tree_lock);
	hole_index = radix_tree_scan_hole(&mapping->page_tree,
					offset, last_index - offset + 1);

	if (hole_index > last_index) {	/* no more holes? */
		read_unlock_irq(&mapping->tree_lock);
		goto submit_io;
	}

	offset = radix_tree_scan_data(&mapping->page_tree, (void **)&page,
						hole_index, last_index);
	read_unlock_irq(&mapping->tree_lock);

	ddprintk("ra range %lu-%lu(%p)-%lu\n", hole_index, offset, page, last_index);

	for (;;) {
                page = page_cache_alloc_cold(mapping);
		if (!page)
			break;

		page->index = hole_index;
		list_add(&page->lru, &page_pool);
		ret++;
		BUG_ON(ret > nr_to_read);

		if (hole_index >= last_index)
			break;

		if (++hole_index >= offset)
			goto next_hole_range;
	}

submit_io:
	/*
	 * Now start the IO.  We ignore I/O errors - if the page is not
	 * uptodate then the caller will launch readpage again, and
	 * will then handle the error.
	 */
	if (ret)
		read_pages(mapping, filp, &page_pool, ret);
	BUG_ON(!list_empty(&page_pool));
out:
	return ret;
}

The radix_tree_scan_data()/radix_tree_scan_hole() functions called
above are more flexible than the original __lookup(). Perhaps we can
rebase radix_tree_gang_lookup() and find_get_pages_contig() on them.

If it is deemed ok, I'll clean it up and submit the patch asap.

Thanks,
Wu

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <wfg@mail.ustc.edu.cn>
To: Andrew Morton <akpm@osdl.org>
Cc: Jens Axboe <axboe@suse.de>,
	torvalds@osdl.org, linux-kernel@vger.kernel.org, npiggin@suse.de,
	linux-mm@kvack.org
Subject: Re: Lockless page cache test results
Date: Fri, 28 Apr 2006 19:28:35 +0800	[thread overview]
Message-ID: <346223668.21667@ustc.edu.cn> (raw)
Message-ID: <20060428112835.GA8072@mail.ustc.edu.cn> (raw)
In-Reply-To: <20060426131200.516cbabc.akpm@osdl.org>

On Wed, Apr 26, 2006 at 01:12:00PM -0700, Andrew Morton wrote:
> Jens Axboe <axboe@suse.de> wrote:
> >
> > With a 16-page gang lookup in splice, the top profile for the 4-client
> > case (which is now at 4GiB/sec instead of 3) are:
> > 
> > samples  %        symbol name
> > 30396    36.7217  __do_page_cache_readahead
> > 25843    31.2212  find_get_pages_contig
> > 9699     11.7174  default_idle
> 
> __do_page_cache_readahead() should use gang lookup.  We never got around to
> that, mainly because nothing really demonstrated a need.

I have been testing a patch for this for a while. The new function
looks like

static int
__do_page_cache_readahead(struct address_space *mapping, struct file *filp,
			pgoff_t offset, unsigned long nr_to_read)
{
	struct inode *inode = mapping->host;
	struct page *page;
	LIST_HEAD(page_pool);
	pgoff_t last_index;	/* The last page we want to read */
	pgoff_t hole_index;
	int ret = 0;
	loff_t isize = i_size_read(inode);

	last_index = ((isize - 1) >> PAGE_CACHE_SHIFT);

	if (unlikely(!isize || !nr_to_read))
		goto out;
	if (unlikely(last_index < offset))
		goto out;

	if (last_index > offset + nr_to_read - 1 &&
		offset < offset + nr_to_read)
		last_index = offset + nr_to_read - 1;

	/*
	 * Go through ranges of holes and preallocate all the absent pages.
	 */
next_hole_range:
	cond_resched();

	read_lock_irq(&mapping->tree_lock);
	hole_index = radix_tree_scan_hole(&mapping->page_tree,
					offset, last_index - offset + 1);

	if (hole_index > last_index) {	/* no more holes? */
		read_unlock_irq(&mapping->tree_lock);
		goto submit_io;
	}

	offset = radix_tree_scan_data(&mapping->page_tree, (void **)&page,
						hole_index, last_index);
	read_unlock_irq(&mapping->tree_lock);

	ddprintk("ra range %lu-%lu(%p)-%lu\n", hole_index, offset, page, last_index);

	for (;;) {
                page = page_cache_alloc_cold(mapping);
		if (!page)
			break;

		page->index = hole_index;
		list_add(&page->lru, &page_pool);
		ret++;
		BUG_ON(ret > nr_to_read);

		if (hole_index >= last_index)
			break;

		if (++hole_index >= offset)
			goto next_hole_range;
	}

submit_io:
	/*
	 * Now start the IO.  We ignore I/O errors - if the page is not
	 * uptodate then the caller will launch readpage again, and
	 * will then handle the error.
	 */
	if (ret)
		read_pages(mapping, filp, &page_pool, ret);
	BUG_ON(!list_empty(&page_pool));
out:
	return ret;
}

The radix_tree_scan_data()/radix_tree_scan_hole() functions called
above are more flexible than the original __lookup(). Perhaps we can
rebase radix_tree_gang_lookup() and find_get_pages_contig() on them.

If it is deemed ok, I'll clean it up and submit the patch asap.

Thanks,
Wu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2006-04-28 11:27 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-26 13:53 Lockless page cache test results Jens Axboe
2006-04-26 14:43 ` Nick Piggin
2006-04-26 14:43   ` Nick Piggin
2006-04-26 19:46   ` Jens Axboe
2006-04-26 19:46     ` Jens Axboe
2006-04-27  5:39     ` Chen, Kenneth W
2006-04-27  5:39       ` Chen, Kenneth W
2006-04-27  6:07       ` Nick Piggin
2006-04-27  6:07         ` Nick Piggin
2006-04-27  6:15       ` Andi Kleen
2006-04-27  6:15         ` Andi Kleen
2006-04-27  7:51         ` Chen, Kenneth W
2006-04-27  7:51           ` Chen, Kenneth W
2006-04-26 16:55 ` Andrew Morton
2006-04-26 16:55   ` Andrew Morton
2006-04-26 17:42   ` Jens Axboe
2006-04-26 17:42     ` Jens Axboe
2006-04-26 18:10     ` Andrew Morton
2006-04-26 18:10       ` Andrew Morton
2006-04-26 18:23       ` Jens Axboe
2006-04-26 18:23         ` Jens Axboe
2006-04-26 18:46         ` Andrew Morton
2006-04-26 18:46           ` Andrew Morton
2006-04-26 19:21           ` Jens Axboe
2006-04-26 19:21             ` Jens Axboe
2006-04-27  5:58           ` Nick Piggin
2006-04-27  5:58             ` Nick Piggin
2006-04-26 18:34       ` Christoph Lameter
2006-04-26 18:34         ` Christoph Lameter
2006-04-26 18:47         ` Andrew Morton
2006-04-26 18:47           ` Andrew Morton
2006-04-26 18:48           ` Christoph Lameter
2006-04-26 18:48             ` Christoph Lameter
2006-04-26 18:49           ` Jens Axboe
2006-04-26 18:49             ` Jens Axboe
2006-04-26 20:31             ` Christoph Lameter
2006-04-26 20:31               ` Christoph Lameter
2006-04-28 14:01               ` David Chinner
2006-04-28 14:01                 ` David Chinner
2006-04-28 14:10                 ` David Chinner
2006-04-28 14:10                   ` David Chinner
2006-04-30  9:49                 ` Nick Piggin
2006-04-30 11:20                   ` Nick Piggin
2006-04-30 11:20                     ` Nick Piggin
2006-04-30 11:39                   ` Jens Axboe
2006-04-30 11:39                     ` Jens Axboe
2006-04-30 11:44                     ` Nick Piggin
2006-04-26 18:58       ` Christoph Hellwig
2006-04-26 18:58         ` Christoph Hellwig
2006-04-26 19:02         ` Jens Axboe
2006-04-26 19:02           ` Jens Axboe
2006-04-26 19:00       ` Linus Torvalds
2006-04-26 19:00         ` Linus Torvalds
2006-04-26 19:15         ` Jens Axboe
2006-04-26 19:15           ` Jens Axboe
2006-04-26 20:12           ` Andrew Morton
2006-04-26 20:12             ` Andrew Morton
2006-04-27  7:45             ` Jens Axboe
2006-04-27  7:47               ` Jens Axboe
2006-04-27  7:47                 ` Jens Axboe
2006-04-27  7:57               ` Nick Piggin
2006-04-27  7:57                 ` Nick Piggin
2006-04-27  8:02                 ` Nick Piggin
2006-04-27  8:02                   ` Nick Piggin
2006-04-27  9:00                   ` Jens Axboe
2006-04-27  9:00                     ` Jens Axboe
2006-04-27 13:36                     ` Nick Piggin
2006-04-27 13:36                       ` Nick Piggin
2006-04-27  8:36                 ` Jens Axboe
2006-04-27  8:36                   ` Jens Axboe
2006-04-28 11:28             ` Wu Fengguang [this message]
2006-04-28 11:28               ` Wu Fengguang
2006-04-28 11:28                 ` Wu Fengguang
2006-04-27  5:49         ` Nick Piggin
2006-04-27  5:49           ` Nick Piggin
2006-04-27 15:12           ` Linus Torvalds
2006-04-27 15:12             ` Linus Torvalds
2006-04-28  4:54             ` Nick Piggin
2006-04-28  4:54               ` Nick Piggin
2006-04-28  5:34               ` Linus Torvalds
2006-04-28  5:34                 ` Linus Torvalds
2006-04-27  9:35         ` Jens Axboe
2006-04-27  5:22       ` Nick Piggin
2006-04-27  5:22         ` Nick Piggin
2006-04-26 18:57     ` Jens Axboe
2006-04-27  2:19       ` KAMEZAWA Hiroyuki
2006-04-27  2:19         ` KAMEZAWA Hiroyuki
2006-04-27  8:03         ` Jens Axboe
2006-04-27  8:03           ` Jens Axboe
2006-04-27 11:16           ` Jens Axboe
2006-04-27 11:16             ` Jens Axboe
2006-04-27 11:41             ` KAMEZAWA Hiroyuki
2006-04-27 11:41               ` KAMEZAWA Hiroyuki
2006-04-27 11:45               ` Jens Axboe
2006-04-27 11:45                 ` Jens Axboe
2006-04-28  9:10 ` Pavel Machek
2006-04-28  9:10   ` Pavel Machek
2006-04-28  9:21   ` Jens Axboe
2006-04-28  9:21     ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2006-04-28 16:58 Al Boldi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=346223668.21667@ustc.edu.cn \
    --to=wfg@mail.ustc.edu.cn \
    --cc=akpm@osdl.org \
    --cc=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.