linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: dhowells@redhat.com, Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, nfsv4@linux-nfs.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 21/41] CacheFiles: Permit the page lock state to be monitored [ver #48]
Date: Sat, 04 Apr 2009 12:31:40 +0100	[thread overview]
Message-ID: <32240.1238844700@redhat.com> (raw)
In-Reply-To: <200904041709.31651.nickpiggin@yahoo.com.au>

Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> I would really like to know exactly why it *has* to be asynchronous,
> and see what the diff looks like between a simple synchronous design
> and what you have here.

Okay.  See attached patch.

Three benchmarks, all with data preloaded into the cache, and all with all the
data in memory on the server:

 (1) dd of a 100MB file.

 (2) dd of a 200MB file.

 (3) Eight parallel tars of 340MB kernel trees.

	Benchmark	Without Patch	With Patch
	1		1.971s		3.371s
	2		3.750s		7.093s
	3		18 mins		22 mins

With this patch, we wait for the pages to be read from the backing fs and copy
them in fscache_read_or_alloc_pages().

I'm sure there are ways to optimise things, but whilst doing a synchronous
read at this in fscache_read_or_alloc_pages() might help an start-to-end read
as done by most programs, it's less likely to help random-access reads, such
as page-in for mmapped sections.

Note also: (1) the caller of the netfs's readpages() has already done
readahead speculation, and we should probably make best use of that, rather
than doing it again; and (2) all the metadata for where the data blocks are on
the backing fs has already been read by dint of calling bmap() to determine
whether the data blocks exist or not.

What might be cute is to have cachefiles_read_or_alloc_pages() dentry_open()
the backing file with O_DIRECT|O_NOHOLE, use ->read() to load the data into
the netfs pages directly, and then fput() the file once it has finished.

What would be even cuter is to do readv() for all the netfs pages
simultaneously, but that's impractical as each page has to be kmap()'d whilst
it is being read.

David
---
From: David Howells <dhowells@redhat.com>
Subject: [PATCH] CacheFiles: Sync read demo for Nick Piggin


---

 fs/cachefiles/rdwr.c |  198 +++++++-------------------------------------------
 1 files changed, 28 insertions(+), 170 deletions(-)


diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index a69787e..d99e43d 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -14,53 +14,6 @@
 #include "internal.h"
 
 /*
- * detect wake up events generated by the unlocking of pages in which we're
- * interested
- * - we use this to detect read completion of backing pages
- * - the caller holds the waitqueue lock
- */
-static int cachefiles_read_waiter(wait_queue_t *wait, unsigned mode,
-				  int sync, void *_key)
-{
-	struct cachefiles_one_read *monitor =
-		container_of(wait, struct cachefiles_one_read, monitor);
-	struct cachefiles_object *object;
-	struct wait_bit_key *key = _key;
-	struct page *page = wait->private;
-
-	ASSERT(key);
-
-	_enter("{%lu},%u,%d,{%p,%u}",
-	       monitor->netfs_page->index, mode, sync,
-	       key->flags, key->bit_nr);
-
-	if (key->flags != &page->flags ||
-	    key->bit_nr != PG_locked)
-		return 0;
-
-	_debug("--- monitor %p %lx ---", page, page->flags);
-
-	if (!PageUptodate(page) && !PageError(page))
-		dump_stack();
-
-	/* remove from the waitqueue */
-	list_del(&wait->task_list);
-
-	/* move onto the action list and queue for FS-Cache thread pool */
-	ASSERT(monitor->op);
-
-	object = container_of(monitor->op->op.object,
-			      struct cachefiles_object, fscache);
-
-	spin_lock(&object->work_lock);
-	list_add_tail(&monitor->op_link, &monitor->op->to_do);
-	spin_unlock(&object->work_lock);
-
-	fscache_enqueue_retrieval(monitor->op);
-	return 0;
-}
-
-/*
  * copy data from backing pages to netfs pages to complete a read operation
  * - driven by FS-Cache's thread pool
  */
@@ -139,7 +92,6 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 					    struct page *netpage,
 					    struct pagevec *pagevec)
 {
-	struct cachefiles_one_read *monitor;
 	struct address_space *bmapping;
 	struct page *newpage, *backpage;
 	int ret;
@@ -151,15 +103,6 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 	_debug("read back %p{%lu,%d}",
 	       netpage, netpage->index, page_count(netpage));
 
-	monitor = kzalloc(sizeof(*monitor), GFP_KERNEL);
-	if (!monitor)
-		goto nomem;
-
-	monitor->netfs_page = netpage;
-	monitor->op = fscache_get_retrieval(op);
-
-	init_waitqueue_func_entry(&monitor->monitor, cachefiles_read_waiter);
-
 	/* attempt to get hold of the backing page */
 	bmapping = object->backer->d_inode->i_mapping;
 	newpage = NULL;
@@ -172,7 +115,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 		if (!newpage) {
 			newpage = page_cache_alloc_cold(bmapping);
 			if (!newpage)
-				goto nomem_monitor;
+				goto nomem;
 		}
 
 		ret = add_to_page_cache(newpage, bmapping,
@@ -200,29 +143,12 @@ read_backing_page:
 	if (ret < 0)
 		goto read_error;
 
-	/* set the monitor to transfer the data across */
-monitor_backing_page:
-	_debug("- monitor add");
-
-	/* install the monitor */
-	page_cache_get(monitor->netfs_page);
-	page_cache_get(backpage);
-	monitor->back_page = backpage;
-	monitor->monitor.private = backpage;
-	add_page_wait_queue(backpage, &monitor->monitor);
-	monitor = NULL;
-
-	/* but the page may have been read before the monitor was installed, so
-	 * the monitor may miss the event - so we have to ensure that we do get
-	 * one in such a case */
-	if (trylock_page(backpage)) {
-		_debug("jumpstart %p {%lx}", backpage, backpage->flags);
-		unlock_page(backpage);
-	}
-	goto success;
+	/* wait for the page read to be complete */
+	wait_on_page_locked(backpage);
+	goto use_backing_page;
 
-	/* if the backing page is already present, it can be in one of
-	 * three states: read in progress, read failed or read okay */
+	/* if the backing page is already present, it can be in one of three
+	 * states: read in progress, read failed or read okay */
 backing_page_already_present:
 	_debug("- present");
 
@@ -231,14 +157,19 @@ backing_page_already_present:
 		newpage = NULL;
 	}
 
+use_backing_page:
 	if (PageError(backpage))
 		goto io_error;
 
 	if (PageUptodate(backpage))
 		goto backing_page_already_uptodate;
 
-	if (!trylock_page(backpage))
-		goto monitor_backing_page;
+	lock_page(backpage);
+	if (PageError(backpage) || PageUptodate(backpage)) {
+		unlock_page(backpage);
+		goto use_backing_page;
+	}
+
 	_debug("read %p {%lx}", backpage, backpage->flags);
 	goto read_backing_page;
 
@@ -252,18 +183,12 @@ backing_page_already_uptodate:
 
 	copy_highpage(netpage, backpage);
 	fscache_end_io(op, netpage, 0);
-
-success:
 	_debug("success");
 	ret = 0;
 
 out:
 	if (backpage)
 		page_cache_release(backpage);
-	if (monitor) {
-		fscache_put_retrieval(monitor->op);
-		kfree(monitor);
-	}
 	_leave(" = %d", ret);
 	return ret;
 
@@ -278,9 +203,6 @@ io_error:
 
 nomem_page:
 	page_cache_release(newpage);
-nomem_monitor:
-	fscache_put_retrieval(monitor->op);
-	kfree(monitor);
 nomem:
 	_leave(" = -ENOMEM");
 	return -ENOMEM;
@@ -379,7 +301,6 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 					struct list_head *list,
 					struct pagevec *mark_pvec)
 {
-	struct cachefiles_one_read *monitor = NULL;
 	struct address_space *bmapping = object->backer->d_inode->i_mapping;
 	struct pagevec lru_pvec;
 	struct page *newpage = NULL, *netpage, *_n, *backpage = NULL;
@@ -395,16 +316,6 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 		_debug("read back %p{%lu,%d}",
 		       netpage, netpage->index, page_count(netpage));
 
-		if (!monitor) {
-			monitor = kzalloc(sizeof(*monitor), GFP_KERNEL);
-			if (!monitor)
-				goto nomem;
-
-			monitor->op = fscache_get_retrieval(op);
-			init_waitqueue_func_entry(&monitor->monitor,
-						  cachefiles_read_waiter);
-		}
-
 		for (;;) {
 			backpage = find_get_page(bmapping, netpage->index);
 			if (backpage)
@@ -441,92 +352,44 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 		if (ret < 0)
 			goto read_error;
 
-		/* add the netfs page to the pagecache and LRU, and set the
-		 * monitor to transfer the data across */
-	monitor_backing_page:
-		_debug("- monitor add");
-
-		ret = add_to_page_cache(netpage, op->mapping, netpage->index,
-					GFP_KERNEL);
-		if (ret < 0) {
-			if (ret == -EEXIST) {
-				page_cache_release(netpage);
-				continue;
-			}
-			goto nomem;
-		}
-
-		page_cache_get(netpage);
-		if (!pagevec_add(&lru_pvec, netpage))
-			__pagevec_lru_add_file(&lru_pvec);
-
-		/* install a monitor */
-		page_cache_get(netpage);
-		monitor->netfs_page = netpage;
-
-		page_cache_get(backpage);
-		monitor->back_page = backpage;
-		monitor->monitor.private = backpage;
-		add_page_wait_queue(backpage, &monitor->monitor);
-		monitor = NULL;
-
-		/* but the page may have been read before the monitor was
-		 * installed, so the monitor may miss the event - so we have to
-		 * ensure that we do get one in such a case */
-		if (trylock_page(backpage)) {
-			_debug("2unlock %p {%lx}", backpage, backpage->flags);
-			unlock_page(backpage);
-		}
-
-		page_cache_release(backpage);
-		backpage = NULL;
-
-		page_cache_release(netpage);
-		netpage = NULL;
-		continue;
+		wait_on_page_locked(backpage);
 
 		/* if the backing page is already present, it can be in one of
 		 * three states: read in progress, read failed or read okay */
 	backing_page_already_present:
 		_debug("- present %p", backpage);
 
+	check_backing_page:
 		if (PageError(backpage))
 			goto io_error;
 
 		if (PageUptodate(backpage))
-			goto backing_page_already_uptodate;
+			goto copy_backing_page;
 
 		_debug("- not ready %p{%lx}", backpage, backpage->flags);
 
-		if (!trylock_page(backpage))
-			goto monitor_backing_page;
-
-		if (PageError(backpage)) {
-			_debug("error %lx", backpage->flags);
+		lock_page(backpage);
+		if (PageError(backpage) || PageUptodate(backpage)) {
 			unlock_page(backpage);
-			goto io_error;
+			goto check_backing_page;
 		}
 
-		if (PageUptodate(backpage))
-			goto backing_page_already_uptodate_unlock;
-
 		/* we've locked a page that's neither up to date nor erroneous,
 		 * so we need to attempt to read it again */
 		goto reread_backing_page;
 
-		/* the backing page is already up to date, attach the netfs
-		 * page to the pagecache and LRU and copy the data across */
-	backing_page_already_uptodate_unlock:
-		_debug("uptodate %lx", backpage->flags);
-		unlock_page(backpage);
-	backing_page_already_uptodate:
-		_debug("- uptodate");
+	copy_backing_page:
+		_debug("- copy page");
 
+		/* add the netfs page to the pagecache and LRU */
 		ret = add_to_page_cache(netpage, op->mapping, netpage->index,
 					GFP_KERNEL);
 		if (ret < 0) {
 			if (ret == -EEXIST) {
 				page_cache_release(netpage);
+				page_cache_release(backpage);
+				netpage = NULL;
+				backpage = NULL;
 				continue;
 			}
 			goto nomem;
@@ -537,17 +400,16 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 		page_cache_release(backpage);
 		backpage = NULL;
 
-		if (!pagevec_add(mark_pvec, netpage))
-			fscache_mark_pages_cached(op, mark_pvec);
-
 		page_cache_get(netpage);
 		if (!pagevec_add(&lru_pvec, netpage))
 			__pagevec_lru_add_file(&lru_pvec);
 
+		if (!pagevec_add(mark_pvec, netpage))
+			fscache_mark_pages_cached(op, mark_pvec);
+
 		fscache_end_io(op, netpage, 0);
 		page_cache_release(netpage);
 		netpage = NULL;
-		continue;
 	}
 
 	netpage = NULL;
@@ -564,10 +426,6 @@ out:
 		page_cache_release(netpage);
 	if (backpage)
 		page_cache_release(backpage);
-	if (monitor) {
-		fscache_put_retrieval(op);
-		kfree(monitor);
-	}
 
 	list_for_each_entry_safe(netpage, _n, list, lru) {
 		list_del(&netpage->lru);


  parent reply	other threads:[~2009-04-04 11:32 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-03 15:54 [PATCH 00/41] Permit filesystem local caching [ver #48] David Howells
2009-04-03 15:54 ` [PATCH 01/41] Create a dynamically sized pool of threads for doing very slow work items " David Howells
2009-04-03 15:54 ` [PATCH 02/41] Make slow-work thread pool actually dynamic " David Howells
2009-04-03 15:54 ` [PATCH 03/41] Make the slow work pool configurable " David Howells
2009-04-03 15:54 ` [PATCH 04/41] Document the slow work thread pool " David Howells
2009-04-03 15:55 ` [PATCH 05/41] FS-Cache: Release page->private after failed readahead " David Howells
2009-04-04  6:00   ` Nick Piggin
2009-04-03 15:55 ` [PATCH 06/41] FS-Cache: Recruit a page flags for cache management " David Howells
2009-04-04  6:04   ` Nick Piggin
2009-04-03 15:55 ` [PATCH 07/41] FS-Cache: Add the FS-Cache netfs API and documentation " David Howells
2009-04-03 15:55 ` [PATCH 08/41] FS-Cache: Add the FS-Cache cache backend " David Howells
2009-04-03 15:55 ` [PATCH 09/41] FS-Cache: Add main configuration option, module entry points and debugging " David Howells
2009-04-03 15:55 ` [PATCH 10/41] FS-Cache: Add use of /proc and presentation of statistics " David Howells
2009-04-04 17:39   ` Christoph Hellwig
2009-04-03 15:55 ` [PATCH 11/41] FS-Cache: Root index definition " David Howells
2009-04-03 15:55 ` [PATCH 12/41] FS-Cache: Add cache tag handling " David Howells
2009-04-03 15:55 ` [PATCH 13/41] FS-Cache: Add cache management " David Howells
2009-04-03 15:55 ` [PATCH 14/41] FS-Cache: Provide a slab for cookie allocation " David Howells
2009-04-03 15:55 ` [PATCH 15/41] FS-Cache: Add netfs registration " David Howells
2009-04-03 15:55 ` [PATCH 16/41] FS-Cache: Bit waiting helpers " David Howells
2009-04-03 15:56 ` [PATCH 17/41] FS-Cache: Object management state machine " David Howells
2009-04-03 15:56 ` [PATCH 18/41] FS-Cache: Implement the cookie management part of the netfs API " David Howells
2009-04-03 15:56 ` [PATCH 19/41] FS-Cache: Add and document asynchronous operation handling " David Howells
2009-04-03 15:56 ` [PATCH 20/41] FS-Cache: Implement data I/O part of netfs API " David Howells
2009-04-03 15:56 ` [PATCH 21/41] CacheFiles: Permit the page lock state to be monitored " David Howells
2009-04-04  6:09   ` Nick Piggin
2009-04-04 14:22     ` Nick Piggin
2009-04-04 22:13     ` David Howells
2009-04-06  8:31       ` Nick Piggin
2009-04-04 11:31   ` David Howells [this message]
2009-04-06  9:34     ` Nick Piggin
2009-04-03 15:56 ` [PATCH 22/41] CacheFiles: Export things for CacheFiles " David Howells
2009-04-03 15:56 ` [PATCH 23/41] CacheFiles: A cache that backs onto a mounted filesystem " David Howells
2009-04-03 15:56 ` [PATCH 24/41] FS-Cache: Make kAFS use FS-Cache " David Howells
2009-04-03 15:56 ` [PATCH 25/41] NFS: Add comment banners to some NFS functions " David Howells
2009-04-03 15:56 ` [PATCH 26/41] NFS: Add FS-Cache option bit and debug bit " David Howells
2009-04-03 15:56 ` [PATCH 27/41] NFS: Permit local filesystem caching to be enabled for NFS " David Howells
2009-04-03 15:57 ` [PATCH 28/41] NFS: Register NFS for caching and retrieve the top-level index " David Howells
2009-04-03 15:57 ` [PATCH 29/41] NFS: Define and create server-level objects " David Howells
2009-04-03 15:57 ` [PATCH 30/41] NFS: Define and create superblock-level " David Howells
2009-04-03 15:57 ` [PATCH 31/41] NFS: Define and create inode-level cache " David Howells
2009-04-03 15:57 ` [PATCH 32/41] NFS: Use local disk inode cache " David Howells
2009-04-03 15:57 ` [PATCH 33/41] NFS: Invalidate FsCache page flags when cache removed " David Howells
2009-04-03 15:57 ` [PATCH 34/41] NFS: Add some new I/O counters for FS-Cache doing things for NFS " David Howells
2009-04-03 15:57 ` [PATCH 35/41] NFS: FS-Cache page management " David Howells
2009-04-03 15:57 ` [PATCH 36/41] NFS: Add read context retention for FS-Cache to call back with " David Howells
2009-04-03 15:57 ` [PATCH 37/41] NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching " David Howells
2009-04-03 15:57 ` [PATCH 38/41] NFS: Read pages from FS-Cache into an NFS inode " David Howells
2009-04-03 15:57 ` [PATCH 39/41] NFS: Store pages from an NFS inode into a local cache " David Howells
2009-04-03 15:58 ` [PATCH 40/41] NFS: Display local caching state " David Howells
2009-04-03 15:58 ` [PATCH 41/41] NFS: Add mount options to enable local caching on NFS " David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32240.1238844700@redhat.com \
    --to=dhowells@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nfsv4@linux-nfs.org \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).