Linux NFS development
 help / color / mirror / Atom feed
* [PATCH 0/5] pgio: fix buffered write retry path
@ 2014-07-11 14:20 Weston Andros Adamson
  2014-07-11 14:20 ` [PATCH 1/5] nfs: mark nfs_page reqs with flag for extra ref Weston Andros Adamson
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Weston Andros Adamson @ 2014-07-11 14:20 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Weston Andros Adamson

My recent pgio work added the ability to split requests into sub-page
regions, but didn't handle a few places in the writeback code where
requests are looked up by struct page and may already be split into
multiple requests.

This patchset adds a function "nfs_lock_and_join_requests" in patch
"nfs: handle multiple reqs in nfs_page_async_flush", which:
  - takes mutex lock
  - looks up head request 
  - grabs request lock for each subrequest
     - if unsuccessful, unrolls old locks and waits on subrequest
  - removes all requests from commit lists
  - merges range of subrequests into the head requests
  - unlinks and destroys the old subrequests.

The other patches are related fixes.

The problem showed up when mounting with wsize < PAGE_SIZE - this would
cause multiple requests per page. If a commit failed, nfs_page_async_flush 
would operate just on the head request, leading to a hang.

The nfs_wb_page_cancel patch leverages the same function -
nfs_lock_and_join_requests cancels all operations on the page group.  I've had
a really hard time testing nfs_wb_page_cancel, I've only hit it once in weeks of
testing. Any ideas on how to reliably trigger this is appreciated - it's not
as easy as just kicking off a ton of writeback then truncating. The one time I
did see it was with a ton of i/o on a VM with 256M of RAM, which was swapping
like crazy, along with restarting the server repeatedly (to get commit verifier
mismatch).

Thanks,
 -dros


Weston Andros Adamson (5):
  nfs: mark nfs_page reqs with flag for extra ref
  nfs: nfs_page should take a ref on the head req
  nfs: change find_request to find_head_request
  nfs: handle multiple reqs in nfs_page_async_flush
  nfs: handle multiple reqs in nfs_wb_page_cancel

 fs/nfs/internal.h |   1 +
 fs/nfs/pagelist.c |  18 ++-
 fs/nfs/write.c    | 332 +++++++++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 296 insertions(+), 55 deletions(-)

-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] nfs: mark nfs_page reqs with flag for extra ref
  2014-07-11 14:20 [PATCH 0/5] pgio: fix buffered write retry path Weston Andros Adamson
@ 2014-07-11 14:20 ` Weston Andros Adamson
  2014-07-11 14:20 ` [PATCH 2/5] nfs: nfs_page should take a ref on the head req Weston Andros Adamson
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Weston Andros Adamson @ 2014-07-11 14:20 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Weston Andros Adamson

Change the use of PG_INODE_REF - set it when taking extra reference on
subrequests and take care to only release once for each request.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
---
 fs/nfs/pagelist.c | 4 +++-
 fs/nfs/write.c    | 8 ++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 6e2c0bc..50e10bb 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -251,8 +251,10 @@ nfs_page_group_init(struct nfs_page *req, struct nfs_page *prev)
 		/* grab extra ref if head request has extra ref from
 		 * the write/commit path to handle handoff between write
 		 * and commit lists */
-		if (test_bit(PG_INODE_REF, &prev->wb_head->wb_flags))
+		if (test_bit(PG_INODE_REF, &prev->wb_head->wb_flags)) {
+			set_bit(PG_INODE_REF, &req->wb_flags);
 			kref_get(&req->wb_kref);
+		}
 	}
 }
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 8534ee5..827b57b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -448,7 +448,9 @@ static void nfs_inode_add_request(struct inode *inode, struct nfs_page *req)
 		set_page_private(req->wb_page, (unsigned long)req);
 	}
 	nfsi->npages++;
-	set_bit(PG_INODE_REF, &req->wb_flags);
+	/* this a head request for a page group - mark it as having an
+	 * extra reference so sub groups can follow suit */
+	WARN_ON(test_and_set_bit(PG_INODE_REF, &req->wb_flags));
 	kref_get(&req->wb_kref);
 	spin_unlock(&inode->i_lock);
 }
@@ -474,7 +476,9 @@ static void nfs_inode_remove_request(struct nfs_page *req)
 		nfsi->npages--;
 		spin_unlock(&inode->i_lock);
 	}
-	nfs_release_request(req);
+
+	if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags))
+		nfs_release_request(req);
 }
 
 static void
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] nfs: nfs_page should take a ref on the head req
  2014-07-11 14:20 [PATCH 0/5] pgio: fix buffered write retry path Weston Andros Adamson
  2014-07-11 14:20 ` [PATCH 1/5] nfs: mark nfs_page reqs with flag for extra ref Weston Andros Adamson
@ 2014-07-11 14:20 ` Weston Andros Adamson
  2014-07-11 14:20 ` [PATCH 3/5] nfs: change find_request to find_head_request Weston Andros Adamson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Weston Andros Adamson @ 2014-07-11 14:20 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Weston Andros Adamson

nfs_pages that aren't the the head of a group must take a reference on the
head as long as ->wb_head is set to it. This stops the head from hitting
a refcount of 0 while there is still an active nfs_page for the page group.

This avoids kref warnings in the writeback code when the page group head
is found and referenced.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
---
 fs/nfs/pagelist.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 50e10bb..8b074da 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -239,15 +239,21 @@ nfs_page_group_init(struct nfs_page *req, struct nfs_page *prev)
 	WARN_ON_ONCE(prev == req);
 
 	if (!prev) {
+		/* a head request */
 		req->wb_head = req;
 		req->wb_this_page = req;
 	} else {
+		/* a subrequest */
 		WARN_ON_ONCE(prev->wb_this_page != prev->wb_head);
 		WARN_ON_ONCE(!test_bit(PG_HEADLOCK, &prev->wb_head->wb_flags));
 		req->wb_head = prev->wb_head;
 		req->wb_this_page = prev->wb_this_page;
 		prev->wb_this_page = req;
 
+		/* All subrequests take a ref on the head request until
+		 * nfs_page_group_destroy is called */
+		kref_get(&req->wb_head->wb_kref);
+
 		/* grab extra ref if head request has extra ref from
 		 * the write/commit path to handle handoff between write
 		 * and commit lists */
@@ -271,6 +277,10 @@ nfs_page_group_destroy(struct kref *kref)
 	struct nfs_page *req = container_of(kref, struct nfs_page, wb_kref);
 	struct nfs_page *tmp, *next;
 
+	/* subrequests must release the ref on the head request */
+	if (req->wb_head != req)
+		nfs_release_request(req->wb_head);
+
 	if (!nfs_page_group_sync_on_bit(req, PG_TEARDOWN))
 		return;
 
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] nfs: change find_request to find_head_request
  2014-07-11 14:20 [PATCH 0/5] pgio: fix buffered write retry path Weston Andros Adamson
  2014-07-11 14:20 ` [PATCH 1/5] nfs: mark nfs_page reqs with flag for extra ref Weston Andros Adamson
  2014-07-11 14:20 ` [PATCH 2/5] nfs: nfs_page should take a ref on the head req Weston Andros Adamson
@ 2014-07-11 14:20 ` Weston Andros Adamson
  2014-07-11 14:20 ` [PATCH 4/5] nfs: handle multiple reqs in nfs_page_async_flush Weston Andros Adamson
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Weston Andros Adamson @ 2014-07-11 14:20 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Weston Andros Adamson

nfs_page_find_request_locked* should find the head request for that page.
Rename the functions and add comments to make this clear, and fix a bug
that could return a subrequest when page_private isn't set on the page.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
---
 fs/nfs/write.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 827b57b..2e2b9f1 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -91,8 +91,15 @@ static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
 	set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
 }
 
+/*
+ * nfs_page_find_head_request_locked - find head request associated with @page
+ *
+ * must be called while holding the inode lock.
+ *
+ * returns matching head request with reference held, or NULL if not found.
+ */
 static struct nfs_page *
-nfs_page_find_request_locked(struct nfs_inode *nfsi, struct page *page)
+nfs_page_find_head_request_locked(struct nfs_inode *nfsi, struct page *page)
 {
 	struct nfs_page *req = NULL;
 
@@ -104,25 +111,33 @@ nfs_page_find_request_locked(struct nfs_inode *nfsi, struct page *page)
 		/* Linearly search the commit list for the correct req */
 		list_for_each_entry_safe(freq, t, &nfsi->commit_info.list, wb_list) {
 			if (freq->wb_page == page) {
-				req = freq;
+				req = freq->wb_head;
 				break;
 			}
 		}
 	}
 
-	if (req)
+	if (req) {
+		WARN_ON_ONCE(req->wb_head != req);
+
 		kref_get(&req->wb_kref);
+	}
 
 	return req;
 }
 
-static struct nfs_page *nfs_page_find_request(struct page *page)
+/*
+ * nfs_page_find_head_request - find head request associated with @page
+ *
+ * returns matching head request with reference held, or NULL if not found.
+ */
+static struct nfs_page *nfs_page_find_head_request(struct page *page)
 {
 	struct inode *inode = page_file_mapping(page)->host;
 	struct nfs_page *req = NULL;
 
 	spin_lock(&inode->i_lock);
-	req = nfs_page_find_request_locked(NFS_I(inode), page);
+	req = nfs_page_find_head_request_locked(NFS_I(inode), page);
 	spin_unlock(&inode->i_lock);
 	return req;
 }
@@ -282,7 +297,7 @@ static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblo
 
 	spin_lock(&inode->i_lock);
 	for (;;) {
-		req = nfs_page_find_request_locked(NFS_I(inode), page);
+		req = nfs_page_find_head_request_locked(NFS_I(inode), page);
 		if (req == NULL)
 			break;
 		if (nfs_lock_request(req))
@@ -767,7 +782,7 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 	spin_lock(&inode->i_lock);
 
 	for (;;) {
-		req = nfs_page_find_request_locked(NFS_I(inode), page);
+		req = nfs_page_find_head_request_locked(NFS_I(inode), page);
 		if (req == NULL)
 			goto out_unlock;
 
@@ -875,7 +890,7 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
 	 * dropped page.
 	 */
 	do {
-		req = nfs_page_find_request(page);
+		req = nfs_page_find_head_request(page);
 		if (req == NULL)
 			return 0;
 		l_ctx = req->wb_lock_context;
@@ -1561,7 +1576,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
 
 	for (;;) {
 		wait_on_page_writeback(page);
-		req = nfs_page_find_request(page);
+		req = nfs_page_find_head_request(page);
 		if (req == NULL)
 			break;
 		if (nfs_lock_request(req)) {
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] nfs: handle multiple reqs in nfs_page_async_flush
  2014-07-11 14:20 [PATCH 0/5] pgio: fix buffered write retry path Weston Andros Adamson
                   ` (2 preceding siblings ...)
  2014-07-11 14:20 ` [PATCH 3/5] nfs: change find_request to find_head_request Weston Andros Adamson
@ 2014-07-11 14:20 ` Weston Andros Adamson
  2014-07-12 21:39   ` Trond Myklebust
  2014-07-11 14:20 ` [PATCH 5/5] nfs: handle multiple reqs in nfs_wb_page_cancel Weston Andros Adamson
  2015-06-19 10:11 ` [PATCH 0/5] pgio: fix buffered write retry path Benjamin Coddington
  5 siblings, 1 reply; 8+ messages in thread
From: Weston Andros Adamson @ 2014-07-11 14:20 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Weston Andros Adamson

Change nfs_find_and_lock_request so nfs_page_async_flush can handle multiple
requests in a page. There is only one request for a page the first time
nfs_page_async_flush is called, but if a write or commit fails, async_flush
is called again and there may be multiple requests associated with the page.
The solution is to merge all the requests in a page group into a single
request before calling nfs_pageio_add_request.

Rename nfs_find_and_lock_request to nfs_lock_and_join_requests and
change it to first lock all requests for the page, then cancel and merge
all subrequests into the head request.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
---
 fs/nfs/internal.h |   1 +
 fs/nfs/pagelist.c |   4 +-
 fs/nfs/write.c    | 254 +++++++++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 234 insertions(+), 25 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index da36257..2f19e83 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -244,6 +244,7 @@ void nfs_pgio_data_destroy(struct nfs_pgio_header *);
 int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
 int nfs_initiate_pgio(struct rpc_clnt *, struct nfs_pgio_header *,
 		      const struct rpc_call_ops *, int, int);
+void nfs_free_request(struct nfs_page *req);
 
 static inline void nfs_iocounter_init(struct nfs_io_counter *c)
 {
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 8b074da..a22c130 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -29,8 +29,6 @@
 static struct kmem_cache *nfs_page_cachep;
 static const struct rpc_call_ops nfs_pgio_common_ops;
 
-static void nfs_free_request(struct nfs_page *);
-
 static bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
 {
 	p->npages = pagecount;
@@ -406,7 +404,7 @@ static void nfs_clear_request(struct nfs_page *req)
  *
  * Note: Should never be called with the spinlock held!
  */
-static void nfs_free_request(struct nfs_page *req)
+void nfs_free_request(struct nfs_page *req)
 {
 	WARN_ON_ONCE(req->wb_this_page != req);
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2e2b9f1..4dab432 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -46,6 +46,7 @@ static const struct rpc_call_ops nfs_commit_ops;
 static const struct nfs_pgio_completion_ops nfs_async_write_completion_ops;
 static const struct nfs_commit_completion_ops nfs_commit_completion_ops;
 static const struct nfs_rw_ops nfs_rw_write_ops;
+static void nfs_clear_request_commit(struct nfs_page *req);
 
 static struct kmem_cache *nfs_wdata_cachep;
 static mempool_t *nfs_wdata_mempool;
@@ -289,36 +290,245 @@ static void nfs_end_page_writeback(struct nfs_page *req)
 		clear_bdi_congested(&nfss->backing_dev_info, BLK_RW_ASYNC);
 }
 
-static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblock)
+
+/* nfs_page_group_clear_bits
+ *   @req - an nfs request
+ * clears all page group related bits from @req
+ */
+static void
+nfs_page_group_clear_bits(struct nfs_page *req)
+{
+	clear_bit(PG_TEARDOWN, &req->wb_flags);
+	clear_bit(PG_UNLOCKPAGE, &req->wb_flags);
+	clear_bit(PG_UPTODATE, &req->wb_flags);
+	clear_bit(PG_WB_END, &req->wb_flags);
+	clear_bit(PG_REMOVE, &req->wb_flags);
+}
+
+
+/*
+ * nfs_unroll_locks_and_wait -  unlock all newly locked reqs and wait on @req
+ *
+ * this is a helper function for nfs_lock_and_join_requests
+ *
+ * @inode - inode associated with request page group, must be holding inode lock
+ * @head  - head request of page group, must be holding head lock
+ * @req   - request that couldn't lock and needs to wait on the req bit lock
+ * @nonblock - if true, don't actually wait
+ *
+ * NOTE: this must be called holding page_group bit lock and inode spin lock
+ *       and BOTH will be released before returning.
+ *
+ * returns 0 on success, < 0 on error.
+ */
+static int
+nfs_unroll_locks_and_wait(struct inode *inode, struct nfs_page *head,
+			  struct nfs_page *req, bool nonblock)
+{
+	struct nfs_page *tmp;
+	int ret;
+
+	/* relinquish all the locks successfully grabbed this run */
+	for (tmp = head ; tmp != req; tmp = tmp->wb_this_page)
+		nfs_unlock_request(tmp);
+
+	WARN_ON_ONCE(test_bit(PG_TEARDOWN, &req->wb_flags));
+
+	/* grab a ref on the request that will be waited on */
+	kref_get(&req->wb_kref);
+
+	nfs_page_group_unlock(head);
+	spin_unlock(&inode->i_lock);
+
+	/* release ref from nfs_page_find_head_request_locked */
+	nfs_release_request(head);
+
+	if (!nonblock)
+		ret = nfs_wait_on_request(req);
+	else
+		ret = -EAGAIN;
+	nfs_release_request(req);
+
+	return ret;
+}
+
+/*
+ * nfs_destroy_unlinked_subrequests - destroy recently unlinked subrequests
+ *
+ * @destroy_list - request list (using wb_this_page) terminated by @old_head
+ * @old_head - the old head of the list
+ *
+ * All subrequests must be locked and removed from all lists, so at this point
+ * they are only "active" in this function, and possibly in nfs_wait_on_request
+ * with a reference held by some other context.
+ */
+static void
+nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list,
+				 struct nfs_page *old_head)
+{
+	while (destroy_list) {
+		struct nfs_page *subreq = destroy_list;
+
+		destroy_list = (subreq->wb_this_page == old_head) ?
+				   NULL : subreq->wb_this_page;
+
+		WARN_ON_ONCE(old_head != subreq->wb_head);
+
+		/* make sure old group is not used */
+		subreq->wb_head = subreq;
+		subreq->wb_this_page = subreq;
+
+		nfs_clear_request_commit(subreq);
+
+		/* subreq is now totally disconnected from page group or any
+		 * write / commit lists. last chance to wake any waiters */
+		nfs_unlock_request(subreq);
+
+		if (!test_bit(PG_TEARDOWN, &subreq->wb_flags)) {
+			/* release ref on old head request */
+			nfs_release_request(old_head);
+
+			nfs_page_group_clear_bits(subreq);
+
+			/* release the PG_INODE_REF reference */
+			if (test_and_clear_bit(PG_INODE_REF, &subreq->wb_flags))
+				nfs_release_request(subreq);
+			else
+				WARN_ON_ONCE(1);
+		} else {
+			WARN_ON_ONCE(test_bit(PG_CLEAN, &subreq->wb_flags));
+			/* zombie requests have already released the last
+			 * reference and were waiting on the rest of the
+			 * group to complete. Since it's no longer part of a
+			 * group, simply free the request */
+			nfs_page_group_clear_bits(subreq);
+			nfs_free_request(subreq);
+		}
+	}
+}
+
+/*
+ * nfs_lock_and_join_requests - join all subreqs to the head req and return
+ *                              a locked reference, cancelling any pending
+ *                              operations for this page.
+ *
+ * @page - the page used to lookup the "page group" of nfs_page structures
+ * @nonblock - if true, don't block waiting for request locks
+ *
+ * This function joins all sub requests to the head request by first
+ * locking all requests in the group, cancelling any pending operations
+ * and finally updating the head request to cover the whole range covered by
+ * the (former) group.  All subrequests are removed from any write or commit
+ * lists, unlinked from the group and destroyed.
+ *
+ * Returns a locked, referenced pointer to the head request - which after
+ * this call is guaranteed to be the only request associated with the page.
+ * Returns NULL if no requests are found for @page, or a ERR_PTR if an
+ * error was encountered.
+ */
+static struct nfs_page *
+nfs_lock_and_join_requests(struct page *page, bool nonblock)
 {
 	struct inode *inode = page_file_mapping(page)->host;
-	struct nfs_page *req;
+	struct nfs_page *head, *subreq;
+	struct nfs_page *destroy_list = NULL;
+	unsigned int total_bytes;
 	int ret;
 
+try_again:
+	total_bytes = 0;
+
+	WARN_ON_ONCE(destroy_list);
+
 	spin_lock(&inode->i_lock);
-	for (;;) {
-		req = nfs_page_find_head_request_locked(NFS_I(inode), page);
-		if (req == NULL)
-			break;
-		if (nfs_lock_request(req))
-			break;
-		/* Note: If we hold the page lock, as is the case in nfs_writepage,
-		 *	 then the call to nfs_lock_request() will always
-		 *	 succeed provided that someone hasn't already marked the
-		 *	 request as dirty (in which case we don't care).
-		 */
+
+	/*
+	 * A reference is taken only on the head request which acts as a
+	 * reference to the whole page group - the group will not be destroyed
+	 * until the head reference is released.
+	 */
+	head = nfs_page_find_head_request_locked(NFS_I(inode), page);
+
+	if (!head) {
 		spin_unlock(&inode->i_lock);
-		if (!nonblock)
-			ret = nfs_wait_on_request(req);
-		else
-			ret = -EAGAIN;
-		nfs_release_request(req);
-		if (ret != 0)
+		return NULL;
+	}
+
+	/* lock each request in the page group */
+	nfs_page_group_lock(head);
+	subreq = head;
+	do {
+		/*
+		 * Subrequests are always contiguous, non overlapping
+		 * and in order. If not, it's a programming error.
+		 */
+		WARN_ON_ONCE(subreq->wb_offset !=
+		     (head->wb_offset + total_bytes));
+
+		/* keep track of how many bytes this group covers */
+		total_bytes += subreq->wb_bytes;
+
+		if (!nfs_lock_request(subreq)) {
+			/* releases page group bit lock and
+			 * inode spin lock and all references */
+			ret = nfs_unroll_locks_and_wait(inode, head,
+				subreq, nonblock);
+
+			if (ret == 0)
+				goto try_again;
+
 			return ERR_PTR(ret);
-		spin_lock(&inode->i_lock);
+		}
+
+		subreq = subreq->wb_this_page;
+	} while (subreq != head);
+
+	/* Now that all requests are locked, make sure they aren't on any list.
+	 * Commit list removal accounting is done after locks are dropped */
+	subreq = head;
+	do {
+		nfs_list_remove_request(subreq);
+		subreq = subreq->wb_this_page;
+	} while (subreq != head);
+
+	/* unlink subrequests from head, destroy them later */
+	if (head->wb_this_page != head) {
+		/* destroy list will be terminated by head */
+		destroy_list = head->wb_this_page;
+		head->wb_this_page = head;
+
+		/* change head request to cover whole range that
+		 * the former page group covered */
+		head->wb_bytes = total_bytes;
 	}
+
+	/*
+	 * prepare head request to be added to new pgio descriptor
+	 */
+	nfs_page_group_clear_bits(head);
+
+	/*
+	 * some part of the group was still on the inode list - otherwise
+	 * the group wouldn't be involved in async write.
+	 * grab a reference for the head request, iff it needs one.
+	 */
+	if (!test_and_set_bit(PG_INODE_REF, &head->wb_flags))
+		kref_get(&head->wb_kref);
+
+	nfs_page_group_unlock(head);
+
+	/* drop lock to clear_request_commit the head req and clean up
+	 * requests on destroy list */
 	spin_unlock(&inode->i_lock);
-	return req;
+
+	nfs_destroy_unlinked_subrequests(destroy_list, head);
+
+	/* clean up commit list state */
+	nfs_clear_request_commit(head);
+
+	/* still holds ref on head from nfs_page_find_head_request_locked
+	 * and still has lock on head from lock loop */
+	return head;
 }
 
 /*
@@ -331,7 +541,7 @@ static int nfs_page_async_flush(struct nfs_pageio_descriptor *pgio,
 	struct nfs_page *req;
 	int ret = 0;
 
-	req = nfs_find_and_lock_request(page, nonblock);
+	req = nfs_lock_and_join_requests(page, nonblock);
 	if (!req)
 		goto out;
 	ret = PTR_ERR(req);
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] nfs: handle multiple reqs in nfs_wb_page_cancel
  2014-07-11 14:20 [PATCH 0/5] pgio: fix buffered write retry path Weston Andros Adamson
                   ` (3 preceding siblings ...)
  2014-07-11 14:20 ` [PATCH 4/5] nfs: handle multiple reqs in nfs_page_async_flush Weston Andros Adamson
@ 2014-07-11 14:20 ` Weston Andros Adamson
  2015-06-19 10:11 ` [PATCH 0/5] pgio: fix buffered write retry path Benjamin Coddington
  5 siblings, 0 replies; 8+ messages in thread
From: Weston Andros Adamson @ 2014-07-11 14:20 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs, Weston Andros Adamson

Use nfs_lock_and_join_requests to merge all subrequests into the head request -
this cancels and dereferences all subrequests.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
---
 fs/nfs/write.c | 41 +++++++++++++++++++++--------------------
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 4dab432..dac9b31c 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1784,27 +1784,28 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page)
 	struct nfs_page *req;
 	int ret = 0;
 
-	for (;;) {
-		wait_on_page_writeback(page);
-		req = nfs_page_find_head_request(page);
-		if (req == NULL)
-			break;
-		if (nfs_lock_request(req)) {
-			nfs_clear_request_commit(req);
-			nfs_inode_remove_request(req);
-			/*
-			 * In case nfs_inode_remove_request has marked the
-			 * page as being dirty
-			 */
-			cancel_dirty_page(page, PAGE_CACHE_SIZE);
-			nfs_unlock_and_release_request(req);
-			break;
-		}
-		ret = nfs_wait_on_request(req);
-		nfs_release_request(req);
-		if (ret < 0)
-			break;
+	wait_on_page_writeback(page);
+
+	/* blocking call to cancel all requests and join to a single (head)
+	 * request */
+	req = nfs_lock_and_join_requests(page, false);
+
+	if (IS_ERR(req)) {
+		ret = PTR_ERR(req);
+	} else if (req) {
+		/* all requests from this page have been cancelled by
+		 * nfs_lock_and_join_requests, so just remove the head
+		 * request from the inode / page_private pointer and
+		 * release it */
+		nfs_inode_remove_request(req);
+		/*
+		 * In case nfs_inode_remove_request has marked the
+		 * page as being dirty
+		 */
+		cancel_dirty_page(page, PAGE_CACHE_SIZE);
+		nfs_unlock_and_release_request(req);
 	}
+
 	return ret;
 }
 
-- 
1.8.5.2 (Apple Git-48)


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/5] nfs: handle multiple reqs in nfs_page_async_flush
  2014-07-11 14:20 ` [PATCH 4/5] nfs: handle multiple reqs in nfs_page_async_flush Weston Andros Adamson
@ 2014-07-12 21:39   ` Trond Myklebust
  0 siblings, 0 replies; 8+ messages in thread
From: Trond Myklebust @ 2014-07-12 21:39 UTC (permalink / raw)
  To: Weston Andros Adamson; +Cc: linux-nfs

On Fri, 2014-07-11 at 10:20 -0400, Weston Andros Adamson wrote:
> Change nfs_find_and_lock_request so nfs_page_async_flush can handle multiple
> requests in a page. There is only one request for a page the first time
> nfs_page_async_flush is called, but if a write or commit fails, async_flush
> is called again and there may be multiple requests associated with the page.
> The solution is to merge all the requests in a page group into a single
> request before calling nfs_pageio_add_request.
> 
> Rename nfs_find_and_lock_request to nfs_lock_and_join_requests and
> change it to first lock all requests for the page, then cancel and merge
> all subrequests into the head request.
> 
> Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> ---
>  fs/nfs/internal.h |   1 +
>  fs/nfs/pagelist.c |   4 +-
>  fs/nfs/write.c    | 254 +++++++++++++++++++++++++++++++++++++++++++++++++-----
>  3 files changed, 234 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index da36257..2f19e83 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -244,6 +244,7 @@ void nfs_pgio_data_destroy(struct nfs_pgio_header *);
>  int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
>  int nfs_initiate_pgio(struct rpc_clnt *, struct nfs_pgio_header *,
>  		      const struct rpc_call_ops *, int, int);
> +void nfs_free_request(struct nfs_page *req);
>  
>  static inline void nfs_iocounter_init(struct nfs_io_counter *c)
>  {
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 8b074da..a22c130 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -29,8 +29,6 @@
>  static struct kmem_cache *nfs_page_cachep;
>  static const struct rpc_call_ops nfs_pgio_common_ops;
>  
> -static void nfs_free_request(struct nfs_page *);
> -
>  static bool nfs_pgarray_set(struct nfs_page_array *p, unsigned int pagecount)
>  {
>  	p->npages = pagecount;
> @@ -406,7 +404,7 @@ static void nfs_clear_request(struct nfs_page *req)
>   *
>   * Note: Should never be called with the spinlock held!
>   */
> -static void nfs_free_request(struct nfs_page *req)
> +void nfs_free_request(struct nfs_page *req)
>  {
>  	WARN_ON_ONCE(req->wb_this_page != req);
>  
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 2e2b9f1..4dab432 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -46,6 +46,7 @@ static const struct rpc_call_ops nfs_commit_ops;
>  static const struct nfs_pgio_completion_ops nfs_async_write_completion_ops;
>  static const struct nfs_commit_completion_ops nfs_commit_completion_ops;
>  static const struct nfs_rw_ops nfs_rw_write_ops;
> +static void nfs_clear_request_commit(struct nfs_page *req);
>  
>  static struct kmem_cache *nfs_wdata_cachep;
>  static mempool_t *nfs_wdata_mempool;
> @@ -289,36 +290,245 @@ static void nfs_end_page_writeback(struct nfs_page *req)
>  		clear_bdi_congested(&nfss->backing_dev_info, BLK_RW_ASYNC);
>  }
>  
> -static struct nfs_page *nfs_find_and_lock_request(struct page *page, bool nonblock)
> +
> +/* nfs_page_group_clear_bits
> + *   @req - an nfs request
> + * clears all page group related bits from @req
> + */
> +static void
> +nfs_page_group_clear_bits(struct nfs_page *req)
> +{
> +	clear_bit(PG_TEARDOWN, &req->wb_flags);
> +	clear_bit(PG_UNLOCKPAGE, &req->wb_flags);
> +	clear_bit(PG_UPTODATE, &req->wb_flags);
> +	clear_bit(PG_WB_END, &req->wb_flags);
> +	clear_bit(PG_REMOVE, &req->wb_flags);
> +}
> +
> +
> +/*
> + * nfs_unroll_locks_and_wait -  unlock all newly locked reqs and wait on @req
> + *
> + * this is a helper function for nfs_lock_and_join_requests
> + *
> + * @inode - inode associated with request page group, must be holding inode lock
> + * @head  - head request of page group, must be holding head lock
> + * @req   - request that couldn't lock and needs to wait on the req bit lock
> + * @nonblock - if true, don't actually wait
> + *
> + * NOTE: this must be called holding page_group bit lock and inode spin lock
> + *       and BOTH will be released before returning.
> + *
> + * returns 0 on success, < 0 on error.
> + */
> +static int
> +nfs_unroll_locks_and_wait(struct inode *inode, struct nfs_page *head,
> +			  struct nfs_page *req, bool nonblock)

Added a "__releases(&inode->i_lock)" in order to keep sparse happy.

> +{
> +	struct nfs_page *tmp;
> +	int ret;
> +
> +	/* relinquish all the locks successfully grabbed this run */
> +	for (tmp = head ; tmp != req; tmp = tmp->wb_this_page)
> +		nfs_unlock_request(tmp);
> +
> +	WARN_ON_ONCE(test_bit(PG_TEARDOWN, &req->wb_flags));
> +
> +	/* grab a ref on the request that will be waited on */
> +	kref_get(&req->wb_kref);
> +
> +	nfs_page_group_unlock(head);
> +	spin_unlock(&inode->i_lock);
> +
> +	/* release ref from nfs_page_find_head_request_locked */
> +	nfs_release_request(head);
> +
> +	if (!nonblock)
> +		ret = nfs_wait_on_request(req);
> +	else
> +		ret = -EAGAIN;
> +	nfs_release_request(req);
> +
> +	return ret;
> +}
> +
> +/*
> + * nfs_destroy_unlinked_subrequests - destroy recently unlinked subrequests
> + *
> + * @destroy_list - request list (using wb_this_page) terminated by @old_head
> + * @old_head - the old head of the list
> + *
> + * All subrequests must be locked and removed from all lists, so at this point
> + * they are only "active" in this function, and possibly in nfs_wait_on_request
> + * with a reference held by some other context.
> + */
> +static void
> +nfs_destroy_unlinked_subrequests(struct nfs_page *destroy_list,
> +				 struct nfs_page *old_head)
> +{
> +	while (destroy_list) {
> +		struct nfs_page *subreq = destroy_list;
> +
> +		destroy_list = (subreq->wb_this_page == old_head) ?
> +				   NULL : subreq->wb_this_page;
> +
> +		WARN_ON_ONCE(old_head != subreq->wb_head);
> +
> +		/* make sure old group is not used */
> +		subreq->wb_head = subreq;
> +		subreq->wb_this_page = subreq;
> +
> +		nfs_clear_request_commit(subreq);
> +
> +		/* subreq is now totally disconnected from page group or any
> +		 * write / commit lists. last chance to wake any waiters */
> +		nfs_unlock_request(subreq);
> +
> +		if (!test_bit(PG_TEARDOWN, &subreq->wb_flags)) {
> +			/* release ref on old head request */
> +			nfs_release_request(old_head);
> +
> +			nfs_page_group_clear_bits(subreq);
> +
> +			/* release the PG_INODE_REF reference */
> +			if (test_and_clear_bit(PG_INODE_REF, &subreq->wb_flags))
> +				nfs_release_request(subreq);
> +			else
> +				WARN_ON_ONCE(1);
> +		} else {
> +			WARN_ON_ONCE(test_bit(PG_CLEAN, &subreq->wb_flags));
> +			/* zombie requests have already released the last
> +			 * reference and were waiting on the rest of the
> +			 * group to complete. Since it's no longer part of a
> +			 * group, simply free the request */
> +			nfs_page_group_clear_bits(subreq);
> +			nfs_free_request(subreq);
> +		}
> +	}
> +}
> +
> +/*
> + * nfs_lock_and_join_requests - join all subreqs to the head req and return
> + *                              a locked reference, cancelling any pending
> + *                              operations for this page.
> + *
> + * @page - the page used to lookup the "page group" of nfs_page structures
> + * @nonblock - if true, don't block waiting for request locks
> + *
> + * This function joins all sub requests to the head request by first
> + * locking all requests in the group, cancelling any pending operations
> + * and finally updating the head request to cover the whole range covered by
> + * the (former) group.  All subrequests are removed from any write or commit
> + * lists, unlinked from the group and destroyed.
> + *
> + * Returns a locked, referenced pointer to the head request - which after
> + * this call is guaranteed to be the only request associated with the page.
> + * Returns NULL if no requests are found for @page, or a ERR_PTR if an
> + * error was encountered.
> + */
> +static struct nfs_page *
> +nfs_lock_and_join_requests(struct page *page, bool nonblock)
>  {
>  	struct inode *inode = page_file_mapping(page)->host;
> -	struct nfs_page *req;
> +	struct nfs_page *head, *subreq;
> +	struct nfs_page *destroy_list = NULL;
> +	unsigned int total_bytes;
>  	int ret;
>  
> +try_again:
> +	total_bytes = 0;
> +
> +	WARN_ON_ONCE(destroy_list);
> +
>  	spin_lock(&inode->i_lock);
> -	for (;;) {
> -		req = nfs_page_find_head_request_locked(NFS_I(inode), page);
> -		if (req == NULL)
> -			break;
> -		if (nfs_lock_request(req))
> -			break;
> -		/* Note: If we hold the page lock, as is the case in nfs_writepage,
> -		 *	 then the call to nfs_lock_request() will always
> -		 *	 succeed provided that someone hasn't already marked the
> -		 *	 request as dirty (in which case we don't care).
> -		 */
> +
> +	/*
> +	 * A reference is taken only on the head request which acts as a
> +	 * reference to the whole page group - the group will not be destroyed
> +	 * until the head reference is released.
> +	 */
> +	head = nfs_page_find_head_request_locked(NFS_I(inode), page);
> +
> +	if (!head) {
>  		spin_unlock(&inode->i_lock);
> -		if (!nonblock)
> -			ret = nfs_wait_on_request(req);
> -		else
> -			ret = -EAGAIN;
> -		nfs_release_request(req);
> -		if (ret != 0)
> +		return NULL;
> +	}
> +
> +	/* lock each request in the page group */
> +	nfs_page_group_lock(head);
> +	subreq = head;
> +	do {
> +		/*
> +		 * Subrequests are always contiguous, non overlapping
> +		 * and in order. If not, it's a programming error.
> +		 */
> +		WARN_ON_ONCE(subreq->wb_offset !=
> +		     (head->wb_offset + total_bytes));
> +
> +		/* keep track of how many bytes this group covers */
> +		total_bytes += subreq->wb_bytes;
> +
> +		if (!nfs_lock_request(subreq)) {
> +			/* releases page group bit lock and
> +			 * inode spin lock and all references */
> +			ret = nfs_unroll_locks_and_wait(inode, head,
> +				subreq, nonblock);
> +
> +			if (ret == 0)
> +				goto try_again;
> +
>  			return ERR_PTR(ret);
> -		spin_lock(&inode->i_lock);
> +		}
> +
> +		subreq = subreq->wb_this_page;
> +	} while (subreq != head);
> +
> +	/* Now that all requests are locked, make sure they aren't on any list.
> +	 * Commit list removal accounting is done after locks are dropped */
> +	subreq = head;
> +	do {
> +		nfs_list_remove_request(subreq);
> +		subreq = subreq->wb_this_page;
> +	} while (subreq != head);
> +
> +	/* unlink subrequests from head, destroy them later */
> +	if (head->wb_this_page != head) {
> +		/* destroy list will be terminated by head */
> +		destroy_list = head->wb_this_page;
> +		head->wb_this_page = head;
> +
> +		/* change head request to cover whole range that
> +		 * the former page group covered */
> +		head->wb_bytes = total_bytes;
>  	}
> +
> +	/*
> +	 * prepare head request to be added to new pgio descriptor
> +	 */
> +	nfs_page_group_clear_bits(head);
> +
> +	/*
> +	 * some part of the group was still on the inode list - otherwise
> +	 * the group wouldn't be involved in async write.
> +	 * grab a reference for the head request, iff it needs one.
> +	 */
> +	if (!test_and_set_bit(PG_INODE_REF, &head->wb_flags))
> +		kref_get(&head->wb_kref);
> +
> +	nfs_page_group_unlock(head);
> +
> +	/* drop lock to clear_request_commit the head req and clean up
> +	 * requests on destroy list */
>  	spin_unlock(&inode->i_lock);
> -	return req;
> +
> +	nfs_destroy_unlinked_subrequests(destroy_list, head);
> +
> +	/* clean up commit list state */
> +	nfs_clear_request_commit(head);
> +
> +	/* still holds ref on head from nfs_page_find_head_request_locked
> +	 * and still has lock on head from lock loop */
> +	return head;
>  }
>  
>  /*
> @@ -331,7 +541,7 @@ static int nfs_page_async_flush(struct nfs_pageio_descriptor *pgio,
>  	struct nfs_page *req;
>  	int ret = 0;
>  
> -	req = nfs_find_and_lock_request(page, nonblock);
> +	req = nfs_lock_and_join_requests(page, nonblock);
>  	if (!req)
>  		goto out;
>  	ret = PTR_ERR(req);

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5] pgio: fix buffered write retry path
  2014-07-11 14:20 [PATCH 0/5] pgio: fix buffered write retry path Weston Andros Adamson
                   ` (4 preceding siblings ...)
  2014-07-11 14:20 ` [PATCH 5/5] nfs: handle multiple reqs in nfs_wb_page_cancel Weston Andros Adamson
@ 2015-06-19 10:11 ` Benjamin Coddington
  5 siblings, 0 replies; 8+ messages in thread
From: Benjamin Coddington @ 2015-06-19 10:11 UTC (permalink / raw)
  To: Weston Andros Adamson; +Cc: public-linux-nfs-u79uwXL29TY76Z2rM5mHXA




On Fri, 11 Jul 2014, Weston Andros Adamson wrote:

> My recent pgio work added the ability to split requests into sub-page
> regions, but didn't handle a few places in the writeback code where
> requests are looked up by struct page and may already be split into
> multiple requests.
>
> This patchset adds a function "nfs_lock_and_join_requests" in patch
> "nfs: handle multiple reqs in nfs_page_async_flush", which:
>   - takes mutex lock
>   - looks up head request
>   - grabs request lock for each subrequest
>      - if unsuccessful, unrolls old locks and waits on subrequest
>   - removes all requests from commit lists
>   - merges range of subrequests into the head requests
>   - unlinks and destroys the old subrequests.
>
> The other patches are related fixes.
>
> The problem showed up when mounting with wsize < PAGE_SIZE - this would
> cause multiple requests per page. If a commit failed, nfs_page_async_flush
> would operate just on the head request, leading to a hang.
>
> The nfs_wb_page_cancel patch leverages the same function -
> nfs_lock_and_join_requests cancels all operations on the page group.  I've had
> a really hard time testing nfs_wb_page_cancel, I've only hit it once in weeks of
> testing. Any ideas on how to reliably trigger this is appreciated - it's not
> as easy as just kicking off a ton of writeback then truncating. The one time I
> did see it was with a ton of i/o on a VM with 256M of RAM, which was swapping
> like crazy, along with restarting the server repeatedly (to get commit verifier
> mismatch).

Hey Dros, it's a year later -- but I want to report that this set fixes a
rare race where nfs_wb_page_cancel() and nfs_commit_release_pages() both
remove the same request from an inode, resulting in an underflow in
nfs_inode->nrequests, and finally a BUG (now changed to a WARN) when the
inode is cleaned up.

The fix is that nfs_lock_and_join_requests() holds i_lock for the duration
of checking PagePrivate and trying to lock the nfs_page, and retrying or
returning NULL.  Previously, nfs_wb_page_cancel() held the i_lock while
checking PagePrivate, then released it which allowed the page to be removed
and then unlocked in nfs_commit_release_pages(), and then
nfs_wb_page_cancel() could lock and remove it as well.

I've had good luck exercising nfs_wb_page_cancel() by racing writes and
truncates on a mount with small nfs page size.. but there are lots of
wait_on_page_writeback() and friends in the pagth that close the race
window.  Better luck can be had using madvise w/ the hardware poision flag,
since that jumps you past a bunch of page invalidation checks, but I wonder
if that is a realistic test of real world conditions..

Anyway, thanks for these.. I have to see what of this work I can move back
into the already long-in-the-tooth RHEL6.

Ben



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-06-19 10:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-11 14:20 [PATCH 0/5] pgio: fix buffered write retry path Weston Andros Adamson
2014-07-11 14:20 ` [PATCH 1/5] nfs: mark nfs_page reqs with flag for extra ref Weston Andros Adamson
2014-07-11 14:20 ` [PATCH 2/5] nfs: nfs_page should take a ref on the head req Weston Andros Adamson
2014-07-11 14:20 ` [PATCH 3/5] nfs: change find_request to find_head_request Weston Andros Adamson
2014-07-11 14:20 ` [PATCH 4/5] nfs: handle multiple reqs in nfs_page_async_flush Weston Andros Adamson
2014-07-12 21:39   ` Trond Myklebust
2014-07-11 14:20 ` [PATCH 5/5] nfs: handle multiple reqs in nfs_wb_page_cancel Weston Andros Adamson
2015-06-19 10:11 ` [PATCH 0/5] pgio: fix buffered write retry path Benjamin Coddington

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox