linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod
@ 2011-02-15 14:58 Jeff Layton
  2011-02-15 15:31 ` Trond Myklebust
  2011-02-15 15:53 ` Tigran Mkrtchyan
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff Layton @ 2011-02-15 14:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

I recently had some of our QA people report some connectathon test
failures in RHEL5 (2.6.18-based kernel). For some odd reason (maybe
scheduling differences that make the race more likely?) the problem
occurs more frequently on s390.

The problem generally manifests itself on NFSv4 as a race where an rmdir
fails because a silly-renamed file in the directory wasn't deleted in
time. Looking at traces, what you usually see is the failing rmdir
attempt that fails with the sillydelete of the file that prevented it
very soon afterward.

Silly deletes are handled via dentry_iput and in the case of a close on
NFSv4, the last dentry reference is often held by the CLOSE RPC task.
nfs4_do_close does the close as an async RPC task that it conditionally
waits on depending on whether the close is synchronous or not.

It also sets the workqueue for the task to nfsiod_workqueue. When
tk_workqueue is set, the rpc_release operation is queued to that
workqueue. rpc_release is where the dentry reference held by the task is
put. The caller has no way to wait for that to complete, so the close(2)
syscall can easily return before the rpc_release call is ever done. In
some cases, that rpc_release is delayed for a long enough to prevent a
subsequent rmdir of the containing directory.

I believe this is a bug, or at least not ideal behavior. We should try
not to have the close(2) call return in this situation until the
sillydelete is done.

I've been able to reproduce this more reliably by adding a 100ms sleep
at the top of nfs4_free_closedata. I've not seen it "in the wild" on
mainline kernels, but it seems quite possible when a machine is heavily
loaded.

This patch fixes this by not setting tk_workqueue in nfs4_do_close when
the wait flag is set. This makes the final rpc_put_task a synchronous
operation and should prevent close(2) from returning before the
dentry_iput is done.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfs/nfs4proc.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 78936a8..4cabfea 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1988,11 +1988,14 @@ int nfs4_do_close(struct path *path, struct nfs4_state *state, gfp_t gfp_mask, i
 		.rpc_client = server->client,
 		.rpc_message = &msg,
 		.callback_ops = &nfs4_close_ops,
-		.workqueue = nfsiod_workqueue,
 		.flags = RPC_TASK_ASYNC,
 	};
 	int status = -ENOMEM;
 
+	/* rpc_release must be synchronous too if "wait" is set */
+	if (!wait)
+		task_setup_data.workqueue = nfsiod_workqueue;
+
 	calldata = kzalloc(sizeof(*calldata), gfp_mask);
 	if (calldata == NULL)
 		goto out;
-- 
1.7.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-02-23 20:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-15 14:58 [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod Jeff Layton
2011-02-15 15:31 ` Trond Myklebust
2011-02-15 16:30   ` Jeff Layton
2011-02-15 23:47     ` Trond Myklebust
2011-02-16 14:09       ` Trond Myklebust
2011-02-16 14:26         ` Trond Myklebust
2011-02-16 14:50           ` Jeff Layton
2011-02-16 15:21             ` Trond Myklebust
2011-02-16 18:13               ` Jeff Layton
2011-02-17 13:40                 ` Jeff Layton
2011-02-17 15:10                   ` Jeff Layton
2011-02-17 19:47                     ` Trond Myklebust
2011-02-17 21:37                       ` Jeff Layton
2011-02-18 20:04                         ` Jeff Layton
2011-02-18 20:54                           ` Trond Myklebust
2011-02-23 20:17                             ` Jeff Layton
2011-02-15 15:53 ` Tigran Mkrtchyan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).