* [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod @ 2011-02-15 14:58 Jeff Layton 2011-02-15 15:31 ` Trond Myklebust 2011-02-15 15:53 ` Tigran Mkrtchyan 0 siblings, 2 replies; 17+ messages in thread From: Jeff Layton @ 2011-02-15 14:58 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs I recently had some of our QA people report some connectathon test failures in RHEL5 (2.6.18-based kernel). For some odd reason (maybe scheduling differences that make the race more likely?) the problem occurs more frequently on s390. The problem generally manifests itself on NFSv4 as a race where an rmdir fails because a silly-renamed file in the directory wasn't deleted in time. Looking at traces, what you usually see is the failing rmdir attempt that fails with the sillydelete of the file that prevented it very soon afterward. Silly deletes are handled via dentry_iput and in the case of a close on NFSv4, the last dentry reference is often held by the CLOSE RPC task. nfs4_do_close does the close as an async RPC task that it conditionally waits on depending on whether the close is synchronous or not. It also sets the workqueue for the task to nfsiod_workqueue. When tk_workqueue is set, the rpc_release operation is queued to that workqueue. rpc_release is where the dentry reference held by the task is put. The caller has no way to wait for that to complete, so the close(2) syscall can easily return before the rpc_release call is ever done. In some cases, that rpc_release is delayed for a long enough to prevent a subsequent rmdir of the containing directory. I believe this is a bug, or at least not ideal behavior. We should try not to have the close(2) call return in this situation until the sillydelete is done. I've been able to reproduce this more reliably by adding a 100ms sleep at the top of nfs4_free_closedata. I've not seen it "in the wild" on mainline kernels, but it seems quite possible when a machine is heavily loaded. This patch fixes this by not setting tk_workqueue in nfs4_do_close when the wait flag is set. This makes the final rpc_put_task a synchronous operation and should prevent close(2) from returning before the dentry_iput is done. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfs/nfs4proc.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 78936a8..4cabfea 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1988,11 +1988,14 @@ int nfs4_do_close(struct path *path, struct nfs4_state *state, gfp_t gfp_mask, i .rpc_client = server->client, .rpc_message = &msg, .callback_ops = &nfs4_close_ops, - .workqueue = nfsiod_workqueue, .flags = RPC_TASK_ASYNC, }; int status = -ENOMEM; + /* rpc_release must be synchronous too if "wait" is set */ + if (!wait) + task_setup_data.workqueue = nfsiod_workqueue; + calldata = kzalloc(sizeof(*calldata), gfp_mask); if (calldata == NULL) goto out; -- 1.7.4 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-15 14:58 [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod Jeff Layton @ 2011-02-15 15:31 ` Trond Myklebust 2011-02-15 16:30 ` Jeff Layton 2011-02-15 15:53 ` Tigran Mkrtchyan 1 sibling, 1 reply; 17+ messages in thread From: Trond Myklebust @ 2011-02-15 15:31 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs On Tue, 2011-02-15 at 09:58 -0500, Jeff Layton wrote: > I recently had some of our QA people report some connectathon test > failures in RHEL5 (2.6.18-based kernel). For some odd reason (maybe > scheduling differences that make the race more likely?) the problem > occurs more frequently on s390. > > The problem generally manifests itself on NFSv4 as a race where an rmdir > fails because a silly-renamed file in the directory wasn't deleted in > time. Looking at traces, what you usually see is the failing rmdir > attempt that fails with the sillydelete of the file that prevented it > very soon afterward. > > Silly deletes are handled via dentry_iput and in the case of a close on > NFSv4, the last dentry reference is often held by the CLOSE RPC task. > nfs4_do_close does the close as an async RPC task that it conditionally > waits on depending on whether the close is synchronous or not. > > It also sets the workqueue for the task to nfsiod_workqueue. When > tk_workqueue is set, the rpc_release operation is queued to that > workqueue. rpc_release is where the dentry reference held by the task is > put. The caller has no way to wait for that to complete, so the close(2) > syscall can easily return before the rpc_release call is ever done. In > some cases, that rpc_release is delayed for a long enough to prevent a > subsequent rmdir of the containing directory. > > I believe this is a bug, or at least not ideal behavior. We should try > not to have the close(2) call return in this situation until the > sillydelete is done. > > I've been able to reproduce this more reliably by adding a 100ms sleep > at the top of nfs4_free_closedata. I've not seen it "in the wild" on > mainline kernels, but it seems quite possible when a machine is heavily > loaded. > > This patch fixes this by not setting tk_workqueue in nfs4_do_close when > the wait flag is set. This makes the final rpc_put_task a synchronous > operation and should prevent close(2) from returning before the > dentry_iput is done. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > fs/nfs/nfs4proc.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 78936a8..4cabfea 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -1988,11 +1988,14 @@ int nfs4_do_close(struct path *path, struct nfs4_state *state, gfp_t gfp_mask, i > .rpc_client = server->client, > .rpc_message = &msg, > .callback_ops = &nfs4_close_ops, > - .workqueue = nfsiod_workqueue, > .flags = RPC_TASK_ASYNC, > }; > int status = -ENOMEM; > > + /* rpc_release must be synchronous too if "wait" is set */ > + if (!wait) > + task_setup_data.workqueue = nfsiod_workqueue; > + > calldata = kzalloc(sizeof(*calldata), gfp_mask); > if (calldata == NULL) > goto out; There is no guarantee that rpc_wait_for_completion_task() will wait until rpciod had called rpc_put_task(), in which case the above change ends up with a dput() on the rpciod workqueue, and potential deadlocks. Let's rather look at making rpc_put_task a little bit more safe, by ensuring that rpciod always calls queue_work(), and by allowing other callers to switch it off. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-15 15:31 ` Trond Myklebust @ 2011-02-15 16:30 ` Jeff Layton 2011-02-15 23:47 ` Trond Myklebust 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2011-02-15 16:30 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Tue, 15 Feb 2011 10:31:38 -0500 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Tue, 2011-02-15 at 09:58 -0500, Jeff Layton wrote: > > I recently had some of our QA people report some connectathon test > > failures in RHEL5 (2.6.18-based kernel). For some odd reason (maybe > > scheduling differences that make the race more likely?) the problem > > occurs more frequently on s390. > > > > The problem generally manifests itself on NFSv4 as a race where an rmdir > > fails because a silly-renamed file in the directory wasn't deleted in > > time. Looking at traces, what you usually see is the failing rmdir > > attempt that fails with the sillydelete of the file that prevented it > > very soon afterward. > > > > Silly deletes are handled via dentry_iput and in the case of a close on > > NFSv4, the last dentry reference is often held by the CLOSE RPC task. > > nfs4_do_close does the close as an async RPC task that it conditionally > > waits on depending on whether the close is synchronous or not. > > > > It also sets the workqueue for the task to nfsiod_workqueue. When > > tk_workqueue is set, the rpc_release operation is queued to that > > workqueue. rpc_release is where the dentry reference held by the task is > > put. The caller has no way to wait for that to complete, so the close(2) > > syscall can easily return before the rpc_release call is ever done. In > > some cases, that rpc_release is delayed for a long enough to prevent a > > subsequent rmdir of the containing directory. > > > > I believe this is a bug, or at least not ideal behavior. We should try > > not to have the close(2) call return in this situation until the > > sillydelete is done. > > > > I've been able to reproduce this more reliably by adding a 100ms sleep > > at the top of nfs4_free_closedata. I've not seen it "in the wild" on > > mainline kernels, but it seems quite possible when a machine is heavily > > loaded. > > > > This patch fixes this by not setting tk_workqueue in nfs4_do_close when > > the wait flag is set. This makes the final rpc_put_task a synchronous > > operation and should prevent close(2) from returning before the > > dentry_iput is done. > > > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > > --- > > fs/nfs/nfs4proc.c | 5 ++++- > > 1 files changed, 4 insertions(+), 1 deletions(-) > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > index 78936a8..4cabfea 100644 > > --- a/fs/nfs/nfs4proc.c > > +++ b/fs/nfs/nfs4proc.c > > @@ -1988,11 +1988,14 @@ int nfs4_do_close(struct path *path, struct nfs4_state *state, gfp_t gfp_mask, i > > .rpc_client = server->client, > > .rpc_message = &msg, > > .callback_ops = &nfs4_close_ops, > > - .workqueue = nfsiod_workqueue, > > .flags = RPC_TASK_ASYNC, > > }; > > int status = -ENOMEM; > > > > + /* rpc_release must be synchronous too if "wait" is set */ > > + if (!wait) > > + task_setup_data.workqueue = nfsiod_workqueue; > > + > > calldata = kzalloc(sizeof(*calldata), gfp_mask); > > if (calldata == NULL) > > goto out; > > There is no guarantee that rpc_wait_for_completion_task() will wait > until rpciod had called rpc_put_task(), in which case the above change > ends up with a dput() on the rpciod workqueue, and potential deadlocks. > > Let's rather look at making rpc_put_task a little bit more safe, by > ensuring that rpciod always calls queue_work(), and by allowing other > callers to switch it off. > Ok, I see the potential deadlock. To make sure I understand correctly, is this what you'd like to see? 1) If tk_workqueue is set, then queue rpc_release to that. 2) If it's not, queue it to rpciod. 3) add a flag to rpc_put_task (or a variant of that function) that calls rpc_release synchronously, and have nfs4_do_close use that variant when it puts the task. ...does that sound correct? If so, then I'm not sure that it'll 100% close the race. It's still possible that the rpc_put_task call in rpc_release_task would be the last one. If so and we queue it to the workqueue, then we're back in the same situation. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-15 16:30 ` Jeff Layton @ 2011-02-15 23:47 ` Trond Myklebust 2011-02-16 14:09 ` Trond Myklebust 0 siblings, 1 reply; 17+ messages in thread From: Trond Myklebust @ 2011-02-15 23:47 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs On Tue, 2011-02-15 at 11:30 -0500, Jeff Layton wrote: > On Tue, 15 Feb 2011 10:31:38 -0500 > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > On Tue, 2011-02-15 at 09:58 -0500, Jeff Layton wrote: > > > I recently had some of our QA people report some connectathon test > > > failures in RHEL5 (2.6.18-based kernel). For some odd reason (maybe > > > scheduling differences that make the race more likely?) the problem > > > occurs more frequently on s390. > > > > > > The problem generally manifests itself on NFSv4 as a race where an rmdir > > > fails because a silly-renamed file in the directory wasn't deleted in > > > time. Looking at traces, what you usually see is the failing rmdir > > > attempt that fails with the sillydelete of the file that prevented it > > > very soon afterward. > > > > > > Silly deletes are handled via dentry_iput and in the case of a close on > > > NFSv4, the last dentry reference is often held by the CLOSE RPC task. > > > nfs4_do_close does the close as an async RPC task that it conditionally > > > waits on depending on whether the close is synchronous or not. > > > > > > It also sets the workqueue for the task to nfsiod_workqueue. When > > > tk_workqueue is set, the rpc_release operation is queued to that > > > workqueue. rpc_release is where the dentry reference held by the task is > > > put. The caller has no way to wait for that to complete, so the close(2) > > > syscall can easily return before the rpc_release call is ever done. In > > > some cases, that rpc_release is delayed for a long enough to prevent a > > > subsequent rmdir of the containing directory. > > > > > > I believe this is a bug, or at least not ideal behavior. We should try > > > not to have the close(2) call return in this situation until the > > > sillydelete is done. > > > > > > I've been able to reproduce this more reliably by adding a 100ms sleep > > > at the top of nfs4_free_closedata. I've not seen it "in the wild" on > > > mainline kernels, but it seems quite possible when a machine is heavily > > > loaded. <snip> > > There is no guarantee that rpc_wait_for_completion_task() will wait > > until rpciod had called rpc_put_task(), in which case the above change > > ends up with a dput() on the rpciod workqueue, and potential deadlocks. > > > > Let's rather look at making rpc_put_task a little bit more safe, by > > ensuring that rpciod always calls queue_work(), and by allowing other > > callers to switch it off. > > > > Ok, I see the potential deadlock. To make sure I understand correctly, > is this what you'd like to see? > > 1) If tk_workqueue is set, then queue rpc_release to that. > > 2) If it's not, queue it to rpciod. > > 3) add a flag to rpc_put_task (or a variant of that function) that > calls rpc_release synchronously, and have nfs4_do_close use that > variant when it puts the task. > > ...does that sound correct? If so, then I'm not sure that it'll 100% > close the race. It's still possible that the rpc_put_task call in > rpc_release_task would be the last one. If so and we queue it to the > workqueue, then we're back in the same situation. Agreed. The problem is lack of atomicity between rpc_complete_task() and rpc_put_task(). How about something like the following? Cheers Trond 8<----------------------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-15 23:47 ` Trond Myklebust @ 2011-02-16 14:09 ` Trond Myklebust 2011-02-16 14:26 ` Trond Myklebust 0 siblings, 1 reply; 17+ messages in thread From: Trond Myklebust @ 2011-02-16 14:09 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs On Tue, 2011-02-15 at 18:47 -0500, Trond Myklebust wrote: > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > index 243fc09..11b71b1 100644 > --- a/net/sunrpc/sched.c > +++ b/net/sunrpc/sched.c > @@ -252,23 +252,39 @@ static void rpc_set_active(struct rpc_task *task) > > /* > * Mark an RPC call as having completed by clearing the 'active' bit > + * and then waking up all tasks that were sleeping. > */ > -static void rpc_mark_complete_task(struct rpc_task *task) > +static int rpc_complete_task(struct rpc_task *task) > { > - smp_mb__before_clear_bit(); > + void *m = &task->tk_runstate; > + wait_queue_head_t *wq = bit_waitqueue(m, RPC_TASK_ACTIVE); > + struct wait_bit_key k = __WAIT_BIT_KEY_INITIALIZER(m, RPC_TASK_ACTIVE); > + unsigned long flags; > + int ret; > + > + spin_lock_irqsave(&wq->lock, flags); > clear_bit(RPC_TASK_ACTIVE, &task->tk_runstate); > - smp_mb__after_clear_bit(); > - wake_up_bit(&task->tk_runstate, RPC_TASK_ACTIVE); > + if (waitqueue_active(wq)) > + __wake_up_locked_key(wq, TASK_NORMAL, &k); > + ret = atomic_dec_and_test(&task->tk_count); > + spin_unlock_irqrestore(&wq->lock, flags); > + return ret; > } > > /* > * Allow callers to wait for completion of an RPC call > + * > + * Note the use of out_of_line_wait_on_bit() rather than wait_on_bit() > + * to enforce taking of the wq->lock and hence avoid races with > + * rpc_complete_task(). > */ > int __rpc_wait_for_completion_task(struct rpc_task *task, int (*action)(void *)) > { > + BUG_ON(!RPC_IS_ASYNC(task)); > + > if (action == NULL) > action = rpc_wait_bit_killable; > - return wait_on_bit(&task->tk_runstate, RPC_TASK_ACTIVE, > + return out_of_line_wait_on_bit(&task->tk_runstate, RPC_TASK_ACTIVE, > action, TASK_KILLABLE); > } > EXPORT_SYMBOL_GPL(__rpc_wait_for_completion_task); Never mind. The above ordering scheme is broken by the fact that 'wake_bit_function' calls autoremove_wake_function, which again means that finish_wait optimises away the spin lock. Back to the drawing board... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-16 14:09 ` Trond Myklebust @ 2011-02-16 14:26 ` Trond Myklebust 2011-02-16 14:50 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Trond Myklebust @ 2011-02-16 14:26 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs On Wed, 2011-02-16 at 09:09 -0500, Trond Myklebust wrote: > On Tue, 2011-02-15 at 18:47 -0500, Trond Myklebust wrote: > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > > index 243fc09..11b71b1 100644 > > --- a/net/sunrpc/sched.c > > +++ b/net/sunrpc/sched.c > > @@ -252,23 +252,39 @@ static void rpc_set_active(struct rpc_task *task) > > > > /* > > * Mark an RPC call as having completed by clearing the 'active' bit > > + * and then waking up all tasks that were sleeping. > > */ > > -static void rpc_mark_complete_task(struct rpc_task *task) > > +static int rpc_complete_task(struct rpc_task *task) > > { > > - smp_mb__before_clear_bit(); > > + void *m = &task->tk_runstate; > > + wait_queue_head_t *wq = bit_waitqueue(m, RPC_TASK_ACTIVE); > > + struct wait_bit_key k = __WAIT_BIT_KEY_INITIALIZER(m, RPC_TASK_ACTIVE); > > + unsigned long flags; > > + int ret; > > + > > + spin_lock_irqsave(&wq->lock, flags); > > clear_bit(RPC_TASK_ACTIVE, &task->tk_runstate); > > - smp_mb__after_clear_bit(); > > - wake_up_bit(&task->tk_runstate, RPC_TASK_ACTIVE); > > + if (waitqueue_active(wq)) > > + __wake_up_locked_key(wq, TASK_NORMAL, &k); > > + ret = atomic_dec_and_test(&task->tk_count); > > + spin_unlock_irqrestore(&wq->lock, flags); > > + return ret; > > } > > > > /* > > * Allow callers to wait for completion of an RPC call > > + * > > + * Note the use of out_of_line_wait_on_bit() rather than wait_on_bit() > > + * to enforce taking of the wq->lock and hence avoid races with > > + * rpc_complete_task(). > > */ > > int __rpc_wait_for_completion_task(struct rpc_task *task, int (*action)(void *)) > > { > > + BUG_ON(!RPC_IS_ASYNC(task)); > > + > > if (action == NULL) > > action = rpc_wait_bit_killable; > > - return wait_on_bit(&task->tk_runstate, RPC_TASK_ACTIVE, > > + return out_of_line_wait_on_bit(&task->tk_runstate, RPC_TASK_ACTIVE, > > action, TASK_KILLABLE); > > } > > EXPORT_SYMBOL_GPL(__rpc_wait_for_completion_task); > > Never mind. The above ordering scheme is broken by the fact that > 'wake_bit_function' calls autoremove_wake_function, which again means > that finish_wait optimises away the spin lock. > > Back to the drawing board... ...and after yet another cup of coffee the solution suggests itself: we just need to reorder the calls to __wake_up_locked_key and the atomic_dec_and_test in rpc_complete_task. The revised patch is appended. Cheers Trond 8<---------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-16 14:26 ` Trond Myklebust @ 2011-02-16 14:50 ` Jeff Layton 2011-02-16 15:21 ` Trond Myklebust 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2011-02-16 14:50 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Wed, 16 Feb 2011 09:26:13 -0500 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Wed, 2011-02-16 at 09:09 -0500, Trond Myklebust wrote: > > On Tue, 2011-02-15 at 18:47 -0500, Trond Myklebust wrote: > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > > > index 243fc09..11b71b1 100644 > > > --- a/net/sunrpc/sched.c > > > +++ b/net/sunrpc/sched.c > > > @@ -252,23 +252,39 @@ static void rpc_set_active(struct rpc_task *task) > > > > > > /* > > > * Mark an RPC call as having completed by clearing the 'active' bit > > > + * and then waking up all tasks that were sleeping. > > > */ > > > -static void rpc_mark_complete_task(struct rpc_task *task) > > > +static int rpc_complete_task(struct rpc_task *task) > > > { > > > - smp_mb__before_clear_bit(); > > > + void *m = &task->tk_runstate; > > > + wait_queue_head_t *wq = bit_waitqueue(m, RPC_TASK_ACTIVE); > > > + struct wait_bit_key k = __WAIT_BIT_KEY_INITIALIZER(m, RPC_TASK_ACTIVE); > > > + unsigned long flags; > > > + int ret; > > > + > > > + spin_lock_irqsave(&wq->lock, flags); > > > clear_bit(RPC_TASK_ACTIVE, &task->tk_runstate); > > > - smp_mb__after_clear_bit(); > > > - wake_up_bit(&task->tk_runstate, RPC_TASK_ACTIVE); > > > + if (waitqueue_active(wq)) > > > + __wake_up_locked_key(wq, TASK_NORMAL, &k); > > > + ret = atomic_dec_and_test(&task->tk_count); > > > + spin_unlock_irqrestore(&wq->lock, flags); > > > + return ret; > > > } > > > > > > /* > > > * Allow callers to wait for completion of an RPC call > > > + * > > > + * Note the use of out_of_line_wait_on_bit() rather than wait_on_bit() > > > + * to enforce taking of the wq->lock and hence avoid races with > > > + * rpc_complete_task(). > > > */ > > > int __rpc_wait_for_completion_task(struct rpc_task *task, int (*action)(void *)) > > > { > > > + BUG_ON(!RPC_IS_ASYNC(task)); > > > + > > > if (action == NULL) > > > action = rpc_wait_bit_killable; > > > - return wait_on_bit(&task->tk_runstate, RPC_TASK_ACTIVE, > > > + return out_of_line_wait_on_bit(&task->tk_runstate, RPC_TASK_ACTIVE, > > > action, TASK_KILLABLE); > > > } > > > EXPORT_SYMBOL_GPL(__rpc_wait_for_completion_task); > > > > Never mind. The above ordering scheme is broken by the fact that > > 'wake_bit_function' calls autoremove_wake_function, which again means > > that finish_wait optimises away the spin lock. > > > > Back to the drawing board... > > ...and after yet another cup of coffee the solution suggests itself: we > just need to reorder the calls to __wake_up_locked_key and the > atomic_dec_and_test in rpc_complete_task. The revised patch is appended. > > Cheers > Trond > Thanks Trond, This builds, but I can't plug in the module: [ 103.540405] sunrpc: Unknown symbol __wake_up_locked_key (err 0) ...I think __wake_up_locked_key will need to be exported too. I'll do that and then test this out later today. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-16 14:50 ` Jeff Layton @ 2011-02-16 15:21 ` Trond Myklebust 2011-02-16 18:13 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Trond Myklebust @ 2011-02-16 15:21 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs On Wed, 2011-02-16 at 09:50 -0500, Jeff Layton wrote: > Thanks Trond, > > This builds, but I can't plug in the module: > > [ 103.540405] sunrpc: Unknown symbol __wake_up_locked_key (err 0) > > ...I think __wake_up_locked_key will need to be exported too. I'll do > that and then test this out later today. Thanks! I've added an EXPORT_SYMBOL_GPL() for __wake_up_locked_key to the patch. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-16 15:21 ` Trond Myklebust @ 2011-02-16 18:13 ` Jeff Layton 2011-02-17 13:40 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2011-02-16 18:13 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Wed, 16 Feb 2011 10:21:17 -0500 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Wed, 2011-02-16 at 09:50 -0500, Jeff Layton wrote: > > Thanks Trond, > > > > This builds, but I can't plug in the module: > > > > [ 103.540405] sunrpc: Unknown symbol __wake_up_locked_key (err 0) > > > > ...I think __wake_up_locked_key will need to be exported too. I'll do > > that and then test this out later today. > > Thanks! I've added an EXPORT_SYMBOL_GPL() for __wake_up_locked_key to > the patch. > So far this patch looks good. I've been able to reproduce the problem much more reliably with this patch and running the cthon special tests: ---------------------------[snip]------------------------- diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 40381d2..0e3d75f 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1849,6 +1849,7 @@ static void nfs4_free_closedata(void *data) struct nfs4_closedata *calldata = data; struct nfs4_state_owner *sp = calldata->state->owner; + msleep(100); if (calldata->roc) pnfs_roc_release(calldata->state->inode); nfs4_put_open_state(calldata->state); ---------------------------[snip]------------------------- ...with your patch on top of that, I've not been able to reproduce the problem so far after around 20 passes. I'll plan to let the tests run this evening to make sure, but initial results are good. Cheers, -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-16 18:13 ` Jeff Layton @ 2011-02-17 13:40 ` Jeff Layton 2011-02-17 15:10 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2011-02-17 13:40 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Wed, 16 Feb 2011 13:13:07 -0500 Jeff Layton <jlayton@redhat.com> wrote: > On Wed, 16 Feb 2011 10:21:17 -0500 > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > On Wed, 2011-02-16 at 09:50 -0500, Jeff Layton wrote: > > > Thanks Trond, > > > > > > This builds, but I can't plug in the module: > > > > > > [ 103.540405] sunrpc: Unknown symbol __wake_up_locked_key (err 0) > > > > > > ...I think __wake_up_locked_key will need to be exported too. I'll do > > > that and then test this out later today. > > > > Thanks! I've added an EXPORT_SYMBOL_GPL() for __wake_up_locked_key to > > the patch. > > > > So far this patch looks good. I've been able to reproduce the problem > much more reliably with this patch and running the cthon special tests: > > ---------------------------[snip]------------------------- > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 40381d2..0e3d75f 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -1849,6 +1849,7 @@ static void nfs4_free_closedata(void *data) > struct nfs4_closedata *calldata = data; > struct nfs4_state_owner *sp = calldata->state->owner; > > + msleep(100); > if (calldata->roc) > pnfs_roc_release(calldata->state->inode); > nfs4_put_open_state(calldata->state); > ---------------------------[snip]------------------------- > > ...with your patch on top of that, I've not been able to reproduce the > problem so far after around 20 passes. I'll plan to let the tests run > this evening to make sure, but initial results are good. > Looks like it finally failed on the 39th pass: second check for lost reply on non-idempotent requests testing 50 idempotencies in directory "testdir" rmdir 1: Directory not empty special tests failed When I look in the directory (several hours after it failed), the silly-renamed file is still there: -rw---x--x. 1 root root 30 Feb 16 15:04 .nfs000000000000002d00000090 ...so I'm not sure what exactly is wrong yet, but it looks like the silly delete just never happened. Maybe there's a dentry refcount leak of some sort? There are no queued RPC's. I'll keep looking at it but if you have ideas as to what it could be, let me know. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-17 13:40 ` Jeff Layton @ 2011-02-17 15:10 ` Jeff Layton 2011-02-17 19:47 ` Trond Myklebust 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2011-02-17 15:10 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Thu, 17 Feb 2011 08:40:52 -0500 Jeff Layton <jlayton@redhat.com> wrote: > > Looks like it finally failed on the 39th pass: > > second check for lost reply on non-idempotent requests > testing 50 idempotencies in directory "testdir" > rmdir 1: Directory not empty > special tests failed > > When I look in the directory (several hours after it failed), the > silly-renamed file is still there: > > -rw---x--x. 1 root root 30 Feb 16 15:04 .nfs000000000000002d00000090 > > ...so I'm not sure what exactly is wrong yet, but it looks like the > silly delete just never happened. Maybe there's a dentry refcount leak > of some sort? There are no queued RPC's. > > I'll keep looking at it but if you have ideas as to what it could be, > let me know. > I walked down the directory tree in crash on the live kernel and found the dentry. The d_count is 0x0, so I'm not clear on why it didn't get cleaned up: crash> struct dentry.d_flags,d_count,d_name 0xffff880017d46a80 d_flags = 0xc000, d_count = 0x0, d_name = { hash = 0xe08ab5c8, len = 0x1c, name = 0xffff880017d46ab8 ".nfs000000000000002d00000090" }, The d_flags are: #define DCACHE_OP_REVALIDATE 0x4000 #define DCACHE_OP_DELETE 0x8000 ...very odd. I'd have expected to see this one set too: #define DCACHE_NFSFS_RENAMED 0x0002 I suppose the async sillyrename call could have failed and we ended up calling nfs_cancel_async_unlink? I'll stick in some printk's around that area and see if I can figure out what's going on... -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-17 15:10 ` Jeff Layton @ 2011-02-17 19:47 ` Trond Myklebust 2011-02-17 21:37 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Trond Myklebust @ 2011-02-17 19:47 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs On Thu, 2011-02-17 at 10:10 -0500, Jeff Layton wrote: > On Thu, 17 Feb 2011 08:40:52 -0500 > Jeff Layton <jlayton@redhat.com> wrote: > > > > > Looks like it finally failed on the 39th pass: > > > > second check for lost reply on non-idempotent requests > > testing 50 idempotencies in directory "testdir" > > rmdir 1: Directory not empty > > special tests failed > > > > When I look in the directory (several hours after it failed), the > > silly-renamed file is still there: > > > > -rw---x--x. 1 root root 30 Feb 16 15:04 .nfs000000000000002d00000090 > > > > ...so I'm not sure what exactly is wrong yet, but it looks like the > > silly delete just never happened. Maybe there's a dentry refcount leak > > of some sort? There are no queued RPC's. > > > > I'll keep looking at it but if you have ideas as to what it could be, > > let me know. > > > > I walked down the directory tree in crash on the live kernel and found > the dentry. The d_count is 0x0, so I'm not clear on why it didn't get > cleaned up: > > crash> struct dentry.d_flags,d_count,d_name 0xffff880017d46a80 > d_flags = 0xc000, > d_count = 0x0, > d_name = { > hash = 0xe08ab5c8, > len = 0x1c, > name = 0xffff880017d46ab8 ".nfs000000000000002d00000090" > }, > > > The d_flags are: > > #define DCACHE_OP_REVALIDATE 0x4000 > #define DCACHE_OP_DELETE 0x8000 > > ...very odd. I'd have expected to see this one set too: > > #define DCACHE_NFSFS_RENAMED 0x0002 > > I suppose the async sillyrename call could have failed and we ended up > calling nfs_cancel_async_unlink? I'll stick in some printk's around > that area and see if I can figure out what's going on... > Perhaps I missed a call site that needs an rpc_put_task_async()? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-17 19:47 ` Trond Myklebust @ 2011-02-17 21:37 ` Jeff Layton 2011-02-18 20:04 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2011-02-17 21:37 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Thu, 17 Feb 2011 11:47:24 -0800 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Thu, 2011-02-17 at 10:10 -0500, Jeff Layton wrote: > > On Thu, 17 Feb 2011 08:40:52 -0500 > > Jeff Layton <jlayton@redhat.com> wrote: > > > > > > > > Looks like it finally failed on the 39th pass: > > > > > > second check for lost reply on non-idempotent requests > > > testing 50 idempotencies in directory "testdir" > > > rmdir 1: Directory not empty > > > special tests failed > > > > > > When I look in the directory (several hours after it failed), the > > > silly-renamed file is still there: > > > > > > -rw---x--x. 1 root root 30 Feb 16 15:04 .nfs000000000000002d00000090 > > > > > > ...so I'm not sure what exactly is wrong yet, but it looks like the > > > silly delete just never happened. Maybe there's a dentry refcount leak > > > of some sort? There are no queued RPC's. > > > > > > I'll keep looking at it but if you have ideas as to what it could be, > > > let me know. > > > > > > > I walked down the directory tree in crash on the live kernel and found > > the dentry. The d_count is 0x0, so I'm not clear on why it didn't get > > cleaned up: > > > > crash> struct dentry.d_flags,d_count,d_name 0xffff880017d46a80 > > d_flags = 0xc000, > > d_count = 0x0, > > d_name = { > > hash = 0xe08ab5c8, > > len = 0x1c, > > name = 0xffff880017d46ab8 ".nfs000000000000002d00000090" > > }, > > > > > > The d_flags are: > > > > #define DCACHE_OP_REVALIDATE 0x4000 > > #define DCACHE_OP_DELETE 0x8000 > > > > ...very odd. I'd have expected to see this one set too: > > > > #define DCACHE_NFSFS_RENAMED 0x0002 > > > > I suppose the async sillyrename call could have failed and we ended up > > calling nfs_cancel_async_unlink? I'll stick in some printk's around > > that area and see if I can figure out what's going on... > > > > Perhaps I missed a call site that needs an rpc_put_task_async()? > I'm not sure that would explain what we're seeing here, unless I'm just missing something. I do know that nfs_cancel_async_unlink was never called in my latest test run, so that does not seem to be it. Another anomaly -- d_fsdata is NULL. How do we get into that state with a dentry that has been silly-renamed? FWIW, here's the entire dentry struct with my latest reproducer. Let me know if anything stands out to you... ffff880017d68e70 struct dentry { d_flags = 0xc008, d_seq = { sequence = 0x2 }, d_hash = { next = 0x0, pprev = 0xffffc900000b45d0 }, d_parent = 0xffff880017e50e70, d_name = { hash = 0xe08a8918, len = 0x1c, name = 0xffff880017d68ea8 ".nfs000000000000002d00000031" }, d_inode = 0xffff88002cec1070, d_iname = ".nfs000000000000002d00000031\000kkk", d_count = 0x0, d_lock = { { rlock = { raw_lock = { slock = 0x180018 }, magic = 0xdead4ead, owner_cpu = 0xffffffff, owner = 0xffffffffffffffff, dep_map = { key = 0xffffffff827df728, class_cache = {0x0, 0x0}, name = 0xffffffff817cb493 "&(&dentry->d_lock)->rlock", cpu = 0x1, ip = 0xffffffff811467bc } }, { __padding = "\030\000\030\000\255N\255\336\377\377\377\377kkkk\377\377\377\377\377\377\377\377", dep_map = { key = 0xffffffff827df728, class_cache = {0x0, 0x0}, name = 0xffffffff817cb493 "&(&dentry->d_lock)->rlock", cpu = 0x1, ip = 0xffffffff811467bc } } } }, d_op = 0xffffffffa01402c0, d_sb = 0xffff88003436b9f8, d_time = 0x5a5a5a5a5a5a5a65, d_fsdata = 0x0, d_lru = { next = 0xffff880017e50f38, prev = 0xffff88003436bbd0 }, d_u = { d_child = { next = 0xffff880017e50f58, prev = 0xffff880017e50f58 }, d_rcu = { next = 0xffff880017e50f58, func = 0xffff880017e50f58 } }, d_subdirs = { next = 0xffff880017d68f58, prev = 0xffff880017d68f58 }, d_alias = { next = 0xffff88002cec11d8, prev = 0xffff88002cec11d8 } } -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-17 21:37 ` Jeff Layton @ 2011-02-18 20:04 ` Jeff Layton 2011-02-18 20:54 ` Trond Myklebust 0 siblings, 1 reply; 17+ messages in thread From: Jeff Layton @ 2011-02-18 20:04 UTC (permalink / raw) To: Jeff Layton; +Cc: Trond Myklebust, linux-nfs On Thu, 17 Feb 2011 16:37:41 -0500 Jeff Layton <jlayton@redhat.com> wrote: > On Thu, 17 Feb 2011 11:47:24 -0800 > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > On Thu, 2011-02-17 at 10:10 -0500, Jeff Layton wrote: > > > On Thu, 17 Feb 2011 08:40:52 -0500 > > > Jeff Layton <jlayton@redhat.com> wrote: > > > > > > > > > > > Looks like it finally failed on the 39th pass: > > > > > > > > second check for lost reply on non-idempotent requests > > > > testing 50 idempotencies in directory "testdir" > > > > rmdir 1: Directory not empty > > > > special tests failed > > > > > > > > When I look in the directory (several hours after it failed), the > > > > silly-renamed file is still there: > > > > > > > > -rw---x--x. 1 root root 30 Feb 16 15:04 .nfs000000000000002d00000090 > > > > > > > > ...so I'm not sure what exactly is wrong yet, but it looks like the > > > > silly delete just never happened. Maybe there's a dentry refcount leak > > > > of some sort? There are no queued RPC's. > > > > > > > > I'll keep looking at it but if you have ideas as to what it could be, > > > > let me know. > > > > > > > > > > I walked down the directory tree in crash on the live kernel and found > > > the dentry. The d_count is 0x0, so I'm not clear on why it didn't get > > > cleaned up: > > > > > > crash> struct dentry.d_flags,d_count,d_name 0xffff880017d46a80 > > > d_flags = 0xc000, > > > d_count = 0x0, > > > d_name = { > > > hash = 0xe08ab5c8, > > > len = 0x1c, > > > name = 0xffff880017d46ab8 ".nfs000000000000002d00000090" > > > }, > > > > > > > > > The d_flags are: > > > > > > #define DCACHE_OP_REVALIDATE 0x4000 > > > #define DCACHE_OP_DELETE 0x8000 > > > > > > ...very odd. I'd have expected to see this one set too: > > > > > > #define DCACHE_NFSFS_RENAMED 0x0002 > > > > > > I suppose the async sillyrename call could have failed and we ended up > > > calling nfs_cancel_async_unlink? I'll stick in some printk's around > > > that area and see if I can figure out what's going on... > > > > > > > Perhaps I missed a call site that needs an rpc_put_task_async()? > > > > I'm not sure that would explain what we're seeing here, unless I'm just > missing something. I do know that nfs_cancel_async_unlink was never > called in my latest test run, so that does not seem to be it. > > Another anomaly -- d_fsdata is NULL. How do we get into that state with > a dentry that has been silly-renamed? FWIW, here's the entire dentry > struct with my latest reproducer. Let me know if anything stands out to > you... > My current thinking is that the problems I'm seeing here are not directly related to the bug we're trying to fix. It looks like sillyrename is still just plain broken a'la your email entitled: 2.6.38-rc2... NFS sillyrename is broken... ...I'll plan to keep an eye out for the fix from Nick, and re-test this when it hits. Thanks, -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-18 20:04 ` Jeff Layton @ 2011-02-18 20:54 ` Trond Myklebust 2011-02-23 20:17 ` Jeff Layton 0 siblings, 1 reply; 17+ messages in thread From: Trond Myklebust @ 2011-02-18 20:54 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs On Fri, 2011-02-18 at 15:04 -0500, Jeff Layton wrote: > On Thu, 17 Feb 2011 16:37:41 -0500 > Jeff Layton <jlayton@redhat.com> wrote: > > > On Thu, 17 Feb 2011 11:47:24 -0800 > > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > > > On Thu, 2011-02-17 at 10:10 -0500, Jeff Layton wrote: > > > > On Thu, 17 Feb 2011 08:40:52 -0500 > > > > Jeff Layton <jlayton@redhat.com> wrote: > > > > > > > > > > > > > > Looks like it finally failed on the 39th pass: > > > > > > > > > > second check for lost reply on non-idempotent requests > > > > > testing 50 idempotencies in directory "testdir" > > > > > rmdir 1: Directory not empty > > > > > special tests failed > > > > > > > > > > When I look in the directory (several hours after it failed), the > > > > > silly-renamed file is still there: > > > > > > > > > > -rw---x--x. 1 root root 30 Feb 16 15:04 .nfs000000000000002d00000090 > > > > > > > > > > ...so I'm not sure what exactly is wrong yet, but it looks like the > > > > > silly delete just never happened. Maybe there's a dentry refcount leak > > > > > of some sort? There are no queued RPC's. > > > > > > > > > > I'll keep looking at it but if you have ideas as to what it could be, > > > > > let me know. > > > > > > > > > > > > > I walked down the directory tree in crash on the live kernel and found > > > > the dentry. The d_count is 0x0, so I'm not clear on why it didn't get > > > > cleaned up: > > > > > > > > crash> struct dentry.d_flags,d_count,d_name 0xffff880017d46a80 > > > > d_flags = 0xc000, > > > > d_count = 0x0, > > > > d_name = { > > > > hash = 0xe08ab5c8, > > > > len = 0x1c, > > > > name = 0xffff880017d46ab8 ".nfs000000000000002d00000090" > > > > }, > > > > > > > > > > > > The d_flags are: > > > > > > > > #define DCACHE_OP_REVALIDATE 0x4000 > > > > #define DCACHE_OP_DELETE 0x8000 > > > > > > > > ...very odd. I'd have expected to see this one set too: > > > > > > > > #define DCACHE_NFSFS_RENAMED 0x0002 > > > > > > > > I suppose the async sillyrename call could have failed and we ended up > > > > calling nfs_cancel_async_unlink? I'll stick in some printk's around > > > > that area and see if I can figure out what's going on... > > > > > > > > > > Perhaps I missed a call site that needs an rpc_put_task_async()? > > > > > > > I'm not sure that would explain what we're seeing here, unless I'm just > > missing something. I do know that nfs_cancel_async_unlink was never > > called in my latest test run, so that does not seem to be it. > > > > Another anomaly -- d_fsdata is NULL. How do we get into that state with > > a dentry that has been silly-renamed? FWIW, here's the entire dentry > > struct with my latest reproducer. Let me know if anything stands out to > > you... > > > > My current thinking is that the problems I'm seeing here are not > directly related to the bug we're trying to fix. It looks like > sillyrename is still just plain broken a'la your email entitled: > > 2.6.38-rc2... NFS sillyrename is broken... > > ...I'll plan to keep an eye out for the fix from Nick, and re-test this > when it hits. > > Thanks, Oh, I'm sorry. I wasn't aware you were testing on 2.6.38-rc... Yes, sillyrename is still broken there due to the NULL parent dentry being passed to the d_iput() callback. Nick promised us a patch, but hasn't delivered yet. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-18 20:54 ` Trond Myklebust @ 2011-02-23 20:17 ` Jeff Layton 0 siblings, 0 replies; 17+ messages in thread From: Jeff Layton @ 2011-02-23 20:17 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Fri, 18 Feb 2011 12:54:07 -0800 Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Fri, 2011-02-18 at 15:04 -0500, Jeff Layton wrote: > > On Thu, 17 Feb 2011 16:37:41 -0500 > > Jeff Layton <jlayton@redhat.com> wrote: > > > > > On Thu, 17 Feb 2011 11:47:24 -0800 > > > Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > > > > > > > On Thu, 2011-02-17 at 10:10 -0500, Jeff Layton wrote: > > > > > On Thu, 17 Feb 2011 08:40:52 -0500 > > > > > Jeff Layton <jlayton@redhat.com> wrote: > > > > > > > > > > > > > > > > > Looks like it finally failed on the 39th pass: > > > > > > > > > > > > second check for lost reply on non-idempotent requests > > > > > > testing 50 idempotencies in directory "testdir" > > > > > > rmdir 1: Directory not empty > > > > > > special tests failed > > > > > > > > > > > > When I look in the directory (several hours after it failed), the > > > > > > silly-renamed file is still there: > > > > > > > > > > > > -rw---x--x. 1 root root 30 Feb 16 15:04 .nfs000000000000002d00000090 > > > > > > > > > > > > ...so I'm not sure what exactly is wrong yet, but it looks like the > > > > > > silly delete just never happened. Maybe there's a dentry refcount leak > > > > > > of some sort? There are no queued RPC's. > > > > > > > > > > > > I'll keep looking at it but if you have ideas as to what it could be, > > > > > > let me know. > > > > > > > > > > > > > > > > I walked down the directory tree in crash on the live kernel and found > > > > > the dentry. The d_count is 0x0, so I'm not clear on why it didn't get > > > > > cleaned up: > > > > > > > > > > crash> struct dentry.d_flags,d_count,d_name 0xffff880017d46a80 > > > > > d_flags = 0xc000, > > > > > d_count = 0x0, > > > > > d_name = { > > > > > hash = 0xe08ab5c8, > > > > > len = 0x1c, > > > > > name = 0xffff880017d46ab8 ".nfs000000000000002d00000090" > > > > > }, > > > > > > > > > > > > > > > The d_flags are: > > > > > > > > > > #define DCACHE_OP_REVALIDATE 0x4000 > > > > > #define DCACHE_OP_DELETE 0x8000 > > > > > > > > > > ...very odd. I'd have expected to see this one set too: > > > > > > > > > > #define DCACHE_NFSFS_RENAMED 0x0002 > > > > > > > > > > I suppose the async sillyrename call could have failed and we ended up > > > > > calling nfs_cancel_async_unlink? I'll stick in some printk's around > > > > > that area and see if I can figure out what's going on... > > > > > > > > > > > > > Perhaps I missed a call site that needs an rpc_put_task_async()? > > > > > > > > > > I'm not sure that would explain what we're seeing here, unless I'm just > > > missing something. I do know that nfs_cancel_async_unlink was never > > > called in my latest test run, so that does not seem to be it. > > > > > > Another anomaly -- d_fsdata is NULL. How do we get into that state with > > > a dentry that has been silly-renamed? FWIW, here's the entire dentry > > > struct with my latest reproducer. Let me know if anything stands out to > > > you... > > > > > > > My current thinking is that the problems I'm seeing here are not > > directly related to the bug we're trying to fix. It looks like > > sillyrename is still just plain broken a'la your email entitled: > > > > 2.6.38-rc2... NFS sillyrename is broken... > > > > ...I'll plan to keep an eye out for the fix from Nick, and re-test this > > when it hits. > > > > Thanks, > > Oh, I'm sorry. I wasn't aware you were testing on 2.6.38-rc... > > Yes, sillyrename is still broken there due to the NULL parent dentry > being passed to the d_iput() callback. Nick promised us a patch, but > hasn't delivered yet. > Thanks. I went back and tested this patch along with my trivial patch that adds an artificial delay in the rpc_release callback. This patch seems to fix the race. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod 2011-02-15 14:58 [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod Jeff Layton 2011-02-15 15:31 ` Trond Myklebust @ 2011-02-15 15:53 ` Tigran Mkrtchyan 1 sibling, 0 replies; 17+ messages in thread From: Tigran Mkrtchyan @ 2011-02-15 15:53 UTC (permalink / raw) To: Jeff Layton, linux-nfs; +Cc: Trond Myklebust I have seen this as well, but was interpreting as problem in my server. Tigran. On 02/15/2011 03:58 PM, Jeff Layton wrote: > I recently had some of our QA people report some connectathon test > failures in RHEL5 (2.6.18-based kernel). For some odd reason (maybe > scheduling differences that make the race more likely?) the problem > occurs more frequently on s390. > > The problem generally manifests itself on NFSv4 as a race where an rmdir > fails because a silly-renamed file in the directory wasn't deleted in > time. Looking at traces, what you usually see is the failing rmdir > attempt that fails with the sillydelete of the file that prevented it > very soon afterward. > > Silly deletes are handled via dentry_iput and in the case of a close on > NFSv4, the last dentry reference is often held by the CLOSE RPC task. > nfs4_do_close does the close as an async RPC task that it conditionally > waits on depending on whether the close is synchronous or not. > > It also sets the workqueue for the task to nfsiod_workqueue. When > tk_workqueue is set, the rpc_release operation is queued to that > workqueue. rpc_release is where the dentry reference held by the task is > put. The caller has no way to wait for that to complete, so the close(2) > syscall can easily return before the rpc_release call is ever done. In > some cases, that rpc_release is delayed for a long enough to prevent a > subsequent rmdir of the containing directory. > > I believe this is a bug, or at least not ideal behavior. We should try > not to have the close(2) call return in this situation until the > sillydelete is done. > > I've been able to reproduce this more reliably by adding a 100ms sleep > at the top of nfs4_free_closedata. I've not seen it "in the wild" on > mainline kernels, but it seems quite possible when a machine is heavily > loaded. > > This patch fixes this by not setting tk_workqueue in nfs4_do_close when > the wait flag is set. This makes the final rpc_put_task a synchronous > operation and should prevent close(2) from returning before the > dentry_iput is done. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > fs/nfs/nfs4proc.c | 5 ++++- > 1 files changed, 4 insertions(+), 1 deletions(-) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index 78936a8..4cabfea 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -1988,11 +1988,14 @@ int nfs4_do_close(struct path *path, struct nfs4_state *state, gfp_t gfp_mask, i > .rpc_client = server->client, > .rpc_message = &msg, > .callback_ops = &nfs4_close_ops, > - .workqueue = nfsiod_workqueue, > .flags = RPC_TASK_ASYNC, > }; > int status = -ENOMEM; > > + /* rpc_release must be synchronous too if "wait" is set */ > + if (!wait) > + task_setup_data.workqueue = nfsiod_workqueue; > + > calldata = kzalloc(sizeof(*calldata), gfp_mask); > if (calldata == NULL) > goto out; ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2011-02-23 20:13 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-15 14:58 [PATCH] nfs: don't queue synchronous NFSv4 close rpc_release to nfsiod Jeff Layton 2011-02-15 15:31 ` Trond Myklebust 2011-02-15 16:30 ` Jeff Layton 2011-02-15 23:47 ` Trond Myklebust 2011-02-16 14:09 ` Trond Myklebust 2011-02-16 14:26 ` Trond Myklebust 2011-02-16 14:50 ` Jeff Layton 2011-02-16 15:21 ` Trond Myklebust 2011-02-16 18:13 ` Jeff Layton 2011-02-17 13:40 ` Jeff Layton 2011-02-17 15:10 ` Jeff Layton 2011-02-17 19:47 ` Trond Myklebust 2011-02-17 21:37 ` Jeff Layton 2011-02-18 20:04 ` Jeff Layton 2011-02-18 20:54 ` Trond Myklebust 2011-02-23 20:17 ` Jeff Layton 2011-02-15 15:53 ` Tigran Mkrtchyan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).