* [PATCH 1/2] NLM/lockd: convert __nlm_async_call to use rpc_run_task()
[not found] ` <20080328201229.18158.52437.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
@ 2008-03-28 20:12 ` Trond Myklebust
2008-03-28 20:12 ` [PATCH 2/2] NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel Trond Myklebust
2008-03-28 21:37 ` [PATCH 0/2] asynchronous unlock on exit Peter Staubach
2 siblings, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2008-03-28 20:12 UTC (permalink / raw)
To: Peter Staubach; +Cc: NFS list
Peter Staubach comments:
> In the course of investigating testing failures in the locking phase of
> the Connectathon testsuite, I discovered a couple of things. One was
> that one of the tests in the locking tests was racy when it didn't seem
> to need to be and two, that the NFS client asynchronously releases locks
> when a process is exiting.
...
> The Single UNIX Specification Version 3 specifies that: "All locks
> associated with a file for a given process shall be removed when a file
> descriptor for that file is closed by that process or the process holding
> that file descriptor terminates.".
>
> This does not specify whether those locks must be released prior to the
> completion of the exit processing for the process or not. However,
> general assumptions seem to be that those locks will be released. This
> leads to more deterministic behavior under normal circumstances.
The following patch converts the NFSv2/v3 locking code to use the same
mechanism as NFSv4 for sending asynchronous RPC calls and then waiting for
them to complete. This ensures that the UNLOCK and CANCEL RPC calls will
complete even if the user interrupts the call, yet satisfies the
above request for synchronous behaviour on process exit.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
fs/lockd/clntproc.c | 58 ++++++++++++++++++++++++++++++++-------------------
1 files changed, 36 insertions(+), 22 deletions(-)
diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index b6b74a6..82c9a27 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -343,10 +343,16 @@ in_grace_period:
/*
* Generic NLM call, async version.
*/
-static int __nlm_async_call(struct nlm_rqst *req, u32 proc, struct rpc_message *msg, const struct rpc_call_ops *tk_ops)
+static struct rpc_task *__nlm_async_call(struct nlm_rqst *req, u32 proc, struct rpc_message *msg, const struct rpc_call_ops *tk_ops)
{
struct nlm_host *host = req->a_host;
struct rpc_clnt *clnt;
+ struct rpc_task_setup task_setup_data = {
+ .rpc_message = msg,
+ .callback_ops = tk_ops,
+ .callback_data = req,
+ .flags = RPC_TASK_ASYNC,
+ };
dprintk("lockd: call procedure %d on %s (async)\n",
(int)proc, host->h_name);
@@ -356,21 +362,38 @@ static int __nlm_async_call(struct nlm_rqst *req, u32 proc, struct rpc_message *
if (clnt == NULL)
goto out_err;
msg->rpc_proc = &clnt->cl_procinfo[proc];
+ task_setup_data.rpc_client = clnt;
/* bootstrap and kick off the async RPC call */
- return rpc_call_async(clnt, msg, RPC_TASK_ASYNC, tk_ops, req);
+ return rpc_run_task(&task_setup_data);
out_err:
tk_ops->rpc_release(req);
- return -ENOLCK;
+ return ERR_PTR(-ENOLCK);
}
+/*
+ * NLM asynchronous call.
+ *
+ * Note that although the calls are asynchronous, and are therefore
+ * guaranteed to complete, we still always attempt to wait for
+ * completion in order to be able to correctly track the lock
+ * state.
+ */
int nlm_async_call(struct nlm_rqst *req, u32 proc, const struct rpc_call_ops *tk_ops)
{
struct rpc_message msg = {
.rpc_argp = &req->a_args,
.rpc_resp = &req->a_res,
};
- return __nlm_async_call(req, proc, &msg, tk_ops);
+ struct rpc_task *task;
+ int err;
+
+ task = __nlm_async_call(req, proc, &msg, tk_ops);
+ if (IS_ERR(task))
+ return PTR_ERR(task);
+ err = rpc_wait_for_completion_task(task);
+ rpc_put_task(task);
+ return err;
}
int nlm_async_reply(struct nlm_rqst *req, u32 proc, const struct rpc_call_ops *tk_ops)
@@ -378,7 +401,13 @@ int nlm_async_reply(struct nlm_rqst *req, u32 proc, const struct rpc_call_ops *t
struct rpc_message msg = {
.rpc_argp = &req->a_res,
};
- return __nlm_async_call(req, proc, &msg, tk_ops);
+ struct rpc_task *task;
+
+ task = __nlm_async_call(req, proc, &msg, tk_ops);
+ if (IS_ERR(task))
+ return PTR_ERR(task);
+ rpc_put_task(task);
+ return 0;
}
/*
@@ -597,8 +626,6 @@ static int
nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
{
struct nlm_host *host = req->a_host;
- struct nlm_res *resp = &req->a_res;
- int status = 0;
/*
* Note: the server is supposed to either grant us the unlock
@@ -613,23 +640,10 @@ nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
}
up_read(&host->h_rwsem);
- if (req->a_flags & RPC_TASK_ASYNC)
- return nlm_async_call(req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
-
- status = nlmclnt_call(req, NLMPROC_UNLOCK);
- if (status < 0)
- goto out;
-
- if (resp->status == nlm_granted)
- goto out;
-
- if (resp->status != nlm_lck_denied_nolocks)
- printk("lockd: unexpected unlock status: %d\n", resp->status);
- /* What to do now? I'm out of my depth... */
- status = -ENOLCK;
+ return nlm_async_call(req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
out:
nlm_release_call(req);
- return status;
+ return 0;
}
static void nlmclnt_unlock_callback(struct rpc_task *task, void *data)
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH 2/2] NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel
[not found] ` <20080328201229.18158.52437.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
2008-03-28 20:12 ` [PATCH 1/2] NLM/lockd: convert __nlm_async_call to use rpc_run_task() Trond Myklebust
@ 2008-03-28 20:12 ` Trond Myklebust
2008-03-28 21:37 ` [PATCH 0/2] asynchronous unlock on exit Peter Staubach
2 siblings, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2008-03-28 20:12 UTC (permalink / raw)
To: Peter Staubach; +Cc: NFS list
The signal masks have been rendered obsolete by the preceding patch.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
fs/lockd/clntproc.c | 42 +-----------------------------------------
1 files changed, 1 insertions(+), 41 deletions(-)
diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 82c9a27..a50122c 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -155,8 +155,6 @@ static void nlmclnt_release_lockargs(struct nlm_rqst *req)
int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl)
{
struct nlm_rqst *call;
- sigset_t oldset;
- unsigned long flags;
int status;
nlm_get_host(host);
@@ -168,22 +166,6 @@ int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl)
/* Set up the argument struct */
nlmclnt_setlockargs(call, fl);
- /* Keep the old signal mask */
- spin_lock_irqsave(¤t->sighand->siglock, flags);
- oldset = current->blocked;
-
- /* If we're cleaning up locks because the process is exiting,
- * perform the RPC call asynchronously. */
- if ((IS_SETLK(cmd) || IS_SETLKW(cmd))
- && fl->fl_type == F_UNLCK
- && (current->flags & PF_EXITING)) {
- sigfillset(¤t->blocked); /* Mask all signals */
- recalc_sigpending();
-
- call->a_flags = RPC_TASK_ASYNC;
- }
- spin_unlock_irqrestore(¤t->sighand->siglock, flags);
-
if (IS_SETLK(cmd) || IS_SETLKW(cmd)) {
if (fl->fl_type != F_UNLCK) {
call->a_args.block = IS_SETLKW(cmd) ? 1 : 0;
@@ -198,11 +180,6 @@ int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl)
fl->fl_ops->fl_release_private(fl);
fl->fl_ops = NULL;
- spin_lock_irqsave(¤t->sighand->siglock, flags);
- current->blocked = oldset;
- recalc_sigpending();
- spin_unlock_irqrestore(¤t->sighand->siglock, flags);
-
dprintk("lockd: clnt proc returns %d\n", status);
return status;
}
@@ -685,16 +662,6 @@ static const struct rpc_call_ops nlmclnt_unlock_ops = {
static int nlmclnt_cancel(struct nlm_host *host, int block, struct file_lock *fl)
{
struct nlm_rqst *req;
- unsigned long flags;
- sigset_t oldset;
- int status;
-
- /* Block all signals while setting up call */
- spin_lock_irqsave(¤t->sighand->siglock, flags);
- oldset = current->blocked;
- sigfillset(¤t->blocked);
- recalc_sigpending();
- spin_unlock_irqrestore(¤t->sighand->siglock, flags);
req = nlm_alloc_call(nlm_get_host(host));
if (!req)
@@ -704,14 +671,7 @@ static int nlmclnt_cancel(struct nlm_host *host, int block, struct file_lock *fl
nlmclnt_setlockargs(req, fl);
req->a_args.block = block;
- status = nlm_async_call(req, NLMPROC_CANCEL, &nlmclnt_cancel_ops);
-
- spin_lock_irqsave(¤t->sighand->siglock, flags);
- current->blocked = oldset;
- recalc_sigpending();
- spin_unlock_irqrestore(¤t->sighand->siglock, flags);
-
- return status;
+ return nlm_async_call(req, NLMPROC_CANCEL, &nlmclnt_cancel_ops);
}
static void nlmclnt_cancel_callback(struct rpc_task *task, void *data)
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH 0/2] asynchronous unlock on exit
[not found] ` <20080328201229.18158.52437.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
2008-03-28 20:12 ` [PATCH 1/2] NLM/lockd: convert __nlm_async_call to use rpc_run_task() Trond Myklebust
2008-03-28 20:12 ` [PATCH 2/2] NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel Trond Myklebust
@ 2008-03-28 21:37 ` Peter Staubach
2008-03-28 22:08 ` Trond Myklebust
2 siblings, 1 reply; 6+ messages in thread
From: Peter Staubach @ 2008-03-28 21:37 UTC (permalink / raw)
To: Trond Myklebust; +Cc: NFS list
[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]
Trond Myklebust wrote:
> Hi Peter,
>
> The following patchsets takes up the theme from the NLM patch that you
> sent me a couple of weeks ago, and re-implements your fix in terms
> that are closer to the existing NFSv4 implementation (which you
> said was fine).
Hi, Trond.
Thanx for doing this! I started recoding and got some stuff
working, but then got distracted by higher priority issues.
I wasn't pleased with the look and feel of the stuff that I
had developed, so I was waiting to finish it up.
Your patch is a bit more extensive than mine, which is good.
However, I think that nlmclnt_unlock() needs to wait until
the RPC is completed. The original problem was test12() in
the Connectathon testsuite, which would occasionally fail.
It would fail because the parent would kill the child process
(actually the child of the child) and immediately attempt to
grab the lock. This would fail because the child hadn't
completed releasing the lock yet. There were some timing
dependencies in test12() itself, which I eliminated, but then
discovered that this wouldn't solve the entire problem. (I
can send you the new version of test12(), if you wish.)
I think that it was this need to wait in nlmclnt_unlock()
which made the patch less pleasing than I wanted. I have
attached the current version that I had worked on, just
for grins.
Thanx...
ps
[-- Attachment #2: nlmclnt_unlock.devel --]
[-- Type: text/plain, Size: 4096 bytes --]
--- linux-2.6.24.i686/fs/lockd/clntproc.c.org
+++ linux-2.6.24.i686/fs/lockd/clntproc.c
@@ -155,8 +155,6 @@ static void nlmclnt_release_lockargs(str
int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl)
{
struct nlm_rqst *call;
- sigset_t oldset;
- unsigned long flags;
int status;
nlm_get_host(host);
@@ -168,22 +166,6 @@ int nlmclnt_proc(struct nlm_host *host,
/* Set up the argument struct */
nlmclnt_setlockargs(call, fl);
- /* Keep the old signal mask */
- spin_lock_irqsave(¤t->sighand->siglock, flags);
- oldset = current->blocked;
-
- /* If we're cleaning up locks because the process is exiting,
- * perform the RPC call asynchronously. */
- if ((IS_SETLK(cmd) || IS_SETLKW(cmd))
- && fl->fl_type == F_UNLCK
- && (current->flags & PF_EXITING)) {
- sigfillset(¤t->blocked); /* Mask all signals */
- recalc_sigpending();
-
- call->a_flags = RPC_TASK_ASYNC;
- }
- spin_unlock_irqrestore(¤t->sighand->siglock, flags);
-
if (IS_SETLK(cmd) || IS_SETLKW(cmd)) {
if (fl->fl_type != F_UNLCK) {
call->a_args.block = IS_SETLKW(cmd) ? 1 : 0;
@@ -192,17 +174,14 @@ int nlmclnt_proc(struct nlm_host *host,
status = nlmclnt_unlock(call, fl);
} else if (IS_GETLK(cmd))
status = nlmclnt_test(call, fl);
- else
+ else {
+ nlm_release_call(call);
status = -EINVAL;
+ }
fl->fl_ops->fl_release_private(fl);
fl->fl_ops = NULL;
- spin_lock_irqsave(¤t->sighand->siglock, flags);
- current->blocked = oldset;
- recalc_sigpending();
- spin_unlock_irqrestore(¤t->sighand->siglock, flags);
-
dprintk("lockd: clnt proc returns %d\n", status);
return status;
}
@@ -596,9 +575,34 @@ nlmclnt_reclaim(struct nlm_host *host, s
static int
nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
{
- struct nlm_host *host = req->a_host;
- struct nlm_res *resp = &req->a_res;
+ struct nlm_host *host = req->a_host;
+ sigset_t oldset;
+ unsigned long flags;
int status = 0;
+ struct rpc_message msg = {
+ .rpc_argp = &req->a_args,
+ .rpc_resp = &req->a_res,
+ };
+ struct rpc_clnt *clnt;
+ struct rpc_task *task;
+ struct rpc_task_setup task_setup_data = {
+ .rpc_message = &msg,
+ .callback_ops = &nlmclnt_unlock_ops,
+ .callback_data = req,
+ .flags = RPC_TASK_ASYNC,
+ };
+
+ /* Keep the old signal mask */
+ spin_lock_irqsave(¤t->sighand->siglock, flags);
+ oldset = current->blocked;
+
+ /* If we're cleaning up locks because the process is exiting,
+ * perform the RPC call asynchronously. */
+ if (current->flags & PF_EXITING) {
+ sigfillset(¤t->blocked); /* Mask all signals */
+ recalc_sigpending();
+ }
+ spin_unlock_irqrestore(¤t->sighand->siglock, flags);
/*
* Note: the server is supposed to either grant us the unlock
@@ -609,27 +613,38 @@ nlmclnt_unlock(struct nlm_rqst *req, str
down_read(&host->h_rwsem);
if (do_vfs_lock(fl) == -ENOENT) {
up_read(&host->h_rwsem);
- goto out;
+ goto err;
}
up_read(&host->h_rwsem);
- if (req->a_flags & RPC_TASK_ASYNC)
- return nlm_async_call(req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
+ /* If we have no RPC client yet, create one. */
+ clnt = nlm_bind_host(host);
+ if (clnt == NULL)
+ goto err;
- status = nlmclnt_call(req, NLMPROC_UNLOCK);
- if (status < 0)
- goto out;
+ msg.rpc_proc = &clnt->cl_procinfo[NLMPROC_UNLOCK];
+
+ task_setup_data.rpc_client = clnt;
- if (resp->status == nlm_granted)
+ task = rpc_run_task(&task_setup_data);
+ status = PTR_ERR(task);
+ if (IS_ERR(task))
goto out;
- if (resp->status != nlm_lck_denied_nolocks)
- printk("lockd: unexpected unlock status: %d\n", resp->status);
- /* What to do now? I'm out of my depth... */
- status = -ENOLCK;
+ status = rpc_wait_for_completion_task(task);
+ rpc_put_task(task);
+
out:
- nlm_release_call(req);
+ spin_lock_irqsave(¤t->sighand->siglock, flags);
+ current->blocked = oldset;
+ recalc_sigpending();
+ spin_unlock_irqrestore(¤t->sighand->siglock, flags);
+
return status;
+
+err:
+ nlm_release_call(req);
+ goto out;
}
static void nlmclnt_unlock_callback(struct rpc_task *task, void *data)
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH 0/2] asynchronous unlock on exit
2008-03-28 21:37 ` [PATCH 0/2] asynchronous unlock on exit Peter Staubach
@ 2008-03-28 22:08 ` Trond Myklebust
[not found] ` <1206742095.15567.44.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2008-03-28 22:08 UTC (permalink / raw)
To: Peter Staubach; +Cc: NFS list
On Fri, 2008-03-28 at 17:37 -0400, Peter Staubach wrote:
> However, I think that nlmclnt_unlock() needs to wait until
> the RPC is completed.
It should do that now. See the call to rpc_wait_for_completion_task() in
nlm_async_call()
> The original problem was test12() in
> the Connectathon testsuite, which would occasionally fail.
> It would fail because the parent would kill the child process
> (actually the child of the child) and immediately attempt to
> grab the lock. This would fail because the child hadn't
> completed releasing the lock yet. There were some timing
> dependencies in test12() itself, which I eliminated, but then
> discovered that this wouldn't solve the entire problem. (I
> can send you the new version of test12(), if you wish.)
So, at least in 2.6.25, the call to rpc_wait_for_completion_task() will
exit only on a fatal signal. The problem in test12() is that there is a
'pre-existing condition', in that the parent signalled us with a SIGINT,
and so the signal is set upon entry to the function.
IOW: we might have to perform a similar trick to what do_coredump()
does, and clear the TIF_SIGPENDING flag. I'm not sure if that is
sufficient, but given that we're eliminating the calls to
recalc_sigpending(), and that there should be no such calls left in the
RPC code, I think we're OK.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 6+ messages in thread