[PATCH 00/33] More NFS and SUNRPC client side patches

Linux NFS development
 help / color / mirror / Atom feed

* [PATCH 00/33] More NFS and SUNRPC client side patches
@ 2008-04-19 20:40 Trond Myklebust
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs

Highlights include:

      * A fix for an RPCSEC_GSS hang, in which some servers would treat
        the client as being crazy if it tried to destroy an RPCSEC_GSS
        credential that had already been labelled as invalid by an
        RPCSEC_GSS_CREDPROBLEM error in an RPC reply.
      * I found a use-after-free case in call_decode() that was
        introduced by commit 220bcc2afd7011b3e0569fc178331fa983c92c1b.
      * I added code to ensure that we only break the connection _once_
        if we have to resend multiple NFSv4 requests.
      * I also tried to ensure that we don't invalidate a gss_cred which
        may be in use by other requests (basically, by forcing
        gss_refresh to always create a new credential if we find out
        that our current one has been invalidated).
      * I cleaned up xs_nospace() so that our use of the
        transport->sock->flags conforms to the expectations of the rest
        of the socket code.
      * I've reintroduced the use of machine creds when calling
        SETCLIENTID in NFSv4. If such a cred exists, then that will
        ensure that we always authenticate to the same principal name
        when rebooting, thus improving the ability to recover from
        network partitions.
        If there are no machine creds available, then we fall back to
        the existing scheme where we take a user cred.
      * NLM fixes to avoid creating 'phantom locks' on the client side.
      * Fix error handling in the read/write code if rpc_run_task() fails.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 02/33] SUNRPC: Fix up xprt_write_space()
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
@ 2008-04-19 20:40   ` Trond Myklebust
       [not found]     ` <20080419204047.14124.5947.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
  2008-04-19 20:40   ` [PATCH 01/33] SUNRPC: Fix a bug in call_decode() Trond Myklebust
                     ` (31 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal whether
or not we have someone waiting for buffer memory. Convert the SUNRPC layer
to use the same idiom.
Remove the unlikely()s in xs_udp_write_space and xs_tcp_write_space. In
fact, the most common case will be that there is nobody waiting for buffer
space.

SOCK_NOSPACE is there to tell the TCP layer whether or not the cwnd was
limited by the application window. Ensure that we follow the same idiom as
the rest of the networking layer here too.

Finally, ensure that we clear SOCK_ASYNC_NOSPACE once we wake up, so that
write_space() doesn't keep waking things up on xprt->pending.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 include/linux/sunrpc/xprt.h |    2 +
 net/sunrpc/xprt.c           |    4 +--
 net/sunrpc/xprtsock.c       |   61 +++++++++++++++++++++++++++++--------------
 3 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index b3ff9a8..8a0629a 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -232,7 +232,7 @@ int			xprt_unregister_transport(struct xprt_class *type);
 void			xprt_set_retrans_timeout_def(struct rpc_task *task);
 void			xprt_set_retrans_timeout_rtt(struct rpc_task *task);
 void			xprt_wake_pending_tasks(struct rpc_xprt *xprt, int status);
-void			xprt_wait_for_buffer_space(struct rpc_task *task);
+void			xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action);
 void			xprt_write_space(struct rpc_xprt *xprt);
 void			xprt_update_rtt(struct rpc_task *task);
 void			xprt_adjust_cwnd(struct rpc_task *task, int result);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 85199c6..3ba64f9 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -447,13 +447,13 @@ EXPORT_SYMBOL_GPL(xprt_wake_pending_tasks);
  * @task: task to be put to sleep
  *
  */
-void xprt_wait_for_buffer_space(struct rpc_task *task)
+void xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action action)
 {
 	struct rpc_rqst *req = task->tk_rqstp;
 	struct rpc_xprt *xprt = req->rq_xprt;
 
 	task->tk_timeout = req->rq_timeout;
-	rpc_sleep_on(&xprt->pending, task, NULL);
+	rpc_sleep_on(&xprt->pending, task, action);
 }
 EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space);
 
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 8bd3b0f..4c2462e 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -516,6 +516,14 @@ out:
 	return sent;
 }
 
+static void xs_nospace_callback(struct rpc_task *task)
+{
+	struct sock_xprt *transport = container_of(task->tk_rqstp->rq_xprt, struct sock_xprt, xprt);
+
+	transport->inet->sk_write_pending--;
+	clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
+}
+
 /**
  * xs_nospace - place task on wait queue if transmit was incomplete
  * @task: task to put to sleep
@@ -531,20 +539,27 @@ static void xs_nospace(struct rpc_task *task)
 			task->tk_pid, req->rq_slen - req->rq_bytes_sent,
 			req->rq_slen);
 
-	if (test_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags)) {
-		/* Protect against races with write_space */
-		spin_lock_bh(&xprt->transport_lock);
-
-		/* Don't race with disconnect */
-		if (!xprt_connected(xprt))
-			task->tk_status = -ENOTCONN;
-		else if (test_bit(SOCK_NOSPACE, &transport->sock->flags))
-			xprt_wait_for_buffer_space(task);
+	/* Protect against races with write_space */
+	spin_lock_bh(&xprt->transport_lock);
+
+	/* Don't race with disconnect */
+	if (xprt_connected(xprt)) {
+		if (test_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags)) {
+			/*
+			 * Notify TCP that we're limited by the application
+			 * window size
+			 */
+			set_bit(SOCK_NOSPACE, &transport->sock->flags);
+			transport->inet->sk_write_pending++;
+			/* ...and wait for more buffer space */
+			xprt_wait_for_buffer_space(task, xs_nospace_callback);
+		}
+	} else {
+		clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
+		task->tk_status = -ENOTCONN;
+	}
 
-		spin_unlock_bh(&xprt->transport_lock);
-	} else
-		/* Keep holding the socket if it is blocked */
-		rpc_delay(task, HZ>>4);
+	spin_unlock_bh(&xprt->transport_lock);
 }
 
 /**
@@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task *task)
 	}
 
 	switch (status) {
+	case -EAGAIN:
+		xs_nospace(task);
+		break;
 	case -ENETUNREACH:
 	case -EPIPE:
 	case -ECONNREFUSED:
 		/* When the server has died, an ICMP port unreachable message
 		 * prompts ECONNREFUSED. */
-		break;
-	case -EAGAIN:
-		xs_nospace(task);
+		clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
 		break;
 	default:
+		clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
 		dprintk("RPC:       sendmsg returned unrecognized error %d\n",
 			-status);
-		break;
 	}
 
 	return status;
@@ -695,12 +711,13 @@ static int xs_tcp_send_request(struct rpc_task *task)
 	case -ENOTCONN:
 	case -EPIPE:
 		status = -ENOTCONN;
+		clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
 		break;
 	default:
 		dprintk("RPC:       sendmsg returned unrecognized error %d\n",
 			-status);
+		clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
 		xs_tcp_shutdown(xprt);
-		break;
 	}
 
 	return status;
@@ -1189,9 +1206,11 @@ static void xs_udp_write_space(struct sock *sk)
 
 		if (unlikely(!(sock = sk->sk_socket)))
 			goto out;
+		clear_bit(SOCK_NOSPACE, &sock->flags);
+
 		if (unlikely(!(xprt = xprt_from_sock(sk))))
 			goto out;
-		if (unlikely(!test_and_clear_bit(SOCK_NOSPACE, &sock->flags)))
+		if (test_and_clear_bit(SOCK_ASYNC_NOSPACE, &sock->flags) == 0)
 			goto out;
 
 		xprt_write_space(xprt);
@@ -1222,9 +1241,11 @@ static void xs_tcp_write_space(struct sock *sk)
 
 		if (unlikely(!(sock = sk->sk_socket)))
 			goto out;
+		clear_bit(SOCK_NOSPACE, &sock->flags);
+
 		if (unlikely(!(xprt = xprt_from_sock(sk))))
 			goto out;
-		if (unlikely(!test_and_clear_bit(SOCK_NOSPACE, &sock->flags)))
+		if (test_and_clear_bit(SOCK_ASYNC_NOSPACE, &sock->flags) == 0)
 			goto out;
 
 		xprt_write_space(xprt);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 01/33] SUNRPC: Fix a bug in call_decode()
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
  2008-04-19 20:40   ` [PATCH 02/33] SUNRPC: Fix up xprt_write_space() Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
       [not found]     ` <20080419204047.14124.76946.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
  2008-04-19 20:40   ` [PATCH 04/33] NFS: Fix nfs_wb_page() to always exit with an error or a clean page Trond Myklebust
                     ` (30 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

call_verify() can, under certain circumstances, free the RPC slot. In that
case, our cached pointer 'req = task->tk_rqstp' is invalid. Bug was
introduced in commit 220bcc2afd7011b3e0569fc178331fa983c92c1b (SUNRPC:
Don't call xprt_release in call refresh).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/clnt.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index d6701f7..0c29792 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1236,10 +1236,13 @@ call_decode(struct rpc_task *task)
 			task->tk_status);
 	return;
 out_retry:
-	req->rq_received = req->rq_private_buf.len = 0;
 	task->tk_status = 0;
-	if (task->tk_client->cl_discrtry)
-		xprt_force_disconnect(task->tk_xprt);
+	/* Note: call_verify() may have freed the RPC slot */
+	if (task->tk_rqstp == req) {
+		req->rq_received = req->rq_private_buf.len = 0;
+		if (task->tk_client->cl_discrtry)
+			xprt_force_disconnect(task->tk_xprt);
+	}
 }
 
 /*


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials..
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (4 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 06/33] NFS: Ensure that the write " Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
       [not found]     ` <20080419204047.14124.64969.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
  2008-04-19 20:40   ` [PATCH 08/33] NFSv4: Remove bogus call to nfs4_drop_state_owner() in _nfs4_open_expired() Trond Myklebust
                     ` (26 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

..and always destroy using a 'soft' RPC call. Destroying GSS credentials
isn't mandatory; the server can always cope with a few credentials not
getting destroyed in a timely fashion.

This actually fixes a hang situation. Basically, some servers will decide
that the client is crazy if it tries to destroy an RPC context for which
they have sent an RPCSEC_GSS_CREDPROBLEM, and so will refuse to talk to it
for a while.
The regression therefor probably was introduced by commit
0df7fb74fbb709591301871a38aac7735a1d6583.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/auth_gss/auth_gss.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index d34f6df..55948cd 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -710,7 +710,7 @@ gss_destroying_context(struct rpc_cred *cred)
 	struct rpc_task *task;
 
 	if (gss_cred->gc_ctx == NULL ||
-			gss_cred->gc_ctx->gc_proc == RPC_GSS_PROC_DESTROY)
+	    test_and_clear_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags) == 0)
 		return 0;
 
 	gss_cred->gc_ctx->gc_proc = RPC_GSS_PROC_DESTROY;
@@ -720,7 +720,7 @@ gss_destroying_context(struct rpc_cred *cred)
 	 * by the RPC call or by the put_rpccred() below */
 	get_rpccred(cred);
 
-	task = rpc_call_null(gss_auth->client, cred, RPC_TASK_ASYNC);
+	task = rpc_call_null(gss_auth->client, cred, RPC_TASK_ASYNC|RPC_TASK_SOFT);
 	if (!IS_ERR(task))
 		rpc_put_task(task);
 


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 06/33] NFS: Ensure that the write code cleans up properly when rpc_run_task() fails
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (3 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 05/33] NFS: Ensure that the read code cleans up properly when rpc_run_task() fails Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials Trond Myklebust
                     ` (27 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/direct.c        |   50 ++++++++++++++++++++----------------
 fs/nfs/write.c         |   67 +++++++++++++++++++++++++++++++-----------------
 include/linux/nfs_fs.h |    4 +--
 3 files changed, 73 insertions(+), 48 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 9d9085b..abf8e02 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -508,27 +508,34 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
 static void nfs_direct_commit_result(struct rpc_task *task, void *calldata)
 {
 	struct nfs_write_data *data = calldata;
-	struct nfs_direct_req *dreq = (struct nfs_direct_req *) data->req;
 
 	/* Call the NFS version-specific code */
-	if (NFS_PROTO(data->inode)->commit_done(task, data) != 0)
-		return;
-	if (unlikely(task->tk_status < 0)) {
+	NFS_PROTO(data->inode)->commit_done(task, data);
+}
+
+static void nfs_direct_commit_release(void *calldata)
+{
+	struct nfs_write_data *data = calldata;
+	struct nfs_direct_req *dreq = (struct nfs_direct_req *) data->req;
+	int status = data->task.tk_status;
+
+	if (status < 0) {
 		dprintk("NFS: %5u commit failed with error %d.\n",
-				task->tk_pid, task->tk_status);
+				data->task.tk_pid, status);
 		dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
 	} else if (memcmp(&dreq->verf, &data->verf, sizeof(data->verf))) {
-		dprintk("NFS: %5u commit verify failed\n", task->tk_pid);
+		dprintk("NFS: %5u commit verify failed\n", data->task.tk_pid);
 		dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
 	}
 
-	dprintk("NFS: %5u commit returned %d\n", task->tk_pid, task->tk_status);
+	dprintk("NFS: %5u commit returned %d\n", data->task.tk_pid, status);
 	nfs_direct_write_complete(dreq, data->inode);
+	nfs_commitdata_release(calldata);
 }
 
 static const struct rpc_call_ops nfs_commit_direct_ops = {
 	.rpc_call_done = nfs_direct_commit_result,
-	.rpc_release = nfs_commit_release,
+	.rpc_release = nfs_direct_commit_release,
 };
 
 static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
@@ -596,7 +603,7 @@ static void nfs_direct_write_complete(struct nfs_direct_req *dreq, struct inode
 
 static void nfs_alloc_commit_data(struct nfs_direct_req *dreq)
 {
-	dreq->commit_data = nfs_commit_alloc();
+	dreq->commit_data = nfs_commitdata_alloc();
 	if (dreq->commit_data != NULL)
 		dreq->commit_data->req = (struct nfs_page *) dreq;
 }
@@ -617,11 +624,20 @@ static void nfs_direct_write_complete(struct nfs_direct_req *dreq, struct inode
 static void nfs_direct_write_result(struct rpc_task *task, void *calldata)
 {
 	struct nfs_write_data *data = calldata;
-	struct nfs_direct_req *dreq = (struct nfs_direct_req *) data->req;
-	int status = task->tk_status;
 
 	if (nfs_writeback_done(task, data) != 0)
 		return;
+}
+
+/*
+ * NB: Return the value of the first error return code.  Subsequent
+ *     errors after the first one are ignored.
+ */
+static void nfs_direct_write_release(void *calldata)
+{
+	struct nfs_write_data *data = calldata;
+	struct nfs_direct_req *dreq = (struct nfs_direct_req *) data->req;
+	int status = data->task.tk_status;
 
 	spin_lock(&dreq->lock);
 
@@ -643,23 +659,13 @@ static void nfs_direct_write_result(struct rpc_task *task, void *calldata)
 				break;
 			case NFS_ODIRECT_DO_COMMIT:
 				if (memcmp(&dreq->verf, &data->verf, sizeof(dreq->verf))) {
-					dprintk("NFS: %5u write verify failed\n", task->tk_pid);
+					dprintk("NFS: %5u write verify failed\n", data->task.tk_pid);
 					dreq->flags = NFS_ODIRECT_RESCHED_WRITES;
 				}
 		}
 	}
 out_unlock:
 	spin_unlock(&dreq->lock);
-}
-
-/*
- * NB: Return the value of the first error return code.  Subsequent
- *     errors after the first one are ignored.
- */
-static void nfs_direct_write_release(void *calldata)
-{
-	struct nfs_write_data *data = calldata;
-	struct nfs_direct_req *dreq = (struct nfs_direct_req *) data->req;
 
 	if (put_dreq(dreq))
 		nfs_direct_write_complete(dreq, data->inode);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 997b42a..31681bb 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -48,7 +48,7 @@ static struct kmem_cache *nfs_wdata_cachep;
 static mempool_t *nfs_wdata_mempool;
 static mempool_t *nfs_commit_mempool;
 
-struct nfs_write_data *nfs_commit_alloc(void)
+struct nfs_write_data *nfs_commitdata_alloc(void)
 {
 	struct nfs_write_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOFS);
 
@@ -973,7 +973,6 @@ static void nfs_writeback_done_partial(struct rpc_task *task, void *calldata)
 {
 	struct nfs_write_data	*data = calldata;
 	struct nfs_page		*req = data->req;
-	struct page		*page = req->wb_page;
 
 	dprintk("NFS: write (%s/%Ld %d@%Ld)",
 		req->wb_context->path.dentry->d_inode->i_sb->s_id,
@@ -981,13 +980,20 @@ static void nfs_writeback_done_partial(struct rpc_task *task, void *calldata)
 		req->wb_bytes,
 		(long long)req_offset(req));
 
-	if (nfs_writeback_done(task, data) != 0)
-		return;
+	nfs_writeback_done(task, data);
+}
 
-	if (task->tk_status < 0) {
+static void nfs_writeback_release_partial(void *calldata)
+{
+	struct nfs_write_data	*data = calldata;
+	struct nfs_page		*req = data->req;
+	struct page		*page = req->wb_page;
+	int status = data->task.tk_status;
+
+	if (status < 0) {
 		nfs_set_pageerror(page);
-		nfs_context_set_write_error(req->wb_context, task->tk_status);
-		dprintk(", error = %d\n", task->tk_status);
+		nfs_context_set_write_error(req->wb_context, status);
+		dprintk(", error = %d\n", status);
 		goto out;
 	}
 
@@ -1012,11 +1018,12 @@ static void nfs_writeback_done_partial(struct rpc_task *task, void *calldata)
 out:
 	if (atomic_dec_and_test(&req->wb_complete))
 		nfs_writepage_release(req);
+	nfs_writedata_release(calldata);
 }
 
 static const struct rpc_call_ops nfs_write_partial_ops = {
 	.rpc_call_done = nfs_writeback_done_partial,
-	.rpc_release = nfs_writedata_release,
+	.rpc_release = nfs_writeback_release_partial,
 };
 
 /*
@@ -1029,17 +1036,21 @@ static const struct rpc_call_ops nfs_write_partial_ops = {
 static void nfs_writeback_done_full(struct rpc_task *task, void *calldata)
 {
 	struct nfs_write_data	*data = calldata;
-	struct nfs_page		*req;
-	struct page		*page;
 
-	if (nfs_writeback_done(task, data) != 0)
-		return;
+	nfs_writeback_done(task, data);
+}
+
+static void nfs_writeback_release_full(void *calldata)
+{
+	struct nfs_write_data	*data = calldata;
+	int status = data->task.tk_status;
 
 	/* Update attributes as result of writeback. */
 	while (!list_empty(&data->pages)) {
-		req = nfs_list_entry(data->pages.next);
+		struct nfs_page *req = nfs_list_entry(data->pages.next);
+		struct page *page = req->wb_page;
+
 		nfs_list_remove_request(req);
-		page = req->wb_page;
 
 		dprintk("NFS: write (%s/%Ld %d@%Ld)",
 			req->wb_context->path.dentry->d_inode->i_sb->s_id,
@@ -1047,10 +1058,10 @@ static void nfs_writeback_done_full(struct rpc_task *task, void *calldata)
 			req->wb_bytes,
 			(long long)req_offset(req));
 
-		if (task->tk_status < 0) {
+		if (status < 0) {
 			nfs_set_pageerror(page);
-			nfs_context_set_write_error(req->wb_context, task->tk_status);
-			dprintk(", error = %d\n", task->tk_status);
+			nfs_context_set_write_error(req->wb_context, status);
+			dprintk(", error = %d\n", status);
 			goto remove_request;
 		}
 
@@ -1070,11 +1081,12 @@ remove_request:
 	next:
 		nfs_clear_page_tag_locked(req);
 	}
+	nfs_writedata_release(calldata);
 }
 
 static const struct rpc_call_ops nfs_write_full_ops = {
 	.rpc_call_done = nfs_writeback_done_full,
-	.rpc_release = nfs_writedata_release,
+	.rpc_release = nfs_writeback_release_full,
 };
 
 
@@ -1160,7 +1172,7 @@ int nfs_writeback_done(struct rpc_task *task, struct nfs_write_data *data)
 
 
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
-void nfs_commit_release(void *data)
+void nfs_commitdata_release(void *data)
 {
 	struct nfs_write_data *wdata = data;
 
@@ -1233,7 +1245,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
 	struct nfs_write_data	*data;
 	struct nfs_page         *req;
 
-	data = nfs_commit_alloc();
+	data = nfs_commitdata_alloc();
 
 	if (!data)
 		goto out_bad;
@@ -1261,7 +1273,6 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
 static void nfs_commit_done(struct rpc_task *task, void *calldata)
 {
 	struct nfs_write_data	*data = calldata;
-	struct nfs_page		*req;
 
         dprintk("NFS: %5u nfs_commit_done (status %d)\n",
                                 task->tk_pid, task->tk_status);
@@ -1269,6 +1280,13 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata)
 	/* Call the NFS version-specific code */
 	if (NFS_PROTO(data->inode)->commit_done(task, data) != 0)
 		return;
+}
+
+static void nfs_commit_release(void *calldata)
+{
+	struct nfs_write_data	*data = calldata;
+	struct nfs_page		*req;
+	int status = data->task.tk_status;
 
 	while (!list_empty(&data->pages)) {
 		req = nfs_list_entry(data->pages.next);
@@ -1283,10 +1301,10 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata)
 			(long long)NFS_FILEID(req->wb_context->path.dentry->d_inode),
 			req->wb_bytes,
 			(long long)req_offset(req));
-		if (task->tk_status < 0) {
-			nfs_context_set_write_error(req->wb_context, task->tk_status);
+		if (status < 0) {
+			nfs_context_set_write_error(req->wb_context, status);
 			nfs_inode_remove_request(req);
-			dprintk(", error = %d\n", task->tk_status);
+			dprintk(", error = %d\n", status);
 			goto next;
 		}
 
@@ -1307,6 +1325,7 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata)
 	next:
 		nfs_clear_page_tag_locked(req);
 	}
+	nfs_commitdata_release(calldata);
 }
 
 static const struct rpc_call_ops nfs_commit_ops = {
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index f4a0e4c..7f0602c 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -466,9 +466,9 @@ extern int nfs_wb_page(struct inode *inode, struct page* page);
 extern int nfs_wb_page_cancel(struct inode *inode, struct page* page);
 #if defined(CONFIG_NFS_V3) || defined(CONFIG_NFS_V4)
 extern int  nfs_commit_inode(struct inode *, int);
-extern struct nfs_write_data *nfs_commit_alloc(void);
+extern struct nfs_write_data *nfs_commitdata_alloc(void);
 extern void nfs_commit_free(struct nfs_write_data *wdata);
-extern void nfs_commit_release(void *wdata);
+extern void nfs_commitdata_release(void *wdata);
 #else
 static inline int
 nfs_commit_inode(struct inode *inode, int how)


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 04/33] NFS: Fix nfs_wb_page() to always exit with an error or a clean page
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
  2008-04-19 20:40   ` [PATCH 02/33] SUNRPC: Fix up xprt_write_space() Trond Myklebust
  2008-04-19 20:40   ` [PATCH 01/33] SUNRPC: Fix a bug in call_decode() Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
       [not found]     ` <20080419204048.14124.46594.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
  2008-04-19 20:40   ` [PATCH 05/33] NFS: Ensure that the read code cleans up properly when rpc_run_task() fails Trond Myklebust
                     ` (29 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

It is possible for nfs_wb_page() to sometimes exit with 0 return value, yet
the page is left in a dirty state.
For instance in the case where the server rebooted, and the COMMIT request
failed, then all the previously "clean" pages which were cached by the
server, but were not guaranteed to have been writted out to disk,
have to be redirtied and resent to the server.
The fix is to have nfs_wb_page_priority() check that the page is clean
before it exits...

This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
nfs_create_request() when we're in the nfs_readpage() path.

Also eliminate a redundant BUG_ON(!PageLocked(page)) while we're at it. It
turns out that clear_page_dirty_for_io() has the exact same test.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/write.c |   23 ++++++++++++-----------
 1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ce40cad..997b42a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1493,18 +1493,19 @@ static int nfs_wb_page_priority(struct inode *inode, struct page *page,
 	};
 	int ret;
 
-	BUG_ON(!PageLocked(page));
-	if (clear_page_dirty_for_io(page)) {
-		ret = nfs_writepage_locked(page, &wbc);
+	do {
+		if (clear_page_dirty_for_io(page)) {
+			ret = nfs_writepage_locked(page, &wbc);
+			if (ret < 0)
+				goto out_error;
+		} else if (!PagePrivate(page))
+			break;
+		ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
 		if (ret < 0)
-			goto out;
-	}
-	if (!PagePrivate(page))
-		return 0;
-	ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
-	if (ret >= 0)
-		return 0;
-out:
+			goto out_error;
+	} while (PagePrivate(page));
+	return 0;
+out_error:
 	__mark_inode_dirty(inode, I_DIRTY_PAGES);
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 05/33] NFS: Ensure that the read code cleans up properly when rpc_run_task() fails
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (2 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 04/33] NFS: Fix nfs_wb_page() to always exit with an error or a clean page Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 06/33] NFS: Ensure that the write " Trond Myklebust
                     ` (28 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

In the case of readpage() we need to ensure that the pages get unlocked,
and that the error is flagged.

In the case of O_DIRECT, we need to ensure that the pages are all released.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/direct.c |   19 +++++++++++++------
 fs/nfs/read.c   |   53 +++++++++++++++++++++++++++++++++--------------------
 2 files changed, 46 insertions(+), 26 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index e442005..9d9085b 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -229,14 +229,20 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq)
 static void nfs_direct_read_result(struct rpc_task *task, void *calldata)
 {
 	struct nfs_read_data *data = calldata;
-	struct nfs_direct_req *dreq = (struct nfs_direct_req *) data->req;
 
-	if (nfs_readpage_result(task, data) != 0)
-		return;
+	nfs_readpage_result(task, data);
+}
+
+static void nfs_direct_read_release(void *calldata)
+{
+
+	struct nfs_read_data *data = calldata;
+	struct nfs_direct_req *dreq = (struct nfs_direct_req *) data->req;
+	int status = data->task.tk_status;
 
 	spin_lock(&dreq->lock);
-	if (unlikely(task->tk_status < 0)) {
-		dreq->error = task->tk_status;
+	if (unlikely(status < 0)) {
+		dreq->error = status;
 		spin_unlock(&dreq->lock);
 	} else {
 		dreq->count += data->res.count;
@@ -249,11 +255,12 @@ static void nfs_direct_read_result(struct rpc_task *task, void *calldata)
 
 	if (put_dreq(dreq))
 		nfs_direct_complete(dreq);
+	nfs_readdata_release(calldata);
 }
 
 static const struct rpc_call_ops nfs_read_direct_ops = {
 	.rpc_call_done = nfs_direct_read_result,
-	.rpc_release = nfs_readdata_release,
+	.rpc_release = nfs_direct_read_release,
 };
 
 /*
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index d333f5f..6f9208a 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -338,26 +338,25 @@ int nfs_readpage_result(struct rpc_task *task, struct nfs_read_data *data)
 	return 0;
 }
 
-static int nfs_readpage_retry(struct rpc_task *task, struct nfs_read_data *data)
+static void nfs_readpage_retry(struct rpc_task *task, struct nfs_read_data *data)
 {
 	struct nfs_readargs *argp = &data->args;
 	struct nfs_readres *resp = &data->res;
 
 	if (resp->eof || resp->count == argp->count)
-		return 0;
+		return;
 
 	/* This is a short read! */
 	nfs_inc_stats(data->inode, NFSIOS_SHORTREAD);
 	/* Has the server at least made some progress? */
 	if (resp->count == 0)
-		return 0;
+		return;
 
 	/* Yes, so retry the read at the end of the data */
 	argp->offset += resp->count;
 	argp->pgbase += resp->count;
 	argp->count -= resp->count;
 	rpc_restart_call(task);
-	return -EAGAIN;
 }
 
 /*
@@ -366,29 +365,37 @@ static int nfs_readpage_retry(struct rpc_task *task, struct nfs_read_data *data)
 static void nfs_readpage_result_partial(struct rpc_task *task, void *calldata)
 {
 	struct nfs_read_data *data = calldata;
-	struct nfs_page *req = data->req;
-	struct page *page = req->wb_page;
  
 	if (nfs_readpage_result(task, data) != 0)
 		return;
+	if (task->tk_status < 0)
+		return;
 
-	if (likely(task->tk_status >= 0)) {
-		nfs_readpage_truncate_uninitialised_page(data);
-		if (nfs_readpage_retry(task, data) != 0)
-			return;
-	}
-	if (unlikely(task->tk_status < 0))
+	nfs_readpage_truncate_uninitialised_page(data);
+	nfs_readpage_retry(task, data);
+}
+
+static void nfs_readpage_release_partial(void *calldata)
+{
+	struct nfs_read_data *data = calldata;
+	struct nfs_page *req = data->req;
+	struct page *page = req->wb_page;
+	int status = data->task.tk_status;
+
+	if (status < 0)
 		SetPageError(page);
+
 	if (atomic_dec_and_test(&req->wb_complete)) {
 		if (!PageError(page))
 			SetPageUptodate(page);
 		nfs_readpage_release(req);
 	}
+	nfs_readdata_release(calldata);
 }
 
 static const struct rpc_call_ops nfs_read_partial_ops = {
 	.rpc_call_done = nfs_readpage_result_partial,
-	.rpc_release = nfs_readdata_release,
+	.rpc_release = nfs_readpage_release_partial,
 };
 
 static void nfs_readpage_set_pages_uptodate(struct nfs_read_data *data)
@@ -423,29 +430,35 @@ static void nfs_readpage_result_full(struct rpc_task *task, void *calldata)
 
 	if (nfs_readpage_result(task, data) != 0)
 		return;
+	if (task->tk_status < 0)
+		return;
 	/*
 	 * Note: nfs_readpage_retry may change the values of
 	 * data->args. In the multi-page case, we therefore need
 	 * to ensure that we call nfs_readpage_set_pages_uptodate()
 	 * first.
 	 */
-	if (likely(task->tk_status >= 0)) {
-		nfs_readpage_truncate_uninitialised_page(data);
-		nfs_readpage_set_pages_uptodate(data);
-		if (nfs_readpage_retry(task, data) != 0)
-			return;
-	}
+	nfs_readpage_truncate_uninitialised_page(data);
+	nfs_readpage_set_pages_uptodate(data);
+	nfs_readpage_retry(task, data);
+}
+
+static void nfs_readpage_release_full(void *calldata)
+{
+	struct nfs_read_data *data = calldata;
+
 	while (!list_empty(&data->pages)) {
 		struct nfs_page *req = nfs_list_entry(data->pages.next);
 
 		nfs_list_remove_request(req);
 		nfs_readpage_release(req);
 	}
+	nfs_readdata_release(calldata);
 }
 
 static const struct rpc_call_ops nfs_read_full_ops = {
 	.rpc_call_done = nfs_readpage_result_full,
-	.rpc_release = nfs_readdata_release,
+	.rpc_release = nfs_readpage_release_full,
 };
 
 /*


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 09/33] NFSv4: Only increment the sequence id if the server saw it
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (7 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 10/33] SUNRPC: Fix read ordering problems with req->rq_private_buf.len Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 07/33] NFS: Ensure that rpc_run_task() errors are propagated back to the caller Trond Myklebust
                     ` (23 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

It is quite possible that the OPEN, CLOSE, LOCK, LOCKU,... compounds fail
before the actual stateful operation has been executed (for instance in the
PUTFH call). There is no way to tell from the overall status result which
operations were executed from the COMPOUND.

The fix is to move incrementing of the sequence id into the XDR layer,
so that we do it as we process the results from the stateful operation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4proc.c       |   12 ++++++------
 fs/nfs/nfs4xdr.c        |   18 +++++++++++++++++-
 include/linux/nfs_xdr.h |   10 ++++++++--
 3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 424aa20..9f2759d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -240,6 +240,8 @@ static void nfs4_init_opendata_res(struct nfs4_opendata *p)
 {
 	p->o_res.f_attr = &p->f_attr;
 	p->o_res.dir_attr = &p->dir_attr;
+	p->o_res.seqid = p->o_arg.seqid;
+	p->c_res.seqid = p->c_arg.seqid;
 	p->o_res.server = p->o_arg.server;
 	nfs_fattr_init(&p->f_attr);
 	nfs_fattr_init(&p->dir_attr);
@@ -730,7 +732,6 @@ static void nfs4_open_confirm_done(struct rpc_task *task, void *calldata)
 		renew_lease(data->o_res.server, data->timestamp);
 		data->rpc_done = 1;
 	}
-	nfs_increment_open_seqid(data->rpc_status, data->c_arg.seqid);
 }
 
 static void nfs4_open_confirm_release(void *calldata)
@@ -860,7 +861,6 @@ static void nfs4_open_done(struct rpc_task *task, void *calldata)
 		if (!(data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM))
 			nfs_confirm_seqid(&data->owner->so_seqid, 0);
 	}
-	nfs_increment_open_seqid(data->rpc_status, data->o_arg.seqid);
 	data->rpc_done = 1;
 }
 
@@ -1226,7 +1226,6 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
         /* hmm. we are done with the inode, and in the process of freeing
 	 * the state_owner. we keep this around to process errors
 	 */
-	nfs_increment_open_seqid(task->tk_status, calldata->arg.seqid);
 	switch (task->tk_status) {
 		case 0:
 			nfs_set_open_stateid(state, &calldata->res.stateid, 0);
@@ -1333,6 +1332,7 @@ int nfs4_do_close(struct path *path, struct nfs4_state *state, int wait)
 		goto out_free_calldata;
 	calldata->arg.bitmask = server->attr_bitmask;
 	calldata->res.fattr = &calldata->fattr;
+	calldata->res.seqid = calldata->arg.seqid;
 	calldata->res.server = server;
 	calldata->path.mnt = mntget(path->mnt);
 	calldata->path.dentry = dget(path->dentry);
@@ -3159,6 +3159,7 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl,
 	p->arg.fh = NFS_FH(inode);
 	p->arg.fl = &p->fl;
 	p->arg.seqid = seqid;
+	p->res.seqid = seqid;
 	p->arg.stateid = &lsp->ls_stateid;
 	p->lsp = lsp;
 	atomic_inc(&lsp->ls_count);
@@ -3184,7 +3185,6 @@ static void nfs4_locku_done(struct rpc_task *task, void *data)
 
 	if (RPC_ASSASSINATED(task))
 		return;
-	nfs_increment_lock_seqid(task->tk_status, calldata->arg.seqid);
 	switch (task->tk_status) {
 		case 0:
 			memcpy(calldata->lsp->ls_stateid.data,
@@ -3322,6 +3322,7 @@ static struct nfs4_lockdata *nfs4_alloc_lockdata(struct file_lock *fl,
 	p->arg.lock_stateid = &lsp->ls_stateid;
 	p->arg.lock_owner.clientid = server->nfs_client->cl_clientid;
 	p->arg.lock_owner.id = lsp->ls_id.id;
+	p->res.lock_seqid = p->arg.lock_seqid;
 	p->lsp = lsp;
 	atomic_inc(&lsp->ls_count);
 	p->ctx = get_nfs_open_context(ctx);
@@ -3348,6 +3349,7 @@ static void nfs4_lock_prepare(struct rpc_task *task, void *calldata)
 			return;
 		data->arg.open_stateid = &state->stateid;
 		data->arg.new_lock_owner = 1;
+		data->res.open_seqid = data->arg.open_seqid;
 	} else
 		data->arg.new_lock_owner = 0;
 	data->timestamp = jiffies;
@@ -3365,7 +3367,6 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
 	if (RPC_ASSASSINATED(task))
 		goto out;
 	if (data->arg.new_lock_owner != 0) {
-		nfs_increment_open_seqid(data->rpc_status, data->arg.open_seqid);
 		if (data->rpc_status == 0)
 			nfs_confirm_seqid(&data->lsp->ls_seqid, 0);
 		else
@@ -3377,7 +3378,6 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
 		data->lsp->ls_flags |= NFS_LOCK_INITIALIZED;
 		renew_lease(NFS_SERVER(data->ctx->path.dentry->d_inode), data->timestamp);
 	}
-	nfs_increment_lock_seqid(data->rpc_status, data->arg.lock_seqid);
 out:
 	dprintk("%s: done, ret = %d!\n", __FUNCTION__, data->rpc_status);
 }
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 2b519f6..b8b2e39 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -3005,6 +3005,8 @@ static int decode_close(struct xdr_stream *xdr, struct nfs_closeres *res)
 	int status;
 
 	status = decode_op_hdr(xdr, OP_CLOSE);
+	if (status != -EIO)
+		nfs_increment_open_seqid(status, res->seqid);
 	if (status)
 		return status;
 	READ_BUF(NFS4_STATEID_SIZE);
@@ -3296,11 +3298,17 @@ static int decode_lock(struct xdr_stream *xdr, struct nfs_lock_res *res)
 	int status;
 
 	status = decode_op_hdr(xdr, OP_LOCK);
+	if (status == -EIO)
+		goto out;
 	if (status == 0) {
 		READ_BUF(NFS4_STATEID_SIZE);
 		COPYMEM(res->stateid.data, NFS4_STATEID_SIZE);
 	} else if (status == -NFS4ERR_DENIED)
-		return decode_lock_denied(xdr, NULL);
+		status = decode_lock_denied(xdr, NULL);
+	if (res->open_seqid != NULL)
+		nfs_increment_open_seqid(status, res->open_seqid);
+	nfs_increment_lock_seqid(status, res->lock_seqid);
+out:
 	return status;
 }
 
@@ -3319,6 +3327,8 @@ static int decode_locku(struct xdr_stream *xdr, struct nfs_locku_res *res)
 	int status;
 
 	status = decode_op_hdr(xdr, OP_LOCKU);
+	if (status != -EIO)
+		nfs_increment_lock_seqid(status, res->seqid);
 	if (status == 0) {
 		READ_BUF(NFS4_STATEID_SIZE);
 		COPYMEM(res->stateid.data, NFS4_STATEID_SIZE);
@@ -3384,6 +3394,8 @@ static int decode_open(struct xdr_stream *xdr, struct nfs_openres *res)
         int status;
 
         status = decode_op_hdr(xdr, OP_OPEN);
+	if (status != -EIO)
+		nfs_increment_open_seqid(status, res->seqid);
         if (status)
                 return status;
         READ_BUF(NFS4_STATEID_SIZE);
@@ -3416,6 +3428,8 @@ static int decode_open_confirm(struct xdr_stream *xdr, struct nfs_open_confirmre
 	int status;
 
         status = decode_op_hdr(xdr, OP_OPEN_CONFIRM);
+	if (status != -EIO)
+		nfs_increment_open_seqid(status, res->seqid);
         if (status)
                 return status;
         READ_BUF(NFS4_STATEID_SIZE);
@@ -3429,6 +3443,8 @@ static int decode_open_downgrade(struct xdr_stream *xdr, struct nfs_closeres *re
 	int status;
 
 	status = decode_op_hdr(xdr, OP_OPEN_DOWNGRADE);
+	if (status != -EIO)
+		nfs_increment_open_seqid(status, res->seqid);
 	if (status)
 		return status;
 	READ_BUF(NFS4_STATEID_SIZE);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index f301d0b..24263bb 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -140,6 +140,7 @@ struct nfs_openres {
 	__u32                   rflags;
 	struct nfs_fattr *      f_attr;
 	struct nfs_fattr *      dir_attr;
+	struct nfs_seqid *	seqid;
 	const struct nfs_server *server;
 	int			delegation_type;
 	nfs4_stateid		delegation;
@@ -159,6 +160,7 @@ struct nfs_open_confirmargs {
 
 struct nfs_open_confirmres {
 	nfs4_stateid            stateid;
+	struct nfs_seqid *	seqid;
 };
 
 /*
@@ -175,6 +177,7 @@ struct nfs_closeargs {
 struct nfs_closeres {
 	nfs4_stateid            stateid;
 	struct nfs_fattr *	fattr;
+	struct nfs_seqid *	seqid;
 	const struct nfs_server *server;
 };
 /*
@@ -199,7 +202,9 @@ struct nfs_lock_args {
 };
 
 struct nfs_lock_res {
-	nfs4_stateid			stateid;
+	nfs4_stateid		stateid;
+	struct nfs_seqid *	lock_seqid;
+	struct nfs_seqid *	open_seqid;
 };
 
 struct nfs_locku_args {
@@ -210,7 +215,8 @@ struct nfs_locku_args {
 };
 
 struct nfs_locku_res {
-	nfs4_stateid			stateid;
+	nfs4_stateid		stateid;
+	struct nfs_seqid *	seqid;
 };
 
 struct nfs_lockt_args {


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 10/33] SUNRPC: Fix read ordering problems with req->rq_private_buf.len
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (6 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 08/33] NFSv4: Remove bogus call to nfs4_drop_state_owner() in _nfs4_open_expired() Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
       [not found]     ` <20080419204049.14124.11174.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
  2008-04-19 20:40   ` [PATCH 09/33] NFSv4: Only increment the sequence id if the server saw it Trond Myklebust
                     ` (24 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

We want to ensure that req->rq_private_buf.len is updated before
req->rq_received, so that call_decode() doesn't use an old value for
req->rq_rcv_buf.len.

In 'call_decode()' itself, instead of using task->tk_status (which is set
using req->rq_received) must use the actual value of
req->rq_private_buf.len when deciding whether or not the received RPC reply
is too short.

Finally ensure that we set req->rq_rcv_buf.len to zero when retrying a
request. A typo meant that we were resetting req->rq_private_buf.len in
call_decode(), and then clobbering that value with the old rq_rcv_buf.len
again in xprt_transmit().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/clnt.c |   26 +++++++++++++-------------
 net/sunrpc/xprt.c |    3 ++-
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 0c29792..57663a4 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1195,18 +1195,6 @@ call_decode(struct rpc_task *task)
 		task->tk_flags &= ~RPC_CALL_MAJORSEEN;
 	}
 
-	if (task->tk_status < 12) {
-		if (!RPC_IS_SOFT(task)) {
-			task->tk_action = call_bind;
-			clnt->cl_stats->rpcretrans++;
-			goto out_retry;
-		}
-		dprintk("RPC:       %s: too small RPC reply size (%d bytes)\n",
-				clnt->cl_protname, task->tk_status);
-		task->tk_action = call_timeout;
-		goto out_retry;
-	}
-
 	/*
 	 * Ensure that we see all writes made by xprt_complete_rqst()
 	 * before it changed req->rq_received.
@@ -1218,6 +1206,18 @@ call_decode(struct rpc_task *task)
 	WARN_ON(memcmp(&req->rq_rcv_buf, &req->rq_private_buf,
 				sizeof(req->rq_rcv_buf)) != 0);
 
+	if (req->rq_rcv_buf.len < 12) {
+		if (!RPC_IS_SOFT(task)) {
+			task->tk_action = call_bind;
+			clnt->cl_stats->rpcretrans++;
+			goto out_retry;
+		}
+		dprintk("RPC:       %s: too small RPC reply size (%d bytes)\n",
+				clnt->cl_protname, task->tk_status);
+		task->tk_action = call_timeout;
+		goto out_retry;
+	}
+
 	/* Verify the RPC header */
 	p = call_verify(task);
 	if (IS_ERR(p)) {
@@ -1239,7 +1239,7 @@ out_retry:
 	task->tk_status = 0;
 	/* Note: call_verify() may have freed the RPC slot */
 	if (task->tk_rqstp == req) {
-		req->rq_received = req->rq_private_buf.len = 0;
+		req->rq_received = req->rq_rcv_buf.len = 0;
 		if (task->tk_client->cl_discrtry)
 			xprt_force_disconnect(task->tk_xprt);
 	}
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 3ba64f9..5110a4e 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -757,9 +757,10 @@ void xprt_complete_rqst(struct rpc_task *task, int copied)
 	task->tk_rtt = (long)jiffies - req->rq_xtime;
 
 	list_del_init(&req->rq_list);
+	req->rq_private_buf.len = copied;
 	/* Ensure all writes are done before we update req->rq_received */
 	smp_wmb();
-	req->rq_received = req->rq_private_buf.len = copied;
+	req->rq_received = copied;
 	rpc_wake_up_queued_task(&xprt->pending, task);
 }
 EXPORT_SYMBOL_GPL(xprt_complete_rqst);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 07/33] NFS: Ensure that rpc_run_task() errors are propagated back to the caller
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (8 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 09/33] NFSv4: Only increment the sequence id if the server saw it Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
       [not found]     ` <20080419204048.14124.4335.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
  2008-04-19 20:40   ` [PATCH 11/33] NLM/lockd: Ensure we don't corrupt fl->fl_flags in nlmclnt_unlock() Trond Myklebust
                     ` (22 subsequent siblings)
  32 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/direct.c |   10 ++++++----
 fs/nfs/read.c   |   23 +++++++++++++++--------
 fs/nfs/write.c  |   33 +++++++++++++++++++--------------
 3 files changed, 40 insertions(+), 26 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index abf8e02..4757a2b 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -347,8 +347,9 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
 		NFS_PROTO(inode)->read_setup(data, &msg);
 
 		task = rpc_run_task(&task_setup_data);
-		if (!IS_ERR(task))
-			rpc_put_task(task);
+		if (IS_ERR(task))
+			break;
+		rpc_put_task(task);
 
 		dprintk("NFS: %5u initiated direct read call "
 			"(req %s/%Ld, %zu bytes @ offset %Lu)\n",
@@ -763,8 +764,9 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
 		NFS_PROTO(inode)->write_setup(data, &msg);
 
 		task = rpc_run_task(&task_setup_data);
-		if (!IS_ERR(task))
-			rpc_put_task(task);
+		if (IS_ERR(task))
+			break;
+		rpc_put_task(task);
 
 		dprintk("NFS: %5u initiated direct write call "
 			"(req %s/%Ld, %zu bytes @ offset %Lu)\n",
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 6f9208a..16f57e0 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -153,7 +153,7 @@ static void nfs_readpage_release(struct nfs_page *req)
 /*
  * Set up the NFS read request struct
  */
-static void nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
+static int nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
 		const struct rpc_call_ops *call_ops,
 		unsigned int count, unsigned int offset)
 {
@@ -202,8 +202,10 @@ static void nfs_read_rpcsetup(struct nfs_page *req, struct nfs_read_data *data,
 			(unsigned long long)data->args.offset);
 
 	task = rpc_run_task(&task_setup_data);
-	if (!IS_ERR(task))
-		rpc_put_task(task);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	rpc_put_task(task);
+	return 0;
 }
 
 static void
@@ -240,6 +242,7 @@ static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigne
 	size_t rsize = NFS_SERVER(inode)->rsize, nbytes;
 	unsigned int offset;
 	int requests = 0;
+	int ret = 0;
 	LIST_HEAD(list);
 
 	nfs_list_remove_request(req);
@@ -261,6 +264,8 @@ static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigne
 	offset = 0;
 	nbytes = count;
 	do {
+		int ret2;
+
 		data = list_entry(list.next, struct nfs_read_data, pages);
 		list_del_init(&data->pages);
 
@@ -268,13 +273,15 @@ static int nfs_pagein_multi(struct inode *inode, struct list_head *head, unsigne
 
 		if (nbytes < rsize)
 			rsize = nbytes;
-		nfs_read_rpcsetup(req, data, &nfs_read_partial_ops,
+		ret2 = nfs_read_rpcsetup(req, data, &nfs_read_partial_ops,
 				  rsize, offset);
+		if (ret == 0)
+			ret = ret2;
 		offset += rsize;
 		nbytes -= rsize;
 	} while (nbytes != 0);
 
-	return 0;
+	return ret;
 
 out_bad:
 	while (!list_empty(&list)) {
@@ -292,6 +299,7 @@ static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned
 	struct nfs_page		*req;
 	struct page		**pages;
 	struct nfs_read_data	*data;
+	int ret = -ENOMEM;
 
 	data = nfs_readdata_alloc(npages);
 	if (!data)
@@ -307,11 +315,10 @@ static int nfs_pagein_one(struct inode *inode, struct list_head *head, unsigned
 	}
 	req = nfs_list_entry(data->pages.next);
 
-	nfs_read_rpcsetup(req, data, &nfs_read_full_ops, count, 0);
-	return 0;
+	return nfs_read_rpcsetup(req, data, &nfs_read_full_ops, count, 0);
 out_bad:
 	nfs_async_read_error(head);
-	return -ENOMEM;
+	return ret;
 }
 
 /*
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 31681bb..1ade11d 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -778,7 +778,7 @@ static int flush_task_priority(int how)
 /*
  * Set up the argument/result storage required for the RPC call.
  */
-static void nfs_write_rpcsetup(struct nfs_page *req,
+static int nfs_write_rpcsetup(struct nfs_page *req,
 		struct nfs_write_data *data,
 		const struct rpc_call_ops *call_ops,
 		unsigned int count, unsigned int offset,
@@ -841,8 +841,10 @@ static void nfs_write_rpcsetup(struct nfs_page *req,
 		(unsigned long long)data->args.offset);
 
 	task = rpc_run_task(&task_setup_data);
-	if (!IS_ERR(task))
-		rpc_put_task(task);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	rpc_put_task(task);
+	return 0;
 }
 
 /* If a nfs_flush_* function fails, it should remove reqs from @head and
@@ -868,6 +870,7 @@ static int nfs_flush_multi(struct inode *inode, struct list_head *head, unsigned
 	size_t wsize = NFS_SERVER(inode)->wsize, nbytes;
 	unsigned int offset;
 	int requests = 0;
+	int ret = 0;
 	LIST_HEAD(list);
 
 	nfs_list_remove_request(req);
@@ -889,6 +892,8 @@ static int nfs_flush_multi(struct inode *inode, struct list_head *head, unsigned
 	offset = 0;
 	nbytes = count;
 	do {
+		int ret2;
+
 		data = list_entry(list.next, struct nfs_write_data, pages);
 		list_del_init(&data->pages);
 
@@ -896,13 +901,15 @@ static int nfs_flush_multi(struct inode *inode, struct list_head *head, unsigned
 
 		if (nbytes < wsize)
 			wsize = nbytes;
-		nfs_write_rpcsetup(req, data, &nfs_write_partial_ops,
+		ret2 = nfs_write_rpcsetup(req, data, &nfs_write_partial_ops,
 				   wsize, offset, how);
+		if (ret == 0)
+			ret = ret2;
 		offset += wsize;
 		nbytes -= wsize;
 	} while (nbytes != 0);
 
-	return 0;
+	return ret;
 
 out_bad:
 	while (!list_empty(&list)) {
@@ -943,9 +950,7 @@ static int nfs_flush_one(struct inode *inode, struct list_head *head, unsigned i
 	req = nfs_list_entry(data->pages.next);
 
 	/* Set up the argument struct */
-	nfs_write_rpcsetup(req, data, &nfs_write_full_ops, count, 0, how);
-
-	return 0;
+	return nfs_write_rpcsetup(req, data, &nfs_write_full_ops, count, 0, how);
  out_bad:
 	while (!list_empty(head)) {
 		req = nfs_list_entry(head->next);
@@ -1183,7 +1188,7 @@ void nfs_commitdata_release(void *data)
 /*
  * Set up the argument/result storage required for the RPC call.
  */
-static void nfs_commit_rpcsetup(struct list_head *head,
+static int nfs_commit_rpcsetup(struct list_head *head,
 		struct nfs_write_data *data,
 		int how)
 {
@@ -1232,8 +1237,10 @@ static void nfs_commit_rpcsetup(struct list_head *head,
 	dprintk("NFS: %5u initiated commit call\n", data->task.tk_pid);
 
 	task = rpc_run_task(&task_setup_data);
-	if (!IS_ERR(task))
-		rpc_put_task(task);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	rpc_put_task(task);
+	return 0;
 }
 
 /*
@@ -1251,9 +1258,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
 		goto out_bad;
 
 	/* Set up the argument struct */
-	nfs_commit_rpcsetup(head, data, how);
-
-	return 0;
+	return nfs_commit_rpcsetup(head, data, how);
  out_bad:
 	while (!list_empty(head)) {
 		req = nfs_list_entry(head->next);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 08/33] NFSv4: Remove bogus call to nfs4_drop_state_owner() in _nfs4_open_expired()
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (5 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 10/33] SUNRPC: Fix read ordering problems with req->rq_private_buf.len Trond Myklebust
                     ` (25 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

There should be no need to invalidate a perfectly good state owner just
because of a stale filehandle. Doing so can cause the state recovery code
to break, since nfs4_get_renew_cred() and nfs4_get_setclientid_cred() rely
on finding active state owners.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4proc.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index f38d057..424aa20 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -982,11 +982,8 @@ static int _nfs4_open_expired(struct nfs_open_context *ctx, struct nfs4_state *s
 	if (IS_ERR(opendata))
 		return PTR_ERR(opendata);
 	ret = nfs4_open_recover(opendata, state);
-	if (ret == -ESTALE) {
-		/* Invalidate the state owner so we don't ever use it again */
-		nfs4_drop_state_owner(state->owner);
+	if (ret == -ESTALE)
 		d_drop(ctx->path.dentry);
-	}
 	nfs4_opendata_put(opendata);
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 11/33] NLM/lockd: Ensure we don't corrupt fl->fl_flags in nlmclnt_unlock()
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (9 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 07/33] NFS: Ensure that rpc_run_task() errors are propagated back to the caller Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 13/33] NLM/lockd: Add a reference counter to struct nlm_rqst Trond Myklebust
                     ` (21 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

Also fix up nlmclnt_lock() so that it doesn't pass modified versions of
fl->fl_flags to nlmclnt_cancel() and other helpers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/lockd/clntproc.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index b6b74a6..4e1c012 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -493,6 +493,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
 	}
 	fl->fl_flags |= FL_ACCESS;
 	status = do_vfs_lock(fl);
+	fl->fl_flags = fl_flags;
 	if (status < 0)
 		goto out;
 
@@ -530,10 +531,11 @@ again:
 			goto again;
 		}
 		/* Ensure the resulting lock will get added to granted list */
-		fl->fl_flags = fl_flags | FL_SLEEP;
+		fl->fl_flags |= FL_SLEEP;
 		if (do_vfs_lock(fl) < 0)
 			printk(KERN_WARNING "%s: VFS is out of sync with lock manager!\n", __FUNCTION__);
 		up_read(&host->h_rwsem);
+		fl->fl_flags = fl_flags;
 	}
 	status = nlm_stat_to_errno(resp->status);
 out_unblock:
@@ -543,7 +545,6 @@ out_unblock:
 		nlmclnt_cancel(host, req->a_args.block, fl);
 out:
 	nlm_release_call(req);
-	fl->fl_flags = fl_flags;
 	return status;
 }
 
@@ -598,7 +599,8 @@ nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
 {
 	struct nlm_host	*host = req->a_host;
 	struct nlm_res	*resp = &req->a_res;
-	int status = 0;
+	int status;
+	unsigned char fl_flags = fl->fl_flags;
 
 	/*
 	 * Note: the server is supposed to either grant us the unlock
@@ -607,11 +609,13 @@ nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
 	 */
 	fl->fl_flags |= FL_EXISTS;
 	down_read(&host->h_rwsem);
-	if (do_vfs_lock(fl) == -ENOENT) {
-		up_read(&host->h_rwsem);
+	status = do_vfs_lock(fl);
+	up_read(&host->h_rwsem);
+	fl->fl_flags = fl_flags;
+	if (status == -ENOENT) {
+		status = 0;
 		goto out;
 	}
-	up_read(&host->h_rwsem);
 
 	if (req->a_flags & RPC_TASK_ASYNC)
 		return nlm_async_call(req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 14/33] NLM/lockd: convert __nlm_async_call to use rpc_run_task()
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (11 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 13/33] NLM/lockd: Add a reference counter to struct nlm_rqst Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 12/33] NFSv4: Ensure we don't corrupt fl->fl_flags in nfs4_proc_unlck Trond Myklebust
                     ` (19 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

Peter Staubach comments:

> In the course of investigating testing failures in the locking phase of
> the Connectathon testsuite, I discovered a couple of things.  One was
> that one of the tests in the locking tests was racy when it didn't seem
> to need to be and two, that the NFS client asynchronously releases locks
> when a process is exiting.
...
> The Single UNIX Specification Version 3 specifies that:  "All locks
> associated with a file for a given process shall be removed when a file
> descriptor for that file is closed by that process or the process holding
> that file descriptor terminates.".
>
> This does not specify whether those locks must be released prior to the
> completion of the exit processing for the process or not.  However,
> general assumptions seem to be that those locks will be released.  This
> leads to more deterministic behavior under normal circumstances.

The following patch converts the NFSv2/v3 locking code to use the same
mechanism as NFSv4 for sending asynchronous RPC calls and then waiting for
them to complete. This ensures that the UNLOCK and CANCEL RPC calls will
complete even if the user interrupts the call, yet satisfies the
above request for synchronous behaviour on process exit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/lockd/clntproc.c |   64 +++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 749eb53..a34b709 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -346,10 +346,16 @@ in_grace_period:
 /*
  * Generic NLM call, async version.
  */
-static int __nlm_async_call(struct nlm_rqst *req, u32 proc, struct rpc_message *msg, const struct rpc_call_ops *tk_ops)
+static struct rpc_task *__nlm_async_call(struct nlm_rqst *req, u32 proc, struct rpc_message *msg, const struct rpc_call_ops *tk_ops)
 {
 	struct nlm_host	*host = req->a_host;
 	struct rpc_clnt	*clnt;
+	struct rpc_task_setup task_setup_data = {
+		.rpc_message = msg,
+		.callback_ops = tk_ops,
+		.callback_data = req,
+		.flags = RPC_TASK_ASYNC,
+	};
 
 	dprintk("lockd: call procedure %d on %s (async)\n",
 			(int)proc, host->h_name);
@@ -359,21 +365,36 @@ static int __nlm_async_call(struct nlm_rqst *req, u32 proc, struct rpc_message *
 	if (clnt == NULL)
 		goto out_err;
 	msg->rpc_proc = &clnt->cl_procinfo[proc];
+	task_setup_data.rpc_client = clnt;
 
         /* bootstrap and kick off the async RPC call */
-        return rpc_call_async(clnt, msg, RPC_TASK_ASYNC, tk_ops, req);
+	return rpc_run_task(&task_setup_data);
 out_err:
 	tk_ops->rpc_release(req);
-	return -ENOLCK;
+	return ERR_PTR(-ENOLCK);
+}
+
+static int nlm_do_async_call(struct nlm_rqst *req, u32 proc, struct rpc_message *msg, const struct rpc_call_ops *tk_ops)
+{
+	struct rpc_task *task;
+
+	task = __nlm_async_call(req, proc, msg, tk_ops);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	rpc_put_task(task);
+	return 0;
 }
 
+/*
+ * NLM asynchronous call.
+ */
 int nlm_async_call(struct nlm_rqst *req, u32 proc, const struct rpc_call_ops *tk_ops)
 {
 	struct rpc_message msg = {
 		.rpc_argp	= &req->a_args,
 		.rpc_resp	= &req->a_res,
 	};
-	return __nlm_async_call(req, proc, &msg, tk_ops);
+	return nlm_do_async_call(req, proc, &msg, tk_ops);
 }
 
 int nlm_async_reply(struct nlm_rqst *req, u32 proc, const struct rpc_call_ops *tk_ops)
@@ -381,7 +402,32 @@ int nlm_async_reply(struct nlm_rqst *req, u32 proc, const struct rpc_call_ops *t
 	struct rpc_message msg = {
 		.rpc_argp	= &req->a_res,
 	};
-	return __nlm_async_call(req, proc, &msg, tk_ops);
+	return nlm_do_async_call(req, proc, &msg, tk_ops);
+}
+
+/*
+ * NLM client asynchronous call.
+ *
+ * Note that although the calls are asynchronous, and are therefore
+ *      guaranteed to complete, we still always attempt to wait for
+ *      completion in order to be able to correctly track the lock
+ *      state.
+ */
+static int nlmclnt_async_call(struct nlm_rqst *req, u32 proc, const struct rpc_call_ops *tk_ops)
+{
+	struct rpc_message msg = {
+		.rpc_argp	= &req->a_args,
+		.rpc_resp	= &req->a_res,
+	};
+	struct rpc_task *task;
+	int err;
+
+	task = __nlm_async_call(req, proc, &msg, tk_ops);
+	if (IS_ERR(task))
+		return PTR_ERR(task);
+	err = rpc_wait_for_completion_task(task);
+	rpc_put_task(task);
+	return err;
 }
 
 /*
@@ -620,10 +666,8 @@ nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
 		goto out;
 	}
 
-	if (req->a_flags & RPC_TASK_ASYNC)
-		return nlm_async_call(req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
-
-	status = nlmclnt_call(req, NLMPROC_UNLOCK);
+	atomic_inc(&req->a_count);
+	status = nlmclnt_async_call(req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
 	if (status < 0)
 		goto out;
 
@@ -697,7 +741,7 @@ static int nlmclnt_cancel(struct nlm_host *host, int block, struct file_lock *fl
 	nlmclnt_setlockargs(req, fl);
 	req->a_args.block = block;
 
-	status = nlm_async_call(req, NLMPROC_CANCEL, &nlmclnt_cancel_ops);
+	status = nlmclnt_async_call(req, NLMPROC_CANCEL, &nlmclnt_cancel_ops);
 
 	spin_lock_irqsave(&current->sighand->siglock, flags);
 	current->blocked = oldset;


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 13/33] NLM/lockd: Add a reference counter to struct nlm_rqst
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (10 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 11/33] NLM/lockd: Ensure we don't corrupt fl->fl_flags in nlmclnt_unlock() Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 14/33] NLM/lockd: convert __nlm_async_call to use rpc_run_task() Trond Myklebust
                     ` (20 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

When we replace the existing synchronous RPC calls with asynchronous calls,
the reference count will be needed in order to allow us to examine the
result of the RPC call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/lockd/clntproc.c         |    3 +++
 include/linux/lockd/lockd.h |    1 +
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 4e1c012..749eb53 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -221,6 +221,7 @@ struct nlm_rqst *nlm_alloc_call(struct nlm_host *host)
 	for(;;) {
 		call = kzalloc(sizeof(*call), GFP_KERNEL);
 		if (call != NULL) {
+			atomic_set(&call->a_count, 1);
 			locks_init_lock(&call->a_args.lock.fl);
 			locks_init_lock(&call->a_res.lock.fl);
 			call->a_host = host;
@@ -237,6 +238,8 @@ struct nlm_rqst *nlm_alloc_call(struct nlm_host *host)
 
 void nlm_release_call(struct nlm_rqst *call)
 {
+	if (!atomic_dec_and_test(&call->a_count))
+		return;
 	nlm_release_host(call->a_host);
 	nlmclnt_release_lockargs(call);
 	kfree(call);
diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
index acf39e1..94649a8 100644
--- a/include/linux/lockd/lockd.h
+++ b/include/linux/lockd/lockd.h
@@ -91,6 +91,7 @@ struct nlm_wait;
  */
 #define NLMCLNT_OHSIZE		((__NEW_UTS_LEN) + 10u)
 struct nlm_rqst {
+	atomic_t		a_count;
 	unsigned int		a_flags;	/* initial RPC task flags */
 	struct nlm_host *	a_host;		/* host handle */
 	struct nlm_args		a_args;		/* arguments */


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 12/33] NFSv4: Ensure we don't corrupt fl->fl_flags in nfs4_proc_unlck
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (12 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 14/33] NLM/lockd: convert __nlm_async_call to use rpc_run_task() Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 16/33] NLM/lockd: Ensure that nlmclnt_cancel() returns results of the CANCEL call Trond Myklebust
                     ` (18 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4proc.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9f2759d..a106932 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3263,6 +3263,7 @@ static int nfs4_proc_unlck(struct nfs4_state *state, int cmd, struct file_lock *
 	struct nfs4_lock_state *lsp;
 	struct rpc_task *task;
 	int status = 0;
+	unsigned char fl_flags = request->fl_flags;
 
 	status = nfs4_set_lock_state(state, request);
 	/* Unlock _before_ we do the RPC call */
@@ -3286,6 +3287,7 @@ static int nfs4_proc_unlck(struct nfs4_state *state, int cmd, struct file_lock *
 	status = nfs4_wait_for_completion_rpc_task(task);
 	rpc_put_task(task);
 out:
+	request->fl_flags = fl_flags;
 	return status;
 }
 


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 15/33] NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (14 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 16/33] NLM/lockd: Ensure that nlmclnt_cancel() returns results of the CANCEL call Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 18/33] NFS: Remove the buggy lock-if-signalled case from do_setlk() Trond Myklebust
                     ` (16 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

The signal masks have been rendered obsolete by the preceding patch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/lockd/clntproc.c |   42 +-----------------------------------------
 1 files changed, 1 insertions(+), 41 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index a34b709..5f13e03 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -155,8 +155,6 @@ static void nlmclnt_release_lockargs(struct nlm_rqst *req)
 int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl)
 {
 	struct nlm_rqst		*call;
-	sigset_t		oldset;
-	unsigned long		flags;
 	int			status;
 
 	nlm_get_host(host);
@@ -168,22 +166,6 @@ int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl)
 	/* Set up the argument struct */
 	nlmclnt_setlockargs(call, fl);
 
-	/* Keep the old signal mask */
-	spin_lock_irqsave(&current->sighand->siglock, flags);
-	oldset = current->blocked;
-
-	/* If we're cleaning up locks because the process is exiting,
-	 * perform the RPC call asynchronously. */
-	if ((IS_SETLK(cmd) || IS_SETLKW(cmd))
-	    && fl->fl_type == F_UNLCK
-	    && (current->flags & PF_EXITING)) {
-		sigfillset(&current->blocked);	/* Mask all signals */
-		recalc_sigpending();
-
-		call->a_flags = RPC_TASK_ASYNC;
-	}
-	spin_unlock_irqrestore(&current->sighand->siglock, flags);
-
 	if (IS_SETLK(cmd) || IS_SETLKW(cmd)) {
 		if (fl->fl_type != F_UNLCK) {
 			call->a_args.block = IS_SETLKW(cmd) ? 1 : 0;
@@ -198,11 +180,6 @@ int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl)
 	fl->fl_ops->fl_release_private(fl);
 	fl->fl_ops = NULL;
 
-	spin_lock_irqsave(&current->sighand->siglock, flags);
-	current->blocked = oldset;
-	recalc_sigpending();
-	spin_unlock_irqrestore(&current->sighand->siglock, flags);
-
 	dprintk("lockd: clnt proc returns %d\n", status);
 	return status;
 }
@@ -722,16 +699,6 @@ static const struct rpc_call_ops nlmclnt_unlock_ops = {
 static int nlmclnt_cancel(struct nlm_host *host, int block, struct file_lock *fl)
 {
 	struct nlm_rqst	*req;
-	unsigned long	flags;
-	sigset_t	oldset;
-	int		status;
-
-	/* Block all signals while setting up call */
-	spin_lock_irqsave(&current->sighand->siglock, flags);
-	oldset = current->blocked;
-	sigfillset(&current->blocked);
-	recalc_sigpending();
-	spin_unlock_irqrestore(&current->sighand->siglock, flags);
 
 	req = nlm_alloc_call(nlm_get_host(host));
 	if (!req)
@@ -741,14 +708,7 @@ static int nlmclnt_cancel(struct nlm_host *host, int block, struct file_lock *fl
 	nlmclnt_setlockargs(req, fl);
 	req->a_args.block = block;
 
-	status = nlmclnt_async_call(req, NLMPROC_CANCEL, &nlmclnt_cancel_ops);
-
-	spin_lock_irqsave(&current->sighand->siglock, flags);
-	current->blocked = oldset;
-	recalc_sigpending();
-	spin_unlock_irqrestore(&current->sighand->siglock, flags);
-
-	return status;
+	return nlmclnt_async_call(req, NLMPROC_CANCEL, &nlmclnt_cancel_ops);
 }
 
 static void nlmclnt_cancel_callback(struct rpc_task *task, void *data)


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 17/33] NLM/lockd: Fix a race when cancelling a blocking lock
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (16 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 18/33] NFS: Remove the buggy lock-if-signalled case from do_setlk() Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 22/33] NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid() Trond Myklebust
                     ` (14 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

We shouldn't remove the lock from the list of blocked locks until the
CANCEL call has completed since we may be racing with a GRANTED callback.

Also ensure that we send an UNLOCK if the CANCEL request failed. Normally
that should only happen if the process gets hit with a fatal signal.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/lockd/clntproc.c |   43 ++++++++++++++++++++++++++++++++++---------
 1 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index ea1a694..37d1aa2 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -510,6 +510,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
 	struct nlm_res	*resp = &req->a_res;
 	struct nlm_wait *block = NULL;
 	unsigned char fl_flags = fl->fl_flags;
+	unsigned char fl_type;
 	int status = -ENOLCK;
 
 	if (nsm_monitor(host) < 0) {
@@ -525,13 +526,16 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
 
 	block = nlmclnt_prepare_block(host, fl);
 again:
+	/*
+	 * Initialise resp->status to a valid non-zero value,
+	 * since 0 == nlm_lck_granted
+	 */
+	resp->status = nlm_lck_blocked;
 	for(;;) {
 		/* Reboot protection */
 		fl->fl_u.nfs_fl.state = host->h_state;
 		status = nlmclnt_call(req, NLMPROC_LOCK);
 		if (status < 0)
-			goto out_unblock;
-		if (!req->a_args.block)
 			break;
 		/* Did a reclaimer thread notify us of a server reboot? */
 		if (resp->status ==  nlm_lck_denied_grace_period)
@@ -540,15 +544,22 @@ again:
 			break;
 		/* Wait on an NLM blocking lock */
 		status = nlmclnt_block(block, req, NLMCLNT_POLL_TIMEOUT);
-		/* if we were interrupted. Send a CANCEL request to the server
-		 * and exit
-		 */
 		if (status < 0)
-			goto out_unblock;
+			break;
 		if (resp->status != nlm_lck_blocked)
 			break;
 	}
 
+	/* if we were interrupted while blocking, then cancel the lock request
+	 * and exit
+	 */
+	if (resp->status == nlm_lck_blocked) {
+		if (!req->a_args.block)
+			goto out_unlock;
+		if (nlmclnt_cancel(host, req->a_args.block, fl) == 0)
+			goto out_unblock;
+	}
+
 	if (resp->status == nlm_granted) {
 		down_read(&host->h_rwsem);
 		/* Check whether or not the server has rebooted */
@@ -562,16 +573,30 @@ again:
 			printk(KERN_WARNING "%s: VFS is out of sync with lock manager!\n", __FUNCTION__);
 		up_read(&host->h_rwsem);
 		fl->fl_flags = fl_flags;
+		status = 0;
 	}
+	if (status < 0)
+		goto out_unlock;
 	status = nlm_stat_to_errno(resp->status);
 out_unblock:
 	nlmclnt_finish_block(block);
-	/* Cancel the blocked request if it is still pending */
-	if (resp->status == nlm_lck_blocked)
-		nlmclnt_cancel(host, req->a_args.block, fl);
 out:
 	nlm_release_call(req);
 	return status;
+out_unlock:
+	/* Fatal error: ensure that we remove the lock altogether */
+	dprintk("lockd: lock attempt ended in fatal error.\n"
+		"       Attempting to unlock.\n");
+	nlmclnt_finish_block(block);
+	fl_type = fl->fl_type;
+	fl->fl_type = F_UNLCK;
+	down_read(&host->h_rwsem);
+	do_vfs_lock(fl);
+	up_read(&host->h_rwsem);
+	fl->fl_type = fl_type;
+	fl->fl_flags = fl_flags;
+	nlmclnt_async_call(req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
+	return status;
 }
 
 /*


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 16/33] NLM/lockd: Ensure that nlmclnt_cancel() returns results of the CANCEL call
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (13 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 12/33] NFSv4: Ensure we don't corrupt fl->fl_flags in nfs4_proc_unlck Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 15/33] NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel Trond Myklebust
                     ` (17 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

Currently, it returns success as long as the RPC call was sent. We'd like
to know if the CANCEL operation succeeded on the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/lockd/clntproc.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 5f13e03..ea1a694 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -699,6 +699,10 @@ static const struct rpc_call_ops nlmclnt_unlock_ops = {
 static int nlmclnt_cancel(struct nlm_host *host, int block, struct file_lock *fl)
 {
 	struct nlm_rqst	*req;
+	int status;
+
+	dprintk("lockd: blocking lock attempt was interrupted by a signal.\n"
+		"       Attempting to cancel lock.\n");
 
 	req = nlm_alloc_call(nlm_get_host(host));
 	if (!req)
@@ -708,7 +712,12 @@ static int nlmclnt_cancel(struct nlm_host *host, int block, struct file_lock *fl
 	nlmclnt_setlockargs(req, fl);
 	req->a_args.block = block;
 
-	return nlmclnt_async_call(req, NLMPROC_CANCEL, &nlmclnt_cancel_ops);
+	atomic_inc(&req->a_count);
+	status = nlmclnt_async_call(req, NLMPROC_CANCEL, &nlmclnt_cancel_ops);
+	if (status == 0 && req->a_res.status == nlm_lck_denied)
+		status = -ENOLCK;
+	nlm_release_call(req);
+	return status;
 }
 
 static void nlmclnt_cancel_callback(struct rpc_task *task, void *data)


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 18/33] NFS: Remove the buggy lock-if-signalled case from do_setlk()
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (15 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 15/33] NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 17/33] NLM/lockd: Fix a race when cancelling a blocking lock Trond Myklebust
                     ` (15 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

Both NLM and NFSv4 should be able to clean up adequately in the case where
the user interrupts the RPC call...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/file.c |   12 ++----------
 1 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 8312b0f..3536b01 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -570,17 +570,9 @@ static int do_setlk(struct file *filp, int cmd, struct file_lock *fl)
 
 	lock_kernel();
 	/* Use local locking if mounted with "-onolock" */
-	if (!(NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM)) {
+	if (!(NFS_SERVER(inode)->flags & NFS_MOUNT_NONLM))
 		status = NFS_PROTO(inode)->lock(filp, cmd, fl);
-		/* If we were signalled we still need to ensure that
-		 * we clean up any state on the server. We therefore
-		 * record the lock call as having succeeded in order to
-		 * ensure that locks_remove_posix() cleans it out when
-		 * the process exits.
-		 */
-		if (status == -EINTR || status == -ERESTARTSYS)
-			do_vfs_lock(filp, fl);
-	} else
+	else
 		status = do_vfs_lock(filp, fl);
 	unlock_kernel();
 	if (status < 0)


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 20/33] nfs: return negative error value from nfs{, 4}_stat_to_errno
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (20 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 19/33] NLM/lockd: Ensure client locking calls use correct credentials Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 25/33] SUNRPC: Protect creds against early garbage collection Trond Myklebust
                     ` (10 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Benny Halevy, Trond Myklebust

From: Benny Halevy <bhalevy@panasas.com>

All use sites for nfs{,4}_stat_to_errno negate their return value.
It's more efficient to return a negative error from the stat_to_errno convertors
rather than negating its return value everywhere. This also produces slightly
smaller code.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs2xdr.c |   76 ++++++++++++++++++++++++++-------------------------
 fs/nfs/nfs3xdr.c |   34 +++++++++++------------
 fs/nfs/nfs4xdr.c |   80 +++++++++++++++++++++++++++---------------------------
 3 files changed, 95 insertions(+), 95 deletions(-)

diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index 86a80b3..28bab67 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -267,7 +267,7 @@ nfs_xdr_readres(struct rpc_rqst *req, __be32 *p, struct nfs_readres *res)
 	int status;
 
 	if ((status = ntohl(*p++)))
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 	p = xdr_decode_fattr(p, res->fattr);
 
 	count = ntohl(*p++);
@@ -432,7 +432,7 @@ nfs_xdr_readdirres(struct rpc_rqst *req, __be32 *p, void *dummy)
 	__be32 *end, *entry, *kaddr;
 
 	if ((status = ntohl(*p++)))
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 
 	hdrlen = (u8 *) p - (u8 *) iov->iov_base;
 	if (iov->iov_len < hdrlen) {
@@ -537,7 +537,7 @@ nfs_xdr_stat(struct rpc_rqst *req, __be32 *p, void *dummy)
 	int	status;
 
 	if ((status = ntohl(*p++)) != 0)
-		status = -nfs_stat_to_errno(status);
+		status = nfs_stat_to_errno(status);
 	return status;
 }
 
@@ -551,7 +551,7 @@ nfs_xdr_attrstat(struct rpc_rqst *req, __be32 *p, struct nfs_fattr *fattr)
 	int	status;
 
 	if ((status = ntohl(*p++)))
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 	xdr_decode_fattr(p, fattr);
 	return 0;
 }
@@ -566,7 +566,7 @@ nfs_xdr_diropres(struct rpc_rqst *req, __be32 *p, struct nfs_diropok *res)
 	int	status;
 
 	if ((status = ntohl(*p++)))
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 	p = xdr_decode_fhandle(p, res->fh);
 	xdr_decode_fattr(p, res->fattr);
 	return 0;
@@ -604,7 +604,7 @@ nfs_xdr_readlinkres(struct rpc_rqst *req, __be32 *p, void *dummy)
 	int	status;
 
 	if ((status = ntohl(*p++)))
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 	/* Convert length of symlink */
 	len = ntohl(*p++);
 	if (len >= rcvbuf->page_len) {
@@ -653,7 +653,7 @@ nfs_xdr_statfsres(struct rpc_rqst *req, __be32 *p, struct nfs2_fsstat *res)
 	int	status;
 
 	if ((status = ntohl(*p++)))
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 
 	res->tsize  = ntohl(*p++);
 	res->bsize  = ntohl(*p++);
@@ -672,39 +672,39 @@ static struct {
 	int errno;
 } nfs_errtbl[] = {
 	{ NFS_OK,		0		},
-	{ NFSERR_PERM,		EPERM		},
-	{ NFSERR_NOENT,		ENOENT		},
-	{ NFSERR_IO,		errno_NFSERR_IO	},
-	{ NFSERR_NXIO,		ENXIO		},
-/*	{ NFSERR_EAGAIN,	EAGAIN		}, */
-	{ NFSERR_ACCES,		EACCES		},
-	{ NFSERR_EXIST,		EEXIST		},
-	{ NFSERR_XDEV,		EXDEV		},
-	{ NFSERR_NODEV,		ENODEV		},
-	{ NFSERR_NOTDIR,	ENOTDIR		},
-	{ NFSERR_ISDIR,		EISDIR		},
-	{ NFSERR_INVAL,		EINVAL		},
-	{ NFSERR_FBIG,		EFBIG		},
-	{ NFSERR_NOSPC,		ENOSPC		},
-	{ NFSERR_ROFS,		EROFS		},
-	{ NFSERR_MLINK,		EMLINK		},
-	{ NFSERR_NAMETOOLONG,	ENAMETOOLONG	},
-	{ NFSERR_NOTEMPTY,	ENOTEMPTY	},
-	{ NFSERR_DQUOT,		EDQUOT		},
-	{ NFSERR_STALE,		ESTALE		},
-	{ NFSERR_REMOTE,	EREMOTE		},
+	{ NFSERR_PERM,		-EPERM		},
+	{ NFSERR_NOENT,		-ENOENT		},
+	{ NFSERR_IO,		-errno_NFSERR_IO},
+	{ NFSERR_NXIO,		-ENXIO		},
+/*	{ NFSERR_EAGAIN,	-EAGAIN		}, */
+	{ NFSERR_ACCES,		-EACCES		},
+	{ NFSERR_EXIST,		-EEXIST		},
+	{ NFSERR_XDEV,		-EXDEV		},
+	{ NFSERR_NODEV,		-ENODEV		},
+	{ NFSERR_NOTDIR,	-ENOTDIR	},
+	{ NFSERR_ISDIR,		-EISDIR		},
+	{ NFSERR_INVAL,		-EINVAL		},
+	{ NFSERR_FBIG,		-EFBIG		},
+	{ NFSERR_NOSPC,		-ENOSPC		},
+	{ NFSERR_ROFS,		-EROFS		},
+	{ NFSERR_MLINK,		-EMLINK		},
+	{ NFSERR_NAMETOOLONG,	-ENAMETOOLONG	},
+	{ NFSERR_NOTEMPTY,	-ENOTEMPTY	},
+	{ NFSERR_DQUOT,		-EDQUOT		},
+	{ NFSERR_STALE,		-ESTALE		},
+	{ NFSERR_REMOTE,	-EREMOTE	},
 #ifdef EWFLUSH
-	{ NFSERR_WFLUSH,	EWFLUSH		},
+	{ NFSERR_WFLUSH,	-EWFLUSH	},
 #endif
-	{ NFSERR_BADHANDLE,	EBADHANDLE	},
-	{ NFSERR_NOT_SYNC,	ENOTSYNC	},
-	{ NFSERR_BAD_COOKIE,	EBADCOOKIE	},
-	{ NFSERR_NOTSUPP,	ENOTSUPP	},
-	{ NFSERR_TOOSMALL,	ETOOSMALL	},
-	{ NFSERR_SERVERFAULT,	ESERVERFAULT	},
-	{ NFSERR_BADTYPE,	EBADTYPE	},
-	{ NFSERR_JUKEBOX,	EJUKEBOX	},
-	{ -1,			EIO		}
+	{ NFSERR_BADHANDLE,	-EBADHANDLE	},
+	{ NFSERR_NOT_SYNC,	-ENOTSYNC	},
+	{ NFSERR_BAD_COOKIE,	-EBADCOOKIE	},
+	{ NFSERR_NOTSUPP,	-ENOTSUPP	},
+	{ NFSERR_TOOSMALL,	-ETOOSMALL	},
+	{ NFSERR_SERVERFAULT,	-ESERVERFAULT	},
+	{ NFSERR_BADTYPE,	-EBADTYPE	},
+	{ NFSERR_JUKEBOX,	-EJUKEBOX	},
+	{ -1,			-EIO		}
 };
 
 /*
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index fb03048..11cddde 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -515,7 +515,7 @@ nfs3_xdr_readdirres(struct rpc_rqst *req, __be32 *p, struct nfs3_readdirres *res
 	/* Decode post_op_attrs */
 	p = xdr_decode_post_op_attr(p, res->dir_attr);
 	if (status)
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 	/* Decode verifier cookie */
 	if (res->verf) {
 		res->verf[0] = *p++;
@@ -751,7 +751,7 @@ nfs3_xdr_attrstat(struct rpc_rqst *req, __be32 *p, struct nfs_fattr *fattr)
 	int	status;
 
 	if ((status = ntohl(*p++)))
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 	xdr_decode_fattr(p, fattr);
 	return 0;
 }
@@ -766,7 +766,7 @@ nfs3_xdr_wccstat(struct rpc_rqst *req, __be32 *p, struct nfs_fattr *fattr)
 	int	status;
 
 	if ((status = ntohl(*p++)))
-		status = -nfs_stat_to_errno(status);
+		status = nfs_stat_to_errno(status);
 	xdr_decode_wcc_data(p, fattr);
 	return status;
 }
@@ -786,7 +786,7 @@ nfs3_xdr_lookupres(struct rpc_rqst *req, __be32 *p, struct nfs3_diropres *res)
 	int	status;
 
 	if ((status = ntohl(*p++))) {
-		status = -nfs_stat_to_errno(status);
+		status = nfs_stat_to_errno(status);
 	} else {
 		if (!(p = xdr_decode_fhandle(p, res->fh)))
 			return -errno_NFSERR_IO;
@@ -806,7 +806,7 @@ nfs3_xdr_accessres(struct rpc_rqst *req, __be32 *p, struct nfs3_accessres *res)
 
 	p = xdr_decode_post_op_attr(p, res->fattr);
 	if (status)
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 	res->access = ntohl(*p++);
 	return 0;
 }
@@ -843,7 +843,7 @@ nfs3_xdr_readlinkres(struct rpc_rqst *req, __be32 *p, struct nfs_fattr *fattr)
 	p = xdr_decode_post_op_attr(p, fattr);
 
 	if (status != 0)
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 
 	/* Convert length of symlink */
 	len = ntohl(*p++);
@@ -891,7 +891,7 @@ nfs3_xdr_readres(struct rpc_rqst *req, __be32 *p, struct nfs_readres *res)
 	p = xdr_decode_post_op_attr(p, res->fattr);
 
 	if (status != 0)
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 
 	/* Decode reply count and EOF flag. NFSv3 is somewhat redundant
 	 * in that it puts the count both in the res struct and in the
@@ -941,7 +941,7 @@ nfs3_xdr_writeres(struct rpc_rqst *req, __be32 *p, struct nfs_writeres *res)
 	p = xdr_decode_wcc_data(p, res->fattr);
 
 	if (status != 0)
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 
 	res->count = ntohl(*p++);
 	res->verf->committed = (enum nfs3_stable_how)ntohl(*p++);
@@ -972,7 +972,7 @@ nfs3_xdr_createres(struct rpc_rqst *req, __be32 *p, struct nfs3_diropres *res)
 			res->fattr->valid = 0;
 		}
 	} else {
-		status = -nfs_stat_to_errno(status);
+		status = nfs_stat_to_errno(status);
 	}
 	p = xdr_decode_wcc_data(p, res->dir_attr);
 	return status;
@@ -987,7 +987,7 @@ nfs3_xdr_renameres(struct rpc_rqst *req, __be32 *p, struct nfs3_renameres *res)
 	int	status;
 
 	if ((status = ntohl(*p++)) != 0)
-		status = -nfs_stat_to_errno(status);
+		status = nfs_stat_to_errno(status);
 	p = xdr_decode_wcc_data(p, res->fromattr);
 	p = xdr_decode_wcc_data(p, res->toattr);
 	return status;
@@ -1002,7 +1002,7 @@ nfs3_xdr_linkres(struct rpc_rqst *req, __be32 *p, struct nfs3_linkres *res)
 	int	status;
 
 	if ((status = ntohl(*p++)) != 0)
-		status = -nfs_stat_to_errno(status);
+		status = nfs_stat_to_errno(status);
 	p = xdr_decode_post_op_attr(p, res->fattr);
 	p = xdr_decode_wcc_data(p, res->dir_attr);
 	return status;
@@ -1020,7 +1020,7 @@ nfs3_xdr_fsstatres(struct rpc_rqst *req, __be32 *p, struct nfs_fsstat *res)
 
 	p = xdr_decode_post_op_attr(p, res->fattr);
 	if (status != 0)
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 
 	p = xdr_decode_hyper(p, &res->tbytes);
 	p = xdr_decode_hyper(p, &res->fbytes);
@@ -1045,7 +1045,7 @@ nfs3_xdr_fsinfores(struct rpc_rqst *req, __be32 *p, struct nfs_fsinfo *res)
 
 	p = xdr_decode_post_op_attr(p, res->fattr);
 	if (status != 0)
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 
 	res->rtmax  = ntohl(*p++);
 	res->rtpref = ntohl(*p++);
@@ -1073,7 +1073,7 @@ nfs3_xdr_pathconfres(struct rpc_rqst *req, __be32 *p, struct nfs_pathconf *res)
 
 	p = xdr_decode_post_op_attr(p, res->fattr);
 	if (status != 0)
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 	res->max_link = ntohl(*p++);
 	res->max_namelen = ntohl(*p++);
 
@@ -1092,7 +1092,7 @@ nfs3_xdr_commitres(struct rpc_rqst *req, __be32 *p, struct nfs_writeres *res)
 	status = ntohl(*p++);
 	p = xdr_decode_wcc_data(p, res->fattr);
 	if (status != 0)
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 
 	res->verf->verifier[0] = *p++;
 	res->verf->verifier[1] = *p++;
@@ -1114,7 +1114,7 @@ nfs3_xdr_getaclres(struct rpc_rqst *req, __be32 *p,
 	int err, base;
 
 	if (status != 0)
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 	p = xdr_decode_post_op_attr(p, res->fattr);
 	res->mask = ntohl(*p++);
 	if (res->mask & ~(NFS_ACL|NFS_ACLCNT|NFS_DFACL|NFS_DFACLCNT))
@@ -1141,7 +1141,7 @@ nfs3_xdr_setaclres(struct rpc_rqst *req, __be32 *p, struct nfs_fattr *fattr)
 	int status = ntohl(*p++);
 
 	if (status)
-		return -nfs_stat_to_errno(status);
+		return nfs_stat_to_errno(status);
 	xdr_decode_post_op_attr(p, fattr);
 	return 0;
 }
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index b8b2e39..809ef0d 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -2241,7 +2241,7 @@ static int decode_op_hdr(struct xdr_stream *xdr, enum nfs_opnum4 expected)
 	}
 	READ32(nfserr);
 	if (nfserr != NFS_OK)
-		return -nfs4_stat_to_errno(nfserr);
+		return nfs4_stat_to_errno(nfserr);
 	return 0;
 }
 
@@ -3760,7 +3760,7 @@ static int decode_setclientid(struct xdr_stream *xdr, struct nfs_client *clp)
 		READ_BUF(len);
 		return -NFSERR_CLID_INUSE;
 	} else
-		return -nfs4_stat_to_errno(nfserr);
+		return nfs4_stat_to_errno(nfserr);
 
 	return 0;
 }
@@ -4422,7 +4422,7 @@ static int nfs4_xdr_dec_fsinfo(struct rpc_rqst *req, __be32 *p, struct nfs_fsinf
 	if (!status)
 		status = decode_fsinfo(&xdr, fsinfo);
 	if (!status)
-		status = -nfs4_stat_to_errno(hdr.status);
+		status = nfs4_stat_to_errno(hdr.status);
 	return status;
 }
 
@@ -4512,7 +4512,7 @@ static int nfs4_xdr_dec_setclientid(struct rpc_rqst *req, __be32 *p,
 	if (!status)
 		status = decode_setclientid(&xdr, clp);
 	if (!status)
-		status = -nfs4_stat_to_errno(hdr.status);
+		status = nfs4_stat_to_errno(hdr.status);
 	return status;
 }
 
@@ -4534,7 +4534,7 @@ static int nfs4_xdr_dec_setclientid_confirm(struct rpc_rqst *req, __be32 *p, str
 	if (!status)
 		status = decode_fsinfo(&xdr, fsinfo);
 	if (!status)
-		status = -nfs4_stat_to_errno(hdr.status);
+		status = nfs4_stat_to_errno(hdr.status);
 	return status;
 }
 
@@ -4644,42 +4644,42 @@ static struct {
 	int errno;
 } nfs_errtbl[] = {
 	{ NFS4_OK,		0		},
-	{ NFS4ERR_PERM,		EPERM		},
-	{ NFS4ERR_NOENT,	ENOENT		},
-	{ NFS4ERR_IO,		errno_NFSERR_IO	},
-	{ NFS4ERR_NXIO,		ENXIO		},
-	{ NFS4ERR_ACCESS,	EACCES		},
-	{ NFS4ERR_EXIST,	EEXIST		},
-	{ NFS4ERR_XDEV,		EXDEV		},
-	{ NFS4ERR_NOTDIR,	ENOTDIR		},
-	{ NFS4ERR_ISDIR,	EISDIR		},
-	{ NFS4ERR_INVAL,	EINVAL		},
-	{ NFS4ERR_FBIG,		EFBIG		},
-	{ NFS4ERR_NOSPC,	ENOSPC		},
-	{ NFS4ERR_ROFS,		EROFS		},
-	{ NFS4ERR_MLINK,	EMLINK		},
-	{ NFS4ERR_NAMETOOLONG,	ENAMETOOLONG	},
-	{ NFS4ERR_NOTEMPTY,	ENOTEMPTY	},
-	{ NFS4ERR_DQUOT,	EDQUOT		},
-	{ NFS4ERR_STALE,	ESTALE		},
-	{ NFS4ERR_BADHANDLE,	EBADHANDLE	},
-	{ NFS4ERR_BADOWNER,	EINVAL		},
-	{ NFS4ERR_BADNAME,	EINVAL		},
-	{ NFS4ERR_BAD_COOKIE,	EBADCOOKIE	},
-	{ NFS4ERR_NOTSUPP,	ENOTSUPP	},
-	{ NFS4ERR_TOOSMALL,	ETOOSMALL	},
-	{ NFS4ERR_SERVERFAULT,	ESERVERFAULT	},
-	{ NFS4ERR_BADTYPE,	EBADTYPE	},
-	{ NFS4ERR_LOCKED,	EAGAIN		},
-	{ NFS4ERR_RESOURCE,	EREMOTEIO	},
-	{ NFS4ERR_SYMLINK,	ELOOP		},
-	{ NFS4ERR_OP_ILLEGAL,	EOPNOTSUPP	},
-	{ NFS4ERR_DEADLOCK,	EDEADLK		},
-	{ NFS4ERR_WRONGSEC,	EPERM		}, /* FIXME: this needs
+	{ NFS4ERR_PERM,		-EPERM		},
+	{ NFS4ERR_NOENT,	-ENOENT		},
+	{ NFS4ERR_IO,		-errno_NFSERR_IO},
+	{ NFS4ERR_NXIO,		-ENXIO		},
+	{ NFS4ERR_ACCESS,	-EACCES		},
+	{ NFS4ERR_EXIST,	-EEXIST		},
+	{ NFS4ERR_XDEV,		-EXDEV		},
+	{ NFS4ERR_NOTDIR,	-ENOTDIR	},
+	{ NFS4ERR_ISDIR,	-EISDIR		},
+	{ NFS4ERR_INVAL,	-EINVAL		},
+	{ NFS4ERR_FBIG,		-EFBIG		},
+	{ NFS4ERR_NOSPC,	-ENOSPC		},
+	{ NFS4ERR_ROFS,		-EROFS		},
+	{ NFS4ERR_MLINK,	-EMLINK		},
+	{ NFS4ERR_NAMETOOLONG,	-ENAMETOOLONG	},
+	{ NFS4ERR_NOTEMPTY,	-ENOTEMPTY	},
+	{ NFS4ERR_DQUOT,	-EDQUOT		},
+	{ NFS4ERR_STALE,	-ESTALE		},
+	{ NFS4ERR_BADHANDLE,	-EBADHANDLE	},
+	{ NFS4ERR_BADOWNER,	-EINVAL		},
+	{ NFS4ERR_BADNAME,	-EINVAL		},
+	{ NFS4ERR_BAD_COOKIE,	-EBADCOOKIE	},
+	{ NFS4ERR_NOTSUPP,	-ENOTSUPP	},
+	{ NFS4ERR_TOOSMALL,	-ETOOSMALL	},
+	{ NFS4ERR_SERVERFAULT,	-ESERVERFAULT	},
+	{ NFS4ERR_BADTYPE,	-EBADTYPE	},
+	{ NFS4ERR_LOCKED,	-EAGAIN		},
+	{ NFS4ERR_RESOURCE,	-EREMOTEIO	},
+	{ NFS4ERR_SYMLINK,	-ELOOP		},
+	{ NFS4ERR_OP_ILLEGAL,	-EOPNOTSUPP	},
+	{ NFS4ERR_DEADLOCK,	-EDEADLK	},
+	{ NFS4ERR_WRONGSEC,	-EPERM		}, /* FIXME: this needs
 						    * to be handled by a
 						    * middle-layer.
 						    */
-	{ -1,			EIO		}
+	{ -1,			-EIO		}
 };
 
 /*
@@ -4696,14 +4696,14 @@ nfs4_stat_to_errno(int stat)
 	}
 	if (stat <= 10000 || stat > 10100) {
 		/* The server is looney tunes. */
-		return ESERVERFAULT;
+		return -ESERVERFAULT;
 	}
 	/* If we cannot translate the error, the recovery routines should
 	 * handle it.
 	 * Note: remaining NFSv4 error codes have values > 10000, so should
 	 * not conflict with native Linux error codes.
 	 */
-	return stat;
+	return -stat;
 }
 
 #define PROC(proc, argtype, restype)				\


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 19/33] NLM/lockd: Ensure client locking calls use correct credentials
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (19 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 21/33] nfs: fix printout of multiword bitfields Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 20/33] nfs: return negative error value from nfs{, 4}_stat_to_errno Trond Myklebust
                     ` (11 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

Now that we've added the 'generic' credentials (that are independent of the
rpc_client) to the nfs_open_context, we can use those in the NLM client to
ensure that the lock/unlock requests are authenticated to whoever
originally opened the file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/lockd/clntproc.c |   23 ++++++++++++++---------
 1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 37d1aa2..40b16f2 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -247,7 +247,7 @@ static int nlm_wait_on_grace(wait_queue_head_t *queue)
  * Generic NLM call
  */
 static int
-nlmclnt_call(struct nlm_rqst *req, u32 proc)
+nlmclnt_call(struct rpc_cred *cred, struct nlm_rqst *req, u32 proc)
 {
 	struct nlm_host	*host = req->a_host;
 	struct rpc_clnt	*clnt;
@@ -256,6 +256,7 @@ nlmclnt_call(struct nlm_rqst *req, u32 proc)
 	struct rpc_message msg = {
 		.rpc_argp	= argp,
 		.rpc_resp	= resp,
+		.rpc_cred	= cred,
 	};
 	int		status;
 
@@ -390,11 +391,12 @@ int nlm_async_reply(struct nlm_rqst *req, u32 proc, const struct rpc_call_ops *t
  *      completion in order to be able to correctly track the lock
  *      state.
  */
-static int nlmclnt_async_call(struct nlm_rqst *req, u32 proc, const struct rpc_call_ops *tk_ops)
+static int nlmclnt_async_call(struct rpc_cred *cred, struct nlm_rqst *req, u32 proc, const struct rpc_call_ops *tk_ops)
 {
 	struct rpc_message msg = {
 		.rpc_argp	= &req->a_args,
 		.rpc_resp	= &req->a_res,
+		.rpc_cred	= cred,
 	};
 	struct rpc_task *task;
 	int err;
@@ -415,7 +417,7 @@ nlmclnt_test(struct nlm_rqst *req, struct file_lock *fl)
 {
 	int	status;
 
-	status = nlmclnt_call(req, NLMPROC_TEST);
+	status = nlmclnt_call(nfs_file_cred(fl->fl_file), req, NLMPROC_TEST);
 	if (status < 0)
 		goto out;
 
@@ -506,6 +508,7 @@ static int do_vfs_lock(struct file_lock *fl)
 static int
 nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
 {
+	struct rpc_cred *cred = nfs_file_cred(fl->fl_file);
 	struct nlm_host	*host = req->a_host;
 	struct nlm_res	*resp = &req->a_res;
 	struct nlm_wait *block = NULL;
@@ -534,7 +537,7 @@ again:
 	for(;;) {
 		/* Reboot protection */
 		fl->fl_u.nfs_fl.state = host->h_state;
-		status = nlmclnt_call(req, NLMPROC_LOCK);
+		status = nlmclnt_call(cred, req, NLMPROC_LOCK);
 		if (status < 0)
 			break;
 		/* Did a reclaimer thread notify us of a server reboot? */
@@ -595,7 +598,7 @@ out_unlock:
 	up_read(&host->h_rwsem);
 	fl->fl_type = fl_type;
 	fl->fl_flags = fl_flags;
-	nlmclnt_async_call(req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
+	nlmclnt_async_call(cred, req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
 	return status;
 }
 
@@ -619,8 +622,8 @@ nlmclnt_reclaim(struct nlm_host *host, struct file_lock *fl)
 	nlmclnt_setlockargs(req, fl);
 	req->a_args.reclaim = 1;
 
-	if ((status = nlmclnt_call(req, NLMPROC_LOCK)) >= 0
-	 && req->a_res.status == nlm_granted)
+	status = nlmclnt_call(nfs_file_cred(fl->fl_file), req, NLMPROC_LOCK);
+	if (status >= 0 && req->a_res.status == nlm_granted)
 		return 0;
 
 	printk(KERN_WARNING "lockd: failed to reclaim lock for pid %d "
@@ -669,7 +672,8 @@ nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
 	}
 
 	atomic_inc(&req->a_count);
-	status = nlmclnt_async_call(req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
+	status = nlmclnt_async_call(nfs_file_cred(fl->fl_file), req,
+			NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
 	if (status < 0)
 		goto out;
 
@@ -738,7 +742,8 @@ static int nlmclnt_cancel(struct nlm_host *host, int block, struct file_lock *fl
 	req->a_args.block = block;
 
 	atomic_inc(&req->a_count);
-	status = nlmclnt_async_call(req, NLMPROC_CANCEL, &nlmclnt_cancel_ops);
+	status = nlmclnt_async_call(nfs_file_cred(fl->fl_file), req,
+			NLMPROC_CANCEL, &nlmclnt_cancel_ops);
 	if (status == 0 && req->a_res.status == nlm_lck_denied)
 		status = -ENOLCK;
 	nlm_release_call(req);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 21/33] nfs: fix printout of multiword bitfields
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (18 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 22/33] NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid() Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 19/33] NLM/lockd: Ensure client locking calls use correct credentials Trond Myklebust
                     ` (12 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Fred Isaman, Benny Halevy, Trond Myklebust

From: Fred Isaman <iisaman@citi.umich.edu>

Benny points out that zero-padding of multiword bitfields is necessary,
and that delimiting each word is nice to avoid endianess confusion.

bhalevy: without zero padding output can be ambiguous. Also,
since the printed array of two 32-bit unsigned integers is not a
64-bit number, delimiting the output with a semicolon makes more sense.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4xdr.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 809ef0d..5a2d649 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1191,8 +1191,8 @@ static int encode_readdir(struct xdr_stream *xdr, const struct nfs4_readdir_arg
 		attrs[1] &= ~FATTR4_WORD1_MOUNTED_ON_FILEID;
 	WRITE32(attrs[0] & readdir->bitmask[0]);
 	WRITE32(attrs[1] & readdir->bitmask[1]);
-	dprintk("%s: cookie = %Lu, verifier = 0x%x%x, bitmap = 0x%x%x\n",
-			__FUNCTION__,
+	dprintk("%s: cookie = %Lu, verifier = %08x:%08x, bitmap = %08x:%08x\n",
+			__func__,
 			(unsigned long long)readdir->cookie,
 			((u32 *)readdir->verifier.data)[0],
 			((u32 *)readdir->verifier.data)[1],
@@ -2291,7 +2291,7 @@ static int decode_attr_supported(struct xdr_stream *xdr, uint32_t *bitmap, uint3
 		bitmap[0] &= ~FATTR4_WORD0_SUPPORTED_ATTRS;
 	} else
 		bitmask[0] = bitmask[1] = 0;
-	dprintk("%s: bitmask=0x%x%x\n", __FUNCTION__, bitmask[0], bitmask[1]);
+	dprintk("%s: bitmask=%08x:%08x\n", __func__, bitmask[0], bitmask[1]);
 	return 0;
 }
 
@@ -3505,8 +3505,8 @@ static int decode_readdir(struct xdr_stream *xdr, struct rpc_rqst *req, struct n
 		return status;
 	READ_BUF(8);
 	COPYMEM(readdir->verifier.data, 8);
-	dprintk("%s: verifier = 0x%x%x\n",
-			__FUNCTION__,
+	dprintk("%s: verifier = %08x:%08x\n",
+			__func__,
 			((u32 *)readdir->verifier.data)[0],
 			((u32 *)readdir->verifier.data)[1]);
 


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 22/33] NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid()
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (17 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 17/33] NLM/lockd: Fix a race when cancelling a blocking lock Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 21/33] nfs: fix printout of multiword bitfields Trond Myklebust
                     ` (13 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

With the recent change to generic creds, we can no longer use
cred->cr_ops->cr_name to distinguish between RPCSEC_GSS principals and
AUTH_SYS/AUTH_NULL identities. Replace it with the rpc_authops->au_name
instead...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4proc.c              |    2 +-
 include/linux/sunrpc/auth.h    |    2 --
 net/sunrpc/auth_generic.c      |    2 --
 net/sunrpc/auth_gss/auth_gss.c |    2 --
 net/sunrpc/auth_null.c         |    2 --
 net/sunrpc/auth_unix.c         |    2 --
 6 files changed, 1 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index a106932..dbc0927 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2885,7 +2885,7 @@ int nfs4_proc_setclientid(struct nfs_client *clp, u32 program, unsigned short po
 							RPC_DISPLAY_ADDR),
 				rpc_peeraddr2str(clp->cl_rpcclient,
 							RPC_DISPLAY_PROTO),
-				cred->cr_ops->cr_name,
+				clp->cl_rpcclient->cl_auth->au_ops->au_name,
 				clp->cl_id_uniquifier);
 		setclientid.sc_netid_len = scnprintf(setclientid.sc_netid,
 				sizeof(setclientid.sc_netid),
diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h
index e93cd8a..a19c3af 100644
--- a/include/linux/sunrpc/auth.h
+++ b/include/linux/sunrpc/auth.h
@@ -96,9 +96,7 @@ struct rpc_auth {
 struct rpc_authops {
 	struct module		*owner;
 	rpc_authflavor_t	au_flavor;	/* flavor (RPC_AUTH_*) */
-#ifdef RPC_DEBUG
 	char *			au_name;
-#endif
 	struct rpc_auth *	(*create)(struct rpc_clnt *, rpc_authflavor_t);
 	void			(*destroy)(struct rpc_auth *);
 
diff --git a/net/sunrpc/auth_generic.c b/net/sunrpc/auth_generic.c
index 6a3f77c..b6f124c 100644
--- a/net/sunrpc/auth_generic.c
+++ b/net/sunrpc/auth_generic.c
@@ -136,9 +136,7 @@ static struct rpc_cred_cache generic_cred_cache = {
 
 static const struct rpc_authops generic_auth_ops = {
 	.owner = THIS_MODULE,
-#ifdef RPC_DEBUG
 	.au_name = "Generic",
-#endif
 	.lookup_cred = generic_lookup_cred,
 	.crcreate = generic_create_cred,
 };
diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 55948cd..7567eb9 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -1287,9 +1287,7 @@ out:
 static const struct rpc_authops authgss_ops = {
 	.owner		= THIS_MODULE,
 	.au_flavor	= RPC_AUTH_GSS,
-#ifdef RPC_DEBUG
 	.au_name	= "RPCSEC_GSS",
-#endif
 	.create		= gss_create,
 	.destroy	= gss_destroy,
 	.lookup_cred	= gss_lookup_cred,
diff --git a/net/sunrpc/auth_null.c b/net/sunrpc/auth_null.c
index 3c26c18..c70dd7f 100644
--- a/net/sunrpc/auth_null.c
+++ b/net/sunrpc/auth_null.c
@@ -104,9 +104,7 @@ nul_validate(struct rpc_task *task, __be32 *p)
 const struct rpc_authops authnull_ops = {
 	.owner		= THIS_MODULE,
 	.au_flavor	= RPC_AUTH_NULL,
-#ifdef RPC_DEBUG
 	.au_name	= "NULL",
-#endif
 	.create		= nul_create,
 	.destroy	= nul_destroy,
 	.lookup_cred	= nul_lookup_cred,
diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c
index 04e936a..44920b9 100644
--- a/net/sunrpc/auth_unix.c
+++ b/net/sunrpc/auth_unix.c
@@ -210,9 +210,7 @@ void __init rpc_init_authunix(void)
 const struct rpc_authops authunix_ops = {
 	.owner		= THIS_MODULE,
 	.au_flavor	= RPC_AUTH_UNIX,
-#ifdef RPC_DEBUG
 	.au_name	= "UNIX",
-#endif
 	.create		= unx_create,
 	.destroy	= unx_destroy,
 	.lookup_cred	= unx_lookup_cred,


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 23/33] NFSv4: Reintroduce machine creds
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (23 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 26/33] SUNRPC: remove XS_SENDMSG_RETRY Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 24/33] NFSv4: Attempt to use machine credentials in SETCLIENTID calls Trond Myklebust
                     ` (7 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

We need to try to ensure that we always use the same credentials whenever
we re-establish the clientid on the server. If not, the server won't
recognise that we're the same client, and so may not allow us to recover
state.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/client.c                 |    7 +++++++
 include/linux/nfs_fs_sb.h       |    2 ++
 include/linux/sunrpc/auth.h     |    2 ++
 include/linux/sunrpc/auth_gss.h |    1 +
 net/sunrpc/auth_generic.c       |   26 ++++++++++++++++++++++++--
 net/sunrpc/auth_gss/auth_gss.c  |   12 +++++++++++-
 6 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 93dfd75..f2f3b28 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -112,6 +112,7 @@ struct nfs_client_initdata {
 static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
 {
 	struct nfs_client *clp;
+	struct rpc_cred *cred;
 
 	if ((clp = kzalloc(sizeof(*clp), GFP_KERNEL)) == NULL)
 		goto error_0;
@@ -150,6 +151,9 @@ static struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_
 	clp->cl_boot_time = CURRENT_TIME;
 	clp->cl_state = 1 << NFS4CLNT_LEASE_EXPIRED;
 #endif
+	cred = rpc_lookup_machine_cred();
+	if (!IS_ERR(cred))
+		clp->cl_machine_cred = cred;
 
 	return clp;
 
@@ -191,6 +195,9 @@ static void nfs_free_client(struct nfs_client *clp)
 	if (__test_and_clear_bit(NFS_CS_CALLBACK, &clp->cl_res_state))
 		nfs_callback_down();
 
+	if (clp->cl_machine_cred != NULL)
+		put_rpccred(clp->cl_machine_cred);
+
 	kfree(clp->cl_hostname);
 	kfree(clp);
 
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index ac7e4fb..c9beacd 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -32,6 +32,8 @@ struct nfs_client {
 	const struct nfs_rpc_ops *rpc_ops;	/* NFS protocol vector */
 	int			cl_proto;	/* Network transport protocol */
 
+	struct rpc_cred		*cl_machine_cred;
+
 #ifdef CONFIG_NFS_V4
 	u64			cl_clientid;	/* constant */
 	nfs4_verifier		cl_confirm;
diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h
index a19c3af..3f63218 100644
--- a/include/linux/sunrpc/auth.h
+++ b/include/linux/sunrpc/auth.h
@@ -26,6 +26,7 @@ struct auth_cred {
 	uid_t	uid;
 	gid_t	gid;
 	struct group_info *group_info;
+	unsigned char machine_cred : 1;
 };
 
 /*
@@ -130,6 +131,7 @@ void __exit		rpcauth_remove_module(void);
 void __exit		rpc_destroy_generic_auth(void);
 
 struct rpc_cred *	rpc_lookup_cred(void);
+struct rpc_cred *	rpc_lookup_machine_cred(void);
 int			rpcauth_register(const struct rpc_authops *);
 int			rpcauth_unregister(const struct rpc_authops *);
 struct rpc_auth *	rpcauth_create(rpc_authflavor_t, struct rpc_clnt *);
diff --git a/include/linux/sunrpc/auth_gss.h b/include/linux/sunrpc/auth_gss.h
index 67658e1..fec6899 100644
--- a/include/linux/sunrpc/auth_gss.h
+++ b/include/linux/sunrpc/auth_gss.h
@@ -84,6 +84,7 @@ struct gss_cred {
 	enum rpc_gss_svc	gc_service;
 	struct gss_cl_ctx	*gc_ctx;
 	struct gss_upcall_msg	*gc_upcall;
+	unsigned char		gc_machine_cred : 1;
 };
 
 #endif /* __KERNEL__ */
diff --git a/net/sunrpc/auth_generic.c b/net/sunrpc/auth_generic.c
index b6f124c..d927d9f 100644
--- a/net/sunrpc/auth_generic.c
+++ b/net/sunrpc/auth_generic.c
@@ -17,6 +17,9 @@
 # define RPCDBG_FACILITY	RPCDBG_AUTH
 #endif
 
+#define RPC_ANONYMOUS_USERID	((uid_t)-2)
+#define RPC_ANONYMOUS_GROUPID	((gid_t)-2)
+
 struct generic_cred {
 	struct rpc_cred gc_base;
 	struct auth_cred acred;
@@ -35,6 +38,22 @@ struct rpc_cred *rpc_lookup_cred(void)
 }
 EXPORT_SYMBOL_GPL(rpc_lookup_cred);
 
+/*
+ * Public call interface for looking up machine creds.
+ */
+struct rpc_cred *rpc_lookup_machine_cred(void)
+{
+	struct auth_cred acred = {
+		.uid = RPC_ANONYMOUS_USERID,
+		.gid = RPC_ANONYMOUS_GROUPID,
+		.machine_cred = 1,
+	};
+
+	dprintk("RPC:       looking up machine cred\n");
+	return generic_auth.au_ops->lookup_cred(&generic_auth, &acred, 0);
+}
+EXPORT_SYMBOL_GPL(rpc_lookup_machine_cred);
+
 static void
 generic_bind_cred(struct rpc_task *task, struct rpc_cred *cred)
 {
@@ -75,8 +94,10 @@ generic_create_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
 	gcred->acred.group_info = acred->group_info;
 	if (gcred->acred.group_info != NULL)
 		get_group_info(gcred->acred.group_info);
+	gcred->acred.machine_cred = acred->machine_cred;
 
-	dprintk("RPC:       allocated generic cred %p for uid %d gid %d\n",
+	dprintk("RPC:       allocated %s cred %p for uid %d gid %d\n",
+			gcred->acred.machine_cred ? "machine" : "generic",
 			gcred, acred->uid, acred->gid);
 	return &gcred->gc_base;
 }
@@ -115,7 +136,8 @@ generic_match(struct auth_cred *acred, struct rpc_cred *cred, int flags)
 
 	if (gcred->acred.uid != acred->uid ||
 	    gcred->acred.gid != acred->gid ||
-	    gcred->acred.group_info != acred->group_info)
+	    gcred->acred.group_info != acred->group_info ||
+	    gcred->acred.machine_cred != acred->machine_cred)
 		return 0;
 	return 1;
 }
diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 7567eb9..46f7ec8 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -371,9 +371,16 @@ gss_alloc_msg(struct gss_auth *gss_auth, uid_t uid)
 static struct gss_upcall_msg *
 gss_setup_upcall(struct rpc_clnt *clnt, struct gss_auth *gss_auth, struct rpc_cred *cred)
 {
+	struct gss_cred *gss_cred = container_of(cred,
+			struct gss_cred, gc_base);
 	struct gss_upcall_msg *gss_new, *gss_msg;
+	uid_t uid = cred->cr_uid;
 
-	gss_new = gss_alloc_msg(gss_auth, cred->cr_uid);
+	/* Special case: rpc.gssd assumes that uid == 0 implies machine creds */
+	if (gss_cred->gc_machine_cred != 0)
+		uid = 0;
+
+	gss_new = gss_alloc_msg(gss_auth, uid);
 	if (gss_new == NULL)
 		return ERR_PTR(-ENOMEM);
 	gss_msg = gss_add_msg(gss_auth, gss_new);
@@ -818,6 +825,7 @@ gss_create_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
 	 */
 	cred->gc_base.cr_flags = 1UL << RPCAUTH_CRED_NEW;
 	cred->gc_service = gss_auth->service;
+	cred->gc_machine_cred = acred->machine_cred;
 	kref_get(&gss_auth->kref);
 	return &cred->gc_base;
 
@@ -855,6 +863,8 @@ gss_match(struct auth_cred *acred, struct rpc_cred *rc, int flags)
 	if (gss_cred->gc_ctx && time_after(jiffies, gss_cred->gc_ctx->gc_expiry))
 		return 0;
 out:
+	if (acred->machine_cred != gss_cred->gc_machine_cred)
+		return 0;
 	return (rc->cr_uid == acred->uid);
 }
 


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 24/33] NFSv4: Attempt to use machine credentials in SETCLIENTID calls
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (24 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 23/33] NFSv4: Reintroduce machine creds Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 28/33] SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests Trond Myklebust
                     ` (6 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/nfs4state.c |   41 +++++++++++++++++++++++++++++++++++++----
 1 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 7775435..46eb624 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -71,6 +71,29 @@ static int nfs4_init_client(struct nfs_client *clp, struct rpc_cred *cred)
 	return status;
 }
 
+static struct rpc_cred *nfs4_get_machine_cred(struct nfs_client *clp)
+{
+	struct rpc_cred *cred = NULL;
+
+	spin_lock(&clp->cl_lock);
+	if (clp->cl_machine_cred != NULL)
+		cred = get_rpccred(clp->cl_machine_cred);
+	spin_unlock(&clp->cl_lock);
+	return cred;
+}
+
+static void nfs4_clear_machine_cred(struct nfs_client *clp)
+{
+	struct rpc_cred *cred;
+
+	spin_lock(&clp->cl_lock);
+	cred = clp->cl_machine_cred;
+	clp->cl_machine_cred = NULL;
+	spin_unlock(&clp->cl_lock);
+	if (cred != NULL)
+		put_rpccred(cred);
+}
+
 struct rpc_cred *nfs4_get_renew_cred(struct nfs_client *clp)
 {
 	struct nfs4_state_owner *sp;
@@ -91,13 +114,18 @@ static struct rpc_cred *nfs4_get_setclientid_cred(struct nfs_client *clp)
 {
 	struct nfs4_state_owner *sp;
 	struct rb_node *pos;
+	struct rpc_cred *cred;
 
+	cred = nfs4_get_machine_cred(clp);
+	if (cred != NULL)
+		goto out;
 	pos = rb_first(&clp->cl_state_owners);
 	if (pos != NULL) {
 		sp = rb_entry(pos, struct nfs4_state_owner, so_client_node);
-		return get_rpccred(sp->so_cred);
+		cred = get_rpccred(sp->so_cred);
 	}
-	return NULL;
+out:
+	return cred;
 }
 
 static void nfs_alloc_unique_id(struct rb_root *root, struct nfs_unique_id *new,
@@ -924,10 +952,10 @@ restart_loop:
 	if (cred != NULL) {
 		/* Yes there are: try to renew the old lease */
 		status = nfs4_proc_renew(clp, cred);
+		put_rpccred(cred);
 		switch (status) {
 			case 0:
 			case -NFS4ERR_CB_PATH_DOWN:
-				put_rpccred(cred);
 				goto out;
 			case -NFS4ERR_STALE_CLIENTID:
 			case -NFS4ERR_LEASE_MOVED:
@@ -936,14 +964,19 @@ restart_loop:
 	} else {
 		/* "reboot" to ensure we clear all state on the server */
 		clp->cl_boot_time = CURRENT_TIME;
-		cred = nfs4_get_setclientid_cred(clp);
 	}
 	/* We're going to have to re-establish a clientid */
 	nfs4_state_mark_reclaim(clp);
 	status = -ENOENT;
+	cred = nfs4_get_setclientid_cred(clp);
 	if (cred != NULL) {
 		status = nfs4_init_client(clp, cred);
 		put_rpccred(cred);
+		/* Handle case where the user hasn't set up machine creds */
+		if (status == -EACCES && cred == clp->cl_machine_cred) {
+			nfs4_clear_machine_cred(clp);
+			goto restart_loop;
+		}
 	}
 	if (status)
 		goto out_error;


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 26/33] SUNRPC: remove XS_SENDMSG_RETRY
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (22 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 25/33] SUNRPC: Protect creds against early garbage collection Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 23/33] NFSv4: Reintroduce machine creds Trond Myklebust
                     ` (8 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

The condition for exiting from the loop in xs_tcp_send_request() should be
that we find we're not making progress (i.e. number of bytes sent is 0).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/xprtsock.c |   12 +++---------
 1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 4c2462e..4a567a9 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -136,12 +136,6 @@ static ctl_table sunrpc_table[] = {
 #endif
 
 /*
- * How many times to try sending a request on a socket before waiting
- * for the socket buffer to clear.
- */
-#define XS_SENDMSG_RETRY	(10U)
-
-/*
  * Time out for an RPC UDP socket connect.  UDP socket connects are
  * synchronous, but we set a timeout anyway in case of resource
  * exhaustion on the local host.
@@ -666,7 +660,6 @@ static int xs_tcp_send_request(struct rpc_task *task)
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
 	struct xdr_buf *xdr = &req->rq_snd_buf;
 	int status;
-	unsigned int retry = 0;
 
 	xs_encode_tcp_record_marker(&req->rq_snd_buf);
 
@@ -697,9 +690,10 @@ static int xs_tcp_send_request(struct rpc_task *task)
 			return 0;
 		}
 
+		if (status != 0)
+			continue;
 		status = -EAGAIN;
-		if (retry++ > XS_SENDMSG_RETRY)
-			break;
+		break;
 	}
 
 	switch (status) {


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 25/33] SUNRPC: Protect creds against early garbage collection
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (21 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 20/33] nfs: return negative error value from nfs{, 4}_stat_to_errno Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 26/33] SUNRPC: remove XS_SENDMSG_RETRY Trond Myklebust
                     ` (9 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/auth.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 0632cd0..6bfea9e 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -220,6 +220,9 @@ rpcauth_destroy_credcache(struct rpc_auth *auth)
 }
 EXPORT_SYMBOL_GPL(rpcauth_destroy_credcache);
 
+
+#define RPC_AUTH_EXPIRY_MORATORIUM (60 * HZ)
+
 /*
  * Remove stale credentials. Avoid sleeping inside the loop.
  */
@@ -228,6 +231,7 @@ rpcauth_prune_expired(struct list_head *free, int nr_to_scan)
 {
 	spinlock_t *cache_lock;
 	struct rpc_cred *cred;
+	unsigned long expired = jiffies - RPC_AUTH_EXPIRY_MORATORIUM;
 
 	while (!list_empty(&cred_unused)) {
 		cred = list_entry(cred_unused.next, struct rpc_cred, cr_lru);
@@ -235,6 +239,10 @@ rpcauth_prune_expired(struct list_head *free, int nr_to_scan)
 		number_cred_unused--;
 		if (atomic_read(&cred->cr_count) != 0)
 			continue;
+		/* Enforce a 5 second garbage collection moratorium */
+		if (time_in_range(cred->cr_expire, expired, jiffies) &&
+		    test_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags) != 0)
+			continue;
 		cache_lock = &cred->cr_auth->au_credcache->lock;
 		spin_lock(cache_lock);
 		if (atomic_read(&cred->cr_count) == 0) {


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 30/33] SUNRPC: Don't change the RPCSEC_GSS context on a credential that is in use
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (27 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 29/33] SUNRPC: Fix a race in gss_refresh_upcall() Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 27/33] SUNRPC: Remove the unused export of xprt_force_disconnect Trond Myklebust
                     ` (3 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

When a server rejects our credential with an AUTH_REJECTEDCRED or similar,
we need to refresh the credential and then retry the request.
However, we do want to allow any requests that are in flight to finish
executing, so that we can at least attempt to process the replies that
depend on this instance of the credential.

The solution is to ensure that gss_refresh() looks up an entirely new
RPCSEC_GSS credential instead of attempting to create a context for the
existing invalid credential.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/auth_gss/auth_gss.c |   69 ++++++++++++++++++++++++----------------
 1 files changed, 42 insertions(+), 27 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 6f1b4e2..621c07f 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -114,28 +114,14 @@ static void
 gss_cred_set_ctx(struct rpc_cred *cred, struct gss_cl_ctx *ctx)
 {
 	struct gss_cred *gss_cred = container_of(cred, struct gss_cred, gc_base);
-	struct gss_cl_ctx *old;
 
-	old = gss_cred->gc_ctx;
+	if (!test_bit(RPCAUTH_CRED_NEW, &cred->cr_flags))
+		return;
 	gss_get_ctx(ctx);
 	rcu_assign_pointer(gss_cred->gc_ctx, ctx);
 	set_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags);
+	smp_mb__before_clear_bit();
 	clear_bit(RPCAUTH_CRED_NEW, &cred->cr_flags);
-	if (old)
-		gss_put_ctx(old);
-}
-
-static int
-gss_cred_is_uptodate_ctx(struct rpc_cred *cred)
-{
-	struct gss_cred *gss_cred = container_of(cred, struct gss_cred, gc_base);
-	int res = 0;
-
-	rcu_read_lock();
-	if (test_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags) && gss_cred->gc_ctx)
-		res = 1;
-	rcu_read_unlock();
-	return res;
 }
 
 static const void *
@@ -857,15 +843,12 @@ gss_match(struct auth_cred *acred, struct rpc_cred *rc, int flags)
 {
 	struct gss_cred *gss_cred = container_of(rc, struct gss_cred, gc_base);
 
-	/*
-	 * If the searchflags have set RPCAUTH_LOOKUP_NEW, then
-	 * we don't really care if the credential has expired or not,
-	 * since the caller should be prepared to reinitialise it.
-	 */
-	if ((flags & RPCAUTH_LOOKUP_NEW) && test_bit(RPCAUTH_CRED_NEW, &rc->cr_flags))
+	if (test_bit(RPCAUTH_CRED_NEW, &rc->cr_flags))
 		goto out;
 	/* Don't match with creds that have expired. */
-	if (gss_cred->gc_ctx && time_after(jiffies, gss_cred->gc_ctx->gc_expiry))
+	if (time_after(jiffies, gss_cred->gc_ctx->gc_expiry))
+		return 0;
+	if (!test_bit(RPCAUTH_CRED_UPTODATE, &rc->cr_flags))
 		return 0;
 out:
 	if (acred->machine_cred != gss_cred->gc_machine_cred)
@@ -933,16 +916,48 @@ out_put_ctx:
 	return NULL;
 }
 
+static int gss_renew_cred(struct rpc_task *task)
+{
+	struct rpc_cred *oldcred = task->tk_msg.rpc_cred;
+	struct gss_cred *gss_cred = container_of(oldcred,
+						 struct gss_cred,
+						 gc_base);
+	struct rpc_auth *auth = oldcred->cr_auth;
+	struct auth_cred acred = {
+		.uid = oldcred->cr_uid,
+		.machine_cred = gss_cred->gc_machine_cred,
+	};
+	struct rpc_cred *new;
+
+	new = gss_lookup_cred(auth, &acred, RPCAUTH_LOOKUP_NEW);
+	if (IS_ERR(new))
+		return PTR_ERR(new);
+	task->tk_msg.rpc_cred = new;
+	put_rpccred(oldcred);
+	return 0;
+}
+
 /*
 * Refresh credentials. XXX - finish
 */
 static int
 gss_refresh(struct rpc_task *task)
 {
+	struct rpc_cred *cred = task->tk_msg.rpc_cred;
+	int ret = 0;
+
+	if (!test_bit(RPCAUTH_CRED_NEW, &cred->cr_flags) &&
+			!test_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags)) {
+		ret = gss_renew_cred(task);
+		if (ret < 0)
+			goto out;
+		cred = task->tk_msg.rpc_cred;
+	}
 
-	if (!gss_cred_is_uptodate_ctx(task->tk_msg.rpc_cred))
-		return gss_refresh_upcall(task);
-	return 0;
+	if (test_bit(RPCAUTH_CRED_NEW, &cred->cr_flags))
+		ret = gss_refresh_upcall(task);
+out:
+	return ret;
 }
 
 /* Dummy refresh routine: used only when destroying the context */


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 27/33] SUNRPC: Remove the unused export of xprt_force_disconnect
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (28 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 30/33] SUNRPC: Don't change the RPCSEC_GSS context on a credential that is in use Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 32/33] NFS: remove duplicate flags assignment from nfs_validate_mount_data Trond Myklebust
                     ` (2 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/xprt.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 5110a4e..a0646a3 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -605,7 +605,6 @@ void xprt_force_disconnect(struct rpc_xprt *xprt)
 	xprt_wake_pending_tasks(xprt, -ENOTCONN);
 	spin_unlock_bh(&xprt->transport_lock);
 }
-EXPORT_SYMBOL_GPL(xprt_force_disconnect);
 
 static void
 xprt_init_autodisconnect(unsigned long data)


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 29/33] SUNRPC: Fix a race in gss_refresh_upcall()
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (26 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 28/33] SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 30/33] SUNRPC: Don't change the RPCSEC_GSS context on a credential that is in use Trond Myklebust
                     ` (4 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

If the downcall completes before we get the spin_lock then we currently
fail to refresh the credential.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/auth_gss/auth_gss.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 46f7ec8..6f1b4e2 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -117,6 +117,7 @@ gss_cred_set_ctx(struct rpc_cred *cred, struct gss_cl_ctx *ctx)
 	struct gss_cl_ctx *old;
 
 	old = gss_cred->gc_ctx;
+	gss_get_ctx(ctx);
 	rcu_assign_pointer(gss_cred->gc_ctx, ctx);
 	set_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags);
 	clear_bit(RPCAUTH_CRED_NEW, &cred->cr_flags);
@@ -340,7 +341,7 @@ gss_upcall_callback(struct rpc_task *task)
 
 	spin_lock(&inode->i_lock);
 	if (gss_msg->ctx)
-		gss_cred_set_ctx(task->tk_msg.rpc_cred, gss_get_ctx(gss_msg->ctx));
+		gss_cred_set_ctx(task->tk_msg.rpc_cred, gss_msg->ctx);
 	else
 		task->tk_status = gss_msg->msg.errno;
 	gss_cred->gc_upcall = NULL;
@@ -417,7 +418,11 @@ gss_refresh_upcall(struct rpc_task *task)
 	spin_lock(&inode->i_lock);
 	if (gss_cred->gc_upcall != NULL)
 		rpc_sleep_on(&gss_cred->gc_upcall->rpc_waitqueue, task, NULL);
-	else if (gss_msg->ctx == NULL && gss_msg->msg.errno >= 0) {
+	else if (gss_msg->ctx != NULL) {
+		gss_cred_set_ctx(task->tk_msg.rpc_cred, gss_msg->ctx);
+		gss_cred->gc_upcall = NULL;
+		rpc_wake_up_status(&gss_msg->rpc_waitqueue, gss_msg->msg.errno);
+	} else if (gss_msg->msg.errno >= 0) {
 		task->tk_timeout = 0;
 		gss_cred->gc_upcall = gss_msg;
 		/* gss_upcall_callback will release the reference to gss_upcall_msg */
@@ -462,7 +467,7 @@ gss_create_upcall(struct gss_auth *gss_auth, struct gss_cred *gss_cred)
 		schedule();
 	}
 	if (gss_msg->ctx)
-		gss_cred_set_ctx(cred, gss_get_ctx(gss_msg->ctx));
+		gss_cred_set_ctx(cred, gss_msg->ctx);
 	else
 		err = gss_msg->msg.errno;
 	spin_unlock(&inode->i_lock);


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 28/33] SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (25 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 24/33] NFSv4: Attempt to use machine credentials in SETCLIENTID calls Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 29/33] SUNRPC: Fix a race in gss_refresh_upcall() Trond Myklebust
                     ` (5 subsequent siblings)
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Trond Myklebust

NFSv4 requires us to ensure that we break the TCP connection before we're
allowed to retransmit a request. However in the case where we're
retransmitting several requests that have been sent on the same
connection, we need to ensure that we don't interfere with the attempt to
reconnect and/or break the connection again once it has been established.

We therefore introduce a 'connection' cookie that is bumped every time a
connection is broken. This allows requests to track if they need to force a
disconnection.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 include/linux/sunrpc/xprt.h |    8 ++++++++
 net/sunrpc/clnt.c           |    6 ++++--
 net/sunrpc/xprt.c           |   29 +++++++++++++++++++++++++++++
 net/sunrpc/xprtsock.c       |    2 ++
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 8a0629a..4d80a11 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -86,6 +86,10 @@ struct rpc_rqst {
 	unsigned long		rq_majortimeo;	/* major timeout alarm */
 	unsigned long		rq_timeout;	/* Current timeout value */
 	unsigned int		rq_retries;	/* # of retries */
+	unsigned int		rq_connect_cookie;
+						/* A cookie used to track the
+						   state of the transport
+						   connection */
 	
 	/*
 	 * Partial send handling
@@ -152,6 +156,9 @@ struct rpc_xprt {
 	unsigned long		connect_timeout,
 				bind_timeout,
 				reestablish_timeout;
+	unsigned int		connect_cookie;	/* A cookie that gets bumped
+						   every time the transport
+						   is reconnected */
 
 	/*
 	 * Disconnection of idle transports
@@ -241,6 +248,7 @@ void			xprt_complete_rqst(struct rpc_task *task, int copied);
 void			xprt_release_rqst_cong(struct rpc_task *task);
 void			xprt_disconnect_done(struct rpc_xprt *xprt);
 void			xprt_force_disconnect(struct rpc_xprt *xprt);
+void			xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie);
 
 /*
  * Reserved bit positions in xprt->state
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 57663a4..2969e84 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1116,7 +1116,8 @@ call_status(struct rpc_task *task)
 	case -ETIMEDOUT:
 		task->tk_action = call_timeout;
 		if (task->tk_client->cl_discrtry)
-			xprt_force_disconnect(task->tk_xprt);
+			xprt_conditional_disconnect(task->tk_xprt,
+					req->rq_connect_cookie);
 		break;
 	case -ECONNREFUSED:
 	case -ENOTCONN:
@@ -1241,7 +1242,8 @@ out_retry:
 	if (task->tk_rqstp == req) {
 		req->rq_received = req->rq_rcv_buf.len = 0;
 		if (task->tk_client->cl_discrtry)
-			xprt_force_disconnect(task->tk_xprt);
+			xprt_conditional_disconnect(task->tk_xprt,
+					req->rq_connect_cookie);
 	}
 }
 
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index a0646a3..75d748e 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -606,6 +606,34 @@ void xprt_force_disconnect(struct rpc_xprt *xprt)
 	spin_unlock_bh(&xprt->transport_lock);
 }
 
+/**
+ * xprt_conditional_disconnect - force a transport to disconnect
+ * @xprt: transport to disconnect
+ * @cookie: 'connection cookie'
+ *
+ * This attempts to break the connection if and only if 'cookie' matches
+ * the current transport 'connection cookie'. It ensures that we don't
+ * try to break the connection more than once when we need to retransmit
+ * a batch of RPC requests.
+ *
+ */
+void xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie)
+{
+	/* Don't race with the test_bit() in xprt_clear_locked() */
+	spin_lock_bh(&xprt->transport_lock);
+	if (cookie != xprt->connect_cookie)
+		goto out;
+	if (test_bit(XPRT_CLOSING, &xprt->state) || !xprt_connected(xprt))
+		goto out;
+	set_bit(XPRT_CLOSE_WAIT, &xprt->state);
+	/* Try to schedule an autoclose RPC call */
+	if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0)
+		queue_work(rpciod_workqueue, &xprt->task_cleanup);
+	xprt_wake_pending_tasks(xprt, -ENOTCONN);
+out:
+	spin_unlock_bh(&xprt->transport_lock);
+}
+
 static void
 xprt_init_autodisconnect(unsigned long data)
 {
@@ -849,6 +877,7 @@ void xprt_transmit(struct rpc_task *task)
 	} else if (!req->rq_bytes_sent)
 		return;
 
+	req->rq_connect_cookie = xprt->connect_cookie;
 	status = xprt->ops->send_request(task);
 	if (status == 0) {
 		dprintk("RPC: %5u xmit complete\n", task->tk_pid);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 4a567a9..63d79e3 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1142,6 +1142,7 @@ static void xs_tcp_state_change(struct sock *sk)
 		break;
 	case TCP_FIN_WAIT1:
 		/* The client initiated a shutdown of the socket */
+		xprt->connect_cookie++;
 		xprt->reestablish_timeout = 0;
 		set_bit(XPRT_CLOSING, &xprt->state);
 		smp_mb__before_clear_bit();
@@ -1154,6 +1155,7 @@ static void xs_tcp_state_change(struct sock *sk)
 		set_bit(XPRT_CLOSING, &xprt->state);
 		xprt_force_disconnect(xprt);
 	case TCP_SYN_SENT:
+		xprt->connect_cookie++;
 	case TCP_CLOSING:
 		/*
 		 * If the server closed down the connection, make sure that


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 32/33] NFS: remove duplicate flags assignment from nfs_validate_mount_data
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (29 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 27/33] SUNRPC: Remove the unused export of xprt_force_disconnect Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 33/33] make nfs_automount_list static Trond Myklebust
  2008-04-19 20:40   ` [PATCH 31/33] NFS - fix potential NULL pointer dereference v2 Trond Myklebust
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Jeff Layton, Trond Myklebust

From: Jeff Layton <jlayton@redhat.com>

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/super.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 7c13ce7..20a1cb1 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1273,7 +1273,6 @@ static int nfs_validate_mount_data(void *options,
 		args->flags		= data->flags;
 		args->rsize		= data->rsize;
 		args->wsize		= data->wsize;
-		args->flags		= data->flags;
 		args->timeo		= data->timeo;
 		args->retrans		= data->retrans;
 		args->acregmin		= data->acregmin;


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 33/33] make nfs_automount_list static
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (30 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 32/33] NFS: remove duplicate flags assignment from nfs_validate_mount_data Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
  2008-04-19 20:40   ` [PATCH 31/33] NFS - fix potential NULL pointer dereference v2 Trond Myklebust
  32 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Adrian Bunk, Trond Myklebust

From: Adrian Bunk <bunk@kernel.org>

nfs_automount_list can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/namespace.c     |    2 +-
 include/linux/nfs_fs.h |    1 -
 2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index 607f6eb..af4d0f1 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -20,7 +20,7 @@
 
 static void nfs_expire_automounts(struct work_struct *work);
 
-LIST_HEAD(nfs_automount_list);
+static LIST_HEAD(nfs_automount_list);
 static DECLARE_DELAYED_WORK(nfs_automount_task, nfs_expire_automounts);
 int nfs_mountpoint_expiry_timeout = 500 * HZ;
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 7f0602c..27d6a8d 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -430,7 +430,6 @@ extern void nfs_unregister_sysctl(void);
 /*
  * linux/fs/nfs/namespace.c
  */
-extern struct list_head nfs_automount_list;
 extern const struct inode_operations nfs_mountpoint_inode_operations;
 extern const struct inode_operations nfs_referral_inode_operations;
 extern int nfs_mountpoint_expiry_timeout;


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 31/33] NFS - fix potential NULL pointer dereference v2
       [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
                     ` (31 preceding siblings ...)
  2008-04-19 20:40   ` [PATCH 33/33] make nfs_automount_list static Trond Myklebust
@ 2008-04-19 20:40   ` Trond Myklebust
       [not found]     ` <20080419204054.14124.59641.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
  32 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-19 20:40 UTC (permalink / raw)
  To: linux-nfs; +Cc: Cyrill Gorcunov, Trond Myklebust

From: Cyrill Gorcunov <gorcunov@gmail.com>

There is possible NULL pointer dereference if kstr[n]dup failed.
So fix them for safety.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/super.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 140174d..7c13ce7 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1295,6 +1295,8 @@ static int nfs_validate_mount_data(void *options,
 		args->namlen		= data->namlen;
 		args->bsize		= data->bsize;
 		args->auth_flavors[0]	= data->pseudoflavor;
+		if (!args->nfs_server.hostname)
+			goto out_nomem;
 
 		/*
 		 * The legacy version 6 binary mount data from userspace has a
@@ -1341,6 +1343,8 @@ static int nfs_validate_mount_data(void *options,
 		len = c - dev_name;
 		/* N.B. caller will free nfs_server.hostname in all cases */
 		args->nfs_server.hostname = kstrndup(dev_name, len, GFP_KERNEL);
+		if (!args->nfs_server.hostname)
+			goto out_nomem;
 
 		c++;
 		if (strlen(c) > NFS_MAXPATHLEN)
@@ -1384,6 +1388,10 @@ out_v3_not_compiled:
 	return -EPROTONOSUPPORT;
 #endif /* !CONFIG_NFS_V3 */
 
+out_nomem:
+	dfprintk(MOUNT, "NFS: not enough memory to handle mount options\n");
+	return -ENOMEM;
+
 out_no_address:
 	dfprintk(MOUNT, "NFS: mount program didn't pass remote address\n");
 	return -EINVAL;
@@ -1890,12 +1898,16 @@ static int nfs4_validate_mount_data(void *options,
 			return -ENAMETOOLONG;
 		/* N.B. caller will free nfs_server.hostname in all cases */
 		args->nfs_server.hostname = kstrndup(dev_name, len, GFP_KERNEL);
+		if (!args->nfs_server.hostname)
+			goto out_nomem;
 
 		c++;			/* step over the ':' */
 		len = strlen(c);
 		if (len > NFS4_MAXPATHLEN)
 			return -ENAMETOOLONG;
 		args->nfs_server.export_path = kstrndup(c, len, GFP_KERNEL);
+		if (!args->nfs_server.export_path)
+			goto out_nomem;
 
 		dprintk("NFS: MNTPATH: '%s'\n", args->nfs_server.export_path);
 
@@ -1917,6 +1929,10 @@ out_inval_auth:
 		 data->auth_flavourlen);
 	return -EINVAL;
 
+out_nomem:
+	dfprintk(MOUNT, "NFS4: not enough memory to handle mount options\n");
+	return -ENOMEM;
+
 out_no_address:
 	dfprintk(MOUNT, "NFS4: mount program didn't pass remote address\n");
 	return -EINVAL;


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 01/33] SUNRPC: Fix a bug in call_decode()
       [not found]     ` <20080419204047.14124.76946.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
@ 2008-04-21 16:22       ` Chuck Lever
  0 siblings, 0 replies; 62+ messages in thread
From: Chuck Lever @ 2008-04-21 16:22 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Hi Trond-

On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> call_verify() can, under certain circumstances, free the RPC slot.  
> In that
> case, our cached pointer 'req = task->tk_rqstp' is invalid. Bug was
> introduced in commit 220bcc2afd7011b3e0569fc178331fa983c92c1b (SUNRPC:
> Don't call xprt_release in call refresh).

Since you still refer to this function as "call_verify" can I assume  
you rejected my patches that renamed it rpc_verify_header?

A better fix might instead move the logic in call_decode's out_retry:  
arm into call_verify.

A subsequent clean-up should refactor the spaghetti in call_verify  
into several simpler functions to reduce the likelihood of this kind  
of defect cropping up again.

> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> net/sunrpc/clnt.c |    9 ++++++---
> 1 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index d6701f7..0c29792 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1236,10 +1236,13 @@ call_decode(struct rpc_task *task)
> 			task->tk_status);
> 	return;
> out_retry:
> -	req->rq_received = req->rq_private_buf.len = 0;
> 	task->tk_status = 0;
> -	if (task->tk_client->cl_discrtry)
> -		xprt_force_disconnect(task->tk_xprt);
> +	/* Note: call_verify() may have freed the RPC slot */
> +	if (task->tk_rqstp == req) {
> +		req->rq_received = req->rq_private_buf.len = 0;
> +		if (task->tk_client->cl_discrtry)
> +			xprt_force_disconnect(task->tk_xprt);
> +	}
> }
>
> /*
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/33] SUNRPC: Fix up xprt_write_space()
       [not found]     ` <20080419204047.14124.5947.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
@ 2008-04-21 17:31       ` Chuck Lever
  2008-04-21 23:51         ` Trond Myklebust
  0 siblings, 1 reply; 62+ messages in thread
From: Chuck Lever @ 2008-04-21 17:31 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Hi Trond-

On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal  
> whether
> or not we have someone waiting for buffer memory. Convert the SUNRPC  
> layer
> to use the same idiom.

As near as I can tell, SOCK_NOSPACE is used for this purpose.  It  
really isn't clear what SOCK_ASYNC_NOSPACE is used for.

In fact I found at least one comment that suggested these flags  
currently may be used inconsistently in the network layer.  Did you  
happen to find any unambiguous documentation explaining how the  
network layer uses these flags?  (I for one would like to understand  
this better).

I'm a little concerned about this patch overall because the  
SOCK_NOSPACE flags interface is well understood by only a handful of  
people in the universe, so it's difficult for us networking outsiders  
to evaluate this patch.

> Remove the unlikely()s in xs_udp_write_space and xs_tcp_write_space.  
> In
> fact, the most common case will be that there is nobody waiting for  
> buffer
> space.

The original purpose of the unlikely() here was to prevent the  
compiler from reordering the out: arm and xprt_write_space, thus  
generating unnecessarily complex object code.  I may even have tested  
this with oprofile on Pentium III.

Not sure it makes any difference at all these days... but you could  
move this change to a separate clean up patch, for clarity.

> SOCK_NOSPACE is there to tell the TCP layer whether or not the cwnd  
> was
> limited by the application window. Ensure that we follow the same  
> idiom as
> the rest of the networking layer here too.

My cursory review of the network layer suggests that this usage of  
SOCK_NOSPACE is an expediency, not the main purpose of SOCK_NOSPACE.

This could be a useful change though... it would be nice to have a  
clear explanation in the patch description of the actual benefit this  
brings to the RPC's TCP socket transport.

> Finally, ensure that we clear SOCK_ASYNC_NOSPACE once we wake up, so  
> that
> write_space() doesn't keep waking things up on xprt->pending.

This is probably the main fix introduced by this patch.  Is there a  
measureable performance gain?

You also moved the transport lock around in xs_nospace (maybe to  
additionally protect the xprt connection flags?), and introduced the  
use of sk_write_pending in the RPC socket transport, but those changes  
are not documented in the patch description.  Was the transport lock  
change a bug fix, or a necessity of the other changes you're making in  
this patch?

Lastly, if there is a palpable benefit to these changes, it would be  
great if the server side write space support could be updated as well.

> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> include/linux/sunrpc/xprt.h |    2 +
> net/sunrpc/xprt.c           |    4 +--
> net/sunrpc/xprtsock.c       |   61 ++++++++++++++++++++++++++++ 
> +--------------
> 3 files changed, 44 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
> index b3ff9a8..8a0629a 100644
> --- a/include/linux/sunrpc/xprt.h
> +++ b/include/linux/sunrpc/xprt.h
> @@ -232,7 +232,7 @@ int			xprt_unregister_transport(struct  
> xprt_class *type);
> void			xprt_set_retrans_timeout_def(struct rpc_task *task);
> void			xprt_set_retrans_timeout_rtt(struct rpc_task *task);
> void			xprt_wake_pending_tasks(struct rpc_xprt *xprt, int status);
> -void			xprt_wait_for_buffer_space(struct rpc_task *task);
> +void			xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action  
> action);
> void			xprt_write_space(struct rpc_xprt *xprt);
> void			xprt_update_rtt(struct rpc_task *task);
> void			xprt_adjust_cwnd(struct rpc_task *task, int result);
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index 85199c6..3ba64f9 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -447,13 +447,13 @@ EXPORT_SYMBOL_GPL(xprt_wake_pending_tasks);
>  * @task: task to be put to sleep

+ * @action: function to call on wake up

>
>  *
>  */
> -void xprt_wait_for_buffer_space(struct rpc_task *task)
> +void xprt_wait_for_buffer_space(struct rpc_task *task, rpc_action  
> action)
> {
> 	struct rpc_rqst *req = task->tk_rqstp;
> 	struct rpc_xprt *xprt = req->rq_xprt;
>
> 	task->tk_timeout = req->rq_timeout;
> -	rpc_sleep_on(&xprt->pending, task, NULL);
> +	rpc_sleep_on(&xprt->pending, task, action);
> }
> EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space);
>
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 8bd3b0f..4c2462e 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -516,6 +516,14 @@ out:
> 	return sent;
> }
>
> +static void xs_nospace_callback(struct rpc_task *task)
> +{
> +	struct sock_xprt *transport = container_of(task->tk_rqstp- 
> >rq_xprt, struct sock_xprt, xprt);
> +
> +	transport->inet->sk_write_pending--;
> +	clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> +}
> +
> /**
>  * xs_nospace - place task on wait queue if transmit was incomplete
>  * @task: task to put to sleep
> @@ -531,20 +539,27 @@ static void xs_nospace(struct rpc_task *task)
> 			task->tk_pid, req->rq_slen - req->rq_bytes_sent,
> 			req->rq_slen);
>
> -	if (test_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags)) {
> -		/* Protect against races with write_space */
> -		spin_lock_bh(&xprt->transport_lock);
> -
> -		/* Don't race with disconnect */
> -		if (!xprt_connected(xprt))
> -			task->tk_status = -ENOTCONN;
> -		else if (test_bit(SOCK_NOSPACE, &transport->sock->flags))
> -			xprt_wait_for_buffer_space(task);
> +	/* Protect against races with write_space */
> +	spin_lock_bh(&xprt->transport_lock);
> +
> +	/* Don't race with disconnect */
> +	if (xprt_connected(xprt)) {
> +		if (test_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags)) {
> +			/*
> +			 * Notify TCP that we're limited by the application
> +			 * window size
> +			 */
> +			set_bit(SOCK_NOSPACE, &transport->sock->flags);
> +			transport->inet->sk_write_pending++;
> +			/* ...and wait for more buffer space */
> +			xprt_wait_for_buffer_space(task, xs_nospace_callback);
> +		}
> +	} else {
> +		clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> +		task->tk_status = -ENOTCONN;
> +	}
>
> -		spin_unlock_bh(&xprt->transport_lock);
> -	} else
> -		/* Keep holding the socket if it is blocked */
> -		rpc_delay(task, HZ>>4);
> +	spin_unlock_bh(&xprt->transport_lock);
> }
>
> /**
> @@ -588,19 +603,20 @@ static int xs_udp_send_request(struct rpc_task  
> *task)
> 	}
>
> 	switch (status) {
> +	case -EAGAIN:
> +		xs_nospace(task);
> +		break;
> 	case -ENETUNREACH:
> 	case -EPIPE:
> 	case -ECONNREFUSED:
> 		/* When the server has died, an ICMP port unreachable message
> 		 * prompts ECONNREFUSED. */
> -		break;
> -	case -EAGAIN:
> -		xs_nospace(task);
> +		clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> 		break;
> 	default:
> +		clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> 		dprintk("RPC:       sendmsg returned unrecognized error %d\n",
> 			-status);
> -		break;
> 	}
>
> 	return status;
> @@ -695,12 +711,13 @@ static int xs_tcp_send_request(struct rpc_task  
> *task)
> 	case -ENOTCONN:
> 	case -EPIPE:
> 		status = -ENOTCONN;
> +		clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> 		break;
> 	default:
> 		dprintk("RPC:       sendmsg returned unrecognized error %d\n",
> 			-status);
> +		clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
> 		xs_tcp_shutdown(xprt);
> -		break;
> 	}
>
> 	return status;
> @@ -1189,9 +1206,11 @@ static void xs_udp_write_space(struct sock *sk)
>
> 		if (unlikely(!(sock = sk->sk_socket)))
> 			goto out;
> +		clear_bit(SOCK_NOSPACE, &sock->flags);
> +
> 		if (unlikely(!(xprt = xprt_from_sock(sk))))
> 			goto out;
> -		if (unlikely(!test_and_clear_bit(SOCK_NOSPACE, &sock->flags)))
> +		if (test_and_clear_bit(SOCK_ASYNC_NOSPACE, &sock->flags) == 0)
> 			goto out;
>
> 		xprt_write_space(xprt);
> @@ -1222,9 +1241,11 @@ static void xs_tcp_write_space(struct sock *sk)
>
> 		if (unlikely(!(sock = sk->sk_socket)))
> 			goto out;
> +		clear_bit(SOCK_NOSPACE, &sock->flags);
> +
> 		if (unlikely(!(xprt = xprt_from_sock(sk))))
> 			goto out;
> -		if (unlikely(!test_and_clear_bit(SOCK_NOSPACE, &sock->flags)))
> +		if (test_and_clear_bit(SOCK_ASYNC_NOSPACE, &sock->flags) == 0)
> 			goto out;
>
> 		xprt_write_space(xprt);


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials..
       [not found]     ` <20080419204047.14124.64969.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
@ 2008-04-21 17:50       ` Chuck Lever
  2008-04-22  0:00         ` Trond Myklebust
  0 siblings, 1 reply; 62+ messages in thread
From: Chuck Lever @ 2008-04-21 17:50 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Hi Trond-

On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> ..and always destroy using a 'soft' RPC call. Destroying GSS  
> credentials
> isn't mandatory; the server can always cope with a few credentials not
> getting destroyed in a timely fashion.

These two changes may have different side-effects, but backing out  
this patch will revert both changes.  Should these be split into  
separate patches to facilitate git bisect?

> This actually fixes a hang situation. Basically, some servers will  
> decide
> that the client is crazy if it tries to destroy an RPC context for  
> which
> they have sent an RPCSEC_GSS_CREDPROBLEM, and so will refuse to talk  
> to it
> for a while.

That seems unfriendly.  Is this server-side behavior mandated?  Why  
wouldn't the server just treat an extraneous context destruction  
request as a successful no-op?

You definitely should include an explanation of this server-side  
behavior in the documenting comment for gss_destroying_context.

> The regression therefor probably was introduced by commit
> 0df7fb74fbb709591301871a38aac7735a1d6583.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> net/sunrpc/auth_gss/auth_gss.c |    4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/ 
> auth_gss.c
> index d34f6df..55948cd 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -710,7 +710,7 @@ gss_destroying_context(struct rpc_cred *cred)
> 	struct rpc_task *task;
>
> 	if (gss_cred->gc_ctx == NULL ||
> -			gss_cred->gc_ctx->gc_proc == RPC_GSS_PROC_DESTROY)
> +	    test_and_clear_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags) == 0)
> 		return 0;
>
> 	gss_cred->gc_ctx->gc_proc = RPC_GSS_PROC_DESTROY;
> @@ -720,7 +720,7 @@ gss_destroying_context(struct rpc_cred *cred)
> 	 * by the RPC call or by the put_rpccred() below */
> 	get_rpccred(cred);
>
> -	task = rpc_call_null(gss_auth->client, cred, RPC_TASK_ASYNC);
> +	task = rpc_call_null(gss_auth->client, cred, RPC_TASK_ASYNC| 
> RPC_TASK_SOFT);
> 	if (!IS_ERR(task))
> 		rpc_put_task(task);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/33] NFS: Fix nfs_wb_page() to always exit with an error or a clean page
       [not found]     ` <20080419204048.14124.46594.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
@ 2008-04-21 18:53       ` Chuck Lever
  2008-04-22  0:14         ` Trond Myklebust
  0 siblings, 1 reply; 62+ messages in thread
From: Chuck Lever @ 2008-04-21 18:53 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Hi Trond-

On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> It is possible for nfs_wb_page() to sometimes exit with 0 return  
> value, yet
> the page is left in a dirty state.
> For instance in the case where the server rebooted, and the COMMIT  
> request
> failed, then all the previously "clean" pages which were cached by the
> server, but were not guaranteed to have been writted out to disk,
> have to be redirtied and resent to the server.
> The fix is to have nfs_wb_page_priority() check that the page is clean
> before it exits...

We have somewhat similar logic in nfs_wb_page_cancel and  
__nfs_write_mapping.  How is it that these two are not affected by the  
"server reboot / commit failed" scenario?  Do they simply retry until  
the resent write succeeds?

> This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
> nfs_create_request() when we're in the nfs_readpage() path.

Seems like there's more at stake here than just triggering a BUG.

In the launder_page path, dirty data could be lost if nfs_wb_page  
returns zero, but hasn't successfully flushed the data to the server,  
right?

In the nfs_flush_incompatible path, you would lose write ordering at  
the least... couldn't that potentially result in data corruption?

I don't recall the test case that triggered the BUG.  Do we have a  
test that is run often (eg. fsx or fsx-odirect) that exercises this  
path?

> Also eliminate a redundant BUG_ON(!PageLocked(page)) while we're at  
> it. It
> turns out that clear_page_dirty_for_io() has the exact same test.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> fs/nfs/write.c |   23 ++++++++++++-----------
> 1 files changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index ce40cad..997b42a 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1493,18 +1493,19 @@ static int nfs_wb_page_priority(struct inode  
> *inode, struct page *page,
> 	};
> 	int ret;
>
> -	BUG_ON(!PageLocked(page));
> -	if (clear_page_dirty_for_io(page)) {
> -		ret = nfs_writepage_locked(page, &wbc);
> +	do {
> +		if (clear_page_dirty_for_io(page)) {
> +			ret = nfs_writepage_locked(page, &wbc);
> +			if (ret < 0)
> +				goto out_error;
> +		} else if (!PagePrivate(page))
> +			break;
> +		ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
> 		if (ret < 0)
> -			goto out;
> -	}
> -	if (!PagePrivate(page))
> -		return 0;
> -	ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
> -	if (ret >= 0)
> -		return 0;
> -out:
> +			goto out_error;
> +	} while (PagePrivate(page));
> +	return 0;
> +out_error:
> 	__mark_inode_dirty(inode, I_DIRTY_PAGES);
> 	return ret;
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/33] NFS: Ensure that rpc_run_task() errors are propagated back to the caller
       [not found]     ` <20080419204048.14124.4335.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
@ 2008-04-21 19:55       ` Chuck Lever
  2008-04-22  0:17         ` Trond Myklebust
  0 siblings, 1 reply; 62+ messages in thread
From: Chuck Lever @ 2008-04-21 19:55 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Hi Trond-

On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> fs/nfs/direct.c |   10 ++++++----
> fs/nfs/read.c   |   23 +++++++++++++++--------
> fs/nfs/write.c  |   33 +++++++++++++++++++--------------
> 3 files changed, 40 insertions(+), 26 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index abf8e02..4757a2b 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -347,8 +347,9 @@ static ssize_t  
> nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
> 		NFS_PROTO(inode)->read_setup(data, &msg);
>
> 		task = rpc_run_task(&task_setup_data);
> -		if (!IS_ERR(task))
> -			rpc_put_task(task);
> +		if (IS_ERR(task))
> +			break;
> +		rpc_put_task(task);
>
> 		dprintk("NFS: %5u initiated direct read call "
> 			"(req %s/%Ld, %zu bytes @ offset %Lu)\n",

My reading of this logic suggests that if the very first  
rpc_run_task() call in the loop fails, we'll return -EFAULT instead of  
something sensible, like -ENOMEM.

> @@ -763,8 +764,9 @@ static ssize_t  
> nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
> 		NFS_PROTO(inode)->write_setup(data, &msg);
>
> 		task = rpc_run_task(&task_setup_data);
> -		if (!IS_ERR(task))
> -			rpc_put_task(task);
> +		if (IS_ERR(task))
> +			break;
> +		rpc_put_task(task);
>
> 		dprintk("NFS: %5u initiated direct write call "
> 			"(req %s/%Ld, %zu bytes @ offset %Lu)\n",

Likewise.

> diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> index 6f9208a..16f57e0 100644
> --- a/fs/nfs/read.c
> +++ b/fs/nfs/read.c
> @@ -153,7 +153,7 @@ static void nfs_readpage_release(struct nfs_page  
> *req)
> /*
>  * Set up the NFS read request struct
>  */
> -static void nfs_read_rpcsetup(struct nfs_page *req, struct  
> nfs_read_data *data,
> +static int nfs_read_rpcsetup(struct nfs_page *req, struct  
> nfs_read_data *data,
> 		const struct rpc_call_ops *call_ops,
> 		unsigned int count, unsigned int offset)
> {
> @@ -202,8 +202,10 @@ static void nfs_read_rpcsetup(struct nfs_page  
> *req, struct nfs_read_data *data,
> 			(unsigned long long)data->args.offset);
>
> 	task = rpc_run_task(&task_setup_data);
> -	if (!IS_ERR(task))
> -		rpc_put_task(task);
> +	if (IS_ERR(task))
> +		return PTR_ERR(task);
> +	rpc_put_task(task);
> +	return 0;
> }
>
> static void
> @@ -240,6 +242,7 @@ static int nfs_pagein_multi(struct inode *inode,  
> struct list_head *head, unsigne
> 	size_t rsize = NFS_SERVER(inode)->rsize, nbytes;
> 	unsigned int offset;
> 	int requests = 0;
> +	int ret = 0;
> 	LIST_HEAD(list);
>
> 	nfs_list_remove_request(req);
> @@ -261,6 +264,8 @@ static int nfs_pagein_multi(struct inode *inode,  
> struct list_head *head, unsigne
> 	offset = 0;
> 	nbytes = count;
> 	do {
> +		int ret2;
> +
> 		data = list_entry(list.next, struct nfs_read_data, pages);
> 		list_del_init(&data->pages);
>
> @@ -268,13 +273,15 @@ static int nfs_pagein_multi(struct inode  
> *inode, struct list_head *head, unsigne
>
> 		if (nbytes < rsize)
> 			rsize = nbytes;
> -		nfs_read_rpcsetup(req, data, &nfs_read_partial_ops,
> +		ret2 = nfs_read_rpcsetup(req, data, &nfs_read_partial_ops,
> 				  rsize, offset);
> +		if (ret == 0)
> +			ret = ret2;
> 		offset += rsize;
> 		nbytes -= rsize;
> 	} while (nbytes != 0);
>
> -	return 0;
> +	return ret;
>
> out_bad:
> 	while (!list_empty(&list)) {
> @@ -292,6 +299,7 @@ static int nfs_pagein_one(struct inode *inode,  
> struct list_head *head, unsigned
> 	struct nfs_page		*req;
> 	struct page		**pages;
> 	struct nfs_read_data	*data;
> +	int ret = -ENOMEM;
>
> 	data = nfs_readdata_alloc(npages);
> 	if (!data)
> @@ -307,11 +315,10 @@ static int nfs_pagein_one(struct inode *inode,  
> struct list_head *head, unsigned
> 	}
> 	req = nfs_list_entry(data->pages.next);
>
> -	nfs_read_rpcsetup(req, data, &nfs_read_full_ops, count, 0);
> -	return 0;
> +	return nfs_read_rpcsetup(req, data, &nfs_read_full_ops, count, 0);
> out_bad:
> 	nfs_async_read_error(head);
> -	return -ENOMEM;
> +	return ret;
> }
>
> /*
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 31681bb..1ade11d 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -778,7 +778,7 @@ static int flush_task_priority(int how)
> /*
>  * Set up the argument/result storage required for the RPC call.
>  */
> -static void nfs_write_rpcsetup(struct nfs_page *req,
> +static int nfs_write_rpcsetup(struct nfs_page *req,
> 		struct nfs_write_data *data,
> 		const struct rpc_call_ops *call_ops,
> 		unsigned int count, unsigned int offset,
> @@ -841,8 +841,10 @@ static void nfs_write_rpcsetup(struct nfs_page  
> *req,
> 		(unsigned long long)data->args.offset);
>
> 	task = rpc_run_task(&task_setup_data);
> -	if (!IS_ERR(task))
> -		rpc_put_task(task);
> +	if (IS_ERR(task))
> +		return PTR_ERR(task);
> +	rpc_put_task(task);
> +	return 0;
> }
>
> /* If a nfs_flush_* function fails, it should remove reqs from @head  
> and
> @@ -868,6 +870,7 @@ static int nfs_flush_multi(struct inode *inode,  
> struct list_head *head, unsigned
> 	size_t wsize = NFS_SERVER(inode)->wsize, nbytes;
> 	unsigned int offset;
> 	int requests = 0;
> +	int ret = 0;
> 	LIST_HEAD(list);
>
> 	nfs_list_remove_request(req);
> @@ -889,6 +892,8 @@ static int nfs_flush_multi(struct inode *inode,  
> struct list_head *head, unsigned
> 	offset = 0;
> 	nbytes = count;
> 	do {
> +		int ret2;
> +
> 		data = list_entry(list.next, struct nfs_write_data, pages);
> 		list_del_init(&data->pages);
>
> @@ -896,13 +901,15 @@ static int nfs_flush_multi(struct inode  
> *inode, struct list_head *head, unsigned
>
> 		if (nbytes < wsize)
> 			wsize = nbytes;
> -		nfs_write_rpcsetup(req, data, &nfs_write_partial_ops,
> +		ret2 = nfs_write_rpcsetup(req, data, &nfs_write_partial_ops,
> 				   wsize, offset, how);
> +		if (ret == 0)
> +			ret = ret2;
> 		offset += wsize;
> 		nbytes -= wsize;
> 	} while (nbytes != 0);
>
> -	return 0;
> +	return ret;
>
> out_bad:
> 	while (!list_empty(&list)) {
> @@ -943,9 +950,7 @@ static int nfs_flush_one(struct inode *inode,  
> struct list_head *head, unsigned i
> 	req = nfs_list_entry(data->pages.next);
>
> 	/* Set up the argument struct */
> -	nfs_write_rpcsetup(req, data, &nfs_write_full_ops, count, 0, how);
> -
> -	return 0;
> +	return nfs_write_rpcsetup(req, data, &nfs_write_full_ops, count,  
> 0, how);
>  out_bad:
> 	while (!list_empty(head)) {
> 		req = nfs_list_entry(head->next);
> @@ -1183,7 +1188,7 @@ void nfs_commitdata_release(void *data)
> /*
>  * Set up the argument/result storage required for the RPC call.
>  */
> -static void nfs_commit_rpcsetup(struct list_head *head,
> +static int nfs_commit_rpcsetup(struct list_head *head,
> 		struct nfs_write_data *data,
> 		int how)
> {
> @@ -1232,8 +1237,10 @@ static void nfs_commit_rpcsetup(struct  
> list_head *head,
> 	dprintk("NFS: %5u initiated commit call\n", data->task.tk_pid);
>
> 	task = rpc_run_task(&task_setup_data);
> -	if (!IS_ERR(task))
> -		rpc_put_task(task);
> +	if (IS_ERR(task))
> +		return PTR_ERR(task);
> +	rpc_put_task(task);
> +	return 0;
> }
>
> /*
> @@ -1251,9 +1258,7 @@ nfs_commit_list(struct inode *inode, struct  
> list_head *head, int how)
> 		goto out_bad;
>
> 	/* Set up the argument struct */
> -	nfs_commit_rpcsetup(head, data, how);
> -
> -	return 0;
> +	return nfs_commit_rpcsetup(head, data, how);
>  out_bad:
> 	while (!list_empty(head)) {
> 		req = nfs_list_entry(head->next);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 31/33] NFS - fix potential NULL pointer dereference v2
       [not found]     ` <20080419204054.14124.59641.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
@ 2008-04-21 21:13       ` Chuck Lever
  2008-04-22  0:21         ` Trond Myklebust
  0 siblings, 1 reply; 62+ messages in thread
From: Chuck Lever @ 2008-04-21 21:13 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, Cyrill Gorcunov

On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> From: Cyrill Gorcunov <gorcunov@gmail.com>
>
> There is possible NULL pointer dereference if kstr[n]dup failed.

The logic in super.c and client.c shouldn't assume nfs_server.hostname  
is non-NULL.  Can you say where the NULL dereference might happen?

> So fix them for safety.

Note that mount_server.hostname, and nfs_server.export_path also use  
kstrdup without a safety net.

I see that nfs_mount and nfs4_path_walk might have a problem if a  
kstrdup failed earlier.

> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> fs/nfs/super.c |   16 ++++++++++++++++
> 1 files changed, 16 insertions(+), 0 deletions(-)
>
> diff --git a/fs/nfs/super.c b/fs/nfs/super.c
> index 140174d..7c13ce7 100644
> --- a/fs/nfs/super.c
> +++ b/fs/nfs/super.c
> @@ -1295,6 +1295,8 @@ static int nfs_validate_mount_data(void  
> *options,
> 		args->namlen		= data->namlen;
> 		args->bsize		= data->bsize;
> 		args->auth_flavors[0]	= data->pseudoflavor;
> +		if (!args->nfs_server.hostname)
> +			goto out_nomem;
>
> 		/*
> 		 * The legacy version 6 binary mount data from userspace has a
> @@ -1341,6 +1343,8 @@ static int nfs_validate_mount_data(void  
> *options,
> 		len = c - dev_name;
> 		/* N.B. caller will free nfs_server.hostname in all cases */
> 		args->nfs_server.hostname = kstrndup(dev_name, len, GFP_KERNEL);
> +		if (!args->nfs_server.hostname)
> +			goto out_nomem;
>
> 		c++;
> 		if (strlen(c) > NFS_MAXPATHLEN)
> @@ -1384,6 +1388,10 @@ out_v3_not_compiled:
> 	return -EPROTONOSUPPORT;
> #endif /* !CONFIG_NFS_V3 */
>
> +out_nomem:
> +	dfprintk(MOUNT, "NFS: not enough memory to handle mount options\n");
> +	return -ENOMEM;
> +
> out_no_address:
> 	dfprintk(MOUNT, "NFS: mount program didn't pass remote address\n");
> 	return -EINVAL;
> @@ -1890,12 +1898,16 @@ static int nfs4_validate_mount_data(void  
> *options,
> 			return -ENAMETOOLONG;
> 		/* N.B. caller will free nfs_server.hostname in all cases */
> 		args->nfs_server.hostname = kstrndup(dev_name, len, GFP_KERNEL);
> +		if (!args->nfs_server.hostname)
> +			goto out_nomem;
>
> 		c++;			/* step over the ':' */
> 		len = strlen(c);
> 		if (len > NFS4_MAXPATHLEN)
> 			return -ENAMETOOLONG;
> 		args->nfs_server.export_path = kstrndup(c, len, GFP_KERNEL);
> +		if (!args->nfs_server.export_path)
> +			goto out_nomem;
>
> 		dprintk("NFS: MNTPATH: '%s'\n", args->nfs_server.export_path);
>
> @@ -1917,6 +1929,10 @@ out_inval_auth:
> 		 data->auth_flavourlen);
> 	return -EINVAL;
>
> +out_nomem:
> +	dfprintk(MOUNT, "NFS4: not enough memory to handle mount options 
> \n");
> +	return -ENOMEM;
> +
> out_no_address:
> 	dfprintk(MOUNT, "NFS4: mount program didn't pass remote address\n");
> 	return -EINVAL;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/33] SUNRPC: Fix read ordering problems with req->rq_private_buf.len
       [not found]     ` <20080419204049.14124.11174.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
@ 2008-04-21 21:19       ` Chuck Lever
  2008-04-22  0:30         ` Trond Myklebust
  0 siblings, 1 reply; 62+ messages in thread
From: Chuck Lever @ 2008-04-21 21:19 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Hi Trond-

On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> We want to ensure that req->rq_private_buf.len is updated before
> req->rq_received, so that call_decode() doesn't use an old value for
> req->rq_rcv_buf.len.
>
> In 'call_decode()' itself, instead of using task->tk_status (which  
> is set
> using req->rq_received) must use the actual value of
> req->rq_private_buf.len when deciding whether or not the received  
> RPC reply
> is too short.
>
> Finally ensure that we set req->rq_rcv_buf.len to zero when retrying a
> request. A typo meant that we were resetting req->rq_private_buf.len  
> in
> call_decode(), and then clobbering that value with the old  
> rq_rcv_buf.len
> again in xprt_transmit().

After staring at this for a while, the interaction between  
xprt_complete_rqst and call_decode isn't clear to me.

I take it there is no guarantee that the xdr_buf fields and  
rq_received are completely updated before the task is awoken and  
call_decode runs?

> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> net/sunrpc/clnt.c |   26 +++++++++++++-------------
> net/sunrpc/xprt.c |    3 ++-
> 2 files changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 0c29792..57663a4 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1195,18 +1195,6 @@ call_decode(struct rpc_task *task)
> 		task->tk_flags &= ~RPC_CALL_MAJORSEEN;
> 	}
>
> -	if (task->tk_status < 12) {
> -		if (!RPC_IS_SOFT(task)) {
> -			task->tk_action = call_bind;
> -			clnt->cl_stats->rpcretrans++;
> -			goto out_retry;
> -		}
> -		dprintk("RPC:       %s: too small RPC reply size (%d bytes)\n",
> -				clnt->cl_protname, task->tk_status);
> -		task->tk_action = call_timeout;
> -		goto out_retry;
> -	}
> -
> 	/*
> 	 * Ensure that we see all writes made by xprt_complete_rqst()
> 	 * before it changed req->rq_received.
> @@ -1218,6 +1206,18 @@ call_decode(struct rpc_task *task)
> 	WARN_ON(memcmp(&req->rq_rcv_buf, &req->rq_private_buf,
> 				sizeof(req->rq_rcv_buf)) != 0);
>
> +	if (req->rq_rcv_buf.len < 12) {
> +		if (!RPC_IS_SOFT(task)) {
> +			task->tk_action = call_bind;
> +			clnt->cl_stats->rpcretrans++;
> +			goto out_retry;
> +		}
> +		dprintk("RPC:       %s: too small RPC reply size (%d bytes)\n",
> +				clnt->cl_protname, task->tk_status);
> +		task->tk_action = call_timeout;
> +		goto out_retry;
> +	}
> +
> 	/* Verify the RPC header */
> 	p = call_verify(task);
> 	if (IS_ERR(p)) {
> @@ -1239,7 +1239,7 @@ out_retry:
> 	task->tk_status = 0;
> 	/* Note: call_verify() may have freed the RPC slot */
> 	if (task->tk_rqstp == req) {
> -		req->rq_received = req->rq_private_buf.len = 0;
> +		req->rq_received = req->rq_rcv_buf.len = 0;
> 		if (task->tk_client->cl_discrtry)
> 			xprt_force_disconnect(task->tk_xprt);
> 	}
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index 3ba64f9..5110a4e 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -757,9 +757,10 @@ void xprt_complete_rqst(struct rpc_task *task,  
> int copied)
> 	task->tk_rtt = (long)jiffies - req->rq_xtime;
>
> 	list_del_init(&req->rq_list);
> +	req->rq_private_buf.len = copied;
> 	/* Ensure all writes are done before we update req->rq_received */
> 	smp_wmb();
> -	req->rq_received = req->rq_private_buf.len = copied;
> +	req->rq_received = copied;
> 	rpc_wake_up_queued_task(&xprt->pending, task);
> }
> EXPORT_SYMBOL_GPL(xprt_complete_rqst);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 02/33] SUNRPC: Fix up xprt_write_space()
  2008-04-21 17:31       ` Chuck Lever
@ 2008-04-21 23:51         ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-21 23:51 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs


On Mon, 2008-04-21 at 13:31 -0400, Chuck Lever wrote:
> Hi Trond-
> 
> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> > The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal  
> > whether
> > or not we have someone waiting for buffer memory. Convert the SUNRPC  
> > layer
> > to use the same idiom.
> 
> As near as I can tell, SOCK_NOSPACE is used for this purpose.  It  
> really isn't clear what SOCK_ASYNC_NOSPACE is used for.

See sock_wake_async(): The SOCK_ASYNC_NOSPACE flag basically turns on
and off the mechanism for notifying userland that the socket send
congestion is over.

> In fact I found at least one comment that suggested these flags  
> currently may be used inconsistently in the network layer.  Did you  
> happen to find any unambiguous documentation explaining how the  
> network layer uses these flags?  (I for one would like to understand  
> this better).
> 
> I'm a little concerned about this patch overall because the  
> SOCK_NOSPACE flags interface is well understood by only a handful of  
> people in the universe, so it's difficult for us networking outsiders  
> to evaluate this patch.

See net/ipv4/tcp_input.c:tcp_cwnd_application_limited(). The
SOCK_NOSPACE flag is used to notify the code that regulates the TCP
congestion window that the application hit the sndbuf limit. It is
normally supposed to be cleared in sk_stream_write_space() if and only
if we're below the low waterline for the sndbuf.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials..
  2008-04-21 17:50       ` Chuck Lever
@ 2008-04-22  0:00         ` Trond Myklebust
       [not found]           ` <1208822443.7767.23.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-22  0:00 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs


On Mon, 2008-04-21 at 13:50 -0400, Chuck Lever wrote:
> Hi Trond-
> 
> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> > ..and always destroy using a 'soft' RPC call. Destroying GSS  
> > credentials
> > isn't mandatory; the server can always cope with a few credentials not
> > getting destroyed in a timely fashion.
> 
> These two changes may have different side-effects, but backing out  
> this patch will revert both changes.  Should these be split into  
> separate patches to facilitate git bisect?

See the comment above. Destroying GSS sessions is _not_ mandatory. If
there are any side-effects from not destroying them, then that would be
a server bug, not a client issue.

> > This actually fixes a hang situation. Basically, some servers will  
> > decide
> > that the client is crazy if it tries to destroy an RPC context for  
> > which
> > they have sent an RPCSEC_GSS_CREDPROBLEM, and so will refuse to talk  
> > to it
> > for a while.
> 
> That seems unfriendly.  Is this server-side behavior mandated?  Why  
> wouldn't the server just treat an extraneous context destruction  
> request as a successful no-op?
> 
> You definitely should include an explanation of this server-side  
> behavior in the documenting comment for gss_destroying_context.

RFC-2203 states that servers are supposed to silently discard requests
that they don't recognise (see section 5.3.3.1 - Context Management), so
it is correct server behaviour.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/33] NFS: Fix nfs_wb_page() to always exit with an error or a clean page
  2008-04-21 18:53       ` Chuck Lever
@ 2008-04-22  0:14         ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-22  0:14 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs


On Mon, 2008-04-21 at 14:53 -0400, Chuck Lever wrote:
> Hi Trond-
> 
> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> > It is possible for nfs_wb_page() to sometimes exit with 0 return  
> > value, yet
> > the page is left in a dirty state.
> > For instance in the case where the server rebooted, and the COMMIT  
> > request
> > failed, then all the previously "clean" pages which were cached by the
> > server, but were not guaranteed to have been writted out to disk,
> > have to be redirtied and resent to the server.
> > The fix is to have nfs_wb_page_priority() check that the page is clean
> > before it exits...
> 
> We have somewhat similar logic in nfs_wb_page_cancel and  
> __nfs_write_mapping.  How is it that these two are not affected by the  
> "server reboot / commit failed" scenario?  Do they simply retry until  
> the resent write succeeds?

Who said they're not affected?

The difference is that nfs_wb_page_cancel() will attempt to clear all
requests that are not committed instead of redirtying them.

OTOH, nfs_write_mapping() can and will leave dirty pages under certain
circumstances.

> > This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
> > nfs_create_request() when we're in the nfs_readpage() path.
> 
> Seems like there's more at stake here than just triggering a BUG.
> 
> In the launder_page path, dirty data could be lost if nfs_wb_page  
> returns zero, but hasn't successfully flushed the data to the server,  
> right?
> 
> In the nfs_flush_incompatible path, you would lose write ordering at  
> the least... couldn't that potentially result in data corruption?

Yes.

> I don't recall the test case that triggered the BUG.  Do we have a  
> test that is run often (eg. fsx or fsx-odirect) that exercises this  
> path?

You would have to either cause the RPC call to fail due to something
like an ENOMEM error, or you would have to have a server reboot at the
'right' moment.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/33] NFS: Ensure that rpc_run_task() errors are propagated back to the caller
  2008-04-21 19:55       ` Chuck Lever
@ 2008-04-22  0:17         ` Trond Myklebust
       [not found]           ` <1208823472.7767.40.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-22  0:17 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs


On Mon, 2008-04-21 at 15:55 -0400, Chuck Lever wrote:
> Hi Trond-
> 
> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> > ---
> >
> > fs/nfs/direct.c |   10 ++++++----
> > fs/nfs/read.c   |   23 +++++++++++++++--------
> > fs/nfs/write.c  |   33 +++++++++++++++++++--------------
> > 3 files changed, 40 insertions(+), 26 deletions(-)
> >
> > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> > index abf8e02..4757a2b 100644
> > --- a/fs/nfs/direct.c
> > +++ b/fs/nfs/direct.c
> > @@ -347,8 +347,9 @@ static ssize_t  
> > nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
> > 		NFS_PROTO(inode)->read_setup(data, &msg);
> >
> > 		task = rpc_run_task(&task_setup_data);
> > -		if (!IS_ERR(task))
> > -			rpc_put_task(task);
> > +		if (IS_ERR(task))
> > +			break;
> > +		rpc_put_task(task);
> >
> > 		dprintk("NFS: %5u initiated direct read call "
> > 			"(req %s/%Ld, %zu bytes @ offset %Lu)\n",
> 
> My reading of this logic suggests that if the very first  
> rpc_run_task() call in the loop fails, we'll return -EFAULT instead of  
> something sensible, like -ENOMEM.

I'm confused. How could ENOMEM be a sensible substitute for EFAULT?


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 31/33] NFS - fix potential NULL pointer dereference v2
  2008-04-21 21:13       ` Chuck Lever
@ 2008-04-22  0:21         ` Trond Myklebust
       [not found]           ` <1208823685.7767.43.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-22  0:21 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, Cyrill Gorcunov


On Mon, 2008-04-21 at 17:13 -0400, Chuck Lever wrote:
> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> > From: Cyrill Gorcunov <gorcunov@gmail.com>
> >
> > There is possible NULL pointer dereference if kstr[n]dup failed.
> 
> The logic in super.c and client.c shouldn't assume nfs_server.hostname  
> is non-NULL.  Can you say where the NULL dereference might happen?

Sure it does. See for instance all those dereferences of
nfs_client->cl_hostname. It has never been acceptable to set a null
hostname.


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/33] SUNRPC: Fix read ordering problems with req->rq_private_buf.len
  2008-04-21 21:19       ` Chuck Lever
@ 2008-04-22  0:30         ` Trond Myklebust
       [not found]           ` <1208824201.7767.53.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-22  0:30 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs


On Mon, 2008-04-21 at 17:19 -0400, Chuck Lever wrote:
> Hi Trond-
> 
> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> > We want to ensure that req->rq_private_buf.len is updated before
> > req->rq_received, so that call_decode() doesn't use an old value for
> > req->rq_rcv_buf.len.
> >
> > In 'call_decode()' itself, instead of using task->tk_status (which  
> > is set
> > using req->rq_received) must use the actual value of
> > req->rq_private_buf.len when deciding whether or not the received  
> > RPC reply
> > is too short.
> >
> > Finally ensure that we set req->rq_rcv_buf.len to zero when retrying a
> > request. A typo meant that we were resetting req->rq_private_buf.len  
> > in
> > call_decode(), and then clobbering that value with the old  
> > rq_rcv_buf.len
> > again in xprt_transmit().
> 
> After staring at this for a while, the interaction between  
> xprt_complete_rqst and call_decode isn't clear to me.
> 
> I take it there is no guarantee that the xdr_buf fields and  
> rq_received are completely updated before the task is awoken and  
> call_decode runs?

The call could complete just as the RPC call is being woken up due to a
timeout. In any case, we need to ensure that the ordering of the update
is correct. We need to know that if a processor sees req->rq_received as
being non-zero, then the same processor will see req->rq_private_buf.len
as being updated: on something like an alpha processor or a PPC, we need
to use explicit read and write barriers to ensure this.


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials..
       [not found]           ` <1208822443.7767.23.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2008-04-22 13:38             ` Chuck Lever
  2008-04-22 15:06               ` Trond Myklebust
  0 siblings, 1 reply; 62+ messages in thread
From: Chuck Lever @ 2008-04-22 13:38 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Apr 21, 2008, at 8:00 PM, Trond Myklebust wrote:
> On Mon, 2008-04-21 at 13:50 -0400, Chuck Lever wrote:
>>> This actually fixes a hang situation. Basically, some servers will
>>> decide
>>> that the client is crazy if it tries to destroy an RPC context for
>>> which
>>> they have sent an RPCSEC_GSS_CREDPROBLEM, and so will refuse to talk
>>> to it
>>> for a while.
>>
>> That seems unfriendly.  Is this server-side behavior mandated?  Why
>> wouldn't the server just treat an extraneous context destruction
>> request as a successful no-op?
>>
>> You definitely should include an explanation of this server-side
>> behavior in the documenting comment for gss_destroying_context.
>
> RFC-2203 states that servers are supposed to silently discard requests
> that they don't recognise (see section 5.3.3.1 - Context  
> Management), so
> it is correct server behaviour.


Dropping the request to destroy a context is fine.  Temporarily  
fencing the client is what I was concerned about.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/33] NFS: Ensure that rpc_run_task() errors are propagated back to the caller
       [not found]           ` <1208823472.7767.40.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2008-04-22 14:19             ` Chuck Lever
  2008-04-22 15:10               ` Trond Myklebust
  0 siblings, 1 reply; 62+ messages in thread
From: Chuck Lever @ 2008-04-22 14:19 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Apr 21, 2008, at 8:17 PM, Trond Myklebust wrote:
> On Mon, 2008-04-21 at 15:55 -0400, Chuck Lever wrote:
>> Hi Trond-
>>
>> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>>> ---
>>>
>>> fs/nfs/direct.c |   10 ++++++----
>>> fs/nfs/read.c   |   23 +++++++++++++++--------
>>> fs/nfs/write.c  |   33 +++++++++++++++++++--------------
>>> 3 files changed, 40 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>>> index abf8e02..4757a2b 100644
>>> --- a/fs/nfs/direct.c
>>> +++ b/fs/nfs/direct.c
>>> @@ -347,8 +347,9 @@ static ssize_t
>>> nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
>>> 		NFS_PROTO(inode)->read_setup(data, &msg);
>>>
>>> 		task = rpc_run_task(&task_setup_data);
>>> -		if (!IS_ERR(task))
>>> -			rpc_put_task(task);
>>> +		if (IS_ERR(task))
>>> +			break;
>>> +		rpc_put_task(task);
>>>
>>> 		dprintk("NFS: %5u initiated direct read call "
>>> 			"(req %s/%Ld, %zu bytes @ offset %Lu)\n",
>>
>> My reading of this logic suggests that if the very first
>> rpc_run_task() call in the loop fails, we'll return -EFAULT instead  
>> of
>> something sensible, like -ENOMEM.
>
> I'm confused. How could ENOMEM be a sensible substitute for EFAULT?


My point is that EFAULT isn't an appropriate return code if  
rpc_run_task() fails.  EFAULT is appropriate only when the I/O engine  
can't access the user's memory.

If no bytes were read, an error code that reflects the problem should  
be returned.

	task = rpc_run_task(&task_setup_data);
	if (IS_ERR(task) {
		result = PTR_ERR(task);
		break;
	}
	rpc_put_task(task);

The logic at the end of nfs_direct_read_schedule_segment() will take  
care to return the number of bytes read, or an error code if no bytes  
were read.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 31/33] NFS - fix potential NULL pointer dereference v2
       [not found]           ` <1208823685.7767.43.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2008-04-22 14:21             ` Chuck Lever
  2008-04-22 15:12               ` Trond Myklebust
  0 siblings, 1 reply; 62+ messages in thread
From: Chuck Lever @ 2008-04-22 14:21 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, Cyrill Gorcunov

On Apr 21, 2008, at 8:21 PM, Trond Myklebust wrote:
> On Mon, 2008-04-21 at 17:13 -0400, Chuck Lever wrote:
>> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
>>> From: Cyrill Gorcunov <gorcunov@gmail.com>
>>>
>>> There is possible NULL pointer dereference if kstr[n]dup failed.
>>
>> The logic in super.c and client.c shouldn't assume  
>> nfs_server.hostname
>> is non-NULL.  Can you say where the NULL dereference might happen?
>
> Sure it does. See for instance all those dereferences of
> nfs_client->cl_hostname. It has never been acceptable to set a null
> hostname.

Whether or not cl_hostname is dereferenced, you need to fix  
export_path too.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/33] SUNRPC: Fix read ordering problems with req->rq_private_buf.len
       [not found]           ` <1208824201.7767.53.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2008-04-22 15:04             ` Chuck Lever
  2008-04-22 15:24               ` Trond Myklebust
  0 siblings, 1 reply; 62+ messages in thread
From: Chuck Lever @ 2008-04-22 15:04 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Apr 21, 2008, at 8:30 PM, Trond Myklebust wrote:
> On Mon, 2008-04-21 at 17:19 -0400, Chuck Lever wrote:
>> Hi Trond-
>>
>> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
>>> We want to ensure that req->rq_private_buf.len is updated before
>>> req->rq_received, so that call_decode() doesn't use an old value for
>>> req->rq_rcv_buf.len.
>>>
>>> In 'call_decode()' itself, instead of using task->tk_status (which
>>> is set
>>> using req->rq_received) must use the actual value of
>>> req->rq_private_buf.len when deciding whether or not the received
>>> RPC reply
>>> is too short.
>>>
>>> Finally ensure that we set req->rq_rcv_buf.len to zero when  
>>> retrying a
>>> request. A typo meant that we were resetting req->rq_private_buf.len
>>> in
>>> call_decode(), and then clobbering that value with the old
>>> rq_rcv_buf.len
>>> again in xprt_transmit().
>>
>> After staring at this for a while, the interaction between
>> xprt_complete_rqst and call_decode isn't clear to me.
>>
>> I take it there is no guarantee that the xdr_buf fields and
>> rq_received are completely updated before the task is awoken and
>> call_decode runs?
>
> The call could complete just as the RPC call is being woken up due  
> to a
> timeout.

I assume we also have a problem with concurrently processing  
retransmitted replies to the same RPC request.

These races would be nice to document, or even prevent.

You could add a bit flag to, say, the rpc_rqst structure that signals  
that a reply is already being processed.  Clear the bit in  
xprt_prepare_transmit; then a test_and_set_bit in xprt_lookup_rqst()  
and xprt_timer() would allow only one thread of execution to access  
each rpc_rqst during reply processing.

> In any case, we need to ensure that the ordering of the update
> is correct. We need to know that if a processor sees req- 
> >rq_received as
> being non-zero, then the same processor will see req- 
> >rq_private_buf.len
> as being updated: on something like an alpha processor or a PPC, we  
> need
> to use explicit read and write barriers to ensure this.

Understood.

The problem I'm underscoring here is that we have three largish  
functions -- call_status, call_decode, and call_verify -- each of  
which access these fields.  There is no clear documentation that  
describes the data dependencies between the soft IRQ callbacks and  
these functions (just a couple of one sentence comments that describe  
what is done but not why).

If nothing else, this patch should improve the documentation of the  
ordering requirements in addition to addressing the problems you found.

Btw:

+	if (req->rq_rcv_buf.len < 12) {

Is there an appropriate symbolic constant you can use here instead of  
the naked 12?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials..
  2008-04-22 13:38             ` Chuck Lever
@ 2008-04-22 15:06               ` Trond Myklebust
       [not found]                 ` <1208876800.11982.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-22 15:06 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Tue, 2008-04-22 at 09:38 -0400, Chuck Lever wrote:
> > RFC-2203 states that servers are supposed to silently discard requests
> > that they don't recognise (see section 5.3.3.1 - Context  
> > Management), so
> > it is correct server behaviour.
> 
> 
> Dropping the request to destroy a context is fine.  Temporarily  
> fencing the client is what I was concerned about.

I'd agree that is somewhat drastic, and have passed the information on
to the server vendor, however that doesn't change the fact that we have
a client bug too: we should not be using expired creds.

The client side performance problem was compounded by the fact that the
RPCSEC_GSS destruction call was sent as a hard RPC call, and the fact
that we impose the NFSv4 rule that we need to drop the connection before
resending a request.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/33] NFS: Ensure that rpc_run_task() errors are propagated back to the caller
  2008-04-22 14:19             ` Chuck Lever
@ 2008-04-22 15:10               ` Trond Myklebust
       [not found]                 ` <1208877058.11982.11.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-22 15:10 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs


On Tue, 2008-04-22 at 10:19 -0400, Chuck Lever wrote:

> 
> My point is that EFAULT isn't an appropriate return code if  
> rpc_run_task() fails.  EFAULT is appropriate only when the I/O engine  
> can't access the user's memory.
> 
> If no bytes were read, an error code that reflects the problem should  
> be returned.
> 
> 	task = rpc_run_task(&task_setup_data);
> 	if (IS_ERR(task) {
> 		result = PTR_ERR(task);
> 		break;
> 	}
> 	rpc_put_task(task);
> 
> The logic at the end of nfs_direct_read_schedule_segment() will take  
> care to return the number of bytes read, or an error code if no bytes  
> were read.

Then a fix should be applied to auth_gss.c: the EFAULT error message is
coming from the downcall code. If it is not appropriate to pass that on
to the RPC caller, then we should substitute an EACCES.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 31/33] NFS - fix potential NULL pointer dereference v2
  2008-04-22 14:21             ` Chuck Lever
@ 2008-04-22 15:12               ` Trond Myklebust
       [not found]                 ` <1208877133.11982.13.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-22 15:12 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, Cyrill Gorcunov


On Tue, 2008-04-22 at 10:21 -0400, Chuck Lever wrote:
> On Apr 21, 2008, at 8:21 PM, Trond Myklebust wrote:
> > On Mon, 2008-04-21 at 17:13 -0400, Chuck Lever wrote:
> >> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> >>> From: Cyrill Gorcunov <gorcunov@gmail.com>
> >>>
> >>> There is possible NULL pointer dereference if kstr[n]dup failed.
> >>
> >> The logic in super.c and client.c shouldn't assume  
> >> nfs_server.hostname
> >> is non-NULL.  Can you say where the NULL dereference might happen?
> >
> > Sure it does. See for instance all those dereferences of
> > nfs_client->cl_hostname. It has never been acceptable to set a null
> > hostname.
> 
> Whether or not cl_hostname is dereferenced, you need to fix  
> export_path too.

<confused>Which the patch does. What is your point?</confused>

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 31/33] NFS - fix potential NULL pointer dereference v2
       [not found]                 ` <1208877133.11982.13.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2008-04-22 15:22                   ` Chuck Lever
  0 siblings, 0 replies; 62+ messages in thread
From: Chuck Lever @ 2008-04-22 15:22 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, Cyrill Gorcunov

On Apr 22, 2008, at 11:12 AM, Trond Myklebust wrote:
> On Tue, 2008-04-22 at 10:21 -0400, Chuck Lever wrote:
>> On Apr 21, 2008, at 8:21 PM, Trond Myklebust wrote:
>>> On Mon, 2008-04-21 at 17:13 -0400, Chuck Lever wrote:
>>>> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
>>>>> From: Cyrill Gorcunov <gorcunov@gmail.com>
>>>>>
>>>>> There is possible NULL pointer dereference if kstr[n]dup failed.
>>>>
>>>> The logic in super.c and client.c shouldn't assume
>>>> nfs_server.hostname
>>>> is non-NULL.  Can you say where the NULL dereference might happen?
>>>
>>> Sure it does. See for instance all those dereferences of
>>> nfs_client->cl_hostname. It has never been acceptable to set a null
>>> hostname.
>>
>> Whether or not cl_hostname is dereferenced, you need to fix
>> export_path too.
>
> <confused>Which the patch does. What is your point?</confused>


So it does.  I missed that.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 10/33] SUNRPC: Fix read ordering problems with req->rq_private_buf.len
  2008-04-22 15:04             ` Chuck Lever
@ 2008-04-22 15:24               ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2008-04-22 15:24 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs


On Tue, 2008-04-22 at 11:04 -0400, Chuck Lever wrote:
> On Apr 21, 2008, at 8:30 PM, Trond Myklebust wrote:
> > On Mon, 2008-04-21 at 17:19 -0400, Chuck Lever wrote:
> >> Hi Trond-
> >>
> >> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> >>> We want to ensure that req->rq_private_buf.len is updated before
> >>> req->rq_received, so that call_decode() doesn't use an old value for
> >>> req->rq_rcv_buf.len.
> >>>
> >>> In 'call_decode()' itself, instead of using task->tk_status (which
> >>> is set
> >>> using req->rq_received) must use the actual value of
> >>> req->rq_private_buf.len when deciding whether or not the received
> >>> RPC reply
> >>> is too short.
> >>>
> >>> Finally ensure that we set req->rq_rcv_buf.len to zero when  
> >>> retrying a
> >>> request. A typo meant that we were resetting req->rq_private_buf.len
> >>> in
> >>> call_decode(), and then clobbering that value with the old
> >>> rq_rcv_buf.len
> >>> again in xprt_transmit().
> >>
> >> After staring at this for a while, the interaction between
> >> xprt_complete_rqst and call_decode isn't clear to me.
> >>
> >> I take it there is no guarantee that the xdr_buf fields and
> >> rq_received are completely updated before the task is awoken and
> >> call_decode runs?
> >
> > The call could complete just as the RPC call is being woken up due  
> > to a
> > timeout.
> 
> I assume we also have a problem with concurrently processing  
> retransmitted replies to the same RPC request.

Not as far as I know. Retransmitting a request should only change the
contents of the send buffer; it should never change the contents of the
receive buffer.

> These races would be nice to document, or even prevent.
> 
> You could add a bit flag to, say, the rpc_rqst structure that signals  
> that a reply is already being processed.  Clear the bit in  
> xprt_prepare_transmit; then a test_and_set_bit in xprt_lookup_rqst()  
> and xprt_timer() would allow only one thread of execution to access  
> each rpc_rqst during reply processing.

We've already got the information we need in order to figure out if the
RPC call has completed or not. What new information would this extra bit
give us?

> > In any case, we need to ensure that the ordering of the update
> > is correct. We need to know that if a processor sees req- 
> > >rq_received as
> > being non-zero, then the same processor will see req- 
> > >rq_private_buf.len
> > as being updated: on something like an alpha processor or a PPC, we  
> > need
> > to use explicit read and write barriers to ensure this.
> 
> Understood.
> 
> The problem I'm underscoring here is that we have three largish  
> functions -- call_status, call_decode, and call_verify -- each of  
> which access these fields.  There is no clear documentation that  
> describes the data dependencies between the soft IRQ callbacks and  
> these functions (just a couple of one sentence comments that describe  
> what is done but not why).

There are no special data dependencies between soft IRQ callbacks and
call_verify or call_status.

The only data dependency with call_decode is the ordering requirement
that is dealt with by the write barrier in xprt_complete_rqst and the
read barrier in call_decode to ensure that the received data is visible
to the processor before it can see the req->rq_received update.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 07/33] NFS: Ensure that rpc_run_task() errors are propagated back to the caller
       [not found]                 ` <1208877058.11982.11.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2008-04-22 15:32                   ` Chuck Lever
  0 siblings, 0 replies; 62+ messages in thread
From: Chuck Lever @ 2008-04-22 15:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Apr 22, 2008, at 11:10 AM, Trond Myklebust wrote:
> On Tue, 2008-04-22 at 10:19 -0400, Chuck Lever wrote:
>> My point is that EFAULT isn't an appropriate return code if
>> rpc_run_task() fails.  EFAULT is appropriate only when the I/O engine
>> can't access the user's memory.
>>
>> If no bytes were read, an error code that reflects the problem should
>> be returned.
>>
>> 	task = rpc_run_task(&task_setup_data);
>> 	if (IS_ERR(task) {
>> 		result = PTR_ERR(task);

		^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You need to add this line to your patch.

>>
>> 		break;
>> 	}
>> 	rpc_put_task(task);
>>
>> The logic at the end of nfs_direct_read_schedule_segment() will take
>> care to return the number of bytes read, or an error code if no bytes
>> were read.
>
> Then a fix should be applied to auth_gss.c: the EFAULT error message  
> is
> coming from the downcall code. If it is not appropriate to pass that  
> on
> to the RPC caller, then we should substitute an EACCES.

The way I read your patch, PTR_ERR(task) is never returned to  
nfs_direct_read_schedule_segment's caller, so it's not possible for  
the downcall error code to be propagated upwards.

Instead, if rpc_run_task() fails the _first_ time through the loop  
(before any bytes have been started), the loop will exit, but "result"  
will already be set to the number of pages grabbed by get_user_pages.

In other words, if no bytes are in flight, "result" will be equal to  
or greater than zero, and the logic at the end of the function will  
return EFAULT.

Am I wrong about that?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials..
       [not found]                 ` <1208876800.11982.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2008-04-23 14:20                   ` Trond Myklebust
       [not found]                     ` <1208960443.7459.9.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-23 14:20 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs


On Tue, 2008-04-22 at 11:11 -0400, Trond Myklebust wrote:
> On Tue, 2008-04-22 at 09:38 -0400, Chuck Lever wrote:
> > > RFC-2203 states that servers are supposed to silently discard requests
> > > that they don't recognise (see section 5.3.3.1 - Context  
> > > Management), so
> > > it is correct server behaviour.
> > 
> > 
> > Dropping the request to destroy a context is fine.  Temporarily  
> > fencing the client is what I was concerned about.
> 
> I'd agree that is somewhat drastic, and have passed the information on
> to the server vendor, however that doesn't change the fact that we have
> a client bug too: we should not be using expired creds.
> 
> The client side performance problem was compounded by the fact that the
> RPCSEC_GSS destruction call was sent as a hard RPC call, and the fact
> that we impose the NFSv4 rule that we need to drop the connection before
> resending a request.

Having thought a bit more about the consequences of this RFC, I think we
also need to drop the credential on (major) timeouts, since we need to
assume that the timeout may be due to the credential being out of
sequence.

---------------------------------------------
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Tue, 22 Apr 2008 16:47:55 -0400
SUNRPC: Invalidate the RPCSEC_GSS session if the server dropped the request

RFC 2203 requires the server to drop the request if it believes the
RPCSEC_GSS context is out of sequence. The problem is that we have no way
on the client to know why the server dropped the request. In order to avoid
spinning forever trying to resend the request, the safe approach is
therefore to always invalidate the RPCSEC_GSS context on every major
timeout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/clnt.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 2969e84..eb813e9 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1169,6 +1169,11 @@ call_timeout(struct rpc_task *task)
 			clnt->cl_protname, clnt->cl_server);
 	}
 	rpc_force_rebind(clnt);
+	/*
+	 * Did our request time out due to an RPCSEC_GSS out-of-sequence
+	 * event? RFC2203 requires the server to drop all such requests.
+	 */
+	rpcauth_invalcred(task);
 
 retry:
 	clnt->cl_stats->rpcretrans++;


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials..
       [not found]                     ` <1208960443.7459.9.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2008-04-23 17:40                       ` Chuck Lever
  2008-04-23 18:19                       ` J. Bruce Fields
  1 sibling, 0 replies; 62+ messages in thread
From: Chuck Lever @ 2008-04-23 17:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Apr 23, 2008, at 10:20 AM, Trond Myklebust wrote:
> On Tue, 2008-04-22 at 11:11 -0400, Trond Myklebust wrote:
>> On Tue, 2008-04-22 at 09:38 -0400, Chuck Lever wrote:
>>>> RFC-2203 states that servers are supposed to silently discard  
>>>> requests
>>>> that they don't recognise (see section 5.3.3.1 - Context
>>>> Management), so
>>>> it is correct server behaviour.
>>>
>>>
>>> Dropping the request to destroy a context is fine.  Temporarily
>>> fencing the client is what I was concerned about.
>>
>> I'd agree that is somewhat drastic, and have passed the information  
>> on
>> to the server vendor, however that doesn't change the fact that we  
>> have
>> a client bug too: we should not be using expired creds.
>>
>> The client side performance problem was compounded by the fact that  
>> the
>> RPCSEC_GSS destruction call was sent as a hard RPC call, and the fact
>> that we impose the NFSv4 rule that we need to drop the connection  
>> before
>> resending a request.
>
> Having thought a bit more about the consequences of this RFC, I  
> think we
> also need to drop the credential on (major) timeouts, since we need to
> assume that the timeout may be due to the credential being out of
> sequence.

I'm not an expert on this, but so we're on the same page, are you  
looking at RFC 2203 Section 5.3.3, or Section 7.2?

5.3.3.1 seems to suggest that clients will typically bump the sequence  
number and retry after the server drops a request.  In other words, it  
doesn't expect there to be much more to timeout recovery than that.

I wonder about the impact of frequent credential invalidation for  
datagram transports, where major timeouts are not so rare.

> ---------------------------------------------
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Tue, 22 Apr 2008 16:47:55 -0400
> SUNRPC: Invalidate the RPCSEC_GSS session if the server dropped the  
> request
>
> RFC 2203 requires the server to drop the request if it believes the
> RPCSEC_GSS context is out of sequence. The problem is that we have  
> no way
> on the client to know why the server dropped the request. In order  
> to avoid
> spinning forever trying to resend the request, the safe approach is
> therefore to always invalidate the RPCSEC_GSS context on every major
> timeout.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
>
> net/sunrpc/clnt.c |    5 +++++
> 1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 2969e84..eb813e9 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1169,6 +1169,11 @@ call_timeout(struct rpc_task *task)
> 			clnt->cl_protname, clnt->cl_server);
> 	}
> 	rpc_force_rebind(clnt);
> +	/*
> +	 * Did our request time out due to an RPCSEC_GSS out-of-sequence
> +	 * event? RFC2203 requires the server to drop all such requests.
> +	 */
> +	rpcauth_invalcred(task);
>
> retry:
> 	clnt->cl_stats->rpcretrans++;
>
>
> -- 
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials..
       [not found]                     ` <1208960443.7459.9.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  2008-04-23 17:40                       ` Chuck Lever
@ 2008-04-23 18:19                       ` J. Bruce Fields
  2008-04-24 17:48                         ` Trond Myklebust
  1 sibling, 1 reply; 62+ messages in thread
From: J. Bruce Fields @ 2008-04-23 18:19 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Chuck Lever, linux-nfs

On Wed, Apr 23, 2008 at 10:20:43AM -0400, Trond Myklebust wrote:
> 
> On Tue, 2008-04-22 at 11:11 -0400, Trond Myklebust wrote:
> > On Tue, 2008-04-22 at 09:38 -0400, Chuck Lever wrote:
> > > > RFC-2203 states that servers are supposed to silently discard requests
> > > > that they don't recognise (see section 5.3.3.1 - Context  
> > > > Management), so
> > > > it is correct server behaviour.
> > > 
> > > 
> > > Dropping the request to destroy a context is fine.  Temporarily  
> > > fencing the client is what I was concerned about.
> > 
> > I'd agree that is somewhat drastic, and have passed the information on
> > to the server vendor, however that doesn't change the fact that we have
> > a client bug too: we should not be using expired creds.
> > 
> > The client side performance problem was compounded by the fact that the
> > RPCSEC_GSS destruction call was sent as a hard RPC call, and the fact
> > that we impose the NFSv4 rule that we need to drop the connection before
> > resending a request.
> 
> Having thought a bit more about the consequences of this RFC, I think we
> also need to drop the credential on (major) timeouts, since we need to
> assume that the timeout may be due to the credential being out of
> sequence.

I'm a little confused.  Each resend is done with a new gss sequence
number.

--b.

> 
> ---------------------------------------------
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Tue, 22 Apr 2008 16:47:55 -0400
> SUNRPC: Invalidate the RPCSEC_GSS session if the server dropped the request
> 
> RFC 2203 requires the server to drop the request if it believes the
> RPCSEC_GSS context is out of sequence. The problem is that we have no way
> on the client to know why the server dropped the request. In order to avoid
> spinning forever trying to resend the request, the safe approach is
> therefore to always invalidate the RPCSEC_GSS context on every major
> timeout.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
> 
>  net/sunrpc/clnt.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 2969e84..eb813e9 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1169,6 +1169,11 @@ call_timeout(struct rpc_task *task)
>  			clnt->cl_protname, clnt->cl_server);
>  	}
>  	rpc_force_rebind(clnt);
> +	/*
> +	 * Did our request time out due to an RPCSEC_GSS out-of-sequence
> +	 * event? RFC2203 requires the server to drop all such requests.
> +	 */
> +	rpcauth_invalcred(task);
>  
>  retry:
>  	clnt->cl_stats->rpcretrans++;
> 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials..
  2008-04-23 18:19                       ` J. Bruce Fields
@ 2008-04-24 17:48                         ` Trond Myklebust
       [not found]                           ` <1209059309.7619.2.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2008-04-24 17:48 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Chuck Lever, linux-nfs


On Wed, 2008-04-23 at 14:19 -0400, J. Bruce Fields wrote:
> On Wed, Apr 23, 2008 at 10:20:43AM -0400, Trond Myklebust wrote:
> > 
> > On Tue, 2008-04-22 at 11:11 -0400, Trond Myklebust wrote:
> > > On Tue, 2008-04-22 at 09:38 -0400, Chuck Lever wrote:
> > > > > RFC-2203 states that servers are supposed to silently discard requests
> > > > > that they don't recognise (see section 5.3.3.1 - Context  
> > > > > Management), so
> > > > > it is correct server behaviour.
> > > > 
> > > > 
> > > > Dropping the request to destroy a context is fine.  Temporarily  
> > > > fencing the client is what I was concerned about.
> > > 
> > > I'd agree that is somewhat drastic, and have passed the information on
> > > to the server vendor, however that doesn't change the fact that we have
> > > a client bug too: we should not be using expired creds.
> > > 
> > > The client side performance problem was compounded by the fact that the
> > > RPCSEC_GSS destruction call was sent as a hard RPC call, and the fact
> > > that we impose the NFSv4 rule that we need to drop the connection before
> > > resending a request.
> > 
> > Having thought a bit more about the consequences of this RFC, I think we
> > also need to drop the credential on (major) timeouts, since we need to
> > assume that the timeout may be due to the credential being out of
> > sequence.
> 
> I'm a little confused.  Each resend is done with a new gss sequence
> number.

The point is that if the _server_ gets confused, then it may not tell us
that our context is invalid: it will just start dropping all the
requests that we send it.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials..
       [not found]                           ` <1209059309.7619.2.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2008-04-24 17:55                             ` J. Bruce Fields
  0 siblings, 0 replies; 62+ messages in thread
From: J. Bruce Fields @ 2008-04-24 17:55 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Chuck Lever, linux-nfs

On Thu, Apr 24, 2008 at 01:48:29PM -0400, Trond Myklebust wrote:
> 
> On Wed, 2008-04-23 at 14:19 -0400, J. Bruce Fields wrote:
> > On Wed, Apr 23, 2008 at 10:20:43AM -0400, Trond Myklebust wrote:
> > > 
> > > On Tue, 2008-04-22 at 11:11 -0400, Trond Myklebust wrote:
> > > > On Tue, 2008-04-22 at 09:38 -0400, Chuck Lever wrote:
> > > > > > RFC-2203 states that servers are supposed to silently discard requests
> > > > > > that they don't recognise (see section 5.3.3.1 - Context  
> > > > > > Management), so
> > > > > > it is correct server behaviour.
> > > > > 
> > > > > 
> > > > > Dropping the request to destroy a context is fine.  Temporarily  
> > > > > fencing the client is what I was concerned about.
> > > > 
> > > > I'd agree that is somewhat drastic, and have passed the information on
> > > > to the server vendor, however that doesn't change the fact that we have
> > > > a client bug too: we should not be using expired creds.
> > > > 
> > > > The client side performance problem was compounded by the fact that the
> > > > RPCSEC_GSS destruction call was sent as a hard RPC call, and the fact
> > > > that we impose the NFSv4 rule that we need to drop the connection before
> > > > resending a request.
> > > 
> > > Having thought a bit more about the consequences of this RFC, I think we
> > > also need to drop the credential on (major) timeouts, since we need to
> > > assume that the timeout may be due to the credential being out of
> > > sequence.
> > 
> > I'm a little confused.  Each resend is done with a new gss sequence
> > number.
> 
> The point is that if the _server_ gets confused, then it may not tell us
> that our context is invalid: it will just start dropping all the
> requests that we send it.

So the server miscalculates and thinks the next sequence number should
be millions higher than what we think it should be, for example?  OK,
sure.

--b.

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2008-04-24 17:56 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-19 20:40 [PATCH 00/33] More NFS and SUNRPC client side patches Trond Myklebust
     [not found] ` <20080419204047.14124.49490.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
2008-04-19 20:40   ` [PATCH 02/33] SUNRPC: Fix up xprt_write_space() Trond Myklebust
     [not found]     ` <20080419204047.14124.5947.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
2008-04-21 17:31       ` Chuck Lever
2008-04-21 23:51         ` Trond Myklebust
2008-04-19 20:40   ` [PATCH 01/33] SUNRPC: Fix a bug in call_decode() Trond Myklebust
     [not found]     ` <20080419204047.14124.76946.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
2008-04-21 16:22       ` Chuck Lever
2008-04-19 20:40   ` [PATCH 04/33] NFS: Fix nfs_wb_page() to always exit with an error or a clean page Trond Myklebust
     [not found]     ` <20080419204048.14124.46594.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
2008-04-21 18:53       ` Chuck Lever
2008-04-22  0:14         ` Trond Myklebust
2008-04-19 20:40   ` [PATCH 05/33] NFS: Ensure that the read code cleans up properly when rpc_run_task() fails Trond Myklebust
2008-04-19 20:40   ` [PATCH 06/33] NFS: Ensure that the write " Trond Myklebust
2008-04-19 20:40   ` [PATCH 03/33] SUNRPC: Don't attempt to destroy expired RPCSEC_GSS credentials Trond Myklebust
     [not found]     ` <20080419204047.14124.64969.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
2008-04-21 17:50       ` Chuck Lever
2008-04-22  0:00         ` Trond Myklebust
     [not found]           ` <1208822443.7767.23.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-04-22 13:38             ` Chuck Lever
2008-04-22 15:06               ` Trond Myklebust
     [not found]                 ` <1208876800.11982.6.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-04-23 14:20                   ` Trond Myklebust
     [not found]                     ` <1208960443.7459.9.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-04-23 17:40                       ` Chuck Lever
2008-04-23 18:19                       ` J. Bruce Fields
2008-04-24 17:48                         ` Trond Myklebust
     [not found]                           ` <1209059309.7619.2.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-04-24 17:55                             ` J. Bruce Fields
2008-04-19 20:40   ` [PATCH 08/33] NFSv4: Remove bogus call to nfs4_drop_state_owner() in _nfs4_open_expired() Trond Myklebust
2008-04-19 20:40   ` [PATCH 10/33] SUNRPC: Fix read ordering problems with req->rq_private_buf.len Trond Myklebust
     [not found]     ` <20080419204049.14124.11174.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
2008-04-21 21:19       ` Chuck Lever
2008-04-22  0:30         ` Trond Myklebust
     [not found]           ` <1208824201.7767.53.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-04-22 15:04             ` Chuck Lever
2008-04-22 15:24               ` Trond Myklebust
2008-04-19 20:40   ` [PATCH 09/33] NFSv4: Only increment the sequence id if the server saw it Trond Myklebust
2008-04-19 20:40   ` [PATCH 07/33] NFS: Ensure that rpc_run_task() errors are propagated back to the caller Trond Myklebust
     [not found]     ` <20080419204048.14124.4335.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
2008-04-21 19:55       ` Chuck Lever
2008-04-22  0:17         ` Trond Myklebust
     [not found]           ` <1208823472.7767.40.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-04-22 14:19             ` Chuck Lever
2008-04-22 15:10               ` Trond Myklebust
     [not found]                 ` <1208877058.11982.11.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-04-22 15:32                   ` Chuck Lever
2008-04-19 20:40   ` [PATCH 11/33] NLM/lockd: Ensure we don't corrupt fl->fl_flags in nlmclnt_unlock() Trond Myklebust
2008-04-19 20:40   ` [PATCH 13/33] NLM/lockd: Add a reference counter to struct nlm_rqst Trond Myklebust
2008-04-19 20:40   ` [PATCH 14/33] NLM/lockd: convert __nlm_async_call to use rpc_run_task() Trond Myklebust
2008-04-19 20:40   ` [PATCH 12/33] NFSv4: Ensure we don't corrupt fl->fl_flags in nfs4_proc_unlck Trond Myklebust
2008-04-19 20:40   ` [PATCH 16/33] NLM/lockd: Ensure that nlmclnt_cancel() returns results of the CANCEL call Trond Myklebust
2008-04-19 20:40   ` [PATCH 15/33] NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel Trond Myklebust
2008-04-19 20:40   ` [PATCH 18/33] NFS: Remove the buggy lock-if-signalled case from do_setlk() Trond Myklebust
2008-04-19 20:40   ` [PATCH 17/33] NLM/lockd: Fix a race when cancelling a blocking lock Trond Myklebust
2008-04-19 20:40   ` [PATCH 22/33] NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid() Trond Myklebust
2008-04-19 20:40   ` [PATCH 21/33] nfs: fix printout of multiword bitfields Trond Myklebust
2008-04-19 20:40   ` [PATCH 19/33] NLM/lockd: Ensure client locking calls use correct credentials Trond Myklebust
2008-04-19 20:40   ` [PATCH 20/33] nfs: return negative error value from nfs{, 4}_stat_to_errno Trond Myklebust
2008-04-19 20:40   ` [PATCH 25/33] SUNRPC: Protect creds against early garbage collection Trond Myklebust
2008-04-19 20:40   ` [PATCH 26/33] SUNRPC: remove XS_SENDMSG_RETRY Trond Myklebust
2008-04-19 20:40   ` [PATCH 23/33] NFSv4: Reintroduce machine creds Trond Myklebust
2008-04-19 20:40   ` [PATCH 24/33] NFSv4: Attempt to use machine credentials in SETCLIENTID calls Trond Myklebust
2008-04-19 20:40   ` [PATCH 28/33] SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests Trond Myklebust
2008-04-19 20:40   ` [PATCH 29/33] SUNRPC: Fix a race in gss_refresh_upcall() Trond Myklebust
2008-04-19 20:40   ` [PATCH 30/33] SUNRPC: Don't change the RPCSEC_GSS context on a credential that is in use Trond Myklebust
2008-04-19 20:40   ` [PATCH 27/33] SUNRPC: Remove the unused export of xprt_force_disconnect Trond Myklebust
2008-04-19 20:40   ` [PATCH 32/33] NFS: remove duplicate flags assignment from nfs_validate_mount_data Trond Myklebust
2008-04-19 20:40   ` [PATCH 33/33] make nfs_automount_list static Trond Myklebust
2008-04-19 20:40   ` [PATCH 31/33] NFS - fix potential NULL pointer dereference v2 Trond Myklebust
     [not found]     ` <20080419204054.14124.59641.stgit-KPEdlmqt5P7XOazzY/2fV4TcuzvYVacciM950cveMlzk1uMJSBkQmQ@public.gmane.org>
2008-04-21 21:13       ` Chuck Lever
2008-04-22  0:21         ` Trond Myklebust
     [not found]           ` <1208823685.7767.43.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-04-22 14:21             ` Chuck Lever
2008-04-22 15:12               ` Trond Myklebust
     [not found]                 ` <1208877133.11982.13.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2008-04-22 15:22                   ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox