[PATCH 00/19] Proposed server-side patches for 2.6.31

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/19] Proposed server-side patches for 2.6.31
@ 2009-04-23 23:31 Chuck Lever
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:31 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Hi Bruce -

Here are some patches I'd like to propose for 2.6.31.  Please review
and consider them for linux-next.  They've seen a little testing here
on x86_64, but they should get some public exercise.

The first 17 are updated versions of patches I submitted for 2.6.30,
but were dropped at the last minute to simplify things.  They provide
some clean-up for the nfsctl API, and a little more IPv6 support in
that area as well, in preparation for full-on IPv6-related changes to
come.  I think I've addressed earlier review comments.

As usual, it would be most helpful if you would drop the whole series
if you have issues with any of these patches.

Patch 18 addresses a long-standing problem with our NLM/NSM
implementation.  If lockd.ko isn't loaded before statd starts up
(which is the typical case for both our server and client) then
lockd's NSM state number is never set.  We then end up sending a zero
state number in all NLM requests.  This is also the case if lockd.ko
is unloaded and reloaded during normal operation -- the NSM state
number is reset to zero.  See the patch description for further
detail.

Patch 19 provides a little more 64-bit alignment clean up in
nsm_init_private() .

These two are independent of all the others and either or both can be
applied as you see fit.

---

Chuck Lever (19):
      lockd: clean up 64-bit alignment fix in nsm_init_private()
      lockd: Update NSM state from SM_MON replies
      NFSD: Stricter buffer size checking in fs/nfsd/nfsctl.c
      NFSD: Stricter buffer size checking in write_versions()
      NFSD: Stricter buffer size checking in write_recoverydir()
      SUNRPC: Clean up one_sock_name()
      SUNRPC: Support PF_INET6 in one_sock_name()
      SUNRPC: Switch one_sock_name() to use snprintf()
      SUNRPC: pass buffer size to svc_sock_names()
      SUNRPC: pass buffer size to svc_addsock()
      NFSD: Prevent a buffer overflow in svc_xprt_names()
      NFSD: move lockd_up() before svc_addsock()
      NFSD: Finish refactoring __write_ports()
      NFSD: Note an additional requirement when passing TCP sockets to portlist
      NFSD: Refactor socket creation out of __write_ports()
      NFSD: Refactor portlist socket closing into a helper
      NFSD: Refactor transport addition out of __write_ports()
      NFSD: Refactor transport removal out of __write_ports()
      SUNRPC: Fix error return value of svc_addr_len()

 fs/lockd/clntproc.c             |    2 
 fs/lockd/mon.c                  |   18 +-
 fs/nfsd/nfsctl.c                |  284 ++++++++++++++++++++++++---------------
 include/linux/sunrpc/svc_xprt.h |    7 +
 include/linux/sunrpc/svcsock.h  |    7 +
 net/sunrpc/svc_xprt.c           |   56 +++++---
 net/sunrpc/svcsock.c            |   87 +++++++++---
 7 files changed, 299 insertions(+), 162 deletions(-)

-- 
Chuck Lever <chuck.lever@oracle.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH 01/19] SUNRPC: Fix error return value of svc_addr_len()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
@ 2009-04-23 23:31   ` Chuck Lever
       [not found]     ` <20090423233124.17283.40252.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
  2009-04-23 23:31   ` [PATCH 02/19] NFSD: Refactor transport removal out of __write_ports() Chuck Lever
                     ` (18 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:31 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

The svc_addr_len() helper function returns -EAFNOSUPPORT if it doesn't
recognize the address family of the passed-in socket address.  However,
the return type of this function is size_t, which means -EAFNOSUPPORT
is turned into a very large positive value in this case.

The check in svc_udp_recvfrom() to see if the return value is less
than zero therefore won't work at all.

Additionally, handle_connect_req() passes this value directly to
memset().  This could cause memset() to clobber a large chunk of memory
if svc_addr_len() has returned an error.  Currently the address family
of these addresses, however, is known to be supported long before
handle_connect_req() is called, so this isn't a real risk.

Change the error return value of svc_addr_len() to zero, which fits in
the range of size_t, and is safer to pass to memset() directly.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 include/linux/sunrpc/svc_xprt.h |    5 +++--
 net/sunrpc/svcsock.c            |    7 ++++---
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 0d9cb6e..d790c52 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -118,7 +118,7 @@ static inline unsigned short svc_addr_port(const struct sockaddr *sa)
 	return 0;
 }
 
-static inline size_t svc_addr_len(struct sockaddr *sa)
+static inline size_t svc_addr_len(const struct sockaddr *sa)
 {
 	switch (sa->sa_family) {
 	case AF_INET:
@@ -126,7 +126,8 @@ static inline size_t svc_addr_len(struct sockaddr *sa)
 	case AF_INET6:
 		return sizeof(struct sockaddr_in6);
 	}
-	return -EAFNOSUPPORT;
+
+	return 0;
 }
 
 static inline unsigned short svc_xprt_local_port(const struct svc_xprt *xprt)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index af31988..8b08328 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -426,13 +426,14 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
 		long		all[SVC_PKTINFO_SPACE / sizeof(long)];
 	} buffer;
 	struct cmsghdr *cmh = &buffer.hdr;
-	int		err, len;
 	struct msghdr msg = {
 		.msg_name = svc_addr(rqstp),
 		.msg_control = cmh,
 		.msg_controllen = sizeof(buffer),
 		.msg_flags = MSG_DONTWAIT,
 	};
+	size_t len;
+	int err;
 
 	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
 	    /* udp sockets need large rcvbuf as all pending
@@ -464,8 +465,8 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
 		return -EAGAIN;
 	}
 	len = svc_addr_len(svc_addr(rqstp));
-	if (len < 0)
-		return len;
+	if (len == 0)
+		return -EAFNOSUPPORT;
 	rqstp->rq_addrlen = len;
 	if (skb->tstamp.tv64 == 0) {
 		skb->tstamp = ktime_get_real();


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 02/19] NFSD: Refactor transport removal out of __write_ports()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
  2009-04-23 23:31   ` [PATCH 01/19] SUNRPC: Fix error return value of svc_addr_len() Chuck Lever
@ 2009-04-23 23:31   ` Chuck Lever
  2009-04-23 23:31   ` [PATCH 03/19] NFSD: Refactor transport addition " Chuck Lever
                     ` (17 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:31 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Clean up: Refactor transport removal out of __write_ports() to make it
easier to understand and maintain.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c |   53 +++++++++++++++++++++++++++++------------------------
 1 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index af16849..2c1dce8 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -910,6 +910,31 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
 	return rv;
 }
 
+/*
+ * A transport listener is removed by writing a "-", it's transport
+ * name, and it's port number.
+ */
+static ssize_t __write_ports_delxprt(char *buf)
+{
+	struct svc_xprt *xprt;
+	char transport[16];
+	int port;
+
+	if (sscanf(&buf[1], "%15s %4u", transport, &port) != 2)
+		return -EINVAL;
+
+	if (port < 1 || port > USHORT_MAX || nfsd_serv == NULL)
+		return -EINVAL;
+
+	xprt = svc_find_xprt(nfsd_serv, transport, AF_UNSPEC, port);
+	if (xprt == NULL)
+		return -ENOTCONN;
+
+	svc_close_xprt(xprt);
+	svc_xprt_put(xprt);
+	return 0;
+}
+
 static ssize_t __write_ports(struct file *file, char *buf, size_t size)
 {
 	if (size == 0) {
@@ -984,30 +1009,10 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size)
 			return err < 0 ? err : 0;
 		}
 	}
-	/*
-	 * Remove a transport by writing it's transport name and port number
-	 */
-	if (buf[0] == '-' && isalpha(buf[1])) {
-		struct svc_xprt *xprt;
-		int err = -EINVAL;
-		char transport[16];
-		int port;
-		if (sscanf(&buf[1], "%15s %4d", transport, &port) == 2) {
-			if (port < 1 || port > 65535)
-				return -EINVAL;
-			if (nfsd_serv) {
-				xprt = svc_find_xprt(nfsd_serv, transport,
-						     AF_UNSPEC, port);
-				if (xprt) {
-					svc_close_xprt(xprt);
-					svc_xprt_put(xprt);
-					err = 0;
-				} else
-					err = -ENOTCONN;
-			}
-			return err < 0 ? err : 0;
-		}
-	}
+
+	if (buf[0] == '-' && isalpha(buf[1]))
+		return __write_ports_delxprt(buf);
+
 	return -EINVAL;
 }
 


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 03/19] NFSD: Refactor transport addition out of __write_ports()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
  2009-04-23 23:31   ` [PATCH 01/19] SUNRPC: Fix error return value of svc_addr_len() Chuck Lever
  2009-04-23 23:31   ` [PATCH 02/19] NFSD: Refactor transport removal out of __write_ports() Chuck Lever
@ 2009-04-23 23:31   ` Chuck Lever
  2009-04-23 23:31   ` [PATCH 04/19] NFSD: Refactor portlist socket closing into a helper Chuck Lever
                     ` (16 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:31 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Clean up: Refactor transport addition out of __write_ports() to make
it easier to understand and maintain.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c |   56 ++++++++++++++++++++++++++++++++----------------------
 1 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 2c1dce8..748532b 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -911,6 +911,36 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
 }
 
 /*
+ * A transport listener is added by writing it's transport name and
+ * a port number.
+ */
+static ssize_t __write_ports_addxprt(char *buf)
+{
+	char transport[16];
+	int port, err;
+
+	if (sscanf(buf, "%15s %4u", transport, &port) != 2)
+		return -EINVAL;
+
+	if (port < 1 || port > USHORT_MAX)
+		return -EINVAL;
+
+	err = nfsd_create_serv();
+	if (err != 0)
+		return err;
+
+	err = svc_create_xprt(nfsd_serv, transport,
+				PF_INET, port, SVC_SOCK_ANONYMOUS);
+	if (err < 0) {
+		/* Give a reasonable perror msg for bad transport string */
+		if (err == -ENOENT)
+			err = -EPROTONOSUPPORT;
+		return err;
+	}
+	return 0;
+}
+
+/*
  * A transport listener is removed by writing a "-", it's transport
  * name, and it's port number.
  */
@@ -986,29 +1016,9 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size)
 		kfree(toclose);
 		return len;
 	}
-	/*
-	 * Add a transport listener by writing it's transport name
-	 */
-	if (isalpha(buf[0])) {
-		int err;
-		char transport[16];
-		int port;
-		if (sscanf(buf, "%15s %4d", transport, &port) == 2) {
-			if (port < 1 || port > 65535)
-				return -EINVAL;
-			err = nfsd_create_serv();
-			if (!err) {
-				err = svc_create_xprt(nfsd_serv,
-						      transport, PF_INET, port,
-						      SVC_SOCK_ANONYMOUS);
-				if (err == -ENOENT)
-					/* Give a reasonable perror msg for
-					 * bad transport string */
-					err = -EPROTONOSUPPORT;
-			}
-			return err < 0 ? err : 0;
-		}
-	}
+
+	if (isalpha(buf[0]))
+		return __write_ports_addxprt(buf);
 
 	if (buf[0] == '-' && isalpha(buf[1]))
 		return __write_ports_delxprt(buf);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 04/19] NFSD: Refactor portlist socket closing into a helper
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (2 preceding siblings ...)
  2009-04-23 23:31   ` [PATCH 03/19] NFSD: Refactor transport addition " Chuck Lever
@ 2009-04-23 23:31   ` Chuck Lever
  2009-04-23 23:31   ` [PATCH 05/19] NFSD: Refactor socket creation out of __write_ports() Chuck Lever
                     ` (15 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:31 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Clean up: Refactor the socket closing logic out of __write_ports() to
make it easier to understand and maintain.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c |   36 ++++++++++++++++++++++++------------
 1 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 748532b..fa268d1 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -911,6 +911,27 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
 }
 
 /*
+ * A '-' followed by the 'name' of a socket means we close the socket.
+ */
+static ssize_t __write_ports_delfd(char *buf)
+{
+	char *toclose;
+	int len = 0;
+
+	toclose = kstrdup(buf + 1, GFP_KERNEL);
+	if (toclose == NULL)
+		return -ENOMEM;
+
+	if (nfsd_serv != NULL)
+		len = svc_sock_names(buf, nfsd_serv, toclose);
+	if (len >= 0)
+		lockd_down();
+
+	kfree(toclose);
+	return len;
+}
+
+/*
  * A transport listener is added by writing it's transport name and
  * a port number.
  */
@@ -1004,18 +1025,9 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size)
 		}
 		return err < 0 ? err : 0;
 	}
-	if (buf[0] == '-' && isdigit(buf[1])) {
-		char *toclose = kstrdup(buf+1, GFP_KERNEL);
-		int len = 0;
-		if (!toclose)
-			return -ENOMEM;
-		if (nfsd_serv)
-			len = svc_sock_names(buf, nfsd_serv, toclose);
-		if (len >= 0)
-			lockd_down();
-		kfree(toclose);
-		return len;
-	}
+
+	if (buf[0] == '-' && isdigit(buf[1]))
+		return __write_ports_delfd(buf);
 
 	if (isalpha(buf[0]))
 		return __write_ports_addxprt(buf);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 05/19] NFSD: Refactor socket creation out of __write_ports()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (3 preceding siblings ...)
  2009-04-23 23:31   ` [PATCH 04/19] NFSD: Refactor portlist socket closing into a helper Chuck Lever
@ 2009-04-23 23:31   ` Chuck Lever
       [not found]     ` <20090423233155.17283.37345.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
  2009-04-23 23:32   ` [PATCH 06/19] NFSD: Note an additional requirement when passing TCP sockets to portlist Chuck Lever
                     ` (14 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:31 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Clean up: Refactor the socket creation logic out of __write_ports() to
make it easier to understand and maintain.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c |   64 +++++++++++++++++++++++++++++-------------------------
 1 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index fa268d1..b6a847f 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -911,6 +911,37 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
 }
 
 /*
+ * A single 'fd' number was written, in which case it must be for
+ * a socket of a supported family/protocol, and we use it as an
+ * nfsd listener.
+ */
+static ssize_t __write_ports_addfd(char *buf)
+{
+	char *mesg = buf;
+	int fd, err;
+
+	err = get_int(&mesg, &fd);
+	if (err != 0 || fd < 0)
+		return -EINVAL;
+
+	err = nfsd_create_serv();
+	if (err != 0)
+		return err;
+
+	err = svc_addsock(nfsd_serv, fd, buf);
+	if (err >= 0) {
+		err = lockd_up();
+		if (err < 0)
+			svc_sock_names(buf + strlen(buf) + 1, nfsd_serv, buf);
+
+		/* Decrease the count, but don't shut down the service */
+		nfsd_serv->sv_nrthreads--;
+	}
+
+	return err < 0 ? err : 0;
+}
+
+/*
  * A '-' followed by the 'name' of a socket means we close the socket.
  */
 static ssize_t __write_ports_delfd(char *buf)
@@ -995,36 +1026,9 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size)
 			len = svc_xprt_names(nfsd_serv, buf, 0);
 		return len;
 	}
-	/* Either a single 'fd' number is written, in which
-	 * case it must be for a socket of a supported family/protocol,
-	 * and we use it as an nfsd socket, or
-	 * A '-' followed by the 'name' of a socket in which case
-	 * we close the socket.
-	 */
-	if (isdigit(buf[0])) {
-		char *mesg = buf;
-		int fd;
-		int err;
-		err = get_int(&mesg, &fd);
-		if (err)
-			return -EINVAL;
-		if (fd < 0)
-			return -EINVAL;
-		err = nfsd_create_serv();
-		if (!err) {
-			err = svc_addsock(nfsd_serv, fd, buf);
-			if (err >= 0) {
-				err = lockd_up();
-				if (err < 0)
-					svc_sock_names(buf+strlen(buf)+1, nfsd_serv, buf);
-			}
-			/* Decrease the count, but don't shutdown the
-			 * the service
-			 */
-			nfsd_serv->sv_nrthreads--;
-		}
-		return err < 0 ? err : 0;
-	}
+
+	if (isdigit(buf[0]))
+		return __write_ports_addfd(buf);
 
 	if (buf[0] == '-' && isdigit(buf[1]))
 		return __write_ports_delfd(buf);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 06/19] NFSD: Note an additional requirement when passing TCP sockets to portlist
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (4 preceding siblings ...)
  2009-04-23 23:31   ` [PATCH 05/19] NFSD: Refactor socket creation out of __write_ports() Chuck Lever
@ 2009-04-23 23:32   ` Chuck Lever
  2009-04-23 23:32   ` [PATCH 07/19] NFSD: Finish refactoring __write_ports() Chuck Lever
                     ` (13 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:32 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

User space must call listen(3) on SOCK_STREAM sockets passed into
/proc/fs/nfsd/portlist, otherwise that listener is ignored.  Document
this.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index b6a847f..d491fa9 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1061,7 +1061,9 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size)
  *			buf:		C string containing an unsigned
  *					integer value representing a bound
  *					but unconnected socket that is to be
- *					used as an NFSD listener
+ *					used as an NFSD listener; listen(3)
+ *					must be called for a SOCK_STREAM
+ *					socket, otherwise it is ignored
  *			size:		non-zero length of C string in @buf
  * Output:
  *	On success:	NFS service is started;


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 07/19] NFSD: Finish refactoring __write_ports()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (5 preceding siblings ...)
  2009-04-23 23:32   ` [PATCH 06/19] NFSD: Note an additional requirement when passing TCP sockets to portlist Chuck Lever
@ 2009-04-23 23:32   ` Chuck Lever
  2009-04-23 23:32   ` [PATCH 08/19] NFSD: move lockd_up() before svc_addsock() Chuck Lever
                     ` (12 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:32 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Clean up: Refactor transport name listing out of __write_ports() to
make it easier to understand and maintain.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c |   20 +++++++++++++-------
 1 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index d491fa9..caf4fdc 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -911,6 +911,17 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
 }
 
 /*
+ * Zero-length write.  Return a list of NFSD's current listener
+ * transports.
+ */
+static ssize_t __write_ports_names(char *buf)
+{
+	if (nfsd_serv == NULL)
+		return 0;
+	return svc_xprt_names(nfsd_serv, buf, 0);
+}
+
+/*
  * A single 'fd' number was written, in which case it must be for
  * a socket of a supported family/protocol, and we use it as an
  * nfsd listener.
@@ -1019,13 +1030,8 @@ static ssize_t __write_ports_delxprt(char *buf)
 
 static ssize_t __write_ports(struct file *file, char *buf, size_t size)
 {
-	if (size == 0) {
-		int len = 0;
-
-		if (nfsd_serv)
-			len = svc_xprt_names(nfsd_serv, buf, 0);
-		return len;
-	}
+	if (size == 0)
+		return __write_ports_names(buf);
 
 	if (isdigit(buf[0]))
 		return __write_ports_addfd(buf);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 08/19] NFSD: move lockd_up() before svc_addsock()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (6 preceding siblings ...)
  2009-04-23 23:32   ` [PATCH 07/19] NFSD: Finish refactoring __write_ports() Chuck Lever
@ 2009-04-23 23:32   ` Chuck Lever
  2009-04-23 23:32   ` [PATCH 09/19] NFSD: Prevent a buffer overflow in svc_xprt_names() Chuck Lever
                     ` (11 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:32 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Clean up.

A couple of years ago, a series of commits, finishing with commit
5680c446, swapped the order of the lockd_up() and svc_addsock() calls
in __write_ports().  At that time lockd_up() needed to know the
transport protocol of the passed-in socket to start a listener on the
same transport protocol.

These days, lockd_up() doesn't take a protocol argument; it always
starts both a UDP and TCP listener.  It's now more straightforward to
try the lockd_up() first, then do a lockd_down() if the svc_addsock()
fails.

Careful review of this code shows that the svc_sock_names() call is
used only to close the just-opened socket in case lockd_up() fails.
So it is no longer needed if lockd_up() is done first.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index caf4fdc..e051847 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -939,17 +939,18 @@ static ssize_t __write_ports_addfd(char *buf)
 	if (err != 0)
 		return err;

-	err = svc_addsock(nfsd_serv, fd, buf);
-	if (err >= 0) {
-		err = lockd_up();
-		if (err < 0)
-			svc_sock_names(buf + strlen(buf) + 1, nfsd_serv, buf);
+	err = lockd_up();
+	if (err != 0)
+		goto out;

-		/* Decrease the count, but don't shut down the service */
-		nfsd_serv->sv_nrthreads--;
-	}
+	err = svc_addsock(nfsd_serv, fd, buf);
+	if (err < 0)
+		lockd_down();

-	return err < 0 ? err : 0;
+out:
+	/* Decrease the count, but don't shut down the service */
+	nfsd_serv->sv_nrthreads--;
+	return err;
 }

 /*

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 09/19] NFSD: Prevent a buffer overflow in svc_xprt_names()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (7 preceding siblings ...)
  2009-04-23 23:32   ` [PATCH 08/19] NFSD: move lockd_up() before svc_addsock() Chuck Lever
@ 2009-04-23 23:32   ` Chuck Lever
       [not found]     ` <20090423233225.17283.10176.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
  2009-04-23 23:32   ` [PATCH 10/19] SUNRPC: pass buffer size to svc_addsock() Chuck Lever
                     ` (10 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:32 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

The svc_xprt_names() function can overflow its buffer if it's so near
the end of the passed in buffer that the "name too long" string still
doesn't fit.  Of course, it could never tell if it was near the end
of the passed in buffer, since its only caller passes in zero as the
buffer length.

Let's make this API a little safer.

Change svc_xprt_names() so it *always* checks for a buffer overflow,
and change its only caller to pass in the correct buffer length.

If svc_xprt_names() does overflow its buffer, it now fails with an
ENAMETOOLONG errno, instead of trying to write a message at the end
of the buffer.  I don't like this much, but I can't figure out a clean
way that's always safe to return some of the names, *and* an
indication that the buffer was not long enough.

The displayed error when doing a 'cat /proc/fs/nfsd/portlist' is
"File name too long".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c                |    2 +
 include/linux/sunrpc/svc_xprt.h |    2 +
 net/sunrpc/svc_xprt.c           |   56 +++++++++++++++++++++++++++------------
 3 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index e051847..6a1cd90 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -918,7 +918,7 @@ static ssize_t __write_ports_names(char *buf)
 {
 	if (nfsd_serv == NULL)
 		return 0;
-	return svc_xprt_names(nfsd_serv, buf, 0);
+	return svc_xprt_names(nfsd_serv, buf, SIMPLE_TRANSACTION_LIMIT);
 }
 
 /*
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index d790c52..2223ae0 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -83,7 +83,7 @@ int	svc_port_is_privileged(struct sockaddr *sin);
 int	svc_print_xprts(char *buf, int maxlen);
 struct	svc_xprt *svc_find_xprt(struct svc_serv *serv, const char *xcl_name,
 			const sa_family_t af, const unsigned short port);
-int	svc_xprt_names(struct svc_serv *serv, char *buf, int buflen);
+int	svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen);
 
 static inline void svc_xprt_get(struct svc_xprt *xprt)
 {
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index c200d92..96eb873 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -1097,36 +1097,58 @@ struct svc_xprt *svc_find_xprt(struct svc_serv *serv, const char *xcl_name,
 }
 EXPORT_SYMBOL_GPL(svc_find_xprt);
 
-/*
- * Format a buffer with a list of the active transports. A zero for
- * the buflen parameter disables target buffer overflow checking.
+static int svc_one_xprt_name(const struct svc_xprt *xprt,
+			     char *pos, int remaining)
+{
+	int len;
+
+	len = snprintf(pos, remaining, "%s %u\n",
+			xprt->xpt_class->xcl_name,
+			svc_xprt_local_port(xprt));
+	if (len >= remaining)
+		return -ENAMETOOLONG;
+	return len;
+}
+
+/**
+ * svc_xprt_names - format a buffer with a list of transport names
+ * @serv: pointer to an RPC service
+ * @buf: pointer to a buffer to be filled in
+ * @buflen: length of buffer to be filled in
+ *
+ * Fills in @buf with a string containing a list of transport names,
+ * each name terminated with '\n'.
+ *
+ * Returns positive length of the filled-in string on success; otherwise
+ * a negative errno value is returned if an error occurs.
  */
-int svc_xprt_names(struct svc_serv *serv, char *buf, int buflen)
+int svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen)
 {
 	struct svc_xprt *xprt;
-	char xprt_str[64];
-	int totlen = 0;
-	int len;
+	int len, totlen;
+	char *pos;
 
 	/* Sanity check args */
 	if (!serv)
 		return 0;
 
 	spin_lock_bh(&serv->sv_lock);
+
+	pos = buf;
+	totlen = 0;
 	list_for_each_entry(xprt, &serv->sv_permsocks, xpt_list) {
-		len = snprintf(xprt_str, sizeof(xprt_str),
-			       "%s %d\n", xprt->xpt_class->xcl_name,
-			       svc_xprt_local_port(xprt));
-		/* If the string was truncated, replace with error string */
-		if (len >= sizeof(xprt_str))
-			strcpy(xprt_str, "name-too-long\n");
-		/* Don't overflow buffer */
-		len = strlen(xprt_str);
-		if (buflen && (len + totlen >= buflen))
+		len = svc_one_xprt_name(xprt, pos, buflen - totlen);
+		if (len < 0) {
+			*buf = '\0';
+			totlen = len;
+		}
+		if (len <= 0)
 			break;
-		strcpy(buf+totlen, xprt_str);
+
+		pos += len;
 		totlen += len;
 	}
+
 	spin_unlock_bh(&serv->sv_lock);
 	return totlen;
 }


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 10/19] SUNRPC: pass buffer size to svc_addsock()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (8 preceding siblings ...)
  2009-04-23 23:32   ` [PATCH 09/19] NFSD: Prevent a buffer overflow in svc_xprt_names() Chuck Lever
@ 2009-04-23 23:32   ` Chuck Lever
  2009-04-23 23:32   ` [PATCH 11/19] SUNRPC: pass buffer size to svc_sock_names() Chuck Lever
                     ` (9 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:32 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Adjust the synopsis of svc_addsock() to pass in the size of the output
buffer.  Add a documenting comment.

This is a cosmetic change for now.  A subsequent patch will make sure
the buffer length is passed to one_sock_name(), where the length will
actually be useful.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c               |    2 +-
 include/linux/sunrpc/svcsock.h |    3 ++-
 net/sunrpc/svcsock.c           |   16 +++++++++++++---
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 6a1cd90..1f1c215 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -943,7 +943,7 @@ static ssize_t __write_ports_addfd(char *buf)
 	if (err != 0)
 		goto out;
 
-	err = svc_addsock(nfsd_serv, fd, buf);
+	err = svc_addsock(nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT);
 	if (err < 0)
 		lockd_down();
 
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 483e103..e23241c 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -39,7 +39,8 @@ int		svc_send(struct svc_rqst *);
 void		svc_drop(struct svc_rqst *);
 void		svc_sock_update_bufs(struct svc_serv *serv);
 int		svc_sock_names(char *buf, struct svc_serv *serv, char *toclose);
-int		svc_addsock(struct svc_serv *serv, int fd, char *name_return);
+int		svc_addsock(struct svc_serv *serv, const int fd,
+					char *name_return, const size_t len);
 void		svc_init_xprt_sock(void);
 void		svc_cleanup_xprt_sock(void);
 
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 8b08328..6bec1e2 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1128,9 +1128,19 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 	return svsk;
 }
 
-int svc_addsock(struct svc_serv *serv,
-		int fd,
-		char *name_return)
+/**
+ * svc_addsock - add a listener socket to an RPC service
+ * @serv: pointer to RPC service to which to add a new listener
+ * @fd: file descriptor of the new listener
+ * @name_return: pointer to buffer to fill in with name of listener
+ * @len: size of the buffer
+ *
+ * Fills in socket name and returns positive length of name if successful.
+ * Name is terminated with '\n'.  On error, returns a negative errno
+ * value.
+ */
+int svc_addsock(struct svc_serv *serv, const int fd, char *name_return,
+		const size_t len)
 {
 	int err = 0;
 	struct socket *so = sockfd_lookup(fd, &err);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 11/19] SUNRPC: pass buffer size to svc_sock_names()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (9 preceding siblings ...)
  2009-04-23 23:32   ` [PATCH 10/19] SUNRPC: pass buffer size to svc_addsock() Chuck Lever
@ 2009-04-23 23:32   ` Chuck Lever
  2009-04-23 23:32   ` [PATCH 12/19] SUNRPC: Switch one_sock_name() to use snprintf() Chuck Lever
                     ` (8 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:32 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Adjust the synopsis of svc_sock_names() to pass in the size of the
output buffer.  Add a documenting comment.

This is a cosmetic change for now.  A subsequent patch will make sure
the buffer length is passed to one_sock_name(), where the length will
actually be useful.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c               |    3 ++-
 include/linux/sunrpc/svcsock.h |    4 +++-
 net/sunrpc/svcsock.c           |   19 +++++++++++++++++--
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 1f1c215..b64a7fb 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -966,7 +966,8 @@ static ssize_t __write_ports_delfd(char *buf)
 		return -ENOMEM;
 
 	if (nfsd_serv != NULL)
-		len = svc_sock_names(buf, nfsd_serv, toclose);
+		len = svc_sock_names(nfsd_serv, buf,
+					SIMPLE_TRANSACTION_LIMIT, toclose);
 	if (len >= 0)
 		lockd_down();
 
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index e23241c..8271631 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -38,7 +38,9 @@ int		svc_recv(struct svc_rqst *, long);
 int		svc_send(struct svc_rqst *);
 void		svc_drop(struct svc_rqst *);
 void		svc_sock_update_bufs(struct svc_serv *serv);
-int		svc_sock_names(char *buf, struct svc_serv *serv, char *toclose);
+int		svc_sock_names(struct svc_serv *serv, char *buf,
+					const size_t buflen,
+					const char *toclose);
 int		svc_addsock(struct svc_serv *serv, const int fd,
 					char *name_return, const size_t len);
 void		svc_init_xprt_sock(void);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 6bec1e2..032b52e 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -259,8 +259,23 @@ static int one_sock_name(char *buf, struct svc_sock *svsk)
 	return len;
 }
 
-int
-svc_sock_names(char *buf, struct svc_serv *serv, char *toclose)
+/**
+ * svc_sock_names - construct a list of listener names in a string
+ * @serv: pointer to RPC service
+ * @buf: pointer to a buffer to fill in with socket names
+ * @buflen: size of the buffer to be filled
+ * @toclose: pointer to '\0'-terminated C string containing the name
+ *		of a listener to be closed
+ *
+ * Fills in @buf with a '\n'-separated list of names of listener
+ * sockets.  If @toclose is not NULL, the socket named by @toclose
+ * is closed, and is not included in the output list.
+ *
+ * Returns positive length of the socket name string, or a negative
+ * errno value on error.
+ */
+int svc_sock_names(struct svc_serv *serv, char *buf, const size_t buflen,
+		   const char *toclose)
 {
 	struct svc_sock *svsk, *closesk = NULL;
 	int len = 0;


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 12/19] SUNRPC: Switch one_sock_name() to use snprintf()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (10 preceding siblings ...)
  2009-04-23 23:32   ` [PATCH 11/19] SUNRPC: pass buffer size to svc_sock_names() Chuck Lever
@ 2009-04-23 23:32   ` Chuck Lever
  2009-04-23 23:32   ` [PATCH 13/19] SUNRPC: Support PF_INET6 in one_sock_name() Chuck Lever
                     ` (7 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:32 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Use snprintf() in one_sock_name() to prevent overflowing the output
buffer.  If the name doesn't fit in the buffer, the buffer is filled
in with an empty string, and -ENAMETOOLONG is returned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/svcsock.c |   25 ++++++++++++++++++-------
 1 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 032b52e..61d4a32 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -240,22 +240,27 @@ out:
 /*
  * Report socket names for nfsdfs
  */
-static int one_sock_name(char *buf, struct svc_sock *svsk)
+static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining)
 {
 	int len;
 
 	switch(svsk->sk_sk->sk_family) {
-	case AF_INET:
-		len = sprintf(buf, "ipv4 %s %pI4 %d\n",
+	case PF_INET:
+		len = snprintf(buf, remaining, "ipv4 %s %pI4 %d\n",
 			      svsk->sk_sk->sk_protocol == IPPROTO_UDP ?
 			      "udp" : "tcp",
 			      &inet_sk(svsk->sk_sk)->rcv_saddr,
 			      inet_sk(svsk->sk_sk)->num);
 		break;
 	default:
-		len = sprintf(buf, "*unknown-%d*\n",
+		len = snprintf(buf, remaining, "*unknown-%d*\n",
 			       svsk->sk_sk->sk_family);
 	}
+
+	if (len >= remaining) {
+		*buf = '\0';
+		return -ENAMETOOLONG;
+	}
 	return len;
 }
 
@@ -282,15 +287,21 @@ int svc_sock_names(struct svc_serv *serv, char *buf, const size_t buflen,
 
 	if (!serv)
 		return 0;
+
 	spin_lock_bh(&serv->sv_lock);
 	list_for_each_entry(svsk, &serv->sv_permsocks, sk_xprt.xpt_list) {
-		int onelen = one_sock_name(buf+len, svsk);
-		if (toclose && strcmp(toclose, buf+len) == 0)
+		int onelen = svc_one_sock_name(svsk, buf + len, buflen - len);
+		if (onelen < 0) {
+			len = onelen;
+			break;
+		}
+		if (toclose && strcmp(toclose, buf + len) == 0)
 			closesk = svsk;
 		else
 			len += onelen;
 	}
 	spin_unlock_bh(&serv->sv_lock);
+
 	if (closesk)
 		/* Should unregister with portmap, but you cannot
 		 * unregister just one protocol...
@@ -1195,7 +1206,7 @@ int svc_addsock(struct svc_serv *serv, const int fd, char *name_return,
 		sockfd_put(so);
 		return err;
 	}
-	return one_sock_name(name_return, svsk);
+	return svc_one_sock_name(svsk, name_return, len);
 }
 EXPORT_SYMBOL_GPL(svc_addsock);
 


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 13/19] SUNRPC: Support PF_INET6 in one_sock_name()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (11 preceding siblings ...)
  2009-04-23 23:32   ` [PATCH 12/19] SUNRPC: Switch one_sock_name() to use snprintf() Chuck Lever
@ 2009-04-23 23:32   ` Chuck Lever
  2009-04-23 23:33   ` [PATCH 14/19] SUNRPC: Clean up one_sock_name() Chuck Lever
                     ` (6 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:32 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Add an arm to the switch statement in svc_one_sock_name() so it can
construct the name of PF_INET6 sockets properly.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aime Le Rouzic <aime.le-rouzic@bull.net>
---

 net/sunrpc/svcsock.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 61d4a32..983bfa9 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -252,6 +252,13 @@ static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining)
 			      &inet_sk(svsk->sk_sk)->rcv_saddr,
 			      inet_sk(svsk->sk_sk)->num);
 		break;
+	case PF_INET6:
+		len = snprintf(buf, remaining, "ipv6 %s %pI6 %d\n",
+				svsk->sk_sk->sk_protocol == IPPROTO_UDP ?
+				"udp" : "tcp",
+				&inet6_sk(svsk->sk_sk)->rcv_saddr,
+				inet_sk(svsk->sk_sk)->num);
+		break;
 	default:
 		len = snprintf(buf, remaining, "*unknown-%d*\n",
 			       svsk->sk_sk->sk_family);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 14/19] SUNRPC: Clean up one_sock_name()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (12 preceding siblings ...)
  2009-04-23 23:32   ` [PATCH 13/19] SUNRPC: Support PF_INET6 in one_sock_name() Chuck Lever
@ 2009-04-23 23:33   ` Chuck Lever
  2009-04-23 23:33   ` [PATCH 15/19] NFSD: Stricter buffer size checking in write_recoverydir() Chuck Lever
                     ` (5 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:33 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Clean up svc_one_sock_name() by setting up automatic variables for
frequently used expressions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/svcsock.c |   21 +++++++++++----------
 1 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 983bfa9..4e6d406 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -242,26 +242,27 @@ out:
  */
 static int svc_one_sock_name(struct svc_sock *svsk, char *buf, int remaining)
 {
+	const struct sock *sk = svsk->sk_sk;
+	const char *proto_name = sk->sk_protocol == IPPROTO_UDP ?
+							"udp" : "tcp";
 	int len;
 
-	switch(svsk->sk_sk->sk_family) {
+	switch (sk->sk_family) {
 	case PF_INET:
 		len = snprintf(buf, remaining, "ipv4 %s %pI4 %d\n",
-			      svsk->sk_sk->sk_protocol == IPPROTO_UDP ?
-			      "udp" : "tcp",
-			      &inet_sk(svsk->sk_sk)->rcv_saddr,
-			      inet_sk(svsk->sk_sk)->num);
+				proto_name,
+				&inet_sk(sk)->rcv_saddr,
+				inet_sk(sk)->num);
 		break;
 	case PF_INET6:
 		len = snprintf(buf, remaining, "ipv6 %s %pI6 %d\n",
-				svsk->sk_sk->sk_protocol == IPPROTO_UDP ?
-				"udp" : "tcp",
-				&inet6_sk(svsk->sk_sk)->rcv_saddr,
-				inet_sk(svsk->sk_sk)->num);
+				proto_name,
+				&inet6_sk(sk)->rcv_saddr,
+				inet_sk(sk)->num);
 		break;
 	default:
 		len = snprintf(buf, remaining, "*unknown-%d*\n",
-			       svsk->sk_sk->sk_family);
+				sk->sk_family);
 	}
 
 	if (len >= remaining) {


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 15/19] NFSD: Stricter buffer size checking in write_recoverydir()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (13 preceding siblings ...)
  2009-04-23 23:33   ` [PATCH 14/19] SUNRPC: Clean up one_sock_name() Chuck Lever
@ 2009-04-23 23:33   ` Chuck Lever
  2009-04-23 23:33   ` [PATCH 16/19] NFSD: Stricter buffer size checking in write_versions() Chuck Lever
                     ` (4 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:33 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

While it's not likely a pathname will be longer than
SIMPLE_TRANSACTION_SIZE, we should be more careful about just
plopping it into the output buffer without bounds checking.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index b64a7fb..c484346 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1260,8 +1260,9 @@ static ssize_t __write_recoverydir(struct file *file, char *buf, size_t size)
 
 		status = nfs4_reset_recoverydir(recdir);
 	}
-	sprintf(buf, "%s\n", nfs4_recoverydir());
-	return strlen(buf);
+
+	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%s\n",
+							nfs4_recoverydir());
 }
 
 /**


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 16/19] NFSD: Stricter buffer size checking in write_versions()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (14 preceding siblings ...)
  2009-04-23 23:33   ` [PATCH 15/19] NFSD: Stricter buffer size checking in write_recoverydir() Chuck Lever
@ 2009-04-23 23:33   ` Chuck Lever
  2009-04-23 23:33   ` [PATCH 17/19] NFSD: Stricter buffer size checking in fs/nfsd/nfsctl.c Chuck Lever
                     ` (3 subsequent siblings)
  19 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:33 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

While it's not likely today that there are enough NFS versions to
overflow the output buffer in write_versions(), we should be more
careful about detecting the end of the buffer.

The number of NFS versions will only increase as NFSv4 minor versions
are added.

Note that this API doesn't behave the same as portlist.  Here we
attempt to display as many versions as will fit in the buffer, and do
not provide any indication that an overflow would have occurred.  I
don't have any good rationale for that.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c |   32 +++++++++++++++++++++++++-------
 1 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index c484346..a152694 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -793,7 +793,7 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size)
 {
 	char *mesg = buf;
 	char *vers, *minorp, sign;
-	int len, num;
+	int len, num, remaining;
 	unsigned minor;
 	ssize_t tlen = 0;
 	char *sep;
@@ -840,32 +840,50 @@ static ssize_t __write_versions(struct file *file, char *buf, size_t size)
 			}
 		next:
 			vers += len + 1;
-			tlen += len;
 		} while ((len = qword_get(&mesg, vers, size)) > 0);
 		/* If all get turned off, turn them back on, as
 		 * having no versions is BAD
 		 */
 		nfsd_reset_versions();
 	}
+
 	/* Now write current state into reply buffer */
 	len = 0;
 	sep = "";
+	remaining = SIMPLE_TRANSACTION_LIMIT;
 	for (num=2 ; num <= 4 ; num++)
 		if (nfsd_vers(num, NFSD_AVAIL)) {
-			len += sprintf(buf+len, "%s%c%d", sep,
+			len = snprintf(buf, remaining, "%s%c%d", sep,
 				       nfsd_vers(num, NFSD_TEST)?'+':'-',
 				       num);
 			sep = " ";
+
+			if (len > remaining)
+				break;
+			remaining -= len;
+			buf += len;
+			tlen += len;
 		}
 	if (nfsd_vers(4, NFSD_AVAIL))
-		for (minor = 1; minor <= NFSD_SUPPORTED_MINOR_VERSION; minor++)
-			len += sprintf(buf+len, " %c4.%u",
+		for (minor = 1; minor <= NFSD_SUPPORTED_MINOR_VERSION;
+		     minor++) {
+			len = snprintf(buf, remaining, " %c4.%u",
 					(nfsd_vers(4, NFSD_TEST) &&
 					 nfsd_minorversion(minor, NFSD_TEST)) ?
 						'+' : '-',
 					minor);
-	len += sprintf(buf+len, "\n");
-	return len;
+
+			if (len > remaining)
+				break;
+			remaining -= len;
+			buf += len;
+			tlen += len;
+		}
+
+	len = snprintf(buf, remaining, "\n");
+	if (len > remaining)
+		return -EINVAL;
+	return tlen + len;
 }
 
 /**


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 17/19] NFSD: Stricter buffer size checking in fs/nfsd/nfsctl.c
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (15 preceding siblings ...)
  2009-04-23 23:33   ` [PATCH 16/19] NFSD: Stricter buffer size checking in write_versions() Chuck Lever
@ 2009-04-23 23:33   ` Chuck Lever
       [not found]     ` <20090423233325.17283.71127.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
  2009-04-23 23:33   ` [PATCH 18/19] lockd: Update NSM state from SM_MON replies Chuck Lever
                     ` (2 subsequent siblings)
  19 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:33 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Clean up: For consistency, handle output buffer size checking in a
other nfsctl functions the same way it's done for write_versions().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfsd/nfsctl.c |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index a152694..877e713 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -695,8 +695,9 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
 		if (rv)
 			return rv;
 	}
-	sprintf(buf, "%d\n", nfsd_nrthreads());
-	return strlen(buf);
+
+	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%d\n",
+							nfsd_nrthreads());
 }
 
 /**
@@ -1197,7 +1198,9 @@ static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
 		nfsd_max_blksize = bsize;
 		mutex_unlock(&nfsd_mutex);
 	}
-	return sprintf(buf, "%d\n", nfsd_max_blksize);
+
+	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%d\n",
+							nfsd_max_blksize);
 }
 
 #ifdef CONFIG_NFSD_V4
@@ -1221,8 +1224,9 @@ static ssize_t __write_leasetime(struct file *file, char *buf, size_t size)
 			return -EINVAL;
 		nfs4_reset_lease(lease);
 	}
-	sprintf(buf, "%ld\n", nfs4_lease_time());
-	return strlen(buf);
+
+	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%ld\n",
+							nfs4_lease_time());
 }
 
 /**


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 18/19] lockd: Update NSM state from SM_MON replies
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (16 preceding siblings ...)
  2009-04-23 23:33   ` [PATCH 17/19] NFSD: Stricter buffer size checking in fs/nfsd/nfsctl.c Chuck Lever
@ 2009-04-23 23:33   ` Chuck Lever
       [not found]     ` <20090423233332.17283.23011.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
  2009-04-23 23:33   ` [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private() Chuck Lever
  2009-04-25 22:14   ` [PATCH 00/19] Proposed server-side patches for 2.6.31 J. Bruce Fields
  19 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:33 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

When rpc.statd starts up in user space at boot time, it attempts to
write the latest NSM local state number into
/proc/sys/fs/nfs/nsm_local_state.

If lockd.ko isn't loaded yet (as is the case in most configurations),
that file doesn't exist, thus the kernel's NSM state remains set to
its initial value of zero during lockd operation.

This is a problem because rpc.statd and lockd use the NSM state number
to prevent repeated lock recovery on rebooted hosts.  If lockd sends
a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state
number is received, there is no way for lockd or rpc.statd to
distinguish that stale SM_NOTIFY from an actual reboot.  Thus lock
recovery could be performed after the rebooted host has already
started reclaiming locks, and those locks will be lost.

We could change /etc/init.d/nfslock so it always modprobes lockd.ko
before starting rpc.statd.  However, if lockd.ko is ever unloaded
and reloaded, we are back at square one, since the NSM state is not
preserved across an unload/reload cycle.  This may happen frequently
on clients that use automounter.  A period of NFS inactivity causes
lockd.ko to be unloaded, and the kernel loses its NSM state setting.

Instead, let's use the fact that rpc.statd plants the local system's
NSM state in every SM_MON (and SM_UNMON) reply.  lockd performs a
synchronous SM_MON upcall to the local rpc.statd _before_ sending its
first NLM request to a new remote.  This would permit rpc.statd to
provide the current NSM state to lockd, even after lockd.ko had been
unloaded and reloaded.

Note that NLMPROC_LOCK arguments are constructed before the
nsm_monitor() call, so we have to rearrange argument construction very
slightly to make this all work out.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/lockd/clntproc.c |    2 +-
 fs/lockd/mon.c      |    6 +++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index dd79570..f55b900 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -126,7 +126,6 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
 	struct nlm_lock	*lock = &argp->lock;
 
 	nlmclnt_next_cookie(&argp->cookie);
-	argp->state   = nsm_local_state;
 	memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode), sizeof(struct nfs_fh));
 	lock->caller  = utsname()->nodename;
 	lock->oh.data = req->a_owner;
@@ -519,6 +518,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
 
 	if (nsm_monitor(host) < 0)
 		goto out;
+	req->a_args.state = nsm_local_state;
 
 	fl->fl_flags |= FL_ACCESS;
 	status = do_vfs_lock(fl);
diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
index 6d5d4a4..5017d50 100644
--- a/fs/lockd/mon.c
+++ b/fs/lockd/mon.c
@@ -188,8 +188,12 @@ int nsm_monitor(const struct nlm_host *host)
 		status = -EIO;
 	if (status < 0)
 		printk(KERN_NOTICE "lockd: cannot monitor %s\n", nsm->sm_name);
-	else
+	else {
 		nsm->sm_monitored = 1;
+		nsm_local_state = res.state;
+		dprintk("lockd: nsm_monitor: NSM state is now %d\n",
+				nsm_local_state);
+	}
 	return status;
 }
 


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (17 preceding siblings ...)
  2009-04-23 23:33   ` [PATCH 18/19] lockd: Update NSM state from SM_MON replies Chuck Lever
@ 2009-04-23 23:33   ` Chuck Lever
       [not found]     ` <20090423233340.17283.29580.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
  2009-04-25 22:14   ` [PATCH 00/19] Proposed server-side patches for 2.6.31 J. Bruce Fields
  19 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-23 23:33 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Recently, commit ad5b365c fixed a data type alignment issue in
nsm_init_private() by introducing put_unaligned().  We don't actually
_require_ an unaligned access to nsm_private here.

Instead, we should always use memcpy/memcmp to access the contents of
RPC opaque data types.  This permits us to continue to define these as
simple character arrays, as most legacy RPC code does, and to rely on
gcc to pack the fields in in-core structures optimally without breaking
on platforms that require strict alignment.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/lockd/mon.c |   12 +++++-------
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
index 5017d50..763509a 100644
--- a/fs/lockd/mon.c
+++ b/fs/lockd/mon.c
@@ -16,8 +16,6 @@
 #include <linux/sunrpc/svc.h>
 #include <linux/lockd/lockd.h>
 
-#include <asm/unaligned.h>
-
 #define NLMDBG_FACILITY		NLMDBG_MONITOR
 #define NSM_PROGRAM		100024
 #define NSM_VERSION		1
@@ -278,14 +276,14 @@ static struct nsm_handle *nsm_lookup_priv(const struct nsm_private *priv)
  */
 static void nsm_init_private(struct nsm_handle *nsm)
 {
-	u64 *p = (u64 *)&nsm->sm_priv.data;
 	struct timespec ts;
-	s64 ns;
+	u64 buf[2];
 
 	ktime_get_ts(&ts);
-	ns = timespec_to_ns(&ts);
-	put_unaligned(ns, p);
-	put_unaligned((unsigned long)nsm, p + 1);
+	buf[0] = timespec_to_ns(&ts);
+	buf[1] = (unsigned long)nsm;
+
+	memcpy(nsm->sm_priv.data, buf, SM_PRIV_SIZE);
 }
 
 static struct nsm_handle *nsm_create_handle(const struct sockaddr *sap,


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 00/19] Proposed server-side patches for 2.6.31
       [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
                     ` (18 preceding siblings ...)
  2009-04-23 23:33   ` [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private() Chuck Lever
@ 2009-04-25 22:14   ` J. Bruce Fields
  19 siblings, 0 replies; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-25 22:14 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Thu, Apr 23, 2009 at 07:31:16PM -0400, Chuck Lever wrote:
> Hi Bruce -
> 
> Here are some patches I'd like to propose for 2.6.31.  Please review
> and consider them for linux-next.  They've seen a little testing here
> on x86_64, but they should get some public exercise.

I'm a little behind, apologies for any delay--I'll try to give these a
good reading soon.  Meanwhile--could you put these in a public git tree
and give me the url?  That'll make it trivial to pull them into -next,
at least.

--b.

> 
> The first 17 are updated versions of patches I submitted for 2.6.30,
> but were dropped at the last minute to simplify things.  They provide
> some clean-up for the nfsctl API, and a little more IPv6 support in
> that area as well, in preparation for full-on IPv6-related changes to
> come.  I think I've addressed earlier review comments.
> 
> As usual, it would be most helpful if you would drop the whole series
> if you have issues with any of these patches.
> 
> Patch 18 addresses a long-standing problem with our NLM/NSM
> implementation.  If lockd.ko isn't loaded before statd starts up
> (which is the typical case for both our server and client) then
> lockd's NSM state number is never set.  We then end up sending a zero
> state number in all NLM requests.  This is also the case if lockd.ko
> is unloaded and reloaded during normal operation -- the NSM state
> number is reset to zero.  See the patch description for further
> detail.
> 
> Patch 19 provides a little more 64-bit alignment clean up in
> nsm_init_private() .
> 
> These two are independent of all the others and either or both can be
> applied as you see fit.
> 
> ---
> 
> Chuck Lever (19):
>       lockd: clean up 64-bit alignment fix in nsm_init_private()
>       lockd: Update NSM state from SM_MON replies
>       NFSD: Stricter buffer size checking in fs/nfsd/nfsctl.c
>       NFSD: Stricter buffer size checking in write_versions()
>       NFSD: Stricter buffer size checking in write_recoverydir()
>       SUNRPC: Clean up one_sock_name()
>       SUNRPC: Support PF_INET6 in one_sock_name()
>       SUNRPC: Switch one_sock_name() to use snprintf()
>       SUNRPC: pass buffer size to svc_sock_names()
>       SUNRPC: pass buffer size to svc_addsock()
>       NFSD: Prevent a buffer overflow in svc_xprt_names()
>       NFSD: move lockd_up() before svc_addsock()
>       NFSD: Finish refactoring __write_ports()
>       NFSD: Note an additional requirement when passing TCP sockets to portlist
>       NFSD: Refactor socket creation out of __write_ports()
>       NFSD: Refactor portlist socket closing into a helper
>       NFSD: Refactor transport addition out of __write_ports()
>       NFSD: Refactor transport removal out of __write_ports()
>       SUNRPC: Fix error return value of svc_addr_len()
> 
> 
>  fs/lockd/clntproc.c             |    2 
>  fs/lockd/mon.c                  |   18 +-
>  fs/nfsd/nfsctl.c                |  284 ++++++++++++++++++++++++---------------
>  include/linux/sunrpc/svc_xprt.h |    7 +
>  include/linux/sunrpc/svcsock.h  |    7 +
>  net/sunrpc/svc_xprt.c           |   56 +++++---
>  net/sunrpc/svcsock.c            |   87 +++++++++---
>  7 files changed, 299 insertions(+), 162 deletions(-)
> 
> -- 
> Chuck Lever <chuck.lever@oracle.com>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/19] SUNRPC: Fix error return value of svc_addr_len()
       [not found]     ` <20090423233124.17283.40252.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
@ 2009-04-25 22:17       ` J. Bruce Fields
  2009-04-27 16:49         ` Chuck Lever
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-25 22:17 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Thu, Apr 23, 2009 at 07:31:25PM -0400, Chuck Lever wrote:
> The svc_addr_len() helper function returns -EAFNOSUPPORT if it doesn't
> recognize the address family of the passed-in socket address.  However,
> the return type of this function is size_t, which means -EAFNOSUPPORT
> is turned into a very large positive value in this case.
> 
> The check in svc_udp_recvfrom() to see if the return value is less
> than zero therefore won't work at all.
> 
> Additionally, handle_connect_req() passes this value directly to
> memset().  This could cause memset() to clobber a large chunk of memory
> if svc_addr_len() has returned an error.  Currently the address family
> of these addresses, however, is known to be supported long before
> handle_connect_req() is called, so this isn't a real risk.
> 
> Change the error return value of svc_addr_len() to zero, which fits in
> the range of size_t, and is safer to pass to memset() directly.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> 
>  include/linux/sunrpc/svc_xprt.h |    5 +++--
>  net/sunrpc/svcsock.c            |    7 ++++---
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> index 0d9cb6e..d790c52 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -118,7 +118,7 @@ static inline unsigned short svc_addr_port(const struct sockaddr *sa)
>  	return 0;
>  }
>  
> -static inline size_t svc_addr_len(struct sockaddr *sa)
> +static inline size_t svc_addr_len(const struct sockaddr *sa)
>  {
>  	switch (sa->sa_family) {
>  	case AF_INET:
> @@ -126,7 +126,8 @@ static inline size_t svc_addr_len(struct sockaddr *sa)
>  	case AF_INET6:
>  		return sizeof(struct sockaddr_in6);
>  	}
> -	return -EAFNOSUPPORT;
> +

May as well stick a WARN() here too if only as a shorthand way of
documenting that this isn't meant to happen.

--b.

> +	return 0;
>  }
>  
>  static inline unsigned short svc_xprt_local_port(const struct svc_xprt *xprt)
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index af31988..8b08328 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -426,13 +426,14 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
>  		long		all[SVC_PKTINFO_SPACE / sizeof(long)];
>  	} buffer;
>  	struct cmsghdr *cmh = &buffer.hdr;
> -	int		err, len;
>  	struct msghdr msg = {
>  		.msg_name = svc_addr(rqstp),
>  		.msg_control = cmh,
>  		.msg_controllen = sizeof(buffer),
>  		.msg_flags = MSG_DONTWAIT,
>  	};
> +	size_t len;
> +	int err;
>  
>  	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
>  	    /* udp sockets need large rcvbuf as all pending
> @@ -464,8 +465,8 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
>  		return -EAGAIN;
>  	}
>  	len = svc_addr_len(svc_addr(rqstp));
> -	if (len < 0)
> -		return len;
> +	if (len == 0)
> +		return -EAFNOSUPPORT;
>  	rqstp->rq_addrlen = len;
>  	if (skb->tstamp.tv64 == 0) {
>  		skb->tstamp = ktime_get_real();
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 05/19] NFSD: Refactor socket creation out of __write_ports()
       [not found]     ` <20090423233155.17283.37345.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
@ 2009-04-25 22:40       ` J. Bruce Fields
  0 siblings, 0 replies; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-25 22:40 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Thu, Apr 23, 2009 at 07:31:55PM -0400, Chuck Lever wrote:
> Clean up: Refactor the socket creation logic out of __write_ports() to
> make it easier to understand and maintain.

Yep, these look good, thanks.--b.

> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> 
>  fs/nfsd/nfsctl.c |   64 +++++++++++++++++++++++++++++-------------------------
>  1 files changed, 34 insertions(+), 30 deletions(-)
> 
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index fa268d1..b6a847f 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -911,6 +911,37 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
>  }
>  
>  /*
> + * A single 'fd' number was written, in which case it must be for
> + * a socket of a supported family/protocol, and we use it as an
> + * nfsd listener.
> + */
> +static ssize_t __write_ports_addfd(char *buf)
> +{
> +	char *mesg = buf;
> +	int fd, err;
> +
> +	err = get_int(&mesg, &fd);
> +	if (err != 0 || fd < 0)
> +		return -EINVAL;
> +
> +	err = nfsd_create_serv();
> +	if (err != 0)
> +		return err;
> +
> +	err = svc_addsock(nfsd_serv, fd, buf);
> +	if (err >= 0) {
> +		err = lockd_up();
> +		if (err < 0)
> +			svc_sock_names(buf + strlen(buf) + 1, nfsd_serv, buf);
> +
> +		/* Decrease the count, but don't shut down the service */
> +		nfsd_serv->sv_nrthreads--;
> +	}
> +
> +	return err < 0 ? err : 0;
> +}
> +
> +/*
>   * A '-' followed by the 'name' of a socket means we close the socket.
>   */
>  static ssize_t __write_ports_delfd(char *buf)
> @@ -995,36 +1026,9 @@ static ssize_t __write_ports(struct file *file, char *buf, size_t size)
>  			len = svc_xprt_names(nfsd_serv, buf, 0);
>  		return len;
>  	}
> -	/* Either a single 'fd' number is written, in which
> -	 * case it must be for a socket of a supported family/protocol,
> -	 * and we use it as an nfsd socket, or
> -	 * A '-' followed by the 'name' of a socket in which case
> -	 * we close the socket.
> -	 */
> -	if (isdigit(buf[0])) {
> -		char *mesg = buf;
> -		int fd;
> -		int err;
> -		err = get_int(&mesg, &fd);
> -		if (err)
> -			return -EINVAL;
> -		if (fd < 0)
> -			return -EINVAL;
> -		err = nfsd_create_serv();
> -		if (!err) {
> -			err = svc_addsock(nfsd_serv, fd, buf);
> -			if (err >= 0) {
> -				err = lockd_up();
> -				if (err < 0)
> -					svc_sock_names(buf+strlen(buf)+1, nfsd_serv, buf);
> -			}
> -			/* Decrease the count, but don't shutdown the
> -			 * the service
> -			 */
> -			nfsd_serv->sv_nrthreads--;
> -		}
> -		return err < 0 ? err : 0;
> -	}
> +
> +	if (isdigit(buf[0]))
> +		return __write_ports_addfd(buf);
>  
>  	if (buf[0] == '-' && isdigit(buf[1]))
>  		return __write_ports_delfd(buf);
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/19] SUNRPC: Fix error return value of svc_addr_len()
  2009-04-25 22:17       ` J. Bruce Fields
@ 2009-04-27 16:49         ` Chuck Lever
  2009-04-27 23:51           ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-27 16:49 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Apr 25, 2009, at 6:17 PM, J. Bruce Fields wrote:
> On Thu, Apr 23, 2009 at 07:31:25PM -0400, Chuck Lever wrote:
>> The svc_addr_len() helper function returns -EAFNOSUPPORT if it  
>> doesn't
>> recognize the address family of the passed-in socket address.   
>> However,
>> the return type of this function is size_t, which means -EAFNOSUPPORT
>> is turned into a very large positive value in this case.
>>
>> The check in svc_udp_recvfrom() to see if the return value is less
>> than zero therefore won't work at all.
>>
>> Additionally, handle_connect_req() passes this value directly to
>> memset().  This could cause memset() to clobber a large chunk of  
>> memory
>> if svc_addr_len() has returned an error.  Currently the address  
>> family
>> of these addresses, however, is known to be supported long before
>> handle_connect_req() is called, so this isn't a real risk.
>>
>> Change the error return value of svc_addr_len() to zero, which fits  
>> in
>> the range of size_t, and is safer to pass to memset() directly.
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>
>> include/linux/sunrpc/svc_xprt.h |    5 +++--
>> net/sunrpc/svcsock.c            |    7 ++++---
>> 2 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/ 
>> svc_xprt.h
>> index 0d9cb6e..d790c52 100644
>> --- a/include/linux/sunrpc/svc_xprt.h
>> +++ b/include/linux/sunrpc/svc_xprt.h
>> @@ -118,7 +118,7 @@ static inline unsigned short  
>> svc_addr_port(const struct sockaddr *sa)
>> 	return 0;
>> }
>>
>> -static inline size_t svc_addr_len(struct sockaddr *sa)
>> +static inline size_t svc_addr_len(const struct sockaddr *sa)
>> {
>> 	switch (sa->sa_family) {
>> 	case AF_INET:
>> @@ -126,7 +126,8 @@ static inline size_t svc_addr_len(struct  
>> sockaddr *sa)
>> 	case AF_INET6:
>> 		return sizeof(struct sockaddr_in6);
>> 	}
>> -	return -EAFNOSUPPORT;
>> +
>
> May as well stick a WARN() here too if only as a shorthand way of
> documenting that this isn't meant to happen.

Sounds reasonable, but:

1.  I thought BUG() was the way to document "shouldn't happen," and

2.  What about all the other static inliney places this "shouldn't  
happen" ?

I guess we kind of gave up on BUG'ing in all these places.

> --b.
>
>> +	return 0;
>> }
>>
>> static inline unsigned short svc_xprt_local_port(const struct  
>> svc_xprt *xprt)
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index af31988..8b08328 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -426,13 +426,14 @@ static int svc_udp_recvfrom(struct svc_rqst  
>> *rqstp)
>> 		long		all[SVC_PKTINFO_SPACE / sizeof(long)];
>> 	} buffer;
>> 	struct cmsghdr *cmh = &buffer.hdr;
>> -	int		err, len;
>> 	struct msghdr msg = {
>> 		.msg_name = svc_addr(rqstp),
>> 		.msg_control = cmh,
>> 		.msg_controllen = sizeof(buffer),
>> 		.msg_flags = MSG_DONTWAIT,
>> 	};
>> +	size_t len;
>> +	int err;
>>
>> 	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
>> 	    /* udp sockets need large rcvbuf as all pending
>> @@ -464,8 +465,8 @@ static int svc_udp_recvfrom(struct svc_rqst  
>> *rqstp)
>> 		return -EAGAIN;
>> 	}
>> 	len = svc_addr_len(svc_addr(rqstp));
>> -	if (len < 0)
>> -		return len;
>> +	if (len == 0)
>> +		return -EAFNOSUPPORT;
>> 	rqstp->rq_addrlen = len;
>> 	if (skb->tstamp.tv64 == 0) {
>> 		skb->tstamp = ktime_get_real();
>>

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/19] SUNRPC: Fix error return value of svc_addr_len()
  2009-04-27 16:49         ` Chuck Lever
@ 2009-04-27 23:51           ` J. Bruce Fields
  2009-04-28 15:28             ` Chuck Lever
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-27 23:51 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Mon, Apr 27, 2009 at 12:49:07PM -0400, Chuck Lever wrote:
> On Apr 25, 2009, at 6:17 PM, J. Bruce Fields wrote:
>> On Thu, Apr 23, 2009 at 07:31:25PM -0400, Chuck Lever wrote:
>>> The svc_addr_len() helper function returns -EAFNOSUPPORT if it  
>>> doesn't
>>> recognize the address family of the passed-in socket address.   
>>> However,
>>> the return type of this function is size_t, which means -EAFNOSUPPORT
>>> is turned into a very large positive value in this case.
>>>
>>> The check in svc_udp_recvfrom() to see if the return value is less
>>> than zero therefore won't work at all.
>>>
>>> Additionally, handle_connect_req() passes this value directly to
>>> memset().  This could cause memset() to clobber a large chunk of  
>>> memory
>>> if svc_addr_len() has returned an error.  Currently the address  
>>> family
>>> of these addresses, however, is known to be supported long before
>>> handle_connect_req() is called, so this isn't a real risk.
>>>
>>> Change the error return value of svc_addr_len() to zero, which fits  
>>> in
>>> the range of size_t, and is safer to pass to memset() directly.
>>>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>
>>> include/linux/sunrpc/svc_xprt.h |    5 +++--
>>> net/sunrpc/svcsock.c            |    7 ++++---
>>> 2 files changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/ 
>>> svc_xprt.h
>>> index 0d9cb6e..d790c52 100644
>>> --- a/include/linux/sunrpc/svc_xprt.h
>>> +++ b/include/linux/sunrpc/svc_xprt.h
>>> @@ -118,7 +118,7 @@ static inline unsigned short svc_addr_port(const 
>>> struct sockaddr *sa)
>>> 	return 0;
>>> }
>>>
>>> -static inline size_t svc_addr_len(struct sockaddr *sa)
>>> +static inline size_t svc_addr_len(const struct sockaddr *sa)
>>> {
>>> 	switch (sa->sa_family) {
>>> 	case AF_INET:
>>> @@ -126,7 +126,8 @@ static inline size_t svc_addr_len(struct  
>>> sockaddr *sa)
>>> 	case AF_INET6:
>>> 		return sizeof(struct sockaddr_in6);
>>> 	}
>>> -	return -EAFNOSUPPORT;
>>> +
>>
>> May as well stick a WARN() here too if only as a shorthand way of
>> documenting that this isn't meant to happen.
>
> Sounds reasonable, but:
>
> 1.  I thought BUG() was the way to document "shouldn't happen," and

That'd be OK too.

>
> 2.  What about all the other static inliney places this "shouldn't  
> happen" ?

Such as?

--b.

>
> I guess we kind of gave up on BUG'ing in all these places.
>
>> --b.
>>
>>> +	return 0;
>>> }
>>>
>>> static inline unsigned short svc_xprt_local_port(const struct  
>>> svc_xprt *xprt)
>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>> index af31988..8b08328 100644
>>> --- a/net/sunrpc/svcsock.c
>>> +++ b/net/sunrpc/svcsock.c
>>> @@ -426,13 +426,14 @@ static int svc_udp_recvfrom(struct svc_rqst  
>>> *rqstp)
>>> 		long		all[SVC_PKTINFO_SPACE / sizeof(long)];
>>> 	} buffer;
>>> 	struct cmsghdr *cmh = &buffer.hdr;
>>> -	int		err, len;
>>> 	struct msghdr msg = {
>>> 		.msg_name = svc_addr(rqstp),
>>> 		.msg_control = cmh,
>>> 		.msg_controllen = sizeof(buffer),
>>> 		.msg_flags = MSG_DONTWAIT,
>>> 	};
>>> +	size_t len;
>>> +	int err;
>>>
>>> 	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
>>> 	    /* udp sockets need large rcvbuf as all pending
>>> @@ -464,8 +465,8 @@ static int svc_udp_recvfrom(struct svc_rqst  
>>> *rqstp)
>>> 		return -EAGAIN;
>>> 	}
>>> 	len = svc_addr_len(svc_addr(rqstp));
>>> -	if (len < 0)
>>> -		return len;
>>> +	if (len == 0)
>>> +		return -EAFNOSUPPORT;
>>> 	rqstp->rq_addrlen = len;
>>> 	if (skb->tstamp.tv64 == 0) {
>>> 		skb->tstamp = ktime_get_real();
>>>
>
> -- 
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/19] NFSD: Prevent a buffer overflow in svc_xprt_names()
       [not found]     ` <20090423233225.17283.10176.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
@ 2009-04-27 23:56       ` J. Bruce Fields
  0 siblings, 0 replies; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-27 23:56 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Thu, Apr 23, 2009 at 07:32:25PM -0400, Chuck Lever wrote:
> The svc_xprt_names() function can overflow its buffer if it's so near
> the end of the passed in buffer that the "name too long" string still
> doesn't fit.  Of course, it could never tell if it was near the end
> of the passed in buffer, since its only caller passes in zero as the
> buffer length.
> 
> Let's make this API a little safer.
> 
> Change svc_xprt_names() so it *always* checks for a buffer overflow,
> and change its only caller to pass in the correct buffer length.
> 
> If svc_xprt_names() does overflow its buffer, it now fails with an
> ENAMETOOLONG errno, instead of trying to write a message at the end
> of the buffer.  I don't like this much, but I can't figure out a clean
> way that's always safe to return some of the names, *and* an
> indication that the buffer was not long enough.
> 
> The displayed error when doing a 'cat /proc/fs/nfsd/portlist' is
> "File name too long".

Also worth noting that the buffer in question is nearly a page, which
makes this a bit academic for now.  (Not an objection to the patch, just
something to note.)

--b.

> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> 
>  fs/nfsd/nfsctl.c                |    2 +
>  include/linux/sunrpc/svc_xprt.h |    2 +
>  net/sunrpc/svc_xprt.c           |   56 +++++++++++++++++++++++++++------------
>  3 files changed, 41 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index e051847..6a1cd90 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -918,7 +918,7 @@ static ssize_t __write_ports_names(char *buf)
>  {
>  	if (nfsd_serv == NULL)
>  		return 0;
> -	return svc_xprt_names(nfsd_serv, buf, 0);
> +	return svc_xprt_names(nfsd_serv, buf, SIMPLE_TRANSACTION_LIMIT);
>  }
>  
>  /*
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> index d790c52..2223ae0 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -83,7 +83,7 @@ int	svc_port_is_privileged(struct sockaddr *sin);
>  int	svc_print_xprts(char *buf, int maxlen);
>  struct	svc_xprt *svc_find_xprt(struct svc_serv *serv, const char *xcl_name,
>  			const sa_family_t af, const unsigned short port);
> -int	svc_xprt_names(struct svc_serv *serv, char *buf, int buflen);
> +int	svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen);
>  
>  static inline void svc_xprt_get(struct svc_xprt *xprt)
>  {
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index c200d92..96eb873 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -1097,36 +1097,58 @@ struct svc_xprt *svc_find_xprt(struct svc_serv *serv, const char *xcl_name,
>  }
>  EXPORT_SYMBOL_GPL(svc_find_xprt);
>  
> -/*
> - * Format a buffer with a list of the active transports. A zero for
> - * the buflen parameter disables target buffer overflow checking.
> +static int svc_one_xprt_name(const struct svc_xprt *xprt,
> +			     char *pos, int remaining)
> +{
> +	int len;
> +
> +	len = snprintf(pos, remaining, "%s %u\n",
> +			xprt->xpt_class->xcl_name,
> +			svc_xprt_local_port(xprt));
> +	if (len >= remaining)
> +		return -ENAMETOOLONG;
> +	return len;
> +}
> +
> +/**
> + * svc_xprt_names - format a buffer with a list of transport names
> + * @serv: pointer to an RPC service
> + * @buf: pointer to a buffer to be filled in
> + * @buflen: length of buffer to be filled in
> + *
> + * Fills in @buf with a string containing a list of transport names,
> + * each name terminated with '\n'.
> + *
> + * Returns positive length of the filled-in string on success; otherwise
> + * a negative errno value is returned if an error occurs.
>   */
> -int svc_xprt_names(struct svc_serv *serv, char *buf, int buflen)
> +int svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen)
>  {
>  	struct svc_xprt *xprt;
> -	char xprt_str[64];
> -	int totlen = 0;
> -	int len;
> +	int len, totlen;
> +	char *pos;
>  
>  	/* Sanity check args */
>  	if (!serv)
>  		return 0;
>  
>  	spin_lock_bh(&serv->sv_lock);
> +
> +	pos = buf;
> +	totlen = 0;
>  	list_for_each_entry(xprt, &serv->sv_permsocks, xpt_list) {
> -		len = snprintf(xprt_str, sizeof(xprt_str),
> -			       "%s %d\n", xprt->xpt_class->xcl_name,
> -			       svc_xprt_local_port(xprt));
> -		/* If the string was truncated, replace with error string */
> -		if (len >= sizeof(xprt_str))
> -			strcpy(xprt_str, "name-too-long\n");
> -		/* Don't overflow buffer */
> -		len = strlen(xprt_str);
> -		if (buflen && (len + totlen >= buflen))
> +		len = svc_one_xprt_name(xprt, pos, buflen - totlen);
> +		if (len < 0) {
> +			*buf = '\0';
> +			totlen = len;
> +		}
> +		if (len <= 0)
>  			break;
> -		strcpy(buf+totlen, xprt_str);
> +
> +		pos += len;
>  		totlen += len;
>  	}
> +
>  	spin_unlock_bh(&serv->sv_lock);
>  	return totlen;
>  }
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/19] SUNRPC: Fix error return value of svc_addr_len()
  2009-04-27 23:51           ` J. Bruce Fields
@ 2009-04-28 15:28             ` Chuck Lever
  2009-04-28 15:31               ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-28 15:28 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs


On Apr 27, 2009, at 7:51 PM, J. Bruce Fields wrote:

> On Mon, Apr 27, 2009 at 12:49:07PM -0400, Chuck Lever wrote:
>> On Apr 25, 2009, at 6:17 PM, J. Bruce Fields wrote:
>>> On Thu, Apr 23, 2009 at 07:31:25PM -0400, Chuck Lever wrote:
>>>> The svc_addr_len() helper function returns -EAFNOSUPPORT if it
>>>> doesn't
>>>> recognize the address family of the passed-in socket address.
>>>> However,
>>>> the return type of this function is size_t, which means - 
>>>> EAFNOSUPPORT
>>>> is turned into a very large positive value in this case.
>>>>
>>>> The check in svc_udp_recvfrom() to see if the return value is less
>>>> than zero therefore won't work at all.
>>>>
>>>> Additionally, handle_connect_req() passes this value directly to
>>>> memset().  This could cause memset() to clobber a large chunk of
>>>> memory
>>>> if svc_addr_len() has returned an error.  Currently the address
>>>> family
>>>> of these addresses, however, is known to be supported long before
>>>> handle_connect_req() is called, so this isn't a real risk.
>>>>
>>>> Change the error return value of svc_addr_len() to zero, which fits
>>>> in
>>>> the range of size_t, and is safer to pass to memset() directly.
>>>>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>
>>>> include/linux/sunrpc/svc_xprt.h |    5 +++--
>>>> net/sunrpc/svcsock.c            |    7 ++++---
>>>> 2 files changed, 7 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/ 
>>>> sunrpc/
>>>> svc_xprt.h
>>>> index 0d9cb6e..d790c52 100644
>>>> --- a/include/linux/sunrpc/svc_xprt.h
>>>> +++ b/include/linux/sunrpc/svc_xprt.h
>>>> @@ -118,7 +118,7 @@ static inline unsigned short  
>>>> svc_addr_port(const
>>>> struct sockaddr *sa)
>>>> 	return 0;
>>>> }
>>>>
>>>> -static inline size_t svc_addr_len(struct sockaddr *sa)
>>>> +static inline size_t svc_addr_len(const struct sockaddr *sa)
>>>> {
>>>> 	switch (sa->sa_family) {
>>>> 	case AF_INET:
>>>> @@ -126,7 +126,8 @@ static inline size_t svc_addr_len(struct
>>>> sockaddr *sa)
>>>> 	case AF_INET6:
>>>> 		return sizeof(struct sockaddr_in6);
>>>> 	}
>>>> -	return -EAFNOSUPPORT;
>>>> +
>>>
>>> May as well stick a WARN() here too if only as a shorthand way of
>>> documenting that this isn't meant to happen.
>>
>> Sounds reasonable, but:
>>
>> 1.  I thought BUG() was the way to document "shouldn't happen," and
>
> That'd be OK too.
>
>>
>> 2.  What about all the other static inliney places this "shouldn't
>> happen" ?
>
> Such as?

nlm_cmp_addr(), nlm_privileged_requester(), svc_addr_port(),  
nfs_sockaddr_match_ipaddr(), nfs_sockaddr_cmp(), nfs_server_get_key(),  
nfs_set_port(), nfs_verify_server_address(), and so on.

> --b.
>
>>
>> I guess we kind of gave up on BUG'ing in all these places.
>>
>>> --b.
>>>
>>>> +	return 0;
>>>> }
>>>>
>>>> static inline unsigned short svc_xprt_local_port(const struct
>>>> svc_xprt *xprt)
>>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>>> index af31988..8b08328 100644
>>>> --- a/net/sunrpc/svcsock.c
>>>> +++ b/net/sunrpc/svcsock.c
>>>> @@ -426,13 +426,14 @@ static int svc_udp_recvfrom(struct svc_rqst
>>>> *rqstp)
>>>> 		long		all[SVC_PKTINFO_SPACE / sizeof(long)];
>>>> 	} buffer;
>>>> 	struct cmsghdr *cmh = &buffer.hdr;
>>>> -	int		err, len;
>>>> 	struct msghdr msg = {
>>>> 		.msg_name = svc_addr(rqstp),
>>>> 		.msg_control = cmh,
>>>> 		.msg_controllen = sizeof(buffer),
>>>> 		.msg_flags = MSG_DONTWAIT,
>>>> 	};
>>>> +	size_t len;
>>>> +	int err;
>>>>
>>>> 	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
>>>> 	    /* udp sockets need large rcvbuf as all pending
>>>> @@ -464,8 +465,8 @@ static int svc_udp_recvfrom(struct svc_rqst
>>>> *rqstp)
>>>> 		return -EAGAIN;
>>>> 	}
>>>> 	len = svc_addr_len(svc_addr(rqstp));
>>>> -	if (len < 0)
>>>> -		return len;
>>>> +	if (len == 0)
>>>> +		return -EAFNOSUPPORT;
>>>> 	rqstp->rq_addrlen = len;
>>>> 	if (skb->tstamp.tv64 == 0) {
>>>> 		skb->tstamp = ktime_get_real();
>>>>
>>
>> -- 
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/19] SUNRPC: Fix error return value of svc_addr_len()
  2009-04-28 15:28             ` Chuck Lever
@ 2009-04-28 15:31               ` J. Bruce Fields
  0 siblings, 0 replies; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-28 15:31 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Tue, Apr 28, 2009 at 11:28:47AM -0400, Chuck Lever wrote:
>
> On Apr 27, 2009, at 7:51 PM, J. Bruce Fields wrote:
>
>> On Mon, Apr 27, 2009 at 12:49:07PM -0400, Chuck Lever wrote:
>>> On Apr 25, 2009, at 6:17 PM, J. Bruce Fields wrote:
>>>> On Thu, Apr 23, 2009 at 07:31:25PM -0400, Chuck Lever wrote:
>>>>> The svc_addr_len() helper function returns -EAFNOSUPPORT if it
>>>>> doesn't
>>>>> recognize the address family of the passed-in socket address.
>>>>> However,
>>>>> the return type of this function is size_t, which means - 
>>>>> EAFNOSUPPORT
>>>>> is turned into a very large positive value in this case.
>>>>>
>>>>> The check in svc_udp_recvfrom() to see if the return value is less
>>>>> than zero therefore won't work at all.
>>>>>
>>>>> Additionally, handle_connect_req() passes this value directly to
>>>>> memset().  This could cause memset() to clobber a large chunk of
>>>>> memory
>>>>> if svc_addr_len() has returned an error.  Currently the address
>>>>> family
>>>>> of these addresses, however, is known to be supported long before
>>>>> handle_connect_req() is called, so this isn't a real risk.
>>>>>
>>>>> Change the error return value of svc_addr_len() to zero, which fits
>>>>> in
>>>>> the range of size_t, and is safer to pass to memset() directly.
>>>>>
>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>> ---
>>>>>
>>>>> include/linux/sunrpc/svc_xprt.h |    5 +++--
>>>>> net/sunrpc/svcsock.c            |    7 ++++---
>>>>> 2 files changed, 7 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/ 
>>>>> sunrpc/
>>>>> svc_xprt.h
>>>>> index 0d9cb6e..d790c52 100644
>>>>> --- a/include/linux/sunrpc/svc_xprt.h
>>>>> +++ b/include/linux/sunrpc/svc_xprt.h
>>>>> @@ -118,7 +118,7 @@ static inline unsigned short  
>>>>> svc_addr_port(const
>>>>> struct sockaddr *sa)
>>>>> 	return 0;
>>>>> }
>>>>>
>>>>> -static inline size_t svc_addr_len(struct sockaddr *sa)
>>>>> +static inline size_t svc_addr_len(const struct sockaddr *sa)
>>>>> {
>>>>> 	switch (sa->sa_family) {
>>>>> 	case AF_INET:
>>>>> @@ -126,7 +126,8 @@ static inline size_t svc_addr_len(struct
>>>>> sockaddr *sa)
>>>>> 	case AF_INET6:
>>>>> 		return sizeof(struct sockaddr_in6);
>>>>> 	}
>>>>> -	return -EAFNOSUPPORT;
>>>>> +
>>>>
>>>> May as well stick a WARN() here too if only as a shorthand way of
>>>> documenting that this isn't meant to happen.
>>>
>>> Sounds reasonable, but:
>>>
>>> 1.  I thought BUG() was the way to document "shouldn't happen," and
>>
>> That'd be OK too.
>>
>>>
>>> 2.  What about all the other static inliney places this "shouldn't
>>> happen" ?
>>
>> Such as?
>
> nlm_cmp_addr(), nlm_privileged_requester(), svc_addr_port(),  
> nfs_sockaddr_match_ipaddr(), nfs_sockaddr_cmp(), nfs_server_get_key(),  
> nfs_set_port(), nfs_verify_server_address(), and so on.

I see.  OK, I'm fine with leaving it as is.--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 18/19] lockd: Update NSM state from SM_MON replies
       [not found]     ` <20090423233332.17283.23011.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
@ 2009-04-28 16:25       ` J. Bruce Fields
  2009-04-28 16:34         ` Chuck Lever
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-28 16:25 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Thu, Apr 23, 2009 at 07:33:33PM -0400, Chuck Lever wrote:
> When rpc.statd starts up in user space at boot time, it attempts to
> write the latest NSM local state number into
> /proc/sys/fs/nfs/nsm_local_state.
> 
> If lockd.ko isn't loaded yet (as is the case in most configurations),
> that file doesn't exist, thus the kernel's NSM state remains set to
> its initial value of zero during lockd operation.
> 
> This is a problem because rpc.statd and lockd use the NSM state number
> to prevent repeated lock recovery on rebooted hosts.  If lockd sends
> a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state
> number is received, there is no way for lockd or rpc.statd to
> distinguish that stale SM_NOTIFY from an actual reboot.  Thus lock
> recovery could be performed after the rebooted host has already
> started reclaiming locks, and those locks will be lost.
> 
> We could change /etc/init.d/nfslock so it always modprobes lockd.ko
> before starting rpc.statd.  However, if lockd.ko is ever unloaded
> and reloaded, we are back at square one, since the NSM state is not
> preserved across an unload/reload cycle.  This may happen frequently
> on clients that use automounter.  A period of NFS inactivity causes
> lockd.ko to be unloaded, and the kernel loses its NSM state setting.

Aie.  Can we also fix the automounter or some other part of the
userspace configuration?

> Instead, let's use the fact that rpc.statd plants the local system's
> NSM state in every SM_MON (and SM_UNMON) reply.  lockd performs a
> synchronous SM_MON upcall to the local rpc.statd _before_ sending its
> first NLM request to a new remote.  This would permit rpc.statd to
> provide the current NSM state to lockd, even after lockd.ko had been
> unloaded and reloaded.
> 
> Note that NLMPROC_LOCK arguments are constructed before the
> nsm_monitor() call, so we have to rearrange argument construction very
> slightly to make this all work out.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> 
>  fs/lockd/clntproc.c |    2 +-
>  fs/lockd/mon.c      |    6 +++++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> index dd79570..f55b900 100644
> --- a/fs/lockd/clntproc.c
> +++ b/fs/lockd/clntproc.c
> @@ -126,7 +126,6 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
>  	struct nlm_lock	*lock = &argp->lock;
>  
>  	nlmclnt_next_cookie(&argp->cookie);
> -	argp->state   = nsm_local_state;
>  	memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode), sizeof(struct nfs_fh));
>  	lock->caller  = utsname()->nodename;
>  	lock->oh.data = req->a_owner;
> @@ -519,6 +518,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
>  
>  	if (nsm_monitor(host) < 0)
>  		goto out;
> +	req->a_args.state = nsm_local_state;

Hm.  It looks like a_args.state is never used, except in ifdef'd-out
code in nlm4svc_proc_lock() and nlmsvc_proc_lock() ifdef'd out.
Something's wrong there.  (Not your fault; but needs looking into.)

>  
>  	fl->fl_flags |= FL_ACCESS;
>  	status = do_vfs_lock(fl);
> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
> index 6d5d4a4..5017d50 100644
> --- a/fs/lockd/mon.c
> +++ b/fs/lockd/mon.c
> @@ -188,8 +188,12 @@ int nsm_monitor(const struct nlm_host *host)
>  		status = -EIO;
>  	if (status < 0)
>  		printk(KERN_NOTICE "lockd: cannot monitor %s\n", nsm->sm_name);
> -	else
> +	else {
>  		nsm->sm_monitored = 1;
> +		nsm_local_state = res.state;
> +		dprintk("lockd: nsm_monitor: NSM state is now %d\n",
> +				nsm_local_state);

Could we make that a dprintk in the case where this changes nsm_local
state from something other than zero (nsm_lock_state && nsm_local_state
!= res.state)?

(Just to make sure no statd is returning inconsistent nsm_local_stats
here.)

> +	}
>  	return status;
>  }

--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
       [not found]     ` <20090423233340.17283.29580.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
@ 2009-04-28 16:31       ` J. Bruce Fields
  2009-04-28 16:35         ` Chuck Lever
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-28 16:31 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, Mans Rullgard

On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
> Recently, commit ad5b365c fixed a data type alignment issue in
> nsm_init_private() by introducing put_unaligned().  We don't actually
> _require_ an unaligned access to nsm_private here.
> 
> Instead, we should always use memcpy/memcmp to access the contents of
> RPC opaque data types.  This permits us to continue to define these as
> simple character arrays, as most legacy RPC code does, and to rely on
> gcc to pack the fields in in-core structures optimally without breaking
> on platforms that require strict alignment.

OK, I'm confused.  What structures will get packed differently?

--b.

> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> 
>  fs/lockd/mon.c |   12 +++++-------
>  1 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
> index 5017d50..763509a 100644
> --- a/fs/lockd/mon.c
> +++ b/fs/lockd/mon.c
> @@ -16,8 +16,6 @@
>  #include <linux/sunrpc/svc.h>
>  #include <linux/lockd/lockd.h>
>  
> -#include <asm/unaligned.h>
> -
>  #define NLMDBG_FACILITY		NLMDBG_MONITOR
>  #define NSM_PROGRAM		100024
>  #define NSM_VERSION		1
> @@ -278,14 +276,14 @@ static struct nsm_handle *nsm_lookup_priv(const struct nsm_private *priv)
>   */
>  static void nsm_init_private(struct nsm_handle *nsm)
>  {
> -	u64 *p = (u64 *)&nsm->sm_priv.data;
>  	struct timespec ts;
> -	s64 ns;
> +	u64 buf[2];
>  
>  	ktime_get_ts(&ts);
> -	ns = timespec_to_ns(&ts);
> -	put_unaligned(ns, p);
> -	put_unaligned((unsigned long)nsm, p + 1);
> +	buf[0] = timespec_to_ns(&ts);
> +	buf[1] = (unsigned long)nsm;
> +
> +	memcpy(nsm->sm_priv.data, buf, SM_PRIV_SIZE);
>  }
>  
>  static struct nsm_handle *nsm_create_handle(const struct sockaddr *sap,
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 17/19] NFSD: Stricter buffer size checking in fs/nfsd/nfsctl.c
       [not found]     ` <20090423233325.17283.71127.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
@ 2009-04-28 16:31       ` J. Bruce Fields
  2009-04-28 16:36         ` Chuck Lever
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-28 16:31 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Thu, Apr 23, 2009 at 07:33:25PM -0400, Chuck Lever wrote:
> Clean up: For consistency, handle output buffer size checking in a
> other nfsctl functions the same way it's done for write_versions().

These all look fine to me, thanks.  Let me know when you want them
applied.

--b.

> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> 
>  fs/nfsd/nfsctl.c |   14 +++++++++-----
>  1 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index a152694..877e713 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -695,8 +695,9 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size)
>  		if (rv)
>  			return rv;
>  	}
> -	sprintf(buf, "%d\n", nfsd_nrthreads());
> -	return strlen(buf);
> +
> +	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%d\n",
> +							nfsd_nrthreads());
>  }
>  
>  /**
> @@ -1197,7 +1198,9 @@ static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
>  		nfsd_max_blksize = bsize;
>  		mutex_unlock(&nfsd_mutex);
>  	}
> -	return sprintf(buf, "%d\n", nfsd_max_blksize);
> +
> +	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%d\n",
> +							nfsd_max_blksize);
>  }
>  
>  #ifdef CONFIG_NFSD_V4
> @@ -1221,8 +1224,9 @@ static ssize_t __write_leasetime(struct file *file, char *buf, size_t size)
>  			return -EINVAL;
>  		nfs4_reset_lease(lease);
>  	}
> -	sprintf(buf, "%ld\n", nfs4_lease_time());
> -	return strlen(buf);
> +
> +	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%ld\n",
> +							nfs4_lease_time());
>  }
>  
>  /**
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 18/19] lockd: Update NSM state from SM_MON replies
  2009-04-28 16:25       ` J. Bruce Fields
@ 2009-04-28 16:34         ` Chuck Lever
  2009-04-28 16:38           ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-28 16:34 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Apr 28, 2009, at 12:25 PM, J. Bruce Fields wrote:
> On Thu, Apr 23, 2009 at 07:33:33PM -0400, Chuck Lever wrote:
>> When rpc.statd starts up in user space at boot time, it attempts to
>> write the latest NSM local state number into
>> /proc/sys/fs/nfs/nsm_local_state.
>>
>> If lockd.ko isn't loaded yet (as is the case in most configurations),
>> that file doesn't exist, thus the kernel's NSM state remains set to
>> its initial value of zero during lockd operation.
>>
>> This is a problem because rpc.statd and lockd use the NSM state  
>> number
>> to prevent repeated lock recovery on rebooted hosts.  If lockd sends
>> a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state
>> number is received, there is no way for lockd or rpc.statd to
>> distinguish that stale SM_NOTIFY from an actual reboot.  Thus lock
>> recovery could be performed after the rebooted host has already
>> started reclaiming locks, and those locks will be lost.
>>
>> We could change /etc/init.d/nfslock so it always modprobes lockd.ko
>> before starting rpc.statd.  However, if lockd.ko is ever unloaded
>> and reloaded, we are back at square one, since the NSM state is not
>> preserved across an unload/reload cycle.  This may happen frequently
>> on clients that use automounter.  A period of NFS inactivity causes
>> lockd.ko to be unloaded, and the kernel loses its NSM state setting.
>
> Aie.  Can we also fix the automounter or some other part of the
> userspace configuration?

User space isn't the problem here... it's the fact that lockd can get  
unloaded after a period of inactivity.  IMO lockd should be pinned in  
the kernel after it is loaded with /etc/init.d/nfslock.

>> Instead, let's use the fact that rpc.statd plants the local system's
>> NSM state in every SM_MON (and SM_UNMON) reply.  lockd performs a
>> synchronous SM_MON upcall to the local rpc.statd _before_ sending its
>> first NLM request to a new remote.  This would permit rpc.statd to
>> provide the current NSM state to lockd, even after lockd.ko had been
>> unloaded and reloaded.
>>
>> Note that NLMPROC_LOCK arguments are constructed before the
>> nsm_monitor() call, so we have to rearrange argument construction  
>> very
>> slightly to make this all work out.
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>
>> fs/lockd/clntproc.c |    2 +-
>> fs/lockd/mon.c      |    6 +++++-
>> 2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
>> index dd79570..f55b900 100644
>> --- a/fs/lockd/clntproc.c
>> +++ b/fs/lockd/clntproc.c
>> @@ -126,7 +126,6 @@ static void nlmclnt_setlockargs(struct nlm_rqst  
>> *req, struct file_lock *fl)
>> 	struct nlm_lock	*lock = &argp->lock;
>>
>> 	nlmclnt_next_cookie(&argp->cookie);
>> -	argp->state   = nsm_local_state;
>> 	memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode),  
>> sizeof(struct nfs_fh));
>> 	lock->caller  = utsname()->nodename;
>> 	lock->oh.data = req->a_owner;
>> @@ -519,6 +518,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct  
>> file_lock *fl)
>>
>> 	if (nsm_monitor(host) < 0)
>> 		goto out;
>> +	req->a_args.state = nsm_local_state;
>
> Hm.  It looks like a_args.state is never used, except in ifdef'd-out
> code in nlm4svc_proc_lock() and nlmsvc_proc_lock() ifdef'd out.
> Something's wrong there.  (Not your fault; but needs looking into.)

This isn't a big deal on the server side (I guess I should give this  
patch to Trond instead of you, in that case).

The client passes its NSM state number to the server in NLMPROC_LOCK  
calls.  There is no mechanism for the server to pass its NSM state  
number to the client via the NLM protocol.  So the first the client is  
aware of the server's NSM state number is after the server reboots  
(via SM_NOTIFY).  If the server never reboots, the client will never  
know the server's NSM state number.

>> 	fl->fl_flags |= FL_ACCESS;
>> 	status = do_vfs_lock(fl);
>> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
>> index 6d5d4a4..5017d50 100644
>> --- a/fs/lockd/mon.c
>> +++ b/fs/lockd/mon.c
>> @@ -188,8 +188,12 @@ int nsm_monitor(const struct nlm_host *host)
>> 		status = -EIO;
>> 	if (status < 0)
>> 		printk(KERN_NOTICE "lockd: cannot monitor %s\n", nsm->sm_name);
>> -	else
>> +	else {
>> 		nsm->sm_monitored = 1;
>> +		nsm_local_state = res.state;
>> +		dprintk("lockd: nsm_monitor: NSM state is now %d\n",
>> +				nsm_local_state);
>
> Could we make that a dprintk in the case where this changes nsm_local
> state from something other than zero (nsm_lock_state &&  
> nsm_local_state
> != res.state)?
>
> (Just to make sure no statd is returning inconsistent nsm_local_stats
> here.)

I'm not sure that's a big deal, but...

Note that the XNFS version 3 spec suggests the local lockd should  
request the NSM state number when it starts up by posting an  
SM_UNMON_ALL to the local statd.  That might be safer than loading it  
after every SM_MON.

>> +	}
>> 	return status;
>> }
>
> --b.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
  2009-04-28 16:31       ` J. Bruce Fields
@ 2009-04-28 16:35         ` Chuck Lever
  2009-04-28 16:40           ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-28 16:35 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, Mans Rullgard

On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>> Recently, commit ad5b365c fixed a data type alignment issue in
>> nsm_init_private() by introducing put_unaligned().  We don't actually
>> _require_ an unaligned access to nsm_private here.
>>
>> Instead, we should always use memcpy/memcmp to access the contents of
>> RPC opaque data types.  This permits us to continue to define these  
>> as
>> simple character arrays, as most legacy RPC code does, and to rely on
>> gcc to pack the fields in in-core structures optimally without  
>> breaking
>> on platforms that require strict alignment.
>
> OK, I'm confused.  What structures will get packed differently?

Any struct that has an nsm_private embedded in it, such as struct  
nlm_reboot.

> --b.
>
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>
>> fs/lockd/mon.c |   12 +++++-------
>> 1 files changed, 5 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
>> index 5017d50..763509a 100644
>> --- a/fs/lockd/mon.c
>> +++ b/fs/lockd/mon.c
>> @@ -16,8 +16,6 @@
>> #include <linux/sunrpc/svc.h>
>> #include <linux/lockd/lockd.h>
>>
>> -#include <asm/unaligned.h>
>> -
>> #define NLMDBG_FACILITY		NLMDBG_MONITOR
>> #define NSM_PROGRAM		100024
>> #define NSM_VERSION		1
>> @@ -278,14 +276,14 @@ static struct nsm_handle  
>> *nsm_lookup_priv(const struct nsm_private *priv)
>>  */
>> static void nsm_init_private(struct nsm_handle *nsm)
>> {
>> -	u64 *p = (u64 *)&nsm->sm_priv.data;
>> 	struct timespec ts;
>> -	s64 ns;
>> +	u64 buf[2];
>>
>> 	ktime_get_ts(&ts);
>> -	ns = timespec_to_ns(&ts);
>> -	put_unaligned(ns, p);
>> -	put_unaligned((unsigned long)nsm, p + 1);
>> +	buf[0] = timespec_to_ns(&ts);
>> +	buf[1] = (unsigned long)nsm;
>> +
>> +	memcpy(nsm->sm_priv.data, buf, SM_PRIV_SIZE);
>> }
>>
>> static struct nsm_handle *nsm_create_handle(const struct sockaddr  
>> *sap,
>>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 17/19] NFSD: Stricter buffer size checking in fs/nfsd/nfsctl.c
  2009-04-28 16:31       ` J. Bruce Fields
@ 2009-04-28 16:36         ` Chuck Lever
  2009-04-28 21:30           ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-28 16:36 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
> On Thu, Apr 23, 2009 at 07:33:25PM -0400, Chuck Lever wrote:
>> Clean up: For consistency, handle output buffer size checking in a
>> other nfsctl functions the same way it's done for write_versions().
>
> These all look fine to me, thanks.  Let me know when you want them
> applied.

Just to confirm that 1/19 looks OK to you too... I think you can apply  
1 - 17 now.

> --b.
>
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>
>> fs/nfsd/nfsctl.c |   14 +++++++++-----
>> 1 files changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>> index a152694..877e713 100644
>> --- a/fs/nfsd/nfsctl.c
>> +++ b/fs/nfsd/nfsctl.c
>> @@ -695,8 +695,9 @@ static ssize_t write_threads(struct file *file,  
>> char *buf, size_t size)
>> 		if (rv)
>> 			return rv;
>> 	}
>> -	sprintf(buf, "%d\n", nfsd_nrthreads());
>> -	return strlen(buf);
>> +
>> +	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%d\n",
>> +							nfsd_nrthreads());
>> }
>>
>> /**
>> @@ -1197,7 +1198,9 @@ static ssize_t write_maxblksize(struct file  
>> *file, char *buf, size_t size)
>> 		nfsd_max_blksize = bsize;
>> 		mutex_unlock(&nfsd_mutex);
>> 	}
>> -	return sprintf(buf, "%d\n", nfsd_max_blksize);
>> +
>> +	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%d\n",
>> +							nfsd_max_blksize);
>> }
>>
>> #ifdef CONFIG_NFSD_V4
>> @@ -1221,8 +1224,9 @@ static ssize_t __write_leasetime(struct file  
>> *file, char *buf, size_t size)
>> 			return -EINVAL;
>> 		nfs4_reset_lease(lease);
>> 	}
>> -	sprintf(buf, "%ld\n", nfs4_lease_time());
>> -	return strlen(buf);
>> +
>> +	return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%ld\n",
>> +							nfs4_lease_time());
>> }
>>
>> /**
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 18/19] lockd: Update NSM state from SM_MON replies
  2009-04-28 16:34         ` Chuck Lever
@ 2009-04-28 16:38           ` J. Bruce Fields
  2009-04-28 19:11             ` Chuck Lever
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-28 16:38 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Tue, Apr 28, 2009 at 12:34:19PM -0400, Chuck Lever wrote:
> On Apr 28, 2009, at 12:25 PM, J. Bruce Fields wrote:
>> On Thu, Apr 23, 2009 at 07:33:33PM -0400, Chuck Lever wrote:
>>> When rpc.statd starts up in user space at boot time, it attempts to
>>> write the latest NSM local state number into
>>> /proc/sys/fs/nfs/nsm_local_state.
>>>
>>> If lockd.ko isn't loaded yet (as is the case in most configurations),
>>> that file doesn't exist, thus the kernel's NSM state remains set to
>>> its initial value of zero during lockd operation.
>>>
>>> This is a problem because rpc.statd and lockd use the NSM state  
>>> number
>>> to prevent repeated lock recovery on rebooted hosts.  If lockd sends
>>> a zero NSM state, but then a delayed SM_NOTIFY with a real NSM state
>>> number is received, there is no way for lockd or rpc.statd to
>>> distinguish that stale SM_NOTIFY from an actual reboot.  Thus lock
>>> recovery could be performed after the rebooted host has already
>>> started reclaiming locks, and those locks will be lost.
>>>
>>> We could change /etc/init.d/nfslock so it always modprobes lockd.ko
>>> before starting rpc.statd.  However, if lockd.ko is ever unloaded
>>> and reloaded, we are back at square one, since the NSM state is not
>>> preserved across an unload/reload cycle.  This may happen frequently
>>> on clients that use automounter.  A period of NFS inactivity causes
>>> lockd.ko to be unloaded, and the kernel loses its NSM state setting.
>>
>> Aie.  Can we also fix the automounter or some other part of the
>> userspace configuration?
>
> User space isn't the problem here... it's the fact that lockd can get  
> unloaded after a period of inactivity.  IMO lockd should be pinned in  
> the kernel after it is loaded with /etc/init.d/nfslock.
>
>>> Instead, let's use the fact that rpc.statd plants the local system's
>>> NSM state in every SM_MON (and SM_UNMON) reply.  lockd performs a
>>> synchronous SM_MON upcall to the local rpc.statd _before_ sending its
>>> first NLM request to a new remote.  This would permit rpc.statd to
>>> provide the current NSM state to lockd, even after lockd.ko had been
>>> unloaded and reloaded.
>>>
>>> Note that NLMPROC_LOCK arguments are constructed before the
>>> nsm_monitor() call, so we have to rearrange argument construction  
>>> very
>>> slightly to make this all work out.
>>>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>
>>> fs/lockd/clntproc.c |    2 +-
>>> fs/lockd/mon.c      |    6 +++++-
>>> 2 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
>>> index dd79570..f55b900 100644
>>> --- a/fs/lockd/clntproc.c
>>> +++ b/fs/lockd/clntproc.c
>>> @@ -126,7 +126,6 @@ static void nlmclnt_setlockargs(struct nlm_rqst  
>>> *req, struct file_lock *fl)
>>> 	struct nlm_lock	*lock = &argp->lock;
>>>
>>> 	nlmclnt_next_cookie(&argp->cookie);
>>> -	argp->state   = nsm_local_state;
>>> 	memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode),  
>>> sizeof(struct nfs_fh));
>>> 	lock->caller  = utsname()->nodename;
>>> 	lock->oh.data = req->a_owner;
>>> @@ -519,6 +518,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct  
>>> file_lock *fl)
>>>
>>> 	if (nsm_monitor(host) < 0)
>>> 		goto out;
>>> +	req->a_args.state = nsm_local_state;
>>
>> Hm.  It looks like a_args.state is never used, except in ifdef'd-out
>> code in nlm4svc_proc_lock() and nlmsvc_proc_lock() ifdef'd out.
>> Something's wrong there.  (Not your fault; but needs looking into.)
>
> This isn't a big deal on the server side (I guess I should give this  
> patch to Trond instead of you, in that case).
>
> The client passes its NSM state number to the server in NLMPROC_LOCK  
> calls.  There is no mechanism for the server to pass its NSM state  
> number to the client via the NLM protocol.  So the first the client is  
> aware of the server's NSM state number is after the server reboots (via 
> SM_NOTIFY).  If the server never reboots, the client will never know the 
> server's NSM state number.

So the #if 0'd code should just be deleted?

--b.

>
>>> 	fl->fl_flags |= FL_ACCESS;
>>> 	status = do_vfs_lock(fl);
>>> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
>>> index 6d5d4a4..5017d50 100644
>>> --- a/fs/lockd/mon.c
>>> +++ b/fs/lockd/mon.c
>>> @@ -188,8 +188,12 @@ int nsm_monitor(const struct nlm_host *host)
>>> 		status = -EIO;
>>> 	if (status < 0)
>>> 		printk(KERN_NOTICE "lockd: cannot monitor %s\n", nsm->sm_name);
>>> -	else
>>> +	else {
>>> 		nsm->sm_monitored = 1;
>>> +		nsm_local_state = res.state;
>>> +		dprintk("lockd: nsm_monitor: NSM state is now %d\n",
>>> +				nsm_local_state);
>>
>> Could we make that a dprintk in the case where this changes nsm_local
>> state from something other than zero (nsm_lock_state &&  
>> nsm_local_state
>> != res.state)?
>>
>> (Just to make sure no statd is returning inconsistent nsm_local_stats
>> here.)
>
> I'm not sure that's a big deal, but...
>
> Note that the XNFS version 3 spec suggests the local lockd should  
> request the NSM state number when it starts up by posting an  
> SM_UNMON_ALL to the local statd.  That might be safer than loading it  
> after every SM_MON.
>
>>> +	}
>>> 	return status;
>>> }
>>
>> --b.
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
  2009-04-28 16:35         ` Chuck Lever
@ 2009-04-28 16:40           ` J. Bruce Fields
  2009-04-28 17:24             ` Chuck Lever
  0 siblings, 1 reply; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-28 16:40 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, Mans Rullgard

On Tue, Apr 28, 2009 at 12:35:50PM -0400, Chuck Lever wrote:
> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>>> Recently, commit ad5b365c fixed a data type alignment issue in
>>> nsm_init_private() by introducing put_unaligned().  We don't actually
>>> _require_ an unaligned access to nsm_private here.
>>>
>>> Instead, we should always use memcpy/memcmp to access the contents of
>>> RPC opaque data types.  This permits us to continue to define these  
>>> as
>>> simple character arrays, as most legacy RPC code does, and to rely on
>>> gcc to pack the fields in in-core structures optimally without  
>>> breaking
>>> on platforms that require strict alignment.
>>
>> OK, I'm confused.  What structures will get packed differently?
>
> Any struct that has an nsm_private embedded in it, such as struct  
> nlm_reboot.

I don't see how that or any structure is changed by this patch.

--b.

>
>> --b.
>>
>>>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>
>>> fs/lockd/mon.c |   12 +++++-------
>>> 1 files changed, 5 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
>>> index 5017d50..763509a 100644
>>> --- a/fs/lockd/mon.c
>>> +++ b/fs/lockd/mon.c
>>> @@ -16,8 +16,6 @@
>>> #include <linux/sunrpc/svc.h>
>>> #include <linux/lockd/lockd.h>
>>>
>>> -#include <asm/unaligned.h>
>>> -
>>> #define NLMDBG_FACILITY		NLMDBG_MONITOR
>>> #define NSM_PROGRAM		100024
>>> #define NSM_VERSION		1
>>> @@ -278,14 +276,14 @@ static struct nsm_handle  
>>> *nsm_lookup_priv(const struct nsm_private *priv)
>>>  */
>>> static void nsm_init_private(struct nsm_handle *nsm)
>>> {
>>> -	u64 *p = (u64 *)&nsm->sm_priv.data;
>>> 	struct timespec ts;
>>> -	s64 ns;
>>> +	u64 buf[2];
>>>
>>> 	ktime_get_ts(&ts);
>>> -	ns = timespec_to_ns(&ts);
>>> -	put_unaligned(ns, p);
>>> -	put_unaligned((unsigned long)nsm, p + 1);
>>> +	buf[0] = timespec_to_ns(&ts);
>>> +	buf[1] = (unsigned long)nsm;
>>> +
>>> +	memcpy(nsm->sm_priv.data, buf, SM_PRIV_SIZE);
>>> }
>>>
>>> static struct nsm_handle *nsm_create_handle(const struct sockaddr  
>>> *sap,
>>>
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
  2009-04-28 16:40           ` J. Bruce Fields
@ 2009-04-28 17:24             ` Chuck Lever
  2009-04-28 21:36               ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-28 17:24 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, Mans Rullgard

On Apr 28, 2009, at 12:40 PM, J. Bruce Fields wrote:
> On Tue, Apr 28, 2009 at 12:35:50PM -0400, Chuck Lever wrote:
>> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>>> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>>>> Recently, commit ad5b365c fixed a data type alignment issue in
>>>> nsm_init_private() by introducing put_unaligned().  We don't  
>>>> actually
>>>> _require_ an unaligned access to nsm_private here.
>>>>
>>>> Instead, we should always use memcpy/memcmp to access the  
>>>> contents of
>>>> RPC opaque data types.  This permits us to continue to define these
>>>> as
>>>> simple character arrays, as most legacy RPC code does, and to  
>>>> rely on
>>>> gcc to pack the fields in in-core structures optimally without
>>>> breaking
>>>> on platforms that require strict alignment.
>>>
>>> OK, I'm confused.  What structures will get packed differently?
>>
>> Any struct that has an nsm_private embedded in it, such as struct
>> nlm_reboot.
>
> I don't see how that or any structure is changed by this patch.

It's not.  Note the phrase above in the description: "permits us to  
_continue_ to define these" -- meaning, I'm not changing the structures.

It has been suggested that we use a union or __aligned attribute for  
RPC opaques.  The problem with that is that it would cause structs  
like nlm_reboot to balloon in size; char x[] is aligned on bytes, but  
a union of u8, u32, and u64 would be aligned on a double-word  
boundary.  That would make such structures larger.

Legacy RPC code, and any code generated by rpcgen, generally defines  
an opaque as a character array.  So again, using a union would be  
inconsistent with other uses of RPC opaques.

The point is we want to define and access RPC opaque's consistently,  
using memset() and memcpy(), since these are nothing more than byte  
arrays.  The code in nsm_init_private() was an exception to this, for  
no reason.  We don't really need special alignment macros in there, as  
long as we access the RPC opaque field with the byte sets and copies.

Would it help if I described this patch as a "clean up" ?

> --b.
>
>>
>>> --b.
>>>
>>>>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>
>>>> fs/lockd/mon.c |   12 +++++-------
>>>> 1 files changed, 5 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
>>>> index 5017d50..763509a 100644
>>>> --- a/fs/lockd/mon.c
>>>> +++ b/fs/lockd/mon.c
>>>> @@ -16,8 +16,6 @@
>>>> #include <linux/sunrpc/svc.h>
>>>> #include <linux/lockd/lockd.h>
>>>>
>>>> -#include <asm/unaligned.h>
>>>> -
>>>> #define NLMDBG_FACILITY		NLMDBG_MONITOR
>>>> #define NSM_PROGRAM		100024
>>>> #define NSM_VERSION		1
>>>> @@ -278,14 +276,14 @@ static struct nsm_handle
>>>> *nsm_lookup_priv(const struct nsm_private *priv)
>>>> */
>>>> static void nsm_init_private(struct nsm_handle *nsm)
>>>> {
>>>> -	u64 *p = (u64 *)&nsm->sm_priv.data;
>>>> 	struct timespec ts;
>>>> -	s64 ns;
>>>> +	u64 buf[2];
>>>>
>>>> 	ktime_get_ts(&ts);
>>>> -	ns = timespec_to_ns(&ts);
>>>> -	put_unaligned(ns, p);
>>>> -	put_unaligned((unsigned long)nsm, p + 1);
>>>> +	buf[0] = timespec_to_ns(&ts);
>>>> +	buf[1] = (unsigned long)nsm;
>>>> +
>>>> +	memcpy(nsm->sm_priv.data, buf, SM_PRIV_SIZE);
>>>> }
>>>>
>>>> static struct nsm_handle *nsm_create_handle(const struct sockaddr
>>>> *sap,
>>>>
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 18/19] lockd: Update NSM state from SM_MON replies
  2009-04-28 16:38           ` J. Bruce Fields
@ 2009-04-28 19:11             ` Chuck Lever
  2009-05-08 15:19               ` Chuck Lever
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-28 19:11 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Apr 28, 2009, at 12:38 PM, J. Bruce Fields wrote:
> On Tue, Apr 28, 2009 at 12:34:19PM -0400, Chuck Lever wrote:
>> On Apr 28, 2009, at 12:25 PM, J. Bruce Fields wrote:
>>> On Thu, Apr 23, 2009 at 07:33:33PM -0400, Chuck Lever wrote:
>>>> When rpc.statd starts up in user space at boot time, it attempts to
>>>> write the latest NSM local state number into
>>>> /proc/sys/fs/nfs/nsm_local_state.
>>>>
>>>> If lockd.ko isn't loaded yet (as is the case in most  
>>>> configurations),
>>>> that file doesn't exist, thus the kernel's NSM state remains set to
>>>> its initial value of zero during lockd operation.
>>>>
>>>> This is a problem because rpc.statd and lockd use the NSM state
>>>> number
>>>> to prevent repeated lock recovery on rebooted hosts.  If lockd  
>>>> sends
>>>> a zero NSM state, but then a delayed SM_NOTIFY with a real NSM  
>>>> state
>>>> number is received, there is no way for lockd or rpc.statd to
>>>> distinguish that stale SM_NOTIFY from an actual reboot.  Thus lock
>>>> recovery could be performed after the rebooted host has already
>>>> started reclaiming locks, and those locks will be lost.
>>>>
>>>> We could change /etc/init.d/nfslock so it always modprobes lockd.ko
>>>> before starting rpc.statd.  However, if lockd.ko is ever unloaded
>>>> and reloaded, we are back at square one, since the NSM state is not
>>>> preserved across an unload/reload cycle.  This may happen  
>>>> frequently
>>>> on clients that use automounter.  A period of NFS inactivity causes
>>>> lockd.ko to be unloaded, and the kernel loses its NSM state  
>>>> setting.
>>>
>>> Aie.  Can we also fix the automounter or some other part of the
>>> userspace configuration?
>>
>> User space isn't the problem here... it's the fact that lockd can get
>> unloaded after a period of inactivity.  IMO lockd should be pinned in
>> the kernel after it is loaded with /etc/init.d/nfslock.
>>
>>>> Instead, let's use the fact that rpc.statd plants the local  
>>>> system's
>>>> NSM state in every SM_MON (and SM_UNMON) reply.  lockd performs a
>>>> synchronous SM_MON upcall to the local rpc.statd _before_ sending  
>>>> its
>>>> first NLM request to a new remote.  This would permit rpc.statd to
>>>> provide the current NSM state to lockd, even after lockd.ko had  
>>>> been
>>>> unloaded and reloaded.
>>>>
>>>> Note that NLMPROC_LOCK arguments are constructed before the
>>>> nsm_monitor() call, so we have to rearrange argument construction
>>>> very
>>>> slightly to make this all work out.
>>>>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>
>>>> fs/lockd/clntproc.c |    2 +-
>>>> fs/lockd/mon.c      |    6 +++++-
>>>> 2 files changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
>>>> index dd79570..f55b900 100644
>>>> --- a/fs/lockd/clntproc.c
>>>> +++ b/fs/lockd/clntproc.c
>>>> @@ -126,7 +126,6 @@ static void nlmclnt_setlockargs(struct nlm_rqst
>>>> *req, struct file_lock *fl)
>>>> 	struct nlm_lock	*lock = &argp->lock;
>>>>
>>>> 	nlmclnt_next_cookie(&argp->cookie);
>>>> -	argp->state   = nsm_local_state;
>>>> 	memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode),
>>>> sizeof(struct nfs_fh));
>>>> 	lock->caller  = utsname()->nodename;
>>>> 	lock->oh.data = req->a_owner;
>>>> @@ -519,6 +518,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct
>>>> file_lock *fl)
>>>>
>>>> 	if (nsm_monitor(host) < 0)
>>>> 		goto out;
>>>> +	req->a_args.state = nsm_local_state;
>>>
>>> Hm.  It looks like a_args.state is never used, except in ifdef'd-out
>>> code in nlm4svc_proc_lock() and nlmsvc_proc_lock() ifdef'd out.
>>> Something's wrong there.  (Not your fault; but needs looking into.)
>>
>> This isn't a big deal on the server side (I guess I should give this
>> patch to Trond instead of you, in that case).
>>
>> The client passes its NSM state number to the server in NLMPROC_LOCK
>> calls.  There is no mechanism for the server to pass its NSM state
>> number to the client via the NLM protocol.  So the first the client  
>> is
>> aware of the server's NSM state number is after the server reboots  
>> (via
>> SM_NOTIFY).  If the server never reboots, the client will never  
>> know the
>> server's NSM state number.
>
> So the #if 0'd code should just be deleted?

OK, I misread your question before.

As I read the code, our server does not appear to utilize the client's  
NSM state number, except for gating SM_NOTIFY requests with a  
previously-seen NSM state number.  The #ifdef'd code would potentially  
deny lock requests if it detected the state number going backwards.

It would be nicer if the server actually tracked the client's state  
number, but it doesn't appear to do that today.  The #ifdef'd code  
serves to remind us that we should consider this.  This would also  
prevent a delayed SM_NOTIFY from causing the server to drop locks  
reacquired during the grace period accidentally.

So I think it would be good to leave it, or replace it with a FIXME  
comment, for now.  Eventually we should add a little extra logic to  
handle this case.

> --b.
>
>>
>>>> 	fl->fl_flags |= FL_ACCESS;
>>>> 	status = do_vfs_lock(fl);
>>>> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
>>>> index 6d5d4a4..5017d50 100644
>>>> --- a/fs/lockd/mon.c
>>>> +++ b/fs/lockd/mon.c
>>>> @@ -188,8 +188,12 @@ int nsm_monitor(const struct nlm_host *host)
>>>> 		status = -EIO;
>>>> 	if (status < 0)
>>>> 		printk(KERN_NOTICE "lockd: cannot monitor %s\n", nsm->sm_name);
>>>> -	else
>>>> +	else {
>>>> 		nsm->sm_monitored = 1;
>>>> +		nsm_local_state = res.state;
>>>> +		dprintk("lockd: nsm_monitor: NSM state is now %d\n",
>>>> +				nsm_local_state);
>>>
>>> Could we make that a dprintk in the case where this changes  
>>> nsm_local
>>> state from something other than zero (nsm_lock_state &&
>>> nsm_local_state
>>> != res.state)?
>>>
>>> (Just to make sure no statd is returning inconsistent  
>>> nsm_local_stats
>>> here.)
>>
>> I'm not sure that's a big deal, but...
>>
>> Note that the XNFS version 3 spec suggests the local lockd should
>> request the NSM state number when it starts up by posting an
>> SM_UNMON_ALL to the local statd.  That might be safer than loading it
>> after every SM_MON.
>>
>>>> +	}
>>>> 	return status;
>>>> }
>>>
>>> --b.
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 17/19] NFSD: Stricter buffer size checking in fs/nfsd/nfsctl.c
  2009-04-28 16:36         ` Chuck Lever
@ 2009-04-28 21:30           ` J. Bruce Fields
  0 siblings, 0 replies; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-28 21:30 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Tue, Apr 28, 2009 at 12:36:38PM -0400, Chuck Lever wrote:
> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>> On Thu, Apr 23, 2009 at 07:33:25PM -0400, Chuck Lever wrote:
>>> Clean up: For consistency, handle output buffer size checking in a
>>> other nfsctl functions the same way it's done for write_versions().
>>
>> These all look fine to me, thanks.  Let me know when you want them
>> applied.
>
> Just to confirm that 1/19 looks OK to you too... I think you can apply 1 
> - 17 now.

OK, applied for 2.6.31.

--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
  2009-04-28 17:24             ` Chuck Lever
@ 2009-04-28 21:36               ` J. Bruce Fields
  2009-04-28 22:03                 ` Måns Rullgård
  2009-04-28 22:11                 ` Chuck Lever
  0 siblings, 2 replies; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-28 21:36 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, Mans Rullgard

On Tue, Apr 28, 2009 at 01:24:24PM -0400, Chuck Lever wrote:
> On Apr 28, 2009, at 12:40 PM, J. Bruce Fields wrote:
>> On Tue, Apr 28, 2009 at 12:35:50PM -0400, Chuck Lever wrote:
>>> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>>>> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>>>>> Recently, commit ad5b365c fixed a data type alignment issue in
>>>>> nsm_init_private() by introducing put_unaligned().  We don't  
>>>>> actually
>>>>> _require_ an unaligned access to nsm_private here.
>>>>>
>>>>> Instead, we should always use memcpy/memcmp to access the  
>>>>> contents of
>>>>> RPC opaque data types.  This permits us to continue to define these
>>>>> as
>>>>> simple character arrays, as most legacy RPC code does, and to  
>>>>> rely on
>>>>> gcc to pack the fields in in-core structures optimally without
>>>>> breaking
>>>>> on platforms that require strict alignment.
>>>>
>>>> OK, I'm confused.  What structures will get packed differently?
>>>
>>> Any struct that has an nsm_private embedded in it, such as struct
>>> nlm_reboot.
>>
>> I don't see how that or any structure is changed by this patch.
>
> It's not.  Note the phrase above in the description: "permits us to  
> _continue_ to define these" -- meaning, I'm not changing the structures.

Err, but that's not right either, is it?:  We don't need to apply this
patch in order to continue to define the structures as they're currently
defined.

Help! I'm confused!

> It has been suggested that we use a union or __aligned attribute for RPC 
> opaques.  The problem with that is that it would cause structs like 
> nlm_reboot to balloon in size; char x[] is aligned on bytes, but a union 
> of u8, u32, and u64 would be aligned on a double-word boundary.  That 
> would make such structures larger.

OK, I agree, so let's not do that.

--b.

> Legacy RPC code, and any code generated by rpcgen, generally defines an 
> opaque as a character array.  So again, using a union would be  
> inconsistent with other uses of RPC opaques.
>
> The point is we want to define and access RPC opaque's consistently,  
> using memset() and memcpy(), since these are nothing more than byte  
> arrays.  The code in nsm_init_private() was an exception to this, for no 
> reason.  We don't really need special alignment macros in there, as long 
> as we access the RPC opaque field with the byte sets and copies.
>
> Would it help if I described this patch as a "clean up" ?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
  2009-04-28 21:36               ` J. Bruce Fields
@ 2009-04-28 22:03                 ` Måns Rullgård
       [not found]                   ` <yw1x63gozb9f.fsf-O+uoZmgXk1l54TAoqtyWWQ@public.gmane.org>
  2009-04-28 22:11                 ` Chuck Lever
  1 sibling, 1 reply; 51+ messages in thread
From: Måns Rullgård @ 2009-04-28 22:03 UTC (permalink / raw)
  To: linux-nfs

"J. Bruce Fields" <bfields@fieldses.org>
writes:

> On Tue, Apr 28, 2009 at 01:24:24PM -0400, Chuck Lever wrote:
>> On Apr 28, 2009, at 12:40 PM, J. Bruce Fields wrote:
>>> On Tue, Apr 28, 2009 at 12:35:50PM -0400, Chuck Lever wrote:
>>>> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>>>>> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>>>>>> Recently, commit ad5b365c fixed a data type alignment issue in
>>>>>> nsm_init_private() by introducing put_unaligned().  We don't
>>>>>> actually _require_ an unaligned access to nsm_private here.
>>>>>>
>>>>>> Instead, we should always use memcpy/memcmp to access the
>>>>>> contents of RPC opaque data types.  This permits us to continue
>>>>>> to define these as simple character arrays, as most legacy RPC
>>>>>> code does, and to rely on gcc to pack the fields in in-core
>>>>>> structures optimally without breaking on platforms that require
>>>>>> strict alignment.
>>>>>
>>>>> OK, I'm confused.  What structures will get packed differently?
>>>>
>>>> Any struct that has an nsm_private embedded in it, such as struct
>>>> nlm_reboot.
>>>
>>> I don't see how that or any structure is changed by this patch.
>>
>> It's not.  Note the phrase above in the description: "permits us to =
=20
>> _continue_ to define these" -- meaning, I'm not changing the structu=
res.
>
> Err, but that's not right either, is it?:  We don't need to apply thi=
s
> patch in order to continue to define the structures as they're curren=
tly
> defined.
>
> Help! I'm confused!

I too fail to see the reasoning with the memcpy().  It achieves
exactly the same end result using twice as much work.  Also, what if
SM_PRIV_SIZE is ever increased?  Then it will be copying "random" data
from the stack into the nsm_private data.  This strikes me as a little
fragile.

--=20
M=E5ns Rullg=E5rd
mans-2StjZFpD7GcAvxtiuMwx3w@public.gmane.org


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
  2009-04-28 21:36               ` J. Bruce Fields
  2009-04-28 22:03                 ` Måns Rullgård
@ 2009-04-28 22:11                 ` Chuck Lever
  2009-04-28 22:23                   ` J. Bruce Fields
  2009-04-28 22:31                   ` Måns Rullgård
  1 sibling, 2 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-28 22:11 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, Mans Rullgard

On Apr 28, 2009, at 5:36 PM, J. Bruce Fields wrote:

> On Tue, Apr 28, 2009 at 01:24:24PM -0400, Chuck Lever wrote:
>> On Apr 28, 2009, at 12:40 PM, J. Bruce Fields wrote:
>>> On Tue, Apr 28, 2009 at 12:35:50PM -0400, Chuck Lever wrote:
>>>> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>>>>> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>>>>>> Recently, commit ad5b365c fixed a data type alignment issue in
>>>>>> nsm_init_private() by introducing put_unaligned().  We don't
>>>>>> actually
>>>>>> _require_ an unaligned access to nsm_private here.
>>>>>>
>>>>>> Instead, we should always use memcpy/memcmp to access the
>>>>>> contents of
>>>>>> RPC opaque data types.  This permits us to continue to define  
>>>>>> these
>>>>>> as
>>>>>> simple character arrays, as most legacy RPC code does, and to
>>>>>> rely on
>>>>>> gcc to pack the fields in in-core structures optimally without
>>>>>> breaking
>>>>>> on platforms that require strict alignment.
>>>>>
>>>>> OK, I'm confused.  What structures will get packed differently?
>>>>
>>>> Any struct that has an nsm_private embedded in it, such as struct
>>>> nlm_reboot.
>>>
>>> I don't see how that or any structure is changed by this patch.
>>
>> It's not.  Note the phrase above in the description: "permits us to
>> _continue_ to define these" -- meaning, I'm not changing the  
>> structures.
>
> Err, but that's not right either, is it?:  We don't need to apply this
> patch in order to continue to define the structures as they're  
> currently
> defined.
>
> Help! I'm confused!

This patch is simply a clean up.  We don't need to use put_unaligned  
in nsm_init_private.  There is absolutely nothing special about the  
nsm_private data type that would require this.  It should be accessed  
and modified the way all other RPC opaques are, via memset/memcpy.

Take a look at http://thread.gmane.org/gmane.linux.nfs/25607 and  
commit ad5b365c.

The controversy is over how to define opaques so they are accessible  
on both 32- and 64-bit hardware platforms.  My first pass at  
nsm_init_private worked on 32-bit systems, but broke on 64-bit  
systems.  An expedient fix for this was to add the put_unaligned in  
there so 64-bit systems could access the field no matter how it was  
aligned.  I argue this is unneeded complexity, and inconsistent with  
the way most other RPC opaques are treated in the kernel.

Andrew Morton proposed making RPC opaques a union of u8, u32 (or  
__be32), and u64 -- the u8 would allow us to treat an opaque as a byte  
array when needed, the u32 would allow access via XDR quads, and the  
u64 would force 64-bit alignment.  The issues with this are:

1.  Defined this way, opaque fields in data structures will force the  
encompassing structures to be large enough to honor the alignment  
requirements of the fields, and

2.  Most other RPC opaques are already defined as character arrays, so  
we would have to visit all of them to see if there were issues.

If we insist on accessing opaques only via memset() and memcpy()  
problem 1 goes away and we remain compatible with the traditional  
definition of an RPC opaque as an array of bytes, on both 64- and 32- 
bit systems.

The description for this patch can be rewritten this way:

"Clean up: There is nothing special about the nsm_private data type  
that requires the use of put_unaligned() to access it.  Rewrite  
nsm_init_private so it accesses nsm_private the way other code  
accesses other RPC opaques.

See kernel bugzilla 12995."

>> It has been suggested that we use a union or __aligned attribute  
>> for RPC
>> opaques.  The problem with that is that it would cause structs like
>> nlm_reboot to balloon in size; char x[] is aligned on bytes, but a  
>> union
>> of u8, u32, and u64 would be aligned on a double-word boundary.  That
>> would make such structures larger.
>
> OK, I agree, so let's not do that.
>
>
> --b.
>
>> Legacy RPC code, and any code generated by rpcgen, generally  
>> defines an
>> opaque as a character array.  So again, using a union would be
>> inconsistent with other uses of RPC opaques.
>>
>> The point is we want to define and access RPC opaque's consistently,
>> using memset() and memcpy(), since these are nothing more than byte
>> arrays.  The code in nsm_init_private() was an exception to this,  
>> for no
>> reason.  We don't really need special alignment macros in there, as  
>> long
>> as we access the RPC opaque field with the byte sets and copies.
>>
>> Would it help if I described this patch as a "clean up" ?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"  
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
       [not found]                   ` <yw1x63gozb9f.fsf-O+uoZmgXk1l54TAoqtyWWQ@public.gmane.org>
@ 2009-04-28 22:14                     ` Chuck Lever
  0 siblings, 0 replies; 51+ messages in thread
From: Chuck Lever @ 2009-04-28 22:14 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: linux-nfs


On Apr 28, 2009, at 6:03 PM, M=E5ns Rullg=E5rd wrote:

> "J. Bruce Fields" <bfields@fieldses.org>
> writes:
>
>> On Tue, Apr 28, 2009 at 01:24:24PM -0400, Chuck Lever wrote:
>>> On Apr 28, 2009, at 12:40 PM, J. Bruce Fields wrote:
>>>> On Tue, Apr 28, 2009 at 12:35:50PM -0400, Chuck Lever wrote:
>>>>> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>>>>>> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>>>>>>> Recently, commit ad5b365c fixed a data type alignment issue in
>>>>>>> nsm_init_private() by introducing put_unaligned().  We don't
>>>>>>> actually _require_ an unaligned access to nsm_private here.
>>>>>>>
>>>>>>> Instead, we should always use memcpy/memcmp to access the
>>>>>>> contents of RPC opaque data types.  This permits us to continue
>>>>>>> to define these as simple character arrays, as most legacy RPC
>>>>>>> code does, and to rely on gcc to pack the fields in in-core
>>>>>>> structures optimally without breaking on platforms that require
>>>>>>> strict alignment.
>>>>>>
>>>>>> OK, I'm confused.  What structures will get packed differently?
>>>>>
>>>>> Any struct that has an nsm_private embedded in it, such as struct
>>>>> nlm_reboot.
>>>>
>>>> I don't see how that or any structure is changed by this patch.
>>>
>>> It's not.  Note the phrase above in the description: "permits us to
>>> _continue_ to define these" -- meaning, I'm not changing the =20
>>> structures.
>>
>> Err, but that's not right either, is it?:  We don't need to apply =20
>> this
>> patch in order to continue to define the structures as they're =20
>> currently
>> defined.
>>
>> Help! I'm confused!
>
> I too fail to see the reasoning with the memcpy().  It achieves
> exactly the same end result using twice as much work.

Well, OK, but then you might want to review all the other places we =20
use memcpy() to access an RPC opaque field.  There's no reason to =20
treat nsm_private any differently than, say, an NFS readdir cookie.  =20
If there is, then it should be documented here.

> Also, what if
> SM_PRIV_SIZE is ever increased?  Then it will be copying "random" dat=
a
> from the stack into the nsm_private data.  This strikes me as a littl=
e
> fragile.

SM_PRIV_SIZE is a size defined by the NSM network protocol.  It will =20
never change.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
  2009-04-28 22:11                 ` Chuck Lever
@ 2009-04-28 22:23                   ` J. Bruce Fields
  2009-04-28 22:31                   ` Måns Rullgård
  1 sibling, 0 replies; 51+ messages in thread
From: J. Bruce Fields @ 2009-04-28 22:23 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, Mans Rullgard

On Tue, Apr 28, 2009 at 06:11:06PM -0400, Chuck Lever wrote:
> This patch is simply a clean up.  We don't need to use put_unaligned in 
> nsm_init_private.  There is absolutely nothing special about the  
> nsm_private data type that would require this.  It should be accessed  
> and modified the way all other RPC opaques are, via memset/memcpy.
>
> Take a look at http://thread.gmane.org/gmane.linux.nfs/25607 and commit 
> ad5b365c.
>
> The controversy is over how to define opaques so they are accessible on 
> both 32- and 64-bit hardware platforms.  My first pass at  
> nsm_init_private worked on 32-bit systems, but broke on 64-bit systems.  
> An expedient fix for this was to add the put_unaligned in there so 64-bit 
> systems could access the field no matter how it was aligned.  I argue 
> this is unneeded complexity, and inconsistent with the way most other RPC 
> opaques are treated in the kernel.
>
> Andrew Morton proposed making RPC opaques a union of u8, u32 (or  
> __be32), and u64 -- the u8 would allow us to treat an opaque as a byte  
> array when needed, the u32 would allow access via XDR quads, and the u64 
> would force 64-bit alignment.  The issues with this are:
>
> 1.  Defined this way, opaque fields in data structures will force the  
> encompassing structures to be large enough to honor the alignment  
> requirements of the fields, and
>
> 2.  Most other RPC opaques are already defined as character arrays, so  
> we would have to visit all of them to see if there were issues.
>
> If we insist on accessing opaques only via memset() and memcpy() problem 
> 1 goes away and we remain compatible with the traditional definition of 
> an RPC opaque as an array of bytes, on both 64- and 32-bit systems.
>
> The description for this patch can be rewritten this way:
>
> "Clean up: There is nothing special about the nsm_private data type that 
> requires the use of put_unaligned() to access it.  Rewrite  
> nsm_init_private so it accesses nsm_private the way other code accesses 
> other RPC opaques.
>
> See kernel bugzilla 12995."

OK.--b.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
  2009-04-28 22:11                 ` Chuck Lever
  2009-04-28 22:23                   ` J. Bruce Fields
@ 2009-04-28 22:31                   ` Måns Rullgård
       [not found]                     ` <yw1xws94xved.fsf-O+uoZmgXk1l54TAoqtyWWQ@public.gmane.org>
  1 sibling, 1 reply; 51+ messages in thread
From: Måns Rullgård @ 2009-04-28 22:31 UTC (permalink / raw)
  To: Chuck Lever; +Cc: J. Bruce Fields, linux-nfs

Chuck Lever <chuck.lever@oracle.com> writes:

> On Apr 28, 2009, at 5:36 PM, J. Bruce Fields wrote:
>
>> On Tue, Apr 28, 2009 at 01:24:24PM -0400, Chuck Lever wrote:
>>> On Apr 28, 2009, at 12:40 PM, J. Bruce Fields wrote:
>>>> On Tue, Apr 28, 2009 at 12:35:50PM -0400, Chuck Lever wrote:
>>>>> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>>>>>> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>>>>>>> Recently, commit ad5b365c fixed a data type alignment issue in
>>>>>>> nsm_init_private() by introducing put_unaligned().  We don't
>>>>>>> actually _require_ an unaligned access to nsm_private here.
>>>>>>>
>>>>>>> Instead, we should always use memcpy/memcmp to access the
>>>>>>> contents of RPC opaque data types.  This permits us to
>>>>>>> continue to define these as simple character arrays, as most
>>>>>>> legacy RPC code does, and to rely on gcc to pack the fields in
>>>>>>> in-core structures optimally without breaking on platforms
>>>>>>> that require strict alignment.
>>>>>>
>>>>>> OK, I'm confused.  What structures will get packed differently?
>>>>>
>>>>> Any struct that has an nsm_private embedded in it, such as struct
>>>>> nlm_reboot.
>>>>
>>>> I don't see how that or any structure is changed by this patch.
>>>
>>> It's not.  Note the phrase above in the description: "permits us to
>>> _continue_ to define these" -- meaning, I'm not changing the
>>> structures.
>>
>> Err, but that's not right either, is it?:  We don't need to apply th=
is
>> patch in order to continue to define the structures as they're
>> currently
>> defined.
>>
>> Help! I'm confused!
>
> This patch is simply a clean up.  We don't need to use put_unaligned
> in nsm_init_private.  There is absolutely nothing special about the
> nsm_private data type that would require this.  It should be accessed

The "special" thing is has not guaranteed alignment.  Hence, any
access to it must use unaligned-safe methods.

> and modified the way all other RPC opaques are, via memset/memcpy.

What is so special about put_unaligned() that you insist on replacing i=
t?

> Take a look at http://thread.gmane.org/gmane.linux.nfs/25607 and
> commit ad5b365c.
>
> The controversy is over how to define opaques so they are accessible
> on both 32- and 64-bit hardware platforms.  My first pass at
> nsm_init_private worked on 32-bit systems, but broke on 64-bit
> systems.  An expedient fix for this was to add the put_unaligned in
> there so 64-bit systems could access the field no matter how it was
> aligned.  I argue this is unneeded complexity, and inconsistent with
> the way most other RPC opaques are treated in the kernel.
>
> Andrew Morton proposed making RPC opaques a union of u8, u32 (or
> __be32), and u64 -- the u8 would allow us to treat an opaque as a byt=
e
> array when needed, the u32 would allow access via XDR quads, and the
> u64 would force 64-bit alignment.  The issues with this are:
>
> 1.  Defined this way, opaque fields in data structures will force the
> encompassing structures to be large enough to honor the alignment
> requirements of the fields, and
>
> 2.  Most other RPC opaques are already defined as character arrays, s=
o
> we would have to visit all of them to see if there were issues.
>
> If we insist on accessing opaques only via memset() and memcpy()
> problem 1 goes away and we remain compatible with the traditional
> definition of an RPC opaque as an array of bytes, on both 64- and 32-
> bit systems.

I still don't see what problem put_unaligned() poses.  Think of it as
a more efficient memcpy().  We don't want the code to be larger and
slower than necessary, do we?

--=20
M=E5ns Rullg=E5rd
mans-2StjZFpD7GcAvxtiuMwx3w@public.gmane.org

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
       [not found]                     ` <yw1xws94xved.fsf-O+uoZmgXk1l54TAoqtyWWQ@public.gmane.org>
@ 2009-04-28 22:43                       ` Chuck Lever
  2009-04-28 22:52                         ` Måns Rullgård
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-28 22:43 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: J. Bruce Fields, linux-nfs

On Apr 28, 2009, at 6:31 PM, M=E5ns Rullg=E5rd wrote:
> Chuck Lever <chuck.lever@oracle.com> writes:
>> On Apr 28, 2009, at 5:36 PM, J. Bruce Fields wrote:
>>
>>> On Tue, Apr 28, 2009 at 01:24:24PM -0400, Chuck Lever wrote:
>>>> On Apr 28, 2009, at 12:40 PM, J. Bruce Fields wrote:
>>>>> On Tue, Apr 28, 2009 at 12:35:50PM -0400, Chuck Lever wrote:
>>>>>> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>>>>>>> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>>>>>>>> Recently, commit ad5b365c fixed a data type alignment issue in
>>>>>>>> nsm_init_private() by introducing put_unaligned().  We don't
>>>>>>>> actually _require_ an unaligned access to nsm_private here.
>>>>>>>>
>>>>>>>> Instead, we should always use memcpy/memcmp to access the
>>>>>>>> contents of RPC opaque data types.  This permits us to
>>>>>>>> continue to define these as simple character arrays, as most
>>>>>>>> legacy RPC code does, and to rely on gcc to pack the fields in
>>>>>>>> in-core structures optimally without breaking on platforms
>>>>>>>> that require strict alignment.
>>>>>>>
>>>>>>> OK, I'm confused.  What structures will get packed differently?
>>>>>>
>>>>>> Any struct that has an nsm_private embedded in it, such as struc=
t
>>>>>> nlm_reboot.
>>>>>
>>>>> I don't see how that or any structure is changed by this patch.
>>>>
>>>> It's not.  Note the phrase above in the description: "permits us t=
o
>>>> _continue_ to define these" -- meaning, I'm not changing the
>>>> structures.
>>>
>>> Err, but that's not right either, is it?:  We don't need to apply =20
>>> this
>>> patch in order to continue to define the structures as they're
>>> currently
>>> defined.
>>>
>>> Help! I'm confused!
>>
>> This patch is simply a clean up.  We don't need to use put_unaligned
>> in nsm_init_private.  There is absolutely nothing special about the
>> nsm_private data type that would require this.  It should be accesse=
d
>
> The "special" thing is has not guaranteed alignment.  Hence, any
> access to it must use unaligned-safe methods.
>
>> and modified the way all other RPC opaques are, via memset/memcpy.
>
> What is so special about put_unaligned() that you insist on =20
> replacing it?
>
>> Take a look at http://thread.gmane.org/gmane.linux.nfs/25607 and
>> commit ad5b365c.
>>
>> The controversy is over how to define opaques so they are accessible
>> on both 32- and 64-bit hardware platforms.  My first pass at
>> nsm_init_private worked on 32-bit systems, but broke on 64-bit
>> systems.  An expedient fix for this was to add the put_unaligned in
>> there so 64-bit systems could access the field no matter how it was
>> aligned.  I argue this is unneeded complexity, and inconsistent with
>> the way most other RPC opaques are treated in the kernel.
>>
>> Andrew Morton proposed making RPC opaques a union of u8, u32 (or
>> __be32), and u64 -- the u8 would allow us to treat an opaque as a =20
>> byte
>> array when needed, the u32 would allow access via XDR quads, and the
>> u64 would force 64-bit alignment.  The issues with this are:
>>
>> 1.  Defined this way, opaque fields in data structures will force th=
e
>> encompassing structures to be large enough to honor the alignment
>> requirements of the fields, and
>>
>> 2.  Most other RPC opaques are already defined as character arrays, =
=20
>> so
>> we would have to visit all of them to see if there were issues.
>>
>> If we insist on accessing opaques only via memset() and memcpy()
>> problem 1 goes away and we remain compatible with the traditional
>> definition of an RPC opaque as an array of bytes, on both 64- and 32=
-
>> bit systems.
>
> I still don't see what problem put_unaligned() poses.  Think of it as
> a more efficient memcpy().  We don't want the code to be larger and
> slower than necessary, do we?

The problem put_unaligned() poses is that it's harder to read and =20
comprehend this code.  Why call put_unaligned() here, but not for =20
other RPC opaques, such as nfs4_verifier?  With put_unaligned() =20
readers have to chase down asm/unaligned.h.  With memcpy() it is =20
precisely clear what is going on.

nsm_init_private() is called infrequently, and SM_PRIV_SIZE is just 16 =
=20
bytes.  And, memcpy() on most platforms is careful to do only the =20
unaligned parts with single byte moves.  The rest is usually performed =
=20
by 64-bit copies.  So I think we're talking about a handful of =20
instructions difference here.  It's hardly worth the effort of making =20
this different than how the other opaques are handled.

Again, this is just a clean up.  The current code works.  But I think =20
it's inconsistent with other areas of the code, and harder to =20
understand.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
  2009-04-28 22:43                       ` Chuck Lever
@ 2009-04-28 22:52                         ` Måns Rullgård
       [not found]                           ` <yw1xskjsxuff.fsf-O+uoZmgXk1l54TAoqtyWWQ@public.gmane.org>
  0 siblings, 1 reply; 51+ messages in thread
From: Måns Rullgård @ 2009-04-28 22:52 UTC (permalink / raw)
  To: Chuck Lever; +Cc: J. Bruce Fields, linux-nfs

Chuck Lever <chuck.lever@oracle.com> writes:

> On Apr 28, 2009, at 6:31 PM, M=E5ns Rullg=E5rd wrote:
>> Chuck Lever <chuck.lever@oracle.com> writes:
>>> On Apr 28, 2009, at 5:36 PM, J. Bruce Fields wrote:
>>>
>>>> On Tue, Apr 28, 2009 at 01:24:24PM -0400, Chuck Lever wrote:
>>>>> On Apr 28, 2009, at 12:40 PM, J. Bruce Fields wrote:
>>>>>> On Tue, Apr 28, 2009 at 12:35:50PM -0400, Chuck Lever wrote:
>>>>>>> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>>>>>>>> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>>>>>>>>> Recently, commit ad5b365c fixed a data type alignment issue i=
n
>>>>>>>>> nsm_init_private() by introducing put_unaligned().  We don't
>>>>>>>>> actually _require_ an unaligned access to nsm_private here.
>>>>>>>>>
>>>>>>>>> Instead, we should always use memcpy/memcmp to access the
>>>>>>>>> contents of RPC opaque data types.  This permits us to
>>>>>>>>> continue to define these as simple character arrays, as most
>>>>>>>>> legacy RPC code does, and to rely on gcc to pack the fields i=
n
>>>>>>>>> in-core structures optimally without breaking on platforms
>>>>>>>>> that require strict alignment.
>>>>>>>>
>>>>>>>> OK, I'm confused.  What structures will get packed differently=
?
>>>>>>>
>>>>>>> Any struct that has an nsm_private embedded in it, such as stru=
ct
>>>>>>> nlm_reboot.
>>>>>>
>>>>>> I don't see how that or any structure is changed by this patch.
>>>>>
>>>>> It's not.  Note the phrase above in the description: "permits us =
to
>>>>> _continue_ to define these" -- meaning, I'm not changing the
>>>>> structures.
>>>>
>>>> Err, but that's not right either, is it?:  We don't need to apply
>>>> this
>>>> patch in order to continue to define the structures as they're
>>>> currently
>>>> defined.
>>>>
>>>> Help! I'm confused!
>>>
>>> This patch is simply a clean up.  We don't need to use put_unaligne=
d
>>> in nsm_init_private.  There is absolutely nothing special about the
>>> nsm_private data type that would require this.  It should be access=
ed
>>
>> The "special" thing is has not guaranteed alignment.  Hence, any
>> access to it must use unaligned-safe methods.
>>
>>> and modified the way all other RPC opaques are, via memset/memcpy.
>>
>> What is so special about put_unaligned() that you insist on
>> replacing it?
>>
>>> Take a look at http://thread.gmane.org/gmane.linux.nfs/25607 and
>>> commit ad5b365c.
>>>
>>> The controversy is over how to define opaques so they are accessibl=
e
>>> on both 32- and 64-bit hardware platforms.  My first pass at
>>> nsm_init_private worked on 32-bit systems, but broke on 64-bit
>>> systems.  An expedient fix for this was to add the put_unaligned in
>>> there so 64-bit systems could access the field no matter how it was
>>> aligned.  I argue this is unneeded complexity, and inconsistent wit=
h
>>> the way most other RPC opaques are treated in the kernel.
>>>
>>> Andrew Morton proposed making RPC opaques a union of u8, u32 (or
>>> __be32), and u64 -- the u8 would allow us to treat an opaque as a
>>> byte
>>> array when needed, the u32 would allow access via XDR quads, and th=
e
>>> u64 would force 64-bit alignment.  The issues with this are:
>>>
>>> 1.  Defined this way, opaque fields in data structures will force t=
he
>>> encompassing structures to be large enough to honor the alignment
>>> requirements of the fields, and
>>>
>>> 2.  Most other RPC opaques are already defined as character
>>> arrays, so we would have to visit all of them to see if there were
>>> issues.
>>>
>>> If we insist on accessing opaques only via memset() and memcpy()
>>> problem 1 goes away and we remain compatible with the traditional
>>> definition of an RPC opaque as an array of bytes, on both 64- and 3=
2-
>>> bit systems.
>>
>> I still don't see what problem put_unaligned() poses.  Think of it a=
s
>> a more efficient memcpy().  We don't want the code to be larger and
>> slower than necessary, do we?
>
> The problem put_unaligned() poses is that it's harder to read and
> comprehend this code.  Why call put_unaligned() here, but not for
> other RPC opaques, such as nfs4_verifier?  With put_unaligned()
> readers have to chase down asm/unaligned.h.  With memcpy() it is
> precisely clear what is going on.

I thought put_unaligned() was precisely clear when I wrote that code.
I would have thought anyone working on the kernel would be familiar
with it, or at the very least be able to guess what it did.

Your reasoning above is the kind of thing I'd expect in some silly
corporate environment, not from Linux kernel developers.

> Again, this is just a clean up.  The current code works.  But I think
> it's inconsistent with other areas of the code, and harder to
> understand.

I'll leave the decision to the maintainers of the code.

--=20
M=E5ns Rullg=E5rd
mans-2StjZFpD7GcAvxtiuMwx3w@public.gmane.org

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
       [not found]                           ` <yw1xskjsxuff.fsf-O+uoZmgXk1l54TAoqtyWWQ@public.gmane.org>
@ 2009-04-29 15:16                             ` Chuck Lever
  2009-04-29 18:02                               ` Måns Rullgård
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-04-29 15:16 UTC (permalink / raw)
  To: Måns Rullgård, J. Bruce Fields; +Cc: Linux NFS Mailing List

On Apr 28, 2009, at 6:52 PM, M=E5ns Rullg=E5rd wrote:
> Chuck Lever <chuck.lever@oracle.com> writes:
>> On Apr 28, 2009, at 6:31 PM, M=E5ns Rullg=E5rd wrote:
>>> Chuck Lever <chuck.lever@oracle.com> writes:
>>>> On Apr 28, 2009, at 5:36 PM, J. Bruce Fields wrote:
>>>>
>>>>> On Tue, Apr 28, 2009 at 01:24:24PM -0400, Chuck Lever wrote:
>>>>>> On Apr 28, 2009, at 12:40 PM, J. Bruce Fields wrote:
>>>>>>> On Tue, Apr 28, 2009 at 12:35:50PM -0400, Chuck Lever wrote:
>>>>>>>> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>>>>>>>>> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>>>>>>>>>> Recently, commit ad5b365c fixed a data type alignment issue =
=20
>>>>>>>>>> in
>>>>>>>>>> nsm_init_private() by introducing put_unaligned().  We don't
>>>>>>>>>> actually _require_ an unaligned access to nsm_private here.
>>>>>>>>>>
>>>>>>>>>> Instead, we should always use memcpy/memcmp to access the
>>>>>>>>>> contents of RPC opaque data types.  This permits us to
>>>>>>>>>> continue to define these as simple character arrays, as most
>>>>>>>>>> legacy RPC code does, and to rely on gcc to pack the fields =
=20
>>>>>>>>>> in
>>>>>>>>>> in-core structures optimally without breaking on platforms
>>>>>>>>>> that require strict alignment.
>>>>>>>>>
>>>>>>>>> OK, I'm confused.  What structures will get packed =20
>>>>>>>>> differently?
>>>>>>>>
>>>>>>>> Any struct that has an nsm_private embedded in it, such as =20
>>>>>>>> struct
>>>>>>>> nlm_reboot.
>>>>>>>
>>>>>>> I don't see how that or any structure is changed by this patch.
>>>>>>
>>>>>> It's not.  Note the phrase above in the description: "permits =20
>>>>>> us to
>>>>>> _continue_ to define these" -- meaning, I'm not changing the
>>>>>> structures.
>>>>>
>>>>> Err, but that's not right either, is it?:  We don't need to apply
>>>>> this
>>>>> patch in order to continue to define the structures as they're
>>>>> currently
>>>>> defined.
>>>>>
>>>>> Help! I'm confused!
>>>>
>>>> This patch is simply a clean up.  We don't need to use =20
>>>> put_unaligned
>>>> in nsm_init_private.  There is absolutely nothing special about th=
e
>>>> nsm_private data type that would require this.  It should be =20
>>>> accessed
>>>
>>> The "special" thing is has not guaranteed alignment.  Hence, any
>>> access to it must use unaligned-safe methods.
>>>
>>>> and modified the way all other RPC opaques are, via memset/memcpy.
>>>
>>> What is so special about put_unaligned() that you insist on
>>> replacing it?
>>>
>>>> Take a look at http://thread.gmane.org/gmane.linux.nfs/25607 and
>>>> commit ad5b365c.
>>>>
>>>> The controversy is over how to define opaques so they are =20
>>>> accessible
>>>> on both 32- and 64-bit hardware platforms.  My first pass at
>>>> nsm_init_private worked on 32-bit systems, but broke on 64-bit
>>>> systems.  An expedient fix for this was to add the put_unaligned i=
n
>>>> there so 64-bit systems could access the field no matter how it wa=
s
>>>> aligned.  I argue this is unneeded complexity, and inconsistent =20
>>>> with
>>>> the way most other RPC opaques are treated in the kernel.
>>>>
>>>> Andrew Morton proposed making RPC opaques a union of u8, u32 (or
>>>> __be32), and u64 -- the u8 would allow us to treat an opaque as a
>>>> byte
>>>> array when needed, the u32 would allow access via XDR quads, and =20
>>>> the
>>>> u64 would force 64-bit alignment.  The issues with this are:
>>>>
>>>> 1.  Defined this way, opaque fields in data structures will force =
=20
>>>> the
>>>> encompassing structures to be large enough to honor the alignment
>>>> requirements of the fields, and
>>>>
>>>> 2.  Most other RPC opaques are already defined as character
>>>> arrays, so we would have to visit all of them to see if there were
>>>> issues.
>>>>
>>>> If we insist on accessing opaques only via memset() and memcpy()
>>>> problem 1 goes away and we remain compatible with the traditional
>>>> definition of an RPC opaque as an array of bytes, on both 64- and =
=20
>>>> 32-
>>>> bit systems.
>>>
>>> I still don't see what problem put_unaligned() poses.  Think of it =
=20
>>> as
>>> a more efficient memcpy().  We don't want the code to be larger and
>>> slower than necessary, do we?
>>
>> The problem put_unaligned() poses is that it's harder to read and
>> comprehend this code.  Why call put_unaligned() here, but not for
>> other RPC opaques, such as nfs4_verifier?  With put_unaligned()
>> readers have to chase down asm/unaligned.h.  With memcpy() it is
>> precisely clear what is going on.
>
> I thought put_unaligned() was precisely clear when I wrote that code.

In fact it wasn't.  Your patch description and code were so unclear =20
that Andrew Morton chastised all of us for not recognizing that this =20
was a _fix_ that also belonged in -stable:

   --- Quote ---
 > A patch just went upstream for 2.6.30 that addresses that
[ ... commit ad5b365c snipped ... ]

Is this the best way of fixing it?  We'll crash again if some other
code starts to access .data with non-byte-sized accesses.  What makes
this especially risky is that the code will pass testing on x86.

__aligned would be one way.  But it would be far far cleaner to just do


--- a/include/linux/lockd/xdr.h~a
+++ a/include/linux/lockd/xdr.h
@@ -17,7 +17,7 @@
#define SM_PRIV_SIZE		16

struct nsm_private {
-	unsigned char		data[SM_PRIV_SIZE];
+	u64 data[SM_PRIV_SIZE/sizeof(u64)];
};

struct svc_rqst;

all the typecasting and the put_unaligned()s just go away then.
 > but did not mention that it fixed an oops.
That was a fairly significant failing. Guys, please please do remember =
=20
that we're also maintaining the stable kernels. When merging a patch =20
please always think whether it needs to be backported.

   --- End Quote ---

It appears that Andrew did not find your patch to be sufficient.

The fact that you have repeated the "what if SM_PRIV_SIZE changes" =20
argument suggests that you still haven't become familiar with the body =
=20
of this code to see what it does, how it works, and what would be a =20
consistent and correct fix.

> I would have thought anyone working on the kernel would be familiar
> with it, or at the very least be able to guess what it did.

> Your reasoning above is the kind of thing I'd expect in some silly
> corporate environment, not from Linux kernel developers.

How adorable!

We are striving for a code base that is easier to review by people who =
=20
are outside our small community.  I find that goal to be perfectly =20
aligned with the stated goals of Linux kernel developers.

>> Again, this is just a clean up.  The current code works.  But I thin=
k
>> it's inconsistent with other areas of the code, and harder to
>> understand.
>
> I'll leave the decision to the maintainers of the code.

Bruce, please drop patch 19.  A simple clean up like this is not worth =
=20
the heartburn.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private()
  2009-04-29 15:16                             ` Chuck Lever
@ 2009-04-29 18:02                               ` Måns Rullgård
  0 siblings, 0 replies; 51+ messages in thread
From: Måns Rullgård @ 2009-04-29 18:02 UTC (permalink / raw)
  To: Chuck Lever; +Cc: J. Bruce Fields, Linux NFS Mailing List, akpm

Chuck Lever <chuck.lever@oracle.com> writes:

> On Apr 28, 2009, at 6:52 PM, M=E5ns Rullg=E5rd wrote:
>> Chuck Lever <chuck.lever@oracle.com> writes:
>>> On Apr 28, 2009, at 6:31 PM, M=E5ns Rullg=E5rd wrote:
>>>> Chuck Lever <chuck.lever@oracle.com> writes:
>>>>> On Apr 28, 2009, at 5:36 PM, J. Bruce Fields wrote:
>>>>>
>>>>>> On Tue, Apr 28, 2009 at 01:24:24PM -0400, Chuck Lever wrote:
>>>>>>> On Apr 28, 2009, at 12:40 PM, J. Bruce Fields wrote:
>>>>>>>> On Tue, Apr 28, 2009 at 12:35:50PM -0400, Chuck Lever wrote:
>>>>>>>>> On Apr 28, 2009, at 12:31 PM, J. Bruce Fields wrote:
>>>>>>>>>> On Thu, Apr 23, 2009 at 07:33:40PM -0400, Chuck Lever wrote:
>>>>>>>>>>> Recently, commit ad5b365c fixed a data type alignment issue
>>>>>>>>>>> in
>>>>>>>>>>> nsm_init_private() by introducing put_unaligned().  We don'=
t
>>>>>>>>>>> actually _require_ an unaligned access to nsm_private here.
>>>>>>>>>>>
>>>>>>>>>>> Instead, we should always use memcpy/memcmp to access the
>>>>>>>>>>> contents of RPC opaque data types.  This permits us to
>>>>>>>>>>> continue to define these as simple character arrays, as mos=
t
>>>>>>>>>>> legacy RPC code does, and to rely on gcc to pack the fields
>>>>>>>>>>> in
>>>>>>>>>>> in-core structures optimally without breaking on platforms
>>>>>>>>>>> that require strict alignment.
>>>>>>>>>>
>>>>>>>>>> OK, I'm confused.  What structures will get packed
>>>>>>>>>> differently?
>>>>>>>>>
>>>>>>>>> Any struct that has an nsm_private embedded in it, such as
>>>>>>>>> struct
>>>>>>>>> nlm_reboot.
>>>>>>>>
>>>>>>>> I don't see how that or any structure is changed by this patch=
=2E
>>>>>>>
>>>>>>> It's not.  Note the phrase above in the description: "permits
>>>>>>> us to
>>>>>>> _continue_ to define these" -- meaning, I'm not changing the
>>>>>>> structures.
>>>>>>
>>>>>> Err, but that's not right either, is it?:  We don't need to appl=
y
>>>>>> this
>>>>>> patch in order to continue to define the structures as they're
>>>>>> currently
>>>>>> defined.
>>>>>>
>>>>>> Help! I'm confused!
>>>>>
>>>>> This patch is simply a clean up.  We don't need to use
>>>>> put_unaligned
>>>>> in nsm_init_private.  There is absolutely nothing special about t=
he
>>>>> nsm_private data type that would require this.  It should be
>>>>> accessed
>>>>
>>>> The "special" thing is has not guaranteed alignment.  Hence, any
>>>> access to it must use unaligned-safe methods.
>>>>
>>>>> and modified the way all other RPC opaques are, via memset/memcpy=
=2E
>>>>
>>>> What is so special about put_unaligned() that you insist on
>>>> replacing it?
>>>>
>>>>> Take a look at http://thread.gmane.org/gmane.linux.nfs/25607 and
>>>>> commit ad5b365c.
>>>>>
>>>>> The controversy is over how to define opaques so they are
>>>>> accessible
>>>>> on both 32- and 64-bit hardware platforms.  My first pass at
>>>>> nsm_init_private worked on 32-bit systems, but broke on 64-bit
>>>>> systems.  An expedient fix for this was to add the put_unaligned =
in
>>>>> there so 64-bit systems could access the field no matter how it w=
as
>>>>> aligned.  I argue this is unneeded complexity, and inconsistent
>>>>> with
>>>>> the way most other RPC opaques are treated in the kernel.
>>>>>
>>>>> Andrew Morton proposed making RPC opaques a union of u8, u32 (or
>>>>> __be32), and u64 -- the u8 would allow us to treat an opaque as a
>>>>> byte
>>>>> array when needed, the u32 would allow access via XDR quads, and
>>>>> the
>>>>> u64 would force 64-bit alignment.  The issues with this are:
>>>>>
>>>>> 1.  Defined this way, opaque fields in data structures will force
>>>>> the
>>>>> encompassing structures to be large enough to honor the alignment
>>>>> requirements of the fields, and
>>>>>
>>>>> 2.  Most other RPC opaques are already defined as character
>>>>> arrays, so we would have to visit all of them to see if there wer=
e
>>>>> issues.
>>>>>
>>>>> If we insist on accessing opaques only via memset() and memcpy()
>>>>> problem 1 goes away and we remain compatible with the traditional
>>>>> definition of an RPC opaque as an array of bytes, on both 64- and
>>>>> 32-
>>>>> bit systems.
>>>>
>>>> I still don't see what problem put_unaligned() poses.  Think of it
>>>> as
>>>> a more efficient memcpy().  We don't want the code to be larger an=
d
>>>> slower than necessary, do we?
>>>
>>> The problem put_unaligned() poses is that it's harder to read and
>>> comprehend this code.  Why call put_unaligned() here, but not for
>>> other RPC opaques, such as nfs4_verifier?  With put_unaligned()
>>> readers have to chase down asm/unaligned.h.  With memcpy() it is
>>> precisely clear what is going on.
>>
>> I thought put_unaligned() was precisely clear when I wrote that code=
=2E
>
> In fact it wasn't.  Your patch description and code were so unclear

I demand an apology for that.  There is absolutely nothing about the
patch or the description that is the slightest bit unclear to anyone
with a decent knowledge of C and half a brain.

> that Andrew Morton chastised all of us for not recognizing that this
> was a _fix_ that also belonged in -stable:

I didn't realise it fixed an oops on some machines.  I only saw a
warning since the trap handler fixed the unaligned on access on both
ARM and Alpha.

>   --- Quote ---
>> A patch just went upstream for 2.6.30 that addresses that
> [ ... commit ad5b365c snipped ... ]
>
> Is this the best way of fixing it?  We'll crash again if some other
> code starts to access .data with non-byte-sized accesses.  What makes
> this especially risky is that the code will pass testing on x86.
>
> __aligned would be one way.  But it would be far far cleaner to just =
do
>
> --- a/include/linux/lockd/xdr.h~a
> +++ a/include/linux/lockd/xdr.h
> @@ -17,7 +17,7 @@
> #define SM_PRIV_SIZE		16
>
> struct nsm_private {
> -	unsigned char		data[SM_PRIV_SIZE];
> +	u64 data[SM_PRIV_SIZE/sizeof(u64)];
> };
>
> struct svc_rqst;
>
> all the typecasting and the put_unaligned()s just go away then.

You already rejected this due to the inflation of struct sizes it
would cause, a viewpoint I am inclined to agree with unless the
structs in question can be rearranged to avoid it.  Depending on the
number of these objects involved, and the frequency of access to them,
I could probably be persuaded either way.

>> but did not mention that it fixed an oops.
> That was a fairly significant failing. Guys, please please do remembe=
r
> that we're also maintaining the stable kernels. When merging a patch
> please always think whether it needs to be backported.

I did consider this, and since I did not experience an oops, I did not
think it would be necessary.

> It appears that Andrew did not find your patch to be sufficient.

That is not how I interpret what he said.  The patch clearly fixes the
*current* issue.  Andrew was concerned that similar errors might show
up again in the future unless the field was made aligned.  The fact
that it happened once is evidence that these concerns are valid and
should be addressed.  This does not imply that an immediate fix for an
oops (as it turns out to be) should be delayed while a broader fix is
developed.  Even less does it imply that a "cleanup" of the working
code is warranted before a final decision has been reached.

> The fact that you have repeated the "what if SM_PRIV_SIZE changes"
> argument suggests that you still haven't become familiar with the bod=
y

No, I did not read hundreds of pages of protocol specifications before
fixing one simple unaligned access in the most obvious way I could
imagine.  Besides, a future revision of the standard could change the
SM_PRIV_SIZE value.

> of this code to see what it does, how it works, and what would be a
> consistent and correct fix.

Your arguments about consistency are nothing short of hypocritical.
Where was consistency when YOU wrote the broken code using pointer
casts and direct assignments rather than the memcpy() you now seem to
cherish so?

=46urthermore, YOU yourself approved my original patch.  How could you
do that if you, as you now claim, did not understand what it did?  Why
did you not request a rewrite using memcpy() then?

>>> Again, this is just a clean up.  The current code works.  But I thi=
nk
>>> it's inconsistent with other areas of the code, and harder to
>>> understand.
>>
>> I'll leave the decision to the maintainers of the code.
>
> Bruce, please drop patch 19.  A simple clean up like this is not wort=
h
> the heartburn.

One man's cleanup is another man's obfuscation, apparently.

--=20
M=E5ns Rullg=E5rd
mans-2StjZFpD7GcAvxtiuMwx3w@public.gmane.org

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 18/19] lockd: Update NSM state from SM_MON replies
  2009-04-28 19:11             ` Chuck Lever
@ 2009-05-08 15:19               ` Chuck Lever
  2009-05-08 15:33                 ` J. Bruce Fields
  0 siblings, 1 reply; 51+ messages in thread
From: Chuck Lever @ 2009-05-08 15:19 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux NFS Mailing List

On Apr 28, 2009, at 3:11 PM, Chuck Lever wrote:
> On Apr 28, 2009, at 12:38 PM, J. Bruce Fields wrote:
>> On Tue, Apr 28, 2009 at 12:34:19PM -0400, Chuck Lever wrote:
>>> On Apr 28, 2009, at 12:25 PM, J. Bruce Fields wrote:
>>>> On Thu, Apr 23, 2009 at 07:33:33PM -0400, Chuck Lever wrote:
>>>>> When rpc.statd starts up in user space at boot time, it attempts  
>>>>> to
>>>>> write the latest NSM local state number into
>>>>> /proc/sys/fs/nfs/nsm_local_state.
>>>>>
>>>>> If lockd.ko isn't loaded yet (as is the case in most  
>>>>> configurations),
>>>>> that file doesn't exist, thus the kernel's NSM state remains set  
>>>>> to
>>>>> its initial value of zero during lockd operation.
>>>>>
>>>>> This is a problem because rpc.statd and lockd use the NSM state
>>>>> number
>>>>> to prevent repeated lock recovery on rebooted hosts.  If lockd  
>>>>> sends
>>>>> a zero NSM state, but then a delayed SM_NOTIFY with a real NSM  
>>>>> state
>>>>> number is received, there is no way for lockd or rpc.statd to
>>>>> distinguish that stale SM_NOTIFY from an actual reboot.  Thus lock
>>>>> recovery could be performed after the rebooted host has already
>>>>> started reclaiming locks, and those locks will be lost.
>>>>>
>>>>> We could change /etc/init.d/nfslock so it always modprobes  
>>>>> lockd.ko
>>>>> before starting rpc.statd.  However, if lockd.ko is ever unloaded
>>>>> and reloaded, we are back at square one, since the NSM state is  
>>>>> not
>>>>> preserved across an unload/reload cycle.  This may happen  
>>>>> frequently
>>>>> on clients that use automounter.  A period of NFS inactivity  
>>>>> causes
>>>>> lockd.ko to be unloaded, and the kernel loses its NSM state  
>>>>> setting.
>>>>
>>>> Aie.  Can we also fix the automounter or some other part of the
>>>> userspace configuration?
>>>
>>> User space isn't the problem here... it's the fact that lockd can  
>>> get
>>> unloaded after a period of inactivity.  IMO lockd should be pinned  
>>> in
>>> the kernel after it is loaded with /etc/init.d/nfslock.
>>>
>>>>> Instead, let's use the fact that rpc.statd plants the local  
>>>>> system's
>>>>> NSM state in every SM_MON (and SM_UNMON) reply.  lockd performs a
>>>>> synchronous SM_MON upcall to the local rpc.statd _before_  
>>>>> sending its
>>>>> first NLM request to a new remote.  This would permit rpc.statd to
>>>>> provide the current NSM state to lockd, even after lockd.ko had  
>>>>> been
>>>>> unloaded and reloaded.
>>>>>
>>>>> Note that NLMPROC_LOCK arguments are constructed before the
>>>>> nsm_monitor() call, so we have to rearrange argument construction
>>>>> very
>>>>> slightly to make this all work out.
>>>>>
>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>> ---
>>>>>
>>>>> fs/lockd/clntproc.c |    2 +-
>>>>> fs/lockd/mon.c      |    6 +++++-
>>>>> 2 files changed, 6 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
>>>>> index dd79570..f55b900 100644
>>>>> --- a/fs/lockd/clntproc.c
>>>>> +++ b/fs/lockd/clntproc.c
>>>>> @@ -126,7 +126,6 @@ static void nlmclnt_setlockargs(struct  
>>>>> nlm_rqst
>>>>> *req, struct file_lock *fl)
>>>>> 	struct nlm_lock	*lock = &argp->lock;
>>>>>
>>>>> 	nlmclnt_next_cookie(&argp->cookie);
>>>>> -	argp->state   = nsm_local_state;
>>>>> 	memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode),
>>>>> sizeof(struct nfs_fh));
>>>>> 	lock->caller  = utsname()->nodename;
>>>>> 	lock->oh.data = req->a_owner;
>>>>> @@ -519,6 +518,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct
>>>>> file_lock *fl)
>>>>>
>>>>> 	if (nsm_monitor(host) < 0)
>>>>> 		goto out;
>>>>> +	req->a_args.state = nsm_local_state;
>>>>
>>>> Hm.  It looks like a_args.state is never used, except in ifdef'd- 
>>>> out
>>>> code in nlm4svc_proc_lock() and nlmsvc_proc_lock() ifdef'd out.
>>>> Something's wrong there.  (Not your fault; but needs looking into.)
>>>
>>> This isn't a big deal on the server side (I guess I should give this
>>> patch to Trond instead of you, in that case).

Since this is a client-side only patch, should I pass this to Trond  
instead?

[ more below ]

>>> The client passes its NSM state number to the server in NLMPROC_LOCK
>>> calls.  There is no mechanism for the server to pass its NSM state
>>> number to the client via the NLM protocol.  So the first the  
>>> client is
>>> aware of the server's NSM state number is after the server reboots  
>>> (via
>>> SM_NOTIFY).  If the server never reboots, the client will never  
>>> know the
>>> server's NSM state number.
>>
>> So the #if 0'd code should just be deleted?
>
> OK, I misread your question before.
>
> As I read the code, our server does not appear to utilize the  
> client's NSM state number, except for gating SM_NOTIFY requests with  
> a previously-seen NSM state number.  The #ifdef'd code would  
> potentially deny lock requests if it detected the state number going  
> backwards.
>
> It would be nicer if the server actually tracked the client's state  
> number, but it doesn't appear to do that today.  The #ifdef'd code  
> serves to remind us that we should consider this.  This would also  
> prevent a delayed SM_NOTIFY from causing the server to drop locks  
> reacquired during the grace period accidentally.
>
> So I think it would be good to leave it, or replace it with a FIXME  
> comment, for now.  Eventually we should add a little extra logic to  
> handle this case.
>
>> --b.
>>
>>>
>>>>> 	fl->fl_flags |= FL_ACCESS;
>>>>> 	status = do_vfs_lock(fl);
>>>>> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
>>>>> index 6d5d4a4..5017d50 100644
>>>>> --- a/fs/lockd/mon.c
>>>>> +++ b/fs/lockd/mon.c
>>>>> @@ -188,8 +188,12 @@ int nsm_monitor(const struct nlm_host *host)
>>>>> 		status = -EIO;
>>>>> 	if (status < 0)
>>>>> 		printk(KERN_NOTICE "lockd: cannot monitor %s\n", nsm->sm_name);
>>>>> -	else
>>>>> +	else {
>>>>> 		nsm->sm_monitored = 1;
>>>>> +		nsm_local_state = res.state;
>>>>> +		dprintk("lockd: nsm_monitor: NSM state is now %d\n",
>>>>> +				nsm_local_state);
>>>>
>>>> Could we make that a dprintk in the case where this changes  
>>>> nsm_local
>>>> state from something other than zero (nsm_lock_state &&
>>>> nsm_local_state
>>>> != res.state)?
>>>>
>>>> (Just to make sure no statd is returning inconsistent  
>>>> nsm_local_stats
>>>> here.)

Having the kernel limit changes to the state number is probably not a  
good idea.  Certain statd operations such as SM_SIMU_CRASH will modify  
that state number.  We don't use SM_SIMU_CRASH today, but handling  
server failover and such will likely require something like it.

In any event, servers that are careful enough to track a client's NSM  
state number will tell us pretty quickly if this is not working right.

>>> I'm not sure that's a big deal, but...
>>>
>>> Note that the XNFS version 3 spec suggests the local lockd should
>>> request the NSM state number when it starts up by posting an
>>> SM_UNMON_ALL to the local statd.  That might be safer than loading  
>>> it
>>> after every SM_MON.

So, the problem with using SM_UNMON_ALL when lockd starts up is that  
it introduces yet another start-up ordering dependency.  In order for  
this solution to work, statd is required to be running before lockd  
starts up.  I think we discussed a few weeks ago how, on the server,  
lockd needs to start first so that it is available before reboot  
notifications are sent.

Even though this patch is for the client, I'm loathe to add yet  
another start-up ordering dependency in this area.  Theoretically this  
stuff should work correctly no matter what order you start it  
(especially since we don't package NFS init scripts with nfs-utils).   
The current proposal (using the result of SM_MON) provides adequate  
NSM state number updates without introducing new ordering constraints.

>>>>> +	}
>>>>> 	return status;
>>>>> }
>>>>
>>>> --b.
>>>
>>> --
>>> Chuck Lever
>>> chuck[dot]lever[at]oracle[dot]com

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 18/19] lockd: Update NSM state from SM_MON replies
  2009-05-08 15:19               ` Chuck Lever
@ 2009-05-08 15:33                 ` J. Bruce Fields
  0 siblings, 0 replies; 51+ messages in thread
From: J. Bruce Fields @ 2009-05-08 15:33 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

On Fri, May 08, 2009 at 11:19:07AM -0400, Chuck Lever wrote:
> On Apr 28, 2009, at 3:11 PM, Chuck Lever wrote:
>> On Apr 28, 2009, at 12:38 PM, J. Bruce Fields wrote:
>>> On Tue, Apr 28, 2009 at 12:34:19PM -0400, Chuck Lever wrote:
>>>> On Apr 28, 2009, at 12:25 PM, J. Bruce Fields wrote:
>>>>> On Thu, Apr 23, 2009 at 07:33:33PM -0400, Chuck Lever wrote:
>>>>>> When rpc.statd starts up in user space at boot time, it 
>>>>>> attempts to
>>>>>> write the latest NSM local state number into
>>>>>> /proc/sys/fs/nfs/nsm_local_state.
>>>>>>
>>>>>> If lockd.ko isn't loaded yet (as is the case in most  
>>>>>> configurations),
>>>>>> that file doesn't exist, thus the kernel's NSM state remains 
>>>>>> set to
>>>>>> its initial value of zero during lockd operation.
>>>>>>
>>>>>> This is a problem because rpc.statd and lockd use the NSM state
>>>>>> number
>>>>>> to prevent repeated lock recovery on rebooted hosts.  If lockd  
>>>>>> sends
>>>>>> a zero NSM state, but then a delayed SM_NOTIFY with a real NSM  
>>>>>> state
>>>>>> number is received, there is no way for lockd or rpc.statd to
>>>>>> distinguish that stale SM_NOTIFY from an actual reboot.  Thus lock
>>>>>> recovery could be performed after the rebooted host has already
>>>>>> started reclaiming locks, and those locks will be lost.
>>>>>>
>>>>>> We could change /etc/init.d/nfslock so it always modprobes  
>>>>>> lockd.ko
>>>>>> before starting rpc.statd.  However, if lockd.ko is ever unloaded
>>>>>> and reloaded, we are back at square one, since the NSM state is 
>>>>>> not
>>>>>> preserved across an unload/reload cycle.  This may happen  
>>>>>> frequently
>>>>>> on clients that use automounter.  A period of NFS inactivity  
>>>>>> causes
>>>>>> lockd.ko to be unloaded, and the kernel loses its NSM state  
>>>>>> setting.
>>>>>
>>>>> Aie.  Can we also fix the automounter or some other part of the
>>>>> userspace configuration?
>>>>
>>>> User space isn't the problem here... it's the fact that lockd can  
>>>> get
>>>> unloaded after a period of inactivity.  IMO lockd should be pinned  
>>>> in
>>>> the kernel after it is loaded with /etc/init.d/nfslock.
>>>>
>>>>>> Instead, let's use the fact that rpc.statd plants the local  
>>>>>> system's
>>>>>> NSM state in every SM_MON (and SM_UNMON) reply.  lockd performs a
>>>>>> synchronous SM_MON upcall to the local rpc.statd _before_  
>>>>>> sending its
>>>>>> first NLM request to a new remote.  This would permit rpc.statd to
>>>>>> provide the current NSM state to lockd, even after lockd.ko had 
>>>>>> been
>>>>>> unloaded and reloaded.
>>>>>>
>>>>>> Note that NLMPROC_LOCK arguments are constructed before the
>>>>>> nsm_monitor() call, so we have to rearrange argument construction
>>>>>> very
>>>>>> slightly to make this all work out.
>>>>>>
>>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>>> ---
>>>>>>
>>>>>> fs/lockd/clntproc.c |    2 +-
>>>>>> fs/lockd/mon.c      |    6 +++++-
>>>>>> 2 files changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
>>>>>> index dd79570..f55b900 100644
>>>>>> --- a/fs/lockd/clntproc.c
>>>>>> +++ b/fs/lockd/clntproc.c
>>>>>> @@ -126,7 +126,6 @@ static void nlmclnt_setlockargs(struct  
>>>>>> nlm_rqst
>>>>>> *req, struct file_lock *fl)
>>>>>> 	struct nlm_lock	*lock = &argp->lock;
>>>>>>
>>>>>> 	nlmclnt_next_cookie(&argp->cookie);
>>>>>> -	argp->state   = nsm_local_state;
>>>>>> 	memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode),
>>>>>> sizeof(struct nfs_fh));
>>>>>> 	lock->caller  = utsname()->nodename;
>>>>>> 	lock->oh.data = req->a_owner;
>>>>>> @@ -519,6 +518,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct
>>>>>> file_lock *fl)
>>>>>>
>>>>>> 	if (nsm_monitor(host) < 0)
>>>>>> 		goto out;
>>>>>> +	req->a_args.state = nsm_local_state;
>>>>>
>>>>> Hm.  It looks like a_args.state is never used, except in ifdef'd- 
>>>>> out
>>>>> code in nlm4svc_proc_lock() and nlmsvc_proc_lock() ifdef'd out.
>>>>> Something's wrong there.  (Not your fault; but needs looking into.)
>>>>
>>>> This isn't a big deal on the server side (I guess I should give this
>>>> patch to Trond instead of you, in that case).
>
> Since this is a client-side only patch, should I pass this to Trond  
> instead?

OK.

>
> [ more below ]
>
>>>> The client passes its NSM state number to the server in NLMPROC_LOCK
>>>> calls.  There is no mechanism for the server to pass its NSM state
>>>> number to the client via the NLM protocol.  So the first the  
>>>> client is
>>>> aware of the server's NSM state number is after the server reboots  
>>>> (via
>>>> SM_NOTIFY).  If the server never reboots, the client will never  
>>>> know the
>>>> server's NSM state number.
>>>
>>> So the #if 0'd code should just be deleted?
>>
>> OK, I misread your question before.
>>
>> As I read the code, our server does not appear to utilize the client's 
>> NSM state number, except for gating SM_NOTIFY requests with a 
>> previously-seen NSM state number.  The #ifdef'd code would potentially 
>> deny lock requests if it detected the state number going backwards.
>>
>> It would be nicer if the server actually tracked the client's state  
>> number, but it doesn't appear to do that today.  The #ifdef'd code  
>> serves to remind us that we should consider this.  This would also  
>> prevent a delayed SM_NOTIFY from causing the server to drop locks  
>> reacquired during the grace period accidentally.
>>
>> So I think it would be good to leave it, or replace it with a FIXME  
>> comment, for now.  Eventually we should add a little extra logic to  
>> handle this case.
>>
>>> --b.
>>>
>>>>
>>>>>> 	fl->fl_flags |= FL_ACCESS;
>>>>>> 	status = do_vfs_lock(fl);
>>>>>> diff --git a/fs/lockd/mon.c b/fs/lockd/mon.c
>>>>>> index 6d5d4a4..5017d50 100644
>>>>>> --- a/fs/lockd/mon.c
>>>>>> +++ b/fs/lockd/mon.c
>>>>>> @@ -188,8 +188,12 @@ int nsm_monitor(const struct nlm_host *host)
>>>>>> 		status = -EIO;
>>>>>> 	if (status < 0)
>>>>>> 		printk(KERN_NOTICE "lockd: cannot monitor %s\n", nsm->sm_name);
>>>>>> -	else
>>>>>> +	else {
>>>>>> 		nsm->sm_monitored = 1;
>>>>>> +		nsm_local_state = res.state;
>>>>>> +		dprintk("lockd: nsm_monitor: NSM state is now %d\n",
>>>>>> +				nsm_local_state);
>>>>>
>>>>> Could we make that a dprintk in the case where this changes  
>>>>> nsm_local
>>>>> state from something other than zero (nsm_lock_state &&
>>>>> nsm_local_state
>>>>> != res.state)?
>>>>>
>>>>> (Just to make sure no statd is returning inconsistent  
>>>>> nsm_local_stats
>>>>> here.)
>
> Having the kernel limit changes to the state number is probably not a  
> good idea.  Certain statd operations such as SM_SIMU_CRASH will modify  
> that state number.  We don't use SM_SIMU_CRASH today, but handling  
> server failover and such will likely require something like it.
>
> In any event, servers that are careful enough to track a client's NSM  
> state number will tell us pretty quickly if this is not working right.
>
>>>> I'm not sure that's a big deal, but...
>>>>
>>>> Note that the XNFS version 3 spec suggests the local lockd should
>>>> request the NSM state number when it starts up by posting an
>>>> SM_UNMON_ALL to the local statd.  That might be safer than loading  
>>>> it
>>>> after every SM_MON.
>
> So, the problem with using SM_UNMON_ALL when lockd starts up is that it 
> introduces yet another start-up ordering dependency.  In order for this 
> solution to work, statd is required to be running before lockd starts up. 
>  I think we discussed a few weeks ago how, on the server, lockd needs to 
> start first so that it is available before reboot notifications are sent.

You can start statd without sending notifications.

Note a few years ago Neil added a very detailed discussion of server and
client startup order to nfs-utils/README, worth reading.

--b.

> Even though this patch is for the client, I'm loathe to add yet another 
> start-up ordering dependency in this area.  Theoretically this stuff 
> should work correctly no matter what order you start it (especially since 
> we don't package NFS init scripts with nfs-utils).  The current proposal 
> (using the result of SM_MON) provides adequate NSM state number updates 
> without introducing new ordering constraints.

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2009-05-08 15:33 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-23 23:31 [PATCH 00/19] Proposed server-side patches for 2.6.31 Chuck Lever
     [not found] ` <20090423231550.17283.24432.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
2009-04-23 23:31   ` [PATCH 01/19] SUNRPC: Fix error return value of svc_addr_len() Chuck Lever
     [not found]     ` <20090423233124.17283.40252.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
2009-04-25 22:17       ` J. Bruce Fields
2009-04-27 16:49         ` Chuck Lever
2009-04-27 23:51           ` J. Bruce Fields
2009-04-28 15:28             ` Chuck Lever
2009-04-28 15:31               ` J. Bruce Fields
2009-04-23 23:31   ` [PATCH 02/19] NFSD: Refactor transport removal out of __write_ports() Chuck Lever
2009-04-23 23:31   ` [PATCH 03/19] NFSD: Refactor transport addition " Chuck Lever
2009-04-23 23:31   ` [PATCH 04/19] NFSD: Refactor portlist socket closing into a helper Chuck Lever
2009-04-23 23:31   ` [PATCH 05/19] NFSD: Refactor socket creation out of __write_ports() Chuck Lever
     [not found]     ` <20090423233155.17283.37345.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
2009-04-25 22:40       ` J. Bruce Fields
2009-04-23 23:32   ` [PATCH 06/19] NFSD: Note an additional requirement when passing TCP sockets to portlist Chuck Lever
2009-04-23 23:32   ` [PATCH 07/19] NFSD: Finish refactoring __write_ports() Chuck Lever
2009-04-23 23:32   ` [PATCH 08/19] NFSD: move lockd_up() before svc_addsock() Chuck Lever
2009-04-23 23:32   ` [PATCH 09/19] NFSD: Prevent a buffer overflow in svc_xprt_names() Chuck Lever
     [not found]     ` <20090423233225.17283.10176.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
2009-04-27 23:56       ` J. Bruce Fields
2009-04-23 23:32   ` [PATCH 10/19] SUNRPC: pass buffer size to svc_addsock() Chuck Lever
2009-04-23 23:32   ` [PATCH 11/19] SUNRPC: pass buffer size to svc_sock_names() Chuck Lever
2009-04-23 23:32   ` [PATCH 12/19] SUNRPC: Switch one_sock_name() to use snprintf() Chuck Lever
2009-04-23 23:32   ` [PATCH 13/19] SUNRPC: Support PF_INET6 in one_sock_name() Chuck Lever
2009-04-23 23:33   ` [PATCH 14/19] SUNRPC: Clean up one_sock_name() Chuck Lever
2009-04-23 23:33   ` [PATCH 15/19] NFSD: Stricter buffer size checking in write_recoverydir() Chuck Lever
2009-04-23 23:33   ` [PATCH 16/19] NFSD: Stricter buffer size checking in write_versions() Chuck Lever
2009-04-23 23:33   ` [PATCH 17/19] NFSD: Stricter buffer size checking in fs/nfsd/nfsctl.c Chuck Lever
     [not found]     ` <20090423233325.17283.71127.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
2009-04-28 16:31       ` J. Bruce Fields
2009-04-28 16:36         ` Chuck Lever
2009-04-28 21:30           ` J. Bruce Fields
2009-04-23 23:33   ` [PATCH 18/19] lockd: Update NSM state from SM_MON replies Chuck Lever
     [not found]     ` <20090423233332.17283.23011.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
2009-04-28 16:25       ` J. Bruce Fields
2009-04-28 16:34         ` Chuck Lever
2009-04-28 16:38           ` J. Bruce Fields
2009-04-28 19:11             ` Chuck Lever
2009-05-08 15:19               ` Chuck Lever
2009-05-08 15:33                 ` J. Bruce Fields
2009-04-23 23:33   ` [PATCH 19/19] lockd: clean up 64-bit alignment fix in nsm_init_private() Chuck Lever
     [not found]     ` <20090423233340.17283.29580.stgit-07a7zB5ZJzbwdl/1UfZZQIVfYA8g3rJ/@public.gmane.org>
2009-04-28 16:31       ` J. Bruce Fields
2009-04-28 16:35         ` Chuck Lever
2009-04-28 16:40           ` J. Bruce Fields
2009-04-28 17:24             ` Chuck Lever
2009-04-28 21:36               ` J. Bruce Fields
2009-04-28 22:03                 ` Måns Rullgård
     [not found]                   ` <yw1x63gozb9f.fsf-O+uoZmgXk1l54TAoqtyWWQ@public.gmane.org>
2009-04-28 22:14                     ` Chuck Lever
2009-04-28 22:11                 ` Chuck Lever
2009-04-28 22:23                   ` J. Bruce Fields
2009-04-28 22:31                   ` Måns Rullgård
     [not found]                     ` <yw1xws94xved.fsf-O+uoZmgXk1l54TAoqtyWWQ@public.gmane.org>
2009-04-28 22:43                       ` Chuck Lever
2009-04-28 22:52                         ` Måns Rullgård
     [not found]                           ` <yw1xskjsxuff.fsf-O+uoZmgXk1l54TAoqtyWWQ@public.gmane.org>
2009-04-29 15:16                             ` Chuck Lever
2009-04-29 18:02                               ` Måns Rullgård
2009-04-25 22:14   ` [PATCH 00/19] Proposed server-side patches for 2.6.31 J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox