public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/4] utsns: RPC/NFS bug rework
@ 2009-01-06  1:13 Matt Helsley
  2009-01-06  1:13 ` [RFC][PATCH 1/4] Remove useless utsname.h includes Matt Helsley
                   ` (3 more replies)
  0 siblings, 4 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-06  1:13 UTC (permalink / raw)
  To: Linux Containers
  Cc: J. Bruce Fields, Cedric Le Goater, Linux Kernel Mailing List,
	linux-nfs, Trond Myklebust, Chuck Lever, Eric W. Biederman

This series replaces the workaround for a bug reported by Cedric Le Goater 
<clg@fr.ibm.com> back in September:

> On a system with nfs mounts, if a task unshares its mount namespace,
> a oops can occur when the system is rebooted if the task is the last
> to unreference the nfs mount. It will try to create a rpc request
> using utsname() which has been invalidated by free_nsproxy().

Cedric worked around this bug by always using the initial uts namespace's
nodename (see commit: 63ffc23d307c9534c732edd87895e37b223004a3).

This is a non-ideal solution because uts namespace nodenames are reported
as the hostname during RPC authentication. Consider a machine configured
to export directories via NFS from a parent container to designated "trusted"
child containers. It would be unable to rely on the hostname during RPC
authentication -- forcing the administration of more advanced authentication
systems than might otherwise be necessary.

The goal of this series is to report a namespace's UTS nodename rather than 
the initial UTS namespace's nodename during RPC-call authentication -- much as
before Cedric's workaround. By changing the way that the nodename is cached and
fetched we can simultaneously avoid the NULL dereference during shutdown and
ensure that amalgamated RPC services (such as statd, lockd, mountd for NFS) see
a consistent nodename.

Cheers,
	-Matt Helsley

-- 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC][PATCH 1/4] Remove useless utsname.h includes
  2009-01-06  1:13 [RFC][PATCH 0/4] utsns: RPC/NFS bug rework Matt Helsley
@ 2009-01-06  1:13 ` Matt Helsley
  2009-01-06  1:13 ` [RFC][PATCH 2/4] sunrpc: Use utsnamespaces Matt Helsley
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-06  1:13 UTC (permalink / raw)
  To: Linux Containers
  Cc: J. Bruce Fields, Cedric Le Goater, Linux Kernel Mailing List,
	linux-nfs, Trond Myklebust, Chuck Lever, Eric W. Biederman,
	Matt Helsley

[-- Attachment #1: remove-useless-utsname-includes.patch --]
[-- Type: text/plain, Size: 3351 bytes --]

As best I can tell these #includes aren't needed in these files.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: linux-nfs@vger.kernel.org
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>

---
 fs/lockd/xdr.c      |    1 -
 fs/lockd/xdr4.c     |    1 -
 fs/nfs/nfs2xdr.c    |    1 -
 fs/nfs/nfs3proc.c   |    1 -
 fs/nfs/nfs4proc.c   |    1 -
 fs/nfs/nfs4xdr.c    |    1 -
 fs/nfs/proc.c       |    1 -
 fs/nfsd/nfs4idmap.c |    1 -
 8 files changed, 8 deletions(-)

Index: linux-2.6.28/fs/lockd/xdr.c
===================================================================
--- linux-2.6.28.orig/fs/lockd/xdr.c
+++ linux-2.6.28/fs/lockd/xdr.c
@@ -8,7 +8,6 @@
 
 #include <linux/types.h>
 #include <linux/sched.h>
-#include <linux/utsname.h>
 #include <linux/nfs.h>
 
 #include <linux/sunrpc/xdr.h>
Index: linux-2.6.28/fs/lockd/xdr4.c
===================================================================
--- linux-2.6.28.orig/fs/lockd/xdr4.c
+++ linux-2.6.28/fs/lockd/xdr4.c
@@ -9,7 +9,6 @@
 
 #include <linux/types.h>
 #include <linux/sched.h>
-#include <linux/utsname.h>
 #include <linux/nfs.h>
 
 #include <linux/sunrpc/xdr.h>
Index: linux-2.6.28/fs/nfs/proc.c
===================================================================
--- linux-2.6.28.orig/fs/nfs/proc.c
+++ linux-2.6.28/fs/nfs/proc.c
@@ -32,7 +32,6 @@
 #include <linux/slab.h>
 #include <linux/time.h>
 #include <linux/mm.h>
-#include <linux/utsname.h>
 #include <linux/errno.h>
 #include <linux/string.h>
 #include <linux/in.h>
Index: linux-2.6.28/fs/nfs/nfs2xdr.c
===================================================================
--- linux-2.6.28.orig/fs/nfs/nfs2xdr.c
+++ linux-2.6.28/fs/nfs/nfs2xdr.c
@@ -13,7 +13,6 @@
 #include <linux/time.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
-#include <linux/utsname.h>
 #include <linux/errno.h>
 #include <linux/string.h>
 #include <linux/in.h>
Index: linux-2.6.28/fs/nfs/nfs3proc.c
===================================================================
--- linux-2.6.28.orig/fs/nfs/nfs3proc.c
+++ linux-2.6.28/fs/nfs/nfs3proc.c
@@ -7,7 +7,6 @@
  */
 
 #include <linux/mm.h>
-#include <linux/utsname.h>
 #include <linux/errno.h>
 #include <linux/string.h>
 #include <linux/sunrpc/clnt.h>
Index: linux-2.6.28/fs/nfs/nfs4proc.c
===================================================================
--- linux-2.6.28.orig/fs/nfs/nfs4proc.c
+++ linux-2.6.28/fs/nfs/nfs4proc.c
@@ -36,7 +36,6 @@
  */
 
 #include <linux/mm.h>
-#include <linux/utsname.h>
 #include <linux/delay.h>
 #include <linux/errno.h>
 #include <linux/string.h>
Index: linux-2.6.28/fs/nfs/nfs4xdr.c
===================================================================
--- linux-2.6.28.orig/fs/nfs/nfs4xdr.c
+++ linux-2.6.28/fs/nfs/nfs4xdr.c
@@ -39,7 +39,6 @@
 #include <linux/time.h>
 #include <linux/mm.h>
 #include <linux/slab.h>
-#include <linux/utsname.h>
 #include <linux/errno.h>
 #include <linux/string.h>
 #include <linux/in.h>
Index: linux-2.6.28/fs/nfsd/nfs4idmap.c
===================================================================
--- linux-2.6.28.orig/fs/nfsd/nfs4idmap.c
+++ linux-2.6.28/fs/nfsd/nfs4idmap.c
@@ -38,7 +38,6 @@
 #include <linux/init.h>
 
 #include <linux/mm.h>
-#include <linux/utsname.h>
 #include <linux/errno.h>
 #include <linux/string.h>
 #include <linux/sunrpc/clnt.h>

-- 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06  1:13 [RFC][PATCH 0/4] utsns: RPC/NFS bug rework Matt Helsley
  2009-01-06  1:13 ` [RFC][PATCH 1/4] Remove useless utsname.h includes Matt Helsley
@ 2009-01-06  1:13 ` Matt Helsley
  2009-01-06 20:02   ` Serge E. Hallyn
  2009-01-06  1:13 ` [RFC][PATCH 3/4] sunrpc: Improve UTS namespace workaround Matt Helsley
  2009-01-06  1:13 ` [RFC][PATCH 4/4] Represent RPC Callers Matt Helsley
  3 siblings, 1 reply; 47+ messages in thread
From: Matt Helsley @ 2009-01-06  1:13 UTC (permalink / raw)
  To: Linux Containers
  Cc: J. Bruce Fields, Cedric Le Goater, Linux Kernel Mailing List,
	linux-nfs, Trond Myklebust, Chuck Lever, Eric W. Biederman,
	Matt Helsley, Linux Containers

[-- Attachment #1: reenable-uts-ns-for-most-rpc-clients.patch --]
[-- Type: text/plain, Size: 1636 bytes --]

We can often specify the UTS namespace to use when starting an RPC client.
However sometimes no UTS namespace is available (specifically during system
shutdown as the last NFS mount in a container is unmounted) so fall
back to the initial UTS namespace.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: linux-nfs@vger.kernel.org
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Linux Containers <containers@lists.osdl.org>

---
 net/sunrpc/clnt.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: linux-2.6.28/net/sunrpc/clnt.c
===================================================================
--- linux-2.6.28.orig/net/sunrpc/clnt.c
+++ linux-2.6.28/net/sunrpc/clnt.c
@@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
 	struct rpc_version	*version;
 	struct rpc_clnt		*clnt = NULL;
 	struct rpc_auth		*auth;
+	struct new_utsname	*uts_ns = init_utsname();
 	int err;
 	size_t len;
 
@@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
 	}
 
 	/* save the nodename */
-	clnt->cl_nodelen = strlen(init_utsname()->nodename);
+	if (current->nsproxy != NULL)
+		uts_ns = utsname();
+	clnt->cl_nodelen = strlen(uts_ns->nodename);
 	if (clnt->cl_nodelen > UNX_MAXNODENAME)
 		clnt->cl_nodelen = UNX_MAXNODENAME;
-	memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
+	memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
 	rpc_register_client(clnt);
 	return clnt;
 

-- 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC][PATCH 3/4] sunrpc: Improve UTS namespace workaround
  2009-01-06  1:13 [RFC][PATCH 0/4] utsns: RPC/NFS bug rework Matt Helsley
  2009-01-06  1:13 ` [RFC][PATCH 1/4] Remove useless utsname.h includes Matt Helsley
  2009-01-06  1:13 ` [RFC][PATCH 2/4] sunrpc: Use utsnamespaces Matt Helsley
@ 2009-01-06  1:13 ` Matt Helsley
  2009-01-06 16:02   ` Chuck Lever
  2009-01-06  1:13 ` [RFC][PATCH 4/4] Represent RPC Callers Matt Helsley
  3 siblings, 1 reply; 47+ messages in thread
From: Matt Helsley @ 2009-01-06  1:13 UTC (permalink / raw)
  To: Linux Containers
  Cc: J. Bruce Fields, Cedric Le Goater, Linux Kernel Mailing List,
	linux-nfs, Trond Myklebust, Chuck Lever, Eric W. Biederman,
	Matt Helsley, Linux Containers

[-- Attachment #1: reenable-uts-ns-for-nfs-clients.patch --]
[-- Type: text/plain, Size: 8868 bytes --]

We can improve upon a workaround applied in commit 
63ffc23d307c9534c732edd87895e37b223004a3 ( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=63ffc23d307c9534c732edd87895e37b223004a3 )

The original problem was:

"On a system with nfs mounts, if a task unshares its mount namespace,
a oops can occur when the system is rebooted if the task is the last
to unreference the nfs mount. It will try to create a rpc request
using utsname() which has been invalidated by free_nsproxy()."

Cedric worked around this by always using the initial uts namespace for RPC.
Critically this workaround meant that RPC clients in uts namespaces can never
report the changed nodename when utilizing RPC. 

Fix that by storing the nodename in the NFS server structure (part of the NFS
super block) and, when an RPC client is operating on behalf of NFS, reporting
that nodename. This solves the problem for NFS clients but leaves any other
RPC users out in the cold.

Rather than caching the nodename in the client structure RPC should obtain the
nodename from RPC callers. It would then be up to those services making RPC
calls to cache the nodename for as long as necessary -- somewhat like this patch
does with NFS.

NOTE: Part of Cedric's workaround -- use of the initial uts namespace -- is
still necessary because non-NFS RPC callers still rely on the nodename cached
with the RPC client struct.

Thoughts?

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: linux-nfs@vger.kernel.org
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Linux Containers <containers@lists.osdl.org>

---
 fs/lockd/clntproc.c         |   16 +++++++++++++---
 fs/nfs/client.c             |    6 ++++++
 fs/nfs/super.c              |    6 ++++++
 include/linux/nfs_fs.h      |    2 ++
 include/linux/nfs_fs_sb.h   |    3 +++
 include/linux/sunrpc/clnt.h |    2 ++
 net/sunrpc/auth_unix.c      |   13 ++++++++++++-
 7 files changed, 44 insertions(+), 4 deletions(-)

Index: linux-2.6.28/include/linux/sunrpc/clnt.h
===================================================================
--- linux-2.6.28.orig/include/linux/sunrpc/clnt.h
+++ linux-2.6.28/include/linux/sunrpc/clnt.h
@@ -19,6 +19,7 @@
 #include <asm/signal.h>
 
 struct rpc_inode;
+struct nfs_server;
 
 /*
  * The high-level client handle
@@ -50,6 +51,7 @@ struct rpc_clnt {
 
 	int			cl_nodelen;	/* nodename length */
 	char 			cl_nodename[UNX_MAXNODENAME];
+	struct nfs_server	*cl_nfs_server;
 	char			cl_pathname[30];/* Path in rpc_pipe_fs */
 	struct vfsmount *	cl_vfsmnt;
 	struct dentry *		cl_dentry;	/* inode */
Index: linux-2.6.28/net/sunrpc/auth_unix.c
===================================================================
--- linux-2.6.28.orig/net/sunrpc/auth_unix.c
+++ linux-2.6.28/net/sunrpc/auth_unix.c
@@ -12,6 +12,13 @@
 #include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/auth.h>
 
+#include <linux/nfs.h>
+#include <linux/nfs2.h>
+#include <linux/nfs3.h>
+#include <linux/nfs4.h>
+#include <linux/nfs_xdr.h>
+#include <linux/nfs_fs_sb.h>
+
 #define NFS_NGROUPS	16
 
 struct unx_cred {
@@ -151,7 +158,11 @@ unx_marshal(struct rpc_task *task, __be3
 	/*
 	 * Copy the UTS nodename captured when the client was created.
 	 */
-	p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen);
+	if (clnt->cl_nfs_server)
+		p = xdr_encode_array(p, clnt->cl_nfs_server->nodename,
+				     clnt->cl_nfs_server->nodelen);
+	else
+		p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen);
 
 	*p++ = htonl((u32) cred->uc_uid);
 	*p++ = htonl((u32) cred->uc_gid);
Index: linux-2.6.28/include/linux/nfs_fs_sb.h
===================================================================
--- linux-2.6.28.orig/include/linux/nfs_fs_sb.h
+++ linux-2.6.28/include/linux/nfs_fs_sb.h
@@ -126,6 +126,9 @@ struct nfs_server {
 	u32			mountd_version;
 	unsigned short		mountd_port;
 	unsigned short		mountd_protocol;
+
+	int			nodelen;	/* nodename length */
+	char 			nodename[UNX_MAXNODENAME];
 };
 
 /* Server capabilities */
Index: linux-2.6.28/fs/nfs/super.c
===================================================================
--- linux-2.6.28.orig/fs/nfs/super.c
+++ linux-2.6.28/fs/nfs/super.c
@@ -51,6 +51,7 @@
 #include <linux/nfs_xdr.h>
 #include <linux/magic.h>
 #include <linux/parser.h>
+#include <linux/utsname.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -1830,6 +1831,11 @@ static void nfs_fill_super(struct super_
 		sb->s_time_gran = 1;
 	}
 
+	server->nodelen = strlen(utsname()->nodename);
+	if (server->nodelen > UNX_MAXNODENAME)
+		server->nodelen = UNX_MAXNODENAME;
+	memcpy(server->nodename, utsname()->nodename, server->nodelen);
+
 	sb->s_op = &nfs_sops;
  	nfs_initialise_sb(sb);
 }
Index: linux-2.6.28/fs/nfs/client.c
===================================================================
--- linux-2.6.28.orig/fs/nfs/client.c
+++ linux-2.6.28/fs/nfs/client.c
@@ -25,10 +25,12 @@
 #include <linux/sunrpc/metrics.h>
 #include <linux/sunrpc/xprtsock.h>
 #include <linux/sunrpc/xprtrdma.h>
+#include <linux/sunrpc/svc.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_mount.h>
 #include <linux/nfs4_mount.h>
 #include <linux/lockd/bind.h>
+#include <linux/lockd/lockd.h>
 #include <linux/seq_file.h>
 #include <linux/mount.h>
 #include <linux/nfs_idmap.h>
@@ -47,6 +49,7 @@
 #include "internal.h"
 
 #define NFSDBG_FACILITY		NFSDBG_CLIENT
+#define NLMDBG_FACILITY		NLMDBG_CLIENT
 
 static DEFINE_SPINLOCK(nfs_client_lock);
 static LIST_HEAD(nfs_client_list);
@@ -555,6 +558,7 @@ static void nfs_init_server_aclclient(st
 
 	/* No errors! Assume that Sun nfsacls are supported */
 	server->caps |= NFS_CAP_ACLS;
+	server->client_acl->cl_nfs_server = server;
 	return;
 
 out_noacl:
@@ -673,6 +677,7 @@ static int nfs_init_server(struct nfs_se
 		goto error;
 
 	server->nfs_client = clp;
+ 	clp->cl_rpcclient->cl_nfs_server = server;
 
 	/* Initialise the client representation from the mount data */
 	server->flags = data->flags;
@@ -1035,6 +1040,7 @@ static int nfs4_set_client(struct nfs_se
 		goto error_put;
 
 	server->nfs_client = clp;
+	clp->cl_rpcclient->cl_nfs_server = server;
 	dprintk("<-- nfs4_set_client() = 0 [new %p]\n", clp);
 	return 0;
 
Index: linux-2.6.28/fs/lockd/clntproc.c
===================================================================
--- linux-2.6.28.orig/fs/lockd/clntproc.c
+++ linux-2.6.28/fs/lockd/clntproc.c
@@ -10,8 +10,8 @@
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/fs.h>
+#include <linux/mount.h>
 #include <linux/nfs_fs.h>
-#include <linux/utsname.h>
 #include <linux/freezer.h>
 #include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/svc.h>
@@ -118,6 +118,11 @@ static struct nlm_lockowner *nlm_find_lo
 	return res;
 }
 
+struct nfs_server *fl_nfs_server(struct file_lock *fl)
+{
+	return NFS_SB(fl->fl_file->f_path.mnt->mnt_sb);
+}
+
 /*
  * Initialize arguments for TEST/LOCK/UNLOCK/CANCEL calls
  */
@@ -125,15 +130,18 @@ static void nlmclnt_setlockargs(struct n
 {
 	struct nlm_args	*argp = &req->a_args;
 	struct nlm_lock	*lock = &argp->lock;
+	char *nodename;
 
 	nlmclnt_next_cookie(&argp->cookie);
 	argp->state   = nsm_local_state;
 	memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode), sizeof(struct nfs_fh));
-	lock->caller  = utsname()->nodename;
+
+	nodename = fl_nfs_server(fl)->nodename;
+	lock->caller  = nodename;
 	lock->oh.data = req->a_owner;
 	lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
 				(unsigned int)fl->fl_u.nfs_fl.owner->pid,
-				utsname()->nodename);
+				nodename);
 	lock->svid = fl->fl_u.nfs_fl.owner->pid;
 	lock->fl.fl_start = fl->fl_start;
 	lock->fl.fl_end = fl->fl_end;
@@ -272,6 +280,7 @@ nlmclnt_call(struct rpc_cred *cred, stru
 		/* If we have no RPC client yet, create one. */
 		if ((clnt = nlm_bind_host(host)) == NULL)
 			return -ENOLCK;
+		clnt->cl_nfs_server = fl_nfs_server(&argp->lock.fl);
 		msg.rpc_proc = &clnt->cl_procinfo[proc];
 
 		/* Perform the RPC call. If an error occurs, try again */
@@ -344,6 +353,7 @@ static struct rpc_task *__nlm_async_call
 	clnt = nlm_bind_host(host);
 	if (clnt == NULL)
 		goto out_err;
+	clnt->cl_nfs_server = fl_nfs_server(&req->a_args.lock.fl);
 	msg->rpc_proc = &clnt->cl_procinfo[proc];
 	task_setup_data.rpc_client = clnt;
 
Index: linux-2.6.28/include/linux/nfs_fs.h
===================================================================
--- linux-2.6.28.orig/include/linux/nfs_fs.h
+++ linux-2.6.28/include/linux/nfs_fs.h
@@ -262,6 +262,8 @@ static inline __u64 NFS_FILEID(const str
 	return NFS_I(inode)->fileid;
 }
 
+struct nfs_server *fl_nfs_server(struct file_lock *fl);
+
 static inline void set_nfs_fileid(struct inode *inode, __u64 fileid)
 {
 	NFS_I(inode)->fileid = fileid;

-- 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [RFC][PATCH 4/4] Represent RPC Callers
  2009-01-06  1:13 [RFC][PATCH 0/4] utsns: RPC/NFS bug rework Matt Helsley
                   ` (2 preceding siblings ...)
  2009-01-06  1:13 ` [RFC][PATCH 3/4] sunrpc: Improve UTS namespace workaround Matt Helsley
@ 2009-01-06  1:13 ` Matt Helsley
  2009-01-06 13:04   ` Trond Myklebust
  3 siblings, 1 reply; 47+ messages in thread
From: Matt Helsley @ 2009-01-06  1:13 UTC (permalink / raw)
  To: Linux Containers
  Cc: J. Bruce Fields, Cedric Le Goater, Linux Kernel Mailing List,
	linux-nfs, Trond Myklebust, Chuck Lever, Eric W. Biederman,
	Matt Helsley, Linux Containers

[-- Attachment #1: move-rpc-client-nodename-cache.patch --]
[-- Type: text/plain, Size: 13549 bytes --]

Currently RPC needs to know the nodename (often the same as the hostname) which
should be used for UNIX-style authentication and file-lock tracking. Because
hostname can change between RPC calls and some sequences of RPC calls may
require consistent names between calls RPC currently saves the nodename with
the RPC client structure.

This is doesn't always work because RPC clients may be discarded over the
lifetime of a higher level service -- like those that compose NFS. Specifically
this is known to happen during shutdown.

Hence RPC should expect the nodename to be saved by the caller when sequences
of RPC calls requiring consistent nodenames may be needed (e.g. NFS). To enable
this we introduce an RPC caller structure that allows RPC to query the caller
for this information.

This patch is not complete but is meant to indicate the direction I'm planning
on going. I'd like to know if there are any objections or if anyone sees a
better way to handle this.

Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: linux-nfs@vger.kernel.org
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Linux Containers <containers@lists.osdl.org>

---
 fs/lockd/clntproc.c         |   24 ++++++---------
 fs/nfs/client.c             |    6 ---
 fs/nfs/super.c              |    8 +----
 include/linux/lockd/xdr.h   |    3 -
 include/linux/nfs_fs.h      |    2 -
 include/linux/nfs_fs_sb.h   |    3 -
 include/linux/sunrpc/clnt.h |   13 +++++---
 net/sunrpc/auth_unix.c      |   17 ++--------
 net/sunrpc/clnt.c           |   70 +++++++++++++++++++++++++++++++++++++-------
 9 files changed, 88 insertions(+), 58 deletions(-)

Index: linux-2.6.28/include/linux/sunrpc/clnt.h
===================================================================
--- linux-2.6.28.orig/include/linux/sunrpc/clnt.h
+++ linux-2.6.28/include/linux/sunrpc/clnt.h
@@ -19,7 +19,14 @@
 #include <asm/signal.h>
 
 struct rpc_inode;
-struct nfs_server;
+
+struct rpc_caller {
+	struct kref	caller_kref;	/* Number of references from rpc_clnt */
+	void		(*fill_nodename) (struct rpc_caller *caller,
+			     		  char nodename[UNX_MAXNODENAME],
+			     		  int *nodelen);
+	void		(*destroy) (struct kref *caller_kref);
+};
 
 /*
  * The high-level client handle
@@ -49,9 +56,7 @@ struct rpc_clnt {
 	struct rpc_rtt *	cl_rtt;		/* RTO estimator data */
 	const struct rpc_timeout *cl_timeout;	/* Timeout strategy */
 
-	int			cl_nodelen;	/* nodename length */
-	char 			cl_nodename[UNX_MAXNODENAME];
-	struct nfs_server	*cl_nfs_server;
+	struct rpc_caller	*cl_local_caller;
 	char			cl_pathname[30];/* Path in rpc_pipe_fs */
 	struct vfsmount *	cl_vfsmnt;
 	struct dentry *		cl_dentry;	/* inode */
Index: linux-2.6.28/net/sunrpc/clnt.c
===================================================================
--- linux-2.6.28.orig/net/sunrpc/clnt.c
+++ linux-2.6.28/net/sunrpc/clnt.c
@@ -122,13 +122,63 @@ rpc_setup_pipedir(struct rpc_clnt *clnt,
 	}
 }
 
-static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args, struct rpc_xprt *xprt)
+/* This RPC caller does not supply a means for RPC to query it. */
+struct rpc_anonymous_caller {
+	struct rpc_caller caller;
+	int nodelen;
+	char nodename[UNX_MAXNODENAME]
+};
+
+static void rpc_anonymous_nodename(struct rpc_caller *caller,
+			  	   char nodename[UNX_MAXNODENAME],
+				   int *nodelen)
+{
+	struct rpc_anonymous_caller *anon;
+
+	anon = container_of(caller, struct rpc_anonymous, caller);
+	memcpy(nodename, anon->nodename, anon->nodelen);
+	*nodelen = anon->nodelen;
+}
+
+static void rpc_anon_caller_destroy(struct kref *caller_ref)
+{
+	struct rpc_anonymous_caller *anon;
+	struct rpc_caller *caller = container_of(caller_ref, struct rpc_caller,
+					 	 caller_ref);
+
+	anon = container_of(caller, struct rpc_anonymous_caller, caller);
+	kfree(anon);
+}
+
+static struct rpc_caller *rpc_new_anon_caller(void)
+{
+	struct rpc_anonymous_caller	*anon;
+	struct new_utsname		*uts_ns;
+
+       	anon = kmalloc(GFP_KERNEL, sizeof(*anon));
+	if (anon == NULL)
+		return NULL;
+	uts_ns = init_utsname();
+	if (current->nsproxy != NULL)
+		uts_ns = utsname();
+	anon->nodelen = strlen(uts_ns->nodename);
+	if (anon->nodelen > UNX_MAXNODENAME)
+		anon->nodelen = UNX_MAXNODENAME;
+	memcpy(anon->nodename, uts_ns->nodename, anon->nodelen);
+	anon->caller.nodename = rpc_anonymous_nodename;
+	anon->caller.destroy = rpc_anon_caller_destroy;
+	kref_init(&anon->caller.caller_kref);
+	return &anon->caller;
+}
+
+static struct rpc_clnt * rpc_new_client(const struct rpc_create_args *args,
+					struct rpc_xprt *xprt,
+					struct rpc_caller *local_caller)
 {
 	struct rpc_program	*program = args->program;
 	struct rpc_version	*version;
 	struct rpc_clnt		*clnt = NULL;
 	struct rpc_auth		*auth;
-	struct new_utsname	*uts_ns = init_utsname();
 	int err;
 	size_t len;
 
@@ -213,13 +263,8 @@ static struct rpc_clnt * rpc_new_client(
 		goto out_no_auth;
 	}
 
-	/* save the nodename */
-	if (current->nsproxy != NULL)
-		uts_ns = utsname();
-	clnt->cl_nodelen = strlen(uts_ns->nodename);
-	if (clnt->cl_nodelen > UNX_MAXNODENAME)
-		clnt->cl_nodelen = UNX_MAXNODENAME;
-	memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
+	kref_get(&local_caller->caller_kref);
+	clnt->local_caller = local_caller;
 	rpc_register_client(clnt);
 	return clnt;
 
@@ -252,7 +297,8 @@ out_no_rpciod:
  * it supports this program and version.  RPC_CLNT_CREATE_NOPING disables
  * this behavior so asynchronous tasks can also use rpc_create.
  */
-struct rpc_clnt *rpc_create(struct rpc_create_args *args)
+struct rpc_clnt *rpc_create(struct rpc_create_args *args,
+			    struct rpc_caller *local_caller)
 {
 	struct rpc_xprt *xprt;
 	struct rpc_clnt *clnt;
@@ -307,7 +353,7 @@ struct rpc_clnt *rpc_create(struct rpc_c
 	if (args->flags & RPC_CLNT_CREATE_NONPRIVPORT)
 		xprt->resvport = 0;
 
-	clnt = rpc_new_client(args, xprt);
+	clnt = rpc_new_client(args, xprt, local_caller);
 	if (IS_ERR(clnt))
 		return clnt;
 
@@ -365,6 +411,7 @@ rpc_clone_client(struct rpc_clnt *clnt)
 		atomic_inc(&new->cl_auth->au_count);
 	xprt_get(clnt->cl_xprt);
 	kref_get(&clnt->cl_kref);
+	kref_get(&clnt->local_caller->caller_kref);
 	rpc_register_client(new);
 	rpciod_up();
 	return new;
@@ -422,6 +469,7 @@ out_free:
 	rpc_free_iostats(clnt->cl_metrics);
 	clnt->cl_metrics = NULL;
 	xprt_put(clnt->cl_xprt);
+	kref_put(&clnt->local_caller->caller_kref, clnt->local_caller->destroy);
 	rpciod_down();
 	kfree(clnt);
 }
Index: linux-2.6.28/net/sunrpc/auth_unix.c
===================================================================
--- linux-2.6.28.orig/net/sunrpc/auth_unix.c
+++ linux-2.6.28/net/sunrpc/auth_unix.c
@@ -12,13 +12,6 @@
 #include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/auth.h>
 
-#include <linux/nfs.h>
-#include <linux/nfs2.h>
-#include <linux/nfs3.h>
-#include <linux/nfs4.h>
-#include <linux/nfs_xdr.h>
-#include <linux/nfs_fs_sb.h>
-
 #define NFS_NGROUPS	16
 
 struct unx_cred {
@@ -150,6 +143,8 @@ unx_marshal(struct rpc_task *task, __be3
 	struct unx_cred	*cred = container_of(task->tk_msg.rpc_cred, struct unx_cred, uc_base);
 	__be32		*base, *hold;
 	int		i;
+	int cl_nodelen;
+	char cl_nodename[UNX_MAXNODENAME]
 
 	*p++ = htonl(RPC_AUTH_UNIX);
 	base = p++;
@@ -158,12 +153,8 @@ unx_marshal(struct rpc_task *task, __be3
 	/*
 	 * Copy the UTS nodename captured when the client was created.
 	 */
-	if (clnt->cl_nfs_server)
-		p = xdr_encode_array(p, clnt->cl_nfs_server->nodename,
-				     clnt->cl_nfs_server->nodelen);
-	else
-		p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen);
-
+	clnt->caller->nodename(cl_nodename, &cl_nodelen);
+	p = xdr_encode_array(p, cl_nodename, cl_nodelen);
 	*p++ = htonl((u32) cred->uc_uid);
 	*p++ = htonl((u32) cred->uc_gid);
 	hold = p++;
Index: linux-2.6.28/fs/nfs/super.c
===================================================================
--- linux-2.6.28.orig/fs/nfs/super.c
+++ linux-2.6.28/fs/nfs/super.c
@@ -1831,11 +1831,9 @@ static void nfs_fill_super(struct super_
 		sb->s_time_gran = 1;
 	}
 
-	server->nodelen = strlen(utsname()->nodename);
-	if (server->nodelen > UNX_MAXNODENAME)
-		server->nodelen = UNX_MAXNODENAME;
-	memcpy(server->nodename, utsname()->nodename, server->nodelen);
-
+	server->rpc_caller = rpc_new_anon_caller();
+	if (server->rpc_caller)
+		kref_get(&server->rpc_caller->caller_kref);
 	sb->s_op = &nfs_sops;
  	nfs_initialise_sb(sb);
 }
Index: linux-2.6.28/fs/lockd/clntproc.c
===================================================================
--- linux-2.6.28.orig/fs/lockd/clntproc.c
+++ linux-2.6.28/fs/lockd/clntproc.c
@@ -118,11 +118,6 @@ static struct nlm_lockowner *nlm_find_lo
 	return res;
 }
 
-struct nfs_server *fl_nfs_server(struct file_lock *fl)
-{
-	return NFS_SB(fl->fl_file->f_path.mnt->mnt_sb);
-}
-
 /*
  * Initialize arguments for TEST/LOCK/UNLOCK/CANCEL calls
  */
@@ -130,18 +125,23 @@ static void nlmclnt_setlockargs(struct n
 {
 	struct nlm_args	*argp = &req->a_args;
 	struct nlm_lock	*lock = &argp->lock;
-	char *nodename;
+	struct rpc_clnt *clp;
+	unsigned int nodename_len;
 
 	nlmclnt_next_cookie(&argp->cookie);
 	argp->state   = nsm_local_state;
 	memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode), sizeof(struct nfs_fh));
 
-	nodename = fl_nfs_server(fl)->nodename;
-	lock->caller  = nodename;
 	lock->oh.data = req->a_owner;
-	lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
-				(unsigned int)fl->fl_u.nfs_fl.owner->pid,
-				nodename);
+	lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), "%u@",
+				 (unsigned int)fl->fl_u.nfs_fl.owner->pid);
+	lock->caller = &lock->oh.data[lock->oh.len];
+	clp = req->a_host->h_rpcclnt;
+	clp->cl_local_caller->nodename(clp->cl_local_caller, lock->caller,
+	   			       &nodename_len);
+	lock->oh.len += nodename_len;
+	lock->oh.data[lock->oh.len] = '\0';
+
 	lock->svid = fl->fl_u.nfs_fl.owner->pid;
 	lock->fl.fl_start = fl->fl_start;
 	lock->fl.fl_end = fl->fl_end;
@@ -280,7 +280,6 @@ nlmclnt_call(struct rpc_cred *cred, stru
 		/* If we have no RPC client yet, create one. */
 		if ((clnt = nlm_bind_host(host)) == NULL)
 			return -ENOLCK;
-		clnt->cl_nfs_server = fl_nfs_server(&argp->lock.fl);
 		msg.rpc_proc = &clnt->cl_procinfo[proc];
 
 		/* Perform the RPC call. If an error occurs, try again */
@@ -353,7 +352,6 @@ static struct rpc_task *__nlm_async_call
 	clnt = nlm_bind_host(host);
 	if (clnt == NULL)
 		goto out_err;
-	clnt->cl_nfs_server = fl_nfs_server(&req->a_args.lock.fl);
 	msg->rpc_proc = &clnt->cl_procinfo[proc];
 	task_setup_data.rpc_client = clnt;
 
Index: linux-2.6.28/fs/nfs/client.c
===================================================================
--- linux-2.6.28.orig/fs/nfs/client.c
+++ linux-2.6.28/fs/nfs/client.c
@@ -25,12 +25,10 @@
 #include <linux/sunrpc/metrics.h>
 #include <linux/sunrpc/xprtsock.h>
 #include <linux/sunrpc/xprtrdma.h>
-#include <linux/sunrpc/svc.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfs_mount.h>
 #include <linux/nfs4_mount.h>
 #include <linux/lockd/bind.h>
-#include <linux/lockd/lockd.h>
 #include <linux/seq_file.h>
 #include <linux/mount.h>
 #include <linux/nfs_idmap.h>
@@ -49,7 +47,6 @@
 #include "internal.h"
 
 #define NFSDBG_FACILITY		NFSDBG_CLIENT
-#define NLMDBG_FACILITY		NLMDBG_CLIENT
 
 static DEFINE_SPINLOCK(nfs_client_lock);
 static LIST_HEAD(nfs_client_list);
@@ -558,7 +555,6 @@ static void nfs_init_server_aclclient(st
 
 	/* No errors! Assume that Sun nfsacls are supported */
 	server->caps |= NFS_CAP_ACLS;
-	server->client_acl->cl_nfs_server = server;
 	return;
 
 out_noacl:
@@ -677,7 +673,6 @@ static int nfs_init_server(struct nfs_se
 		goto error;
 
 	server->nfs_client = clp;
- 	clp->cl_rpcclient->cl_nfs_server = server;
 
 	/* Initialise the client representation from the mount data */
 	server->flags = data->flags;
@@ -1040,7 +1035,6 @@ static int nfs4_set_client(struct nfs_se
 		goto error_put;
 
 	server->nfs_client = clp;
-	clp->cl_rpcclient->cl_nfs_server = server;
 	dprintk("<-- nfs4_set_client() = 0 [new %p]\n", clp);
 	return 0;
 
Index: linux-2.6.28/include/linux/lockd/xdr.h
===================================================================
--- linux-2.6.28.orig/include/linux/lockd/xdr.h
+++ linux-2.6.28/include/linux/lockd/xdr.h
@@ -28,8 +28,7 @@ struct svc_rqst;
 
 /* Lock info passed via NLM */
 struct nlm_lock {
-	char *			caller;
-	unsigned int		len; 	/* length of "caller" */
+	struct rpc_caller	*caller;
 	struct nfs_fh		fh;
 	struct xdr_netobj	oh;
 	u32			svid;
Index: linux-2.6.28/include/linux/nfs_fs.h
===================================================================
--- linux-2.6.28.orig/include/linux/nfs_fs.h
+++ linux-2.6.28/include/linux/nfs_fs.h
@@ -262,8 +262,6 @@ static inline __u64 NFS_FILEID(const str
 	return NFS_I(inode)->fileid;
 }
 
-struct nfs_server *fl_nfs_server(struct file_lock *fl);
-
 static inline void set_nfs_fileid(struct inode *inode, __u64 fileid)
 {
 	NFS_I(inode)->fileid = fileid;
Index: linux-2.6.28/include/linux/nfs_fs_sb.h
===================================================================
--- linux-2.6.28.orig/include/linux/nfs_fs_sb.h
+++ linux-2.6.28/include/linux/nfs_fs_sb.h
@@ -127,8 +127,7 @@ struct nfs_server {
 	unsigned short		mountd_port;
 	unsigned short		mountd_protocol;
 
-	int			nodelen;	/* nodename length */
-	char 			nodename[UNX_MAXNODENAME];
+	struct rpc_caller	*rpc_caller;
 };
 
 /* Server capabilities */

-- 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 4/4] Represent RPC Callers
  2009-01-06  1:13 ` [RFC][PATCH 4/4] Represent RPC Callers Matt Helsley
@ 2009-01-06 13:04   ` Trond Myklebust
  2009-01-06 23:05     ` Matt Helsley
  0 siblings, 1 reply; 47+ messages in thread
From: Trond Myklebust @ 2009-01-06 13:04 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Linux Containers, J. Bruce Fields, Cedric Le Goater,
	Linux Kernel Mailing List, linux-nfs, Chuck Lever,
	Eric W. Biederman, Linux Containers

On Mon, 2009-01-05 at 17:13 -0800, Matt Helsley wrote:
> plain text document attachment (move-rpc-client-nodename-cache.patch)
> Currently RPC needs to know the nodename (often the same as the hostname) which
> should be used for UNIX-style authentication and file-lock tracking. Because
> hostname can change between RPC calls and some sequences of RPC calls may
> require consistent names between calls RPC currently saves the nodename with
> the RPC client structure.
> 
> This is doesn't always work because RPC clients may be discarded over the
> lifetime of a higher level service -- like those that compose NFS. Specifically
> this is known to happen during shutdown.
> 
> Hence RPC should expect the nodename to be saved by the caller when sequences
> of RPC calls requiring consistent nodenames may be needed (e.g. NFS). To enable
> this we introduce an RPC caller structure that allows RPC to query the caller
> for this information.
> 
> This patch is not complete but is meant to indicate the direction I'm planning
> on going. I'd like to know if there are any objections or if anyone sees a
> better way to handle this.

You're planning on slowing down every RPC call in order to fix a problem
on client shutdown? Why?

Trond


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 3/4] sunrpc: Improve UTS namespace workaround
  2009-01-06  1:13 ` [RFC][PATCH 3/4] sunrpc: Improve UTS namespace workaround Matt Helsley
@ 2009-01-06 16:02   ` Chuck Lever
  2009-01-07  0:28     ` Matt Helsley
  2009-01-07  3:02     ` Matt Helsley
  0 siblings, 2 replies; 47+ messages in thread
From: Chuck Lever @ 2009-01-06 16:02 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Linux Containers, J. Bruce Fields, Cedric Le Goater,
	Linux Kernel Mailing List, linux-nfs, Trond Myklebust,
	Eric W. Biederman, Linux Containers

Matt-

Thanks for pursuing a permanent fix for this.

On Jan 5, 2009, at Jan 5, 2009, 8:13 PM, Matt Helsley wrote:

> We can improve upon a workaround applied in commit
> 63ffc23d307c9534c732edd87895e37b223004a3 ( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=63ffc23d307c9534c732edd87895e37b223004a3 
>  )
>
> The original problem was:
>
> "On a system with nfs mounts, if a task unshares its mount namespace,
> a oops can occur when the system is rebooted if the task is the last
> to unreference the nfs mount. It will try to create a rpc request
> using utsname() which has been invalidated by free_nsproxy()."
>
> Cedric worked around this by always using the initial uts namespace  
> for RPC.
> Critically this workaround meant that RPC clients in uts namespaces  
> can never
> report the changed nodename when utilizing RPC.
>
> Fix that by storing the nodename in the NFS server structure (part  
> of the NFS
> super block) and, when an RPC client is operating on behalf of NFS,  
> reporting
> that nodename. This solves the problem for NFS clients but leaves  
> any other
> RPC users out in the cold.
>
> Rather than caching the nodename in the client structure RPC should  
> obtain the
> nodename from RPC callers. It would then be up to those services  
> making RPC
> calls to cache the nodename for as long as necessary -- somewhat  
> like this patch
> does with NFS.

Instead of having the RPC client call the consumer back, why can't you  
pass the nodename as an argument to each RPC call; say, via the  
rpc_message structure?

> NOTE: Part of Cedric's workaround -- use of the initial uts  
> namespace -- is
> still necessary because non-NFS RPC callers still rely on the  
> nodename cached
> with the RPC client struct.

In the long run I think it would be more useful to spell out where  
each consumer gets its nodename value, rather than having a convenient  
default value.  A default would encourage exposing nodenames  
inappropriately due to sloppy coding and incorrect assumptions (on the  
developer's part) about a complex API.

Where would NFSv4 callbacks get a nodename, for example?  And what  
about rpcbind requests?

> Thoughts?


More comments below.

> Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
> Cc: Cedric Le Goater <clg@fr.ibm.com>
> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> Cc: linux-nfs@vger.kernel.org
> Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
> Cc: Chuck Lever <chuck.lever@oracle.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Linux Containers <containers@lists.osdl.org>
>
> ---
> fs/lockd/clntproc.c         |   16 +++++++++++++---
> fs/nfs/client.c             |    6 ++++++
> fs/nfs/super.c              |    6 ++++++
> include/linux/nfs_fs.h      |    2 ++
> include/linux/nfs_fs_sb.h   |    3 +++
> include/linux/sunrpc/clnt.h |    2 ++
> net/sunrpc/auth_unix.c      |   13 ++++++++++++-
> 7 files changed, 44 insertions(+), 4 deletions(-)
>
> Index: linux-2.6.28/include/linux/sunrpc/clnt.h
> ===================================================================
> --- linux-2.6.28.orig/include/linux/sunrpc/clnt.h
> +++ linux-2.6.28/include/linux/sunrpc/clnt.h
> @@ -19,6 +19,7 @@
> #include <asm/signal.h>
>
> struct rpc_inode;
> +struct nfs_server;
>
> /*
>  * The high-level client handle
> @@ -50,6 +51,7 @@ struct rpc_clnt {
>
> 	int			cl_nodelen;	/* nodename length */
> 	char 			cl_nodename[UNX_MAXNODENAME];
> +	struct nfs_server	*cl_nfs_server;

This is a layering violation...  I would rather avoid introducing new  
strong data structure dependencies on one of RPC's consumers.

> 	char			cl_pathname[30];/* Path in rpc_pipe_fs */
> 	struct vfsmount *	cl_vfsmnt;
> 	struct dentry *		cl_dentry;	/* inode */
> Index: linux-2.6.28/net/sunrpc/auth_unix.c
> ===================================================================
> --- linux-2.6.28.orig/net/sunrpc/auth_unix.c
> +++ linux-2.6.28/net/sunrpc/auth_unix.c
> @@ -12,6 +12,13 @@
> #include <linux/sunrpc/clnt.h>
> #include <linux/sunrpc/auth.h>
>
> +#include <linux/nfs.h>
> +#include <linux/nfs2.h>
> +#include <linux/nfs3.h>
> +#include <linux/nfs4.h>
> +#include <linux/nfs_xdr.h>
> +#include <linux/nfs_fs_sb.h>
> +
> #define NFS_NGROUPS	16
>
> struct unx_cred {
> @@ -151,7 +158,11 @@ unx_marshal(struct rpc_task *task, __be3
> 	/*
> 	 * Copy the UTS nodename captured when the client was created.
> 	 */
> -	p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen);
> +	if (clnt->cl_nfs_server)
> +		p = xdr_encode_array(p, clnt->cl_nfs_server->nodename,
> +				     clnt->cl_nfs_server->nodelen);
> +	else
> +		p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen);
>
> 	*p++ = htonl((u32) cred->uc_uid);
> 	*p++ = htonl((u32) cred->uc_gid);
> Index: linux-2.6.28/include/linux/nfs_fs_sb.h
> ===================================================================
> --- linux-2.6.28.orig/include/linux/nfs_fs_sb.h
> +++ linux-2.6.28/include/linux/nfs_fs_sb.h
> @@ -126,6 +126,9 @@ struct nfs_server {
> 	u32			mountd_version;
> 	unsigned short		mountd_port;
> 	unsigned short		mountd_protocol;
> +
> +	int			nodelen;	/* nodename length */
> +	char 			nodename[UNX_MAXNODENAME];
> };
>
> /* Server capabilities */
> Index: linux-2.6.28/fs/nfs/super.c
> ===================================================================
> --- linux-2.6.28.orig/fs/nfs/super.c
> +++ linux-2.6.28/fs/nfs/super.c
> @@ -51,6 +51,7 @@
> #include <linux/nfs_xdr.h>
> #include <linux/magic.h>
> #include <linux/parser.h>
> +#include <linux/utsname.h>
>
> #include <asm/system.h>
> #include <asm/uaccess.h>
> @@ -1830,6 +1831,11 @@ static void nfs_fill_super(struct super_
> 		sb->s_time_gran = 1;
> 	}
>
> +	server->nodelen = strlen(utsname()->nodename);
> +	if (server->nodelen > UNX_MAXNODENAME)
> +		server->nodelen = UNX_MAXNODENAME;
> +	memcpy(server->nodename, utsname()->nodename, server->nodelen);
> +
> 	sb->s_op = &nfs_sops;
>  	nfs_initialise_sb(sb);
> }
> Index: linux-2.6.28/fs/nfs/client.c
> ===================================================================
> --- linux-2.6.28.orig/fs/nfs/client.c
> +++ linux-2.6.28/fs/nfs/client.c
> @@ -25,10 +25,12 @@
> #include <linux/sunrpc/metrics.h>
> #include <linux/sunrpc/xprtsock.h>
> #include <linux/sunrpc/xprtrdma.h>
> +#include <linux/sunrpc/svc.h>
> #include <linux/nfs_fs.h>
> #include <linux/nfs_mount.h>
> #include <linux/nfs4_mount.h>
> #include <linux/lockd/bind.h>
> +#include <linux/lockd/lockd.h>
> #include <linux/seq_file.h>
> #include <linux/mount.h>
> #include <linux/nfs_idmap.h>
> @@ -47,6 +49,7 @@
> #include "internal.h"
>
> #define NFSDBG_FACILITY		NFSDBG_CLIENT
> +#define NLMDBG_FACILITY		NLMDBG_CLIENT
>
> static DEFINE_SPINLOCK(nfs_client_lock);
> static LIST_HEAD(nfs_client_list);
> @@ -555,6 +558,7 @@ static void nfs_init_server_aclclient(st
>
> 	/* No errors! Assume that Sun nfsacls are supported */
> 	server->caps |= NFS_CAP_ACLS;
> +	server->client_acl->cl_nfs_server = server;
> 	return;
>
> out_noacl:
> @@ -673,6 +677,7 @@ static int nfs_init_server(struct nfs_se
> 		goto error;
>
> 	server->nfs_client = clp;
> + 	clp->cl_rpcclient->cl_nfs_server = server;
>
> 	/* Initialise the client representation from the mount data */
> 	server->flags = data->flags;
> @@ -1035,6 +1040,7 @@ static int nfs4_set_client(struct nfs_se
> 		goto error_put;
>
> 	server->nfs_client = clp;
> +	clp->cl_rpcclient->cl_nfs_server = server;
> 	dprintk("<-- nfs4_set_client() = 0 [new %p]\n", clp);
> 	return 0;
>
> Index: linux-2.6.28/fs/lockd/clntproc.c
> ===================================================================
> --- linux-2.6.28.orig/fs/lockd/clntproc.c
> +++ linux-2.6.28/fs/lockd/clntproc.c
> @@ -10,8 +10,8 @@
> #include <linux/types.h>
> #include <linux/errno.h>
> #include <linux/fs.h>
> +#include <linux/mount.h>
> #include <linux/nfs_fs.h>
> -#include <linux/utsname.h>
> #include <linux/freezer.h>
> #include <linux/sunrpc/clnt.h>
> #include <linux/sunrpc/svc.h>
> @@ -118,6 +118,11 @@ static struct nlm_lockowner *nlm_find_lo
> 	return res;
> }
>
> +struct nfs_server *fl_nfs_server(struct file_lock *fl)
> +{
> +	return NFS_SB(fl->fl_file->f_path.mnt->mnt_sb);
> +}
> +
> /*
>  * Initialize arguments for TEST/LOCK/UNLOCK/CANCEL calls
>  */
> @@ -125,15 +130,18 @@ static void nlmclnt_setlockargs(struct n
> {
> 	struct nlm_args	*argp = &req->a_args;
> 	struct nlm_lock	*lock = &argp->lock;
> +	char *nodename;
>
> 	nlmclnt_next_cookie(&argp->cookie);
> 	argp->state   = nsm_local_state;
> 	memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode),  
> sizeof(struct nfs_fh));
> -	lock->caller  = utsname()->nodename;
> +
> +	nodename = fl_nfs_server(fl)->nodename;
> +	lock->caller  = nodename;
> 	lock->oh.data = req->a_owner;
> 	lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> 				(unsigned int)fl->fl_u.nfs_fl.owner->pid,
> -				utsname()->nodename);
> +				nodename);
> 	lock->svid = fl->fl_u.nfs_fl.owner->pid;
> 	lock->fl.fl_start = fl->fl_start;
> 	lock->fl.fl_end = fl->fl_end;
> @@ -272,6 +280,7 @@ nlmclnt_call(struct rpc_cred *cred, stru
> 		/* If we have no RPC client yet, create one. */
> 		if ((clnt = nlm_bind_host(host)) == NULL)
> 			return -ENOLCK;
> +		clnt->cl_nfs_server = fl_nfs_server(&argp->lock.fl);
> 		msg.rpc_proc = &clnt->cl_procinfo[proc];
>
> 		/* Perform the RPC call. If an error occurs, try again */
> @@ -344,6 +353,7 @@ static struct rpc_task *__nlm_async_call
> 	clnt = nlm_bind_host(host);
> 	if (clnt == NULL)
> 		goto out_err;
> +	clnt->cl_nfs_server = fl_nfs_server(&req->a_args.lock.fl);

This change takes care of fixing NFS client locking requests, but the  
NFS server also makes lockd callbacks to the client.  You won't have  
an nfs_server handle on the NFS server side.

Generally, you ought to pass just the nodename where needed, not a  
pointer to the nfs_server.

> 	msg->rpc_proc = &clnt->cl_procinfo[proc];
> 	task_setup_data.rpc_client = clnt;
>
> Index: linux-2.6.28/include/linux/nfs_fs.h
> ===================================================================
> --- linux-2.6.28.orig/include/linux/nfs_fs.h
> +++ linux-2.6.28/include/linux/nfs_fs.h
> @@ -262,6 +262,8 @@ static inline __u64 NFS_FILEID(const str
> 	return NFS_I(inode)->fileid;
> }
>
> +struct nfs_server *fl_nfs_server(struct file_lock *fl);
> +
> static inline void set_nfs_fileid(struct inode *inode, __u64 fileid)
> {
> 	NFS_I(inode)->fileid = fileid;
>
> -- 

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06  1:13 ` [RFC][PATCH 2/4] sunrpc: Use utsnamespaces Matt Helsley
@ 2009-01-06 20:02   ` Serge E. Hallyn
  2009-01-06 20:20     ` J. Bruce Fields
                       ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Serge E. Hallyn @ 2009-01-06 20:02 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Trond Myklebust, J. Bruce Fields, Chuck Lever, Eric W. Biederman,
	Linux Containers, Cedric Le Goater

Quoting Matt Helsley (matthltc@us.ibm.com):
> We can often specify the UTS namespace to use when starting an RPC client.
> However sometimes no UTS namespace is available (specifically during system
> shutdown as the last NFS mount in a container is unmounted) so fall
> back to the initial UTS namespace.

So what happens if we take this patch and do nothing else?

The only potential problem situation will be rpc requests
made on behalf of a container in which the last task has
exited, right?  So let's say a container did an nfs mount
and then exits, causing an nfs umount request.

That umount request will now be sent with the wrong nodename.
Does that actually cause problems, will the server use the
nodename to try and determine the client sending the request?

thanks,
-serge

> Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
> Cc: Cedric Le Goater <clg@fr.ibm.com>
> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> Cc: linux-nfs@vger.kernel.org
> Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
> Cc: Chuck Lever <chuck.lever@oracle.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Linux Containers <containers@lists.osdl.org>
> 
> ---
>  net/sunrpc/clnt.c |    7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6.28/net/sunrpc/clnt.c
> ===================================================================
> --- linux-2.6.28.orig/net/sunrpc/clnt.c
> +++ linux-2.6.28/net/sunrpc/clnt.c
> @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
>  	struct rpc_version	*version;
>  	struct rpc_clnt		*clnt = NULL;
>  	struct rpc_auth		*auth;
> +	struct new_utsname	*uts_ns = init_utsname();
>  	int err;
>  	size_t len;
> 
> @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
>  	}
> 
>  	/* save the nodename */
> -	clnt->cl_nodelen = strlen(init_utsname()->nodename);
> +	if (current->nsproxy != NULL)
> +		uts_ns = utsname();
> +	clnt->cl_nodelen = strlen(uts_ns->nodename);
>  	if (clnt->cl_nodelen > UNX_MAXNODENAME)
>  		clnt->cl_nodelen = UNX_MAXNODENAME;
> -	memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
> +	memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
>  	rpc_register_client(clnt);
>  	return clnt;
> 
> 
> -- 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 20:02   ` Serge E. Hallyn
@ 2009-01-06 20:20     ` J. Bruce Fields
  2009-01-06 21:53       ` Serge E. Hallyn
  2009-01-06 22:43       ` Matt Helsley
  2009-01-06 20:44     ` Trond Myklebust
  2009-01-06 22:29     ` Chuck Lever
  2 siblings, 2 replies; 47+ messages in thread
From: J. Bruce Fields @ 2009-01-06 20:20 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Matt Helsley, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, Trond Myklebust, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> Quoting Matt Helsley (matthltc@us.ibm.com):
> > We can often specify the UTS namespace to use when starting an RPC client.
> > However sometimes no UTS namespace is available (specifically during system
> > shutdown as the last NFS mount in a container is unmounted) so fall
> > back to the initial UTS namespace.
> 
> So what happens if we take this patch and do nothing else?
> 
> The only potential problem situation will be rpc requests
> made on behalf of a container in which the last task has
> exited, right?  So let's say a container did an nfs mount
> and then exits, causing an nfs umount request.
> 
> That umount request will now be sent with the wrong nodename.
> Does that actually cause problems, will the server use the
> nodename to try and determine the client sending the request?

This is just the machine name in the auth_unix credential?  The linux
server ignores that completely (for the purpose of auth_unix
authenication, it identifies clients only by source ip address).  I
suspect other servers also ignore it, but I don't know.

--b.

> 
> thanks,
> -serge
> 
> > Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
> > Cc: Cedric Le Goater <clg@fr.ibm.com>
> > Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> > Cc: linux-nfs@vger.kernel.org
> > Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
> > Cc: Chuck Lever <chuck.lever@oracle.com>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: Linux Containers <containers@lists.osdl.org>
> > 
> > ---
> >  net/sunrpc/clnt.c |    7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > Index: linux-2.6.28/net/sunrpc/clnt.c
> > ===================================================================
> > --- linux-2.6.28.orig/net/sunrpc/clnt.c
> > +++ linux-2.6.28/net/sunrpc/clnt.c
> > @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
> >  	struct rpc_version	*version;
> >  	struct rpc_clnt		*clnt = NULL;
> >  	struct rpc_auth		*auth;
> > +	struct new_utsname	*uts_ns = init_utsname();
> >  	int err;
> >  	size_t len;
> > 
> > @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
> >  	}
> > 
> >  	/* save the nodename */
> > -	clnt->cl_nodelen = strlen(init_utsname()->nodename);
> > +	if (current->nsproxy != NULL)
> > +		uts_ns = utsname();
> > +	clnt->cl_nodelen = strlen(uts_ns->nodename);
> >  	if (clnt->cl_nodelen > UNX_MAXNODENAME)
> >  		clnt->cl_nodelen = UNX_MAXNODENAME;
> > -	memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
> > +	memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
> >  	rpc_register_client(clnt);
> >  	return clnt;
> > 
> > 
> > -- 
> > _______________________________________________
> > Containers mailing list
> > Containers@lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 20:02   ` Serge E. Hallyn
  2009-01-06 20:20     ` J. Bruce Fields
@ 2009-01-06 20:44     ` Trond Myklebust
  2009-01-06 21:58       ` Serge E. Hallyn
  2009-01-06 23:18       ` Matt Helsley
  2009-01-06 22:29     ` Chuck Lever
  2 siblings, 2 replies; 47+ messages in thread
From: Trond Myklebust @ 2009-01-06 20:44 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Matt Helsley, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> Quoting Matt Helsley (matthltc@us.ibm.com):
> > We can often specify the UTS namespace to use when starting an RPC client.
> > However sometimes no UTS namespace is available (specifically during system
> > shutdown as the last NFS mount in a container is unmounted) so fall
> > back to the initial UTS namespace.
> 
> So what happens if we take this patch and do nothing else?
> 
> The only potential problem situation will be rpc requests
> made on behalf of a container in which the last task has
> exited, right?  So let's say a container did an nfs mount
> and then exits, causing an nfs umount request.
> 
> That umount request will now be sent with the wrong nodename.
> Does that actually cause problems, will the server use the
> nodename to try and determine the client sending the request?

The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
userspace, not the kernel. The problem here is that because lazy mounts
exist, the lifetime of the RPC client may be longer than that of the
container. In addition, it may be shared among more than 1 container,
because superblocks can be shared.

One thing you need to be aware of here is that inode dirty data
writebacks may be initiated by completely different processes than the
one that dirtied the inode.
IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
rely on being able to determine the container-specific node name at RPC
generation time are therefore going to return incorrect values.

  Trond


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 20:20     ` J. Bruce Fields
@ 2009-01-06 21:53       ` Serge E. Hallyn
  2009-01-06 23:35         ` Matt Helsley
  2009-01-06 22:43       ` Matt Helsley
  1 sibling, 1 reply; 47+ messages in thread
From: Serge E. Hallyn @ 2009-01-06 21:53 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Matt Helsley, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, Trond Myklebust, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

Quoting J. Bruce Fields (bfields@fieldses.org):
> On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley (matthltc@us.ibm.com):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> > 
> > So what happens if we take this patch and do nothing else?
> > 
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right?  So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> > 
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
> 
> This is just the machine name in the auth_unix credential?  The linux
> server ignores that completely (for the purpose of auth_unix
> authenication, it identifies clients only by source ip address).  I
> suspect other servers also ignore it, but I don't know.

Thanks, that's what i was hoping...

Matt, have you audited the other rpc-based services?  Do any
of them care?

-serge

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 20:44     ` Trond Myklebust
@ 2009-01-06 21:58       ` Serge E. Hallyn
  2009-01-06 22:42         ` Trond Myklebust
                           ` (2 more replies)
  2009-01-06 23:18       ` Matt Helsley
  1 sibling, 3 replies; 47+ messages in thread
From: Serge E. Hallyn @ 2009-01-06 21:58 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Matt Helsley, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

Quoting Trond Myklebust (trond.myklebust@fys.uio.no):
> On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley (matthltc@us.ibm.com):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> > 
> > So what happens if we take this patch and do nothing else?
> > 
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right?  So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> > 
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
> 
> The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
> userspace, not the kernel. The problem here is that because lazy mounts
> exist, the lifetime of the RPC client may be longer than that of the

Right that was what i was referring to.

> container. In addition, it may be shared among more than 1 container,
> because superblocks can be shared.

Good point.  And in that case what do we care about (even though
apparently we just might not care at all :) - who did the mount,
or who is using it?

In fact one thing I noticed in Matt's patch 3 was that he copied
in the nodename verbatim, so a future hostname() by the container
wouldn't be reflected, again not sure if that would matter.

> One thing you need to be aware of here is that inode dirty data
> writebacks may be initiated by completely different processes than the
> one that dirtied the inode.

Right, but I *was* thinking that we wanted to associate the nodename
on the rpc calls with the hostname of the mounter, not the actor.  Maybe
you'll tell me above that that is bogus.

> IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
> rely on being able to determine the container-specific node name at RPC
> generation time are therefore going to return incorrect values.

So should we use patch 2/4, plus (as someone - was it you? - suggested)
using a DEFAULT instead of init_utsname()->nodename when
current->utsname() == NULL?

-serge

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 20:02   ` Serge E. Hallyn
  2009-01-06 20:20     ` J. Bruce Fields
  2009-01-06 20:44     ` Trond Myklebust
@ 2009-01-06 22:29     ` Chuck Lever
  2009-01-07  0:01       ` Serge E. Hallyn
  2 siblings, 1 reply; 47+ messages in thread
From: Chuck Lever @ 2009-01-06 22:29 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Matt Helsley, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, Trond Myklebust, J. Bruce Fields,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Jan 6, 2009, at Jan 6, 2009, 3:02 PM, Serge E. Hallyn wrote:
> Quoting Matt Helsley (matthltc@us.ibm.com):
>> We can often specify the UTS namespace to use when starting an RPC  
>> client.
>> However sometimes no UTS namespace is available (specifically  
>> during system
>> shutdown as the last NFS mount in a container is unmounted) so fall
>> back to the initial UTS namespace.
>
> So what happens if we take this patch and do nothing else?

I thought the point of this was to prevent incorrect container  
nodenames from leaking onto the network.

> The only potential problem situation will be rpc requests
> made on behalf of a container in which the last task has
> exited, right?  So let's say a container did an nfs mount
> and then exits, causing an nfs umount request.
>
> That umount request will now be sent with the wrong nodename.
> Does that actually cause problems, will the server use the
> nodename to try and determine the client sending the request?
>
> thanks,
> -serge
>
>> Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
>> Cc: Cedric Le Goater <clg@fr.ibm.com>
>> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
>> Cc: linux-nfs@vger.kernel.org
>> Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
>> Cc: Chuck Lever <chuck.lever@oracle.com>
>> Cc: Eric W. Biederman <ebiederm@xmission.com>
>> Cc: Linux Containers <containers@lists.osdl.org>
>>
>> ---
>> net/sunrpc/clnt.c |    7 +++++--
>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6.28/net/sunrpc/clnt.c
>> ===================================================================
>> --- linux-2.6.28.orig/net/sunrpc/clnt.c
>> +++ linux-2.6.28/net/sunrpc/clnt.c
>> @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
>> 	struct rpc_version	*version;
>> 	struct rpc_clnt		*clnt = NULL;
>> 	struct rpc_auth		*auth;
>> +	struct new_utsname	*uts_ns = init_utsname();
>> 	int err;
>> 	size_t len;
>>
>> @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
>> 	}
>>
>> 	/* save the nodename */
>> -	clnt->cl_nodelen = strlen(init_utsname()->nodename);
>> +	if (current->nsproxy != NULL)
>> +		uts_ns = utsname();
>> +	clnt->cl_nodelen = strlen(uts_ns->nodename);
>> 	if (clnt->cl_nodelen > UNX_MAXNODENAME)
>> 		clnt->cl_nodelen = UNX_MAXNODENAME;
>> -	memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt- 
>> >cl_nodelen);
>> +	memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
>> 	rpc_register_client(clnt);
>> 	return clnt;
>>
>>
>> -- 
>> _______________________________________________
>> Containers mailing list
>> Containers@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/containers

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 21:58       ` Serge E. Hallyn
@ 2009-01-06 22:42         ` Trond Myklebust
  2009-01-07  0:08           ` Matt Helsley
  2009-01-06 23:04         ` Eric W. Biederman
  2009-01-06 23:30         ` Matt Helsley
  2 siblings, 1 reply; 47+ messages in thread
From: Trond Myklebust @ 2009-01-06 22:42 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Matt Helsley, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote:

> So should we use patch 2/4, plus (as someone - was it you? - suggested)
> using a DEFAULT instead of init_utsname()->nodename when
> current->utsname() == NULL?

No. I'm don't think that 2/4 is correct either. Basically, 2/4 is saying
that the container that first mounts the filesystem 'owns' it. However
at the same time we know that the lifetime of the filesystem is in no
way bounded by the lifetime of the container, and that's what gets you
into trouble with 'umount' in the first place.

IMO, the current code is the most correct approach, in that it assumes
that the filesystems are owned by the 'init' namespace.

Trond


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 20:20     ` J. Bruce Fields
  2009-01-06 21:53       ` Serge E. Hallyn
@ 2009-01-06 22:43       ` Matt Helsley
  1 sibling, 0 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-06 22:43 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Serge E. Hallyn, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, Trond Myklebust, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 15:20 -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley (matthltc@us.ibm.com):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> > 
> > So what happens if we take this patch and do nothing else?
> > 
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right?  So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> > 
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
> 
> This is just the machine name in the auth_unix credential?  The linux
> server ignores that completely (for the purpose of auth_unix
> authenication, it identifies clients only by source ip address).  I
> suspect other servers also ignore it, but I don't know.
> 
> --b.

I was wondering about this because I kept coming back to the question of
how the server could trust the name the client supplies (seems it
can't). This is very useful information -- it certainly suggests that
this patch would be sufficient.

Thanks!

Cheers,
	-Matt Helsley

> > 
> > thanks,
> > -serge
> > 
> > > Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
> > > Cc: Cedric Le Goater <clg@fr.ibm.com>
> > > Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> > > Cc: linux-nfs@vger.kernel.org
> > > Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
> > > Cc: Chuck Lever <chuck.lever@oracle.com>
> > > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > > Cc: Linux Containers <containers@lists.osdl.org>
> > > 
> > > ---
> > >  net/sunrpc/clnt.c |    7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > Index: linux-2.6.28/net/sunrpc/clnt.c
> > > ===================================================================
> > > --- linux-2.6.28.orig/net/sunrpc/clnt.c
> > > +++ linux-2.6.28/net/sunrpc/clnt.c
> > > @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client(
> > >  	struct rpc_version	*version;
> > >  	struct rpc_clnt		*clnt = NULL;
> > >  	struct rpc_auth		*auth;
> > > +	struct new_utsname	*uts_ns = init_utsname();
> > >  	int err;
> > >  	size_t len;
> > > 
> > > @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client(
> > >  	}
> > > 
> > >  	/* save the nodename */
> > > -	clnt->cl_nodelen = strlen(init_utsname()->nodename);
> > > +	if (current->nsproxy != NULL)
> > > +		uts_ns = utsname();
> > > +	clnt->cl_nodelen = strlen(uts_ns->nodename);
> > >  	if (clnt->cl_nodelen > UNX_MAXNODENAME)
> > >  		clnt->cl_nodelen = UNX_MAXNODENAME;
> > > -	memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen);
> > > +	memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen);
> > >  	rpc_register_client(clnt);
> > >  	return clnt;
> > > 
> > > 
> > > -- 
> > > _______________________________________________
> > > Containers mailing list
> > > Containers@lists.linux-foundation.org
> > > https://lists.linux-foundation.org/mailman/listinfo/containers


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 21:58       ` Serge E. Hallyn
  2009-01-06 22:42         ` Trond Myklebust
@ 2009-01-06 23:04         ` Eric W. Biederman
  2009-01-06 23:15           ` Trond Myklebust
  2009-01-06 23:30         ` Matt Helsley
  2 siblings, 1 reply; 47+ messages in thread
From: Eric W. Biederman @ 2009-01-06 23:04 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Trond Myklebust, Matt Helsley, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Quoting Trond Myklebust (trond.myklebust@fys.uio.no):
>> On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
>> > Quoting Matt Helsley (matthltc@us.ibm.com):
>> > > We can often specify the UTS namespace to use when starting an RPC client.
>> > > However sometimes no UTS namespace is available (specifically during
> system
>> > > shutdown as the last NFS mount in a container is unmounted) so fall
>> > > back to the initial UTS namespace.
>> > 
>> > So what happens if we take this patch and do nothing else?
>> > 
>> > The only potential problem situation will be rpc requests
>> > made on behalf of a container in which the last task has
>> > exited, right?  So let's say a container did an nfs mount
>> > and then exits, causing an nfs umount request.
>> > 
>> > That umount request will now be sent with the wrong nodename.
>> > Does that actually cause problems, will the server use the
>> > nodename to try and determine the client sending the request?
>> 
>> The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
>> userspace, not the kernel. The problem here is that because lazy mounts
>> exist, the lifetime of the RPC client may be longer than that of the
>
> Right that was what i was referring to.
>
>> container. In addition, it may be shared among more than 1 container,
>> because superblocks can be shared.
>
> Good point.  And in that case what do we care about (even though
> apparently we just might not care at all :) - who did the mount,
> or who is using it?
>
> In fact one thing I noticed in Matt's patch 3 was that he copied
> in the nodename verbatim, so a future hostname() by the container
> wouldn't be reflected, again not sure if that would matter.
>
>> One thing you need to be aware of here is that inode dirty data
>> writebacks may be initiated by completely different processes than the
>> one that dirtied the inode.
>
> Right, but I *was* thinking that we wanted to associate the nodename
> on the rpc calls with the hostname of the mounter, not the actor.  Maybe
> you'll tell me above that that is bogus.
>
>> IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
>> rely on being able to determine the container-specific node name at RPC
>> generation time are therefore going to return incorrect values.
>
> So should we use patch 2/4, plus (as someone - was it you? - suggested)
> using a DEFAULT instead of init_utsname()->nodename when
> current->utsname() == NULL?

Is there any reason to believe that the kernel helper threads will ever
have a useful namespace value?  I don't think so.

That implies to me you want to capture the value at mount time, and to
pass it in to the rpc_call creation, and only at very specific well
defined points where we interact with user space should we examine
current->utsname().  At which point there should be no question
of current->utsname() is valid as the user space process is alive.

Eric

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 4/4] Represent RPC Callers
  2009-01-06 13:04   ` Trond Myklebust
@ 2009-01-06 23:05     ` Matt Helsley
  0 siblings, 0 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-06 23:05 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Linux Containers, J. Bruce Fields, Cedric Le Goater,
	Linux Kernel Mailing List, linux-nfs, Chuck Lever,
	Eric W. Biederman, Linux Containers

On Tue, 2009-01-06 at 08:04 -0500, Trond Myklebust wrote:
> On Mon, 2009-01-05 at 17:13 -0800, Matt Helsley wrote:
> > plain text document attachment (move-rpc-client-nodename-cache.patch)
> > Currently RPC needs to know the nodename (often the same as the hostname) which
> > should be used for UNIX-style authentication and file-lock tracking. Because
> > hostname can change between RPC calls and some sequences of RPC calls may
> > require consistent names between calls RPC currently saves the nodename with
> > the RPC client structure.
> > 
> > This is doesn't always work because RPC clients may be discarded over the
> > lifetime of a higher level service -- like those that compose NFS. Specifically
> > this is known to happen during shutdown.
> > 
> > Hence RPC should expect the nodename to be saved by the caller when sequences
> > of RPC calls requiring consistent nodenames may be needed (e.g. NFS). To enable
> > this we introduce an RPC caller structure that allows RPC to query the caller
> > for this information.
> > 
> > This patch is not complete but is meant to indicate the direction I'm planning
> > on going. I'd like to know if there are any objections or if anyone sees a
> > better way to handle this.
> 
> You're planning on slowing down every RPC call in order to fix a problem
> on client shutdown? Why?

	I figured that the network latencies would be much larger than the
amount of time it takes to make a function call which, in the cases I
know of, would reduce to a strcpy().

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 23:04         ` Eric W. Biederman
@ 2009-01-06 23:15           ` Trond Myklebust
  2009-01-06 23:32             ` J. Bruce Fields
  0 siblings, 1 reply; 47+ messages in thread
From: Trond Myklebust @ 2009-01-06 23:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> That implies to me you want to capture the value at mount time, and to
> pass it in to the rpc_call creation, and only at very specific well
> defined points where we interact with user space should we examine
> current->utsname().  At which point there should be no question
> of current->utsname() is valid as the user space process is alive.

Why pretend that the filesystem is owned by a particular namespace? It
can, and will be shared among many containers...

Trond


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 20:44     ` Trond Myklebust
  2009-01-06 21:58       ` Serge E. Hallyn
@ 2009-01-06 23:18       ` Matt Helsley
  2009-01-06 23:43         ` Trond Myklebust
  1 sibling, 1 reply; 47+ messages in thread
From: Matt Helsley @ 2009-01-06 23:18 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Serge E. Hallyn, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 15:44 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> > Quoting Matt Helsley (matthltc@us.ibm.com):
> > > We can often specify the UTS namespace to use when starting an RPC client.
> > > However sometimes no UTS namespace is available (specifically during system
> > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > back to the initial UTS namespace.
> > 
> > So what happens if we take this patch and do nothing else?
> > 
> > The only potential problem situation will be rpc requests
> > made on behalf of a container in which the last task has
> > exited, right?  So let's say a container did an nfs mount
> > and then exits, causing an nfs umount request.
> > 
> > That umount request will now be sent with the wrong nodename.
> > Does that actually cause problems, will the server use the
> > nodename to try and determine the client sending the request?
> 
> The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
> userspace, not the kernel. The problem here is that because lazy mounts

Ahh, that's news to me. I thought userspace originated the umount but
the kernel constructed and sent the corresponding RPC call.

> exist, the lifetime of the RPC client may be longer than that of the
> container. In addition, it may be shared among more than 1 container,
> because superblocks can be shared.

Right.

> One thing you need to be aware of here is that inode dirty data
> writebacks may be initiated by completely different processes than the
> one that dirtied the inode.
> IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which
> rely on being able to determine the container-specific node name at RPC
> generation time are therefore going to return incorrect values.

Yes, I was aware that the inode might be dirtied by another container. I
was thinking that, at least in the case of NFS, it makes sense to report
the node name of the container that did the original mount. Of course
this doesn't address the general RPC client case and, like patch 3, it
makes the superblock solution rather NFS-specific. That brings me to a
basic question: Are there any RPC clients in the kernel that do not
operate on behalf of NFS?

Thanks!

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 21:58       ` Serge E. Hallyn
  2009-01-06 22:42         ` Trond Myklebust
  2009-01-06 23:04         ` Eric W. Biederman
@ 2009-01-06 23:30         ` Matt Helsley
  2 siblings, 0 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-06 23:30 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Trond Myklebust, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote:
> Quoting Trond Myklebust (trond.myklebust@fys.uio.no):
> > On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote:
> > > Quoting Matt Helsley (matthltc@us.ibm.com):
> > > > We can often specify the UTS namespace to use when starting an RPC client.
> > > > However sometimes no UTS namespace is available (specifically during system
> > > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > > back to the initial UTS namespace.
> > > 
> > > So what happens if we take this patch and do nothing else?
> > > 
> > > The only potential problem situation will be rpc requests
> > > made on behalf of a container in which the last task has
> > > exited, right?  So let's say a container did an nfs mount
> > > and then exits, causing an nfs umount request.
> > > 
> > > That umount request will now be sent with the wrong nodename.
> > > Does that actually cause problems, will the server use the
> > > nodename to try and determine the client sending the request?
> > 
> > The NFSv2/v3 umount rpc call will be sent by the 'umount' program from
> > userspace, not the kernel. The problem here is that because lazy mounts
> > exist, the lifetime of the RPC client may be longer than that of the
> 
> Right that was what i was referring to.
> 
> > container. In addition, it may be shared among more than 1 container,
> > because superblocks can be shared.
> 
> Good point.  And in that case what do we care about (even though
> apparently we just might not care at all :) - who did the mount,
> or who is using it?
> 
> In fact one thing I noticed in Matt's patch 3 was that he copied
> in the nodename verbatim, so a future hostname() by the container
> wouldn't be reflected, again not sure if that would matter.

I thought of this. I found the patches that added the nodename in the
RPC client struct and the stale nodename was intentional. That's why I
preserved the copy rather than called utsname() each time.

Cheers,
	-Matt


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 23:15           ` Trond Myklebust
@ 2009-01-06 23:32             ` J. Bruce Fields
  2009-01-06 23:35               ` Trond Myklebust
  0 siblings, 1 reply; 47+ messages in thread
From: J. Bruce Fields @ 2009-01-06 23:32 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > That implies to me you want to capture the value at mount time, and to
> > pass it in to the rpc_call creation, and only at very specific well
> > defined points where we interact with user space should we examine
> > current->utsname().  At which point there should be no question
> > of current->utsname() is valid as the user space process is alive.
> 
> Why pretend that the filesystem is owned by a particular namespace? It
> can, and will be shared among many containers...

If the only purpose of this is to fill in the auth_unix cred then
shouldn't it be part of whatever cred structures are passed around?

But I doubt it's that important to get it right.

--b.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 21:53       ` Serge E. Hallyn
@ 2009-01-06 23:35         ` Matt Helsley
  0 siblings, 0 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-06 23:35 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: J. Bruce Fields, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, Trond Myklebust, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 15:53 -0600, Serge E. Hallyn wrote:
> Quoting J. Bruce Fields (bfields@fieldses.org):
> > On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote:
> > > Quoting Matt Helsley (matthltc@us.ibm.com):
> > > > We can often specify the UTS namespace to use when starting an RPC client.
> > > > However sometimes no UTS namespace is available (specifically during system
> > > > shutdown as the last NFS mount in a container is unmounted) so fall
> > > > back to the initial UTS namespace.
> > > 
> > > So what happens if we take this patch and do nothing else?
> > > 
> > > The only potential problem situation will be rpc requests
> > > made on behalf of a container in which the last task has
> > > exited, right?  So let's say a container did an nfs mount
> > > and then exits, causing an nfs umount request.
> > > 
> > > That umount request will now be sent with the wrong nodename.
> > > Does that actually cause problems, will the server use the
> > > nodename to try and determine the client sending the request?
> > 
> > This is just the machine name in the auth_unix credential?  The linux
> > server ignores that completely (for the purpose of auth_unix
> > authenication, it identifies clients only by source ip address).  I
> > suspect other servers also ignore it, but I don't know.
> 
> Thanks, that's what i was hoping...
> 
> Matt, have you audited the other rpc-based services?  Do any
> of them care?

Frankly, I did not audit any of the RPC-based services to see if any
truly cared about the node name. That's actually a rather large scope
when you consider it -- I'd have to look at much more than just "Linux"
RPC clients. It seemed unsafe to assume _nobody_ would care after they
bothered to put it into the spec.

Hopefully I'm wrong though :).

Cheers,
	-Matt


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 23:32             ` J. Bruce Fields
@ 2009-01-06 23:35               ` Trond Myklebust
  2009-01-06 23:48                 ` Matt Helsley
                                   ` (3 more replies)
  0 siblings, 4 replies; 47+ messages in thread
From: Trond Myklebust @ 2009-01-06 23:35 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > That implies to me you want to capture the value at mount time, and to
> > > pass it in to the rpc_call creation, and only at very specific well
> > > defined points where we interact with user space should we examine
> > > current->utsname().  At which point there should be no question
> > > of current->utsname() is valid as the user space process is alive.
> > 
> > Why pretend that the filesystem is owned by a particular namespace? It
> > can, and will be shared among many containers...
> 
> If the only purpose of this is to fill in the auth_unix cred then
> shouldn't it be part of whatever cred structures are passed around?

So how does tracking it in a shared structure like the rpc_client help?
If you consider it to be part of the cred, then it needs to be tracked
in the cred...

Trond


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 23:18       ` Matt Helsley
@ 2009-01-06 23:43         ` Trond Myklebust
  2009-01-06 23:58           ` Matt Helsley
  0 siblings, 1 reply; 47+ messages in thread
From: Trond Myklebust @ 2009-01-06 23:43 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Serge E. Hallyn, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 15:18 -0800, Matt Helsley wrote:
> Yes, I was aware that the inode might be dirtied by another container. I
> was thinking that, at least in the case of NFS, it makes sense to report
> the node name of the container that did the original mount. Of course
> this doesn't address the general RPC client case and, like patch 3, it
> makes the superblock solution rather NFS-specific. That brings me to a
> basic question: Are there any RPC clients in the kernel that do not
> operate on behalf of NFS?

Yes. There is, for instance, the portmapper/rpcbind client. The NFS and
lockd servers also sometime needs to send RPC callbacks. Assuming that
you can convert the RPC client into something NFS-specific is therefore
not an option.

Trond


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 23:35               ` Trond Myklebust
@ 2009-01-06 23:48                 ` Matt Helsley
  2009-01-06 23:51                 ` Chuck Lever
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-06 23:48 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: J. Bruce Fields, Eric W. Biederman, Serge E. Hallyn,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 18:35 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > > That implies to me you want to capture the value at mount time, and to
> > > > pass it in to the rpc_call creation, and only at very specific well
> > > > defined points where we interact with user space should we examine
> > > > current->utsname().  At which point there should be no question
> > > > of current->utsname() is valid as the user space process is alive.
> > > 
> > > Why pretend that the filesystem is owned by a particular namespace? It
> > > can, and will be shared among many containers...
> > 
> > If the only purpose of this is to fill in the auth_unix cred then
> > shouldn't it be part of whatever cred structures are passed around?
> 
> So how does tracking it in a shared structure like the rpc_client help?
> If you consider it to be part of the cred, then it needs to be tracked
> in the cred...
> 
> Trond

I originally discarded the idea of relating it to the credentials
because it looked like the credential cache could be flushed at any time
(according to the small portion of the RFC I read) so I was worried we'd
have even worse lifetime issues to deal with. 

Cheers,
	-Matt


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 23:35               ` Trond Myklebust
  2009-01-06 23:48                 ` Matt Helsley
@ 2009-01-06 23:51                 ` Chuck Lever
  2009-01-06 23:53                 ` J. Bruce Fields
  2009-01-07  0:20                 ` Trond Myklebust
  3 siblings, 0 replies; 47+ messages in thread
From: Chuck Lever @ 2009-01-06 23:51 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: J. Bruce Fields, Eric W. Biederman, Serge E. Hallyn, Matt Helsley,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Linux Containers, Cedric Le Goater

On Jan 6, 2009, at Jan 6, 2009, 6:35 PM, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
>> On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
>>> On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
>>>> That implies to me you want to capture the value at mount time,  
>>>> and to
>>>> pass it in to the rpc_call creation, and only at very specific well
>>>> defined points where we interact with user space should we examine
>>>> current->utsname().  At which point there should be no question
>>>> of current->utsname() is valid as the user space process is alive.
>>>
>>> Why pretend that the filesystem is owned by a particular  
>>> namespace? It
>>> can, and will be shared among many containers...
>>
>> If the only purpose of this is to fill in the auth_unix cred then
>> shouldn't it be part of whatever cred structures are passed around?
>
> So how does tracking it in a shared structure like the rpc_client  
> help?
> If you consider it to be part of the cred, then it needs to be tracked
> in the cred...

I think generating a proper AUTH_SYS cred, given the existence of  
containers, is the essential question here.

However, we use nodename for lock owners too... perhaps that deserves  
a separate solution.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 23:35               ` Trond Myklebust
  2009-01-06 23:48                 ` Matt Helsley
  2009-01-06 23:51                 ` Chuck Lever
@ 2009-01-06 23:53                 ` J. Bruce Fields
  2009-01-07  0:07                   ` Matt Helsley
  2009-01-07  0:20                   ` Trond Myklebust
  2009-01-07  0:20                 ` Trond Myklebust
  3 siblings, 2 replies; 47+ messages in thread
From: J. Bruce Fields @ 2009-01-06 23:53 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > > That implies to me you want to capture the value at mount time, and to
> > > > pass it in to the rpc_call creation, and only at very specific well
> > > > defined points where we interact with user space should we examine
> > > > current->utsname().  At which point there should be no question
> > > > of current->utsname() is valid as the user space process is alive.
> > > 
> > > Why pretend that the filesystem is owned by a particular namespace? It
> > > can, and will be shared among many containers...
> > 
> > If the only purpose of this is to fill in the auth_unix cred then
> > shouldn't it be part of whatever cred structures are passed around?
> 
> So how does tracking it in a shared structure like the rpc_client help?
> If you consider it to be part of the cred, then it needs to be tracked
> in the cred...

Right, that's what I meant.

It seems like overkill, though.  Does anyone actually care whether these
names are right?

--b.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 23:43         ` Trond Myklebust
@ 2009-01-06 23:58           ` Matt Helsley
  0 siblings, 0 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-06 23:58 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Serge E. Hallyn, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 18:43 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 15:18 -0800, Matt Helsley wrote:
> > Yes, I was aware that the inode might be dirtied by another container. I
> > was thinking that, at least in the case of NFS, it makes sense to report
> > the node name of the container that did the original mount. Of course
> > this doesn't address the general RPC client case and, like patch 3, it
> > makes the superblock solution rather NFS-specific. That brings me to a
> > basic question: Are there any RPC clients in the kernel that do not
> > operate on behalf of NFS?
> 
> Yes. There is, for instance, the portmapper/rpcbind client.

<snip>

OK that's a good example. Point taken.

>  The NFS and
> lockd servers also sometime needs to send RPC callbacks. 

Yes, these won't have an NFS superblock handy and hence they use the
node name cached in the RPC client struct. That means that these patches
have only addressed NFS client-side namespace issues. Have you seen J.
Bruce Fields post on the nfsd side of things?

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 22:29     ` Chuck Lever
@ 2009-01-07  0:01       ` Serge E. Hallyn
  0 siblings, 0 replies; 47+ messages in thread
From: Serge E. Hallyn @ 2009-01-07  0:01 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Matt Helsley, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, Trond Myklebust, J. Bruce Fields,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

Quoting Chuck Lever (chuck.lever@oracle.com):
> On Jan 6, 2009, at Jan 6, 2009, 3:02 PM, Serge E. Hallyn wrote:
>> Quoting Matt Helsley (matthltc@us.ibm.com):
>>> We can often specify the UTS namespace to use when starting an RPC  
>>> client.
>>> However sometimes no UTS namespace is available (specifically during 
>>> system
>>> shutdown as the last NFS mount in a container is unmounted) so fall
>>> back to the initial UTS namespace.
>>
>> So what happens if we take this patch and do nothing else?
>
> I thought the point of this was to prevent incorrect container nodenames 
> from leaking onto the network.

But define incorrect.  If container B does an nfs mount, container c
is launched with a tree in that mount, container B dies, and container C
umounts it.  Should the umount belong to container B (for having
mounted it), container C (for having umount it), or the init_utsname
(for being the physical host/kernel)?

I get the feeling that consensus on this thread is that init_utsname
is actually the best choice, but OTOH if I have 3 containers on my
host, for apache, mysql, and postfix servers, and each is doing
nfs mounts from a physically remote machine, maybe I care about
having them report separate nodenames?

(that's a question, I really don't know...)

-serge

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 23:53                 ` J. Bruce Fields
@ 2009-01-07  0:07                   ` Matt Helsley
  2009-01-07  0:55                     ` Eric W. Biederman
  2009-01-07  0:20                   ` Trond Myklebust
  1 sibling, 1 reply; 47+ messages in thread
From: Matt Helsley @ 2009-01-07  0:07 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Trond Myklebust, Eric W. Biederman, Serge E. Hallyn,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote:
> > On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
> > > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
> > > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
> > > > > That implies to me you want to capture the value at mount time, and to
> > > > > pass it in to the rpc_call creation, and only at very specific well
> > > > > defined points where we interact with user space should we examine
> > > > > current->utsname().  At which point there should be no question
> > > > > of current->utsname() is valid as the user space process is alive.
> > > > 
> > > > Why pretend that the filesystem is owned by a particular namespace? It
> > > > can, and will be shared among many containers...
> > > 
> > > If the only purpose of this is to fill in the auth_unix cred then
> > > shouldn't it be part of whatever cred structures are passed around?
> > 
> > So how does tracking it in a shared structure like the rpc_client help?
> > If you consider it to be part of the cred, then it needs to be tracked
> > in the cred...
> 
> Right, that's what I meant.
> 
> It seems like overkill, though.  Does anyone actually care whether these
> names are right?

That's certainly a tempting angle. However we may not "control" the
server code -- couldn't there be some oddball (maybe even proprietary)
NFS servers out there that users do care about interacting with?

Cheers,
	-Matt Helsley



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 22:42         ` Trond Myklebust
@ 2009-01-07  0:08           ` Matt Helsley
  2009-01-07  0:20             ` Trond Myklebust
  2009-01-07  0:20             ` J. Bruce Fields
  0 siblings, 2 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-07  0:08 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Serge E. Hallyn, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 17:42 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote:
> 
> > So should we use patch 2/4, plus (as someone - was it you? - suggested)
> > using a DEFAULT instead of init_utsname()->nodename when
> > current->utsname() == NULL?
> 
> No. I'm don't think that 2/4 is correct either. Basically, 2/4 is saying
> that the container that first mounts the filesystem 'owns' it. However
> at the same time we know that the lifetime of the filesystem is in no
> way bounded by the lifetime of the container, and that's what gets you
> into trouble with 'umount' in the first place.
>
> IMO, the current code is the most correct approach, in that it assumes
> that the filesystems are owned by the 'init' namespace.

IMHO This seems more incorrect than trying to use a more proximal
namespace.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 23:35               ` Trond Myklebust
                                   ` (2 preceding siblings ...)
  2009-01-06 23:53                 ` J. Bruce Fields
@ 2009-01-07  0:20                 ` Trond Myklebust
  2009-01-07  0:26                   ` J. Bruce Fields
  3 siblings, 1 reply; 47+ messages in thread
From: Trond Myklebust @ 2009-01-07  0:20 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 18:35 -0500, Trond Myklebust wrote:
> So how does tracking it in a shared structure like the rpc_client help?
> If you consider it to be part of the cred, then it needs to be tracked
> in the cred...

OK. If people really want to track this, then you could add a reference
to the current->nsproxy to the struct rpc_cred and struct auth_cred, and
ensure that the unx_marshal and unx_match routines do the right thing
w.r.t. that reference (if it exists).
However such a patch had better be accompanied with a _really_
convincing argument for why containers need this...

As for the NFSv4 clientid, I can't see how you would ever want to use
anything other than the init->utsname(), since the requirement is only
that the clientid string be unique and preserved across reboots. The
server isn't allowed to interpret the contents of the clientid string.
Ditto for the RPCSEC_GSS machine creds that are used to establish the
clientid.

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-06 23:53                 ` J. Bruce Fields
  2009-01-07  0:07                   ` Matt Helsley
@ 2009-01-07  0:20                   ` Trond Myklebust
  1 sibling, 0 replies; 47+ messages in thread
From: Trond Myklebust @ 2009-01-07  0:20 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote:
> It seems like overkill, though.  Does anyone actually care whether these
> names are right?

I very much doubt it. Particularly since the actual sockets used by the
RPC client are always owned by the init namespace (they are established
from a workqueue process), and so the source IP address will always be
that of the init namespace.

Trond
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  0:08           ` Matt Helsley
@ 2009-01-07  0:20             ` Trond Myklebust
  2009-01-07  0:43               ` Matt Helsley
  2009-01-07  0:20             ` J. Bruce Fields
  1 sibling, 1 reply; 47+ messages in thread
From: Trond Myklebust @ 2009-01-07  0:20 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Serge E. Hallyn, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote:
> IMHO This seems more incorrect than trying to use a more proximal
> namespace.

You have yet to explain why.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  0:08           ` Matt Helsley
  2009-01-07  0:20             ` Trond Myklebust
@ 2009-01-07  0:20             ` J. Bruce Fields
  2009-01-07  0:23               ` Trond Myklebust
  1 sibling, 1 reply; 47+ messages in thread
From: J. Bruce Fields @ 2009-01-07  0:20 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Trond Myklebust, Serge E. Hallyn, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, Chuck Lever, Eric W. Biederman,
	Linux Containers, Cedric Le Goater

On Tue, Jan 06, 2009 at 04:08:50PM -0800, Matt Helsley wrote:
> On Tue, 2009-01-06 at 17:42 -0500, Trond Myklebust wrote:
> > On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote:
> > 
> > > So should we use patch 2/4, plus (as someone - was it you? - suggested)
> > > using a DEFAULT instead of init_utsname()->nodename when
> > > current->utsname() == NULL?
> > 
> > No. I'm don't think that 2/4 is correct either. Basically, 2/4 is saying
> > that the container that first mounts the filesystem 'owns' it. However
> > at the same time we know that the lifetime of the filesystem is in no
> > way bounded by the lifetime of the container, and that's what gets you
> > into trouble with 'umount' in the first place.
> >
> > IMO, the current code is the most correct approach, in that it assumes
> > that the filesystems are owned by the 'init' namespace.
> 
> IMHO This seems more incorrect than trying to use a more proximal
> namespace.

If it would be possible, for example, for the 'init' namespace to have
no network interfaces at all, then it would be nicer to use a name
that's at least been used with nfs at *some* point--just on the general
principle of not leaking information to a domain that the user wouldn't
expect it to.

(Assuming it's unlikely anyone would consider init's utsname to be
sensitive information, that's a minor point.)

--b.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  0:20             ` J. Bruce Fields
@ 2009-01-07  0:23               ` Trond Myklebust
  2009-01-07  3:44                 ` Matt Helsley
  0 siblings, 1 reply; 47+ messages in thread
From: Trond Myklebust @ 2009-01-07  0:23 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Matt Helsley, Serge E. Hallyn, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, Chuck Lever, Eric W. Biederman,
	Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 19:20 -0500, J. Bruce Fields wrote:
> If it would be possible, for example, for the 'init' namespace to have
> no network interfaces at all, then it would be nicer to use a name
> that's at least been used with nfs at *some* point--just on the general
> principle of not leaking information to a domain that the user wouldn't
> expect it to.

Then RPC would fail. Thanks to the limitations imposed by selinux &
friends, all RPC sockets have to be owned by the init process.
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  0:20                 ` Trond Myklebust
@ 2009-01-07  0:26                   ` J. Bruce Fields
  2009-01-07  0:38                     ` Trond Myklebust
  0 siblings, 1 reply; 47+ messages in thread
From: J. Bruce Fields @ 2009-01-07  0:26 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

On Tue, Jan 06, 2009 at 07:20:07PM -0500, Trond Myklebust wrote:
> As for the NFSv4 clientid, I can't see how you would ever want to use
> anything other than the init->utsname(), since the requirement is only
> that the clientid string be unique and preserved across reboots. The
> server isn't allowed to interpret the contents of the clientid string.
> Ditto for the RPCSEC_GSS machine creds that are used to establish the
> clientid.

If people eventually expect to be able to, say, migrate a container to
another host while using an nfs mount as their storage, then they'd need
the name to be that of the container, not of the host.

Obviously we'd also need to ensure the container got its own nfsv4
client state, etc., etc., and it sounds like we're far from that.

--b.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 3/4] sunrpc: Improve UTS namespace workaround
  2009-01-06 16:02   ` Chuck Lever
@ 2009-01-07  0:28     ` Matt Helsley
  2009-01-07  3:02     ` Matt Helsley
  1 sibling, 0 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-07  0:28 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Linux Containers, J. Bruce Fields, Cedric Le Goater,
	Linux Kernel Mailing List, linux-nfs, Trond Myklebust,
	Eric W. Biederman, Linux Containers

On Tue, 2009-01-06 at 11:02 -0500, Chuck Lever wrote:
> Matt-
> 
> Thanks for pursuing a permanent fix for this.
> 
> On Jan 5, 2009, at Jan 5, 2009, 8:13 PM, Matt Helsley wrote:
> 
> > We can improve upon a workaround applied in commit
> > 63ffc23d307c9534c732edd87895e37b223004a3 ( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=63ffc23d307c9534c732edd87895e37b223004a3 
> >  )
> >
> > The original problem was:
> >
> > "On a system with nfs mounts, if a task unshares its mount namespace,
> > a oops can occur when the system is rebooted if the task is the last
> > to unreference the nfs mount. It will try to create a rpc request
> > using utsname() which has been invalidated by free_nsproxy()."
> >
> > Cedric worked around this by always using the initial uts namespace  
> > for RPC.
> > Critically this workaround meant that RPC clients in uts namespaces  
> > can never
> > report the changed nodename when utilizing RPC.
> >
> > Fix that by storing the nodename in the NFS server structure (part  
> > of the NFS
> > super block) and, when an RPC client is operating on behalf of NFS,  
> > reporting
> > that nodename. This solves the problem for NFS clients but leaves  
> > any other
> > RPC users out in the cold.
> >
> > Rather than caching the nodename in the client structure RPC should  
> > obtain the
> > nodename from RPC callers. It would then be up to those services  
> > making RPC
> > calls to cache the nodename for as long as necessary -- somewhat  
> > like this patch
> > does with NFS.
> 
> Instead of having the RPC client call the consumer back, why can't you  
> pass the nodename as an argument to each RPC call; say, via the  
> rpc_message structure?
> 
> > NOTE: Part of Cedric's workaround -- use of the initial uts  
> > namespace -- is
> > still necessary because non-NFS RPC callers still rely on the  
> > nodename cached
> > with the RPC client struct.
> 
> In the long run I think it would be more useful to spell out where  
> each consumer gets its nodename value, rather than having a convenient  
> default value.  A default would encourage exposing nodenames  
> inappropriately due to sloppy coding and incorrect assumptions (on the  
> developer's part) about a complex API.
> 
> Where would NFSv4 callbacks get a nodename, for example?  And what  
> about rpcbind requests?
> 
> > Thoughts?
> 
> 
> More comments below.
> 
> > Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
> > Cc: Cedric Le Goater <clg@fr.ibm.com>
> > Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> > Cc: linux-nfs@vger.kernel.org
> > Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
> > Cc: Chuck Lever <chuck.lever@oracle.com>
> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Cc: Linux Containers <containers@lists.osdl.org>
> >
> > ---
> > fs/lockd/clntproc.c         |   16 +++++++++++++---
> > fs/nfs/client.c             |    6 ++++++
> > fs/nfs/super.c              |    6 ++++++
> > include/linux/nfs_fs.h      |    2 ++
> > include/linux/nfs_fs_sb.h   |    3 +++
> > include/linux/sunrpc/clnt.h |    2 ++
> > net/sunrpc/auth_unix.c      |   13 ++++++++++++-
> > 7 files changed, 44 insertions(+), 4 deletions(-)
> >
> > Index: linux-2.6.28/include/linux/sunrpc/clnt.h
> > ===================================================================
> > --- linux-2.6.28.orig/include/linux/sunrpc/clnt.h
> > +++ linux-2.6.28/include/linux/sunrpc/clnt.h
> > @@ -19,6 +19,7 @@
> > #include <asm/signal.h>
> >
> > struct rpc_inode;
> > +struct nfs_server;
> >
> > /*
> >  * The high-level client handle
> > @@ -50,6 +51,7 @@ struct rpc_clnt {
> >
> > 	int			cl_nodelen;	/* nodename length */
> > 	char 			cl_nodename[UNX_MAXNODENAME];
> > +	struct nfs_server	*cl_nfs_server;
> 
> This is a layering violation...  I would rather avoid introducing new  
> strong data structure dependencies on one of RPC's consumers.

Good point. I was hoping the RPC caller structure I introduced in patch
4 would be, if not pretty, at least more-acceptable replacement for this
hack. 

However it seems this series has more fundamental issues that need to be
answered before I can proceed..

> > 	char			cl_pathname[30];/* Path in rpc_pipe_fs */
> > 	struct vfsmount *	cl_vfsmnt;
> > 	struct dentry *		cl_dentry;	/* inode */
> > Index: linux-2.6.28/net/sunrpc/auth_unix.c
> > ===================================================================
> > --- linux-2.6.28.orig/net/sunrpc/auth_unix.c
> > +++ linux-2.6.28/net/sunrpc/auth_unix.c
> > @@ -12,6 +12,13 @@
> > #include <linux/sunrpc/clnt.h>
> > #include <linux/sunrpc/auth.h>
> >
> > +#include <linux/nfs.h>
> > +#include <linux/nfs2.h>
> > +#include <linux/nfs3.h>
> > +#include <linux/nfs4.h>
> > +#include <linux/nfs_xdr.h>
> > +#include <linux/nfs_fs_sb.h>
> > +
> > #define NFS_NGROUPS	16
> >
> > struct unx_cred {
> > @@ -151,7 +158,11 @@ unx_marshal(struct rpc_task *task, __be3
> > 	/*
> > 	 * Copy the UTS nodename captured when the client was created.
> > 	 */
> > -	p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen);
> > +	if (clnt->cl_nfs_server)
> > +		p = xdr_encode_array(p, clnt->cl_nfs_server->nodename,
> > +				     clnt->cl_nfs_server->nodelen);
> > +	else
> > +		p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen);
> >
> > 	*p++ = htonl((u32) cred->uc_uid);
> > 	*p++ = htonl((u32) cred->uc_gid);
> > Index: linux-2.6.28/include/linux/nfs_fs_sb.h
> > ===================================================================
> > --- linux-2.6.28.orig/include/linux/nfs_fs_sb.h
> > +++ linux-2.6.28/include/linux/nfs_fs_sb.h
> > @@ -126,6 +126,9 @@ struct nfs_server {
> > 	u32			mountd_version;
> > 	unsigned short		mountd_port;
> > 	unsigned short		mountd_protocol;
> > +
> > +	int			nodelen;	/* nodename length */
> > +	char 			nodename[UNX_MAXNODENAME];
> > };
> >
> > /* Server capabilities */
> > Index: linux-2.6.28/fs/nfs/super.c
> > ===================================================================
> > --- linux-2.6.28.orig/fs/nfs/super.c
> > +++ linux-2.6.28/fs/nfs/super.c
> > @@ -51,6 +51,7 @@
> > #include <linux/nfs_xdr.h>
> > #include <linux/magic.h>
> > #include <linux/parser.h>
> > +#include <linux/utsname.h>
> >
> > #include <asm/system.h>
> > #include <asm/uaccess.h>
> > @@ -1830,6 +1831,11 @@ static void nfs_fill_super(struct super_
> > 		sb->s_time_gran = 1;
> > 	}
> >
> > +	server->nodelen = strlen(utsname()->nodename);
> > +	if (server->nodelen > UNX_MAXNODENAME)
> > +		server->nodelen = UNX_MAXNODENAME;
> > +	memcpy(server->nodename, utsname()->nodename, server->nodelen);
> > +
> > 	sb->s_op = &nfs_sops;
> >  	nfs_initialise_sb(sb);
> > }
> > Index: linux-2.6.28/fs/nfs/client.c
> > ===================================================================
> > --- linux-2.6.28.orig/fs/nfs/client.c
> > +++ linux-2.6.28/fs/nfs/client.c
> > @@ -25,10 +25,12 @@
> > #include <linux/sunrpc/metrics.h>
> > #include <linux/sunrpc/xprtsock.h>
> > #include <linux/sunrpc/xprtrdma.h>
> > +#include <linux/sunrpc/svc.h>
> > #include <linux/nfs_fs.h>
> > #include <linux/nfs_mount.h>
> > #include <linux/nfs4_mount.h>
> > #include <linux/lockd/bind.h>
> > +#include <linux/lockd/lockd.h>
> > #include <linux/seq_file.h>
> > #include <linux/mount.h>
> > #include <linux/nfs_idmap.h>
> > @@ -47,6 +49,7 @@
> > #include "internal.h"
> >
> > #define NFSDBG_FACILITY		NFSDBG_CLIENT
> > +#define NLMDBG_FACILITY		NLMDBG_CLIENT
> >
> > static DEFINE_SPINLOCK(nfs_client_lock);
> > static LIST_HEAD(nfs_client_list);
> > @@ -555,6 +558,7 @@ static void nfs_init_server_aclclient(st
> >
> > 	/* No errors! Assume that Sun nfsacls are supported */
> > 	server->caps |= NFS_CAP_ACLS;
> > +	server->client_acl->cl_nfs_server = server;
> > 	return;
> >
> > out_noacl:
> > @@ -673,6 +677,7 @@ static int nfs_init_server(struct nfs_se
> > 		goto error;
> >
> > 	server->nfs_client = clp;
> > + 	clp->cl_rpcclient->cl_nfs_server = server;
> >
> > 	/* Initialise the client representation from the mount data */
> > 	server->flags = data->flags;
> > @@ -1035,6 +1040,7 @@ static int nfs4_set_client(struct nfs_se
> > 		goto error_put;
> >
> > 	server->nfs_client = clp;
> > +	clp->cl_rpcclient->cl_nfs_server = server;
> > 	dprintk("<-- nfs4_set_client() = 0 [new %p]\n", clp);
> > 	return 0;
> >
> > Index: linux-2.6.28/fs/lockd/clntproc.c
> > ===================================================================
> > --- linux-2.6.28.orig/fs/lockd/clntproc.c
> > +++ linux-2.6.28/fs/lockd/clntproc.c
> > @@ -10,8 +10,8 @@
> > #include <linux/types.h>
> > #include <linux/errno.h>
> > #include <linux/fs.h>
> > +#include <linux/mount.h>
> > #include <linux/nfs_fs.h>
> > -#include <linux/utsname.h>
> > #include <linux/freezer.h>
> > #include <linux/sunrpc/clnt.h>
> > #include <linux/sunrpc/svc.h>
> > @@ -118,6 +118,11 @@ static struct nlm_lockowner *nlm_find_lo
> > 	return res;
> > }
> >
> > +struct nfs_server *fl_nfs_server(struct file_lock *fl)
> > +{
> > +	return NFS_SB(fl->fl_file->f_path.mnt->mnt_sb);
> > +}
> > +
> > /*
> >  * Initialize arguments for TEST/LOCK/UNLOCK/CANCEL calls
> >  */
> > @@ -125,15 +130,18 @@ static void nlmclnt_setlockargs(struct n
> > {
> > 	struct nlm_args	*argp = &req->a_args;
> > 	struct nlm_lock	*lock = &argp->lock;
> > +	char *nodename;
> >
> > 	nlmclnt_next_cookie(&argp->cookie);
> > 	argp->state   = nsm_local_state;
> > 	memcpy(&lock->fh, NFS_FH(fl->fl_file->f_path.dentry->d_inode),  
> > sizeof(struct nfs_fh));
> > -	lock->caller  = utsname()->nodename;
> > +
> > +	nodename = fl_nfs_server(fl)->nodename;
> > +	lock->caller  = nodename;
> > 	lock->oh.data = req->a_owner;
> > 	lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
> > 				(unsigned int)fl->fl_u.nfs_fl.owner->pid,
> > -				utsname()->nodename);
> > +				nodename);
> > 	lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > 	lock->fl.fl_start = fl->fl_start;
> > 	lock->fl.fl_end = fl->fl_end;
> > @@ -272,6 +280,7 @@ nlmclnt_call(struct rpc_cred *cred, stru
> > 		/* If we have no RPC client yet, create one. */
> > 		if ((clnt = nlm_bind_host(host)) == NULL)
> > 			return -ENOLCK;
> > +		clnt->cl_nfs_server = fl_nfs_server(&argp->lock.fl);
> > 		msg.rpc_proc = &clnt->cl_procinfo[proc];
> >
> > 		/* Perform the RPC call. If an error occurs, try again */
> > @@ -344,6 +353,7 @@ static struct rpc_task *__nlm_async_call
> > 	clnt = nlm_bind_host(host);
> > 	if (clnt == NULL)
> > 		goto out_err;
> > +	clnt->cl_nfs_server = fl_nfs_server(&req->a_args.lock.fl);
> 
> This change takes care of fixing NFS client locking requests, but the  
> NFS server also makes lockd callbacks to the client.  You won't have  
> an nfs_server handle on the NFS server side.

True -- this series doesn't address the server side. Depending on how
this discussion goes I may end up visiting that code too.

> Generally, you ought to pass just the nodename where needed, not a  
> pointer to the nfs_server.

Passing the nfs server around is a temporary measure. It's meant to make
the following patch more palatable because patch 4 removes the code
passing around the nfs server pointer and adds the RPC caller structure
in its place.

Cheers,
	-Matt




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  0:26                   ` J. Bruce Fields
@ 2009-01-07  0:38                     ` Trond Myklebust
  2009-01-07  1:44                       ` J. Bruce Fields
  0 siblings, 1 reply; 47+ messages in thread
From: Trond Myklebust @ 2009-01-07  0:38 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 19:26 -0500, J. Bruce Fields wrote:
> On Tue, Jan 06, 2009 at 07:20:07PM -0500, Trond Myklebust wrote:
> > As for the NFSv4 clientid, I can't see how you would ever want to use
> > anything other than the init->utsname(), since the requirement is only
> > that the clientid string be unique and preserved across reboots. The
> > server isn't allowed to interpret the contents of the clientid string.
> > Ditto for the RPCSEC_GSS machine creds that are used to establish the
> > clientid.
> 
> If people eventually expect to be able to, say, migrate a container to
> another host while using an nfs mount as their storage, then they'd need
> the name to be that of the container, not of the host.

Why?

> Obviously we'd also need to ensure the container got its own nfsv4
> client state, etc., etc., and it sounds like we're far from that.

Again, why? Are you seriously proposing a plan to transport all NFS and
locking state directly from one kernel to another?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  0:20             ` Trond Myklebust
@ 2009-01-07  0:43               ` Matt Helsley
  2009-01-07  1:10                 ` Trond Myklebust
  0 siblings, 1 reply; 47+ messages in thread
From: Matt Helsley @ 2009-01-07  0:43 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Serge E. Hallyn, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 19:20 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote:
> > IMHO This seems more incorrect than trying to use a more proximal
> > namespace.
> 
> You have yet to explain why.

It's "more proximal" -- it's closer to the container that we expect to
cause (directly or otherwise) the bulk of the RPC calls for that mount.
If the container does not wind up sharing that mount with other
containers then the reported node name matches. If the container winds
up sharing the mount with other containers then at least we can learn
which container originated the mount.

I imagine an NFS administrator trying to determine the source of a bunch
of RPC calls. If we just report the initial namespace then that
administrator has to do lots more digging to determine which container
sent the calls (assuming they aren't in different network namespaces).
By not always reporting the initial namespace we may give the
administrator one way to narrow down the search. Even if the reported
node name does not perfectly match the source of all RPC traffic related
to the mount at least the administrator gets something more specific.

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  0:07                   ` Matt Helsley
@ 2009-01-07  0:55                     ` Eric W. Biederman
  0 siblings, 0 replies; 47+ messages in thread
From: Eric W. Biederman @ 2009-01-07  0:55 UTC (permalink / raw)
  To: Matt Helsley
  Cc: J. Bruce Fields, Trond Myklebust, Serge E. Hallyn,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

Matt Helsley <matthltc@us.ibm.com> writes:

> On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote:
>> On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote:
>> > On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote:
>> > > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote:
>> > > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote:
>> > > > > That implies to me you want to capture the value at mount time, and to
>> > > > > pass it in to the rpc_call creation, and only at very specific well
>> > > > > defined points where we interact with user space should we examine
>> > > > > current->utsname().  At which point there should be no question
>> > > > > of current->utsname() is valid as the user space process is alive.
>> > > > 
>> > > > Why pretend that the filesystem is owned by a particular namespace? It
>> > > > can, and will be shared among many containers...

Sounds right.  Still like the owner of a file it can happen that some
containers are more correct than others.  Especially in the context of
mount merging and the other sophisticated caching that happens in NFS
this increasingly sounds like something that belongs in the cred as
that is where it is used.

>> > > If the only purpose of this is to fill in the auth_unix cred then
>> > > shouldn't it be part of whatever cred structures are passed around?
>> > 
>> > So how does tracking it in a shared structure like the rpc_client help?
>> > If you consider it to be part of the cred, then it needs to be tracked
>> > in the cred...
>> 
>> Right, that's what I meant.
>> 
>> It seems like overkill, though.  Does anyone actually care whether these
>> names are right?
>
> That's certainly a tempting angle. However we may not "control" the
> server code -- couldn't there be some oddball (maybe even proprietary)
> NFS servers out there that users do care about interacting with?

Matt could you look at what it will take to do the right thing from
the network namespace side of things as well?  I believe it is going
to require the same level of understanding of the interactions in the code
to get there.

For the network namespace we should cache it at mount or server
startup and use it until we are done.  In a network namespace context
there are good reasons for that because talking to 10.0.0.1 on one
network may not be the same machine as talking to 10.0.0.1 on another
network.  NFS reestablishes tcp connections if the connection to the
server breaks doesn't it?  Or is that left to user space?

Eric

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  0:43               ` Matt Helsley
@ 2009-01-07  1:10                 ` Trond Myklebust
  0 siblings, 0 replies; 47+ messages in thread
From: Trond Myklebust @ 2009-01-07  1:10 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Serge E. Hallyn, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
	Eric W. Biederman, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 16:43 -0800, Matt Helsley wrote:
> On Tue, 2009-01-06 at 19:20 -0500, Trond Myklebust wrote:
> > On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote:
> > > IMHO This seems more incorrect than trying to use a more proximal
> > > namespace.
> > 
> > You have yet to explain why.
> 
> It's "more proximal" -- it's closer to the container that we expect to
> cause (directly or otherwise) the bulk of the RPC calls for that mount.
> If the container does not wind up sharing that mount with other
> containers then the reported node name matches. If the container winds
> up sharing the mount with other containers then at least we can learn
> which container originated the mount.
> 
> I imagine an NFS administrator trying to determine the source of a bunch
> of RPC calls. If we just report the initial namespace then that
> administrator has to do lots more digging to determine which container
> sent the calls (assuming they aren't in different network namespaces).

As I said elsewhere, the network namespaces do not matter, since we
always create the sockets via the rpciod workqueue.

> By not always reporting the initial namespace we may give the
> administrator one way to narrow down the search. Even if the reported
> node name does not perfectly match the source of all RPC traffic related
> to the mount at least the administrator gets something more specific.

The administrator would have to be running wireshark or something in
order to track the rpc calls. That's hardly a convenient interface, or a
compelling reason to add namespace info into RPC credentials. We
certainly haven't added any other information to the RPC calls in order
to allow the administrator to identify which process it originated from.

  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  0:38                     ` Trond Myklebust
@ 2009-01-07  1:44                       ` J. Bruce Fields
  2009-01-07  1:50                         ` Trond Myklebust
  2009-01-07  2:37                         ` Eric W. Biederman
  0 siblings, 2 replies; 47+ messages in thread
From: J. Bruce Fields @ 2009-01-07  1:44 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

On Tue, Jan 06, 2009 at 07:38:26PM -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 19:26 -0500, J. Bruce Fields wrote:
> > On Tue, Jan 06, 2009 at 07:20:07PM -0500, Trond Myklebust wrote:
> > > As for the NFSv4 clientid, I can't see how you would ever want to use
> > > anything other than the init->utsname(), since the requirement is only
> > > that the clientid string be unique and preserved across reboots. The
> > > server isn't allowed to interpret the contents of the clientid string.
> > > Ditto for the RPCSEC_GSS machine creds that are used to establish the
> > > clientid.
> > 
> > If people eventually expect to be able to, say, migrate a container to
> > another host while using an nfs mount as their storage, then they'd need
> > the name to be that of the container, not of the host.
> 
> Why?
>
> > Obviously we'd also need to ensure the container got its own nfsv4
> > client state, etc., etc., and it sounds like we're far from that.
> 
> Again, why? Are you seriously proposing a plan to transport all NFS and
> locking state directly from one kernel to another?

If people seem to think they can do live process and container
migration:

	http://ols.fedoraproject.org/OLS/Reprints-2008/mirkin-reprint.pdf

then moving the NFS state strikes me as not the greatest of their
troubles.

I'm not volunteering!

--b.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  1:44                       ` J. Bruce Fields
@ 2009-01-07  1:50                         ` Trond Myklebust
  2009-01-07  2:37                         ` Eric W. Biederman
  1 sibling, 0 replies; 47+ messages in thread
From: Trond Myklebust @ 2009-01-07  1:50 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley,
	Linux Containers, linux-nfs, Linux Kernel Mailing List,
	Chuck Lever, Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 20:44 -0500, J. Bruce Fields wrote:
> > Again, why? Are you seriously proposing a plan to transport all NFS and
> > locking state directly from one kernel to another?
> 
> If people seem to think they can do live process and container
> migration:
> 
> 	http://ols.fedoraproject.org/OLS/Reprints-2008/mirkin-reprint.pdf
> 
> then moving the NFS state strikes me as not the greatest of their
> troubles.

ROTFL...


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  1:44                       ` J. Bruce Fields
  2009-01-07  1:50                         ` Trond Myklebust
@ 2009-01-07  2:37                         ` Eric W. Biederman
  1 sibling, 0 replies; 47+ messages in thread
From: Eric W. Biederman @ 2009-01-07  2:37 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Trond Myklebust, Serge E. Hallyn, Matt Helsley, Linux Containers,
	linux-nfs, Linux Kernel Mailing List, Chuck Lever,
	Linux Containers, Cedric Le Goater

"J. Bruce Fields" <bfields@fieldses.org> writes:

> If people seem to think they can do live process and container
> migration:
>
> 	http://ols.fedoraproject.org/OLS/Reprints-2008/mirkin-reprint.pdf
>
> then moving the NFS state strikes me as not the greatest of their
> troubles.
>
> I'm not volunteering!

Reasonable.  There are a lot easier cases than NFS, and migration is
not the biggest use case.

A lot of this work we can do incrementally and simply not support
doing everything in a container.  Which is where we are today.

The whole hostname in NFS packets isn't something that makes much sense
to me in the first place, so I can't comment on a good use case for that.

Sorting out the details so we can get NFS working in multiple network namespaces
makes a lot of sense to me.   It allows things like VPNing into work isolating the
VPN process in a network namespace, and still being able to use all of the kernel
networking including NFS like normal sounds fun to me.

There may also be some fun things you can do with testing having both an NFS
server and the client on the same box and not go through the loopback device.

Eric

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 3/4] sunrpc: Improve UTS namespace workaround
  2009-01-06 16:02   ` Chuck Lever
  2009-01-07  0:28     ` Matt Helsley
@ 2009-01-07  3:02     ` Matt Helsley
  1 sibling, 0 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-07  3:02 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Linux Containers, J. Bruce Fields, Cedric Le Goater,
	Linux Kernel Mailing List, linux-nfs, Trond Myklebust,
	Eric W. Biederman, Linux Containers

Argh, missed some questions I should've answered earlier...

On Tue, 2009-01-06 at 11:02 -0500, Chuck Lever wrote:
> Matt-
> 
> Thanks for pursuing a permanent fix for this.
> 
> On Jan 5, 2009, at Jan 5, 2009, 8:13 PM, Matt Helsley wrote:
> 
> > We can improve upon a workaround applied in commit
> > 63ffc23d307c9534c732edd87895e37b223004a3 ( http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=63ffc23d307c9534c732edd87895e37b223004a3 
> >  )
> >
> > The original problem was:
> >
> > "On a system with nfs mounts, if a task unshares its mount namespace,
> > a oops can occur when the system is rebooted if the task is the last
> > to unreference the nfs mount. It will try to create a rpc request
> > using utsname() which has been invalidated by free_nsproxy()."
> >
> > Cedric worked around this by always using the initial uts namespace  
> > for RPC.
> > Critically this workaround meant that RPC clients in uts namespaces  
> > can never
> > report the changed nodename when utilizing RPC.
> >
> > Fix that by storing the nodename in the NFS server structure (part  
> > of the NFS
> > super block) and, when an RPC client is operating on behalf of NFS,  
> > reporting
> > that nodename. This solves the problem for NFS clients but leaves  
> > any other
> > RPC users out in the cold.
> >
> > Rather than caching the nodename in the client structure RPC should  
> > obtain the
> > nodename from RPC callers. It would then be up to those services  
> > making RPC
> > calls to cache the nodename for as long as necessary -- somewhat  
> > like this patch
> > does with NFS.
> 
> Instead of having the RPC client call the consumer back, why can't you  
> pass the nodename as an argument to each RPC call; say, via the  
> rpc_message structure?

I suspect it would work and it would avoid the layering violation you
pointed out.

> > NOTE: Part of Cedric's workaround -- use of the initial uts  
> > namespace -- is
> > still necessary because non-NFS RPC callers still rely on the  
> > nodename cached
> > with the RPC client struct.
> 
> In the long run I think it would be more useful to spell out where  
> each consumer gets its nodename value, rather than having a convenient  
> default value.  A default would encourage exposing nodenames  
> inappropriately due to sloppy coding and incorrect assumptions (on the  
> developer's part) about a complex API.

You make some good points here. I'll add your idea to my list of patches
to try writing:
	1. Store a reference to the uts namespace with the credentials
	2. Pass the nodename directly for each RPC call (via rpc_message
		perhaps)

<snip>

> > Index: linux-2.6.28/include/linux/sunrpc/clnt.h
> > ===================================================================
> > --- linux-2.6.28.orig/include/linux/sunrpc/clnt.h
> > +++ linux-2.6.28/include/linux/sunrpc/clnt.h
> > @@ -19,6 +19,7 @@
> > #include <asm/signal.h>
> >
> > struct rpc_inode;
> > +struct nfs_server;
> >
> > /*
> >  * The high-level client handle
> > @@ -50,6 +51,7 @@ struct rpc_clnt {
> >
> > 	int			cl_nodelen;	/* nodename length */
> > 	char 			cl_nodename[UNX_MAXNODENAME];
> > +	struct nfs_server	*cl_nfs_server;
> 
> This is a layering violation...  I would rather avoid introducing new  
> strong data structure dependencies on one of RPC's consumers.

<snip>

I addressed your other comments in an earlier reply.

Thanks!

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
  2009-01-07  0:23               ` Trond Myklebust
@ 2009-01-07  3:44                 ` Matt Helsley
  0 siblings, 0 replies; 47+ messages in thread
From: Matt Helsley @ 2009-01-07  3:44 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: J. Bruce Fields, Serge E. Hallyn, Linux Containers, linux-nfs,
	Linux Kernel Mailing List, Chuck Lever, Eric W. Biederman,
	Linux Containers, Cedric Le Goater

On Tue, 2009-01-06 at 19:23 -0500, Trond Myklebust wrote:
> On Tue, 2009-01-06 at 19:20 -0500, J. Bruce Fields wrote:
> > If it would be possible, for example, for the 'init' namespace to have
> > no network interfaces at all, then it would be nicer to use a name
> > that's at least been used with nfs at *some* point--just on the general
> > principle of not leaking information to a domain that the user wouldn't
> > expect it to.
> 
> Then RPC would fail. Thanks to the limitations imposed by selinux &
> friends, all RPC sockets have to be owned by the init process.

Interesting -- I'm not familiar with this requirement of selinux. Must
it be the init process of the initial pid namespace or could any pid
namespace's init process own it?

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2009-01-07  3:45 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-06  1:13 [RFC][PATCH 0/4] utsns: RPC/NFS bug rework Matt Helsley
2009-01-06  1:13 ` [RFC][PATCH 1/4] Remove useless utsname.h includes Matt Helsley
2009-01-06  1:13 ` [RFC][PATCH 2/4] sunrpc: Use utsnamespaces Matt Helsley
2009-01-06 20:02   ` Serge E. Hallyn
2009-01-06 20:20     ` J. Bruce Fields
2009-01-06 21:53       ` Serge E. Hallyn
2009-01-06 23:35         ` Matt Helsley
2009-01-06 22:43       ` Matt Helsley
2009-01-06 20:44     ` Trond Myklebust
2009-01-06 21:58       ` Serge E. Hallyn
2009-01-06 22:42         ` Trond Myklebust
2009-01-07  0:08           ` Matt Helsley
2009-01-07  0:20             ` Trond Myklebust
2009-01-07  0:43               ` Matt Helsley
2009-01-07  1:10                 ` Trond Myklebust
2009-01-07  0:20             ` J. Bruce Fields
2009-01-07  0:23               ` Trond Myklebust
2009-01-07  3:44                 ` Matt Helsley
2009-01-06 23:04         ` Eric W. Biederman
2009-01-06 23:15           ` Trond Myklebust
2009-01-06 23:32             ` J. Bruce Fields
2009-01-06 23:35               ` Trond Myklebust
2009-01-06 23:48                 ` Matt Helsley
2009-01-06 23:51                 ` Chuck Lever
2009-01-06 23:53                 ` J. Bruce Fields
2009-01-07  0:07                   ` Matt Helsley
2009-01-07  0:55                     ` Eric W. Biederman
2009-01-07  0:20                   ` Trond Myklebust
2009-01-07  0:20                 ` Trond Myklebust
2009-01-07  0:26                   ` J. Bruce Fields
2009-01-07  0:38                     ` Trond Myklebust
2009-01-07  1:44                       ` J. Bruce Fields
2009-01-07  1:50                         ` Trond Myklebust
2009-01-07  2:37                         ` Eric W. Biederman
2009-01-06 23:30         ` Matt Helsley
2009-01-06 23:18       ` Matt Helsley
2009-01-06 23:43         ` Trond Myklebust
2009-01-06 23:58           ` Matt Helsley
2009-01-06 22:29     ` Chuck Lever
2009-01-07  0:01       ` Serge E. Hallyn
2009-01-06  1:13 ` [RFC][PATCH 3/4] sunrpc: Improve UTS namespace workaround Matt Helsley
2009-01-06 16:02   ` Chuck Lever
2009-01-07  0:28     ` Matt Helsley
2009-01-07  3:02     ` Matt Helsley
2009-01-06  1:13 ` [RFC][PATCH 4/4] Represent RPC Callers Matt Helsley
2009-01-06 13:04   ` Trond Myklebust
2009-01-06 23:05     ` Matt Helsley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox