* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces
@ 2009-01-07 0:39 trond.myklebust
2009-01-07 0:57 ` Matt Helsley
0 siblings, 1 reply; 42+ messages in thread
From: trond.myklebust @ 2009-01-07 0:39 UTC (permalink / raw)
To: Matt Helsley
Cc: Serge E. Hallyn, Linux Containers, linux-nfs,
Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever,
Eric W. Biederman, Linux Containers, Cedric Le Goater
On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote:
> IMHO This seems more incorrect than trying to use a more proximal
namespace.
You have yet to explain why.
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:39 [RFC][PATCH 2/4] sunrpc: Use utsnamespaces trond.myklebust @ 2009-01-07 0:57 ` Matt Helsley 2009-01-07 1:02 ` Trond Myklebust 0 siblings, 1 reply; 42+ messages in thread From: Matt Helsley @ 2009-01-07 0:57 UTC (permalink / raw) To: trond.myklebust Cc: Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 19:39 -0500, trond.myklebust@fys.uio.no wrote: > On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote: > > IMHO This seems more incorrect than trying to use a more proximal > namespace. > > You have yet to explain why. Yes, I know I have to explain why. If you would please have a little more patience you might see I have replied to your original request for an explanation -- I *just* sent it out. If I don't see it show up soon I'll resend it as a reply here. OK? Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:57 ` Matt Helsley @ 2009-01-07 1:02 ` Trond Myklebust 2009-01-07 1:22 ` Matt Helsley 0 siblings, 1 reply; 42+ messages in thread From: Trond Myklebust @ 2009-01-07 1:02 UTC (permalink / raw) To: Matt Helsley Cc: Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 16:57 -0800, Matt Helsley wrote: > On Tue, 2009-01-06 at 19:39 -0500, trond.myklebust@fys.uio.no wrote: > > On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote: > > > IMHO This seems more incorrect than trying to use a more proximal > > namespace. > > > > You have yet to explain why. > > Yes, I know I have to explain why. If you would please have a little > more patience you might see I have replied to your original request for > an explanation -- I *just* sent it out. If I don't see it show up soon > I'll resend it as a reply here. OK? Sorry if you saw the request twice. My mail server at uio.no appears to be on the blink, so I switched to the netapp account... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 1:02 ` Trond Myklebust @ 2009-01-07 1:22 ` Matt Helsley 0 siblings, 0 replies; 42+ messages in thread From: Matt Helsley @ 2009-01-07 1:22 UTC (permalink / raw) To: Trond Myklebust Cc: Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 20:02 -0500, Trond Myklebust wrote: > On Tue, 2009-01-06 at 16:57 -0800, Matt Helsley wrote: > > On Tue, 2009-01-06 at 19:39 -0500, trond.myklebust@fys.uio.no wrote: > > > On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote: > > > > IMHO This seems more incorrect than trying to use a more proximal > > > namespace. > > > > > > You have yet to explain why. > > > > Yes, I know I have to explain why. If you would please have a little > > more patience you might see I have replied to your original request for > > an explanation -- I *just* sent it out. If I don't see it show up soon > > I'll resend it as a reply here. OK? > > Sorry if you saw the request twice. My mail server at uio.no appears to > be on the blink, so I switched to the netapp account... Ahhh, ok. Sorry I misinterpretted that.. Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 42+ messages in thread
* [RFC][PATCH 0/4] utsns: RPC/NFS bug rework
@ 2009-01-06 1:13 Matt Helsley
2009-01-06 1:13 ` [RFC][PATCH 2/4] sunrpc: Use utsnamespaces Matt Helsley
0 siblings, 1 reply; 42+ messages in thread
From: Matt Helsley @ 2009-01-06 1:13 UTC (permalink / raw)
To: Linux Containers
Cc: J. Bruce Fields, Cedric Le Goater, Linux Kernel Mailing List,
linux-nfs, Trond Myklebust, Chuck Lever, Eric W. Biederman
This series replaces the workaround for a bug reported by Cedric Le Goater
<clg@fr.ibm.com> back in September:
> On a system with nfs mounts, if a task unshares its mount namespace,
> a oops can occur when the system is rebooted if the task is the last
> to unreference the nfs mount. It will try to create a rpc request
> using utsname() which has been invalidated by free_nsproxy().
Cedric worked around this bug by always using the initial uts namespace's
nodename (see commit: 63ffc23d307c9534c732edd87895e37b223004a3).
This is a non-ideal solution because uts namespace nodenames are reported
as the hostname during RPC authentication. Consider a machine configured
to export directories via NFS from a parent container to designated "trusted"
child containers. It would be unable to rely on the hostname during RPC
authentication -- forcing the administration of more advanced authentication
systems than might otherwise be necessary.
The goal of this series is to report a namespace's UTS nodename rather than
the initial UTS namespace's nodename during RPC-call authentication -- much as
before Cedric's workaround. By changing the way that the nodename is cached and
fetched we can simultaneously avoid the NULL dereference during shutdown and
ensure that amalgamated RPC services (such as statd, lockd, mountd for NFS) see
a consistent nodename.
Cheers,
-Matt Helsley
--
^ permalink raw reply [flat|nested] 42+ messages in thread* [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 1:13 [RFC][PATCH 0/4] utsns: RPC/NFS bug rework Matt Helsley @ 2009-01-06 1:13 ` Matt Helsley 2009-01-06 20:02 ` Serge E. Hallyn 0 siblings, 1 reply; 42+ messages in thread From: Matt Helsley @ 2009-01-06 1:13 UTC (permalink / raw) To: Linux Containers Cc: J. Bruce Fields, Cedric Le Goater, Linux Kernel Mailing List, linux-nfs, Trond Myklebust, Chuck Lever, Eric W. Biederman, Matt Helsley, Linux Containers [-- Attachment #1: reenable-uts-ns-for-most-rpc-clients.patch --] [-- Type: text/plain, Size: 1636 bytes --] We can often specify the UTS namespace to use when starting an RPC client. However sometimes no UTS namespace is available (specifically during system shutdown as the last NFS mount in a container is unmounted) so fall back to the initial UTS namespace. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Cc: linux-nfs@vger.kernel.org Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Linux Containers <containers@lists.osdl.org> --- net/sunrpc/clnt.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) Index: linux-2.6.28/net/sunrpc/clnt.c =================================================================== --- linux-2.6.28.orig/net/sunrpc/clnt.c +++ linux-2.6.28/net/sunrpc/clnt.c @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client( struct rpc_version *version; struct rpc_clnt *clnt = NULL; struct rpc_auth *auth; + struct new_utsname *uts_ns = init_utsname(); int err; size_t len; @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client( } /* save the nodename */ - clnt->cl_nodelen = strlen(init_utsname()->nodename); + if (current->nsproxy != NULL) + uts_ns = utsname(); + clnt->cl_nodelen = strlen(uts_ns->nodename); if (clnt->cl_nodelen > UNX_MAXNODENAME) clnt->cl_nodelen = UNX_MAXNODENAME; - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen); + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen); rpc_register_client(clnt); return clnt; -- ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 1:13 ` [RFC][PATCH 2/4] sunrpc: Use utsnamespaces Matt Helsley @ 2009-01-06 20:02 ` Serge E. Hallyn 2009-01-06 20:20 ` J. Bruce Fields ` (2 more replies) 0 siblings, 3 replies; 42+ messages in thread From: Serge E. Hallyn @ 2009-01-06 20:02 UTC (permalink / raw) To: Matt Helsley Cc: Linux Containers, linux-nfs, Linux Kernel Mailing List, Trond Myklebust, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater Quoting Matt Helsley (matthltc@us.ibm.com): > We can often specify the UTS namespace to use when starting an RPC client. > However sometimes no UTS namespace is available (specifically during system > shutdown as the last NFS mount in a container is unmounted) so fall > back to the initial UTS namespace. So what happens if we take this patch and do nothing else? The only potential problem situation will be rpc requests made on behalf of a container in which the last task has exited, right? So let's say a container did an nfs mount and then exits, causing an nfs umount request. That umount request will now be sent with the wrong nodename. Does that actually cause problems, will the server use the nodename to try and determine the client sending the request? thanks, -serge > Signed-off-by: Matt Helsley <matthltc@us.ibm.com> > Cc: Cedric Le Goater <clg@fr.ibm.com> > Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> > Cc: linux-nfs@vger.kernel.org > Cc: Trond Myklebust <trond.myklebust@fys.uio.no> > Cc: Chuck Lever <chuck.lever@oracle.com> > Cc: Eric W. Biederman <ebiederm@xmission.com> > Cc: Linux Containers <containers@lists.osdl.org> > > --- > net/sunrpc/clnt.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > Index: linux-2.6.28/net/sunrpc/clnt.c > =================================================================== > --- linux-2.6.28.orig/net/sunrpc/clnt.c > +++ linux-2.6.28/net/sunrpc/clnt.c > @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client( > struct rpc_version *version; > struct rpc_clnt *clnt = NULL; > struct rpc_auth *auth; > + struct new_utsname *uts_ns = init_utsname(); > int err; > size_t len; > > @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client( > } > > /* save the nodename */ > - clnt->cl_nodelen = strlen(init_utsname()->nodename); > + if (current->nsproxy != NULL) > + uts_ns = utsname(); > + clnt->cl_nodelen = strlen(uts_ns->nodename); > if (clnt->cl_nodelen > UNX_MAXNODENAME) > clnt->cl_nodelen = UNX_MAXNODENAME; > - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen); > + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen); > rpc_register_client(clnt); > return clnt; > > > -- > _______________________________________________ > Containers mailing list > Containers@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 20:02 ` Serge E. Hallyn @ 2009-01-06 20:20 ` J. Bruce Fields 2009-01-06 21:53 ` Serge E. Hallyn 2009-01-06 22:43 ` Matt Helsley 2009-01-06 20:44 ` Trond Myklebust 2009-01-06 22:29 ` Chuck Lever 2 siblings, 2 replies; 42+ messages in thread From: J. Bruce Fields @ 2009-01-06 20:20 UTC (permalink / raw) To: Serge E. Hallyn Cc: Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Trond Myklebust, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote: > Quoting Matt Helsley (matthltc@us.ibm.com): > > We can often specify the UTS namespace to use when starting an RPC client. > > However sometimes no UTS namespace is available (specifically during system > > shutdown as the last NFS mount in a container is unmounted) so fall > > back to the initial UTS namespace. > > So what happens if we take this patch and do nothing else? > > The only potential problem situation will be rpc requests > made on behalf of a container in which the last task has > exited, right? So let's say a container did an nfs mount > and then exits, causing an nfs umount request. > > That umount request will now be sent with the wrong nodename. > Does that actually cause problems, will the server use the > nodename to try and determine the client sending the request? This is just the machine name in the auth_unix credential? The linux server ignores that completely (for the purpose of auth_unix authenication, it identifies clients only by source ip address). I suspect other servers also ignore it, but I don't know. --b. > > thanks, > -serge > > > Signed-off-by: Matt Helsley <matthltc@us.ibm.com> > > Cc: Cedric Le Goater <clg@fr.ibm.com> > > Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> > > Cc: linux-nfs@vger.kernel.org > > Cc: Trond Myklebust <trond.myklebust@fys.uio.no> > > Cc: Chuck Lever <chuck.lever@oracle.com> > > Cc: Eric W. Biederman <ebiederm@xmission.com> > > Cc: Linux Containers <containers@lists.osdl.org> > > > > --- > > net/sunrpc/clnt.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > Index: linux-2.6.28/net/sunrpc/clnt.c > > =================================================================== > > --- linux-2.6.28.orig/net/sunrpc/clnt.c > > +++ linux-2.6.28/net/sunrpc/clnt.c > > @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client( > > struct rpc_version *version; > > struct rpc_clnt *clnt = NULL; > > struct rpc_auth *auth; > > + struct new_utsname *uts_ns = init_utsname(); > > int err; > > size_t len; > > > > @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client( > > } > > > > /* save the nodename */ > > - clnt->cl_nodelen = strlen(init_utsname()->nodename); > > + if (current->nsproxy != NULL) > > + uts_ns = utsname(); > > + clnt->cl_nodelen = strlen(uts_ns->nodename); > > if (clnt->cl_nodelen > UNX_MAXNODENAME) > > clnt->cl_nodelen = UNX_MAXNODENAME; > > - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen); > > + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen); > > rpc_register_client(clnt); > > return clnt; > > > > > > -- > > _______________________________________________ > > Containers mailing list > > Containers@lists.linux-foundation.org > > https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 20:20 ` J. Bruce Fields @ 2009-01-06 21:53 ` Serge E. Hallyn 2009-01-06 23:35 ` Matt Helsley 2009-01-06 22:43 ` Matt Helsley 1 sibling, 1 reply; 42+ messages in thread From: Serge E. Hallyn @ 2009-01-06 21:53 UTC (permalink / raw) To: J. Bruce Fields Cc: Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Trond Myklebust, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater Quoting J. Bruce Fields (bfields@fieldses.org): > On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote: > > Quoting Matt Helsley (matthltc@us.ibm.com): > > > We can often specify the UTS namespace to use when starting an RPC client. > > > However sometimes no UTS namespace is available (specifically during system > > > shutdown as the last NFS mount in a container is unmounted) so fall > > > back to the initial UTS namespace. > > > > So what happens if we take this patch and do nothing else? > > > > The only potential problem situation will be rpc requests > > made on behalf of a container in which the last task has > > exited, right? So let's say a container did an nfs mount > > and then exits, causing an nfs umount request. > > > > That umount request will now be sent with the wrong nodename. > > Does that actually cause problems, will the server use the > > nodename to try and determine the client sending the request? > > This is just the machine name in the auth_unix credential? The linux > server ignores that completely (for the purpose of auth_unix > authenication, it identifies clients only by source ip address). I > suspect other servers also ignore it, but I don't know. Thanks, that's what i was hoping... Matt, have you audited the other rpc-based services? Do any of them care? -serge ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 21:53 ` Serge E. Hallyn @ 2009-01-06 23:35 ` Matt Helsley 0 siblings, 0 replies; 42+ messages in thread From: Matt Helsley @ 2009-01-06 23:35 UTC (permalink / raw) To: Serge E. Hallyn Cc: J. Bruce Fields, Linux Containers, linux-nfs, Linux Kernel Mailing List, Trond Myklebust, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 15:53 -0600, Serge E. Hallyn wrote: > Quoting J. Bruce Fields (bfields@fieldses.org): > > On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote: > > > Quoting Matt Helsley (matthltc@us.ibm.com): > > > > We can often specify the UTS namespace to use when starting an RPC client. > > > > However sometimes no UTS namespace is available (specifically during system > > > > shutdown as the last NFS mount in a container is unmounted) so fall > > > > back to the initial UTS namespace. > > > > > > So what happens if we take this patch and do nothing else? > > > > > > The only potential problem situation will be rpc requests > > > made on behalf of a container in which the last task has > > > exited, right? So let's say a container did an nfs mount > > > and then exits, causing an nfs umount request. > > > > > > That umount request will now be sent with the wrong nodename. > > > Does that actually cause problems, will the server use the > > > nodename to try and determine the client sending the request? > > > > This is just the machine name in the auth_unix credential? The linux > > server ignores that completely (for the purpose of auth_unix > > authenication, it identifies clients only by source ip address). I > > suspect other servers also ignore it, but I don't know. > > Thanks, that's what i was hoping... > > Matt, have you audited the other rpc-based services? Do any > of them care? Frankly, I did not audit any of the RPC-based services to see if any truly cared about the node name. That's actually a rather large scope when you consider it -- I'd have to look at much more than just "Linux" RPC clients. It seemed unsafe to assume _nobody_ would care after they bothered to put it into the spec. Hopefully I'm wrong though :). Cheers, -Matt ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 20:20 ` J. Bruce Fields 2009-01-06 21:53 ` Serge E. Hallyn @ 2009-01-06 22:43 ` Matt Helsley 1 sibling, 0 replies; 42+ messages in thread From: Matt Helsley @ 2009-01-06 22:43 UTC (permalink / raw) To: J. Bruce Fields Cc: Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, Trond Myklebust, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 15:20 -0500, J. Bruce Fields wrote: > On Tue, Jan 06, 2009 at 02:02:29PM -0600, Serge E. Hallyn wrote: > > Quoting Matt Helsley (matthltc@us.ibm.com): > > > We can often specify the UTS namespace to use when starting an RPC client. > > > However sometimes no UTS namespace is available (specifically during system > > > shutdown as the last NFS mount in a container is unmounted) so fall > > > back to the initial UTS namespace. > > > > So what happens if we take this patch and do nothing else? > > > > The only potential problem situation will be rpc requests > > made on behalf of a container in which the last task has > > exited, right? So let's say a container did an nfs mount > > and then exits, causing an nfs umount request. > > > > That umount request will now be sent with the wrong nodename. > > Does that actually cause problems, will the server use the > > nodename to try and determine the client sending the request? > > This is just the machine name in the auth_unix credential? The linux > server ignores that completely (for the purpose of auth_unix > authenication, it identifies clients only by source ip address). I > suspect other servers also ignore it, but I don't know. > > --b. I was wondering about this because I kept coming back to the question of how the server could trust the name the client supplies (seems it can't). This is very useful information -- it certainly suggests that this patch would be sufficient. Thanks! Cheers, -Matt Helsley > > > > thanks, > > -serge > > > > > Signed-off-by: Matt Helsley <matthltc@us.ibm.com> > > > Cc: Cedric Le Goater <clg@fr.ibm.com> > > > Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> > > > Cc: linux-nfs@vger.kernel.org > > > Cc: Trond Myklebust <trond.myklebust@fys.uio.no> > > > Cc: Chuck Lever <chuck.lever@oracle.com> > > > Cc: Eric W. Biederman <ebiederm@xmission.com> > > > Cc: Linux Containers <containers@lists.osdl.org> > > > > > > --- > > > net/sunrpc/clnt.c | 7 +++++-- > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > Index: linux-2.6.28/net/sunrpc/clnt.c > > > =================================================================== > > > --- linux-2.6.28.orig/net/sunrpc/clnt.c > > > +++ linux-2.6.28/net/sunrpc/clnt.c > > > @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client( > > > struct rpc_version *version; > > > struct rpc_clnt *clnt = NULL; > > > struct rpc_auth *auth; > > > + struct new_utsname *uts_ns = init_utsname(); > > > int err; > > > size_t len; > > > > > > @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client( > > > } > > > > > > /* save the nodename */ > > > - clnt->cl_nodelen = strlen(init_utsname()->nodename); > > > + if (current->nsproxy != NULL) > > > + uts_ns = utsname(); > > > + clnt->cl_nodelen = strlen(uts_ns->nodename); > > > if (clnt->cl_nodelen > UNX_MAXNODENAME) > > > clnt->cl_nodelen = UNX_MAXNODENAME; > > > - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt->cl_nodelen); > > > + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen); > > > rpc_register_client(clnt); > > > return clnt; > > > > > > > > > -- > > > _______________________________________________ > > > Containers mailing list > > > Containers@lists.linux-foundation.org > > > https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 20:02 ` Serge E. Hallyn 2009-01-06 20:20 ` J. Bruce Fields @ 2009-01-06 20:44 ` Trond Myklebust 2009-01-06 21:58 ` Serge E. Hallyn 2009-01-06 23:18 ` Matt Helsley 2009-01-06 22:29 ` Chuck Lever 2 siblings, 2 replies; 42+ messages in thread From: Trond Myklebust @ 2009-01-06 20:44 UTC (permalink / raw) To: Serge E. Hallyn Cc: Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote: > Quoting Matt Helsley (matthltc@us.ibm.com): > > We can often specify the UTS namespace to use when starting an RPC client. > > However sometimes no UTS namespace is available (specifically during system > > shutdown as the last NFS mount in a container is unmounted) so fall > > back to the initial UTS namespace. > > So what happens if we take this patch and do nothing else? > > The only potential problem situation will be rpc requests > made on behalf of a container in which the last task has > exited, right? So let's say a container did an nfs mount > and then exits, causing an nfs umount request. > > That umount request will now be sent with the wrong nodename. > Does that actually cause problems, will the server use the > nodename to try and determine the client sending the request? The NFSv2/v3 umount rpc call will be sent by the 'umount' program from userspace, not the kernel. The problem here is that because lazy mounts exist, the lifetime of the RPC client may be longer than that of the container. In addition, it may be shared among more than 1 container, because superblocks can be shared. One thing you need to be aware of here is that inode dirty data writebacks may be initiated by completely different processes than the one that dirtied the inode. IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which rely on being able to determine the container-specific node name at RPC generation time are therefore going to return incorrect values. Trond ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 20:44 ` Trond Myklebust @ 2009-01-06 21:58 ` Serge E. Hallyn 2009-01-06 22:42 ` Trond Myklebust ` (2 more replies) 2009-01-06 23:18 ` Matt Helsley 1 sibling, 3 replies; 42+ messages in thread From: Serge E. Hallyn @ 2009-01-06 21:58 UTC (permalink / raw) To: Trond Myklebust Cc: Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater Quoting Trond Myklebust (trond.myklebust@fys.uio.no): > On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote: > > Quoting Matt Helsley (matthltc@us.ibm.com): > > > We can often specify the UTS namespace to use when starting an RPC client. > > > However sometimes no UTS namespace is available (specifically during system > > > shutdown as the last NFS mount in a container is unmounted) so fall > > > back to the initial UTS namespace. > > > > So what happens if we take this patch and do nothing else? > > > > The only potential problem situation will be rpc requests > > made on behalf of a container in which the last task has > > exited, right? So let's say a container did an nfs mount > > and then exits, causing an nfs umount request. > > > > That umount request will now be sent with the wrong nodename. > > Does that actually cause problems, will the server use the > > nodename to try and determine the client sending the request? > > The NFSv2/v3 umount rpc call will be sent by the 'umount' program from > userspace, not the kernel. The problem here is that because lazy mounts > exist, the lifetime of the RPC client may be longer than that of the Right that was what i was referring to. > container. In addition, it may be shared among more than 1 container, > because superblocks can be shared. Good point. And in that case what do we care about (even though apparently we just might not care at all :) - who did the mount, or who is using it? In fact one thing I noticed in Matt's patch 3 was that he copied in the nodename verbatim, so a future hostname() by the container wouldn't be reflected, again not sure if that would matter. > One thing you need to be aware of here is that inode dirty data > writebacks may be initiated by completely different processes than the > one that dirtied the inode. Right, but I *was* thinking that we wanted to associate the nodename on the rpc calls with the hostname of the mounter, not the actor. Maybe you'll tell me above that that is bogus. > IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which > rely on being able to determine the container-specific node name at RPC > generation time are therefore going to return incorrect values. So should we use patch 2/4, plus (as someone - was it you? - suggested) using a DEFAULT instead of init_utsname()->nodename when current->utsname() == NULL? -serge ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 21:58 ` Serge E. Hallyn @ 2009-01-06 22:42 ` Trond Myklebust 2009-01-07 0:08 ` Matt Helsley 2009-01-06 23:04 ` Eric W. Biederman 2009-01-06 23:30 ` Matt Helsley 2 siblings, 1 reply; 42+ messages in thread From: Trond Myklebust @ 2009-01-06 22:42 UTC (permalink / raw) To: Serge E. Hallyn Cc: Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote: > So should we use patch 2/4, plus (as someone - was it you? - suggested) > using a DEFAULT instead of init_utsname()->nodename when > current->utsname() == NULL? No. I'm don't think that 2/4 is correct either. Basically, 2/4 is saying that the container that first mounts the filesystem 'owns' it. However at the same time we know that the lifetime of the filesystem is in no way bounded by the lifetime of the container, and that's what gets you into trouble with 'umount' in the first place. IMO, the current code is the most correct approach, in that it assumes that the filesystems are owned by the 'init' namespace. Trond ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 22:42 ` Trond Myklebust @ 2009-01-07 0:08 ` Matt Helsley 2009-01-07 0:20 ` Trond Myklebust 2009-01-07 0:20 ` J. Bruce Fields 0 siblings, 2 replies; 42+ messages in thread From: Matt Helsley @ 2009-01-07 0:08 UTC (permalink / raw) To: Trond Myklebust Cc: Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 17:42 -0500, Trond Myklebust wrote: > On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote: > > > So should we use patch 2/4, plus (as someone - was it you? - suggested) > > using a DEFAULT instead of init_utsname()->nodename when > > current->utsname() == NULL? > > No. I'm don't think that 2/4 is correct either. Basically, 2/4 is saying > that the container that first mounts the filesystem 'owns' it. However > at the same time we know that the lifetime of the filesystem is in no > way bounded by the lifetime of the container, and that's what gets you > into trouble with 'umount' in the first place. > > IMO, the current code is the most correct approach, in that it assumes > that the filesystems are owned by the 'init' namespace. IMHO This seems more incorrect than trying to use a more proximal namespace. Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:08 ` Matt Helsley @ 2009-01-07 0:20 ` Trond Myklebust 2009-01-07 0:43 ` Matt Helsley 2009-01-07 0:20 ` J. Bruce Fields 1 sibling, 1 reply; 42+ messages in thread From: Trond Myklebust @ 2009-01-07 0:20 UTC (permalink / raw) To: Matt Helsley Cc: Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote: > IMHO This seems more incorrect than trying to use a more proximal > namespace. You have yet to explain why. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:20 ` Trond Myklebust @ 2009-01-07 0:43 ` Matt Helsley 2009-01-07 1:10 ` Trond Myklebust 0 siblings, 1 reply; 42+ messages in thread From: Matt Helsley @ 2009-01-07 0:43 UTC (permalink / raw) To: Trond Myklebust Cc: Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 19:20 -0500, Trond Myklebust wrote: > On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote: > > IMHO This seems more incorrect than trying to use a more proximal > > namespace. > > You have yet to explain why. It's "more proximal" -- it's closer to the container that we expect to cause (directly or otherwise) the bulk of the RPC calls for that mount. If the container does not wind up sharing that mount with other containers then the reported node name matches. If the container winds up sharing the mount with other containers then at least we can learn which container originated the mount. I imagine an NFS administrator trying to determine the source of a bunch of RPC calls. If we just report the initial namespace then that administrator has to do lots more digging to determine which container sent the calls (assuming they aren't in different network namespaces). By not always reporting the initial namespace we may give the administrator one way to narrow down the search. Even if the reported node name does not perfectly match the source of all RPC traffic related to the mount at least the administrator gets something more specific. Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:43 ` Matt Helsley @ 2009-01-07 1:10 ` Trond Myklebust 0 siblings, 0 replies; 42+ messages in thread From: Trond Myklebust @ 2009-01-07 1:10 UTC (permalink / raw) To: Matt Helsley Cc: Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 16:43 -0800, Matt Helsley wrote: > On Tue, 2009-01-06 at 19:20 -0500, Trond Myklebust wrote: > > On Tue, 2009-01-06 at 16:08 -0800, Matt Helsley wrote: > > > IMHO This seems more incorrect than trying to use a more proximal > > > namespace. > > > > You have yet to explain why. > > It's "more proximal" -- it's closer to the container that we expect to > cause (directly or otherwise) the bulk of the RPC calls for that mount. > If the container does not wind up sharing that mount with other > containers then the reported node name matches. If the container winds > up sharing the mount with other containers then at least we can learn > which container originated the mount. > > I imagine an NFS administrator trying to determine the source of a bunch > of RPC calls. If we just report the initial namespace then that > administrator has to do lots more digging to determine which container > sent the calls (assuming they aren't in different network namespaces). As I said elsewhere, the network namespaces do not matter, since we always create the sockets via the rpciod workqueue. > By not always reporting the initial namespace we may give the > administrator one way to narrow down the search. Even if the reported > node name does not perfectly match the source of all RPC traffic related > to the mount at least the administrator gets something more specific. The administrator would have to be running wireshark or something in order to track the rpc calls. That's hardly a convenient interface, or a compelling reason to add namespace info into RPC credentials. We certainly haven't added any other information to the RPC calls in order to allow the administrator to identify which process it originated from. Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:08 ` Matt Helsley 2009-01-07 0:20 ` Trond Myklebust @ 2009-01-07 0:20 ` J. Bruce Fields 2009-01-07 0:23 ` Trond Myklebust 1 sibling, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2009-01-07 0:20 UTC (permalink / raw) To: Matt Helsley Cc: Trond Myklebust, Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, Jan 06, 2009 at 04:08:50PM -0800, Matt Helsley wrote: > On Tue, 2009-01-06 at 17:42 -0500, Trond Myklebust wrote: > > On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote: > > > > > So should we use patch 2/4, plus (as someone - was it you? - suggested) > > > using a DEFAULT instead of init_utsname()->nodename when > > > current->utsname() == NULL? > > > > No. I'm don't think that 2/4 is correct either. Basically, 2/4 is saying > > that the container that first mounts the filesystem 'owns' it. However > > at the same time we know that the lifetime of the filesystem is in no > > way bounded by the lifetime of the container, and that's what gets you > > into trouble with 'umount' in the first place. > > > > IMO, the current code is the most correct approach, in that it assumes > > that the filesystems are owned by the 'init' namespace. > > IMHO This seems more incorrect than trying to use a more proximal > namespace. If it would be possible, for example, for the 'init' namespace to have no network interfaces at all, then it would be nicer to use a name that's at least been used with nfs at *some* point--just on the general principle of not leaking information to a domain that the user wouldn't expect it to. (Assuming it's unlikely anyone would consider init's utsname to be sensitive information, that's a minor point.) --b. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:20 ` J. Bruce Fields @ 2009-01-07 0:23 ` Trond Myklebust 2009-01-07 3:44 ` Matt Helsley 0 siblings, 1 reply; 42+ messages in thread From: Trond Myklebust @ 2009-01-07 0:23 UTC (permalink / raw) To: J. Bruce Fields Cc: Matt Helsley, Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 19:20 -0500, J. Bruce Fields wrote: > If it would be possible, for example, for the 'init' namespace to have > no network interfaces at all, then it would be nicer to use a name > that's at least been used with nfs at *some* point--just on the general > principle of not leaking information to a domain that the user wouldn't > expect it to. Then RPC would fail. Thanks to the limitations imposed by selinux & friends, all RPC sockets have to be owned by the init process. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:23 ` Trond Myklebust @ 2009-01-07 3:44 ` Matt Helsley 0 siblings, 0 replies; 42+ messages in thread From: Matt Helsley @ 2009-01-07 3:44 UTC (permalink / raw) To: Trond Myklebust Cc: J. Bruce Fields, Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 19:23 -0500, Trond Myklebust wrote: > On Tue, 2009-01-06 at 19:20 -0500, J. Bruce Fields wrote: > > If it would be possible, for example, for the 'init' namespace to have > > no network interfaces at all, then it would be nicer to use a name > > that's at least been used with nfs at *some* point--just on the general > > principle of not leaking information to a domain that the user wouldn't > > expect it to. > > Then RPC would fail. Thanks to the limitations imposed by selinux & > friends, all RPC sockets have to be owned by the init process. Interesting -- I'm not familiar with this requirement of selinux. Must it be the init process of the initial pid namespace or could any pid namespace's init process own it? Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 21:58 ` Serge E. Hallyn 2009-01-06 22:42 ` Trond Myklebust @ 2009-01-06 23:04 ` Eric W. Biederman 2009-01-06 23:15 ` Trond Myklebust 2009-01-06 23:30 ` Matt Helsley 2 siblings, 1 reply; 42+ messages in thread From: Eric W. Biederman @ 2009-01-06 23:04 UTC (permalink / raw) To: Serge E. Hallyn Cc: Trond Myklebust, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater "Serge E. Hallyn" <serue@us.ibm.com> writes: > Quoting Trond Myklebust (trond.myklebust@fys.uio.no): >> On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote: >> > Quoting Matt Helsley (matthltc@us.ibm.com): >> > > We can often specify the UTS namespace to use when starting an RPC client. >> > > However sometimes no UTS namespace is available (specifically during > system >> > > shutdown as the last NFS mount in a container is unmounted) so fall >> > > back to the initial UTS namespace. >> > >> > So what happens if we take this patch and do nothing else? >> > >> > The only potential problem situation will be rpc requests >> > made on behalf of a container in which the last task has >> > exited, right? So let's say a container did an nfs mount >> > and then exits, causing an nfs umount request. >> > >> > That umount request will now be sent with the wrong nodename. >> > Does that actually cause problems, will the server use the >> > nodename to try and determine the client sending the request? >> >> The NFSv2/v3 umount rpc call will be sent by the 'umount' program from >> userspace, not the kernel. The problem here is that because lazy mounts >> exist, the lifetime of the RPC client may be longer than that of the > > Right that was what i was referring to. > >> container. In addition, it may be shared among more than 1 container, >> because superblocks can be shared. > > Good point. And in that case what do we care about (even though > apparently we just might not care at all :) - who did the mount, > or who is using it? > > In fact one thing I noticed in Matt's patch 3 was that he copied > in the nodename verbatim, so a future hostname() by the container > wouldn't be reflected, again not sure if that would matter. > >> One thing you need to be aware of here is that inode dirty data >> writebacks may be initiated by completely different processes than the >> one that dirtied the inode. > > Right, but I *was* thinking that we wanted to associate the nodename > on the rpc calls with the hostname of the mounter, not the actor. Maybe > you'll tell me above that that is bogus. > >> IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which >> rely on being able to determine the container-specific node name at RPC >> generation time are therefore going to return incorrect values. > > So should we use patch 2/4, plus (as someone - was it you? - suggested) > using a DEFAULT instead of init_utsname()->nodename when > current->utsname() == NULL? Is there any reason to believe that the kernel helper threads will ever have a useful namespace value? I don't think so. That implies to me you want to capture the value at mount time, and to pass it in to the rpc_call creation, and only at very specific well defined points where we interact with user space should we examine current->utsname(). At which point there should be no question of current->utsname() is valid as the user space process is alive. Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 23:04 ` Eric W. Biederman @ 2009-01-06 23:15 ` Trond Myklebust 2009-01-06 23:32 ` J. Bruce Fields 0 siblings, 1 reply; 42+ messages in thread From: Trond Myklebust @ 2009-01-06 23:15 UTC (permalink / raw) To: Eric W. Biederman Cc: Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote: > That implies to me you want to capture the value at mount time, and to > pass it in to the rpc_call creation, and only at very specific well > defined points where we interact with user space should we examine > current->utsname(). At which point there should be no question > of current->utsname() is valid as the user space process is alive. Why pretend that the filesystem is owned by a particular namespace? It can, and will be shared among many containers... Trond ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 23:15 ` Trond Myklebust @ 2009-01-06 23:32 ` J. Bruce Fields 2009-01-06 23:35 ` Trond Myklebust 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2009-01-06 23:32 UTC (permalink / raw) To: Trond Myklebust Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote: > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote: > > That implies to me you want to capture the value at mount time, and to > > pass it in to the rpc_call creation, and only at very specific well > > defined points where we interact with user space should we examine > > current->utsname(). At which point there should be no question > > of current->utsname() is valid as the user space process is alive. > > Why pretend that the filesystem is owned by a particular namespace? It > can, and will be shared among many containers... If the only purpose of this is to fill in the auth_unix cred then shouldn't it be part of whatever cred structures are passed around? But I doubt it's that important to get it right. --b. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 23:32 ` J. Bruce Fields @ 2009-01-06 23:35 ` Trond Myklebust 2009-01-06 23:48 ` Matt Helsley ` (3 more replies) 0 siblings, 4 replies; 42+ messages in thread From: Trond Myklebust @ 2009-01-06 23:35 UTC (permalink / raw) To: J. Bruce Fields Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote: > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote: > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote: > > > That implies to me you want to capture the value at mount time, and to > > > pass it in to the rpc_call creation, and only at very specific well > > > defined points where we interact with user space should we examine > > > current->utsname(). At which point there should be no question > > > of current->utsname() is valid as the user space process is alive. > > > > Why pretend that the filesystem is owned by a particular namespace? It > > can, and will be shared among many containers... > > If the only purpose of this is to fill in the auth_unix cred then > shouldn't it be part of whatever cred structures are passed around? So how does tracking it in a shared structure like the rpc_client help? If you consider it to be part of the cred, then it needs to be tracked in the cred... Trond ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 23:35 ` Trond Myklebust @ 2009-01-06 23:48 ` Matt Helsley 2009-01-06 23:51 ` Chuck Lever ` (2 subsequent siblings) 3 siblings, 0 replies; 42+ messages in thread From: Matt Helsley @ 2009-01-06 23:48 UTC (permalink / raw) To: Trond Myklebust Cc: J. Bruce Fields, Eric W. Biederman, Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 18:35 -0500, Trond Myklebust wrote: > On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote: > > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote: > > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote: > > > > That implies to me you want to capture the value at mount time, and to > > > > pass it in to the rpc_call creation, and only at very specific well > > > > defined points where we interact with user space should we examine > > > > current->utsname(). At which point there should be no question > > > > of current->utsname() is valid as the user space process is alive. > > > > > > Why pretend that the filesystem is owned by a particular namespace? It > > > can, and will be shared among many containers... > > > > If the only purpose of this is to fill in the auth_unix cred then > > shouldn't it be part of whatever cred structures are passed around? > > So how does tracking it in a shared structure like the rpc_client help? > If you consider it to be part of the cred, then it needs to be tracked > in the cred... > > Trond I originally discarded the idea of relating it to the credentials because it looked like the credential cache could be flushed at any time (according to the small portion of the RFC I read) so I was worried we'd have even worse lifetime issues to deal with. Cheers, -Matt ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 23:35 ` Trond Myklebust 2009-01-06 23:48 ` Matt Helsley @ 2009-01-06 23:51 ` Chuck Lever 2009-01-06 23:53 ` J. Bruce Fields 2009-01-07 0:20 ` Trond Myklebust 3 siblings, 0 replies; 42+ messages in thread From: Chuck Lever @ 2009-01-06 23:51 UTC (permalink / raw) To: Trond Myklebust Cc: J. Bruce Fields, Eric W. Biederman, Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Linux Containers, Cedric Le Goater On Jan 6, 2009, at Jan 6, 2009, 6:35 PM, Trond Myklebust wrote: > On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote: >> On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote: >>> On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote: >>>> That implies to me you want to capture the value at mount time, >>>> and to >>>> pass it in to the rpc_call creation, and only at very specific well >>>> defined points where we interact with user space should we examine >>>> current->utsname(). At which point there should be no question >>>> of current->utsname() is valid as the user space process is alive. >>> >>> Why pretend that the filesystem is owned by a particular >>> namespace? It >>> can, and will be shared among many containers... >> >> If the only purpose of this is to fill in the auth_unix cred then >> shouldn't it be part of whatever cred structures are passed around? > > So how does tracking it in a shared structure like the rpc_client > help? > If you consider it to be part of the cred, then it needs to be tracked > in the cred... I think generating a proper AUTH_SYS cred, given the existence of containers, is the essential question here. However, we use nodename for lock owners too... perhaps that deserves a separate solution. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 23:35 ` Trond Myklebust 2009-01-06 23:48 ` Matt Helsley 2009-01-06 23:51 ` Chuck Lever @ 2009-01-06 23:53 ` J. Bruce Fields 2009-01-07 0:07 ` Matt Helsley 2009-01-07 0:20 ` Trond Myklebust 2009-01-07 0:20 ` Trond Myklebust 3 siblings, 2 replies; 42+ messages in thread From: J. Bruce Fields @ 2009-01-06 23:53 UTC (permalink / raw) To: Trond Myklebust Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote: > On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote: > > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote: > > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote: > > > > That implies to me you want to capture the value at mount time, and to > > > > pass it in to the rpc_call creation, and only at very specific well > > > > defined points where we interact with user space should we examine > > > > current->utsname(). At which point there should be no question > > > > of current->utsname() is valid as the user space process is alive. > > > > > > Why pretend that the filesystem is owned by a particular namespace? It > > > can, and will be shared among many containers... > > > > If the only purpose of this is to fill in the auth_unix cred then > > shouldn't it be part of whatever cred structures are passed around? > > So how does tracking it in a shared structure like the rpc_client help? > If you consider it to be part of the cred, then it needs to be tracked > in the cred... Right, that's what I meant. It seems like overkill, though. Does anyone actually care whether these names are right? --b. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 23:53 ` J. Bruce Fields @ 2009-01-07 0:07 ` Matt Helsley 2009-01-07 0:55 ` Eric W. Biederman 2009-01-07 0:20 ` Trond Myklebust 1 sibling, 1 reply; 42+ messages in thread From: Matt Helsley @ 2009-01-07 0:07 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Eric W. Biederman, Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote: > On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote: > > On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote: > > > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote: > > > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote: > > > > > That implies to me you want to capture the value at mount time, and to > > > > > pass it in to the rpc_call creation, and only at very specific well > > > > > defined points where we interact with user space should we examine > > > > > current->utsname(). At which point there should be no question > > > > > of current->utsname() is valid as the user space process is alive. > > > > > > > > Why pretend that the filesystem is owned by a particular namespace? It > > > > can, and will be shared among many containers... > > > > > > If the only purpose of this is to fill in the auth_unix cred then > > > shouldn't it be part of whatever cred structures are passed around? > > > > So how does tracking it in a shared structure like the rpc_client help? > > If you consider it to be part of the cred, then it needs to be tracked > > in the cred... > > Right, that's what I meant. > > It seems like overkill, though. Does anyone actually care whether these > names are right? That's certainly a tempting angle. However we may not "control" the server code -- couldn't there be some oddball (maybe even proprietary) NFS servers out there that users do care about interacting with? Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:07 ` Matt Helsley @ 2009-01-07 0:55 ` Eric W. Biederman 0 siblings, 0 replies; 42+ messages in thread From: Eric W. Biederman @ 2009-01-07 0:55 UTC (permalink / raw) To: Matt Helsley Cc: J. Bruce Fields, Trond Myklebust, Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater Matt Helsley <matthltc@us.ibm.com> writes: > On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote: >> On Tue, Jan 06, 2009 at 06:35:43PM -0500, Trond Myklebust wrote: >> > On Tue, 2009-01-06 at 18:32 -0500, J. Bruce Fields wrote: >> > > On Tue, Jan 06, 2009 at 06:15:34PM -0500, Trond Myklebust wrote: >> > > > On Tue, 2009-01-06 at 15:04 -0800, Eric W. Biederman wrote: >> > > > > That implies to me you want to capture the value at mount time, and to >> > > > > pass it in to the rpc_call creation, and only at very specific well >> > > > > defined points where we interact with user space should we examine >> > > > > current->utsname(). At which point there should be no question >> > > > > of current->utsname() is valid as the user space process is alive. >> > > > >> > > > Why pretend that the filesystem is owned by a particular namespace? It >> > > > can, and will be shared among many containers... Sounds right. Still like the owner of a file it can happen that some containers are more correct than others. Especially in the context of mount merging and the other sophisticated caching that happens in NFS this increasingly sounds like something that belongs in the cred as that is where it is used. >> > > If the only purpose of this is to fill in the auth_unix cred then >> > > shouldn't it be part of whatever cred structures are passed around? >> > >> > So how does tracking it in a shared structure like the rpc_client help? >> > If you consider it to be part of the cred, then it needs to be tracked >> > in the cred... >> >> Right, that's what I meant. >> >> It seems like overkill, though. Does anyone actually care whether these >> names are right? > > That's certainly a tempting angle. However we may not "control" the > server code -- couldn't there be some oddball (maybe even proprietary) > NFS servers out there that users do care about interacting with? Matt could you look at what it will take to do the right thing from the network namespace side of things as well? I believe it is going to require the same level of understanding of the interactions in the code to get there. For the network namespace we should cache it at mount or server startup and use it until we are done. In a network namespace context there are good reasons for that because talking to 10.0.0.1 on one network may not be the same machine as talking to 10.0.0.1 on another network. NFS reestablishes tcp connections if the connection to the server breaks doesn't it? Or is that left to user space? Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 23:53 ` J. Bruce Fields 2009-01-07 0:07 ` Matt Helsley @ 2009-01-07 0:20 ` Trond Myklebust 1 sibling, 0 replies; 42+ messages in thread From: Trond Myklebust @ 2009-01-07 0:20 UTC (permalink / raw) To: J. Bruce Fields Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 18:53 -0500, J. Bruce Fields wrote: > It seems like overkill, though. Does anyone actually care whether these > names are right? I very much doubt it. Particularly since the actual sockets used by the RPC client are always owned by the init namespace (they are established from a workqueue process), and so the source IP address will always be that of the init namespace. Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 23:35 ` Trond Myklebust ` (2 preceding siblings ...) 2009-01-06 23:53 ` J. Bruce Fields @ 2009-01-07 0:20 ` Trond Myklebust 2009-01-07 0:26 ` J. Bruce Fields 3 siblings, 1 reply; 42+ messages in thread From: Trond Myklebust @ 2009-01-07 0:20 UTC (permalink / raw) To: J. Bruce Fields Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 18:35 -0500, Trond Myklebust wrote: > So how does tracking it in a shared structure like the rpc_client help? > If you consider it to be part of the cred, then it needs to be tracked > in the cred... OK. If people really want to track this, then you could add a reference to the current->nsproxy to the struct rpc_cred and struct auth_cred, and ensure that the unx_marshal and unx_match routines do the right thing w.r.t. that reference (if it exists). However such a patch had better be accompanied with a _really_ convincing argument for why containers need this... As for the NFSv4 clientid, I can't see how you would ever want to use anything other than the init->utsname(), since the requirement is only that the clientid string be unique and preserved across reboots. The server isn't allowed to interpret the contents of the clientid string. Ditto for the RPCSEC_GSS machine creds that are used to establish the clientid. Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:20 ` Trond Myklebust @ 2009-01-07 0:26 ` J. Bruce Fields 2009-01-07 0:38 ` Trond Myklebust 0 siblings, 1 reply; 42+ messages in thread From: J. Bruce Fields @ 2009-01-07 0:26 UTC (permalink / raw) To: Trond Myklebust Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, Jan 06, 2009 at 07:20:07PM -0500, Trond Myklebust wrote: > As for the NFSv4 clientid, I can't see how you would ever want to use > anything other than the init->utsname(), since the requirement is only > that the clientid string be unique and preserved across reboots. The > server isn't allowed to interpret the contents of the clientid string. > Ditto for the RPCSEC_GSS machine creds that are used to establish the > clientid. If people eventually expect to be able to, say, migrate a container to another host while using an nfs mount as their storage, then they'd need the name to be that of the container, not of the host. Obviously we'd also need to ensure the container got its own nfsv4 client state, etc., etc., and it sounds like we're far from that. --b. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:26 ` J. Bruce Fields @ 2009-01-07 0:38 ` Trond Myklebust 2009-01-07 1:44 ` J. Bruce Fields 0 siblings, 1 reply; 42+ messages in thread From: Trond Myklebust @ 2009-01-07 0:38 UTC (permalink / raw) To: J. Bruce Fields Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 19:26 -0500, J. Bruce Fields wrote: > On Tue, Jan 06, 2009 at 07:20:07PM -0500, Trond Myklebust wrote: > > As for the NFSv4 clientid, I can't see how you would ever want to use > > anything other than the init->utsname(), since the requirement is only > > that the clientid string be unique and preserved across reboots. The > > server isn't allowed to interpret the contents of the clientid string. > > Ditto for the RPCSEC_GSS machine creds that are used to establish the > > clientid. > > If people eventually expect to be able to, say, migrate a container to > another host while using an nfs mount as their storage, then they'd need > the name to be that of the container, not of the host. Why? > Obviously we'd also need to ensure the container got its own nfsv4 > client state, etc., etc., and it sounds like we're far from that. Again, why? Are you seriously proposing a plan to transport all NFS and locking state directly from one kernel to another? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 0:38 ` Trond Myklebust @ 2009-01-07 1:44 ` J. Bruce Fields 2009-01-07 1:50 ` Trond Myklebust 2009-01-07 2:37 ` Eric W. Biederman 0 siblings, 2 replies; 42+ messages in thread From: J. Bruce Fields @ 2009-01-07 1:44 UTC (permalink / raw) To: Trond Myklebust Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, Jan 06, 2009 at 07:38:26PM -0500, Trond Myklebust wrote: > On Tue, 2009-01-06 at 19:26 -0500, J. Bruce Fields wrote: > > On Tue, Jan 06, 2009 at 07:20:07PM -0500, Trond Myklebust wrote: > > > As for the NFSv4 clientid, I can't see how you would ever want to use > > > anything other than the init->utsname(), since the requirement is only > > > that the clientid string be unique and preserved across reboots. The > > > server isn't allowed to interpret the contents of the clientid string. > > > Ditto for the RPCSEC_GSS machine creds that are used to establish the > > > clientid. > > > > If people eventually expect to be able to, say, migrate a container to > > another host while using an nfs mount as their storage, then they'd need > > the name to be that of the container, not of the host. > > Why? > > > Obviously we'd also need to ensure the container got its own nfsv4 > > client state, etc., etc., and it sounds like we're far from that. > > Again, why? Are you seriously proposing a plan to transport all NFS and > locking state directly from one kernel to another? If people seem to think they can do live process and container migration: http://ols.fedoraproject.org/OLS/Reprints-2008/mirkin-reprint.pdf then moving the NFS state strikes me as not the greatest of their troubles. I'm not volunteering! --b. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 1:44 ` J. Bruce Fields @ 2009-01-07 1:50 ` Trond Myklebust 2009-01-07 2:37 ` Eric W. Biederman 1 sibling, 0 replies; 42+ messages in thread From: Trond Myklebust @ 2009-01-07 1:50 UTC (permalink / raw) To: J. Bruce Fields Cc: Eric W. Biederman, Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 20:44 -0500, J. Bruce Fields wrote: > > Again, why? Are you seriously proposing a plan to transport all NFS and > > locking state directly from one kernel to another? > > If people seem to think they can do live process and container > migration: > > http://ols.fedoraproject.org/OLS/Reprints-2008/mirkin-reprint.pdf > > then moving the NFS state strikes me as not the greatest of their > troubles. ROTFL... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-07 1:44 ` J. Bruce Fields 2009-01-07 1:50 ` Trond Myklebust @ 2009-01-07 2:37 ` Eric W. Biederman 1 sibling, 0 replies; 42+ messages in thread From: Eric W. Biederman @ 2009-01-07 2:37 UTC (permalink / raw) To: J. Bruce Fields Cc: Trond Myklebust, Serge E. Hallyn, Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Chuck Lever, Linux Containers, Cedric Le Goater "J. Bruce Fields" <bfields@fieldses.org> writes: > If people seem to think they can do live process and container > migration: > > http://ols.fedoraproject.org/OLS/Reprints-2008/mirkin-reprint.pdf > > then moving the NFS state strikes me as not the greatest of their > troubles. > > I'm not volunteering! Reasonable. There are a lot easier cases than NFS, and migration is not the biggest use case. A lot of this work we can do incrementally and simply not support doing everything in a container. Which is where we are today. The whole hostname in NFS packets isn't something that makes much sense to me in the first place, so I can't comment on a good use case for that. Sorting out the details so we can get NFS working in multiple network namespaces makes a lot of sense to me. It allows things like VPNing into work isolating the VPN process in a network namespace, and still being able to use all of the kernel networking including NFS like normal sounds fun to me. There may also be some fun things you can do with testing having both an NFS server and the client on the same box and not go through the loopback device. Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 21:58 ` Serge E. Hallyn 2009-01-06 22:42 ` Trond Myklebust 2009-01-06 23:04 ` Eric W. Biederman @ 2009-01-06 23:30 ` Matt Helsley 2 siblings, 0 replies; 42+ messages in thread From: Matt Helsley @ 2009-01-06 23:30 UTC (permalink / raw) To: Serge E. Hallyn Cc: Trond Myklebust, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 15:58 -0600, Serge E. Hallyn wrote: > Quoting Trond Myklebust (trond.myklebust@fys.uio.no): > > On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote: > > > Quoting Matt Helsley (matthltc@us.ibm.com): > > > > We can often specify the UTS namespace to use when starting an RPC client. > > > > However sometimes no UTS namespace is available (specifically during system > > > > shutdown as the last NFS mount in a container is unmounted) so fall > > > > back to the initial UTS namespace. > > > > > > So what happens if we take this patch and do nothing else? > > > > > > The only potential problem situation will be rpc requests > > > made on behalf of a container in which the last task has > > > exited, right? So let's say a container did an nfs mount > > > and then exits, causing an nfs umount request. > > > > > > That umount request will now be sent with the wrong nodename. > > > Does that actually cause problems, will the server use the > > > nodename to try and determine the client sending the request? > > > > The NFSv2/v3 umount rpc call will be sent by the 'umount' program from > > userspace, not the kernel. The problem here is that because lazy mounts > > exist, the lifetime of the RPC client may be longer than that of the > > Right that was what i was referring to. > > > container. In addition, it may be shared among more than 1 container, > > because superblocks can be shared. > > Good point. And in that case what do we care about (even though > apparently we just might not care at all :) - who did the mount, > or who is using it? > > In fact one thing I noticed in Matt's patch 3 was that he copied > in the nodename verbatim, so a future hostname() by the container > wouldn't be reflected, again not sure if that would matter. I thought of this. I found the patches that added the nodename in the RPC client struct and the stale nodename was intentional. That's why I preserved the copy rather than called utsname() each time. Cheers, -Matt ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 20:44 ` Trond Myklebust 2009-01-06 21:58 ` Serge E. Hallyn @ 2009-01-06 23:18 ` Matt Helsley 2009-01-06 23:43 ` Trond Myklebust 1 sibling, 1 reply; 42+ messages in thread From: Matt Helsley @ 2009-01-06 23:18 UTC (permalink / raw) To: Trond Myklebust Cc: Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 15:44 -0500, Trond Myklebust wrote: > On Tue, 2009-01-06 at 14:02 -0600, Serge E. Hallyn wrote: > > Quoting Matt Helsley (matthltc@us.ibm.com): > > > We can often specify the UTS namespace to use when starting an RPC client. > > > However sometimes no UTS namespace is available (specifically during system > > > shutdown as the last NFS mount in a container is unmounted) so fall > > > back to the initial UTS namespace. > > > > So what happens if we take this patch and do nothing else? > > > > The only potential problem situation will be rpc requests > > made on behalf of a container in which the last task has > > exited, right? So let's say a container did an nfs mount > > and then exits, causing an nfs umount request. > > > > That umount request will now be sent with the wrong nodename. > > Does that actually cause problems, will the server use the > > nodename to try and determine the client sending the request? > > The NFSv2/v3 umount rpc call will be sent by the 'umount' program from > userspace, not the kernel. The problem here is that because lazy mounts Ahh, that's news to me. I thought userspace originated the umount but the kernel constructed and sent the corresponding RPC call. > exist, the lifetime of the RPC client may be longer than that of the > container. In addition, it may be shared among more than 1 container, > because superblocks can be shared. Right. > One thing you need to be aware of here is that inode dirty data > writebacks may be initiated by completely different processes than the > one that dirtied the inode. > IOW: Aside from being extremely ugly, approaches like [PATCH 4/4] which > rely on being able to determine the container-specific node name at RPC > generation time are therefore going to return incorrect values. Yes, I was aware that the inode might be dirtied by another container. I was thinking that, at least in the case of NFS, it makes sense to report the node name of the container that did the original mount. Of course this doesn't address the general RPC client case and, like patch 3, it makes the superblock solution rather NFS-specific. That brings me to a basic question: Are there any RPC clients in the kernel that do not operate on behalf of NFS? Thanks! Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 23:18 ` Matt Helsley @ 2009-01-06 23:43 ` Trond Myklebust 2009-01-06 23:58 ` Matt Helsley 0 siblings, 1 reply; 42+ messages in thread From: Trond Myklebust @ 2009-01-06 23:43 UTC (permalink / raw) To: Matt Helsley Cc: Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 15:18 -0800, Matt Helsley wrote: > Yes, I was aware that the inode might be dirtied by another container. I > was thinking that, at least in the case of NFS, it makes sense to report > the node name of the container that did the original mount. Of course > this doesn't address the general RPC client case and, like patch 3, it > makes the superblock solution rather NFS-specific. That brings me to a > basic question: Are there any RPC clients in the kernel that do not > operate on behalf of NFS? Yes. There is, for instance, the portmapper/rpcbind client. The NFS and lockd servers also sometime needs to send RPC callbacks. Assuming that you can convert the RPC client into something NFS-specific is therefore not an option. Trond ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 23:43 ` Trond Myklebust @ 2009-01-06 23:58 ` Matt Helsley 0 siblings, 0 replies; 42+ messages in thread From: Matt Helsley @ 2009-01-06 23:58 UTC (permalink / raw) To: Trond Myklebust Cc: Serge E. Hallyn, Linux Containers, linux-nfs, Linux Kernel Mailing List, J. Bruce Fields, Chuck Lever, Eric W. Biederman, Linux Containers, Cedric Le Goater On Tue, 2009-01-06 at 18:43 -0500, Trond Myklebust wrote: > On Tue, 2009-01-06 at 15:18 -0800, Matt Helsley wrote: > > Yes, I was aware that the inode might be dirtied by another container. I > > was thinking that, at least in the case of NFS, it makes sense to report > > the node name of the container that did the original mount. Of course > > this doesn't address the general RPC client case and, like patch 3, it > > makes the superblock solution rather NFS-specific. That brings me to a > > basic question: Are there any RPC clients in the kernel that do not > > operate on behalf of NFS? > > Yes. There is, for instance, the portmapper/rpcbind client. <snip> OK that's a good example. Point taken. > The NFS and > lockd servers also sometime needs to send RPC callbacks. Yes, these won't have an NFS superblock handy and hence they use the node name cached in the RPC client struct. That means that these patches have only addressed NFS client-side namespace issues. Have you seen J. Bruce Fields post on the nfsd side of things? Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 20:02 ` Serge E. Hallyn 2009-01-06 20:20 ` J. Bruce Fields 2009-01-06 20:44 ` Trond Myklebust @ 2009-01-06 22:29 ` Chuck Lever 2009-01-07 0:01 ` Serge E. Hallyn 2 siblings, 1 reply; 42+ messages in thread From: Chuck Lever @ 2009-01-06 22:29 UTC (permalink / raw) To: Serge E. Hallyn Cc: Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Trond Myklebust, J. Bruce Fields, Eric W. Biederman, Linux Containers, Cedric Le Goater On Jan 6, 2009, at Jan 6, 2009, 3:02 PM, Serge E. Hallyn wrote: > Quoting Matt Helsley (matthltc@us.ibm.com): >> We can often specify the UTS namespace to use when starting an RPC >> client. >> However sometimes no UTS namespace is available (specifically >> during system >> shutdown as the last NFS mount in a container is unmounted) so fall >> back to the initial UTS namespace. > > So what happens if we take this patch and do nothing else? I thought the point of this was to prevent incorrect container nodenames from leaking onto the network. > The only potential problem situation will be rpc requests > made on behalf of a container in which the last task has > exited, right? So let's say a container did an nfs mount > and then exits, causing an nfs umount request. > > That umount request will now be sent with the wrong nodename. > Does that actually cause problems, will the server use the > nodename to try and determine the client sending the request? > > thanks, > -serge > >> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> >> Cc: Cedric Le Goater <clg@fr.ibm.com> >> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> >> Cc: linux-nfs@vger.kernel.org >> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> >> Cc: Chuck Lever <chuck.lever@oracle.com> >> Cc: Eric W. Biederman <ebiederm@xmission.com> >> Cc: Linux Containers <containers@lists.osdl.org> >> >> --- >> net/sunrpc/clnt.c | 7 +++++-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> Index: linux-2.6.28/net/sunrpc/clnt.c >> =================================================================== >> --- linux-2.6.28.orig/net/sunrpc/clnt.c >> +++ linux-2.6.28/net/sunrpc/clnt.c >> @@ -128,6 +128,7 @@ static struct rpc_clnt * rpc_new_client( >> struct rpc_version *version; >> struct rpc_clnt *clnt = NULL; >> struct rpc_auth *auth; >> + struct new_utsname *uts_ns = init_utsname(); >> int err; >> size_t len; >> >> @@ -213,10 +214,12 @@ static struct rpc_clnt * rpc_new_client( >> } >> >> /* save the nodename */ >> - clnt->cl_nodelen = strlen(init_utsname()->nodename); >> + if (current->nsproxy != NULL) >> + uts_ns = utsname(); >> + clnt->cl_nodelen = strlen(uts_ns->nodename); >> if (clnt->cl_nodelen > UNX_MAXNODENAME) >> clnt->cl_nodelen = UNX_MAXNODENAME; >> - memcpy(clnt->cl_nodename, init_utsname()->nodename, clnt- >> >cl_nodelen); >> + memcpy(clnt->cl_nodename, uts_ns->nodename, clnt->cl_nodelen); >> rpc_register_client(clnt); >> return clnt; >> >> >> -- >> _______________________________________________ >> Containers mailing list >> Containers@lists.linux-foundation.org >> https://lists.linux-foundation.org/mailman/listinfo/containers -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [RFC][PATCH 2/4] sunrpc: Use utsnamespaces 2009-01-06 22:29 ` Chuck Lever @ 2009-01-07 0:01 ` Serge E. Hallyn 0 siblings, 0 replies; 42+ messages in thread From: Serge E. Hallyn @ 2009-01-07 0:01 UTC (permalink / raw) To: Chuck Lever Cc: Matt Helsley, Linux Containers, linux-nfs, Linux Kernel Mailing List, Trond Myklebust, J. Bruce Fields, Eric W. Biederman, Linux Containers, Cedric Le Goater Quoting Chuck Lever (chuck.lever@oracle.com): > On Jan 6, 2009, at Jan 6, 2009, 3:02 PM, Serge E. Hallyn wrote: >> Quoting Matt Helsley (matthltc@us.ibm.com): >>> We can often specify the UTS namespace to use when starting an RPC >>> client. >>> However sometimes no UTS namespace is available (specifically during >>> system >>> shutdown as the last NFS mount in a container is unmounted) so fall >>> back to the initial UTS namespace. >> >> So what happens if we take this patch and do nothing else? > > I thought the point of this was to prevent incorrect container nodenames > from leaking onto the network. But define incorrect. If container B does an nfs mount, container c is launched with a tree in that mount, container B dies, and container C umounts it. Should the umount belong to container B (for having mounted it), container C (for having umount it), or the init_utsname (for being the physical host/kernel)? I get the feeling that consensus on this thread is that init_utsname is actually the best choice, but OTOH if I have 3 containers on my host, for apache, mysql, and postfix servers, and each is doing nfs mounts from a physically remote machine, maybe I care about having them report separate nodenames? (that's a question, I really don't know...) -serge ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2009-01-07 3:45 UTC | newest] Thread overview: 42+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-07 0:39 [RFC][PATCH 2/4] sunrpc: Use utsnamespaces trond.myklebust 2009-01-07 0:57 ` Matt Helsley 2009-01-07 1:02 ` Trond Myklebust 2009-01-07 1:22 ` Matt Helsley -- strict thread matches above, loose matches on Subject: below -- 2009-01-06 1:13 [RFC][PATCH 0/4] utsns: RPC/NFS bug rework Matt Helsley 2009-01-06 1:13 ` [RFC][PATCH 2/4] sunrpc: Use utsnamespaces Matt Helsley 2009-01-06 20:02 ` Serge E. Hallyn 2009-01-06 20:20 ` J. Bruce Fields 2009-01-06 21:53 ` Serge E. Hallyn 2009-01-06 23:35 ` Matt Helsley 2009-01-06 22:43 ` Matt Helsley 2009-01-06 20:44 ` Trond Myklebust 2009-01-06 21:58 ` Serge E. Hallyn 2009-01-06 22:42 ` Trond Myklebust 2009-01-07 0:08 ` Matt Helsley 2009-01-07 0:20 ` Trond Myklebust 2009-01-07 0:43 ` Matt Helsley 2009-01-07 1:10 ` Trond Myklebust 2009-01-07 0:20 ` J. Bruce Fields 2009-01-07 0:23 ` Trond Myklebust 2009-01-07 3:44 ` Matt Helsley 2009-01-06 23:04 ` Eric W. Biederman 2009-01-06 23:15 ` Trond Myklebust 2009-01-06 23:32 ` J. Bruce Fields 2009-01-06 23:35 ` Trond Myklebust 2009-01-06 23:48 ` Matt Helsley 2009-01-06 23:51 ` Chuck Lever 2009-01-06 23:53 ` J. Bruce Fields 2009-01-07 0:07 ` Matt Helsley 2009-01-07 0:55 ` Eric W. Biederman 2009-01-07 0:20 ` Trond Myklebust 2009-01-07 0:20 ` Trond Myklebust 2009-01-07 0:26 ` J. Bruce Fields 2009-01-07 0:38 ` Trond Myklebust 2009-01-07 1:44 ` J. Bruce Fields 2009-01-07 1:50 ` Trond Myklebust 2009-01-07 2:37 ` Eric W. Biederman 2009-01-06 23:30 ` Matt Helsley 2009-01-06 23:18 ` Matt Helsley 2009-01-06 23:43 ` Trond Myklebust 2009-01-06 23:58 ` Matt Helsley 2009-01-06 22:29 ` Chuck Lever 2009-01-07 0:01 ` Serge E. Hallyn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox