* nfsd and containers
@ 2009-01-04 2:54 J. Bruce Fields
[not found] ` <20090104025415.GF24075-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: J. Bruce Fields @ 2009-01-04 2:54 UTC (permalink / raw)
To: containers-qjLDD68F18O7TbgM5vRIOg
Does anyone have any ideas about how the kernel's nfsd should interact
(if at all) with network namespaces?
I'm initially interested because I've been experimenting with modifying
the server to allow it to present different exported filesystems
depending on which ip address it's accessed through. One way to do that
might be by modifying the kernel to behave as though there's a separate
nfsd service per network namespace; then we'd need little or no
modification of the userspace support daemons (statd, the portmapper,
etc.)--just start multiple instances of them in separate network
namespaces and teach the kernel to route requests to them to the
corresponding loopback interface. (That would work at least for daemons
that communicate with the kernel exclusively using rpc over loopback.
We could perhaps do something similar with the various /proc and nfsctl
interfaces.)
I'm also curious more generally whether anyone's thought about how nfsd
should behave in the presence of containers.
(Also, I take it the sysfs problem described in
http://lwn.net/Articles/295587/ is still unsolved?)
--b.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: nfsd and containers
[not found] ` <20090104025415.GF24075-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
@ 2009-01-05 16:40 ` Serge E. Hallyn
[not found] ` <20090105164016.GA8746-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 5+ messages in thread
From: Serge E. Hallyn @ 2009-01-05 16:40 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: containers-qjLDD68F18O7TbgM5vRIOg
Quoting J. Bruce Fields (bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org):
> Does anyone have any ideas about how the kernel's nfsd should interact
> (if at all) with network namespaces?
>
> I'm initially interested because I've been experimenting with modifying
> the server to allow it to present different exported filesystems
> depending on which ip address it's accessed through. One way to do that
> might be by modifying the kernel to behave as though there's a separate
> nfsd service per network namespace; then we'd need little or no
> modification of the userspace support daemons (statd, the portmapper,
> etc.)--just start multiple instances of them in separate network
> namespaces and teach the kernel to route requests to them to the
> corresponding loopback interface. (That would work at least for daemons
> that communicate with the kernel exclusively using rpc over loopback.
> We could perhaps do something similar with the various /proc and nfsctl
> interfaces.)
>
> I'm also curious more generally whether anyone's thought about how nfsd
> should behave in the presence of containers.
I suspect Eric has had more detailed thoughts than I so I'm waiting
to see his response. Matt sent a patchset to deal with sunrpc/nfs/uts
namespaces which I haven't yet had a chance to look at, so he might
also have some good comments at this point.
> (Also, I take it the sysfs problem described in
> http://lwn.net/Articles/295587/ is still unsolved?)
Sysfs tagging is not yet implemented, but 2.6.29 is supposed to
have the netdev hack-fix which simply doesn't show /sys/class/net
entries for tasks which are not in the initial network namespace.
That allows network namespaces to be used.
Checking gitweb real quick, yes, CONFIG_NET_NS is an option in
Linus' current git tree.
-serge
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: nfsd and containers
[not found] ` <20090105164016.GA8746-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2009-01-05 22:55 ` Matt Helsley
2009-01-06 15:41 ` J. Bruce Fields
0 siblings, 1 reply; 5+ messages in thread
From: Matt Helsley @ 2009-01-05 22:55 UTC (permalink / raw)
To: Serge E. Hallyn; +Cc: J. Bruce Fields, containers-qjLDD68F18O7TbgM5vRIOg
On Mon, 2009-01-05 at 10:40 -0600, Serge E. Hallyn wrote:
> Quoting J. Bruce Fields (bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org):
> > Does anyone have any ideas about how the kernel's nfsd should interact
> > (if at all) with network namespaces?
> >
> > I'm initially interested because I've been experimenting with modifying
> > the server to allow it to present different exported filesystems
> > depending on which ip address it's accessed through. One way to do that
> > might be by modifying the kernel to behave as though there's a separate
> > nfsd service per network namespace; then we'd need little or no
> > modification of the userspace support daemons (statd, the portmapper,
> > etc.)--just start multiple instances of them in separate network
> > namespaces and teach the kernel to route requests to them to the
> > corresponding loopback interface. (That would work at least for daemons
> > that communicate with the kernel exclusively using rpc over loopback.
> > We could perhaps do something similar with the various /proc and nfsctl
> > interfaces.)
This sounds good. It is somewhat related to UTS namespaces because the
hostname reported from the UTS namespace and the DNS name might not
match. I haven't thoroughly explored all the combinations but I suspect
the use of network namespaces could play a part in that depending on
what choices the administrator(s) make.
> > I'm also curious more generally whether anyone's thought about how nfsd
> > should behave in the presence of containers.
I have only thought about how nfsd should see clients in different UTS
and mount namespaces. The conclusion I came to was NFS should use
whatever name was used with the original mount. So if we mounted an NFS
export and then create a container that uses that mount then it should
use the hostname of the original container. However if the child
container then does another NFS mount then the child's hostname ought to
be used for the new mount.
> I suspect Eric has had more detailed thoughts than I so I'm waiting
> to see his response. Matt sent a patchset to deal with sunrpc/nfs/uts
> namespaces which I haven't yet had a chance to look at, so he might
> also have some good comments at this point.
Seems there's some confusion -- that patchset went out privately during
the holidays. Now is a good time to repost it publicly though. I'll cc
folks on this thread. I'm also planning on cc'ing nfs-devel to get their
thoughts.
Cheers,
-Matt Helsley
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: nfsd and containers
2009-01-05 22:55 ` Matt Helsley
@ 2009-01-06 15:41 ` J. Bruce Fields
0 siblings, 0 replies; 5+ messages in thread
From: J. Bruce Fields @ 2009-01-06 15:41 UTC (permalink / raw)
To: Matt Helsley; +Cc: containers-qjLDD68F18O7TbgM5vRIOg
On Mon, Jan 05, 2009 at 02:55:18PM -0800, Matt Helsley wrote:
> On Mon, 2009-01-05 at 10:40 -0600, Serge E. Hallyn wrote:
> > Quoting J. Bruce Fields (bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org):
> > > Does anyone have any ideas about how the kernel's nfsd should interact
> > > (if at all) with network namespaces?
> > >
> > > I'm initially interested because I've been experimenting with modifying
> > > the server to allow it to present different exported filesystems
> > > depending on which ip address it's accessed through. One way to do that
> > > might be by modifying the kernel to behave as though there's a separate
> > > nfsd service per network namespace; then we'd need little or no
> > > modification of the userspace support daemons (statd, the portmapper,
> > > etc.)--just start multiple instances of them in separate network
> > > namespaces and teach the kernel to route requests to them to the
> > > corresponding loopback interface. (That would work at least for daemons
> > > that communicate with the kernel exclusively using rpc over loopback.
> > > We could perhaps do something similar with the various /proc and nfsctl
> > > interfaces.)
>
> This sounds good. It is somewhat related to UTS namespaces because the
> hostname reported from the UTS namespace and the DNS name might not
> match. I haven't thoroughly explored all the combinations but I suspect
> the use of network namespaces could play a part in that depending on
> what choices the administrator(s) make.
>
> > > I'm also curious more generally whether anyone's thought about how nfsd
> > > should behave in the presence of containers.
>
> I have only thought about how nfsd should see clients in different UTS
> and mount namespaces. The conclusion I came to was NFS should use
> whatever name was used with the original mount. So if we mounted an NFS
> export and then create a container that uses that mount then it should
> use the hostname of the original container. However if the child
> container then does another NFS mount then the child's hostname ought to
> be used for the new mount.
I'm interested in what needs to be done on the server side rather than
on the client side. The server is perhaps less of a natural fit for the
containers project since, unlike the rest of the kernel, it doesn't
exist to perform services on behalf of local processes--it's more like a
full-blown application that happens to run inside the kernel.
Still, it might make sense to use network namespaces to implement
something like Apache's ip-based virtual hosts, by presenting the
processes in each network namespace with the illusion that they control
their own private nfs server.
On the other hand, requiring administrators to set up multiple network
namespaces just to configure this kind of nfs service may be cumbersome.
--b.
^ permalink raw reply [flat|nested] 5+ messages in thread
* nfsd and containers
@ 2016-02-06 0:19 Kjetil Jørgensen
0 siblings, 0 replies; 5+ messages in thread
From: Kjetil Jørgensen @ 2016-02-06 0:19 UTC (permalink / raw)
To: linux-nfs
Hi,
trying to fit everything into the same mold, we're trying to run the
in-kernel "nfs server" inside of docker containers. It works great -
with one exception, "unclean shutdown" of the container itself which
leaves behind the knfsd threads, which holds on to references to i.e.
the mount-namespace the filesystem it's exported lives within.
We've done some patching of docker, so we use ceph rbd devices,
mounted into the docker container, and a veth pair for networking. The
"init" process in the docker containers pid-namespace has a notion of
graceful shutdown, where echos 0 into /proc/fs/nfsd/threads.
In the case where the container init process gets an un-trappable
signal, the kernel threads not really being part of the pid-namespace
will be left behind, the knfsd threads holds on references to the
mount-namespace, which leaves the filesystem mounted.
Yes - we can from the outside signal the kernel NFSd threads which do
let them terminate, but it's not ideal.
A simple-ish test case: unshare -n -p -m -f --mount-proc -- /usr/sbin/rpc.nfsd
Wishful thinking: the kernel nfsd threads that were spawned by
rpc.nfsd goes away with the pid-namespace
Actual outcome: the kernel nfsd threads sticks around until signalled
The actual question(s):
- Am I missing something ?
- Is this folly, and should be abandoned post haste ? (In essence, go
find a userspace nfs implementation)
- In the case where this is folly but we still decide to plow ahead,
is there any way I can determine which namespaces the kernel nfsd
threads hold references to ? (Making killing signalling the "right"
nfsd threads easier, as I lost my reference to the correct
/proc/fs/nfsd with the process I had in the corresponding
pid-namespace)
Cheers,
--
Kjetil Joergensen <kjetil@medallia.com>
Medallia Inc
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-02-06 0:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-04 2:54 nfsd and containers J. Bruce Fields
[not found] ` <20090104025415.GF24075-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2009-01-05 16:40 ` Serge E. Hallyn
[not found] ` <20090105164016.GA8746-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-01-05 22:55 ` Matt Helsley
2009-01-06 15:41 ` J. Bruce Fields
-- strict thread matches above, loose matches on Subject: below --
2016-02-06 0:19 Kjetil Jørgensen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.