From: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: James Bottomley
<James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>,
David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
trondmy-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org
Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org,
Linux Containers
<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
Subject: Re: [RFC][PATCH 0/9] Make containers kernel objects
Date: Mon, 22 May 2017 14:34:52 -0400 [thread overview]
Message-ID: <1495478092.2816.17.camel@redhat.com> (raw)
In-Reply-To: <1495472039.2757.19.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
On Mon, 2017-05-22 at 09:53 -0700, James Bottomley wrote:
> [Added missing cc to containers list]
> On Mon, 2017-05-22 at 17:22 +0100, David Howells wrote:
> > Here are a set of patches to define a container object for the kernel
> > and to provide some methods to create and manipulate them.
> >
> > The reason I think this is necessary is that the kernel has no idea
> > how to direct upcalls to what userspace considers to be a container -
> > current Linux practice appears to make a "container" just an
> > arbitrarily chosen junction of namespaces, control groups and files,
> > which may be changed individually within the "container".
>
> This sounds like a step in the wrong direction: the strength of the
> current container interfaces in Linux is that people who set up
> containers don't have to agree what they look like. So I can set up a
> user namespace without a mount namespace or an architecture emulation
> container with only a mount namespace.
>
Does this really mandate what they look like though? AFAICT, you can
still spawn disconnected namespaces to your heart's content. What this
does is provide a container for several different namespaces so that the
kernel can actually be aware of the association between them. The way
you populate the different namespaces looks to be pretty flexible.
> But ignoring my fun foibles with containers and to give a concrete
> example in terms of a popular orchestration system: in kubernetes,
> where certain namespaces are shared across pods, do you imagine the
> kernel's view of the "container" to be the pod or what kubernetes
> thinks of as the container? This is important, because half the
> examples you give below are network related and usually pods share a
> network namespace.
>
> > The kernel upcall mechanism then needs to decide which set of
> > namespaces, etc., it must exec the appropriate upcall program.
> > Examples of this include:
> >
> > (1) The DNS resolver. The DNS cache in the kernel should probably
> > be per-network namespace, but in userspace the program, its
> > libraries and its config data are associated with a mount tree and a
> > user namespace and it gets run in a particular pid namespace.
>
> All persistent (written to fs data) has to be mount ns associated;
> there are no ifs, ands and buts to that. I agree this implies that if
> you want to run a separate network namespace, you either take DNS from
> the parent (a lot of containers do) or you set up a daemon to run
> within the mount namespace. I agree the latter is a slightly fiddly
> operation you have to get right, but that's why we have orchestration
> systems.
>
> What is it we could do with the above that we cannot do today?
>
Spawn a task directly from the kernel, already set up in the correct
namespaces, a'la call_usermodehelper. So far there is no way to do that,
and it is something we'd very much desire. Ian Kent has made several
passes at it recently.
> > (2) NFS ID mapper. The NFS ID mapping cache should also probably be
> > per-network namespace.
>
> I think this is a view but not the only one: Right at the moment, NFS
> ID mapping is used as the one of the ways we can get the user namespace
> ID mapping writes to file problems fixed ... that makes it a property
> of the mount namespace for a lot of containers. There are many other
> instances where they do exactly as you say, but what I'm saying is that
> we don't want to lose the flexibility we currently have.
>
> > (3) nfsdcltrack. A way for NFSD to access stable storage for
> > tracking of persistent state. Again, network-namespace dependent,
> > but also perhaps mount-namespace dependent.
Definitely mount-namespace dependent.
>
> So again, given we can set this up to work today, this sounds like more
> a restriction that will bite us than an enhancement that gives us extra
> features.
>
How do you set this up to work today?
AFAIK, if you want to run knfsd in a container today, you're out of luck
for any non-trivial configuration. The main reason is that most of knfsd
is namespace-ized in the network namespace, but there is no clear way to
associate that with a mount namespace, which is what we need to do this
properly inside a container. I think David's patches would get us there.
> > (4) General request-key upcalls. Not particularly namespace
> > dependent, apart from keyrings being somewhat governed by the user
> > namespace and the upcall being configured by the mount namespace.
>
> All mount namespaces have an owning user namespace, so the data
> relations are already there in the kernel, is the problem simply
> finding them?
>
> > These patches are built on top of the mount context patchset so that
> > namespaces can be properly propagated over submounts/automounts.
>
> I'll stop here ... you get the idea that I think this is imposing a set
> of restrictions that will come back to bite us later. If this is just
> for the sake of figuring out how to get keyring upcalls to work, then
> I'm sure we can come up with something.
>
--
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
next prev parent reply other threads:[~2017-05-22 18:34 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-22 16:22 [RFC][PATCH 0/9] Make containers kernel objects David Howells
2017-05-22 16:22 ` [PATCH 1/9] containers: Rename linux/container.h to linux/container_dev.h David Howells
2017-05-22 16:22 ` [PATCH 2/9] Implement containers as kernel objects David Howells
[not found] ` <149547016213.10599.1969443294414531853.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-08-14 5:47 ` Richard Guy Briggs
2017-08-16 22:21 ` Paul Moore
[not found] ` <CAHC9VhRgPRa7KeMt8G700aeFvqVYc0gMx__82K31TYY6oQQqTw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-08-18 8:03 ` Richard Guy Briggs
[not found] ` <20170818080300.GQ7187-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2017-09-06 14:03 ` Serge E. Hallyn
[not found] ` <20170906140341.GA8729-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2017-09-14 5:47 ` Richard Guy Briggs
2017-09-08 20:02 ` Paul Moore
2017-05-22 16:23 ` [PATCH 5/9] Open a socket inside a container David Howells
2017-05-22 16:23 ` [PATCH 6/9] Allow fs syscall dfd arguments to take a container fd David Howells
2017-05-22 16:23 ` [PATCH 7/9] Make fsopen() able to initiate mounting into a container David Howells
2017-05-22 16:23 ` [PATCH 9/9] Sample program for driving container objects David Howells
[not found] ` <149547014649.10599.12025037906646164347.stgit-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-22 16:22 ` [PATCH 3/9] Provide /proc/containers David Howells
2017-05-22 16:22 ` [PATCH 4/9] Allow processes to be forked and upcalled into a container David Howells
2017-05-22 16:23 ` [PATCH 8/9] Honour CONTAINER_NEW_EMPTY_FS_NS David Howells
2017-05-22 16:53 ` [RFC][PATCH 0/9] Make containers kernel objects James Bottomley
[not found] ` <1495472039.2757.19.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-05-22 17:14 ` Aleksa Sarai
2017-05-22 17:27 ` Jessica Frazelle
2017-05-22 18:34 ` Jeff Layton [this message]
[not found] ` <1495478092.2816.17.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-05-22 19:21 ` James Bottomley
2017-05-22 22:14 ` Jeff Layton
[not found] ` <1495480860.9050.18.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-05-23 10:35 ` Ian Kent
2017-05-23 9:38 ` Ian Kent
2017-05-23 13:52 ` David Howells
[not found] ` <32556.1495547529-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-23 15:02 ` James Bottomley
2017-05-23 15:23 ` Eric W. Biederman
2017-05-23 14:53 ` David Howells
[not found] ` <2446.1495551216-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-23 14:56 ` Eric W. Biederman
[not found] ` <87zie3mxkc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-23 15:14 ` David Howells
[not found] ` <2961.1495552481-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-23 15:17 ` Eric W. Biederman
[not found] ` <87bmqjmwl5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-23 15:44 ` James Bottomley
[not found] ` <1495554267.27369.9.camel-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org>
2017-05-23 16:36 ` David Howells
[not found] ` <3860.1495557363-S6HVgzuS8uM4Awkfq6JHfwNdhmdF6hFW@public.gmane.org>
2017-05-24 8:26 ` Eric W. Biederman
[not found] ` <87k256ek3e.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-24 9:16 ` Ian Kent
2017-05-22 17:11 ` Jessica Frazelle
2017-05-22 19:04 ` Eric W. Biederman
[not found] ` <87lgpoww67.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-22 22:22 ` Jeff Layton
[not found] ` <1495491733.25946.3.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-05-23 12:54 ` Eric W. Biederman
[not found] ` <874lwbraxh.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-05-23 14:27 ` Jeff Layton
2017-05-23 14:30 ` Djalal Harouni
[not found] ` <CAEiveUcbmm5m4=11ZppxAWppeoFWUBFpLC7dAZRuBCTFHR548g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-23 14:54 ` Colin Walters
[not found] ` <1495551292.2742620.985957224.3FCF254A-2RFepEojUI2N1INw9kWLP6GC3tUn3ZHUQQ4Iyu8u01E@public.gmane.org>
2017-05-23 15:31 ` Jeff Layton
[not found] ` <1495553491.2946.16.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-05-23 15:35 ` Colin Walters
2017-05-23 15:30 ` David Howells
2017-05-23 14:23 ` Djalal Harouni
2017-05-27 17:45 ` Trond Myklebust
[not found] ` <1495907132.4591.3.camel-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
2017-05-27 19:10 ` James Bottomley
2017-05-30 1:03 ` Ian Kent
2017-05-23 10:09 ` Ian Kent
2017-05-23 15:33 ` Eric W. Biederman
[not found] ` <CAEk6tEyjk4=rHfsJUZ7dYPpdSa-=QX6QAm8ni8-ySpHmjUMwTg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-23 15:12 ` David Howells
2017-05-23 16:13 ` David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1495478092.2816.17.camel@redhat.com \
--to=jlayton-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=trondmy-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org \
--cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).