From: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
To: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>,
"Eric W. Biederman"
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Linux Containers
<containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>,
cgroups mailinglist
<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
lkml <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field
Date: Fri, 15 Apr 2016 11:02:51 -0500 [thread overview]
Message-ID: <20160415160251.GA32508@mail.hallyn.com> (raw)
In-Reply-To: <CAGr1F2EZtts38SPDc9cuH1prc6NfUJiwUQmqyRp-RpNYM5UzxA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Quoting Aditya Kali (adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org):
> On Thu, Apr 14, 2016 at 8:27 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> >>
> >> > This is so that userspace can distinguish a mount made in a cgroup
> >> > namespace from a bind mount from a cgroup subdirectory.
> >>
> >> To do that do you need to print the path, or is an extra option that
> >> reveals nothing except that it was a cgroup mount sufficient?
> >>
> >> Is there any practical difference between a mount in a namespace and a
> >> bind mount?
> >>
> >> Given the way the conversation has been going I think it would be good
> >> to see the answers to these questions. Perhaps I missed it but I
> >> haven't seen the answers to those questions.
> >
> > Yup, I tried to answer those in my last email, let me try again.
> >
> > Let's say I start a container using cgroup namespaces, /lxc/x1. It mounts
> > freezer at /sys/fs/cgroup so it has field three of mountinfo as /lxc/x1,
> > and /sys/fs/cgroup/ is the path to the container's cgroup (/lxc/x1). In
> > that container, I start another container x1, not using cgroup namespaces.
> > It also wants a cgroup mount, and a common way to handle that (to prevent
> > container rewriting its limits) is to mount a tmpfs at /sys/fs/cgroup,
> > create /sysfs/cgroup/lxc/x1, and bind mount /sys/fs/cgroup/lxc/x1 from
> > the parent container onto /sys/fs/cgroup/lxc/x1 in the child container.
> > Now for that bind mount, the mountinfo field 3 will show /lxc/x1/lxc/x1,
> > with mount target /sys/fs/cgroup/lxc/x1, while /proc/self/cgroup for a task
> > in that container will show '/lxc/x1'. Unless it has been moved into
> > /lxc/x1/lxc/x1 in the container (/lxc/x1/lxc/x1/lxc/x1 on the host)...
> > Every time I've thought "maybe we can just..." I've found a case where it
> > wouldn't work.
> >
> > At first in lxc we simply said if /proc/self/ns/cgroup exists assume that
> > the cgroupfs mounts are not bind mounts. However, old userspace (and
> > container drivers) on new kernels is certainly possible, especially an
> > older distro in a container on a newer distro on the host. That completely
> > breaks with this approach.
> >
>
> My main concern regarding making this a new kernel API is that its too
> generic and exposes information about all system cgroups to every
> process on the system, not just the container or the process inside it
> that needs it. Not all containers need this information and not all
> processes running inside the container needs this. I haven't spent too
> much thought into it, but it seems you will still need to update the
> container userspace to read this extra mount option. So seems like a
> simpler approach where the host "cgroup manager" provides this
> information to specific container cgroup manager via other user-space
> channels (a config file, command-line args, environment vars, proper
> container mounts, etc.) may also work, right?
No, because existing legacy userspace would need to be taught about
these new channels.
I'm testing a new patch which simply fixes the root dentry field in
mountinfo, which should also serve to fix this problem without adding
the nsroot= option field.
WARNING: multiple messages have this Message-ID (diff)
From: "Serge E. Hallyn" <serge@hallyn.com>
To: Aditya Kali <adityakali@google.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Tejun Heo <tj@kernel.org>, Linux API <linux-api@vger.kernel.org>,
Linux Containers <containers@lists.osdl.org>,
cgroups mailinglist <cgroups@vger.kernel.org>,
lkml <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field
Date: Fri, 15 Apr 2016 11:02:51 -0500 [thread overview]
Message-ID: <20160415160251.GA32508@mail.hallyn.com> (raw)
In-Reply-To: <CAGr1F2EZtts38SPDc9cuH1prc6NfUJiwUQmqyRp-RpNYM5UzxA@mail.gmail.com>
Quoting Aditya Kali (adityakali@google.com):
> On Thu, Apr 14, 2016 at 8:27 AM, Serge E. Hallyn <serge@hallyn.com> wrote:
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> "Serge E. Hallyn" <serge@hallyn.com> writes:
> >>
> >> > This is so that userspace can distinguish a mount made in a cgroup
> >> > namespace from a bind mount from a cgroup subdirectory.
> >>
> >> To do that do you need to print the path, or is an extra option that
> >> reveals nothing except that it was a cgroup mount sufficient?
> >>
> >> Is there any practical difference between a mount in a namespace and a
> >> bind mount?
> >>
> >> Given the way the conversation has been going I think it would be good
> >> to see the answers to these questions. Perhaps I missed it but I
> >> haven't seen the answers to those questions.
> >
> > Yup, I tried to answer those in my last email, let me try again.
> >
> > Let's say I start a container using cgroup namespaces, /lxc/x1. It mounts
> > freezer at /sys/fs/cgroup so it has field three of mountinfo as /lxc/x1,
> > and /sys/fs/cgroup/ is the path to the container's cgroup (/lxc/x1). In
> > that container, I start another container x1, not using cgroup namespaces.
> > It also wants a cgroup mount, and a common way to handle that (to prevent
> > container rewriting its limits) is to mount a tmpfs at /sys/fs/cgroup,
> > create /sysfs/cgroup/lxc/x1, and bind mount /sys/fs/cgroup/lxc/x1 from
> > the parent container onto /sys/fs/cgroup/lxc/x1 in the child container.
> > Now for that bind mount, the mountinfo field 3 will show /lxc/x1/lxc/x1,
> > with mount target /sys/fs/cgroup/lxc/x1, while /proc/self/cgroup for a task
> > in that container will show '/lxc/x1'. Unless it has been moved into
> > /lxc/x1/lxc/x1 in the container (/lxc/x1/lxc/x1/lxc/x1 on the host)...
> > Every time I've thought "maybe we can just..." I've found a case where it
> > wouldn't work.
> >
> > At first in lxc we simply said if /proc/self/ns/cgroup exists assume that
> > the cgroupfs mounts are not bind mounts. However, old userspace (and
> > container drivers) on new kernels is certainly possible, especially an
> > older distro in a container on a newer distro on the host. That completely
> > breaks with this approach.
> >
>
> My main concern regarding making this a new kernel API is that its too
> generic and exposes information about all system cgroups to every
> process on the system, not just the container or the process inside it
> that needs it. Not all containers need this information and not all
> processes running inside the container needs this. I haven't spent too
> much thought into it, but it seems you will still need to update the
> container userspace to read this extra mount option. So seems like a
> simpler approach where the host "cgroup manager" provides this
> information to specific container cgroup manager via other user-space
> channels (a config file, command-line args, environment vars, proper
> container mounts, etc.) may also work, right?
No, because existing legacy userspace would need to be taught about
these new channels.
I'm testing a new patch which simply fixes the root dentry field in
mountinfo, which should also serve to fix this problem without adding
the nsroot= option field.
next prev parent reply other threads:[~2016-04-15 16:02 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-21 23:41 [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field Serge E. Hallyn
2016-03-21 23:41 ` Serge E. Hallyn
[not found] ` <20160321234133.GA22463-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-29 1:12 ` Serge E. Hallyn
2016-03-29 1:12 ` Serge E. Hallyn
[not found] ` <20160329011203.GA8974-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-30 18:58 ` Tejun Heo
2016-03-30 18:58 ` Tejun Heo
2016-03-29 13:58 ` Tycho Andersen
2016-03-29 13:58 ` Tycho Andersen
2016-03-29 20:00 ` Serge E. Hallyn
2016-03-29 20:00 ` Serge E. Hallyn
[not found] ` <20160329200018.GA21908-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-30 17:21 ` [PATCH] cgroup mount: ignore nsroot= Serge E. Hallyn
2016-03-30 17:21 ` Serge E. Hallyn
[not found] ` <20160330172100.GA11373-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-30 18:09 ` Tycho Andersen
2016-03-30 18:09 ` Tycho Andersen
2016-04-13 17:57 ` [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field Tejun Heo
2016-04-13 17:57 ` Tejun Heo
[not found] ` <20160413175736.GC3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-04-13 18:46 ` Serge E. Hallyn
2016-04-13 18:46 ` Serge E. Hallyn
[not found] ` <20160413184639.GA29483-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-13 18:50 ` Tejun Heo
2016-04-13 18:50 ` Tejun Heo
[not found] ` <20160413185033.GH3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-04-13 19:01 ` Serge E. Hallyn
2016-04-13 19:01 ` Serge E. Hallyn
[not found] ` <20160413190152.GA29753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-13 19:12 ` Tejun Heo
2016-04-13 19:12 ` Tejun Heo
2016-04-13 23:31 ` Aditya Kali
2016-04-13 23:31 ` Aditya Kali
[not found] ` <CAGr1F2HXJ1BdMFY+vF40O_khE+4S7OnbQPv-h1Q_AmGGhL7mzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-13 23:52 ` Serge E. Hallyn
2016-04-13 23:52 ` Serge E. Hallyn
2016-04-14 4:04 ` [PATCH] " Serge E. Hallyn
2016-04-14 4:04 ` Serge E. Hallyn
[not found] ` <20160414040436.GA3739-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-14 14:42 ` Eric W. Biederman
2016-04-14 14:42 ` Eric W. Biederman
[not found] ` <87oa9c6ymf.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-04-14 15:27 ` Serge E. Hallyn
2016-04-14 15:27 ` Serge E. Hallyn
[not found] ` <20160414152747.GA12700-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-14 16:12 ` Eric W. Biederman
2016-04-14 16:12 ` Eric W. Biederman
[not found] ` <877fg06uf9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-04-14 16:38 ` Serge E. Hallyn
2016-04-14 16:38 ` Serge E. Hallyn
2016-04-14 16:43 ` Eric W. Biederman
2016-04-15 15:50 ` Aditya Kali
[not found] ` <CAGr1F2EZtts38SPDc9cuH1prc6NfUJiwUQmqyRp-RpNYM5UzxA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-15 16:02 ` Serge E. Hallyn [this message]
2016-04-15 16:02 ` Serge E. Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160415160251.GA32508@mail.hallyn.com \
--to=serge-a9i7lubdfnhqt0dzr+alfa@public.gmane.org \
--cc=adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.