From: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
Linux Containers
<containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
lkml <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field
Date: Thu, 14 Apr 2016 10:27:47 -0500 [thread overview]
Message-ID: <20160414152747.GA12700@mail.hallyn.com> (raw)
In-Reply-To: <87oa9c6ymf.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
>
> > This is so that userspace can distinguish a mount made in a cgroup
> > namespace from a bind mount from a cgroup subdirectory.
>
> To do that do you need to print the path, or is an extra option that
> reveals nothing except that it was a cgroup mount sufficient?
>
> Is there any practical difference between a mount in a namespace and a
> bind mount?
>
> Given the way the conversation has been going I think it would be good
> to see the answers to these questions. Perhaps I missed it but I
> haven't seen the answers to those questions.
Yup, I tried to answer those in my last email, let me try again.
Let's say I start a container using cgroup namespaces, /lxc/x1. It mounts
freezer at /sys/fs/cgroup so it has field three of mountinfo as /lxc/x1,
and /sys/fs/cgroup/ is the path to the container's cgroup (/lxc/x1). In
that container, I start another container x1, not using cgroup namespaces.
It also wants a cgroup mount, and a common way to handle that (to prevent
container rewriting its limits) is to mount a tmpfs at /sys/fs/cgroup,
create /sysfs/cgroup/lxc/x1, and bind mount /sys/fs/cgroup/lxc/x1 from
the parent container onto /sys/fs/cgroup/lxc/x1 in the child container.
Now for that bind mount, the mountinfo field 3 will show /lxc/x1/lxc/x1,
with mount target /sys/fs/cgroup/lxc/x1, while /proc/self/cgroup for a task
in that container will show '/lxc/x1'. Unless it has been moved into
/lxc/x1/lxc/x1 in the container (/lxc/x1/lxc/x1/lxc/x1 on the host)...
Every time I've thought "maybe we can just..." I've found a case where it
wouldn't work.
At first in lxc we simply said if /proc/self/ns/cgroup exists assume that
the cgroupfs mounts are not bind mounts. However, old userspace (and
container drivers) on new kernels is certainly possible, especially an
older distro in a container on a newer distro on the host. That completely
breaks with this approach.
I also personally think there *is* value in letting a task know its
place on the system, so hiding the full cgroup path is imo not only not
a valid goal, it's counter-productive. Part of making for better
virtualization is to give userspace all the info it needs about its
current limits. Consider that with the unified hierarchy, you cannot
have tasks in a cgroup that also has child cgroups - except for the
root. Cgroup namespaces do not make an exception for this, so knowing
that you are not in the absolute cgroup root actually can prevent you
from trying something that cannot work. Or, I suppose, at least
understanding why you're unable to do what you're trying to do (namely
your container manager messed up). I point this out because finding
a way to only show the namespaced root in field 3 of mountinfo would
fix the base problem, but at the cost of hiding useful information
from a container.
> Eric
>
>
> >
> > Signed-off-by: Serge Hallyn <serge.hallyn-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
> > ---
> > Changelog: 2016-04-13: pass kernfs_node rather than dentry to show_options
> > ---
> > fs/kernfs/mount.c | 2 +-
> > include/linux/kernfs.h | 3 ++-
> > kernel/cgroup.c | 28 +++++++++++++++++++++++++++-
> > 3 files changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
> > index f73541f..58e8a86 100644
> > --- a/fs/kernfs/mount.c
> > +++ b/fs/kernfs/mount.c
> > @@ -36,7 +36,7 @@ static int kernfs_sop_show_options(struct seq_file *sf, struct dentry *dentry)
> > struct kernfs_syscall_ops *scops = root->syscall_ops;
> >
> > if (scops && scops->show_options)
> > - return scops->show_options(sf, root);
> > + return scops->show_options(sf, dentry->d_fsdata, root);
> > return 0;
> > }
> >
> > diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
> > index c06c442..72b4081 100644
> > --- a/include/linux/kernfs.h
> > +++ b/include/linux/kernfs.h
> > @@ -145,7 +145,8 @@ struct kernfs_node {
> > */
> > struct kernfs_syscall_ops {
> > int (*remount_fs)(struct kernfs_root *root, int *flags, char *data);
> > - int (*show_options)(struct seq_file *sf, struct kernfs_root *root);
> > + int (*show_options)(struct seq_file *sf, struct kernfs_node *kn,
> > + struct kernfs_root *root);
> >
> > int (*mkdir)(struct kernfs_node *parent, const char *name,
> > umode_t mode);
> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> > index 671dc05..4d26d07 100644
> > --- a/kernel/cgroup.c
> > +++ b/kernel/cgroup.c
> > @@ -1593,7 +1593,31 @@ static int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask)
> > return 0;
> > }
> >
> > -static int cgroup_show_options(struct seq_file *seq,
> > +static void cgroup_show_nsroot(struct seq_file *seq, struct kernfs_node *knode,
> > + struct kernfs_root *kroot)
> > +{
> > + char *nsroot;
> > + int len, ret;
> > +
> > + if (!kroot)
> > + return;
> > + len = kernfs_path_from_node(knode, kroot->kn, NULL, 0);
> > + if (len <= 0)
> > + return;
> > + nsroot = kzalloc(len + 1, GFP_ATOMIC);
> > + if (!nsroot)
> > + return;
> > + ret = kernfs_path_from_node(knode, kroot->kn, nsroot, len + 1);
> > + if (ret <= 0 || ret > len)
> > + goto out;
> > +
> > + seq_show_option(seq, "nsroot", nsroot);
> > +
> > +out:
> > + kfree(nsroot);
> > +}
> > +
> > +static int cgroup_show_options(struct seq_file *seq, struct kernfs_node *kn,
> > struct kernfs_root *kf_root)
> > {
> > struct cgroup_root *root = cgroup_root_from_kf(kf_root);
> > @@ -1619,6 +1643,8 @@ static int cgroup_show_options(struct seq_file *seq,
> > seq_puts(seq, ",clone_children");
> > if (strlen(root->name))
> > seq_show_option(seq, "name", root->name);
> > + cgroup_show_nsroot(seq, kn, kf_root);
> > +
> > return 0;
> > }
next prev parent reply other threads:[~2016-04-14 15:27 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-21 23:41 [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field Serge E. Hallyn
[not found] ` <20160321234133.GA22463-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-29 1:12 ` Serge E. Hallyn
[not found] ` <20160329011203.GA8974-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-30 18:58 ` Tejun Heo
2016-03-29 13:58 ` Tycho Andersen
2016-03-29 20:00 ` Serge E. Hallyn
[not found] ` <20160329200018.GA21908-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-30 17:21 ` [PATCH] cgroup mount: ignore nsroot= Serge E. Hallyn
[not found] ` <20160330172100.GA11373-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-30 18:09 ` Tycho Andersen
2016-04-13 17:57 ` [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field Tejun Heo
[not found] ` <20160413175736.GC3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-04-13 18:46 ` Serge E. Hallyn
[not found] ` <20160413184639.GA29483-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-13 18:50 ` Tejun Heo
[not found] ` <20160413185033.GH3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-04-13 19:01 ` Serge E. Hallyn
[not found] ` <20160413190152.GA29753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-13 19:12 ` Tejun Heo
2016-04-13 23:31 ` Aditya Kali
[not found] ` <CAGr1F2HXJ1BdMFY+vF40O_khE+4S7OnbQPv-h1Q_AmGGhL7mzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-13 23:52 ` Serge E. Hallyn
2016-04-14 4:04 ` [PATCH] " Serge E. Hallyn
[not found] ` <20160414040436.GA3739-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-14 14:42 ` Eric W. Biederman
[not found] ` <87oa9c6ymf.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-04-14 15:27 ` Serge E. Hallyn [this message]
[not found] ` <20160414152747.GA12700-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-14 16:12 ` Eric W. Biederman
[not found] ` <877fg06uf9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-04-14 16:38 ` Serge E. Hallyn
2016-04-14 16:43 ` Eric W. Biederman
2016-04-15 15:50 ` Aditya Kali
[not found] ` <CAGr1F2EZtts38SPDc9cuH1prc6NfUJiwUQmqyRp-RpNYM5UzxA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-15 16:02 ` Serge E. Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160414152747.GA12700@mail.hallyn.com \
--to=serge-a9i7lubdfnhqt0dzr+alfa@public.gmane.org \
--cc=adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).