cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
To: Aditya Kali <adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux Containers
	<containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>,
	"Eric W. Biederman"
	<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>,
	cgroups mailinglist
	<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	lkml <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field
Date: Wed, 13 Apr 2016 18:52:40 -0500	[thread overview]
Message-ID: <20160413235240.GA921@mail.hallyn.com> (raw)
In-Reply-To: <CAGr1F2HXJ1BdMFY+vF40O_khE+4S7OnbQPv-h1Q_AmGGhL7mzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Quoting Aditya Kali (adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org):
> On Wed, Apr 13, 2016 at 12:01 PM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> > Quoting Tejun Heo (tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org):
> >> Hello, Serge.
> >>
> >> On Wed, Apr 13, 2016 at 01:46:39PM -0500, Serge E. Hallyn wrote:
> >> > It's not a leak of any information we're trying to hide.  I realize
> >> > something like 8 years have passed, but I still basically go by the
> >> > ksummit guidance that containers are ok but the kernel's first priority
> >> > is to facilitate containers but not trick containers into thinking
> >> > they're not containerized.  So long as the container is properly set
> >> > up, I don't think there's anything the workload could do with the
> >> > nsroot= info other than *know* that it is in a ns cgroup.
> >> >
> >> > If we did change that guidance, there's a slew of proc info that we
> >> > could better virtualize :)
> >>
> >> I see.  I'm just wondering because the information here seems a bit
> >> gratuituous.  Isn't the only thing necessary telling whether the root
> >> is bind mounted or namescoped?  Wouldn't simple "nsroot" work for that
> >> purpose?
> >
> > I don't think so - we could be in a cgroup namespace but still have
> > access only to bind-mounted cgroups.  So we need to compare the
> > superblock dentry root field to the nsroot= value.
> 
> Umm, I don't think this is such a good idea. The main purpose of
> cgroup namespace was to prevent this exposure of system cgroup
> hierarchy that used to happen because of /proc/self/cgroup. Wouldn't
> showing that information in /proc/self/mountinfo defeat the purpose?

I disagree.  The primary purpose was to simplify init's job and to keep
cgroup mounts in sync with /proc/self/cgroup.  So that userspace doesn't
have to look at /proc/self/cgroup and then try and figure out how that
relates to its actual cgroup mountpoints.  It was not to *hide* the
information.

Field 3 already gives us the path, nsroot just tells us what part of
it we are namespaced under.

> > One practical problem I've found with cgroup namespaces is that there
> > is no way to disambiguate between a cgroupfs mount which was done in
> > a cgroup namespace, and a bind mount of a cgroupfs directory.
> 
> Thats actually by design, no? Namespaced apps should not know/care if
> they are running inside namespace. If they can find it out today, its

No.  If a workload isn't allowed to mount its own cgroups, and can only
see that freezer /lxc/x1 was mounted at /dev/cgroup (poorly done, but we
don't get to pass judgement or choose mountpoints for userspace), and
it sees /lxc/x1 in its freezer entry for /proc/self/cgroup, then it
cannot tell whether it should be using /dev/cgroup/tasks or
/dev/cgroup/lxc/tasks or /dev/cgroup/lxc/x1/tasks.  That's a problem.

> just because of certain side-effects. I fear adding explicit "nsroot"
> or something in /proc/self/mountinfo now becomes an API making it hard
> to virtualize user-apps again.

It doesn't make it hard to virtualize.  The only complication would be
if you wanted to checkpoint/restart and reproduce the exact
/proc/self/mountinfo output.  That's a bogus goal anyway, since the
restart could be in a different cgroup and field 3 would be different.

In contrast, not providing this makes it impossible for software to
deal with both cgroup namespace and any bind-mounted cgroups.  Which
means any new docker (say) which can run in cgroup namespaces will
not be able to run under old (that is, anything currently released
except lxc 2.0) container managers.  We're breaking all container
managers.

Now the other thing we could do would be to tweak field 3 in the
mountinfo output.  That had been my first inclination, but the way
the mountinfo code is currently done makes that ... challenging.

-serge

  parent reply	other threads:[~2016-04-13 23:52 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-21 23:41 [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field Serge E. Hallyn
     [not found] ` <20160321234133.GA22463-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-29  1:12   ` Serge E. Hallyn
     [not found]     ` <20160329011203.GA8974-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-30 18:58       ` Tejun Heo
2016-03-29 13:58   ` Tycho Andersen
2016-03-29 20:00     ` Serge E. Hallyn
     [not found]       ` <20160329200018.GA21908-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-30 17:21         ` [PATCH] cgroup mount: ignore nsroot= Serge E. Hallyn
     [not found]           ` <20160330172100.GA11373-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-03-30 18:09             ` Tycho Andersen
2016-04-13 17:57   ` [RFC PATCH] cgroup namespaces: add a 'nsroot=' mountinfo field Tejun Heo
     [not found]     ` <20160413175736.GC3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-04-13 18:46       ` Serge E. Hallyn
     [not found]         ` <20160413184639.GA29483-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-13 18:50           ` Tejun Heo
     [not found]             ` <20160413185033.GH3676-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org>
2016-04-13 19:01               ` Serge E. Hallyn
     [not found]                 ` <20160413190152.GA29753-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-13 19:12                   ` Tejun Heo
2016-04-13 23:31                   ` Aditya Kali
     [not found]                     ` <CAGr1F2HXJ1BdMFY+vF40O_khE+4S7OnbQPv-h1Q_AmGGhL7mzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-13 23:52                       ` Serge E. Hallyn [this message]
2016-04-14  4:04       ` [PATCH] " Serge E. Hallyn
     [not found]         ` <20160414040436.GA3739-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-14 14:42           ` Eric W. Biederman
     [not found]             ` <87oa9c6ymf.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-04-14 15:27               ` Serge E. Hallyn
     [not found]                 ` <20160414152747.GA12700-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2016-04-14 16:12                   ` Eric W. Biederman
     [not found]                     ` <877fg06uf9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2016-04-14 16:38                       ` Serge E. Hallyn
2016-04-14 16:43                         ` Eric W. Biederman
2016-04-15 15:50                 ` Aditya Kali
     [not found]                   ` <CAGr1F2EZtts38SPDc9cuH1prc6NfUJiwUQmqyRp-RpNYM5UzxA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-04-15 16:02                     ` Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160413235240.GA921@mail.hallyn.com \
    --to=serge-a9i7lubdfnhqt0dzr+alfa@public.gmane.org \
    --cc=adityakali-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).