linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Richard Weinberger <richard@nod.at>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aditya Kali <adityakali@google.com>, Tejun Heo <tj@kernel.org>,
	Li Zefan <lizefan@huawei.com>,
	Serge Hallyn <serge.hallyn@ubuntu.com>,
	Andy Lutomirski <luto@amacapital.net>,
	cgroups mailinglist <cgroups@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Linux Containers <containers@lists.linux-foundation.org>,
	Rohit Jnagal <jnagal@google.com>, Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces
Date: Tue, 06 Jan 2015 01:07:04 +0100	[thread overview]
Message-ID: <54AB2728.60403@nod.at> (raw)
In-Reply-To: <87lhlgpyxk.fsf@x220.int.ebiederm.org>

Am 06.01.2015 um 00:53 schrieb Eric W. Biederman:
> Richard Weinberger <richard@nod.at> writes:
> 
>> Am 05.01.2015 um 23:48 schrieb Aditya Kali:
>>> On Sun, Dec 14, 2014 at 3:05 PM, Richard Weinberger <richard@nod.at> wrote:
>>>> Aditya,
>>>>
>>>> I gave your patch set a try but it does not work for me.
>>>> Maybe you can bring some light into the issues I'm facing.
>>>> Sadly I still had no time to dig into your code.
>>>>
>>>> Am 05.12.2014 um 02:55 schrieb Aditya Kali:
>>>>> Signed-off-by: Aditya Kali <adityakali@google.com>
>>>>> ---
>>>>>  Documentation/cgroups/namespace.txt | 147 ++++++++++++++++++++++++++++++++++++
>>>>>  1 file changed, 147 insertions(+)
>>>>>  create mode 100644 Documentation/cgroups/namespace.txt
>>>>>
>>>>> diff --git a/Documentation/cgroups/namespace.txt b/Documentation/cgroups/namespace.txt
>>>>> new file mode 100644
>>>>> index 0000000..6480379
>>>>> --- /dev/null
>>>>> +++ b/Documentation/cgroups/namespace.txt
>>>>> @@ -0,0 +1,147 @@
>>>>> +                     CGroup Namespaces
>>>>> +
>>>>> +CGroup Namespace provides a mechanism to virtualize the view of the
>>>>> +/proc/<pid>/cgroup file. The CLONE_NEWCGROUP clone-flag can be used with
>>>>> +clone() and unshare() syscalls to create a new cgroup namespace.
>>>>> +The process running inside the cgroup namespace will have its /proc/<pid>/cgroup
>>>>> +output restricted to cgroupns-root. cgroupns-root is the cgroup of the process
>>>>> +at the time of creation of the cgroup namespace.
>>>>> +
>>>>> +Prior to CGroup Namespace, the /proc/<pid>/cgroup file used to show complete
>>>>> +path of the cgroup of a process. In a container setup (where a set of cgroups
>>>>> +and namespaces are intended to isolate processes), the /proc/<pid>/cgroup file
>>>>> +may leak potential system level information to the isolated processes.
>>>>> +
>>>>> +For Example:
>>>>> +  $ cat /proc/self/cgroup
>>>>> +  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1
>>>>> +
>>>>> +The path '/batchjobs/container_id1' can generally be considered as system-data
>>>>> +and its desirable to not expose it to the isolated process.
>>>>> +
>>>>> +CGroup Namespaces can be used to restrict visibility of this path.
>>>>> +For Example:
>>>>> +  # Before creating cgroup namespace
>>>>> +  $ ls -l /proc/self/ns/cgroup
>>>>> +  lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
>>>>> +  $ cat /proc/self/cgroup
>>>>> +  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1
>>>>> +
>>>>> +  # unshare(CLONE_NEWCGROUP) and exec /bin/bash
>>>>> +  $ ~/unshare -c
>>>>> +  [ns]$ ls -l /proc/self/ns/cgroup
>>>>> +  lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
>>>>> +  # From within new cgroupns, process sees that its in the root cgroup
>>>>> +  [ns]$ cat /proc/self/cgroup
>>>>> +  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/
>>>>> +
>>>>> +  # From global cgroupns:
>>>>> +  $ cat /proc/<pid>/cgroup
>>>>> +  0:cpuset,cpu,cpuacct,memory,devices,freezer,hugetlb:/batchjobs/container_id1
>>>>> +
>>>>> +  # Unshare cgroupns along with userns and mountns
>>>>> +  # Following calls unshare(CLONE_NEWCGROUP|CLONE_NEWUSER|CLONE_NEWNS), then
>>>>> +  # sets up uid/gid map and execs /bin/bash
>>>>> +  $ ~/unshare -c -u -m
>>>>
>>>> This command does not issue CLONE_NEWUSER, -U does.
>>>>
>>> I was using a custom unshare binary. But I will update the command
>>> line to be similar to the one in util-linux.
>>>
>>>>> +  # Originally, we were in /batchjobs/container_id1 cgroup. Mount our own cgroup
>>>>> +  # hierarchy.
>>>>> +  [ns]$ mount -t cgroup cgroup /tmp/cgroup
>>>>> +  [ns]$ ls -l /tmp/cgroup
>>>>> +  total 0
>>>>> +  -r--r--r-- 1 root root 0 2014-10-13 09:32 cgroup.controllers
>>>>> +  -r--r--r-- 1 root root 0 2014-10-13 09:32 cgroup.populated
>>>>> +  -rw-r--r-- 1 root root 0 2014-10-13 09:25 cgroup.procs
>>>>> +  -rw-r--r-- 1 root root 0 2014-10-13 09:32 cgroup.subtree_control
>>>>
>>>> I've patched libvirt-lxc to issue CLONE_NEWCGROUP and not bind mount cgroupfs into a container.
>>>> But I'm unable to mount cgroupfs within the container, mount(2) is failing with EINVAL.
>>>> And /proc/self/cgroup still shows the cgroup from outside.
>>>>
>>>> ---cut---
>>>> container:/ # ls /sys/fs/cgroup/
>>>> container:/ # mount -t cgroup none /sys/fs/cgroup/
>>>
>>> You need to provide "-o __DEVEL_sane_behavior" flag. Inside the
>>> container, only unified hierarchy can be mounted. So, for now, that
>>> flag is needed. I will fix the documentation too.
>>>
>>>> mount: wrong fs type, bad option, bad superblock on none,
>>>>        missing codepage or helper program, or other error
>>>>
>>>>        In some cases useful info is found in syslog - try
>>>>        dmesg | tail or so.
>>>> container:/ # cat /proc/self/cgroup
>>>> 8:memory:/machine/test00.libvirt-lxc
>>>> 7:devices:/machine/test00.libvirt-lxc
>>>> 6:hugetlb:/
>>>> 5:cpuset:/machine/test00.libvirt-lxc
>>>> 4:blkio:/machine/test00.libvirt-lxc
>>>> 3:cpu,cpuacct:/machine/test00.libvirt-lxc
>>>> 2:freezer:/machine/test00.libvirt-lxc
>>>> 1:name=systemd:/user.slice/user-0.slice/session-c2.scope
>>>> container:/ # ls -la /proc/self/ns
>>>> total 0
>>>> dr-x--x--x 2 root root 0 Dec 14 23:02 .
>>>> dr-xr-xr-x 8 root root 0 Dec 14 23:02 ..
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:02 cgroup -> cgroup:[4026532240]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:02 ipc -> ipc:[4026532238]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:02 mnt -> mnt:[4026532235]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:02 net -> net:[4026532242]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:02 pid -> pid:[4026532239]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:02 user -> user:[4026532234]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:02 uts -> uts:[4026532236]
>>>> container:/ #
>>>>
>>>> #host side
>>>> lxc-os132:~ # ls -la /proc/self/ns
>>>> total 0
>>>> dr-x--x--x 2 root root 0 Dec 14 23:56 .
>>>> dr-xr-xr-x 8 root root 0 Dec 14 23:56 ..
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:56 cgroup -> cgroup:[4026531835]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:56 ipc -> ipc:[4026531839]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:56 mnt -> mnt:[4026531840]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:56 net -> net:[4026531957]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:56 pid -> pid:[4026531836]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:56 user -> user:[4026531837]
>>>> lrwxrwxrwx 1 root root 0 Dec 14 23:56 uts -> uts:[4026531838]
>>>> ---cut---
>>>>
>>>> Any ideas?
>>>>
>>>
>>> Please try with "-o __DEVEL_sane_behavior" flag to the mount command.
>>
>> Ohh, this renders the whole patch useless for me as systemd needs the "old/default" behavior of cgroups. :-(
>> I really hoped that cgroup namespaces will help me running systemd in a sane way within Linux containers.
> 
> Ugh.  It sounds like there is a real mess here.  At the very least there
> is misunderstanding.
> 
> I have a memory that systemd should have been able to use a unified
> hierarchy.  As you could still mount the different controllers
> independently (they just use the same directory structure on each
> mount).

Luckily systemd folks want to move to the unified but as of now it does not work.
Please see this mail from Lennart:
https://www.redhat.com/archives/libvir-list/2014-November/msg01090.html

Maybe the porting is easy. Dunno.
I had no time yet to look into that.

> That said from a practical standpoint I am not certain that a cgroup
> namespace is viable if it can not support the behavior of cgroupsfs
> that everyone is using.

Yep.

systemd *really* wants to own cgroupfs, so it has to mount it within the container.
Currently libvirt does nasty hacks using bind mounts which are also problematic.
My hope was that with cgroup namespaces I can simply cheat systemd and give it
a cgroupfs to mess with.

Thanks,
//richard

  reply	other threads:[~2015-01-06  0:07 UTC|newest]

Thread overview: 157+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <adityakali-cgroupns>
2014-07-17 19:52 ` [PATCH 0/5] RFC: CGroup Namespaces Aditya Kali
2014-07-17 19:52   ` [PATCH 1/5] kernfs: Add API to get generate relative kernfs path Aditya Kali
2014-07-24 15:10     ` Serge Hallyn
2014-07-17 19:52   ` [PATCH 2/5] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace Aditya Kali
2014-07-24 17:01     ` Serge Hallyn
2014-07-31 19:48       ` Aditya Kali
2014-08-04 23:12         ` Serge Hallyn
2014-07-17 19:52   ` [PATCH 3/5] cgroup: add function to get task's cgroup on default hierarchy Aditya Kali
2014-07-24 16:59     ` Serge Hallyn
2014-07-17 19:52   ` [PATCH 4/5] cgroup: export cgroup_get() and cgroup_put() Aditya Kali
2014-07-24 17:03     ` Serge Hallyn
2014-07-17 19:52   ` [PATCH 5/5] cgroup: introduce cgroup namespaces Aditya Kali
2014-07-17 19:57     ` Andy Lutomirski
2014-07-17 20:55       ` Aditya Kali
2014-07-18 16:51         ` Andy Lutomirski
2014-07-18 18:51           ` Aditya Kali
2014-07-18 18:57             ` Andy Lutomirski
2014-07-21 22:11               ` Aditya Kali
2014-07-21 22:16                 ` Andy Lutomirski
2014-07-23 19:52                   ` Aditya Kali
2014-07-18 16:00   ` [PATCH 0/5] RFC: CGroup Namespaces Serge Hallyn
2014-07-24 16:10   ` Serge Hallyn
2014-07-24 16:36   ` Serge Hallyn
2014-07-25 19:29     ` Aditya Kali
2014-07-25 20:27       ` Andy Lutomirski
2014-07-29  4:51       ` Serge E. Hallyn
2014-07-29 15:08         ` Andy Lutomirski
2014-07-29 16:06           ` Serge E. Hallyn
2014-10-13 21:23 ` [PATCHv1 0/8] " Aditya Kali
2014-10-13 21:23   ` [PATCHv1 1/8] kernfs: Add API to generate relative kernfs path Aditya Kali
2014-10-16 16:07     ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace Aditya Kali
2014-10-16 16:08     ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 3/8] cgroup: add function to get task's cgroup on default hierarchy Aditya Kali
2014-10-16 16:13     ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 4/8] cgroup: export cgroup_get() and cgroup_put() Aditya Kali
2014-10-16 16:14     ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 5/8] cgroup: introduce cgroup namespaces Aditya Kali
2014-10-16 16:37     ` Serge E. Hallyn
2014-10-24  1:03       ` Aditya Kali
2014-10-25  3:16         ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 6/8] cgroup: restrict cgroup operations within task's cgroupns Aditya Kali
2014-10-17  9:28     ` Serge E. Hallyn
2014-10-22 19:06       ` Aditya Kali
2014-10-19  4:57     ` Eric W. Biederman
2014-10-13 21:23   ` [PATCHv1 7/8] cgroup: cgroup namespace setns support Aditya Kali
2014-10-16 21:12     ` Serge E. Hallyn
2014-10-16 21:17       ` Andy Lutomirski
2014-10-16 21:22       ` Aditya Kali
2014-10-16 21:47         ` Serge E. Hallyn
2014-10-19  5:23           ` Eric W. Biederman
2014-10-19 18:26             ` Andy Lutomirski
2014-10-20  4:55               ` Eric W.Biederman
2014-10-21  0:20                 ` Andy Lutomirski
2014-10-21  4:49                   ` Eric W. Biederman
2014-10-21  5:03                     ` Andy Lutomirski
2014-10-21  5:42                       ` Eric W. Biederman
2014-10-21  5:49                         ` Andy Lutomirski
2014-10-21 18:49                           ` Aditya Kali
2014-10-21 19:02                             ` Andy Lutomirski
2014-10-21 22:33                               ` Aditya Kali
2014-10-21 22:42                                 ` Andy Lutomirski
2014-10-22  0:46                                   ` Aditya Kali
2014-10-22  0:58                                     ` Andy Lutomirski
2014-10-22 18:37                                       ` Aditya Kali
2014-10-22 18:50                                         ` Andy Lutomirski
2014-10-22 19:42                                         ` Tejun Heo
2014-10-17  9:52     ` Serge E. Hallyn
2014-10-13 21:23   ` [PATCHv1 8/8] cgroup: mount cgroupns-root when inside non-init cgroupns Aditya Kali
2014-10-17 12:19     ` Serge E. Hallyn
2014-10-14 22:42   ` [PATCHv1 0/8] CGroup Namespaces Andy Lutomirski
2014-10-14 23:33     ` Aditya Kali
2014-10-19  4:54   ` Eric W. Biederman
2015-07-22 18:10     ` Vincent Batts
2014-10-31 19:18 ` [PATCHv2 0/7] " Aditya Kali
2014-10-31 19:18   ` [PATCHv2 1/7] kernfs: Add API to generate relative kernfs path Aditya Kali
2014-10-31 19:18   ` [PATCHv2 2/7] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace Aditya Kali
2014-10-31 19:18   ` [PATCHv2 3/7] cgroup: add function to get task's cgroup on default hierarchy Aditya Kali
2014-10-31 19:18   ` [PATCHv2 4/7] cgroup: export cgroup_get() and cgroup_put() Aditya Kali
2014-10-31 19:18   ` [PATCHv2 5/7] cgroup: introduce cgroup namespaces Aditya Kali
2014-11-01  0:02     ` Andy Lutomirski
2014-11-01  0:58       ` Eric W. Biederman
2014-11-03 23:42         ` Aditya Kali
2014-11-03 23:40       ` Aditya Kali
2014-11-04  1:56     ` Aditya Kali
2014-10-31 19:19   ` [PATCHv2 6/7] cgroup: cgroup namespace setns support Aditya Kali
2014-10-31 19:19   ` [PATCHv2 7/7] cgroup: mount cgroupns-root when inside non-init cgroupns Aditya Kali
2014-11-01  0:07     ` Andy Lutomirski
2014-11-01  2:59       ` Eric W. Biederman
2014-11-01  3:29         ` Andy Lutomirski
2014-11-03 23:12       ` Aditya Kali
2014-11-03 23:15         ` Andy Lutomirski
2014-11-03 23:23           ` Aditya Kali
2014-11-03 23:48             ` Andy Lutomirski
2014-11-04  0:12               ` Aditya Kali
2014-11-04  0:17                 ` Andy Lutomirski
2014-11-04  0:49                   ` Aditya Kali
2014-11-04 13:57         ` Tejun Heo
2014-11-06 17:28           ` Aditya Kali
2014-11-01  1:09     ` Eric W. Biederman
2014-11-03 22:46       ` Aditya Kali
     [not found]       ` <CAGr1F2Hd_PS_AscBGMXdZC9qkHGRUp-MeQvJksDOQkRBB3RGoA@mail.gmail.com>
2014-11-03 22:56         ` Andy Lutomirski
2014-11-04 13:46         ` Tejun Heo
2014-11-04 15:00           ` Andy Lutomirski
2014-11-04 15:50             ` Serge E. Hallyn
2014-11-12 17:48               ` Aditya Kali
2014-11-04  1:59     ` Aditya Kali
2014-11-04 13:10   ` [PATCHv2 0/7] CGroup Namespaces Vivek Goyal
2014-11-06 17:33     ` Aditya Kali
2014-11-26 22:58       ` Richard Weinberger
2014-12-02 19:14         ` Aditya Kali
2014-12-05  1:55 ` [PATCHv3 0/8] " Aditya Kali
2014-12-05  1:55   ` [PATCHv3 1/8] kernfs: Add API to generate relative kernfs path Aditya Kali
2014-12-05  1:55   ` [PATCHv3 2/8] sched: new clone flag CLONE_NEWCGROUP for cgroup namespace Aditya Kali
2014-12-05  1:55   ` [PATCHv3 3/8] cgroup: add function to get task's cgroup on default hierarchy Aditya Kali
2014-12-05  1:55   ` [PATCHv3 4/8] cgroup: export cgroup_get() and cgroup_put() Aditya Kali
2014-12-05  1:55   ` [PATCHv3 5/8] cgroup: introduce cgroup namespaces Aditya Kali
2014-12-12  8:54     ` Zefan Li
2014-12-05  1:55   ` [PATCHv3 6/8] cgroup: cgroup namespace setns support Aditya Kali
2014-12-05  1:55   ` [PATCHv3 7/8] cgroup: mount cgroupns-root when inside non-init cgroupns Aditya Kali
2014-12-12  8:55     ` Zefan Li
2014-12-05  1:55   ` [PATCHv3 8/8] cgroup: Add documentation for cgroup namespaces Aditya Kali
2014-12-12  8:54     ` Zefan Li
2015-01-05 22:54       ` Aditya Kali
2014-12-14 23:05     ` Richard Weinberger
2015-01-05 22:48       ` Aditya Kali
2015-01-05 22:52         ` Richard Weinberger
2015-01-05 23:53           ` Eric W. Biederman
2015-01-06  0:07             ` Richard Weinberger [this message]
2015-01-06  0:10             ` Aditya Kali
2015-01-06  0:17               ` Richard Weinberger
2015-01-06 23:20                 ` Aditya Kali
2015-01-06 23:39                   ` Richard Weinberger
2015-01-07  9:28                   ` Richard Weinberger
2015-01-07 14:45                     ` Eric W. Biederman
2015-01-07 19:30                       ` Serge E. Hallyn
2015-01-07 22:14                         ` Eric W. Biederman
2015-01-07 22:45                           ` Tejun Heo
2015-01-07 23:02                             ` Eric W. Biederman
2015-01-07 23:06                               ` Tejun Heo
2015-01-07 23:09                                 ` Eric W. Biederman
2015-01-07 23:16                                   ` Tejun Heo
2015-01-07 23:27                                   ` Eric W. Biederman
2015-01-07 23:35                                     ` Tejun Heo
2015-02-11  3:46                                       ` Serge E. Hallyn
2015-02-11  4:09                                         ` Tejun Heo
2015-02-11  4:29                                           ` Serge E. Hallyn
2015-02-11  5:02                                             ` Eric W. Biederman
2015-02-11  5:17                                               ` Tejun Heo
2015-02-11  6:29                                                 ` Eric W. Biederman
2015-02-11 14:36                                                   ` Tejun Heo
2015-02-11 16:00                                                 ` Serge E. Hallyn
2015-02-11 16:03                                                   ` Tejun Heo
2015-02-11 16:18                                                     ` Serge E. Hallyn
2015-02-11  5:10                                             ` Tejun Heo
2015-01-07 18:57                     ` Aditya Kali
2014-12-05  3:20   ` [PATCHv3 0/8] CGroup Namespaces Aditya Kali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54AB2728.60403@nod.at \
    --to=richard@nod.at \
    --cc=adityakali@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=jnagal@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=serge.hallyn@ubuntu.com \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).