* 2.6.35: unshare(NEWNS) does not work inside a container anymore?
@ 2010-08-31 11:02 Michael Tokarev
[not found] ` <4C7CE137.5090009-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Michael Tokarev @ 2010-08-31 11:02 UTC (permalink / raw)
To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
I just noticed a regression - immediately after updating
kernel from 2.6.32 to 2.6.35 (I skipped .33 and .34).
Namely, unshare(CLONE_NEWNS) stopped workin from within
a container, like this:
unshare(CLONE_NEWNS) = -1 EINVAL (Invalid argument)
There's no other fancy stuff going on around, just plain
unshare and exec a new shell.
What's wrong with 2.6.35 in this context?
Thanks.
/mjt
^ permalink raw reply [flat|nested] 6+ messages in thread[parent not found: <4C7CE137.5090009-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>]
* Re: 2.6.35: unshare(NEWNS) does not work inside a container anymore? [not found] ` <4C7CE137.5090009-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org> @ 2010-09-01 16:28 ` Serge E. Hallyn [not found] ` <20100901162833.GA13274-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> 2010-09-02 9:20 ` Michael Tokarev 1 sibling, 1 reply; 6+ messages in thread From: Serge E. Hallyn @ 2010-09-01 16:28 UTC (permalink / raw) To: Michael Tokarev; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Quoting Michael Tokarev (mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org): > I just noticed a regression - immediately after updating > kernel from 2.6.32 to 2.6.35 (I skipped .33 and .34). > Namely, unshare(CLONE_NEWNS) stopped workin from within > a container, like this: > > unshare(CLONE_NEWNS) = -1 EINVAL (Invalid argument) > > There's no other fancy stuff going on around, just plain > unshare and exec a new shell. I'm not seeing this behavior. I'm on 2.6.35-19-generic (ubuntu maverick), created a lucid container with the standard template, and tested with ns_exec (git clone git://git.sr71.net/~hallyn/cr_tests.git; git checkout ns_exec; make ns_exec; ns_exec -m /bin/bash; play with mounts; exit) Can you give us /proc/self/status and capsh --print output from inside the container before you try to unshare, and maybe strace output from the program you were using? > What's wrong with 2.6.35 in this context? > > Thanks. > > /mjt > _______________________________________________ > Containers mailing list > Containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org > https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20100901162833.GA13274-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>]
* Re: 2.6.35: unshare(NEWNS) does not work inside a container anymore? [not found] ` <20100901162833.GA13274-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> @ 2010-09-01 17:27 ` Michael Tokarev [not found] ` <4C7E8D1B.2060204-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Michael Tokarev @ 2010-09-01 17:27 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA 01.09.2010 20:28, Serge E. Hallyn wrote: > Quoting Michael Tokarev (mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org): >> I just noticed a regression - immediately after updating >> kernel from 2.6.32 to 2.6.35 (I skipped .33 and .34). >> Namely, unshare(CLONE_NEWNS) stopped workin from within >> a container, like this: >> >> unshare(CLONE_NEWNS) = -1 EINVAL (Invalid argument) >> >> There's no other fancy stuff going on around, just plain >> unshare and exec a new shell. > > I'm not seeing this behavior. I'm on 2.6.35-19-generic (ubuntu > maverick), created a lucid container with the standard template, > and tested with ns_exec > (git clone git://git.sr71.net/~hallyn/cr_tests.git; > git checkout ns_exec; make ns_exec; > ns_exec -m /bin/bash; play with mounts; exit) This one is not using unshare(2), it is using clone(2) syscall. I asked about unshare. In particular, lxc-unshare fails withing the container the same way too -- it too uses unshare(). > Can you give us /proc/self/status and capsh --print output > from inside the container before you try to unshare, and > maybe strace output from the program you were using? Sure. # cat /proc/self/status Name: cat State: R (running) Tgid: 2663 Pid: 2663 PPid: 2660 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 256 Groups: 0 VmPeak: 4944 kB VmSize: 4944 kB VmLck: 0 kB VmHWM: 232 kB VmRSS: 232 kB VmData: 160 kB VmStk: 136 kB VmExe: 40 kB VmLib: 1388 kB VmPTE: 24 kB VmSwap: 0 kB Threads: 1 SigQ: 4/63178 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000000 SigCgt: 0000000000000000 CapInh: 0000000000000000 CapPrm: ffffffffffbfffff CapEff: ffffffffffbfffff CapBnd: ffffffffffbfffff Cpus_allowed: f Cpus_allowed_list: 0-3 Mems_allowed: 1 Mems_allowed_list: 0 voluntary_ctxt_switches: 3 nonvoluntary_ctxt_switches: 2 # capsh --print Current: =ep cap_sys_boot-ep Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin Securebits: 00/0x0 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: no (unlocked) uid=0 # strace clone --fs bash execve("/usr/sbin/clone", ["clone", "--fs", "bash"], [/* 15 vars */]) = 0 brk(0) = 0x834c000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf76f1000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=18528, ...}) = 0 mmap2(NULL, 18528, PROT_READ, MAP_PRIVATE, 3, 0) = 0xf76ec000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/i686/cmov/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320m\1\0004\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1327556, ...}) = 0 mmap2(NULL, 1337704, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf75a5000 mprotect(0xf76e5000, 4096, PROT_NONE) = 0 mmap2(0xf76e6000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x140) = 0xf76e6000 mmap2(0xf76e9000, 10600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf76e9000 close(3) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf75a4000 set_thread_area({entry_number:-1 -> 12, base_addr:0xf75a46c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 mprotect(0xf76e6000, 8192, PROT_READ) = 0 mprotect(0xf770f000, 4096, PROT_READ) = 0 munmap(0xf76ec000, 18528) = 0 unshare(CLONE_NEWNS) = -1 EINVAL (Invalid argument) write(2, "clone: unshare: Invalid argument"..., 33clone: unshare: Invalid argument ) = 33 exit_group(1) = ? The source of this clone program is available at http://www.corpit.ru/mjt/clone.c - I use it for a long time, it works on this same machine outside of containers, and it worked in 2.6.32. Thanks! /mjt ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <4C7E8D1B.2060204-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>]
* Re: 2.6.35: unshare(NEWNS) does not work inside a container anymore? [not found] ` <4C7E8D1B.2060204-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org> @ 2010-09-01 19:41 ` Serge E. Hallyn [not found] ` <20100901194136.GA13918-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Serge E. Hallyn @ 2010-09-01 19:41 UTC (permalink / raw) To: Michael Tokarev; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Quoting Michael Tokarev (mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org): > 01.09.2010 20:28, Serge E. Hallyn wrote: > > Quoting Michael Tokarev (mjt-XAri/EZa3C4vJsYlp49lxw@public.gmane.org): > >> I just noticed a regression - immediately after updating > >> kernel from 2.6.32 to 2.6.35 (I skipped .33 and .34). > >> Namely, unshare(CLONE_NEWNS) stopped workin from within > >> a container, like this: > >> > >> unshare(CLONE_NEWNS) = -1 EINVAL (Invalid argument) > >> > >> There's no other fancy stuff going on around, just plain > >> unshare and exec a new shell. > > > > I'm not seeing this behavior. I'm on 2.6.35-19-generic (ubuntu > > maverick), created a lucid container with the standard template, > > and tested with ns_exec > > (git clone git://git.sr71.net/~hallyn/cr_tests.git; > > git checkout ns_exec; make ns_exec; > > ns_exec -m /bin/bash; play with mounts; exit) > > This one is not using unshare(2), it is using clone(2) syscall. That's only the case if you do 'ns_exec -cm'. > I asked about unshare. In particular, lxc-unshare fails withing > the container the same way too -- it too uses unshare(). lxc-unshare -s MOUNT /bin/bash passes here too. > > Can you give us /proc/self/status and capsh --print output > > from inside the container before you try to unshare, and > > maybe strace output from the program you were using? > > Sure. > > # cat /proc/self/status > Name: cat > State: R (running) > Tgid: 2663 > Pid: 2663 > PPid: 2660 > TracerPid: 0 > Uid: 0 0 0 0 > Gid: 0 0 0 0 > FDSize: 256 > Groups: 0 > VmPeak: 4944 kB > VmSize: 4944 kB > VmLck: 0 kB > VmHWM: 232 kB > VmRSS: 232 kB > VmData: 160 kB > VmStk: 136 kB > VmExe: 40 kB > VmLib: 1388 kB > VmPTE: 24 kB > VmSwap: 0 kB > Threads: 1 > SigQ: 4/63178 > SigPnd: 0000000000000000 > ShdPnd: 0000000000000000 > SigBlk: 0000000000000000 > SigIgn: 0000000000000000 > SigCgt: 0000000000000000 > CapInh: 0000000000000000 > CapPrm: ffffffffffbfffff > CapEff: ffffffffffbfffff > CapBnd: ffffffffffbfffff > Cpus_allowed: f > Cpus_allowed_list: 0-3 > Mems_allowed: 1 > Mems_allowed_list: 0 > voluntary_ctxt_switches: 3 > nonvoluntary_ctxt_switches: 2 > > # capsh --print > Current: =ep cap_sys_boot-ep > Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin > Securebits: 00/0x0 > secure-noroot: no (unlocked) > secure-no-suid-fixup: no (unlocked) > secure-keep-caps: no (unlocked) > uid=0 > > # strace clone --fs bash > execve("/usr/sbin/clone", ["clone", "--fs", "bash"], [/* 15 vars */]) = 0 > brk(0) = 0x834c000 > access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) > mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf76f1000 > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) > open("/etc/ld.so.cache", O_RDONLY) = 3 > fstat64(3, {st_mode=S_IFREG|0644, st_size=18528, ...}) = 0 > mmap2(NULL, 18528, PROT_READ, MAP_PRIVATE, 3, 0) = 0xf76ec000 > close(3) = 0 > access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) > open("/lib/i686/cmov/libc.so.6", O_RDONLY) = 3 > read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320m\1\0004\0\0\0"..., 512) = 512 > fstat64(3, {st_mode=S_IFREG|0755, st_size=1327556, ...}) = 0 > mmap2(NULL, 1337704, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xf75a5000 > mprotect(0xf76e5000, 4096, PROT_NONE) = 0 > mmap2(0xf76e6000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x140) = 0xf76e6000 > mmap2(0xf76e9000, 10600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xf76e9000 > close(3) = 0 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xf75a4000 > set_thread_area({entry_number:-1 -> 12, base_addr:0xf75a46c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 > mprotect(0xf76e6000, 8192, PROT_READ) = 0 > mprotect(0xf770f000, 4096, PROT_READ) = 0 > munmap(0xf76ec000, 18528) = 0 > unshare(CLONE_NEWNS) = -1 EINVAL (Invalid argument) > write(2, "clone: unshare: Invalid argument"..., 33clone: unshare: Invalid argument > ) = 33 > exit_group(1) = ? > > The source of this clone program is available at > http://www.corpit.ru/mjt/clone.c - I use it for > a long time, it works on this same machine > outside of containers, and it worked in 2.6.32. Hm, is working for me. You're on a plain upstream 2.6.35, as in commitid 9fe6206f400646a2322096b56c59891d530e8d51 ? I see nothing obvious in your output, unfortunately. -serge ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20100901194136.GA13918-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>]
* Re: 2.6.35: unshare(NEWNS) does not work inside a container anymore? [not found] ` <20100901194136.GA13918-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> @ 2010-09-01 19:53 ` Michael Tokarev 0 siblings, 0 replies; 6+ messages in thread From: Michael Tokarev @ 2010-09-01 19:53 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA 01.09.2010 23:41, Serge E. Hallyn wrote: [] >>>> unshare(CLONE_NEWNS) = -1 EINVAL (Invalid argument) [] >>> ns_exec -m /bin/bash; play with mounts; exit) >> This one is not using unshare(2), it is using clone(2) syscall. > > That's only the case if you do 'ns_exec -cm'. Oh. I missed that. [] >> The source of this clone program is available at >> http://www.corpit.ru/mjt/clone.c - I use it for >> a long time, it works on this same machine >> outside of containers, and it worked in 2.6.32. > > Hm, is working for me. You're on a plain upstream 2.6.35, as in commitid > 9fe6206f400646a2322096b56c59891d530e8d51 ? No, it's 2.6.35.4 - last stable. Plain 2.6.35 works (or fails) the same for me as 2.6.35 - this one: http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.35.tar.bz2 But I see at least one possible difference: I run 64bit kernel and a 32bit userspace, including lxc tools and unshare code. Lemme check with 64bit (native) userspace.... /mjt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.35: unshare(NEWNS) does not work inside a container anymore? [not found] ` <4C7CE137.5090009-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org> 2010-09-01 16:28 ` Serge E. Hallyn @ 2010-09-02 9:20 ` Michael Tokarev 1 sibling, 0 replies; 6+ messages in thread From: Michael Tokarev @ 2010-09-02 9:20 UTC (permalink / raw) To: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA 31.08.2010 15:02, Michael Tokarev wrote: > I just noticed a regression - immediately after updating > kernel from 2.6.32 to 2.6.35 (I skipped .33 and .34). > Namely, unshare(CLONE_NEWNS) stopped workin from within > a container, like this: > > unshare(CLONE_NEWNS) = -1 EINVAL (Invalid argument) > > There's no other fancy stuff going on around, just plain > unshare and exec a new shell. > > What's wrong with 2.6.35 in this context? So, after discussing this on IRC and doing some discovery, it turned out to be new (in 2.6.35) cgroup subsystem -- block I/O controller (CONFIG_BLK_CGROUP). This one does not allow more than 1 level of nesting, so, for example, it is impossible to create a subdirectory in another cgroup dir in cgroupfs: mkdir /dev/cgroup/foo -- this one succeeds, but mkdir /dev/cgroup/foo/bar -- this fails as long as blkio mount option is enabled. Once disabled, it works again. In 2.6.35 block/blk-cgroup.c, blkiocg_create() there's the following code: /* Currently we do not support hierarchy deeper than two level (0,1) */ if (parent != cgroup->top_cgroup) return ERR_PTR(-EINVAL); In 2.6.36-tobe it were changed to return ERR_PTR(-EPERM); but the issue remains anyway. What is problematic here is that blkio is different from all other cgroups in this very respect (not allowing nesting), but there's no indication of this fact anywhere. At least, the above quoted place warranrs a WARN() or WARN_ONCE() to tell the user what's going on - or else it's very difficult to debug. Speaking of real solution, it looks like disallowing nesting should be done in a different way. Maybe allow creation of a subcontainer but reset the limits in there and catch attempts to set them, - I dunno. Or, don't clone whole cgroup hierarchy on CLONE_NEWNS only. Current situation is too restrictive IMHO - blkio controller is useful for a container like LXC, but currently it implies that one can't create even a new filesystem namespace within it. Thanks. /mjt ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-09-02 9:20 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-31 11:02 2.6.35: unshare(NEWNS) does not work inside a container anymore? Michael Tokarev
[not found] ` <4C7CE137.5090009-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
2010-09-01 16:28 ` Serge E. Hallyn
[not found] ` <20100901162833.GA13274-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-09-01 17:27 ` Michael Tokarev
[not found] ` <4C7E8D1B.2060204-Gdu+ltImwkhes2APU0mLOQ@public.gmane.org>
2010-09-01 19:41 ` Serge E. Hallyn
[not found] ` <20100901194136.GA13918-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2010-09-01 19:53 ` Michael Tokarev
2010-09-02 9:20 ` Michael Tokarev
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.