From mboxrd@z Thu Jan 1 00:00:00 1970 From: ebiederm@xmission.com (Eric W. Biederman) Subject: Re: [RFC][PATCH] ns: Syscalls for better namespace sharing control. Date: Mon, 08 Mar 2010 13:25:03 -0800 Message-ID: References: <4B88E431.6040609@parallels.com> <4B8D28CF.8060304@parallels.com> <20100302211942.GA17816@us.ibm.com> <20100303000743.GA13744@us.ibm.com> <4B8E9370.3050300@parallels.com> <4B9158F5.5040205@parallels.com> <4B926B1B.5070207@free.fr> <4B92C886.9020507@free.fr> <4B952BBE.6070507@free.fr> <4B9556A9.60206@free.fr> <4B95611C.5060403@free.fr> <4B956852.7050804@free.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Pavel Emelyanov , Sukadev Bhattiprolu , Serge Hallyn , Linux Netdev List , containers@lists.linux-foundation.org, Netfilter Development Mailinglist , Ben Greear To: Daniel Lezcano Return-path: Received: from out01.mta.xmission.com ([166.70.13.231]:54250 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755600Ab0CHVZJ (ORCPT ); Mon, 8 Mar 2010 16:25:09 -0500 In-Reply-To: <4B956852.7050804@free.fr> (Daniel Lezcano's message of "Mon\, 08 Mar 2010 22\:12\:50 +0100") Sender: netdev-owner@vger.kernel.org List-ID: Daniel Lezcano writes: > Eric W. Biederman wrote: >> Daniel Lezcano writes: >> >> >>> Eric W. Biederman wrote: >>> >>>> Daniel Lezcano writes: >>>> >>>> >>>>> Eric W. Biederman wrote: >>>>> >>>>>> Daniel Lezcano writes: >>>>>> >>>>>> >>>>>>> Eric W. Biederman wrote: >>>>>>> >>>>>>>> I have take an snapshot of my development tree and placed it at. >>>>>>>> >>>>>>>> >>>>>>>> git://git.kernel.org/pub/scm/linux/people/ebiederm/linux-2.6.33-nsfd-v5.git >>>>>>>> >>>>>>> Hi Eric, >>>>>>> >>>>>>> thanks for the pointer. >>>>>>> >>>>>>> I tried to boot the kernel under qemu and I got this oops: >>>>>>> >>>>>> I am clearly running an old userspace on my test machine. No udev. >>>>>> It looks like udev has a long standing netlink misfeature, where >>>>>> it does not initializing NETLINK_CB.... >>>>>> >>>>>> >>>>>> >From 8d85e3ab88718eda3d94cf8e1be14b69dae2b8f1 Mon Sep 17 00:00:00 2001 >>>>>> From: Eric W. Biederman >>>>>> Date: Mon, 8 Mar 2010 09:25:20 -0800 >>>>>> Subject: [PATCH] kobject_uevent: Use the netlink allocator helper... >>>>>> >>>>>> Signed-off-by: Eric W. Biederman >>>>>> >>>>> Thanks. >>>>> >>>>> I was able to boot but I have the following warning: >>>>> >>>> Thanks for the bug report. >>>> >>> Thanks to you for the patchset :) >>> >>> >>>> For the moment you might want to drop: >>>> af_netlink: Allow credentials to work across namespaces. >>>> af_netlink: Debugging in case I have missed something. >>>> >>>> Although I am curious if you hit my debugging messages in >>>> netlink recv. >>>> >>> No, it does not appear (looked for "missing NETLINK_CB proto"). >>> >>> >>>> I guess if the goal is to test my nsfd bits you can drop everything >>>> starting with my 'scm: Reorder scm_cookie.' commit. The rest is what >>>> it takes to get get uids, gid and pids translated when the cross >>>> namespaces on an af_unix of an af_netlink socket. >>>> >>>> At least in the af_netlink case it appears clear I am have missed >>>> something. >>>> >>>> This is a warning that netlink throws when the packet accounting messed >>>> up. So it sounds like you are exercising another path that I failed >>>> to exercise and fix. >>>> >>> I will look forward if I find more clues for this warning. >>> >>> In the meantime was able to enter the container with the ugly following >>> program: >>> >>> #include >>> #include >>> #include >>> #include >>> #include >>> #include >>> #include >>> #include >>> >>> #define __NR_setns 300 >>> >>> int setns(int nstype, int fd) >>> { >>> return syscall (__NR_setns, nstype, fd); >>> } >>> >>> int main(int argc, char *argv[]) >>> { >>> char path[MAXPATHLEN]; >>> char *ns[] = { "pid", "mnt", "net", "pid", "uts" }; >>> const int size = sizeof(ns) / sizeof(char *); >>> int fd[size]; >>> int i; >>> >>> if (argc != 3) { >>> fprintf(stderr, "mynsenter \n"); >>> exit(1); >>> } >>> >>> for (i = 0; i < size; i++) { >>> sprintf(path, "/proc/%s/ns/%s", argv[1], ns[i]); >>> >>> fd[i] = open(path, O_RDONLY); >>> if (fd[i] < 0) { >>> perror("open"); >>> return -1; >>> } >>> >>> } >>> >>> for (i = 0; i < size; i++) { >>> >>> if (setns(0, fd[i])) { >>> perror("setns"); >>> return -1; >>> } >>> } >>> >>> execve(argv[2], &argv[2], NULL); >>> perror("execve"); >>> >>> return 0; >>> } >>> >>> At the fist glance, no problem :) >>> >> >> No fork() so your processes is completely in the pid namespace? >> > What I do is to attach "/bin/sh" to the container with this program. > The container is a VPS running busybox with the full isolation. > > echo $$ gives the real pid. > All the forked processes appears in the pid namespace, they are visible through > /proc with the virtual pid. > I am not able to change to the /proc/self directory (I assume this is normal). I guess my meaning is I was expecting. child = fork(); if (child == 0) { execve(...); } waitpid(child); This puts /bin/sh in the container as well. I'm not certain about the /proc/self thing I have never encountered that. But I guess if your pid is outside of the pid namespace of that instance of proc /proc/self will be a broken symlink. Eric