correct usage of unshare+nsenter for persistent namespaces?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Assaf Gordon <assafgordon@gmail.com>
To: util-linux@vger.kernel.org
Subject: correct usage of unshare+nsenter for persistent namespaces?
Date: Fri, 10 Mar 2017 17:51:57 +0000	[thread overview]
Message-ID: <20170310175156.GB21783@gmail.com> (raw)

Hello Karel and all,

I'd like to ask you advice regarding proper usage of unshare+nsenter
to create persistent containers. I understand unshare(1) is rather 
low-level, but it would like to still be able to understand how to use 
it.

Apologise in advance for the long email, but I hope it will
result in better documentation (or at least better understanding for 
me).

There are many bits and pieces of information
around (man pages and blogs and stack-overflow, etc.),
but I haven't been able to find an authoritative example
of using it to create a contained re-entrant persistent environment.
(If I missed it, please do point me to it).

Step 1: preparations
--------------------

All my testing was done stock Debian 8.7,
with kernel 3.16.39-1+deb8u1,
and util-linux 2.29.2 compiled from source.
All commands run as 'root'.

Extrapolating from unshare's man page about creating
a persistent environment:

    basedir=/var/namespaces/ns1
    mkdir -p $basedir
    mount --bind $basedir $basedir
    mount --make-private $basedir
    for i in uts mnt pid net ipc user ;
    do
     touch $basedir/$i
    done

Are these correct?

Step 2: creating shared namespace
---------------------------------

(for now, I'm ignoring user-namespace, as it brings
its own complications.)

Starting a new environment using the following:

    unshare --uts=$basedir/uts \
            --mount=$basedir/mnt \
            --ipc=$basedir/ipc \
            --pid=$basedir/pid \
            --net=$basedir/net \
            --mount-proc \
            --fork \
            sh -c 'hostname foobar ; exec /bin/bash -il'

And indeed I get a prompt inside the container:

    root@foobar# ps ax
    PID TTY      STAT   TIME COMMAND
     1 pts/2    S      0:00 /bin/bash -il
     8 pts/2    R+     0:00 ps ax

    root@foobar# ifconfig -a
    lo        Link encap:Local Loopback
              LOOPBACK  MTU:65536  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

On the outside host, I see the mounts and the namespaces:

    # findmnt -O TARGET
    [...]
    └─/var/namespaces/ns1
     ├─/var/namespaces/ns1/ipc
     ├─/var/namespaces/ns1/uts
     ├─/var/namespaces/ns1/net
     ├─/var/namespaces/ns1/pid
     └─/var/namespaces/ns1/mnt

    # lsns
    NS        TYPE  NPROCS   PID USER     COMMAND
    [...]
    4026532329 mnt        2 19221 root     unshare --uts=..
    4026532330 uts        2 19221 root     unshare --uts=..
    4026532331 ipc        2 19221 root     unshare --uts=..
    4026532332 pid        1 19223 root     /bin/bash -il
    4026532334 net        2 19221 root     unshare --uts=..

Step 3: Re-entering
-------------------

Trying to enter based on PID works:

    # nsenter -t 19223 -m -u -i -n -p \
          sh -c 'hostname ; echo ; ps ax ; echo ; ifconfig -a'
    foobar

      PID TTY      STAT   TIME COMMAND
        1 pts/2    S+     0:00 /bin/bash -il
       15 pts/1    S+     0:00 sh -c hostname ; ps ax
       17 pts/1    R+     0:00 ps ax

    lo        Link encap:Local Loopback
          LOOPBACK  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

However trying to enter by the persistent mounts does not
re-enter the pid/net namespace:

    # nsenter --uts=$basedir/uts \
              --mount=$basedir/mnt \
              --ipc=$basedir/ipc \
              --pid=$basedir/pid \
              --net=$basedir/net \
              sh -c 'hostname ; echo ; ps ax ; echo ; ifconfig -a'
    foobar

    Error, do this: mount -t proc proc /proc

    Warning: cannot open /proc/net/dev (No such file or directory).
    Limited output.

Listing /proc inside the container shows it only lists PID 1
(the running '/bin/bash' from the original 'unshare' invocation).

Based on naive reading of unshare(1) man page (with the example of 
persistent UTS at the bottom), I assumed the above two examples with 
PID and with persistent mount points should be equivalent.

Is this a kernel limitation ?

Step 4: PID namespace is never persistent?
------------------------------------------

IIUC, this is a kernel limitation:
If the program which is PID1 inside the container
terminates, there is no way to re-enter the PID namespace
(http://man7.org/linux/man-pages/man7/pid_namespaces.7.html).

Is that correct?

If so, perhaps it would be helpful to add a caveat in the
unshare/nsenter man pages, saying the PID namespace will
not persist if the process termintes?

And if this is the case, would the following
work to create a re-entrant persistent namespace:

    unshare --uts=$basedir/uts \
            --mount=$basedir/mnt \
            --ipc=$basedir/ipc \
            --pid=$basedir/pid \
            --net=$basedir/net \
            --mount-proc \
            --fork \
            sleep inf

Obviosuly sleep(1) is not a good PID1, but is it conceptually correct
way to ensure the PID namespace is persistent?

There are already some examples of minimal 'init' for containers:
  https://github.com/Yelp/dumb-init
  https://github.com/krallin/tini
  and most minimal: https://gist.github.com/rofl0r/6168719 

I wonder if you will be willing to consider a patch to add
something like 'unshare --do-nothing-init' which
will simply create a process that does nothing except handling signals
and never terminates, to facilitate truly persistent namespaces with 
unshare(1) ? (if so I'm happy to try and write it).

Thank you for reaing so far.
regards,
 - assaf

P.S.
I have more questions about proper usage of user-namespace and 
switch_root/pivot_root, but I'll save them for later :)

P.P.S.

The download URL in the 2.92.2 announcement was http://ftp.kernel.org/
and it seems broken:
  $ host ftp.kernel.org
  Host ftp.kernel.org not found: 3(NXDOMAIN)
The working URL seems like 'www.kernel.org' (www. instead of ftp.):
  https://www.kernel.org/pub/linux/utils/util-linux/

next             reply	other threads:[~2017-03-10 17:52 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-10 17:51 Assaf Gordon [this message]
2017-03-27 11:41 ` correct usage of unshare+nsenter for persistent namespaces? Karel Zak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170310175156.GB21783@gmail.com \
    --to=assafgordon@gmail.com \
    --cc=util-linux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.