Re: Building a BSD-jail clone out of namespaces

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: ebiederm@xmission.com (Eric W. Biederman)
To: Chris Webb <chris@arachsys.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Building a BSD-jail clone out of namespaces
Date: Thu, 06 Jun 2013 09:35:36 -0700	[thread overview]
Message-ID: <87ehcf8aef.fsf@xmission.com> (raw)
In-Reply-To: <20130606161010.GI12062@arachsys.com> (Chris Webb's message of "Thu, 6 Jun 2013 17:10:10 +0100")

Chris Webb <chris@arachsys.com> writes:

> Prompted by the new userns support merged in the 3.8/3.9 kernels, I've been
> playing with namespaces and trying to understand how I could use them to
> build containers to replace some of my uses of qemu-kvm virtual machines.
>
> I've successfully created a fakeroot-type container running as an
> unprivileged user by unsharing everything including CLONE_NEWUSER, and can
> map a block of host UIDs for that environment by writing to
> /proc/PID/[ug]id_map from a helper process running as root.
>
> However, what I'm hoping for in practice is to be able to create containers
> whose access to its filesystem subtree is untranslated, i.e. uid/gid N in
> the container maps to uid/gid N in a subdirectory of the filesystem, but
> which is still isolated from the rest of the host filesystem and can't do
> externally privileged things. This is pretty much what a BSD jail provides,
> for example.
>
> Is this possible to achieve securely using the mechanisms now available?
> (I'm assuming that parent directory permissions prevent unprivileged host
> users from getting at these container filesystems, exactly as is necessary
> to make BSD jails safe.)
>
>
> As a first step, I naively tried running as root and unsharing everything
> with 
>
>   unshare(CLONE_NEWIPC | CLONE_NEWNS | CLONE_NEWNET | CLONE_NEWPID
>                        | CLONE_NEWUTS | CLONE_NEWUSER);
>
> before execing a shell[1]. From another root process in the host namespace,
> I then wrote a pass-through mapping 0 0 4294967295 to /proc/PID/[ug]id_map.

That will work, but you really don't want to run with uid == 0 mapped to
uid == 0.  There are too many things in /proc and /sys and similar that
grant access to uid == 0.

> The result initially looks plausible, with the PID namespace preventing
> signals being sent from one container to another, despite those processes
> sharing the same user ID in the top-level user namespace.
>
> However, unfortunately I still have too many privileges with respect to the
> host. Whilst (for example) I can't mknod, I can mount a sysfs or procfs and
> apparently write to them with host root privileges to reconfigure the host
> kernel. I suspect there will be other things I haven't secured by this
> recipe too.

Yes.  I recommend having a dedicated range of uids for your container to
prevent this kind of silliness.  Or at the very least a separate mapping
of uid == 0.

> I also tried tightening things up by dropping capabilities from my root user
> and preventing capability grant on exec by setting and locking SECBIT_NOROOT
> on before starting the container. However, I'm not sure this really makes
> any difference---does CLONE_NEWUSER drop all capabilities with respect to
> the parent namespace?

Yes.  CLONE_NEWUSER drops all capabilities with respect to the parent
namespace.

> [1] In this description, I'm ignoring the part where I lock into a new root
> filesystem, but presumably the way to do this is by pivot_root into a bind
> mount?

Yes pivot_root and bind mount work.

ERic

next prev parent reply	other threads:[~2013-06-06 16:58 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-06 16:10 Building a BSD-jail clone out of namespaces Chris Webb
2013-06-06 16:35 ` Eric W. Biederman [this message]
2013-06-06 16:46   ` Chris Webb
2013-06-06 16:56     ` Eric W. Biederman
2013-06-06 21:51   ` Chris Webb
2013-06-07  4:06     ` Eric W. Biederman
2013-06-07 12:58       ` Chris Webb
2013-06-27 13:43 ` Chris Webb

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ehcf8aef.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=chris@arachsys.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox