From: Seth Forshee <seth.forshee@canonical.com>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Serge Hallyn <serge.hallyn@ubuntu.com>,
Jens Axboe <axboe@kernel.dk>,
Serge Hallyn <serge.hallyn@canonical.com>,
Arnd Bergmann <arnd@arndb.de>,
linux-kernel@vger.kernel.org,
LXC development mailing-list
<lxc-devel@lists.linuxcontainers.org>
Subject: Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces
Date: Fri, 16 May 2014 15:18:41 -0500 [thread overview]
Message-ID: <20140516201841.GC23902@ubuntu-hedt> (raw)
In-Reply-To: <1400268515.2221.91.camel@dabdike.int.hansenpartnership.com>
On Fri, May 16, 2014 at 12:28:35PM -0700, James Bottomley wrote:
> On Fri, 2014-05-16 at 11:57 -0700, Greg Kroah-Hartman wrote:
> > On Fri, May 16, 2014 at 09:06:07AM -0500, Seth Forshee wrote:
> > > On Thu, May 15, 2014 at 09:35:32PM -0700, Greg Kroah-Hartman wrote:
> > > > On Fri, May 16, 2014 at 01:49:59AM +0000, Serge Hallyn wrote:
> > > > > > I think having to pick and choose what device nodes you want in a
> > > > > > container is a good thing. Becides, you would have to do the same thing
> > > > > > in the kernel anyway, what's wrong with userspace making the decision
> > > > > > here, especially as it knows exactly what it wants to do much more so
> > > > > > than the kernel ever can.
> > > > >
> > > > > For 'real' devices that sounds sensible. The thing about loop devices
> > > > > is that we simply want to allow a container to say "give me a loop
> > > > > device to use" and have it receive a unique loop device (or 3), without
> > > > > having to pre-assign them. I think that would be cleaner to do using
> > > > > a pseudofs and loop-control device, rather than having to have a
> > > > > daemon in userspace on the host farming those out in response to
> > > > > some, I don't know, dbus request?
> > > >
> > > > I agree that loop devices would be nice to have in a container, and that
> > > > the existing loop interface doesn't really lend itself to that. So
> > > > create a new type of thing that acts like a loop device in a container.
> > > > But don't try to mess with the whole driver core just for a single type
> > > > of device.
> > >
> > > No matter what I don't think we get out of this without driver core
> > > changes, whether this was done in loop or by creating something new.
> > > Not unless the whole thing is punted to userspace, anyway.
> > >
> > > The first problem is that many block device ioctls check for
> > > CAP_SYS_ADMIN. Most of these might not ever be used on loop devices, I'm
> > > not really sure. But loop does at minimum support partitions, and to get
> > > that functionality in an unprivileged container at least the block layer
> > > needs to know the namespace which has privileges for that device.
> >
> > That's fine, you should have those permissions in a container if you
> > want to do something like that on a loop device, right?
>
> Really, no. CAP_SYS_ADMIN is effectively a pseudo root security hole.
> Any user possessing CAP_SYS_ADMIN can do about as much damage as real
> root can, whether or not you use user namespaces, so it would compromise
> a lot of the security we're just bringing to containers.
>
> > > The second is that all block devices automatically appear in devtmpfs.
> > > The scenario I'm concerned about is that the host could unknowingly use
> > > a loop device exposed to a container, then the container could see data
> > > from the host.
> >
> > I don't think that's a real issue, the host should know not to do that.
> >
> > > So we either need a flag to tell the driver core not to create a node
> > > in devtmpfs, or we need a privileged manager in userspace to remove
> > > them (which kind of defeats the purpose). And it gets more complicated
> > > when partition block devs are mixed in, because they can be created
> > > without involvement from the driver - they would need to inherit the
> > > "no devtmpfs node" property from their parent, and if the driver uses
> > > a psuedo fs to create device nodes for userspace then it needs to be
> > > informed about the partitions too so it can create those nodes.
> >
> > I don't think that will be needed. Root in a host can do whatever it
> > wants in the containers, so mixing up block devices is the least of the
> > issues involved :)
> >
> > > So maybe we could get by without the privileged ioctls, as long as it
> > > was understood that unprivileged containers can't do partitioning. But I
> > > do think the devtmpfs problem would need to be addressed.
> >
> > I don't think unpriviliged containers should be able to do partitioning.
> > An unpriviliged user can't do that, so why should a container be any
> > different?
>
> To make sure we're on the same page with terminology, there's an
> unprivileged container and a secure container. In the former, there's
> no root user (all the processes run as non-root), so the container isn't
> expected to perform any actions root would ... that's easy. In a secure
> container, root is mapped to a nobody user in the host, so is
> effectively unprivileged, but root in the container expects to look like
> a real root within the VPS (and thus may expect to partition things,
> depending on how they've been given access to the block device). The
> big problem is giving back capabilities to the container root such that
> a) it loses them if it escapes the container and b) it doesn't get
> sufficient capabilities to damage the system.
Based on your description what I was talking about is a secure
container. Thanks for clearing that up, and sorry for misusing the
terminology.
What I set out for was feature parity between loop devices in a secure
container and loop devices on the host. Since some operations currently
check for system-wide CAP_SYS_ADMIN, the only way I see to accomplish
this is to push knowledge of the user namespace farther down into the
driver stack so the check can instead be for CAP_SYS_ADMIN in the user
namespace associated with the device.
That said, I suspect our current use cases can get by without these
capabilities. Really though I suspect this is just deferring the
discussion rather than settling it, and what we'll end up with is little
more than a fancy way for userspace to ask the kernel to run mknod on
its behalf.
Thanks,
Seth
next prev parent reply other threads:[~2014-05-16 20:18 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-14 21:34 [RFC PATCH 00/11] Add support for devtmpfs in user namespaces Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 01/11] driver core: Assign owning user namespace to devices Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 02/11] driver core: Add device_create_global() Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 03/11] tmpfs: Add sub-filesystem data pointer to shmem_sb_info Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 04/11] ramfs: Add sub-filesystem data pointer to ram_fs_info Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 05/11] devtmpfs: Add support for mounting in user namespaces Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 06/11] drivers/char/mem.c: Make null/zero/full/random/urandom available to " Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 07/11] block: Make partitions inherit namespace from whole disk device Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 08/11] block: Allow blkdev ioctls within user namespaces Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 09/11] misc: Make loop-control available to all " Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 10/11] loop: Assign devices to current_user_ns() Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 11/11] loop: Allow priveleged operations for root in the namespace which owns a device Seth Forshee
2014-05-23 5:48 ` Marian Marinov
2014-05-26 9:16 ` Seth Forshee
2014-05-26 15:32 ` [lxc-devel] " Michael H. Warfield
2014-05-26 15:45 ` Seth Forshee
2014-05-27 1:36 ` Serge E. Hallyn
2014-05-27 2:39 ` Michael H. Warfield
2014-05-27 7:16 ` Seth Forshee
2014-05-27 13:16 ` Serge Hallyn
2014-05-15 1:32 ` [RFC PATCH 00/11] Add support for devtmpfs in user namespaces Greg Kroah-Hartman
2014-05-15 2:17 ` [lxc-devel] " Michael H. Warfield
2014-05-15 3:15 ` Seth Forshee
2014-05-15 4:00 ` Greg Kroah-Hartman
2014-05-15 13:42 ` Michael H. Warfield
2014-05-15 14:08 ` Greg Kroah-Hartman
2014-05-15 17:42 ` Serge Hallyn
2014-05-15 18:12 ` Seth Forshee
2014-05-15 22:15 ` Greg Kroah-Hartman
2014-05-16 1:42 ` Michael H. Warfield
2014-05-16 7:56 ` Richard Weinberger
2014-05-16 19:20 ` James Bottomley
2014-05-16 19:42 ` Michael H. Warfield
2014-05-16 19:52 ` [lxc-devel] Mount and other notifiers, was: " James Bottomley
2014-05-16 20:04 ` Michael H. Warfield
2014-05-16 1:49 ` [lxc-devel] " Serge Hallyn
2014-05-16 4:35 ` Greg Kroah-Hartman
2014-05-16 14:06 ` Seth Forshee
2014-05-16 15:28 ` Michael H. Warfield
2014-05-16 15:43 ` Seth Forshee
2014-05-16 18:57 ` Greg Kroah-Hartman
2014-05-16 19:28 ` James Bottomley
2014-05-16 20:18 ` Seth Forshee [this message]
2014-05-20 0:04 ` Eric W. Biederman
2014-05-20 1:14 ` Michael H. Warfield
2014-05-20 14:18 ` Serge Hallyn
2014-05-20 14:21 ` Seth Forshee
2014-05-21 22:00 ` Eric W. Biederman
2014-05-21 22:33 ` Serge Hallyn
2014-05-23 22:23 ` Eric W. Biederman
2014-05-28 9:26 ` Seth Forshee
2014-05-28 13:12 ` Serge E. Hallyn
2014-05-28 20:33 ` Eric W. Biederman
2014-05-18 2:42 ` Serge E. Hallyn
2014-05-17 4:31 ` Eric W. Biederman
2014-05-17 16:01 ` Seth Forshee
2014-05-18 2:44 ` Serge E. Hallyn
2014-05-19 13:27 ` Seth Forshee
2014-05-20 14:15 ` Serge Hallyn
2014-05-20 14:26 ` Serge Hallyn
2014-05-17 12:57 ` Michael H. Warfield
2014-05-15 18:25 ` Richard Weinberger
2014-05-15 19:50 ` Serge Hallyn
2014-05-15 20:13 ` Richard Weinberger
2014-05-15 20:26 ` Serge E. Hallyn
2014-05-15 20:33 ` Richard Weinberger
2014-05-19 20:22 ` Andy Lutomirski
2014-05-20 14:19 ` Serge Hallyn
2014-05-23 8:20 ` Marian Marinov
2014-05-23 13:16 ` James Bottomley
2014-05-23 16:39 ` Andy Lutomirski
2014-05-24 22:25 ` Serge Hallyn
2014-05-25 8:12 ` James Bottomley
2014-05-25 22:24 ` Serge E. Hallyn
2014-05-28 7:02 ` James Bottomley
2014-05-28 13:49 ` Serge Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140516201841.GC23902@ubuntu-hedt \
--to=seth.forshee@canonical.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=arnd@arndb.de \
--cc=axboe@kernel.dk \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lxc-devel@lists.linuxcontainers.org \
--cc=serge.hallyn@canonical.com \
--cc=serge.hallyn@ubuntu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.