From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Marian Marinov <mm@1h.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>,
Andy Lutomirski <luto@amacapital.net>,
"Serge E. Hallyn" <serge@hallyn.com>,
"Michael H. Warfield" <mhw@wittsend.com>,
Arnd Bergmann <arnd@arndb.de>,
LXC development mailing-list
<lxc-devel@lists.linuxcontainers.org>,
Richard Weinberger <richard@nod.at>,
LKML <linux-kernel@vger.kernel.org>,
Serge Hallyn <serge.hallyn@canonical.com>,
Jens Axboe <axboe@kernel.dk>
Subject: Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces
Date: Fri, 23 May 2014 06:16:00 -0700 [thread overview]
Message-ID: <1400850960.2332.4.camel@dabdike> (raw)
In-Reply-To: <537F04BF.3000301@1h.com>
On Fri, 2014-05-23 at 11:20 +0300, Marian Marinov wrote:
> On 05/20/2014 05:19 PM, Serge Hallyn wrote:
> > Quoting Andy Lutomirski (luto@amacapital.net):
> >> On May 15, 2014 1:26 PM, "Serge E. Hallyn" <serge@hallyn.com> wrote:
> >>>
> >>> Quoting Richard Weinberger (richard@nod.at):
> >>>> Am 15.05.2014 21:50, schrieb Serge Hallyn:
> >>>>> Quoting Richard Weinberger (richard.weinberger@gmail.com):
> >>>>>> On Thu, May 15, 2014 at 4:08 PM, Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> >>>>>>> Then don't use a container to build such a thing, or fix the build scripts to not do that :)
> >>>>>>
> >>>>>> I second this. To me it looks like some folks try to (ab)use Linux containers for purposes where KVM
> >>>>>> would much better fit in. Please don't put more complexity into containers. They are already horrible
> >>>>>> complex and error prone.
> >>>>>
> >>>>> I, naturally, disagree :) The only use case which is inherently not valid for containers is running a
> >>>>> kernel. Practically speaking there are other things which likely will never be possible, but if someone
> >>>>> offers a way to do something in containers, "you can't do that in containers" is not an apropos response.
> >>>>>
> >>>>> "That abstraction is wrong" is certainly valid, as when vpids were originally proposed and rejected,
> >>>>> resulting in the development of pid namespaces. "We have to work out (x) first" can be valid (and I can
> >>>>> think of examples here), assuming it's not just trying to hide behind a catch-22/chicken-egg problem.
> >>>>>
> >>>>> Finally, saying "containers are complex and error prone" is conflating several large suites of userspace
> >>>>> code and many kernel features which support them. Being more precise would, if the argument is valid, lend
> >>>>> it a lot more weight.
> >>>>
> >>>> We (my company) use Linux containers since 2011 in production. First LXC, now libvirt-lxc. To understand the
> >>>> internals better I also wrote my own userspace to create/start containers. There are so many things which can
> >>>> hurt you badly. With user namespaces we expose a really big attack surface to regular users. I.e. Suddenly a
> >>>> user is allowed to mount filesystems.
> >>>
> >>> That is currently not the case. They can mount some virtual filesystems and do bind mounts, but cannot mount
> >>> most real filesystems. This keeps us protected (for now) from potentially unsafe superblock readers in the
> >>> kernel.
> >>>
> >>>> Ask Andy, he found already lots of nasty things...
> >>
> >> I don't think I have anything brilliant to add to this discussion right now, except possibly:
> >>
> >> ISTM that Linux distributions are, in general, vulnerable to all kinds of shenanigans that would happen if an
> >> untrusted user can cause a block device to appear. That user doesn't need permission to mount it
> >
> > Interesting point. This would further suggest that we absolutely must ensure that a loop device which shows up in
> > the container does not also show up in the host.
>
> Can I suggest the usage of the devices cgroup to achieve that?
Not really ... cgroups impose resource limits, it's namespaces that
impose visibility separations. In theory this can be done with the
device namespace that's been proposed; however, a simpler way is simply
to rm the device node in the host and mknod it in the guest. I don't
really see host visibility as a huge problem: in a shared OS
virtualisation it's not really possible securely to separate the guest
from the host (only vice versa).
But I really don't think we want to do it this way. Giving a container
the ability to do a mount is too dangerous. What we want to do is
intercept the mount in the host and perform it on behalf of the guest as
host root in the guest's mount namespace. If you do it that way, it
doesn't really matter what device actually shows up in the guest, as
long as the host knows what to do when the mount request comes along.
James
next prev parent reply other threads:[~2014-05-23 13:16 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-14 21:34 [RFC PATCH 00/11] Add support for devtmpfs in user namespaces Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 01/11] driver core: Assign owning user namespace to devices Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 02/11] driver core: Add device_create_global() Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 03/11] tmpfs: Add sub-filesystem data pointer to shmem_sb_info Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 04/11] ramfs: Add sub-filesystem data pointer to ram_fs_info Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 05/11] devtmpfs: Add support for mounting in user namespaces Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 06/11] drivers/char/mem.c: Make null/zero/full/random/urandom available to " Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 07/11] block: Make partitions inherit namespace from whole disk device Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 08/11] block: Allow blkdev ioctls within user namespaces Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 09/11] misc: Make loop-control available to all " Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 10/11] loop: Assign devices to current_user_ns() Seth Forshee
2014-05-14 21:34 ` [RFC PATCH 11/11] loop: Allow priveleged operations for root in the namespace which owns a device Seth Forshee
2014-05-23 5:48 ` Marian Marinov
2014-05-26 9:16 ` Seth Forshee
2014-05-26 15:32 ` [lxc-devel] " Michael H. Warfield
2014-05-26 15:45 ` Seth Forshee
2014-05-27 1:36 ` Serge E. Hallyn
2014-05-27 2:39 ` Michael H. Warfield
2014-05-27 7:16 ` Seth Forshee
2014-05-27 13:16 ` Serge Hallyn
2014-05-15 1:32 ` [RFC PATCH 00/11] Add support for devtmpfs in user namespaces Greg Kroah-Hartman
2014-05-15 2:17 ` [lxc-devel] " Michael H. Warfield
2014-05-15 3:15 ` Seth Forshee
2014-05-15 4:00 ` Greg Kroah-Hartman
2014-05-15 13:42 ` Michael H. Warfield
2014-05-15 14:08 ` Greg Kroah-Hartman
2014-05-15 17:42 ` Serge Hallyn
2014-05-15 18:12 ` Seth Forshee
2014-05-15 22:15 ` Greg Kroah-Hartman
2014-05-16 1:42 ` Michael H. Warfield
2014-05-16 7:56 ` Richard Weinberger
2014-05-16 19:20 ` James Bottomley
2014-05-16 19:42 ` Michael H. Warfield
2014-05-16 19:52 ` [lxc-devel] Mount and other notifiers, was: " James Bottomley
2014-05-16 20:04 ` Michael H. Warfield
2014-05-16 1:49 ` [lxc-devel] " Serge Hallyn
2014-05-16 4:35 ` Greg Kroah-Hartman
2014-05-16 14:06 ` Seth Forshee
2014-05-16 15:28 ` Michael H. Warfield
2014-05-16 15:43 ` Seth Forshee
2014-05-16 18:57 ` Greg Kroah-Hartman
2014-05-16 19:28 ` James Bottomley
2014-05-16 20:18 ` Seth Forshee
2014-05-20 0:04 ` Eric W. Biederman
2014-05-20 1:14 ` Michael H. Warfield
2014-05-20 14:18 ` Serge Hallyn
2014-05-20 14:21 ` Seth Forshee
2014-05-21 22:00 ` Eric W. Biederman
2014-05-21 22:33 ` Serge Hallyn
2014-05-23 22:23 ` Eric W. Biederman
2014-05-28 9:26 ` Seth Forshee
2014-05-28 13:12 ` Serge E. Hallyn
2014-05-28 20:33 ` Eric W. Biederman
2014-05-18 2:42 ` Serge E. Hallyn
2014-05-17 4:31 ` Eric W. Biederman
2014-05-17 16:01 ` Seth Forshee
2014-05-18 2:44 ` Serge E. Hallyn
2014-05-19 13:27 ` Seth Forshee
2014-05-20 14:15 ` Serge Hallyn
2014-05-20 14:26 ` Serge Hallyn
2014-05-17 12:57 ` Michael H. Warfield
2014-05-15 18:25 ` Richard Weinberger
2014-05-15 19:50 ` Serge Hallyn
2014-05-15 20:13 ` Richard Weinberger
2014-05-15 20:26 ` Serge E. Hallyn
2014-05-15 20:33 ` Richard Weinberger
2014-05-19 20:22 ` Andy Lutomirski
2014-05-20 14:19 ` Serge Hallyn
2014-05-23 8:20 ` Marian Marinov
2014-05-23 13:16 ` James Bottomley [this message]
2014-05-23 16:39 ` Andy Lutomirski
2014-05-24 22:25 ` Serge Hallyn
2014-05-25 8:12 ` James Bottomley
2014-05-25 22:24 ` Serge E. Hallyn
2014-05-28 7:02 ` James Bottomley
2014-05-28 13:49 ` Serge Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1400850960.2332.4.camel@dabdike \
--to=james.bottomley@hansenpartnership.com \
--cc=arnd@arndb.de \
--cc=axboe@kernel.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=lxc-devel@lists.linuxcontainers.org \
--cc=mhw@wittsend.com \
--cc=mm@1h.com \
--cc=richard@nod.at \
--cc=serge.hallyn@canonical.com \
--cc=serge.hallyn@ubuntu.com \
--cc=serge@hallyn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).