From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753470AbaETOUJ (ORCPT ); Tue, 20 May 2014 10:20:09 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:41582 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750745AbaETOUH (ORCPT ); Tue, 20 May 2014 10:20:07 -0400 Date: Tue, 20 May 2014 14:19:31 +0000 From: Serge Hallyn To: Andy Lutomirski Cc: "Serge E. Hallyn" , "Michael H. Warfield" , Arnd Bergmann , LXC development mailing-list , Richard Weinberger , James Bottomley , LKML , Serge Hallyn , Jens Axboe Subject: Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces Message-ID: <20140520141931.GH26600@ubuntumail> References: <1400120251.7699.11.camel@canyon.ip6.wittsend.com> <20140515031527.GA146352@ubuntu-hedt> <20140515040032.GA6702@kroah.com> <1400161337.7699.33.camel@canyon.ip6.wittsend.com> <20140515140856.GA17453@kroah.com> <20140515195010.GA22317@ubuntumail> <53751FFA.5040103@nod.at> <20140515202628.GB25896@mail.hallyn.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Quoting Andy Lutomirski (luto@amacapital.net): > On May 15, 2014 1:26 PM, "Serge E. Hallyn" wrote: > > > > Quoting Richard Weinberger (richard@nod.at): > > > Am 15.05.2014 21:50, schrieb Serge Hallyn: > > > > Quoting Richard Weinberger (richard.weinberger@gmail.com): > > > >> On Thu, May 15, 2014 at 4:08 PM, Greg Kroah-Hartman > > > >> wrote: > > > >>> Then don't use a container to build such a thing, or fix the build > > > >>> scripts to not do that :) > > > >> > > > >> I second this. > > > >> To me it looks like some folks try to (ab)use Linux containers > > > >> for purposes where KVM would much better fit in. > > > >> Please don't put more complexity into containers. They are already > > > >> horrible complex > > > >> and error prone. > > > > > > > > I, naturally, disagree :) The only use case which is inherently not > > > > valid for containers is running a kernel. Practically speaking there > > > > are other things which likely will never be possible, but if someone > > > > offers a way to do something in containers, "you can't do that in > > > > containers" is not an apropos response. > > > > > > > > "That abstraction is wrong" is certainly valid, as when vpids were > > > > originally proposed and rejected, resulting in the development of > > > > pid namespaces. "We have to work out (x) first" can be valid (and > > > > I can think of examples here), assuming it's not just trying to hide > > > > behind a catch-22/chicken-egg problem. > > > > > > > > Finally, saying "containers are complex and error prone" is conflating > > > > several large suites of userspace code and many kernel features which > > > > support them. Being more precise would, if the argument is valid, > > > > lend it a lot more weight. > > > > > > We (my company) use Linux containers since 2011 in production. First LXC, now libvirt-lxc. > > > To understand the internals better I also wrote my own userspace to create/start > > > containers. There are so many things which can hurt you badly. > > > With user namespaces we expose a really big attack surface to regular users. > > > I.e. Suddenly a user is allowed to mount filesystems. > > > > That is currently not the case. They can mount some virtual filesystems > > and do bind mounts, but cannot mount most real filesystems. This keeps > > us protected (for now) from potentially unsafe superblock readers in the > > kernel. > > > > > Ask Andy, he found already lots of nasty things... > > I don't think I have anything brilliant to add to this discussion > right now, except possibly: > > ISTM that Linux distributions are, in general, vulnerable to all kinds > of shenanigans that would happen if an untrusted user can cause a > block device to appear. That user doesn't need permission to mount it Interesting point. This would further suggest that we absolutely must ensure that a loop device which shows up in the container does not also show up in the host. > or even necessarily to change its contents on the fly. > > E.g. what happens if you boot a machine that contains a malicious disk > image that has the same partition UUID as /? Nothing good, I imagine. > > So if we're going to go down this road, we really need some way to > tell the host that certain devices are not trusted. > > --Andy