From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: Secure KVM Date: Mon, 07 Nov 2011 11:37:03 -0600 Message-ID: <4EB8173F.9090008@codemonkey.ws> References: <1320612020.3299.22.camel@lappy> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Andrea Arcangeli , Avi Kivity , Marcelo Tosatti , Ingo Molnar , Pekka Enberg , Cyrill Gorcunov , Asias He , Rusty Russell , "Michael S. Tsirkin" , kvm , Corentin Chary , qemu-devel To: Sasha Levin Return-path: Received: from mail-gy0-f174.google.com ([209.85.160.174]:52140 "EHLO mail-gy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754963Ab1KGRhI (ORCPT ); Mon, 7 Nov 2011 12:37:08 -0500 Received: by gyc15 with SMTP id 15so4407040gyc.19 for ; Mon, 07 Nov 2011 09:37:08 -0800 (PST) In-Reply-To: <1320612020.3299.22.camel@lappy> Sender: kvm-owner@vger.kernel.org List-ID: On 11/06/2011 02:40 PM, Sasha Levin wrote: > Hi all, > > I'm planning on doing a small fork of the KVM tool to turn it into a > 'Secure KVM' enabled hypervisor. Now you probably ask yourself, Huh? > > The idea was discussed briefly couple of months ago, but never got off > the ground - which is a shame IMO. > > It's easy to explain the problem: If an attacker finds a security hole > in any of the devices which are exposed to the guest, the attacker would > be able to either crash the guest, or possibly run code on the host > itself. > > The solution is also simple to explain: Split the devices into different > processes and use seccomp to sandbox each device into the exact set of > resources it needs to operate, nothing more and nothing less. > > Since I'll be basing it on the KVM tool, which doesn't really emulate > that many legacy devices, I'll focus first on the virtio family for the > sake of simplicity (and covering 90% of the options). > > This is my basic overview of how I'm planning on implementing the > initial POC: > > 1. First I'll focus on the simple virtio-rng device, it's simple enough > to allow us to focus on the aspects which are important for the POC > while still covering most bases (i.e. sandbox to single file > - /dev/urandom and such). > > 2. Do it on a one process per device concept, where for each device > (notice - not device *type*) requested, a new process which handles it > will be spawned. > > 3. That process will be limited exactly to the resources it needs to > operate, for example - if we run a virtio-blk device, it would be able > to access only the image file which it should be using. > > 4. Connection between hypervisor and devices will be based on unix > sockets, this should allow for better separation compared to other > approaches such as shared memory. > > 5. While performance is an aspect, complete isolation is more important. > Security is primary, performance is secondary. > > 6. Share as much code as possible with current implementation of virtio > devices, make it possible to run virtio devices either like it's being > done now, or by spawning them as separate processes - the amount of > specific code for the separate process case should be minimal. > > > Thats all I have for now, comments are *very* welcome. I thought about this a bit and have some ideas that may or may not help. 1) If you add device save/load support, then it's something you can potentially use to give yourself quite a bit of flexibility in changing the sandbox. At any point in run time, you can save the device model's state in the sandbox, destroy the sandbox, and then build a new sandbox and restore the device to its former state. This might turn out to be very useful in supporting things like device hotplug and/or memory hot plug. 2) I think it's largely possible to implement all device emulation without doing any dynamic memory allocation. Since memory allocation DoS is something you have to deal with anyway, I suspect most device emulation already uses a fixed amount of memory per device. This can potentially dramatically simplify things. 3) I think virtio can/should be used as a generic "backend to frontend" transport between the device model and the tool. 4) Lack of select() is really challenging. I understand why it's not there since it can technically be emulated but it seems like a no-risk syscall to whitelist and it would make programming in a sandbox so much easier. Maybe Andrea has some comments here? I might be missing something here. Regards, Anthony Liguori >