From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: Secure KVM
Date: Mon, 07 Nov 2011 11:37:03 -0600
Message-ID: <4EB8173F.9090008@codemonkey.ws>
References: <1320612020.3299.22.camel@lappy>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Avi Kivity <avi@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Ingo Molnar <mingo@elte.hu>, Pekka Enberg <penberg@kernel.org>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Asias He <asias.hejun@gmail.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	"Michael S. Tsirkin" <mst@redhat.com>, kvm <kvm@vger.kernel.org>,
	Corentin Chary <corentincj@iksaif.net>,
	qemu-devel <qemu-devel@nongnu.org>
To: Sasha Levin <levinsasha928@gmail.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-gy0-f174.google.com ([209.85.160.174]:52140 "EHLO
	mail-gy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754963Ab1KGRhI (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 7 Nov 2011 12:37:08 -0500
Received: by gyc15 with SMTP id 15so4407040gyc.19
        for <kvm@vger.kernel.org>; Mon, 07 Nov 2011 09:37:08 -0800 (PST)
In-Reply-To: <1320612020.3299.22.camel@lappy>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 11/06/2011 02:40 PM, Sasha Levin wrote:
> Hi all,
>
> I'm planning on doing a small fork of the KVM tool to turn it into a
> 'Secure KVM' enabled hypervisor. Now you probably ask yourself, Huh?
>
> The idea was discussed briefly couple of months ago, but never got off
> the ground - which is a shame IMO.
>
> It's easy to explain the problem: If an attacker finds a security hole
> in any of the devices which are exposed to the guest, the attacker would
> be able to either crash the guest, or possibly run code on the host
> itself.
>
> The solution is also simple to explain: Split the devices into different
> processes and use seccomp to sandbox each device into the exact set of
> resources it needs to operate, nothing more and nothing less.
>
> Since I'll be basing it on the KVM tool, which doesn't really emulate
> that many legacy devices, I'll focus first on the virtio family for the
> sake of simplicity (and covering 90% of the options).
>
> This is my basic overview of how I'm planning on implementing the
> initial POC:
>
> 1. First I'll focus on the simple virtio-rng device, it's simple enough
> to allow us to focus on the aspects which are important for the POC
> while still covering most bases (i.e. sandbox to single file
> - /dev/urandom and such).
>
> 2. Do it on a one process per device concept, where for each device
> (notice - not device *type*) requested, a new process which handles it
> will be spawned.
>
> 3. That process will be limited exactly to the resources it needs to
> operate, for example - if we run a virtio-blk device, it would be able
> to access only the image file which it should be using.
>
> 4. Connection between hypervisor and devices will be based on unix
> sockets, this should allow for better separation compared to other
> approaches such as shared memory.
>
> 5. While performance is an aspect, complete isolation is more important.
> Security is primary, performance is secondary.
>
> 6. Share as much code as possible with current implementation of virtio
> devices, make it possible to run virtio devices either like it's being
> done now, or by spawning them as separate processes - the amount of
> specific code for the separate process case should be minimal.
>
>
> Thats all I have for now, comments are *very* welcome.

I thought about this a bit and have some ideas that may or may not help.

1) If you add device save/load support, then it's something you can potentially 
use to give yourself quite a bit of flexibility in changing the sandbox.  At any 
point in run time, you can save the device model's state in the sandbox, destroy 
the sandbox, and then build a new sandbox and restore the device to its former 
state.

This might turn out to be very useful in supporting things like device hotplug 
and/or memory hot plug.

2) I think it's largely possible to implement all device emulation without doing 
any dynamic memory allocation.  Since memory allocation DoS is something you 
have to deal with anyway, I suspect most device emulation already uses a fixed 
amount of memory per device.   This can potentially dramatically simplify things.

3) I think virtio can/should be used as a generic "backend to frontend" 
transport between the device model and the tool.

4) Lack of select() is really challenging.  I understand why it's not there 
since it can technically be emulated but it seems like a no-risk syscall to 
whitelist and it would make programming in a sandbox so much easier.  Maybe 
Andrea has some comments here?  I might be missing something here.

Regards,

Anthony Liguori

>