From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: Secure KVM
Date: Mon, 07 Nov 2011 11:39:30 -0600
Message-ID: <4EB817D2.5010200@codemonkey.ws>
References: <1320612020.3299.22.camel@lappy> <4EB7A45D.1030600@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Sasha Levin <levinsasha928@gmail.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Ingo Molnar <mingo@elte.hu>, Pekka Enberg <penberg@kernel.org>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Asias He <asias.hejun@gmail.com>,
	Rusty Russell <rusty@rustcorp.com.au>,
	"Michael S. Tsirkin" <mst@redhat.com>, kvm <kvm@vger.kernel.org>
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-iy0-f174.google.com ([209.85.210.174]:49434 "EHLO
	mail-iy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S933071Ab1KGRje (ORCPT <rfc822;kvm@vger.kernel.org>);
	Mon, 7 Nov 2011 12:39:34 -0500
Received: by iage36 with SMTP id e36so6080468iag.19
        for <kvm@vger.kernel.org>; Mon, 07 Nov 2011 09:39:33 -0800 (PST)
In-Reply-To: <4EB7A45D.1030600@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 11/07/2011 03:26 AM, Avi Kivity wrote:
> On 11/06/2011 10:40 PM, Sasha Levin wrote:
>> Hi all,
>>
>> I'm planning on doing a small fork of the KVM tool to turn it into a
>> 'Secure KVM' enabled hypervisor. Now you probably ask yourself, Huh?
>
> Actually, no.
>
>> The idea was discussed briefly couple of months ago, but never got off
>> the ground - which is a shame IMO.
>>
>> It's easy to explain the problem: If an attacker finds a security hole
>> in any of the devices which are exposed to the guest, the attacker would
>> be able to either crash the guest, or possibly run code on the host
>> itself.
>
> Crashing the guest is fine (not 100% - you can have unprivileged code
> managing a device, in which case we allow unprivileged code to crash the
> entire guest - but that's rare).  Running code on the host is also fine;
> we have a permissions system in place to prevent damage; see libvirt's
> sVirt code, which uses selinux to disallow an exploited guest from
> touching other guests or host data.  It should be able to protect
> host-only networks as well (not sure if it does that).
>
> The real risk is that the exploited hypervisor turns around and exploits
> yet another hole in the system, like a privileged daemon that the
> hypervisor is allowed to be in contact with, or the kernel itself, via a
> vulnerability in the kernel interfaces.
>
>> The solution is also simple to explain: Split the devices into different
>> processes and use seccomp to sandbox each device into the exact set of
>> resources it needs to operate, nothing more and nothing less.
>
> One thing to beware of is memory hotplug.  If the memory map is static,
> then a fork() once everything is set up (with MAP_SHARED) alllows all
> processes to access guest memory.  However, if memory hotplug is
> supported (or planned to be supported), then you can't do that, as
> seccomp doesn't allow you to run mmap() in confined processes.
>
> This means they have to use RPC to the main process in order to access
> memory, which is going to slow them down significantly.

If you treat the sandbox as ephemeral by leveraging save/restore, you can throw 
away and rebuild the device model on every memory change.  While not a super 
cheap operation, it's at least amortized over time.

Regards,

Anthony Liguori


>> Since I'll be basing it on the KVM tool, which doesn't really emulate
>> that many legacy devices, I'll focus first on the virtio family for the
>> sake of simplicity (and covering 90% of the options).
>
> Since virtio is so performance sensitive, my feeling is that it is
> better to audit it, and rely on sandboxing for the non performance
> sensitive parts of the device model.  Of course for a POC it's fine to
> start with it.
>
>> This is my basic overview of how I'm planning on implementing the
>> initial POC:
>
> <snip plan>
>
>> Thats all I have for now, comments are *very* welcome.
>
> This plan is quite similar to the equivalent plans for qemu.  However,
> as kvm-tool is much smaller than qemu, you're likely to have much easier
> time and make much faster progress.  This is really a great use of
> kvm-tool, to explore new ideas rather than catching up; and I'm sure
> your experience will prove useful for qemu as well.
>