From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:36871)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1RYMXa-0002ii-6y
	for qemu-devel@nongnu.org; Wed, 07 Dec 2011 13:48:27 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1RYMXV-0003HS-S6
	for qemu-devel@nongnu.org; Wed, 07 Dec 2011 13:48:26 -0500
Received: from mail-qy0-f173.google.com ([209.85.216.173]:56220)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1RYMXV-0003HM-NC
	for qemu-devel@nongnu.org; Wed, 07 Dec 2011 13:48:21 -0500
Received: by qcsd15 with SMTP id d15so774610qcs.4
	for <qemu-devel@nongnu.org>; Wed, 07 Dec 2011 10:48:21 -0800 (PST)
Message-ID: <4EDFB4F0.70406@codemonkey.ws>
Date: Wed, 07 Dec 2011 12:48:16 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <4EDFAF91.4070904@linux.vnet.ibm.com>
In-Reply-To: <4EDFAF91.4070904@linux.vnet.ibm.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] Device sandboxing
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Corey Bryant <coreyb@linux.vnet.ibm.com>
Cc: Ashley D Lai <adlai@us.ibm.com>, Stefan Hajnoczi <stefanha@gmail.com>, Michael Halcrow <mhalcrow@google.com>, qemu-devel@nongnu.org, Eric Paris <eparis@redhat.com>, Paul Moore <pmoore@redhat.com>, =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <radimkrcmar@hpx.cz>, Avi Kivity <avi@redhat.com>, Richa Marwaha <rmarwah@us.ibm.com>, Amit Shah <amit.shah@redhat.com>, Eduardo Terrell Ferrari Otubo <eotubo@br.ibm.com>, Lee Terrell <lterrell@us.ibm.com>, George Wilson <gcwilson@us.ibm.com>

On 12/07/2011 12:25 PM, Corey Bryant wrote:
> A group of us are starting to work on sandboxing QEMU device emulation
> code. We're just getting started investigating various approaches, and
> want to engage the community to gather input.
>
> Following are the design points that we are currently considering:

To be perfectly honest, I think prototyping and measuring performance is going 
to be the only way to figure out the right approach here.  Here are some 
thoughts on the various approaches.

>
> * Decompose QEMU into multiple processes:
>
> * This could be done such that QEMU devices execute in separate
> processes based on device type, e.g. all block devices in one
> process and all network devices in a second process. Another
> alternative is executing a separate process per device.

I don't think that a HIRD of QEMU-replacing daemons is the best approach to this 
problem.  While I appreciate the academic attraction to such a proposal, I think 
practical experience tells us that this isn't the easiest type of system to get 
right.

> * Decomposition would not only afford a level of security inherent
> in process separation, it would also allow development of stricter
> sVirt/SELinux policy for the decomposed QEMU processes (e.g. a
> block device specific policy). This would enable a true sandbox
> with layers of defense.
>
> * Decompose the device emulation process further into an untrusted and
> trusted thread:

I think this general approach is the most rationale place to start.

> * The untrusted thread would be restricted by seccomp mode 1 and
> would contain the device emulation code.

I think the best strategy would allow for a device to run either in the 
untrusted thread or the trusted thread.  This makes performance testing a bit 
easier and it also makes development a bit more natural.

> * The trusted helper thread would run beside the untrusted thread,
> enabling the untrusted thread to make syscalls beyond read(),
> write(), exit(), and sigreturn().

I assume you mean process, not thread BTW?

> * IPC communication mechanisms:
>
> * An IPC mechanism will be required to enable communication between
> untrusted and trusted threads.
>
> * An IPC mechanism will also be required to enable communication
> between the main QEMU process and device processes.

IPC is easy.  We have tons of infrastructure in QEMU for IPC (virtio, QMP, 
etc.).  Please don't reinvent the wheel here.

> * The communication mechanisms must provide secure communication,
> be low overhead (easy to generate, parse, and validate), and must
> play well with sVirt/LSMs.

I don't see how sVirt/LSM fits into this but all of these requirements are also 
true for the other big untrusted thread that we interact with (the guest itself).

My view is that we should view the untrusted thread as an extension of the guest 
and that the interfaces between the trusted thread and the untrusted thread 
views it simply as another machine type that presents a different (simpler) 
hardware abstraction.

> * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
> Google Native Client's IMC, and shared memory.

The actual mechanism doesn't really matter I think, but see above comments.

> * If seccomp mode 2 support becomes available, decomposition of device
> emulation into untrusted/trusted threads may not be necessary. This
> could result in improved performance (no IPC overhead between trusted
> and untrusted thread) and reduced complexity (no need for trusted
> helper thread).

If mode 2 is the Right Answer, then we shouldn't wait for it to become 
available.  We should make it available by pushing it into the kernel.

If we all agree that if mode 2 existed, it's what we would use, then that we 
have the answer to this discussion and we know what we need to go off and do.

> * Execution of QEMU with the sandboxed device support should be an
> optional run-time specification.

Ack with a small exception.  If we can demonstrate that sandboxing has an 
acceptable performance overhead, then we should do it unconditionally to reduce 
our overall test matrix.  It's unclear that that's obtainable though.

> * We will be focusing on legacy devices first, both for performance and
> risk reasons.
>
> Once we settle on a direction, we will develop a proof of concept to
> share with the community.

Proof of concepts are the only way to settle on direction.  Code speaks louder 
than anything else.

>
> We appreciate your input.
>
> Regards,
>
> Ashley Lai
> Corey Bryant
> Eduardo Otubo
> Michael Halcrow
> Paul Moore
> Richa Marwaha

In the future, I would suggest beginning these type of discussions on the list 
to start with.  Otherwise valuable in information (including discussion and 
debate on directions) are not available to the greater community at large.

Not a big deal in this case, but I want to be on the record here about this.  I 
would have greatly preferred this whole effort start out on qemu-devel from day one.

Regards,

Anthony Liguori

>