From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:59586)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <coreyb@linux.vnet.ibm.com>) id 1RYNGn-0005i1-CH
	for qemu-devel@nongnu.org; Wed, 07 Dec 2011 14:35:11 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <coreyb@linux.vnet.ibm.com>) id 1RYNGl-0003SN-Ox
	for qemu-devel@nongnu.org; Wed, 07 Dec 2011 14:35:09 -0500
Received: from e1.ny.us.ibm.com ([32.97.182.141]:36437)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <coreyb@linux.vnet.ibm.com>) id 1RYNGl-0003Qc-KE
	for qemu-devel@nongnu.org; Wed, 07 Dec 2011 14:35:07 -0500
Received: from /spool/local
	by e1.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only!
	Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <coreyb@linux.vnet.ibm.com>;
	Wed, 7 Dec 2011 14:35:05 -0500
Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216])
	by d01relay01.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	pB7JWjVI296650
	for <qemu-devel@nongnu.org>; Wed, 7 Dec 2011 14:32:45 -0500
Received: from d01av02.pok.ibm.com (loopback [127.0.0.1])
	by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id
	pB7JWiWp020751
	for <qemu-devel@nongnu.org>; Wed, 7 Dec 2011 17:32:45 -0200
Message-ID: <4EDFBF56.9030607@linux.vnet.ibm.com>
Date: Wed, 07 Dec 2011 14:32:38 -0500
From: Corey Bryant <coreyb@linux.vnet.ibm.com>
MIME-Version: 1.0
References: <4EDFAF91.4070904@linux.vnet.ibm.com>
	<4EDFB4F0.70406@codemonkey.ws>
In-Reply-To: <4EDFB4F0.70406@codemonkey.ws>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] Device sandboxing
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Stefan Hajnoczi <stefanha@gmail.com>, Michael Halcrow <mhalcrow@google.com>, qemu-devel@nongnu.org, Eric Paris <eparis@redhat.com>, Paul Moore <pmoore@redhat.com>, Ashley D Lai <adlai@us.ibm.com>, Avi Kivity <avi@redhat.com>, Richa Marwaha <rmarwah@us.ibm.com>, Amit Shah <amit.shah@redhat.com>, =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <radimkrcmar@hpx.cz>, Eduardo Terrell Ferrari Otubo <eotubo@br.ibm.com>, Lee Terrell <lterrell@us.ibm.com>, George Wilson <gcwilson@us.ibm.com>


On 12/07/2011 01:48 PM, Anthony Liguori wrote:
> On 12/07/2011 12:25 PM, Corey Bryant wrote:
>> A group of us are starting to work on sandboxing QEMU device emulation
>> code. We're just getting started investigating various approaches, and
>> want to engage the community to gather input.
>>
>> Following are the design points that we are currently considering:
>
> To be perfectly honest, I think prototyping and measuring performance is
> going to be the only way to figure out the right approach here. Here are
> some thoughts on the various approaches.
>
>>
>> * Decompose QEMU into multiple processes:
>>
>> * This could be done such that QEMU devices execute in separate
>> processes based on device type, e.g. all block devices in one
>> process and all network devices in a second process. Another
>> alternative is executing a separate process per device.
>
> I don't think that a HIRD of QEMU-replacing daemons is the best approach
> to this problem. While I appreciate the academic attraction to such a
> proposal, I think practical experience tells us that this isn't the
> easiest type of system to get right.
>

Thanks for the input.

The idea would be to fork() the processes internally, if that is the 
concern.  They wouldn't have to be started separately by the user.

>> * Decomposition would not only afford a level of security inherent
>> in process separation, it would also allow development of stricter
>> sVirt/SELinux policy for the decomposed QEMU processes (e.g. a
>> block device specific policy). This would enable a true sandbox
>> with layers of defense.
>>
>> * Decompose the device emulation process further into an untrusted and
>> trusted thread:
>
> I think this general approach is the most rationale place to start.
>

Agreed.

>> * The untrusted thread would be restricted by seccomp mode 1 and
>> would contain the device emulation code.
>
> I think the best strategy would allow for a device to run either in the
> untrusted thread or the trusted thread. This makes performance testing a
> bit easier and it also makes development a bit more natural.
>

When you refer to the device running in the trusted thread, are you 
talking about the case where you run QEMU without sandboxing support?  I 
think we would ideally like to add this new support such that if it is 
not enabled, QEMU will still run as a single process and decomposition 
wouldn't occur.

>> * The trusted helper thread would run beside the untrusted thread,
>> enabling the untrusted thread to make syscalls beyond read(),
>> write(), exit(), and sigreturn().
>
> I assume you mean process, not thread BTW?
>

I do mean thread.  When making calls on behalf of the seccomp'd thread, 
I think there will be syscalls that must be called from the same address 
space.  That's where the the trusted helper thread would come into play.

>> * IPC communication mechanisms:
>>
>> * An IPC mechanism will be required to enable communication between
>> untrusted and trusted threads.
>>
>> * An IPC mechanism will also be required to enable communication
>> between the main QEMU process and device processes.
>
> IPC is easy. We have tons of infrastructure in QEMU for IPC (virtio,
> QMP, etc.). Please don't reinvent the wheel here.
>

Ok

>> * The communication mechanisms must provide secure communication,
>> be low overhead (easy to generate, parse, and validate), and must
>> play well with sVirt/LSMs.
>
> I don't see how sVirt/LSM fits into this but all of these requirements
> are also true for the other big untrusted thread that we interact with
> (the guest itself).
>
> My view is that we should view the untrusted thread as an extension of
> the guest and that the interfaces between the trusted thread and the
> untrusted thread views it simply as another machine type that presents a
> different (simpler) hardware abstraction.
>

Yes this makes sense.  I think our biggest concern with IPC is that we 
don't introduce a TOCTTOU opportunity for a device to change call 
parameters after they've been checked and before the calls is made on 
behalf of the sandboxed thread.  Shared memory that is writable by both 
untrusted/trusted thread could introduce this.

>> * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
>> Google Native Client's IMC, and shared memory.
>
> The actual mechanism doesn't really matter I think, but see above comments.
>
>> * If seccomp mode 2 support becomes available, decomposition of device
>> emulation into untrusted/trusted threads may not be necessary. This
>> could result in improved performance (no IPC overhead between trusted
>> and untrusted thread) and reduced complexity (no need for trusted
>> helper thread).
>
> If mode 2 is the Right Answer, then we shouldn't wait for it to become
> available. We should make it available by pushing it into the kernel.
>
> If we all agree that if mode 2 existed, it's what we would use, then
> that we have the answer to this discussion and we know what we need to
> go off and do.
>

That would seem like the logical approach.  I think there may be new 
mode 2 patches coming soon so we can see how they go over.

>> * Execution of QEMU with the sandboxed device support should be an
>> optional run-time specification.
>
> Ack with a small exception. If we can demonstrate that sandboxing has an
> acceptable performance overhead, then we should do it unconditionally to
> reduce our overall test matrix. It's unclear that that's obtainable though.
>

Good point.

>> * We will be focusing on legacy devices first, both for performance and
>> risk reasons.
>>
>> Once we settle on a direction, we will develop a proof of concept to
>> share with the community.
>
> Proof of concepts are the only way to settle on direction. Code speaks
> louder than anything else.
>

Definitely.

>>
>> We appreciate your input.
>>
>> Regards,
>>
>> Ashley Lai
>> Corey Bryant
>> Eduardo Otubo
>> Michael Halcrow
>> Paul Moore
>> Richa Marwaha
>
> In the future, I would suggest beginning these type of discussions on
> the list to start with. Otherwise valuable in information (including
> discussion and debate on directions) are not available to the greater
> community at large.
>
> Not a big deal in this case, but I want to be on the record here about
> this. I would have greatly preferred this whole effort start out on
> qemu-devel from day one.
>

Understood.

> Regards,
>
> Anthony Liguori
>
>>
>
>

-- 
Regards,
Corey