From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:59586) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RYNGn-0005i1-CH for qemu-devel@nongnu.org; Wed, 07 Dec 2011 14:35:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RYNGl-0003SN-Ox for qemu-devel@nongnu.org; Wed, 07 Dec 2011 14:35:09 -0500 Received: from e1.ny.us.ibm.com ([32.97.182.141]:36437) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RYNGl-0003Qc-KE for qemu-devel@nongnu.org; Wed, 07 Dec 2011 14:35:07 -0500 Received: from /spool/local by e1.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 7 Dec 2011 14:35:05 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay01.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id pB7JWjVI296650 for ; Wed, 7 Dec 2011 14:32:45 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id pB7JWiWp020751 for ; Wed, 7 Dec 2011 17:32:45 -0200 Message-ID: <4EDFBF56.9030607@linux.vnet.ibm.com> Date: Wed, 07 Dec 2011 14:32:38 -0500 From: Corey Bryant MIME-Version: 1.0 References: <4EDFAF91.4070904@linux.vnet.ibm.com> <4EDFB4F0.70406@codemonkey.ws> In-Reply-To: <4EDFB4F0.70406@codemonkey.ws> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] Device sandboxing List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Stefan Hajnoczi , Michael Halcrow , qemu-devel@nongnu.org, Eric Paris , Paul Moore , Ashley D Lai , Avi Kivity , Richa Marwaha , Amit Shah , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Eduardo Terrell Ferrari Otubo , Lee Terrell , George Wilson On 12/07/2011 01:48 PM, Anthony Liguori wrote: > On 12/07/2011 12:25 PM, Corey Bryant wrote: >> A group of us are starting to work on sandboxing QEMU device emulation >> code. We're just getting started investigating various approaches, and >> want to engage the community to gather input. >> >> Following are the design points that we are currently considering: > > To be perfectly honest, I think prototyping and measuring performance is > going to be the only way to figure out the right approach here. Here are > some thoughts on the various approaches. > >> >> * Decompose QEMU into multiple processes: >> >> * This could be done such that QEMU devices execute in separate >> processes based on device type, e.g. all block devices in one >> process and all network devices in a second process. Another >> alternative is executing a separate process per device. > > I don't think that a HIRD of QEMU-replacing daemons is the best approach > to this problem. While I appreciate the academic attraction to such a > proposal, I think practical experience tells us that this isn't the > easiest type of system to get right. > Thanks for the input. The idea would be to fork() the processes internally, if that is the concern. They wouldn't have to be started separately by the user. >> * Decomposition would not only afford a level of security inherent >> in process separation, it would also allow development of stricter >> sVirt/SELinux policy for the decomposed QEMU processes (e.g. a >> block device specific policy). This would enable a true sandbox >> with layers of defense. >> >> * Decompose the device emulation process further into an untrusted and >> trusted thread: > > I think this general approach is the most rationale place to start. > Agreed. >> * The untrusted thread would be restricted by seccomp mode 1 and >> would contain the device emulation code. > > I think the best strategy would allow for a device to run either in the > untrusted thread or the trusted thread. This makes performance testing a > bit easier and it also makes development a bit more natural. > When you refer to the device running in the trusted thread, are you talking about the case where you run QEMU without sandboxing support? I think we would ideally like to add this new support such that if it is not enabled, QEMU will still run as a single process and decomposition wouldn't occur. >> * The trusted helper thread would run beside the untrusted thread, >> enabling the untrusted thread to make syscalls beyond read(), >> write(), exit(), and sigreturn(). > > I assume you mean process, not thread BTW? > I do mean thread. When making calls on behalf of the seccomp'd thread, I think there will be syscalls that must be called from the same address space. That's where the the trusted helper thread would come into play. >> * IPC communication mechanisms: >> >> * An IPC mechanism will be required to enable communication between >> untrusted and trusted threads. >> >> * An IPC mechanism will also be required to enable communication >> between the main QEMU process and device processes. > > IPC is easy. We have tons of infrastructure in QEMU for IPC (virtio, > QMP, etc.). Please don't reinvent the wheel here. > Ok >> * The communication mechanisms must provide secure communication, >> be low overhead (easy to generate, parse, and validate), and must >> play well with sVirt/LSMs. > > I don't see how sVirt/LSM fits into this but all of these requirements > are also true for the other big untrusted thread that we interact with > (the guest itself). > > My view is that we should view the untrusted thread as an extension of > the guest and that the interfaces between the trusted thread and the > untrusted thread views it simply as another machine type that presents a > different (simpler) hardware abstraction. > Yes this makes sense. I think our biggest concern with IPC is that we don't introduce a TOCTTOU opportunity for a device to change call parameters after they've been checked and before the calls is made on behalf of the sandboxed thread. Shared memory that is writable by both untrusted/trusted thread could introduce this. >> * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio, >> Google Native Client's IMC, and shared memory. > > The actual mechanism doesn't really matter I think, but see above comments. > >> * If seccomp mode 2 support becomes available, decomposition of device >> emulation into untrusted/trusted threads may not be necessary. This >> could result in improved performance (no IPC overhead between trusted >> and untrusted thread) and reduced complexity (no need for trusted >> helper thread). > > If mode 2 is the Right Answer, then we shouldn't wait for it to become > available. We should make it available by pushing it into the kernel. > > If we all agree that if mode 2 existed, it's what we would use, then > that we have the answer to this discussion and we know what we need to > go off and do. > That would seem like the logical approach. I think there may be new mode 2 patches coming soon so we can see how they go over. >> * Execution of QEMU with the sandboxed device support should be an >> optional run-time specification. > > Ack with a small exception. If we can demonstrate that sandboxing has an > acceptable performance overhead, then we should do it unconditionally to > reduce our overall test matrix. It's unclear that that's obtainable though. > Good point. >> * We will be focusing on legacy devices first, both for performance and >> risk reasons. >> >> Once we settle on a direction, we will develop a proof of concept to >> share with the community. > > Proof of concepts are the only way to settle on direction. Code speaks > louder than anything else. > Definitely. >> >> We appreciate your input. >> >> Regards, >> >> Ashley Lai >> Corey Bryant >> Eduardo Otubo >> Michael Halcrow >> Paul Moore >> Richa Marwaha > > In the future, I would suggest beginning these type of discussions on > the list to start with. Otherwise valuable in information (including > discussion and debate on directions) are not available to the greater > community at large. > > Not a big deal in this case, but I want to be on the record here about > this. I would have greatly preferred this whole effort start out on > qemu-devel from day one. > Understood. > Regards, > > Anthony Liguori > >> > > -- Regards, Corey