From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:46988) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RYNkp-00068u-25 for qemu-devel@nongnu.org; Wed, 07 Dec 2011 15:06:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RYNkk-000110-Ku for qemu-devel@nongnu.org; Wed, 07 Dec 2011 15:06:11 -0500 Received: from e6.ny.us.ibm.com ([32.97.182.146]:50448) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RYNkk-00010b-Cs for qemu-devel@nongnu.org; Wed, 07 Dec 2011 15:06:06 -0500 Received: from /spool/local by e6.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 7 Dec 2011 15:06:01 -0500 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d01relay07.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id pB7K43Lm2850896 for ; Wed, 7 Dec 2011 15:04:04 -0500 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id pB7K2m6O012785 for ; Wed, 7 Dec 2011 13:02:56 -0700 Message-ID: <4EDFC665.3040800@linux.vnet.ibm.com> Date: Wed, 07 Dec 2011 15:02:45 -0500 From: Corey Bryant MIME-Version: 1.0 References: <4EDFAF91.4070904@linux.vnet.ibm.com> <4EDFB4F0.70406@codemonkey.ws> <4EDFBF56.9030607@linux.vnet.ibm.com> <4EDFC1F3.1080900@codemonkey.ws> In-Reply-To: <4EDFC1F3.1080900@codemonkey.ws> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] Device sandboxing List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Stefan Hajnoczi , Michael Halcrow , qemu-devel@nongnu.org, Eric Paris , Paul Moore , Ashley D Lai , Avi Kivity , Richa Marwaha , Amit Shah , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Eduardo Terrell Ferrari Otubo , Lee Terrell , George Wilson On 12/07/2011 02:43 PM, Anthony Liguori wrote: > On 12/07/2011 01:32 PM, Corey Bryant wrote: >> >> Agreed. >> >>>> * The untrusted thread would be restricted by seccomp mode 1 and >>>> would contain the device emulation code. >>> >>> I think the best strategy would allow for a device to run either in the >>> untrusted thread or the trusted thread. This makes performance testing a >>> bit easier and it also makes development a bit more natural. >>> >> >> When you refer to the device running in the trusted thread, are you >> talking >> about the case where you run QEMU without sandboxing support? I think >> we would >> ideally like to add this new support such that if it is not enabled, >> QEMU will >> still run as a single process and decomposition wouldn't occur. >> >>>> * The trusted helper thread would run beside the untrusted thread, >>>> enabling the untrusted thread to make syscalls beyond read(), >>>> write(), exit(), and sigreturn(). >>> >>> I assume you mean process, not thread BTW? >>> >> >> I do mean thread. When making calls on behalf of the seccomp'd thread, >> I think >> there will be syscalls that must be called from the same address >> space. That's >> where the the trusted helper thread would come into play. >> >>>> * IPC communication mechanisms: >>>> >>>> * An IPC mechanism will be required to enable communication between >>>> untrusted and trusted threads. >>>> >>>> * An IPC mechanism will also be required to enable communication >>>> between the main QEMU process and device processes. >>> >>> IPC is easy. We have tons of infrastructure in QEMU for IPC (virtio, >>> QMP, etc.). Please don't reinvent the wheel here. >>> >> >> Ok >> >>>> * The communication mechanisms must provide secure communication, >>>> be low overhead (easy to generate, parse, and validate), and must >>>> play well with sVirt/LSMs. >>> >>> I don't see how sVirt/LSM fits into this but all of these requirements >>> are also true for the other big untrusted thread that we interact with >>> (the guest itself). >>> >>> My view is that we should view the untrusted thread as an extension of >>> the guest and that the interfaces between the trusted thread and the >>> untrusted thread views it simply as another machine type that presents a >>> different (simpler) hardware abstraction. >>> >> >> Yes this makes sense. I think our biggest concern with IPC is that we >> don't >> introduce a TOCTTOU opportunity for a device to change call parameters >> after >> they've been checked and before the calls is made on behalf of the >> sandboxed >> thread. Shared memory that is writable by both untrusted/trusted >> thread could >> introduce this. > > This is no different than dealing with a guest. We have to handle this > with virtio already. > Well that's good. >> >>>> * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio, >>>> Google Native Client's IMC, and shared memory. >>> >>> The actual mechanism doesn't really matter I think, but see above >>> comments. >>> >>>> * If seccomp mode 2 support becomes available, decomposition of device >>>> emulation into untrusted/trusted threads may not be necessary. This >>>> could result in improved performance (no IPC overhead between trusted >>>> and untrusted thread) and reduced complexity (no need for trusted >>>> helper thread). >>> >>> If mode 2 is the Right Answer, then we shouldn't wait for it to become >>> available. We should make it available by pushing it into the kernel. >>> >>> If we all agree that if mode 2 existed, it's what we would use, then >>> that we have the answer to this discussion and we know what we need to >>> go off and do. >>> >> >> That would seem like the logical approach. I think there may be new >> mode 2 >> patches coming soon so we can see how they go over. > > I'd like to see what the whitelist would need to be for something like > QEMU in mode 2. My biggest concern is that the whitelist would need to > be so large that the practical security what's all that much improved. This may not tell the whole story. These are the syscalls found to be called with the following execution: qemu -hda harddrive.raw -boot c -m 256 access brk clock_gettime clone close connect dup eventfd2 execve fcntl64 fstat64 futex getegid32 geteuid32 getgid32 getpeername getrlimit getsockname gettimeofday getuid32 ioctl _llseek madvise mmap2 mprotect munmap nanosleep open poll prctl pread64 read readlink rt_sigaction rt_sigprocmask select set_robust_list set_thread_area set_tid_address shmat shmctl shmdt shmget signalfd socket stat64 tgkill time timer_create timer_gettime timer_settime uname write writev > > Regards, > > Anthony Liguori > -- Regards, Corey