From: Corey Bryant <coreyb@linux.vnet.ibm.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: "Stefan Hajnoczi" <stefanha@gmail.com>,
"Michael Halcrow" <mhalcrow@google.com>,
qemu-devel@nongnu.org, "Eric Paris" <eparis@redhat.com>,
"Paul Moore" <pmoore@redhat.com>,
"Ashley D Lai" <adlai@us.ibm.com>, "Avi Kivity" <avi@redhat.com>,
"Richa Marwaha" <rmarwah@us.ibm.com>,
"Amit Shah" <amit.shah@redhat.com>,
"Radim Krčmář" <radimkrcmar@hpx.cz>,
"Eduardo Terrell Ferrari Otubo" <eotubo@br.ibm.com>,
"Lee Terrell" <lterrell@us.ibm.com>,
"George Wilson" <gcwilson@us.ibm.com>
Subject: Re: [Qemu-devel] [RFC] Device sandboxing
Date: Wed, 07 Dec 2011 14:32:38 -0500 [thread overview]
Message-ID: <4EDFBF56.9030607@linux.vnet.ibm.com> (raw)
In-Reply-To: <4EDFB4F0.70406@codemonkey.ws>
On 12/07/2011 01:48 PM, Anthony Liguori wrote:
> On 12/07/2011 12:25 PM, Corey Bryant wrote:
>> A group of us are starting to work on sandboxing QEMU device emulation
>> code. We're just getting started investigating various approaches, and
>> want to engage the community to gather input.
>>
>> Following are the design points that we are currently considering:
>
> To be perfectly honest, I think prototyping and measuring performance is
> going to be the only way to figure out the right approach here. Here are
> some thoughts on the various approaches.
>
>>
>> * Decompose QEMU into multiple processes:
>>
>> * This could be done such that QEMU devices execute in separate
>> processes based on device type, e.g. all block devices in one
>> process and all network devices in a second process. Another
>> alternative is executing a separate process per device.
>
> I don't think that a HIRD of QEMU-replacing daemons is the best approach
> to this problem. While I appreciate the academic attraction to such a
> proposal, I think practical experience tells us that this isn't the
> easiest type of system to get right.
>
Thanks for the input.
The idea would be to fork() the processes internally, if that is the
concern. They wouldn't have to be started separately by the user.
>> * Decomposition would not only afford a level of security inherent
>> in process separation, it would also allow development of stricter
>> sVirt/SELinux policy for the decomposed QEMU processes (e.g. a
>> block device specific policy). This would enable a true sandbox
>> with layers of defense.
>>
>> * Decompose the device emulation process further into an untrusted and
>> trusted thread:
>
> I think this general approach is the most rationale place to start.
>
Agreed.
>> * The untrusted thread would be restricted by seccomp mode 1 and
>> would contain the device emulation code.
>
> I think the best strategy would allow for a device to run either in the
> untrusted thread or the trusted thread. This makes performance testing a
> bit easier and it also makes development a bit more natural.
>
When you refer to the device running in the trusted thread, are you
talking about the case where you run QEMU without sandboxing support? I
think we would ideally like to add this new support such that if it is
not enabled, QEMU will still run as a single process and decomposition
wouldn't occur.
>> * The trusted helper thread would run beside the untrusted thread,
>> enabling the untrusted thread to make syscalls beyond read(),
>> write(), exit(), and sigreturn().
>
> I assume you mean process, not thread BTW?
>
I do mean thread. When making calls on behalf of the seccomp'd thread,
I think there will be syscalls that must be called from the same address
space. That's where the the trusted helper thread would come into play.
>> * IPC communication mechanisms:
>>
>> * An IPC mechanism will be required to enable communication between
>> untrusted and trusted threads.
>>
>> * An IPC mechanism will also be required to enable communication
>> between the main QEMU process and device processes.
>
> IPC is easy. We have tons of infrastructure in QEMU for IPC (virtio,
> QMP, etc.). Please don't reinvent the wheel here.
>
Ok
>> * The communication mechanisms must provide secure communication,
>> be low overhead (easy to generate, parse, and validate), and must
>> play well with sVirt/LSMs.
>
> I don't see how sVirt/LSM fits into this but all of these requirements
> are also true for the other big untrusted thread that we interact with
> (the guest itself).
>
> My view is that we should view the untrusted thread as an extension of
> the guest and that the interfaces between the trusted thread and the
> untrusted thread views it simply as another machine type that presents a
> different (simpler) hardware abstraction.
>
Yes this makes sense. I think our biggest concern with IPC is that we
don't introduce a TOCTTOU opportunity for a device to change call
parameters after they've been checked and before the calls is made on
behalf of the sandboxed thread. Shared memory that is writable by both
untrusted/trusted thread could introduce this.
>> * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
>> Google Native Client's IMC, and shared memory.
>
> The actual mechanism doesn't really matter I think, but see above comments.
>
>> * If seccomp mode 2 support becomes available, decomposition of device
>> emulation into untrusted/trusted threads may not be necessary. This
>> could result in improved performance (no IPC overhead between trusted
>> and untrusted thread) and reduced complexity (no need for trusted
>> helper thread).
>
> If mode 2 is the Right Answer, then we shouldn't wait for it to become
> available. We should make it available by pushing it into the kernel.
>
> If we all agree that if mode 2 existed, it's what we would use, then
> that we have the answer to this discussion and we know what we need to
> go off and do.
>
That would seem like the logical approach. I think there may be new
mode 2 patches coming soon so we can see how they go over.
>> * Execution of QEMU with the sandboxed device support should be an
>> optional run-time specification.
>
> Ack with a small exception. If we can demonstrate that sandboxing has an
> acceptable performance overhead, then we should do it unconditionally to
> reduce our overall test matrix. It's unclear that that's obtainable though.
>
Good point.
>> * We will be focusing on legacy devices first, both for performance and
>> risk reasons.
>>
>> Once we settle on a direction, we will develop a proof of concept to
>> share with the community.
>
> Proof of concepts are the only way to settle on direction. Code speaks
> louder than anything else.
>
Definitely.
>>
>> We appreciate your input.
>>
>> Regards,
>>
>> Ashley Lai
>> Corey Bryant
>> Eduardo Otubo
>> Michael Halcrow
>> Paul Moore
>> Richa Marwaha
>
> In the future, I would suggest beginning these type of discussions on
> the list to start with. Otherwise valuable in information (including
> discussion and debate on directions) are not available to the greater
> community at large.
>
> Not a big deal in this case, but I want to be on the record here about
> this. I would have greatly preferred this whole effort start out on
> qemu-devel from day one.
>
Understood.
> Regards,
>
> Anthony Liguori
>
>>
>
>
--
Regards,
Corey
next prev parent reply other threads:[~2011-12-07 19:35 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-07 18:25 [Qemu-devel] [RFC] Device sandboxing Corey Bryant
2011-12-07 18:48 ` Anthony Liguori
2011-12-07 19:32 ` Corey Bryant [this message]
2011-12-07 19:43 ` Anthony Liguori
2011-12-07 19:52 ` Michael Halcrow
2011-12-07 20:02 ` Corey Bryant
2011-12-07 20:54 ` Eric Paris
2011-12-08 9:40 ` Stefan Hajnoczi
2011-12-11 10:50 ` Dor Laor
2011-12-12 18:54 ` Will Drewry
2011-12-08 9:47 ` Stefan Hajnoczi
2011-12-08 14:39 ` Corey Bryant
2011-12-07 21:20 ` Paul Moore
2011-12-14 17:15 ` Serge E. Hallyn
2011-12-14 23:56 ` Paul Moore
2011-12-15 14:28 ` Corey Bryant
2011-12-15 15:14 ` Serge Hallyn
2011-12-15 15:35 ` Paul Moore
2011-12-15 16:05 ` Serge Hallyn
2011-12-08 21:51 ` Blue Swirl
2011-12-12 18:30 ` Corey Bryant
2011-12-09 16:17 ` Paul Brook
2011-12-09 16:34 ` Paul Moore
2011-12-09 17:32 ` Paul Brook
2011-12-09 17:49 ` Paul Moore
2011-12-09 18:46 ` Paul Brook
2011-12-09 18:50 ` Paul Moore
2011-12-09 18:59 ` Paul Brook
2011-12-09 19:17 ` Paul Moore
2011-12-10 19:39 ` Blue Swirl
2011-12-11 9:08 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EDFBF56.9030607@linux.vnet.ibm.com \
--to=coreyb@linux.vnet.ibm.com \
--cc=adlai@us.ibm.com \
--cc=amit.shah@redhat.com \
--cc=anthony@codemonkey.ws \
--cc=avi@redhat.com \
--cc=eotubo@br.ibm.com \
--cc=eparis@redhat.com \
--cc=gcwilson@us.ibm.com \
--cc=lterrell@us.ibm.com \
--cc=mhalcrow@google.com \
--cc=pmoore@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=radimkrcmar@hpx.cz \
--cc=rmarwah@us.ibm.com \
--cc=stefanha@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.