qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] Device sandboxing
@ 2011-12-07 18:25 Corey Bryant
  2011-12-07 18:48 ` Anthony Liguori
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Corey Bryant @ 2011-12-07 18:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Stefan Hajnoczi, Michael Halcrow,
	Radim Krčmář, Corey C Bryant, Lee Terrell,
	Eric Paris, Paul Moore, Eduardo Terrell Ferrari Otubo, Avi Kivity,
	Richa Marwaha, Amit Shah, Ashley D Lai, George Wilson

A group of us are starting to work on sandboxing QEMU device emulation
code.  We're just getting started investigating various approaches, and
want to engage the community to gather input.

Following are the design points that we are currently considering:

* Decompose QEMU into multiple processes:

     * This could be done such that QEMU devices execute in separate
       processes based on device type, e.g. all block devices in one
       process and all network devices in a second process.  Another
       alternative is executing a separate process per device.

     * Decomposition would not only afford a level of security inherent
       in process separation, it would also allow development of stricter
       sVirt/SELinux policy for the decomposed QEMU processes (e.g. a
       block device specific policy).  This would enable a true sandbox
       with layers of defense.

* Decompose the device emulation process further into an untrusted and
   trusted thread:

     * The untrusted thread would be restricted by seccomp mode 1 and
       would contain the device emulation code.

     * The trusted helper thread would run beside the untrusted thread,
       enabling the untrusted thread to make syscalls beyond read(),
       write(), exit(), and sigreturn().

* IPC communication mechanisms:

     * An IPC mechanism will be required to enable communication between
       untrusted and trusted threads.

     * An IPC mechanism will also be required to enable communication
       between the main QEMU process and device processes.

     * The communication mechanisms must provide secure communication,
       be low overhead (easy to generate, parse, and validate), and must
       play well with sVirt/LSMs.

     * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
       Google Native Client's IMC, and shared memory.

* If seccomp mode 2 support becomes available, decomposition of device
   emulation into untrusted/trusted threads may not be necessary.  This
   could result in improved performance (no IPC overhead between trusted
   and untrusted thread) and reduced complexity (no need for trusted
   helper thread).

* Execution of QEMU with the sandboxed device support should be an
   optional run-time specification.

* We will be focusing on legacy devices first, both for performance and
   risk reasons.

Once we settle on a direction, we will develop a proof of concept to
share with the community.

We appreciate your input.

Regards,

Ashley Lai
Corey Bryant
Eduardo Otubo
Michael Halcrow
Paul Moore
Richa Marwaha

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 18:25 [Qemu-devel] [RFC] Device sandboxing Corey Bryant
@ 2011-12-07 18:48 ` Anthony Liguori
  2011-12-07 19:32   ` Corey Bryant
  2011-12-07 21:20   ` Paul Moore
  2011-12-08 21:51 ` Blue Swirl
  2011-12-09 16:17 ` Paul Brook
  2 siblings, 2 replies; 31+ messages in thread
From: Anthony Liguori @ 2011-12-07 18:48 UTC (permalink / raw)
  To: Corey Bryant
  Cc: Ashley D Lai, Stefan Hajnoczi, Michael Halcrow, qemu-devel,
	Eric Paris, Paul Moore, Radim Krčmář, Avi Kivity,
	Richa Marwaha, Amit Shah, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson

On 12/07/2011 12:25 PM, Corey Bryant wrote:
> A group of us are starting to work on sandboxing QEMU device emulation
> code. We're just getting started investigating various approaches, and
> want to engage the community to gather input.
>
> Following are the design points that we are currently considering:

To be perfectly honest, I think prototyping and measuring performance is going 
to be the only way to figure out the right approach here.  Here are some 
thoughts on the various approaches.

>
> * Decompose QEMU into multiple processes:
>
> * This could be done such that QEMU devices execute in separate
> processes based on device type, e.g. all block devices in one
> process and all network devices in a second process. Another
> alternative is executing a separate process per device.

I don't think that a HIRD of QEMU-replacing daemons is the best approach to this 
problem.  While I appreciate the academic attraction to such a proposal, I think 
practical experience tells us that this isn't the easiest type of system to get 
right.

> * Decomposition would not only afford a level of security inherent
> in process separation, it would also allow development of stricter
> sVirt/SELinux policy for the decomposed QEMU processes (e.g. a
> block device specific policy). This would enable a true sandbox
> with layers of defense.
>
> * Decompose the device emulation process further into an untrusted and
> trusted thread:

I think this general approach is the most rationale place to start.

> * The untrusted thread would be restricted by seccomp mode 1 and
> would contain the device emulation code.

I think the best strategy would allow for a device to run either in the 
untrusted thread or the trusted thread.  This makes performance testing a bit 
easier and it also makes development a bit more natural.

> * The trusted helper thread would run beside the untrusted thread,
> enabling the untrusted thread to make syscalls beyond read(),
> write(), exit(), and sigreturn().

I assume you mean process, not thread BTW?

> * IPC communication mechanisms:
>
> * An IPC mechanism will be required to enable communication between
> untrusted and trusted threads.
>
> * An IPC mechanism will also be required to enable communication
> between the main QEMU process and device processes.

IPC is easy.  We have tons of infrastructure in QEMU for IPC (virtio, QMP, 
etc.).  Please don't reinvent the wheel here.

> * The communication mechanisms must provide secure communication,
> be low overhead (easy to generate, parse, and validate), and must
> play well with sVirt/LSMs.

I don't see how sVirt/LSM fits into this but all of these requirements are also 
true for the other big untrusted thread that we interact with (the guest itself).

My view is that we should view the untrusted thread as an extension of the guest 
and that the interfaces between the trusted thread and the untrusted thread 
views it simply as another machine type that presents a different (simpler) 
hardware abstraction.

> * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
> Google Native Client's IMC, and shared memory.

The actual mechanism doesn't really matter I think, but see above comments.

> * If seccomp mode 2 support becomes available, decomposition of device
> emulation into untrusted/trusted threads may not be necessary. This
> could result in improved performance (no IPC overhead between trusted
> and untrusted thread) and reduced complexity (no need for trusted
> helper thread).

If mode 2 is the Right Answer, then we shouldn't wait for it to become 
available.  We should make it available by pushing it into the kernel.

If we all agree that if mode 2 existed, it's what we would use, then that we 
have the answer to this discussion and we know what we need to go off and do.

> * Execution of QEMU with the sandboxed device support should be an
> optional run-time specification.

Ack with a small exception.  If we can demonstrate that sandboxing has an 
acceptable performance overhead, then we should do it unconditionally to reduce 
our overall test matrix.  It's unclear that that's obtainable though.

> * We will be focusing on legacy devices first, both for performance and
> risk reasons.
>
> Once we settle on a direction, we will develop a proof of concept to
> share with the community.

Proof of concepts are the only way to settle on direction.  Code speaks louder 
than anything else.

>
> We appreciate your input.
>
> Regards,
>
> Ashley Lai
> Corey Bryant
> Eduardo Otubo
> Michael Halcrow
> Paul Moore
> Richa Marwaha

In the future, I would suggest beginning these type of discussions on the list 
to start with.  Otherwise valuable in information (including discussion and 
debate on directions) are not available to the greater community at large.

Not a big deal in this case, but I want to be on the record here about this.  I 
would have greatly preferred this whole effort start out on qemu-devel from day one.

Regards,

Anthony Liguori

>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 18:48 ` Anthony Liguori
@ 2011-12-07 19:32   ` Corey Bryant
  2011-12-07 19:43     ` Anthony Liguori
  2011-12-08  9:47     ` Stefan Hajnoczi
  2011-12-07 21:20   ` Paul Moore
  1 sibling, 2 replies; 31+ messages in thread
From: Corey Bryant @ 2011-12-07 19:32 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Stefan Hajnoczi, Michael Halcrow, qemu-devel, Eric Paris,
	Paul Moore, Ashley D Lai, Avi Kivity, Richa Marwaha, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson



On 12/07/2011 01:48 PM, Anthony Liguori wrote:
> On 12/07/2011 12:25 PM, Corey Bryant wrote:
>> A group of us are starting to work on sandboxing QEMU device emulation
>> code. We're just getting started investigating various approaches, and
>> want to engage the community to gather input.
>>
>> Following are the design points that we are currently considering:
>
> To be perfectly honest, I think prototyping and measuring performance is
> going to be the only way to figure out the right approach here. Here are
> some thoughts on the various approaches.
>
>>
>> * Decompose QEMU into multiple processes:
>>
>> * This could be done such that QEMU devices execute in separate
>> processes based on device type, e.g. all block devices in one
>> process and all network devices in a second process. Another
>> alternative is executing a separate process per device.
>
> I don't think that a HIRD of QEMU-replacing daemons is the best approach
> to this problem. While I appreciate the academic attraction to such a
> proposal, I think practical experience tells us that this isn't the
> easiest type of system to get right.
>

Thanks for the input.

The idea would be to fork() the processes internally, if that is the 
concern.  They wouldn't have to be started separately by the user.

>> * Decomposition would not only afford a level of security inherent
>> in process separation, it would also allow development of stricter
>> sVirt/SELinux policy for the decomposed QEMU processes (e.g. a
>> block device specific policy). This would enable a true sandbox
>> with layers of defense.
>>
>> * Decompose the device emulation process further into an untrusted and
>> trusted thread:
>
> I think this general approach is the most rationale place to start.
>

Agreed.

>> * The untrusted thread would be restricted by seccomp mode 1 and
>> would contain the device emulation code.
>
> I think the best strategy would allow for a device to run either in the
> untrusted thread or the trusted thread. This makes performance testing a
> bit easier and it also makes development a bit more natural.
>

When you refer to the device running in the trusted thread, are you 
talking about the case where you run QEMU without sandboxing support?  I 
think we would ideally like to add this new support such that if it is 
not enabled, QEMU will still run as a single process and decomposition 
wouldn't occur.

>> * The trusted helper thread would run beside the untrusted thread,
>> enabling the untrusted thread to make syscalls beyond read(),
>> write(), exit(), and sigreturn().
>
> I assume you mean process, not thread BTW?
>

I do mean thread.  When making calls on behalf of the seccomp'd thread, 
I think there will be syscalls that must be called from the same address 
space.  That's where the the trusted helper thread would come into play.

>> * IPC communication mechanisms:
>>
>> * An IPC mechanism will be required to enable communication between
>> untrusted and trusted threads.
>>
>> * An IPC mechanism will also be required to enable communication
>> between the main QEMU process and device processes.
>
> IPC is easy. We have tons of infrastructure in QEMU for IPC (virtio,
> QMP, etc.). Please don't reinvent the wheel here.
>

Ok

>> * The communication mechanisms must provide secure communication,
>> be low overhead (easy to generate, parse, and validate), and must
>> play well with sVirt/LSMs.
>
> I don't see how sVirt/LSM fits into this but all of these requirements
> are also true for the other big untrusted thread that we interact with
> (the guest itself).
>
> My view is that we should view the untrusted thread as an extension of
> the guest and that the interfaces between the trusted thread and the
> untrusted thread views it simply as another machine type that presents a
> different (simpler) hardware abstraction.
>

Yes this makes sense.  I think our biggest concern with IPC is that we 
don't introduce a TOCTTOU opportunity for a device to change call 
parameters after they've been checked and before the calls is made on 
behalf of the sandboxed thread.  Shared memory that is writable by both 
untrusted/trusted thread could introduce this.

>> * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
>> Google Native Client's IMC, and shared memory.
>
> The actual mechanism doesn't really matter I think, but see above comments.
>
>> * If seccomp mode 2 support becomes available, decomposition of device
>> emulation into untrusted/trusted threads may not be necessary. This
>> could result in improved performance (no IPC overhead between trusted
>> and untrusted thread) and reduced complexity (no need for trusted
>> helper thread).
>
> If mode 2 is the Right Answer, then we shouldn't wait for it to become
> available. We should make it available by pushing it into the kernel.
>
> If we all agree that if mode 2 existed, it's what we would use, then
> that we have the answer to this discussion and we know what we need to
> go off and do.
>

That would seem like the logical approach.  I think there may be new 
mode 2 patches coming soon so we can see how they go over.

>> * Execution of QEMU with the sandboxed device support should be an
>> optional run-time specification.
>
> Ack with a small exception. If we can demonstrate that sandboxing has an
> acceptable performance overhead, then we should do it unconditionally to
> reduce our overall test matrix. It's unclear that that's obtainable though.
>

Good point.

>> * We will be focusing on legacy devices first, both for performance and
>> risk reasons.
>>
>> Once we settle on a direction, we will develop a proof of concept to
>> share with the community.
>
> Proof of concepts are the only way to settle on direction. Code speaks
> louder than anything else.
>

Definitely.

>>
>> We appreciate your input.
>>
>> Regards,
>>
>> Ashley Lai
>> Corey Bryant
>> Eduardo Otubo
>> Michael Halcrow
>> Paul Moore
>> Richa Marwaha
>
> In the future, I would suggest beginning these type of discussions on
> the list to start with. Otherwise valuable in information (including
> discussion and debate on directions) are not available to the greater
> community at large.
>
> Not a big deal in this case, but I want to be on the record here about
> this. I would have greatly preferred this whole effort start out on
> qemu-devel from day one.
>

Understood.

> Regards,
>
> Anthony Liguori
>
>>
>
>

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 19:32   ` Corey Bryant
@ 2011-12-07 19:43     ` Anthony Liguori
  2011-12-07 19:52       ` Michael Halcrow
                         ` (2 more replies)
  2011-12-08  9:47     ` Stefan Hajnoczi
  1 sibling, 3 replies; 31+ messages in thread
From: Anthony Liguori @ 2011-12-07 19:43 UTC (permalink / raw)
  To: Corey Bryant
  Cc: Stefan Hajnoczi, Michael Halcrow, qemu-devel, Eric Paris,
	Paul Moore, Ashley D Lai, Avi Kivity, Richa Marwaha, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson

On 12/07/2011 01:32 PM, Corey Bryant wrote:
>
> Agreed.
>
>>> * The untrusted thread would be restricted by seccomp mode 1 and
>>> would contain the device emulation code.
>>
>> I think the best strategy would allow for a device to run either in the
>> untrusted thread or the trusted thread. This makes performance testing a
>> bit easier and it also makes development a bit more natural.
>>
>
> When you refer to the device running in the trusted thread, are you talking
> about the case where you run QEMU without sandboxing support? I think we would
> ideally like to add this new support such that if it is not enabled, QEMU will
> still run as a single process and decomposition wouldn't occur.
>
>>> * The trusted helper thread would run beside the untrusted thread,
>>> enabling the untrusted thread to make syscalls beyond read(),
>>> write(), exit(), and sigreturn().
>>
>> I assume you mean process, not thread BTW?
>>
>
> I do mean thread. When making calls on behalf of the seccomp'd thread, I think
> there will be syscalls that must be called from the same address space. That's
> where the the trusted helper thread would come into play.
>
>>> * IPC communication mechanisms:
>>>
>>> * An IPC mechanism will be required to enable communication between
>>> untrusted and trusted threads.
>>>
>>> * An IPC mechanism will also be required to enable communication
>>> between the main QEMU process and device processes.
>>
>> IPC is easy. We have tons of infrastructure in QEMU for IPC (virtio,
>> QMP, etc.). Please don't reinvent the wheel here.
>>
>
> Ok
>
>>> * The communication mechanisms must provide secure communication,
>>> be low overhead (easy to generate, parse, and validate), and must
>>> play well with sVirt/LSMs.
>>
>> I don't see how sVirt/LSM fits into this but all of these requirements
>> are also true for the other big untrusted thread that we interact with
>> (the guest itself).
>>
>> My view is that we should view the untrusted thread as an extension of
>> the guest and that the interfaces between the trusted thread and the
>> untrusted thread views it simply as another machine type that presents a
>> different (simpler) hardware abstraction.
>>
>
> Yes this makes sense. I think our biggest concern with IPC is that we don't
> introduce a TOCTTOU opportunity for a device to change call parameters after
> they've been checked and before the calls is made on behalf of the sandboxed
> thread. Shared memory that is writable by both untrusted/trusted thread could
> introduce this.

This is no different than dealing with a guest.  We have to handle this with 
virtio already.

>
>>> * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
>>> Google Native Client's IMC, and shared memory.
>>
>> The actual mechanism doesn't really matter I think, but see above comments.
>>
>>> * If seccomp mode 2 support becomes available, decomposition of device
>>> emulation into untrusted/trusted threads may not be necessary. This
>>> could result in improved performance (no IPC overhead between trusted
>>> and untrusted thread) and reduced complexity (no need for trusted
>>> helper thread).
>>
>> If mode 2 is the Right Answer, then we shouldn't wait for it to become
>> available. We should make it available by pushing it into the kernel.
>>
>> If we all agree that if mode 2 existed, it's what we would use, then
>> that we have the answer to this discussion and we know what we need to
>> go off and do.
>>
>
> That would seem like the logical approach. I think there may be new mode 2
> patches coming soon so we can see how they go over.

I'd like to see what the whitelist would need to be for something like QEMU in 
mode 2.  My biggest concern is that the whitelist would need to be so large that 
the practical security what's all that much improved.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 19:43     ` Anthony Liguori
@ 2011-12-07 19:52       ` Michael Halcrow
  2011-12-07 20:02       ` Corey Bryant
  2011-12-07 20:54       ` Eric Paris
  2 siblings, 0 replies; 31+ messages in thread
From: Michael Halcrow @ 2011-12-07 19:52 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Stefan Hajnoczi, Corey Bryant, Lee Terrell, qemu-devel,
	Eric Paris, Paul Moore, Ashley D Lai, Avi Kivity, Richa Marwaha,
	Amit Shah, Radim Krčmář,
	Eduardo Terrell Ferrari Otubo, George Wilson

[-- Attachment #1: Type: text/plain, Size: 850 bytes --]

On Wed, Dec 7, 2011 at 11:43 AM, Anthony Liguori <anthony@codemonkey.ws>wrote:

> I'd like to see what the whitelist would need to be for something like
> QEMU in mode 2.  My biggest concern is that the whitelist would need to be
> so large that the practical security what's all that much improved.
>

Based on some prototyping work I've done with VMM ptrace sandboxing, I
estimate a ceiling of about 50 syscalls in the whitelist. This is a
reduction from over 300, and Linux syscalls that have had security
vulnerabilities in the past few years were not needed. Aside from that, if
we can further restrict based on syscall parameters, then we have a
straightforward mechanism for locking down access to things like file
system resources. For instance, a block device can be restricted to only
accessing the host file(s) that back the block device.

[-- Attachment #2: Type: text/html, Size: 1155 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 19:43     ` Anthony Liguori
  2011-12-07 19:52       ` Michael Halcrow
@ 2011-12-07 20:02       ` Corey Bryant
  2011-12-07 20:54       ` Eric Paris
  2 siblings, 0 replies; 31+ messages in thread
From: Corey Bryant @ 2011-12-07 20:02 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Stefan Hajnoczi, Michael Halcrow, qemu-devel, Eric Paris,
	Paul Moore, Ashley D Lai, Avi Kivity, Richa Marwaha, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson



On 12/07/2011 02:43 PM, Anthony Liguori wrote:
> On 12/07/2011 01:32 PM, Corey Bryant wrote:
>>
>> Agreed.
>>
>>>> * The untrusted thread would be restricted by seccomp mode 1 and
>>>> would contain the device emulation code.
>>>
>>> I think the best strategy would allow for a device to run either in the
>>> untrusted thread or the trusted thread. This makes performance testing a
>>> bit easier and it also makes development a bit more natural.
>>>
>>
>> When you refer to the device running in the trusted thread, are you
>> talking
>> about the case where you run QEMU without sandboxing support? I think
>> we would
>> ideally like to add this new support such that if it is not enabled,
>> QEMU will
>> still run as a single process and decomposition wouldn't occur.
>>
>>>> * The trusted helper thread would run beside the untrusted thread,
>>>> enabling the untrusted thread to make syscalls beyond read(),
>>>> write(), exit(), and sigreturn().
>>>
>>> I assume you mean process, not thread BTW?
>>>
>>
>> I do mean thread. When making calls on behalf of the seccomp'd thread,
>> I think
>> there will be syscalls that must be called from the same address
>> space. That's
>> where the the trusted helper thread would come into play.
>>
>>>> * IPC communication mechanisms:
>>>>
>>>> * An IPC mechanism will be required to enable communication between
>>>> untrusted and trusted threads.
>>>>
>>>> * An IPC mechanism will also be required to enable communication
>>>> between the main QEMU process and device processes.
>>>
>>> IPC is easy. We have tons of infrastructure in QEMU for IPC (virtio,
>>> QMP, etc.). Please don't reinvent the wheel here.
>>>
>>
>> Ok
>>
>>>> * The communication mechanisms must provide secure communication,
>>>> be low overhead (easy to generate, parse, and validate), and must
>>>> play well with sVirt/LSMs.
>>>
>>> I don't see how sVirt/LSM fits into this but all of these requirements
>>> are also true for the other big untrusted thread that we interact with
>>> (the guest itself).
>>>
>>> My view is that we should view the untrusted thread as an extension of
>>> the guest and that the interfaces between the trusted thread and the
>>> untrusted thread views it simply as another machine type that presents a
>>> different (simpler) hardware abstraction.
>>>
>>
>> Yes this makes sense. I think our biggest concern with IPC is that we
>> don't
>> introduce a TOCTTOU opportunity for a device to change call parameters
>> after
>> they've been checked and before the calls is made on behalf of the
>> sandboxed
>> thread. Shared memory that is writable by both untrusted/trusted
>> thread could
>> introduce this.
>
> This is no different than dealing with a guest. We have to handle this
> with virtio already.
>

Well that's good.

>>
>>>> * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
>>>> Google Native Client's IMC, and shared memory.
>>>
>>> The actual mechanism doesn't really matter I think, but see above
>>> comments.
>>>
>>>> * If seccomp mode 2 support becomes available, decomposition of device
>>>> emulation into untrusted/trusted threads may not be necessary. This
>>>> could result in improved performance (no IPC overhead between trusted
>>>> and untrusted thread) and reduced complexity (no need for trusted
>>>> helper thread).
>>>
>>> If mode 2 is the Right Answer, then we shouldn't wait for it to become
>>> available. We should make it available by pushing it into the kernel.
>>>
>>> If we all agree that if mode 2 existed, it's what we would use, then
>>> that we have the answer to this discussion and we know what we need to
>>> go off and do.
>>>
>>
>> That would seem like the logical approach. I think there may be new
>> mode 2
>> patches coming soon so we can see how they go over.
>
> I'd like to see what the whitelist would need to be for something like
> QEMU in mode 2. My biggest concern is that the whitelist would need to
> be so large that the practical security what's all that much improved.

This may not tell the whole story.  These are the syscalls found to be 
called with the following execution:   qemu -hda harddrive.raw -boot c 
-m 256

access
brk
clock_gettime
clone
close
connect
dup
eventfd2
execve
fcntl64
fstat64
futex
getegid32
geteuid32
getgid32
getpeername
getrlimit
getsockname
gettimeofday
getuid32
ioctl
_llseek
madvise
mmap2
mprotect
munmap
nanosleep
open
poll
prctl
pread64
read
readlink
rt_sigaction
rt_sigprocmask
select
set_robust_list
set_thread_area
set_tid_address
shmat
shmctl
shmdt
shmget
signalfd
socket
stat64
tgkill
time
timer_create
timer_gettime
timer_settime
uname
write
writev

>
> Regards,
>
> Anthony Liguori
>

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 19:43     ` Anthony Liguori
  2011-12-07 19:52       ` Michael Halcrow
  2011-12-07 20:02       ` Corey Bryant
@ 2011-12-07 20:54       ` Eric Paris
  2011-12-08  9:40         ` Stefan Hajnoczi
  2 siblings, 1 reply; 31+ messages in thread
From: Eric Paris @ 2011-12-07 20:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Stefan Hajnoczi, Corey Bryant, Michael Halcrow, qemu-devel,
	George Wilson, Paul Moore, Ashley D Lai, Avi Kivity,
	Richa Marwaha, Amit Shah, Radim Krčmář,
	Eduardo Terrell Ferrari Otubo, Lee Terrell

On Wed, 2011-12-07 at 13:43 -0600, Anthony Liguori wrote:
> On 12/07/2011 01:32 PM, Corey Bryant wrote:

> > That would seem like the logical approach. I think there may be new mode 2
> > patches coming soon so we can see how they go over.
> 
> I'd like to see what the whitelist would need to be for something like QEMU in 
> mode 2.  My biggest concern is that the whitelist would need to be so large that 
> the practical security what's all that much improved.

When I prototyped my version of seccomp v2 for qemu a while back I did
it by only looking at syscalls after inital setup was completed (aka the
very last thing before main_loop() was to lock it down).  My list was
much sorter than even these:

+        __NR_brk,
+        __NR_close,
+        __NR_exit_group,
+        __NR_futex,
+        __NR_ioctl,
+        __NR_madvise,
+        __NR_mmap,
+        __NR_munmap,
+        __NR_read,
+        __NR_recvfrom,
+        __NR_recvmsg,
+        __NR_rt_sigaction,
+        __NR_select,
+        __NR_sendto,
+        __NR_tgkill,
+        __NR_timer_delete,
+        __NR_timer_gettime,
+        __NR_timer_settime,
+        __NR_write,
+        __NR_writev,

There is simple obvious negligible overhead value here, but every
proposal I know of, including mine, has been shot down by Ingo.  Ingo's
proposal is much more work, more overhead, but clearly more flexible.
His suggestions (and code based on those suggestions from others) has
been shot down by PeterZ.

So I feel like seccomp v2 is between a rock and a hard place.  We have
something that works really well, and could be a huge win for all sorts
of programs, but we don't seem to be able to get anything past the
damned if you do, damned if you don't nak's.....

(There's also a cgroup version of seccomp proposed, but I'm guessing it
will go just about as far as all the other versions)

-Eric

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 18:48 ` Anthony Liguori
  2011-12-07 19:32   ` Corey Bryant
@ 2011-12-07 21:20   ` Paul Moore
  2011-12-14 17:15     ` Serge E. Hallyn
  1 sibling, 1 reply; 31+ messages in thread
From: Paul Moore @ 2011-12-07 21:20 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ashley D Lai, Stefan Hajnoczi, Corey Bryant, Michael Halcrow,
	qemu-devel, Eric Paris, Radim Krčmář, Avi Kivity,
	Richa Marwaha, Amit Shah, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson

On Wednesday, December 07, 2011 12:48:16 PM Anthony Liguori wrote:
> On 12/07/2011 12:25 PM, Corey Bryant wrote:
> > A group of us are starting to work on sandboxing QEMU device emulation
> > code. We're just getting started investigating various approaches, and
> > want to engage the community to gather input.
> 
> > Following are the design points that we are currently considering:
>
> To be perfectly honest, I think prototyping and measuring performance is
> going to be the only way to figure out the right approach here.

Agreed.  I'm currently working on a prototype to play around with some of the 
ideas discussed in this thread.  As soon as it is functional I'll send a 
pointer/patches/etc. to the list.

-- 
paul moore
virtualization @ redhat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 20:54       ` Eric Paris
@ 2011-12-08  9:40         ` Stefan Hajnoczi
  2011-12-11 10:50           ` Dor Laor
  0 siblings, 1 reply; 31+ messages in thread
From: Stefan Hajnoczi @ 2011-12-08  9:40 UTC (permalink / raw)
  To: Eric Paris
  Cc: Richa Marwaha, Corey Bryant, Michael Halcrow, qemu-devel,
	George Wilson, Paul Moore, Ashley D Lai, Avi Kivity, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell

On Wed, Dec 7, 2011 at 8:54 PM, Eric Paris <eparis@redhat.com> wrote:
> On Wed, 2011-12-07 at 13:43 -0600, Anthony Liguori wrote:
>> On 12/07/2011 01:32 PM, Corey Bryant wrote:
>
>> > That would seem like the logical approach. I think there may be new mode 2
>> > patches coming soon so we can see how they go over.
>>
>> I'd like to see what the whitelist would need to be for something like QEMU in
>> mode 2.  My biggest concern is that the whitelist would need to be so large that
>> the practical security what's all that much improved.
>
> When I prototyped my version of seccomp v2 for qemu a while back I did
> it by only looking at syscalls after inital setup was completed (aka the
> very last thing before main_loop() was to lock it down).  My list was
> much sorter than even these:
>
> +        __NR_brk,
> +        __NR_close,
> +        __NR_exit_group,
> +        __NR_futex,
> +        __NR_ioctl,
> +        __NR_madvise,
> +        __NR_mmap,
> +        __NR_munmap,
> +        __NR_read,
> +        __NR_recvfrom,
> +        __NR_recvmsg,
> +        __NR_rt_sigaction,
> +        __NR_select,
> +        __NR_sendto,
> +        __NR_tgkill,
> +        __NR_timer_delete,
> +        __NR_timer_gettime,
> +        __NR_timer_settime,
> +        __NR_write,
> +        __NR_writev,
>
> There is simple obvious negligible overhead value here, but every
> proposal I know of, including mine, has been shot down by Ingo.  Ingo's
> proposal is much more work, more overhead, but clearly more flexible.
> His suggestions (and code based on those suggestions from others) has
> been shot down by PeterZ.
>
> So I feel like seccomp v2 is between a rock and a hard place.  We have
> something that works really well, and could be a huge win for all sorts
> of programs, but we don't seem to be able to get anything past the
> damned if you do, damned if you don't nak's.....
>
> (There's also a cgroup version of seccomp proposed, but I'm guessing it
> will go just about as far as all the other versions)

Still, these sorts of situations are overcome all the time.  Sometimes
it takes a while and several LWN.net articles about the drama but at
the end things can be worked out.

If we want to discuss the specifics of mode 2 and especially what Ingo
or Peter think then I think we should do it in a forum where they
participate.  Maybe their positions have changed.

Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 19:32   ` Corey Bryant
  2011-12-07 19:43     ` Anthony Liguori
@ 2011-12-08  9:47     ` Stefan Hajnoczi
  2011-12-08 14:39       ` Corey Bryant
  1 sibling, 1 reply; 31+ messages in thread
From: Stefan Hajnoczi @ 2011-12-08  9:47 UTC (permalink / raw)
  To: Corey Bryant
  Cc: Richa Marwaha, Michael Halcrow, qemu-devel, Eric Paris,
	Paul Moore, Ashley D Lai, Avi Kivity, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson

On Wed, Dec 7, 2011 at 7:32 PM, Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
>
>
> On 12/07/2011 01:48 PM, Anthony Liguori wrote:
>>
>> On 12/07/2011 12:25 PM, Corey Bryant wrote:
>>> * The trusted helper thread would run beside the untrusted thread,
>>> enabling the untrusted thread to make syscalls beyond read(),
>>> write(), exit(), and sigreturn().
>>
>>
>> I assume you mean process, not thread BTW?
>>
>
> I do mean thread.  When making calls on behalf of the seccomp'd thread, I
> think there will be syscalls that must be called from the same address
> space.  That's where the the trusted helper thread would come into play.

It's worth pointing out that "isolation within the same process"
schemes work by running the trusted thread in a very special execution
environment.  It cannot trust memory and cannot use the stack for
control flow.  Everything must be done in registers.

This can be made to work but it's highly unportable across host
architectures and hard to make changes to the trusted helper because
you have to be so careful.

Stefan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-08  9:47     ` Stefan Hajnoczi
@ 2011-12-08 14:39       ` Corey Bryant
  0 siblings, 0 replies; 31+ messages in thread
From: Corey Bryant @ 2011-12-08 14:39 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Richa Marwaha, Michael Halcrow, qemu-devel, Eric Paris,
	Paul Moore, Ashley D Lai, Avi Kivity, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson



On 12/08/2011 04:47 AM, Stefan Hajnoczi wrote:
> On Wed, Dec 7, 2011 at 7:32 PM, Corey Bryant<coreyb@linux.vnet.ibm.com>  wrote:
>>
>>
>> On 12/07/2011 01:48 PM, Anthony Liguori wrote:
>>>
>>> On 12/07/2011 12:25 PM, Corey Bryant wrote:
>>>> * The trusted helper thread would run beside the untrusted thread,
>>>> enabling the untrusted thread to make syscalls beyond read(),
>>>> write(), exit(), and sigreturn().
>>>
>>>
>>> I assume you mean process, not thread BTW?
>>>
>>
>> I do mean thread.  When making calls on behalf of the seccomp'd thread, I
>> think there will be syscalls that must be called from the same address
>> space.  That's where the the trusted helper thread would come into play.
>
> It's worth pointing out that "isolation within the same process"
> schemes work by running the trusted thread in a very special execution
> environment.  It cannot trust memory and cannot use the stack for
> control flow.  Everything must be done in registers.
>
> This can be made to work but it's highly unportable across host
> architectures and hard to make changes to the trusted helper because
> you have to be so careful.
>
> Stefan
>

That's a good point.  And maybe we would only need the trusted thread 
for a minimal number of syscalls that must be made from the same address 
space, like mmap.  I think another approach to safely making a call on 
behalf of an untrusted thread is to pass the call and parameters to a 
trusted process which sanitizes the parameters, writes them to memory 
shared with the trusted thread (read-only from the thread side), and the 
trusted thread can make the call.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 18:25 [Qemu-devel] [RFC] Device sandboxing Corey Bryant
  2011-12-07 18:48 ` Anthony Liguori
@ 2011-12-08 21:51 ` Blue Swirl
  2011-12-12 18:30   ` Corey Bryant
  2011-12-09 16:17 ` Paul Brook
  2 siblings, 1 reply; 31+ messages in thread
From: Blue Swirl @ 2011-12-08 21:51 UTC (permalink / raw)
  To: Corey Bryant
  Cc: Ashley D Lai, Anthony Liguori, Stefan Hajnoczi, Michael Halcrow,
	qemu-devel, Eric Paris, Paul Moore, Radim Krčmář,
	Avi Kivity, Richa Marwaha, Amit Shah,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson

On Wed, Dec 7, 2011 at 18:25, Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
> A group of us are starting to work on sandboxing QEMU device emulation
> code.  We're just getting started investigating various approaches, and
> want to engage the community to gather input.
>
> Following are the design points that we are currently considering:
>
> * Decompose QEMU into multiple processes:
>
>    * This could be done such that QEMU devices execute in separate
>      processes based on device type, e.g. all block devices in one
>      process and all network devices in a second process.  Another
>      alternative is executing a separate process per device.
>
>    * Decomposition would not only afford a level of security inherent
>      in process separation, it would also allow development of stricter
>      sVirt/SELinux policy for the decomposed QEMU processes (e.g. a
>      block device specific policy).  This would enable a true sandbox
>      with layers of defense.

I'd start by splitting QEMU into two processes: untrusted process
(which performs most of the work) running in a chroot() jail and
trusted process outside (handling access to drives, network etc.). The
untrusted process could then be split further later like you detail
below but this first step would already give minimal protection with a
reasonable amount of effort.

> * Decompose the device emulation process further into an untrusted and
>  trusted thread:
>
>    * The untrusted thread would be restricted by seccomp mode 1 and
>      would contain the device emulation code.
>
>    * The trusted helper thread would run beside the untrusted thread,
>      enabling the untrusted thread to make syscalls beyond read(),
>      write(), exit(), and sigreturn().

Why limit this to device emulation only? Where in QEMU would this
approach not work?

But the problem here is that conversion of a single-thread application
to multithreading, especially using only API like seccomp, will not be
so trivial or error free work. Therefore I'd propose to start with
something simpler at first.

> * IPC communication mechanisms:
>
>    * An IPC mechanism will be required to enable communication between
>      untrusted and trusted threads.
>
>    * An IPC mechanism will also be required to enable communication
>      between the main QEMU process and device processes.
>
>    * The communication mechanisms must provide secure communication,
>      be low overhead (easy to generate, parse, and validate), and must
>      play well with sVirt/LSMs.
>
>    * Some thoughts for IPC mechanisms are Unix sockets, pipes, virtio,
>      Google Native Client's IMC, and shared memory.
>
> * If seccomp mode 2 support becomes available, decomposition of device
>  emulation into untrusted/trusted threads may not be necessary.  This
>  could result in improved performance (no IPC overhead between trusted
>  and untrusted thread) and reduced complexity (no need for trusted
>  helper thread).
>
> * Execution of QEMU with the sandboxed device support should be an
>  optional run-time specification.
>
> * We will be focusing on legacy devices first, both for performance and
>  risk reasons.
>
> Once we settle on a direction, we will develop a proof of concept to
> share with the community.
>
> We appreciate your input.
>
> Regards,
>
> Ashley Lai
> Corey Bryant
> Eduardo Otubo
> Michael Halcrow
> Paul Moore
> Richa Marwaha
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 18:25 [Qemu-devel] [RFC] Device sandboxing Corey Bryant
  2011-12-07 18:48 ` Anthony Liguori
  2011-12-08 21:51 ` Blue Swirl
@ 2011-12-09 16:17 ` Paul Brook
  2011-12-09 16:34   ` Paul Moore
                     ` (2 more replies)
  2 siblings, 3 replies; 31+ messages in thread
From: Paul Brook @ 2011-12-09 16:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Ashley D Lai, Anthony Liguori, Stefan Hajnoczi, Corey Bryant,
	Michael Halcrow, Eric Paris, Paul Moore,
	Radim Krčmář, Avi Kivity, Richa Marwaha, Amit Shah,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson

> A group of us are starting to work on sandboxing QEMU device emulation
> code.  We're just getting started investigating various approaches, and
> want to engage the community to gather input.
> 
> Following are the design points that we are currently considering:
> 
> * Decompose QEMU into multiple processes:
> 
>      * This could be done such that QEMU devices execute in separate
>        processes based on device type, e.g. all block devices in one
>        process and all network devices in a second process.  Another
>        alternative is executing a separate process per device.

I can't help wondering if nested virtualization would be a better solution.  
i.e. have an outer VM that only implements a trusted subset of devices. Inside 
that run a VM that provides the flakey legacy device emulation you expect to 
be compromised.

Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-09 16:17 ` Paul Brook
@ 2011-12-09 16:34   ` Paul Moore
  2011-12-09 17:32     ` Paul Brook
  2011-12-10 19:39   ` Blue Swirl
  2011-12-11  9:08   ` Avi Kivity
  2 siblings, 1 reply; 31+ messages in thread
From: Paul Moore @ 2011-12-09 16:34 UTC (permalink / raw)
  To: Paul Brook
  Cc: Ashley D Lai, Anthony Liguori, Stefan Hajnoczi, Corey Bryant,
	Michael Halcrow, qemu-devel, Eric Paris,
	Radim Krčmář, Avi Kivity, Richa Marwaha, Amit Shah,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson

On Friday, December 09, 2011 04:17:50 PM Paul Brook wrote:
> > A group of us are starting to work on sandboxing QEMU device emulation
> > code.  We're just getting started investigating various approaches, and
> > want to engage the community to gather input.
> > 
> > Following are the design points that we are currently considering:
> > 
> > * Decompose QEMU into multiple processes:
> >      * This could be done such that QEMU devices execute in
> >      separate
> >      
> >        processes based on device type, e.g. all block devices in
> >        one
> >        process and all network devices in a second process. 
> >        Another
> >        alternative is executing a separate process per device.
> 
> I can't help wondering if nested virtualization would be a better solution.
> i.e. have an outer VM that only implements a trusted subset of devices.
> Inside that run a VM that provides the flakey legacy device emulation you
> expect to be compromised.

A few questions about this approach come to mind:

1. Does nested virtualization work across all the different hardware assisted 
virtualization platforms/CPUs?

2. What is the additional performance overhead for nested virtualization?  
Generalizations are okay, I'm just trying to get a basic understanding.

3. What, if any, management concerns are there with nested virtualization?

-- 
paul moore
virtualization @ redhat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-09 16:34   ` Paul Moore
@ 2011-12-09 17:32     ` Paul Brook
  2011-12-09 17:49       ` Paul Moore
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Brook @ 2011-12-09 17:32 UTC (permalink / raw)
  To: Paul Moore
  Cc: Ashley D Lai, Anthony Liguori, Stefan Hajnoczi, Corey Bryant,
	Michael Halcrow, qemu-devel, Eric Paris,
	Radim Krčmář, Avi Kivity, Richa Marwaha, Amit Shah,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson

> On Friday, December 09, 2011 04:17:50 PM Paul Brook wrote:
> > > A group of us are starting to work on sandboxing QEMU device emulation
> > > code.  We're just getting started investigating various approaches, and
> > > want to engage the community to gather input.
> > > 
> > > Following are the design points that we are currently considering:
> > > 
> > > * Decompose QEMU into multiple processes:
> > >      * This could be done such that QEMU devices execute in
> > >      separate
> > >      
> > >        processes based on device type, e.g. all block devices in
> > >        one
> > >        process and all network devices in a second process.
> > >        Another
> > >        alternative is executing a separate process per device.
> > 
> > I can't help wondering if nested virtualization would be a better
> > solution. i.e. have an outer VM that only implements a trusted subset of
> > devices. Inside that run a VM that provides the flakey legacy device
> > emulation you expect to be compromised.
> 
> A few questions about this approach come to mind:
> 
> 1. Does nested virtualization work across all the different hardware
> assisted virtualization platforms/CPUs?
> 
> 2. What is the additional performance overhead for nested virtualization?
> Generalizations are okay, I'm just trying to get a basic understanding.
> 
> 3. What, if any, management concerns are there with nested virtualization?

I don't have good answers to any of these questions. Then again I doubt anyone 
has good answers for your proposed process splitting either.

Last time I checked at least one of the Intel/AMD schemes had been 
implemented, through I don't know if it's been merged, or had any serious 
performance tuning.  My main intent was to raise this as a potentially viable 
alternative.  Someone who actually cares about the answer can figure out the 
details and cobble together some benchmarks :-)

Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-09 17:32     ` Paul Brook
@ 2011-12-09 17:49       ` Paul Moore
  2011-12-09 18:46         ` Paul Brook
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Moore @ 2011-12-09 17:49 UTC (permalink / raw)
  To: Paul Brook
  Cc: Ashley D Lai, Anthony Liguori, Stefan Hajnoczi, Corey Bryant,
	Michael Halcrow, qemu-devel, Eric Paris,
	Radim Krčmář, Avi Kivity, Richa Marwaha, Amit Shah,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson

On Friday, December 09, 2011 05:32:19 PM Paul Brook wrote:
> > On Friday, December 09, 2011 04:17:50 PM Paul Brook wrote:
> > > > A group of us are starting to work on sandboxing QEMU device
> > > > emulation code.  We're just getting started investigating
> > > > various approaches, and want to engage the community to gather
> > > > input.
> > > > 
> > > > Following are the design points that we are currently
> > > > considering:
> > > > 
> > > > * Decompose QEMU into multiple processes:
> > > >      * This could be done such that QEMU devices execute in
> > > >      separate
> > > >      
> > > >        processes based on device type, e.g. all block
> > > >        devices in
> > > >        one
> > > >        process and all network devices in a second
> > > >        process.
> > > >        Another
> > > >        alternative is executing a separate process per
> > > >        device.
> > > 
> > > I can't help wondering if nested virtualization would be a better
> > > solution. i.e. have an outer VM that only implements a trusted
> > > subset of devices. Inside that run a VM that provides the flakey
> > > legacy device emulation you expect to be compromised.
> > 
> > A few questions about this approach come to mind:
> > 
> > 1. Does nested virtualization work across all the different hardware
> > assisted virtualization platforms/CPUs?
> > 
> > 2. What is the additional performance overhead for nested
> > virtualization?
> > Generalizations are okay, I'm just trying to get a basic understanding.
> > 
> > 3. What, if any, management concerns are there with nested
> > virtualization?
> I don't have good answers to any of these questions. Then again I doubt
> anyone has good answers for your proposed process splitting either.

That's why we're working on a prototype.  The questions weren't intended to be 
adversarial, just questions that I didn't know the answers to and thought you 
might ...

> Last time I checked at least one of the Intel/AMD schemes had been
> implemented, through I don't know if it's been merged, or had any serious
> performance tuning.  My main intent was to raise this as a potentially
> viable alternative.  Someone who actually cares about the answer can figure
> out the details and cobble together some benchmarks :-)

Well, if we see no answers and see no interest it probably isn't a viable 
alternative as no interest typically means no code.

-- 
paul moore
virtualization @ redhat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-09 17:49       ` Paul Moore
@ 2011-12-09 18:46         ` Paul Brook
  2011-12-09 18:50           ` Paul Moore
  2011-12-09 18:59           ` Paul Brook
  0 siblings, 2 replies; 31+ messages in thread
From: Paul Brook @ 2011-12-09 18:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Stefan Hajnoczi, Corey Bryant, Michael Halcrow,
	Eric Paris, Paul Moore, Ashley D Lai, Avi Kivity, Richa Marwaha,
	Amit Shah, Radim Krčmář,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson

> > Last time I checked at least one of the Intel/AMD schemes had been
> > implemented, through I don't know if it's been merged, or had any serious
> > performance tuning.  My main intent was to raise this as a potentially
> > viable alternative.  Someone who actually cares about the answer can
> > figure out the details and cobble together some benchmarks :-)
> 
> Well, if we see no answers and see no interest it probably isn't a viable
> alternative as no interest typically means no code.

You're using circular logic.  Based on that theory your proposal isn't viable 
either. If it was someone would have done it laready!

Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-09 18:46         ` Paul Brook
@ 2011-12-09 18:50           ` Paul Moore
  2011-12-09 18:59           ` Paul Brook
  1 sibling, 0 replies; 31+ messages in thread
From: Paul Moore @ 2011-12-09 18:50 UTC (permalink / raw)
  To: Paul Brook
  Cc: Anthony Liguori, Stefan Hajnoczi, Corey Bryant, Michael Halcrow,
	qemu-devel, Eric Paris, Ashley D Lai, Avi Kivity, Richa Marwaha,
	Amit Shah, Radim Krčmář,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson

On Friday, December 09, 2011 06:46:59 PM Paul Brook wrote:
> > > Last time I checked at least one of the Intel/AMD schemes had been
> > > implemented, through I don't know if it's been merged, or had any
> > > serious performance tuning.  My main intent was to raise this as a
> > > potentially viable alternative.  Someone who actually cares about
> > > the answer can figure out the details and cobble together some
> > > benchmarks :-)> 
> >
> > Well, if we see no answers and see no interest it probably isn't a
> > viable alternative as no interest typically means no code.
> 
> You're using circular logic.  Based on that theory your proposal isn't
> viable either. If it was someone would have done it laready!

Did you miss the part where we are working on a prototype?  To me that signals 
interest and code.

-- 
paul moore
virtualization @ redhat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-09 18:46         ` Paul Brook
  2011-12-09 18:50           ` Paul Moore
@ 2011-12-09 18:59           ` Paul Brook
  2011-12-09 19:17             ` Paul Moore
  1 sibling, 1 reply; 31+ messages in thread
From: Paul Brook @ 2011-12-09 18:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Stefan Hajnoczi, Corey Bryant, Michael Halcrow,
	Eric Paris, Paul Moore, Ashley D Lai, Avi Kivity, Richa Marwaha,
	Amit Shah, Radim Krčmář,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson

> > > Last time I checked at least one of the Intel/AMD schemes had been
> > > implemented, through I don't know if it's been merged, or had any
> > > serious performance tuning.  My main intent was to raise this as a
> > > potentially viable alternative.  Someone who actually cares about the
> > > answer can figure out the details and cobble together some benchmarks
> > > :-)
> > 
> > Well, if we see no answers and see no interest it probably isn't a viable
> > alternative as no interest typically means no code.
> 
> You're using circular logic.  Based on that theory your proposal isn't
> viable either. If it was someone would have done it already!

... and to be clear, the reason I don't care is because you're trying to solve 
a problem that doesn't interest me.  I can see the benefit you're trying to 
achieve, but for my workloads once the guest genie gets out of the bottle 
you've already lost.

Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-09 18:59           ` Paul Brook
@ 2011-12-09 19:17             ` Paul Moore
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Moore @ 2011-12-09 19:17 UTC (permalink / raw)
  To: Paul Brook
  Cc: Anthony Liguori, Stefan Hajnoczi, Corey Bryant, Michael Halcrow,
	qemu-devel, Eric Paris, Ashley D Lai, Avi Kivity, Richa Marwaha,
	Amit Shah, Radim Krčmář,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson

On Friday, December 09, 2011 06:59:29 PM Paul Brook wrote:
> ... and to be clear, the reason I don't care is because you're trying to
> solve a problem that doesn't interest me.

That's fine with me, the world would be a very boring place if we all shared 
the same opinions and interests.

-- 
paul moore
virtualization @ redhat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-09 16:17 ` Paul Brook
  2011-12-09 16:34   ` Paul Moore
@ 2011-12-10 19:39   ` Blue Swirl
  2011-12-11  9:08   ` Avi Kivity
  2 siblings, 0 replies; 31+ messages in thread
From: Blue Swirl @ 2011-12-10 19:39 UTC (permalink / raw)
  To: Paul Brook
  Cc: Anthony Liguori, Stefan Hajnoczi, Corey Bryant, Michael Halcrow,
	qemu-devel, Eric Paris, Paul Moore, Ashley D Lai, Avi Kivity,
	Richa Marwaha, Amit Shah, Radim Krčmář,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson

On Fri, Dec 9, 2011 at 16:17, Paul Brook <paul@codesourcery.com> wrote:
>> A group of us are starting to work on sandboxing QEMU device emulation
>> code.  We're just getting started investigating various approaches, and
>> want to engage the community to gather input.
>>
>> Following are the design points that we are currently considering:
>>
>> * Decompose QEMU into multiple processes:
>>
>>      * This could be done such that QEMU devices execute in separate
>>        processes based on device type, e.g. all block devices in one
>>        process and all network devices in a second process.  Another
>>        alternative is executing a separate process per device.
>
> I can't help wondering if nested virtualization would be a better solution.
> i.e. have an outer VM that only implements a trusted subset of devices. Inside
> that run a VM that provides the flakey legacy device emulation you expect to
> be compromised.

I think Anthony has proposed this also, taking virtio devices as the
trusted subset.

Similar effect from security point of view could be made by forcing
the guest to only use the trusted subset of devices, then the outer VM
equals inner VM. Nesting may provide additional layer of defense and
the subset may limit guest OS selection.

I proposed once a setup where the outer VM would use an encrypted
instruction set (maybe also I/O registers should be encrypted). Then
the guest in the inner VM would have trouble guessing how to break
into it (or how to exploit a vulnerability) or it could try to break
directly into the (presumably x86) host which could be more difficult.
It would not be very difficult for hardware designers to add hardware
support for this though, without that performance would be bad (less
than TCG efficiency squared).

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-09 16:17 ` Paul Brook
  2011-12-09 16:34   ` Paul Moore
  2011-12-10 19:39   ` Blue Swirl
@ 2011-12-11  9:08   ` Avi Kivity
  2 siblings, 0 replies; 31+ messages in thread
From: Avi Kivity @ 2011-12-11  9:08 UTC (permalink / raw)
  To: Paul Brook
  Cc: Ashley D Lai, Anthony Liguori, Stefan Hajnoczi, Corey Bryant,
	Michael Halcrow, qemu-devel, Eric Paris, Paul Moore,
	Radim Krčmář, Richa Marwaha, Amit Shah,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson

On 12/09/2011 06:17 PM, Paul Brook wrote:
> > A group of us are starting to work on sandboxing QEMU device emulation
> > code.  We're just getting started investigating various approaches, and
> > want to engage the community to gather input.
> > 
> > Following are the design points that we are currently considering:
> > 
> > * Decompose QEMU into multiple processes:
> > 
> >      * This could be done such that QEMU devices execute in separate
> >        processes based on device type, e.g. all block devices in one
> >        process and all network devices in a second process.  Another
> >        alternative is executing a separate process per device.
>
> I can't help wondering if nested virtualization would be a better solution.  
> i.e. have an outer VM that only implements a trusted subset of devices. Inside 
> that run a VM that provides the flakey legacy device emulation you expect to 
> be compromised.

Nested virtualization is going to be painfully slow.  We did consider
side-by-side virtualization: both the guest and the device model run in
separate VM containers (this is what Xen does, except it uses
paravirtualization for the device model).  It's going to be more
expensive that the other forms of sandboxing, though, due to the heavier
context switch penalty.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-08  9:40         ` Stefan Hajnoczi
@ 2011-12-11 10:50           ` Dor Laor
  2011-12-12 18:54             ` Will Drewry
  0 siblings, 1 reply; 31+ messages in thread
From: Dor Laor @ 2011-12-11 10:50 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Radim Krčmář, wad, Richa Marwaha, Corey Bryant,
	Michael Halcrow, qemu-devel, Eric Paris, Paul Moore,
	George Wilson, Avi Kivity, Amit Shah, Ashley D Lai,
	Eduardo Terrell Ferrari Otubo, Lee Terrell

On 12/08/2011 11:40 AM, Stefan Hajnoczi wrote:
> On Wed, Dec 7, 2011 at 8:54 PM, Eric Paris<eparis@redhat.com>  wrote:
>> On Wed, 2011-12-07 at 13:43 -0600, Anthony Liguori wrote:
>>> On 12/07/2011 01:32 PM, Corey Bryant wrote:
>>
>>>> That would seem like the logical approach. I think there may be new mode 2
>>>> patches coming soon so we can see how they go over.
>>>
>>> I'd like to see what the whitelist would need to be for something like QEMU in
>>> mode 2.  My biggest concern is that the whitelist would need to be so large that
>>> the practical security what's all that much improved.
>>
>> When I prototyped my version of seccomp v2 for qemu a while back I did
>> it by only looking at syscalls after inital setup was completed (aka the
>> very last thing before main_loop() was to lock it down).  My list was
>> much sorter than even these:
>>
>> +        __NR_brk,
>> +        __NR_close,
>> +        __NR_exit_group,
>> +        __NR_futex,
>> +        __NR_ioctl,
>> +        __NR_madvise,
>> +        __NR_mmap,
>> +        __NR_munmap,
>> +        __NR_read,
>> +        __NR_recvfrom,
>> +        __NR_recvmsg,
>> +        __NR_rt_sigaction,
>> +        __NR_select,
>> +        __NR_sendto,
>> +        __NR_tgkill,
>> +        __NR_timer_delete,
>> +        __NR_timer_gettime,
>> +        __NR_timer_settime,
>> +        __NR_write,
>> +        __NR_writev,
>>
>> There is simple obvious negligible overhead value here, but every
>> proposal I know of, including mine, has been shot down by Ingo.  Ingo's
>> proposal is much more work, more overhead, but clearly more flexible.
>> His suggestions (and code based on those suggestions from others) has
>> been shot down by PeterZ.
>>
>> So I feel like seccomp v2 is between a rock and a hard place.  We have
>> something that works really well, and could be a huge win for all sorts
>> of programs, but we don't seem to be able to get anything past the
>> damned if you do, damned if you don't nak's.....
>>
>> (There's also a cgroup version of seccomp proposed, but I'm guessing it
>> will go just about as far as all the other versions)
>
> Still, these sorts of situations are overcome all the time.  Sometimes
> it takes a while and several LWN.net articles about the drama but at
> the end things can be worked out.
>
> If we want to discuss the specifics of mode 2 and especially what Ingo
> or Peter think then I think we should do it in a forum where they
> participate.  Maybe their positions have changed.

Will, little bird whispered that you'll going to send another iteration 
w/ higher acceptance chances. Where do we stand w/ it? Can you please 
elaborate on it chance to get merged?

>
> Stefan
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-08 21:51 ` Blue Swirl
@ 2011-12-12 18:30   ` Corey Bryant
  0 siblings, 0 replies; 31+ messages in thread
From: Corey Bryant @ 2011-12-12 18:30 UTC (permalink / raw)
  To: Blue Swirl
  Cc: Ashley D Lai, Anthony Liguori, Stefan Hajnoczi, Michael Halcrow,
	qemu-devel, Eric Paris, Paul Moore, Radim Krčmář,
	Avi Kivity, Richa Marwaha, Amit Shah,
	Eduardo Terrell Ferrari Otubo, Lee Terrell, George Wilson



On 12/08/2011 04:51 PM, Blue Swirl wrote:
> Why limit this to device emulation only? Where in QEMU would this
> approach not work?

That's a good point, and we've thrown this idea around.  I don't know if 
there's any reason why this approach wouldn't work for all of QEMU.  The 
idea for now though is to target the most vulnerable code, devices.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-11 10:50           ` Dor Laor
@ 2011-12-12 18:54             ` Will Drewry
  0 siblings, 0 replies; 31+ messages in thread
From: Will Drewry @ 2011-12-12 18:54 UTC (permalink / raw)
  To: dlaor
  Cc: Radim Krčmář, Stefan Hajnoczi, Corey Bryant,
	Michael Halcrow, qemu-devel, Eric Paris, Paul Moore,
	George Wilson, Avi Kivity, Richa Marwaha, Amit Shah, Ashley D Lai,
	Eduardo Terrell Ferrari Otubo, Lee Terrell

On Sun, Dec 11, 2011 at 4:50 AM, Dor Laor <dlaor@redhat.com> wrote:
> On 12/08/2011 11:40 AM, Stefan Hajnoczi wrote:
>>
>> On Wed, Dec 7, 2011 at 8:54 PM, Eric Paris<eparis@redhat.com>  wrote:
>>>
>>> On Wed, 2011-12-07 at 13:43 -0600, Anthony Liguori wrote:
>>>>
>>>> On 12/07/2011 01:32 PM, Corey Bryant wrote:
>>>
>>>
>>>>> That would seem like the logical approach. I think there may be new
>>>>> mode 2
>>>>> patches coming soon so we can see how they go over.
>>>>
>>>>
>>>> I'd like to see what the whitelist would need to be for something like
>>>> QEMU in
>>>> mode 2.  My biggest concern is that the whitelist would need to be so
>>>> large that
>>>> the practical security what's all that much improved.
>>>
>>>
>>> When I prototyped my version of seccomp v2 for qemu a while back I did
>>> it by only looking at syscalls after inital setup was completed (aka the
>>> very last thing before main_loop() was to lock it down).  My list was
>>> much sorter than even these:
>>>
>>> +        __NR_brk,
>>> +        __NR_close,
>>> +        __NR_exit_group,
>>> +        __NR_futex,
>>> +        __NR_ioctl,
>>> +        __NR_madvise,
>>> +        __NR_mmap,
>>> +        __NR_munmap,
>>> +        __NR_read,
>>> +        __NR_recvfrom,
>>> +        __NR_recvmsg,
>>> +        __NR_rt_sigaction,
>>> +        __NR_select,
>>> +        __NR_sendto,
>>> +        __NR_tgkill,
>>> +        __NR_timer_delete,
>>> +        __NR_timer_gettime,
>>> +        __NR_timer_settime,
>>> +        __NR_write,
>>> +        __NR_writev,
>>>
>>> There is simple obvious negligible overhead value here, but every
>>> proposal I know of, including mine, has been shot down by Ingo.  Ingo's
>>> proposal is much more work, more overhead, but clearly more flexible.
>>> His suggestions (and code based on those suggestions from others) has
>>> been shot down by PeterZ.
>>>
>>> So I feel like seccomp v2 is between a rock and a hard place.  We have
>>> something that works really well, and could be a huge win for all sorts
>>> of programs, but we don't seem to be able to get anything past the
>>> damned if you do, damned if you don't nak's.....
>>>
>>> (There's also a cgroup version of seccomp proposed, but I'm guessing it
>>> will go just about as far as all the other versions)
>>
>>
>> Still, these sorts of situations are overcome all the time.  Sometimes
>> it takes a while and several LWN.net articles about the drama but at
>> the end things can be worked out.
>>
>> If we want to discuss the specifics of mode 2 and especially what Ingo
>> or Peter think then I think we should do it in a forum where they
>> participate.  Maybe their positions have changed.
>
>
> Will, little bird whispered that you'll going to send another iteration w/
> higher acceptance chances. Where do we stand w/ it? Can you please elaborate
> on it chance to get merged?

Hi - yup. I keep getting delayed with other work, but I still plan to
send it soon.  The first plan was to port the updated patchset to
linus's tree, then repost. I plan on adding a cover letter (0/x) to
the series to see if we can get discussion going again on it. I'm not
sure what the merge chance is, to be honest. I believe the patch is
desirable to many parties, but Ingo didn't feel it was appropriate
given that it would add an ABI that was not part of the trace/perf
ABI.  Given that grand-unified trace has been held up on stable
internal api points and other challenges, I would hope that we could
continue on using a prctl for seccomp_filter, even if it becomes a
compatibility layer which is eventually switched off for most users.

I did spend some time looking at other approaches (making the syscall
table a namespace, cgroups syscalls like Łukasz Sowa's patch, etc),
but using the seccomp-centered approach still seems to make sense from
a process view and a seccomp view - and who wants a new syscall anyway
:)  The last thing I am looking at before I post is seeing if  it'd be
possible to avoid touching ftrace filter engine at all for
ipc,socketcall, ioctl, etc, but I'm not sure that it makes sense to
not use the generic system that exists today.

cheers!
will

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-07 21:20   ` Paul Moore
@ 2011-12-14 17:15     ` Serge E. Hallyn
  2011-12-14 23:56       ` Paul Moore
  0 siblings, 1 reply; 31+ messages in thread
From: Serge E. Hallyn @ 2011-12-14 17:15 UTC (permalink / raw)
  To: Paul Moore
  Cc: Stefan Hajnoczi, Corey Bryant, Michael Halcrow, qemu-devel,
	Eric Paris, Ashley D Lai, Avi Kivity, Richa Marwaha, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson

Quoting Paul Moore (pmoore@redhat.com):
> On Wednesday, December 07, 2011 12:48:16 PM Anthony Liguori wrote:
> > On 12/07/2011 12:25 PM, Corey Bryant wrote:
> > > A group of us are starting to work on sandboxing QEMU device emulation
> > > code. We're just getting started investigating various approaches, and
> > > want to engage the community to gather input.
> > 
> > > Following are the design points that we are currently considering:
> >
> > To be perfectly honest, I think prototyping and measuring performance is
> > going to be the only way to figure out the right approach here.
> 
> Agreed.  I'm currently working on a prototype to play around with some of the 
> ideas discussed in this thread.  As soon as it is functional I'll send a 
> pointer/patches/etc. to the list.

Hey Paul,

just wondering, exactly which approache(s) are you prototyping?  Are you
touching seccomp2?

thanks,
-serge

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-14 17:15     ` Serge E. Hallyn
@ 2011-12-14 23:56       ` Paul Moore
  2011-12-15 14:28         ` Corey Bryant
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Moore @ 2011-12-14 23:56 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Stefan Hajnoczi, Corey Bryant, Michael Halcrow, qemu-devel,
	Eric Paris, Ashley D Lai, Avi Kivity, Richa Marwaha, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson

On Wednesday, December 14, 2011 11:15:58 AM Serge E. Hallyn wrote:
> Quoting Paul Moore (pmoore@redhat.com):
> > On Wednesday, December 07, 2011 12:48:16 PM Anthony Liguori wrote:
> > > On 12/07/2011 12:25 PM, Corey Bryant wrote:
> > > > A group of us are starting to work on sandboxing QEMU device
> > > > emulation code. We're just getting started investigating
> > > > various approaches, and want to engage the community to gather
> > > > input.
> > > 
> > > > Following are the design points that we are currently considering:
> > > To be perfectly honest, I think prototyping and measuring
> > > performance is going to be the only way to figure out the right
> > > approach here.> 
> > Agreed.  I'm currently working on a prototype to play around with some
> > of the ideas discussed in this thread.  As soon as it is functional
> > I'll send a pointer/patches/etc. to the list.
> 
> Hey Paul,
> 
> just wondering, exactly which approache(s) are you prototyping?  Are you
> touching seccomp2?

The decomposed approach as I felt (well, still do for that matter) that the 
enhanced seccomp stuff could be put to even better use in a decomposed mode of 
operation.

However, earlier this week those of us involved in this effort were strongly 
discouraged (this probably isn't the best term to use, but there is a reason 
I'm a programmer and not an english student) from pursuing the decomposed 
prototype further so work on it has dropped off considerably.

I still think it is worth pursuing, if for no other reason than to answer 
questions that right now we can only answer with educated guesses, but it is 
no longer my main focus.  If anyone else is interested in this feel free to 
drop me some email and I can bring you up to speed on the current status.

As far as the enhanced seccomp patches for QEMU, I believe Corey said that IBM 
was starting work on a prototype based on the patches that Will posted earlier 
this year.  I don't expect this change to be very substantial, the hard part 
will be determining the syscall filter and maintaining it over time.

-- 
paul moore
virtualization @ redhat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-14 23:56       ` Paul Moore
@ 2011-12-15 14:28         ` Corey Bryant
  2011-12-15 15:14           ` Serge Hallyn
  0 siblings, 1 reply; 31+ messages in thread
From: Corey Bryant @ 2011-12-15 14:28 UTC (permalink / raw)
  To: Paul Moore
  Cc: Serge E. Hallyn, Stefan Hajnoczi, Michael Halcrow, qemu-devel,
	Eric Paris, Ashley D Lai, Avi Kivity, Richa Marwaha, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson



On 12/14/2011 06:56 PM, Paul Moore wrote:
> On Wednesday, December 14, 2011 11:15:58 AM Serge E. Hallyn wrote:
>> Quoting Paul Moore (pmoore@redhat.com):
>>> On Wednesday, December 07, 2011 12:48:16 PM Anthony Liguori wrote:
>>>> On 12/07/2011 12:25 PM, Corey Bryant wrote:
>>>>> A group of us are starting to work on sandboxing QEMU device
>>>>> emulation code. We're just getting started investigating
>>>>> various approaches, and want to engage the community to gather
>>>>> input.
>>>>
>>>>> Following are the design points that we are currently considering:
>>>> To be perfectly honest, I think prototyping and measuring
>>>> performance is going to be the only way to figure out the right
>>>> approach here.>
>>> Agreed.  I'm currently working on a prototype to play around with some
>>> of the ideas discussed in this thread.  As soon as it is functional
>>> I'll send a pointer/patches/etc. to the list.
>>
>> Hey Paul,
>>
>> just wondering, exactly which approache(s) are you prototyping?  Are you
>> touching seccomp2?
>
> The decomposed approach as I felt (well, still do for that matter) that the
> enhanced seccomp stuff could be put to even better use in a decomposed mode of
> operation.
>
> However, earlier this week those of us involved in this effort were strongly
> discouraged (this probably isn't the best term to use, but there is a reason
> I'm a programmer and not an english student) from pursuing the decomposed
> prototype further so work on it has dropped off considerably.
>
> I still think it is worth pursuing, if for no other reason than to answer
> questions that right now we can only answer with educated guesses, but it is
> no longer my main focus.  If anyone else is interested in this feel free to
> drop me some email and I can bring you up to speed on the current status.
>
> As far as the enhanced seccomp patches for QEMU, I believe Corey said that IBM
> was starting work on a prototype based on the patches that Will posted earlier
> this year.  I don't expect this change to be very substantial, the hard part
> will be determining the syscall filter and maintaining it over time.
>

Paul covered the current state of affairs above so I won't expand on 
that much.  One of the major concerns from the QEMU community revolved 
around the maintenance complexity introduced by decomposing QEMU into 
separate processes, and that patches doing so were unlikely to be accepted.

With that in mind we're going to pursue a single process mode 2 
approach.  We'll put together a trivial prototype for evaluation 
purposes.  Like Paul mentioned, one of the complex parts is determining 
the correct call parameter filters, and there will be tweaking required 
as new syscalls/parameters are introduced in the future.  But the 
biggest hurdle is getting mode 2 patches into the mainline kernel, which 
has been an unsuccessful effort for a few years now.

-- 
Regards,
Corey

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-15 14:28         ` Corey Bryant
@ 2011-12-15 15:14           ` Serge Hallyn
  2011-12-15 15:35             ` Paul Moore
  0 siblings, 1 reply; 31+ messages in thread
From: Serge Hallyn @ 2011-12-15 15:14 UTC (permalink / raw)
  To: Corey Bryant
  Cc: Stefan Hajnoczi, Michael Halcrow, qemu-devel, Eric Paris,
	Paul Moore, Ashley D Lai, Avi Kivity, Richa Marwaha, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson

Quoting Corey Bryant (coreyb@linux.vnet.ibm.com):
> 
> 
> On 12/14/2011 06:56 PM, Paul Moore wrote:
> >On Wednesday, December 14, 2011 11:15:58 AM Serge E. Hallyn wrote:
> >>Quoting Paul Moore (pmoore@redhat.com):
> >>>On Wednesday, December 07, 2011 12:48:16 PM Anthony Liguori wrote:
> >>>>On 12/07/2011 12:25 PM, Corey Bryant wrote:
> >>>>>A group of us are starting to work on sandboxing QEMU device
> >>>>>emulation code. We're just getting started investigating
> >>>>>various approaches, and want to engage the community to gather
> >>>>>input.
> >>>>
> >>>>>Following are the design points that we are currently considering:
> >>>>To be perfectly honest, I think prototyping and measuring
> >>>>performance is going to be the only way to figure out the right
> >>>>approach here.>
> >>>Agreed.  I'm currently working on a prototype to play around with some
> >>>of the ideas discussed in this thread.  As soon as it is functional
> >>>I'll send a pointer/patches/etc. to the list.
> >>
> >>Hey Paul,
> >>
> >>just wondering, exactly which approache(s) are you prototyping?  Are you
> >>touching seccomp2?
> >
> >The decomposed approach as I felt (well, still do for that matter) that the
> >enhanced seccomp stuff could be put to even better use in a decomposed mode of
> >operation.
> >
> >However, earlier this week those of us involved in this effort were strongly
> >discouraged (this probably isn't the best term to use, but there is a reason
> >I'm a programmer and not an english student) from pursuing the decomposed
> >prototype further so work on it has dropped off considerably.
> >
> >I still think it is worth pursuing, if for no other reason than to answer
> >questions that right now we can only answer with educated guesses, but it is
> >no longer my main focus.  If anyone else is interested in this feel free to
> >drop me some email and I can bring you up to speed on the current status.

Thanks, Paul.  I don't know for sure that I'll have time, but I'd
definately be interested in anything you have about current status
of that approach.  On my own I would've pursued the seccomp2 way
if only because I'll be doing the same for lxc, but if noone else
is following up on decomposition I might take a look over break.
And as you say, if the design ends up being maintaineable and with
acceptable performance overhead, I have no doubt it would be well
merged with seccomp2.

> >As far as the enhanced seccomp patches for QEMU, I believe Corey said that IBM
> >was starting work on a prototype based on the patches that Will posted earlier
> >this year.  I don't expect this change to be very substantial, the hard part
> >will be determining the syscall filter and maintaining it over time.
> >
> 
> Paul covered the current state of affairs above so I won't expand on
> that much.  One of the major concerns from the QEMU community
> revolved around the maintenance complexity introduced by decomposing
> QEMU into separate processes, and that patches doing so were
> unlikely to be accepted.
> 
> With that in mind we're going to pursue a single process mode 2
> approach.  We'll put together a trivial prototype for evaluation
> purposes.  Like Paul mentioned, one of the complex parts is
> determining the correct call parameter filters, and there will be
> tweaking required as new syscalls/parameters are introduced in the
> future.  But the biggest hurdle is getting mode 2 patches into the
> mainline kernel, which has been an unsuccessful effort for a few
> years now.

I might be wrong but I think that's a bit overly pessimistic :)  Pretty
sure it's only been a few months.  Compared to some other things like
checkpoint/restart and user namespaces, it's positively on a fast track.
And if qemu demonstrates true value, that can only help.

thanks,
-serge

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-15 15:14           ` Serge Hallyn
@ 2011-12-15 15:35             ` Paul Moore
  2011-12-15 16:05               ` Serge Hallyn
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Moore @ 2011-12-15 15:35 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Stefan Hajnoczi, Corey Bryant, Michael Halcrow, qemu-devel,
	Eric Paris, Ashley D Lai, Avi Kivity, Richa Marwaha, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson

On Thursday, December 15, 2011 09:14:11 AM Serge Hallyn wrote:
> Quoting Corey Bryant (coreyb@linux.vnet.ibm.com):
> > On 12/14/2011 06:56 PM, Paul Moore wrote:
> > >On Wednesday, December 14, 2011 11:15:58 AM Serge E. Hallyn wrote:
> > >>Hey Paul,
> > >>
> > >>just wondering, exactly which approache(s) are you prototyping?  Are
> > >>you touching seccomp2?
> > >
> > >The decomposed approach as I felt (well, still do for that matter)
> > >that the enhanced seccomp stuff could be put to even better use in a
> > >decomposed mode of operation.
> > >
> > >However, earlier this week those of us involved in this effort were
> > >strongly discouraged (this probably isn't the best term to use, but
> > >there is a reason I'm a programmer and not an english student) from
> > >pursuing the decomposed prototype further so work on it has dropped
> > >off considerably.
> > >
> > >I still think it is worth pursuing, if for no other reason than to
> > >answer questions that right now we can only answer with educated
> > >guesses, but it is no longer my main focus.  If anyone else is
> > >interested in this feel free to drop me some email and I can bring
> > >you up to speed on the current status.
>
> Thanks, Paul.  I don't know for sure that I'll have time, but I'd
> definately be interested in anything you have about current status
> of that approach.  On my own I would've pursued the seccomp2 way
> if only because I'll be doing the same for lxc, but if noone else
> is following up on decomposition I might take a look over break.
> And as you say, if the design ends up being maintaineable and with
> acceptable performance overhead, I have no doubt it would be well
> merged with seccomp2.

The current status of the prototype is that it is still largely incomplete; 
most of the "how do I do this?" work is done, now it is just a matter of 
coding.

I *think* I've identified all the function calls that the e1000 device 
emulation makes into the core QEMU code as well as a good spot for forking, 
most of the implementation is blank (lots of empty function bodies).  About 
the only part of the implementation that currently has any substance to it is 
the pipe based message passing and the code trickery that allows us to go from 
straight functions calls to RPC/IPC.  Neither have been tested yet, and the 
former isn't as elegant as I would like, but at least they all compile cleanly 
... ;)

As I said earlier, I still plan to allocate some time to working on this, but 
much less than before.  I'll drop you another email, offlist, and if you've 
got some interest/time in helping out you're more than welcome to join in.

-- 
paul moore
virtualization @ redhat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Qemu-devel] [RFC] Device sandboxing
  2011-12-15 15:35             ` Paul Moore
@ 2011-12-15 16:05               ` Serge Hallyn
  0 siblings, 0 replies; 31+ messages in thread
From: Serge Hallyn @ 2011-12-15 16:05 UTC (permalink / raw)
  To: Paul Moore
  Cc: Stefan Hajnoczi, Corey Bryant, Michael Halcrow, qemu-devel,
	Eric Paris, Ashley D Lai, Avi Kivity, Richa Marwaha, Amit Shah,
	Radim Krčmář, Eduardo Terrell Ferrari Otubo,
	Lee Terrell, George Wilson

Quoting Paul Moore (pmoore@redhat.com):
> On Thursday, December 15, 2011 09:14:11 AM Serge Hallyn wrote:
> > Quoting Corey Bryant (coreyb@linux.vnet.ibm.com):
> > > On 12/14/2011 06:56 PM, Paul Moore wrote:
> > > >On Wednesday, December 14, 2011 11:15:58 AM Serge E. Hallyn wrote:
> > > >>Hey Paul,
> > > >>
> > > >>just wondering, exactly which approache(s) are you prototyping?  Are
> > > >>you touching seccomp2?
> > > >
> > > >The decomposed approach as I felt (well, still do for that matter)
> > > >that the enhanced seccomp stuff could be put to even better use in a
> > > >decomposed mode of operation.
> > > >
> > > >However, earlier this week those of us involved in this effort were
> > > >strongly discouraged (this probably isn't the best term to use, but
> > > >there is a reason I'm a programmer and not an english student) from
> > > >pursuing the decomposed prototype further so work on it has dropped
> > > >off considerably.
> > > >
> > > >I still think it is worth pursuing, if for no other reason than to
> > > >answer questions that right now we can only answer with educated
> > > >guesses, but it is no longer my main focus.  If anyone else is
> > > >interested in this feel free to drop me some email and I can bring
> > > >you up to speed on the current status.
> >
> > Thanks, Paul.  I don't know for sure that I'll have time, but I'd
> > definately be interested in anything you have about current status
> > of that approach.  On my own I would've pursued the seccomp2 way
> > if only because I'll be doing the same for lxc, but if noone else
> > is following up on decomposition I might take a look over break.
> > And as you say, if the design ends up being maintaineable and with
> > acceptable performance overhead, I have no doubt it would be well
> > merged with seccomp2.
> 
> The current status of the prototype is that it is still largely incomplete; 
> most of the "how do I do this?" work is done, now it is just a matter of 
> coding.
> 
> I *think* I've identified all the function calls that the e1000 device 
> emulation makes into the core QEMU code as well as a good spot for forking, 
> most of the implementation is blank (lots of empty function bodies).  About 
> the only part of the implementation that currently has any substance to it is 
> the pipe based message passing and the code trickery that allows us to go from 
> straight functions calls to RPC/IPC.  Neither have been tested yet, and the 
> former isn't as elegant as I would like, but at least they all compile cleanly 
> ... ;)
> 
> As I said earlier, I still plan to allocate some time to working on this, but 
> much less than before.  I'll drop you another email, offlist, and if you've 
> got some interest/time in helping out you're more than welcome to join in.

Thanks, Paul.

-serge

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2011-12-15 16:05 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-07 18:25 [Qemu-devel] [RFC] Device sandboxing Corey Bryant
2011-12-07 18:48 ` Anthony Liguori
2011-12-07 19:32   ` Corey Bryant
2011-12-07 19:43     ` Anthony Liguori
2011-12-07 19:52       ` Michael Halcrow
2011-12-07 20:02       ` Corey Bryant
2011-12-07 20:54       ` Eric Paris
2011-12-08  9:40         ` Stefan Hajnoczi
2011-12-11 10:50           ` Dor Laor
2011-12-12 18:54             ` Will Drewry
2011-12-08  9:47     ` Stefan Hajnoczi
2011-12-08 14:39       ` Corey Bryant
2011-12-07 21:20   ` Paul Moore
2011-12-14 17:15     ` Serge E. Hallyn
2011-12-14 23:56       ` Paul Moore
2011-12-15 14:28         ` Corey Bryant
2011-12-15 15:14           ` Serge Hallyn
2011-12-15 15:35             ` Paul Moore
2011-12-15 16:05               ` Serge Hallyn
2011-12-08 21:51 ` Blue Swirl
2011-12-12 18:30   ` Corey Bryant
2011-12-09 16:17 ` Paul Brook
2011-12-09 16:34   ` Paul Moore
2011-12-09 17:32     ` Paul Brook
2011-12-09 17:49       ` Paul Moore
2011-12-09 18:46         ` Paul Brook
2011-12-09 18:50           ` Paul Moore
2011-12-09 18:59           ` Paul Brook
2011-12-09 19:17             ` Paul Moore
2011-12-10 19:39   ` Blue Swirl
2011-12-11  9:08   ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).