qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qemu-devel] Secure KVM
       [not found] <1320612020.3299.22.camel@lappy>
@ 2011-11-07 17:37 ` Anthony Liguori
  2011-11-07 17:52   ` Sasha Levin
  0 siblings, 1 reply; 5+ messages in thread
From: Anthony Liguori @ 2011-11-07 17:37 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andrea Arcangeli, Cyrill Gorcunov, Rusty Russell, kvm,
	Michael S. Tsirkin, Corentin Chary, Asias He, Marcelo Tosatti,
	qemu-devel, Pekka Enberg, Avi Kivity, Ingo Molnar

On 11/06/2011 02:40 PM, Sasha Levin wrote:
> Hi all,
>
> I'm planning on doing a small fork of the KVM tool to turn it into a
> 'Secure KVM' enabled hypervisor. Now you probably ask yourself, Huh?
>
> The idea was discussed briefly couple of months ago, but never got off
> the ground - which is a shame IMO.
>
> It's easy to explain the problem: If an attacker finds a security hole
> in any of the devices which are exposed to the guest, the attacker would
> be able to either crash the guest, or possibly run code on the host
> itself.
>
> The solution is also simple to explain: Split the devices into different
> processes and use seccomp to sandbox each device into the exact set of
> resources it needs to operate, nothing more and nothing less.
>
> Since I'll be basing it on the KVM tool, which doesn't really emulate
> that many legacy devices, I'll focus first on the virtio family for the
> sake of simplicity (and covering 90% of the options).
>
> This is my basic overview of how I'm planning on implementing the
> initial POC:
>
> 1. First I'll focus on the simple virtio-rng device, it's simple enough
> to allow us to focus on the aspects which are important for the POC
> while still covering most bases (i.e. sandbox to single file
> - /dev/urandom and such).
>
> 2. Do it on a one process per device concept, where for each device
> (notice - not device *type*) requested, a new process which handles it
> will be spawned.
>
> 3. That process will be limited exactly to the resources it needs to
> operate, for example - if we run a virtio-blk device, it would be able
> to access only the image file which it should be using.
>
> 4. Connection between hypervisor and devices will be based on unix
> sockets, this should allow for better separation compared to other
> approaches such as shared memory.
>
> 5. While performance is an aspect, complete isolation is more important.
> Security is primary, performance is secondary.
>
> 6. Share as much code as possible with current implementation of virtio
> devices, make it possible to run virtio devices either like it's being
> done now, or by spawning them as separate processes - the amount of
> specific code for the separate process case should be minimal.
>
>
> Thats all I have for now, comments are *very* welcome.

I thought about this a bit and have some ideas that may or may not help.

1) If you add device save/load support, then it's something you can potentially 
use to give yourself quite a bit of flexibility in changing the sandbox.  At any 
point in run time, you can save the device model's state in the sandbox, destroy 
the sandbox, and then build a new sandbox and restore the device to its former 
state.

This might turn out to be very useful in supporting things like device hotplug 
and/or memory hot plug.

2) I think it's largely possible to implement all device emulation without doing 
any dynamic memory allocation.  Since memory allocation DoS is something you 
have to deal with anyway, I suspect most device emulation already uses a fixed 
amount of memory per device.   This can potentially dramatically simplify things.

3) I think virtio can/should be used as a generic "backend to frontend" 
transport between the device model and the tool.

4) Lack of select() is really challenging.  I understand why it's not there 
since it can technically be emulated but it seems like a no-risk syscall to 
whitelist and it would make programming in a sandbox so much easier.  Maybe 
Andrea has some comments here?  I might be missing something here.

Regards,

Anthony Liguori

>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Secure KVM
  2011-11-07 17:37 ` [Qemu-devel] Secure KVM Anthony Liguori
@ 2011-11-07 17:52   ` Sasha Levin
  2011-11-07 18:03     ` Anthony Liguori
  0 siblings, 1 reply; 5+ messages in thread
From: Sasha Levin @ 2011-11-07 17:52 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Andrea Arcangeli, Cyrill Gorcunov, Rusty Russell, kvm,
	Michael S. Tsirkin, Corentin Chary, Asias He, Marcelo Tosatti,
	qemu-devel, Pekka Enberg, Avi Kivity, Ingo Molnar

Hi Anthony,

Thank you for your comments!

On Mon, 2011-11-07 at 11:37 -0600, Anthony Liguori wrote:
> On 11/06/2011 02:40 PM, Sasha Levin wrote:
> > Hi all,
> >
> > I'm planning on doing a small fork of the KVM tool to turn it into a
> > 'Secure KVM' enabled hypervisor. Now you probably ask yourself, Huh?
> >
> > The idea was discussed briefly couple of months ago, but never got off
> > the ground - which is a shame IMO.
> >
> > It's easy to explain the problem: If an attacker finds a security hole
> > in any of the devices which are exposed to the guest, the attacker would
> > be able to either crash the guest, or possibly run code on the host
> > itself.
> >
> > The solution is also simple to explain: Split the devices into different
> > processes and use seccomp to sandbox each device into the exact set of
> > resources it needs to operate, nothing more and nothing less.
> >
> > Since I'll be basing it on the KVM tool, which doesn't really emulate
> > that many legacy devices, I'll focus first on the virtio family for the
> > sake of simplicity (and covering 90% of the options).
> >
> > This is my basic overview of how I'm planning on implementing the
> > initial POC:
> >
> > 1. First I'll focus on the simple virtio-rng device, it's simple enough
> > to allow us to focus on the aspects which are important for the POC
> > while still covering most bases (i.e. sandbox to single file
> > - /dev/urandom and such).
> >
> > 2. Do it on a one process per device concept, where for each device
> > (notice - not device *type*) requested, a new process which handles it
> > will be spawned.
> >
> > 3. That process will be limited exactly to the resources it needs to
> > operate, for example - if we run a virtio-blk device, it would be able
> > to access only the image file which it should be using.
> >
> > 4. Connection between hypervisor and devices will be based on unix
> > sockets, this should allow for better separation compared to other
> > approaches such as shared memory.
> >
> > 5. While performance is an aspect, complete isolation is more important.
> > Security is primary, performance is secondary.
> >
> > 6. Share as much code as possible with current implementation of virtio
> > devices, make it possible to run virtio devices either like it's being
> > done now, or by spawning them as separate processes - the amount of
> > specific code for the separate process case should be minimal.
> >
> >
> > Thats all I have for now, comments are *very* welcome.
> 
> I thought about this a bit and have some ideas that may or may not help.
> 
> 1) If you add device save/load support, then it's something you can potentially 
> use to give yourself quite a bit of flexibility in changing the sandbox.  At any 
> point in run time, you can save the device model's state in the sandbox, destroy 
> the sandbox, and then build a new sandbox and restore the device to its former 
> state.
> 
> This might turn out to be very useful in supporting things like device hotplug 
> and/or memory hot plug.
> 
> 2) I think it's largely possible to implement all device emulation without doing 
> any dynamic memory allocation.  Since memory allocation DoS is something you 
> have to deal with anyway, I suspect most device emulation already uses a fixed 
> amount of memory per device.   This can potentially dramatically simplify things.
> 
> 3) I think virtio can/should be used as a generic "backend to frontend" 
> transport between the device model and the tool.

virtio requires server and client to have shared memory, so if we
already go with shared memory we can just let the device manage the
actual virtio driver directly, no?

Also, things like interrupts would also require some sort of a different
IPC, which would complicate things a bit.


> 4) Lack of select() is really challenging.  I understand why it's not there 
> since it can technically be emulated but it seems like a no-risk syscall to 
> whitelist and it would make programming in a sandbox so much easier.  Maybe 
> Andrea has some comments here?  I might be missing something here.

There are several of these which would be nice to have, and if we can
get seccomp filters we have good flexibility with which APIs we allow
for each device.

> Regards,
> 
> Anthony Liguori
> 
> >
> 

-- 

Sasha.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Secure KVM
  2011-11-07 17:52   ` Sasha Levin
@ 2011-11-07 18:03     ` Anthony Liguori
  2011-11-07 23:06       ` Rusty Russell
  2011-11-08 19:51       ` Will Drewry
  0 siblings, 2 replies; 5+ messages in thread
From: Anthony Liguori @ 2011-11-07 18:03 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Andrea Arcangeli, Pekka Enberg, Marcelo Tosatti, kvm,
	Michael S. Tsirkin, Corentin Chary, Asias He, Rusty Russell,
	qemu-devel, Cyrill Gorcunov, Avi Kivity, Ingo Molnar

On 11/07/2011 11:52 AM, Sasha Levin wrote:
> Hi Anthony,
>
> Thank you for your comments!
>
> On Mon, 2011-11-07 at 11:37 -0600, Anthony Liguori wrote:
>> On 11/06/2011 02:40 PM, Sasha Levin wrote:
>>> Hi all,
>>>
>>> I'm planning on doing a small fork of the KVM tool to turn it into a
>>> 'Secure KVM' enabled hypervisor. Now you probably ask yourself, Huh?
>>>
>>> The idea was discussed briefly couple of months ago, but never got off
>>> the ground - which is a shame IMO.
>>>
>>> It's easy to explain the problem: If an attacker finds a security hole
>>> in any of the devices which are exposed to the guest, the attacker would
>>> be able to either crash the guest, or possibly run code on the host
>>> itself.
>>>
>>> The solution is also simple to explain: Split the devices into different
>>> processes and use seccomp to sandbox each device into the exact set of
>>> resources it needs to operate, nothing more and nothing less.
>>>
>>> Since I'll be basing it on the KVM tool, which doesn't really emulate
>>> that many legacy devices, I'll focus first on the virtio family for the
>>> sake of simplicity (and covering 90% of the options).
>>>
>>> This is my basic overview of how I'm planning on implementing the
>>> initial POC:
>>>
>>> 1. First I'll focus on the simple virtio-rng device, it's simple enough
>>> to allow us to focus on the aspects which are important for the POC
>>> while still covering most bases (i.e. sandbox to single file
>>> - /dev/urandom and such).
>>>
>>> 2. Do it on a one process per device concept, where for each device
>>> (notice - not device *type*) requested, a new process which handles it
>>> will be spawned.
>>>
>>> 3. That process will be limited exactly to the resources it needs to
>>> operate, for example - if we run a virtio-blk device, it would be able
>>> to access only the image file which it should be using.
>>>
>>> 4. Connection between hypervisor and devices will be based on unix
>>> sockets, this should allow for better separation compared to other
>>> approaches such as shared memory.
>>>
>>> 5. While performance is an aspect, complete isolation is more important.
>>> Security is primary, performance is secondary.
>>>
>>> 6. Share as much code as possible with current implementation of virtio
>>> devices, make it possible to run virtio devices either like it's being
>>> done now, or by spawning them as separate processes - the amount of
>>> specific code for the separate process case should be minimal.
>>>
>>>
>>> Thats all I have for now, comments are *very* welcome.
>>
>> I thought about this a bit and have some ideas that may or may not help.
>>
>> 1) If you add device save/load support, then it's something you can potentially
>> use to give yourself quite a bit of flexibility in changing the sandbox.  At any
>> point in run time, you can save the device model's state in the sandbox, destroy
>> the sandbox, and then build a new sandbox and restore the device to its former
>> state.
>>
>> This might turn out to be very useful in supporting things like device hotplug
>> and/or memory hot plug.
>>
>> 2) I think it's largely possible to implement all device emulation without doing
>> any dynamic memory allocation.  Since memory allocation DoS is something you
>> have to deal with anyway, I suspect most device emulation already uses a fixed
>> amount of memory per device.   This can potentially dramatically simplify things.
>>
>> 3) I think virtio can/should be used as a generic "backend to frontend"
>> transport between the device model and the tool.
>
> virtio requires server and client to have shared memory, so if we
> already go with shared memory we can just let the device manage the
> actual virtio driver directly, no?

Let's say you're implementing an IDE device model in the sandbox.  You can try 
to implement the block layer in the sandbox but I think that quickly will become 
too difficult.

You can do as Avi suggested and do all DMA accesses from the IDE device model as 
RPCs, or you can map guest memory as shared memory and utilize (1) in order to 
change that mapping as you need to.

At some point, you end up with a struct iovec and an offset that you want to 
read/write to the virtual disk.  You need a way to send that to the "frontend" 
that will then handle that as a raw/qcow2 request.

Well, virtio is great at doing exactly that :-)   So if you increase your shared 
memory to have a little bit extra to stick another vring, you can use that for 
device model -> front end communication without paying an extra memcpy.

For notifications, the easiest thing to do is setup an "event channel" bitmap 
and use a single eventfd to multiplex that event channel bitmap.  This is pretty 
much how Xen works btw.  A single interrupt is reserved and a bitmap is used to 
dispatch the actual events.

So the sandbox loop would look like:

void main() {
   setup_devices();

   read_from_event_channel(main_channel);
   for i in vrings:
      check_vring_notification(i);
}

Once vring would be used for dispatching PIO/MMIO.  The remaining vrings could 
be used for anything really.

Like I mentioned elsewhere, just think of the sandbox as just an extension of 
the guests firmware.  The purpose of the sandbox is to reduce a very 
complicated, legacy device model, into a very simple and easy to audit, purely 
virtio based model.

>
> Also, things like interrupts would also require some sort of a different
> IPC, which would complicate things a bit.
>
>
>> 4) Lack of select() is really challenging.  I understand why it's not there
>> since it can technically be emulated but it seems like a no-risk syscall to
>> whitelist and it would make programming in a sandbox so much easier.  Maybe
>> Andrea has some comments here?  I might be missing something here.
>
> There are several of these which would be nice to have, and if we can
> get seccomp filters we have good flexibility with which APIs we allow
> for each device.

Yeah, filters are nice but I fear that you lose some of the PR benefits of 
sandboxing.  Once the first application claims to use sandboxing, whitelists a 
syscall it shouldn't, you'll start getting slashdot articles about "Linux 
sandbox broken, Linux security hopeless broken".  Then what's the point of all 
of this?

Regards,

Anthony Liguori

>> Regards,
>>
>> Anthony Liguori
>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Secure KVM
  2011-11-07 18:03     ` Anthony Liguori
@ 2011-11-07 23:06       ` Rusty Russell
  2011-11-08 19:51       ` Will Drewry
  1 sibling, 0 replies; 5+ messages in thread
From: Rusty Russell @ 2011-11-07 23:06 UTC (permalink / raw)
  To: Anthony Liguori, Sasha Levin
  Cc: Andrea Arcangeli, Pekka Enberg, kvm, Michael S. Tsirkin,
	Corentin Chary, Asias He, Marcelo Tosatti, qemu-devel,
	Cyrill Gorcunov, Avi Kivity, Ingo Molnar

On Mon, 07 Nov 2011 12:03:38 -0600, Anthony Liguori <anthony@codemonkey.ws> wrote:
> So the sandbox loop would look like:
> 
> void main() {
>    setup_devices();
> 
>    read_from_event_channel(main_channel);
>    for i in vrings:
>       check_vring_notification(i);
> }

lguest uses a model where you attach an eventfd to a given virtqueue.
(If you don't have an eventfd registered for a vq, the main process
 returns from the read() of /dev/lguest with the info).

At the moment we use a process per virtqueue, but you could attach the
same eventfd to multiple vqs.

Since you can't select() inside seccomp, the main process could write to
the eventfd to wake up the thread to respond to IPC.

Here's the net output code:

/*
 * The Network
 *
 * Handling output for network is also simple: we get all the output buffers
 * and write them to /dev/net/tun.
 */
struct net_info {
	int tunfd;
};

static void net_output(struct virtqueue *vq)
{
	struct net_info *net_info = vq->dev->priv;
	unsigned int head, out, in;
	struct iovec iov[vq->vring.num];

	/* We usually wait in here for the Guest to give us a packet. */
	head = wait_for_vq_desc(vq, iov, &out, &in);
	if (in)
		errx(1, "Input buffers in net output queue?");
	/*
	 * Send the whole thing through to /dev/net/tun.  It expects the exact
	 * same format: what a coincidence!
	 */
	if (writev(net_info->tunfd, iov, out) < 0)
		warnx("Write to tun failed (%d)?", errno);

	/*
	 * Done with that one; wait_for_vq_desc() will send the interrupt if
	 * all packets are processed.
	 */
	add_used(vq, head, 0);
}

Here's the input thread:

/*
 * Handling network input is a bit trickier, because I've tried to optimize it.
 *
 * First we have a helper routine which tells is if from this file descriptor
 * (ie. the /dev/net/tun device) will block:
 */
static bool will_block(int fd)
{
	fd_set fdset;
	struct timeval zero = { 0, 0 };
	FD_ZERO(&fdset);
	FD_SET(fd, &fdset);
	return select(fd+1, &fdset, NULL, NULL, &zero) != 1;
}

/*
 * This handles packets coming in from the tun device to our Guest.  Like all
 * service routines, it gets called again as soon as it returns, so you don't
 * see a while(1) loop here.
 */
static void net_input(struct virtqueue *vq)
{
	int len;
	unsigned int head, out, in;
	struct iovec iov[vq->vring.num];
	struct net_info *net_info = vq->dev->priv;

	/*
	 * Get a descriptor to write an incoming packet into.  This will also
	 * send an interrupt if they're out of descriptors.
	 */
	head = wait_for_vq_desc(vq, iov, &out, &in);
	if (out)
		errx(1, "Output buffers in net input queue?");

	/*
	 * If it looks like we'll block reading from the tun device, send them
	 * an interrupt.
	 */
	if (vq->pending_used && will_block(net_info->tunfd))
		trigger_irq(vq);

	/*
	 * Read in the packet.  This is where we normally wait (when there's no
	 * incoming network traffic).
	 */
	len = readv(net_info->tunfd, iov, in);
	if (len <= 0)
		warn("Failed to read from tun (%d).", errno);

	/*
	 * Mark that packet buffer as used, but don't interrupt here.  We want
	 * to wait until we've done as much work as we can.
	 */
	add_used(vq, head, len);
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] Secure KVM
  2011-11-07 18:03     ` Anthony Liguori
  2011-11-07 23:06       ` Rusty Russell
@ 2011-11-08 19:51       ` Will Drewry
  1 sibling, 0 replies; 5+ messages in thread
From: Will Drewry @ 2011-11-08 19:51 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Andrea Arcangeli, Cyrill Gorcunov, Rusty Russell, kvm,
	Michael S. Tsirkin, Corentin Chary, Asias He, Marcelo Tosatti,
	qemu-devel, Pekka Enberg, Sasha Levin, Ingo Molnar, Avi Kivity

On Mon, Nov 7, 2011 at 12:03 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 11/07/2011 11:52 AM, Sasha Levin wrote:
>>
>> Hi Anthony,
>>
>> Thank you for your comments!
>>
>> On Mon, 2011-11-07 at 11:37 -0600, Anthony Liguori wrote:
>>>
>>> On 11/06/2011 02:40 PM, Sasha Levin wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I'm planning on doing a small fork of the KVM tool to turn it into a
>>>> 'Secure KVM' enabled hypervisor. Now you probably ask yourself, Huh?
>>>>
>>>> The idea was discussed briefly couple of months ago, but never got off
>>>> the ground - which is a shame IMO.
>>>>
>>>> It's easy to explain the problem: If an attacker finds a security hole
>>>> in any of the devices which are exposed to the guest, the attacker would
>>>> be able to either crash the guest, or possibly run code on the host
>>>> itself.
>>>>
>>>> The solution is also simple to explain: Split the devices into different
>>>> processes and use seccomp to sandbox each device into the exact set of
>>>> resources it needs to operate, nothing more and nothing less.
>>>>
>>>> Since I'll be basing it on the KVM tool, which doesn't really emulate
>>>> that many legacy devices, I'll focus first on the virtio family for the
>>>> sake of simplicity (and covering 90% of the options).
>>>>
>>>> This is my basic overview of how I'm planning on implementing the
>>>> initial POC:
>>>>
>>>> 1. First I'll focus on the simple virtio-rng device, it's simple enough
>>>> to allow us to focus on the aspects which are important for the POC
>>>> while still covering most bases (i.e. sandbox to single file
>>>> - /dev/urandom and such).
>>>>
>>>> 2. Do it on a one process per device concept, where for each device
>>>> (notice - not device *type*) requested, a new process which handles it
>>>> will be spawned.
>>>>
>>>> 3. That process will be limited exactly to the resources it needs to
>>>> operate, for example - if we run a virtio-blk device, it would be able
>>>> to access only the image file which it should be using.
>>>>
>>>> 4. Connection between hypervisor and devices will be based on unix
>>>> sockets, this should allow for better separation compared to other
>>>> approaches such as shared memory.
>>>>
>>>> 5. While performance is an aspect, complete isolation is more important.
>>>> Security is primary, performance is secondary.
>>>>
>>>> 6. Share as much code as possible with current implementation of virtio
>>>> devices, make it possible to run virtio devices either like it's being
>>>> done now, or by spawning them as separate processes - the amount of
>>>> specific code for the separate process case should be minimal.
>>>>
>>>>
>>>> Thats all I have for now, comments are *very* welcome.
>>>
>>> I thought about this a bit and have some ideas that may or may not help.
>>>
>>> 1) If you add device save/load support, then it's something you can
>>> potentially
>>> use to give yourself quite a bit of flexibility in changing the sandbox.
>>>  At any
>>> point in run time, you can save the device model's state in the sandbox,
>>> destroy
>>> the sandbox, and then build a new sandbox and restore the device to its
>>> former
>>> state.
>>>
>>> This might turn out to be very useful in supporting things like device
>>> hotplug
>>> and/or memory hot plug.
>>>
>>> 2) I think it's largely possible to implement all device emulation
>>> without doing
>>> any dynamic memory allocation.  Since memory allocation DoS is something
>>> you
>>> have to deal with anyway, I suspect most device emulation already uses a
>>> fixed
>>> amount of memory per device.   This can potentially dramatically simplify
>>> things.
>>>
>>> 3) I think virtio can/should be used as a generic "backend to frontend"
>>> transport between the device model and the tool.
>>
>> virtio requires server and client to have shared memory, so if we
>> already go with shared memory we can just let the device manage the
>> actual virtio driver directly, no?
>
> Let's say you're implementing an IDE device model in the sandbox.  You can
> try to implement the block layer in the sandbox but I think that quickly
> will become too difficult.
>
> You can do as Avi suggested and do all DMA accesses from the IDE device
> model as RPCs, or you can map guest memory as shared memory and utilize (1)
> in order to change that mapping as you need to.
>
> At some point, you end up with a struct iovec and an offset that you want to
> read/write to the virtual disk.  You need a way to send that to the
> "frontend" that will then handle that as a raw/qcow2 request.
>
> Well, virtio is great at doing exactly that :-)   So if you increase your
> shared memory to have a little bit extra to stick another vring, you can use
> that for device model -> front end communication without paying an extra
> memcpy.
>
> For notifications, the easiest thing to do is setup an "event channel"
> bitmap and use a single eventfd to multiplex that event channel bitmap.
>  This is pretty much how Xen works btw.  A single interrupt is reserved and
> a bitmap is used to dispatch the actual events.
>
> So the sandbox loop would look like:
>
> void main() {
>  setup_devices();
>
>  read_from_event_channel(main_channel);
>  for i in vrings:
>     check_vring_notification(i);
> }
>
> Once vring would be used for dispatching PIO/MMIO.  The remaining vrings
> could be used for anything really.
>
> Like I mentioned elsewhere, just think of the sandbox as just an extension
> of the guests firmware.  The purpose of the sandbox is to reduce a very
> complicated, legacy device model, into a very simple and easy to audit,
> purely virtio based model.
>
>>
>> Also, things like interrupts would also require some sort of a different
>> IPC, which would complicate things a bit.
>>
>>
>>> 4) Lack of select() is really challenging.  I understand why it's not
>>> there
>>> since it can technically be emulated but it seems like a no-risk syscall
>>> to
>>> whitelist and it would make programming in a sandbox so much easier.
>>>  Maybe
>>> Andrea has some comments here?  I might be missing something here.
>>
>> There are several of these which would be nice to have, and if we can
>> get seccomp filters we have good flexibility with which APIs we allow
>> for each device.
>
> Yeah, filters are nice but I fear that you lose some of the PR benefits of
> sandboxing.  Once the first application claims to use sandboxing, whitelists
> a syscall it shouldn't, you'll start getting slashdot articles about "Linux
> sandbox broken, Linux security hopeless broken".  Then what's the point of
> all of this?

Approaching the limit: since no security code/infrastructure is
perfect, then what's the point of all of this? :)

When I've spoken about seccomp_filter, I've tried to avoid the word
'sandbox' as that comes with more baggage than just creating a means
of reducing the kernel's attack surface.  Ideally, seccomp_filter just
fills the void between read/write/sigreturn/exit and
all-the-system-calls: Don't want select? ok. Want epoll? ok. . . It
does mean that developers will have to determine the tradeoffs
themselves (or with some general guidance).  But, I expect there'd be
quite a few more consumers of seccomp if it was possible to not need
to emulate select() behavior or if, for example, brk() was allowed.

cheers!
will

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-11-08 19:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1320612020.3299.22.camel@lappy>
2011-11-07 17:37 ` [Qemu-devel] Secure KVM Anthony Liguori
2011-11-07 17:52   ` Sasha Levin
2011-11-07 18:03     ` Anthony Liguori
2011-11-07 23:06       ` Rusty Russell
2011-11-08 19:51       ` Will Drewry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).