From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:48969)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wad@chromium.org>) id 1RaB1W-00077O-Cn
	for qemu-devel@nongnu.org; Mon, 12 Dec 2011 13:54:51 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <wad@chromium.org>) id 1RaB1S-0002B9-0Z
	for qemu-devel@nongnu.org; Mon, 12 Dec 2011 13:54:50 -0500
Received: from mail-bw0-f45.google.com ([209.85.214.45]:46395)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wad@chromium.org>) id 1RaB1Q-0002Ab-Uk
	for qemu-devel@nongnu.org; Mon, 12 Dec 2011 13:54:45 -0500
Received: by bkbzu5 with SMTP id zu5so5248096bkb.4
	for <qemu-devel@nongnu.org>; Mon, 12 Dec 2011 10:54:43 -0800 (PST)
MIME-Version: 1.0
In-Reply-To: <4EE48ADA.9000908@redhat.com>
References: <4EDFAF91.4070904@linux.vnet.ibm.com>
	<4EDFB4F0.70406@codemonkey.ws>
	<4EDFBF56.9030607@linux.vnet.ibm.com>
	<4EDFC1F3.1080900@codemonkey.ws>
	<1323291290.2486.13.camel@localhost>
	<CAJSP0QU2DEy2MeXk0uQaFRYmb2eor+wqa1AXn3pTyqV9fiAWUw@mail.gmail.com>
	<4EE48ADA.9000908@redhat.com>
Date: Mon, 12 Dec 2011 12:54:42 -0600
Message-ID: <CABqD9hZFAXCec7NzG00tU1c-b63UhbYmdAc_jrBO2SPS_Qi4Yw@mail.gmail.com>
From: Will Drewry <wad@chromium.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [RFC] Device sandboxing
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: dlaor@redhat.com
Cc: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= <radimkrcmar@hpx.cz>, Stefan Hajnoczi <stefanha@gmail.com>, Corey Bryant <coreyb@linux.vnet.ibm.com>, Michael Halcrow <mhalcrow@google.com>, qemu-devel@nongnu.org, Eric Paris <eparis@redhat.com>, Paul Moore <pmoore@redhat.com>, George Wilson <gcwilson@us.ibm.com>, Avi Kivity <avi@redhat.com>, Richa Marwaha <rmarwah@us.ibm.com>, Amit Shah <amit.shah@redhat.com>, Ashley D Lai <adlai@us.ibm.com>, Eduardo Terrell Ferrari Otubo <eotubo@br.ibm.com>, Lee Terrell <lterrell@us.ibm.com>

On Sun, Dec 11, 2011 at 4:50 AM, Dor Laor <dlaor@redhat.com> wrote:
> On 12/08/2011 11:40 AM, Stefan Hajnoczi wrote:
>>
>> On Wed, Dec 7, 2011 at 8:54 PM, Eric Paris<eparis@redhat.com> =C2=A0wrot=
e:
>>>
>>> On Wed, 2011-12-07 at 13:43 -0600, Anthony Liguori wrote:
>>>>
>>>> On 12/07/2011 01:32 PM, Corey Bryant wrote:
>>>
>>>
>>>>> That would seem like the logical approach. I think there may be new
>>>>> mode 2
>>>>> patches coming soon so we can see how they go over.
>>>>
>>>>
>>>> I'd like to see what the whitelist would need to be for something like
>>>> QEMU in
>>>> mode 2. =C2=A0My biggest concern is that the whitelist would need to b=
e so
>>>> large that
>>>> the practical security what's all that much improved.
>>>
>>>
>>> When I prototyped my version of seccomp v2 for qemu a while back I did
>>> it by only looking at syscalls after inital setup was completed (aka th=
e
>>> very last thing before main_loop() was to lock it down). =C2=A0My list =
was
>>> much sorter than even these:
>>>
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_brk,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_close,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_exit_group,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_futex,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_ioctl,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_madvise,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_mmap,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_munmap,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_read,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_recvfrom,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_recvmsg,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_rt_sigaction,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_select,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_sendto,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_tgkill,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_timer_delete,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_timer_gettime,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_timer_settime,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_write,
>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0__NR_writev,
>>>
>>> There is simple obvious negligible overhead value here, but every
>>> proposal I know of, including mine, has been shot down by Ingo. =C2=A0I=
ngo's
>>> proposal is much more work, more overhead, but clearly more flexible.
>>> His suggestions (and code based on those suggestions from others) has
>>> been shot down by PeterZ.
>>>
>>> So I feel like seccomp v2 is between a rock and a hard place. =C2=A0We =
have
>>> something that works really well, and could be a huge win for all sorts
>>> of programs, but we don't seem to be able to get anything past the
>>> damned if you do, damned if you don't nak's.....
>>>
>>> (There's also a cgroup version of seccomp proposed, but I'm guessing it
>>> will go just about as far as all the other versions)
>>
>>
>> Still, these sorts of situations are overcome all the time. =C2=A0Someti=
mes
>> it takes a while and several LWN.net articles about the drama but at
>> the end things can be worked out.
>>
>> If we want to discuss the specifics of mode 2 and especially what Ingo
>> or Peter think then I think we should do it in a forum where they
>> participate. =C2=A0Maybe their positions have changed.
>
>
> Will, little bird whispered that you'll going to send another iteration w=
/
> higher acceptance chances. Where do we stand w/ it? Can you please elabor=
ate
> on it chance to get merged?

Hi - yup. I keep getting delayed with other work, but I still plan to
send it soon.  The first plan was to port the updated patchset to
linus's tree, then repost. I plan on adding a cover letter (0/x) to
the series to see if we can get discussion going again on it. I'm not
sure what the merge chance is, to be honest. I believe the patch is
desirable to many parties, but Ingo didn't feel it was appropriate
given that it would add an ABI that was not part of the trace/perf
ABI.  Given that grand-unified trace has been held up on stable
internal api points and other challenges, I would hope that we could
continue on using a prctl for seccomp_filter, even if it becomes a
compatibility layer which is eventually switched off for most users.

I did spend some time looking at other approaches (making the syscall
table a namespace, cgroups syscalls like =C5=81ukasz Sowa's patch, etc),
but using the seccomp-centered approach still seems to make sense from
a process view and a seccomp view - and who wants a new syscall anyway
:)  The last thing I am looking at before I post is seeing if  it'd be
possible to avoid touching ftrace filter engine at all for
ipc,socketcall, ioctl, etc, but I'm not sure that it makes sense to
not use the generic system that exists today.

cheers!
will