From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48303)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wad@chromium.org>) id 1VoxWR-0003pU-Ed
	for qemu-devel@nongnu.org; Fri, 06 Dec 2013 10:40:56 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <wad@chromium.org>) id 1VoxWP-00040X-Ur
	for qemu-devel@nongnu.org; Fri, 06 Dec 2013 10:40:55 -0500
Received: from mail-pb0-x229.google.com ([2607:f8b0:400e:c01::229]:51590)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <wad@chromium.org>) id 1VoxWP-000407-Ds
	for qemu-devel@nongnu.org; Fri, 06 Dec 2013 10:40:53 -0500
Received: by mail-pb0-f41.google.com with SMTP id jt11so1261736pbb.14
	for <qemu-devel@nongnu.org>; Fri, 06 Dec 2013 07:40:52 -0800 (PST)
MIME-Version: 1.0
In-Reply-To: <20131206091333.GB2616@stefanha-thinkpad.redhat.com>
References: <1382440906-3852-1-git-send-email-otubo@linux.vnet.ibm.com>
	<3468561.4aYf2ZG3eq@sifl>
	<20131122154841.GA3232@stefanha-thinkpad.redhat.com>
	<7966284.CJrPIyrYnI@sifl>
	<20131204093946.GA19096@stefanha-thinkpad.redhat.com>
	<529F2C48.3060304@linux.vnet.ibm.com>
	<20131205131501.GB6385@stefanha-thinkpad.redhat.com>
	<CABqD9hacjUPCuitDMRx+rU_pSK0i-tT3hGYcwc2nsduvYZLqhA@mail.gmail.com>
	<20131206091333.GB2616@stefanha-thinkpad.redhat.com>
Date: Fri, 6 Dec 2013 09:40:52 -0600
Message-ID: <CABqD9hYjn2Z-7vWidttbCiArR=p-Axs0BLLMUN8NfsGNFVMaVA@mail.gmail.com>
From: Will Drewry <wad@chromium.org>
Content-Type: text/plain; charset=ISO-8859-1
Subject: Re: [Qemu-devel] [PATCH for-1.7] seccomp: setting "-sandbox on" by
	default
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Corey Bryant <coreyb@linux.vnet.ibm.com>, qemu-devel <qemu-devel@nongnu.org>, Paul Moore <pmoore@redhat.com>, Anthony Liguori <anthony@codemonkey.ws>, Paolo Bonzini <pbonzini@redhat.com>, Eduardo Otubo <otubo@linux.vnet.ibm.com>

On Fri, Dec 6, 2013 at 3:13 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Dec 05, 2013 at 10:12:00AM -0600, Will Drewry wrote:
>> On Thu, Dec 5, 2013 at 7:15 AM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> > On Wed, Dec 04, 2013 at 11:21:12AM -0200, Eduardo Otubo wrote:
>> >> On 12/04/2013 07:39 AM, Stefan Hajnoczi wrote:
>> >> >On Fri, Nov 22, 2013 at 11:00:24AM -0500, Paul Moore wrote:
>> >> >>>Developers will only be happy with seccomp if it's easy and rewarding to
>> >> >>>support/debug.
>> >> >>
>> >> >>Agreed.
>> >> >>
>> >> >>As a developer, how do you feel about the audit/syslog based approach I
>> >> >>mentioned earlier?
>> >> >
>> >> >I used the commands you posted (I think that's what you mean).  They
>> >> >produce useful output.
>> >> >
>> >> >The problem is that without an error message on stderr or from the
>> >> >shell, no one will think "QEMU process dead and hung == check seccomp"
>> >> >immediately.  It's frustrating to deal with a "silent" failure.
>> >>
>> >> The process dies with a SIGKILL, and sig handling in Qemu is hard to
>> >> implement due to dozen of external linked libraries that has their
>> >> own signal masks and conflicts with seccomp. I've already tried this
>> >> approach in the past (you can find in the list by searching for
>> >> debug mode)
>> >
>> > I now realize we may be talking past each other.  Dying with
>> > SIGKILL/SIGSYS is perfectly reasonable and I would be happy with that
>> > :-).
>> >
>> > But I think there's a bug in seccomp: a multi-threaded process can be
>> > left in a zombie state.  In my case the primary thread was killed by
>> > seccomp but another thread was deadlocked on a futex.
>> >
>> > The result is the process isn't quite dead yet.  The shell will not reap
>> > it and we're stuck with a zombie.
>> >
>> > I can reproduce it reliably when I run "qemu-system-x86_64 -sandbox on"
>> > on Fedora 20 (qemu-system-x86-1.6.1-2).
>> >
>> > Should seccomp use do_group_exit() for SIGKILL?
>>
>> Is the problem that the SECCOMP_RET_KILL didn't take down the thread
>> group (which would be a departure from how seccomp(mode=1) worked) and
>> causes the deadlock somehow, or is it that the other thread is
>> deadlocked?
>
> The former.
>
> When the first thread is killed by seccomp, the second thread in the
> process is left waiting on a futex forever.  Therefore the process never
> exits after the seccomp violation occurs.
>
> Directing the signal at a thread makes perfect sense for
> SECCOMP_RET_TRAP since the thread can handle the signal and recover.
> But for SECCOMP_RET_KILL it's probably more useful to kill the entire
> process rather than just a single thread.

Would it be possible to just have the offending thread die with
SECCOMP_RET_TRAP then have a SIGSYS handler that calls tgkill?

>
>> Regardless, adding a SECCOMP_RET_TGKILL probably isn't a bad idea :)
>
> Yes.  Do you have time for that or would you like me to send a patch?

A straight SECCOMP_RET_TGKILL code could be a bit awkward, but it
might make sense to add tgkill as "data" OR'd with RET_KILL since
those 16 bits of data are still unused.  I didn't have a clear plan
for those bits with RET_KILL, but I think they would be fair game for
this sort of extension if it is really impractical to use a trap.
I'll play with a patch next week, but I'd be happy to see alternative
approaches too!  (I know I've been kicking around a separate patch for
apply per-task behavioral flags for filters that this could fit in
too, but there will be some ABI challenges that have led me to
approach it _very_ slowly.)

thanks!