From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48303) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VoxWR-0003pU-Ed for qemu-devel@nongnu.org; Fri, 06 Dec 2013 10:40:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VoxWP-00040X-Ur for qemu-devel@nongnu.org; Fri, 06 Dec 2013 10:40:55 -0500 Received: from mail-pb0-x229.google.com ([2607:f8b0:400e:c01::229]:51590) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VoxWP-000407-Ds for qemu-devel@nongnu.org; Fri, 06 Dec 2013 10:40:53 -0500 Received: by mail-pb0-f41.google.com with SMTP id jt11so1261736pbb.14 for ; Fri, 06 Dec 2013 07:40:52 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20131206091333.GB2616@stefanha-thinkpad.redhat.com> References: <1382440906-3852-1-git-send-email-otubo@linux.vnet.ibm.com> <3468561.4aYf2ZG3eq@sifl> <20131122154841.GA3232@stefanha-thinkpad.redhat.com> <7966284.CJrPIyrYnI@sifl> <20131204093946.GA19096@stefanha-thinkpad.redhat.com> <529F2C48.3060304@linux.vnet.ibm.com> <20131205131501.GB6385@stefanha-thinkpad.redhat.com> <20131206091333.GB2616@stefanha-thinkpad.redhat.com> Date: Fri, 6 Dec 2013 09:40:52 -0600 Message-ID: From: Will Drewry Content-Type: text/plain; charset=ISO-8859-1 Subject: Re: [Qemu-devel] [PATCH for-1.7] seccomp: setting "-sandbox on" by default List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Corey Bryant , qemu-devel , Paul Moore , Anthony Liguori , Paolo Bonzini , Eduardo Otubo On Fri, Dec 6, 2013 at 3:13 AM, Stefan Hajnoczi wrote: > On Thu, Dec 05, 2013 at 10:12:00AM -0600, Will Drewry wrote: >> On Thu, Dec 5, 2013 at 7:15 AM, Stefan Hajnoczi wrote: >> > On Wed, Dec 04, 2013 at 11:21:12AM -0200, Eduardo Otubo wrote: >> >> On 12/04/2013 07:39 AM, Stefan Hajnoczi wrote: >> >> >On Fri, Nov 22, 2013 at 11:00:24AM -0500, Paul Moore wrote: >> >> >>>Developers will only be happy with seccomp if it's easy and rewarding to >> >> >>>support/debug. >> >> >> >> >> >>Agreed. >> >> >> >> >> >>As a developer, how do you feel about the audit/syslog based approach I >> >> >>mentioned earlier? >> >> > >> >> >I used the commands you posted (I think that's what you mean). They >> >> >produce useful output. >> >> > >> >> >The problem is that without an error message on stderr or from the >> >> >shell, no one will think "QEMU process dead and hung == check seccomp" >> >> >immediately. It's frustrating to deal with a "silent" failure. >> >> >> >> The process dies with a SIGKILL, and sig handling in Qemu is hard to >> >> implement due to dozen of external linked libraries that has their >> >> own signal masks and conflicts with seccomp. I've already tried this >> >> approach in the past (you can find in the list by searching for >> >> debug mode) >> > >> > I now realize we may be talking past each other. Dying with >> > SIGKILL/SIGSYS is perfectly reasonable and I would be happy with that >> > :-). >> > >> > But I think there's a bug in seccomp: a multi-threaded process can be >> > left in a zombie state. In my case the primary thread was killed by >> > seccomp but another thread was deadlocked on a futex. >> > >> > The result is the process isn't quite dead yet. The shell will not reap >> > it and we're stuck with a zombie. >> > >> > I can reproduce it reliably when I run "qemu-system-x86_64 -sandbox on" >> > on Fedora 20 (qemu-system-x86-1.6.1-2). >> > >> > Should seccomp use do_group_exit() for SIGKILL? >> >> Is the problem that the SECCOMP_RET_KILL didn't take down the thread >> group (which would be a departure from how seccomp(mode=1) worked) and >> causes the deadlock somehow, or is it that the other thread is >> deadlocked? > > The former. > > When the first thread is killed by seccomp, the second thread in the > process is left waiting on a futex forever. Therefore the process never > exits after the seccomp violation occurs. > > Directing the signal at a thread makes perfect sense for > SECCOMP_RET_TRAP since the thread can handle the signal and recover. > But for SECCOMP_RET_KILL it's probably more useful to kill the entire > process rather than just a single thread. Would it be possible to just have the offending thread die with SECCOMP_RET_TRAP then have a SIGSYS handler that calls tgkill? > >> Regardless, adding a SECCOMP_RET_TGKILL probably isn't a bad idea :) > > Yes. Do you have time for that or would you like me to send a patch? A straight SECCOMP_RET_TGKILL code could be a bit awkward, but it might make sense to add tgkill as "data" OR'd with RET_KILL since those 16 bits of data are still unused. I didn't have a clear plan for those bits with RET_KILL, but I think they would be fair game for this sort of extension if it is really impractical to use a trap. I'll play with a patch next week, but I'd be happy to see alternative approaches too! (I know I've been kicking around a separate patch for apply per-task behavioral flags for filters that this could fit in too, but there will be some ABI challenges that have led me to approach it _very_ slowly.) thanks!