From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:59304) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sll4O-0003CC-Cz for qemu-devel@nongnu.org; Mon, 02 Jul 2012 14:09:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Sll4M-0002qQ-15 for qemu-devel@nongnu.org; Mon, 02 Jul 2012 14:09:55 -0400 Received: from e33.co.us.ibm.com ([32.97.110.151]:35821) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sll4L-0002qH-OA for qemu-devel@nongnu.org; Mon, 02 Jul 2012 14:09:53 -0400 Received: from /spool/local by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 2 Jul 2012 12:09:47 -0600 Received: from d01relay05.pok.ibm.com (d01relay05.pok.ibm.com [9.56.227.237]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 5FEBBC90062 for ; Mon, 2 Jul 2012 14:05:10 -0400 (EDT) Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q62I5ACI223886 for ; Mon, 2 Jul 2012 14:05:10 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q62I59Ii006028 for ; Mon, 2 Jul 2012 15:05:10 -0300 Message-ID: <4FF1E2D4.1050702@linux.vnet.ibm.com> Date: Mon, 02 Jul 2012 14:05:08 -0400 From: Corey Bryant MIME-Version: 1.0 References: <5022524.gIe1TV6Uvp@sifl> <3077496.8pYx57Tfhz@sifl> <4FE05CC8.9040801@redhat.com> <4FE2D576.10509@redhat.com> <4FEB7A4D.7050608@redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC] [PATCHv2 2/2] Adding basic calls to libseccomp in vl.c List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Blue Swirl Cc: Paul Moore , qemu-devel@nongnu.org, Avi Kivity , Anthony Liguori , Eduardo Otubo On 06/28/2012 03:49 PM, Blue Swirl wrote: > On Wed, Jun 27, 2012 at 9:25 PM, Anthony Liguori wrote: >> On 06/21/2012 03:04 AM, Avi Kivity wrote: >>> >>> On 06/19/2012 09:58 PM, Blue Swirl wrote: >>>>>> >>>>>> At least qemu-ifup/down scripts, migration exec and smbd have been >>>>>> mentioned. Only the system calls made by smbd (for some version of it) >>>>>> can be known. The user could specify arbitrary commands for the >>>>>> others, those could be assumed to use some common (large) subset of >>>>>> system calls but I think the security value would be close to zero >>>>>> then. >>>>> >>>>> >>>>> We're not trying to protect against the user, but against the guest. If >>>>> we assume the user wrote those scripts with care so they cannot be >>>>> exploited by the guest, then we are okay. >>>> >>>> >>>> My concern was that first we could accidentally filter a system call >>>> that changes the script or executable behavior, much like sendmail + >>>> capabilities bug, and then a guest could trigger running this >>>> script/executable and exploit the changed behavior. >>> >>> >>> Ah, I see. I agree this is dangerous. We should probably disable exec >>> if we seccomp. >> >> >> There's no great place to jump into this thread so I guess I'll do it here. >> >> There is absolutely no doubt that white-listing syscalls that we currently >> use provides an improvement in security. >> >> We need to assume: >> >> 1) QEMU is run as an unprivileged user >> >> 2) QEMU is already heavily restricted by SELinux >> >> In this case, seccomp() is not being used to replace MAC or DAC. It's >> supplementing both of them by additionally filtering out syscalls that may >> have unknown kernel exploits in them. That's all this initial effort is >> about. Since it's scope is so limited, we can simply enable it >> unconditionally too. > > I don't think the scope is limited in a safe way. What is the set of > system calls that can't ever cause problems to any possible ifup/down > scripts, migration exec helpers and various versions of smbd? > > For example, unlink() is missing. What if the ifup/down script needs > it for lock file cleanup? ftruncate()? Every socket syscalls in case > LDAP is used to access user information by the libc? > > I think we can't define the safe set, except 'allow all'. I'd propose > one of the following to avoid breakage: > > 1. Allow all system calls for the initial patch, refactor later to > reduce the set. Useless until refactored. One thing I like about starting with a known subset of syscalls used by QEMU is that it forces us to expand the whitelist if we come across more syscalls that QEMU uses. An issue with this approach is that if seccomp kills QEMU for using a disallowed syscall, I don't think we know what syscall it is. (At least, I don't think it is accessible anywhere.) This is good for security but makes it hard for developers who are debugging. Would it make sense to have the ability to configure QEMU in either: 1) seccomp kill mode (this is what the existing patches do), or 2) seccomp debug mode? In debug mode we could trap on the failing syscall (using SCMP_ACT_TRAP), determine the syscall value, and issue an error message that displays the syscall value. The emulator() function here gives an idea of how this could be done: https://lkml.org/lkml/2012/4/12/449 > > 2. Don't make seccomp mode enabled default, when enabled, forbid > execve(). Limits functionality when enabled, no security benefit if > not enabled. > > 3. Before enabling seccomp, fork a helper process without restrictions > that is used to launch other programs. Needs some work. > >> >> After we have this initial support, then we can look at a -sandbox option. >> This open could prevent things like open()/execve() but that will come at a >> cost of features. >> >> I think the reasonable thing to do for -sandbox is to basically focus on the >> set of syscalls that QEMU would use if it were launched under libvirt. We >> should obviously make improvements (things like -blockdev) to make this even >> more restrictive. >> >> Who knows, maybe we end up having multiple types of sandboxes. A '-sandbox >> libvirt' and a '-sandbox user' where the later is focused on the typical >> usage of an unprivileged user. >> >> But this is all stuff that can come later. We solve a big problem by just >> getting the initial whitelist support in. > > Fully agree, but we'd have to agree about what is a safe initial whitelist. > >> >> Regards, >> >> Anthony Liguori >> >> >>> >>>>> >>>>> We have decomposed qemu to some extent, in that privileged operations >>>>> happen in libvirt. So the modes make sense - qemu has no idea whether a >>>>> privileged management system is controlling it or not. >>>> >>>> >>>> So with -seccomp, libvirt could tell QEMU that for example open(), >>>> execve(), bind() and connect() will never be needed? >>> >>> >>> Yes. >>> >> > -- Regards, Corey