* [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp @ 2012-05-04 19:08 Eduardo Otubo 2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 1/2] Adding support for libseccomp in configure Eduardo Otubo ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Eduardo Otubo @ 2012-05-04 19:08 UTC (permalink / raw) To: qemu-devel; +Cc: Eduardo Otubo Hello all, This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The patches that follows are pretty simple and straightforward. I added the correct options and checks to the configure script and the basic calls to libseccomp in the main loop at vl.c. Details of each one are in the emails of the patch set. This support limits the system call footprint of the entire QEMU process to a limited set of syscalls, those that we know QEMU uses. The idea is to limit the allowable syscalls, therefore limiting the impact that an attacked guest could have on the host system. It's important to note that the libseccomp itself needs the seccomp mode 2 feature in the kernel, which is pretty close to get to the mainline since it's already been accepted to the linux-next branch[1]. As always, comments are more than welcome. Regards, [0] - http://sourceforge.net/projects/libseccomp/ [1] - http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=e2cfabdfd075648216f99c2c03821cf3f47c1727 Eduardo Otubo (2): Adding support for libseccomp in configure Adding basic calls to libseccomp in vl.c configure | 23 ++++++++++++++++++ vl.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 104 insertions(+) -- 1.7.9.5 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC] [PATCH 1/2] Adding support for libseccomp in configure 2012-05-04 19:08 [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Eduardo Otubo @ 2012-05-04 19:08 ` Eduardo Otubo 2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c Eduardo Otubo 2012-05-08 9:15 ` [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Daniel P. Berrange 2 siblings, 0 replies; 13+ messages in thread From: Eduardo Otubo @ 2012-05-04 19:08 UTC (permalink / raw) To: qemu-devel; +Cc: Eduardo Otubo Adding basic options to the configure script to use libseccomp or not. The default is set to 'no'. If the flag --enable-libseccomp is used, the script will check for its existence using pkg-config. Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> --- configure | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/configure b/configure index 0111774..a496db3 100755 --- a/configure +++ b/configure @@ -194,6 +194,7 @@ zlib="yes" guest_agent="yes" libiscsi="" coroutine="" +libseccomp="no" # parse CC options first for opt do @@ -824,6 +825,10 @@ for opt do ;; --disable-guest-agent) guest_agent="no" ;; + --enable-libseccomp) libseccomp="yes" + ;; + --disable-libseccomp) libseccomp="no" + ;; *) echo "ERROR: unknown option $opt"; show_help="yes" ;; esac @@ -1108,6 +1113,8 @@ echo " --disable-usb-redir disable usb network redirection support" echo " --enable-usb-redir enable usb network redirection support" echo " --disable-guest-agent disable building of the QEMU Guest Agent" echo " --enable-guest-agent enable building of the QEMU Guest Agent" +echo " --disable-libseccomp disable libseccomp support" +echo " --enable-libseccomp enable libseccomp support" echo " --with-coroutine=BACKEND coroutine backend. Supported options:" echo " gthread, ucontext, sigaltstack, windows" echo "" @@ -1353,6 +1360,17 @@ EOF fi ########################################## +# libseccomp check + +if test "$libseccomp" = "yes" ; then + if $pkg_config libseccomp --modversion >/dev/null 2>&1; then + LIBS=`$pkg_config --libs libseccomp` + else + feature_not_found "libseccomp" + fi +fi + +########################################## # xen probe if test "$xen" != "no" ; then @@ -3008,6 +3026,7 @@ echo "usb net redir $usb_redir" echo "OpenGL support $opengl" echo "libiscsi support $libiscsi" echo "build guest agent $guest_agent" +echo "seccomp support $libseccomp" echo "coroutine backend $coroutine_backend" if test "$sdl_too_old" = "yes"; then @@ -3309,6 +3328,10 @@ if test "$libiscsi" = "yes" ; then echo "CONFIG_LIBISCSI=y" >> $config_host_mak fi +if test "$libseccomp" = "yes" ; then + echo "CONFIG_LIBSECCOMP=y" >> $config_host_mak +fi + # XXX: suppress that if [ "$bsd" = "yes" ] ; then echo "CONFIG_BSD=y" >> $config_host_mak -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c 2012-05-04 19:08 [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Eduardo Otubo 2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 1/2] Adding support for libseccomp in configure Eduardo Otubo @ 2012-05-04 19:08 ` Eduardo Otubo 2012-05-04 21:59 ` Andreas Färber 2012-05-08 9:15 ` [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Daniel P. Berrange 2 siblings, 1 reply; 13+ messages in thread From: Eduardo Otubo @ 2012-05-04 19:08 UTC (permalink / raw) To: qemu-devel; +Cc: Eduardo Otubo I added a syscall struct using priority levels as described in the libseccomp man page. The priority numbers are based to the frequency they appear in a sample strace from a regular qemu guest run under libvirt. Libseccomp generates linear BPF code to filter system calls, those rules are read one after another. The priority system places the most common rules first in order to reduce the overhead when processing them. Also, since this is just a first RFC, the whitelist is a little raw. We might need your help to improve, test and fine tune the set of system calls. Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> --- vl.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/vl.c b/vl.c index ae91a8a..e23838b 100644 --- a/vl.c +++ b/vl.c @@ -63,6 +63,9 @@ #include <linux/ppdev.h> #include <linux/parport.h> +#ifdef CONFIG_LIBSECCOMP +#include <seccomp.h> +#endif #endif #ifdef __sun__ #include <sys/stat.h> @@ -175,6 +178,8 @@ int main(int argc, char **argv) #define MAX_VIRTIO_CONSOLES 1 +static int seccomp_start(void); + static const char *data_dir; const char *bios_name = NULL; enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB; @@ -313,6 +318,75 @@ static int default_driver_check(QemuOpts *opts, void *opaque) return 0; } +#ifdef CONFIG_LIBSECCOMP +struct qemu_seccomp_syscall { + int32_t num; + uint8_t priority; +}; + +static struct qemu_seccomp_syscall seccomp_whitelist[] = { + {SCMP_SYS(timer_settime), 255}, + {SCMP_SYS(timer_gettime), 254}, + {SCMP_SYS(futex), 253}, + {SCMP_SYS(select), 252}, + {SCMP_SYS(recvfrom), 251}, + {SCMP_SYS(sendto), 250}, + {SCMP_SYS(read), 249}, + {SCMP_SYS(brk), 248}, + {SCMP_SYS(clone), 247}, + {SCMP_SYS(mmap), 247}, + {SCMP_SYS(mprotect), 246}, + {SCMP_SYS(rt_sigprocmask), 245}, + {SCMP_SYS(write), 244}, + {SCMP_SYS(fcntl), 243}, + {SCMP_SYS(tgkill), 242}, + {SCMP_SYS(rt_sigaction), 242}, + {SCMP_SYS(pipe2), 242}, + {SCMP_SYS(munmap), 242}, + {SCMP_SYS(mremap), 242}, + {SCMP_SYS(getsockname), 242}, + {SCMP_SYS(getpeername), 242}, + {SCMP_SYS(fdatasync), 242}, + {SCMP_SYS(close), 242} +}; + +#define seccomp_whitelist_count \ + (sizeof(seccomp_whitelist)/sizeof(seccomp_whitelist[0])) + +int seccomp_start(void) +{ + int rc = 0; + unsigned int i = 0; + + rc = seccomp_init(SCMP_ACT_KILL); + if (rc < 0) { + goto seccomp_return; + } + + for (i = 0; i < seccomp_whitelist_count; i++) { + rc = seccomp_rule_add(SCMP_ACT_ALLOW, seccomp_whitelist[i].num, 0); + if (rc < 0) { + goto seccomp_return; + } + rc = seccomp_syscall_priority(seccomp_whitelist[i].num, + seccomp_whitelist[i].priority); + if (rc < 0) { + goto seccomp_return; + } + } + + rc = seccomp_load(); + + seccomp_return: + seccomp_release(); + if (rc < 0) { + fprintf(stderr, + "ERROR: failed to configure the seccomp syscall filter in the kernel"); + } + return rc; +} +#endif + /***********************************************************/ /* QEMU state */ @@ -2257,6 +2331,13 @@ int qemu_init_main_loop(void) int main(int argc, char **argv, char **envp) { + +#ifdef CONFIG_LIBSECCOMP + if (seccomp_start() < 0) { + exit(1); + } +#endif + int i; int snapshot, linux_boot; const char *icount_option = NULL; -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c 2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c Eduardo Otubo @ 2012-05-04 21:59 ` Andreas Färber 2012-05-07 11:01 ` Paolo Bonzini 2012-05-07 12:16 ` Eduardo Otubo 0 siblings, 2 replies; 13+ messages in thread From: Andreas Färber @ 2012-05-04 21:59 UTC (permalink / raw) To: Eduardo Otubo; +Cc: qemu-devel Am 04.05.2012 21:08, schrieb Eduardo Otubo: > I added a syscall struct using priority levels as described in the libseccomp > man page. The priority numbers are based to the frequency they appear in a > sample strace from a regular qemu guest run under libvirt. > > Libseccomp generates linear BPF code to filter system calls, those rules are > read one after another. The priority system places the most common rules first > in order to reduce the overhead when processing them. > > Also, since this is just a first RFC, the whitelist is a little raw. We might > need your help to improve, test and fine tune the set of system calls. > > Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> > --- > vl.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 81 insertions(+) > > diff --git a/vl.c b/vl.c > index ae91a8a..e23838b 100644 > --- a/vl.c > +++ b/vl.c > @@ -63,6 +63,9 @@ > > #include <linux/ppdev.h> > #include <linux/parport.h> > +#ifdef CONFIG_LIBSECCOMP > +#include <seccomp.h> > +#endif > #endif > #ifdef __sun__ > #include <sys/stat.h> > @@ -175,6 +178,8 @@ int main(int argc, char **argv) > > #define MAX_VIRTIO_CONSOLES 1 > > +static int seccomp_start(void); You haven't tested this without libseccomp apparently? Other than that mostly some Coding Style issues... > + > static const char *data_dir; > const char *bios_name = NULL; > enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB; > @@ -313,6 +318,75 @@ static int default_driver_check(QemuOpts *opts, void *opaque) > return 0; > } > > +#ifdef CONFIG_LIBSECCOMP > +struct qemu_seccomp_syscall { Struct names are expected to be CamelCase. > + int32_t num; > + uint8_t priority; > +}; > + > +static struct qemu_seccomp_syscall seccomp_whitelist[] = { > + {SCMP_SYS(timer_settime), 255}, Spaces inside braces please. > + {SCMP_SYS(timer_gettime), 254}, > + {SCMP_SYS(futex), 253}, > + {SCMP_SYS(select), 252}, > + {SCMP_SYS(recvfrom), 251}, > + {SCMP_SYS(sendto), 250}, > + {SCMP_SYS(read), 249}, > + {SCMP_SYS(brk), 248}, > + {SCMP_SYS(clone), 247}, > + {SCMP_SYS(mmap), 247}, > + {SCMP_SYS(mprotect), 246}, > + {SCMP_SYS(rt_sigprocmask), 245}, > + {SCMP_SYS(write), 244}, > + {SCMP_SYS(fcntl), 243}, > + {SCMP_SYS(tgkill), 242}, > + {SCMP_SYS(rt_sigaction), 242}, > + {SCMP_SYS(pipe2), 242}, > + {SCMP_SYS(munmap), 242}, > + {SCMP_SYS(mremap), 242}, > + {SCMP_SYS(getsockname), 242}, > + {SCMP_SYS(getpeername), 242}, > + {SCMP_SYS(fdatasync), 242}, > + {SCMP_SYS(close), 242} > +}; > + > +#define seccomp_whitelist_count \ > + (sizeof(seccomp_whitelist)/sizeof(seccomp_whitelist[0])) Please just use the ARRAY_SIZE() macro inline. > + > +int seccomp_start(void) > +{ > + int rc = 0; > + unsigned int i = 0; > + > + rc = seccomp_init(SCMP_ACT_KILL); > + if (rc < 0) { > + goto seccomp_return; > + } > + > + for (i = 0; i < seccomp_whitelist_count; i++) { > + rc = seccomp_rule_add(SCMP_ACT_ALLOW, seccomp_whitelist[i].num, 0); > + if (rc < 0) { > + goto seccomp_return; > + } > + rc = seccomp_syscall_priority(seccomp_whitelist[i].num, > + seccomp_whitelist[i].priority); > + if (rc < 0) { > + goto seccomp_return; > + } > + } > + > + rc = seccomp_load(); > + > + seccomp_return: > + seccomp_release(); > + if (rc < 0) { > + fprintf(stderr, > + "ERROR: failed to configure the seccomp syscall filter in the kernel"); \n missing. > + } > + return rc; > +} > +#endif > + > /***********************************************************/ > /* QEMU state */ > > @@ -2257,6 +2331,13 @@ int qemu_init_main_loop(void) > > int main(int argc, char **argv, char **envp) > { > + > +#ifdef CONFIG_LIBSECCOMP > + if (seccomp_start() < 0) { > + exit(1); This is inconsistent: Either exit() within seccomp_start() where you print the error message, or move the fprintf() here. > + } > +#endif This is adding code before the variable declaration block, please move to after. > + > int i; > int snapshot, linux_boot; > const char *icount_option = NULL; Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c 2012-05-04 21:59 ` Andreas Färber @ 2012-05-07 11:01 ` Paolo Bonzini 2012-05-07 12:28 ` Eduardo Otubo 2012-05-07 12:16 ` Eduardo Otubo 1 sibling, 1 reply; 13+ messages in thread From: Paolo Bonzini @ 2012-05-07 11:01 UTC (permalink / raw) To: qemu-devel, otubo Il 04/05/2012 23:59, Andreas Färber ha scritto: >> > +static struct qemu_seccomp_syscall seccomp_whitelist[] = { >> > + {SCMP_SYS(timer_settime), 255}, > Spaces inside braces please. > >> > + {SCMP_SYS(timer_gettime), 254}, >> > + {SCMP_SYS(futex), 253}, >> > + {SCMP_SYS(select), 252}, >> > + {SCMP_SYS(recvfrom), 251}, >> > + {SCMP_SYS(sendto), 250}, >> > + {SCMP_SYS(read), 249}, >> > + {SCMP_SYS(brk), 248}, >> > + {SCMP_SYS(clone), 247}, >> > + {SCMP_SYS(mmap), 247}, >> > + {SCMP_SYS(mprotect), 246}, >> > + {SCMP_SYS(rt_sigprocmask), 245}, >> > + {SCMP_SYS(write), 244}, >> > + {SCMP_SYS(fcntl), 243}, >> > + {SCMP_SYS(tgkill), 242}, >> > + {SCMP_SYS(rt_sigaction), 242}, >> > + {SCMP_SYS(pipe2), 242}, >> > + {SCMP_SYS(munmap), 242}, >> > + {SCMP_SYS(mremap), 242}, >> > + {SCMP_SYS(getsockname), 242}, >> > + {SCMP_SYS(getpeername), 242}, >> > + {SCMP_SYS(fdatasync), 242}, >> > + {SCMP_SYS(close), 242} >> > +}; >> > + At least the following are also used: recvmsg, sendmsg, accept, connect, bind, listen, ioctl, fallocate, eventfd. I don't know if all of them have to be included in the list. Other syscalls are not used but probably should be allowed for simplicity, for example poll. For ioctl, we may want to refine the white-list depending on the argument, and perhaps even filter by file descriptor (the KVM ioctls are in relatively fast paths, so it would be nice if they were passed with fewer BPF ops). BTW, please keep this out of vl.c, so that all hairiness can be added as appropriate. Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c 2012-05-07 11:01 ` Paolo Bonzini @ 2012-05-07 12:28 ` Eduardo Otubo 2012-05-07 12:34 ` Paolo Bonzini 0 siblings, 1 reply; 13+ messages in thread From: Eduardo Otubo @ 2012-05-07 12:28 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel On Mon, May 07, 2012 at 01:01:01PM +0200, Paolo Bonzini wrote: > Il 04/05/2012 23:59, Andreas Färber ha scritto: > >> > +static struct qemu_seccomp_syscall seccomp_whitelist[] = { > >> > + {SCMP_SYS(timer_settime), 255}, > > Spaces inside braces please. > > > >> > + {SCMP_SYS(timer_gettime), 254}, > >> > + {SCMP_SYS(futex), 253}, > >> > + {SCMP_SYS(select), 252}, > >> > + {SCMP_SYS(recvfrom), 251}, > >> > + {SCMP_SYS(sendto), 250}, > >> > + {SCMP_SYS(read), 249}, > >> > + {SCMP_SYS(brk), 248}, > >> > + {SCMP_SYS(clone), 247}, > >> > + {SCMP_SYS(mmap), 247}, > >> > + {SCMP_SYS(mprotect), 246}, > >> > + {SCMP_SYS(rt_sigprocmask), 245}, > >> > + {SCMP_SYS(write), 244}, > >> > + {SCMP_SYS(fcntl), 243}, > >> > + {SCMP_SYS(tgkill), 242}, > >> > + {SCMP_SYS(rt_sigaction), 242}, > >> > + {SCMP_SYS(pipe2), 242}, > >> > + {SCMP_SYS(munmap), 242}, > >> > + {SCMP_SYS(mremap), 242}, > >> > + {SCMP_SYS(getsockname), 242}, > >> > + {SCMP_SYS(getpeername), 242}, > >> > + {SCMP_SYS(fdatasync), 242}, > >> > + {SCMP_SYS(close), 242} > >> > +}; > >> > + > > At least the following are also used: recvmsg, sendmsg, accept, connect, > bind, listen, ioctl, fallocate, eventfd. I don't know if all of them > have to be included in the list. Other syscalls are not used but > probably should be allowed for simplicity, for example poll. You straced those syscalls from what kind of guest? Can you provide the frequency they appear on a strace of you example so we can set the priority? Don't need any fancy report, just some grep's and wc's on a strace output should be just fine. > > For ioctl, we may want to refine the white-list depending on the > argument, and perhaps even filter by file descriptor (the KVM ioctls are > in relatively fast paths, so it would be nice if they were passed with > fewer BPF ops). > > BTW, please keep this out of vl.c, so that all hairiness can be added as > appropriate. I thought it would be overkill the create a new seccomp.[c|h] just for this purpose. But yes, we can start thinking about that since the features might grow in the future. Thanks for the comments, Regards -- Eduardo Otubo Software Engineer Linux Technology Center IBM Systems & Technology Group Mobile: +55 19 8135 0885 eotubo@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c 2012-05-07 12:28 ` Eduardo Otubo @ 2012-05-07 12:34 ` Paolo Bonzini 0 siblings, 0 replies; 13+ messages in thread From: Paolo Bonzini @ 2012-05-07 12:34 UTC (permalink / raw) To: Eduardo Otubo; +Cc: qemu-devel > > At least the following are also used: recvmsg, sendmsg, accept, connect, > > bind, listen, ioctl, fallocate, eventfd. I don't know if all of them > > have to be included in the list. Other syscalls are not used but > > probably should be allowed for simplicity, for example poll. > > You straced those syscalls from what kind of guest? Can you provide > the frequency they appear on a strace of you example so we can set the > priority? Don't need any fancy report, just some grep's and wc's on a > strace output should be just fine. No, just looking at the code. (Uhm, fallocate is not used in master yet). ioctl is the only one with pretty high priority. Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c 2012-05-04 21:59 ` Andreas Färber 2012-05-07 11:01 ` Paolo Bonzini @ 2012-05-07 12:16 ` Eduardo Otubo 1 sibling, 0 replies; 13+ messages in thread From: Eduardo Otubo @ 2012-05-07 12:16 UTC (permalink / raw) To: Andreas Färber; +Cc: qemu-devel On Fri, May 04, 2012 at 11:59:00PM +0200, Andreas Färber wrote: > Am 04.05.2012 21:08, schrieb Eduardo Otubo: > > I added a syscall struct using priority levels as described in the libseccomp > > man page. The priority numbers are based to the frequency they appear in a > > sample strace from a regular qemu guest run under libvirt. > > > > Libseccomp generates linear BPF code to filter system calls, those rules are > > read one after another. The priority system places the most common rules first > > in order to reduce the overhead when processing them. > > > > Also, since this is just a first RFC, the whitelist is a little raw. We might > > need your help to improve, test and fine tune the set of system calls. > > > > Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> > > --- > > vl.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 81 insertions(+) > > > > diff --git a/vl.c b/vl.c > > index ae91a8a..e23838b 100644 > > --- a/vl.c > > +++ b/vl.c > > @@ -63,6 +63,9 @@ > > > > #include <linux/ppdev.h> > > #include <linux/parport.h> > > +#ifdef CONFIG_LIBSECCOMP > > +#include <seccomp.h> > > +#endif > > #endif > > #ifdef __sun__ > > #include <sys/stat.h> > > @@ -175,6 +178,8 @@ int main(int argc, char **argv) > > > > #define MAX_VIRTIO_CONSOLES 1 > > > > +static int seccomp_start(void); > > You haven't tested this without libseccomp apparently? > > Other than that mostly some Coding Style issues... Actually I did run the scripts/checkpatch.pl, and there were no errors. But I'll fix those issues for my next version. Thanks for pointing them out :) > > > + > > static const char *data_dir; > > const char *bios_name = NULL; > > enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB; > > @@ -313,6 +318,75 @@ static int default_driver_check(QemuOpts *opts, void *opaque) > > return 0; > > } > > > > +#ifdef CONFIG_LIBSECCOMP > > +struct qemu_seccomp_syscall { > > Struct names are expected to be CamelCase. > > > + int32_t num; > > + uint8_t priority; > > +}; > > + > > +static struct qemu_seccomp_syscall seccomp_whitelist[] = { > > + {SCMP_SYS(timer_settime), 255}, > > Spaces inside braces please. > > > + {SCMP_SYS(timer_gettime), 254}, > > + {SCMP_SYS(futex), 253}, > > + {SCMP_SYS(select), 252}, > > + {SCMP_SYS(recvfrom), 251}, > > + {SCMP_SYS(sendto), 250}, > > + {SCMP_SYS(read), 249}, > > + {SCMP_SYS(brk), 248}, > > + {SCMP_SYS(clone), 247}, > > + {SCMP_SYS(mmap), 247}, > > + {SCMP_SYS(mprotect), 246}, > > + {SCMP_SYS(rt_sigprocmask), 245}, > > + {SCMP_SYS(write), 244}, > > + {SCMP_SYS(fcntl), 243}, > > + {SCMP_SYS(tgkill), 242}, > > + {SCMP_SYS(rt_sigaction), 242}, > > + {SCMP_SYS(pipe2), 242}, > > + {SCMP_SYS(munmap), 242}, > > + {SCMP_SYS(mremap), 242}, > > + {SCMP_SYS(getsockname), 242}, > > + {SCMP_SYS(getpeername), 242}, > > + {SCMP_SYS(fdatasync), 242}, > > + {SCMP_SYS(close), 242} > > +}; > > + > > +#define seccomp_whitelist_count \ > > + (sizeof(seccomp_whitelist)/sizeof(seccomp_whitelist[0])) > > Please just use the ARRAY_SIZE() macro inline. > > > + > > +int seccomp_start(void) > > +{ > > + int rc = 0; > > + unsigned int i = 0; > > + > > + rc = seccomp_init(SCMP_ACT_KILL); > > + if (rc < 0) { > > + goto seccomp_return; > > + } > > + > > + for (i = 0; i < seccomp_whitelist_count; i++) { > > + rc = seccomp_rule_add(SCMP_ACT_ALLOW, seccomp_whitelist[i].num, 0); > > + if (rc < 0) { > > + goto seccomp_return; > > + } > > + rc = seccomp_syscall_priority(seccomp_whitelist[i].num, > > + seccomp_whitelist[i].priority); > > + if (rc < 0) { > > + goto seccomp_return; > > + } > > + } > > + > > + rc = seccomp_load(); > > + > > + seccomp_return: > > + seccomp_release(); > > + if (rc < 0) { > > + fprintf(stderr, > > + "ERROR: failed to configure the seccomp syscall filter in the kernel"); > > \n missing. > > > + } > > + return rc; > > +} > > +#endif > > + > > /***********************************************************/ > > /* QEMU state */ > > > > @@ -2257,6 +2331,13 @@ int qemu_init_main_loop(void) > > > > int main(int argc, char **argv, char **envp) > > { > > + > > +#ifdef CONFIG_LIBSECCOMP > > + if (seccomp_start() < 0) { > > + exit(1); > > This is inconsistent: Either exit() within seccomp_start() where you > print the error message, or move the fprintf() here. > Agreed. > > + } > > +#endif > > This is adding code before the variable declaration block, please move > to after. Agreed. Thanks for the comments. -- Eduardo Otubo Software Engineer Linux Technology Center IBM Systems & Technology Group Mobile: +55 19 8135 0885 eotubo@linux.vnet.ibm.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp 2012-05-04 19:08 [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Eduardo Otubo 2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 1/2] Adding support for libseccomp in configure Eduardo Otubo 2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c Eduardo Otubo @ 2012-05-08 9:15 ` Daniel P. Berrange 2012-05-08 11:32 ` Stefano Stabellini 2 siblings, 1 reply; 13+ messages in thread From: Daniel P. Berrange @ 2012-05-08 9:15 UTC (permalink / raw) To: Eduardo Otubo; +Cc: qemu-devel On Fri, May 04, 2012 at 04:08:36PM -0300, Eduardo Otubo wrote: > Hello all, > > This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The > patches that follows are pretty simple and straightforward. I added the correct > options and checks to the configure script and the basic calls to libseccomp in > the main loop at vl.c. Details of each one are in the emails of the patch set. > > This support limits the system call footprint of the entire QEMU process to a > limited set of syscalls, those that we know QEMU uses. The idea is to limit > the allowable syscalls, therefore limiting the impact that an attacked guest > could have on the host system. What functionality has been lost by applying this seccomp filter ? I've not looked closely at the code, but it appears as if this blocks pretty much any kind of runtime device changes. ie no hotplug of any kind will work ? Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp 2012-05-08 9:15 ` [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Daniel P. Berrange @ 2012-05-08 11:32 ` Stefano Stabellini 2012-05-08 14:10 ` Corey Bryant 0 siblings, 1 reply; 13+ messages in thread From: Stefano Stabellini @ 2012-05-08 11:32 UTC (permalink / raw) To: Daniel P. Berrange; +Cc: qemu-devel@nongnu.org, Eduardo Otubo On Tue, 8 May 2012, Daniel P. Berrange wrote: > On Fri, May 04, 2012 at 04:08:36PM -0300, Eduardo Otubo wrote: > > Hello all, > > > > This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The > > patches that follows are pretty simple and straightforward. I added the correct > > options and checks to the configure script and the basic calls to libseccomp in > > the main loop at vl.c. Details of each one are in the emails of the patch set. > > > > This support limits the system call footprint of the entire QEMU process to a > > limited set of syscalls, those that we know QEMU uses. The idea is to limit > > the allowable syscalls, therefore limiting the impact that an attacked guest > > could have on the host system. > > What functionality has been lost by applying this seccomp filter ? I've not > looked closely at the code, but it appears as if this blocks pretty much > any kind of runtime device changes. ie no hotplug of any kind will work ? Right, I was wondering the same thing: open is not on the list so adding a new disk shouldn't be possible. Regarding Xen, most of the hypercalls go through xc_* calls that are ioctls on the privcmd device. Is it possible to add ioctl to the list? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp 2012-05-08 11:32 ` Stefano Stabellini @ 2012-05-08 14:10 ` Corey Bryant 2012-05-08 14:27 ` Daniel P. Berrange 0 siblings, 1 reply; 13+ messages in thread From: Corey Bryant @ 2012-05-08 14:10 UTC (permalink / raw) To: Stefano Stabellini; +Cc: qemu-devel@nongnu.org, Eduardo Otubo On 05/08/2012 07:32 AM, Stefano Stabellini wrote: > On Tue, 8 May 2012, Daniel P. Berrange wrote: >> On Fri, May 04, 2012 at 04:08:36PM -0300, Eduardo Otubo wrote: >>> Hello all, >>> >>> This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The >>> patches that follows are pretty simple and straightforward. I added the correct >>> options and checks to the configure script and the basic calls to libseccomp in >>> the main loop at vl.c. Details of each one are in the emails of the patch set. >>> >>> This support limits the system call footprint of the entire QEMU process to a >>> limited set of syscalls, those that we know QEMU uses. The idea is to limit >>> the allowable syscalls, therefore limiting the impact that an attacked guest >>> could have on the host system. >> >> What functionality has been lost by applying this seccomp filter ? I've not >> looked closely at the code, but it appears as if this blocks pretty much >> any kind of runtime device changes. ie no hotplug of any kind will work ? > > Right, I was wondering the same thing: open is not on the list so adding > a new disk shouldn't be possible. > > Regarding Xen, most of the hypercalls go through xc_* calls that are > ioctls on the privcmd device. Is it possible to add ioctl to the list? > If the whitelist is complete there should be no functionality lost when using seccomp with QEMU. The idea (at least at this point) is to disallow the system calls that QEMU doesn't use. open and ioctl should be added to the whitelist. -- Regards, Corey ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp 2012-05-08 14:10 ` Corey Bryant @ 2012-05-08 14:27 ` Daniel P. Berrange 2012-05-08 15:19 ` Corey Bryant 0 siblings, 1 reply; 13+ messages in thread From: Daniel P. Berrange @ 2012-05-08 14:27 UTC (permalink / raw) To: Corey Bryant; +Cc: Eduardo Otubo, qemu-devel@nongnu.org, Stefano Stabellini On Tue, May 08, 2012 at 10:10:25AM -0400, Corey Bryant wrote: > > > On 05/08/2012 07:32 AM, Stefano Stabellini wrote: > >On Tue, 8 May 2012, Daniel P. Berrange wrote: > >>On Fri, May 04, 2012 at 04:08:36PM -0300, Eduardo Otubo wrote: > >>>Hello all, > >>> > >>>This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The > >>>patches that follows are pretty simple and straightforward. I added the correct > >>>options and checks to the configure script and the basic calls to libseccomp in > >>>the main loop at vl.c. Details of each one are in the emails of the patch set. > >>> > >>>This support limits the system call footprint of the entire QEMU process to a > >>>limited set of syscalls, those that we know QEMU uses. The idea is to limit > >>>the allowable syscalls, therefore limiting the impact that an attacked guest > >>>could have on the host system. > >> > >>What functionality has been lost by applying this seccomp filter ? I've not > >>looked closely at the code, but it appears as if this blocks pretty much > >>any kind of runtime device changes. ie no hotplug of any kind will work ? > > > >Right, I was wondering the same thing: open is not on the list so adding > >a new disk shouldn't be possible. > > > >Regarding Xen, most of the hypercalls go through xc_* calls that are > >ioctls on the privcmd device. Is it possible to add ioctl to the list? > > > > If the whitelist is complete there should be no functionality lost > when using seccomp with QEMU. The idea (at least at this point) is > to disallow the system calls that QEMU doesn't use. open and ioctl > should be added to the whitelist. Ok. So my next question is what is the benchmark for evaluating whether this seccomp code provides any kind of meaningful security improvement ? AFAICT, if you were allow open(), or indeed every syscall any QEMU feature could possibly use, then there would be little-to-no security benefit. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp 2012-05-08 14:27 ` Daniel P. Berrange @ 2012-05-08 15:19 ` Corey Bryant 0 siblings, 0 replies; 13+ messages in thread From: Corey Bryant @ 2012-05-08 15:19 UTC (permalink / raw) To: Daniel P. Berrange Cc: Stefano Stabellini, qemu-devel@nongnu.org, Eduardo Otubo On 05/08/2012 10:27 AM, Daniel P. Berrange wrote: > On Tue, May 08, 2012 at 10:10:25AM -0400, Corey Bryant wrote: >> >> >> On 05/08/2012 07:32 AM, Stefano Stabellini wrote: >>> On Tue, 8 May 2012, Daniel P. Berrange wrote: >>>> On Fri, May 04, 2012 at 04:08:36PM -0300, Eduardo Otubo wrote: >>>>> Hello all, >>>>> >>>>> This is the first effort to sandboxing Qemu guests using Libseccomp[0]. The >>>>> patches that follows are pretty simple and straightforward. I added the correct >>>>> options and checks to the configure script and the basic calls to libseccomp in >>>>> the main loop at vl.c. Details of each one are in the emails of the patch set. >>>>> >>>>> This support limits the system call footprint of the entire QEMU process to a >>>>> limited set of syscalls, those that we know QEMU uses. The idea is to limit >>>>> the allowable syscalls, therefore limiting the impact that an attacked guest >>>>> could have on the host system. >>>> >>>> What functionality has been lost by applying this seccomp filter ? I've not >>>> looked closely at the code, but it appears as if this blocks pretty much >>>> any kind of runtime device changes. ie no hotplug of any kind will work ? >>> >>> Right, I was wondering the same thing: open is not on the list so adding >>> a new disk shouldn't be possible. >>> >>> Regarding Xen, most of the hypercalls go through xc_* calls that are >>> ioctls on the privcmd device. Is it possible to add ioctl to the list? >>> >> >> If the whitelist is complete there should be no functionality lost >> when using seccomp with QEMU. The idea (at least at this point) is >> to disallow the system calls that QEMU doesn't use. open and ioctl >> should be added to the whitelist. > > Ok. So my next question is what is the benchmark for evaluating > whether this seccomp code provides any kind of meaningful security > improvement ? AFAICT, if you were allow open(), or indeed every > syscall any QEMU feature could possibly use, then there would be > little-to-no security benefit. Well let's say we have a seccomp whitelist of 50 syscalls. That reduces the syscall footprint from ~350 (on x86) syscalls to 50, limiting what the attacker could execute from an exploited guest. Eventually it would be nice to fine-tune the syscall parameters that are whitelisted. For example, we could only allow a designated subset of allowable ioctls. Or we could allow I/O operations only on a designated set of file descriptors that the guest needs to access. -- Regards, Corey ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-05-08 15:19 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-04 19:08 [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Eduardo Otubo 2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 1/2] Adding support for libseccomp in configure Eduardo Otubo 2012-05-04 19:08 ` [Qemu-devel] [RFC] [PATCH 2/2] Adding basic calls to libseccomp in vl.c Eduardo Otubo 2012-05-04 21:59 ` Andreas Färber 2012-05-07 11:01 ` Paolo Bonzini 2012-05-07 12:28 ` Eduardo Otubo 2012-05-07 12:34 ` Paolo Bonzini 2012-05-07 12:16 ` Eduardo Otubo 2012-05-08 9:15 ` [Qemu-devel] [RFC] [PATCH 0/2] Sandboxing Qemu guests with Libseccomp Daniel P. Berrange 2012-05-08 11:32 ` Stefano Stabellini 2012-05-08 14:10 ` Corey Bryant 2012-05-08 14:27 ` Daniel P. Berrange 2012-05-08 15:19 ` Corey Bryant
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).