* Re: [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) [not found] <1404124096-21445-1-git-send-email-drysdale@google.com> @ 2014-07-03 9:12 ` Paolo Bonzini 2014-07-03 10:01 ` Loganaden Velvindron 2014-07-03 18:39 ` David Drysdale 0 siblings, 2 replies; 10+ messages in thread From: Paolo Bonzini @ 2014-07-03 9:12 UTC (permalink / raw) To: David Drysdale, linux-security-module, linux-kernel, Greg Kroah-Hartman Cc: Kees Cook, linux-api, Meredydd Luff, qemu-devel, Alexander Viro, James Morris Il 30/06/2014 12:28, David Drysdale ha scritto: > Hi all, > > The last couple of versions of FreeBSD (9.x/10.x) have included the > Capsicum security framework [1], which allows security-aware > applications to sandbox themselves in a very fine-grained way. For > example, OpenSSH now (>= 6.5) uses Capsicum in its FreeBSD version to > restrict sshd's credentials checking process, to reduce the chances of > credential leakage. Hi David, we've had similar goals in QEMU. QEMU can be used as a virtual machine monitor from the command line, but it also has an API that lets a management tool drive QEMU via AF_UNIX sockets. Long term, we would like to have a restricted mode for QEMU where all file descriptors are obtained via SCM_RIGHTS or /dev/fd, and syscalls can be locked down. Currently we do use seccomp v2 BPF filters, but unfortunately this didn't help very much. QEMU supports hotplugging hence the filter must whitelist anything that _might_ be used in the future, which is generally... too much. Something like Capsicum would be really nice because it attaches capabilities to file descriptors. However, I wonder however how extensible Capsicum could be, and I am worried about the proliferation of capabilities that its design naturally leads to. Given Linux's previous experience with BPF filters, what do you think about attaching specific BPF programs to file descriptors? Then whenever a syscall is run that affects a file descriptor, the BPF program for the file descriptor (attached to a struct file* as in Capsicum) would run in addition to the process-wide filter. An equivalent of PR_SET_NO_NEW_PRIVS can also be added to file descriptors, so that a program that doesn't lock down syscalls can still lock down the operations (including fcntls and ioctls) on specific file descriptors. Converting FreeBSD capabilities to BPF programs can be easily implemented in userspace. > [Capsicum also includes 'capability mode', which locks down the > available syscalls so the rights restrictions can't just be bypassed > by opening new file descriptors; I'll describe that separately later.] This can also be implemented in userspace via seccomp and PR_SET_NO_NEW_PRIVS. > [Policing the rights checks anywhere else, for example at the system > call boundary, isn't a good idea because it opens up the possibility > of time-of-check/time-of-use (TOCTOU) attacks [2] where FDs are > changed (as openat/close/dup2 are allowed in capability mode) between > the 'check' at syscall entry and the 'use' at fget() invocation.] In the case of BPF filters, I wonder if you could stash the BPF "environment" somewhere and then use it at fget() invocation. Alternatively, it can be reconstructed at fget() time, similar to your introduction of fgetr(). Thanks, Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) 2014-07-03 9:12 ` [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) Paolo Bonzini @ 2014-07-03 10:01 ` Loganaden Velvindron 2014-07-03 18:39 ` David Drysdale 1 sibling, 0 replies; 10+ messages in thread From: Loganaden Velvindron @ 2014-07-03 10:01 UTC (permalink / raw) To: Paolo Bonzini Cc: Kees Cook, Greg Kroah-Hartman, Meredydd Luff, linux-kernel, qemu-devel, linux-security-module, Alexander Viro, James Morris, linux-api, David Drysdale On Thu, Jul 3, 2014 at 1:12 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 30/06/2014 12:28, David Drysdale ha scritto: >> >> Hi all, >> >> The last couple of versions of FreeBSD (9.x/10.x) have included the >> Capsicum security framework [1], which allows security-aware >> applications to sandbox themselves in a very fine-grained way. For >> example, OpenSSH now (>= 6.5) uses Capsicum in its FreeBSD version to >> restrict sshd's credentials checking process, to reduce the chances of >> credential leakage. Aside from OpenSSH, I've also been working on implementing Capsicum, in other userspace software. > > > Hi David, > > we've had similar goals in QEMU. QEMU can be used as a virtual machine > monitor from the command line, but it also has an API that lets a management > tool drive QEMU via AF_UNIX sockets. Long term, we would like to have a > restricted mode for QEMU where all file descriptors are obtained via > SCM_RIGHTS or /dev/fd, and syscalls can be locked down. > > Currently we do use seccomp v2 BPF filters, but unfortunately this didn't > help very much. QEMU supports hotplugging hence the filter must whitelist > anything that _might_ be used in the future, which is generally... too much. > > Something like Capsicum would be really nice because it attaches > capabilities to file descriptors. However, I wonder however how extensible > Capsicum could be, and I am worried about the proliferation of capabilities > that its design naturally leads to. > > Given Linux's previous experience with BPF filters, what do you think about > attaching specific BPF programs to file descriptors? Then whenever a > syscall is run that affects a file descriptor, the BPF program for the file > descriptor (attached to a struct file* as in Capsicum) would run in addition > to the process-wide filter. > > An equivalent of PR_SET_NO_NEW_PRIVS can also be added to file descriptors, > so that a program that doesn't lock down syscalls can still lock down the > operations (including fcntls and ioctls) on specific file descriptors. > > Converting FreeBSD capabilities to BPF programs can be easily implemented in > userspace. > >> [Capsicum also includes 'capability mode', which locks down the >> available syscalls so the rights restrictions can't just be bypassed >> by opening new file descriptors; I'll describe that separately later.] > > > This can also be implemented in userspace via seccomp and > PR_SET_NO_NEW_PRIVS. > >> [Policing the rights checks anywhere else, for example at the system >> call boundary, isn't a good idea because it opens up the possibility >> of time-of-check/time-of-use (TOCTOU) attacks [2] where FDs are >> changed (as openat/close/dup2 are allowed in capability mode) between >> the 'check' at syscall entry and the 'use' at fget() invocation.] > > > In the case of BPF filters, I wonder if you could stash the BPF > "environment" somewhere and then use it at fget() invocation. Alternatively, > it can be reconstructed at fget() time, similar to your introduction of > fgetr(). > > Thanks, > > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe > linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- This message is strictly personal and the opinions expressed do not represent those of my employers, either past or present. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) 2014-07-03 9:12 ` [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) Paolo Bonzini 2014-07-03 10:01 ` Loganaden Velvindron @ 2014-07-03 18:39 ` David Drysdale 2014-07-04 7:03 ` Paolo Bonzini 1 sibling, 1 reply; 10+ messages in thread From: David Drysdale @ 2014-07-03 18:39 UTC (permalink / raw) To: Paolo Bonzini Cc: Kees Cook, Greg Kroah-Hartman, Meredydd Luff, linux-kernel@vger.kernel.org, qemu-devel, LSM List, Alexander Viro, James Morris, Linux API On Thu, Jul 03, 2014 at 11:12:33AM +0200, Paolo Bonzini wrote: > Il 30/06/2014 12:28, David Drysdale ha scritto: > >Hi all, > > > >The last couple of versions of FreeBSD (9.x/10.x) have included the > >Capsicum security framework [1], which allows security-aware > >applications to sandbox themselves in a very fine-grained way. For > >example, OpenSSH now (>= 6.5) uses Capsicum in its FreeBSD version to > >restrict sshd's credentials checking process, to reduce the chances of > >credential leakage. > > Hi David, > > we've had similar goals in QEMU. QEMU can be used as a virtual > machine monitor from the command line, but it also has an API that > lets a management tool drive QEMU via AF_UNIX sockets. Long term, > we would like to have a restricted mode for QEMU where all file > descriptors are obtained via SCM_RIGHTS or /dev/fd, and syscalls can > be locked down. > > Currently we do use seccomp v2 BPF filters, but unfortunately this > didn't help very much. QEMU supports hotplugging hence the filter > must whitelist anything that _might_ be used in the future, which is > generally... too much. > > Something like Capsicum would be really nice because it attaches > capabilities to file descriptors. However, I wonder however how > extensible Capsicum could be, and I am worried about the > proliferation of capabilities that its design naturally leads to. True, capability rights are likely to expand over time (although FreeBSD only expanded from 55 to 60 between 9.x and 10.x). > Given Linux's previous experience with BPF filters, what do you > think about attaching specific BPF programs to file descriptors? > Then whenever a syscall is run that affects a file descriptor, the > BPF program for the file descriptor (attached to a struct file* as > in Capsicum) would run in addition to the process-wide filter. That sounds kind of clever, but also kind of complicated. Off the top of my head, one particular problem is that not all fd->struct file conversions in the kernel are completely specified by an enclosing syscall and the explicit values of its parameters. For example, the actual contents of the arguments to io_submit(2) aren't visible to a seccomp-bpf program (as it can't read the __user memory for the iocb structures), and so it can't distinguish a read from a write. Also, there could potentially be some odd interactions with file descriptors passed between processes, if the BPF program relies on assumptions about the environment of the original process. For example, what happens if an x86_64 process passes a filter-attached FD to an ia32 process? Given that the syscall numbers are arch-specific, I guess that means the filter program would have to include arch-specific branches for any possible variant. More generally, I suspect that keeping things simpler will end up being more secure. Capsicum was based on well-studied ideas from the world of object capability-based security, and I'd be nervous about adding complications that take us further away from that. > An equivalent of PR_SET_NO_NEW_PRIVS can also be added to file > descriptors, so that a program that doesn't lock down syscalls can > still lock down the operations (including fcntls and ioctls) on > specific file descriptors. > > Converting FreeBSD capabilities to BPF programs can be easily > implemented in userspace. I get the idea, but I'm not sure it would be that easy! The BPF-generation library would need to hold all of the mappings from system calls (and their arguments) to the equivalent required rights -- and vice versa. That mapping would also need be kept closely in sync with the kernel and other system libraries -- if a new syscall is added and libc (or some other library) started using it, the equivalent BPF chunks would need to be updated to cope. > > [Capsicum also includes 'capability mode', which locks down the > > available syscalls so the rights restrictions can't just be bypassed > > by opening new file descriptors; I'll describe that separately later.] > > This can also be implemented in userspace via seccomp and > PR_SET_NO_NEW_PRIVS. Well, mostly (and in fact I've got an attempt to do exactly that at https://github.com/google/capsicum-test/blob/dev/linux-bpf-capmode.c). But there are a few wrinkles with that approach. First, we need Kees Cook's patches to allow seccomp filters to be synchronized across existing threads, but hopefully they will make it in soon. Next, there's one awkward syscall case. In capability mode we'd like to prevent processes from sending signals with kill(2)/tgkill(2) to other processes, but they should still be able to send themselves signals. For example, abort(3) generates: tgkill(gettid(), gettid(), SIGABRT) Only allowing kill(self) is hard to encode in a seccomp-bpf program, at least in a way that survives forking. Finally, capability mode also turns on strict-relative lookups process-wide; in other words, every openat(dfd, ...) operation acts as though it has the O_BENEATH_ONLY flag set, regardless of whether the dfd is a Capsicum capability. I can't see a way to do that with a BPF program (although it would be possible to add a filter that polices the requirement to include O_BENEATH_ONLY rather than implicitly adding it). So although a capability-mode implementation in terms of seccomp-bpf is tantalizingly close, at the moment I've got it implemented as a new seccomp mode. > > [Policing the rights checks anywhere else, for example at the system > > call boundary, isn't a good idea because it opens up the possibility > > of time-of-check/time-of-use (TOCTOU) attacks [2] where FDs are > > changed (as openat/close/dup2 are allowed in capability mode) between > > the 'check' at syscall entry and the 'use' at fget() invocation.] > > In the case of BPF filters, I wonder if you could stash the BPF > "environment" somewhere and then use it at fget() invocation. > Alternatively, it can be reconstructed at fget() time, similar to > your introduction of fgetr(). Stashing something at syscall entry to be referred to later always makes me worry about TOCTOU vulnerabilities, but the details might be OK in this case (given that no check occurs at syscall entry)... > Thanks, > > Paolo Many thanks for taking the time to comment and think of innovative ideas! David ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) 2014-07-03 18:39 ` David Drysdale @ 2014-07-04 7:03 ` Paolo Bonzini 2014-07-07 10:29 ` David Drysdale 0 siblings, 1 reply; 10+ messages in thread From: Paolo Bonzini @ 2014-07-04 7:03 UTC (permalink / raw) To: David Drysdale Cc: Kees Cook, Greg Kroah-Hartman, Meredydd Luff, linux-kernel@vger.kernel.org, qemu-devel, LSM List, Alexander Viro, James Morris, Linux API Il 03/07/2014 20:39, David Drysdale ha scritto: > On Thu, Jul 03, 2014 at 11:12:33AM +0200, Paolo Bonzini wrote: >> Given Linux's previous experience with BPF filters, what do you >> think about attaching specific BPF programs to file descriptors? >> Then whenever a syscall is run that affects a file descriptor, the >> BPF program for the file descriptor (attached to a struct file* as >> in Capsicum) would run in addition to the process-wide filter. > > That sounds kind of clever, but also kind of complicated. > > Off the top of my head, one particular problem is that not all > fd->struct file conversions in the kernel are completely specified > by an enclosing syscall and the explicit values of its parameters. > > For example, the actual contents of the arguments to io_submit(2) > aren't visible to a seccomp-bpf program (as it can't read the __user > memory for the iocb structures), and so it can't distinguish a > read from a write. I think that's more easily done by opening the file as O_RDONLY/O_WRONLY /O_RDWR. You could do it by running the file descriptor's seccomp-bpf program once per iocb with synthesized syscall numbers and argument vectors. BTW, there's one thing I'm not sure I understand (because my knowledge of VFS is really only cursory). Are the capabilities associated to the file _descriptor_ (a la F_GETFD/SETFD) or _description_ (F_GETFL/SETFL)?!? If it is the former, there is some value in read/write capabilities because you could for example block a child process from reading an eventfd and simulate the two file descriptors returned by pipe(2). But if it is the latter, it looks like an important usability problem in the Capsicum model. (Granted, it's just about usability---in the end it does exactly what it's meant and documented to do). > Also, there could potentially be some odd interactions with file > descriptors passed between processes, if the BPF program relies > on assumptions about the environment of the original process. For > example, what happens if an x86_64 process passes a filter-attached > FD to an ia32 process? Given that the syscall numbers are > arch-specific, I guess that means the filter program would have > to include arch-specific branches for any possible variant. This is the same for using seccompv2 to limit child processes, no? So there may be a problem but it has to be solved anyway by libseccomp. > More generally, I suspect that keeping things simpler will end > up being more secure. Capsicum was based on well-studied ideas > from the world of object capability-based security, and I'd be > nervous about adding complications that take us further away from > that. True. > That mapping would also need be kept closely in sync with the kernel > and other system libraries -- if a new syscall is added and libc (or > some other library) started using it, the equivalent BPF chunks would > need to be updated to cope. Again, this is the same problem that has to be solved for process-wide seccompv2. >>> [Capsicum also includes 'capability mode', which locks down the >>> available syscalls so the rights restrictions can't just be bypassed >>> by opening new file descriptors; I'll describe that separately later.] >> >> This can also be implemented in userspace via seccomp and >> PR_SET_NO_NEW_PRIVS. > > Well, mostly (and in fact I've got an attempt to do exactly that at > https://github.com/google/capsicum-test/blob/dev/linux-bpf-capmode.c). > > [..] there's one awkward syscall case. In capability mode we'd like > to prevent processes from sending signals with kill(2)/tgkill(2) > to other processes, but they should still be able to send themselves > signals. For example, abort(3) generates: > tgkill(gettid(), gettid(), SIGABRT) > > Only allowing kill(self) is hard to encode in a seccomp-bpf program, at > least in a way that survives forking. I guess the thread id could be added as a special seccomp-bpf argument (ancillary datum?). > Finally, capability mode also turns on strict-relative lookups > process-wide; in other words, every openat(dfd, ...) operation > acts as though it has the O_BENEATH_ONLY flag set, regardless of > whether the dfd is a Capsicum capability. I can't see a way to > do that with a BPF program (although it would be possible to add > a filter that polices the requirement to include O_BENEATH_ONLY > rather than implicitly adding it). That can be a new prctl (one that PR_SET_NO_NEW_PRIVS would lock up). It seems useful independent of Capsicum, and the Linux APIs tend to be fine-grained more often than coarse-grained. >>> [Policing the rights checks anywhere else, for example at the system >>> call boundary, isn't a good idea because it opens up the possibility >>> of time-of-check/time-of-use (TOCTOU) attacks [2] where FDs are >>> changed (as openat/close/dup2 are allowed in capability mode) between >>> the 'check' at syscall entry and the 'use' at fget() invocation.] >> >> In the case of BPF filters, I wonder if you could stash the BPF >> "environment" somewhere and then use it at fget() invocation. >> Alternatively, it can be reconstructed at fget() time, similar to >> your introduction of fgetr(). > > Stashing something at syscall entry to be referred to later always > makes me worry about TOCTOU vulnerabilities, but the details might > be OK in this case (given that no check occurs at syscall entry)... Yeah, that was pretty much the idea. But I was cautious enough to label it with "I wonder"... Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) 2014-07-04 7:03 ` Paolo Bonzini @ 2014-07-07 10:29 ` David Drysdale 2014-07-07 12:20 ` Paolo Bonzini 0 siblings, 1 reply; 10+ messages in thread From: David Drysdale @ 2014-07-07 10:29 UTC (permalink / raw) To: Paolo Bonzini Cc: Kees Cook, Greg Kroah-Hartman, Meredydd Luff, linux-kernel@vger.kernel.org, qemu-devel, LSM List, Alexander Viro, James Morris, Linux API On Fri, Jul 4, 2014 at 8:03 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > > Il 03/07/2014 20:39, David Drysdale ha scritto: >> On Thu, Jul 03, 2014 at 11:12:33AM +0200, Paolo Bonzini wrote: >>> Given Linux's previous experience with BPF filters, what do you >>> think about attaching specific BPF programs to file descriptors? >>> Then whenever a syscall is run that affects a file descriptor, the >>> BPF program for the file descriptor (attached to a struct file* as >>> in Capsicum) would run in addition to the process-wide filter. >> >> That sounds kind of clever, but also kind of complicated. >> >> Off the top of my head, one particular problem is that not all >> fd->struct file conversions in the kernel are completely specified >> by an enclosing syscall and the explicit values of its parameters. >> >> For example, the actual contents of the arguments to io_submit(2) >> aren't visible to a seccomp-bpf program (as it can't read the __user >> memory for the iocb structures), and so it can't distinguish a >> read from a write. > > I think that's more easily done by opening the file as O_RDONLY/O_WRONLY > /O_RDWR. You could do it by running the file descriptor's seccomp-bpf > program once per iocb with synthesized syscall numbers and argument > vectors. Right, but generating the equivalent seccomp input environment for an equivalent single-fd syscall is going to be subtle and complex (which are worrying words to mention in a security context). And how many other syscalls are going to need similar special-case processing? (poll? select? send[m]msg? ...) > BTW, there's one thing I'm not sure I understand (because my knowledge > of VFS is really only cursory). Are the capabilities associated to the > file _descriptor_ (a la F_GETFD/SETFD) or _description_ > (F_GETFL/SETFL)?!? Capsicum capabilities are associated with the file descriptor (a la F_GETFD), not the open file itself -- different FDs with different associated rights can map to the same underlying open file. > If it is the former, there is some value in read/write capabilities > because you could for example block a child process from reading an > eventfd and simulate the two file descriptors returned by pipe(2). But > if it is the latter, it looks like an important usability problem in > the Capsicum model. (Granted, it's just about usability---in the end > it does exactly what it's meant and documented to do). Attaching the rights to the FD also comes back to the association with object-capability security. The FD is an unforgeable reference to the object (file) in question, but these references (with their rights) can be transferred to other programs -- either by inheritance after fork, or by explicitly passing the FD across a Unix domain socket. >> Also, there could potentially be some odd interactions with file >> descriptors passed between processes, if the BPF program relies >> on assumptions about the environment of the original process. For >> example, what happens if an x86_64 process passes a filter-attached >> FD to an ia32 process? Given that the syscall numbers are >> arch-specific, I guess that means the filter program would have >> to include arch-specific branches for any possible variant. > > This is the same for using seccompv2 to limit child processes, no? So > there may be a problem but it has to be solved anyway by libseccomp. I don't know whether libseccomp would worry about this, but being able to send FDs between processes via Unix domain sockets makes this more visible in the Capsicum case. >> More generally, I suspect that keeping things simpler will end >> up being more secure. Capsicum was based on well-studied ideas >> from the world of object capability-based security, and I'd be >> nervous about adding complications that take us further away from >> that. > > True. > >> That mapping would also need be kept closely in sync with the kernel >> and other system libraries -- if a new syscall is added and libc (or >> some other library) started using it, the equivalent BPF chunks would >> need to be updated to cope. > > Again, this is the same problem that has to be solved for process-wide > seccompv2. True. I guess new syscalls are sufficiently rare in practice that this isn't a serious concern. >>>> [Capsicum also includes 'capability mode', which locks down the >>>> available syscalls so the rights restrictions can't just be bypassed >>>> by opening new file descriptors; I'll describe that separately later.] >>> >>> This can also be implemented in userspace via seccomp and >>> PR_SET_NO_NEW_PRIVS. >> >> Well, mostly (and in fact I've got an attempt to do exactly that at >> https://github.com/google/capsicum-test/blob/dev/linux-bpf-capmode.c). >> >> [..] there's one awkward syscall case. In capability mode we'd like >> to prevent processes from sending signals with kill(2)/tgkill(2) >> to other processes, but they should still be able to send themselves >> signals. For example, abort(3) generates: >> tgkill(gettid(), gettid(), SIGABRT) >> >> Only allowing kill(self) is hard to encode in a seccomp-bpf program, at >> least in a way that survives forking. > > I guess the thread id could be added as a special seccomp-bpf argument > (ancillary datum?). Yeah, I tried exactly that a while ago (https://github.com/google/capsicum-linux/commit/e163c6348328) but didn't run with it because of the process-wide beneath-only issue below. But a combination of that and your new prctl() suggestion below might do the trick. >> Finally, capability mode also turns on strict-relative lookups >> process-wide; in other words, every openat(dfd, ...) operation >> acts as though it has the O_BENEATH_ONLY flag set, regardless of >> whether the dfd is a Capsicum capability. I can't see a way to >> do that with a BPF program (although it would be possible to add >> a filter that polices the requirement to include O_BENEATH_ONLY >> rather than implicitly adding it). > > That can be a new prctl (one that PR_SET_NO_NEW_PRIVS would lock up). > It seems useful independent of Capsicum, and the Linux APIs tend to be > fine-grained more often than coarse-grained. That sounds like a good idea, particularly in combination with the idea above -- thanks! I'll have a think/investigate... >>>> [Policing the rights checks anywhere else, for example at the system >>>> call boundary, isn't a good idea because it opens up the possibility >>>> of time-of-check/time-of-use (TOCTOU) attacks [2] where FDs are >>>> changed (as openat/close/dup2 are allowed in capability mode) between >>>> the 'check' at syscall entry and the 'use' at fget() invocation.] >>> >>> In the case of BPF filters, I wonder if you could stash the BPF >>> "environment" somewhere and then use it at fget() invocation. >>> Alternatively, it can be reconstructed at fget() time, similar to >>> your introduction of fgetr(). >> >> Stashing something at syscall entry to be referred to later always >> makes me worry about TOCTOU vulnerabilities, but the details might >> be OK in this case (given that no check occurs at syscall entry)... > > Yeah, that was pretty much the idea. But I was cautious enough to > label it with "I wonder"... > > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) 2014-07-07 10:29 ` David Drysdale @ 2014-07-07 12:20 ` Paolo Bonzini 2014-07-07 14:11 ` David Drysdale 2014-07-07 22:33 ` Alexei Starovoitov 0 siblings, 2 replies; 10+ messages in thread From: Paolo Bonzini @ 2014-07-07 12:20 UTC (permalink / raw) To: David Drysdale Cc: Kees Cook, Greg Kroah-Hartman, Meredydd Luff, linux-kernel@vger.kernel.org, qemu-devel, LSM List, Alexander Viro, James Morris, Linux API Il 07/07/2014 12:29, David Drysdale ha scritto: >> I think that's more easily done by opening the file as O_RDONLY/O_WRONLY >> /O_RDWR. You could do it by running the file descriptor's seccomp-bpf >> program once per iocb with synthesized syscall numbers and argument >> vectors. > > Right, but generating the equivalent seccomp input environment for an > equivalent single-fd syscall is going to be subtle and complex (which > are worrying words to mention in a security context). And how many > other syscalls are going to need similar special-case processing? > (poll? select? send[m]msg? ...) Yeah, the difficult part is getting the right balance between: 1) limitations due to seccomp's impossibility to chase pointers (which is not something that can be lifted, as it's required for correctness) 2) subtlety and complexity of the resulting code. The problem stems when you have a single a single syscall operating on multiple file descriptors. So for example among the cases you mention poll and select are problematic; sendm?msg are not. They would be if Capsicum had a capability for SCM_RIGHTS file descriptor passing, but I cannot find it. But then you also have to strike the right balance between a complete solution and an overengineered one. For example, even though poll and select are problematic, I wonder what would really the point be in blocking them; poll/select are level-triggered, and calling them should be idempotent as far as the file descriptor is concerned. If you want to prevent a process/thread from issuing blocking system calls, but you'd do that with a per-process filter, not with per-file-descriptor filters or capabilities. > Capsicum capabilities are associated with the file descriptor (a la > F_GETFD), not the open file itself -- different FDs with different > associated rights can map to the same underlying open file. Good to know, thanks. I suppose you have testcases that cover this. Paolo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) 2014-07-07 12:20 ` Paolo Bonzini @ 2014-07-07 14:11 ` David Drysdale 2014-07-07 22:33 ` Alexei Starovoitov 1 sibling, 0 replies; 10+ messages in thread From: David Drysdale @ 2014-07-07 14:11 UTC (permalink / raw) To: Paolo Bonzini Cc: Kees Cook, Greg Kroah-Hartman, Meredydd Luff, linux-kernel@vger.kernel.org, qemu-devel, LSM List, Alexander Viro, James Morris, Linux API On Mon, Jul 7, 2014 at 1:20 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 07/07/2014 12:29, David Drysdale ha scritto: >> Capsicum capabilities are associated with the file descriptor (a la >> F_GETFD), not the open file itself -- different FDs with different >> associated rights can map to the same underlying open file. > > > Good to know, thanks. I suppose you have testcases that cover this. > > Paolo Yeah, there's lots of tests at: https://github.com/google/capsicum-test (which is in a separate repo so it's easy to run against FreeBSD as well as the Linux code); in particular https://github.com/google/capsicum-test/blob/dev/capability-fd.cc has various interactions of capability FDs. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) 2014-07-07 12:20 ` Paolo Bonzini 2014-07-07 14:11 ` David Drysdale @ 2014-07-07 22:33 ` Alexei Starovoitov 2014-07-08 14:58 ` Kees Cook 2014-08-16 15:41 ` Pavel Machek 1 sibling, 2 replies; 10+ messages in thread From: Alexei Starovoitov @ 2014-07-07 22:33 UTC (permalink / raw) To: Paolo Bonzini Cc: Kees Cook, Greg Kroah-Hartman, Meredydd Luff, linux-kernel@vger.kernel.org, qemu-devel, LSM List, Alexander Viro, James Morris, Linux API, David Drysdale On Mon, Jul 7, 2014 at 5:20 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: > Il 07/07/2014 12:29, David Drysdale ha scritto: > >>> I think that's more easily done by opening the file as O_RDONLY/O_WRONLY >>> /O_RDWR. You could do it by running the file descriptor's seccomp-bpf >>> program once per iocb with synthesized syscall numbers and argument >>> vectors. >> >> >> Right, but generating the equivalent seccomp input environment for an >> equivalent single-fd syscall is going to be subtle and complex (which >> are worrying words to mention in a security context). And how many >> other syscalls are going to need similar special-case processing? >> (poll? select? send[m]msg? ...) > > > Yeah, the difficult part is getting the right balance between: > > 1) limitations due to seccomp's impossibility to chase pointers (which is > not something that can be lifted, as it's required for correctness) btw once seccomp moves to eBPF it will be able to 'chase pointers', since pointer walking will be possible via bpf_load_pointer() function call, which is a wrapper of: probe_kernel_read(&ptr, unsafe_ptr, sizeof(void *)); return ptr; Not sure whether it helps this case or not. Just fyi. > 2) subtlety and complexity of the resulting code. > > The problem stems when you have a single a single syscall operating on > multiple file descriptors. So for example among the cases you mention poll > and select are problematic; sendm?msg are not. They would be if Capsicum > had a capability for SCM_RIGHTS file descriptor passing, but I cannot find > it. > > But then you also have to strike the right balance between a complete > solution and an overengineered one. > > For example, even though poll and select are problematic, I wonder what > would really the point be in blocking them; poll/select are level-triggered, > and calling them should be idempotent as far as the file descriptor is > concerned. If you want to prevent a process/thread from issuing blocking > system calls, but you'd do that with a per-process filter, not with > per-file-descriptor filters or capabilities. > > >> Capsicum capabilities are associated with the file descriptor (a la >> F_GETFD), not the open file itself -- different FDs with different >> associated rights can map to the same underlying open file. > > > Good to know, thanks. I suppose you have testcases that cover this. > > Paolo > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) 2014-07-07 22:33 ` Alexei Starovoitov @ 2014-07-08 14:58 ` Kees Cook 2014-08-16 15:41 ` Pavel Machek 1 sibling, 0 replies; 10+ messages in thread From: Kees Cook @ 2014-07-08 14:58 UTC (permalink / raw) To: Alexei Starovoitov Cc: Greg Kroah-Hartman, Meredydd Luff, linux-kernel@vger.kernel.org, qemu-devel, LSM List, Alexander Viro, James Morris, Linux API, Paolo Bonzini, David Drysdale On Mon, Jul 7, 2014 at 3:33 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > On Mon, Jul 7, 2014 at 5:20 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: >> Il 07/07/2014 12:29, David Drysdale ha scritto: >> >>>> I think that's more easily done by opening the file as O_RDONLY/O_WRONLY >>>> /O_RDWR. You could do it by running the file descriptor's seccomp-bpf >>>> program once per iocb with synthesized syscall numbers and argument >>>> vectors. >>> >>> >>> Right, but generating the equivalent seccomp input environment for an >>> equivalent single-fd syscall is going to be subtle and complex (which >>> are worrying words to mention in a security context). And how many >>> other syscalls are going to need similar special-case processing? >>> (poll? select? send[m]msg? ...) >> >> >> Yeah, the difficult part is getting the right balance between: >> >> 1) limitations due to seccomp's impossibility to chase pointers (which is >> not something that can be lifted, as it's required for correctness) > > btw once seccomp moves to eBPF it will be able to 'chase pointers', > since pointer walking will be possible via bpf_load_pointer() function call, > which is a wrapper of: > probe_kernel_read(&ptr, unsafe_ptr, sizeof(void *)); > return ptr; > Not sure whether it helps this case or not. Just fyi. It won't immediately help, since threads can race pointer target contents (i.e. seccomp sees one thing, and then the syscall see another thing). Having an immutable memory area could help with this (i.e. some kind of "locked" memory range that holds all the "approved" argument strings, at which point seccomp could then trust the chased pointers that land in this range.) Obviously eBPF is a prerequisite to this, but it isn't the full solution, as far as I understand it. -Kees -- Kees Cook Chrome OS Security ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) 2014-07-07 22:33 ` Alexei Starovoitov 2014-07-08 14:58 ` Kees Cook @ 2014-08-16 15:41 ` Pavel Machek 1 sibling, 0 replies; 10+ messages in thread From: Pavel Machek @ 2014-08-16 15:41 UTC (permalink / raw) To: Alexei Starovoitov Cc: Kees Cook, Greg Kroah-Hartman, Meredydd Luff, linux-kernel@vger.kernel.org, qemu-devel, LSM List, Alexander Viro, James Morris, Linux API, Paolo Bonzini, David Drysdale Hi! > >>> I think that's more easily done by opening the file as O_RDONLY/O_WRONLY > >>> /O_RDWR. You could do it by running the file descriptor's seccomp-bpf > >>> program once per iocb with synthesized syscall numbers and argument > >>> vectors. > >> > >> > >> Right, but generating the equivalent seccomp input environment for an > >> equivalent single-fd syscall is going to be subtle and complex (which > >> are worrying words to mention in a security context). And how many > >> other syscalls are going to need similar special-case processing? > >> (poll? select? send[m]msg? ...) > > > > > > Yeah, the difficult part is getting the right balance between: > > > > 1) limitations due to seccomp's impossibility to chase pointers (which is > > not something that can be lifted, as it's required for correctness) > > btw once seccomp moves to eBPF it will be able to 'chase pointers', > since pointer walking will be possible via bpf_load_pointer() function call, > which is a wrapper of: Even if you could make capscium work with eBPF... please don't. Capscium is kind of obvious, elegant solution. BPF is quite complex. And security semantics should not be pushed to userspace... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-08-16 15:42 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1404124096-21445-1-git-send-email-drysdale@google.com> 2014-07-03 9:12 ` [Qemu-devel] [RFC PATCH 00/11] Adding FreeBSD's Capsicum security framework (part 1) Paolo Bonzini 2014-07-03 10:01 ` Loganaden Velvindron 2014-07-03 18:39 ` David Drysdale 2014-07-04 7:03 ` Paolo Bonzini 2014-07-07 10:29 ` David Drysdale 2014-07-07 12:20 ` Paolo Bonzini 2014-07-07 14:11 ` David Drysdale 2014-07-07 22:33 ` Alexei Starovoitov 2014-07-08 14:58 ` Kees Cook 2014-08-16 15:41 ` Pavel Machek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).