* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-06 21:46 ` Markus Gutschke (顧孟勤)
0 siblings, 0 replies; 84+ messages in thread
From: @ 2009-05-06 21:46 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Roland McGrath, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
On Wed, May 6, 2009 at 14:29, Ingo Molnar <mingo@elte.hu> wrote:
> That's a pretty interesting usage. What would be fallback mode you
> are using if the kernel doesnt have seccomp built in? Completely
> non-sandboxed? Or a ptrace/PTRACE_SYSCALL based sandbox?
Ptrace has performance and/or reliability problems when used to
sandbox threaded applications due to potential race conditions when
inspecting system call arguments. We hope that we can avoid this
problem with seccomp. It is very attractive that kernel automatically
terminates any application that violates the very well-defined
constraints of the sandbox.
In general, we are currently exploring different options based on
general availability, functionality, and complexity of implementation.
Seccomp is a good middle ground that we expect to be able to use in
the medium term to provide an acceptable solution for a large segment
of Linux users. Although the restriction to just four unfiltered
system calls is painful.
We are still discussing what fallback options we have, and they are
likely on different schedules.
For instance, on platforms that have AppArmor or SELinux, we might be
able to use them as part of our sandboxing solution. Although we are
still investigating whether they meet all of our needs.
Markus
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-06 21:46 ` Markus Gutschke (顧孟勤)
0 siblings, 0 replies; 84+ messages in thread
From: Markus Gutschke (顧孟勤) @ 2009-05-06 21:46 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-mips, x86, linux-kernel, linuxppc-dev, sparclinux,
Andrew Morton, Linus Torvalds, stable, Roland McGrath
On Wed, May 6, 2009 at 14:29, Ingo Molnar <mingo@elte.hu> wrote:
> That's a pretty interesting usage. What would be fallback mode you
> are using if the kernel doesnt have seccomp built in? Completely
> non-sandboxed? Or a ptrace/PTRACE_SYSCALL based sandbox?
Ptrace has performance and/or reliability problems when used to
sandbox threaded applications due to potential race conditions when
inspecting system call arguments. We hope that we can avoid this
problem with seccomp. It is very attractive that kernel automatically
terminates any application that violates the very well-defined
constraints of the sandbox.
In general, we are currently exploring different options based on
general availability, functionality, and complexity of implementation.
Seccomp is a good middle ground that we expect to be able to use in
the medium term to provide an acceptable solution for a large segment
of Linux users. Although the restriction to just four unfiltered
system calls is painful.
We are still discussing what fallback options we have, and they are
likely on different schedules.
For instance, on platforms that have AppArmor or SELinux, we might be
able to use them as part of our sandboxing solution. Although we are
still investigating whether they meet all of our needs.
Markus
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-06 21:46 ` Markus Gutschke (顧孟勤)
(?)
@ 2009-05-06 21:54 ` Ingo Molnar
-1 siblings, 0 replies; 84+ messages in thread
From: Ingo Molnar @ 2009-05-06 21:54 UTC (permalink / raw)
To: Markus Gutschke (顧孟勤)
Cc: Linus Torvalds, Roland McGrath, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
* Markus Gutschke (顧孟勤) <markus@google.com> wrote:
> On Wed, May 6, 2009 at 14:29, Ingo Molnar <mingo@elte.hu> wrote:
> > That's a pretty interesting usage. What would be fallback mode you
> > are using if the kernel doesnt have seccomp built in? Completely
> > non-sandboxed? Or a ptrace/PTRACE_SYSCALL based sandbox?
>
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions
> when inspecting system call arguments. We hope that we can avoid
> this problem with seccomp. It is very attractive that kernel
> automatically terminates any application that violates the very
> well-defined constraints of the sandbox.
>
> In general, we are currently exploring different options based on
> general availability, functionality, and complexity of
> implementation. Seccomp is a good middle ground that we expect to
> be able to use in the medium term to provide an acceptable
> solution for a large segment of Linux users. Although the
> restriction to just four unfiltered system calls is painful.
Which other system calls would you like to use? Futexes might be
one, for fast synchronization primitives?
Ingo
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-06 21:54 ` Ingo Molnar
0 siblings, 0 replies; 84+ messages in thread
From: Ingo Molnar @ 2009-05-06 21:54 UTC (permalink / raw)
To: Markus Gutschke (顧孟勤)
Cc: Linus Torvalds, Roland McGrath, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
* Markus Gutschke (顧孟勤) <markus@google.com> wrote:
> On Wed, May 6, 2009 at 14:29, Ingo Molnar <mingo@elte.hu> wrote:
> > That's a pretty interesting usage. What would be fallback mode you
> > are using if the kernel doesnt have seccomp built in? Completely
> > non-sandboxed? Or a ptrace/PTRACE_SYSCALL based sandbox?
>
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions
> when inspecting system call arguments. We hope that we can avoid
> this problem with seccomp. It is very attractive that kernel
> automatically terminates any application that violates the very
> well-defined constraints of the sandbox.
>
> In general, we are currently exploring different options based on
> general availability, functionality, and complexity of
> implementation. Seccomp is a good middle ground that we expect to
> be able to use in the medium term to provide an acceptable
> solution for a large segment of Linux users. Although the
> restriction to just four unfiltered system calls is painful.
Which other system calls would you like to use? Futexes might be
one, for fast synchronization primitives?
Ingo
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-06 21:54 ` Ingo Molnar
0 siblings, 0 replies; 84+ messages in thread
From: Ingo Molnar @ 2009-05-06 21:54 UTC (permalink / raw)
To: Markus Gutschke (顧孟勤)
Cc: linux-mips, x86, linux-kernel, linuxppc-dev, sparclinux,
Andrew Morton, Linus Torvalds, stable, Roland McGrath
* Markus Gutschke (顧孟勤) <markus@google.com> wrote:
> On Wed, May 6, 2009 at 14:29, Ingo Molnar <mingo@elte.hu> wrote:
> > That's a pretty interesting usage. What would be fallback mode you
> > are using if the kernel doesnt have seccomp built in? Completely
> > non-sandboxed? Or a ptrace/PTRACE_SYSCALL based sandbox?
>
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions
> when inspecting system call arguments. We hope that we can avoid
> this problem with seccomp. It is very attractive that kernel
> automatically terminates any application that violates the very
> well-defined constraints of the sandbox.
>
> In general, we are currently exploring different options based on
> general availability, functionality, and complexity of
> implementation. Seccomp is a good middle ground that we expect to
> be able to use in the medium term to provide an acceptable
> solution for a large segment of Linux users. Although the
> restriction to just four unfiltered system calls is painful.
Which other system calls would you like to use? Futexes might be
one, for fast synchronization primitives?
Ingo
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-06 21:54 ` Ingo Molnar
(?)
@ 2009-05-06 22:08 ` Markus Gutschke (顧孟勤)
-1 siblings, 0 replies; 84+ messages in thread
From: Markus Gutschke (顧孟勤) @ 2009-05-06 22:08 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Roland McGrath, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
On Wed, May 6, 2009 at 14:54, Ingo Molnar <mingo@elte.hu> wrote:
> Which other system calls would you like to use? Futexes might be
> one, for fast synchronization primitives?
There are a large number of system calls that "normal" C/C++ code uses
quite frequently, and that are not security sensitive. A typical
example would be gettimeofday(). But there are other system calls,
where the sandbox would not really need to inspect arguments as the
call does not expose any exploitable interface.
It is currently awkward that in order to use seccomp we have to
intercept all system calls and provide alternative implementations for
them; whereas we really only care about a comparatively small number
of security critical operations that we need to restrict.
Also, any redirected system call ends up incurring at least two
context switches, which is needlessly expensive for the large number
of trivial system calls. We are quite happy that read() and write(),
which are quite important to us, do not incur this penalty.
Markus
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-06 22:08 ` Markus Gutschke (顧孟勤)
0 siblings, 0 replies; 84+ messages in thread
From: @ 2009-05-06 22:08 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Roland McGrath, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
On Wed, May 6, 2009 at 14:54, Ingo Molnar <mingo@elte.hu> wrote:
> Which other system calls would you like to use? Futexes might be
> one, for fast synchronization primitives?
There are a large number of system calls that "normal" C/C++ code uses
quite frequently, and that are not security sensitive. A typical
example would be gettimeofday(). But there are other system calls,
where the sandbox would not really need to inspect arguments as the
call does not expose any exploitable interface.
It is currently awkward that in order to use seccomp we have to
intercept all system calls and provide alternative implementations for
them; whereas we really only care about a comparatively small number
of security critical operations that we need to restrict.
Also, any redirected system call ends up incurring at least two
context switches, which is needlessly expensive for the large number
of trivial system calls. We are quite happy that read() and write(),
which are quite important to us, do not incur this penalty.
Markus
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-06 22:08 ` Markus Gutschke (顧孟勤)
0 siblings, 0 replies; 84+ messages in thread
From: Markus Gutschke (顧孟勤) @ 2009-05-06 22:08 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-mips, x86, linux-kernel, linuxppc-dev, sparclinux,
Andrew Morton, Linus Torvalds, stable, Roland McGrath
On Wed, May 6, 2009 at 14:54, Ingo Molnar <mingo@elte.hu> wrote:
> Which other system calls would you like to use? Futexes might be
> one, for fast synchronization primitives?
There are a large number of system calls that "normal" C/C++ code uses
quite frequently, and that are not security sensitive. A typical
example would be gettimeofday(). But there are other system calls,
where the sandbox would not really need to inspect arguments as the
call does not expose any exploitable interface.
It is currently awkward that in order to use seccomp we have to
intercept all system calls and provide alternative implementations for
them; whereas we really only care about a comparatively small number
of security critical operations that we need to restrict.
Also, any redirected system call ends up incurring at least two
context switches, which is needlessly expensive for the large number
of trivial system calls. We are quite happy that read() and write(),
which are quite important to us, do not incur this penalty.
Markus
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-06 22:08 ` Markus Gutschke (顧孟勤)
(?)
@ 2009-05-06 22:13 ` Ingo Molnar
-1 siblings, 0 replies; 84+ messages in thread
From: Ingo Molnar @ 2009-05-06 22:13 UTC (permalink / raw)
To: Markus Gutschke (顧孟勤)
Cc: Linus Torvalds, Roland McGrath, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
* Markus Gutschke (顧孟勤) <markus@google.com> wrote:
> On Wed, May 6, 2009 at 14:54, Ingo Molnar <mingo@elte.hu> wrote:
> > Which other system calls would you like to use? Futexes might be
> > one, for fast synchronization primitives?
>
> There are a large number of system calls that "normal" C/C++ code
> uses quite frequently, and that are not security sensitive. A
> typical example would be gettimeofday(). But there are other
> system calls, where the sandbox would not really need to inspect
> arguments as the call does not expose any exploitable interface.
>
> It is currently awkward that in order to use seccomp we have to
> intercept all system calls and provide alternative implementations
> for them; whereas we really only care about a comparatively small
> number of security critical operations that we need to restrict.
>
> Also, any redirected system call ends up incurring at least two
> context switches, which is needlessly expensive for the large
> number of trivial system calls. We are quite happy that read() and
> write(), which are quite important to us, do not incur this
> penalty.
doing a (per arch) bitmap of harmless syscalls and replacing the
mode1_syscalls[] check with that in kernel/seccomp.c would be a
pretty reasonable extension. (.config controllable perhaps, for
old-style-seccomp)
It would probably be faster than the current loop over
mode1_syscalls[] as well.
Ingo
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-06 22:13 ` Ingo Molnar
0 siblings, 0 replies; 84+ messages in thread
From: Ingo Molnar @ 2009-05-06 22:13 UTC (permalink / raw)
To: Markus Gutschke (顧孟勤)
Cc: Linus Torvalds, Roland McGrath, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
* Markus Gutschke (顧孟勤) <markus@google.com> wrote:
> On Wed, May 6, 2009 at 14:54, Ingo Molnar <mingo@elte.hu> wrote:
> > Which other system calls would you like to use? Futexes might be
> > one, for fast synchronization primitives?
>
> There are a large number of system calls that "normal" C/C++ code
> uses quite frequently, and that are not security sensitive. A
> typical example would be gettimeofday(). But there are other
> system calls, where the sandbox would not really need to inspect
> arguments as the call does not expose any exploitable interface.
>
> It is currently awkward that in order to use seccomp we have to
> intercept all system calls and provide alternative implementations
> for them; whereas we really only care about a comparatively small
> number of security critical operations that we need to restrict.
>
> Also, any redirected system call ends up incurring at least two
> context switches, which is needlessly expensive for the large
> number of trivial system calls. We are quite happy that read() and
> write(), which are quite important to us, do not incur this
> penalty.
doing a (per arch) bitmap of harmless syscalls and replacing the
mode1_syscalls[] check with that in kernel/seccomp.c would be a
pretty reasonable extension. (.config controllable perhaps, for
old-style-seccomp)
It would probably be faster than the current loop over
mode1_syscalls[] as well.
Ingo
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-06 22:13 ` Ingo Molnar
0 siblings, 0 replies; 84+ messages in thread
From: Ingo Molnar @ 2009-05-06 22:13 UTC (permalink / raw)
To: Markus Gutschke (顧孟勤)
Cc: linux-mips, x86, linux-kernel, linuxppc-dev, sparclinux,
Andrew Morton, Linus Torvalds, stable, Roland McGrath
* Markus Gutschke (顧孟勤) <markus@google.com> wrote:
> On Wed, May 6, 2009 at 14:54, Ingo Molnar <mingo@elte.hu> wrote:
> > Which other system calls would you like to use? Futexes might be
> > one, for fast synchronization primitives?
>
> There are a large number of system calls that "normal" C/C++ code
> uses quite frequently, and that are not security sensitive. A
> typical example would be gettimeofday(). But there are other
> system calls, where the sandbox would not really need to inspect
> arguments as the call does not expose any exploitable interface.
>
> It is currently awkward that in order to use seccomp we have to
> intercept all system calls and provide alternative implementations
> for them; whereas we really only care about a comparatively small
> number of security critical operations that we need to restrict.
>
> Also, any redirected system call ends up incurring at least two
> context switches, which is needlessly expensive for the large
> number of trivial system calls. We are quite happy that read() and
> write(), which are quite important to us, do not incur this
> penalty.
doing a (per arch) bitmap of harmless syscalls and replacing the
mode1_syscalls[] check with that in kernel/seccomp.c would be a
pretty reasonable extension. (.config controllable perhaps, for
old-style-seccomp)
It would probably be faster than the current loop over
mode1_syscalls[] as well.
Ingo
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-06 22:13 ` Ingo Molnar
(?)
@ 2009-05-06 22:21 ` Markus Gutschke (顧孟勤)
-1 siblings, 0 replies; 84+ messages in thread
From: Markus Gutschke (顧孟勤) @ 2009-05-06 22:21 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Roland McGrath, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> doing a (per arch) bitmap of harmless syscalls and replacing the
> mode1_syscalls[] check with that in kernel/seccomp.c would be a
> pretty reasonable extension. (.config controllable perhaps, for
> old-style-seccomp)
>
> It would probably be faster than the current loop over
> mode1_syscalls[] as well.
This would be a great option to improve performance of our sandbox. I
can detect the availability of the new kernel API dynamically, and
then not intercept the bulk of the system calls. This would allow the
sandbox to work both with existing and with newer kernels.
We'll post a kernel patch for discussion in the next few days,
Markus
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-06 22:21 ` Markus Gutschke (顧孟勤)
0 siblings, 0 replies; 84+ messages in thread
From: @ 2009-05-06 22:21 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Roland McGrath, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> doing a (per arch) bitmap of harmless syscalls and replacing the
> mode1_syscalls[] check with that in kernel/seccomp.c would be a
> pretty reasonable extension. (.config controllable perhaps, for
> old-style-seccomp)
>
> It would probably be faster than the current loop over
> mode1_syscalls[] as well.
This would be a great option to improve performance of our sandbox. I
can detect the availability of the new kernel API dynamically, and
then not intercept the bulk of the system calls. This would allow the
sandbox to work both with existing and with newer kernels.
We'll post a kernel patch for discussion in the next few days,
Markus
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-06 22:21 ` Markus Gutschke (顧孟勤)
0 siblings, 0 replies; 84+ messages in thread
From: Markus Gutschke (顧孟勤) @ 2009-05-06 22:21 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-mips, x86, linux-kernel, linuxppc-dev, sparclinux,
Andrew Morton, Linus Torvalds, stable, Roland McGrath
On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> doing a (per arch) bitmap of harmless syscalls and replacing the
> mode1_syscalls[] check with that in kernel/seccomp.c would be a
> pretty reasonable extension. (.config controllable perhaps, for
> old-style-seccomp)
>
> It would probably be faster than the current loop over
> mode1_syscalls[] as well.
This would be a great option to improve performance of our sandbox. I
can detect the availability of the new kernel API dynamically, and
then not intercept the bulk of the system calls. This would allow the
sandbox to work both with existing and with newer kernels.
We'll post a kernel patch for discussion in the next few days,
Markus
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-06 22:21 ` Markus Gutschke (顧孟勤)
(?)
@ 2009-05-07 4:23 ` Nicholas Miell
-1 siblings, 0 replies; 84+ messages in thread
From: Nicholas Miell @ 2009-05-07 4:23 UTC (permalink / raw)
To: Markus Gutschke (顧孟勤)
Cc: Ingo Molnar, Linus Torvalds, Roland McGrath, Andrew Morton, x86,
linux-kernel, stable, linux-mips, sparclinux, linuxppc-dev
On Wed, 2009-05-06 at 15:21 -0700, Markus Gutschke (顧孟勤) wrote:
> On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> > doing a (per arch) bitmap of harmless syscalls and replacing the
> > mode1_syscalls[] check with that in kernel/seccomp.c would be a
> > pretty reasonable extension. (.config controllable perhaps, for
> > old-style-seccomp)
> >
> > It would probably be faster than the current loop over
> > mode1_syscalls[] as well.
>
> This would be a great option to improve performance of our sandbox. I
> can detect the availability of the new kernel API dynamically, and
> then not intercept the bulk of the system calls. This would allow the
> sandbox to work both with existing and with newer kernels.
>
> We'll post a kernel patch for discussion in the next few days,
>
I suspect the correct thing to do would be to leave seccomp mode 1 alone
and introduce a mode 2 with a less restricted set of system calls -- the
interface was designed to be extended in this way, after all.
--
Nicholas Miell <nmiell@comcast.net>
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 4:23 ` Nicholas Miell
0 siblings, 0 replies; 84+ messages in thread
From: Nicholas Miell @ 2009-05-07 4:23 UTC (permalink / raw)
To: Markus Gutschke (顧孟勤)
Cc: Ingo Molnar, Linus Torvalds, Roland McGrath, Andrew Morton, x86,
linux-kernel, stable, linux-mips, sparclinux, linuxppc-dev
On Wed, 2009-05-06 at 15:21 -0700, Markus Gutschke (顧孟勤) wrote:
> On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> > doing a (per arch) bitmap of harmless syscalls and replacing the
> > mode1_syscalls[] check with that in kernel/seccomp.c would be a
> > pretty reasonable extension. (.config controllable perhaps, for
> > old-style-seccomp)
> >
> > It would probably be faster than the current loop over
> > mode1_syscalls[] as well.
>
> This would be a great option to improve performance of our sandbox. I
> can detect the availability of the new kernel API dynamically, and
> then not intercept the bulk of the system calls. This would allow the
> sandbox to work both with existing and with newer kernels.
>
> We'll post a kernel patch for discussion in the next few days,
>
I suspect the correct thing to do would be to leave seccomp mode 1 alone
and introduce a mode 2 with a less restricted set of system calls -- the
interface was designed to be extended in this way, after all.
--
Nicholas Miell <nmiell@comcast.net>
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 4:23 ` Nicholas Miell
0 siblings, 0 replies; 84+ messages in thread
From: Nicholas Miell @ 2009-05-07 4:23 UTC (permalink / raw)
To: Markus Gutschke (顧孟勤)
Cc: linux-mips, x86, linux-kernel, stable, linuxppc-dev, sparclinux,
Ingo Molnar, Linus Torvalds, Andrew Morton, Roland McGrath
On Wed, 2009-05-06 at 15:21 -0700, Markus Gutschke (顧孟勤) wrote:
> On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> > doing a (per arch) bitmap of harmless syscalls and replacing the
> > mode1_syscalls[] check with that in kernel/seccomp.c would be a
> > pretty reasonable extension. (.config controllable perhaps, for
> > old-style-seccomp)
> >
> > It would probably be faster than the current loop over
> > mode1_syscalls[] as well.
>
> This would be a great option to improve performance of our sandbox. I
> can detect the availability of the new kernel API dynamically, and
> then not intercept the bulk of the system calls. This would allow the
> sandbox to work both with existing and with newer kernels.
>
> We'll post a kernel patch for discussion in the next few days,
>
I suspect the correct thing to do would be to leave seccomp mode 1 alone
and introduce a mode 2 with a less restricted set of system calls -- the
interface was designed to be extended in this way, after all.
--
Nicholas Miell <nmiell@comcast.net>
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-07 4:23 ` Nicholas Miell
(?)
@ 2009-05-07 10:11 ` Ingo Molnar
-1 siblings, 0 replies; 84+ messages in thread
From: Ingo Molnar @ 2009-05-07 10:11 UTC (permalink / raw)
To: Nicholas Miell
Cc: Markus Gutschke (顧孟勤), Linus Torvalds,
Roland McGrath, Andrew Morton, x86, linux-kernel, stable,
linux-mips, sparclinux, linuxppc-dev
* Nicholas Miell <nmiell@comcast.net> wrote:
> On Wed, 2009-05-06 at 15:21 -0700, Markus Gutschke (顧孟勤) wrote:
> > On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> > > doing a (per arch) bitmap of harmless syscalls and replacing the
> > > mode1_syscalls[] check with that in kernel/seccomp.c would be a
> > > pretty reasonable extension. (.config controllable perhaps, for
> > > old-style-seccomp)
> > >
> > > It would probably be faster than the current loop over
> > > mode1_syscalls[] as well.
> >
> > This would be a great option to improve performance of our sandbox. I
> > can detect the availability of the new kernel API dynamically, and
> > then not intercept the bulk of the system calls. This would allow the
> > sandbox to work both with existing and with newer kernels.
> >
> > We'll post a kernel patch for discussion in the next few days,
> >
>
> I suspect the correct thing to do would be to leave seccomp mode 1
> alone and introduce a mode 2 with a less restricted set of system
> calls -- the interface was designed to be extended in this way,
> after all.
Yes, that is what i alluded to above via the '.config controllable'
aspect.
Mode 2 could be implemented like this: extend prctl_set_seccomp()
with a bitmap pointer, and copy it to a per task seccomp context
structure.
a bitmap for 300 syscalls takes only about 40 bytes.
Please take care to implement nesting properly: if a seccomp context
does a seccomp call (which mode 2 could allow), then the resulting
bitmap should be the logical-AND of the parent and child bitmaps.
There's no reason why seccomp couldnt be used in hiearachy of
sandboxes, in a gradually less permissive fashion.
Ingo
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 10:11 ` Ingo Molnar
0 siblings, 0 replies; 84+ messages in thread
From: Ingo Molnar @ 2009-05-07 10:11 UTC (permalink / raw)
To: Nicholas Miell
Cc: Markus Gutschke (顧孟勤), Linus Torvalds,
Roland McGrath, Andrew Morton, x86, linux-kernel, stable,
linux-mips, sparclinux, linuxppc-dev
* Nicholas Miell <nmiell@comcast.net> wrote:
> On Wed, 2009-05-06 at 15:21 -0700, Markus Gutschke (顧孟勤) wrote:
> > On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> > > doing a (per arch) bitmap of harmless syscalls and replacing the
> > > mode1_syscalls[] check with that in kernel/seccomp.c would be a
> > > pretty reasonable extension. (.config controllable perhaps, for
> > > old-style-seccomp)
> > >
> > > It would probably be faster than the current loop over
> > > mode1_syscalls[] as well.
> >
> > This would be a great option to improve performance of our sandbox. I
> > can detect the availability of the new kernel API dynamically, and
> > then not intercept the bulk of the system calls. This would allow the
> > sandbox to work both with existing and with newer kernels.
> >
> > We'll post a kernel patch for discussion in the next few days,
> >
>
> I suspect the correct thing to do would be to leave seccomp mode 1
> alone and introduce a mode 2 with a less restricted set of system
> calls -- the interface was designed to be extended in this way,
> after all.
Yes, that is what i alluded to above via the '.config controllable'
aspect.
Mode 2 could be implemented like this: extend prctl_set_seccomp()
with a bitmap pointer, and copy it to a per task seccomp context
structure.
a bitmap for 300 syscalls takes only about 40 bytes.
Please take care to implement nesting properly: if a seccomp context
does a seccomp call (which mode 2 could allow), then the resulting
bitmap should be the logical-AND of the parent and child bitmaps.
There's no reason why seccomp couldnt be used in hiearachy of
sandboxes, in a gradually less permissive fashion.
Ingo
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 10:11 ` Ingo Molnar
0 siblings, 0 replies; 84+ messages in thread
From: Ingo Molnar @ 2009-05-07 10:11 UTC (permalink / raw)
To: Nicholas Miell
Cc: linux-mips, linuxppc-dev, x86, linux-kernel, sparclinux,
Markus Gutschke (顧孟勤), Andrew Morton,
Linus Torvalds, stable, Roland McGrath
* Nicholas Miell <nmiell@comcast.net> wrote:
> On Wed, 2009-05-06 at 15:21 -0700, Markus Gutschke (顧孟勤) wrote:
> > On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> > > doing a (per arch) bitmap of harmless syscalls and replacing the
> > > mode1_syscalls[] check with that in kernel/seccomp.c would be a
> > > pretty reasonable extension. (.config controllable perhaps, for
> > > old-style-seccomp)
> > >
> > > It would probably be faster than the current loop over
> > > mode1_syscalls[] as well.
> >
> > This would be a great option to improve performance of our sandbox. I
> > can detect the availability of the new kernel API dynamically, and
> > then not intercept the bulk of the system calls. This would allow the
> > sandbox to work both with existing and with newer kernels.
> >
> > We'll post a kernel patch for discussion in the next few days,
> >
>
> I suspect the correct thing to do would be to leave seccomp mode 1
> alone and introduce a mode 2 with a less restricted set of system
> calls -- the interface was designed to be extended in this way,
> after all.
Yes, that is what i alluded to above via the '.config controllable'
aspect.
Mode 2 could be implemented like this: extend prctl_set_seccomp()
with a bitmap pointer, and copy it to a per task seccomp context
structure.
a bitmap for 300 syscalls takes only about 40 bytes.
Please take care to implement nesting properly: if a seccomp context
does a seccomp call (which mode 2 could allow), then the resulting
bitmap should be the logical-AND of the parent and child bitmaps.
There's no reason why seccomp couldnt be used in hiearachy of
sandboxes, in a gradually less permissive fashion.
Ingo
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-07 10:11 ` Ingo Molnar
(?)
@ 2009-05-10 5:37 ` Pavel Machek
-1 siblings, 0 replies; 84+ messages in thread
From: Pavel Machek @ 2009-05-10 5:37 UTC (permalink / raw)
To: Ingo Molnar
Cc: Nicholas Miell, Markus Gutschke (?????????), Linus Torvalds,
Roland McGrath, Andrew Morton, x86, linux-kernel, stable,
linux-mips, sparclinux, linuxppc-dev
On Thu 2009-05-07 12:11:29, Ingo Molnar wrote:
>
> * Nicholas Miell <nmiell@comcast.net> wrote:
>
> > On Wed, 2009-05-06 at 15:21 -0700, Markus Gutschke (?????????) wrote:
> > > On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> > > > doing a (per arch) bitmap of harmless syscalls and replacing the
> > > > mode1_syscalls[] check with that in kernel/seccomp.c would be a
> > > > pretty reasonable extension. (.config controllable perhaps, for
> > > > old-style-seccomp)
> > > >
> > > > It would probably be faster than the current loop over
> > > > mode1_syscalls[] as well.
> > >
> > > This would be a great option to improve performance of our sandbox. I
> > > can detect the availability of the new kernel API dynamically, and
> > > then not intercept the bulk of the system calls. This would allow the
> > > sandbox to work both with existing and with newer kernels.
> > >
> > > We'll post a kernel patch for discussion in the next few days,
> > >
> >
> > I suspect the correct thing to do would be to leave seccomp mode 1
> > alone and introduce a mode 2 with a less restricted set of system
> > calls -- the interface was designed to be extended in this way,
> > after all.
>
> Yes, that is what i alluded to above via the '.config controllable'
> aspect.
>
> Mode 2 could be implemented like this: extend prctl_set_seccomp()
> with a bitmap pointer, and copy it to a per task seccomp context
> structure.
>
> a bitmap for 300 syscalls takes only about 40 bytes.
>
> Please take care to implement nesting properly: if a seccomp context
> does a seccomp call (which mode 2 could allow), then the resulting
> bitmap should be the logical-AND of the parent and child bitmaps.
> There's no reason why seccomp couldnt be used in hiearachy of
> sandboxes, in a gradually less permissive fashion.
I don't think seccomp nesting (at kernel level) has any value.
First, syscalls are wrong level of abstraction for sandboxing. There
are multiple ways to read from file, for example.
If you wanted to do hierarchical sandboxes, asking your monitor to
restrict your seccomp mask would seem like a way to go...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-10 5:37 ` Pavel Machek
0 siblings, 0 replies; 84+ messages in thread
From: Pavel Machek @ 2009-05-10 5:37 UTC (permalink / raw)
To: Ingo Molnar
Cc: Nicholas Miell, Markus Gutschke (?????????), Linus Torvalds,
Roland McGrath, Andrew Morton, x86, linux-kernel, stable,
linux-mips, sparclinux, linuxppc-dev
On Thu 2009-05-07 12:11:29, Ingo Molnar wrote:
>
> * Nicholas Miell <nmiell@comcast.net> wrote:
>
> > On Wed, 2009-05-06 at 15:21 -0700, Markus Gutschke (?????????) wrote:
> > > On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> > > > doing a (per arch) bitmap of harmless syscalls and replacing the
> > > > mode1_syscalls[] check with that in kernel/seccomp.c would be a
> > > > pretty reasonable extension. (.config controllable perhaps, for
> > > > old-style-seccomp)
> > > >
> > > > It would probably be faster than the current loop over
> > > > mode1_syscalls[] as well.
> > >
> > > This would be a great option to improve performance of our sandbox. I
> > > can detect the availability of the new kernel API dynamically, and
> > > then not intercept the bulk of the system calls. This would allow the
> > > sandbox to work both with existing and with newer kernels.
> > >
> > > We'll post a kernel patch for discussion in the next few days,
> > >
> >
> > I suspect the correct thing to do would be to leave seccomp mode 1
> > alone and introduce a mode 2 with a less restricted set of system
> > calls -- the interface was designed to be extended in this way,
> > after all.
>
> Yes, that is what i alluded to above via the '.config controllable'
> aspect.
>
> Mode 2 could be implemented like this: extend prctl_set_seccomp()
> with a bitmap pointer, and copy it to a per task seccomp context
> structure.
>
> a bitmap for 300 syscalls takes only about 40 bytes.
>
> Please take care to implement nesting properly: if a seccomp context
> does a seccomp call (which mode 2 could allow), then the resulting
> bitmap should be the logical-AND of the parent and child bitmaps.
> There's no reason why seccomp couldnt be used in hiearachy of
> sandboxes, in a gradually less permissive fashion.
I don't think seccomp nesting (at kernel level) has any value.
First, syscalls are wrong level of abstraction for sandboxing. There
are multiple ways to read from file, for example.
If you wanted to do hierarchical sandboxes, asking your monitor to
restrict your seccomp mask would seem like a way to go...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-10 5:37 ` Pavel Machek
0 siblings, 0 replies; 84+ messages in thread
From: Pavel Machek @ 2009-05-10 5:37 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-mips, linuxppc-dev, x86, linux-kernel, stable, sparclinux,
Markus Gutschke (?????????), Nicholas Miell, Linus Torvalds,
Andrew Morton, Roland McGrath
On Thu 2009-05-07 12:11:29, Ingo Molnar wrote:
>
> * Nicholas Miell <nmiell@comcast.net> wrote:
>
> > On Wed, 2009-05-06 at 15:21 -0700, Markus Gutschke (?????????) wrote:
> > > On Wed, May 6, 2009 at 15:13, Ingo Molnar <mingo@elte.hu> wrote:
> > > > doing a (per arch) bitmap of harmless syscalls and replacing the
> > > > mode1_syscalls[] check with that in kernel/seccomp.c would be a
> > > > pretty reasonable extension. (.config controllable perhaps, for
> > > > old-style-seccomp)
> > > >
> > > > It would probably be faster than the current loop over
> > > > mode1_syscalls[] as well.
> > >
> > > This would be a great option to improve performance of our sandbox. I
> > > can detect the availability of the new kernel API dynamically, and
> > > then not intercept the bulk of the system calls. This would allow the
> > > sandbox to work both with existing and with newer kernels.
> > >
> > > We'll post a kernel patch for discussion in the next few days,
> > >
> >
> > I suspect the correct thing to do would be to leave seccomp mode 1
> > alone and introduce a mode 2 with a less restricted set of system
> > calls -- the interface was designed to be extended in this way,
> > after all.
>
> Yes, that is what i alluded to above via the '.config controllable'
> aspect.
>
> Mode 2 could be implemented like this: extend prctl_set_seccomp()
> with a bitmap pointer, and copy it to a per task seccomp context
> structure.
>
> a bitmap for 300 syscalls takes only about 40 bytes.
>
> Please take care to implement nesting properly: if a seccomp context
> does a seccomp call (which mode 2 could allow), then the resulting
> bitmap should be the logical-AND of the parent and child bitmaps.
> There's no reason why seccomp couldnt be used in hiearachy of
> sandboxes, in a gradually less permissive fashion.
I don't think seccomp nesting (at kernel level) has any value.
First, syscalls are wrong level of abstraction for sandboxing. There
are multiple ways to read from file, for example.
If you wanted to do hierarchical sandboxes, asking your monitor to
restrict your seccomp mask would seem like a way to go...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-06 22:08 ` Markus Gutschke (顧孟勤)
(?)
@ 2009-05-08 19:18 ` Andi Kleen
-1 siblings, 0 replies; 84+ messages in thread
From: Andi Kleen @ 2009-05-08 19:18 UTC (permalink / raw)
To: Markus Gutschke (ÜÒÐ)
Cc: Ingo Molnar, Linus Torvalds, Roland McGrath, Andrew Morton, x86,
linux-kernel, stable, linux-mips, sparclinux, linuxppc-dev
"Markus Gutschke (ÜÒÐ)" <markus@google.com> writes:
>
> There are a large number of system calls that "normal" C/C++ code uses
> quite frequently, and that are not security sensitive. A typical
> example would be gettimeofday().
At least on x86-64 gettimeofday() (and time(2)) work inside seccomp because
they're vsyscalls that run in ring 3 only.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-08 19:18 ` Andi Kleen
0 siblings, 0 replies; 84+ messages in thread
From: Andi Kleen @ 2009-05-08 19:18 UTC (permalink / raw)
To: Markus Gutschke (ÜÒÐ)
Cc: Ingo Molnar, Linus Torvalds, Roland McGrath, Andrew Morton, x86,
linux-kernel, stable, linux-mips, sparclinux, linuxppc-dev
"Markus Gutschke (мва)" <markus@google.com> writes:
>
> There are a large number of system calls that "normal" C/C++ code uses
> quite frequently, and that are not security sensitive. A typical
> example would be gettimeofday().
At least on x86-64 gettimeofday() (and time(2)) work inside seccomp because
they're vsyscalls that run in ring 3 only.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-08 19:18 ` Andi Kleen
0 siblings, 0 replies; 84+ messages in thread
From: Andi Kleen @ 2009-05-08 19:18 UTC (permalink / raw)
To: Markus Gutschke (ÜÒÐ)
Cc: linux-mips, x86, linux-kernel, stable, linuxppc-dev, sparclinux,
Ingo Molnar, Linus Torvalds, Andrew Morton, Roland McGrath
"Markus Gutschke (ÜÒÐ)" <markus@google.com> writes:
>
> There are a large number of system calls that "normal" C/C++ code uses
> quite frequently, and that are not security sensitive. A typical
> example would be gettimeofday().
At least on x86-64 gettimeofday() (and time(2)) work inside seccomp because
they're vsyscalls that run in ring 3 only.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-06 21:46 ` Markus Gutschke (顧孟勤)
@ 2009-05-07 7:03 ` Roland McGrath
-1 siblings, 0 replies; 84+ messages in thread
From: Roland McGrath @ 2009-05-07 7:03 UTC (permalink / raw)
To: Markus Gutschke (顧孟勤)
Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions when
> inspecting system call arguments. We hope that we can avoid this
> problem with seccomp.
ptrace certainly has performance issues. I take it the only "reliability
problems" you are talking about are MT races with modifications to user
memory that is relevant to a system call. (Is there something else?)
That is not a "ptrace problem" per se at all. It's an intrinsic problem
with any method based on "generic" syscall interception, if the filtering
and enforcement decisions depend on examining user memory. By the same
token, no such method has a "reliability problem" if the filtering checks
only examine the registers (or other thread-synchronous state).
In the sense that I mean, seccomp is "generic syscall interception" too.
(That is, the checks/enforcement are "around" the call, rather than inside
it with direct atomicity controls binding the checks and uses together.)
The only reason seccomp does not have this "reliability problem" is that
its filtering is trivial and depends only on registers (in fact, only on
one register, the syscall number).
If you want to do checks that depend on shared or volatile state, then
syscall interception is really not the proper mechanism for you. (Likely
examples include user memory, e.g. for file names in open calls, or ioctl
struct contents, etc., fd tables or filesystem details, etc.) For that
you need mechanisms that look at stable kernel copies of user data that
are what the syscall will actually use, such as is done by audit, LSM, etc.
If you only have checks confined to thread-synchronous state such as the
user registers, then you don't have any "reliability problem" regardless
of the the particular syscall interception mechanism you use. (ptrace has
many problems for this or any other purpose, but this is not one of them.)
That's unless you are referring to some other "reliability problem" that
I'm not aware of. (And I'll leave aside the "is it registers or is it
user memory?" issue on ia64 as irrelevant, since, you know, it's ia64.)
If syscall interception is indeed an appropriate mechanism for your needs
and you want something tailored more specifically to your exact use in
future kernels, a module doing this would be easy to implement using the
utrace API. (That might be a "compelling use" of utrace by virtue of the
Midas brand name effect, if nothing else. ;-)
Thanks,
Roland
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 7:03 ` Roland McGrath
0 siblings, 0 replies; 84+ messages in thread
From: Roland McGrath @ 2009-05-07 7:03 UTC (permalink / raw)
To: Markus Gutschke (顧孟勤)
Cc: linux-mips, Andrew Morton, x86, linux-kernel, linuxppc-dev,
sparclinux, Ingo Molnar, Linus Torvalds, stable
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions when
> inspecting system call arguments. We hope that we can avoid this
> problem with seccomp.
ptrace certainly has performance issues. I take it the only "reliability
problems" you are talking about are MT races with modifications to user
memory that is relevant to a system call. (Is there something else?)
That is not a "ptrace problem" per se at all. It's an intrinsic problem
with any method based on "generic" syscall interception, if the filtering
and enforcement decisions depend on examining user memory. By the same
token, no such method has a "reliability problem" if the filtering checks
only examine the registers (or other thread-synchronous state).
In the sense that I mean, seccomp is "generic syscall interception" too.
(That is, the checks/enforcement are "around" the call, rather than inside
it with direct atomicity controls binding the checks and uses together.)
The only reason seccomp does not have this "reliability problem" is that
its filtering is trivial and depends only on registers (in fact, only on
one register, the syscall number).
If you want to do checks that depend on shared or volatile state, then
syscall interception is really not the proper mechanism for you. (Likely
examples include user memory, e.g. for file names in open calls, or ioctl
struct contents, etc., fd tables or filesystem details, etc.) For that
you need mechanisms that look at stable kernel copies of user data that
are what the syscall will actually use, such as is done by audit, LSM, etc.
If you only have checks confined to thread-synchronous state such as the
user registers, then you don't have any "reliability problem" regardless
of the the particular syscall interception mechanism you use. (ptrace has
many problems for this or any other purpose, but this is not one of them.)
That's unless you are referring to some other "reliability problem" that
I'm not aware of. (And I'll leave aside the "is it registers or is it
user memory?" issue on ia64 as irrelevant, since, you know, it's ia64.)
If syscall interception is indeed an appropriate mechanism for your needs
and you want something tailored more specifically to your exact use in
future kernels, a module doing this would be easy to implement using the
utrace API. (That might be a "compelling use" of utrace by virtue of the
Midas brand name effect, if nothing else. ;-)
Thanks,
Roland
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-07 7:03 ` Roland McGrath
(?)
@ 2009-05-07 8:01 ` Markus Gutschke (顧孟勤)
-1 siblings, 0 replies; 84+ messages in thread
From: Markus Gutschke (顧孟勤) @ 2009-05-07 8:01 UTC (permalink / raw)
To: Roland McGrath
Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
On Thu, May 7, 2009 at 00:03, Roland McGrath <roland@redhat.com> wrote:
>
> That is not a "ptrace problem" per se at all. It's an intrinsic problem
> with any method based on "generic" syscall interception, if the filtering
> and enforcement decisions depend on examining user memory.
Yes, this is indeed the main problem that we are aware of. It can be
avoided by suspending all threads during user memory inspection, but
that's a horrible price to pay (also: see below for an alternative
approach, that could in principle be adapted to use with ptrace)
> The only reason seccomp does not have this "reliability problem" is that
> its filtering is trivial and depends only on registers (in fact, only on
> one register, the syscall number).
Simplicity is really the beauty of seccomp. It is very easy to verify
that it does the right thing from a security point of view, because
any attempt to call unsafe system calls results in the kernel
terminating the program. This is much preferable over most ptrace
solutions which is more difficult to audit for correctness.
The downside is that the sandbox'd code needs to delegate execution of
most of its system calls to a monitor process. This is slow and rather
awkward. Although due to the magic of clone(), (almost) all system
calls can in fact be serialized, sent to the monitor process, have
their arguments safely inspected, and then executed on behalf of the
sandbox'd process. Details are tedious but we believe they are
solvable with current kernel APIs.
The other issue is performance. For system calls that are known to be
safe, we would rather not pay the penalty of redirecting them. A
kernel patch that made seccomp more efficient for these system calls
would be very welcome, and we will post such a patch for discussion
shortly.
> If you want to do checks that depend on shared or volatile state, then
> syscall interception is really not the proper mechanism for you.
We agree that syscall interception is a poor abstraction level for a
sandbox. But in the short term, we need to work with the APIs that are
available in today's kernels. And we believe that seccomp is one of
the more promising API that are currently available to us.
Markus
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 8:01 ` Markus Gutschke (顧孟勤)
0 siblings, 0 replies; 84+ messages in thread
From: @ 2009-05-07 8:01 UTC (permalink / raw)
To: Roland McGrath
Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
On Thu, May 7, 2009 at 00:03, Roland McGrath <roland@redhat.com> wrote:
>
> That is not a "ptrace problem" per se at all. Â It's an intrinsic problem
> with any method based on "generic" syscall interception, if the filtering
> and enforcement decisions depend on examining user memory.
Yes, this is indeed the main problem that we are aware of. It can be
avoided by suspending all threads during user memory inspection, but
that's a horrible price to pay (also: see below for an alternative
approach, that could in principle be adapted to use with ptrace)
> The only reason seccomp does not have this "reliability problem" is that
> its filtering is trivial and depends only on registers (in fact, only on
> one register, the syscall number).
Simplicity is really the beauty of seccomp. It is very easy to verify
that it does the right thing from a security point of view, because
any attempt to call unsafe system calls results in the kernel
terminating the program. This is much preferable over most ptrace
solutions which is more difficult to audit for correctness.
The downside is that the sandbox'd code needs to delegate execution of
most of its system calls to a monitor process. This is slow and rather
awkward. Although due to the magic of clone(), (almost) all system
calls can in fact be serialized, sent to the monitor process, have
their arguments safely inspected, and then executed on behalf of the
sandbox'd process. Details are tedious but we believe they are
solvable with current kernel APIs.
The other issue is performance. For system calls that are known to be
safe, we would rather not pay the penalty of redirecting them. A
kernel patch that made seccomp more efficient for these system calls
would be very welcome, and we will post such a patch for discussion
shortly.
> If you want to do checks that depend on shared or volatile state, then
> syscall interception is really not the proper mechanism for you.
We agree that syscall interception is a poor abstraction level for a
sandbox. But in the short term, we need to work with the APIs that are
available in today's kernels. And we believe that seccomp is one of
the more promising API that are currently available to us.
Markus
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 8:01 ` Markus Gutschke (顧孟勤)
0 siblings, 0 replies; 84+ messages in thread
From: Markus Gutschke (顧孟勤) @ 2009-05-07 8:01 UTC (permalink / raw)
To: Roland McGrath
Cc: linux-mips, Andrew Morton, x86, linux-kernel, linuxppc-dev,
sparclinux, Ingo Molnar, Linus Torvalds, stable
On Thu, May 7, 2009 at 00:03, Roland McGrath <roland@redhat.com> wrote:
>
> That is not a "ptrace problem" per se at all. =C2=A0It's an intrinsic pro=
blem
> with any method based on "generic" syscall interception, if the filtering
> and enforcement decisions depend on examining user memory.
Yes, this is indeed the main problem that we are aware of. It can be
avoided by suspending all threads during user memory inspection, but
that's a horrible price to pay (also: see below for an alternative
approach, that could in principle be adapted to use with ptrace)
> The only reason seccomp does not have this "reliability problem" is that
> its filtering is trivial and depends only on registers (in fact, only on
> one register, the syscall number).
Simplicity is really the beauty of seccomp. It is very easy to verify
that it does the right thing from a security point of view, because
any attempt to call unsafe system calls results in the kernel
terminating the program. This is much preferable over most ptrace
solutions which is more difficult to audit for correctness.
The downside is that the sandbox'd code needs to delegate execution of
most of its system calls to a monitor process. This is slow and rather
awkward. Although due to the magic of clone(), (almost) all system
calls can in fact be serialized, sent to the monitor process, have
their arguments safely inspected, and then executed on behalf of the
sandbox'd process. Details are tedious but we believe they are
solvable with current kernel APIs.
The other issue is performance. For system calls that are known to be
safe, we would rather not pay the penalty of redirecting them. A
kernel patch that made seccomp more efficient for these system calls
would be very welcome, and we will post such a patch for discussion
shortly.
> If you want to do checks that depend on shared or volatile state, then
> syscall interception is really not the proper mechanism for you.
We agree that syscall interception is a poor abstraction level for a
sandbox. But in the short term, we need to work with the APIs that are
available in today's kernels. And we believe that seccomp is one of
the more promising API that are currently available to us.
Markus
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 7:30 ` Roland McGrath
0 siblings, 0 replies; 84+ messages in thread
From: Roland McGrath @ 2009-05-07 7:30 UTC (permalink / raw)
To: markus
Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions when
> inspecting system call arguments. We hope that we can avoid this
> problem with seccomp.
ptrace certainly has performance issues. I take it the only "reliability
problems" you are talking about are MT races with modifications to user
memory that is relevant to a system call. (Is there something else?)
That is not a "ptrace problem" per se at all. It's an intrinsic problem
with any method based on "generic" syscall interception, if the filtering
and enforcement decisions depend on examining user memory. By the same
token, no such method has a "reliability problem" if the filtering checks
only examine the registers (or other thread-synchronous state).
In the sense that I mean, seccomp is "generic syscall interception" too.
(That is, the checks/enforcement are "around" the call, rather than inside
it with direct atomicity controls binding the checks and uses together.)
The only reason seccomp does not have this "reliability problem" is that
its filtering is trivial and depends only on registers (in fact, only on
one register, the syscall number).
If you want to do checks that depend on shared or volatile state, then
syscall interception is really not the proper mechanism for you. (Likely
examples include user memory, e.g. for file names in open calls, or ioctl
struct contents, etc., fd tables or filesystem details, etc.) For that
you need mechanisms that look at stable kernel copies of user data that
are what the syscall will actually use, such as is done by audit, LSM, etc.
If you only have checks confined to thread-synchronous state such as the
user registers, then you don't have any "reliability problem" regardless
of the the particular syscall interception mechanism you use. (ptrace has
many problems for this or any other purpose, but this is not one of them.)
That's unless you are referring to some other "reliability problem" that
I'm not aware of. (And I'll leave aside the "is it registers or is it
user memory?" issue on ia64 as irrelevant, since, you know, it's ia64.)
If syscall interception is indeed an appropriate mechanism for your needs
and you want something tailored more specifically to your exact use in
future kernels, a module doing this would be easy to implement using the
utrace API. (That might be a "compelling use" of utrace by virtue of the
Midas brand name effect, if nothing else. ;-)
Thanks,
Roland
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 7:30 ` Roland McGrath
0 siblings, 0 replies; 84+ messages in thread
From: Roland McGrath @ 2009-05-07 7:30 UTC (permalink / raw)
To: markus
Cc: linux-mips, Andrew Morton, x86, linux-kernel, linuxppc-dev,
sparclinux, Ingo Molnar, Linus Torvalds, stable
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions when
> inspecting system call arguments. We hope that we can avoid this
> problem with seccomp.
ptrace certainly has performance issues. I take it the only "reliability
problems" you are talking about are MT races with modifications to user
memory that is relevant to a system call. (Is there something else?)
That is not a "ptrace problem" per se at all. It's an intrinsic problem
with any method based on "generic" syscall interception, if the filtering
and enforcement decisions depend on examining user memory. By the same
token, no such method has a "reliability problem" if the filtering checks
only examine the registers (or other thread-synchronous state).
In the sense that I mean, seccomp is "generic syscall interception" too.
(That is, the checks/enforcement are "around" the call, rather than inside
it with direct atomicity controls binding the checks and uses together.)
The only reason seccomp does not have this "reliability problem" is that
its filtering is trivial and depends only on registers (in fact, only on
one register, the syscall number).
If you want to do checks that depend on shared or volatile state, then
syscall interception is really not the proper mechanism for you. (Likely
examples include user memory, e.g. for file names in open calls, or ioctl
struct contents, etc., fd tables or filesystem details, etc.) For that
you need mechanisms that look at stable kernel copies of user data that
are what the syscall will actually use, such as is done by audit, LSM, etc.
If you only have checks confined to thread-synchronous state such as the
user registers, then you don't have any "reliability problem" regardless
of the the particular syscall interception mechanism you use. (ptrace has
many problems for this or any other purpose, but this is not one of them.)
That's unless you are referring to some other "reliability problem" that
I'm not aware of. (And I'll leave aside the "is it registers or is it
user memory?" issue on ia64 as irrelevant, since, you know, it's ia64.)
If syscall interception is indeed an appropriate mechanism for your needs
and you want something tailored more specifically to your exact use in
future kernels, a module doing this would be easy to implement using the
utrace API. (That might be a "compelling use" of utrace by virtue of the
Midas brand name effect, if nothing else. ;-)
Thanks,
Roland
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 7:30 ` Roland McGrath
0 siblings, 0 replies; 84+ messages in thread
From: Roland McGrath @ 2009-05-07 7:30 UTC (permalink / raw)
To: markus
Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions when
> inspecting system call arguments. We hope that we can avoid this
> problem with seccomp.
ptrace certainly has performance issues. I take it the only "reliability
problems" you are talking about are MT races with modifications to user
memory that is relevant to a system call. (Is there something else?)
That is not a "ptrace problem" per se at all. It's an intrinsic problem
with any method based on "generic" syscall interception, if the filtering
and enforcement decisions depend on examining user memory. By the same
token, no such method has a "reliability problem" if the filtering checks
only examine the registers (or other thread-synchronous state).
In the sense that I mean, seccomp is "generic syscall interception" too.
(That is, the checks/enforcement are "around" the call, rather than inside
it with direct atomicity controls binding the checks and uses together.)
The only reason seccomp does not have this "reliability problem" is that
its filtering is trivial and depends only on registers (in fact, only on
one register, the syscall number).
If you want to do checks that depend on shared or volatile state, then
syscall interception is really not the proper mechanism for you. (Likely
examples include user memory, e.g. for file names in open calls, or ioctl
struct contents, etc., fd tables or filesystem details, etc.) For that
you need mechanisms that look at stable kernel copies of user data that
are what the syscall will actually use, such as is done by audit, LSM, etc.
If you only have checks confined to thread-synchronous state such as the
user registers, then you don't have any "reliability problem" regardless
of the the particular syscall interception mechanism you use. (ptrace has
many problems for this or any other purpose, but this is not one of them.)
That's unless you are referring to some other "reliability problem" that
I'm not aware of. (And I'll leave aside the "is it registers or is it
user memory?" issue on ia64 as irrelevant, since, you know, it's ia64.)
If syscall interception is indeed an appropriate mechanism for your needs
and you want something tailored more specifically to your exact use in
future kernels, a module doing this would be easy to implement using the
utrace API. (That might be a "compelling use" of utrace by virtue of the
Midas brand name effect, if nothing else. ;-)
Thanks,
Roland
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 7:31 ` Roland McGrath
0 siblings, 0 replies; 84+ messages in thread
From: Roland McGrath @ 2009-05-07 7:31 UTC (permalink / raw)
To: markus
Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions when
> inspecting system call arguments. We hope that we can avoid this
> problem with seccomp.
ptrace certainly has performance issues. I take it the only "reliability
problems" you are talking about are MT races with modifications to user
memory that is relevant to a system call. (Is there something else?)
That is not a "ptrace problem" per se at all. It's an intrinsic problem
with any method based on "generic" syscall interception, if the filtering
and enforcement decisions depend on examining user memory. By the same
token, no such method has a "reliability problem" if the filtering checks
only examine the registers (or other thread-synchronous state).
In the sense that I mean, seccomp is "generic syscall interception" too.
(That is, the checks/enforcement are "around" the call, rather than inside
it with direct atomicity controls binding the checks and uses together.)
The only reason seccomp does not have this "reliability problem" is that
its filtering is trivial and depends only on registers (in fact, only on
one register, the syscall number).
If you want to do checks that depend on shared or volatile state, then
syscall interception is really not the proper mechanism for you. (Likely
examples include user memory, e.g. for file names in open calls, or ioctl
struct contents, etc., fd tables or filesystem details, etc.) For that
you need mechanisms that look at stable kernel copies of user data that
are what the syscall will actually use, such as is done by audit, LSM, etc.
If you only have checks confined to thread-synchronous state such as the
user registers, then you don't have any "reliability problem" regardless
of the the particular syscall interception mechanism you use. (ptrace has
many problems for this or any other purpose, but this is not one of them.)
That's unless you are referring to some other "reliability problem" that
I'm not aware of. (And I'll leave aside the "is it registers or is it
user memory?" issue on ia64 as irrelevant, since, you know, it's ia64.)
If syscall interception is indeed an appropriate mechanism for your needs
and you want something tailored more specifically to your exact use in
future kernels, a module doing this would be easy to implement using the
utrace API. (That might be a "compelling use" of utrace by virtue of the
Midas brand name effect, if nothing else. ;-)
Thanks,
Roland
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 7:31 ` Roland McGrath
0 siblings, 0 replies; 84+ messages in thread
From: Roland McGrath @ 2009-05-07 7:31 UTC (permalink / raw)
To: markus
Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions when
> inspecting system call arguments. We hope that we can avoid this
> problem with seccomp.
ptrace certainly has performance issues. I take it the only "reliability
problems" you are talking about are MT races with modifications to user
memory that is relevant to a system call. (Is there something else?)
That is not a "ptrace problem" per se at all. It's an intrinsic problem
with any method based on "generic" syscall interception, if the filtering
and enforcement decisions depend on examining user memory. By the same
token, no such method has a "reliability problem" if the filtering checks
only examine the registers (or other thread-synchronous state).
In the sense that I mean, seccomp is "generic syscall interception" too.
(That is, the checks/enforcement are "around" the call, rather than inside
it with direct atomicity controls binding the checks and uses together.)
The only reason seccomp does not have this "reliability problem" is that
its filtering is trivial and depends only on registers (in fact, only on
one register, the syscall number).
If you want to do checks that depend on shared or volatile state, then
syscall interception is really not the proper mechanism for you. (Likely
examples include user memory, e.g. for file names in open calls, or ioctl
struct contents, etc., fd tables or filesystem details, etc.) For that
you need mechanisms that look at stable kernel copies of user data that
are what the syscall will actually use, such as is done by audit, LSM, etc.
If you only have checks confined to thread-synchronous state such as the
user registers, then you don't have any "reliability problem" regardless
of the the particular syscall interception mechanism you use. (ptrace has
many problems for this or any other purpose, but this is not one of them.)
That's unless you are referring to some other "reliability problem" that
I'm not aware of. (And I'll leave aside the "is it registers or is it
user memory?" issue on ia64 as irrelevant, since, you know, it's ia64.)
If syscall interception is indeed an appropriate mechanism for your needs
and you want something tailored more specifically to your exact use in
future kernels, a module doing this would be easy to implement using the
utrace API. (That might be a "compelling use" of utrace by virtue of the
Midas brand name effect, if nothing else. ;-)
Thanks,
Roland
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 7:31 ` Roland McGrath
0 siblings, 0 replies; 84+ messages in thread
From: Roland McGrath @ 2009-05-07 7:31 UTC (permalink / raw)
To: markus
Cc: linux-mips, Andrew Morton, x86, linux-kernel, linuxppc-dev,
sparclinux, Ingo Molnar, Linus Torvalds, stable
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions when
> inspecting system call arguments. We hope that we can avoid this
> problem with seccomp.
ptrace certainly has performance issues. I take it the only "reliability
problems" you are talking about are MT races with modifications to user
memory that is relevant to a system call. (Is there something else?)
That is not a "ptrace problem" per se at all. It's an intrinsic problem
with any method based on "generic" syscall interception, if the filtering
and enforcement decisions depend on examining user memory. By the same
token, no such method has a "reliability problem" if the filtering checks
only examine the registers (or other thread-synchronous state).
In the sense that I mean, seccomp is "generic syscall interception" too.
(That is, the checks/enforcement are "around" the call, rather than inside
it with direct atomicity controls binding the checks and uses together.)
The only reason seccomp does not have this "reliability problem" is that
its filtering is trivial and depends only on registers (in fact, only on
one register, the syscall number).
If you want to do checks that depend on shared or volatile state, then
syscall interception is really not the proper mechanism for you. (Likely
examples include user memory, e.g. for file names in open calls, or ioctl
struct contents, etc., fd tables or filesystem details, etc.) For that
you need mechanisms that look at stable kernel copies of user data that
are what the syscall will actually use, such as is done by audit, LSM, etc.
If you only have checks confined to thread-synchronous state such as the
user registers, then you don't have any "reliability problem" regardless
of the the particular syscall interception mechanism you use. (ptrace has
many problems for this or any other purpose, but this is not one of them.)
That's unless you are referring to some other "reliability problem" that
I'm not aware of. (And I'll leave aside the "is it registers or is it
user memory?" issue on ia64 as irrelevant, since, you know, it's ia64.)
If syscall interception is indeed an appropriate mechanism for your needs
and you want something tailored more specifically to your exact use in
future kernels, a module doing this would be easy to implement using the
utrace API. (That might be a "compelling use" of utrace by virtue of the
Midas brand name effect, if nothing else. ;-)
Thanks,
Roland
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
@ 2009-05-07 7:31 ` Roland McGrath
0 siblings, 0 replies; 84+ messages in thread
From: Roland McGrath @ 2009-05-07 7:31 UTC (permalink / raw)
To: markus
Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, x86, linux-kernel,
stable, linux-mips, sparclinux, linuxppc-dev
> Ptrace has performance and/or reliability problems when used to
> sandbox threaded applications due to potential race conditions when
> inspecting system call arguments. We hope that we can avoid this
> problem with seccomp.
ptrace certainly has performance issues. I take it the only "reliability
problems" you are talking about are MT races with modifications to user
memory that is relevant to a system call. (Is there something else?)
That is not a "ptrace problem" per se at all. It's an intrinsic problem
with any method based on "generic" syscall interception, if the filtering
and enforcement decisions depend on examining user memory. By the same
token, no such method has a "reliability problem" if the filtering checks
only examine the registers (or other thread-synchronous state).
In the sense that I mean, seccomp is "generic syscall interception" too.
(That is, the checks/enforcement are "around" the call, rather than inside
it with direct atomicity controls binding the checks and uses together.)
The only reason seccomp does not have this "reliability problem" is that
its filtering is trivial and depends only on registers (in fact, only on
one register, the syscall number).
If you want to do checks that depend on shared or volatile state, then
syscall interception is really not the proper mechanism for you. (Likely
examples include user memory, e.g. for file names in open calls, or ioctl
struct contents, etc., fd tables or filesystem details, etc.) For that
you need mechanisms that look at stable kernel copies of user data that
are what the syscall will actually use, such as is done by audit, LSM, etc.
If you only have checks confined to thread-synchronous state such as the
user registers, then you don't have any "reliability problem" regardless
of the the particular syscall interception mechanism you use. (ptrace has
many problems for this or any other purpose, but this is not one of them.)
That's unless you are referring to some other "reliability problem" that
I'm not aware of. (And I'll leave aside the "is it registers or is it
user memory?" issue on ia64 as irrelevant, since, you know, it's ia64.)
If syscall interception is indeed an appropriate mechanism for your needs
and you want something tailored more specifically to your exact use in
future kernels, a module doing this would be easy to implement using the
utrace API. (That might be a "compelling use" of utrace by virtue of the
Midas brand name effect, if nothing else. ;-)
Thanks,
Roland
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-07 7:31 ` Roland McGrath
` (2 preceding siblings ...)
(?)
@ 2009-05-08 1:59 ` David Wagner
2009-05-10 5:36 ` Pavel Machek
-1 siblings, 1 reply; 84+ messages in thread
From: David Wagner @ 2009-05-08 1:59 UTC (permalink / raw)
To: linux-kernel
Roland McGrath wrote:
>> Ptrace has performance and/or reliability problems when used to
>> sandbox threaded applications due to potential race conditions when
>> inspecting system call arguments. We hope that we can avoid this
>> problem with seccomp.
>
>ptrace certainly has performance issues. I take it the only "reliability
>problems" you are talking about are MT races with modifications to user
>memory that is relevant to a system call. (Is there something else?)
As of 1999, I believe there were some other issues for using ptrace
securely:
1. I do not know of a good way to reliably ensure that all children of
a traced program will be traced as well. If you wait for the fork()
call to return, check the pid, and start tracing the child process,
you are subject to race conditions. (strace's solution is to modify
the code of the traced program to put a trapping instruction immediately
after the call site to fork(). This is a grody hack and I had a hard
time convincing myself that this will be secure in all cases.)
2. ptrace disrupts the process hierarchy and Unix signals. Because of
the way ptrace overloads signals to deliver tracing events, tracing is
not transparent. For instance, if the parent and child are both traced,
and the parent waits for a signal from the child, things may no longer
work the same way while being traced. Working around this requires
non-trivial code. Complexity is the enemy of security and makes it hard
to gain confidence this doesn't introduce subtle issues.
3. If the tracing application should happen to die unexpectedly
(OOM, anyone?), I believe the traced application continues running,
now without any security checks.
4. I seem to recall that when I looked at this in 1999, if the traced
app makes a syscall that should not be allowed, I couldn't find a good
way to prevent that syscall from executing. I don't know if current
ptrace has solved this problem.
Disclaimer: I haven't checked whether these all still apply today.
^ permalink raw reply [flat|nested] 84+ messages in thread* Re: [PATCH 2/2] x86-64: seccomp: fix 32/64 syscall hole
2009-05-08 1:59 ` David Wagner
@ 2009-05-10 5:36 ` Pavel Machek
0 siblings, 0 replies; 84+ messages in thread
From: Pavel Machek @ 2009-05-10 5:36 UTC (permalink / raw)
To: David Wagner; +Cc: linux-kernel
Hi!
> >> Ptrace has performance and/or reliability problems when used to
> >> sandbox threaded applications due to potential race conditions when
> >> inspecting system call arguments. We hope that we can avoid this
> >> problem with seccomp.
> >
> >ptrace certainly has performance issues. I take it the only "reliability
> >problems" you are talking about are MT races with modifications to user
> >memory that is relevant to a system call. (Is there something else?)
>
> As of 1999, I believe there were some other issues for using ptrace
> securely:
>
> 1. I do not know of a good way to reliably ensure that all children of
> a traced program will be traced as well. If you wait for the fork()
> call to return, check the pid, and start tracing the child process,
> you are subject to race conditions. (strace's solution is to modify
> the code of the traced program to put a trapping instruction immediately
> after the call site to fork(). This is a grody hack and I had a hard
> time convincing myself that this will be secure in all cases.)
>
> 2. ptrace disrupts the process hierarchy and Unix signals. Because of
> the way ptrace overloads signals to deliver tracing events, tracing is
> not transparent. For instance, if the parent and child are both traced,
> and the parent waits for a signal from the child, things may no longer
> work the same way while being traced. Working around this requires
> non-trivial code. Complexity is the enemy of security and makes it hard
> to gain confidence this doesn't introduce subtle issues.
>
> 3. If the tracing application should happen to die unexpectedly
> (OOM, anyone?), I believe the traced application continues running,
> now without any security checks.
>
> 4. I seem to recall that when I looked at this in 1999, if the traced
> app makes a syscall that should not be allowed, I couldn't find a good
> way to prevent that syscall from executing. I don't know if current
> ptrace has solved this problem.
>
> Disclaimer: I haven't checked whether these all still apply today.
subterfugue.net has ptrace-based monitor that is secure AFAICT. We
improved ptrace for it a bit...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply [flat|nested] 84+ messages in thread