* [3.1 REGRESSION] Commit 5cec93c216db77c45f7ce970d46283bcb1933884 breaks the Chromium seccomp sandbox
@ 2011-11-14 0:40 Nix
2011-11-14 2:36 ` Andrew Lutomirski
0 siblings, 1 reply; 7+ messages in thread
From: Nix @ 2011-11-14 0:40 UTC (permalink / raw)
To: linux-kernel; +Cc: Andy Lutomirski, mseaborn
With this commit installed:
commit 5cec93c216db77c45f7ce970d46283bcb1933884
Author: Andy Lutomirski <luto@MIT.EDU>
Date: Sun Jun 5 13:50:24 2011 -0400
x86-64: Emulate legacy vsyscalls
With CONFIG_SECCOMP set, and the Chromium seccomp sandbox compiled in
and enabled (which is not the default), on a system running glibc 2.12.x
(thus, relying on emulated vsyscalls), Chromium renderers sometimes hang
or abruptly abort before rendering anything (both of which show as pages
that never complete rendering and eventually get a Chromium kill request
dialog). The hang is consistent for a given page, but not all pages
hang. (One that *does* hang is the chrome://extensions page, so network
access is not the problem here.)
vsyscall=native does not help.
Turning off CONFIG_SECCOMP, or running Chromium with the seccomp sandbox
disabled, fixes it.
I speculate that do_emulate_vsyscall() is broken, but it's hard to debug
the Chromium renderer sandboxing to see what's failing because the
multiple layers of sandboxing get in the way, as they are designed to :)
(also, I am not in any way shape or form a Chromium hacker). Chromium
does downright terrible things to convert syscalls into IPC calls
outside the seccomp sandbox (mostly to a separate, nonseccomped,
assembler thread in the same process, but in some cases via IPC to an
entirely separate process for validation followed by IPC back and
execution in that separate thread). I suspect this delicate dance (for
which see seccompsandbox/ in a Chromium source tree) has been disrupted.
I have raised Chromium bug
<http://code.google.com/p/chromium/issues/detail?id=104084> to attract
the Chromium hackers' attention to this, and am Cc:ing a Chromium hacker
whose fingers are all over the seccomp sandbox as well. Hopefully the
cause, or a diagnostic trick, will be obvious to someone other than me.
--
NULL && (void)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [3.1 REGRESSION] Commit 5cec93c216db77c45f7ce970d46283bcb1933884 breaks the Chromium seccomp sandbox
2011-11-14 0:40 [3.1 REGRESSION] Commit 5cec93c216db77c45f7ce970d46283bcb1933884 breaks the Chromium seccomp sandbox Nix
@ 2011-11-14 2:36 ` Andrew Lutomirski
2011-11-14 4:00 ` Andrew Lutomirski
2011-11-14 6:50 ` Mark Seaborn
0 siblings, 2 replies; 7+ messages in thread
From: Andrew Lutomirski @ 2011-11-14 2:36 UTC (permalink / raw)
To: Nix; +Cc: linux-kernel, mseaborn
On Sun, Nov 13, 2011 at 4:40 PM, Nix <nix@esperi.org.uk> wrote:
> With this commit installed:
>
> commit 5cec93c216db77c45f7ce970d46283bcb1933884
> Author: Andy Lutomirski <luto@MIT.EDU>
> Date: Sun Jun 5 13:50:24 2011 -0400
>
> x86-64: Emulate legacy vsyscalls
>
> With CONFIG_SECCOMP set, and the Chromium seccomp sandbox compiled in
> and enabled (which is not the default), on a system running glibc 2.12.x
> (thus, relying on emulated vsyscalls), Chromium renderers sometimes hang
> or abruptly abort before rendering anything (both of which show as pages
> that never complete rendering and eventually get a Chromium kill request
> dialog). The hang is consistent for a given page, but not all pages
> hang. (One that *does* hang is the chrome://extensions page, so network
> access is not the problem here.)
>
> vsyscall=native does not help.
>
> Turning off CONFIG_SECCOMP, or running Chromium with the seccomp sandbox
> disabled, fixes it.
>
> I speculate that do_emulate_vsyscall() is broken, but it's hard to debug
> the Chromium renderer sandboxing to see what's failing because the
> multiple layers of sandboxing get in the way, as they are designed to :)
I don't buy that explanation -- with vsyscall=native,
do_emulate_vsyscall shouldn't be called at all. I have a much simpler
explanation: the Chromium sandbox is calling vsyscalls in seccomp
mode, which has no business working.
The attached patch (in vsyscall=native mode) should help diagnose
exactly what's wrong. But I wouldn't be surprised if you can trigger
the exact same failure on older kernels by doing
# echo acpi_pm >/sys/devices/system/clocksource/clocksource0/current_clocksource
and then going to the broken page. Just because vgettimeofday
sometimes doesn't issue a syscall (and is therefore not caught by the
seccomp code) doesn't mean it never issues a syscall.
If the crash is in vtime instead, then there's an argument to be made
that time and vtime should be allowed in seccomp mode, in which case
it's an easy change to make. But if I'm right, I think that Chromium
should stop using vsyscalls from inside the sandbox.
--Andy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [3.1 REGRESSION] Commit 5cec93c216db77c45f7ce970d46283bcb1933884 breaks the Chromium seccomp sandbox
2011-11-14 2:36 ` Andrew Lutomirski
@ 2011-11-14 4:00 ` Andrew Lutomirski
2011-11-14 6:50 ` Mark Seaborn
1 sibling, 0 replies; 7+ messages in thread
From: Andrew Lutomirski @ 2011-11-14 4:00 UTC (permalink / raw)
To: Nix; +Cc: linux-kernel, mseaborn
[-- Attachment #1: Type: text/plain, Size: 312 bytes --]
On Sun, Nov 13, 2011 at 6:36 PM, Andrew Lutomirski <luto@mit.edu> wrote:
>
> The attached patch (in vsyscall=native mode) should help diagnose
> exactly what's wrong. But I wouldn't be surprised if you can trigger
> the exact same failure on older kernels by doing
Now with actual attachment.
--Andy
[-- Attachment #2: log_seccomp_kill.patch --]
[-- Type: text/x-patch, Size: 359 bytes --]
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 57d4b13..29deaf8 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -54,6 +54,7 @@ void __secure_computing(int this_syscall)
#ifdef SECCOMP_DEBUG
dump_stack();
#endif
+ printk(KERN_ERR "Killing %d due to bad seccomp syscall %d\n", (int)current->pid, (int)this_syscall);
do_exit(SIGKILL);
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [3.1 REGRESSION] Commit 5cec93c216db77c45f7ce970d46283bcb1933884 breaks the Chromium seccomp sandbox
2011-11-14 2:36 ` Andrew Lutomirski
2011-11-14 4:00 ` Andrew Lutomirski
@ 2011-11-14 6:50 ` Mark Seaborn
2011-11-14 8:38 ` Andrew Lutomirski
1 sibling, 1 reply; 7+ messages in thread
From: Mark Seaborn @ 2011-11-14 6:50 UTC (permalink / raw)
To: Andrew Lutomirski; +Cc: Nix, linux-kernel, Markus Gutschke
On 13 November 2011 18:36, Andrew Lutomirski <luto@mit.edu> wrote:
>
> On Sun, Nov 13, 2011 at 4:40 PM, Nix <nix@esperi.org.uk> wrote:
> > With this commit installed:
> >
> > commit 5cec93c216db77c45f7ce970d46283bcb1933884
> > Author: Andy Lutomirski <luto@MIT.EDU>
> > Date: Sun Jun 5 13:50:24 2011 -0400
> >
> > x86-64: Emulate legacy vsyscalls
> >
> > With CONFIG_SECCOMP set, and the Chromium seccomp sandbox compiled in
> > and enabled (which is not the default), on a system running glibc 2.12.x
> > (thus, relying on emulated vsyscalls), Chromium renderers sometimes hang
> > or abruptly abort before rendering anything (both of which show as pages
> > that never complete rendering and eventually get a Chromium kill request
> > dialog). The hang is consistent for a given page, but not all pages
> > hang. (One that *does* hang is the chrome://extensions page, so network
> > access is not the problem here.)
> >
> > vsyscall=native does not help.
> >
> > Turning off CONFIG_SECCOMP, or running Chromium with the seccomp sandbox
> > disabled, fixes it.
> >
> > I speculate that do_emulate_vsyscall() is broken, but it's hard to debug
> > the Chromium renderer sandboxing to see what's failing because the
> > multiple layers of sandboxing get in the way, as they are designed to :)
>
> I don't buy that explanation -- with vsyscall=native,
> do_emulate_vsyscall shouldn't be called at all. I have a much simpler
> explanation: the Chromium sandbox is calling vsyscalls in seccomp
> mode, which has no business working.
I think the problem is that seccomp-sandbox attempts to patch the
vsyscall page. It replaces the SYSCALL instructions in this page with
jumps to seccomp-sandbox's handler. (More accurately, seccomp-sandbox
creates a patched copy of the vsyscall page. It redirects glibc's
indirect jumps so that they go to the patched copy of the vsyscall
page instead of to the original.) The code for this is in
patchVSystemCalls() in library.cc
(http://code.google.com/p/seccompsandbox/source/browse/trunk/library.cc).
If the vsyscall page's code no longer invokes the kernel via SYSCALL
instructions but via some other trap, seccomp-sandbox's trick will no
longer work, because it doesn't know to patch the instructions that do
this new trap.
Mark
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [3.1 REGRESSION] Commit 5cec93c216db77c45f7ce970d46283bcb1933884 breaks the Chromium seccomp sandbox
2011-11-14 6:50 ` Mark Seaborn
@ 2011-11-14 8:38 ` Andrew Lutomirski
2011-11-14 11:41 ` Nix
2011-11-14 16:26 ` Mark Seaborn
0 siblings, 2 replies; 7+ messages in thread
From: Andrew Lutomirski @ 2011-11-14 8:38 UTC (permalink / raw)
To: Mark Seaborn; +Cc: Nix, linux-kernel, Markus Gutschke
On Sun, Nov 13, 2011 at 10:50 PM, Mark Seaborn <mseaborn@chromium.org> wrote:
> On 13 November 2011 18:36, Andrew Lutomirski <luto@mit.edu> wrote:
>>
>> On Sun, Nov 13, 2011 at 4:40 PM, Nix <nix@esperi.org.uk> wrote:
>> > With this commit installed:
>> >
>> > commit 5cec93c216db77c45f7ce970d46283bcb1933884
>> > Author: Andy Lutomirski <luto@MIT.EDU>
>> > Date: Sun Jun 5 13:50:24 2011 -0400
>> >
>> > x86-64: Emulate legacy vsyscalls
>> >
>> > With CONFIG_SECCOMP set, and the Chromium seccomp sandbox compiled in
>> > and enabled (which is not the default), on a system running glibc 2.12.x
>> > (thus, relying on emulated vsyscalls), Chromium renderers sometimes hang
>> > or abruptly abort before rendering anything (both of which show as pages
>> > that never complete rendering and eventually get a Chromium kill request
>> > dialog). The hang is consistent for a given page, but not all pages
>> > hang. (One that *does* hang is the chrome://extensions page, so network
>> > access is not the problem here.)
>> >
>> > vsyscall=native does not help.
>> >
>> > Turning off CONFIG_SECCOMP, or running Chromium with the seccomp sandbox
>> > disabled, fixes it.
>> >
>> > I speculate that do_emulate_vsyscall() is broken, but it's hard to debug
>> > the Chromium renderer sandboxing to see what's failing because the
>> > multiple layers of sandboxing get in the way, as they are designed to :)
>>
>> I don't buy that explanation -- with vsyscall=native,
>> do_emulate_vsyscall shouldn't be called at all. I have a much simpler
>> explanation: the Chromium sandbox is calling vsyscalls in seccomp
>> mode, which has no business working.
>
> I think the problem is that seccomp-sandbox attempts to patch the
> vsyscall page. It replaces the SYSCALL instructions in this page with
> jumps to seccomp-sandbox's handler. (More accurately, seccomp-sandbox
> creates a patched copy of the vsyscall page. It redirects glibc's
> indirect jumps so that they go to the patched copy of the vsyscall
> page instead of to the original.) The code for this is in
> patchVSystemCalls() in library.cc
> (http://code.google.com/p/seccompsandbox/source/browse/trunk/library.cc).
>
> If the vsyscall page's code no longer invokes the kernel via SYSCALL
> instructions but via some other trap, seccomp-sandbox's trick will no
> longer work, because it doesn't know to patch the instructions that do
> this new trap.
The vsyscall code is now:
mov $__NR_whatever %rax
syscall
ret
It used to be weirder, but we changed to to avoid breaking things like
this. The secret is that, if vsyscall=emulate, the vsyscall page is
not executable and we use the page fault to invoke
do_emulate_vsyscall. But userspace can't tell it's not executable
without actually jumping there, and with vsyscall=native, it's just a
normal syscall.
I'll try to build a sandboxing copy of chromium tomorrow to see if I
can reproduce it.
--Andy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [3.1 REGRESSION] Commit 5cec93c216db77c45f7ce970d46283bcb1933884 breaks the Chromium seccomp sandbox
2011-11-14 8:38 ` Andrew Lutomirski
@ 2011-11-14 11:41 ` Nix
2011-11-14 16:26 ` Mark Seaborn
1 sibling, 0 replies; 7+ messages in thread
From: Nix @ 2011-11-14 11:41 UTC (permalink / raw)
To: Andrew Lutomirski; +Cc: Mark Seaborn, linux-kernel, Markus Gutschke
On 14 Nov 2011, Andrew Lutomirski stated:
> On Sun, Nov 13, 2011 at 10:50 PM, Mark Seaborn <mseaborn@chromium.org> wrote:
>> I think the problem is that seccomp-sandbox attempts to patch the
>> vsyscall page.
I grepped for vsyscall to try to avoid making an idiot of myself like
this. A shame I misspelt it. :/
> The vsyscall code is now:
>
> mov $__NR_whatever %rax
> syscall
> ret
>
> It used to be weirder, but we changed to to avoid breaking things like
> this. The secret is that, if vsyscall=emulate, the vsyscall page is
> not executable and we use the page fault to invoke
> do_emulate_vsyscall. But userspace can't tell it's not executable
> without actually jumping there, and with vsyscall=native, it's just a
> normal syscall.
>
> I'll try to build a sandboxing copy of chromium tomorrow to see if I
> can reproduce it.
If you look at line 909 of seccompsandbox/library.cc (in
http://git.chromium.org/external/seccompsandbox.git) the problem does
indeed jump out at you. That nice manual disassembly using == isn't
going to work anymore. It even helpfully dies with a message saying that
it can't patch the vsyscall page, but the message gets thrown away by
some higher layer.
I suspect we want two functions and something to recognize what the
vsyscall page looks like and choose between them, rather than making
this already tricky function even uglier.
(To build Chromium with sandboxing, the recipe is little more than
making sure you haven't passed in -Dselinux, and starting Chromium with
--enable-seccomp-sandbox. It's turned off by default because it slows
Chromium down, not because it doesn't work. :) )
--
NULL && (void)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [3.1 REGRESSION] Commit 5cec93c216db77c45f7ce970d46283bcb1933884 breaks the Chromium seccomp sandbox
2011-11-14 8:38 ` Andrew Lutomirski
2011-11-14 11:41 ` Nix
@ 2011-11-14 16:26 ` Mark Seaborn
1 sibling, 0 replies; 7+ messages in thread
From: Mark Seaborn @ 2011-11-14 16:26 UTC (permalink / raw)
To: Andrew Lutomirski; +Cc: Nix, linux-kernel, Markus Gutschke
On 14 November 2011 00:38, Andrew Lutomirski <luto@mit.edu> wrote:
> On Sun, Nov 13, 2011 at 10:50 PM, Mark Seaborn <mseaborn@chromium.org> wrote:
>> I think the problem is that seccomp-sandbox attempts to patch the
>> vsyscall page. It replaces the SYSCALL instructions in this page with
>> jumps to seccomp-sandbox's handler. (More accurately, seccomp-sandbox
>> creates a patched copy of the vsyscall page. It redirects glibc's
>> indirect jumps so that they go to the patched copy of the vsyscall
>> page instead of to the original.) The code for this is in
>> patchVSystemCalls() in library.cc
>> (http://code.google.com/p/seccompsandbox/source/browse/trunk/library.cc).
>>
>> If the vsyscall page's code no longer invokes the kernel via SYSCALL
>> instructions but via some other trap, seccomp-sandbox's trick will no
>> longer work, because it doesn't know to patch the instructions that do
>> this new trap.
>
> The vsyscall code is now:
>
> mov $__NR_whatever %rax
> syscall
> ret
>
> It used to be weirder, but we changed to to avoid breaking things like
> this. The secret is that, if vsyscall=emulate, the vsyscall page is
> not executable and we use the page fault to invoke
> do_emulate_vsyscall. But userspace can't tell it's not executable
> without actually jumping there, and with vsyscall=native, it's just a
> normal syscall.
Ah, that's much nicer. In that case, the fix should just be a case of
adding the new syscall numbers to the whitelist in
system_call_table.cc.
The current hang might just be occurring because libpthread's timeout
calculations come out wrong. libpthread doesn't check for errors from
its calls to vsyscall routines, at least in
nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S in the
version I checked.
> I'll try to build a sandboxing copy of chromium tomorrow to see if I
> can reproduce it.
You don't necessarily need to build Chromium. You can try running
"make test" in a checkout of seccompsandbox. That might not catch the
problem, though. I did not realise that the vsyscall page was still
used by glibc when I wrote those tests.
Cheers,
Mark
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-11-14 16:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-14 0:40 [3.1 REGRESSION] Commit 5cec93c216db77c45f7ce970d46283bcb1933884 breaks the Chromium seccomp sandbox Nix
2011-11-14 2:36 ` Andrew Lutomirski
2011-11-14 4:00 ` Andrew Lutomirski
2011-11-14 6:50 ` Mark Seaborn
2011-11-14 8:38 ` Andrew Lutomirski
2011-11-14 11:41 ` Nix
2011-11-14 16:26 ` Mark Seaborn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox