* [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath @ 2014-05-28 12:19 Philipp Kern 2014-05-28 20:47 ` Andy Lutomirski 0 siblings, 1 reply; 10+ messages in thread From: Philipp Kern @ 2014-05-28 12:19 UTC (permalink / raw) To: hpa; +Cc: Philipp Kern, linux-kernel, H. J. Lu, Eric Paris audit_filter_syscall uses the syscall number to reference into a bitmask (e->rule.mask[word]). Not removing the x32 bit before passing the number to this architecture independent codepath will fail to lookup the proper audit bit. Furthermore it will cause an invalid memory access in the kernel if the out of bound location is not mapped: BUG: unable to handle kernel paging request at ffff8800e5446630 IP: [<ffffffff810fcdd0>] audit_filter_syscall+0x90/0xf0 Together with the entrypoint in entry_64.S this change causes x32 programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending on the syscall path. Cc: linux-kernel@vger.kernel.org Cc: H. J. Lu <hjl.tools@gmail.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Philipp Kern <pkern@google.com> --- arch/x86/kernel/ptrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index 678c0ad..166a3c7 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs) if (IS_IA32) audit_syscall_entry(AUDIT_ARCH_I386, - regs->orig_ax, + regs->orig_ax & __SYSCALL_MASK, regs->bx, regs->cx, regs->dx, regs->si); #ifdef CONFIG_X86_64 -- 1.9.1.423.g4596e3a ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath 2014-05-28 12:19 [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath Philipp Kern @ 2014-05-28 20:47 ` Andy Lutomirski 2014-05-28 20:58 ` Philipp Kern 2014-05-28 21:01 ` H. Peter Anvin 0 siblings, 2 replies; 10+ messages in thread From: Andy Lutomirski @ 2014-05-28 20:47 UTC (permalink / raw) To: Philipp Kern, hpa; +Cc: linux-kernel, H. J. Lu, Eric Paris On 05/28/2014 05:19 AM, Philipp Kern wrote: > audit_filter_syscall uses the syscall number to reference into a > bitmask (e->rule.mask[word]). Not removing the x32 bit before passing > the number to this architecture independent codepath will fail to > lookup the proper audit bit. Furthermore it will cause an invalid memory > access in the kernel if the out of bound location is not mapped: > > BUG: unable to handle kernel paging request at ffff8800e5446630 > IP: [<ffffffff810fcdd0>] audit_filter_syscall+0x90/0xf0 > > Together with the entrypoint in entry_64.S this change causes x32 > programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending > on the syscall path. > > Cc: linux-kernel@vger.kernel.org > Cc: H. J. Lu <hjl.tools@gmail.com> > Cc: Eric Paris <eparis@redhat.com> > Signed-off-by: Philipp Kern <pkern@google.com> > --- > arch/x86/kernel/ptrace.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c > index 678c0ad..166a3c7 100644 > --- a/arch/x86/kernel/ptrace.c > +++ b/arch/x86/kernel/ptrace.c > @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs) > > if (IS_IA32) > audit_syscall_entry(AUDIT_ARCH_I386, > - regs->orig_ax, > + regs->orig_ax & __SYSCALL_MASK, This is weird. Three questions: 1. How can this happen? I thought that x32 syscalls always came in through the syscall path, which doesn't set is_compat_task. (Can someone rename is_compat_task to in_compat_syscall? Pretty please?) 2. Now audit can't tell whether a syscall is x32 or i386. And audit is inconsistent with seccomp. This seems wrong. 3. The OOPS you're fixing doesn't seem like it's fixed. What if some other random high bits are set? --Andy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath 2014-05-28 20:47 ` Andy Lutomirski @ 2014-05-28 20:58 ` Philipp Kern 2014-05-28 21:01 ` H. Peter Anvin 1 sibling, 0 replies; 10+ messages in thread From: Philipp Kern @ 2014-05-28 20:58 UTC (permalink / raw) To: Andy Lutomirski; +Cc: hpa, linux-kernel, H. J. Lu, Eric Paris On Wed, May 28, 2014 at 10:47 PM, Andy Lutomirski <luto@amacapital.net> wrote: > On 05/28/2014 05:19 AM, Philipp Kern wrote: > > audit_filter_syscall uses the syscall number to reference into a > > bitmask (e->rule.mask[word]). Not removing the x32 bit before passing > > the number to this architecture independent codepath will fail to > > lookup the proper audit bit. Furthermore it will cause an invalid memory > > access in the kernel if the out of bound location is not mapped: > > > > BUG: unable to handle kernel paging request at ffff8800e5446630 > > IP: [<ffffffff810fcdd0>] audit_filter_syscall+0x90/0xf0 > > > > Together with the entrypoint in entry_64.S this change causes x32 > > programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending > > on the syscall path. > > > > Cc: linux-kernel@vger.kernel.org > > Cc: H. J. Lu <hjl.tools@gmail.com> > > Cc: Eric Paris <eparis@redhat.com> > > Signed-off-by: Philipp Kern <pkern@google.com> > > --- > > arch/x86/kernel/ptrace.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c > > index 678c0ad..166a3c7 100644 > > --- a/arch/x86/kernel/ptrace.c > > +++ b/arch/x86/kernel/ptrace.c > > @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs) > > > > if (IS_IA32) > > audit_syscall_entry(AUDIT_ARCH_I386, > > - regs->orig_ax, > > + regs->orig_ax & __SYSCALL_MASK, > > This is weird. Three questions: > > 1. How can this happen? I thought that x32 syscalls always came in > through the syscall path, which doesn't set is_compat_task. (Can > someone rename is_compat_task to in_compat_syscall? Pretty please?) The other patch is this one: https://lkml.org/lkml/2014/5/26/209 But I agree: IS_IA32 is confusing but it does trigger on x32 tasks. I put that into a bug report but apparently not into the patch description, I'm sorry. is_compat_task is defined as is_ia32_task() || is_x32_task(). > 2. Now audit can't tell whether a syscall is x32 or i386. And audit is > inconsistent with seccomp. This seems wrong. I'm not sure where seccomp is hooked in. Can you point me to the entry points here? ptrace still requires that userspace sees the x32 bit so it cannot be masked beforehand. The path the other patch fixes masks the bit away directly after the audit call. > 3. The OOPS you're fixing doesn't seem like it's fixed. What if some > other random high bits are set? Fair point. I guess we should check against AUDIT_BITMASK_SIZE as that is the size of the array we are referencing into. But what would we do in case it's larger? Those are not ENOSYS'ed yet in the entry_64.S path. That's after tracing and auditing. Kind regards and thanks Philipp Kern ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath 2014-05-28 20:47 ` Andy Lutomirski 2014-05-28 20:58 ` Philipp Kern @ 2014-05-28 21:01 ` H. Peter Anvin 2014-05-28 21:15 ` Andy Lutomirski 1 sibling, 1 reply; 10+ messages in thread From: H. Peter Anvin @ 2014-05-28 21:01 UTC (permalink / raw) To: Andy Lutomirski, Philipp Kern; +Cc: linux-kernel, H. J. Lu, Eric Paris On 05/28/2014 01:47 PM, Andy Lutomirski wrote: > On 05/28/2014 05:19 AM, Philipp Kern wrote: >> audit_filter_syscall uses the syscall number to reference into a >> bitmask (e->rule.mask[word]). Not removing the x32 bit before passing >> the number to this architecture independent codepath will fail to >> lookup the proper audit bit. Furthermore it will cause an invalid memory >> access in the kernel if the out of bound location is not mapped: >> >> BUG: unable to handle kernel paging request at ffff8800e5446630 >> IP: [<ffffffff810fcdd0>] audit_filter_syscall+0x90/0xf0 >> >> Together with the entrypoint in entry_64.S this change causes x32 >> programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending >> on the syscall path. >> >> Cc: linux-kernel@vger.kernel.org >> Cc: H. J. Lu <hjl.tools@gmail.com> >> Cc: Eric Paris <eparis@redhat.com> >> Signed-off-by: Philipp Kern <pkern@google.com> >> --- >> arch/x86/kernel/ptrace.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c >> index 678c0ad..166a3c7 100644 >> --- a/arch/x86/kernel/ptrace.c >> +++ b/arch/x86/kernel/ptrace.c >> @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs) >> >> if (IS_IA32) >> audit_syscall_entry(AUDIT_ARCH_I386, >> - regs->orig_ax, >> + regs->orig_ax & __SYSCALL_MASK, > > This is weird. Three questions: > > 1. How can this happen? I thought that x32 syscalls always came in > through the syscall path, which doesn't set is_compat_task. (Can > someone rename is_compat_task to in_compat_syscall? Pretty please?) The SYSCALL path doesn't set TS_COMPAT, but is_compat_task() looks both as TS_COMPAT and bit 30 of orig_ax. I think what is really needed here is IS_IA32 should use is_ia32_task() instead, and *that* is the context we can mask off the x32 bit in at all. However, does audit not need that information? (And why the frakk does audit receive the first four syscall arguments? Audit seems like the worst turd ever...) > 2. Now audit can't tell whether a syscall is x32 or i386. And audit is > inconsistent with seccomp. This seems wrong. This is completely and totally bogus, indeed. > 3. The OOPS you're fixing doesn't seem like it's fixed. What if some > other random high bits are set? There is a range check in entry_*.S for the system call. -hpa ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath 2014-05-28 21:01 ` H. Peter Anvin @ 2014-05-28 21:15 ` Andy Lutomirski 2014-05-28 21:19 ` H. Peter Anvin 0 siblings, 1 reply; 10+ messages in thread From: Andy Lutomirski @ 2014-05-28 21:15 UTC (permalink / raw) To: H. Peter Anvin Cc: Philipp Kern, linux-kernel@vger.kernel.org, H. J. Lu, Eric Paris On Wed, May 28, 2014 at 2:01 PM, H. Peter Anvin <hpa@linux.intel.com> wrote: > On 05/28/2014 01:47 PM, Andy Lutomirski wrote: >> On 05/28/2014 05:19 AM, Philipp Kern wrote: >>> audit_filter_syscall uses the syscall number to reference into a >>> bitmask (e->rule.mask[word]). Not removing the x32 bit before passing >>> the number to this architecture independent codepath will fail to >>> lookup the proper audit bit. Furthermore it will cause an invalid memory >>> access in the kernel if the out of bound location is not mapped: >>> >>> BUG: unable to handle kernel paging request at ffff8800e5446630 >>> IP: [<ffffffff810fcdd0>] audit_filter_syscall+0x90/0xf0 >>> >>> Together with the entrypoint in entry_64.S this change causes x32 >>> programs to pass in both AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 depending >>> on the syscall path. >>> >>> Cc: linux-kernel@vger.kernel.org >>> Cc: H. J. Lu <hjl.tools@gmail.com> >>> Cc: Eric Paris <eparis@redhat.com> >>> Signed-off-by: Philipp Kern <pkern@google.com> >>> --- >>> arch/x86/kernel/ptrace.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c >>> index 678c0ad..166a3c7 100644 >>> --- a/arch/x86/kernel/ptrace.c >>> +++ b/arch/x86/kernel/ptrace.c >>> @@ -1489,7 +1489,7 @@ long syscall_trace_enter(struct pt_regs *regs) >>> >>> if (IS_IA32) >>> audit_syscall_entry(AUDIT_ARCH_I386, >>> - regs->orig_ax, >>> + regs->orig_ax & __SYSCALL_MASK, >> >> This is weird. Three questions: >> >> 1. How can this happen? I thought that x32 syscalls always came in >> through the syscall path, which doesn't set is_compat_task. (Can >> someone rename is_compat_task to in_compat_syscall? Pretty please?) > > The SYSCALL path doesn't set TS_COMPAT, but is_compat_task() looks both > as TS_COMPAT and bit 30 of orig_ax. > > I think what is really needed here is IS_IA32 should use is_ia32_task() > instead, and *that* is the context we can mask off the x32 bit in at > all. However, does audit not need that information? This is thoroughly fscked up. What *should* have happened is that there should have been an AUDIT_ARCH_X32. Unfortunately no one caught that in time, so the current enshrined ABI is that, as far as seccomp is concerned, x32 is AUDIT_ARCH_X86_64 (see syscall_get_arch) but nr has the x32 bit set. But the audit code is different: x32 is AUDIT_ARCH_I386 and the x32 bit is set. This may be changeable, since fixes to the audit ABI may not break anything. Can we just use syscall_get_arch to determine the token to pass to the audit code? If it makes everyone feel better, I think that every single architecture has screwed this up. We actually had to prevent seccomp and audit from being configured in on any OABI-supporting ARM kernel, and MIPS almost did the same thing that x32 did. We caught that one on time, and it's now fixed in Linus' tree. > > (And why the frakk does audit receive the first four syscall arguments? > Audit seems like the worst turd ever...) If you ever benchmark the performance impact of merely running auditd, you might have to upgrade that insult :-/ > >> 2. Now audit can't tell whether a syscall is x32 or i386. And audit is >> inconsistent with seccomp. This seems wrong. > > This is completely and totally bogus, indeed. I'd suggest just fixing the bug in auditsc.c. > >> 3. The OOPS you're fixing doesn't seem like it's fixed. What if some >> other random high bits are set? > > There is a range check in entry_*.S for the system call. I can imagine that causing a certain amount of confusion to fancy seccomp users. Oh, well. No one that I know of has complained yet. --Andy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath 2014-05-28 21:15 ` Andy Lutomirski @ 2014-05-28 21:19 ` H. Peter Anvin 2014-05-28 21:43 ` Andy Lutomirski 0 siblings, 1 reply; 10+ messages in thread From: H. Peter Anvin @ 2014-05-28 21:19 UTC (permalink / raw) To: Andy Lutomirski Cc: Philipp Kern, linux-kernel@vger.kernel.org, H. J. Lu, Eric Paris On 05/28/2014 02:15 PM, Andy Lutomirski wrote: >> >>> 3. The OOPS you're fixing doesn't seem like it's fixed. What if some >>> other random high bits are set? >> >> There is a range check in entry_*.S for the system call. > > I can imagine that causing a certain amount of confusion to fancy > seccomp users. Oh, well. No one that I know of has complained yet. > I don't know how seccomp or audit deal with out-of-range syscall numbers, so that might be worth taking a look at. -hpa ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath 2014-05-28 21:19 ` H. Peter Anvin @ 2014-05-28 21:43 ` Andy Lutomirski 2014-05-28 21:53 ` Philipp Kern 0 siblings, 1 reply; 10+ messages in thread From: Andy Lutomirski @ 2014-05-28 21:43 UTC (permalink / raw) To: H. Peter Anvin Cc: Philipp Kern, linux-kernel@vger.kernel.org, H. J. Lu, Eric Paris On Wed, May 28, 2014 at 2:19 PM, H. Peter Anvin <hpa@linux.intel.com> wrote: > On 05/28/2014 02:15 PM, Andy Lutomirski wrote: >>> >>>> 3. The OOPS you're fixing doesn't seem like it's fixed. What if some >>>> other random high bits are set? >>> >>> There is a range check in entry_*.S for the system call. >> >> I can imagine that causing a certain amount of confusion to fancy >> seccomp users. Oh, well. No one that I know of has complained yet. >> > > I don't know how seccomp or audit deal with out-of-range syscall > numbers, so that might be worth taking a look at. audit oopses, apparently. seccomp will tell BPF about it and follow directions, barring bugs. However: are you sure that entry_64.S handles this? It looks like tracesys has higher priority than badsys. And strace can certainly see out-of-range syscalls. And I can oops audit by doing: auditctl -a exit,always -S open ./a.out where a.out is this: #include <stdio.h> #include <sys/syscall.h> int main() { long i; for (i = 1000; ; i += 64 * 4096) syscall(i, 0, 0, 0, 0, 0, 0); return 0; } I would *love* to deprecate the entire syscall auditing mechanism. Or at least I'd love to have distros stop using it. I'll go ask for a CVE number. Sigh. --Andy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath 2014-05-28 21:43 ` Andy Lutomirski @ 2014-05-28 21:53 ` Philipp Kern 2014-05-28 21:55 ` Andy Lutomirski 2014-05-28 22:37 ` H. Peter Anvin 0 siblings, 2 replies; 10+ messages in thread From: Philipp Kern @ 2014-05-28 21:53 UTC (permalink / raw) To: Andy Lutomirski Cc: H. Peter Anvin, linux-kernel@vger.kernel.org, H. J. Lu, Eric Paris On Wed, May 28, 2014 at 11:43 PM, Andy Lutomirski <luto@amacapital.net> wrote: > However: are you sure that entry_64.S handles this? It looks like > tracesys has higher priority than badsys. And strace can certainly > see out-of-range syscalls. […] Not only can it see them: It must see that this bit is set as that's the only identifier it has to deduce that the binary is running in x32 mode. Out of range syscall numbers certainly do not work for auditing right now, hence my attempt to patch around it. Kind regards and thanks Philipp Kern ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath 2014-05-28 21:53 ` Philipp Kern @ 2014-05-28 21:55 ` Andy Lutomirski 2014-05-28 22:37 ` H. Peter Anvin 1 sibling, 0 replies; 10+ messages in thread From: Andy Lutomirski @ 2014-05-28 21:55 UTC (permalink / raw) To: Philipp Kern Cc: H. Peter Anvin, linux-kernel@vger.kernel.org, H. J. Lu, Eric Paris On Wed, May 28, 2014 at 2:53 PM, Philipp Kern <pkern@google.com> wrote: > On Wed, May 28, 2014 at 11:43 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> However: are you sure that entry_64.S handles this? It looks like >> tracesys has higher priority than badsys. And strace can certainly >> see out-of-range syscalls. […] > > Not only can it see them: It must see that this bit is set as that's > the only identifier it has to deduce that the binary is running in x32 > mode. > > Out of range syscall numbers certainly do not work for auditing right > now, hence my attempt to patch around it. There appears to be a completely arbitrary limit of 32*64 syscalls. There's also an arbitrary limit of 4 arguments. Both are wrong. I have no intention of fixing either. I'll fix the OOPS, though. > > Kind regards and thanks > Philipp Kern -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath 2014-05-28 21:53 ` Philipp Kern 2014-05-28 21:55 ` Andy Lutomirski @ 2014-05-28 22:37 ` H. Peter Anvin 1 sibling, 0 replies; 10+ messages in thread From: H. Peter Anvin @ 2014-05-28 22:37 UTC (permalink / raw) To: Philipp Kern, Andy Lutomirski Cc: linux-kernel@vger.kernel.org, H. J. Lu, Eric Paris On 05/28/2014 02:53 PM, Philipp Kern wrote: > On Wed, May 28, 2014 at 11:43 PM, Andy Lutomirski <luto@amacapital.net> wrote: >> However: are you sure that entry_64.S handles this? It looks like >> tracesys has higher priority than badsys. And strace can certainly >> see out-of-range syscalls. […] > > Not only can it see them: It must see that this bit is set as that's > the only identifier it has to deduce that the binary is running in x32 > mode. Yes... keep in mind the ABI is a local property: just because the binary is running in x32 mode doesn't mean it can't access x86-64 or i386 system calls (or similar for x86-64 processes.) A process started in i386 mode can transition to long mode and execute x86-64 or x32 system calls, too. -hpa ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-05-28 22:37 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-28 12:19 [PATCH] x32: Mask away the x32 syscall bit in the ptrace codepath Philipp Kern 2014-05-28 20:47 ` Andy Lutomirski 2014-05-28 20:58 ` Philipp Kern 2014-05-28 21:01 ` H. Peter Anvin 2014-05-28 21:15 ` Andy Lutomirski 2014-05-28 21:19 ` H. Peter Anvin 2014-05-28 21:43 ` Andy Lutomirski 2014-05-28 21:53 ` Philipp Kern 2014-05-28 21:55 ` Andy Lutomirski 2014-05-28 22:37 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox