* System call number masking @ 2016-04-14 17:22 Ben Hutchings 2016-04-14 17:48 ` Andy Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: Ben Hutchings @ 2016-04-14 17:22 UTC (permalink / raw) To: Andy Lutomirski; +Cc: x86, LKML [-- Attachment #1: Type: text/plain, Size: 943 bytes --] I'm updating my x32-as-boot-time-option patch for 4.6, and I noticed a subtle change in system call number masking on x86_64 as a result of moving the slow path into C. Previously we would mask out the upper 32 bits before doing anything with the system call number, both on the slow and fast paths, if and only if x32 was enabled. Now we always mask out the upper 32 bits on the slow path, so it's not quite consistent with the fast path if x32 is disabled. A system call that would be rejected by the fast path can succeed on the slow path. I don't know whether this causes any problems, but it seems undesirable. But it's also undesirable that the behaviour of system call numbers not assigned to x32 also varies depending on whether x32 is enabled. Should we always mask out the upper 32 bits on the fast path? Ben. -- Ben Hutchings In a hierarchy, every employee tends to rise to his level of incompetence. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: System call number masking 2016-04-14 17:22 System call number masking Ben Hutchings @ 2016-04-14 17:48 ` Andy Lutomirski 2016-04-18 0:45 ` Ben Hutchings 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2016-04-14 17:48 UTC (permalink / raw) To: Ben Hutchings; +Cc: Andy Lutomirski, X86 ML, LKML On Thu, Apr 14, 2016 at 10:22 AM, Ben Hutchings <ben@decadent.org.uk> wrote: > I'm updating my x32-as-boot-time-option patch for 4.6, and I noticed a > subtle change in system call number masking on x86_64 as a result of > moving the slow path into C. > > Previously we would mask out the upper 32 bits before doing anything > with the system call number, both on the slow and fast paths, if and > only if x32 was enabled. I always thought that the old behavior was nonsensical. The behavior should be the same regardless of config options. > > Now we always mask out the upper 32 bits on the slow path, so it's not > quite consistent with the fast path if x32 is disabled. A system call > that would be rejected by the fast path can succeed on the slow path. > I don't know whether this causes any problems, but it seems > undesirable. > > But it's also undesirable that the behaviour of system call numbers not > assigned to x32 also varies depending on whether x32 is enabled. > Should we always mask out the upper 32 bits on the fast path? > I think we should. Alternatively, the ja 1f that takes us back out if the syscall nr is invalid could be changed to jump to the slow path, thus avoiding an instruction in the fast path and keeping the code a bit cleaner. --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: System call number masking 2016-04-14 17:48 ` Andy Lutomirski @ 2016-04-18 0:45 ` Ben Hutchings 2016-04-18 0:47 ` [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path Ben Hutchings 0 siblings, 1 reply; 14+ messages in thread From: Ben Hutchings @ 2016-04-18 0:45 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Andy Lutomirski, X86 ML, LKML [-- Attachment #1: Type: text/plain, Size: 961 bytes --] On Thu, 2016-04-14 at 10:48 -0700, Andy Lutomirski wrote: > On Thu, Apr 14, 2016 at 10:22 AM, Ben Hutchings <ben@decadent.org.uk> wrote: > > > > I'm updating my x32-as-boot-time-option patch for 4.6, and I noticed a > > subtle change in system call number masking on x86_64 as a result of > > moving the slow path into C. > > > > Previously we would mask out the upper 32 bits before doing anything > > with the system call number, both on the slow and fast paths, if and > > only if x32 was enabled. > I always thought that the old behavior was nonsensical. The behavior > should be the same regardless of config options. [...] Oops, my C is failing me - ints are sign-extended, not zero-extended, when promoted to unsigned long. So the slow path actually does test the upper 32 bits, and the odd one out is the x32 fast path. Ben. -- Ben Hutchings Always try to do things in chronological order; it's less confusing that way. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path 2016-04-18 0:45 ` Ben Hutchings @ 2016-04-18 0:47 ` Ben Hutchings 2016-04-18 4:50 ` H. Peter Anvin 0 siblings, 1 reply; 14+ messages in thread From: Ben Hutchings @ 2016-04-18 0:47 UTC (permalink / raw) To: Andy Lutomirski; +Cc: X86 ML, LKML [-- Attachment #1: Type: text/plain, Size: 1027 bytes --] We've always masked off the top 32 bits when x32 is enabled, but hopefully no-one relies on that. Now that the slow path is in C, we check all the bits there, regardless of whether x32 is enabled. Let's make the fast path consistent with it. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Andy Lutomirski <luto@kernel.org> --- arch/x86/entry/entry_64.S | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 858b555e274b..17ba2ca9b24d 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -190,12 +190,10 @@ entry_SYSCALL_64_fastpath: */ TRACE_IRQS_ON ENABLE_INTERRUPTS(CLBR_NONE) -#if __SYSCALL_MASK == ~0 - cmpq $__NR_syscall_max, %rax -#else - andl $__SYSCALL_MASK, %eax - cmpl $__NR_syscall_max, %eax +#if __SYSCALL_MASK != ~0 + andq $__SYSCALL_MASK, %rax #endif + cmpq $__NR_syscall_max, %rax ja 1f /* return -ENOSYS (already in pt_regs->ax) */ movq %r10, %rcx [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 811 bytes --] ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path 2016-04-18 0:47 ` [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path Ben Hutchings @ 2016-04-18 4:50 ` H. Peter Anvin 2016-04-18 5:18 ` Andy Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: H. Peter Anvin @ 2016-04-18 4:50 UTC (permalink / raw) To: Ben Hutchings, Andy Lutomirski; +Cc: X86 ML, LKML On 04/17/16 17:47, Ben Hutchings wrote: > We've always masked off the top 32 bits when x32 is enabled, but > hopefully no-one relies on that. Now that the slow path is in C, we > check all the bits there, regardless of whether x32 is enabled. Let's > make the fast path consistent with it. We have always masked off the top 32 bits *period*. We have had some bugs where we haven't, because someone has tried to "optimize" the code and they have been quite serious. The system call number is an int, which means the upper 32 bits are undefined on call entry: we HAVE to mask them. -hpa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path 2016-04-18 4:50 ` H. Peter Anvin @ 2016-04-18 5:18 ` Andy Lutomirski 2016-04-18 5:21 ` H. Peter Anvin 2016-04-18 5:24 ` H. Peter Anvin 0 siblings, 2 replies; 14+ messages in thread From: Andy Lutomirski @ 2016-04-18 5:18 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ben Hutchings, Andy Lutomirski, X86 ML, LKML On Sun, Apr 17, 2016 at 9:50 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 04/17/16 17:47, Ben Hutchings wrote: >> We've always masked off the top 32 bits when x32 is enabled, but >> hopefully no-one relies on that. Now that the slow path is in C, we >> check all the bits there, regardless of whether x32 is enabled. Let's >> make the fast path consistent with it. > > We have always masked off the top 32 bits *period*. > > We have had some bugs where we haven't, because someone has tried to > "optimize" the code and they have been quite serious. The system call > number is an int, which means the upper 32 bits are undefined on call > entry: we HAVE to mask them. I'm reasonably confident that normal kernels (non-x32) have not masked those bits since before I started hacking on the entry code. So the type of the syscall nr is a bit confused. If there was an installed base of programs that leaved garbage in the high bits, we would have noticed *years* ago. On the other hand, the 32-bit ptrace ABI and the seccomp ABI both think it's 32-bits. If we were designing the x86_64 ABI and everything around it from scratch, I'd suggest that that either the high bits must be zero or that the number actually be 64 bits (which are more or less the same thing). That would let us use the high bits for something interesting in the future. In practice, we can probably still declare that the thing is a 64-bit number, given that most kernels in the wild currently fail syscalls that have the high bits set. --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path 2016-04-18 5:18 ` Andy Lutomirski @ 2016-04-18 5:21 ` H. Peter Anvin 2016-04-18 5:39 ` Andy Lutomirski 2016-04-18 5:24 ` H. Peter Anvin 1 sibling, 1 reply; 14+ messages in thread From: H. Peter Anvin @ 2016-04-18 5:21 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Ben Hutchings, Andy Lutomirski, X86 ML, LKML On 04/17/16 22:18, Andy Lutomirski wrote: > On Sun, Apr 17, 2016 at 9:50 PM, H. Peter Anvin <hpa@zytor.com> wrote: >> On 04/17/16 17:47, Ben Hutchings wrote: >>> We've always masked off the top 32 bits when x32 is enabled, but >>> hopefully no-one relies on that. Now that the slow path is in C, we >>> check all the bits there, regardless of whether x32 is enabled. Let's >>> make the fast path consistent with it. >> >> We have always masked off the top 32 bits *period*. >> >> We have had some bugs where we haven't, because someone has tried to >> "optimize" the code and they have been quite serious. The system call >> number is an int, which means the upper 32 bits are undefined on call >> entry: we HAVE to mask them. > > I'm reasonably confident that normal kernels (non-x32) have not masked > those bits since before I started hacking on the entry code. > I'm reasonably confident they have, because we have had security bugs TWICE when someone has tried to "optimize" the code. The masking was generally done with a movl instruction, which confused people. > So the type of the syscall nr is a bit confused. If there was an > installed base of programs that leaved garbage in the high bits, we > would have noticed *years* ago. On the other hand, the 32-bit ptrace > ABI and the seccomp ABI both think it's 32-bits. Incorrect. We have seen these failures in real life. > If we were designing the x86_64 ABI and everything around it from > scratch, I'd suggest that that either the high bits must be zero or > that the number actually be 64 bits (which are more or less the same > thing). That would let us use the high bits for something interesting > in the future. Not really all that useful. What we have is a C ABI. > In practice, we can probably still declare that the thing is a 64-bit > number, given that most kernels in the wild currently fail syscalls > that have the high bits set. They don't, and we can prove it... -hpa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path 2016-04-18 5:21 ` H. Peter Anvin @ 2016-04-18 5:39 ` Andy Lutomirski 2016-04-18 5:45 ` H. Peter Anvin 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2016-04-18 5:39 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ben Hutchings, Andy Lutomirski, X86 ML, LKML On Sun, Apr 17, 2016 at 10:21 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 04/17/16 22:18, Andy Lutomirski wrote: >> On Sun, Apr 17, 2016 at 9:50 PM, H. Peter Anvin <hpa@zytor.com> wrote: >>> On 04/17/16 17:47, Ben Hutchings wrote: >>>> We've always masked off the top 32 bits when x32 is enabled, but >>>> hopefully no-one relies on that. Now that the slow path is in C, we >>>> check all the bits there, regardless of whether x32 is enabled. Let's >>>> make the fast path consistent with it. >>> >>> We have always masked off the top 32 bits *period*. >>> >>> We have had some bugs where we haven't, because someone has tried to >>> "optimize" the code and they have been quite serious. The system call >>> number is an int, which means the upper 32 bits are undefined on call >>> entry: we HAVE to mask them. >> >> I'm reasonably confident that normal kernels (non-x32) have not masked >> those bits since before I started hacking on the entry code. >> > > I'm reasonably confident they have, because we have had security bugs > TWICE when someone has tried to "optimize" the code. The masking was > generally done with a movl instruction, which confused people. > >> So the type of the syscall nr is a bit confused. If there was an >> installed base of programs that leaved garbage in the high bits, we >> would have noticed *years* ago. On the other hand, the 32-bit ptrace >> ABI and the seccomp ABI both think it's 32-bits. > > Incorrect. We have seen these failures in real life. What kind of failure? Programs that accidentally set rax to 0xbaadf00d00000003 get -ENOSYS in most cases, not close(). If we'd broken programs like this, I assume we would have had to fix it a long time ago. > >> If we were designing the x86_64 ABI and everything around it from >> scratch, I'd suggest that that either the high bits must be zero or >> that the number actually be 64 bits (which are more or less the same >> thing). That would let us use the high bits for something interesting >> in the future. > > Not really all that useful. What we have is a C ABI. And we've already stolen a bit once for x32. Maybe we'll want more. For example, if we added a cancellable bit, if x86_32 didn't want it, we could steal a high bit for ie. > >> In practice, we can probably still declare that the thing is a 64-bit >> number, given that most kernels in the wild currently fail syscalls >> that have the high bits set. > > They don't, and we can prove it... I'm confused. asm volatile ("syscall" : "=a" (ret) : "a" (SYS_getpid | 0xbaadf00d00000000ULL) : "memory", "cc", "rcx", "r11"); gets -ENOSYS on the kernel I'm running on my laptop and on Fedora 23's stock kernel. I'm not terribly worried about nasty security issues in here because all the nasty stuff is in C now. What kernel had the other behavior? In 2.6.11, I see: ENTRY(system_call) CFI_STARTPROC swapgs movq %rsp,%gs:pda_oldrsp movq %gs:pda_kernelstack,%rsp sti SAVE_ARGS 8,1 movq %rax,ORIG_RAX-ARGOFFSET(%rsp) movq %rcx,RIP-ARGOFFSET(%rsp) GET_THREAD_INFO(%rcx) testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT),threadinfo_flags(%rcx) jnz tracesys cmpq $__NR_syscall_max,%rax --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path 2016-04-18 5:39 ` Andy Lutomirski @ 2016-04-18 5:45 ` H. Peter Anvin 2016-04-18 5:48 ` Andy Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: H. Peter Anvin @ 2016-04-18 5:45 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Ben Hutchings, Andy Lutomirski, X86 ML, LKML On 04/17/16 22:39, Andy Lutomirski wrote: >> >> I'm reasonably confident they have, because we have had security bugs >> TWICE when someone has tried to "optimize" the code. The masking was >> generally done with a movl instruction, which confused people. >> >>> So the type of the syscall nr is a bit confused. If there was an >>> installed base of programs that leaved garbage in the high bits, we >>> would have noticed *years* ago. On the other hand, the 32-bit ptrace >>> ABI and the seccomp ABI both think it's 32-bits. >> >> Incorrect. We have seen these failures in real life. > > What kind of failure? Programs that accidentally set rax to > 0xbaadf00d00000003 get -ENOSYS in most cases, not close(). If we'd > broken programs like this, I assume we would have had to fix it a long > time ago. > >>> If we were designing the x86_64 ABI and everything around it from >>> scratch, I'd suggest that that either the high bits must be zero or >>> that the number actually be 64 bits (which are more or less the same >>> thing). That would let us use the high bits for something interesting >>> in the future. >> >> Not really all that useful. What we have is a C ABI. > > And we've already stolen a bit once for x32. Maybe we'll want more. > For example, if we added a cancellable bit, if x86_32 didn't want it, > we could steal a high bit for ie. > I think we're worrying about the wrong thing here... we skipped bit 31 to avoid signedness issues, and with bit 30 for x32 we now "only" have 20 bits that haven't been used for anything at all. >> >>> In practice, we can probably still declare that the thing is a 64-bit >>> number, given that most kernels in the wild currently fail syscalls >>> that have the high bits set. >> >> They don't, and we can prove it... > > I'm confused. > > asm volatile ("syscall" : > "=a" (ret) : > "a" (SYS_getpid | 0xbaadf00d00000000ULL) : > "memory", "cc", "rcx", "r11"); > > gets -ENOSYS on the kernel I'm running on my laptop and on Fedora 23's > stock kernel. > > I'm not terribly worried about nasty security issues in here because > all the nasty stuff is in C now. > > What kernel had the other behavior? In 2.6.11, I see: > > ENTRY(system_call) > CFI_STARTPROC > swapgs > movq %rsp,%gs:pda_oldrsp > movq %gs:pda_kernelstack,%rsp > sti > SAVE_ARGS 8,1 > movq %rax,ORIG_RAX-ARGOFFSET(%rsp) > movq %rcx,RIP-ARGOFFSET(%rsp) > GET_THREAD_INFO(%rcx) > testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT),threadinfo_flags(%rcx) > jnz tracesys > cmpq $__NR_syscall_max,%rax > I can't remember what versions. What I do know is that this was a bug which was introduced, fixed, re-introduced, and fixed again, and both resulted in CVEs. The fact that you're seeing the cmpq indicates that it at least was not one of the security-buggy kernels. I do agree we should make the behavior consistent, and follow the documented behavior of treating the syscall argument as an int. -hpa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path 2016-04-18 5:45 ` H. Peter Anvin @ 2016-04-18 5:48 ` Andy Lutomirski 2016-04-18 6:01 ` H. Peter Anvin 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2016-04-18 5:48 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ben Hutchings, Andy Lutomirski, X86 ML, LKML On Sun, Apr 17, 2016 at 10:45 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 04/17/16 22:39, Andy Lutomirski wrote: >>> >>> I'm reasonably confident they have, because we have had security bugs >>> TWICE when someone has tried to "optimize" the code. The masking was >>> generally done with a movl instruction, which confused people. >>> >>>> So the type of the syscall nr is a bit confused. If there was an >>>> installed base of programs that leaved garbage in the high bits, we >>>> would have noticed *years* ago. On the other hand, the 32-bit ptrace >>>> ABI and the seccomp ABI both think it's 32-bits. >>> >>> Incorrect. We have seen these failures in real life. >> >> What kind of failure? Programs that accidentally set rax to >> 0xbaadf00d00000003 get -ENOSYS in most cases, not close(). If we'd >> broken programs like this, I assume we would have had to fix it a long >> time ago. >> >>>> If we were designing the x86_64 ABI and everything around it from >>>> scratch, I'd suggest that that either the high bits must be zero or >>>> that the number actually be 64 bits (which are more or less the same >>>> thing). That would let us use the high bits for something interesting >>>> in the future. >>> >>> Not really all that useful. What we have is a C ABI. >> >> And we've already stolen a bit once for x32. Maybe we'll want more. >> For example, if we added a cancellable bit, if x86_32 didn't want it, >> we could steal a high bit for ie. >> > > I think we're worrying about the wrong thing here... we skipped bit 31 > to avoid signedness issues, and with bit 30 for x32 we now "only" have > 20 bits that haven't been used for anything at all. > >>> >>>> In practice, we can probably still declare that the thing is a 64-bit >>>> number, given that most kernels in the wild currently fail syscalls >>>> that have the high bits set. >>> >>> They don't, and we can prove it... >> >> I'm confused. >> >> asm volatile ("syscall" : >> "=a" (ret) : >> "a" (SYS_getpid | 0xbaadf00d00000000ULL) : >> "memory", "cc", "rcx", "r11"); >> >> gets -ENOSYS on the kernel I'm running on my laptop and on Fedora 23's >> stock kernel. >> >> I'm not terribly worried about nasty security issues in here because >> all the nasty stuff is in C now. >> >> What kernel had the other behavior? In 2.6.11, I see: >> >> ENTRY(system_call) >> CFI_STARTPROC >> swapgs >> movq %rsp,%gs:pda_oldrsp >> movq %gs:pda_kernelstack,%rsp >> sti >> SAVE_ARGS 8,1 >> movq %rax,ORIG_RAX-ARGOFFSET(%rsp) >> movq %rcx,RIP-ARGOFFSET(%rsp) >> GET_THREAD_INFO(%rcx) >> testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT),threadinfo_flags(%rcx) >> jnz tracesys >> cmpq $__NR_syscall_max,%rax >> > > I can't remember what versions. What I do know is that this was a bug > which was introduced, fixed, re-introduced, and fixed again, and both > resulted in CVEs. The fact that you're seeing the cmpq indicates that > it at least was not one of the security-buggy kernels. > > I do agree we should make the behavior consistent, and follow the > documented behavior of treating the syscall argument as an int. > I think I prefer the "reject weird input" behavior over the "accept and normalize weird input" if we can get away with it, and I'm fairly confident that we can get away with "reject weird input" given that distro kernels do exactly that already. So I like Ben's patch. --Andy > -hpa > > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path 2016-04-18 5:48 ` Andy Lutomirski @ 2016-04-18 6:01 ` H. Peter Anvin 2016-04-18 6:14 ` Andy Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: H. Peter Anvin @ 2016-04-18 6:01 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Ben Hutchings, Andy Lutomirski, X86 ML, LKML On 04/17/16 22:48, Andy Lutomirski wrote: > > I think I prefer the "reject weird input" behavior over the "accept > and normalize weird input" if we can get away with it, and I'm fairly > confident that we can get away with "reject weird input" given that > distro kernels do exactly that already. > It's not "weird", it is the ABI as defined. We have to do this for all the system call arguments, too; you just don't notice it because the compiler does it for us. Some other architectures, e.g. s390, has the opposite convention where the caller is responsible for normalizing the result; in that case we have to do it *again* in the kernel, which is one of the major reasons for the SYSCALL_*() macros. So I'm not sure this is a valid consideration. The reason it generally works is because the natural way for the user space code to work is to load a value into %eax which will naturally zero-extend to %rax, but it isn't inherently required to work that way. -hpa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path 2016-04-18 6:01 ` H. Peter Anvin @ 2016-04-18 6:14 ` Andy Lutomirski 2016-04-18 6:19 ` H. Peter Anvin 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2016-04-18 6:14 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Ben Hutchings, Andy Lutomirski, X86 ML, LKML On Sun, Apr 17, 2016 at 11:01 PM, H. Peter Anvin <hpa@zytor.com> wrote: > On 04/17/16 22:48, Andy Lutomirski wrote: >> >> I think I prefer the "reject weird input" behavior over the "accept >> and normalize weird input" if we can get away with it, and I'm fairly >> confident that we can get away with "reject weird input" given that >> distro kernels do exactly that already. >> > > It's not "weird", it is the ABI as defined. We have to do this for all > the system call arguments, too; you just don't notice it because the > compiler does it for us. Some other architectures, e.g. s390, has the > opposite convention where the caller is responsible for normalizing the > result; in that case we have to do it *again* in the kernel, which is > one of the major reasons for the SYSCALL_*() macros. What ABI? Even the man page says: #define _GNU_SOURCE /* See feature_test_macros(7) */ #include <unistd.h> #include <sys/syscall.h> /* For SYS_xxx definitions */ long syscall(long number, ...); musl's 64-bit syscall wrappers use long I can't confidently decipher glibc's wrappers, because they're approximately as obfuscated as the rest of glibc, but the code that I think matters looks like: # define DO_CALL(syscall_name, args) \ DOARGS_##args \ movl $SYS_ify (syscall_name), %eax; \ syscall; which doesn't correspond to any particular C type but leaves the high bits clear. For all I know, some day we'll want to use the syscall instruction for something that isn't a normal syscall, and having high bits available for that could be handy. Also, the behavior in which fail the syscall if any high bits are set is faster -- it's one fewer instruction. Admittedly, the CPU can probably do that instruction for free, but still... --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path 2016-04-18 6:14 ` Andy Lutomirski @ 2016-04-18 6:19 ` H. Peter Anvin 0 siblings, 0 replies; 14+ messages in thread From: H. Peter Anvin @ 2016-04-18 6:19 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Ben Hutchings, Andy Lutomirski, X86 ML, LKML On 04/17/16 23:14, Andy Lutomirski wrote: >> >> It's not "weird", it is the ABI as defined. We have to do this for all >> the system call arguments, too; you just don't notice it because the >> compiler does it for us. Some other architectures, e.g. s390, has the >> opposite convention where the caller is responsible for normalizing the >> result; in that case we have to do it *again* in the kernel, which is >> one of the major reasons for the SYSCALL_*() macros. > > What ABI? > The C ABI for int. I hadn't seen the below, because I think syscall(3) is just braindamaged, but the odds are that if we'd ever use the upper 32 bits for anything we'd be in a world of hurt, so that would be highly theoretical IMO. Bit 31 might be possible, but I wouldn't really want to brave it unless we really have no choice. > Also, the behavior in which fail the syscall if any high bits are set > is faster -- it's one fewer instruction. Admittedly, the CPU can > probably do that instruction for free, but still... Yes, it can; at least on any remotely modern hardware. -hpa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path 2016-04-18 5:18 ` Andy Lutomirski 2016-04-18 5:21 ` H. Peter Anvin @ 2016-04-18 5:24 ` H. Peter Anvin 1 sibling, 0 replies; 14+ messages in thread From: H. Peter Anvin @ 2016-04-18 5:24 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Ben Hutchings, Andy Lutomirski, X86 ML, LKML On 04/17/16 22:18, Andy Lutomirski wrote: > On Sun, Apr 17, 2016 at 9:50 PM, H. Peter Anvin <hpa@zytor.com> wrote: >> On 04/17/16 17:47, Ben Hutchings wrote: >>> We've always masked off the top 32 bits when x32 is enabled, but >>> hopefully no-one relies on that. Now that the slow path is in C, we >>> check all the bits there, regardless of whether x32 is enabled. Let's >>> make the fast path consistent with it. >> >> We have always masked off the top 32 bits *period*. >> >> We have had some bugs where we haven't, because someone has tried to >> "optimize" the code and they have been quite serious. The system call >> number is an int, which means the upper 32 bits are undefined on call >> entry: we HAVE to mask them. > > I'm reasonably confident that normal kernels (non-x32) have not masked > those bits since before I started hacking on the entry code. > > So the type of the syscall nr is a bit confused. If there was an > installed base of programs that leaved garbage in the high bits, we > would have noticed *years* ago. On the other hand, the 32-bit ptrace > ABI and the seccomp ABI both think it's 32-bits. > > If we were designing the x86_64 ABI and everything around it from > scratch, I'd suggest that that either the high bits must be zero or > that the number actually be 64 bits (which are more or less the same > thing). That would let us use the high bits for something interesting > in the future. > > In practice, we can probably still declare that the thing is a 64-bit > number, given that most kernels in the wild currently fail syscalls > that have the high bits set. > For the record, I changed the range comparison from cmpl to cmpq so if someone re-introduced this bug *again* it would be a functionality problem as opposed to a security hole a mile wide. -hpa ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-04-18 6:19 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-04-14 17:22 System call number masking Ben Hutchings 2016-04-14 17:48 ` Andy Lutomirski 2016-04-18 0:45 ` Ben Hutchings 2016-04-18 0:47 ` [PATCH] x86/entry/x32: Check top 32 bits of syscall number on the fast path Ben Hutchings 2016-04-18 4:50 ` H. Peter Anvin 2016-04-18 5:18 ` Andy Lutomirski 2016-04-18 5:21 ` H. Peter Anvin 2016-04-18 5:39 ` Andy Lutomirski 2016-04-18 5:45 ` H. Peter Anvin 2016-04-18 5:48 ` Andy Lutomirski 2016-04-18 6:01 ` H. Peter Anvin 2016-04-18 6:14 ` Andy Lutomirski 2016-04-18 6:19 ` H. Peter Anvin 2016-04-18 5:24 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox