* [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
@ 2017-09-12 22:57 Dmitry V. Levin
2017-09-13 16:49 ` Oleg Nesterov
2017-09-14 21:05 ` Andy Lutomirski
0 siblings, 2 replies; 13+ messages in thread
From: Dmitry V. Levin @ 2017-09-12 22:57 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86
Cc: Andy Lutomirski, Oleg Nesterov, Eugene Syromyatnikov,
linux-kernel
Before this change, CONFIG_X86_X32=y fastpath behaviour was different
from slowpath:
$ gcc -xc -Wall -O2 - <<'EOF'
#include <unistd.h>
#include <asm/unistd.h>
int main(void) {
unsigned long nr = ~0xffffffffUL | __NR_exit;
return !!syscall(nr, 42, 1, 2, 3, 4, 5);
}
EOF
$ ./a.out; echo \$?=$?
$?=42
$ strace -enone ./a.out
syscall_18446744069414584380(0x2a, 0x1, 0x2, 0x3, 0x4, 0x5) = -1 (errno 38)
+++ exited with 1 +++
This change syncs CONFIG_X86_X32=y fastpath behaviour with the case
when CONFIG_X86_X32 is not enabled.
Fixes: fca460f95e92 ("x32: Handle the x32 system call flag")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
---
arch/x86/entry/entry_64.S | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4916725..3bab6af 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -185,12 +185,10 @@ entry_SYSCALL_64_fastpath:
*/
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
-#if __SYSCALL_MASK == ~0
- cmpq $__NR_syscall_max, %rax
-#else
- andl $__SYSCALL_MASK, %eax
- cmpl $__NR_syscall_max, %eax
+#if __SYSCALL_MASK != ~0
+ andq $__SYSCALL_MASK, %rax
#endif
+ cmpq $__NR_syscall_max, %rax
ja 1f /* return -ENOSYS (already in pt_regs->ax) */
movq %r10, %rcx
--
ldv
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-12 22:57 [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y Dmitry V. Levin
@ 2017-09-13 16:49 ` Oleg Nesterov
2017-09-14 19:40 ` Eugene Syromyatnikov
2017-09-14 20:21 ` Dmitry V. Levin
2017-09-14 21:05 ` Andy Lutomirski
1 sibling, 2 replies; 13+ messages in thread
From: Oleg Nesterov @ 2017-09-13 16:49 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Andy Lutomirski, Eugene Syromyatnikov, linux-kernel
On 09/13, Dmitry V. Levin wrote:
>
> Before this change, CONFIG_X86_X32=y fastpath behaviour was different
> from slowpath:
and even with this change they differ if CONFIG_X86_X32=n? do_syscall_64()
does "nr & __SYSCALL_MASK" unconditionally, this clears the upper bits, no?
And why __SYSCALL_MASK is not "unsigned long" ? IOW, why do we want to silently
ignore the upper bits in $rax ?
Or I am totally confused?
Oleg.
> $ gcc -xc -Wall -O2 - <<'EOF'
> #include <unistd.h>
> #include <asm/unistd.h>
> int main(void) {
> unsigned long nr = ~0xffffffffUL | __NR_exit;
> return !!syscall(nr, 42, 1, 2, 3, 4, 5);
> }
> EOF
> $ ./a.out; echo \$?=$?
> $?=42
> $ strace -enone ./a.out
> syscall_18446744069414584380(0x2a, 0x1, 0x2, 0x3, 0x4, 0x5) = -1 (errno 38)
> +++ exited with 1 +++
>
> This change syncs CONFIG_X86_X32=y fastpath behaviour with the case
> when CONFIG_X86_X32 is not enabled.
>
> Fixes: fca460f95e92 ("x32: Handle the x32 system call flag")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
> ---
> arch/x86/entry/entry_64.S | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 4916725..3bab6af 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -185,12 +185,10 @@ entry_SYSCALL_64_fastpath:
> */
> TRACE_IRQS_ON
> ENABLE_INTERRUPTS(CLBR_NONE)
> -#if __SYSCALL_MASK == ~0
> - cmpq $__NR_syscall_max, %rax
> -#else
> - andl $__SYSCALL_MASK, %eax
> - cmpl $__NR_syscall_max, %eax
> +#if __SYSCALL_MASK != ~0
> + andq $__SYSCALL_MASK, %rax
> #endif
> + cmpq $__NR_syscall_max, %rax
> ja 1f /* return -ENOSYS (already in pt_regs->ax) */
> movq %r10, %rcx
>
> --
> ldv
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-13 16:49 ` Oleg Nesterov
@ 2017-09-14 19:40 ` Eugene Syromyatnikov
2017-09-14 20:24 ` Dmitry V. Levin
2017-09-14 20:21 ` Dmitry V. Levin
1 sibling, 1 reply; 13+ messages in thread
From: Eugene Syromyatnikov @ 2017-09-14 19:40 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Andy Lutomirski, lkml, Dmitry V. Levin
On Wed, Sep 13, 2017 at 4:49 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> IOW, why do we want to silently ignore the upper bits in $rax ?
By the way, they are ignored elsewhere, in audit[1] or seccomp[2], for example.
[1] include/linux/audit.h
[2] include/uapi/linux/seccomp.h, definition of struct seccomp_data
--
Eugene Syromyatnikov
mailto:evgsyr@gmail.com
xmpp:esyr@jabber.{ru|org}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-13 16:49 ` Oleg Nesterov
2017-09-14 19:40 ` Eugene Syromyatnikov
@ 2017-09-14 20:21 ` Dmitry V. Levin
2017-09-15 16:12 ` Oleg Nesterov
1 sibling, 1 reply; 13+ messages in thread
From: Dmitry V. Levin @ 2017-09-14 20:21 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Andy Lutomirski, Eugene Syromyatnikov, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]
On Wed, Sep 13, 2017 at 06:49:21PM +0200, Oleg Nesterov wrote:
> On 09/13, Dmitry V. Levin wrote:
> >
> > Before this change, CONFIG_X86_X32=y fastpath behaviour was different
> > from slowpath:
>
> and even with this change they differ if CONFIG_X86_X32=n?
No, I don't think so.
> do_syscall_64() does "nr & __SYSCALL_MASK" unconditionally,
yes
> this clears the upper bits, no?
Why? As "nr" is of type "unsigned long" and __SYSCALL_MASK is either
(~(__X32_SYSCALL_BIT)) or (~0), that is, an integer with the sign bit set,
in "nr & __SYSCALL_MASK" expression __SYSCALL_MASK is sign-extended
to unsigned long. When __SYSCALL_MASK is defined to (~0),
"nr & __SYSCALL_MASK" is optimized to "nr" at compilation time:
$ echo 'unsigned long foo(unsigned long nr) { return nr & (~0); }' |
gcc -Wall -O2 -xc -S -o - - |
sed -n '/cfi_/,/cfi_/p'
.cfi_startproc
movq %rdi, %rax
ret
.cfi_endproc
> And why __SYSCALL_MASK is not "unsigned long" ? IOW, why do we want to silently
> ignore the upper bits in $rax ?
__SYSCALL_MASK is "int" but it is being sign-extended to unsigned long in all
(two) places of arch/x86/entry/common.c where it is used.
> Or I am totally confused?
The thing looks like it was designed to confuse people.
--
ldv
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-14 19:40 ` Eugene Syromyatnikov
@ 2017-09-14 20:24 ` Dmitry V. Levin
0 siblings, 0 replies; 13+ messages in thread
From: Dmitry V. Levin @ 2017-09-14 20:24 UTC (permalink / raw)
To: Eugene Syromyatnikov
Cc: Oleg Nesterov, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Andy Lutomirski, lkml
[-- Attachment #1: Type: text/plain, Size: 494 bytes --]
On Thu, Sep 14, 2017 at 07:40:43PM +0000, Eugene Syromyatnikov wrote:
> On Wed, Sep 13, 2017 at 4:49 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> > IOW, why do we want to silently ignore the upper bits in $rax ?
>
> By the way, they are ignored elsewhere, in audit[1] or seccomp[2], for example.
>
> [1] include/linux/audit.h
> [2] include/uapi/linux/seccomp.h, definition of struct seccomp_data
Yes, unfortunately, they are ignored later, but that is another story.
--
ldv
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-12 22:57 [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y Dmitry V. Levin
2017-09-13 16:49 ` Oleg Nesterov
@ 2017-09-14 21:05 ` Andy Lutomirski
2017-09-14 21:33 ` Dmitry V. Levin
1 sibling, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2017-09-14 21:05 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
Andy Lutomirski, Oleg Nesterov, Eugene Syromyatnikov,
linux-kernel@vger.kernel.org
On Tue, Sep 12, 2017 at 3:57 PM, Dmitry V. Levin <ldv@altlinux.org> wrote:
> Before this change, CONFIG_X86_X32=y fastpath behaviour was different
> from slowpath:
>
> $ gcc -xc -Wall -O2 - <<'EOF'
> #include <unistd.h>
> #include <asm/unistd.h>
> int main(void) {
> unsigned long nr = ~0xffffffffUL | __NR_exit;
> return !!syscall(nr, 42, 1, 2, 3, 4, 5);
> }
> EOF
> $ ./a.out; echo \$?=$?
> $?=42
> $ strace -enone ./a.out
> syscall_18446744069414584380(0x2a, 0x1, 0x2, 0x3, 0x4, 0x5) = -1 (errno 38)
> +++ exited with 1 +++
>
> This change syncs CONFIG_X86_X32=y fastpath behaviour with the case
> when CONFIG_X86_X32 is not enabled.
Do you see real brokenness here, or is it just weird?
>
> Fixes: fca460f95e92 ("x32: Handle the x32 system call flag")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
> ---
> arch/x86/entry/entry_64.S | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 4916725..3bab6af 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -185,12 +185,10 @@ entry_SYSCALL_64_fastpath:
> */
> TRACE_IRQS_ON
> ENABLE_INTERRUPTS(CLBR_NONE)
> -#if __SYSCALL_MASK == ~0
> - cmpq $__NR_syscall_max, %rax
> -#else
> - andl $__SYSCALL_MASK, %eax
> - cmpl $__NR_syscall_max, %eax
> +#if __SYSCALL_MASK != ~0
> + andq $__SYSCALL_MASK, %rax
> #endif
> + cmpq $__NR_syscall_max, %rax
I don't know much about x32 userspace, but there's an argument that
the high bits *should* be masked off if the x32 bit is set. Of
course, that's slower, but it could be done without performance loss,
I think.
--Andy
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-14 21:05 ` Andy Lutomirski
@ 2017-09-14 21:33 ` Dmitry V. Levin
2017-09-14 21:38 ` Andy Lutomirski
0 siblings, 1 reply; 13+ messages in thread
From: Dmitry V. Levin @ 2017-09-14 21:33 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
Oleg Nesterov, Eugene Syromyatnikov, linux-kernel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 2021 bytes --]
On Thu, Sep 14, 2017 at 02:05:04PM -0700, Andy Lutomirski wrote:
> On Tue, Sep 12, 2017 at 3:57 PM, Dmitry V. Levin wrote:
> > Before this change, CONFIG_X86_X32=y fastpath behaviour was different
> > from slowpath:
> >
> > $ gcc -xc -Wall -O2 - <<'EOF'
> > #include <unistd.h>
> > #include <asm/unistd.h>
> > int main(void) {
> > unsigned long nr = ~0xffffffffUL | __NR_exit;
> > return !!syscall(nr, 42, 1, 2, 3, 4, 5);
> > }
> > EOF
> > $ ./a.out; echo \$?=$?
> > $?=42
> > $ strace -enone ./a.out
> > syscall_18446744069414584380(0x2a, 0x1, 0x2, 0x3, 0x4, 0x5) = -1 (errno 38)
> > +++ exited with 1 +++
> >
> > This change syncs CONFIG_X86_X32=y fastpath behaviour with the case
> > when CONFIG_X86_X32 is not enabled.
>
> Do you see real brokenness here, or is it just weird?
It's definitely broken. A syscall should be either valid or invalid
regardless of implementation peculiarities like fastpath vs slowpath.
> > Fixes: fca460f95e92 ("x32: Handle the x32 system call flag")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
> > ---
> > arch/x86/entry/entry_64.S | 8 +++-----
> > 1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > index 4916725..3bab6af 100644
> > --- a/arch/x86/entry/entry_64.S
> > +++ b/arch/x86/entry/entry_64.S
> > @@ -185,12 +185,10 @@ entry_SYSCALL_64_fastpath:
> > */
> > TRACE_IRQS_ON
> > ENABLE_INTERRUPTS(CLBR_NONE)
> > -#if __SYSCALL_MASK == ~0
> > - cmpq $__NR_syscall_max, %rax
> > -#else
> > - andl $__SYSCALL_MASK, %eax
> > - cmpl $__NR_syscall_max, %eax
> > +#if __SYSCALL_MASK != ~0
> > + andq $__SYSCALL_MASK, %rax
> > #endif
> > + cmpq $__NR_syscall_max, %rax
>
> I don't know much about x32 userspace, but there's an argument that
> the high bits *should* be masked off if the x32 bit is set.
Why?
--
ldv
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-14 21:33 ` Dmitry V. Levin
@ 2017-09-14 21:38 ` Andy Lutomirski
2017-09-15 5:31 ` Ingo Molnar
0 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2017-09-14 21:38 UTC (permalink / raw)
To: Andy Lutomirski, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
X86 ML, Oleg Nesterov, Eugene Syromyatnikov,
linux-kernel@vger.kernel.org
On Thu, Sep 14, 2017 at 2:33 PM, Dmitry V. Levin <ldv@altlinux.org> wrote:
> On Thu, Sep 14, 2017 at 02:05:04PM -0700, Andy Lutomirski wrote:
>> On Tue, Sep 12, 2017 at 3:57 PM, Dmitry V. Levin wrote:
>> > Before this change, CONFIG_X86_X32=y fastpath behaviour was different
>> > from slowpath:
>> >
>> > $ gcc -xc -Wall -O2 - <<'EOF'
>> > #include <unistd.h>
>> > #include <asm/unistd.h>
>> > int main(void) {
>> > unsigned long nr = ~0xffffffffUL | __NR_exit;
>> > return !!syscall(nr, 42, 1, 2, 3, 4, 5);
>> > }
>> > EOF
>> > $ ./a.out; echo \$?=$?
>> > $?=42
>> > $ strace -enone ./a.out
>> > syscall_18446744069414584380(0x2a, 0x1, 0x2, 0x3, 0x4, 0x5) = -1 (errno 38)
>> > +++ exited with 1 +++
>> >
>> > This change syncs CONFIG_X86_X32=y fastpath behaviour with the case
>> > when CONFIG_X86_X32 is not enabled.
>>
>> Do you see real brokenness here, or is it just weird?
>
> It's definitely broken. A syscall should be either valid or invalid
> regardless of implementation peculiarities like fastpath vs slowpath.
>
>> > Fixes: fca460f95e92 ("x32: Handle the x32 system call flag")
>> > Cc: stable@vger.kernel.org
>> > Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
>> > ---
>> > arch/x86/entry/entry_64.S | 8 +++-----
>> > 1 file changed, 3 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> > index 4916725..3bab6af 100644
>> > --- a/arch/x86/entry/entry_64.S
>> > +++ b/arch/x86/entry/entry_64.S
>> > @@ -185,12 +185,10 @@ entry_SYSCALL_64_fastpath:
>> > */
>> > TRACE_IRQS_ON
>> > ENABLE_INTERRUPTS(CLBR_NONE)
>> > -#if __SYSCALL_MASK == ~0
>> > - cmpq $__NR_syscall_max, %rax
>> > -#else
>> > - andl $__SYSCALL_MASK, %eax
>> > - cmpl $__NR_syscall_max, %eax
>> > +#if __SYSCALL_MASK != ~0
>> > + andq $__SYSCALL_MASK, %rax
>> > #endif
>> > + cmpq $__NR_syscall_max, %rax
>>
>> I don't know much about x32 userspace, but there's an argument that
>> the high bits *should* be masked off if the x32 bit is set.
>
> Why?
Because it always worked that way.
That being said, I'd be okay with applying your patch and seeing
whether anything breaks. Ingo?
>
>
> --
> ldv
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-14 21:38 ` Andy Lutomirski
@ 2017-09-15 5:31 ` Ingo Molnar
2017-09-15 5:46 ` hpa
2017-09-15 5:47 ` hpa
0 siblings, 2 replies; 13+ messages in thread
From: Ingo Molnar @ 2017-09-15 5:31 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, X86 ML,
Oleg Nesterov, Eugene Syromyatnikov, linux-kernel@vger.kernel.org,
Linus Torvalds, Peter Zijlstra, Andrew Morton
* Andy Lutomirski <luto@kernel.org> wrote:
> >> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> >> > index 4916725..3bab6af 100644
> >> > --- a/arch/x86/entry/entry_64.S
> >> > +++ b/arch/x86/entry/entry_64.S
> >> > @@ -185,12 +185,10 @@ entry_SYSCALL_64_fastpath:
> >> > */
> >> > TRACE_IRQS_ON
> >> > ENABLE_INTERRUPTS(CLBR_NONE)
> >> > -#if __SYSCALL_MASK == ~0
> >> > - cmpq $__NR_syscall_max, %rax
> >> > -#else
> >> > - andl $__SYSCALL_MASK, %eax
> >> > - cmpl $__NR_syscall_max, %eax
> >> > +#if __SYSCALL_MASK != ~0
> >> > + andq $__SYSCALL_MASK, %rax
> >> > #endif
> >> > + cmpq $__NR_syscall_max, %rax
> >>
> >> I don't know much about x32 userspace, but there's an argument that
> >> the high bits *should* be masked off if the x32 bit is set.
> >
> > Why?
>
> Because it always worked that way.
>
> That being said, I'd be okay with applying your patch and seeing
> whether anything breaks. Ingo?
So I believe this was introduced with x32 as a 'fresh, modern syscall ABI'
behavioral aspect, because we wanted to protect the overly complex syscall entry
code from 'weird' input values. IIRC there was an old bug where we'd overflow the
syscall table in certain circumstances ...
But our new, redesigned entry code is a lot less complex, a lot more readable and
a lot more maintainable (not to mention a lot more robust), so if invalid RAX
values with high bits set get reliably turned into -ENOSYS or such then I'd not
mind the patch per se either, as a general consistency improvement.
Of course if something in x32 user-land breaks then this turns into an ABI and we
have to reintroduce this aspect, as a quirk :-/
It should also improve x32 syscall performance a tiny bit, right? So might be
worth a try on various grounds.
( Another future advantage would be that _maybe_ we could use the high RAX
component as an extra (64-bit only) special argument of sorts. Not that I can
think of any such use right now. )
Thanks,
Ingo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-15 5:31 ` Ingo Molnar
@ 2017-09-15 5:46 ` hpa
2017-09-17 16:45 ` Dmitry V. Levin
2017-09-15 5:47 ` hpa
1 sibling, 1 reply; 13+ messages in thread
From: hpa @ 2017-09-15 5:46 UTC (permalink / raw)
To: Ingo Molnar, Andy Lutomirski
Cc: Thomas Gleixner, Ingo Molnar, X86 ML, Oleg Nesterov,
Eugene Syromyatnikov, linux-kernel@vger.kernel.org,
Linus Torvalds, Peter Zijlstra, Andrew Morton
On September 14, 2017 10:31:55 PM PDT, Ingo Molnar <mingo@kernel.org> wrote:
>
>* Andy Lutomirski <luto@kernel.org> wrote:
>
>> >> > diff --git a/arch/x86/entry/entry_64.S
>b/arch/x86/entry/entry_64.S
>> >> > index 4916725..3bab6af 100644
>> >> > --- a/arch/x86/entry/entry_64.S
>> >> > +++ b/arch/x86/entry/entry_64.S
>> >> > @@ -185,12 +185,10 @@ entry_SYSCALL_64_fastpath:
>> >> > */
>> >> > TRACE_IRQS_ON
>> >> > ENABLE_INTERRUPTS(CLBR_NONE)
>> >> > -#if __SYSCALL_MASK == ~0
>> >> > - cmpq $__NR_syscall_max, %rax
>> >> > -#else
>> >> > - andl $__SYSCALL_MASK, %eax
>> >> > - cmpl $__NR_syscall_max, %eax
>> >> > +#if __SYSCALL_MASK != ~0
>> >> > + andq $__SYSCALL_MASK, %rax
>> >> > #endif
>> >> > + cmpq $__NR_syscall_max, %rax
>> >>
>> >> I don't know much about x32 userspace, but there's an argument
>that
>> >> the high bits *should* be masked off if the x32 bit is set.
>> >
>> > Why?
>>
>> Because it always worked that way.
>>
>> That being said, I'd be okay with applying your patch and seeing
>> whether anything breaks. Ingo?
>
>So I believe this was introduced with x32 as a 'fresh, modern syscall
>ABI'
>behavioral aspect, because we wanted to protect the overly complex
>syscall entry
>code from 'weird' input values. IIRC there was an old bug where we'd
>overflow the
>syscall table in certain circumstances ...
>
>But our new, redesigned entry code is a lot less complex, a lot more
>readable and
>a lot more maintainable (not to mention a lot more robust), so if
>invalid RAX
>values with high bits set get reliably turned into -ENOSYS or such then
>I'd not
>mind the patch per se either, as a general consistency improvement.
>
>Of course if something in x32 user-land breaks then this turns into an
>ABI and we
>have to reintroduce this aspect, as a quirk :-/
>
>It should also improve x32 syscall performance a tiny bit, right? So
>might be
>worth a try on various grounds.
>
>( Another future advantage would be that _maybe_ we could use the high
>RAX
>component as an extra (64-bit only) special argument of sorts. Not that
>I can
> think of any such use right now. )
>
>Thanks,
>
> Ingo
x32 should be sharing the native 64-but entry code, by design. We have had the latter mask the upper 32 bits, so we should do that for x32 as well.
There seems to be little if no motivation to ever change both these ABIs.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-15 5:31 ` Ingo Molnar
2017-09-15 5:46 ` hpa
@ 2017-09-15 5:47 ` hpa
1 sibling, 0 replies; 13+ messages in thread
From: hpa @ 2017-09-15 5:47 UTC (permalink / raw)
To: Ingo Molnar, Andy Lutomirski
Cc: Thomas Gleixner, Ingo Molnar, X86 ML, Oleg Nesterov,
Eugene Syromyatnikov, linux-kernel@vger.kernel.org,
Linus Torvalds, Peter Zijlstra, Andrew Morton
On September 14, 2017 10:31:55 PM PDT, Ingo Molnar <mingo@kernel.org> wrote:
>
>* Andy Lutomirski <luto@kernel.org> wrote:
>
>> >> > diff --git a/arch/x86/entry/entry_64.S
>b/arch/x86/entry/entry_64.S
>> >> > index 4916725..3bab6af 100644
>> >> > --- a/arch/x86/entry/entry_64.S
>> >> > +++ b/arch/x86/entry/entry_64.S
>> >> > @@ -185,12 +185,10 @@ entry_SYSCALL_64_fastpath:
>> >> > */
>> >> > TRACE_IRQS_ON
>> >> > ENABLE_INTERRUPTS(CLBR_NONE)
>> >> > -#if __SYSCALL_MASK == ~0
>> >> > - cmpq $__NR_syscall_max, %rax
>> >> > -#else
>> >> > - andl $__SYSCALL_MASK, %eax
>> >> > - cmpl $__NR_syscall_max, %eax
>> >> > +#if __SYSCALL_MASK != ~0
>> >> > + andq $__SYSCALL_MASK, %rax
>> >> > #endif
>> >> > + cmpq $__NR_syscall_max, %rax
>> >>
>> >> I don't know much about x32 userspace, but there's an argument
>that
>> >> the high bits *should* be masked off if the x32 bit is set.
>> >
>> > Why?
>>
>> Because it always worked that way.
>>
>> That being said, I'd be okay with applying your patch and seeing
>> whether anything breaks. Ingo?
>
>So I believe this was introduced with x32 as a 'fresh, modern syscall
>ABI'
>behavioral aspect, because we wanted to protect the overly complex
>syscall entry
>code from 'weird' input values. IIRC there was an old bug where we'd
>overflow the
>syscall table in certain circumstances ...
>
>But our new, redesigned entry code is a lot less complex, a lot more
>readable and
>a lot more maintainable (not to mention a lot more robust), so if
>invalid RAX
>values with high bits set get reliably turned into -ENOSYS or such then
>I'd not
>mind the patch per se either, as a general consistency improvement.
>
>Of course if something in x32 user-land breaks then this turns into an
>ABI and we
>have to reintroduce this aspect, as a quirk :-/
>
>It should also improve x32 syscall performance a tiny bit, right? So
>might be
>worth a try on various grounds.
>
>( Another future advantage would be that _maybe_ we could use the high
>RAX
>component as an extra (64-bit only) special argument of sorts. Not that
>I can
> think of any such use right now. )
>
>Thanks,
>
> Ingo
If the consensus is that we should change the x86-64 ABI then we should change the x32 ABI to match, though.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-14 20:21 ` Dmitry V. Levin
@ 2017-09-15 16:12 ` Oleg Nesterov
0 siblings, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2017-09-15 16:12 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
Andy Lutomirski, Eugene Syromyatnikov, linux-kernel
On 09/14, Dmitry V. Levin wrote:
>
> On Wed, Sep 13, 2017 at 06:49:21PM +0200, Oleg Nesterov wrote:
>
> > do_syscall_64() does "nr & __SYSCALL_MASK" unconditionally,
>
> yes
>
> > this clears the upper bits, no?
>
> Why? As "nr" is of type "unsigned long" and __SYSCALL_MASK is either
> (~(__X32_SYSCALL_BIT)) or (~0), that is, an integer with the sign bit set,
Yes, thanks for correcting me, it is "int" but somehow I thought it is
"unsigned int".
Oleg.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y
2017-09-15 5:46 ` hpa
@ 2017-09-17 16:45 ` Dmitry V. Levin
0 siblings, 0 replies; 13+ messages in thread
From: Dmitry V. Levin @ 2017-09-17 16:45 UTC (permalink / raw)
To: hpa
Cc: Ingo Molnar, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
X86 ML, Oleg Nesterov, Eugene Syromyatnikov, linux-kernel,
Linus Torvalds, Peter Zijlstra, Andrew Morton, Dmitry V. Levin
[-- Attachment #1: Type: text/plain, Size: 2792 bytes --]
On Thu, Sep 14, 2017 at 10:46:37PM -0700, hpa@zytor.com wrote:
> On September 14, 2017 10:31:55 PM PDT, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >* Andy Lutomirski <luto@kernel.org> wrote:
> >
> >> >> > diff --git a/arch/x86/entry/entry_64.S
> >b/arch/x86/entry/entry_64.S
> >> >> > index 4916725..3bab6af 100644
> >> >> > --- a/arch/x86/entry/entry_64.S
> >> >> > +++ b/arch/x86/entry/entry_64.S
> >> >> > @@ -185,12 +185,10 @@ entry_SYSCALL_64_fastpath:
> >> >> > */
> >> >> > TRACE_IRQS_ON
> >> >> > ENABLE_INTERRUPTS(CLBR_NONE)
> >> >> > -#if __SYSCALL_MASK == ~0
> >> >> > - cmpq $__NR_syscall_max, %rax
> >> >> > -#else
> >> >> > - andl $__SYSCALL_MASK, %eax
> >> >> > - cmpl $__NR_syscall_max, %eax
> >> >> > +#if __SYSCALL_MASK != ~0
> >> >> > + andq $__SYSCALL_MASK, %rax
> >> >> > #endif
> >> >> > + cmpq $__NR_syscall_max, %rax
> >> >>
> >> >> I don't know much about x32 userspace, but there's an argument
> >that
> >> >> the high bits *should* be masked off if the x32 bit is set.
> >> >
> >> > Why?
> >>
> >> Because it always worked that way.
> >>
> >> That being said, I'd be okay with applying your patch and seeing
> >> whether anything breaks. Ingo?
> >
> >So I believe this was introduced with x32 as a 'fresh, modern syscall
> >ABI'
> >behavioral aspect, because we wanted to protect the overly complex
> >syscall entry
> >code from 'weird' input values. IIRC there was an old bug where we'd
> >overflow the
> >syscall table in certain circumstances ...
> >
> >But our new, redesigned entry code is a lot less complex, a lot more
> >readable and
> >a lot more maintainable (not to mention a lot more robust), so if
> >invalid RAX
> >values with high bits set get reliably turned into -ENOSYS or such then
> >I'd not
> >mind the patch per se either, as a general consistency improvement.
> >
> >Of course if something in x32 user-land breaks then this turns into an
> >ABI and we
> >have to reintroduce this aspect, as a quirk :-/
> >
> >It should also improve x32 syscall performance a tiny bit, right? So
> >might be
> >worth a try on various grounds.
> >
> >( Another future advantage would be that _maybe_ we could use the high
> >RAX
> >component as an extra (64-bit only) special argument of sorts. Not that
> >I can
> > think of any such use right now. )
> >
> >Thanks,
> >
> > Ingo
>
> x32 should be sharing the native 64-but entry code, by design.
> We have had the latter mask the upper 32 bits,
There must be some misunderstanding on your side: the clearing
of the upper 32 bits of 64-bit syscall numbers currently happens
if and only if fastpath and CONFIG_X86_X32_ABI is enabled.
--
ldv
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-09-17 16:45 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-12 22:57 [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y Dmitry V. Levin
2017-09-13 16:49 ` Oleg Nesterov
2017-09-14 19:40 ` Eugene Syromyatnikov
2017-09-14 20:24 ` Dmitry V. Levin
2017-09-14 20:21 ` Dmitry V. Levin
2017-09-15 16:12 ` Oleg Nesterov
2017-09-14 21:05 ` Andy Lutomirski
2017-09-14 21:33 ` Dmitry V. Levin
2017-09-14 21:38 ` Andy Lutomirski
2017-09-15 5:31 ` Ingo Molnar
2017-09-15 5:46 ` hpa
2017-09-17 16:45 ` Dmitry V. Levin
2017-09-15 5:47 ` hpa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).