stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
       [not found] <cover.1412138935.git.luto@amacapital.net>
@ 2014-10-01  4:51 ` Andy Lutomirski
  2014-10-01  5:09   ` Sebastian Lackner
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-01  4:51 UTC (permalink / raw)
  To: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org,
	Chuck Ebbert, Andy Lutomirski, stable

The NT flag doesn't do anything in long mode other than causing IRET
to #GP.  Oddly, CPL3 code can still set NT using popf.

Entry via hardware or software interrupt clears NT automatically, so
the only relevant entries are fast syscalls.

If user code causes kernel code to run with NT set, then there's at
least some (small) chance that it could cause trouble.  For example,
user code could cause a call to EFI code with NT set, and who knows
what would happen?  Apparently some games on Wine sometimes do
this (!), and, if an IRET return happens, they will segfault.  That
segfault cannot be handled, because signal delivery fails, too.

This patch programs the CPU to clear NT on entry via SYSCALL (both
32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
in software on entry via SYSENTER.

To save a few cycles, this borrows a trick from Jan Beulich in Xen:
it checks whether NT is set before trying to clear it.  As a result,
it seems to have very little effect on SYSENTER performance on my
machine.

Testers beware: on Xen, SYSENTER with NT set turns into a GPF.

I haven't touched anything on 32-bit kernels.

The syscall mask change comes from a variant of this patch by Anish
Bhatt.

Cc: stable@vger.kernel.org
Reported-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/ia32/ia32entry.S    | 12 ++++++++++++
 arch/x86/kernel/cpu/common.c |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 4299eb05023c..44d1dd371454 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target)
 1:	movl	(%rbp),%ebp
 	_ASM_EXTABLE(1b,ia32_badarg)
 	ASM_CLAC
+
+	/*
+	 * Sysenter doesn't filter flags, so we need to clear NT
+	 * ourselves.  To save a few cycles, we can check whether
+	 * NT was set instead of doing an unconditional popfq.
+	 */
+	testl $X86_EFLAGS_NT,EFLAGS(%rsp)	/* saved EFLAGS match cpu */
+	jz 1f
+	pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
+	popfq_cfi
+1:
+
 	orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
 	testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
 	CFI_REMEMBER_STATE
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index e4ab2b42bd6f..31265580c38a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1184,7 +1184,7 @@ void syscall_init(void)
 	/* Flags to clear on syscall */
 	wrmsrl(MSR_SYSCALL_MASK,
 	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
-	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
+	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
 }
 
 /*
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01  4:51 ` [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
@ 2014-10-01  5:09   ` Sebastian Lackner
  2014-10-01  5:24     ` Andy Lutomirski
  2014-10-01 14:09   ` Chuck Ebbert
  2014-10-01 15:22   ` H. Peter Anvin
  2 siblings, 1 reply; 14+ messages in thread
From: Sebastian Lackner @ 2014-10-01  5:09 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, X86 ML, Ingo Molnar,
	H. Peter Anvin
  Cc: Anish Bhatt, linux-kernel@vger.kernel.org, Chuck Ebbert, stable

> +	testl $X86_EFLAGS_NT,EFLAGS(%rsp)	/* saved EFLAGS match cpu */
> +	jz 1f
> +	pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
> +	popfq_cfi
> +1:
> +

Do you think it makes sense to change the order here, so that no jump happens if
NT is not set (which happens a bit more often, than the other way round)? Just a
guess though, haven't measured if pipeline effects have such a big influence in this
case. ;)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01  5:09   ` Sebastian Lackner
@ 2014-10-01  5:24     ` Andy Lutomirski
  2014-10-01 15:19       ` H. Peter Anvin
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-01  5:24 UTC (permalink / raw)
  To: Sebastian Lackner
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin, Anish Bhatt,
	linux-kernel@vger.kernel.org, Chuck Ebbert, stable

On Tue, Sep 30, 2014 at 10:09 PM, Sebastian Lackner
<sebastian@fds-team.de> wrote:
>> +     testl $X86_EFLAGS_NT,EFLAGS(%rsp)       /* saved EFLAGS match cpu */
>> +     jz 1f
>> +     pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
>> +     popfq_cfi
>> +1:
>> +
>
> Do you think it makes sense to change the order here, so that no jump happens if
> NT is not set (which happens a bit more often, than the other way round)? Just a
> guess though, haven't measured if pipeline effects have such a big influence in this
> case. ;)
>

It should be immeasurable in a tight loop, since it will predict
correctly almost every time.  And, unless cfi state works across
.pushsection (does it?), getting the cfi annotations right will be
more complicated.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01  4:51 ` [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
  2014-10-01  5:09   ` Sebastian Lackner
@ 2014-10-01 14:09   ` Chuck Ebbert
  2014-10-01 14:32     ` Chuck Ebbert
  2014-10-01 15:22   ` H. Peter Anvin
  2 siblings, 1 reply; 14+ messages in thread
From: Chuck Ebbert @ 2014-10-01 14:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org,
	stable

On Tue, 30 Sep 2014 21:51:27 -0700
Andy Lutomirski <luto@amacapital.net> wrote:

> The NT flag doesn't do anything in long mode other than causing IRET
> to #GP.  Oddly, CPL3 code can still set NT using popf.
> 
> Entry via hardware or software interrupt clears NT automatically, so
> the only relevant entries are fast syscalls.
> 
> If user code causes kernel code to run with NT set, then there's at
> least some (small) chance that it could cause trouble.  For example,
> user code could cause a call to EFI code with NT set, and who knows
> what would happen?  Apparently some games on Wine sometimes do
> this (!), and, if an IRET return happens, they will segfault.  That
> segfault cannot be handled, because signal delivery fails, too.
> 
> This patch programs the CPU to clear NT on entry via SYSCALL (both
> 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
> in software on entry via SYSENTER.
> 
> To save a few cycles, this borrows a trick from Jan Beulich in Xen:
> it checks whether NT is set before trying to clear it.  As a result,
> it seems to have very little effect on SYSENTER performance on my
> machine.
> 
> Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
> 
> I haven't touched anything on 32-bit kernels.
> 
> The syscall mask change comes from a variant of this patch by Anish
> Bhatt.
> 
> Cc: stable@vger.kernel.org
> Reported-by: Anish Bhatt <anish@chelsio.com>
> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> ---
>  arch/x86/ia32/ia32entry.S    | 12 ++++++++++++
>  arch/x86/kernel/cpu/common.c |  2 +-
>  2 files changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> index 4299eb05023c..44d1dd371454 100644
> --- a/arch/x86/ia32/ia32entry.S
> +++ b/arch/x86/ia32/ia32entry.S
> @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target)
>  1:	movl	(%rbp),%ebp
>  	_ASM_EXTABLE(1b,ia32_badarg)
>  	ASM_CLAC
> +
> +	/*
> +	 * Sysenter doesn't filter flags, so we need to clear NT
> +	 * ourselves.  To save a few cycles, we can check whether
> +	 * NT was set instead of doing an unconditional popfq.
> +	 */
> +	testl $X86_EFLAGS_NT,EFLAGS(%rsp)	/* saved EFLAGS match cpu */
> +	jz 1f
> +	pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
> +	popfq_cfi
> +1:
> +

I think you've gone backwards with this version. The earlier one got
some of the performance loss back by not needing to do the "cld" insn.

You should just replace that "cld" (line 146) with

	pushfq_cfi $2
	popfq_cfi

Unfortunately I'm not set up to test that yet. But I did look at
the SDM and can't see a need to preserve any of the flags.

>  	orl     $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>  	testl   $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
>  	CFI_REMEMBER_STATE
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index e4ab2b42bd6f..31265580c38a 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1184,7 +1184,7 @@ void syscall_init(void)
>  	/* Flags to clear on syscall */
>  	wrmsrl(MSR_SYSCALL_MASK,
>  	       X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|
> -	       X86_EFLAGS_IOPL|X86_EFLAGS_AC);
> +	       X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT);
>  }
>  
>  /*


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01 14:09   ` Chuck Ebbert
@ 2014-10-01 14:32     ` Chuck Ebbert
  2014-10-01 14:46       ` Andy Lutomirski
  0 siblings, 1 reply; 14+ messages in thread
From: Chuck Ebbert @ 2014-10-01 14:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org,
	stable

On Wed, 1 Oct 2014 09:09:13 -0500
Chuck Ebbert <cebbert.lkml@gmail.com> wrote:

> On Tue, 30 Sep 2014 21:51:27 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
> 
> > The NT flag doesn't do anything in long mode other than causing IRET
> > to #GP.  Oddly, CPL3 code can still set NT using popf.
> > 
> > Entry via hardware or software interrupt clears NT automatically, so
> > the only relevant entries are fast syscalls.
> > 
> > If user code causes kernel code to run with NT set, then there's at
> > least some (small) chance that it could cause trouble.  For example,
> > user code could cause a call to EFI code with NT set, and who knows
> > what would happen?  Apparently some games on Wine sometimes do
> > this (!), and, if an IRET return happens, they will segfault.  That
> > segfault cannot be handled, because signal delivery fails, too.
> > 
> > This patch programs the CPU to clear NT on entry via SYSCALL (both
> > 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
> > in software on entry via SYSENTER.
> > 
> > To save a few cycles, this borrows a trick from Jan Beulich in Xen:
> > it checks whether NT is set before trying to clear it.  As a result,
> > it seems to have very little effect on SYSENTER performance on my
> > machine.
> > 
> > Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
> > 
> > I haven't touched anything on 32-bit kernels.
> > 
> > The syscall mask change comes from a variant of this patch by Anish
> > Bhatt.
> > 
> > Cc: stable@vger.kernel.org
> > Reported-by: Anish Bhatt <anish@chelsio.com>
> > Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> > ---
> >  arch/x86/ia32/ia32entry.S    | 12 ++++++++++++
> >  arch/x86/kernel/cpu/common.c |  2 +-
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> > index 4299eb05023c..44d1dd371454 100644
> > --- a/arch/x86/ia32/ia32entry.S
> > +++ b/arch/x86/ia32/ia32entry.S
> > @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target)
> >  1:	movl	(%rbp),%ebp
> >  	_ASM_EXTABLE(1b,ia32_badarg)
> >  	ASM_CLAC
> > +
> > +	/*
> > +	 * Sysenter doesn't filter flags, so we need to clear NT
> > +	 * ourselves.  To save a few cycles, we can check whether
> > +	 * NT was set instead of doing an unconditional popfq.
> > +	 */
> > +	testl $X86_EFLAGS_NT,EFLAGS(%rsp)	/* saved EFLAGS match cpu */
> > +	jz 1f
> > +	pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
> > +	popfq_cfi
> > +1:
> > +
> 
> I think you've gone backwards with this version. The earlier one got
> some of the performance loss back by not needing to do the "cld" insn.
> 
> You should just replace that "cld" (line 146) with
> 
> 	pushfq_cfi $2
> 	popfq_cfi
> 
> Unfortunately I'm not set up to test that yet. But I did look at
> the SDM and can't see a need to preserve any of the flags.
> 


<sigh> that's:

	pushfw_cfi $0x202

IF needs to stay on because we've already enabled interrupts after
sysenter.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01 14:32     ` Chuck Ebbert
@ 2014-10-01 14:46       ` Andy Lutomirski
  2014-10-01 14:56         ` Chuck Ebbert
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-01 14:46 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org,
	stable

On Wed, Oct 1, 2014 at 7:32 AM, Chuck Ebbert <cebbert.lkml@gmail.com> wrote:
> On Wed, 1 Oct 2014 09:09:13 -0500
> Chuck Ebbert <cebbert.lkml@gmail.com> wrote:
>
>> On Tue, 30 Sep 2014 21:51:27 -0700
>> Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> > The NT flag doesn't do anything in long mode other than causing IRET
>> > to #GP.  Oddly, CPL3 code can still set NT using popf.
>> >
>> > Entry via hardware or software interrupt clears NT automatically, so
>> > the only relevant entries are fast syscalls.
>> >
>> > If user code causes kernel code to run with NT set, then there's at
>> > least some (small) chance that it could cause trouble.  For example,
>> > user code could cause a call to EFI code with NT set, and who knows
>> > what would happen?  Apparently some games on Wine sometimes do
>> > this (!), and, if an IRET return happens, they will segfault.  That
>> > segfault cannot be handled, because signal delivery fails, too.
>> >
>> > This patch programs the CPU to clear NT on entry via SYSCALL (both
>> > 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
>> > in software on entry via SYSENTER.
>> >
>> > To save a few cycles, this borrows a trick from Jan Beulich in Xen:
>> > it checks whether NT is set before trying to clear it.  As a result,
>> > it seems to have very little effect on SYSENTER performance on my
>> > machine.
>> >
>> > Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
>> >
>> > I haven't touched anything on 32-bit kernels.
>> >
>> > The syscall mask change comes from a variant of this patch by Anish
>> > Bhatt.
>> >
>> > Cc: stable@vger.kernel.org
>> > Reported-by: Anish Bhatt <anish@chelsio.com>
>> > Signed-off-by: Andy Lutomirski <luto@amacapital.net>
>> > ---
>> >  arch/x86/ia32/ia32entry.S    | 12 ++++++++++++
>> >  arch/x86/kernel/cpu/common.c |  2 +-
>> >  2 files changed, 13 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
>> > index 4299eb05023c..44d1dd371454 100644
>> > --- a/arch/x86/ia32/ia32entry.S
>> > +++ b/arch/x86/ia32/ia32entry.S
>> > @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target)
>> >  1: movl    (%rbp),%ebp
>> >     _ASM_EXTABLE(1b,ia32_badarg)
>> >     ASM_CLAC
>> > +
>> > +   /*
>> > +    * Sysenter doesn't filter flags, so we need to clear NT
>> > +    * ourselves.  To save a few cycles, we can check whether
>> > +    * NT was set instead of doing an unconditional popfq.
>> > +    */
>> > +   testl $X86_EFLAGS_NT,EFLAGS(%rsp)       /* saved EFLAGS match cpu */
>> > +   jz 1f
>> > +   pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
>> > +   popfq_cfi
>> > +1:
>> > +
>>
>> I think you've gone backwards with this version. The earlier one got
>> some of the performance loss back by not needing to do the "cld" insn.
>>
>> You should just replace that "cld" (line 146) with
>>
>>       pushfq_cfi $2
>>       popfq_cfi
>>
>> Unfortunately I'm not set up to test that yet. But I did look at
>> the SDM and can't see a need to preserve any of the flags.
>>
>
>
> <sigh> that's:
>
>         pushfw_cfi $0x202
>
> IF needs to stay on because we've already enabled interrupts after
> sysenter.

I tried exactly this.  It was much slower than the version I sent.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01 14:46       ` Andy Lutomirski
@ 2014-10-01 14:56         ` Chuck Ebbert
  2014-10-01 15:03           ` Andy Lutomirski
  0 siblings, 1 reply; 14+ messages in thread
From: Chuck Ebbert @ 2014-10-01 14:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org,
	stable

On Wed, 1 Oct 2014 07:46:54 -0700
Andy Lutomirski <luto@amacapital.net> wrote:

> On Wed, Oct 1, 2014 at 7:32 AM, Chuck Ebbert <cebbert.lkml@gmail.com> wrote:
> > On Wed, 1 Oct 2014 09:09:13 -0500
> > Chuck Ebbert <cebbert.lkml@gmail.com> wrote:
> >
> >> On Tue, 30 Sep 2014 21:51:27 -0700
> >> Andy Lutomirski <luto@amacapital.net> wrote:
> >>
> >> > The NT flag doesn't do anything in long mode other than causing IRET
> >> > to #GP.  Oddly, CPL3 code can still set NT using popf.
> >> >
> >> > Entry via hardware or software interrupt clears NT automatically, so
> >> > the only relevant entries are fast syscalls.
> >> >
> >> > If user code causes kernel code to run with NT set, then there's at
> >> > least some (small) chance that it could cause trouble.  For example,
> >> > user code could cause a call to EFI code with NT set, and who knows
> >> > what would happen?  Apparently some games on Wine sometimes do
> >> > this (!), and, if an IRET return happens, they will segfault.  That
> >> > segfault cannot be handled, because signal delivery fails, too.
> >> >
> >> > This patch programs the CPU to clear NT on entry via SYSCALL (both
> >> > 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
> >> > in software on entry via SYSENTER.
> >> >
> >> > To save a few cycles, this borrows a trick from Jan Beulich in Xen:
> >> > it checks whether NT is set before trying to clear it.  As a result,
> >> > it seems to have very little effect on SYSENTER performance on my
> >> > machine.
> >> >
> >> > Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
> >> >
> >> > I haven't touched anything on 32-bit kernels.
> >> >
> >> > The syscall mask change comes from a variant of this patch by Anish
> >> > Bhatt.
> >> >
> >> > Cc: stable@vger.kernel.org
> >> > Reported-by: Anish Bhatt <anish@chelsio.com>
> >> > Signed-off-by: Andy Lutomirski <luto@amacapital.net>
> >> > ---
> >> >  arch/x86/ia32/ia32entry.S    | 12 ++++++++++++
> >> >  arch/x86/kernel/cpu/common.c |  2 +-
> >> >  2 files changed, 13 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> >> > index 4299eb05023c..44d1dd371454 100644
> >> > --- a/arch/x86/ia32/ia32entry.S
> >> > +++ b/arch/x86/ia32/ia32entry.S
> >> > @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target)
> >> >  1: movl    (%rbp),%ebp
> >> >     _ASM_EXTABLE(1b,ia32_badarg)
> >> >     ASM_CLAC
> >> > +
> >> > +   /*
> >> > +    * Sysenter doesn't filter flags, so we need to clear NT
> >> > +    * ourselves.  To save a few cycles, we can check whether
> >> > +    * NT was set instead of doing an unconditional popfq.
> >> > +    */
> >> > +   testl $X86_EFLAGS_NT,EFLAGS(%rsp)       /* saved EFLAGS match cpu */
> >> > +   jz 1f
> >> > +   pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
> >> > +   popfq_cfi
> >> > +1:
> >> > +
> >>
> >> I think you've gone backwards with this version. The earlier one got
> >> some of the performance loss back by not needing to do the "cld" insn.
> >>
> >> You should just replace that "cld" (line 146) with
> >>
> >>       pushfq_cfi $2
> >>       popfq_cfi
> >>
> >> Unfortunately I'm not set up to test that yet. But I did look at
> >> the SDM and can't see a need to preserve any of the flags.
> >>
> >
> >
> > <sigh> that's:
> >
> >         pushfw_cfi $0x202
> >
> > IF needs to stay on because we've already enabled interrupts after
> > sysenter.
> 
> I tried exactly this.  It was much slower than the version I sent.
> 

Yeah, it looks like a new paravirt op that enables interrupts and
clears all the other flags would be the only way to do this without at
least some impact on performance.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01 14:56         ` Chuck Ebbert
@ 2014-10-01 15:03           ` Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-01 15:03 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin,
	Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org,
	stable

On Wed, Oct 1, 2014 at 7:56 AM, Chuck Ebbert <cebbert.lkml@gmail.com> wrote:
> On Wed, 1 Oct 2014 07:46:54 -0700
> Andy Lutomirski <luto@amacapital.net> wrote:
>
>> On Wed, Oct 1, 2014 at 7:32 AM, Chuck Ebbert <cebbert.lkml@gmail.com> wrote:
>> > On Wed, 1 Oct 2014 09:09:13 -0500
>> > Chuck Ebbert <cebbert.lkml@gmail.com> wrote:
>> >
>> >> On Tue, 30 Sep 2014 21:51:27 -0700
>> >> Andy Lutomirski <luto@amacapital.net> wrote:
>> >>
>> >> > The NT flag doesn't do anything in long mode other than causing IRET
>> >> > to #GP.  Oddly, CPL3 code can still set NT using popf.
>> >> >
>> >> > Entry via hardware or software interrupt clears NT automatically, so
>> >> > the only relevant entries are fast syscalls.
>> >> >
>> >> > If user code causes kernel code to run with NT set, then there's at
>> >> > least some (small) chance that it could cause trouble.  For example,
>> >> > user code could cause a call to EFI code with NT set, and who knows
>> >> > what would happen?  Apparently some games on Wine sometimes do
>> >> > this (!), and, if an IRET return happens, they will segfault.  That
>> >> > segfault cannot be handled, because signal delivery fails, too.
>> >> >
>> >> > This patch programs the CPU to clear NT on entry via SYSCALL (both
>> >> > 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
>> >> > in software on entry via SYSENTER.
>> >> >
>> >> > To save a few cycles, this borrows a trick from Jan Beulich in Xen:
>> >> > it checks whether NT is set before trying to clear it.  As a result,
>> >> > it seems to have very little effect on SYSENTER performance on my
>> >> > machine.
>> >> >
>> >> > Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
>> >> >
>> >> > I haven't touched anything on 32-bit kernels.
>> >> >
>> >> > The syscall mask change comes from a variant of this patch by Anish
>> >> > Bhatt.
>> >> >
>> >> > Cc: stable@vger.kernel.org
>> >> > Reported-by: Anish Bhatt <anish@chelsio.com>
>> >> > Signed-off-by: Andy Lutomirski <luto@amacapital.net>
>> >> > ---
>> >> >  arch/x86/ia32/ia32entry.S    | 12 ++++++++++++
>> >> >  arch/x86/kernel/cpu/common.c |  2 +-
>> >> >  2 files changed, 13 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
>> >> > index 4299eb05023c..44d1dd371454 100644
>> >> > --- a/arch/x86/ia32/ia32entry.S
>> >> > +++ b/arch/x86/ia32/ia32entry.S
>> >> > @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target)
>> >> >  1: movl    (%rbp),%ebp
>> >> >     _ASM_EXTABLE(1b,ia32_badarg)
>> >> >     ASM_CLAC
>> >> > +
>> >> > +   /*
>> >> > +    * Sysenter doesn't filter flags, so we need to clear NT
>> >> > +    * ourselves.  To save a few cycles, we can check whether
>> >> > +    * NT was set instead of doing an unconditional popfq.
>> >> > +    */
>> >> > +   testl $X86_EFLAGS_NT,EFLAGS(%rsp)       /* saved EFLAGS match cpu */
>> >> > +   jz 1f
>> >> > +   pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
>> >> > +   popfq_cfi
>> >> > +1:
>> >> > +
>> >>
>> >> I think you've gone backwards with this version. The earlier one got
>> >> some of the performance loss back by not needing to do the "cld" insn.
>> >>
>> >> You should just replace that "cld" (line 146) with
>> >>
>> >>       pushfq_cfi $2
>> >>       popfq_cfi
>> >>
>> >> Unfortunately I'm not set up to test that yet. But I did look at
>> >> the SDM and can't see a need to preserve any of the flags.
>> >>
>> >
>> >
>> > <sigh> that's:
>> >
>> >         pushfw_cfi $0x202
>> >
>> > IF needs to stay on because we've already enabled interrupts after
>> > sysenter.
>>
>> I tried exactly this.  It was much slower than the version I sent.
>>
>
> Yeah, it looks like a new paravirt op that enables interrupts and
> clears all the other flags would be the only way to do this without at
> least some impact on performance.

We have that -- it's called something like setfl.

But it still wouldn't help.  It seems that cld, test, jnz is simply
much faster than popfq.

If we could fold it with the sti earlier, *maybe* that would be a win,
but then we'd also have to patch the saved flags to avoid returning to
userspace with interrupts off.  (And I tried that.  It still didn't
seem to be fast enough.)

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01  5:24     ` Andy Lutomirski
@ 2014-10-01 15:19       ` H. Peter Anvin
  0 siblings, 0 replies; 14+ messages in thread
From: H. Peter Anvin @ 2014-10-01 15:19 UTC (permalink / raw)
  To: Andy Lutomirski, Sebastian Lackner
  Cc: Thomas Gleixner, X86 ML, Ingo Molnar, Anish Bhatt,
	linux-kernel@vger.kernel.org, Chuck Ebbert, stable

On 09/30/2014 10:24 PM, Andy Lutomirski wrote:
> On Tue, Sep 30, 2014 at 10:09 PM, Sebastian Lackner
> <sebastian@fds-team.de> wrote:
>>> +     testl $X86_EFLAGS_NT,EFLAGS(%rsp)       /* saved EFLAGS match cpu */
>>> +     jz 1f
>>> +     pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
>>> +     popfq_cfi
>>> +1:
>>> +
>>
>> Do you think it makes sense to change the order here, so that no jump happens if
>> NT is not set (which happens a bit more often, than the other way round)? Just a
>> guess though, haven't measured if pipeline effects have such a big influence in this
>> case. ;)
>>
> 
> It should be immeasurable in a tight loop, since it will predict
> correctly almost every time.  And, unless cfi state works across
> .pushsection (does it?), getting the cfi annotations right will be
> more complicated.
> 

It does, actually... otherwise it would be almost impossible to use in a
lot of cases.

	-hpa



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01  4:51 ` [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
  2014-10-01  5:09   ` Sebastian Lackner
  2014-10-01 14:09   ` Chuck Ebbert
@ 2014-10-01 15:22   ` H. Peter Anvin
  2014-10-01 15:26     ` H. Peter Anvin
  2 siblings, 1 reply; 14+ messages in thread
From: H. Peter Anvin @ 2014-10-01 15:22 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, X86 ML, Ingo Molnar
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org,
	Chuck Ebbert, stable

On 09/30/2014 09:51 PM, Andy Lutomirski wrote:
> 
> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> index 4299eb05023c..44d1dd371454 100644
> --- a/arch/x86/ia32/ia32entry.S
> +++ b/arch/x86/ia32/ia32entry.S
> @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target)
>  1:	movl	(%rbp),%ebp
>  	_ASM_EXTABLE(1b,ia32_badarg)
>  	ASM_CLAC
> +
> +	/*
> +	 * Sysenter doesn't filter flags, so we need to clear NT
> +	 * ourselves.  To save a few cycles, we can check whether
> +	 * NT was set instead of doing an unconditional popfq.
> +	 */
> +	testl $X86_EFLAGS_NT,EFLAGS(%rsp)	/* saved EFLAGS match cpu */
> +	jz 1f
> +	pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
> +	popfq_cfi
> +1:
> +

I'm wondering if it would be easier to just remove ASM_CLAC and do this
unconditionally.  On SMAP-enabled hardware then that gives us back some
of the cycles, may make the branch unnecessary.

	-hpa



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01 15:22   ` H. Peter Anvin
@ 2014-10-01 15:26     ` H. Peter Anvin
  2014-10-01 15:50       ` Andy Lutomirski
  0 siblings, 1 reply; 14+ messages in thread
From: H. Peter Anvin @ 2014-10-01 15:26 UTC (permalink / raw)
  To: Andy Lutomirski, Thomas Gleixner, X86 ML, Ingo Molnar
  Cc: Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org,
	Chuck Ebbert, stable

On 10/01/2014 08:22 AM, H. Peter Anvin wrote:
> On 09/30/2014 09:51 PM, Andy Lutomirski wrote:
>>
>> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
>> index 4299eb05023c..44d1dd371454 100644
>> --- a/arch/x86/ia32/ia32entry.S
>> +++ b/arch/x86/ia32/ia32entry.S
>> @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target)
>>  1:	movl	(%rbp),%ebp
>>  	_ASM_EXTABLE(1b,ia32_badarg)
>>  	ASM_CLAC
>> +
>> +	/*
>> +	 * Sysenter doesn't filter flags, so we need to clear NT
>> +	 * ourselves.  To save a few cycles, we can check whether
>> +	 * NT was set instead of doing an unconditional popfq.
>> +	 */
>> +	testl $X86_EFLAGS_NT,EFLAGS(%rsp)	/* saved EFLAGS match cpu */
>> +	jz 1f
>> +	pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
>> +	popfq_cfi
>> +1:
>> +
> 
> I'm wondering if it would be easier to just remove ASM_CLAC and do this
> unconditionally.  On SMAP-enabled hardware then that gives us back some
> of the cycles, may make the branch unnecessary.
> 

Heck, we can drop the CLD and the STI as well (with some tweaking in
ia32_badarg.)

	-hpa



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01 15:26     ` H. Peter Anvin
@ 2014-10-01 15:50       ` Andy Lutomirski
  2014-10-01 16:04         ` Andy Lutomirski
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-01 15:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Sebastian Lackner, X86 ML, Thomas Gleixner, Anish Bhatt,
	Ingo Molnar, linux-kernel@vger.kernel.org, Chuck Ebbert, stable

On Oct 1, 2014 8:26 AM, "H. Peter Anvin" <hpa@zytor.com> wrote:
>
> On 10/01/2014 08:22 AM, H. Peter Anvin wrote:
> > On 09/30/2014 09:51 PM, Andy Lutomirski wrote:
> >>
> >> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
> >> index 4299eb05023c..44d1dd371454 100644
> >> --- a/arch/x86/ia32/ia32entry.S
> >> +++ b/arch/x86/ia32/ia32entry.S
> >> @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target)
> >>  1:  movl    (%rbp),%ebp
> >>      _ASM_EXTABLE(1b,ia32_badarg)
> >>      ASM_CLAC
> >> +
> >> +    /*
> >> +     * Sysenter doesn't filter flags, so we need to clear NT
> >> +     * ourselves.  To save a few cycles, we can check whether
> >> +     * NT was set instead of doing an unconditional popfq.
> >> +     */
> >> +    testl $X86_EFLAGS_NT,EFLAGS(%rsp)       /* saved EFLAGS match cpu */
> >> +    jz 1f
> >> +    pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
> >> +    popfq_cfi
> >> +1:
> >> +
> >
> > I'm wondering if it would be easier to just remove ASM_CLAC and do this
> > unconditionally.  On SMAP-enabled hardware then that gives us back some
> > of the cycles, may make the branch unnecessary.
> >
>
> Heck, we can drop the CLD and the STI as well (with some tweaking in
> ia32_badarg.)

I prototyped this, and performance sucked.  I suspect that cld and sti
are fairly well optimized, that I ended up introducing stalls due to
stack manipulation, and that Sandy Bridge's popfq microcode is just
not that fast.  Maybe I did it wrong.  Dunno.  Also, I can't benchmark
a SMAP machine, since I don't have one.  (Does anyone?  I'm currently
tempted to wait for Skylake before upgrading all my systems.)

In fact, I think we should change all the irqrestore code to do

if (flags & X86_EFLAFS_IF)
        sti;

I can send a v3 with the unlikely code moved out of line.

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01 15:50       ` Andy Lutomirski
@ 2014-10-01 16:04         ` Andy Lutomirski
  2014-10-01 16:17           ` H. Peter Anvin
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2014-10-01 16:04 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Sebastian Lackner, X86 ML, Thomas Gleixner, Anish Bhatt,
	Ingo Molnar, linux-kernel@vger.kernel.org, Chuck Ebbert, stable

On Wed, Oct 1, 2014 at 8:50 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Oct 1, 2014 8:26 AM, "H. Peter Anvin" <hpa@zytor.com> wrote:
>>
>> On 10/01/2014 08:22 AM, H. Peter Anvin wrote:
>> > On 09/30/2014 09:51 PM, Andy Lutomirski wrote:
>> >>
>> >> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
>> >> index 4299eb05023c..44d1dd371454 100644
>> >> --- a/arch/x86/ia32/ia32entry.S
>> >> +++ b/arch/x86/ia32/ia32entry.S
>> >> @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target)
>> >>  1:  movl    (%rbp),%ebp
>> >>      _ASM_EXTABLE(1b,ia32_badarg)
>> >>      ASM_CLAC
>> >> +
>> >> +    /*
>> >> +     * Sysenter doesn't filter flags, so we need to clear NT
>> >> +     * ourselves.  To save a few cycles, we can check whether
>> >> +     * NT was set instead of doing an unconditional popfq.
>> >> +     */
>> >> +    testl $X86_EFLAGS_NT,EFLAGS(%rsp)       /* saved EFLAGS match cpu */
>> >> +    jz 1f
>> >> +    pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED)
>> >> +    popfq_cfi
>> >> +1:
>> >> +
>> >
>> > I'm wondering if it would be easier to just remove ASM_CLAC and do this
>> > unconditionally.  On SMAP-enabled hardware then that gives us back some
>> > of the cycles, may make the branch unnecessary.
>> >
>>
>> Heck, we can drop the CLD and the STI as well (with some tweaking in
>> ia32_badarg.)
>
> I prototyped this, and performance sucked.  I suspect that cld and sti
> are fairly well optimized, that I ended up introducing stalls due to
> stack manipulation, and that Sandy Bridge's popfq microcode is just
> not that fast.  Maybe I did it wrong.  Dunno.  Also, I can't benchmark
> a SMAP machine, since I don't have one.  (Does anyone?  I'm currently
> tempted to wait for Skylake before upgrading all my systems.)

Agner Fog's tables for Sandy Bridge have 9 uops for popf and
reciprocal throughput 18.  sti isn't listed for Sandy Bridge or
anything similar, but cld is 3 uops with reciprocal throughput 4.
Also, popf accesses rsp, and the sysenter code is very heavy on stack
manipulation.

--Andy

>
> In fact, I think we should change all the irqrestore code to do
>
> if (flags & X86_EFLAFS_IF)
>         sti;
>
> I can send a v3 with the unlikely code moved out of line.
>
> --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace
  2014-10-01 16:04         ` Andy Lutomirski
@ 2014-10-01 16:17           ` H. Peter Anvin
  0 siblings, 0 replies; 14+ messages in thread
From: H. Peter Anvin @ 2014-10-01 16:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sebastian Lackner, X86 ML, Thomas Gleixner, Anish Bhatt,
	Ingo Molnar, linux-kernel@vger.kernel.org, Chuck Ebbert, stable

On 10/01/2014 09:04 AM, Andy Lutomirski wrote:
> 
> Agner Fog's tables for Sandy Bridge have 9 uops for popf and
> reciprocal throughput 18.  sti isn't listed for Sandy Bridge or
> anything similar, but cld is 3 uops with reciprocal throughput 4.
> Also, popf accesses rsp, and the sysenter code is very heavy on stack
> manipulation.
> 

It does a stack operation.  Newer CPUs optimize stack accesses pretty
heavily.  That doesn't mean back-to-back push/pop are all that
optimized, I wonder if it would help separating them.  popf is unlikely
to ever be all that fast.

	-hpa


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-10-01 16:17 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1412138935.git.luto@amacapital.net>
2014-10-01  4:51 ` [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
2014-10-01  5:09   ` Sebastian Lackner
2014-10-01  5:24     ` Andy Lutomirski
2014-10-01 15:19       ` H. Peter Anvin
2014-10-01 14:09   ` Chuck Ebbert
2014-10-01 14:32     ` Chuck Ebbert
2014-10-01 14:46       ` Andy Lutomirski
2014-10-01 14:56         ` Chuck Ebbert
2014-10-01 15:03           ` Andy Lutomirski
2014-10-01 15:22   ` H. Peter Anvin
2014-10-01 15:26     ` H. Peter Anvin
2014-10-01 15:50       ` Andy Lutomirski
2014-10-01 16:04         ` Andy Lutomirski
2014-10-01 16:17           ` H. Peter Anvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).