* [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace [not found] <cover.1412138935.git.luto@amacapital.net> @ 2014-10-01 4:51 ` Andy Lutomirski 2014-10-01 5:09 ` Sebastian Lackner ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Andy Lutomirski @ 2014-10-01 4:51 UTC (permalink / raw) To: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin Cc: Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org, Chuck Ebbert, Andy Lutomirski, stable The NT flag doesn't do anything in long mode other than causing IRET to #GP. Oddly, CPL3 code can still set NT using popf. Entry via hardware or software interrupt clears NT automatically, so the only relevant entries are fast syscalls. If user code causes kernel code to run with NT set, then there's at least some (small) chance that it could cause trouble. For example, user code could cause a call to EFI code with NT set, and who knows what would happen? Apparently some games on Wine sometimes do this (!), and, if an IRET return happens, they will segfault. That segfault cannot be handled, because signal delivery fails, too. This patch programs the CPU to clear NT on entry via SYSCALL (both 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT in software on entry via SYSENTER. To save a few cycles, this borrows a trick from Jan Beulich in Xen: it checks whether NT is set before trying to clear it. As a result, it seems to have very little effect on SYSENTER performance on my machine. Testers beware: on Xen, SYSENTER with NT set turns into a GPF. I haven't touched anything on 32-bit kernels. The syscall mask change comes from a variant of this patch by Anish Bhatt. Cc: stable@vger.kernel.org Reported-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> --- arch/x86/ia32/ia32entry.S | 12 ++++++++++++ arch/x86/kernel/cpu/common.c | 2 +- 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S index 4299eb05023c..44d1dd371454 100644 --- a/arch/x86/ia32/ia32entry.S +++ b/arch/x86/ia32/ia32entry.S @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target) 1: movl (%rbp),%ebp _ASM_EXTABLE(1b,ia32_badarg) ASM_CLAC + + /* + * Sysenter doesn't filter flags, so we need to clear NT + * ourselves. To save a few cycles, we can check whether + * NT was set instead of doing an unconditional popfq. + */ + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ + jz 1f + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) + popfq_cfi +1: + orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET) testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) CFI_REMEMBER_STATE diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index e4ab2b42bd6f..31265580c38a 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1184,7 +1184,7 @@ void syscall_init(void) /* Flags to clear on syscall */ wrmsrl(MSR_SYSCALL_MASK, X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF| - X86_EFLAGS_IOPL|X86_EFLAGS_AC); + X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT); } /* -- 1.9.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 4:51 ` [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski @ 2014-10-01 5:09 ` Sebastian Lackner 2014-10-01 5:24 ` Andy Lutomirski 2014-10-01 14:09 ` Chuck Ebbert 2014-10-01 15:22 ` H. Peter Anvin 2 siblings, 1 reply; 14+ messages in thread From: Sebastian Lackner @ 2014-10-01 5:09 UTC (permalink / raw) To: Andy Lutomirski, Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin Cc: Anish Bhatt, linux-kernel@vger.kernel.org, Chuck Ebbert, stable > + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ > + jz 1f > + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) > + popfq_cfi > +1: > + Do you think it makes sense to change the order here, so that no jump happens if NT is not set (which happens a bit more often, than the other way round)? Just a guess though, haven't measured if pipeline effects have such a big influence in this case. ;) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 5:09 ` Sebastian Lackner @ 2014-10-01 5:24 ` Andy Lutomirski 2014-10-01 15:19 ` H. Peter Anvin 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2014-10-01 5:24 UTC (permalink / raw) To: Sebastian Lackner Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin, Anish Bhatt, linux-kernel@vger.kernel.org, Chuck Ebbert, stable On Tue, Sep 30, 2014 at 10:09 PM, Sebastian Lackner <sebastian@fds-team.de> wrote: >> + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ >> + jz 1f >> + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) >> + popfq_cfi >> +1: >> + > > Do you think it makes sense to change the order here, so that no jump happens if > NT is not set (which happens a bit more often, than the other way round)? Just a > guess though, haven't measured if pipeline effects have such a big influence in this > case. ;) > It should be immeasurable in a tight loop, since it will predict correctly almost every time. And, unless cfi state works across .pushsection (does it?), getting the cfi annotations right will be more complicated. --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 5:24 ` Andy Lutomirski @ 2014-10-01 15:19 ` H. Peter Anvin 0 siblings, 0 replies; 14+ messages in thread From: H. Peter Anvin @ 2014-10-01 15:19 UTC (permalink / raw) To: Andy Lutomirski, Sebastian Lackner Cc: Thomas Gleixner, X86 ML, Ingo Molnar, Anish Bhatt, linux-kernel@vger.kernel.org, Chuck Ebbert, stable On 09/30/2014 10:24 PM, Andy Lutomirski wrote: > On Tue, Sep 30, 2014 at 10:09 PM, Sebastian Lackner > <sebastian@fds-team.de> wrote: >>> + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ >>> + jz 1f >>> + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) >>> + popfq_cfi >>> +1: >>> + >> >> Do you think it makes sense to change the order here, so that no jump happens if >> NT is not set (which happens a bit more often, than the other way round)? Just a >> guess though, haven't measured if pipeline effects have such a big influence in this >> case. ;) >> > > It should be immeasurable in a tight loop, since it will predict > correctly almost every time. And, unless cfi state works across > .pushsection (does it?), getting the cfi annotations right will be > more complicated. > It does, actually... otherwise it would be almost impossible to use in a lot of cases. -hpa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 4:51 ` [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski 2014-10-01 5:09 ` Sebastian Lackner @ 2014-10-01 14:09 ` Chuck Ebbert 2014-10-01 14:32 ` Chuck Ebbert 2014-10-01 15:22 ` H. Peter Anvin 2 siblings, 1 reply; 14+ messages in thread From: Chuck Ebbert @ 2014-10-01 14:09 UTC (permalink / raw) To: Andy Lutomirski Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin, Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org, stable On Tue, 30 Sep 2014 21:51:27 -0700 Andy Lutomirski <luto@amacapital.net> wrote: > The NT flag doesn't do anything in long mode other than causing IRET > to #GP. Oddly, CPL3 code can still set NT using popf. > > Entry via hardware or software interrupt clears NT automatically, so > the only relevant entries are fast syscalls. > > If user code causes kernel code to run with NT set, then there's at > least some (small) chance that it could cause trouble. For example, > user code could cause a call to EFI code with NT set, and who knows > what would happen? Apparently some games on Wine sometimes do > this (!), and, if an IRET return happens, they will segfault. That > segfault cannot be handled, because signal delivery fails, too. > > This patch programs the CPU to clear NT on entry via SYSCALL (both > 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT > in software on entry via SYSENTER. > > To save a few cycles, this borrows a trick from Jan Beulich in Xen: > it checks whether NT is set before trying to clear it. As a result, > it seems to have very little effect on SYSENTER performance on my > machine. > > Testers beware: on Xen, SYSENTER with NT set turns into a GPF. > > I haven't touched anything on 32-bit kernels. > > The syscall mask change comes from a variant of this patch by Anish > Bhatt. > > Cc: stable@vger.kernel.org > Reported-by: Anish Bhatt <anish@chelsio.com> > Signed-off-by: Andy Lutomirski <luto@amacapital.net> > --- > arch/x86/ia32/ia32entry.S | 12 ++++++++++++ > arch/x86/kernel/cpu/common.c | 2 +- > 2 files changed, 13 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S > index 4299eb05023c..44d1dd371454 100644 > --- a/arch/x86/ia32/ia32entry.S > +++ b/arch/x86/ia32/ia32entry.S > @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target) > 1: movl (%rbp),%ebp > _ASM_EXTABLE(1b,ia32_badarg) > ASM_CLAC > + > + /* > + * Sysenter doesn't filter flags, so we need to clear NT > + * ourselves. To save a few cycles, we can check whether > + * NT was set instead of doing an unconditional popfq. > + */ > + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ > + jz 1f > + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) > + popfq_cfi > +1: > + I think you've gone backwards with this version. The earlier one got some of the performance loss back by not needing to do the "cld" insn. You should just replace that "cld" (line 146) with pushfq_cfi $2 popfq_cfi Unfortunately I'm not set up to test that yet. But I did look at the SDM and can't see a need to preserve any of the flags. > orl $TS_COMPAT,TI_status+THREAD_INFO(%rsp,RIP-ARGOFFSET) > testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) > CFI_REMEMBER_STATE > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index e4ab2b42bd6f..31265580c38a 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -1184,7 +1184,7 @@ void syscall_init(void) > /* Flags to clear on syscall */ > wrmsrl(MSR_SYSCALL_MASK, > X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF| > - X86_EFLAGS_IOPL|X86_EFLAGS_AC); > + X86_EFLAGS_IOPL|X86_EFLAGS_AC|X86_EFLAGS_NT); > } > > /* ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 14:09 ` Chuck Ebbert @ 2014-10-01 14:32 ` Chuck Ebbert 2014-10-01 14:46 ` Andy Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: Chuck Ebbert @ 2014-10-01 14:32 UTC (permalink / raw) To: Andy Lutomirski Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin, Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org, stable On Wed, 1 Oct 2014 09:09:13 -0500 Chuck Ebbert <cebbert.lkml@gmail.com> wrote: > On Tue, 30 Sep 2014 21:51:27 -0700 > Andy Lutomirski <luto@amacapital.net> wrote: > > > The NT flag doesn't do anything in long mode other than causing IRET > > to #GP. Oddly, CPL3 code can still set NT using popf. > > > > Entry via hardware or software interrupt clears NT automatically, so > > the only relevant entries are fast syscalls. > > > > If user code causes kernel code to run with NT set, then there's at > > least some (small) chance that it could cause trouble. For example, > > user code could cause a call to EFI code with NT set, and who knows > > what would happen? Apparently some games on Wine sometimes do > > this (!), and, if an IRET return happens, they will segfault. That > > segfault cannot be handled, because signal delivery fails, too. > > > > This patch programs the CPU to clear NT on entry via SYSCALL (both > > 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT > > in software on entry via SYSENTER. > > > > To save a few cycles, this borrows a trick from Jan Beulich in Xen: > > it checks whether NT is set before trying to clear it. As a result, > > it seems to have very little effect on SYSENTER performance on my > > machine. > > > > Testers beware: on Xen, SYSENTER with NT set turns into a GPF. > > > > I haven't touched anything on 32-bit kernels. > > > > The syscall mask change comes from a variant of this patch by Anish > > Bhatt. > > > > Cc: stable@vger.kernel.org > > Reported-by: Anish Bhatt <anish@chelsio.com> > > Signed-off-by: Andy Lutomirski <luto@amacapital.net> > > --- > > arch/x86/ia32/ia32entry.S | 12 ++++++++++++ > > arch/x86/kernel/cpu/common.c | 2 +- > > 2 files changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S > > index 4299eb05023c..44d1dd371454 100644 > > --- a/arch/x86/ia32/ia32entry.S > > +++ b/arch/x86/ia32/ia32entry.S > > @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target) > > 1: movl (%rbp),%ebp > > _ASM_EXTABLE(1b,ia32_badarg) > > ASM_CLAC > > + > > + /* > > + * Sysenter doesn't filter flags, so we need to clear NT > > + * ourselves. To save a few cycles, we can check whether > > + * NT was set instead of doing an unconditional popfq. > > + */ > > + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ > > + jz 1f > > + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) > > + popfq_cfi > > +1: > > + > > I think you've gone backwards with this version. The earlier one got > some of the performance loss back by not needing to do the "cld" insn. > > You should just replace that "cld" (line 146) with > > pushfq_cfi $2 > popfq_cfi > > Unfortunately I'm not set up to test that yet. But I did look at > the SDM and can't see a need to preserve any of the flags. > <sigh> that's: pushfw_cfi $0x202 IF needs to stay on because we've already enabled interrupts after sysenter. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 14:32 ` Chuck Ebbert @ 2014-10-01 14:46 ` Andy Lutomirski 2014-10-01 14:56 ` Chuck Ebbert 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2014-10-01 14:46 UTC (permalink / raw) To: Chuck Ebbert Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin, Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org, stable On Wed, Oct 1, 2014 at 7:32 AM, Chuck Ebbert <cebbert.lkml@gmail.com> wrote: > On Wed, 1 Oct 2014 09:09:13 -0500 > Chuck Ebbert <cebbert.lkml@gmail.com> wrote: > >> On Tue, 30 Sep 2014 21:51:27 -0700 >> Andy Lutomirski <luto@amacapital.net> wrote: >> >> > The NT flag doesn't do anything in long mode other than causing IRET >> > to #GP. Oddly, CPL3 code can still set NT using popf. >> > >> > Entry via hardware or software interrupt clears NT automatically, so >> > the only relevant entries are fast syscalls. >> > >> > If user code causes kernel code to run with NT set, then there's at >> > least some (small) chance that it could cause trouble. For example, >> > user code could cause a call to EFI code with NT set, and who knows >> > what would happen? Apparently some games on Wine sometimes do >> > this (!), and, if an IRET return happens, they will segfault. That >> > segfault cannot be handled, because signal delivery fails, too. >> > >> > This patch programs the CPU to clear NT on entry via SYSCALL (both >> > 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT >> > in software on entry via SYSENTER. >> > >> > To save a few cycles, this borrows a trick from Jan Beulich in Xen: >> > it checks whether NT is set before trying to clear it. As a result, >> > it seems to have very little effect on SYSENTER performance on my >> > machine. >> > >> > Testers beware: on Xen, SYSENTER with NT set turns into a GPF. >> > >> > I haven't touched anything on 32-bit kernels. >> > >> > The syscall mask change comes from a variant of this patch by Anish >> > Bhatt. >> > >> > Cc: stable@vger.kernel.org >> > Reported-by: Anish Bhatt <anish@chelsio.com> >> > Signed-off-by: Andy Lutomirski <luto@amacapital.net> >> > --- >> > arch/x86/ia32/ia32entry.S | 12 ++++++++++++ >> > arch/x86/kernel/cpu/common.c | 2 +- >> > 2 files changed, 13 insertions(+), 1 deletion(-) >> > >> > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S >> > index 4299eb05023c..44d1dd371454 100644 >> > --- a/arch/x86/ia32/ia32entry.S >> > +++ b/arch/x86/ia32/ia32entry.S >> > @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target) >> > 1: movl (%rbp),%ebp >> > _ASM_EXTABLE(1b,ia32_badarg) >> > ASM_CLAC >> > + >> > + /* >> > + * Sysenter doesn't filter flags, so we need to clear NT >> > + * ourselves. To save a few cycles, we can check whether >> > + * NT was set instead of doing an unconditional popfq. >> > + */ >> > + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ >> > + jz 1f >> > + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) >> > + popfq_cfi >> > +1: >> > + >> >> I think you've gone backwards with this version. The earlier one got >> some of the performance loss back by not needing to do the "cld" insn. >> >> You should just replace that "cld" (line 146) with >> >> pushfq_cfi $2 >> popfq_cfi >> >> Unfortunately I'm not set up to test that yet. But I did look at >> the SDM and can't see a need to preserve any of the flags. >> > > > <sigh> that's: > > pushfw_cfi $0x202 > > IF needs to stay on because we've already enabled interrupts after > sysenter. I tried exactly this. It was much slower than the version I sent. --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 14:46 ` Andy Lutomirski @ 2014-10-01 14:56 ` Chuck Ebbert 2014-10-01 15:03 ` Andy Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: Chuck Ebbert @ 2014-10-01 14:56 UTC (permalink / raw) To: Andy Lutomirski Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin, Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org, stable On Wed, 1 Oct 2014 07:46:54 -0700 Andy Lutomirski <luto@amacapital.net> wrote: > On Wed, Oct 1, 2014 at 7:32 AM, Chuck Ebbert <cebbert.lkml@gmail.com> wrote: > > On Wed, 1 Oct 2014 09:09:13 -0500 > > Chuck Ebbert <cebbert.lkml@gmail.com> wrote: > > > >> On Tue, 30 Sep 2014 21:51:27 -0700 > >> Andy Lutomirski <luto@amacapital.net> wrote: > >> > >> > The NT flag doesn't do anything in long mode other than causing IRET > >> > to #GP. Oddly, CPL3 code can still set NT using popf. > >> > > >> > Entry via hardware or software interrupt clears NT automatically, so > >> > the only relevant entries are fast syscalls. > >> > > >> > If user code causes kernel code to run with NT set, then there's at > >> > least some (small) chance that it could cause trouble. For example, > >> > user code could cause a call to EFI code with NT set, and who knows > >> > what would happen? Apparently some games on Wine sometimes do > >> > this (!), and, if an IRET return happens, they will segfault. That > >> > segfault cannot be handled, because signal delivery fails, too. > >> > > >> > This patch programs the CPU to clear NT on entry via SYSCALL (both > >> > 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT > >> > in software on entry via SYSENTER. > >> > > >> > To save a few cycles, this borrows a trick from Jan Beulich in Xen: > >> > it checks whether NT is set before trying to clear it. As a result, > >> > it seems to have very little effect on SYSENTER performance on my > >> > machine. > >> > > >> > Testers beware: on Xen, SYSENTER with NT set turns into a GPF. > >> > > >> > I haven't touched anything on 32-bit kernels. > >> > > >> > The syscall mask change comes from a variant of this patch by Anish > >> > Bhatt. > >> > > >> > Cc: stable@vger.kernel.org > >> > Reported-by: Anish Bhatt <anish@chelsio.com> > >> > Signed-off-by: Andy Lutomirski <luto@amacapital.net> > >> > --- > >> > arch/x86/ia32/ia32entry.S | 12 ++++++++++++ > >> > arch/x86/kernel/cpu/common.c | 2 +- > >> > 2 files changed, 13 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S > >> > index 4299eb05023c..44d1dd371454 100644 > >> > --- a/arch/x86/ia32/ia32entry.S > >> > +++ b/arch/x86/ia32/ia32entry.S > >> > @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target) > >> > 1: movl (%rbp),%ebp > >> > _ASM_EXTABLE(1b,ia32_badarg) > >> > ASM_CLAC > >> > + > >> > + /* > >> > + * Sysenter doesn't filter flags, so we need to clear NT > >> > + * ourselves. To save a few cycles, we can check whether > >> > + * NT was set instead of doing an unconditional popfq. > >> > + */ > >> > + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ > >> > + jz 1f > >> > + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) > >> > + popfq_cfi > >> > +1: > >> > + > >> > >> I think you've gone backwards with this version. The earlier one got > >> some of the performance loss back by not needing to do the "cld" insn. > >> > >> You should just replace that "cld" (line 146) with > >> > >> pushfq_cfi $2 > >> popfq_cfi > >> > >> Unfortunately I'm not set up to test that yet. But I did look at > >> the SDM and can't see a need to preserve any of the flags. > >> > > > > > > <sigh> that's: > > > > pushfw_cfi $0x202 > > > > IF needs to stay on because we've already enabled interrupts after > > sysenter. > > I tried exactly this. It was much slower than the version I sent. > Yeah, it looks like a new paravirt op that enables interrupts and clears all the other flags would be the only way to do this without at least some impact on performance. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 14:56 ` Chuck Ebbert @ 2014-10-01 15:03 ` Andy Lutomirski 0 siblings, 0 replies; 14+ messages in thread From: Andy Lutomirski @ 2014-10-01 15:03 UTC (permalink / raw) To: Chuck Ebbert Cc: Thomas Gleixner, X86 ML, Ingo Molnar, H. Peter Anvin, Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org, stable On Wed, Oct 1, 2014 at 7:56 AM, Chuck Ebbert <cebbert.lkml@gmail.com> wrote: > On Wed, 1 Oct 2014 07:46:54 -0700 > Andy Lutomirski <luto@amacapital.net> wrote: > >> On Wed, Oct 1, 2014 at 7:32 AM, Chuck Ebbert <cebbert.lkml@gmail.com> wrote: >> > On Wed, 1 Oct 2014 09:09:13 -0500 >> > Chuck Ebbert <cebbert.lkml@gmail.com> wrote: >> > >> >> On Tue, 30 Sep 2014 21:51:27 -0700 >> >> Andy Lutomirski <luto@amacapital.net> wrote: >> >> >> >> > The NT flag doesn't do anything in long mode other than causing IRET >> >> > to #GP. Oddly, CPL3 code can still set NT using popf. >> >> > >> >> > Entry via hardware or software interrupt clears NT automatically, so >> >> > the only relevant entries are fast syscalls. >> >> > >> >> > If user code causes kernel code to run with NT set, then there's at >> >> > least some (small) chance that it could cause trouble. For example, >> >> > user code could cause a call to EFI code with NT set, and who knows >> >> > what would happen? Apparently some games on Wine sometimes do >> >> > this (!), and, if an IRET return happens, they will segfault. That >> >> > segfault cannot be handled, because signal delivery fails, too. >> >> > >> >> > This patch programs the CPU to clear NT on entry via SYSCALL (both >> >> > 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT >> >> > in software on entry via SYSENTER. >> >> > >> >> > To save a few cycles, this borrows a trick from Jan Beulich in Xen: >> >> > it checks whether NT is set before trying to clear it. As a result, >> >> > it seems to have very little effect on SYSENTER performance on my >> >> > machine. >> >> > >> >> > Testers beware: on Xen, SYSENTER with NT set turns into a GPF. >> >> > >> >> > I haven't touched anything on 32-bit kernels. >> >> > >> >> > The syscall mask change comes from a variant of this patch by Anish >> >> > Bhatt. >> >> > >> >> > Cc: stable@vger.kernel.org >> >> > Reported-by: Anish Bhatt <anish@chelsio.com> >> >> > Signed-off-by: Andy Lutomirski <luto@amacapital.net> >> >> > --- >> >> > arch/x86/ia32/ia32entry.S | 12 ++++++++++++ >> >> > arch/x86/kernel/cpu/common.c | 2 +- >> >> > 2 files changed, 13 insertions(+), 1 deletion(-) >> >> > >> >> > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S >> >> > index 4299eb05023c..44d1dd371454 100644 >> >> > --- a/arch/x86/ia32/ia32entry.S >> >> > +++ b/arch/x86/ia32/ia32entry.S >> >> > @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target) >> >> > 1: movl (%rbp),%ebp >> >> > _ASM_EXTABLE(1b,ia32_badarg) >> >> > ASM_CLAC >> >> > + >> >> > + /* >> >> > + * Sysenter doesn't filter flags, so we need to clear NT >> >> > + * ourselves. To save a few cycles, we can check whether >> >> > + * NT was set instead of doing an unconditional popfq. >> >> > + */ >> >> > + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ >> >> > + jz 1f >> >> > + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) >> >> > + popfq_cfi >> >> > +1: >> >> > + >> >> >> >> I think you've gone backwards with this version. The earlier one got >> >> some of the performance loss back by not needing to do the "cld" insn. >> >> >> >> You should just replace that "cld" (line 146) with >> >> >> >> pushfq_cfi $2 >> >> popfq_cfi >> >> >> >> Unfortunately I'm not set up to test that yet. But I did look at >> >> the SDM and can't see a need to preserve any of the flags. >> >> >> > >> > >> > <sigh> that's: >> > >> > pushfw_cfi $0x202 >> > >> > IF needs to stay on because we've already enabled interrupts after >> > sysenter. >> >> I tried exactly this. It was much slower than the version I sent. >> > > Yeah, it looks like a new paravirt op that enables interrupts and > clears all the other flags would be the only way to do this without at > least some impact on performance. We have that -- it's called something like setfl. But it still wouldn't help. It seems that cld, test, jnz is simply much faster than popfq. If we could fold it with the sti earlier, *maybe* that would be a win, but then we'd also have to patch the saved flags to avoid returning to userspace with interrupts off. (And I tried that. It still didn't seem to be fast enough.) --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 4:51 ` [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski 2014-10-01 5:09 ` Sebastian Lackner 2014-10-01 14:09 ` Chuck Ebbert @ 2014-10-01 15:22 ` H. Peter Anvin 2014-10-01 15:26 ` H. Peter Anvin 2 siblings, 1 reply; 14+ messages in thread From: H. Peter Anvin @ 2014-10-01 15:22 UTC (permalink / raw) To: Andy Lutomirski, Thomas Gleixner, X86 ML, Ingo Molnar Cc: Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org, Chuck Ebbert, stable On 09/30/2014 09:51 PM, Andy Lutomirski wrote: > > diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S > index 4299eb05023c..44d1dd371454 100644 > --- a/arch/x86/ia32/ia32entry.S > +++ b/arch/x86/ia32/ia32entry.S > @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target) > 1: movl (%rbp),%ebp > _ASM_EXTABLE(1b,ia32_badarg) > ASM_CLAC > + > + /* > + * Sysenter doesn't filter flags, so we need to clear NT > + * ourselves. To save a few cycles, we can check whether > + * NT was set instead of doing an unconditional popfq. > + */ > + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ > + jz 1f > + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) > + popfq_cfi > +1: > + I'm wondering if it would be easier to just remove ASM_CLAC and do this unconditionally. On SMAP-enabled hardware then that gives us back some of the cycles, may make the branch unnecessary. -hpa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 15:22 ` H. Peter Anvin @ 2014-10-01 15:26 ` H. Peter Anvin 2014-10-01 15:50 ` Andy Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: H. Peter Anvin @ 2014-10-01 15:26 UTC (permalink / raw) To: Andy Lutomirski, Thomas Gleixner, X86 ML, Ingo Molnar Cc: Sebastian Lackner, Anish Bhatt, linux-kernel@vger.kernel.org, Chuck Ebbert, stable On 10/01/2014 08:22 AM, H. Peter Anvin wrote: > On 09/30/2014 09:51 PM, Andy Lutomirski wrote: >> >> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S >> index 4299eb05023c..44d1dd371454 100644 >> --- a/arch/x86/ia32/ia32entry.S >> +++ b/arch/x86/ia32/ia32entry.S >> @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target) >> 1: movl (%rbp),%ebp >> _ASM_EXTABLE(1b,ia32_badarg) >> ASM_CLAC >> + >> + /* >> + * Sysenter doesn't filter flags, so we need to clear NT >> + * ourselves. To save a few cycles, we can check whether >> + * NT was set instead of doing an unconditional popfq. >> + */ >> + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ >> + jz 1f >> + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) >> + popfq_cfi >> +1: >> + > > I'm wondering if it would be easier to just remove ASM_CLAC and do this > unconditionally. On SMAP-enabled hardware then that gives us back some > of the cycles, may make the branch unnecessary. > Heck, we can drop the CLD and the STI as well (with some tweaking in ia32_badarg.) -hpa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 15:26 ` H. Peter Anvin @ 2014-10-01 15:50 ` Andy Lutomirski 2014-10-01 16:04 ` Andy Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2014-10-01 15:50 UTC (permalink / raw) To: H. Peter Anvin Cc: Sebastian Lackner, X86 ML, Thomas Gleixner, Anish Bhatt, Ingo Molnar, linux-kernel@vger.kernel.org, Chuck Ebbert, stable On Oct 1, 2014 8:26 AM, "H. Peter Anvin" <hpa@zytor.com> wrote: > > On 10/01/2014 08:22 AM, H. Peter Anvin wrote: > > On 09/30/2014 09:51 PM, Andy Lutomirski wrote: > >> > >> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S > >> index 4299eb05023c..44d1dd371454 100644 > >> --- a/arch/x86/ia32/ia32entry.S > >> +++ b/arch/x86/ia32/ia32entry.S > >> @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target) > >> 1: movl (%rbp),%ebp > >> _ASM_EXTABLE(1b,ia32_badarg) > >> ASM_CLAC > >> + > >> + /* > >> + * Sysenter doesn't filter flags, so we need to clear NT > >> + * ourselves. To save a few cycles, we can check whether > >> + * NT was set instead of doing an unconditional popfq. > >> + */ > >> + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ > >> + jz 1f > >> + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) > >> + popfq_cfi > >> +1: > >> + > > > > I'm wondering if it would be easier to just remove ASM_CLAC and do this > > unconditionally. On SMAP-enabled hardware then that gives us back some > > of the cycles, may make the branch unnecessary. > > > > Heck, we can drop the CLD and the STI as well (with some tweaking in > ia32_badarg.) I prototyped this, and performance sucked. I suspect that cld and sti are fairly well optimized, that I ended up introducing stalls due to stack manipulation, and that Sandy Bridge's popfq microcode is just not that fast. Maybe I did it wrong. Dunno. Also, I can't benchmark a SMAP machine, since I don't have one. (Does anyone? I'm currently tempted to wait for Skylake before upgrading all my systems.) In fact, I think we should change all the irqrestore code to do if (flags & X86_EFLAFS_IF) sti; I can send a v3 with the unlikely code moved out of line. --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 15:50 ` Andy Lutomirski @ 2014-10-01 16:04 ` Andy Lutomirski 2014-10-01 16:17 ` H. Peter Anvin 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2014-10-01 16:04 UTC (permalink / raw) To: H. Peter Anvin Cc: Sebastian Lackner, X86 ML, Thomas Gleixner, Anish Bhatt, Ingo Molnar, linux-kernel@vger.kernel.org, Chuck Ebbert, stable On Wed, Oct 1, 2014 at 8:50 AM, Andy Lutomirski <luto@amacapital.net> wrote: > On Oct 1, 2014 8:26 AM, "H. Peter Anvin" <hpa@zytor.com> wrote: >> >> On 10/01/2014 08:22 AM, H. Peter Anvin wrote: >> > On 09/30/2014 09:51 PM, Andy Lutomirski wrote: >> >> >> >> diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S >> >> index 4299eb05023c..44d1dd371454 100644 >> >> --- a/arch/x86/ia32/ia32entry.S >> >> +++ b/arch/x86/ia32/ia32entry.S >> >> @@ -151,6 +151,18 @@ ENTRY(ia32_sysenter_target) >> >> 1: movl (%rbp),%ebp >> >> _ASM_EXTABLE(1b,ia32_badarg) >> >> ASM_CLAC >> >> + >> >> + /* >> >> + * Sysenter doesn't filter flags, so we need to clear NT >> >> + * ourselves. To save a few cycles, we can check whether >> >> + * NT was set instead of doing an unconditional popfq. >> >> + */ >> >> + testl $X86_EFLAGS_NT,EFLAGS(%rsp) /* saved EFLAGS match cpu */ >> >> + jz 1f >> >> + pushq_cfi $(X86_EFLAGS_IF|X86_EFLAGS_FIXED) >> >> + popfq_cfi >> >> +1: >> >> + >> > >> > I'm wondering if it would be easier to just remove ASM_CLAC and do this >> > unconditionally. On SMAP-enabled hardware then that gives us back some >> > of the cycles, may make the branch unnecessary. >> > >> >> Heck, we can drop the CLD and the STI as well (with some tweaking in >> ia32_badarg.) > > I prototyped this, and performance sucked. I suspect that cld and sti > are fairly well optimized, that I ended up introducing stalls due to > stack manipulation, and that Sandy Bridge's popfq microcode is just > not that fast. Maybe I did it wrong. Dunno. Also, I can't benchmark > a SMAP machine, since I don't have one. (Does anyone? I'm currently > tempted to wait for Skylake before upgrading all my systems.) Agner Fog's tables for Sandy Bridge have 9 uops for popf and reciprocal throughput 18. sti isn't listed for Sandy Bridge or anything similar, but cld is 3 uops with reciprocal throughput 4. Also, popf accesses rsp, and the sysenter code is very heavy on stack manipulation. --Andy > > In fact, I think we should change all the irqrestore code to do > > if (flags & X86_EFLAFS_IF) > sti; > > I can send a v3 with the unlikely code moved out of line. > > --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace 2014-10-01 16:04 ` Andy Lutomirski @ 2014-10-01 16:17 ` H. Peter Anvin 0 siblings, 0 replies; 14+ messages in thread From: H. Peter Anvin @ 2014-10-01 16:17 UTC (permalink / raw) To: Andy Lutomirski Cc: Sebastian Lackner, X86 ML, Thomas Gleixner, Anish Bhatt, Ingo Molnar, linux-kernel@vger.kernel.org, Chuck Ebbert, stable On 10/01/2014 09:04 AM, Andy Lutomirski wrote: > > Agner Fog's tables for Sandy Bridge have 9 uops for popf and > reciprocal throughput 18. sti isn't listed for Sandy Bridge or > anything similar, but cld is 3 uops with reciprocal throughput 4. > Also, popf accesses rsp, and the sysenter code is very heavy on stack > manipulation. > It does a stack operation. Newer CPUs optimize stack accesses pretty heavily. That doesn't mean back-to-back push/pop are all that optimized, I wonder if it would help separating them. popf is unlikely to ever be all that fast. -hpa ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-10-01 16:17 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <cover.1412138935.git.luto@amacapital.net>
2014-10-01 4:51 ` [PATCH v2 1/2] x86_64,entry: Filter RFLAGS.NT on entry from userspace Andy Lutomirski
2014-10-01 5:09 ` Sebastian Lackner
2014-10-01 5:24 ` Andy Lutomirski
2014-10-01 15:19 ` H. Peter Anvin
2014-10-01 14:09 ` Chuck Ebbert
2014-10-01 14:32 ` Chuck Ebbert
2014-10-01 14:46 ` Andy Lutomirski
2014-10-01 14:56 ` Chuck Ebbert
2014-10-01 15:03 ` Andy Lutomirski
2014-10-01 15:22 ` H. Peter Anvin
2014-10-01 15:26 ` H. Peter Anvin
2014-10-01 15:50 ` Andy Lutomirski
2014-10-01 16:04 ` Andy Lutomirski
2014-10-01 16:17 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).