* [PATCH 0/2] x86/nmi: Optimize address compares with better jump algorithm
@ 2017-03-09 22:42 Steven Rostedt
2017-03-09 22:42 ` [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code Steven Rostedt
2017-03-09 22:42 ` [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code Steven Rostedt
0 siblings, 2 replies; 10+ messages in thread
From: Steven Rostedt @ 2017-03-09 22:42 UTC (permalink / raw)
To: linux-kernel
Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
Andy Lutomirski, Andrew Morton
I started working on the x86-32 fentry code and I noticed that I had some
old patches in my local tip/x86/core branch. Looking at them, they still
seem relevant.
They are from 2014 when I was working at Red Hat. I kept the author having
(Red Hat) and since I forward ported them now that I work for VMware,
I used (VMware) in my Signed-off-by.
Both these patches are basically doing the same thing but to different
parts. One is making a better test of the RIP the other is a better
test of the RSP. Each patch describe what they are doing.
Steven Rostedt (Red Hat) (2):
x86/nmi: Optimize the check for being in the repeat_nmi code
x86/nmi: Fix and optimize the NMI stack check code
----
arch/x86/entry/entry_64.S | 32 +++++++++++++++-----------------
1 file changed, 15 insertions(+), 17 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code 2017-03-09 22:42 [PATCH 0/2] x86/nmi: Optimize address compares with better jump algorithm Steven Rostedt @ 2017-03-09 22:42 ` Steven Rostedt 2017-03-10 2:42 ` Andy Lutomirski 2017-03-09 22:42 ` [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code Steven Rostedt 1 sibling, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2017-03-09 22:42 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski, Andrew Morton [-- Attachment #1: 0001-x86-nmi-Optimize-the-check-for-being-in-the-repeat_n.patch --] [-- Type: text/plain, Size: 1202 bytes --] From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org> Linus mentioned that doing two compares can be replaced by a single compare. That is, instead of: movq $repeat_nmi, %rdx cmpq 8(%rsp), %rdx ja not_in_region movq $end_repeat_nmi, %rdx cmpq 8(%rsp), %rdx ja in_region we can replace that with: movq 8(%rsp), %rdx subq $repeat_nmi, %rdx cmpq $end_repeat_nmi-repeat_nmi, %rdx jb in_region Inspired-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> --- arch/x86/entry/entry_64.S | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 044d18ebc43c..3aad759aace2 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1330,13 +1330,10 @@ ENTRY(nmi) * resume the outer NMI. */ - movq $repeat_nmi, %rdx - cmpq 8(%rsp), %rdx - ja 1f - movq $end_repeat_nmi, %rdx - cmpq 8(%rsp), %rdx - ja nested_nmi_out -1: + movq 8(%rsp), %rdx + subq $repeat_nmi, %rdx + cmpq $end_repeat_nmi-repeat_nmi, %rdx + jb nested_nmi_out /* * Now check "NMI executing". If it's set, then we're nested. -- 2.10.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code 2017-03-09 22:42 ` [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code Steven Rostedt @ 2017-03-10 2:42 ` Andy Lutomirski 2017-03-10 3:49 ` Steven Rostedt 0 siblings, 1 reply; 10+ messages in thread From: Andy Lutomirski @ 2017-03-10 2:42 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel@vger.kernel.org, Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski, Andrew Morton On Thu, Mar 9, 2017 at 2:42 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org> > > Linus mentioned that doing two compares can be replaced by a single > compare. That is, instead of: > > movq $repeat_nmi, %rdx > cmpq 8(%rsp), %rdx > ja not_in_region > movq $end_repeat_nmi, %rdx > cmpq 8(%rsp), %rdx > ja in_region > > we can replace that with: > > movq 8(%rsp), %rdx > subq $repeat_nmi, %rdx > cmpq $end_repeat_nmi-repeat_nmi, %rdx > jb in_region Seems reasonable to me. Good luck ever noticing the speedup :) --Andy ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code 2017-03-10 2:42 ` Andy Lutomirski @ 2017-03-10 3:49 ` Steven Rostedt 2017-03-10 3:50 ` Andy Lutomirski 0 siblings, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2017-03-10 3:49 UTC (permalink / raw) To: Andy Lutomirski Cc: linux-kernel@vger.kernel.org, Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski, Andrew Morton On March 9, 2017 9:42:57 PM EST, Andy Lutomirski <luto@amacapital.net> wrote: >On Thu, Mar 9, 2017 at 2:42 PM, Steven Rostedt <rostedt@goodmis.org> >wrote: >> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org> >> >> Linus mentioned that doing two compares can be replaced by a single >> compare. That is, instead of: >> >> movq $repeat_nmi, %rdx >> cmpq 8(%rsp), %rdx >> ja not_in_region >> movq $end_repeat_nmi, %rdx >> cmpq 8(%rsp), %rdx >> ja in_region >> >> we can replace that with: >> >> movq 8(%rsp), %rdx >> subq $repeat_nmi, %rdx >> cmpq $end_repeat_nmi-repeat_nmi, %rdx >> jb in_region > >Seems reasonable to me. Good luck ever noticing the speedup :) > It had nothing to do with speedup. Linus said that the current code makes the assembly programmer in him die a little. I want to cure that. -- Steve >--Andy -- Sent from my Android device with K-9 Mail. Please excuse my brevity. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code 2017-03-10 3:49 ` Steven Rostedt @ 2017-03-10 3:50 ` Andy Lutomirski 2017-03-10 7:20 ` Ingo Molnar 0 siblings, 1 reply; 10+ messages in thread From: Andy Lutomirski @ 2017-03-10 3:50 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel@vger.kernel.org, Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski, Andrew Morton On Thu, Mar 9, 2017 at 7:49 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > On March 9, 2017 9:42:57 PM EST, Andy Lutomirski <luto@amacapital.net> wrote: >>On Thu, Mar 9, 2017 at 2:42 PM, Steven Rostedt <rostedt@goodmis.org> >>wrote: >>> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org> >>> >>> Linus mentioned that doing two compares can be replaced by a single >>> compare. That is, instead of: >>> >>> movq $repeat_nmi, %rdx >>> cmpq 8(%rsp), %rdx >>> ja not_in_region >>> movq $end_repeat_nmi, %rdx >>> cmpq 8(%rsp), %rdx >>> ja in_region >>> >>> we can replace that with: >>> >>> movq 8(%rsp), %rdx >>> subq $repeat_nmi, %rdx >>> cmpq $end_repeat_nmi-repeat_nmi, %rdx >>> jb in_region >> >>Seems reasonable to me. Good luck ever noticing the speedup :) >> > > It had nothing to do with speedup. Linus said that the current code makes the assembly programmer in him die a little. I want to cure that. > One might argue that the world would be a better place if the assembly programmer in some people died a little. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code 2017-03-10 3:50 ` Andy Lutomirski @ 2017-03-10 7:20 ` Ingo Molnar 2017-03-10 19:00 ` Linus Torvalds 0 siblings, 1 reply; 10+ messages in thread From: Ingo Molnar @ 2017-03-10 7:20 UTC (permalink / raw) To: Andy Lutomirski Cc: Steven Rostedt, linux-kernel@vger.kernel.org, Linus Torvalds, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski, Andrew Morton * Andy Lutomirski <luto@amacapital.net> wrote: > > It had nothing to do with speedup. Linus said that the current code makes the > > assembly programmer in him die a little. I want to cure that. > > One might argue that the world would be a better place if the assembly > programmer in some people died a little. Joking aside, I'll bite: while in the kernel we try to avoid ever actually _writing_ new assembly code, assembly programming is still an invaluable skill, because it indirectly improves all the other 99% of non-assembly .c code: - Looking at the C compiler's assembly output tells us how close the code is to optimal. - Being able to tell whether our C abstractions are too far removed from how the compiler will map it to machine instructions is invaluable. - Being able to shape data structures and code in a machine-friendly way. Much would be lost if the assembly programmer went extinct and it's no accident that annotated assembly output is just two <Enter> keys away after launching 'perf top' or 'perf report'. The more developers know assembly the better, IMHO. Thanks, Ingo ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code 2017-03-10 7:20 ` Ingo Molnar @ 2017-03-10 19:00 ` Linus Torvalds 2017-03-10 19:03 ` Andy Lutomirski 0 siblings, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2017-03-10 19:00 UTC (permalink / raw) To: Ingo Molnar Cc: Andy Lutomirski, Steven Rostedt, linux-kernel@vger.kernel.org, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski, Andrew Morton On Thu, Mar 9, 2017 at 11:20 PM, Ingo Molnar <mingo@kernel.org> wrote: > > Joking aside, I'll bite: while in the kernel we try to avoid ever actually > _writing_ new assembly code .. also, when we do, I think we should care about it. If you write asm, and the end result is noticeably worse than what your average compiler would generate, exactly why are you writing it in asm in the first place? So I think people should aim to avoid asm. Andy certainly knows that, and I loved his "rewrite a lot of the low-level system call code" patches. But the corollary to that is that if you _do_ write assembler, please have some pride in the code, and don't half-arse it. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code 2017-03-10 19:00 ` Linus Torvalds @ 2017-03-10 19:03 ` Andy Lutomirski 0 siblings, 0 replies; 10+ messages in thread From: Andy Lutomirski @ 2017-03-10 19:03 UTC (permalink / raw) To: Linus Torvalds Cc: Ingo Molnar, Steven Rostedt, linux-kernel@vger.kernel.org, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski, Andrew Morton On Fri, Mar 10, 2017 at 11:00 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Mar 9, 2017 at 11:20 PM, Ingo Molnar <mingo@kernel.org> wrote: >> >> Joking aside, I'll bite: while in the kernel we try to avoid ever actually >> _writing_ new assembly code > > .. also, when we do, I think we should care about it. > > If you write asm, and the end result is noticeably worse than what > your average compiler would generate, exactly why are you writing it > in asm in the first place? > > So I think people should aim to avoid asm. Andy certainly knows that, > and I loved his "rewrite a lot of the low-level system call code" > patches. > > But the corollary to that is that if you _do_ write assembler, please > have some pride in the code, and don't half-arse it. > Geez, I didn't expect anyone to take my silly comment remotely seriously :) And I do like Steven's patches. --Andy, who just looked at binutils source to figure out WTF "nobits" meant. Take that, asm! ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code 2017-03-09 22:42 [PATCH 0/2] x86/nmi: Optimize address compares with better jump algorithm Steven Rostedt 2017-03-09 22:42 ` [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code Steven Rostedt @ 2017-03-09 22:42 ` Steven Rostedt 2017-03-10 2:43 ` Andy Lutomirski 1 sibling, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2017-03-09 22:42 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski, Andrew Morton, Andy Lutomirski [-- Attachment #1: 0002-x86-nmi-Fix-and-optimize-the-NMI-stack-check-code.patch --] [-- Type: text/plain, Size: 2282 bytes --] From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org> Andy Lutomirski reported an off by one in the NMI stack check for the nested NMI code, where if the stack pointer was one above the actual stack (stack start + STACK_SIZE) it would trigger a false positive. This is not that big of a deal because the stack pointer should never be that. Even if a stack was using the pages just above the NMI stack, it would require the stack about to overflow for this to trigger, which is a much bigger bug than this is fixing. Also, Linus Torvalds pointed out that doing two compares can be accomplish with a single compare. That is: ("reg" is top of stack we are comparing "stack" to) cmpq reg, stack jae label // note, code had one off "ja" instead of "jae" subq size, reg cmpq reg, stack jb label Is the same as: subq $1, reg subq stack, reg cmpq size, reg jae label The subq $1 was added into the leaq by doing: leaq 5*8+7(%rsp), %rdx Added more comments as well. Reported-by: Andy Lutomirski <luto@amacapital.net> Inspired-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> --- arch/x86/entry/entry_64.S | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 3aad759aace2..1e6ca3740762 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1355,16 +1355,17 @@ ENTRY(nmi) * if it controls the kernel's RSP. We set DF before we clear * "NMI executing". */ - lea 6*8(%rsp), %rdx - /* Compare the NMI stack (rdx) with the stack we came from (4*8(%rsp)) */ - cmpq %rdx, 4*8(%rsp) - /* If the stack pointer is above the NMI stack, this is a normal NMI */ - ja first_nmi - - subq $EXCEPTION_STKSZ, %rdx - cmpq %rdx, 4*8(%rsp) - /* If it is below the NMI stack, it is a normal NMI */ - jb first_nmi + + /* Load address of the top of this stack into rdx */ + lea 5*8+7(%rsp), %rdx + /* Subtract the return stack pointer from it */ + subq 4*8(%rsp), %rdx + /* + * If the result is greater or equal to the stack size, + * then the return stack was not on this stack. + */ + cmpq $EXCEPTION_STKSZ, %rdx + jae first_nmi /* Ah, it is within the NMI stack. */ -- 2.10.2 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code 2017-03-09 22:42 ` [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code Steven Rostedt @ 2017-03-10 2:43 ` Andy Lutomirski 0 siblings, 0 replies; 10+ messages in thread From: Andy Lutomirski @ 2017-03-10 2:43 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel@vger.kernel.org, Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Andy Lutomirski, Andrew Morton On Thu, Mar 9, 2017 at 2:42 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org> > > Andy Lutomirski reported an off by one in the NMI stack check > for the nested NMI code, where if the stack pointer was one above > the actual stack (stack start + STACK_SIZE) it would trigger a false > positive. This is not that big of a deal because the stack pointer > should never be that. Even if a stack was using the pages just > above the NMI stack, it would require the stack about to overflow > for this to trigger, which is a much bigger bug than this is fixing. > > Also, Linus Torvalds pointed out that doing two compares can be > accomplish with a single compare. That is: > > ("reg" is top of stack we are comparing "stack" to) > > cmpq reg, stack > jae label // note, code had one off "ja" instead of "jae" > subq size, reg > cmpq reg, stack > jb label > > Is the same as: > > subq $1, reg > subq stack, reg > cmpq size, reg > jae label > > The subq $1 was added into the leaq by doing: > > leaq 5*8+7(%rsp), %rdx > > Added more comments as well. Nice. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-03-10 19:04 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-03-09 22:42 [PATCH 0/2] x86/nmi: Optimize address compares with better jump algorithm Steven Rostedt 2017-03-09 22:42 ` [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code Steven Rostedt 2017-03-10 2:42 ` Andy Lutomirski 2017-03-10 3:49 ` Steven Rostedt 2017-03-10 3:50 ` Andy Lutomirski 2017-03-10 7:20 ` Ingo Molnar 2017-03-10 19:00 ` Linus Torvalds 2017-03-10 19:03 ` Andy Lutomirski 2017-03-09 22:42 ` [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code Steven Rostedt 2017-03-10 2:43 ` Andy Lutomirski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox