[RFC PATCH] x86: optimize IRET returns to kernel

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] x86: optimize IRET returns to kernel
@ 2015-03-31 12:46 Denys Vlasenko
  2015-03-31 13:49 ` Steven Rostedt
  2015-03-31 13:54 ` Andy Lutomirski
  0 siblings, 2 replies; 5+ messages in thread
From: Denys Vlasenko @ 2015-03-31 12:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Denys Vlasenko, Linus Torvalds, Steven Rostedt, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Oleg Nesterov,
	Frederic Weisbecker, Alexei Starovoitov, Will Drewry, Kees Cook,
	x86, linux-kernel

This is not proposed to be merged yet.

Andy, this patch is in spirit of your crazy ideas of repurposing
instructions for the roles they weren't intended for :)

Recently I measured IRET timings and was newly "impressed"
how slow it is. 200+ cycles. So I started thinking...

When we return from interrupt/exception *to kernel*,
most of IRET's doings are not necessary. CS and SS
do not need changing. And in many (most?) cases
saved RSP points right at the top of pt_regs,
or (top of pt_regs+8).

In which case we can (ab)use POPF and RET!

Please see the patch.

It has an ifdefed out code which shows that if we could be sure
we aren't on IST stack, the check for stack alignment can be much simpler.
Since this patch is an RFC, I did not remove this bit
as an illustration of some alternatives / future ideas.

I did not measure this, but it must be a win. A big one.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Alexei Starovoitov <ast@plumgrid.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/entry_64.S | 47 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 020872b..b7ee959 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -750,6 +750,53 @@ retint_kernel:
 	 * The iretq could re-enable interrupts:
 	 */
 	TRACE_IRQS_IRETQ
+
+	/*
+	 * Since we return to kernel, CS and SS do not need changing.
+	 * Only RSP, RIP and RFLAGS do.
+	 * We can use POPF + near RET, which is much faster.
+	 * The below code may seem excessive, but IRET is _very_ slow.
+	 * Hundreds of cycles.
+	 *
+	 * However, there is a complication. Interrupts in 64-bit mode
+	 * align stack to 16 bytes. This changes location
+	 * where we need to store EFLAGS and RIP:
+	 */
+#if 0
+	testb	$8, RSP(%rsp)
+	jnz	1f
+#else
+	/* There is a complication #2: 64-bit mode has IST stacks */
+	leaq	SIZEOF_PTREGS+8(%rsp), %rax
+	cmpq	%rax, RSP(%rsp)
+	je	1f
+	subq	$8, %rax
+	cmpq	%rax, RSP(%rsp)
+	jne	restore_args /* probably IST stack, can't optimize */
+#endif
+	/* there is no padding above iret frame */
+	movq	EFLAGS(%rsp), %rax
+	movq	RIP(%rsp), %rcx
+	movq	%rax, (SIZEOF_PTREGS-2*8)(%rsp)
+	movq	%rcx, (SIZEOF_PTREGS-1*8)(%rsp)
+	CFI_REMEMBER_STATE
+	RESTORE_C_REGS
+	REMOVE_PT_GPREGS_FROM_STACK 4*8 /* remove all except last two words */
+	popfq_cfi
+	retq
+	CFI_RESTORE_STATE
+1:	/* there are 8 bytes of padding above iret frame */
+	movq	EFLAGS(%rsp), %rax
+	movq	RIP(%rsp), %rcx
+	movq	%rax, (SIZEOF_PTREGS-2*8 + 8)(%rsp)
+	movq	%rcx, (SIZEOF_PTREGS-1*8 + 8)(%rsp)
+	CFI_REMEMBER_STATE
+	RESTORE_C_REGS
+	REMOVE_PT_GPREGS_FROM_STACK 4*8 + 8
+	popfq_cfi
+	retq
+	CFI_RESTORE_STATE
+
 restore_args:
 	RESTORE_C_REGS
 	REMOVE_PT_GPREGS_FROM_STACK 8
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] x86: optimize IRET returns to kernel
  2015-03-31 12:46 [RFC PATCH] x86: optimize IRET returns to kernel Denys Vlasenko
@ 2015-03-31 13:49 ` Steven Rostedt
  2015-03-31 13:54 ` Andy Lutomirski
  1 sibling, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2015-03-31 13:49 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Andy Lutomirski, Linus Torvalds, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, x86, linux-kernel

On Tue, 31 Mar 2015 14:46:21 +0200
Denys Vlasenko <dvlasenk@redhat.com> wrote:

> @@ -750,6 +750,53 @@ retint_kernel:
>  	 * The iretq could re-enable interrupts:
>  	 */
>  	TRACE_IRQS_IRETQ
> +
> +	/*
> +	 * Since we return to kernel, CS and SS do not need changing.
> +	 * Only RSP, RIP and RFLAGS do.
> +	 * We can use POPF + near RET, which is much faster.
> +	 * The below code may seem excessive, but IRET is _very_ slow.
> +	 * Hundreds of cycles.
> +	 *
> +	 * However, there is a complication. Interrupts in 64-bit mode
> +	 * align stack to 16 bytes. This changes location
> +	 * where we need to store EFLAGS and RIP:
> +	 */
> +#if 0
> +	testb	$8, RSP(%rsp)

Shouldn't this be: testb $0xf, RSP(%rsp) ?

The stack should be word (8 bytes) aligned, but I never like to assume
anything. As an interrupt can come in anywhere, and %rsp can be
modified to anything in assembly, I wouldn't want some "hack" that
performs a non word aligned rsp manipulation to suddenly break. It
would be rather hard to debug.


> +	jnz	1f
> +#else
> +	/* There is a complication #2: 64-bit mode has IST stacks */
> +	leaq	SIZEOF_PTREGS+8(%rsp), %rax
> +	cmpq	%rax, RSP(%rsp)
> +	je	1f
> +	subq	$8, %rax
> +	cmpq	%rax, RSP(%rsp)
> +	jne	restore_args /* probably IST stack, can't optimize */
> +#endif
> +	/* there is no padding above iret frame */
> +	movq	EFLAGS(%rsp), %rax
> +	movq	RIP(%rsp), %rcx
> +	movq	%rax, (SIZEOF_PTREGS-2*8)(%rsp)
> +	movq	%rcx, (SIZEOF_PTREGS-1*8)(%rsp)
> +	CFI_REMEMBER_STATE
> +	RESTORE_C_REGS
> +	REMOVE_PT_GPREGS_FROM_STACK 4*8 /* remove all except last two words */
> +	popfq_cfi
> +	retq

BTW, have you made sure that this path has been hit?

-- Steve

> +	CFI_RESTORE_STATE
> +1:	/* there are 8 bytes of padding above iret frame */
> +	movq	EFLAGS(%rsp), %rax
> +	movq	RIP(%rsp), %rcx
> +	movq	%rax, (SIZEOF_PTREGS-2*8 + 8)(%rsp)
> +	movq	%rcx, (SIZEOF_PTREGS-1*8 + 8)(%rsp)
> +	CFI_REMEMBER_STATE
> +	RESTORE_C_REGS
> +	REMOVE_PT_GPREGS_FROM_STACK 4*8 + 8
> +	popfq_cfi
> +	retq
> +	CFI_RESTORE_STATE
> +
>  restore_args:
>  	RESTORE_C_REGS
>  	REMOVE_PT_GPREGS_FROM_STACK 8


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] x86: optimize IRET returns to kernel
  2015-03-31 12:46 [RFC PATCH] x86: optimize IRET returns to kernel Denys Vlasenko
  2015-03-31 13:49 ` Steven Rostedt
@ 2015-03-31 13:54 ` Andy Lutomirski
  2015-03-31 15:59   ` Denys Vlasenko
  1 sibling, 1 reply; 5+ messages in thread
From: Andy Lutomirski @ 2015-03-31 13:54 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	linux-kernel@vger.kernel.org

On Tue, Mar 31, 2015 at 5:46 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> This is not proposed to be merged yet.
>
> Andy, this patch is in spirit of your crazy ideas of repurposing
> instructions for the roles they weren't intended for :)
>
> Recently I measured IRET timings and was newly "impressed"
> how slow it is. 200+ cycles. So I started thinking...
>
> When we return from interrupt/exception *to kernel*,
> most of IRET's doings are not necessary. CS and SS
> do not need changing. And in many (most?) cases
> saved RSP points right at the top of pt_regs,
> or (top of pt_regs+8).
>
> In which case we can (ab)use POPF and RET!
>
> Please see the patch.

I have an old attempt at this here:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=fast-return-to-kernel&id=6cfe29821979c42cd812878e05577f69f99fafaf

If I were doing it again, I'd add a bit more care: if saved eflags
have RF set (can kgdb do that?), then we have to use iret.  Your patch
may need that.  Sadly, this may kill any attempt to completely prevent
NMI nesting.

I think that, if returning to IF=1, you need to do sti;ret to avoid an
infinite stack usage failure in which, during an IRQ storm, each IRQ
adds around one word of stack utilization because you haven't done the
ret yet before the next IRQ comes in.  To make that robust, I'd adjust
the NMI code to clear IF and back up one instruction if it interrupts
after sti.

This should dramatically speed up in-kernel page faults as well as IRQ
handling from kernel mode.

--Andy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] x86: optimize IRET returns to kernel
  2015-03-31 13:54 ` Andy Lutomirski
@ 2015-03-31 15:59   ` Denys Vlasenko
  2015-04-04 16:54     ` Andy Lutomirski
  0 siblings, 1 reply; 5+ messages in thread
From: Denys Vlasenko @ 2015-03-31 15:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	linux-kernel@vger.kernel.org

On 03/31/2015 03:54 PM, Andy Lutomirski wrote:
> On Tue, Mar 31, 2015 at 5:46 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> This is not proposed to be merged yet.
>>
>> Andy, this patch is in spirit of your crazy ideas of repurposing
>> instructions for the roles they weren't intended for :)
>>
>> Recently I measured IRET timings and was newly "impressed"
>> how slow it is. 200+ cycles. So I started thinking...
>>
>> When we return from interrupt/exception *to kernel*,
>> most of IRET's doings are not necessary. CS and SS
>> do not need changing. And in many (most?) cases
>> saved RSP points right at the top of pt_regs,
>> or (top of pt_regs+8).
>>
>> In which case we can (ab)use POPF and RET!
>>
>> Please see the patch.
> 
> I have an old attempt at this here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=fast-return-to-kernel&id=6cfe29821979c42cd812878e05577f69f99fafaf

Your version is better :/

I'd only suggest    s/pop %rsp/mov (%rsp),%rsp/

I suspect "pop %rsp" is not an easy insn for CPU to digest.

> If I were doing it again, I'd add a bit more care: if saved eflags
> have RF set (can kgdb do that?), then we have to use iret.

Good idea, we can even be paranoid and jump to real IRET if any
of "unusual" flags are set.

> I think that, if returning to IF=1, you need to do sti;ret to avoid an
> infinite stack usage failure in which, during an IRQ storm, each IRQ
> adds around one word of stack utilization because you haven't done the
> ret yet before the next IRQ comes in.  To make that robust, I'd adjust
> the NMI code to clear IF and back up one instruction if it interrupts
> after sti.

I kinda hoped POPF is secretly a shadowing insn too.
Experiments show it is not.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] x86: optimize IRET returns to kernel
  2015-03-31 15:59   ` Denys Vlasenko
@ 2015-04-04 16:54     ` Andy Lutomirski
  0 siblings, 0 replies; 5+ messages in thread
From: Andy Lutomirski @ 2015-04-04 16:54 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Linus Torvalds, Steven Rostedt, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Oleg Nesterov, Frederic Weisbecker,
	Alexei Starovoitov, Will Drewry, Kees Cook, X86 ML,
	linux-kernel@vger.kernel.org

On Tue, Mar 31, 2015 at 8:59 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 03/31/2015 03:54 PM, Andy Lutomirski wrote:
>> On Tue, Mar 31, 2015 at 5:46 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>> This is not proposed to be merged yet.
>>>
>>> Andy, this patch is in spirit of your crazy ideas of repurposing
>>> instructions for the roles they weren't intended for :)
>>>
>>> Recently I measured IRET timings and was newly "impressed"
>>> how slow it is. 200+ cycles. So I started thinking...
>>>
>>> When we return from interrupt/exception *to kernel*,
>>> most of IRET's doings are not necessary. CS and SS
>>> do not need changing. And in many (most?) cases
>>> saved RSP points right at the top of pt_regs,
>>> or (top of pt_regs+8).
>>>
>>> In which case we can (ab)use POPF and RET!
>>>
>>> Please see the patch.
>>
>> I have an old attempt at this here:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=fast-return-to-kernel&id=6cfe29821979c42cd812878e05577f69f99fafaf
>
> Your version is better :/
>
> I'd only suggest    s/pop %rsp/mov (%rsp),%rsp/
>
> I suspect "pop %rsp" is not an easy insn for CPU to digest.
>
>> If I were doing it again, I'd add a bit more care: if saved eflags
>> have RF set (can kgdb do that?), then we have to use iret.
>
> Good idea, we can even be paranoid and jump to real IRET if any
> of "unusual" flags are set.
>
>> I think that, if returning to IF=1, you need to do sti;ret to avoid an
>> infinite stack usage failure in which, during an IRQ storm, each IRQ
>> adds around one word of stack utilization because you haven't done the
>> ret yet before the next IRQ comes in.  To make that robust, I'd adjust
>> the NMI code to clear IF and back up one instruction if it interrupts
>> after sti.
>
> I kinda hoped POPF is secretly a shadowing insn too.
> Experiments show it is not.
>

I'll fiddle with this some more at some point.  First I want to get
rid of IST for #DB and #BP, which will reduce the number of funny
cases to think about.  I hope to have patches for that ready short
after the next merge window closes.

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-04 16:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-31 12:46 [RFC PATCH] x86: optimize IRET returns to kernel Denys Vlasenko
2015-03-31 13:49 ` Steven Rostedt
2015-03-31 13:54 ` Andy Lutomirski
2015-03-31 15:59   ` Denys Vlasenko
2015-04-04 16:54     ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox