[PATCH] x86/retpoline: Avoid return buffer underflows on context switch

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
@ 2018-01-08 20:15 Andi Kleen
  2018-01-08 21:38 ` David Woodhouse
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Andi Kleen @ 2018-01-08 20:15 UTC (permalink / raw)
  To: dwmw
  Cc: pjt, linux-kernel, gregkh, tim.c.chen, dave.hansen, tglx, luto,
	Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

[This is on top of David's retpoline branch, as of 08-01 this morning]

This patch further hardens retpoline

CPUs have return buffers which store the return address for
RET to predict function returns. Some CPUs (Skylake, some Broadwells)
can fall back to indirect branch prediction on return buffer underflow.

With retpoline we want to avoid uncontrolled indirect branches,
which could be poisoned by ring 3, so we need to avoid uncontrolled
return buffer underflows in the kernel.

This can happen when we're context switching from a shallower to a
deeper kernel stack.  The deeper kernel stack would eventually underflow
the return buffer, which again would fall back to the indirect branch predictor.

To guard against this fill the return buffer with controlled
content during context switch. This prevents any underflows.

We always fill the buffer with 30 entries: 32 minus 2 for at
least one call from entry_{64,32}.S to C code and another into
the function doing the filling.

That's pessimistic because we likely did more controlled kernel calls.
So in principle we could do less.  However it's hard to maintain such an
invariant, and it may be broken with more aggressive compilers.
So err on the side of safety and always fill 30.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/entry/entry_32.S            | 15 +++++++++++++++
 arch/x86/entry/entry_64.S            | 15 +++++++++++++++
 arch/x86/include/asm/nospec-branch.h | 29 +++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index cf9ef33d299b..5404a9b2197c 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -250,6 +250,21 @@ ENTRY(__switch_to_asm)
 	popl	%ebx
 	popl	%ebp
 
+	/*
+	 * When we switch from a shallower to a deeper call stack
+	 * the call stack will underflow in the kernel in the next task.
+	 * This could cause the CPU to fall back to indirect branch
+	 * prediction, which may be poisoned.
+	 *
+	 * To guard against that always fill the return stack with
+	 * known values.
+	 *
+	 * We do this in assembler because it needs to be before
+	 * any calls on the new stack, and this can be difficult to
+	 * ensure in a complex C function like __switch_to.
+	 */
+	ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE
+	FILL_RETURN_BUFFER
 	jmp	__switch_to
 END(__switch_to_asm)
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9bce6ed03353..0f28d0ea57e8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -495,6 +495,21 @@ ENTRY(__switch_to_asm)
 	popq	%rbx
 	popq	%rbp
 
+	/*
+	 * When we switch from a shallower to a deeper call stack
+	 * the call stack will underflow in the kernel in the next task.
+	 * This could cause the CPU to fall back to indirect branch
+	 * prediction, which may be poisoned.
+	 *
+	 * To guard against that always fill the return stack with
+	 * known values.
+	 *
+	 * We do this in assembler because it needs to be before
+	 * any calls on the new stack, and this can be difficult to
+	 * ensure in a complex C function like __switch_to.
+	 */
+	ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE
+	FILL_RETURN_BUFFER
 	jmp	__switch_to
 END(__switch_to_asm)
 
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index b8c8eeacb4be..e84e231248c2 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -53,6 +53,35 @@
 #endif
 .endm
 
+/*
+ * We use 32-N: 32 is the max return buffer size,
+ * but there should have been at a minimum two
+ * controlled calls already: one into the kernel
+ * from entry*.S and another into the function
+ * containing this macro. So N=2, thus 30.
+ */
+#define NUM_BRANCHES_TO_FILL	30
+
+/*
+ * Fill the CPU return branch buffer to prevent
+ * indirect branch prediction on underflow.
+ * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE
+ */
+.macro FILL_RETURN_BUFFER
+#ifdef CONFIG_RETPOLINE
+	.rept	NUM_BRANCHES_TO_FILL
+	call	1221f
+	pause	/* stop speculation */
+1221:
+	.endr
+#ifdef CONFIG_64BIT
+	addq	$8*NUM_BRANCHES_TO_FILL, %rsp
+#else
+	addl    $4*NUM_BRANCHES_TO_FILL, %esp
+#endif
+#endif
+.endm
+
 #else /* __ASSEMBLY__ */
 
 #if defined(CONFIG_X86_64) && defined(RETPOLINE)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-08 20:15 [PATCH] x86/retpoline: Avoid return buffer underflows on context switch Andi Kleen
@ 2018-01-08 21:38 ` David Woodhouse
  2018-01-08 22:54   ` Andi Kleen
  2018-01-08 22:11 ` Peter Zijlstra
  2018-01-12 11:15 ` David Laight
  2 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2018-01-08 21:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: pjt, linux-kernel, gregkh, tim.c.chen, dave.hansen, tglx, luto,
	Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 1784 bytes --]

On Mon, 2018-01-08 at 12:15 -0800, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> [This is on top of David's retpoline branch, as of 08-01 this morning]
> 
> This patch further hardens retpoline
> 
> CPUs have return buffers which store the return address for
> RET to predict function returns. Some CPUs (Skylake, some Broadwells)
> can fall back to indirect branch prediction on return buffer underflow.
> 
> With retpoline we want to avoid uncontrolled indirect branches,
> which could be poisoned by ring 3, so we need to avoid uncontrolled
> return buffer underflows in the kernel.
> 
> This can happen when we're context switching from a shallower to a
> deeper kernel stack.  The deeper kernel stack would eventually underflow
> the return buffer, which again would fall back to the indirect branch predictor.
> 
> To guard against this fill the return buffer with controlled
> content during context switch. This prevents any underflows.
> 
> We always fill the buffer with 30 entries: 32 minus 2 for at
> least one call from entry_{64,32}.S to C code and another into
> the function doing the filling.
> 
> That's pessimistic because we likely did more controlled kernel calls.
> So in principle we could do less.  However it's hard to maintain such an
> invariant, and it may be broken with more aggressive compilers.
> So err on the side of safety and always fill 30.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Thanks.

Acked-by: David Woodhouse <dwmw@amazon.co.uk>

We want this on vmexit too, right? And the IBRS/IBPB patch set is going
to want to do similar things. But picking the RSB stuffing out of that
patch set and putting it in with the retpoline support is absolutely
the right thing to do.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-08 20:15 [PATCH] x86/retpoline: Avoid return buffer underflows on context switch Andi Kleen
  2018-01-08 21:38 ` David Woodhouse
@ 2018-01-08 22:11 ` Peter Zijlstra
  2018-01-08 22:17   ` Woodhouse, David
                     ` (2 more replies)
  2018-01-12 11:15 ` David Laight
  2 siblings, 3 replies; 16+ messages in thread
From: Peter Zijlstra @ 2018-01-08 22:11 UTC (permalink / raw)
  To: Andi Kleen
  Cc: dwmw, pjt, linux-kernel, gregkh, tim.c.chen, dave.hansen, tglx,
	luto, Andi Kleen

On Mon, Jan 08, 2018 at 12:15:31PM -0800, Andi Kleen wrote:
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index b8c8eeacb4be..e84e231248c2 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -53,6 +53,35 @@
>  #endif
>  .endm
>  
> +/*
> + * We use 32-N: 32 is the max return buffer size,
> + * but there should have been at a minimum two
> + * controlled calls already: one into the kernel
> + * from entry*.S and another into the function
> + * containing this macro. So N=2, thus 30.
> + */
> +#define NUM_BRANCHES_TO_FILL	30
> +
> +/*
> + * Fill the CPU return branch buffer to prevent
> + * indirect branch prediction on underflow.
> + * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE
> + */
> +.macro FILL_RETURN_BUFFER
> +#ifdef CONFIG_RETPOLINE
> +	.rept	NUM_BRANCHES_TO_FILL
> +	call	1221f
> +	pause	/* stop speculation */
> +1221:
> +	.endr
> +#ifdef CONFIG_64BIT
> +	addq	$8*NUM_BRANCHES_TO_FILL, %rsp
> +#else
> +	addl    $4*NUM_BRANCHES_TO_FILL, %esp
> +#endif
> +#endif
> +.endm

So pjt did alignment, a single unroll and per discussion earlier today
(CET) or late last night (PST), he only does 16.

Why is none of that done here? Also, can we pretty please stop using
those retarded number labels, they make this stuff unreadable.

Also, pause is unlikely to stop speculation, that comment doesn't make
sense. Looking at PJT's version there used to be a speculation trap in
there, but I can't see that here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-08 22:11 ` Peter Zijlstra
@ 2018-01-08 22:17   ` Woodhouse, David
  2018-01-08 22:26     ` Peter Zijlstra
  2018-01-08 22:25   ` Andi Kleen
  2018-01-09  0:15   ` Paul Turner
  2 siblings, 1 reply; 16+ messages in thread
From: Woodhouse, David @ 2018-01-08 22:17 UTC (permalink / raw)
  To: peterz@infradead.org, andi@firstfloor.org
  Cc: linux-kernel@vger.kernel.org, tim.c.chen@linux.intel.com,
	tglx@linutronix.de, ak@linux.intel.com, pjt@google.com,
	dave.hansen@intel.com, luto@amacapital.net,
	gregkh@linux-foundation.org


[-- Attachment #1.1: Type: text/plain, Size: 950 bytes --]

On Mon, 2018-01-08 at 23:11 +0100, Peter Zijlstra wrote:
> 
> So pjt did alignment, a single unroll and per discussion earlier today
> (CET) or late last night (PST), he only does 16.

Hey Intel, please tell us precisely how many RSB entries there are, on
each family of CPU... :)

> Why is none of that done here? Also, can we pretty please stop using
> those retarded number labels, they make this stuff unreadable.
> 
> Also, pause is unlikely to stop speculation, that comment doesn't make
> sense. Looking at PJT's version there used to be a speculation trap in
> there, but I can't see that here.

In this particular code we don't need a speculation trap; that's
elsewhere. This one is *just* about the call stack. And the reason we
don't just have...

 call . + 5
 call . + 5
 call . + 5
 ...

is because that might get interpreted as a "push %rip" and not go on
the RSB at all. Hence the 'pause' between each one.

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

[-- Attachment #2.1: Type: text/plain, Size: 197 bytes --]




Amazon Web Services UK Limited. Registered in England and Wales with registration number 08650665 and which has its registered office at 60 Holborn Viaduct, London EC1A 2FD, United Kingdom.

[-- Attachment #2.2: Type: text/html, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-08 22:11 ` Peter Zijlstra
  2018-01-08 22:17   ` Woodhouse, David
@ 2018-01-08 22:25   ` Andi Kleen
  2018-01-08 22:58     ` Andi Kleen
  2018-01-09  0:15     ` Paul Turner
  2018-01-09  0:15   ` Paul Turner
  2 siblings, 2 replies; 16+ messages in thread
From: Andi Kleen @ 2018-01-08 22:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, dwmw, pjt, linux-kernel, gregkh, tim.c.chen,
	dave.hansen, tglx, luto

> So pjt did alignment, a single unroll and per discussion earlier today
> (CET) or late last night (PST), he only does 16.

I used the Intel recommended sequence, which recommends 32. 

Not sure if alignment makes a difference. I can check. 

> Why is none of that done here? Also, can we pretty please stop using
> those retarded number labels, they make this stuff unreadable.

Personally I find the magic labels with strange ASCII characters 
far less readable than a simple number.

But can change it if you insist.

> Also, pause is unlikely to stop speculation, that comment doesn't make
> sense. Looking at PJT's version there used to be a speculation trap in
> there, but I can't see that here.

My understanding is that it stops speculation. But could also
use LFENCE.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-08 22:17   ` Woodhouse, David
@ 2018-01-08 22:26     ` Peter Zijlstra
  2018-01-08 23:25       ` Woodhouse, David
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2018-01-08 22:26 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: andi@firstfloor.org, linux-kernel@vger.kernel.org,
	tim.c.chen@linux.intel.com, tglx@linutronix.de,
	ak@linux.intel.com, pjt@google.com, dave.hansen@intel.com,
	luto@amacapital.net, gregkh@linux-foundation.org

On Mon, Jan 08, 2018 at 10:17:19PM +0000, Woodhouse, David wrote:
> On Mon, 2018-01-08 at 23:11 +0100, Peter Zijlstra wrote:
> > 
> > So pjt did alignment, a single unroll and per discussion earlier today
> > (CET) or late last night (PST), he only does 16.
> 
> Hey Intel, please tell us precisely how many RSB entries there are, on
> each family of CPU... :)

Right, and we can always fall back to 32 for unknown models.

> > Also, pause is unlikely to stop speculation, that comment doesn't make
> > sense. Looking at PJT's version there used to be a speculation trap in
> > there, but I can't see that here.
> 
> In this particular code we don't need a speculation trap; that's
> elsewhere. This one is *just* about the call stack. And the reason we
> don't just have...
> 
>  call . + 5
>  call . + 5
>  call . + 5
>  ...
> 
> is because that might get interpreted as a "push %rip" and not go on
> the RSB at all. Hence the 'pause' between each one.

OK, then make the comment say that.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-08 21:38 ` David Woodhouse
@ 2018-01-08 22:54   ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2018-01-08 22:54 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Andi Kleen, pjt, linux-kernel, gregkh, tim.c.chen, dave.hansen,
	tglx, luto, Andi Kleen

> We want this on vmexit too, right? 

Yes. KVM patches are done separately.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-08 22:25   ` Andi Kleen
@ 2018-01-08 22:58     ` Andi Kleen
  2018-01-09  0:15     ` Paul Turner
  1 sibling, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2018-01-08 22:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Andi Kleen, dwmw, pjt, linux-kernel, gregkh,
	tim.c.chen, dave.hansen, tglx, luto

> > Why is none of that done here? Also, can we pretty please stop using
> > those retarded number labels, they make this stuff unreadable.
> 
> Personally I find the magic labels with strange ASCII characters 
> far less readable than a simple number.

Tried it and \@ is incompatible with .rept.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-08 22:26     ` Peter Zijlstra
@ 2018-01-08 23:25       ` Woodhouse, David
  0 siblings, 0 replies; 16+ messages in thread
From: Woodhouse, David @ 2018-01-08 23:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: andi@firstfloor.org, linux-kernel@vger.kernel.org,
	tim.c.chen@linux.intel.com, tglx@linutronix.de,
	ak@linux.intel.com, pjt@google.com, dave.hansen@intel.com,
	luto@amacapital.net, gregkh@linux-foundation.org

[-- Attachment #1: Type: text/plain, Size: 512 bytes --]

On Mon, 2018-01-08 at 23:26 +0100, Peter Zijlstra wrote:
> 
> > is because that might get interpreted as a "push %rip" and not go on
> > the RSB at all. Hence the 'pause' between each one.
> 
> OK, then make the comment say that.

Fixed. I've also shifted the #ifdef CONFIG_RETPOLINE to the call sites
instead of inside the FILL_RETURN_BUFFER macro itself. This is going to
get used with IBRS code too, Real Soon Now™.

http://git.infradead.org/users/dwmw2/linux-retpoline.git/commitdiff/6e961b86558

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-08 22:11 ` Peter Zijlstra
  2018-01-08 22:17   ` Woodhouse, David
  2018-01-08 22:25   ` Andi Kleen
@ 2018-01-09  0:15   ` Paul Turner
  2018-01-09  0:44     ` Woodhouse, David
  2 siblings, 1 reply; 16+ messages in thread
From: Paul Turner @ 2018-01-09  0:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Woodhouse, David, LKML, Greg Kroah-Hartman, Tim Chen,
	Dave Hansen, Thomas Gleixner, Andy Lutomirski, Andi Kleen

On Mon, Jan 8, 2018 at 2:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Jan 08, 2018 at 12:15:31PM -0800, Andi Kleen wrote:
>> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
>> index b8c8eeacb4be..e84e231248c2 100644
>> --- a/arch/x86/include/asm/nospec-branch.h
>> +++ b/arch/x86/include/asm/nospec-branch.h
>> @@ -53,6 +53,35 @@
>>  #endif
>>  .endm
>>
>> +/*
>> + * We use 32-N: 32 is the max return buffer size,
>> + * but there should have been at a minimum two
>> + * controlled calls already: one into the kernel
>> + * from entry*.S and another into the function
>> + * containing this macro. So N=2, thus 30.
>> + */
>> +#define NUM_BRANCHES_TO_FILL 30
>> +
>> +/*
>> + * Fill the CPU return branch buffer to prevent
>> + * indirect branch prediction on underflow.
>> + * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE
>> + */
>> +.macro FILL_RETURN_BUFFER
>> +#ifdef CONFIG_RETPOLINE
>> +     .rept   NUM_BRANCHES_TO_FILL
>> +     call    1221f
>> +     pause   /* stop speculation */
>> +1221:
>> +     .endr
>> +#ifdef CONFIG_64BIT
>> +     addq    $8*NUM_BRANCHES_TO_FILL, %rsp
>> +#else
>> +     addl    $4*NUM_BRANCHES_TO_FILL, %esp
>> +#endif
>> +#endif
>> +.endm
>
> So pjt did alignment, a single unroll and per discussion earlier today
> (CET) or late last night (PST), he only does 16.
>
> Why is none of that done here? Also, can we pretty please stop using
> those retarded number labels, they make this stuff unreadable.
>
> Also, pause is unlikely to stop speculation, that comment doesn't make
> sense. Looking at PJT's version there used to be a speculation trap in
> there, but I can't see that here.
>

You definitely want the speculation traps.. these entries are
potentially consumed.
Worse: The first entry that will be consumed is the last call in your
linear chain, meaning that it immediately gets to escape into
alternative execution.
(When I was experimenting with icache-minimizing constructions here I
actually used intentional backwards jumps in linear chains to avoid
this.)

The sequence I reported is what ended up seeming optimal.

>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-08 22:25   ` Andi Kleen
  2018-01-08 22:58     ` Andi Kleen
@ 2018-01-09  0:15     ` Paul Turner
  1 sibling, 0 replies; 16+ messages in thread
From: Paul Turner @ 2018-01-09  0:15 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Andi Kleen, Woodhouse, David, LKML,
	Greg Kroah-Hartman, Tim Chen, Dave Hansen, Thomas Gleixner,
	Andy Lutomirski

On Mon, Jan 8, 2018 at 2:25 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> So pjt did alignment, a single unroll and per discussion earlier today
>> (CET) or late last night (PST), he only does 16.
>
> I used the Intel recommended sequence, which recommends 32.
>
> Not sure if alignment makes a difference. I can check.
>
>> Why is none of that done here? Also, can we pretty please stop using
>> those retarded number labels, they make this stuff unreadable.
>
> Personally I find the magic labels with strange ASCII characters
> far less readable than a simple number.
>
> But can change it if you insist.
>
>> Also, pause is unlikely to stop speculation, that comment doesn't make
>> sense. Looking at PJT's version there used to be a speculation trap in
>> there, but I can't see that here.
>
> My understanding is that it stops speculation. But could also
> use LFENCE.
>

Neither pause nor lfence stop speculation.

> -Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-09  0:15   ` Paul Turner
@ 2018-01-09  0:44     ` Woodhouse, David
  2018-01-09  0:48       ` David Woodhouse
  0 siblings, 1 reply; 16+ messages in thread
From: Woodhouse, David @ 2018-01-09  0:44 UTC (permalink / raw)
  To: Paul Turner, Peter Zijlstra, Van De Ven, Arjan
  Cc: Andi Kleen, LKML, Greg Kroah-Hartman, Tim Chen, Dave Hansen,
	Thomas Gleixner, Andy Lutomirski, Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 2747 bytes --]

On Mon, 2018-01-08 at 16:15 -0800, Paul Turner wrote:
> On Mon, Jan 8, 2018 at 2:11 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Mon, Jan 08, 2018 at 12:15:31PM -0800, Andi Kleen wrote:
> >> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> >> index b8c8eeacb4be..e84e231248c2 100644
> >> --- a/arch/x86/include/asm/nospec-branch.h
> >> +++ b/arch/x86/include/asm/nospec-branch.h
> >> @@ -53,6 +53,35 @@
> >>  #endif
> >>  .endm
> >>
> >> +/*
> >> + * We use 32-N: 32 is the max return buffer size,
> >> + * but there should have been at a minimum two
> >> + * controlled calls already: one into the kernel
> >> + * from entry*.S and another into the function
> >> + * containing this macro. So N=2, thus 30.
> >> + */
> >> +#define NUM_BRANCHES_TO_FILL 30
> >> +
> >> +/*
> >> + * Fill the CPU return branch buffer to prevent
> >> + * indirect branch prediction on underflow.
> >> + * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE
> >> + */
> >> +.macro FILL_RETURN_BUFFER
> >> +#ifdef CONFIG_RETPOLINE
> >> +     .rept   NUM_BRANCHES_TO_FILL
> >> +     call    1221f
> >> +     pause   /* stop speculation */
> >> +1221:
> >> +     .endr
> >> +#ifdef CONFIG_64BIT
> >> +     addq    $8*NUM_BRANCHES_TO_FILL, %rsp
> >> +#else
> >> +     addl    $4*NUM_BRANCHES_TO_FILL, %esp
> >> +#endif
> >> +#endif
> >> +.endm
> >
> > So pjt did alignment, a single unroll and per discussion earlier today
> > (CET) or late last night (PST), he only does 16.
> >
> > Why is none of that done here? Also, can we pretty please stop using
> > those retarded number labels, they make this stuff unreadable.
> >
> > Also, pause is unlikely to stop speculation, that comment doesn't make
> > sense. Looking at PJT's version there used to be a speculation trap in
> > there, but I can't see that here.
> >
> 
> You definitely want the speculation traps.. these entries are
> potentially consumed.
> Worse: The first entry that will be consumed is the last call in your
> linear chain, meaning that it immediately gets to escape into
> alternative execution.
> (When I was experimenting with icache-minimizing constructions here I
> actually used intentional backwards jumps in linear chains to avoid
> this.)
> 
> The sequence I reported is what ended up seeming optimal.

On IRC, Arjan assures me that 'pause' here really is sufficient as a
speculation trap. If we do end up returning back here as a
misprediction, that 'pause' will stop the speculative execution on
affected CPUs even though it isn't *architecturally* documented to do
so.

Arjan, can you confirm that in email please?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-09  0:44     ` Woodhouse, David
@ 2018-01-09  0:48       ` David Woodhouse
  2018-01-09  2:48         ` Paul Turner
  0 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2018-01-09  0:48 UTC (permalink / raw)
  To: Paul Turner, Peter Zijlstra, Van De Ven, Arjan
  Cc: Andi Kleen, LKML, Greg Kroah-Hartman, Tim Chen, Dave Hansen,
	Thomas Gleixner, Andy Lutomirski, Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 572 bytes --]

On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
> On IRC, Arjan assures me that 'pause' here really is sufficient as a
> speculation trap. If we do end up returning back here as a
> misprediction, that 'pause' will stop the speculative execution on
> affected CPUs even though it isn't *architecturally* documented to do
> so.
> 
> Arjan, can you confirm that in email please?


That actually doesn't make sense to me. If 'pause' alone is sufficient,
then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
retpoline itself?

Arjan?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-09  0:48       ` David Woodhouse
@ 2018-01-09  2:48         ` Paul Turner
  2018-01-09  3:05           ` David Woodhouse
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Turner @ 2018-01-09  2:48 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Peter Zijlstra, Van De Ven, Arjan, Andi Kleen, LKML,
	Greg Kroah-Hartman, Tim Chen, Dave Hansen, Thomas Gleixner,
	Andy Lutomirski, Andi Kleen

On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
>> On IRC, Arjan assures me that 'pause' here really is sufficient as a
>> speculation trap. If we do end up returning back here as a
>> misprediction, that 'pause' will stop the speculative execution on
>> affected CPUs even though it isn't *architecturally* documented to do
>> so.
>>
>> Arjan, can you confirm that in email please?
>
>
> That actually doesn't make sense to me. If 'pause' alone is sufficient,
> then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
> retpoline itself?
>
> Arjan?

On further investigation, I don't understand any of the motivation for
the changes here:
- It micro-benchmarks several cycles slower than the suggested
implementation on average (38 vs 44 cycles) [likely due to lost 16-byte call
alignment]
- It's much larger in terms of .text size (120 bytes @ 16 calls, 218
bytes @ 30 calls) vs (61 bytes)
- I'm not sure it's universally correct in preventing speculation:

(1) I am able to observe a small timing difference between executing
"1: pause; jmp 1b;" and "pause" in the speculative path.
     Given that alignment is otherwise identical, this should only
occur if execution is non-identical, which would require speculative
execution to proceed beyond the pause.
(2) When we proposed and reviewed the sequence.  This was not cited by
architects as a way of presenting speculation.  Indeed, as David
points out, we'd consider using this within the sequence without the
loop.


If the claim above is true -- which (1) actually appears to contradict
-- it seems to bear stronger validation.  Particularly since that in
the suggested sequences we can fit the jmps within the space we get
for free by aligning the call targets.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-09  2:48         ` Paul Turner
@ 2018-01-09  3:05           ` David Woodhouse
  0 siblings, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2018-01-09  3:05 UTC (permalink / raw)
  To: Paul Turner
  Cc: Peter Zijlstra, Van De Ven, Arjan, Andi Kleen, LKML,
	Greg Kroah-Hartman, Tim Chen, Dave Hansen, Thomas Gleixner,
	Andy Lutomirski, Andi Kleen

[-- Attachment #1: Type: text/plain, Size: 2551 bytes --]

On Mon, 2018-01-08 at 18:48 -0800, Paul Turner wrote:
> On Mon, Jan 8, 2018 at 4:48 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> > 
> > On Tue, 2018-01-09 at 00:44 +0000, Woodhouse, David wrote:
> > > 
> > > On IRC, Arjan assures me that 'pause' here really is sufficient as a
> > > speculation trap. If we do end up returning back here as a
> > > misprediction, that 'pause' will stop the speculative execution on
> > > affected CPUs even though it isn't *architecturally* documented to do
> > > so.
> > > 
> > > Arjan, can you confirm that in email please?
> > 
> > That actually doesn't make sense to me. If 'pause' alone is sufficient,
> > then why in $DEITY's name would we need a '1:pause;jmp 1b' loop in the
> > retpoline itself?
> > 
> > Arjan?
> On further investigation, I don't understand any of the motivation for
> the changes here:
> - It micro-benchmarks several cycles slower than the suggested
> implementation on average (38 vs 44 cycles) [likely due to lost 16-byte call
> alignment]
> - It's much larger in terms of .text size (120 bytes @ 16 calls, 218
> bytes @ 30 calls) vs (61 bytes)
> - I'm not sure it's universally correct in preventing speculation:
> 
> (1) I am able to observe a small timing difference between executing
> "1: pause; jmp 1b;" and "pause" in the speculative path.
>      Given that alignment is otherwise identical, this should only
> occur if execution is non-identical, which would require speculative
> execution to proceed beyond the pause.
> (2) When we proposed and reviewed the sequence.  This was not cited by
> architects as a way of presenting speculation.  Indeed, as David
> points out, we'd consider using this within the sequence without the
> loop.
> 
> 
> If the claim above is true -- which (1) actually appears to contradict
> -- it seems to bear stronger validation.  Particularly since that in
> the suggested sequences we can fit the jmps within the space we get
> for free by aligning the call targets.

Some of the discrepancies are because it's been filtered through Intel
and I may not have had your latest version.

I'm going to revert to your version from
https://support.google.com/faqs/answer/7625886 — for the retpoline
(i.e. adding back the alignment) and the RSB stuffing.

If Intel have a sequence which is simpler and guaranteed to work, I'll
let them post that with an authoritative statement from the CPU
architects. At this point, we really need to get on with rolling in the
other parts on top.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
  2018-01-08 20:15 [PATCH] x86/retpoline: Avoid return buffer underflows on context switch Andi Kleen
  2018-01-08 21:38 ` David Woodhouse
  2018-01-08 22:11 ` Peter Zijlstra
@ 2018-01-12 11:15 ` David Laight
  2 siblings, 0 replies; 16+ messages in thread
From: David Laight @ 2018-01-12 11:15 UTC (permalink / raw)
  To: 'Andi Kleen', dwmw@amazon.co.uk
  Cc: pjt@google.com, linux-kernel@vger.kernel.org,
	gregkh@linux-foundation.org, tim.c.chen@linux.intel.com,
	dave.hansen@intel.com, tglx@linutronix.de, luto@amacapital.net,
	Andi Kleen

From: Andi Kleen
> Sent: 08 January 2018 20:16
>
> [This is on top of David's retpoline branch, as of 08-01 this morning]
> 
> This patch further hardens retpoline
> 
> CPUs have return buffers which store the return address for
> RET to predict function returns. Some CPUs (Skylake, some Broadwells)
> can fall back to indirect branch prediction on return buffer underflow.
> 
> With retpoline we want to avoid uncontrolled indirect branches,
> which could be poisoned by ring 3, so we need to avoid uncontrolled
> return buffer underflows in the kernel.
> 
> This can happen when we're context switching from a shallower to a
> deeper kernel stack.  The deeper kernel stack would eventually underflow
> the return buffer, which again would fall back to the indirect branch predictor.
...

Is that really a usable attack vector?

Isn't it actually more likely to leak kernel addresses to userspace
in the return stack buffer - which might be usable to get around KASR.

	David

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-01-12 11:14 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-08 20:15 [PATCH] x86/retpoline: Avoid return buffer underflows on context switch Andi Kleen
2018-01-08 21:38 ` David Woodhouse
2018-01-08 22:54   ` Andi Kleen
2018-01-08 22:11 ` Peter Zijlstra
2018-01-08 22:17   ` Woodhouse, David
2018-01-08 22:26     ` Peter Zijlstra
2018-01-08 23:25       ` Woodhouse, David
2018-01-08 22:25   ` Andi Kleen
2018-01-08 22:58     ` Andi Kleen
2018-01-09  0:15     ` Paul Turner
2018-01-09  0:15   ` Paul Turner
2018-01-09  0:44     ` Woodhouse, David
2018-01-09  0:48       ` David Woodhouse
2018-01-09  2:48         ` Paul Turner
2018-01-09  3:05           ` David Woodhouse
2018-01-12 11:15 ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox