* [PATCH v3 1/3] x86/idle: Disable IBRS when cpu is offline
2023-06-22 0:36 [PATCH v3 0/3] x86/speculation: Disable IBRS when idle Waiman Long
@ 2023-06-22 0:36 ` Waiman Long
2023-06-22 2:05 ` Randy Dunlap
2023-06-22 5:40 ` Josh Poimboeuf
2023-06-22 0:36 ` [PATCH v3 2/3] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current Waiman Long
2023-06-22 0:36 ` [PATCH v3 3/3] intel_idle: Add ibrs_off module parameter to force disable IBRS Waiman Long
2 siblings, 2 replies; 15+ messages in thread
From: Waiman Long @ 2023-06-22 0:36 UTC (permalink / raw)
To: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Pawan Gupta,
Jacob Pan, Len Brown
Cc: linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario, Waiman Long
Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the CPU enters long idle. However, when a CPU
becomes offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS
is enabled. That will impact the performance of a sibling CPU. Mitigate
this performance impact by clearing all the mitigation bits in SPEC_CTRL
MSR when offline. When the CPU is online again, it will be re-initialized
and so restoring the SPEC_CTRL value isn't needed.
Add a comment to say that native_play_dead() is a __noreturn function,
but it can't be marked as such to avoid confusion about the missing
MSR restoration code.
Signed-off-by: Waiman Long <longman@redhat.com>
---
arch/x86/kernel/smpboot.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 352f0ce1ece4..7bc33885518c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -84,6 +84,7 @@
#include <asm/hw_irq.h>
#include <asm/stackprotector.h>
#include <asm/sev.h>
+#include <asm/nospec-branch.h>
/* representing HT siblings of each logical CPU */
DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1836,8 +1837,17 @@ void __noreturn hlt_play_dead(void)
}
}
+/*
+ * naitve_play_dead() is essentially a __noreturn function, but it can't
+ * be marked as such as the compiler may complain about it.
+ */
void native_play_dead(void)
{
+ if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
+ this_cpu_write(x86_spec_ctrl_current, 0);
+ native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+ }
+
play_dead_common();
tboot_shutdown(TB_SHUTDOWN_WFS);
--
2.31.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v3 1/3] x86/idle: Disable IBRS when cpu is offline
2023-06-22 0:36 ` [PATCH v3 1/3] x86/idle: Disable IBRS when cpu is offline Waiman Long
@ 2023-06-22 2:05 ` Randy Dunlap
2023-06-22 2:13 ` Waiman Long
2023-06-22 5:40 ` Josh Poimboeuf
1 sibling, 1 reply; 15+ messages in thread
From: Randy Dunlap @ 2023-06-22 2:05 UTC (permalink / raw)
To: Waiman Long, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
Pawan Gupta, Jacob Pan, Len Brown
Cc: linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario
On 6/21/23 17:36, Waiman Long wrote:
> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
> disables IBRS when the CPU enters long idle. However, when a CPU
> becomes offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS
> is enabled. That will impact the performance of a sibling CPU. Mitigate
> this performance impact by clearing all the mitigation bits in SPEC_CTRL
> MSR when offline. When the CPU is online again, it will be re-initialized
> and so restoring the SPEC_CTRL value isn't needed.
>
> Add a comment to say that native_play_dead() is a __noreturn function,
> but it can't be marked as such to avoid confusion about the missing
> MSR restoration code.
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> arch/x86/kernel/smpboot.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 352f0ce1ece4..7bc33885518c 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -84,6 +84,7 @@
> #include <asm/hw_irq.h>
> #include <asm/stackprotector.h>
> #include <asm/sev.h>
> +#include <asm/nospec-branch.h>
>
> /* representing HT siblings of each logical CPU */
> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
> @@ -1836,8 +1837,17 @@ void __noreturn hlt_play_dead(void)
> }
> }
>
> +/*
> + * naitve_play_dead() is essentially a __noreturn function, but it can't
typo: native_play_dead()
> + * be marked as such as the compiler may complain about it.
> + */
> void native_play_dead(void)
> {
> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> + this_cpu_write(x86_spec_ctrl_current, 0);
> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> + }
> +
> play_dead_common();
> tboot_shutdown(TB_SHUTDOWN_WFS);
>
--
~Randy
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3 1/3] x86/idle: Disable IBRS when cpu is offline
2023-06-22 2:05 ` Randy Dunlap
@ 2023-06-22 2:13 ` Waiman Long
0 siblings, 0 replies; 15+ messages in thread
From: Waiman Long @ 2023-06-22 2:13 UTC (permalink / raw)
To: Randy Dunlap, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, Josh Poimboeuf,
Pawan Gupta, Jacob Pan, Len Brown
Cc: linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario
On 6/21/23 22:05, Randy Dunlap wrote:
>
> On 6/21/23 17:36, Waiman Long wrote:
>> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
>> disables IBRS when the CPU enters long idle. However, when a CPU
>> becomes offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS
>> is enabled. That will impact the performance of a sibling CPU. Mitigate
>> this performance impact by clearing all the mitigation bits in SPEC_CTRL
>> MSR when offline. When the CPU is online again, it will be re-initialized
>> and so restoring the SPEC_CTRL value isn't needed.
>>
>> Add a comment to say that native_play_dead() is a __noreturn function,
>> but it can't be marked as such to avoid confusion about the missing
>> MSR restoration code.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> arch/x86/kernel/smpboot.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index 352f0ce1ece4..7bc33885518c 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -84,6 +84,7 @@
>> #include <asm/hw_irq.h>
>> #include <asm/stackprotector.h>
>> #include <asm/sev.h>
>> +#include <asm/nospec-branch.h>
>>
>> /* representing HT siblings of each logical CPU */
>> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
>> @@ -1836,8 +1837,17 @@ void __noreturn hlt_play_dead(void)
>> }
>> }
>>
>> +/*
>> + * naitve_play_dead() is essentially a __noreturn function, but it can't
> typo: native_play_dead()
Thanks for spotting the typo. Will fix it in the next version.
Cheers,
Longman
>
>> + * be marked as such as the compiler may complain about it.
>> + */
>> void native_play_dead(void)
>> {
>> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>> + this_cpu_write(x86_spec_ctrl_current, 0);
>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>> + }
>> +
>> play_dead_common();
>> tboot_shutdown(TB_SHUTDOWN_WFS);
>>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 1/3] x86/idle: Disable IBRS when cpu is offline
2023-06-22 0:36 ` [PATCH v3 1/3] x86/idle: Disable IBRS when cpu is offline Waiman Long
2023-06-22 2:05 ` Randy Dunlap
@ 2023-06-22 5:40 ` Josh Poimboeuf
2023-06-22 12:27 ` Waiman Long
1 sibling, 1 reply; 15+ messages in thread
From: Josh Poimboeuf @ 2023-06-22 5:40 UTC (permalink / raw)
To: Waiman Long
Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Pawan Gupta, Jacob Pan, Len Brown,
linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario
On Wed, Jun 21, 2023 at 08:36:01PM -0400, Waiman Long wrote:
> +/*
> + * naitve_play_dead() is essentially a __noreturn function, but it can't
> + * be marked as such as the compiler may complain about it.
> + */
FWIW, we could in theory do so by marking the smp_ops.play_dead function
pointer as __noreturn, but it would be tricky to teach objtool how to
understand that.
> void native_play_dead(void)
> {
> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> + this_cpu_write(x86_spec_ctrl_current, 0);
> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> + }
Can update_spec_ctrl() be used instead?
--
Josh
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3 1/3] x86/idle: Disable IBRS when cpu is offline
2023-06-22 5:40 ` Josh Poimboeuf
@ 2023-06-22 12:27 ` Waiman Long
0 siblings, 0 replies; 15+ messages in thread
From: Waiman Long @ 2023-06-22 12:27 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Pawan Gupta, Jacob Pan, Len Brown,
linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario
On 6/22/23 01:40, Josh Poimboeuf wrote:
> On Wed, Jun 21, 2023 at 08:36:01PM -0400, Waiman Long wrote:
>> +/*
>> + * naitve_play_dead() is essentially a __noreturn function, but it can't
>> + * be marked as such as the compiler may complain about it.
>> + */
> FWIW, we could in theory do so by marking the smp_ops.play_dead function
> pointer as __noreturn, but it would be tricky to teach objtool how to
> understand that.
I added the comment here because I had taken out the MSR restoration
code. We can always replace that later on if there is a better way to do
that.
>
>> void native_play_dead(void)
>> {
>> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>> + this_cpu_write(x86_spec_ctrl_current, 0);
>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>> + }
> Can update_spec_ctrl() be used instead?
Yes, the code is similar to what has been done in update_spec_ctrl().
Using it, however, will require exporting the function either by putting
it into a public header or making it a global function.
Cheers,
Longman
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 2/3] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
2023-06-22 0:36 [PATCH v3 0/3] x86/speculation: Disable IBRS when idle Waiman Long
2023-06-22 0:36 ` [PATCH v3 1/3] x86/idle: Disable IBRS when cpu is offline Waiman Long
@ 2023-06-22 0:36 ` Waiman Long
2023-06-22 5:46 ` Josh Poimboeuf
2023-06-22 0:36 ` [PATCH v3 3/3] intel_idle: Add ibrs_off module parameter to force disable IBRS Waiman Long
2 siblings, 1 reply; 15+ messages in thread
From: Waiman Long @ 2023-06-22 0:36 UTC (permalink / raw)
To: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Pawan Gupta,
Jacob Pan, Len Brown
Cc: linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario, Waiman Long
When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to
0 in order disable IBRS. However, the new MSR value isn't reflected
in x86_spec_ctrl_current which is at odd with the other code that
keep track of its state in that percpu variable. Fix that by updating
x86_spec_ctrl_current percpu value to always match the content of the
SPEC_CTRL MSR.
Signed-off-by: Waiman Long <longman@redhat.com>
---
drivers/idle/intel_idle.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index aa2d19db2b1d..07fa23707b3c 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -181,13 +181,17 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
u64 spec_ctrl = spec_ctrl_current();
int ret;
- if (smt_active)
+ if (smt_active) {
+ __this_cpu_write(x86_spec_ctrl_current, 0);
native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+ }
ret = __intel_idle(dev, drv, index);
- if (smt_active)
+ if (smt_active) {
native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
+ __this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
+ }
return ret;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v3 2/3] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
2023-06-22 0:36 ` [PATCH v3 2/3] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current Waiman Long
@ 2023-06-22 5:46 ` Josh Poimboeuf
2023-06-22 9:38 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Josh Poimboeuf @ 2023-06-22 5:46 UTC (permalink / raw)
To: Waiman Long
Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Pawan Gupta, Jacob Pan, Len Brown,
linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario
On Wed, Jun 21, 2023 at 08:36:02PM -0400, Waiman Long wrote:
> When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to
> 0 in order disable IBRS. However, the new MSR value isn't reflected
> in x86_spec_ctrl_current which is at odd with the other code that
> keep track of its state in that percpu variable. Fix that by updating
> x86_spec_ctrl_current percpu value to always match the content of the
> SPEC_CTRL MSR.
Is this fixing an actual bug or is there some other reason for doing
this?
>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> drivers/idle/intel_idle.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index aa2d19db2b1d..07fa23707b3c 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -181,13 +181,17 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
> u64 spec_ctrl = spec_ctrl_current();
> int ret;
>
> - if (smt_active)
> + if (smt_active) {
> + __this_cpu_write(x86_spec_ctrl_current, 0);
> native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> + }
>
> ret = __intel_idle(dev, drv, index);
>
> - if (smt_active)
> + if (smt_active) {
> native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
> + __this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
> + }
More candidates for update_spec_ctrl()?
--
Josh
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3 2/3] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
2023-06-22 5:46 ` Josh Poimboeuf
@ 2023-06-22 9:38 ` Peter Zijlstra
2023-06-22 12:43 ` Waiman Long
2023-06-22 13:08 ` Peter Zijlstra
2023-06-22 9:40 ` Peter Zijlstra
2023-06-22 12:31 ` Waiman Long
2 siblings, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2023-06-22 9:38 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Waiman Long, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Pawan Gupta, Jacob Pan, Len Brown,
linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario
On Wed, Jun 21, 2023 at 10:46:33PM -0700, Josh Poimboeuf wrote:
> On Wed, Jun 21, 2023 at 08:36:02PM -0400, Waiman Long wrote:
> > When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to
> > 0 in order disable IBRS. However, the new MSR value isn't reflected
> > in x86_spec_ctrl_current which is at odd with the other code that
> > keep track of its state in that percpu variable. Fix that by updating
> > x86_spec_ctrl_current percpu value to always match the content of the
> > SPEC_CTRL MSR.
>
> Is this fixing an actual bug or is there some other reason for doing
> this?
>
> >
> > Signed-off-by: Waiman Long <longman@redhat.com>
> > ---
> > drivers/idle/intel_idle.c | 8 ++++++--
> > 1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> > index aa2d19db2b1d..07fa23707b3c 100644
> > --- a/drivers/idle/intel_idle.c
> > +++ b/drivers/idle/intel_idle.c
> > @@ -181,13 +181,17 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
> > u64 spec_ctrl = spec_ctrl_current();
> > int ret;
> >
> > - if (smt_active)
> > + if (smt_active) {
> > + __this_cpu_write(x86_spec_ctrl_current, 0);
> > native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> > + }
> >
> > ret = __intel_idle(dev, drv, index);
> >
> > - if (smt_active)
> > + if (smt_active) {
> > native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
> > + __this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
> > + }
>
> More candidates for update_spec_ctrl()?
Both this and the play_dead case can't use update_spec_ctrl() because
RCU isn't there anymore and all that is noinstr. Additionally, both
sites rely on preemption being off already, where update_spec_ctrl()
can't do that.
That said, I suppose one could write it like so:
static __always_inline __update_spec_ctrl(u64 val)
{
__this_cpu_write(x86_spec_ctrl_current, val);
native_wrmsrl(MSR_IA32_SPEC_CTRL, val);
}
static void update_spec_ctrl(u64 val)
{
preempt_disable();
__update_spec_ctrl(val);
preempt_enable();
}
And then you can use __update_spec_ctrl(). But that would need a wee
audit of using native_wrmsrl() in all places, probably ok, IIRC Xen
wasn't using our IBRS stuff anyway.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3 2/3] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
2023-06-22 9:38 ` Peter Zijlstra
@ 2023-06-22 12:43 ` Waiman Long
2023-06-22 13:08 ` Peter Zijlstra
1 sibling, 0 replies; 15+ messages in thread
From: Waiman Long @ 2023-06-22 12:43 UTC (permalink / raw)
To: Peter Zijlstra, Josh Poimboeuf
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Pawan Gupta, Jacob Pan, Len Brown, linux-kernel,
x86, linux-pm, Robin Jarry, Joe Mario
On 6/22/23 05:38, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 10:46:33PM -0700, Josh Poimboeuf wrote:
>> On Wed, Jun 21, 2023 at 08:36:02PM -0400, Waiman Long wrote:
>>> When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to
>>> 0 in order disable IBRS. However, the new MSR value isn't reflected
>>> in x86_spec_ctrl_current which is at odd with the other code that
>>> keep track of its state in that percpu variable. Fix that by updating
>>> x86_spec_ctrl_current percpu value to always match the content of the
>>> SPEC_CTRL MSR.
>> Is this fixing an actual bug or is there some other reason for doing
>> this?
>>
>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>> ---
>>> drivers/idle/intel_idle.c | 8 ++++++--
>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
>>> index aa2d19db2b1d..07fa23707b3c 100644
>>> --- a/drivers/idle/intel_idle.c
>>> +++ b/drivers/idle/intel_idle.c
>>> @@ -181,13 +181,17 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
>>> u64 spec_ctrl = spec_ctrl_current();
>>> int ret;
>>>
>>> - if (smt_active)
>>> + if (smt_active) {
>>> + __this_cpu_write(x86_spec_ctrl_current, 0);
>>> native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>>> + }
>>>
>>> ret = __intel_idle(dev, drv, index);
>>>
>>> - if (smt_active)
>>> + if (smt_active) {
>>> native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
>>> + __this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
>>> + }
>> More candidates for update_spec_ctrl()?
> Both this and the play_dead case can't use update_spec_ctrl() because
> RCU isn't there anymore and all that is noinstr. Additionally, both
> sites rely on preemption being off already, where update_spec_ctrl()
> can't do that.
>
> That said, I suppose one could write it like so:
>
> static __always_inline __update_spec_ctrl(u64 val)
> {
> __this_cpu_write(x86_spec_ctrl_current, val);
> native_wrmsrl(MSR_IA32_SPEC_CTRL, val);
> }
We can put this into asm/nospec-branch.h since x86_spec_ctrl_current is
defined there as well.
Cheers,
Longman
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v3 2/3] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
2023-06-22 9:38 ` Peter Zijlstra
2023-06-22 12:43 ` Waiman Long
@ 2023-06-22 13:08 ` Peter Zijlstra
1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2023-06-22 13:08 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Waiman Long, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Pawan Gupta, Jacob Pan, Len Brown,
linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario
On Thu, Jun 22, 2023 at 11:38:28AM +0200, Peter Zijlstra wrote:
> Both this and the play_dead case can't use update_spec_ctrl() because
> RCU isn't there anymore and all that is noinstr. Additionally, both
> sites rely on preemption being off already, where update_spec_ctrl()
> can't do that.
Oh, I might be wrong about the preemption thing; it doesn't make sense
to do wrmsr with preemption on, so it could be simpler.
> That said, I suppose one could write it like so:
>
> static __always_inline __update_spec_ctrl(u64 val)
> {
> __this_cpu_write(x86_spec_ctrl_current, val);
> native_wrmsrl(MSR_IA32_SPEC_CTRL, val);
> }
>
> static void update_spec_ctrl(u64 val)
> {
> preempt_disable();
> __update_spec_ctrl(val);
> preempt_enable();
> }
>
> And then you can use __update_spec_ctrl(). But that would need a wee
> audit of using native_wrmsrl() in all places, probably ok, IIRC Xen
> wasn't using our IBRS stuff anyway.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/3] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
2023-06-22 5:46 ` Josh Poimboeuf
2023-06-22 9:38 ` Peter Zijlstra
@ 2023-06-22 9:40 ` Peter Zijlstra
2023-06-22 12:34 ` Waiman Long
2023-06-22 12:31 ` Waiman Long
2 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2023-06-22 9:40 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Waiman Long, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Pawan Gupta, Jacob Pan, Len Brown,
linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario
On Wed, Jun 21, 2023 at 10:46:33PM -0700, Josh Poimboeuf wrote:
> On Wed, Jun 21, 2023 at 08:36:02PM -0400, Waiman Long wrote:
> > When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to
> > 0 in order disable IBRS. However, the new MSR value isn't reflected
> > in x86_spec_ctrl_current which is at odd with the other code that
> > keep track of its state in that percpu variable. Fix that by updating
> > x86_spec_ctrl_current percpu value to always match the content of the
> > SPEC_CTRL MSR.
>
> Is this fixing an actual bug or is there some other reason for doing
> this?
No actual bug, he did this for his debugfs file -- which is no longer
part of the series. With that on, you can observe the
x86_spec_ctrl_current value while idle.
Arguably that's not even inconsistent because we disable/enable the
thing with IRQs disabled, so nothing can observe the difference.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/3] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
2023-06-22 9:40 ` Peter Zijlstra
@ 2023-06-22 12:34 ` Waiman Long
0 siblings, 0 replies; 15+ messages in thread
From: Waiman Long @ 2023-06-22 12:34 UTC (permalink / raw)
To: Peter Zijlstra, Josh Poimboeuf
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
H. Peter Anvin, Pawan Gupta, Jacob Pan, Len Brown, linux-kernel,
x86, linux-pm, Robin Jarry, Joe Mario
On 6/22/23 05:40, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 10:46:33PM -0700, Josh Poimboeuf wrote:
>> On Wed, Jun 21, 2023 at 08:36:02PM -0400, Waiman Long wrote:
>>> When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to
>>> 0 in order disable IBRS. However, the new MSR value isn't reflected
>>> in x86_spec_ctrl_current which is at odd with the other code that
>>> keep track of its state in that percpu variable. Fix that by updating
>>> x86_spec_ctrl_current percpu value to always match the content of the
>>> SPEC_CTRL MSR.
>> Is this fixing an actual bug or is there some other reason for doing
>> this?
> No actual bug, he did this for his debugfs file -- which is no longer
> part of the series. With that on, you can observe the
> x86_spec_ctrl_current value while idle.
Right. That is the main reason as I want the SPEC_CTRL MSRs value to be
observable by some external mean:-)
Regards,
Longman
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v3 2/3] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
2023-06-22 5:46 ` Josh Poimboeuf
2023-06-22 9:38 ` Peter Zijlstra
2023-06-22 9:40 ` Peter Zijlstra
@ 2023-06-22 12:31 ` Waiman Long
2 siblings, 0 replies; 15+ messages in thread
From: Waiman Long @ 2023-06-22 12:31 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Pawan Gupta, Jacob Pan, Len Brown,
linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario
On 6/22/23 01:46, Josh Poimboeuf wrote:
> On Wed, Jun 21, 2023 at 08:36:02PM -0400, Waiman Long wrote:
>> When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to
>> 0 in order disable IBRS. However, the new MSR value isn't reflected
>> in x86_spec_ctrl_current which is at odd with the other code that
>> keep track of its state in that percpu variable. Fix that by updating
>> x86_spec_ctrl_current percpu value to always match the content of the
>> SPEC_CTRL MSR.
> Is this fixing an actual bug or is there some other reason for doing
> this?
It is not a bug per se. It is mainly to make the per cpu variable more
up to date.
>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> drivers/idle/intel_idle.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
>> index aa2d19db2b1d..07fa23707b3c 100644
>> --- a/drivers/idle/intel_idle.c
>> +++ b/drivers/idle/intel_idle.c
>> @@ -181,13 +181,17 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
>> u64 spec_ctrl = spec_ctrl_current();
>> int ret;
>>
>> - if (smt_active)
>> + if (smt_active) {
>> + __this_cpu_write(x86_spec_ctrl_current, 0);
>> native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>> + }
>>
>> ret = __intel_idle(dev, drv, index);
>>
>> - if (smt_active)
>> + if (smt_active) {
>> native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
>> + __this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
>> + }
> More candidates for update_spec_ctrl()?
I don't think we can use update_spec_ctrl() here simply because of the
noinstr requirement. See commit 9b461a6faae7 ("cpuidle, intel_idle: Fix
CPUIDLE_FLAG_IBRS").
Cheers,
Longman
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 3/3] intel_idle: Add ibrs_off module parameter to force disable IBRS
2023-06-22 0:36 [PATCH v3 0/3] x86/speculation: Disable IBRS when idle Waiman Long
2023-06-22 0:36 ` [PATCH v3 1/3] x86/idle: Disable IBRS when cpu is offline Waiman Long
2023-06-22 0:36 ` [PATCH v3 2/3] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current Waiman Long
@ 2023-06-22 0:36 ` Waiman Long
2 siblings, 0 replies; 15+ messages in thread
From: Waiman Long @ 2023-06-22 0:36 UTC (permalink / raw)
To: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Pawan Gupta,
Jacob Pan, Len Brown
Cc: linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario, Waiman Long
Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the cstate is 6 or lower. However, there are
some use cases where a customer may want to use max_cstate=1 to
lower latency. Such use cases will suffer from the performance
degradation caused by the enabling of IBRS in the sibling idle thread.
Add a "ibrs_off" module parameter to force disable IBRS and the
CPUIDLE_FLAG_IRQ_ENABLE flag if set.
In the case of a Skylake server with max_cstate=1, this new ibrs_off
option will likely increase the IRQ response latency as IRQ will now
be disabled.
When running SPECjbb2015 with cstates set to C1 on a Skylake system.
First test when the kernel is booted with: "intel_idle.ibrs_off"
max-jOPS = 117828, critical-jOPS = 66047
Then retest when the kernel is booted without the "intel_idle.ibrs_off"
added.
max-jOPS = 116408, critical-jOPS = 58958
That means booting with "intel_idle.ibrs_off" improves performance by:
max-jOPS: 1.2%, which could be considered noise range.
critical-jOPS: 12%, which is definitely a solid improvement.
The admin-guide/pm/intel_idle.rst file is updated to add a description
about the new "ibrs_off" module parameter.
Signed-off-by: Waiman Long <longman@redhat.com>
---
Documentation/admin-guide/pm/intel_idle.rst | 17 ++++++++++++++++-
drivers/idle/intel_idle.c | 14 ++++++++++++--
2 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst
index b799a43da62e..8604e6d1fe2c 100644
--- a/Documentation/admin-guide/pm/intel_idle.rst
+++ b/Documentation/admin-guide/pm/intel_idle.rst
@@ -170,7 +170,7 @@ and ``idle=nomwait``. If any of them is present in the kernel command line, the
``MWAIT`` instruction is not allowed to be used, so the initialization of
``intel_idle`` will fail.
-Apart from that there are four module parameters recognized by ``intel_idle``
+Apart from that there are five module parameters recognized by ``intel_idle``
itself that can be set via the kernel command line (they cannot be updated via
sysfs, so that is the only way to change their values).
@@ -216,6 +216,21 @@ are ignored).
The idle states disabled this way can be enabled (on a per-CPU basis) from user
space via ``sysfs``.
+The ``ibrs_off`` module parameter is a boolean flag (default to false). It is
+used to control if IBRS (Indirect Branch Restricted Speculation) should be
+turned off, if set, when the CPU enters an idle state. This flag will not
+affect CPUs that are using Enhanced IBRS which can remain on with little
+performance impact.
+
+For some CPUs, IBRS will be selected as mitigation for Spectre v2 and Retbleed
+security vulnerabilities by default. Leaving the IBRS mode on while idling may
+have a performance impact on its sibling CPU. The IBRS mode will be turned off
+by default when the CPU enters into a deep idle state, but not in some
+shallower ones. Setting the ``ibrs_off`` module parameter will force the IBRS
+mode to off when the CPU is in any one of the available idle states. This may
+help performance of a sibling CPU at the expense of a slightly higher wakeup
+latency for the idle CPU.
+
.. _intel-idle-core-and-package-idle-states:
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 07fa23707b3c..8bbf1e6d845c 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -69,6 +69,7 @@ static int max_cstate = CPUIDLE_STATE_MAX - 1;
static unsigned int disabled_states_mask __read_mostly;
static unsigned int preferred_states_mask __read_mostly;
static bool force_irq_on __read_mostly;
+static bool ibrs_off __read_mostly;
static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
@@ -1907,12 +1908,15 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
state->enter = intel_idle_xstate;
} else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
- state->flags & CPUIDLE_FLAG_IBRS) {
+ ((state->flags & CPUIDLE_FLAG_IBRS) || ibrs_off)) {
/*
* IBRS mitigation requires that C-states are entered
* with interrupts disabled.
*/
- WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
+ if (ibrs_off && (state->flags & CPUIDLE_FLAG_IRQ_ENABLE))
+ state->flags &= ~CPUIDLE_FLAG_IRQ_ENABLE;
+ else
+ WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
state->enter = intel_idle_ibrs;
} else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
state->enter = intel_idle_irq;
@@ -2165,3 +2169,9 @@ MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states");
* 'CPUIDLE_FLAG_INIT_XSTATE' and 'CPUIDLE_FLAG_IBRS' flags.
*/
module_param(force_irq_on, bool, 0444);
+/*
+ * Force the disabling of IBRS when X86_FEATURE_KERNEL_IBRS is on and
+ * CPUIDLE_FLAG_IRQ_ENABLE isn't set.
+ */
+module_param(ibrs_off, bool, 0444);
+MODULE_PARM_DESC(ibrs_off, "Disable IBRS when idle");
--
2.31.1
^ permalink raw reply related [flat|nested] 15+ messages in thread