* [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR
@ 2025-04-02 17:24 Andrew Cooper
2025-04-02 17:35 ` Dave Hansen
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Andrew Cooper @ 2025-04-02 17:24 UTC (permalink / raw)
To: LKML
Cc: Andrew Cooper, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin
Commit 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH
workaround, add barriers") adds barriers, justified with:
... and add memory barriers around it since the documentation is explicit
that CLFLUSH is only ordered with respect to MFENCE.
This also triggered the same adjustment in commit
f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched
IPIs") during development, although it failed to get the static_cpu_has_bug()
treatment.
X86_BUG_CLFLUSH_MONITOR (a.k.a the AAI65 errata) is specific to Intel CPUs,
and the SDM currently states:
Executions of the CLFLUSH instruction are ordered with respect to each
other and with respect to writes, locked read-modify-write instructions,
and fence instructions[1].
With footnote 1 reading:
Earlier versions of this manual specified that executions of the CLFLUSH
instruction were ordered only by the MFENCE instruction. All processors
implementing the CLFLUSH instruction also order it relative to the other
operations enumerated above.
i.e. The SDM was incorrect at the time, and barriers should not have been
inserted. Double checking the original AAI65 errata (not available from
intel.com any more) shows no mention of barriers either.
Note: If this were a general codepath, the MFENCEs would be needed, because
AMD CPUs of the same vintage do sport otherwise-unordered CLFLUSHs.
Furthermore, use a plain alternative, rather than static_cpu_has_bug() and/or
no optimisation. The workaround is a single instruction.
Use an explicit %rax pointer rather than a general memory operand, because
MONITOR takes the pointer implicitly in the same way.
Link: https://web.archive.org/web/20090219054841/http://download.intel.com/design/xeon/specupdt/32033601.pdf
Fixes: 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers")
Fixes: f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: x86@kernel.org
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: linux-kernel@vger.kernel.org
v2:
* Fix the same pattern in mwait_idle() too
* Expand on why we're not using a general memory operand.
---
arch/x86/include/asm/mwait.h | 11 +++++------
arch/x86/kernel/process.c | 10 ++++------
2 files changed, 9 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index ce857ef54cf1..108d25bd4597 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -116,13 +116,12 @@ static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
{
if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
- if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
- mb();
- clflush((void *)¤t_thread_info()->flags);
- mb();
- }
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
+ const void *addr = ¤t_thread_info()->flags;
+
+ alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR,
+ [addr] "a" (addr));
+ __monitor(addr, 0, 0);
if (!need_resched()) {
if (ecx & 1) {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 91f6ff618852..a1a41b3d94d2 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -907,13 +907,11 @@ static __init bool prefer_mwait_c1_over_halt(void)
static __cpuidle void mwait_idle(void)
{
if (!current_set_polling_and_test()) {
- if (this_cpu_has(X86_BUG_CLFLUSH_MONITOR)) {
- mb(); /* quirk */
- clflush((void *)¤t_thread_info()->flags);
- mb(); /* quirk */
- }
+ const void *addr = ¤t_thread_info()->flags;
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
+ alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR,
+ [addr] "a" (addr));
+ __monitor(addr, 0, 0);
if (!need_resched()) {
__sti_mwait(0, 0);
raw_local_irq_disable();
base-commit: acc4d5ff0b61eb1715c498b6536c38c1feb7f3c1
--
2.39.5
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR
2025-04-02 17:24 [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR Andrew Cooper
@ 2025-04-02 17:35 ` Dave Hansen
2025-04-02 20:04 ` Ingo Molnar
2025-04-02 19:35 ` [tip: x86/mm] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR in mwait_idle_with_hints() and prefer_mwait_c1_over_halt() tip-bot2 for Andrew Cooper
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Dave Hansen @ 2025-04-02 17:35 UTC (permalink / raw)
To: Andrew Cooper, LKML
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin
On 4/2/25 10:24, Andrew Cooper wrote:
...
> i.e. The SDM was incorrect at the time, and barriers should not have been
> inserted. Double checking the original AAI65 errata (not available from
> intel.com any more) shows no mention of barriers either.
>
> Note: If this were a general codepath, the MFENCEs would be needed, because
> AMD CPUs of the same vintage do sport otherwise-unordered CLFLUSHs.
>
> Furthermore, use a plain alternative, rather than static_cpu_has_bug() and/or
> no optimisation. The workaround is a single instruction.
Nit: this never comes out and says that it removes the unnecessary
barriers. But we can fix that up when we apply this by adding:
Remove the unnecessary barriers. Furthermore, use a plain
alternative, rather than static_cpu_has_bug() and/or
no optimisation. The workaround is a single instruction.
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* [tip: x86/mm] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR in mwait_idle_with_hints() and prefer_mwait_c1_over_halt()
2025-04-02 17:24 [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR Andrew Cooper
2025-04-02 17:35 ` Dave Hansen
@ 2025-04-02 19:35 ` tip-bot2 for Andrew Cooper
2025-04-02 20:32 ` tip-bot2 for Andrew Cooper
2025-04-04 14:14 ` [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR Uros Bizjak
3 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Andrew Cooper @ 2025-04-02 19:35 UTC (permalink / raw)
To: linux-tip-commits
Cc: Andrew Cooper, Ingo Molnar, Dave Hansen, Borislav Petkov (AMD),
H. Peter Anvin, Peter Zijlstra, Rik van Riel, Linus Torvalds,
Andy Lutomirski, Brian Gerst, Juergen Gross, Rafael J. Wysocki,
x86, linux-kernel
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: fda2d1d86abc2d83ce6163536cc75c1f9fec1d72
Gitweb: https://git.kernel.org/tip/fda2d1d86abc2d83ce6163536cc75c1f9fec1d72
Author: Andrew Cooper <andrew.cooper3@citrix.com>
AuthorDate: Wed, 02 Apr 2025 18:24:58 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 02 Apr 2025 21:27:49 +02:00
x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR in mwait_idle_with_hints() and prefer_mwait_c1_over_halt()
The following commit, 12 years ago:
7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers")
added barriers around the CLFLUSH in mwait_idle_with_hints(), justified with:
... and add memory barriers around it since the documentation is explicit
that CLFLUSH is only ordered with respect to MFENCE.
This also triggered, 11 years ago, the same adjustment in:
f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs")
during development, although it failed to get the static_cpu_has_bug() treatment.
X86_BUG_CLFLUSH_MONITOR (a.k.a the AAI65 errata) is specific to Intel CPUs,
and the SDM currently states:
Executions of the CLFLUSH instruction are ordered with respect to each
other and with respect to writes, locked read-modify-write instructions,
and fence instructions[1].
With footnote 1 reading:
Earlier versions of this manual specified that executions of the CLFLUSH
instruction were ordered only by the MFENCE instruction. All processors
implementing the CLFLUSH instruction also order it relative to the other
operations enumerated above.
i.e. The SDM was incorrect at the time, and barriers should not have been
inserted. Double checking the original AAI65 errata (not available from
intel.com any more) shows no mention of barriers either.
Note: If this were a general codepath, the MFENCEs would be needed, because
AMD CPUs of the same vintage do sport otherwise-unordered CLFLUSHs.
Furthermore, use a plain alternative(), rather than static_cpu_has_bug() and/or
no optimisation. The workaround is a single instruction.
Use an explicit %rax pointer rather than a general memory operand, because
MONITOR takes the pointer implicitly in the same way.
[ mingo: Cleaned up the commit a bit. ]
Fixes: 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20250402172458.1378112-1-andrew.cooper3@citrix.com
---
arch/x86/include/asm/mwait.h | 9 +++------
arch/x86/kernel/process.c | 9 +++------
2 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index ce857ef..54dc313 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -116,13 +116,10 @@ static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
{
if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
- if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
- mb();
- clflush((void *)¤t_thread_info()->flags);
- mb();
- }
+ const void *addr = ¤t_thread_info()->flags;
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
+ alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
+ __monitor(addr, 0, 0);
if (!need_resched()) {
if (ecx & 1) {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 91f6ff6..bda47d9 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -907,13 +907,10 @@ static __init bool prefer_mwait_c1_over_halt(void)
static __cpuidle void mwait_idle(void)
{
if (!current_set_polling_and_test()) {
- if (this_cpu_has(X86_BUG_CLFLUSH_MONITOR)) {
- mb(); /* quirk */
- clflush((void *)¤t_thread_info()->flags);
- mb(); /* quirk */
- }
+ const void *addr = ¤t_thread_info()->flags;
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
+ alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
+ __monitor(addr, 0, 0);
if (!need_resched()) {
__sti_mwait(0, 0);
raw_local_irq_disable();
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR
2025-04-02 17:35 ` Dave Hansen
@ 2025-04-02 20:04 ` Ingo Molnar
0 siblings, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2025-04-02 20:04 UTC (permalink / raw)
To: Dave Hansen
Cc: Andrew Cooper, LKML, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
* Dave Hansen <dave.hansen@intel.com> wrote:
> On 4/2/25 10:24, Andrew Cooper wrote:
> ...
> > i.e. The SDM was incorrect at the time, and barriers should not have been
> > inserted. Double checking the original AAI65 errata (not available from
> > intel.com any more) shows no mention of barriers either.
> >
> > Note: If this were a general codepath, the MFENCEs would be needed, because
> > AMD CPUs of the same vintage do sport otherwise-unordered CLFLUSHs.
> >
> > Furthermore, use a plain alternative, rather than static_cpu_has_bug() and/or
> > no optimisation. The workaround is a single instruction.
>
> Nit: this never comes out and says that it removes the unnecessary
> barriers. But we can fix that up when we apply this by adding:
>
> Remove the unnecessary barriers. Furthermore, use a plain
> alternative, rather than static_cpu_has_bug() and/or
> no optimisation. The workaround is a single instruction.
So the title says it already:
x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR
But I've added in your sentence as well, because there's no such thing
as too much clarity in a changelog. :-)
> Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Thanks!
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* [tip: x86/mm] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR in mwait_idle_with_hints() and prefer_mwait_c1_over_halt()
2025-04-02 17:24 [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR Andrew Cooper
2025-04-02 17:35 ` Dave Hansen
2025-04-02 19:35 ` [tip: x86/mm] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR in mwait_idle_with_hints() and prefer_mwait_c1_over_halt() tip-bot2 for Andrew Cooper
@ 2025-04-02 20:32 ` tip-bot2 for Andrew Cooper
2025-04-04 14:14 ` [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR Uros Bizjak
3 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Andrew Cooper @ 2025-04-02 20:32 UTC (permalink / raw)
To: linux-tip-commits
Cc: Andrew Cooper, Ingo Molnar, Dave Hansen, Borislav Petkov (AMD),
H. Peter Anvin, Peter Zijlstra, Rik van Riel, Linus Torvalds,
Andy Lutomirski, Brian Gerst, Juergen Gross, Rafael J. Wysocki,
x86, linux-kernel
The following commit has been merged into the x86/mm branch of tip:
Commit-ID: 1f13c60d84e880df6698441026e64f84c7110c49
Gitweb: https://git.kernel.org/tip/1f13c60d84e880df6698441026e64f84c7110c49
Author: Andrew Cooper <andrew.cooper3@citrix.com>
AuthorDate: Wed, 02 Apr 2025 18:24:58 +01:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 02 Apr 2025 22:02:26 +02:00
x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR in mwait_idle_with_hints() and prefer_mwait_c1_over_halt()
The following commit, 12 years ago:
7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers")
added barriers around the CLFLUSH in mwait_idle_with_hints(), justified with:
... and add memory barriers around it since the documentation is explicit
that CLFLUSH is only ordered with respect to MFENCE.
This also triggered, 11 years ago, the same adjustment in:
f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs")
during development, although it failed to get the static_cpu_has_bug() treatment.
X86_BUG_CLFLUSH_MONITOR (a.k.a the AAI65 errata) is specific to Intel CPUs,
and the SDM currently states:
Executions of the CLFLUSH instruction are ordered with respect to each
other and with respect to writes, locked read-modify-write instructions,
and fence instructions[1].
With footnote 1 reading:
Earlier versions of this manual specified that executions of the CLFLUSH
instruction were ordered only by the MFENCE instruction. All processors
implementing the CLFLUSH instruction also order it relative to the other
operations enumerated above.
i.e. The SDM was incorrect at the time, and barriers should not have been
inserted. Double checking the original AAI65 errata (not available from
intel.com any more) shows no mention of barriers either.
Note: If this were a general codepath, the MFENCEs would be needed, because
AMD CPUs of the same vintage do sport otherwise-unordered CLFLUSHs.
Remove the unnecessary barriers. Furthermore, use a plain alternative(),
rather than static_cpu_has_bug() and/or no optimisation. The workaround
is a single instruction.
Use an explicit %rax pointer rather than a general memory operand, because
MONITOR takes the pointer implicitly in the same way.
[ mingo: Cleaned up the commit a bit. ]
Fixes: 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20250402172458.1378112-1-andrew.cooper3@citrix.com
---
arch/x86/include/asm/mwait.h | 9 +++------
arch/x86/kernel/process.c | 9 +++------
2 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index ce857ef..54dc313 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -116,13 +116,10 @@ static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
{
if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
- if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
- mb();
- clflush((void *)¤t_thread_info()->flags);
- mb();
- }
+ const void *addr = ¤t_thread_info()->flags;
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
+ alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
+ __monitor(addr, 0, 0);
if (!need_resched()) {
if (ecx & 1) {
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 91f6ff6..bda47d9 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -907,13 +907,10 @@ static __init bool prefer_mwait_c1_over_halt(void)
static __cpuidle void mwait_idle(void)
{
if (!current_set_polling_and_test()) {
- if (this_cpu_has(X86_BUG_CLFLUSH_MONITOR)) {
- mb(); /* quirk */
- clflush((void *)¤t_thread_info()->flags);
- mb(); /* quirk */
- }
+ const void *addr = ¤t_thread_info()->flags;
- __monitor((void *)¤t_thread_info()->flags, 0, 0);
+ alternative_input("", "clflush (%[addr])", X86_BUG_CLFLUSH_MONITOR, [addr] "a" (addr));
+ __monitor(addr, 0, 0);
if (!need_resched()) {
__sti_mwait(0, 0);
raw_local_irq_disable();
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR
2025-04-02 17:24 [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR Andrew Cooper
` (2 preceding siblings ...)
2025-04-02 20:32 ` tip-bot2 for Andrew Cooper
@ 2025-04-04 14:14 ` Uros Bizjak
2025-04-04 14:16 ` Andrew Cooper
2025-04-06 17:21 ` Ingo Molnar
3 siblings, 2 replies; 9+ messages in thread
From: Uros Bizjak @ 2025-04-04 14:14 UTC (permalink / raw)
To: Andrew Cooper, LKML
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin
On 2. 04. 25 19:24, Andrew Cooper wrote:
> Commit 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH
> workaround, add barriers") adds barriers, justified with:
>
> ... and add memory barriers around it since the documentation is explicit
> that CLFLUSH is only ordered with respect to MFENCE.
>
> This also triggered the same adjustment in commit
> f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched
> IPIs") during development, although it failed to get the static_cpu_has_bug()
> treatment.
>
> X86_BUG_CLFLUSH_MONITOR (a.k.a the AAI65 errata) is specific to Intel CPUs,
> and the SDM currently states:
>
> Executions of the CLFLUSH instruction are ordered with respect to each
> other and with respect to writes, locked read-modify-write instructions,
> and fence instructions[1].
>
> With footnote 1 reading:
>
> Earlier versions of this manual specified that executions of the CLFLUSH
> instruction were ordered only by the MFENCE instruction. All processors
> implementing the CLFLUSH instruction also order it relative to the other
> operations enumerated above.
>
> i.e. The SDM was incorrect at the time, and barriers should not have been
> inserted. Double checking the original AAI65 errata (not available from
> intel.com any more) shows no mention of barriers either.
>
> Note: If this were a general codepath, the MFENCEs would be needed, because
> AMD CPUs of the same vintage do sport otherwise-unordered CLFLUSHs.
>
> Furthermore, use a plain alternative, rather than static_cpu_has_bug() and/or
> no optimisation. The workaround is a single instruction.
>
> Use an explicit %rax pointer rather than a general memory operand, because
> MONITOR takes the pointer implicitly in the same way.
>
> Link: https://web.archive.org/web/20090219054841/http://download.intel.com/design/xeon/specupdt/32033601.pdf
> Fixes: 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers")
> Fixes: f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: Borislav Petkov <bp@alien8.de>
> CC: Dave Hansen <dave.hansen@linux.intel.com>
> CC: x86@kernel.org
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: linux-kernel@vger.kernel.org
>
> v2:
> * Fix the same pattern in mwait_idle() too
> * Expand on why we're not using a general memory operand.
> ---
> arch/x86/include/asm/mwait.h | 11 +++++------
> arch/x86/kernel/process.c | 10 ++++------
> 2 files changed, 9 insertions(+), 12 deletions(-)
There is another instance of the same sequence in arch/x86/kernel/smpboot.c:
/*
* The CLFLUSH is a workaround for erratum AAI65 for
* the Xeon 7400 series. It's not clear it is actually
* needed, but it should be harmless in either case.
* The WBINVD is insufficient due to the spurious-wakeup
* case where we return around the loop.
*/
mb();
clflush(md);
mb();
__monitor(md, 0, 0);
mb();
__mwait(eax_hint, 0);
Should this also be converted to the new sequence?
Uros.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR
2025-04-04 14:14 ` [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR Uros Bizjak
@ 2025-04-04 14:16 ` Andrew Cooper
2025-04-04 14:49 ` Uros Bizjak
2025-04-06 17:21 ` Ingo Molnar
1 sibling, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2025-04-04 14:16 UTC (permalink / raw)
To: Uros Bizjak, LKML
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin
On 04/04/2025 3:14 pm, Uros Bizjak wrote:
>
>
> On 2. 04. 25 19:24, Andrew Cooper wrote:
>> Commit 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH
>> workaround, add barriers") adds barriers, justified with:
>>
>> ... and add memory barriers around it since the documentation is
>> explicit
>> that CLFLUSH is only ordered with respect to MFENCE.
>>
>> This also triggered the same adjustment in commit
>> f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched
>> IPIs") during development, although it failed to get the
>> static_cpu_has_bug()
>> treatment.
>>
>> X86_BUG_CLFLUSH_MONITOR (a.k.a the AAI65 errata) is specific to Intel
>> CPUs,
>> and the SDM currently states:
>>
>> Executions of the CLFLUSH instruction are ordered with respect to
>> each
>> other and with respect to writes, locked read-modify-write
>> instructions,
>> and fence instructions[1].
>>
>> With footnote 1 reading:
>>
>> Earlier versions of this manual specified that executions of the
>> CLFLUSH
>> instruction were ordered only by the MFENCE instruction. All
>> processors
>> implementing the CLFLUSH instruction also order it relative to the
>> other
>> operations enumerated above.
>>
>> i.e. The SDM was incorrect at the time, and barriers should not have
>> been
>> inserted. Double checking the original AAI65 errata (not available from
>> intel.com any more) shows no mention of barriers either.
>>
>> Note: If this were a general codepath, the MFENCEs would be needed,
>> because
>> AMD CPUs of the same vintage do sport otherwise-unordered
>> CLFLUSHs.
>>
>> Furthermore, use a plain alternative, rather than
>> static_cpu_has_bug() and/or
>> no optimisation. The workaround is a single instruction.
>>
>> Use an explicit %rax pointer rather than a general memory operand,
>> because
>> MONITOR takes the pointer implicitly in the same way.
>>
>> Link:
>> https://web.archive.org/web/20090219054841/http://download.intel.com/design/xeon/specupdt/32033601.pdf
>> Fixes: 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH
>> workaround, add barriers")
>> Fixes: f8e617f45829 ("sched/idle/x86: Optimize unnecessary
>> mwait_idle() resched IPIs")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Thomas Gleixner <tglx@linutronix.de>
>> CC: Ingo Molnar <mingo@redhat.com>
>> CC: Borislav Petkov <bp@alien8.de>
>> CC: Dave Hansen <dave.hansen@linux.intel.com>
>> CC: x86@kernel.org
>> CC: "H. Peter Anvin" <hpa@zytor.com>
>> CC: linux-kernel@vger.kernel.org
>>
>> v2:
>> * Fix the same pattern in mwait_idle() too
>> * Expand on why we're not using a general memory operand.
>> ---
>> arch/x86/include/asm/mwait.h | 11 +++++------
>> arch/x86/kernel/process.c | 10 ++++------
>> 2 files changed, 9 insertions(+), 12 deletions(-)
>
> There is another instance of the same sequence in
> arch/x86/kernel/smpboot.c:
>
> /*
> * The CLFLUSH is a workaround for erratum AAI65 for
> * the Xeon 7400 series. It's not clear it is actually
> * needed, but it should be harmless in either case.
> * The WBINVD is insufficient due to the spurious-wakeup
> * case where we return around the loop.
> */
> mb();
> clflush(md);
> mb();
> __monitor(md, 0, 0);
> mb();
> __mwait(eax_hint, 0);
>
> Should this also be converted to the new sequence?
Yes, it ought to.
I'm OoO for a while though. If you fancy doing a patch, please go
ahead. Or a maintainer could fold a 3rd hunk into this patch?
~Andrew
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR
2025-04-04 14:16 ` Andrew Cooper
@ 2025-04-04 14:49 ` Uros Bizjak
0 siblings, 0 replies; 9+ messages in thread
From: Uros Bizjak @ 2025-04-04 14:49 UTC (permalink / raw)
To: Andrew Cooper
Cc: LKML, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
x86, H. Peter Anvin
On Fri, Apr 4, 2025 at 4:16 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 04/04/2025 3:14 pm, Uros Bizjak wrote:
> >
> >
> > On 2. 04. 25 19:24, Andrew Cooper wrote:
> >> Commit 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH
> >> workaround, add barriers") adds barriers, justified with:
> >>
> >> ... and add memory barriers around it since the documentation is
> >> explicit
> >> that CLFLUSH is only ordered with respect to MFENCE.
> >>
> >> This also triggered the same adjustment in commit
> >> f8e617f45829 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched
> >> IPIs") during development, although it failed to get the
> >> static_cpu_has_bug()
> >> treatment.
> >>
> >> X86_BUG_CLFLUSH_MONITOR (a.k.a the AAI65 errata) is specific to Intel
> >> CPUs,
> >> and the SDM currently states:
> >>
> >> Executions of the CLFLUSH instruction are ordered with respect to
> >> each
> >> other and with respect to writes, locked read-modify-write
> >> instructions,
> >> and fence instructions[1].
> >>
> >> With footnote 1 reading:
> >>
> >> Earlier versions of this manual specified that executions of the
> >> CLFLUSH
> >> instruction were ordered only by the MFENCE instruction. All
> >> processors
> >> implementing the CLFLUSH instruction also order it relative to the
> >> other
> >> operations enumerated above.
> >>
> >> i.e. The SDM was incorrect at the time, and barriers should not have
> >> been
> >> inserted. Double checking the original AAI65 errata (not available from
> >> intel.com any more) shows no mention of barriers either.
> >>
> >> Note: If this were a general codepath, the MFENCEs would be needed,
> >> because
> >> AMD CPUs of the same vintage do sport otherwise-unordered
> >> CLFLUSHs.
> >>
> >> Furthermore, use a plain alternative, rather than
> >> static_cpu_has_bug() and/or
> >> no optimisation. The workaround is a single instruction.
> >>
> >> Use an explicit %rax pointer rather than a general memory operand,
> >> because
> >> MONITOR takes the pointer implicitly in the same way.
> >>
> >> Link:
> >> https://web.archive.org/web/20090219054841/http://download.intel.com/design/xeon/specupdt/32033601.pdf
> >> Fixes: 7e98b7192046 ("x86, idle: Use static_cpu_has() for CLFLUSH
> >> workaround, add barriers")
> >> Fixes: f8e617f45829 ("sched/idle/x86: Optimize unnecessary
> >> mwait_idle() resched IPIs")
> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> >> ---
> >> CC: Thomas Gleixner <tglx@linutronix.de>
> >> CC: Ingo Molnar <mingo@redhat.com>
> >> CC: Borislav Petkov <bp@alien8.de>
> >> CC: Dave Hansen <dave.hansen@linux.intel.com>
> >> CC: x86@kernel.org
> >> CC: "H. Peter Anvin" <hpa@zytor.com>
> >> CC: linux-kernel@vger.kernel.org
> >>
> >> v2:
> >> * Fix the same pattern in mwait_idle() too
> >> * Expand on why we're not using a general memory operand.
> >> ---
> >> arch/x86/include/asm/mwait.h | 11 +++++------
> >> arch/x86/kernel/process.c | 10 ++++------
> >> 2 files changed, 9 insertions(+), 12 deletions(-)
> >
> > There is another instance of the same sequence in
> > arch/x86/kernel/smpboot.c:
> >
> > /*
> > * The CLFLUSH is a workaround for erratum AAI65 for
> > * the Xeon 7400 series. It's not clear it is actually
> > * needed, but it should be harmless in either case.
> > * The WBINVD is insufficient due to the spurious-wakeup
> > * case where we return around the loop.
> > */
> > mb();
> > clflush(md);
> > mb();
> > __monitor(md, 0, 0);
> > mb();
> > __mwait(eax_hint, 0);
> >
> > Should this also be converted to the new sequence?
>
> Yes, it ought to.
>
> I'm OoO for a while though. If you fancy doing a patch, please go
> ahead. Or a maintainer could fold a 3rd hunk into this patch?
Heh, I learned the hard way not to touch the code I don't understand. ;)
Uros.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR
2025-04-04 14:14 ` [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR Uros Bizjak
2025-04-04 14:16 ` Andrew Cooper
@ 2025-04-06 17:21 ` Ingo Molnar
1 sibling, 0 replies; 9+ messages in thread
From: Ingo Molnar @ 2025-04-06 17:21 UTC (permalink / raw)
To: Uros Bizjak
Cc: Andrew Cooper, LKML, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin
* Uros Bizjak <ubizjak@gmail.com> wrote:
> There is another instance of the same sequence in arch/x86/kernel/smpboot.c:
>
> /*
> * The CLFLUSH is a workaround for erratum AAI65 for
> * the Xeon 7400 series. It's not clear it is actually
> * needed, but it should be harmless in either case.
> * The WBINVD is insufficient due to the spurious-wakeup
> * case where we return around the loop.
> */
> mb();
> clflush(md);
> mb();
> __monitor(md, 0, 0);
> mb();
> __mwait(eax_hint, 0);
True, but I wouldn't touch that - the shutdown path is not a
performance critical code path in any fashion, and it's often
difficult to debug on real hardware, so the cost/benefit factor
is abnormally high.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-04-06 17:21 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-02 17:24 [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR Andrew Cooper
2025-04-02 17:35 ` Dave Hansen
2025-04-02 20:04 ` Ingo Molnar
2025-04-02 19:35 ` [tip: x86/mm] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR in mwait_idle_with_hints() and prefer_mwait_c1_over_halt() tip-bot2 for Andrew Cooper
2025-04-02 20:32 ` tip-bot2 for Andrew Cooper
2025-04-04 14:14 ` [PATCH v2] x86/idle: Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR Uros Bizjak
2025-04-04 14:16 ` Andrew Cooper
2025-04-04 14:49 ` Uros Bizjak
2025-04-06 17:21 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox