linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch V1] x86, mce: Don't clear global error reporting banks during cpu_offline
@ 2015-09-04 18:29 Ashok Raj
  2015-09-11 19:31 ` Borislav Petkov
  2015-09-29  8:37 ` [tip:ras/core] x86/mce: Don' t clear shared banks on Intel when offlining CPUs tip-bot for Ashok Raj
  0 siblings, 2 replies; 3+ messages in thread
From: Ashok Raj @ 2015-09-04 18:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ashok Raj, Boris Petkov, linux-edac, Tony Luck, Serge Ayoun

During CPU offline, or during suspend/resume operations, its not safe to
clear MCi_CTL. These MSR's are either thread scoped (meaning private to
thread), or core scoped (private to threads in that core only), or socket
scope i.e visible and controllable from all threads in the socket.

When we turn off during CPU_OFFLINE, just offlining a single CPU will
stop signaling for all the socket wide resources, such as LLC, iMC for e.g.

It is true for Intel CPU's. But there seems some history that other processors
may require to turn these off during every CPU offline.

Intel Secure Guard eXtentions (SGX) is worried that it might be possible to
compromise integrity in a SGX system if the attacker has control of host system
to inject errors which would be otherwise ignored when MCi_CTL bits are
cleared. Hence on SGX enabled systems, if MCi_CTL is cleared SGX becomes not
available anymore.

- Consolidated some code to use sharing
- Minor changes to some prototypes to fit usage.
- Left handling same for non-Intel CPU models to avoid any unknown regressions.
- Fixed review comments from Boris

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Serge Ayoun <serge.ayoun@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index d350858..69c7e3c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2100,7 +2100,7 @@ int __init mcheck_init(void)
  * Disable machine checks on suspend and shutdown. We can't really handle
  * them later.
  */
-static int mce_disable_error_reporting(void)
+static void mce_disable_error_reporting(void)
 {
 	int i;
 
@@ -2110,17 +2110,32 @@ static int mce_disable_error_reporting(void)
 		if (b->init)
 			wrmsrl(MSR_IA32_MCx_CTL(i), 0);
 	}
-	return 0;
+	return;
+}
+
+static void vendor_disable_error_reporting(void)
+{
+	/*
+	 * Don't clear on Intel CPUs. Some of these MSRs are
+	 * socket wide. Disabling them for just a single CPU offline
+	 * is bad, since it will inhibit reporting for all shared
+	 * resources.. such as LLC, iMC for e.g.
+	 */
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		return;
+
+	mce_disable_error_reporting();
 }
 
 static int mce_syscore_suspend(void)
 {
-	return mce_disable_error_reporting();
+	vendor_disable_error_reporting();
+	return 0;
 }
 
 static void mce_syscore_shutdown(void)
 {
-	mce_disable_error_reporting();
+	vendor_disable_error_reporting();
 }
 
 /*
@@ -2400,19 +2415,14 @@ static void mce_device_remove(unsigned int cpu)
 static void mce_disable_cpu(void *h)
 {
 	unsigned long action = *(unsigned long *)h;
-	int i;
 
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
 
 	if (!(action & CPU_TASKS_FROZEN))
 		cmci_clear();
-	for (i = 0; i < mca_cfg.banks; i++) {
-		struct mce_bank *b = &mce_banks[i];
 
-		if (b->init)
-			wrmsrl(MSR_IA32_MCx_CTL(i), 0);
-	}
+	vendor_disable_error_reporting();
 }
 
 static void mce_reenable_cpu(void *h)
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [Patch V1] x86, mce: Don't clear global error reporting banks during cpu_offline
  2015-09-04 18:29 [Patch V1] x86, mce: Don't clear global error reporting banks during cpu_offline Ashok Raj
@ 2015-09-11 19:31 ` Borislav Petkov
  2015-09-29  8:37 ` [tip:ras/core] x86/mce: Don' t clear shared banks on Intel when offlining CPUs tip-bot for Ashok Raj
  1 sibling, 0 replies; 3+ messages in thread
From: Borislav Petkov @ 2015-09-11 19:31 UTC (permalink / raw)
  To: Ashok Raj; +Cc: linux-kernel, Boris Petkov, linux-edac, Tony Luck, Serge Ayoun

On Fri, Sep 04, 2015 at 02:29:50PM -0400, Ashok Raj wrote:
> During CPU offline, or during suspend/resume operations, its not safe to
> clear MCi_CTL. These MSR's are either thread scoped (meaning private to
> thread), or core scoped (private to threads in that core only), or socket
> scope i.e visible and controllable from all threads in the socket.
> 
> When we turn off during CPU_OFFLINE, just offlining a single CPU will
> stop signaling for all the socket wide resources, such as LLC, iMC for e.g.
> 
> It is true for Intel CPU's. But there seems some history that other processors
> may require to turn these off during every CPU offline.
> 
> Intel Secure Guard eXtentions (SGX) is worried that it might be possible to
> compromise integrity in a SGX system if the attacker has control of host system
> to inject errors which would be otherwise ignored when MCi_CTL bits are
> cleared. Hence on SGX enabled systems, if MCi_CTL is cleared SGX becomes not
> available anymore.
> 
> - Consolidated some code to use sharing
> - Minor changes to some prototypes to fit usage.
> - Left handling same for non-Intel CPU models to avoid any unknown regressions.
> - Fixed review comments from Boris

I massaged the text a bit more and ended up committing this:

---
From: Ashok Raj <ashok.raj@intel.com>
Date: Fri, 4 Sep 2015 14:29:50 -0400
Subject: [PATCH] x86/mce: Don't clear shared banks on Intel when offlining
 CPUs

It is not safe to clear global MCi_CTL banks during CPU offline or
suspend/resume operations. These MSRs are either thread-scoped (meaning
private to a thread), or core-scoped (private to threads in that core
only), or with a socket scope: visible and controllable from all threads
in the socket.

When we offline a single CPU, clearing those MCi_CTL bits will stop
signaling for all the shared, i.e., socket-wide resources, such as LLC,
iMC, etc.

In addition, it might be possible to compromise the integrity of an
Intel Secure Guard eXtentions (SGX) system if the attacker has control
of the host system and is able to inject errors which would be otherwise
ignored when MCi_CTL bits are cleared.

Hence on SGX enabled systems, if MCi_CTL is cleared, SGX gets disabled.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Serge Ayoun <serge.ayoun@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Serge Ayoun <serge.ayoun@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1441391390-16985-1-git-send-email-ashok.raj@intel.com
[ Cleanup text. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 9d014b82a124..17b5ec6edcb6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2042,7 +2042,7 @@ int __init mcheck_init(void)
  * Disable machine checks on suspend and shutdown. We can't really handle
  * them later.
  */
-static int mce_disable_error_reporting(void)
+static void mce_disable_error_reporting(void)
 {
 	int i;
 
@@ -2052,17 +2052,32 @@ static int mce_disable_error_reporting(void)
 		if (b->init)
 			wrmsrl(MSR_IA32_MCx_CTL(i), 0);
 	}
-	return 0;
+	return;
+}
+
+static void vendor_disable_error_reporting(void)
+{
+	/*
+	 * Don't clear on Intel CPUs. Some of these MSRs are socket-wide.
+	 * Disabling them for just a single offlined CPU is bad, since it will
+	 * inhibit reporting for all shared resources on the socket like the
+	 * last level cache (LLC), the integrated memory controller (iMC), etc.
+	 */
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		return;
+
+	mce_disable_error_reporting();
 }
 
 static int mce_syscore_suspend(void)
 {
-	return mce_disable_error_reporting();
+	vendor_disable_error_reporting();
+	return 0;
 }
 
 static void mce_syscore_shutdown(void)
 {
-	mce_disable_error_reporting();
+	vendor_disable_error_reporting();
 }
 
 /*
@@ -2342,19 +2357,14 @@ static void mce_device_remove(unsigned int cpu)
 static void mce_disable_cpu(void *h)
 {
 	unsigned long action = *(unsigned long *)h;
-	int i;
 
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
 
 	if (!(action & CPU_TASKS_FROZEN))
 		cmci_clear();
-	for (i = 0; i < mca_cfg.banks; i++) {
-		struct mce_bank *b = &mce_banks[i];
 
-		if (b->init)
-			wrmsrl(MSR_IA32_MCx_CTL(i), 0);
-	}
+	vendor_disable_error_reporting();
 }
 
 static void mce_reenable_cpu(void *h)
-- 
2.3.5


-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [tip:ras/core] x86/mce: Don' t clear shared banks on Intel when offlining CPUs
  2015-09-04 18:29 [Patch V1] x86, mce: Don't clear global error reporting banks during cpu_offline Ashok Raj
  2015-09-11 19:31 ` Borislav Petkov
@ 2015-09-29  8:37 ` tip-bot for Ashok Raj
  1 sibling, 0 replies; 3+ messages in thread
From: tip-bot for Ashok Raj @ 2015-09-29  8:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-edac, bp, brgerst, hpa, bp, tglx, peterz, linux-kernel,
	ashok.raj, mingo, dvlasenk, tony.luck, torvalds, luto,
	serge.ayoun

Commit-ID:  6e06780a98f149f131d46c1108d4ae27f05a9357
Gitweb:     http://git.kernel.org/tip/6e06780a98f149f131d46c1108d4ae27f05a9357
Author:     Ashok Raj <ashok.raj@intel.com>
AuthorDate: Mon, 28 Sep 2015 09:21:43 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 28 Sep 2015 10:15:26 +0200

x86/mce: Don't clear shared banks on Intel when offlining CPUs

It is not safe to clear global MCi_CTL banks during CPU offline
or suspend/resume operations. These MSRs are either
thread-scoped (meaning private to a thread), or core-scoped
(private to threads in that core only), or with a socket scope:
visible and controllable from all threads in the socket.

When we offline a single CPU, clearing those MCi_CTL bits will
stop signaling for all the shared, i.e., socket-wide resources,
such as LLC, iMC, etc.

In addition, it might be possible to compromise the integrity of
an Intel Secure Guard eXtentions (SGX) system if the attacker
has control of the host system and is able to inject errors
which would be otherwise ignored when MCi_CTL bits are cleared.

Hence on SGX enabled systems, if MCi_CTL is cleared, SGX gets
disabled.

Tested-by: Serge Ayoun <serge.ayoun@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
[ Cleanup text. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1441391390-16985-1-git-send-email-ashok.raj@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 9d014b82..17b5ec6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -2042,7 +2042,7 @@ int __init mcheck_init(void)
  * Disable machine checks on suspend and shutdown. We can't really handle
  * them later.
  */
-static int mce_disable_error_reporting(void)
+static void mce_disable_error_reporting(void)
 {
 	int i;
 
@@ -2052,17 +2052,32 @@ static int mce_disable_error_reporting(void)
 		if (b->init)
 			wrmsrl(MSR_IA32_MCx_CTL(i), 0);
 	}
-	return 0;
+	return;
+}
+
+static void vendor_disable_error_reporting(void)
+{
+	/*
+	 * Don't clear on Intel CPUs. Some of these MSRs are socket-wide.
+	 * Disabling them for just a single offlined CPU is bad, since it will
+	 * inhibit reporting for all shared resources on the socket like the
+	 * last level cache (LLC), the integrated memory controller (iMC), etc.
+	 */
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		return;
+
+	mce_disable_error_reporting();
 }
 
 static int mce_syscore_suspend(void)
 {
-	return mce_disable_error_reporting();
+	vendor_disable_error_reporting();
+	return 0;
 }
 
 static void mce_syscore_shutdown(void)
 {
-	mce_disable_error_reporting();
+	vendor_disable_error_reporting();
 }
 
 /*
@@ -2342,19 +2357,14 @@ static void mce_device_remove(unsigned int cpu)
 static void mce_disable_cpu(void *h)
 {
 	unsigned long action = *(unsigned long *)h;
-	int i;
 
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
 
 	if (!(action & CPU_TASKS_FROZEN))
 		cmci_clear();
-	for (i = 0; i < mca_cfg.banks; i++) {
-		struct mce_bank *b = &mce_banks[i];
 
-		if (b->init)
-			wrmsrl(MSR_IA32_MCx_CTL(i), 0);
-	}
+	vendor_disable_error_reporting();
 }
 
 static void mce_reenable_cpu(void *h)

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-09-29  8:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-04 18:29 [Patch V1] x86, mce: Don't clear global error reporting banks during cpu_offline Ashok Raj
2015-09-11 19:31 ` Borislav Petkov
2015-09-29  8:37 ` [tip:ras/core] x86/mce: Don' t clear shared banks on Intel when offlining CPUs tip-bot for Ashok Raj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).