* [PATCH 1/9] x86, mce: Enable MCA support by default
2011-10-19 14:50 [RFC -v2] x86 RAS: Reorganize functionality Borislav Petkov
@ 2011-10-19 14:50 ` Borislav Petkov
2011-10-19 14:50 ` [PATCH 2/9] x86, RAS: Start reorganizing RAS features support Borislav Petkov
` (8 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 14:50 UTC (permalink / raw)
To: EDAC devel; +Cc: Tony Luck, Ingo Molnar, X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
MCA is the basic support for hardware error logging and reporting, and
it is majorly unwise to run without it so enable machine check software
support by default on x86.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/Kconfig | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 6a47bb2..afb7a19 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -841,6 +841,7 @@ config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
config X86_MCE
bool "Machine Check / overheating reporting"
+ default y
---help---
Machine Check support allows the processor to notify the
kernel if it detects a problem (e.g. overheating, data corruption).
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH 2/9] x86, RAS: Start reorganizing RAS features support
2011-10-19 14:50 [RFC -v2] x86 RAS: Reorganize functionality Borislav Petkov
2011-10-19 14:50 ` [PATCH 1/9] x86, mce: Enable MCA support by default Borislav Petkov
@ 2011-10-19 14:50 ` Borislav Petkov
2011-10-19 17:13 ` Luck, Tony
2011-10-19 14:51 ` [PATCH 3/9] x86, RAS: Move MCE decoding code into ras/ Borislav Petkov
` (7 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 14:50 UTC (permalink / raw)
To: EDAC devel; +Cc: Tony Luck, Ingo Molnar, X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Start relocating RAS features into a centralized location under
arch/x86/kernel/cpu/ras/. Readjust Kconfig items and makefiles
accordingly.
This patch moves the MCE error thresholding code to its new place. No
functionality change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/Kconfig | 10 ++--------
arch/x86/include/asm/mce.h | 2 +-
arch/x86/kernel/cpu/Makefile | 2 +-
arch/x86/kernel/cpu/mcheck/Makefile | 1 -
arch/x86/kernel/cpu/ras/Kconfig | 16 ++++++++++++++++
arch/x86/kernel/cpu/ras/Makefile | 1 +
arch/x86/kernel/cpu/ras/amd/Makefile | 1 +
.../{mcheck/mce_amd.c => ras/amd/thresholding.c} | 0
8 files changed, 22 insertions(+), 11 deletions(-)
create mode 100644 arch/x86/kernel/cpu/ras/Kconfig
create mode 100644 arch/x86/kernel/cpu/ras/Makefile
create mode 100644 arch/x86/kernel/cpu/ras/amd/Makefile
rename arch/x86/kernel/cpu/{mcheck/mce_amd.c => ras/amd/thresholding.c} (100%)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index afb7a19..1a70aad 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -856,13 +856,7 @@ config X86_MCE_INTEL
Additional support for intel specific MCE features such as
the thermal monitor.
-config X86_MCE_AMD
- def_bool y
- prompt "AMD MCE features"
- depends on X86_MCE && X86_LOCAL_APIC
- ---help---
- Additional support for AMD specific MCE features such as
- the DRAM Error Threshold.
+source "arch/x86/kernel/cpu/ras/Kconfig"
config X86_ANCIENT_MCE
bool "Support for old Pentium 5 / WinChip machine checks"
@@ -873,7 +867,7 @@ config X86_ANCIENT_MCE
line.
config X86_MCE_THRESHOLD
- depends on X86_MCE_AMD || X86_MCE_INTEL
+ depends on X86_AMD_ERROR_THRESHOLDING || X86_MCE_INTEL
def_bool y
config X86_MCE_INJECT
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index c9321f3..545e8e4 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -174,7 +174,7 @@ static inline void cmci_rediscover(int dying) {}
static inline void cmci_recheck(void) {}
#endif
-#ifdef CONFIG_X86_MCE_AMD
+#ifdef CONFIG_X86_AMD_ERROR_THRESHOLDING
void mce_amd_feature_init(struct cpuinfo_x86 *c);
#else
static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 6042981..b2832b8 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -28,7 +28,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o
-obj-$(CONFIG_X86_MCE) += mcheck/
+obj-$(CONFIG_X86_MCE) += mcheck/ ras/
obj-$(CONFIG_MTRR) += mtrr/
obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o
diff --git a/arch/x86/kernel/cpu/mcheck/Makefile b/arch/x86/kernel/cpu/mcheck/Makefile
index bb34b03..ccd1997 100644
--- a/arch/x86/kernel/cpu/mcheck/Makefile
+++ b/arch/x86/kernel/cpu/mcheck/Makefile
@@ -2,7 +2,6 @@ obj-y = mce.o mce-severity.o
obj-$(CONFIG_X86_ANCIENT_MCE) += winchip.o p5.o
obj-$(CONFIG_X86_MCE_INTEL) += mce_intel.o
-obj-$(CONFIG_X86_MCE_AMD) += mce_amd.o
obj-$(CONFIG_X86_MCE_THRESHOLD) += threshold.o
obj-$(CONFIG_X86_MCE_INJECT) += mce-inject.o
diff --git a/arch/x86/kernel/cpu/ras/Kconfig b/arch/x86/kernel/cpu/ras/Kconfig
new file mode 100644
index 0000000..e58c4ea
--- /dev/null
+++ b/arch/x86/kernel/cpu/ras/Kconfig
@@ -0,0 +1,16 @@
+menu "AMD RAS features"
+ depends on X86_MCE && CPU_SUP_AMD
+
+config X86_AMD_ERROR_THRESHOLDING
+ def_bool y
+ prompt "Error Thresholding"
+ depends on X86_LOCAL_APIC
+ ---help---
+ Support hardware-maintained counters of some types of hw errors.
+ Currently, there three groups: DRAM, Link and L3 errors. For more
+ detailed information see the section on Error Thresholding in
+ the respective AMD BKDG.
+
+endmenu
+
+
diff --git a/arch/x86/kernel/cpu/ras/Makefile b/arch/x86/kernel/cpu/ras/Makefile
new file mode 100644
index 0000000..dd7a321
--- /dev/null
+++ b/arch/x86/kernel/cpu/ras/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_X86_MCE) += amd/
diff --git a/arch/x86/kernel/cpu/ras/amd/Makefile b/arch/x86/kernel/cpu/ras/amd/Makefile
new file mode 100644
index 0000000..3c1678f
--- /dev/null
+++ b/arch/x86/kernel/cpu/ras/amd/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_X86_AMD_ERROR_THRESHOLDING) += thresholding.o
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/ras/amd/thresholding.c
similarity index 100%
rename from arch/x86/kernel/cpu/mcheck/mce_amd.c
rename to arch/x86/kernel/cpu/ras/amd/thresholding.c
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 21+ messages in thread* RE: [PATCH 2/9] x86, RAS: Start reorganizing RAS features support
2011-10-19 14:50 ` [PATCH 2/9] x86, RAS: Start reorganizing RAS features support Borislav Petkov
@ 2011-10-19 17:13 ` Luck, Tony
2011-10-19 17:22 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 21+ messages in thread
From: Luck, Tony @ 2011-10-19 17:13 UTC (permalink / raw)
To: Borislav Petkov, EDAC devel; +Cc: Ingo Molnar, X86-ML, LKML, Borislav Petkov
> Start relocating RAS features into a centralized location under
> arch/x86/kernel/cpu/ras/. Readjust Kconfig items and makefiles
> accordingly.
Not all ras features are "cpu" orientated ... should we really be
moving to "arch/x86/kernel/ras"?
-Tony
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/9] x86, RAS: Start reorganizing RAS features support
2011-10-19 17:13 ` Luck, Tony
@ 2011-10-19 17:22 ` Mauro Carvalho Chehab
2011-10-19 18:11 ` Borislav Petkov
0 siblings, 1 reply; 21+ messages in thread
From: Mauro Carvalho Chehab @ 2011-10-19 17:22 UTC (permalink / raw)
To: Luck, Tony
Cc: Borislav Petkov, EDAC devel, Ingo Molnar, X86-ML, LKML,
Borislav Petkov
Em 19-10-2011 15:13, Luck, Tony escreveu:
>> Start relocating RAS features into a centralized location under
>> arch/x86/kernel/cpu/ras/. Readjust Kconfig items and makefiles
>> accordingly.
>
> Not all ras features are "cpu" orientated ... should we really be
> moving to "arch/x86/kernel/ras"?
I think that it makes sense to move MCA bits into arch/x86, but I agree
with Tony: generally speaking, RAS is not even x86 specific.
It seems to make more sense to rename drivers/edac to drivers/ras and put
the RAS menu there, even if the actual support for the AMD and Intel MCA
RAS stuff is kept inside arch/x86/...
Mauro
>
> -Tony
> --
> To unsubscribe from this list: send the line "unsubscribe linux-edac" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/9] x86, RAS: Start reorganizing RAS features support
2011-10-19 17:22 ` Mauro Carvalho Chehab
@ 2011-10-19 18:11 ` Borislav Petkov
2011-10-19 19:14 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 18:11 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: Luck, Tony, Borislav Petkov, EDAC devel, Ingo Molnar, X86-ML,
LKML
On Wed, Oct 19, 2011 at 01:22:27PM -0400, Mauro Carvalho Chehab wrote:
> Em 19-10-2011 15:13, Luck, Tony escreveu:
> >> Start relocating RAS features into a centralized location under
> >> arch/x86/kernel/cpu/ras/. Readjust Kconfig items and makefiles
> >> accordingly.
> >
> > Not all ras features are "cpu" orientated ... should we really be
> > moving to "arch/x86/kernel/ras"?
>
> I think that it makes sense to move MCA bits into arch/x86, but I agree
> with Tony: generally speaking, RAS is not even x86 specific.
But we are :-)
> It seems to make more sense to rename drivers/edac to drivers/ras and put
> the RAS menu there, even if the actual support for the AMD and Intel MCA
> RAS stuff is kept inside arch/x86/...
drivers/edac/ contains other architectures too, patches for which we
cannot (and probably don't want to) test so the whole deal has to be
x86-centric.
I don't consider MCE decoding and injection drivers but rather MCA
functionality extensions or something, so those should go to arch/x86/
IMHO.
And the DRAM error decoding things, aka EDAC, should stay where they
are, although I cannot call them real drivers, either. We can't move
them yet anyway because they use the whole EDAC infrastructure.
Hmm...
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/9] x86, RAS: Start reorganizing RAS features support
2011-10-19 18:11 ` Borislav Petkov
@ 2011-10-19 19:14 ` Mauro Carvalho Chehab
2011-10-20 15:12 ` Borislav Petkov
0 siblings, 1 reply; 21+ messages in thread
From: Mauro Carvalho Chehab @ 2011-10-19 19:14 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Luck, Tony, EDAC devel, Ingo Molnar, X86-ML, LKML
Em 19-10-2011 16:11, Borislav Petkov escreveu:
> On Wed, Oct 19, 2011 at 01:22:27PM -0400, Mauro Carvalho Chehab wrote:
>> Em 19-10-2011 15:13, Luck, Tony escreveu:
>>>> Start relocating RAS features into a centralized location under
>>>> arch/x86/kernel/cpu/ras/. Readjust Kconfig items and makefiles
>>>> accordingly.
>>>
>>> Not all ras features are "cpu" orientated ... should we really be
>>> moving to "arch/x86/kernel/ras"?
>>
>> I think that it makes sense to move MCA bits into arch/x86, but I agree
>> with Tony: generally speaking, RAS is not even x86 specific.
>
> But we are :-)
>
>> It seems to make more sense to rename drivers/edac to drivers/ras and put
>> the RAS menu there, even if the actual support for the AMD and Intel MCA
>> RAS stuff is kept inside arch/x86/...
>
> drivers/edac/ contains other architectures too, patches for which we
> cannot (and probably don't want to) test so the whole deal has to be
> x86-centric.
Most drivers there are for memory controller chipsets, so, even the x86 specific
drivers there won't fit well inside arch/. It might even be possible to have a
MC driver used on more than one architecture (it just doesn't occur, in practice,
because the MC is generally inside a north bridge chip that is sold to match some
features found on some CPU family).
> I don't consider MCE decoding and injection drivers but rather MCA
> functionality extensions or something, so those should go to arch/x86/
> IMHO.
Agreed. MCE decoders and MCE error injection fit better together with
MCA bits. I think they could just be moved to be into the same directory where
the MCE driver is located.
> And the DRAM error decoding things, aka EDAC, should stay where they
> are, although I cannot call them real drivers, either. We can't move
> them yet anyway because they use the whole EDAC infrastructure.
True.
>From someone that wants to select the RAS features however, it makes sense
to put everything together at the same menu when selecting the RAS options.
There are some tricks that could be used, like, for example, having something
like:
menuconfig RAS_FEATURES
bool "Enable RAS features"
if RAS_REATURES
config RAS_MCE
bool "turn on Memory Channel Architecture Error logic for AMD CPU's"
depends on X86
select X86_MCE_AMD
config RAS_MCE
bool "turn on Memory Channel Architecture Error logic for Intel CPU's"
depends on X86
select X86_MCE_INTEL
source "drivers/edac"
endif
at a /drivers/ras Kconfig (or at /ras)
This would allow putting everything together at the same Kconfig menu.
Regards,
Mauro
>
> Hmm...
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/9] x86, RAS: Start reorganizing RAS features support
2011-10-19 19:14 ` Mauro Carvalho Chehab
@ 2011-10-20 15:12 ` Borislav Petkov
0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2011-10-20 15:12 UTC (permalink / raw)
To: Mauro Carvalho Chehab; +Cc: Luck, Tony, EDAC devel, Ingo Molnar, X86-ML, LKML
On Wed, Oct 19, 2011 at 03:14:08PM -0400, Mauro Carvalho Chehab wrote:
> > I don't consider MCE decoding and injection drivers but rather MCA
> > functionality extensions or something, so those should go to arch/x86/
> > IMHO.
>
> Agreed. MCE decoders and MCE error injection fit better together with
> MCA bits. I think they could just be moved to be into the same directory where
> the MCE driver is located.
I think you mean arch/x86/kernel/cpu/mcheck/. Well, that _is_ possible,
since they're MCA extensions.
> > And the DRAM error decoding things, aka EDAC, should stay where they
> > are, although I cannot call them real drivers, either. We can't move
> > them yet anyway because they use the whole EDAC infrastructure.
>
> True.
>
> From someone that wants to select the RAS features however, it makes sense
> to put everything together at the same menu when selecting the RAS options.
>
> There are some tricks that could be used, like, for example, having something
> like:
>
> menuconfig RAS_FEATURES
> bool "Enable RAS features"
>
> if RAS_REATURES
>
> config RAS_MCE
> bool "turn on Memory Channel Architecture Error logic for AMD CPU's"
> depends on X86
> select X86_MCE_AMD
>
> config RAS_MCE
> bool "turn on Memory Channel Architecture Error logic for Intel CPU's"
> depends on X86
> select X86_MCE_INTEL
>
> source "drivers/edac"
>
> endif
>
> at a /drivers/ras Kconfig (or at /ras)
>
> This would allow putting everything together at the same Kconfig menu.
Ok, we could do that too, it has merits.
Let's discuss all this at KS.
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 3/9] x86, RAS: Move MCE decoding code into ras/
2011-10-19 14:50 [RFC -v2] x86 RAS: Reorganize functionality Borislav Petkov
2011-10-19 14:50 ` [PATCH 1/9] x86, mce: Enable MCA support by default Borislav Petkov
2011-10-19 14:50 ` [PATCH 2/9] x86, RAS: Start reorganizing RAS features support Borislav Petkov
@ 2011-10-19 14:51 ` Borislav Petkov
2011-10-19 14:51 ` [PATCH 4/9] x86, RAS: Move MCE injection " Borislav Petkov
` (6 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 14:51 UTC (permalink / raw)
To: EDAC devel; +Cc: Tony Luck, Ingo Molnar, X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Move the MCE decoding code into arch/x86/.../ras/.
No functionality change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
.../x86/include/ras/amd/mce-decode.h | 0
arch/x86/kernel/cpu/ras/Kconfig | 12 ++++++++++++
arch/x86/kernel/cpu/ras/amd/Makefile | 3 +++
.../x86/kernel/cpu/ras/amd/mce-decode.c | 2 +-
drivers/edac/Kconfig | 12 ------------
drivers/edac/Makefile | 3 ---
drivers/edac/amd64_edac.h | 2 +-
drivers/edac/mce_amd_inj.c | 3 +--
8 files changed, 18 insertions(+), 19 deletions(-)
rename drivers/edac/mce_amd.h => arch/x86/include/ras/amd/mce-decode.h (100%)
rename drivers/edac/mce_amd.c => arch/x86/kernel/cpu/ras/amd/mce-decode.c (99%)
diff --git a/drivers/edac/mce_amd.h b/arch/x86/include/ras/amd/mce-decode.h
similarity index 100%
rename from drivers/edac/mce_amd.h
rename to arch/x86/include/ras/amd/mce-decode.h
diff --git a/arch/x86/kernel/cpu/ras/Kconfig b/arch/x86/kernel/cpu/ras/Kconfig
index e58c4ea..440d6a1 100644
--- a/arch/x86/kernel/cpu/ras/Kconfig
+++ b/arch/x86/kernel/cpu/ras/Kconfig
@@ -11,6 +11,18 @@ config X86_AMD_ERROR_THRESHOLDING
detailed information see the section on Error Thresholding in
the respective AMD BKDG.
+config X86_AMD_DECODE_MCE
+ tristate "Decode MCEs in human-readable form"
+ default y
+ ---help---
+ Enable this option if you want to decode Machine Check Exceptions
+ occurring on your machine in a human-readable form.
+
+ You should definitely say Y here in case you want to decode MCEs
+ which occur really early upon boot, before the module infrastructure
+ has been initialized.
+
+
endmenu
diff --git a/arch/x86/kernel/cpu/ras/amd/Makefile b/arch/x86/kernel/cpu/ras/amd/Makefile
index 3c1678f..a18207b 100644
--- a/arch/x86/kernel/cpu/ras/amd/Makefile
+++ b/arch/x86/kernel/cpu/ras/amd/Makefile
@@ -1 +1,4 @@
obj-$(CONFIG_X86_AMD_ERROR_THRESHOLDING) += thresholding.o
+
+amd_mce_decode-y := mce-decode.o
+obj-$(CONFIG_X86_AMD_DECODE_MCE) += amd_mce_decode.o
diff --git a/drivers/edac/mce_amd.c b/arch/x86/kernel/cpu/ras/amd/mce-decode.c
similarity index 99%
rename from drivers/edac/mce_amd.c
rename to arch/x86/kernel/cpu/ras/amd/mce-decode.c
index 795cfbc..e435779 100644
--- a/drivers/edac/mce_amd.c
+++ b/arch/x86/kernel/cpu/ras/amd/mce-decode.c
@@ -1,7 +1,7 @@
#include <linux/module.h>
#include <linux/slab.h>
-#include "mce_amd.h"
+#include <ras/amd/mce-decode.h>
static struct amd_decoder_ops *fam_ops;
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index af1a17d..8de46e7 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -39,18 +39,6 @@ config EDAC_DEBUG
there're four debug levels (x=0,1,2,3 from low to high).
Usually you should select 'N'.
-config EDAC_DECODE_MCE
- tristate "Decode MCEs in human-readable form (only on AMD for now)"
- depends on CPU_SUP_AMD && X86_MCE
- default y
- ---help---
- Enable this option if you want to decode Machine Check Exceptions
- occurring on your machine in human-readable form.
-
- You should definitely say Y here in case you want to decode MCEs
- which occur really early upon boot, before the module infrastructure
- has been initialized.
-
config EDAC_MCE_INJ
tristate "Simple MCE injection interface over /sysfs"
depends on EDAC_DECODE_MCE
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index 3e23913..a6f10c2 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -19,9 +19,6 @@ endif
obj-$(CONFIG_EDAC_MCE_INJ) += mce_amd_inj.o
-edac_mce_amd-y := mce_amd.o
-obj-$(CONFIG_EDAC_DECODE_MCE) += edac_mce_amd.o
-
obj-$(CONFIG_EDAC_AMD76X) += amd76x_edac.o
obj-$(CONFIG_EDAC_CPC925) += cpc925_edac.o
obj-$(CONFIG_EDAC_I5000) += i5000_edac.o
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 9a666cb..cd4232a 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -71,8 +71,8 @@
#include <linux/mmzone.h>
#include <linux/edac.h>
#include <asm/msr.h>
+#include <ras/amd/mce-decode.h>
#include "edac_core.h"
-#include "mce_amd.h"
#define amd64_debug(fmt, arg...) \
edac_printk(KERN_DEBUG, "amd64", fmt, ##arg)
diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
index a4987e0..a38806a 100644
--- a/drivers/edac/mce_amd_inj.c
+++ b/drivers/edac/mce_amd_inj.c
@@ -14,8 +14,7 @@
#include <linux/sysdev.h>
#include <linux/edac.h>
#include <asm/mce.h>
-
-#include "mce_amd.h"
+#include <ras/amd/mce-decode.h>
struct edac_mce_attr {
struct attribute attr;
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH 4/9] x86, RAS: Move MCE injection code into ras/
2011-10-19 14:50 [RFC -v2] x86 RAS: Reorganize functionality Borislav Petkov
` (2 preceding siblings ...)
2011-10-19 14:51 ` [PATCH 3/9] x86, RAS: Move MCE decoding code into ras/ Borislav Petkov
@ 2011-10-19 14:51 ` Borislav Petkov
2011-10-19 14:51 ` [PATCH 5/9] x86, MCE: Add a HW injection flag Borislav Petkov
` (5 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 14:51 UTC (permalink / raw)
To: EDAC devel; +Cc: Tony Luck, Ingo Molnar, X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
This is the code collecting all AMD MCE injection methods.
No functionality change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/cpu/ras/Kconfig | 9 +++++++--
arch/x86/kernel/cpu/ras/amd/Makefile | 3 +++
.../x86/kernel/cpu/ras/amd/mce-inject.c | 0
drivers/edac/Kconfig | 10 ----------
drivers/edac/Makefile | 2 --
5 files changed, 10 insertions(+), 14 deletions(-)
rename drivers/edac/mce_amd_inj.c => arch/x86/kernel/cpu/ras/amd/mce-inject.c (100%)
diff --git a/arch/x86/kernel/cpu/ras/Kconfig b/arch/x86/kernel/cpu/ras/Kconfig
index 440d6a1..39dd0af 100644
--- a/arch/x86/kernel/cpu/ras/Kconfig
+++ b/arch/x86/kernel/cpu/ras/Kconfig
@@ -22,7 +22,12 @@ config X86_AMD_DECODE_MCE
which occur really early upon boot, before the module infrastructure
has been initialized.
+config X86_AMD_MCE_INJECT
+ tristate "Simple MCE injection interface over /sysfs"
+ depends on X86_AMD_DECODE_MCE
+ default n
+ help
+ This is a simple interface to inject MCEs over /sysfs and test
+ the MCE decoding code.
endmenu
-
-
diff --git a/arch/x86/kernel/cpu/ras/amd/Makefile b/arch/x86/kernel/cpu/ras/amd/Makefile
index a18207b..3a01e3a 100644
--- a/arch/x86/kernel/cpu/ras/amd/Makefile
+++ b/arch/x86/kernel/cpu/ras/amd/Makefile
@@ -2,3 +2,6 @@ obj-$(CONFIG_X86_AMD_ERROR_THRESHOLDING) += thresholding.o
amd_mce_decode-y := mce-decode.o
obj-$(CONFIG_X86_AMD_DECODE_MCE) += amd_mce_decode.o
+
+amd_mce_inject-y := mce-inject.o
+obj-$(CONFIG_X86_AMD_MCE_INJECT) += amd_mce_inject.o
diff --git a/drivers/edac/mce_amd_inj.c b/arch/x86/kernel/cpu/ras/amd/mce-inject.c
similarity index 100%
rename from drivers/edac/mce_amd_inj.c
rename to arch/x86/kernel/cpu/ras/amd/mce-inject.c
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 8de46e7..9c9b319 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -39,16 +39,6 @@ config EDAC_DEBUG
there're four debug levels (x=0,1,2,3 from low to high).
Usually you should select 'N'.
-config EDAC_MCE_INJ
- tristate "Simple MCE injection interface over /sysfs"
- depends on EDAC_DECODE_MCE
- default n
- help
- This is a simple interface to inject MCEs over /sysfs and test
- the MCE decoding code in EDAC.
-
- This is currently AMD-only.
-
config EDAC_MM_EDAC
tristate "Main Memory EDAC (Error Detection And Correction) reporting"
help
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index a6f10c2..5444512 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -17,8 +17,6 @@ ifdef CONFIG_PCI
edac_core-y += edac_pci.o edac_pci_sysfs.o
endif
-obj-$(CONFIG_EDAC_MCE_INJ) += mce_amd_inj.o
-
obj-$(CONFIG_EDAC_AMD76X) += amd76x_edac.o
obj-$(CONFIG_EDAC_CPC925) += cpc925_edac.o
obj-$(CONFIG_EDAC_I5000) += i5000_edac.o
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH 5/9] x86, MCE: Add a HW injection flag
2011-10-19 14:50 [RFC -v2] x86 RAS: Reorganize functionality Borislav Petkov
` (3 preceding siblings ...)
2011-10-19 14:51 ` [PATCH 4/9] x86, RAS: Move MCE injection " Borislav Petkov
@ 2011-10-19 14:51 ` Borislav Petkov
2011-10-19 14:51 ` [PATCH 6/9] x86, RAS: Convert mce-inject module to debugfs Borislav Petkov
` (4 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 14:51 UTC (permalink / raw)
To: EDAC devel; +Cc: Tony Luck, Ingo Molnar, X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Add an mce->inject_flag to denote that we're doing HW injection.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/include/asm/mce.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 545e8e4..233cf03 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -54,6 +54,7 @@
#define MCJ_CTX_IRQ 2 /* inject context: IRQ */
#define MCJ_NMI_BROADCAST 4 /* do NMI broadcasting */
#define MCJ_EXCEPTION 8 /* raise as exception */
+#define MCJ_HW_MSR_INJECT 16 /* HW MCE injection method */
/* Fields are zero when not available */
struct mce {
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH 6/9] x86, RAS: Convert mce-inject module to debugfs
2011-10-19 14:50 [RFC -v2] x86 RAS: Reorganize functionality Borislav Petkov
` (4 preceding siblings ...)
2011-10-19 14:51 ` [PATCH 5/9] x86, MCE: Add a HW injection flag Borislav Petkov
@ 2011-10-19 14:51 ` Borislav Petkov
2011-10-19 14:51 ` [PATCH 7/9] x86, RAS: Add function enabling direct writes to MCE MSRs Borislav Petkov
` (3 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 14:51 UTC (permalink / raw)
To: EDAC devel; +Cc: Tony Luck, Ingo Molnar, X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
This is a module which is used for debugging MCE decoding paths so its
userspace interface should go to debugfs, where it belongs conceptually.
While at it, add a warning to the Kconfig text that this interface is
unstable and no userspace scripts should rely all too much on it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/cpu/ras/Kconfig | 12 +-
arch/x86/kernel/cpu/ras/amd/mce-inject.c | 181 +++++++++++++-----------------
2 files changed, 86 insertions(+), 107 deletions(-)
diff --git a/arch/x86/kernel/cpu/ras/Kconfig b/arch/x86/kernel/cpu/ras/Kconfig
index 39dd0af..46d375f 100644
--- a/arch/x86/kernel/cpu/ras/Kconfig
+++ b/arch/x86/kernel/cpu/ras/Kconfig
@@ -23,11 +23,13 @@ config X86_AMD_DECODE_MCE
has been initialized.
config X86_AMD_MCE_INJECT
- tristate "Simple MCE injection interface over /sysfs"
- depends on X86_AMD_DECODE_MCE
+ tristate "Inject MCEs"
+ depends on X86_AMD_DECODE_MCE && DEBUG_FS
default n
- help
- This is a simple interface to inject MCEs over /sysfs and test
- the MCE decoding code.
+ ---help---
+ This is a simple debugfs interface to inject MCEs and test different
+ aspects of the MCE handling code.
+
+ WARNING: Do not even assume that this interface is staying stable!
endmenu
diff --git a/arch/x86/kernel/cpu/ras/amd/mce-inject.c b/arch/x86/kernel/cpu/ras/amd/mce-inject.c
index a38806a..fdec361 100644
--- a/arch/x86/kernel/cpu/ras/amd/mce-inject.c
+++ b/arch/x86/kernel/cpu/ras/amd/mce-inject.c
@@ -1,171 +1,148 @@
/*
- * A simple MCE injection facility for testing the MCE decoding code. This
- * driver should be built as module so that it can be loaded on production
- * kernels for testing purposes.
+ * A simple MCE injection facility for testing different aspects of the RAS
+ * code. This driver should be built as module so that it can be loaded
+ * on production kernels for testing purposes.
*
* This file may be distributed under the terms of the GNU General Public
* License version 2.
*
- * Copyright (c) 2010: Borislav Petkov <borislav.petkov@amd.com>
- * Advanced Micro Devices Inc.
+ * Copyright (c) 2010-11: Borislav Petkov <borislav.petkov@amd.com>
+ * Advanced Micro Devices Inc.
*/
#include <linux/kobject.h>
#include <linux/sysdev.h>
-#include <linux/edac.h>
+#include <linux/debugfs.h>
#include <asm/mce.h>
#include <ras/amd/mce-decode.h>
-struct edac_mce_attr {
- struct attribute attr;
- ssize_t (*show) (struct kobject *kobj, struct edac_mce_attr *attr, char *buf);
- ssize_t (*store)(struct kobject *kobj, struct edac_mce_attr *attr,
- const char *buf, size_t count);
-};
-
-#define EDAC_MCE_ATTR(_name, _mode, _show, _store) \
-static struct edac_mce_attr mce_attr_##_name = __ATTR(_name, _mode, _show, _store)
-
-static struct kobject *mce_kobj;
-
/*
* Collect all the MCi_XXX settings
*/
static struct mce i_mce;
+static struct dentry *dfs_inj;
-#define MCE_INJECT_STORE(reg) \
-static ssize_t edac_inject_##reg##_store(struct kobject *kobj, \
- struct edac_mce_attr *attr, \
- const char *data, size_t count)\
+#define MCE_INJECT_SET(reg) \
+static int inj_##reg##_set(void *data, u64 val) \
{ \
- int ret = 0; \
- unsigned long value; \
- \
- ret = strict_strtoul(data, 16, &value); \
- if (ret < 0) \
- printk(KERN_ERR "Error writing MCE " #reg " field.\n"); \
+ struct mce *m = (struct mce *)data; \
\
- i_mce.reg = value; \
- \
- return count; \
+ m->reg = val; \
+ return 0; \
}
-MCE_INJECT_STORE(status);
-MCE_INJECT_STORE(misc);
-MCE_INJECT_STORE(addr);
+MCE_INJECT_SET(status);
+MCE_INJECT_SET(misc);
+MCE_INJECT_SET(addr);
-#define MCE_INJECT_SHOW(reg) \
-static ssize_t edac_inject_##reg##_show(struct kobject *kobj, \
- struct edac_mce_attr *attr, \
- char *buf) \
+#define MCE_INJECT_GET(reg) \
+static int inj_##reg##_get(void *data, u64 *val) \
{ \
- return sprintf(buf, "0x%016llx\n", i_mce.reg); \
+ struct mce *m = (struct mce *)data; \
+ \
+ *val = m->reg; \
+ return 0; \
}
-MCE_INJECT_SHOW(status);
-MCE_INJECT_SHOW(misc);
-MCE_INJECT_SHOW(addr);
+MCE_INJECT_GET(status);
+MCE_INJECT_GET(misc);
+MCE_INJECT_GET(addr);
-EDAC_MCE_ATTR(status, 0644, edac_inject_status_show, edac_inject_status_store);
-EDAC_MCE_ATTR(misc, 0644, edac_inject_misc_show, edac_inject_misc_store);
-EDAC_MCE_ATTR(addr, 0644, edac_inject_addr_show, edac_inject_addr_store);
+DEFINE_SIMPLE_ATTRIBUTE(status_fops, inj_status_get, inj_status_set, "%llx\n");
+DEFINE_SIMPLE_ATTRIBUTE(misc_fops, inj_misc_get, inj_misc_set, "%llx\n");
+DEFINE_SIMPLE_ATTRIBUTE(addr_fops, inj_addr_get, inj_addr_set, "%llx\n");
/*
* This denotes into which bank we're injecting and triggers
* the injection, at the same time.
*/
-static ssize_t edac_inject_bank_store(struct kobject *kobj,
- struct edac_mce_attr *attr,
- const char *data, size_t count)
+static int inj_bank_set(void *data, u64 val)
{
- int ret = 0;
- unsigned long value;
-
- ret = strict_strtoul(data, 10, &value);
- if (ret < 0) {
- printk(KERN_ERR "Invalid bank value!\n");
- return -EINVAL;
- }
+ struct mce *m = (struct mce *)data;
- if (value > 5)
- if (boot_cpu_data.x86 != 0x15 || value > 6) {
- printk(KERN_ERR "Non-existent MCE bank: %lu\n", value);
+ if (val > 5)
+ if (boot_cpu_data.x86 != 0x15 || val > 6) {
+ printk(KERN_ERR "Non-existent MCE bank: %llu\n", val);
return -EINVAL;
}
- i_mce.bank = value;
+ m->bank = val;
- amd_decode_mce(NULL, 0, &i_mce);
+ amd_decode_mce(NULL, 0, m);
- return count;
+ return 0;
}
-static ssize_t edac_inject_bank_show(struct kobject *kobj,
- struct edac_mce_attr *attr, char *buf)
+static int inj_bank_get(void *data, u64 *val)
{
- return sprintf(buf, "%d\n", i_mce.bank);
-}
+ struct mce *m = (struct mce *)data;
-EDAC_MCE_ATTR(bank, 0644, edac_inject_bank_show, edac_inject_bank_store);
+ *val = m->bank;
+ return 0;
+}
-static struct edac_mce_attr *sysfs_attrs[] = { &mce_attr_status, &mce_attr_misc,
- &mce_attr_addr, &mce_attr_bank
+DEFINE_SIMPLE_ATTRIBUTE(bank_fops, inj_bank_get, inj_bank_set, "%llu\n");
+
+struct dfs_node {
+ char *name;
+ struct dentry *d;
+ const struct file_operations *fops;
+} dfs_fls[] = {
+ { .name = "status", .fops = &status_fops },
+ { .name = "misc", .fops = &misc_fops },
+ { .name = "addr", .fops = &addr_fops },
+ { .name = "bank", .fops = &bank_fops },
};
-static int __init edac_init_mce_inject(void)
+static int __init init_mce_inject(void)
{
- struct sysdev_class *edac_class = NULL;
- int i, err = 0;
+ int i;
- edac_class = edac_get_sysfs_class();
- if (!edac_class)
+ dfs_inj = debugfs_create_dir("mce-inject", NULL);
+ if (!dfs_inj)
return -EINVAL;
- mce_kobj = kobject_create_and_add("mce", &edac_class->kset.kobj);
- if (!mce_kobj) {
- printk(KERN_ERR "Error creating a mce kset.\n");
- err = -ENOMEM;
- goto err_mce_kobj;
- }
+ for (i = 0; i < ARRAY_SIZE(dfs_fls); i++) {
+ dfs_fls[i].d = debugfs_create_file(dfs_fls[i].name,
+ S_IRUSR | S_IWUSR,
+ dfs_inj,
+ &i_mce,
+ dfs_fls[i].fops);
+
+ if (!dfs_fls[i].d)
+ goto err_dfs_add;
- for (i = 0; i < ARRAY_SIZE(sysfs_attrs); i++) {
- err = sysfs_create_file(mce_kobj, &sysfs_attrs[i]->attr);
- if (err) {
- printk(KERN_ERR "Error creating %s in sysfs.\n",
- sysfs_attrs[i]->attr.name);
- goto err_sysfs_create;
- }
}
+
return 0;
-err_sysfs_create:
+err_dfs_add:
while (--i >= 0)
- sysfs_remove_file(mce_kobj, &sysfs_attrs[i]->attr);
-
- kobject_del(mce_kobj);
+ debugfs_remove(dfs_fls[i].d);
-err_mce_kobj:
- edac_put_sysfs_class();
+ debugfs_remove(dfs_inj);
+ dfs_inj = NULL;
- return err;
+ return -ENOMEM;
}
-static void __exit edac_exit_mce_inject(void)
+static void __exit exit_mce_inject(void)
{
int i;
- for (i = 0; i < ARRAY_SIZE(sysfs_attrs); i++)
- sysfs_remove_file(mce_kobj, &sysfs_attrs[i]->attr);
+ for (i = 0; i < ARRAY_SIZE(dfs_fls); i++)
+ debugfs_remove(dfs_fls[i].d);
- kobject_del(mce_kobj);
+ memset(&dfs_fls, 0, sizeof(dfs_fls));
- edac_put_sysfs_class();
+ debugfs_remove(dfs_inj);
+ dfs_inj = NULL;
}
-module_init(edac_init_mce_inject);
-module_exit(edac_exit_mce_inject);
+module_init(init_mce_inject);
+module_exit(exit_mce_inject);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Borislav Petkov <borislav.petkov@amd.com>");
MODULE_AUTHOR("AMD Inc.");
-MODULE_DESCRIPTION("MCE injection facility for testing MCE decoding");
+MODULE_DESCRIPTION("MCE injection facility for RAS testing");
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH 7/9] x86, RAS: Add function enabling direct writes to MCE MSRs
2011-10-19 14:50 [RFC -v2] x86 RAS: Reorganize functionality Borislav Petkov
` (5 preceding siblings ...)
2011-10-19 14:51 ` [PATCH 6/9] x86, RAS: Convert mce-inject module to debugfs Borislav Petkov
@ 2011-10-19 14:51 ` Borislav Petkov
2011-10-19 14:51 ` [PATCH 8/9] x86, RAS: Add attributes needed for HW injection Borislav Petkov
` (2 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 14:51 UTC (permalink / raw)
To: EDAC devel; +Cc: Tony Luck, Ingo Molnar, X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Normally, writing to MCE MSRs causes a #GP. Add a function to enable
direct accesses to those MSRs.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/cpu/ras/amd/mce-inject.c | 24 ++++++++++++++++++++++++
1 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/cpu/ras/amd/mce-inject.c b/arch/x86/kernel/cpu/ras/amd/mce-inject.c
index fdec361..1b797bd 100644
--- a/arch/x86/kernel/cpu/ras/amd/mce-inject.c
+++ b/arch/x86/kernel/cpu/ras/amd/mce-inject.c
@@ -53,6 +53,30 @@ DEFINE_SIMPLE_ATTRIBUTE(misc_fops, inj_misc_get, inj_misc_set, "%llx\n");
DEFINE_SIMPLE_ATTRIBUTE(addr_fops, inj_addr_get, inj_addr_set, "%llx\n");
/*
+ * Caller needs to be make sure this cpu doesn't disappear
+ * from under us, i.e.: get_cpu/put_cpu.
+ */
+static int toggle_hw_mce_inject(unsigned int cpu, bool enable)
+{
+ u32 l, h;
+ int err;
+
+ err = rdmsr_on_cpu(cpu, MSR_K7_HWCR, &l, &h);
+ if (err) {
+ printk(KERN_ERR "%s: error reading HWCR\n", __func__);
+ return err;
+ }
+
+ enable ? (l |= BIT(18)) : (l &= ~BIT(18));
+
+ err = wrmsr_on_cpu(cpu, MSR_K7_HWCR, l, h);
+ if (err)
+ printk(KERN_ERR "%s: error writing HWCR\n", __func__);
+
+ return err;
+}
+
+/*
* This denotes into which bank we're injecting and triggers
* the injection, at the same time.
*/
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 21+ messages in thread* [PATCH 8/9] x86, RAS: Add attributes needed for HW injection
2011-10-19 14:50 [RFC -v2] x86 RAS: Reorganize functionality Borislav Petkov
` (6 preceding siblings ...)
2011-10-19 14:51 ` [PATCH 7/9] x86, RAS: Add function enabling direct writes to MCE MSRs Borislav Petkov
@ 2011-10-19 14:51 ` Borislav Petkov
2011-10-19 21:03 ` David Rientjes
2011-10-19 14:51 ` [PATCH 9/9] x86, RAS: Add an injector function Borislav Petkov
2011-10-19 17:08 ` [RFC -v2] x86 RAS: Reorganize functionality Luck, Tony
9 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 14:51 UTC (permalink / raw)
To: EDAC devel; +Cc: Tony Luck, Ingo Molnar, X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
hw_inject denotes whether we want to do a hardware or a software
injection and, in the case of hardware injection, we want to do that on
a particular cpu, thus the 'cpu' attribute.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/cpu/ras/amd/mce-inject.c | 55 ++++++++++++++++++++++++++++++
1 files changed, 55 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/cpu/ras/amd/mce-inject.c b/arch/x86/kernel/cpu/ras/amd/mce-inject.c
index 1b797bd..8646080 100644
--- a/arch/x86/kernel/cpu/ras/amd/mce-inject.c
+++ b/arch/x86/kernel/cpu/ras/amd/mce-inject.c
@@ -77,6 +77,59 @@ static int toggle_hw_mce_inject(unsigned int cpu, bool enable)
}
/*
+ * HW or SW injection
+ */
+static int hw_inj_get(void *data, u64 *val)
+{
+ struct mce *m = (struct mce *)data;
+
+ *val = !!(m->inject_flags & MCJ_HW_MSR_INJECT);
+
+ return 0;
+}
+
+static int hw_inj_set(void *data, u64 val)
+{
+ struct mce *m = (struct mce *)data;
+
+ switch (val) {
+ case 0:
+ m->inject_flags &= (u8)~MCJ_HW_MSR_INJECT;
+ break;
+
+ case 1:
+ m->inject_flags |= MCJ_HW_MSR_INJECT;
+ break;
+
+ default:
+ printk(KERN_ERR "%s: Only 0 or 1 allowed!\n", __func__);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(hw_inj_fops, hw_inj_get, hw_inj_set, "%llu\n");
+
+/*
+ * On which CPU to inject?
+ */
+MCE_INJECT_GET(extcpu);
+
+static int inj_extcpu_set(void *data, u64 val)
+{
+ struct mce *m = (struct mce *)data;
+
+ if (val >= num_online_cpus()) {
+ printk(KERN_ERR "%s: Non-existent CPU: %llu\n", __func__, val);
+ return -EINVAL;
+ }
+ m->extcpu = val;
+ return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(extcpu_fops, inj_extcpu_get, inj_extcpu_set, "%llu\n");
+
+/*
* This denotes into which bank we're injecting and triggers
* the injection, at the same time.
*/
@@ -116,6 +169,8 @@ struct dfs_node {
{ .name = "misc", .fops = &misc_fops },
{ .name = "addr", .fops = &addr_fops },
{ .name = "bank", .fops = &bank_fops },
+ { .name = "hw_inject", .fops = &hw_inj_fops },
+ { .name = "cpu", .fops = &extcpu_fops },
};
static int __init init_mce_inject(void)
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 8/9] x86, RAS: Add attributes needed for HW injection
2011-10-19 14:51 ` [PATCH 8/9] x86, RAS: Add attributes needed for HW injection Borislav Petkov
@ 2011-10-19 21:03 ` David Rientjes
2011-10-19 21:09 ` Borislav Petkov
0 siblings, 1 reply; 21+ messages in thread
From: David Rientjes @ 2011-10-19 21:03 UTC (permalink / raw)
To: Borislav Petkov
Cc: EDAC devel, Tony Luck, Ingo Molnar, X86-ML, LKML, Borislav Petkov
On Wed, 19 Oct 2011, Borislav Petkov wrote:
> diff --git a/arch/x86/kernel/cpu/ras/amd/mce-inject.c b/arch/x86/kernel/cpu/ras/amd/mce-inject.c
> index 1b797bd..8646080 100644
> --- a/arch/x86/kernel/cpu/ras/amd/mce-inject.c
> +++ b/arch/x86/kernel/cpu/ras/amd/mce-inject.c
> @@ -77,6 +77,59 @@ static int toggle_hw_mce_inject(unsigned int cpu, bool enable)
> }
>
> /*
> + * HW or SW injection
> + */
> +static int hw_inj_get(void *data, u64 *val)
> +{
> + struct mce *m = (struct mce *)data;
> +
> + *val = !!(m->inject_flags & MCJ_HW_MSR_INJECT);
> +
> + return 0;
> +}
> +
> +static int hw_inj_set(void *data, u64 val)
> +{
> + struct mce *m = (struct mce *)data;
> +
> + switch (val) {
> + case 0:
> + m->inject_flags &= (u8)~MCJ_HW_MSR_INJECT;
> + break;
> +
> + case 1:
> + m->inject_flags |= MCJ_HW_MSR_INJECT;
> + break;
> +
> + default:
> + printk(KERN_ERR "%s: Only 0 or 1 allowed!\n", __func__);
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> +DEFINE_SIMPLE_ATTRIBUTE(hw_inj_fops, hw_inj_get, hw_inj_set, "%llu\n");
> +
> +/*
> + * On which CPU to inject?
> + */
> +MCE_INJECT_GET(extcpu);
> +
> +static int inj_extcpu_set(void *data, u64 val)
> +{
> + struct mce *m = (struct mce *)data;
> +
> + if (val >= num_online_cpus()) {
That wouldn't catch an offline cpuid, you probably want cpu_online(val)?
> + printk(KERN_ERR "%s: Non-existent CPU: %llu\n", __func__, val);
> + return -EINVAL;
> + }
> + m->extcpu = val;
> + return 0;
> +}
> +
> +DEFINE_SIMPLE_ATTRIBUTE(extcpu_fops, inj_extcpu_get, inj_extcpu_set, "%llu\n");
> +
> +/*
> * This denotes into which bank we're injecting and triggers
> * the injection, at the same time.
> */
> @@ -116,6 +169,8 @@ struct dfs_node {
> { .name = "misc", .fops = &misc_fops },
> { .name = "addr", .fops = &addr_fops },
> { .name = "bank", .fops = &bank_fops },
> + { .name = "hw_inject", .fops = &hw_inj_fops },
> + { .name = "cpu", .fops = &extcpu_fops },
> };
>
> static int __init init_mce_inject(void)
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 8/9] x86, RAS: Add attributes needed for HW injection
2011-10-19 21:03 ` David Rientjes
@ 2011-10-19 21:09 ` Borislav Petkov
2011-10-19 21:19 ` David Rientjes
0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 21:09 UTC (permalink / raw)
To: David Rientjes; +Cc: EDAC devel, Tony Luck, Ingo Molnar, X86-ML, LKML
On Wed, Oct 19, 2011 at 05:03:43PM -0400, David Rientjes wrote:
> > +static int inj_extcpu_set(void *data, u64 val)
> > +{
> > + struct mce *m = (struct mce *)data;
> > +
> > + if (val >= num_online_cpus()) {
>
> That wouldn't catch an offline cpuid, you probably want cpu_online(val)?
Good catch, thanks.
So I'm looking at <arch/x86/kernel/cpuid.c> which is how the validity of
a cpu value should be tested properly, IMHO:
static int cpuid_open(struct inode *inode, struct file *file)
{
...
if (cpu >= nr_cpu_ids || !cpu_online(cpu))
return -ENXIO; /* No such CPU */
...
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 8/9] x86, RAS: Add attributes needed for HW injection
2011-10-19 21:09 ` Borislav Petkov
@ 2011-10-19 21:19 ` David Rientjes
2011-10-20 15:06 ` Borislav Petkov
0 siblings, 1 reply; 21+ messages in thread
From: David Rientjes @ 2011-10-19 21:19 UTC (permalink / raw)
To: Borislav Petkov; +Cc: EDAC devel, Tony Luck, Ingo Molnar, X86-ML, LKML
On Wed, 19 Oct 2011, Borislav Petkov wrote:
> > That wouldn't catch an offline cpuid, you probably want cpu_online(val)?
>
> Good catch, thanks.
>
> So I'm looking at <arch/x86/kernel/cpuid.c> which is how the validity of
> a cpu value should be tested properly, IMHO:
>
> static int cpuid_open(struct inode *inode, struct file *file)
> {
> ...
>
> if (cpu >= nr_cpu_ids || !cpu_online(cpu))
> return -ENXIO; /* No such CPU */
> ...
Right, the comment for cpumask_test_cpu(), which cpu_online() really is,
wants a cpu < nr_cpu_ids.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 8/9] x86, RAS: Add attributes needed for HW injection
2011-10-19 21:19 ` David Rientjes
@ 2011-10-20 15:06 ` Borislav Petkov
0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2011-10-20 15:06 UTC (permalink / raw)
To: David Rientjes; +Cc: EDAC devel, Tony Luck, Ingo Molnar, X86-ML, LKML
On Wed, Oct 19, 2011 at 05:19:06PM -0400, David Rientjes wrote:
> On Wed, 19 Oct 2011, Borislav Petkov wrote:
>
> > > That wouldn't catch an offline cpuid, you probably want cpu_online(val)?
> >
> > Good catch, thanks.
> >
> > So I'm looking at <arch/x86/kernel/cpuid.c> which is how the validity of
> > a cpu value should be tested properly, IMHO:
> >
> > static int cpuid_open(struct inode *inode, struct file *file)
> > {
> > ...
> >
> > if (cpu >= nr_cpu_ids || !cpu_online(cpu))
> > return -ENXIO; /* No such CPU */
> > ...
>
> Right, the comment for cpumask_test_cpu(), which cpu_online() really is,
> wants a cpu < nr_cpu_ids.
Fixed, thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 9/9] x86, RAS: Add an injector function
2011-10-19 14:50 [RFC -v2] x86 RAS: Reorganize functionality Borislav Petkov
` (7 preceding siblings ...)
2011-10-19 14:51 ` [PATCH 8/9] x86, RAS: Add attributes needed for HW injection Borislav Petkov
@ 2011-10-19 14:51 ` Borislav Petkov
2011-10-19 17:08 ` [RFC -v2] x86 RAS: Reorganize functionality Luck, Tony
9 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 14:51 UTC (permalink / raw)
To: EDAC devel; +Cc: Tony Luck, Ingo Molnar, X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Selectively inject either a real MCE or a sw-only version which
exercises the decoding code only. The hardware-injected MCE triggers a
machine check exception (#MC) so that the MCE handler can be bothered to
do something too.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/kernel/cpu/ras/amd/mce-inject.c | 50 ++++++++++++++++++++++++++++-
1 files changed, 48 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/ras/amd/mce-inject.c b/arch/x86/kernel/cpu/ras/amd/mce-inject.c
index 8646080..d798096 100644
--- a/arch/x86/kernel/cpu/ras/amd/mce-inject.c
+++ b/arch/x86/kernel/cpu/ras/amd/mce-inject.c
@@ -129,6 +129,53 @@ static int inj_extcpu_set(void *data, u64 val)
DEFINE_SIMPLE_ATTRIBUTE(extcpu_fops, inj_extcpu_get, inj_extcpu_set, "%llu\n");
+static void trigger_mce(void *info)
+{
+ asm volatile("int $18");
+}
+
+static void do_inject(void)
+{
+ u64 mcg_status = 0;
+ unsigned int cpu = i_mce.extcpu;
+ int this_cpu;
+ u8 b = i_mce.bank;
+
+ if (!(i_mce.inject_flags & MCJ_HW_MSR_INJECT)) {
+ amd_decode_mce(NULL, 0, &i_mce);
+ return;
+ }
+
+ /* prep MCE global settings for the injection */
+ mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV;
+
+ if (!(i_mce.status & MCI_STATUS_PCC))
+ mcg_status |= MCG_STATUS_RIPV;
+
+ this_cpu = get_cpu();
+
+ toggle_hw_mce_inject(cpu, true);
+
+ wrmsr_on_cpu(cpu, MSR_IA32_MCG_STATUS,
+ (u32)mcg_status, (u32)(mcg_status >> 32));
+
+ wrmsr_on_cpu(cpu, MSR_IA32_MCx_STATUS(b),
+ (u32)i_mce.status, (u32)(i_mce.status >> 32));
+
+ wrmsr_on_cpu(cpu, MSR_IA32_MCx_ADDR(b),
+ (u32)i_mce.addr, (u32)(i_mce.addr >> 32));
+
+ wrmsr_on_cpu(cpu, MSR_IA32_MCx_MISC(b),
+ (u32)i_mce.misc, (u32)(i_mce.misc >> 32));
+
+ toggle_hw_mce_inject(cpu, false);
+
+ smp_call_function_single(cpu, trigger_mce, NULL, 0);
+
+ put_cpu();
+
+}
+
/*
* This denotes into which bank we're injecting and triggers
* the injection, at the same time.
@@ -144,8 +191,7 @@ static int inj_bank_set(void *data, u64 val)
}
m->bank = val;
-
- amd_decode_mce(NULL, 0, m);
+ do_inject();
return 0;
}
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 21+ messages in thread* RE: [RFC -v2] x86 RAS: Reorganize functionality
2011-10-19 14:50 [RFC -v2] x86 RAS: Reorganize functionality Borislav Petkov
` (8 preceding siblings ...)
2011-10-19 14:51 ` [PATCH 9/9] x86, RAS: Add an injector function Borislav Petkov
@ 2011-10-19 17:08 ` Luck, Tony
2011-10-19 17:13 ` Borislav Petkov
9 siblings, 1 reply; 21+ messages in thread
From: Luck, Tony @ 2011-10-19 17:08 UTC (permalink / raw)
To: Borislav Petkov, EDAC devel; +Cc: Ingo Molnar, X86-ML, LKML, Borislav Petkov
> and what I actually would like to see is something like
[ ] Reliability, Availability, Serviceability
[ ] Machine Check Architecture
[ ] Intel-specific features
[ ] CMCI / overheating reporting
[ ] MCE decoding
[ ] MCE injection
[ ] AMD-specific features
[ ] Error thresholding
[ ] MCE decoding
[ ] MCE injection
[ ] ...
Where does EINJ and other APEI bits fit into this ... it isn't Intel
specific (Is it? Do AMD BIOS writers include it??). It would be nice
for it to eventually move in with the other "RAS" bit, rather than
being hidden under a couple of levels of ACPI menus.
-Tony
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [RFC -v2] x86 RAS: Reorganize functionality
2011-10-19 17:08 ` [RFC -v2] x86 RAS: Reorganize functionality Luck, Tony
@ 2011-10-19 17:13 ` Borislav Petkov
0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2011-10-19 17:13 UTC (permalink / raw)
To: Luck, Tony; +Cc: EDAC devel, Ingo Molnar, X86-ML, LKML
On Wed, Oct 19, 2011 at 10:08:25AM -0700, Luck, Tony wrote:
> > and what I actually would like to see is something like
>
> [ ] Reliability, Availability, Serviceability
> [ ] Machine Check Architecture
> [ ] Intel-specific features
> [ ] CMCI / overheating reporting
> [ ] MCE decoding
> [ ] MCE injection
> [ ] AMD-specific features
> [ ] Error thresholding
> [ ] MCE decoding
> [ ] MCE injection
> [ ] ...
>
> Where does EINJ and other APEI bits fit into this ... it isn't Intel
> specific (Is it? Do AMD BIOS writers include it??).
I'll have to check on that but...
> It would be nice for it to eventually move in with the other "RAS"
> bit, rather than being hidden under a couple of levels of ACPI menus.
... absolutely, that's what the "[ ] ..." was supposed to mean - other
RAS features which should go there eventually.
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
^ permalink raw reply [flat|nested] 21+ messages in thread