* [PATCH 0/4] x86, MCE, AMD: Hardware MCE injection
@ 2011-05-05 16:49 Borislav Petkov
2011-05-05 16:49 ` [PATCH 1/4] x86, MCE: Add a HW injection flag Borislav Petkov
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Borislav Petkov @ 2011-05-05 16:49 UTC (permalink / raw)
To: EDAC devel; +Cc: X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Hi,
only small changes from the last submission
(http://lkml.org/lkml/2011/4/15/171) so that we can have valid
MCG_STATUS values and don't choke on severity sanity checking.
@Ingo: I don't care which way 1/4 takes upstream so I'm fine with either
you ACKing and I send it to Linus in the next merge window or picking it
up.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/4] x86, MCE: Add a HW injection flag
2011-05-05 16:49 [PATCH 0/4] x86, MCE, AMD: Hardware MCE injection Borislav Petkov
@ 2011-05-05 16:49 ` Borislav Petkov
2011-05-05 16:49 ` [PATCH 2/4] EDAC, MCE, AMD: Add function to enable direct write to MCE MSRs Borislav Petkov
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2011-05-05 16:49 UTC (permalink / raw)
To: EDAC devel; +Cc: X86-ML, LKML, Borislav Petkov, Ingo Molnar
From: Borislav Petkov <borislav.petkov@amd.com>
Add an mce->inject_flag to denote that we're doing HW injection.
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
arch/x86/include/asm/mce.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index eb16e94..a2e3dcf 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -49,6 +49,7 @@
#define MCJ_CTX_IRQ 2 /* inject context: IRQ */
#define MCJ_NMI_BROADCAST 4 /* do NMI broadcasting */
#define MCJ_EXCEPTION 8 /* raise as exception */
+#define MCJ_HW_INJECT 16 /* do a HW MCE inject, i.e. INT18 */
/* Fields are zero when not available */
struct mce {
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/4] EDAC, MCE, AMD: Add function to enable direct write to MCE MSRs
2011-05-05 16:49 [PATCH 0/4] x86, MCE, AMD: Hardware MCE injection Borislav Petkov
2011-05-05 16:49 ` [PATCH 1/4] x86, MCE: Add a HW injection flag Borislav Petkov
@ 2011-05-05 16:49 ` Borislav Petkov
2011-05-05 16:49 ` [PATCH 3/4] EDAC, MCE, AMD: Add hw_inject and cpu sysfs attrs Borislav Petkov
2011-05-05 16:49 ` [PATCH 4/4] EDAC, MCE, AMD: Add an injector function Borislav Petkov
3 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2011-05-05 16:49 UTC (permalink / raw)
To: EDAC devel; +Cc: X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Normally, writing to MCE MSRs causes a #GP. Add a function to enable
direct access to those MSRs.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
drivers/edac/mce_amd_inj.c | 24 ++++++++++++++++++++++++
1 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
index a4987e0..eb36e15 100644
--- a/drivers/edac/mce_amd_inj.c
+++ b/drivers/edac/mce_amd_inj.c
@@ -30,6 +30,30 @@ static struct edac_mce_attr mce_attr_##_name = __ATTR(_name, _mode, _show, _stor
static struct kobject *mce_kobj;
/*
+ * Caller needs to be make sure this cpu doesn't disappear
+ * from under us, i.e.: get_cpu/put_cpu.
+ */
+static int toggle_hw_mce_inject(unsigned int cpu, bool enable)
+{
+ u32 l, h;
+ int err;
+
+ err = rdmsr_on_cpu(cpu, MSR_K7_HWCR, &l, &h);
+ if (err) {
+ printk(KERN_ERR "%s: error reading HWCR\n", __func__);
+ return err;
+ }
+
+ enable ? (l |= BIT(18)) : (l &= ~BIT(18));
+
+ err = wrmsr_on_cpu(cpu, MSR_K7_HWCR, l, h);
+ if (err)
+ printk(KERN_ERR "%s: error writing HWCR\n", __func__);
+
+ return err;
+}
+
+/*
* Collect all the MCi_XXX settings
*/
static struct mce i_mce;
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/4] EDAC, MCE, AMD: Add hw_inject and cpu sysfs attrs
2011-05-05 16:49 [PATCH 0/4] x86, MCE, AMD: Hardware MCE injection Borislav Petkov
2011-05-05 16:49 ` [PATCH 1/4] x86, MCE: Add a HW injection flag Borislav Petkov
2011-05-05 16:49 ` [PATCH 2/4] EDAC, MCE, AMD: Add function to enable direct write to MCE MSRs Borislav Petkov
@ 2011-05-05 16:49 ` Borislav Petkov
2011-05-08 16:40 ` Greg KH
2011-05-05 16:49 ` [PATCH 4/4] EDAC, MCE, AMD: Add an injector function Borislav Petkov
3 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2011-05-05 16:49 UTC (permalink / raw)
To: EDAC devel; +Cc: X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Add sysfs handlers for the 'hw_inject' and 'cpu' attrs.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
drivers/edac/mce_amd_inj.c | 78 +++++++++++++++++++++++++++++++++++++++++--
1 files changed, 74 insertions(+), 4 deletions(-)
diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
index eb36e15..3a50e0c 100644
--- a/drivers/edac/mce_amd_inj.c
+++ b/drivers/edac/mce_amd_inj.c
@@ -67,8 +67,11 @@ static ssize_t edac_inject_##reg##_store(struct kobject *kobj, \
unsigned long value; \
\
ret = strict_strtoul(data, 16, &value); \
- if (ret < 0) \
- printk(KERN_ERR "Error writing MCE " #reg " field.\n"); \
+ if (ret < 0) { \
+ printk(KERN_ERR "%s: Error parsing MCE " #reg " field.\n",\
+ __func__); \
+ return ret; \
+ } \
\
i_mce.reg = value; \
\
@@ -96,6 +99,68 @@ EDAC_MCE_ATTR(misc, 0644, edac_inject_misc_show, edac_inject_misc_store);
EDAC_MCE_ATTR(addr, 0644, edac_inject_addr_show, edac_inject_addr_store);
/*
+ * HW or SW injection
+ */
+static ssize_t edac_hw_inject_show(struct kobject *kobj,
+ struct edac_mce_attr *attr, char *buf)
+{
+ return sprintf(buf, "%d\n", !!(i_mce.inject_flags & MCJ_HW_INJECT));
+}
+
+static ssize_t edac_hw_inject_store(struct kobject *kobj,
+ struct edac_mce_attr *attr,
+ const char *data, size_t count)
+{
+ switch (data[0]) {
+ case '0':
+ i_mce.inject_flags &= (u8)~MCJ_HW_INJECT;
+ break;
+ case '1':
+ i_mce.inject_flags |= MCJ_HW_INJECT;
+ break;
+ default:
+ printk(KERN_ERR "%s: Only 0 or 1 allowed!\n", __func__);
+ return -EINVAL;
+ }
+ return count;
+}
+
+EDAC_MCE_ATTR(hw_inject, 0644, edac_hw_inject_show, edac_hw_inject_store);
+
+/*
+ * On which CPU to inject?
+ */
+static ssize_t edac_cpu_show(struct kobject *kobj, struct edac_mce_attr *attr,
+ char *buf)
+{
+ return sprintf(buf, "%d\n", i_mce.extcpu);
+}
+
+static ssize_t edac_cpu_store(struct kobject *kobj, struct edac_mce_attr *attr,
+ const char *data, size_t count)
+{
+ int ret = 0;
+ unsigned long value;
+
+ ret = strict_strtoul(data, 10, &value);
+ if (ret < 0) {
+ printk(KERN_ERR "%s: Unable to parse CPU value!\n", __func__);
+ return ret;
+ }
+
+ if (value > num_online_cpus()) {
+ printk(KERN_ERR "%s: Non-existant CPU: %lu\n", __func__, value);
+ return -EINVAL;
+ }
+
+ i_mce.extcpu = value;
+
+ return count;
+}
+
+EDAC_MCE_ATTR(cpu, 0644, edac_cpu_show, edac_cpu_store);
+
+/*
* This denotes into which bank we're injecting and triggers
* the injection, at the same time.
*/
@@ -133,8 +198,13 @@ static ssize_t edac_inject_bank_show(struct kobject *kobj,
EDAC_MCE_ATTR(bank, 0644, edac_inject_bank_show, edac_inject_bank_store);
-static struct edac_mce_attr *sysfs_attrs[] = { &mce_attr_status, &mce_attr_misc,
- &mce_attr_addr, &mce_attr_bank
+static struct edac_mce_attr *sysfs_attrs[] = {
+ &mce_attr_status,
+ &mce_attr_misc,
+ &mce_attr_addr,
+ &mce_attr_bank,
+ &mce_attr_hw_inject,
+ &mce_attr_cpu,
};
static int __init edac_init_mce_inject(void)
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/4] EDAC, MCE, AMD: Add an injector function
2011-05-05 16:49 [PATCH 0/4] x86, MCE, AMD: Hardware MCE injection Borislav Petkov
` (2 preceding siblings ...)
2011-05-05 16:49 ` [PATCH 3/4] EDAC, MCE, AMD: Add hw_inject and cpu sysfs attrs Borislav Petkov
@ 2011-05-05 16:49 ` Borislav Petkov
3 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2011-05-05 16:49 UTC (permalink / raw)
To: EDAC devel; +Cc: X86-ML, LKML, Borislav Petkov
From: Borislav Petkov <borislav.petkov@amd.com>
Selectively inject either a real MCE or a sw-only version which
exercises the decoding code only. The hardware-injected MCE triggers a
machine check exception (#MC) so that the MCE handler can be bothered to
do something too.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
---
drivers/edac/mce_amd_inj.c | 49 +++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 48 insertions(+), 1 deletions(-)
diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
index 3a50e0c..82cbf8c 100644
--- a/drivers/edac/mce_amd_inj.c
+++ b/drivers/edac/mce_amd_inj.c
@@ -160,6 +160,53 @@ static ssize_t edac_cpu_store(struct kobject *kobj, struct edac_mce_attr *attr,
EDAC_MCE_ATTR(cpu, 0644, edac_cpu_show, edac_cpu_store);
+static void trigger_mce(void *info)
+{
+ asm volatile("int $18");
+}
+
+static void do_inject(void)
+{
+ u64 mcg_status = 0;
+ unsigned int cpu = i_mce.extcpu;
+ int this_cpu;
+ u8 b = i_mce.bank;
+
+ if (!(i_mce.inject_flags & MCJ_HW_INJECT)) {
+ amd_decode_mce(NULL, 0, &i_mce);
+ return;
+ }
+
+ /* prep MCE global settings for the injection */
+ mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV;
+
+ if (!(i_mce.status & MCI_STATUS_PCC))
+ mcg_status |= MCG_STATUS_RIPV;
+
+ this_cpu = get_cpu();
+
+ toggle_hw_mce_inject(cpu, true);
+
+ wrmsr_on_cpu(cpu, MSR_IA32_MCG_STATUS,
+ (u32)mcg_status, (u32)(mcg_status >> 32));
+
+ wrmsr_on_cpu(cpu, MSR_IA32_MCx_STATUS(b),
+ (u32)i_mce.status, (u32)(i_mce.status >> 32));
+
+ wrmsr_on_cpu(cpu, MSR_IA32_MCx_ADDR(b),
+ (u32)i_mce.addr, (u32)(i_mce.addr >> 32));
+
+ wrmsr_on_cpu(cpu, MSR_IA32_MCx_MISC(b),
+ (u32)i_mce.misc, (u32)(i_mce.misc >> 32));
+
+ toggle_hw_mce_inject(cpu, false);
+
+ smp_call_function_single(cpu, trigger_mce, NULL, 0);
+
+ put_cpu();
+
+}
+
/*
* This denotes into which bank we're injecting and triggers
* the injection, at the same time.
@@ -185,7 +232,7 @@ static ssize_t edac_inject_bank_store(struct kobject *kobj,
i_mce.bank = value;
- amd_decode_mce(NULL, 0, &i_mce);
+ do_inject();
return count;
}
--
1.7.4.rc2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 3/4] EDAC, MCE, AMD: Add hw_inject and cpu sysfs attrs
2011-05-05 16:49 ` [PATCH 3/4] EDAC, MCE, AMD: Add hw_inject and cpu sysfs attrs Borislav Petkov
@ 2011-05-08 16:40 ` Greg KH
2011-05-08 18:49 ` Borislav Petkov
0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2011-05-08 16:40 UTC (permalink / raw)
To: Borislav Petkov; +Cc: EDAC devel, X86-ML, LKML, Borislav Petkov
On Thu, May 05, 2011 at 06:49:35PM +0200, Borislav Petkov wrote:
> From: Borislav Petkov <borislav.petkov@amd.com>
>
> Add sysfs handlers for the 'hw_inject' and 'cpu' attrs.
You forgot to add the proper Documentation/ABI/ files for these new
sysfs files. Please do that.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/4] EDAC, MCE, AMD: Add hw_inject and cpu sysfs attrs
2011-05-08 16:40 ` Greg KH
@ 2011-05-08 18:49 ` Borislav Petkov
2011-05-08 19:18 ` Greg KH
0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2011-05-08 18:49 UTC (permalink / raw)
To: Greg KH; +Cc: EDAC devel, X86-ML, LKML, Petkov, Borislav
On Sun, May 08, 2011 at 12:40:54PM -0400, Greg KH wrote:
> On Thu, May 05, 2011 at 06:49:35PM +0200, Borislav Petkov wrote:
> > From: Borislav Petkov <borislav.petkov@amd.com>
> >
> > Add sysfs handlers for the 'hw_inject' and 'cpu' attrs.
>
> You forgot to add the proper Documentation/ABI/ files for these new
> sysfs files. Please do that.
Yeah, so those sprang up as a testing module and the final goal is to
have all injection done through perf events but it could also be prudent
to keep those as a fallback alternative.
I'm thinking something like
Documentation/ABI/testing/sysfs-devices-system-edac but looking
at Documentation/ABI/README and reading the "testing/" category
description, I don't think it applies to those interfaces just yet for
the simple reason that they're still changing and I wouldn't want to
cast them in stone now.
Can we add them to a "unstable" category instead or maybe delay making
the ABI description for now?
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/4] EDAC, MCE, AMD: Add hw_inject and cpu sysfs attrs
2011-05-08 18:49 ` Borislav Petkov
@ 2011-05-08 19:18 ` Greg KH
2011-05-08 19:43 ` Borislav Petkov
0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2011-05-08 19:18 UTC (permalink / raw)
To: Borislav Petkov; +Cc: EDAC devel, X86-ML, LKML, Petkov, Borislav
On Sun, May 08, 2011 at 08:49:11PM +0200, Borislav Petkov wrote:
> On Sun, May 08, 2011 at 12:40:54PM -0400, Greg KH wrote:
> > On Thu, May 05, 2011 at 06:49:35PM +0200, Borislav Petkov wrote:
> > > From: Borislav Petkov <borislav.petkov@amd.com>
> > >
> > > Add sysfs handlers for the 'hw_inject' and 'cpu' attrs.
> >
> > You forgot to add the proper Documentation/ABI/ files for these new
> > sysfs files. Please do that.
>
> Yeah, so those sprang up as a testing module and the final goal is to
> have all injection done through perf events but it could also be prudent
> to keep those as a fallback alternative.
>
> I'm thinking something like
> Documentation/ABI/testing/sysfs-devices-system-edac but looking
> at Documentation/ABI/README and reading the "testing/" category
> description, I don't think it applies to those interfaces just yet for
> the simple reason that they're still changing and I wouldn't want to
> cast them in stone now.
>
> Can we add them to a "unstable" category instead or maybe delay making
> the ABI description for now?
No. If you add sysfs files to the kernel, you had better be sure that
they are correct and are what you want to be doing. Otherwise, don't
add them.
As you really want to do this through perf, just do it through perf.
Please don't create a user/kernel api that you are not going to want to
use in the end.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/4] EDAC, MCE, AMD: Add hw_inject and cpu sysfs attrs
2011-05-08 19:18 ` Greg KH
@ 2011-05-08 19:43 ` Borislav Petkov
0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2011-05-08 19:43 UTC (permalink / raw)
To: Greg KH; +Cc: Borislav Petkov, EDAC devel, X86-ML, LKML
On Sun, May 08, 2011 at 03:18:53PM -0400, Greg KH wrote:
> > > You forgot to add the proper Documentation/ABI/ files for these new
> > > sysfs files. Please do that.
> >
> > Yeah, so those sprang up as a testing module and the final goal is to
> > have all injection done through perf events but it could also be prudent
> > to keep those as a fallback alternative.
> >
> > I'm thinking something like
> > Documentation/ABI/testing/sysfs-devices-system-edac but looking
> > at Documentation/ABI/README and reading the "testing/" category
> > description, I don't think it applies to those interfaces just yet for
> > the simple reason that they're still changing and I wouldn't want to
> > cast them in stone now.
> >
> > Can we add them to a "unstable" category instead or maybe delay making
> > the ABI description for now?
>
> No. If you add sysfs files to the kernel, you had better be sure that
> they are correct and are what you want to be doing. Otherwise, don't
> add them.
I cannot be sure right now because this whole RAS thing is being
developed now and we're more or less learning by doing. I don't think
anyone can tell you what we're going to need a year from now on
interfaces for error injection.
> As you really want to do this through perf, just do it through perf.
> Please don't create a user/kernel api that you are not going to want
> to use in the end.
As I said above, it is going to be used as a fallback alternative to the
perf option for systems which cannot do perf. This injection method will
be the fallback.
Also, this whole module is default-N in Kconfig and you have to really
know what you're doing when enabling it.
Which brings me to a much better idea: this whole interface should
actually go to debugfs! This is more or less a debugging module, so
debugfs, doh!
/me slaps himself on the forehead and goes to rework the patches.
Thanks.
--
Regards/Gruss,
Boris.
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-05-08 19:43 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-05 16:49 [PATCH 0/4] x86, MCE, AMD: Hardware MCE injection Borislav Petkov
2011-05-05 16:49 ` [PATCH 1/4] x86, MCE: Add a HW injection flag Borislav Petkov
2011-05-05 16:49 ` [PATCH 2/4] EDAC, MCE, AMD: Add function to enable direct write to MCE MSRs Borislav Petkov
2011-05-05 16:49 ` [PATCH 3/4] EDAC, MCE, AMD: Add hw_inject and cpu sysfs attrs Borislav Petkov
2011-05-08 16:40 ` Greg KH
2011-05-08 18:49 ` Borislav Petkov
2011-05-08 19:18 ` Greg KH
2011-05-08 19:43 ` Borislav Petkov
2011-05-05 16:49 ` [PATCH 4/4] EDAC, MCE, AMD: Add an injector function Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox