* Re: MCE Bug?
2015-06-17 23:53 ` Luck, Tony
@ 2015-06-18 10:25 ` Borislav Petkov
2015-06-18 13:10 ` [PATCH] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
2015-06-19 9:27 ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
2 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-18 10:25 UTC (permalink / raw)
To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
On Wed, Jun 17, 2015 at 11:53:53PM +0000, Luck, Tony wrote:
> > if you want to give those changes a run, I've uploaded them here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
>
> Latest experiments show that sometimes checking kventd_up() before calling schedule_work()
> helps ... but mostly only when I fake some early logs from low numbered cpus. I added some
> traces to the real case of a left-over fatal error and got this splat:
Hmm, and calling mce_log from __mcheck_cpu_init_generic() as you
suggested yesterday seems to work on this box here:
[ 1.588713] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-4620 0 @ 2.20GHz (fam: 06, model: 2d, stepping
: 07)
[ 1.592727] Performance Events: PEBS fmt1+, 16-deep LBR, SandyBridge events, full-w Broken BIOS d
etected, complain to your hardware vendor.
[ 1.997344] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
[ 2.000146] Intel PMU driver.
[ 2.001376] ... version: 3
[ 2.002919] ... bit width: 48
[ 2.004626] ... generic registers: 4
[ 2.006137] ... value mask: 0000ffffffffffff
[ 2.008064] ... max period: 0000ffffffffffff
[ 2.010010] ... fixed-purpose events: 3
[ 2.011528] ... event mask: 000000070000000f
[ 2.017257] x86: Booting SMP configuration:
[ 2.019232] .... node #0, CPUs: #1
[ 2.033848] microcode: CPU1 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.038730] mce: [Hardware Error]: Machine check events logged
[ 2.050735] #2
[ 2.050735] microcode: CPU2 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.056163] mce: [Hardware Error]: Machine check events logged
[ 2.068133] #3
[ 2.068140] microcode: CPU3 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.07412.324641] microcode: CPU4 microcode updated early to revision 0x710, date = 2013-06-17
[ 2.479404] #5
Stuff gets logged just fine, no splats later.
Hmmm, more staring...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH] x86/mce: Kill drain_mcelog_buffer()
2015-06-17 23:53 ` Luck, Tony
2015-06-18 10:25 ` Borislav Petkov
@ 2015-06-18 13:10 ` Borislav Petkov
2015-06-19 9:27 ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
2 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-18 13:10 UTC (permalink / raw)
To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
From: Borislav Petkov <bp@suse.de>
Subject: [PATCH] x86/mce: Kill drain_mcelog_buffer()
This used to flush out MCEs logged during early boot and which were in
the MCA registers from a previous system run. No need for that now,
since we're moving to a genpool.
Signed-off-by: Borislav Petkov <bp@suse.de>
Suggested-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/include/asm/mce.h | 2 +-
arch/x86/kernel/cpu/mcheck/mce.c | 44 ++--------------------------------------
drivers/acpi/acpi_extlog.c | 2 +-
drivers/edac/i7core_edac.c | 2 +-
drivers/edac/mce_amd.c | 2 +-
drivers/edac/sb_edac.c | 2 +-
6 files changed, 7 insertions(+), 47 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index dfaa4de1dbb4..982dfc3679ad 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -140,7 +140,7 @@ struct mce_vendor_flags {
extern struct mce_vendor_flags mce_flags;
extern struct mca_config mca_cfg;
-extern void mce_register_decode_chain(struct notifier_block *nb, bool drain);
+extern void mce_register_decode_chain(struct notifier_block *nb);
extern void mce_unregister_decode_chain(struct notifier_block *nb);
#include <linux/percpu.h>
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index a3660cd1fda3..72b4acef36f4 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -196,55 +196,15 @@ void mce_log(struct mce *mce)
set_bit(0, &mce_need_notify);
}
-static void drain_mcelog_buffer(void)
-{
- unsigned int next, i, prev = 0;
-
- next = ACCESS_ONCE(mcelog.next);
-
- do {
- struct mce *m;
-
- /* drain what was logged during boot */
- for (i = prev; i < next; i++) {
- unsigned long start = jiffies;
- unsigned retries = 1;
-
- m = &mcelog.entry[i];
-
- while (!m->finished) {
- if (time_after_eq(jiffies, start + 2*retries))
- retries++;
-
- cpu_relax();
-
- if (!m->finished && retries >= 4) {
- pr_err("skipping error being logged currently!\n");
- break;
- }
- }
- smp_rmb();
- atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
- }
-
- memset(mcelog.entry + prev, 0, (next - prev) * sizeof(*m));
- prev = next;
- next = cmpxchg(&mcelog.next, prev, 0);
- } while (next != prev);
-}
-
static struct notifier_block mce_srao_nb;
-void mce_register_decode_chain(struct notifier_block *nb, bool drain)
+void mce_register_decode_chain(struct notifier_block *nb)
{
/* Ensure SRAO notifier has the highest priority in the decode chain. */
if (nb != &mce_srao_nb && nb->priority == INT_MAX)
nb->priority -= 1;
atomic_notifier_chain_register(&x86_mce_decoder_chain, nb);
-
- if (drain)
- drain_mcelog_buffer();
}
EXPORT_SYMBOL_GPL(mce_register_decode_chain);
@@ -2031,7 +1991,7 @@ __setup("mce", mcheck_enable);
int __init mcheck_init(void)
{
mcheck_intel_therm_init();
- mce_register_decode_chain(&mce_srao_nb, false);
+ mce_register_decode_chain(&mce_srao_nb);
mcheck_vendor_init_severity();
return 0;
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index 07e012e74c1b..b3842ffc19ba 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -286,7 +286,7 @@ static int __init extlog_init(void)
*/
old_edac_report_status = get_edac_report_status();
set_edac_report_status(EDAC_REPORTING_DISABLED);
- mce_register_decode_chain(&extlog_mce_dec, true);
+ mce_register_decode_chain(&extlog_mce_dec);
/* enable OS to be involved to take over management from BIOS */
((struct extlog_l1_head *)extlog_l1_addr)->flags |= FLAG_OS_OPTIN;
diff --git a/drivers/edac/i7core_edac.c b/drivers/edac/i7core_edac.c
index 13d77f4a892c..01087a38da22 100644
--- a/drivers/edac/i7core_edac.c
+++ b/drivers/edac/i7core_edac.c
@@ -2424,7 +2424,7 @@ static int __init i7core_init(void)
pci_rc = pci_register_driver(&i7core_driver);
if (pci_rc >= 0) {
- mce_register_decode_chain(&i7_mce_dec, true);
+ mce_register_decode_chain(&i7_mce_dec);
return 0;
}
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index aca31a237073..58586d59bf8e 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -895,7 +895,7 @@ static int __init mce_amd_init(void)
pr_info("MCE: In-kernel MCE decoding enabled.\n");
- mce_register_decode_chain(&amd_mce_dec_nb, true);
+ mce_register_decode_chain(&amd_mce_dec_nb);
return 0;
}
diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
index 494a23e4f7e6..1acf57ba4c86 100644
--- a/drivers/edac/sb_edac.c
+++ b/drivers/edac/sb_edac.c
@@ -2556,7 +2556,7 @@ static int __init sbridge_init(void)
pci_rc = pci_register_driver(&sbridge_driver);
if (pci_rc >= 0) {
- mce_register_decode_chain(&sbridge_mce_dec, true);
+ mce_register_decode_chain(&sbridge_mce_dec);
if (get_edac_report_status() == EDAC_REPORTING_DISABLED)
sbridge_printk(KERN_WARNING, "Loading driver, error reporting disabled.\n");
return 0;
--
2.3.5
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH] x86/mce: Initialize workqueues only once
2015-06-17 23:53 ` Luck, Tony
2015-06-18 10:25 ` Borislav Petkov
2015-06-18 13:10 ` [PATCH] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
@ 2015-06-19 9:27 ` Borislav Petkov
2015-06-19 12:24 ` Borislav Petkov
2015-06-19 17:36 ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
2 siblings, 2 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-19 9:27 UTC (permalink / raw)
To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
On Wed, Jun 17, 2015 at 11:53:53PM +0000, Luck, Tony wrote:
> > if you want to give those changes a run, I've uploaded them here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
>
> Latest experiments show that sometimes checking kventd_up() before calling schedule_work()
> helps ... but mostly only when I fake some early logs from low numbered cpus. I added some
> traces to the real case of a left-over fatal error and got this splat:
Here's the fix:
--
From: Borislav Petkov <bp@suse.de>
Subject: [PATCH] x86/mce: Initialize workqueues only once
96d98bfd0366 ("x86/mce: Don't use percpu workqueues") dropped the
per-CPU workqueues in the MCE code but left the initialization per-CPU.
This lead to early boot time splats (below) in the workqueues code
because we were overwriting the workqueue during INIT_WORK() on each new
CPU which would appear.
And since mcheck_cpu_init() happens very early, using an initcall to do
this one-time initialization doesn't fly. Using work->func as a check
whether the workqueue has been initialized already might break in the
future if someone changes ->func, so doesn't work either.
So let's have a simple static boolean flag to do that one-time work.
mce: [Hardware Error]: Machine check events logged
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015
task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
RIP: 0010:[<ffffffff810980a1>] process_one_work+0x31/0x420
RSP: 0000:ffff88181bd97e08 EFLAGS: 00010046
RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
^^^^^^^^^^^^^^^^^
27: 4c 0f 45 f2 cmovne %rdx,%r14
2b:* 49 8b 46 08 mov 0x8(%r14),%rax <-- trapping instruction
2f: 44 8b b8 00 01 00 00 mov 0x100(%rax),%r15d
...
Call Trace:
worker_thread
? rescuer_thread
kthread
? kthread_create_on_node
ret_from_fork
? kthread_create_on_node
Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c 8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44 89 f8 83 e0 01
RIP [<ffffffff810980a1>] process_one_work
RSP <ffff88181bd97e08>
CR2: 0000000000000008
---[ end trace 8229a011b97532a0 ]---
Kernel panic - not syncing: Fatal exception
---[ end Kernel panic - not syncing: Fatal exception
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/mcheck/mce.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 478f81a6d824..216d44d074df 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1645,6 +1645,8 @@ void (*machine_check_vector)(struct pt_regs *, long error_code) =
*/
void mcheck_cpu_init(struct cpuinfo_x86 *c)
{
+ static bool __mce_init_once;
+
if (mca_cfg.disabled)
return;
@@ -1665,8 +1667,11 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
return;
}
- INIT_WORK(&mce_work, mce_process_work);
- init_irq_work(&mce_irq_work, mce_irq_work_cb);
+ if (!__mce_init_once) {
+ INIT_WORK(&mce_work, mce_process_work);
+ init_irq_work(&mce_irq_work, mce_irq_work_cb);
+ __mce_init_once = 1;
+ }
machine_check_vector = do_machine_check;
--
2.3.5
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] x86/mce: Initialize workqueues only once
2015-06-19 9:27 ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
@ 2015-06-19 12:24 ` Borislav Petkov
2015-06-19 17:36 ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
1 sibling, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-19 12:24 UTC (permalink / raw)
To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
On Fri, Jun 19, 2015 at 11:27:18AM +0200, Borislav Petkov wrote:
> Here's the fix:
Injected error:
[ 201.237241] EINJ: Error INJection is initialized.
[ 246.146290] [Firmware Bug]: APEI: Invalid bit width + offset in GAR [0x1234/64/0/3/0]
[ 257.868432] Disabling lock debugging due to kernel taint
[ 259.808068] mce: [Hardware Error]: CPU 61: Machine Check Exception: 5 Bank 7: be00000000010090
[ 257.868432] mce: [Hardware Error]: Machine check events logged
[ 259.808068] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81344bbf> {intel_idle+0xbf/0x130}
[ 259.808068] mce: [Hardware Error]: TSC 98aa47539f28 ADDR 1200 MISC 142303c86
[ 259.808068] mce: [Hardware Error]: PROCESSOR 0:306e7 TIME 1434716502 SOCKET 0 APIC 3 microcode 700
[ 259.808068] EDAC sbridge MC3: HANDLING MCE MEMORY ERROR
[ 259.808068] EDAC sbridge MC3: CPU 61: Machine Check Exception: 5 Bank 7: be00000000010090
[ 259.808068] EDAC sbridge MC3: TSC 98aa47539f28
[ 259.808068] EDAC sbridge MC3: ADDR 1200 EDAC sbridge MC3: MISC 142303c86
[ 259.808068] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1434716502 SOCKET 0 APIC 3
[ 259.808068] EDAC MC3: 0 UE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x1 offset:0x200 grain:32 - recoverable area:DRAM err_code:0001:0090 socket:0 channel_mask:4 rank:0)
Let's see whether something's in the MCA MSRs when this monster reboots
(it takes forever...).
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] x86/mce: Initialize workqueues only once (alternate proposal)
2015-06-19 9:27 ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
2015-06-19 12:24 ` Borislav Petkov
@ 2015-06-19 17:36 ` Luck, Tony
2015-06-19 19:02 ` Borislav Petkov
1 sibling, 1 reply; 9+ messages in thread
From: Luck, Tony @ 2015-06-19 17:36 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
96d98bfd0366 ("x86/mce: Don't use percpu workqueues") dropped the
per-CPU workqueues in the MCE code but left the initialization per-CPU.
This lead to early boot time splats (below) in the workqueues code
because we were overwriting the workqueue during INIT_WORK() on each new
CPU which would appear.
Move initialization to mcheck_init() so it happens only once.
mce: [Hardware Error]: Machine check events logged
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640
+05/01/2015
task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
RIP: 0010:[<ffffffff810980a1>] process_one_work+0x31/0x420
RSP: 0000:ffff88181bd97e08 EFLAGS: 00010046
RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
^^^^^^^^^^^^^^^^^
27: 4c 0f 45 f2 cmovne %rdx,%r14
2b:* 49 8b 46 08 mov 0x8(%r14),%rax <-- trapping instruction
2f: 44 8b b8 00 01 00 00 mov 0x100(%rax),%r15d
...
Call Trace:
worker_thread
? rescuer_thread
kthread
? kthread_create_on_node
ret_from_fork
? kthread_create_on_node
Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c
+8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44
+89 f8 83 e0 01
RIP [<ffffffff810980a1>] process_one_work
RSP <ffff88181bd97e08>
CR2: 0000000000000008
---[ end trace 8229a011b97532a0 ]---
Kernel panic - not syncing: Fatal exception
---[ end Kernel panic - not syncing: Fatal exception
Reported-by: Rui Wang <rui.y.wang@intel.com>
Debugged-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 478f81a6d824..158d9e7db974 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1665,9 +1665,6 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
return;
}
- INIT_WORK(&mce_work, mce_process_work);
- init_irq_work(&mce_irq_work, mce_irq_work_cb);
-
machine_check_vector = do_machine_check;
__mcheck_cpu_init_generic();
@@ -1994,6 +1991,9 @@ int __init mcheck_init(void)
mce_register_decode_chain(&mce_srao_nb);
mcheck_vendor_init_severity();
+ INIT_WORK(&mce_work, mce_process_work);
+ init_irq_work(&mce_irq_work, mce_irq_work_cb);
+
return 0;
}
--
2.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] x86/mce: Initialize workqueues only once (alternate proposal)
2015-06-19 17:36 ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
@ 2015-06-19 19:02 ` Borislav Petkov
0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2015-06-19 19:02 UTC (permalink / raw)
To: Luck, Tony; +Cc: Wang, Rui Y, Chen, Gong, linux-kernel@vger.kernel.org
On Fri, Jun 19, 2015 at 10:36:20AM -0700, Luck, Tony wrote:
> 96d98bfd0366 ("x86/mce: Don't use percpu workqueues") dropped the
> per-CPU workqueues in the MCE code but left the initialization per-CPU.
> This lead to early boot time splats (below) in the workqueues code
> because we were overwriting the workqueue during INIT_WORK() on each new
> CPU which would appear.
>
> Move initialization to mcheck_init() so it happens only once.
>
> mce: [Hardware Error]: Machine check events logged
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
> PGD 0
> Oops: 0000 [#1] SMP
> Modules linked in:
> CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
> Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640
> +05/01/2015
> task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
> RIP: 0010:[<ffffffff810980a1>] process_one_work+0x31/0x420
> RSP: 0000:ffff88181bd97e08 EFLAGS: 00010046
> RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
> RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
> RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
> R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
> R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
> ^^^^^^^^^^^^^^^^^
>
> 27: 4c 0f 45 f2 cmovne %rdx,%r14
> 2b:* 49 8b 46 08 mov 0x8(%r14),%rax <-- trapping instruction
> 2f: 44 8b b8 00 01 00 00 mov 0x100(%rax),%r15d
>
> ...
>
> Call Trace:
> worker_thread
> ? rescuer_thread
> kthread
> ? kthread_create_on_node
> ret_from_fork
> ? kthread_create_on_node
> Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c
> +8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44
> +89 f8 83 e0 01
> RIP [<ffffffff810980a1>] process_one_work
> RSP <ffff88181bd97e08>
> CR2: 0000000000000008
> ---[ end trace 8229a011b97532a0 ]---
> Kernel panic - not syncing: Fatal exception
> ---[ end Kernel panic - not syncing: Fatal exception
>
> Reported-by: Rui Wang <rui.y.wang@intel.com>
> Debugged-by: Borislav Petkov <bp@suse.de>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 478f81a6d824..158d9e7db974 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1665,9 +1665,6 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
> return;
> }
>
> - INIT_WORK(&mce_work, mce_process_work);
> - init_irq_work(&mce_irq_work, mce_irq_work_cb);
> -
> machine_check_vector = do_machine_check;
>
> __mcheck_cpu_init_generic();
> @@ -1994,6 +1991,9 @@ int __init mcheck_init(void)
> mce_register_decode_chain(&mce_srao_nb);
> mcheck_vendor_init_severity();
>
> + INIT_WORK(&mce_work, mce_process_work);
> + init_irq_work(&mce_irq_work, mce_irq_work_cb);
> +
> return 0;
Hmm, and I was under the impression that mcheck_init() runs much
later... Not really.
Anyway, your version is better, I've replaced mine with it.
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 9+ messages in thread