From: Borislav Petkov <bp@suse.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Wang, Rui Y" <rui.y.wang@intel.com>,
"Chen, Gong" <gong.chen@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: [PATCH] x86/mce: Initialize workqueues only once
Date: Fri, 19 Jun 2015 11:27:18 +0200 [thread overview]
Message-ID: <20150619092718.GB12979@pd.tnic> (raw)
In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F32A9E177@ORSMSX114.amr.corp.intel.com>
On Wed, Jun 17, 2015 at 11:53:53PM +0000, Luck, Tony wrote:
> > if you want to give those changes a run, I've uploaded them here:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git#tip-ras
>
> Latest experiments show that sometimes checking kventd_up() before calling schedule_work()
> helps ... but mostly only when I fake some early logs from low numbered cpus. I added some
> traces to the real case of a left-over fatal error and got this splat:
Here's the fix:
--
From: Borislav Petkov <bp@suse.de>
Subject: [PATCH] x86/mce: Initialize workqueues only once
96d98bfd0366 ("x86/mce: Don't use percpu workqueues") dropped the
per-CPU workqueues in the MCE code but left the initialization per-CPU.
This lead to early boot time splats (below) in the workqueues code
because we were overwriting the workqueue during INIT_WORK() on each new
CPU which would appear.
And since mcheck_cpu_init() happens very early, using an initcall to do
this one-time initialization doesn't fly. Using work->func as a check
whether the workqueue has been initialized already might break in the
future if someone changes ->func, so doesn't work either.
So let's have a simple static boolean flag to do that one-time work.
mce: [Hardware Error]: Machine check events logged
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
PGD 0
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640 05/01/2015
task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
RIP: 0010:[<ffffffff810980a1>] process_one_work+0x31/0x420
RSP: 0000:ffff88181bd97e08 EFLAGS: 00010046
RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
^^^^^^^^^^^^^^^^^
27: 4c 0f 45 f2 cmovne %rdx,%r14
2b:* 49 8b 46 08 mov 0x8(%r14),%rax <-- trapping instruction
2f: 44 8b b8 00 01 00 00 mov 0x100(%rax),%r15d
...
Call Trace:
worker_thread
? rescuer_thread
kthread
? kthread_create_on_node
ret_from_fork
? kthread_create_on_node
Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c 8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44 89 f8 83 e0 01
RIP [<ffffffff810980a1>] process_one_work
RSP <ffff88181bd97e08>
CR2: 0000000000000008
---[ end trace 8229a011b97532a0 ]---
Kernel panic - not syncing: Fatal exception
---[ end Kernel panic - not syncing: Fatal exception
Signed-off-by: Borislav Petkov <bp@suse.de>
---
arch/x86/kernel/cpu/mcheck/mce.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 478f81a6d824..216d44d074df 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1645,6 +1645,8 @@ void (*machine_check_vector)(struct pt_regs *, long error_code) =
*/
void mcheck_cpu_init(struct cpuinfo_x86 *c)
{
+ static bool __mce_init_once;
+
if (mca_cfg.disabled)
return;
@@ -1665,8 +1667,11 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
return;
}
- INIT_WORK(&mce_work, mce_process_work);
- init_irq_work(&mce_irq_work, mce_irq_work_cb);
+ if (!__mce_init_once) {
+ INIT_WORK(&mce_work, mce_process_work);
+ init_irq_work(&mce_irq_work, mce_irq_work_cb);
+ __mce_init_once = 1;
+ }
machine_check_vector = do_machine_check;
--
2.3.5
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
next prev parent reply other threads:[~2015-06-19 9:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <FC9702EC51E4CA40A875703BEBD6CEF801AE738D@SHSMSX101.ccr.corp.intel.com>
2015-06-17 9:41 ` MCE Bug? Borislav Petkov
2015-06-17 17:45 ` Luck, Tony
2015-06-17 23:53 ` Luck, Tony
2015-06-18 10:25 ` Borislav Petkov
2015-06-18 13:10 ` [PATCH] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
2015-06-19 9:27 ` Borislav Petkov [this message]
2015-06-19 12:24 ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
2015-06-19 17:36 ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
2015-06-19 19:02 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150619092718.GB12979@pd.tnic \
--to=bp@suse.de \
--cc=gong.chen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rui.y.wang@intel.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.