From: Borislav Petkov <bp@suse.de>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "Wang, Rui Y" <rui.y.wang@intel.com>,
"Chen, Gong" <gong.chen@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86/mce: Initialize workqueues only once (alternate proposal)
Date: Fri, 19 Jun 2015 21:02:00 +0200 [thread overview]
Message-ID: <20150619190200.GB20546@pd.tnic> (raw)
In-Reply-To: <20150619173620.GA9622@agluck-desk.sc.intel.com>
On Fri, Jun 19, 2015 at 10:36:20AM -0700, Luck, Tony wrote:
> 96d98bfd0366 ("x86/mce: Don't use percpu workqueues") dropped the
> per-CPU workqueues in the MCE code but left the initialization per-CPU.
> This lead to early boot time splats (below) in the workqueues code
> because we were overwriting the workqueue during INIT_WORK() on each new
> CPU which would appear.
>
> Move initialization to mcheck_init() so it happens only once.
>
> mce: [Hardware Error]: Machine check events logged
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: [<ffffffff810980a1>] process_one_work+0x31/0x420
> PGD 0
> Oops: 0000 [#1] SMP
> Modules linked in:
> CPU: 36 PID: 263 Comm: kworker/36:0 Not tainted 4.1.0-rc8 #1
> Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0065.R01.1505011640
> +05/01/2015
> task: ffff88181c284470 ti: ffff88181bd94000 task.ti: ffff88181bd94000
> RIP: 0010:[<ffffffff810980a1>] process_one_work+0x31/0x420
> RSP: 0000:ffff88181bd97e08 EFLAGS: 00010046
> RAX: 0000000fffffffe0 RBX: ffffffff81d0fa20 RCX: 0000000000000000
> RDX: 0000000fffffff00 RSI: ffffffff81d0fa20 RDI: ffff88181c2660c0
> RBP: ffff88181bd97e48 R08: ffff88181f416ec0 R09: ffff88181c284470
> R10: 0000000000000002 R11: ffffffff8109e5ac R12: ffff88181c2660c0
> R13: ffff88181f416ec0 R14: 0000000000000000 R15: ffff88181c2660f0
> ^^^^^^^^^^^^^^^^^
>
> 27: 4c 0f 45 f2 cmovne %rdx,%r14
> 2b:* 49 8b 46 08 mov 0x8(%r14),%rax <-- trapping instruction
> 2f: 44 8b b8 00 01 00 00 mov 0x100(%rax),%r15d
>
> ...
>
> Call Trace:
> worker_thread
> ? rescuer_thread
> kthread
> ? kthread_create_on_node
> ret_from_fork
> ? kthread_create_on_node
> Code: 48 89 e5 41 57 41 56 45 31 f6 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 18 48 8b 06 4c
> +8b 6f 48 48 89 c2 30 d2 a8 04 4c 0f 45 f2 <49> 8b 46 08 44 8b b8 00 01 00 00 41 c1 ef 05 44
> +89 f8 83 e0 01
> RIP [<ffffffff810980a1>] process_one_work
> RSP <ffff88181bd97e08>
> CR2: 0000000000000008
> ---[ end trace 8229a011b97532a0 ]---
> Kernel panic - not syncing: Fatal exception
> ---[ end Kernel panic - not syncing: Fatal exception
>
> Reported-by: Rui Wang <rui.y.wang@intel.com>
> Debugged-by: Borislav Petkov <bp@suse.de>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/kernel/cpu/mcheck/mce.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 478f81a6d824..158d9e7db974 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1665,9 +1665,6 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
> return;
> }
>
> - INIT_WORK(&mce_work, mce_process_work);
> - init_irq_work(&mce_irq_work, mce_irq_work_cb);
> -
> machine_check_vector = do_machine_check;
>
> __mcheck_cpu_init_generic();
> @@ -1994,6 +1991,9 @@ int __init mcheck_init(void)
> mce_register_decode_chain(&mce_srao_nb);
> mcheck_vendor_init_severity();
>
> + INIT_WORK(&mce_work, mce_process_work);
> + init_irq_work(&mce_irq_work, mce_irq_work_cb);
> +
> return 0;
Hmm, and I was under the impression that mcheck_init() runs much
later... Not really.
Anyway, your version is better, I've replaced mine with it.
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at http://www.tux.org/lkml/
prev parent reply other threads:[~2015-06-19 19:02 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <FC9702EC51E4CA40A875703BEBD6CEF801AE738D@SHSMSX101.ccr.corp.intel.com>
2015-06-17 9:41 ` MCE Bug? Borislav Petkov
2015-06-17 17:45 ` Luck, Tony
2015-06-17 23:53 ` Luck, Tony
2015-06-18 10:25 ` Borislav Petkov
2015-06-18 13:10 ` [PATCH] x86/mce: Kill drain_mcelog_buffer() Borislav Petkov
2015-06-19 9:27 ` [PATCH] x86/mce: Initialize workqueues only once Borislav Petkov
2015-06-19 12:24 ` Borislav Petkov
2015-06-19 17:36 ` [PATCH] x86/mce: Initialize workqueues only once (alternate proposal) Luck, Tony
2015-06-19 19:02 ` Borislav Petkov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150619190200.GB20546@pd.tnic \
--to=bp@suse.de \
--cc=gong.chen@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rui.y.wang@intel.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.