From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paul Gortmaker Subject: [PATCH -rt 3.10.x] mce: don't try to wake thread before it exists. Date: Tue, 26 Aug 2014 18:10:53 -0400 Message-ID: <1409091053-45412-1-git-send-email-paul.gortmaker@windriver.com> Mime-Version: 1.0 Content-Type: text/plain Cc: , Paul Gortmaker To: "Steven Rostedt (Red Hat)" Return-path: Received: from mail1.windriver.com ([147.11.146.13]:62808 "EHLO mail1.windriver.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755588AbaHZWMo (ORCPT ); Tue, 26 Aug 2014 18:12:44 -0400 Sender: linux-rt-users-owner@vger.kernel.org List-ID: If a broken machine with issues raises an MCE irq event real early in the boot, it can try and wake the -rt specific handler thread (mce_notify_helper) before it exists. (It is created through a device_initcall that happens later in the boot.) When this happens, we see the irq, which calls the wake with a null pointer, which then panics the machine at boot. The race between the irq event and thread init is as follows: mce_notify_irq(); --> mce_notify_work(); --> wake_up_process(mce_notify_helper); device_initcall_sync(mcheck_init_device); --> mce_notify_work_init(); --> mce_notify_helper = kthread_run(mce_notify_helper_thread, ...); So, clearly if the IRQ event happens before the device_initcall, the mce_notify_helper pointer (at global file scope and hence BSS) will still be NULL, resulting in the following panic at boot: CPU: Physical Processor ID: 0 CPU: Processor Core ID: 0 ENERGY_PERF_BIAS: Set to 'normal', was 'performance' ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8) mce: CPU supports 22 MCE banks CPU0: Thermal monitoring enabled (TM1) Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0 tlb_flushall_shift: 6 Freeing SMP alternatives: 36k freed ACPI: Core revision 20130328 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] wake_up_process+0xd/0x40 PGD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.40-rt40_preempt-rt #1 Hardware name: Insyde Grantley/Type2 - Board Product Name1, BIOS 05.04.07 04/21/2014 task: ffffffff81e14440 ti: ffffffff81e00000 task.ti: ffffffff81e00000 RIP: 0010:[] [] wake_up_process+0xd/0x40 RSP: 0000:ffff88107fc03f68 EFLAGS: 00010086 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000007ffefbff RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88107fc03f70 R08: 0000000000000002 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88103f03d100 R13: ffff880ff4e0c000 R14: ffff88107fc16f00 R15: ffff880ff4e0c000 FS: 0000000000000000(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000001e0f000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff88107fc0ccf0 ffff88107fc03f80 ffffffff8101f900 ffff88107fc03f98 ffffffff8102169d ffff88107fc0fab0 ffff88107fc03fa8 ffffffff81022051 ffffffff81e01d48 ffffffff819a8a9a ffffffff81e01bf8 ffffffff81e01d48 Call Trace: [] mce_notify_irq+0x30/0x40 [] intel_threshold_interrupt+0xbd/0xe0 [] smp_threshold_interrupt+0x21/0x40 [] threshold_interrupt+0x6a/0x70 [] ? __slab_alloc.isra.48+0x39e/0x60c [] ? acpi_ps_alloc_op+0x9a/0xa1 [] ? kmem_cache_free+0xb8/0x2b0 [] kmem_cache_alloc+0x234/0x2e0 [] ? acpi_ps_alloc_op+0x9a/0xa1 [] acpi_ps_alloc_op+0x9a/0xa1 [] acpi_ps_get_next_arg+0xfe/0x3d3 [] acpi_ps_parse_loop+0x290/0x560 [] acpi_ps_parse_aml+0x98/0x28c [] acpi_ns_one_complete_parse+0x104/0x124 [] acpi_ns_parse_table+0x33/0x38 [] acpi_ns_load_table+0x4a/0x8c [] acpi_load_tables+0xa2/0x176 [] acpi_early_init+0x70/0x100 [] ? check_bugs+0xe/0x2d [] start_kernel+0x387/0x3b5 [] ? repair_env_string+0x5c/0x5c [] x86_64_start_reservations+0x2a/0x2c [] x86_64_start_kernel+0xcc/0xcf Code: 8b 52 18 e9 9e fc ff ff 48 89 45 c0 e8 cd df 92 00 48 8b 45 c0 eb e5 0f 1f 80 00 00 00 00 e8 fb 04 93 00 55 48 89 e5 53 48 89 fb <48> 8b 07 a8 0c 75 12 48 89 df 31 d2 be 03 00 00 00 e8 ad fb ff RIP [] wake_up_process+0xd/0x40 RSP CR2: 0000000000000000 ---[ end trace 0000000000000001 ]--- Kernel panic - not syncing: Fatal exception in interrupt Evidently the hardware has issues, but we can handle this more gracefully by ignoring the events that happen before the device_initcall has registered the mce handler thread. Signed-off-by: Paul Gortmaker diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index aaf4b9b94f38..94860c521fb8 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1391,6 +1391,11 @@ static int mce_notify_work_init(void) static void mce_notify_work(void) { + if (unlikely(!mce_notify_helper)) { + pr_info(HW_ERR "Machine check event before MCE init; ignored\n"); + return; + } + wake_up_process(mce_notify_helper); } #else -- 2.0.1