From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759478Ab2AMX2O (ORCPT ); Fri, 13 Jan 2012 18:28:14 -0500 Received: from e23smtp04.au.ibm.com ([202.81.31.146]:44139 "EHLO e23smtp04.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756145Ab2AMX2L (ORCPT ); Fri, 13 Jan 2012 18:28:11 -0500 Message-ID: <4F10BDF7.8030306@linux.vnet.ibm.com> Date: Sat, 14 Jan 2012 04:57:51 +0530 From: "Srivatsa S. Bhat" User-Agent: Mozilla/5.0 (X11; Linux i686; rv:7.0) Gecko/20110927 Thunderbird/7.0 MIME-Version: 1.0 To: Linus Torvalds CC: Ming Lei , Djalal Harouni , Borislav Petkov , Tony Luck , Hidetoshi Seto , Ingo Molnar , Andi Kleen , linux-kernel@vger.kernel.org, Greg Kroah-Hartman , Kay Sievers , gouders@et.bocholt.fh-gelsenkirchen.de, Marcos Souza , Linux PM mailing list , "Rafael J. Wysocki" , "tglx@linutronix.de" , prasad@linux.vnet.ibm.com, justinmattock@gmail.com, Jeff Chua , Suresh B Siddha , Peter Zijlstra , Mel Gorman , Gilad Ben-Yossef Subject: Re: x86/mce: machine check warning during poweroff References: <20120111000051.GA28874@dztty> <4F10929E.8070007@linux.vnet.ibm.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit x-cbid: 12011313-9264-0000-0000-0000009EB0F0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/14/2012 04:32 AM, Linus Torvalds wrote: > On Fri, Jan 13, 2012 at 12:22 PM, Srivatsa S. Bhat > wrote: >> >> Fundamentally, this warning is triggered during CPU Offline, which is done >> during poweroff, suspend, hibernate etc. IOW, even a simple >> # echo 0 > /sys/devices/system/cpu/cpuX/online will trigger it. > > There is definitely something wrong with CPU hotplug and MCE. > > I seem to be able to trigger not only warnings, but some oopses, by doing: > > - enable list debugging, slab debugging, and kobject debugging in the > kernel (I've got some other things enabled too, but I think those are > the main ones) > > - do > > echo 0 > /sys/devices/system/cpu/cpuX/online > > this gets a few warnings > > - then do > > echo 1 > /sys/devices/system/cpu/cpuX/online > > where bringing it up again will crash the machine entirely. > I observed this too; and it is very easy to reproduce. Here is the log: # echo 0 > /sys/devices/system/cpu/cpu1/online [ 65.091045] CPU 1 is now offline [ 65.097267] ------------[ cut here ]------------ [ 65.102045] WARNING: at drivers/base/core.c:194 device_release+0x82/0x90() [ 65.109137] Hardware name: IBM System x -[7870C4Q]- [ 65.109139] Device 'machinecheck1' does not have a release() function, it is broken and must be fixed. [ 65.109141] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod cdc_ether usbnet i7core_edac edac_core mii serio_raw i2c_i801 shpchp ioatdma iTCO_wdt iTCO_vendor_support dca pci_hotplug pcspkr bnx2 i2c_core tpm_tis tpm tpm_bios sg rtc_cmos button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon [ 65.109195] Pid: 6631, comm: bash Not tainted 3.2.0-debugkernel-0.0.0.28.36b5ec9-default #4 [ 65.109197] Call Trace: [ 65.109202] [] ? device_release+0x82/0x90 [ 65.109208] [] warn_slowpath_common+0x7a/0xb0 [ 65.109212] [] warn_slowpath_fmt+0x41/0x50 [ 65.109216] [] device_release+0x82/0x90 [ 65.109223] [] ? kobj_kset_leave+0x1e/0x60 [ 65.109228] [] kobject_cleanup+0x6d/0x1b0 [ 65.109233] [] kobject_release+0xd/0x10 [ 65.109237] [] kobject_put+0x2b/0x60 [ 65.109241] [] put_device+0x12/0x20 [ 65.109245] [] device_unregister+0x25/0x60 [ 65.109252] [] mce_cpu_callback+0x149/0x1a5 [ 65.109257] [] notifier_call_chain+0x72/0x110 [ 65.109263] [] __raw_notifier_call_chain+0x9/0x10 [ 65.109270] [] _cpu_down+0x1c6/0x320 [ 65.109274] [] cpu_down+0x3b/0x60 [ 65.109279] [] store_online+0x6d/0xc8 [ 65.109283] [] dev_attr_store+0x1b/0x20 [ 65.109288] [] sysfs_write_file+0xd4/0x150 [ 65.109295] [] vfs_write+0xcb/0x130 [ 65.109299] [] sys_write+0x50/0x90 [ 65.109304] [] system_call_fastpath+0x16/0x1b [ 65.109307] ---[ end trace dafb3fda8041063e ]--- [ 65.112016] ------------[ cut here ]------------ [ 65.112024] WARNING: at arch/x86/kernel/smp.c:120 native_smp_send_reschedule+0x59/0x60() [ 65.112027] Hardware name: IBM System x -[7870C4Q]- [ 65.112028] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod cdc_ether usbnet i7core_edac edac_core mii serio_raw i2c_i801 shpchp ioatdma iTCO_wdt iTCO_vendor_support dca pci_hotplug pcspkr bnx2 i2c_core tpm_tis tpm tpm_bios sg rtc_cmos button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon [ 65.112067] Pid: 2277, comm: udevd Tainted: G W 3.2.0-debugkernel-0.0.0.28.36b5ec9-default #4 [ 65.112070] Call Trace: [ 65.112071] [] ? native_smp_send_reschedule+0x59/0x60 [ 65.112079] [] warn_slowpath_common+0x7a/0xb0 [ 65.112083] [] warn_slowpath_null+0x15/0x20 [ 65.112086] [] native_smp_send_reschedule+0x59/0x60 [ 65.112092] [] trigger_load_balance+0x185/0x4f0 [ 65.112096] [] ? trigger_load_balance+0x1bb/0x4f0 [ 65.112101] [] scheduler_tick+0x107/0x170 [ 65.112107] [] update_process_times+0x67/0x80 [ 65.112113] [] tick_sched_timer+0x5f/0xc0 [ 65.112117] [] ? tick_nohz_handler+0x100/0x100 [ 65.112122] [] __run_hrtimer+0x12e/0x330 [ 65.112126] [] hrtimer_interrupt+0xc7/0x1f0 [ 65.112131] [] smp_apic_timer_interrupt+0x64/0xa0 [ 65.112135] [] apic_timer_interrupt+0x73/0x80 [ 65.112137] [] ? __slab_alloc+0x228/0x4e0 [ 65.112145] [] ? __wake_up_bit+0x10/0x30 [ 65.112150] [] unlock_page+0x25/0x30 [ 65.112157] [] do_wp_page+0x4f5/0x7b0 [ 65.112161] [] handle_pte_fault+0x19d/0x1e0 [ 65.112165] [] handle_mm_fault+0x178/0x2e0 [ 65.112169] [] do_page_fault+0x201/0x4c0 [ 65.112173] [] ? do_fork+0x179/0x350 [ 65.112177] [] ? mntput+0x1e/0x30 [ 65.112182] [] ? __fput+0x16f/0x210 [ 65.112187] [] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 65.112192] [] page_fault+0x25/0x30 [ 65.112195] ---[ end trace dafb3fda8041063f ]--- [ 65.541793] CPU 9 MCA banks CMCI:2 CMCI:3 CMCI:5 [ 75.472229] lockdep: fixing up alternatives. The above warning is related to the reschedule IPI sent to an offline cpu. I guess this is due to the recent changes done to nohz_balancer_kick() and find_new_ilb() in kernel/sched/fair.c. I had never seen this warning before 3.3 merge window, even during CPU Hotplug stress tests. Now this warning is seen pretty often during CPU offline. [Adding Suresh Siddha and Peter Zijlstra to Cc.] # echo 1 > /sys/devices/system/cpu/cpu1/online [ 75.476772] Booting Node 0 Processor 1 APIC 0x2 [ 75.481495] smpboot cpu 1: start_ip = 97000 [ 75.492927] Calibrating delay loop (skipped) already calibrated this CPU [ 75.508449] NMI watchdog enabled, takes one hw-pmu counter. [ 75.515402] general protection fault: 0000 [#1] SMP [ 75.518940] CPU 7 [ 75.518940] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod cdc_ether usbnet i7core_edac edac_core mii serio_raw i2c_i801 shpchp ioatdma iTCO_wdt iTCO_vendor_support dca pci_hotplug pcspkr bnx2 i2c_core tpm_tis tpm tpm_bios sg rtc_cmos button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon [ 75.518940] [ 75.518940] Pid: 6631, comm: bash Tainted: G W 3.2.0-debugkernel-0.0.0.28.36b5ec9-default #4 IBM IBM System x -[7870C4Q]-/68Y8033 [ 75.518940] RIP: 0010:[] [] kobject_get+0x19/0x60 [ 75.518940] RSP: 0018:ffff8808c6cc7c18 EFLAGS: 00010206 [ 75.518940] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b7b RCX: 0000000000000006 [ 75.518940] RDX: ffffffff81e98ae0 RSI: ffff8808ccc93080 RDI: 6b6b6b6b6b6b6b7b [ 75.518940] RBP: ffff8808c6cc7c28 R08: 5ff145670d8e439e R09: 0000000000000000 [ 75.518940] R10: 0000000000000005 R11: 0000000000000001 R12: ffff88114ded3608 [ 75.518940] R13: ffffffff81a13440 R14: ffff8808ddc4cb60 R15: 0000000000000001 [ 75.518940] FS: 00007f9a3218e700(0000) GS:ffff88117fcc0000(0000) knlGS:0000000000000000 [ 75.518940] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 75.518940] CR2: 000000000068a2a0 CR3: 000000114bd59000 CR4: 00000000000006e0 [ 75.518940] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 75.518940] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 75.518940] Process bash (pid: 6631, threadinfo ffff8808c6cc6000, task ffff8808c6d9c600) [ 75.518940] Stack: [ 75.518940] ffff8808ccc93080 ffff88114ded3608 ffff8808c6cc7c38 ffffffff8133ab14 [ 75.518940] ffff8808c6cc7c48 ffffffff8133ddad ffff8808c6cc7c68 ffffffff81478b82 [ 75.518940] ffff88114ded3608 ffff8808ccc93080 ffff8808c6cc7c88 ffffffff81479062 [ 75.518940] Call Trace: [ 75.518940] [] get_device+0x14/0x20 [ 75.518940] [] klist_devices_get+0xd/0x10 [ 75.518940] [] klist_node_init+0x42/0x70 [ 75.518940] [] klist_add_tail+0x22/0x60 [ 75.518940] [] bus_add_device+0x1bb/0x200 [ 75.518940] [] device_add+0x2e7/0x570 [ 75.518940] [] ? device_pm_init+0x70/0xa0 [ 75.518940] [] device_register+0x19/0x20 [ 75.518940] [] mce_device_create+0x8b/0x18b [ 75.518940] [] mce_cpu_callback+0x187/0x1a5 [ 75.518940] [] notifier_call_chain+0x72/0x110 [ 75.518940] [] __raw_notifier_call_chain+0x9/0x10 [ 75.518940] [] _cpu_up+0x124/0x12a [ 75.518940] [] cpu_up+0xbc/0x114 [ 75.518940] [] store_online+0x95/0xc8 [ 75.518940] [] dev_attr_store+0x1b/0x20 [ 75.518940] [] sysfs_write_file+0xd4/0x150 [ 75.518940] [] vfs_write+0xcb/0x130 [ 75.518940] [] sys_write+0x50/0x90 [ 75.518940] [] system_call_fastpath+0x16/0x1b [ 75.518940] Code: ff ff 55 48 83 ef 38 48 89 e5 e8 43 fe ff ff c9 c3 90 55 48 89 e5 48 83 ec 10 48 85 ff 48 89 1c 24 4c 89 64 24 08 48 89 fb 74 0f <8b> 47 38 4c 8d 67 38 85 c0 74 1c f0 ff 43 38 48 89 d8 4c 8b 64 [ 75.518940] RIP [] kobject_get+0x19/0x60 [ 75.518940] RSP [ 75.856395] ---[ end trace dafb3fda80410640 ]--- And in a separate try, I got this during cpu online operation: (Pretty much the same as above, but with the BUG description present.) [ 83.491328] Booting Node 1 Processor 6 APIC 0x14^M [ 83.496135] smpboot cpu 6: start_ip = 97000^M [ 72.494772] Calibrating delay loop (skipped) already calibrated this CPU^M [ 83.522491] NMI watchdog enabled, takes one hw-pmu counter.^M [ 83.529016] BUG: unable to handle kernel paging request at 000000350000004a^M [ 83.532868] IP: [] kobject_get+0x19/0x60^M [ 83.532868] PGD 8c7909067 PUD 0 ^M [ 83.532868] Oops: 0000 [#1] SMP ^M [ 83.532868] CPU 0 ^M [ 83.532868] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod ioatdma cdc_ether usbnet bnx2 shpchp mii tpm_tis tpm i7core_edac rtc_cmos serio_raw i2c_i801 dca pcspkr pci_hotplug edac_core i2c_core iTCO_wdt iTCO_vendor_support sg tpm_bios button uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon^M [ 83.532868] ^M [ 83.532868] Pid: 6347, comm: allon_cpu_statu Tainted: G W 3.2.0-33-default #3 IBM IBM System x -[7870C4Q]-/68Y8033 ^M [ 83.532868] RIP: 0010:[] [] kobject_get+0x19/0x60^M [ 83.532868] RSP: 0018:ffff8808c78c1c18 EFLAGS: 00010206^M [ 83.532868] RAX: 0000000000000000 RBX: 0000003500000012 RCX: 0000000000000006^M [ 83.532868] RDX: ffffffff81f0f180 RSI: ffff8808c7f01118 RDI: 0000003500000012^M [ 83.532868] RBP: ffff8808c78c1c28 R08: 543148780dbe0391 R09: 0000000000000000^M [ 83.532868] R10: 0000000000000005 R11: 0000000000000001 R12: ffff8808c9f37d38^M [ 83.532868] R13: ffffffff81a13440 R14: ffff88117fc8cb60 R15: 0000000000000006^M [ 83.532868] FS: 00007f7043861700(0000) GS:ffff8808ffc00000(0000) knlGS:0000000000000000^M [ 83.532868] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M [ 83.532868] CR2: 000000350000004a CR3: 00000008c7ee9000 CR4: 00000000000006f0^M [ 83.532868] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M [ 83.532868] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M [ 83.532868] Process allon_cpu_statu (pid: 6347, threadinfo ffff8808c78c0000, task ffff8808ca7c8bc0)^M [ 83.532868] Stack:^M [ 83.532868] ffff8808c7f01118 ffff8808c9f37d38 ffff8808c78c1c38 ffffffff813362e4^M [ 83.532868] ffff8808c78c1c48 ffffffff8133951d ffff8808c78c1c68 ffffffff81473db2^M [ 83.532868] ffff8808c9f37d38 ffff8808c7f01118 ffff8808c78c1c88 ffffffff81474292^M [ 83.532868] Call Trace:^M [ 83.532868] [] get_device+0x14/0x20^M [ 83.532868] [] klist_devices_get+0xd/0x10^M [ 83.532868] [] klist_node_init+0x42/0x70^M [ 83.532868] [] klist_add_tail+0x22/0x60^M [ 83.532868] [] bus_add_device+0x1bb/0x200^M [ 83.532868] [] device_add+0x2e7/0x570^M [ 83.532868] [] ? device_pm_init+0x70/0xa0^M [ 83.532868] [] device_register+0x19/0x20^M [ 83.532868] [] mce_device_create+0x8b/0x18b^M [ 83.532868] [] mce_cpu_callback+0x187/0x1a5^M [ 83.532868] [] notifier_call_chain+0x72/0x110^M [ 83.532868] [] __raw_notifier_call_chain+0x9/0x10^M [ 83.532868] [] _cpu_up+0x124/0x12a^M [ 83.532868] [] cpu_up+0xbc/0x114^M [ 83.532868] [] store_online+0x95/0xc8^M [ 83.532868] [] dev_attr_store+0x1b/0x20^M [ 83.532868] [] sysfs_write_file+0xd4/0x150^M [ 83.532868] [] vfs_write+0xcb/0x130^M [ 83.532868] [] sys_write+0x50/0x90^M [ 83.532868] [] system_call_fastpath+0x16/0x1b^M [ 83.532868] Code: ff ff 55 48 83 ef 38 48 89 e5 e8 43 fe ff ff c9 c3 90 55 48 89 e5 48 83 ec 10 48 85 ff 48 89 1c 24 4c 89 64 24 08 48 89 fb 74 0f <8b> 47 38 4c 8d 67 38 85 c0 74 1c f0 ff 43 38 48 89 d8 4c 8b 64 ^M [ 83.532868] RIP [] kobject_get+0x19/0x60^M [ 83.532868] RSP ^M [ 83.532868] CR2: 000000350000004a^M [ 83.890209] ---[ end trace fab5021066ee998d ]---^M > so it's definitely something bad in MCE device handling, and probably > something to do with reusing a 'struct device' after freeign it, or > after not having completely cleaned it up. > > I didn't see if I could spot the problem, but I think this is entirely > reproducible, so hopefully somebody who knows the MCE code can > trivially see this and fix it. > > Linus > Regards, Srivatsa S. Bhat IBM Linux Technology Center