From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755374AbbCFV7x (ORCPT ); Fri, 6 Mar 2015 16:59:53 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:27029 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753557AbbCFV7w (ORCPT ); Fri, 6 Mar 2015 16:59:52 -0500 Message-ID: <54FA2326.3000300@oracle.com> Date: Fri, 06 Mar 2015 16:59:02 -0500 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Davidlohr Bueso , Ingo Molnar CC: Peter Zijlstra , LKML , Dave Jones , jason.low2@hp.com, Linus Torvalds Subject: Re: sched: softlockups in multi_cpu_stop References: <54F41516.6060608@oracle.com> <54F98F1F.3080107@oracle.com> <20150306123233.GA9972@gmail.com> <1425662342.19505.41.camel@stgolabs.net> <54F9EBCA.1060300@oracle.com> In-Reply-To: <54F9EBCA.1060300@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/06/2015 01:02 PM, Sasha Levin wrote: > I can go redo that again if you suspect that that commit is not the cause. I took a closer look at the logs, and I'm seeing hangs that begin this way as well: [ 2298.020237] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [trinity-c19:839] [ 2298.020237] Modules linked in: [ 2298.020237] CPU: 19 PID: 839 Comm: trinity-c19 Not tainted 4.0.0-rc2-next-20150306-sasha-00056-g61886e8 #2005 [ 2298.020237] task: ffff880278d62000 ti: ffff880254fe8000 task.ti: ffff880254fe8000 [ 2298.020237] RIP: 0010:[] [] __rcu_read_unlock+0x9f/0x130 [ 2298.020237] RSP: 0000:ffff880254fefbd8 EFLAGS: 00000207 [ 2298.020237] RAX: dffffc0000000000 RBX: ffff880254fe8000 RCX: 1ffff1004a9fd002 [ 2298.020237] RDX: 1ffff1004f1ac4e2 RSI: ffff8802c3ff6000 RDI: ffff880278d62714 [ 2298.020237] RBP: ffff880254fefbe8 R08: ffff880362b2e080 R09: ffffffff00000001 [ 2298.020237] R10: ffff880362b2e140 R11: ffffea000e253800 R12: 0000000000000a3e [ 2298.020237] R13: ffff880278d62cb0 R14: ffffed014e1e4899 R15: 0034c1c55efd9eff [ 2298.020237] FS: 00007f183b9c3700(0000) GS:ffff880375200000(0000) knlGS:0000000000000000 [ 2298.020237] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2298.020237] CR2: 0000000000bc8fe8 CR3: 0000000259a0a000 CR4: 00000000000007a0 [ 2298.020237] Stack: [ 2298.020237] ffff8802c3ff6000 ffff880126cfdc28 ffff880254fefc38 ffffffffa43fcb75 [ 2298.020237] ffff8802c3ff6000 ffff8802c3ff6000 ffff880278d62714 ffff880126cfdc48 [ 2298.020237] ffff8802c3ff6000 ffff880126cfdc44 ffff880126cfdc28 ffff880254fefd78 [ 2298.020237] Call Trace: [ 2298.020237] [] rwsem_spin_on_owner+0x165/0x250 [ 2298.020237] [] rwsem_down_write_failed+0x22f/0x750 [ 2298.020237] [] ? rwsem_down_read_failed+0x260/0x260 [ 2298.020237] [] ? get_parent_ip+0x11/0x50 [ 2298.020237] [] ? preempt_count_add+0x106/0x160 [ 2298.020237] [] ? debug_smp_processor_id+0x17/0x20 [ 2298.020237] [] ? cmpxchg_double_slab.isra.25+0x210/0x240 [ 2298.020237] [] ? free_debug_processing+0x19f/0x320 [ 2298.020237] [] call_rwsem_down_write_failed+0x13/0x20 [ 2298.020237] [] ? down_write+0x29/0x70 [ 2298.020237] [] validate_mm+0xa2/0x910 [ 2298.020237] [] do_munmap+0x421/0xf50 [ 2298.020237] [] ? send_sigtrap+0x1e0/0x1e0 [ 2298.020237] [] vm_munmap+0x5f/0x80 [ 2298.020237] [] SyS_munmap+0x22/0x30 [ 2298.020237] [] system_call_fastpath+0x16/0x1b [ 2298.020237] Code: 02 84 c0 74 04 3c 03 7e 7c c7 83 10 07 00 00 00 00 00 80 48 8d bb 14 07 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 04 84 d2 75 52 So it seems that we end up spinning for quite a while? Thanks, Sasha