From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1755374AbbCFV7x (ORCPT <rfc822;w@1wt.eu>);
	Fri, 6 Mar 2015 16:59:53 -0500
Received: from userp1040.oracle.com ([156.151.31.81]:27029 "EHLO
	userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753557AbbCFV7w (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 6 Mar 2015 16:59:52 -0500
Message-ID: <54FA2326.3000300@oracle.com>
Date: Fri, 06 Mar 2015 16:59:02 -0500
From: Sasha Levin <sasha.levin@oracle.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: Davidlohr Bueso <dave@stgolabs.net>, Ingo Molnar <mingo@kernel.org>
CC: Peter Zijlstra <peterz@infradead.org>, LKML <linux-kernel@vger.kernel.org>,
        Dave Jones <davej@codemonkey.org.uk>, jason.low2@hp.com,
        Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: sched: softlockups in multi_cpu_stop
References: <54F41516.6060608@oracle.com> <54F98F1F.3080107@oracle.com>	 <20150306123233.GA9972@gmail.com> <1425662342.19505.41.camel@stgolabs.net> <54F9EBCA.1060300@oracle.com>
In-Reply-To: <54F9EBCA.1060300@oracle.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Source-IP: aserv0021.oracle.com [141.146.126.233]
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 03/06/2015 01:02 PM, Sasha Levin wrote:
> I can go redo that again if you suspect that that commit is not the cause.

I took a closer look at the logs, and I'm seeing hangs that begin this way
as well:

[ 2298.020237] NMI watchdog: BUG: soft lockup - CPU#19 stuck for 23s! [trinity-c19:839]
[ 2298.020237] Modules linked in:
[ 2298.020237] CPU: 19 PID: 839 Comm: trinity-c19 Not tainted 4.0.0-rc2-next-20150306-sasha-00056-g61886e8 #2005
[ 2298.020237] task: ffff880278d62000 ti: ffff880254fe8000 task.ti: ffff880254fe8000
[ 2298.020237] RIP: 0010:[<ffffffffa442702f>]  [<ffffffffa442702f>] __rcu_read_unlock+0x9f/0x130
[ 2298.020237] RSP: 0000:ffff880254fefbd8  EFLAGS: 00000207
[ 2298.020237] RAX: dffffc0000000000 RBX: ffff880254fe8000 RCX: 1ffff1004a9fd002
[ 2298.020237] RDX: 1ffff1004f1ac4e2 RSI: ffff8802c3ff6000 RDI: ffff880278d62714
[ 2298.020237] RBP: ffff880254fefbe8 R08: ffff880362b2e080 R09: ffffffff00000001
[ 2298.020237] R10: ffff880362b2e140 R11: ffffea000e253800 R12: 0000000000000a3e
[ 2298.020237] R13: ffff880278d62cb0 R14: ffffed014e1e4899 R15: 0034c1c55efd9eff
[ 2298.020237] FS:  00007f183b9c3700(0000) GS:ffff880375200000(0000) knlGS:0000000000000000
[ 2298.020237] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2298.020237] CR2: 0000000000bc8fe8 CR3: 0000000259a0a000 CR4: 00000000000007a0
[ 2298.020237] Stack:
[ 2298.020237]  ffff8802c3ff6000 ffff880126cfdc28 ffff880254fefc38 ffffffffa43fcb75
[ 2298.020237]  ffff8802c3ff6000 ffff8802c3ff6000 ffff880278d62714 ffff880126cfdc48
[ 2298.020237]  ffff8802c3ff6000 ffff880126cfdc44 ffff880126cfdc28 ffff880254fefd78
[ 2298.020237] Call Trace:
[ 2298.020237]  [<ffffffffa43fcb75>] rwsem_spin_on_owner+0x165/0x250
[ 2298.020237]  [<ffffffffae92a67f>] rwsem_down_write_failed+0x22f/0x750
[ 2298.020237]  [<ffffffffae92a450>] ? rwsem_down_read_failed+0x260/0x260
[ 2298.020237]  [<ffffffffa438fc31>] ? get_parent_ip+0x11/0x50
[ 2298.020237]  [<ffffffffa438fd76>] ? preempt_count_add+0x106/0x160
[ 2298.020237]  [<ffffffffa5f77c77>] ? debug_smp_processor_id+0x17/0x20
[ 2298.020237]  [<ffffffffa47128c0>] ? cmpxchg_double_slab.isra.25+0x210/0x240
[ 2298.020237]  [<ffffffffa47119df>] ? free_debug_processing+0x19f/0x320
[ 2298.020237]  [<ffffffffa5f4da33>] call_rwsem_down_write_failed+0x13/0x20
[ 2298.020237]  [<ffffffffae9296a9>] ? down_write+0x29/0x70
[ 2298.020237]  [<ffffffffa46acd32>] validate_mm+0xa2/0x910
[ 2298.020237]  [<ffffffffa46b5fd1>] do_munmap+0x421/0xf50
[ 2298.020237]  [<ffffffffa41627d0>] ? send_sigtrap+0x1e0/0x1e0
[ 2298.020237]  [<ffffffffa46b6b5f>] vm_munmap+0x5f/0x80
[ 2298.020237]  [<ffffffffa46b9562>] SyS_munmap+0x22/0x30
[ 2298.020237]  [<ffffffffae92e60d>] system_call_fastpath+0x16/0x1b
[ 2298.020237] Code: 02 84 c0 74 04 3c 03 7e 7c c7 83 10 07 00 00 00 00 00 80 48 8d bb 14 07 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0 7c 04 84 d2 75 52

So it seems that we end up spinning for quite a while?


Thanks,
Sasha