From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933761AbXCKPZd (ORCPT ); Sun, 11 Mar 2007 11:25:33 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S933763AbXCKPZd (ORCPT ); Sun, 11 Mar 2007 11:25:33 -0400 Received: from amsfep17-int.chello.nl ([62.179.120.12]:24962 "EHLO amsfep17-int.chello.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933761AbXCKPZc (ORCPT ); Sun, 11 Mar 2007 11:25:32 -0400 Subject: Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!) From: Peter Zijlstra To: "Michael S. Tsirkin" Cc: Roland Dreier , Tziporet Koren , Roland Dreier , general@lists.openfabrics.org, Ingo Molnar , Linux Kernel Mailing List In-Reply-To: <20070311135051.GA31985@mellanox.co.il> References: <45E552FC.4040305@mellanox.co.il> <20070311135051.GA31985@mellanox.co.il> Content-Type: text/plain Date: Sun, 11 Mar 2007 16:25:19 +0100 Message-Id: <1173626719.5182.10.camel@lappy> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 2007-03-11 at 15:50 +0200, Michael S. Tsirkin wrote: > > Quoting Roland Dreier : > > Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! > > > > >Feb 27 17:47:52 sw169 kernel: [] _spin_lock_irqsave+0x15/0x24 > > >Feb 27 17:47:52 sw169 kernel: [] :ib_ipoib:ipoib_neigh_destructor+0xc2/0x139 > > > > It looks like this is deadlocking trying to take priv->lock in ipoib_neigh_destructor(). > > One idea I just had would be to build a kernel with CONFIG_PROVE_LOCKING > > turned on, and then rerun this test. There's a good chance that this would > > diagnose the deadlock. (I don't have good access to my test machines right now, or > > else I would do it myself) > > OK, I did that. But I get > [13440.761857] INFO: trying to register non-static key. > [13440.766903] the code is fine but needs lockdep annotation. > [13440.772455] turning off the locking correctness validator. > and I am not sure what triggers this, or how to fix it to have the > validator actually do its job. It usually indicates a spinlock is not properly initialized. Like __SPIN_LOCK_UNLOCKED() used in a non-static context, use spin_lock_init() in these cases. However looking at the code, ipoib_neight_destructor only uses &priv->lock, and that seems to get properly initialized in ipoib_setup() using spin_lock_init(). So either there are other sites that instanciate those objects and forget about the lock init, or the object is corrupted (use after free?) > Ingo, what key does the message refer to? > > The stack dump seems to point to drivers/infiniband/ulp/ipoib/ipoib_main.c line > 829. > > Full message below: > > [13440.761857] INFO: trying to register non-static key. > [13440.766903] the code is fine but needs lockdep annotation. > [13440.772455] turning off the locking correctness validator. > [13440.778008] [] __lock_acquire+0xae4/0xbb9 > [13440.783078] [] lock_acquire+0x56/0x71 > [13440.787784] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.794412] [] _spin_lock_irqsave+0x32/0x41 > [13440.799649] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.806275] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13440.812897] [] dst_run_gc+0xc/0x118 > [13440.817439] [] run_timer_softirq+0x37/0x16b > [13440.822673] [] dst_run_gc+0x0/0x118 > [13440.827221] [] neigh_destroy+0xbe/0x104 > [13440.832114] [] dst_destroy+0x4d/0xab > [13440.836751] [] dst_run_gc+0x55/0x118 > [13440.841384] [] run_timer_softirq+0x108/0x16b > [13440.846711] [] __do_softirq+0x5a/0xd5 > [13440.851427] [] trace_hardirqs_on+0x106/0x141 > [13440.856754] [] __do_softirq+0x69/0xd5 > [13440.861470] [] do_softirq+0x37/0x4d > [13440.866016] [] smp_apic_timer_interrupt+0x6b/0x77 > [13440.871774] [] default_idle+0x3b/0x54 > [13440.876491] [] default_idle+0x3b/0x54 > [13440.881211] [] apic_timer_interrupt+0x33/0x38 > [13440.886624] [] default_idle+0x3b/0x54 > [13440.891342] [] default_idle+0x3d/0x54 > [13440.896061] [] cpu_idle+0xa2/0xbb > [13440.900436] ======================= > [13768.711447] BUG: spinlock lockup on CPU#1, swapper/0, c0687880 > [13768.717353] [] _raw_spin_lock+0xda/0xfd > [13768.722247] [] _spin_lock_irqsave+0x39/0x41 > [13768.727486] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13768.734110] [] ipoib_neigh_destructor+0xd0/0x132 [ib_ipoib] > [13768.740735] [] dst_run_gc+0xc/0x118 > [13768.745276] [] run_timer_softirq+0x37/0x16b > [13768.750517] [] dst_run_gc+0x0/0x118 > [13768.755061] [] neigh_destroy+0xbe/0x104 > [13768.759955] [] dst_destroy+0x4d/0xab > [13768.764586] [] dst_run_gc+0x55/0x118 > [13768.769218] [] run_timer_softirq+0x108/0x16b > [13768.774542] [] __do_softirq+0x5a/0xd5 > [13768.779261] [] trace_hardirqs_on+0x106/0x141 > [13768.784588] [] __do_softirq+0x69/0xd5 > [13768.789308] [] do_softirq+0x37/0x4d > [13768.793851] [] smp_apic_timer_interrupt+0x6b/0x77 > [13768.799609] [] default_idle+0x3b/0x54 > [13768.804326] [] default_idle+0x3b/0x54 > [13768.809054] [] apic_timer_interrupt+0x33/0x38 > [13768.814471] [] default_idle+0x3b/0x54 > [13768.819187] [] default_idle+0x3d/0x54 > [13768.823903] [] cpu_idle+0xa2/0xbb > [13768.828279] ======================= > > > -- > MST > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/