From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Miller Subject: Re: 2.6.24 BUG: soft lockup - CPU#X Date: Thu, 27 Mar 2008 17:02:35 -0700 (PDT) Message-ID: <20080327.170235.53674739.davem@davemloft.net> References: <20080327103340.GB2845@ami.dom.local> <36D9DB17C6DE9E40B059440DB8D95F5204C275C2@orsmsx418.amr.corp.intel.com> <47EC3182.7080005@sun.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: jesse.brandeburg@intel.com, jarkao2@gmail.com, netdev@vger.kernel.org To: Matheos.Worku@Sun.COM Return-path: Received: from 74-93-104-97-Washington.hfc.comcastbusiness.net ([74.93.104.97]:46763 "EHLO sunset.davemloft.net" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1752907AbYC1ACf (ORCPT ); Thu, 27 Mar 2008 20:02:35 -0400 In-Reply-To: <47EC3182.7080005@sun.com> Sender: netdev-owner@vger.kernel.org List-ID: From: Matheos Worku Date: Thu, 27 Mar 2008 16:45:06 -0700 > Brandeburg, Jesse wrote: > > Jarek Poplawski wrote: > > > >> On Wed, Mar 26, 2008 at 01:26:00PM -0700, Matheos Worku wrote: > >> ... > >> > >>> nsn57-110 login: BUG: soft lockup - CPU#2 stuck for 11s! ... Call > >>> Trace: [] __skb_clone+0x24/0xdc > >>> [] skb_realloc_headroom+0x30/0x63 > >>> [] :niu:niu_start_xmit+0x114/0x5af > >>> [] gart_map_single+0x0/0x70 > >>> [] dev_hard_start_xmit+0x1d2/0x246 ... > >>> > >> Maybe I'm wrong with this again, but I wonder about this > >> gart_map_single on almost all traces, and probably not supposed to be > >> seen here. Did you try with some memory re-config/debugging? > >> > > > > I have some more examples of this but with the ixgbe driver. We are > > running heavy bidirectional stress with multiple rx (non-napi, yeah I > > know) interrupts by default (and userspace irqbalance is probably on, > > I'll have the lab try it without) > > > > I have seen the lockup on kernels 2.6.18 and newer mostly on TX traffic. > I have seen it on another 10G driver (off the tree niu driver sibling, > nxge). The nxge driver doesn't use any TX interrupts and I have seen it > with UDP TX, irqbalance disabled, with no irq activity at all. some > example traces included. Interesting. Are you running uperf in a way such that there are multiple processors doing TX's in parallel? That might be a clue.