From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jarek Poplawski Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) Date: Thu, 9 Jul 2009 16:24:14 +0200 Message-ID: <20090709142414.GC3651@ami.dom.local> References: <200907031326.21822.andres@anarazel.de> <200907071811.27570.andres@anarazel.de> <20090708080852.GC3148@ami.dom.local> <200907090023.18040.andres@anarazel.de> <20090708224828.GD3666@ami.dom.local> <20090709104412.GA3651@ami.dom.local> <20090709132256.GB3651@ami.dom.local> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andres Freund , Joao Correia , Arun R Bharadwaj , Stephen Hemminger , netdev@vger.kernel.org, LKML , Patrick McHardy , Peter Zijlstra To: Thomas Gleixner Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org On Thu, Jul 09, 2009 at 04:15:28PM +0200, Thomas Gleixner wrote: > On Thu, 9 Jul 2009, Jarek Poplawski wrote: > > On Thu, Jul 09, 2009 at 02:03:50PM +0200, Thomas Gleixner wrote: > > > On Thu, 9 Jul 2009, Jarek Poplawski wrote: > > > > > > > > > > I have the feeling that the code relies on some implicit cpu > > > > > boundness, which is not longer guaranteed with the timer migration > > > > > changes, but that's a question for the network experts. > > > > > > > > As a matter of fact, I've just looked at this __netif_schedule(), > > > > which really is cpu bound, so you might be 100% right. > > > > > > So the watchdog is the one which causes the trouble. The patch below > > > should fix this. > > > > I hope so. On the other hand it seems it should work with this > > migration yet, so it probably needs additional debugging. > > Right. I just provided the patch to narrow down the problem, but > please test the fix of the hrtimer migration code which I sent out a > bit earlier: http://lkml.org/lkml/2009/7/9/150 > > It fixes a possible endless loop in the timer code which is related to > the migration changes. Looking at the backtraces of the spinlock > lockup I think that is what you hit. Actually, Andres and Joao hit this, and I hope they'll try these two patches. Thanks, Jarek P.