From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752298AbYL3GGP (ORCPT ); Tue, 30 Dec 2008 01:06:15 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751187AbYL3GF4 (ORCPT ); Tue, 30 Dec 2008 01:05:56 -0500 Received: from mx3.mail.elte.hu ([157.181.1.138]:39985 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751182AbYL3GFz (ORCPT ); Tue, 30 Dec 2008 01:05:55 -0500 Date: Tue, 30 Dec 2008 07:05:36 +0100 From: Ingo Molnar To: Herbert Xu Cc: Peter Zijlstra , "Tantilov, Emil S" , "Kirsher, Jeffrey T" , netdev , David Miller , "Waskiewicz Jr, Peter P" , "Duyck, Alexander H" , Eric Dumazet , linux-kernel@vger.kernel.org Subject: Re: [patch] locking, percpu counters: introduce separate lock classes Message-ID: <20081230060536.GG11037@elte.hu> References: <20081229103735.GA9763@gondor.apana.org.au> <20081229112858.GA16385@elte.hu> <20081229114907.GA10170@gondor.apana.org.au> <20081229115827.GA441@elte.hu> <20081229120132.GA10363@gondor.apana.org.au> <20081229121626.GF9628@elte.hu> <20081229123819.GA18321@elte.hu> <20081229124444.GA20306@elte.hu> <20081229141417.GA1493@elte.hu> <20081230035815.GA16454@gondor.apana.org.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081230035815.GA16454@gondor.apana.org.au> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Herbert Xu wrote: > On Mon, Dec 29, 2008 at 03:14:17PM +0100, Ingo Molnar wrote: > > > > my testing efforts today are not particularly dominated by luck :-) > > > > Below is the latest splat that i got with Peter's patch plus the revert of > > dd24c00191 applied. > > > [ 78.679386] > > [ 78.679389] ================================= > > [ 78.680039] [ INFO: inconsistent lock state ] > > [ 78.680039] 2.6.28-tip-03885-g44c31d5-dirty #13188 > > [ 78.680039] --------------------------------- > > [ 78.680039] inconsistent {softirq-on-W} -> {in-softirq-W} usage. > > [ 78.680039] ssh/4054 [HC0[0]:SC1[1]:HE1:SE0] takes: > > [ 78.680039] (key#8){-+..}, at: [] __percpu_counter_add+0x52/0x7a > > [ 78.680039] {softirq-on-W} state was registered at: > > [ 78.680039] [] __lock_acquire+0x288/0xa93 > > [ 78.680039] [] lock_acquire+0x5d/0x7a > > [ 78.680039] [] _spin_lock+0x20/0x2f > > [ 78.680039] [] __percpu_counter_add+0x52/0x7a > > [ 78.680039] [] percpu_counter_add+0xf/0x12 > > [ 78.680039] [] tcp_v4_init_sock+0xe5/0xea > > Right, this is the correct version of the earlier splat :) > > Anyway, I've extended Peter's patch to cover the other cases. > Please let me know if it still bitches with this + Peter's fbc > patch. > > net: Fix percpu counters deadlock > > When we converted the protocol atomic counters such as the orphan > count and the total socket count deadlocks were introduced due to > the mismatch in BH status of the spots that used the percpu counter > operations. > > Based on the diagnosis and patch by Peter Zijlstra, this patch > fixes these issues by disabling BH where we may be in process > context. > > Signed-off-by: Herbert Xu thanks, will start testing it now. One small nit: could you please add the Reported-by line for Jeff Kirscher who reported the problem originally: Reported-by: Jeff Kirsher Ingo