From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752146Ab1HUGRW (ORCPT ); Sun, 21 Aug 2011 02:17:22 -0400 Received: from mailout-de.gmx.net ([213.165.64.23]:53598 "HELO mailout-de.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1750901Ab1HUGRV (ORCPT ); Sun, 21 Aug 2011 02:17:21 -0400 X-Authenticated: #14349625 X-Provags-ID: V01U2FsdGVkX19hwEhd2t5hTPjSNm5XN3e2+7h3Uftw7si2PDH2tv 4wKZtLAw1n1Xfm Subject: Re: [PATCH 1/3 v2] call_function_many: fix list delete vs add race From: Mike Galbraith To: paulmck@linux.vnet.ibm.com Cc: Milton Miller , Peter Zijlstra , akpm@linux-foundation.org, Anton Blanchard , xiaoguangrong@cn.fujitsu.com, mingo@elte.hu, jaxboe@fusionio.com, npiggin@gmail.com, JBeulich@novell.com, rusty@rustcorp.com.au, torvalds@linux-foundation.org, benh@kernel.crashing.org, linux-kernel@vger.kernel.org In-Reply-To: <20110208193650.GG2146@linux.vnet.ibm.com> References: <1296145360.15234.234.camel@laptop> <20110201220026.GD2142@linux.vnet.ibm.com> <20110202041740.GB2129@linux.vnet.ibm.com> <1297066374.5739.38.camel@marge.simson.net> <20110208193650.GG2146@linux.vnet.ibm.com> Content-Type: text/plain; charset="UTF-8" Date: Sun, 21 Aug 2011 08:17:11 +0200 Message-ID: <1313907431.6394.26.camel@marge.simson.net> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2011-02-08 at 11:36 -0800, Paul E. McKenney wrote: > On Mon, Feb 07, 2011 at 09:12:54AM +0100, Mike Galbraith wrote: > > On Tue, 2011-02-01 at 20:17 -0800, Paul E. McKenney wrote: > > > > FWIW, my red headed stepchild (.32 xen cluster beast) with.. > > smp-smp_call_function_many-fix-SMP-race (6dc1989) > > smp-consolidate-writes-in-smp_call_function_interrupt (225c8e0) > > smp-smp_call_function_many-fix-list-delete-vs-add-race (V2) > > smp-smp_call_function_many-handle-concurrent-clearing-of-mask (V2) > > smp-generic_smp_call_function_interrupt-additional-memory-order-tightening (below) > > ..has not experienced any IPI problems lately, nor have I been able to > > trigger anything beating up my box running normal x64_64 kernels. > > > > That's not saying much since my IPI woes were only the concurrent > > clearing variety, just letting you know that these patches have received > > (and are receiving) hefty thumpage. > > Very good, I have added your Tested-by to my patch. I'm still using your barrier changes. Is it safe to drop them? HA clusters falling over is _not_ something I want to risk resurrecting by dropping them just because they didn't go anywhere. > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > > > smp_call_function: additional memory-order tightening. > > > > > > The csd_lock() and csd_unlock() interaction guarantees that the > > > smp_call_function_many() function sees the results of interactions > > > with prior incarnations of the callback, so the check is not needed. > > > Instead, tighter memory ordering is required in the companion > > > generic_smp_call_function_interrupt() function to ensure proper > > > interaction with partially initialized callbacks. > > > > > > Signed-off-by: Paul E. McKenney > > > > > > diff --git a/kernel/smp.c b/kernel/smp.c > > > index 064bb6e..e091905 100644 > > > --- a/kernel/smp.c > > > +++ b/kernel/smp.c > > > @@ -182,7 +182,7 @@ void generic_smp_call_function_interrupt(void) > > > > > > /* > > > * Ensure entry is visible on call_function_queue after we have > > > - * entered the IPI. See comment in smp_call_function_many. > > > + * entered the IPI. See comment in smp_call_function_many > > > * If we don't have this, then we may miss an entry on the list > > > * and never get another IPI to process it. > > > */ > > > @@ -209,13 +209,18 @@ void generic_smp_call_function_interrupt(void) > > > if (!cpumask_test_cpu(cpu, data->cpumask)) > > > continue; > > > > > > - smp_rmb(); > > > + smp_mb(); /* If we see our bit set above, we need to see */ > > > + /* all the processing associated with the prior */ > > > + /* incarnation of this callback. */ > > > > > > if (atomic_read(&data->refs) == 0) > > > continue; > > > > > > + smp_rmb(); /* We need to read ->refs before we read either */ > > > + /* ->csd.func and ->csd.info. */ > > > + > > > func = data->csd.func; /* for later warn */ > > > - data->csd.func(data->csd.info); > > > + func(data->csd.info); > > > > > > /* > > > * If the cpu mask is not still set then func enabled > > > >