From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758562AbcBXVk0 (ORCPT ); Wed, 24 Feb 2016 16:40:26 -0500 Received: from e19.ny.us.ibm.com ([129.33.205.209]:32967 "EHLO e19.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752373AbcBXVkY (ORCPT ); Wed, 24 Feb 2016 16:40:24 -0500 X-IBM-Helo: d01dlp03.pok.ibm.com X-IBM-MailFrom: paulmck@linux.vnet.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Wed, 24 Feb 2016 13:40:13 -0800 From: "Paul E. McKenney" To: Mathieu Desnoyers Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Lai Jiangshan , dipankar@in.ibm.com, Andrew Morton , josh@joshtriplett.org, Thomas Gleixner , Peter Zijlstra , rostedt , David Howells , edumazet@google.com, dvhart@linux.intel.com, fweisbec@gmail.com, Oleg Nesterov , bobby prani Subject: Re: [PATCH tip/core/rcu 02/14] documentation: Fix control dependency and identical stores Message-ID: <20160224214013.GF3522@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20160224050021.GA14616@linux.vnet.ibm.com> <1456290047-16654-2-git-send-email-paulmck@linux.vnet.ibm.com> <467632883.7240.1456348324456.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <467632883.7240.1456348324456.JavaMail.zimbra@efficios.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16022421-0057-0000-0000-00000384DB94 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 24, 2016 at 09:12:04PM +0000, Mathieu Desnoyers wrote: > ----- On Feb 24, 2016, at 12:00 AM, Paul E. McKenney paulmck@linux.vnet.ibm.com wrote: > > > The summary of the "CONTROL DEPENDENCIES" section incorrectly states that > > barrier() may be used to prevent compiler reordering when more than one > > leg of the control-dependent "if" statement start with identical stores. > > This is incorrect at high optimization levels. This commit therefore > > updates the summary to match the detailed description. > > > > Reported by: Jianyu Zhan > > Signed-off-by: Paul E. McKenney > > --- > > Documentation/memory-barriers.txt | 10 +++++++--- > > 1 file changed, 7 insertions(+), 3 deletions(-) > > > > diff --git a/Documentation/memory-barriers.txt > > b/Documentation/memory-barriers.txt > > index 904ee42d078e..e26058d3e253 100644 > > --- a/Documentation/memory-barriers.txt > > +++ b/Documentation/memory-barriers.txt > > @@ -800,9 +800,13 @@ In summary: > > use smp_rmb(), smp_wmb(), or, in the case of prior stores and > > later loads, smp_mb(). > > > > - (*) If both legs of the "if" statement begin with identical stores > > - to the same variable, a barrier() statement is required at the > > - beginning of each leg of the "if" statement. > > + (*) If both legs of the "if" statement begin with identical stores to > > + the same variable, then those stores must be ordered, either by > > + preceding both of them with smp_mb() or by using smp_store_release() > > + to carry out the stores. Please note that it is -not- sufficient > > + to use barrier() at beginning of each leg of the "if" statement, > > + as optimizing compilers do not necessarily respect barrier() > > + in this case. > > Hrm, I really don't understand this one. > > One caveat, as stated here, would be that optimizing compilers > can reorder instruction with respect to barrier() placed at the > beginning of if/else legs that start with identical stores. > > It goes on stating that "smp_mb() or smp_store_release()" should > be used rather than barrier() in those cases. > > I don't get how, from a compiler optimization perspective, > barrier() is any different from smp_mb(). > > #define barrier() __asm__ __volatile__("": : :"memory") > > vs > > #define mb() asm volatile("mfence":::"memory") > > What the compiler would observe is a "memory" clobber in both > cases. > > Now if the stated cause of this issue would have been > internal reordering of those identical stores within the > processor, I would understand that smp_mb() has an > effect which differs from the compiler barrier, but since > the paragraph begins by stating that this is purely for > compiler optimizations, I'm confused. > > What am I missing there ? > > Thanks, > > Mathieu > > > > > > (*) Control dependencies require at least one run-time conditional > > between the prior load and the subsequent store, and this Let's take the example, replace barrier() with smp_mb(), and see what happens: q = READ_ONCE(a); if (q) { smp_mb(); WRITE_ONCE(b, p); do_something(); } else { smp_mb(); WRITE_ONCE(b, p); do_something_else(); } Given the same compiler transformation: q = READ_ONCE(a); smp_mb(); WRITE_ONCE(b, p); /* BUG: No ordering vs. load from a!!! */ if (q) { /* WRITE_ONCE(b, p); -- moved up, BUG!!! */ do_something(); } else { /* WRITE_ONCE(b, p); -- moved up, BUG!!! */ do_something_else(); } So ordering between the read from "a" and the write to "b" is still preserved. The reason this works is that the smp_mb() does all the ordering, so the fact that the control dependency has been eliminated is irrelevant. Thanx, Paul