From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754879AbcBZVbi (ORCPT <rfc822;w@1wt.eu>);
	Fri, 26 Feb 2016 16:31:38 -0500
Received: from e36.co.us.ibm.com ([32.97.110.154]:49158 "EHLO
	e36.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752141AbcBZVbh (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 26 Feb 2016 16:31:37 -0500
X-IBM-Helo: d03dlp01.boulder.ibm.com
X-IBM-MailFrom: paulmck@linux.vnet.ibm.com
X-IBM-RcptTo: linux-kernel@vger.kernel.org
Date: Fri, 26 Feb 2016 13:31:33 -0800
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Sergey Fedorov <serge.fdrv@gmail.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Documentation/memory-barriers.txt: How can READ_ONCE() and
 WRITE_ONCE() provide cache coherence?
Message-ID: <20160226213133.GI3522@linux.vnet.ibm.com>
Reply-To: paulmck@linux.vnet.ibm.com
References: <56D0C02D.6000905@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <56D0C02D.6000905@gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16022621-0021-0000-0000-0000176D60D2
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sat, Feb 27, 2016 at 12:14:21AM +0300, Sergey Fedorov wrote:
> Hi,
> 
> I just can't understand how this kind of compiler barrier macros may
> provide any form of cache coherence. Sure, such kind of compiler
> barrier is necessary to "reliably" access a variable from multiple
> CPUs. But why it is stated that these macros *provide* cache
> coherence?

Without READ_ONCE(), common sub-expression elimination optimizations
can cause later reads of a given variable to see older value than
previous reads did.  For a (silly) example:

	a = complicated_pure_function(x);
	b = x;
	c = complicated_pure_function(x);

The compiler is within its rights to transform this into the following:

	a = complicated_pure_function(x);
	b = x;
	c = a(x);

In this case, the assignment to b might see a newer value of x than did
the later assignment to c.  This violates cache coherence, which states
that all reads from a given variable must agree on the order of values
taken on by that variable.

Using READ_ONCE() prevents this violation of cache coherence, albeit
at the price of evaluating complicated_pure_function() twice rather
than once:

	a = complicated_pure_function(READ_ONCE(x));
	b = READ_ONCE(x);
	c = complicated_pure_function(READ_ONCE(x));

Similar examples exist for WRITE_ONCE().

You -want- the compiler to violate cache coherence for normal accesses
to unshared variables, so you have to tell it when cache coherence is
important.

							Thanx, Paul

> From Documentation/memory-barriers.txt:
> >The READ_ONCE() and WRITE_ONCE() functions can prevent any number of
> >optimizations that, while perfectly safe in single-threaded code, can
> >be fatal in concurrent code.  Here are some examples of these sorts
> >of optimizations:
> >
> > (*) The compiler is within its rights to reorder loads and stores
> >     to the same variable, and in some cases, the CPU is within its
> >     rights to reorder loads to the same variable.  This means that
> >     the following code:
> >
> >    a[0] = x;
> >    a[1] = x;
> >
> >     Might result in an older value of x stored in a[1] than in a[0].
> >     Prevent both the compiler and the CPU from doing this as follows:
> >
> >    a[0] = READ_ONCE(x);
> >    a[1] = READ_ONCE(x);
> >
> >     In short, READ_ONCE() and WRITE_ONCE() provide cache coherence for
> >     accesses from multiple CPUs to a single variable.
> 
> Thanks,
> Sergey
>