From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:56052)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1an7uu-00041W-Os
	for qemu-devel@nongnu.org; Mon, 04 Apr 2016 13:03:57 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1an7uq-0007ZM-Mw
	for qemu-devel@nongnu.org; Mon, 04 Apr 2016 13:03:56 -0400
Received: from mail-lf0-x244.google.com ([2a00:1450:4010:c07::244]:35530)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1an7uq-0007ZF-8x
	for qemu-devel@nongnu.org; Mon, 04 Apr 2016 13:03:52 -0400
Received: by mail-lf0-x244.google.com with SMTP id c62so24968000lfc.2
	for <qemu-devel@nongnu.org>; Mon, 04 Apr 2016 10:03:52 -0700 (PDT)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
References: <1453976119-24372-1-git-send-email-alex.bennee@linaro.org>
	<1453976119-24372-4-git-send-email-alex.bennee@linaro.org>
	<87h9fl12zq.fsf@gmail.com> <57022283.5070400@redhat.com>
	<CAJhHMCA1NZiUvKaUpb5SSXOnLe+a507xmAf9Kuym3i7MVuKcHw@mail.gmail.com>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <57029E74.8070509@redhat.com>
Date: Mon, 4 Apr 2016 19:03:48 +0200
MIME-Version: 1.0
In-Reply-To: <CAJhHMCA1NZiUvKaUpb5SSXOnLe+a507xmAf9Kuym3i7MVuKcHw@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to
 __atomic functions
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Pranith Kumar <bobby.prani@gmail.com>
Cc: mttcg@greensocs.com, Peter Maydell <peter.maydell@linaro.org>, Mark Burton <mark.burton@greensocs.com>, alvise rigo <a.rigo@virtualopensystems.com>, qemu-devel <qemu-devel@nongnu.org>, stefanha@redhat.com, =?UTF-8?Q?Alex_Benn=c3=a9e?= <alex.bennee@linaro.org>, =?UTF-8?B?S09OUkFEIEZyw6lkw6lyaWM=?= <fred.konrad@greensocs.com>


On 04/04/2016 18:26, Pranith Kumar wrote:
> Hi Paolo,
> 
> On Mon, Apr 4, 2016 at 4:14 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> The issue is that atomic_thread_fence() only affects other atomic
>> operations, while smp_rmb() and smp_wmb() affect normal loads and stores
>> as well.
> 
> That is different from what I understand what atomic_thread_fence()
> does. AFAIK, it should order all loads/stores and not only atomic
> operations.
> 
> Quoting http://www.inf.pucrs.br/~flash/progeng2/cppreference/w/cpp/atomic/atomic_thread_fencehtml.html:
> 
> "Establishes memory synchronization ordering of non-atomic and relaxed
> atomic accesses"

You can find some information at
https://bugzilla.redhat.com/show_bug.cgi?id=1142857 and
https://retrace.fedoraproject.org/faf/problems/670281/.

I've looked at private email from that time and I was pointed to this
sentence in GCC's manual, which says the opposite:

    "Note that in the C++11 memory model, fences (e.g.,
    ‘__atomic_thread_fence’) take effect in combination with other
    atomic operations on specific memory locations (e.g., atomic loads);
    operations on specific memory locations do not necessarily affect
    other operations in the same way."

Basically, the __atomic fences are essentially a way to break up a
non-relaxed atomic access into a relaxed atomic access and the
ordering-constraining fence.  According to those emails, atomic
load-acquires and store-releases can provide synchronization for plain
loads and stores (not just for relaxed atomic loads and stores).
However, an atomic thread fence cannot do this.

>> In the GCC implementation, atomic operations (even relaxed ones) access
>> memory as if the pointer was volatile.  By doing this, GCC can remove
>> the acquire and release fences altogether on TSO architectures.  We
>> actually observed a case where the compiler subsequently inverted the
>> order of two writes around a smp_wmb().
> 
> That is a GCC bug in that case. GCC should remove smp_wmb() only after
> the optimization passes have run in the codegen phase. Can you please
> point me to the mailing list thread/bug where this was discovered? Or
> is it easily reproducible with reverting that patch? In that case, the
> location of the bug will be helpful to analyse this further.

It depends on the compiler you're using.  With some luck, reverting the
patch will cause accesses across smp_wmb() or smp_rmb() to get
reordered.  In our case it was in thread-pool.c; Kevin Wolf did the
analysis.

>> In principle it could do the same on architectures that are sequentially
>> consistent; even if none exists in practice, keeping the barriers for
>> smp_mb() is consistent with the other barriers.
> 
> I understand one barrier(), but having two on both sides seems
> unnecessary. I would prefer we clean up and have just one. Although, I
> think it is just an annoyance since performance wise there is no
> difference. Two consecutive barrier()'s should behave just like one.

Yes, this is okay to change.

Paolo