From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56052) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1an7uu-00041W-Os for qemu-devel@nongnu.org; Mon, 04 Apr 2016 13:03:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1an7uq-0007ZM-Mw for qemu-devel@nongnu.org; Mon, 04 Apr 2016 13:03:56 -0400 Received: from mail-lf0-x244.google.com ([2a00:1450:4010:c07::244]:35530) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1an7uq-0007ZF-8x for qemu-devel@nongnu.org; Mon, 04 Apr 2016 13:03:52 -0400 Received: by mail-lf0-x244.google.com with SMTP id c62so24968000lfc.2 for ; Mon, 04 Apr 2016 10:03:52 -0700 (PDT) Sender: Paolo Bonzini References: <1453976119-24372-1-git-send-email-alex.bennee@linaro.org> <1453976119-24372-4-git-send-email-alex.bennee@linaro.org> <87h9fl12zq.fsf@gmail.com> <57022283.5070400@redhat.com> From: Paolo Bonzini Message-ID: <57029E74.8070509@redhat.com> Date: Mon, 4 Apr 2016 19:03:48 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v1 3/5] include/qemu/atomic.h: default to __atomic functions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Pranith Kumar Cc: mttcg@greensocs.com, Peter Maydell , Mark Burton , alvise rigo , qemu-devel , stefanha@redhat.com, =?UTF-8?Q?Alex_Benn=c3=a9e?= , =?UTF-8?B?S09OUkFEIEZyw6lkw6lyaWM=?= On 04/04/2016 18:26, Pranith Kumar wrote: > Hi Paolo, > > On Mon, Apr 4, 2016 at 4:14 AM, Paolo Bonzini wrote: >> >> The issue is that atomic_thread_fence() only affects other atomic >> operations, while smp_rmb() and smp_wmb() affect normal loads and stores >> as well. > > That is different from what I understand what atomic_thread_fence() > does. AFAIK, it should order all loads/stores and not only atomic > operations. > > Quoting http://www.inf.pucrs.br/~flash/progeng2/cppreference/w/cpp/atomic/atomic_thread_fencehtml.html: > > "Establishes memory synchronization ordering of non-atomic and relaxed > atomic accesses" You can find some information at https://bugzilla.redhat.com/show_bug.cgi?id=1142857 and https://retrace.fedoraproject.org/faf/problems/670281/. I've looked at private email from that time and I was pointed to this sentence in GCC's manual, which says the opposite: "Note that in the C++11 memory model, fences (e.g., ‘__atomic_thread_fence’) take effect in combination with other atomic operations on specific memory locations (e.g., atomic loads); operations on specific memory locations do not necessarily affect other operations in the same way." Basically, the __atomic fences are essentially a way to break up a non-relaxed atomic access into a relaxed atomic access and the ordering-constraining fence. According to those emails, atomic load-acquires and store-releases can provide synchronization for plain loads and stores (not just for relaxed atomic loads and stores). However, an atomic thread fence cannot do this. >> In the GCC implementation, atomic operations (even relaxed ones) access >> memory as if the pointer was volatile. By doing this, GCC can remove >> the acquire and release fences altogether on TSO architectures. We >> actually observed a case where the compiler subsequently inverted the >> order of two writes around a smp_wmb(). > > That is a GCC bug in that case. GCC should remove smp_wmb() only after > the optimization passes have run in the codegen phase. Can you please > point me to the mailing list thread/bug where this was discovered? Or > is it easily reproducible with reverting that patch? In that case, the > location of the bug will be helpful to analyse this further. It depends on the compiler you're using. With some luck, reverting the patch will cause accesses across smp_wmb() or smp_rmb() to get reordered. In our case it was in thread-pool.c; Kevin Wolf did the analysis. >> In principle it could do the same on architectures that are sequentially >> consistent; even if none exists in practice, keeping the barriers for >> smp_mb() is consistent with the other barriers. > > I understand one barrier(), but having two on both sides seems > unnecessary. I would prefer we clean up and have just one. Although, I > think it is just an annoyance since performance wise there is no > difference. Two consecutive barrier()'s should behave just like one. Yes, this is okay to change. Paolo