From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34934) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4t6L-0002MW-DO for qemu-devel@nongnu.org; Mon, 23 May 2016 12:53:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b4t6H-0006jL-Hu for qemu-devel@nongnu.org; Mon, 23 May 2016 12:53:09 -0400 Received: from mail-qk0-x244.google.com ([2607:f8b0:400d:c09::244]:36763) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b4t6H-0006jC-D9 for qemu-devel@nongnu.org; Mon, 23 May 2016 12:53:05 -0400 Received: by mail-qk0-x244.google.com with SMTP id l68so18842258qkf.3 for ; Mon, 23 May 2016 09:53:04 -0700 (PDT) Sender: Richard Henderson References: <1463863336-28760-1-git-send-email-cota@braap.org> <1463863336-28760-2-git-send-email-cota@braap.org> From: Richard Henderson Message-ID: Date: Mon, 23 May 2016 09:53:00 -0700 MIME-Version: 1.0 In-Reply-To: <1463863336-28760-2-git-send-email-cota@braap.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 1/2] atomics: do not use __atomic primitives for RCU atomics List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Emilio G. Cota" , QEMU Developers , MTTCG Devel Cc: =?UTF-8?Q?Alex_Benn=c3=a9e?= , Paolo Bonzini , Sergey Fedorov On 05/21/2016 01:42 PM, Emilio G. Cota wrote: > In the process, the atomic_rcu_read/set were converted to implement > consume/release semantics, respectively. This is inefficient; for > correctness and maximum performance we only need an smp_barrier_depends > for reads, and an smp_wmb for writes. Fix it by using the original > definition of these two primitives for all compilers. For what host do you think this is inefficient? In particular, what you've done is going to be less efficient for e.g. armv8, where the __atomic formulation is going to produce load-acquire and store-release instructions. Whereas the separate barriers are going to produce two insns. As for the common case of x86_64, what you're doing is going to make no difference at all. So what are you trying to improve? r~