From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38267) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UorP9-000643-Fz for qemu-devel@nongnu.org; Tue, 18 Jun 2013 04:36:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UorP7-0000ZB-OF for qemu-devel@nongnu.org; Tue, 18 Jun 2013 04:36:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:3725) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UorP7-0000Vl-9p for qemu-devel@nongnu.org; Tue, 18 Jun 2013 04:36:41 -0400 Message-ID: <51C01C0E.5030501@redhat.com> Date: Tue, 18 Jun 2013 10:36:30 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1371381681-14252-1-git-send-email-pingfanl@linux.vnet.ibm.com> <1371381681-14252-2-git-send-email-pingfanl@linux.vnet.ibm.com> <51BF5C0F.6020209@twiddle.net> In-Reply-To: <51BF5C0F.6020209@twiddle.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 1/2] add a header file for atomic operations List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: Liu Ping Fan , Anthony Liguori , qemu-devel@nongnu.org Il 17/06/2013 20:57, Richard Henderson ha scritto: > On 06/16/2013 04:21 AM, Liu Ping Fan wrote: >> +#if QEMU_GNUC_PREREQ(4, 8) >> +#ifndef __ATOMIC_RELAXED >> +#define __ATOMIC_RELAXED 0 >> +#endif > > Why all the ifdefs? If __atomic support is present, then __ATOMIC defines will > exist. Then I can just use "#ifdef __ATOMIC_RELAXED" instead of "#if QEMU_GNUC_PREREQ(4, 8)"? >> +/* Compiler barrier */ >> +#define barrier() ({ asm volatile("" ::: "memory"); (void)0; }) >> + >> +#if !QEMU_GNUC_PREREQ(4, 8) > > With C++11 atomic support we should define barrier as > > __atomic_signal_fence(__ATOMIC_ACQ_REL) I don't really understand fences other than SEQ_CST, so I was hesitant to use them. The asms should work anyway. >> #if defined(__powerpc64__) >> -#define smp_rmb() asm volatile("lwsync" ::: "memory") >> +#define smp_rmb() ({ asm volatile("lwsync" ::: "memory"); (void)0; }) >> #else >> -#define smp_rmb() asm volatile("sync" ::: "memory") >> +#define smp_rmb() ({ asm volatile("sync" ::: "memory"); (void)0; }) >> #endif > > Err... lwsync corresponds to ACQ_REL consistency, while sync corresponds to > SEQ_CST consistency. The newly added document says you want SEQ_CST while the > old code implies ACQ_REL. > > Which is true? I have no idea, but I can say which semantics I want: 1) Linux kernel memory barrier semantics for smp_*mb (i.e. express the barriers in terms of read/write/full, not in terms of acq/rel/seqcst); 2) Java volatile semantics for atomic_mb_*. Basically, I cannot claim I understand this stuff 100%, but at least I could use sources I trust to implement it. My rationale was that both have been in use for a while and have proven to be both efficient and useful. They are more easily understood than the C11 memory model, though not as complete. I wanted something that people could figure out relatively easily (tested on myself while working on the RCU stuff). Most important, Java volatile can be described in terms of barriers as in the JSR-133 cookbook, making it easy to convert one to the other. This also reflected in the names. Finally, Java volatile semantics are reasonably fast for all architectures, including ARM and PPC. I think my understanding that Java volatile == SEQ_CST was wrong, though, and I should rewrite the documentation. More on this below. >> +#ifndef smp_mb >> +#define smp_mb() __sync_synchronize() >> +#endif > > Use __atomic_thread_fence here for completeness? > >> +#ifndef atomic_read >> +#define atomic_read(ptr) (*(__typeof__(*ptr) *volatile) (ptr)) >> #endif >> >> +#ifndef atomic_set >> +#define atomic_set(ptr, i) ((*(__typeof__(*ptr) *volatile) (ptr)) = (i)) >> +#endif > > Use > > __atomic_load(..., __ATOMIC_RELAXED) > __atomic_store(..., __ATOMIC_RELAXED) > > ? Same here, I didn't want proliferation of #ifdefs beyond what is actually required. >> + * And for the few ia64 lovers that exist, an atomic_mb_read is a ld.acq, >> + * while an atomic_mb_set is a st.rel followed by a memory barrier. > ... >> + */ >> +#ifndef atomic_mb_read >> +#if QEMU_GNUC_PREREQ(4, 8) >> +#define atomic_mb_read(ptr) ({ \ >> + typeof(*ptr) _val; \ >> + __atomic_load(ptr, &_val, __ATOMIC_SEQ_CST); \ >> + _val; \ >> +}) >> +#else >> +#define atomic_mb_read(ptr) ({ \ >> + typeof(*ptr) _val = atomic_read(ptr); \ >> + smp_rmb(); \ >> + _val; \ > > This latter definition is ACQUIRE not SEQ_CST (except for ia64). Without > load_acquire, one needs barriers before and after the atomic_read in order to > implement SEQ_CST. > > So again I have to ask, what semantics are you actually looking for here? As mentioned above, I wanted Java volatile semantics. I misunderstood that Java volatile == SEQ_CST. The Java memory model FAQ says: http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#volatile Each read of a volatile will see the last write to that volatile by any thread; in effect, they are designated by the programmer as fields for which it is never acceptable to see a "stale" value as a result of caching or reordering. The compiler and runtime are prohibited from allocating them in registers. They must also ensure that after they are written, they are flushed out of the cache to main memory, so they can immediately become visible to other threads. Similarly, before a volatile field is read, the cache must be invalidated so that the value in main memory, not the local processor cache, is the one seen. There are also additional restrictions on reordering accesses to volatile variables. (Since almost all architectures are cache coherent, I take that "cache" really means the store buffers. The JSR-133 cookbook basically says the same). In addition, mentioned by the cookbook but not the FAQ, normal stores cannot be moved after a volatile stores, and volatile loads cannot be moved after a normal load. The FAQ also has an "important note", however: Important Note: Note that it is important for both threads to access the same volatile variable in order to properly set up the happens-before relationship. It is not the case that everything visible to thread A when it writes volatile field f becomes visible to thread B after it reads volatile field g. The release and acquire have to "match" (i.e., be performed on the same volatile field) to have the right semantics. Is this final "important note" the difference between ACQ_REL and SEQ_CST? >> +#define atomic_mb_set(ptr, i) do { \ >> + smp_wmb(); \ >> + atomic_set(ptr, i); \ >> + smp_mb(); \ >> +} while (0) > > Really only a smp_wmb? What about outstanding loads? Probably the same as before. Also, as mentioned in the documentation smp_rmb() + smp_wmb() imply a load-store barrier on all machines. Thanks for reviewing this, you're probably the only person in the community that understands this stuff. Paolo