From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38267)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1UorP9-000643-Fz
	for qemu-devel@nongnu.org; Tue, 18 Jun 2013 04:36:49 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1UorP7-0000ZB-OF
	for qemu-devel@nongnu.org; Tue, 18 Jun 2013 04:36:43 -0400
Received: from mx1.redhat.com ([209.132.183.28]:3725)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1UorP7-0000Vl-9p
	for qemu-devel@nongnu.org; Tue, 18 Jun 2013 04:36:41 -0400
Message-ID: <51C01C0E.5030501@redhat.com>
Date: Tue, 18 Jun 2013 10:36:30 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1371381681-14252-1-git-send-email-pingfanl@linux.vnet.ibm.com>
	<1371381681-14252-2-git-send-email-pingfanl@linux.vnet.ibm.com>
	<51BF5C0F.6020209@twiddle.net>
In-Reply-To: <51BF5C0F.6020209@twiddle.net>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v2 1/2] add a header file for atomic
	operations
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Richard Henderson <rth@twiddle.net>
Cc: Liu Ping Fan <qemulist@gmail.com>, Anthony Liguori <anthony@codemonkey.ws>, qemu-devel@nongnu.org

Il 17/06/2013 20:57, Richard Henderson ha scritto:
> On 06/16/2013 04:21 AM, Liu Ping Fan wrote:
>> +#if QEMU_GNUC_PREREQ(4, 8)
>> +#ifndef __ATOMIC_RELAXED
>> +#define __ATOMIC_RELAXED 0
>> +#endif
> 
> Why all the ifdefs?  If __atomic support is present, then __ATOMIC defines will
> exist.

Then I can just use "#ifdef __ATOMIC_RELAXED" instead of
"#if QEMU_GNUC_PREREQ(4, 8)"?

>> +/* Compiler barrier */
>> +#define barrier()   ({ asm volatile("" ::: "memory"); (void)0; })
>> +
>> +#if !QEMU_GNUC_PREREQ(4, 8)
> 
> With C++11 atomic support we should define barrier as
> 
>   __atomic_signal_fence(__ATOMIC_ACQ_REL)

I don't really understand fences other than SEQ_CST, so I was hesitant to use
them.  The asms should work anyway.

>>  #if defined(__powerpc64__)
>> -#define smp_rmb()   asm volatile("lwsync" ::: "memory")
>> +#define smp_rmb()   ({ asm volatile("lwsync" ::: "memory"); (void)0; })
>>  #else
>> -#define smp_rmb()   asm volatile("sync" ::: "memory")
>> +#define smp_rmb()   ({ asm volatile("sync" ::: "memory"); (void)0; })
>>  #endif
> 
> Err... lwsync corresponds to ACQ_REL consistency, while sync corresponds to
> SEQ_CST consistency.  The newly added document says you want SEQ_CST while the
> old code implies ACQ_REL.
> 
> Which is true?

I have no idea, but I can say which semantics I want:

1) Linux kernel memory barrier semantics for smp_*mb (i.e. express the barriers
in terms of read/write/full, not in terms of acq/rel/seqcst);

2) Java volatile semantics for atomic_mb_*.

Basically, I cannot claim I understand this stuff 100%, but at least I could
use sources I trust to implement it.

My rationale was that both have been in use for a while and have proven to be
both efficient and useful.  They are more easily understood than the C11 memory
model, though not as complete.  I wanted something that people could figure out
relatively easily (tested on myself while working on the RCU stuff).

Most important, Java volatile can be described in terms of barriers as in the
JSR-133 cookbook, making it easy to convert one to the other.  This also reflected
in the names.

Finally, Java volatile semantics are reasonably fast for all architectures,
including ARM and PPC.

I think my understanding that Java volatile == SEQ_CST was wrong, though,
and I should rewrite the documentation.  More on this below.

>> +#ifndef smp_mb
>> +#define smp_mb()    __sync_synchronize()
>> +#endif
> 
> Use __atomic_thread_fence here for completeness?
> 
>> +#ifndef atomic_read
>> +#define atomic_read(ptr)       (*(__typeof__(*ptr) *volatile) (ptr))
>>  #endif
>>  
>> +#ifndef atomic_set
>> +#define atomic_set(ptr, i)     ((*(__typeof__(*ptr) *volatile) (ptr)) = (i))
>> +#endif
> 
> Use
> 
> __atomic_load(..., __ATOMIC_RELAXED)
> __atomic_store(..., __ATOMIC_RELAXED)
> 
> ?

Same here, I didn't want proliferation of #ifdefs beyond what is actually required.

>> + * And for the few ia64 lovers that exist, an atomic_mb_read is a ld.acq,
>> + * while an atomic_mb_set is a st.rel followed by a memory barrier.
> ...
>> + */
>> +#ifndef atomic_mb_read
>> +#if QEMU_GNUC_PREREQ(4, 8)
>> +#define atomic_mb_read(ptr)       ({             \
>> +    typeof(*ptr) _val;                           \
>> +    __atomic_load(ptr, &_val, __ATOMIC_SEQ_CST); \
>> +    _val;                                        \
>> +})
>> +#else
>> +#define atomic_mb_read(ptr)    ({           \
>> +    typeof(*ptr) _val = atomic_read(ptr);   \
>> +    smp_rmb();                              \
>> +    _val;                                   \
> 
> This latter definition is ACQUIRE not SEQ_CST (except for ia64).  Without
> load_acquire, one needs barriers before and after the atomic_read in order to
> implement SEQ_CST.
> 
> So again I have to ask, what semantics are you actually looking for here?

As mentioned above, I wanted Java volatile semantics.  I misunderstood
that Java volatile == SEQ_CST.  The Java memory model FAQ says:
http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#volatile

     Each read of a volatile will see the last write to that volatile
     by any thread; in effect, they are designated by the programmer as
     fields for which it is never acceptable to see a "stale" value as
     a result of caching or reordering. The compiler and runtime are
     prohibited from allocating them in registers. They must also ensure
     that after they are written, they are flushed out of the cache to
     main memory, so they can immediately become visible to other threads.
     Similarly, before a volatile field is read, the cache must be
     invalidated so that the value in main memory, not the local processor
     cache, is the one seen. There are also additional restrictions on
     reordering accesses to volatile variables. 

(Since almost all architectures are cache coherent, I take that "cache"
really means the store buffers.  The JSR-133 cookbook basically says the
same).  In addition, mentioned by the cookbook but not the FAQ, normal
stores cannot be moved after a volatile stores, and volatile loads
cannot be moved after a normal load.

The FAQ also has an "important note", however:

    Important Note: Note that it is important for both threads to access
    the same volatile variable in order to properly set up the happens-before
    relationship. It is not the case that everything visible to thread A
    when it writes volatile field f becomes visible to thread B after it
    reads volatile field g. The release and acquire have to "match" (i.e.,
    be performed on the same volatile field) to have the right semantics. 

Is this final "important note" the difference between ACQ_REL and SEQ_CST?

>> +#define atomic_mb_set(ptr, i)  do {         \
>> +    smp_wmb();                              \
>> +    atomic_set(ptr, i);                     \
>> +    smp_mb();                               \
>> +} while (0)
> 
> Really only a smp_wmb?  What about outstanding loads?

Probably the same as before.  Also, as mentioned in the documentation
smp_rmb() + smp_wmb() imply a load-store barrier on all machines.

Thanks for reviewing this, you're probably the only person in the community
that understands this stuff.

Paolo