From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62F49C2BA2B for ; Mon, 6 Apr 2020 19:14:55 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2478C206C3 for ; Mon, 6 Apr 2020 19:14:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="d76FctLn" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2478C206C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:37342 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jLXCw-0002ct-8o for qemu-devel@archiver.kernel.org; Mon, 06 Apr 2020 15:14:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52434) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jLXBe-00014P-DF for qemu-devel@nongnu.org; Mon, 06 Apr 2020 15:13:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jLXBc-0006j9-0a for qemu-devel@nongnu.org; Mon, 06 Apr 2020 15:13:34 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:38937 helo=us-smtp-1.mimecast.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jLXBb-0006ir-RB for qemu-devel@nongnu.org; Mon, 06 Apr 2020 15:13:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586200411; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=POVW9ePUf/Ib4Kc+laUKo0y2kHQOilu154c/mjCyHmc=; b=d76FctLnEpJo7wNOj15yIQC9WwFlxDsA5N6FOxwrDaq43MEmDlvUyuSuhi4JguIOc3ojoC sLWS/nNq19TnhfzCgiZXXAUsemCbGKlVasFoZFpjYEdeV0r9ZVUiiNFVsG+PBKJVZwtydV 9nR4Zzwe2bZ54B8UgbADRw5euguBQ54= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-208-m2quIq1yM9255sn-sYspUg-1; Mon, 06 Apr 2020 15:13:27 -0400 X-MC-Unique: m2quIq1yM9255sn-sYspUg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0B5B71005513; Mon, 6 Apr 2020 19:13:26 +0000 (UTC) Received: from virtlab701.virt.lab.eng.bos.redhat.com (virtlab701.virt.lab.eng.bos.redhat.com [10.19.152.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8A5C162697; Mon, 6 Apr 2020 19:13:25 +0000 (UTC) From: Paolo Bonzini To: qemu-devel@nongnu.org Subject: [PATCH 2/4] atomics: update documentation for C11 Date: Mon, 6 Apr 2020 15:13:18 -0400 Message-Id: <20200406191320.13371-3-pbonzini@redhat.com> In-Reply-To: <20200406191320.13371-1-pbonzini@redhat.com> References: <20200406191320.13371-1-pbonzini@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 205.139.110.120 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ying Fang , stefanha@redhat.com Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Deprecate atomic_mb_read and atomic_mb_set; it is not really possible to use them correctly because they do not interoperate with sequentially-consi= stent RMW operations. Signed-off-by: Paolo Bonzini --- docs/devel/atomics.rst | 290 ++++++++++++++++------------------------- 1 file changed, 114 insertions(+), 176 deletions(-) diff --git a/docs/devel/atomics.rst b/docs/devel/atomics.rst index 00f3f7d3ed..e92752a64d 100644 --- a/docs/devel/atomics.rst +++ b/docs/devel/atomics.rst @@ -11,10 +11,15 @@ that is consistent with the expectations of the program= mer. The most basic tool is locking. Mutexes, condition variables and semaphores are used in QEMU, and should be the default approach to synchronization. Anything else is considerably harder, but it's -also justified more often than one would like. The two tools that -are provided by ``qemu/atomic.h`` are memory barriers and atomic operation= s. +also justified more often than one would like; +the most performance-critical parts of QEMU in particular require +a very low level approach to concurrency, involving memory barriers +and atomic operations. The semantics of concurrent memory accesses are go= verned +by the C11 memory model. =20 -Macros defined by ``qemu/atomic.h`` fall in three camps: +QEMU provides a header, ``qemu/atomic.h``, which wraps C11 atomics to +provide better portability and a less verbose syntax. ``qemu/atomic.h`` +provides macros that fall in three camps: =20 - compiler barriers: ``barrier()``; =20 @@ -24,6 +29,14 @@ Macros defined by ``qemu/atomic.h`` fall in three camps: =20 - sequentially consistent atomic access: everything else. =20 +In general, use of ``qemu/atomic.h`` should be wrapped with more easily +used data structures (e.g. the lock-free singly-liked list operations +``QSLIST_INSERT_HEAD_ATOMIC`` and ``QSLIST_MOVE_ATOMIC``) or synchronizati= on +primitives (such as RCU, ``QemuEvent`` or ``QemuLockCnt``). Bare use of +atomic operations and memory barriers should be limited to inter-thread +checking of flags and documented thoroughly. + + =20 Compiler memory barrier =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D @@ -85,36 +98,14 @@ Similar operations return the new value of ``*ptr``:: typeof(*ptr) atomic_or_fetch(ptr, val) typeof(*ptr) atomic_xor_fetch(ptr, val) =20 -Sequentially consistent loads and stores can be done using:: - - atomic_fetch_add(ptr, 0) for loads - atomic_xchg(ptr, val) for stores +These operations operate on any type that is as wide as an int or smaller. =20 -However, they are quite expensive on some platforms, notably POWER and -Arm. Therefore, qemu/atomic.h provides two primitives with slightly -weaker constraints:: +``qemu/atomic.h`` also provides sequentially consistent loads and stores c= an:: =20 typeof(*ptr) atomic_mb_read(ptr) void atomic_mb_set(ptr, val) =20 -The semantics of these primitives map to Java volatile variables, -and are strongly related to memory barriers as used in the Linux -kernel (see below). - -As long as you use atomic_mb_read and atomic_mb_set, accesses cannot -be reordered with each other, and it is also not possible to reorder -"normal" accesses around them. - -However, and this is the important difference between -atomic_mb_read/atomic_mb_set and sequential consistency, it is important -for both threads to access the same volatile variable. It is not the -case that everything visible to thread A when it writes volatile field f -becomes visible to thread B after it reads volatile field g. The store -and load have to "match" (i.e., be performed on the same volatile -field) to achieve the right semantics. - - -These operations operate on any type that is as wide as an int or smaller. +which however are deprecated. =20 =20 Weak atomic access and manual memory barriers @@ -208,135 +199,62 @@ They come in six kinds: dependency and a full read barrier or better is required. =20 =20 -This is the set of barriers that is required *between* two ``atomic_read()= `` -and ``atomic_set()`` operations to achieve sequential consistency: - - +----------------+-----------------------------------------------------= --+ - | | 2nd operation = | - | +------------------+-----------------+----------------= --+ - | 1st operation | (after last) | atomic_read | atomic_set = | - +----------------+------------------+-----------------+----------------= --+ - | (before first) | .. | none | smp_mb_release(= ) | - +----------------+------------------+-----------------+----------------= --+ - | atomic_read | smp_mb_acquire() | smp_rmb() [1]_ | [2]_ = | - +----------------+------------------+-----------------+----------------= --+ - | atomic_set | none | smp_mb() [3]_ | smp_wmb() = | - +----------------+------------------+-----------------+----------------= --+ - - .. [1] Or smp_read_barrier_depends(). - - .. [2] This requires a load-store barrier. This is achieved by - either smp_mb_acquire() or smp_mb_release(). - - .. [3] This requires a store-load barrier. On most machines, the only - way to achieve this is a full barrier. - - -You can see that the two possible definitions of ``atomic_mb_read()`` -and ``atomic_mb_set()`` are the following: - - 1) | atomic_mb_read(p) =3D atomic_read(p); smp_mb_acquire() - | atomic_mb_set(p, v) =3D smp_mb_release(); atomic_set(p, v); smp_mb(= ) - - 2) | atomic_mb_read(p) =3D smp_mb() atomic_read(p); smp_mb_acquire() - | atomic_mb_set(p, v) =3D smp_mb_release(); atomic_set(p, v); - -Usually the former is used, because ``smp_mb()`` is expensive and a progra= m -normally has more reads than writes. Therefore it makes more sense to -make ``atomic_mb_set()`` the more expensive operation. - -There are two common cases in which atomic_mb_read and atomic_mb_set -generate too many memory barriers, and thus it can be useful to manually -place barriers, or use atomic_load_acquire/atomic_store_release instead: - -- when a data structure has one thread that is always a writer - and one thread that is always a reader, manual placement of - memory barriers makes the write side faster. Furthermore, - correctness is easy to check for in this case using the "pairing" - trick that is explained below: - - +---------------------------------------------------------------------= -+ - | thread 1 = | - +-----------------------------------+---------------------------------= -+ - | before | after = | - +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ - | :: | :: = | - | | = | - | (other writes) | = | - | atomic_mb_set(&a, x) | atomic_store_release(&a, x) = | - | atomic_mb_set(&b, y) | atomic_store_release(&b, y) = | - +-----------------------------------+---------------------------------= -+ - - +---------------------------------------------------------------------= -+ - | thread 2 = | - +-----------------------------------+---------------------------------= -+ - | before | after = | - +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ - | :: | :: = | - | | = | - | y =3D atomic_mb_read(&b) | y =3D atomic_load_acquire(&b= ) | - | x =3D atomic_mb_read(&a) | x =3D atomic_load_acquire(&a= ) | - | (other reads) | = | - +-----------------------------------+---------------------------------= -+ - - Note that the barrier between the stores in thread 1, and between - the loads in thread 2, has been optimized here to a write or a - read memory barrier respectively. On some architectures, notably - ARMv7, smp_mb_acquire and smp_mb_release are just as expensive as - smp_mb, but smp_rmb and/or smp_wmb are more efficient. - -- sometimes, a thread is accessing many variables that are otherwise - unrelated to each other (for example because, apart from the current - thread, exactly one other thread will read or write each of these - variables). In this case, it is possible to "hoist" the implicit - barriers provided by ``atomic_mb_read()`` and ``atomic_mb_set()`` outsid= e - a loop. For example, the above definition ``atomic_mb_read()`` gives - the following transformation: - - +-----------------------------------+---------------------------------= -+ - | before | after = | - +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ - | :: | :: = | - | | = | - | n =3D 0; | n =3D 0; = | - | for (i =3D 0; i < 10; i++) | for (i =3D 0; i < 10; i++) = | - | n +=3D atomic_mb_read(&a[i]); | n +=3D atomic_read(&a[i]);= | - | | smp_mb_acquire(); = | - +-----------------------------------+---------------------------------= -+ - - Similarly, atomic_mb_set() can be transformed as follows: - - +-----------------------------------+---------------------------------= -+ - | before | after = | - +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ - | :: | :: = | - | | = | - | | smp_mb_release(); = | - | for (i =3D 0; i < 10; i++) | for (i =3D 0; i < 10; i++) = | - | atomic_mb_set(&a[i], false); | atomic_set(&a[i], false); = | - | | smp_mb(); = | - +-----------------------------------+---------------------------------= -+ - - - The other thread can still use ``atomic_mb_read()``/``atomic_mb_set()``. +Memory barriers and ``atomic_load_acquire``/``atomic_store_release`` are +mostly used when a data structure has one thread that is always a writer +and one thread that is always a reader: + + +----------------------------------+----------------------------------= + + | thread 1 | thread 2 = | + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ + | :: | :: = | + | | = | + | atomic_store_release(&a, x) | y =3D atomic_load_acquire(&b) = | + | atomic_store_release(&b, y) | x =3D atomic_load_acquire(&a) = | + +----------------------------------+----------------------------------= + + +In this case, correctness is easy to check for in this case using the +"pairing" trick that is explained below. + +Sometimes, a thread is accessing many variables that are otherwise +unrelated to each other (for example because, apart from the current +thread, exactly one other thread will read or write each of these +variables). In this case, it is possible to "hoist" the barriers +outside a loop. For example: + + +------------------------------------------+--------------------------= --------+ + | before | after = | + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D+ + | :: | :: = | + | | = | + | n =3D 0; | n =3D 0; = | + | for (i =3D 0; i < 10; i++) | for (i =3D 0; i < 10;= i++) | + | n +=3D atomic_load_acquire(&a[i]); | n +=3D atomic_read(= &a[i]); | + | | smp_mb_acquire(); = | + +------------------------------------------+--------------------------= --------+ + | :: | :: = | + | | = | + | | smp_mb_release(); = | + | for (i =3D 0; i < 10; i++) | for (i =3D 0; i < 10;= i++) | + | atomic_store_release(&a[i], false); | atomic_set(&a[i], fal= se); | + +------------------------------------------+--------------------------= --------+ =20 The two tricks can be combined. In this case, splitting a loop in -two lets you hoist the barriers out of the loops _and_ eliminate the -expensive ``smp_mb()``: - - +-----------------------------------+---------------------------------= -+ - | before | after = | - +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ - | :: | :: = | - | | = | - | | smp_mb_release(); = | - | for (i =3D 0; i < 10; i++) { | for (i =3D 0; i < 10; i++)= | - | atomic_mb_set(&a[i], false); | atomic_set(&a[i], false); = | - | atomic_mb_set(&b[i], false); | smb_wmb(); = | - | } | for (i =3D 0; i < 10; i++) = | - | | atomic_set(&a[i], false); = | - | | smp_mb(); = | - +-----------------------------------+---------------------------------= -+ +two lets you hoist the barriers out of the loops, and replace a +``smp_mb_release()`` with a (possibly cheaper, and clearer as well) +``smp_wmb()``: + + +------------------------------------------+--------------------------= --------+ + | before | after = | + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D+ + | :: | :: = | + | | = | + | | smp_mb_release(); = | + | for (i =3D 0; i < 10; i++) { | for (i =3D 0; i < 1= 0; i++) | + | atomic_store_release(&a[i], false); | atomic_set(&a[i], f= alse); | + | atomic_store_release(&b[i], false); | smb_wmb(); = | + | } | for (i =3D 0; i < 10;= i++) | + | | atomic_set(&b[i], f= alse); | + +------------------------------------------+--------------------------= --------+ =20 =20 Memory barrier pairing @@ -347,11 +265,13 @@ always, be paired with another barrier. In the case = of QEMU, however, note that the other barrier may actually be in a driver that runs in the guest! =20 -For the purposes of pairing, ``smp_read_barrier_depends()`` and ``smp_rmb(= )`` -both count as read barriers. A read barrier shall pair with a write -barrier or a full barrier; a write barrier shall pair with a read -barrier or a full barrier. A full barrier can pair with anything. -For example: +For the purposes of pairing, ``smp_read_barrier_depends()``, ``smp_rmb()`` +and ``smp_mb_acquire()`` all count as read barriers; ``smp_wmb()` and +``smp_mb_release()`` both count as write barriers. + +A read barrier shall pair with a write barrier or a full barrier; +a write barrier shall pair with a read barrier or a full barrier. +A full barrier can pair with anything. For example: =20 +--------------------+------------------------------+ | thread 1 | thread 2 | @@ -387,9 +307,6 @@ access and for data dependency barriers: | | z =3D b[y]; | +--------------------+------------------------------+ =20 -``smp_wmb()`` also pairs with ``atomic_mb_read()`` and ``smp_mb_acquire()`= `. -and ``smp_rmb()`` also pairs with ``atomic_mb_set()`` and ``smp_mb_release= ()``. - =20 Comparison with Linux kernel memory barriers =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D @@ -424,24 +341,45 @@ and memory barriers, and the equivalents in QEMU: ``atomic_cmpxchg`` returns the old value of the variable =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 - In QEMU, the second kind does not exist. Currently Linux has - atomic_fetch_or only. QEMU provides and, or, inc, dec, add, sub. + In QEMU, the second kind does not exist. =20 - different atomic read-modify-write operations in Linux imply a different set of memory barriers; in QEMU, all of them enforce - sequential consistency, which means they imply full memory barriers - before and after the operation. - -- Linux does not have an equivalent of ``atomic_mb_set()``. In particular= , - note that ``smp_store_mb()`` is a little weaker than ``atomic_mb_set()``= . - ``atomic_mb_read()`` compiles to the same instructions as Linux's - ``smp_load_acquire()``, but this should be treated as an implementation - detail. + sequential consistency. + +- in QEMU, ``atomic_read()`` and ``atomic_set()`` do not participate in + the total ordering enforced by sequentially-consistent operations. + This is because QEMU uses the C11 memory model. The following example + is correct in Linux but not in QEMU: + + +----------------------------------+--------------------------------= + + | Linux (correct) | QEMU (incorrect) = | + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ + | :: | :: = | + | | = | + | a =3D atomic_fetch_add(&x, 2); | a =3D atomic_fetch_add(&x, = 2); | + | b =3D READ_ONCE(&y); | b =3D atomic_read(&y); = | + +----------------------------------+--------------------------------= + + + because the read of ``y`` can be moved (by either the processor or the + compiler) before the write of ``x``. + + Fixing this requires an ``smp_mb()`` memory barrier between the write + of ``x`` and the read of ``y``. In the common case where only one threa= d + writes ``x``, it is also possible to write it like this: + + +--------------------------------+ + | QEMU (correct) | + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ + | :: | + | | + | a =3D atomic_read(&x); | + | atomic_set(&x, a + 2); | + | smp_mb(); | + | b =3D atomic_read(&y); | + +--------------------------------+ =20 Sources =3D=3D=3D=3D=3D=3D=3D =20 * Documentation/memory-barriers.txt from the Linux kernel - -* "The JSR-133 Cookbook for Compiler Writers", available at - http://g.oswego.edu/dl/jmm/cookbook.html --=20 2.18.2