From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:36445)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1TYxYU-00038c-0q
	for qemu-devel@nongnu.org; Thu, 15 Nov 2012 06:24:25 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1TYxYQ-0002wu-Un
	for qemu-devel@nongnu.org; Thu, 15 Nov 2012 06:24:21 -0500
Received: from mx1.redhat.com ([209.132.183.28]:32948)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1TYxYQ-0002wl-Mp
	for qemu-devel@nongnu.org; Thu, 15 Nov 2012 06:24:18 -0500
Message-ID: <50A4D0D9.1060801@redhat.com>
Date: Thu, 15 Nov 2012 12:24:09 +0100
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <CAJnKYQnJ=kmDf0uddxe2kT2GrydAhmS4CbyHho5B_EBNhTFqOA@mail.gmail.com>
	<1914486608.9887816.1352801368641.JavaMail.root@redhat.com>
	<CAJnKYQ=f60acM+rw_pbG+tQ0TH=O-3XHTAbim6haADRWq9yCJg@mail.gmail.com>
	<50A368CD.1080002@redhat.com>
	<CAJnKYQ=S=y2_-Z=K9M1uV1MjgKO48q8HAJnAOsdU4gNZjym7iA@mail.gmail.com>
In-Reply-To: <CAJnKYQ=S=y2_-Z=K9M1uV1MjgKO48q8HAJnAOsdU4gNZjym7iA@mail.gmail.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH v6 1/8] atomic: introduce atomic operations
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: liu ping fan <qemulist@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>, Stefan Hajnoczi <stefanha@gmail.com>, Marcelo Tosatti <mtosatti@redhat.com>, qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>, Anthony Liguori <anthony@codemonkey.ws>, Jan Kiszka <jan.kiszka@siemens.com>

Il 15/11/2012 08:47, liu ping fan ha scritto:
>>>> RCU prototype required pointer-sized access, which you cannot make t=
ype-
>>> But I think that your RCU prototype should rely on atomic of CPU, not
>>> gcc=91s atomic.
>>
>> What's the difference?  gcc's atomic produces the same instructions as
>> hand-written assembly (or should).
>>
> Probably I made a mistake here, in vhost,  log =3D
> __sync_fetch_and_and(from, 0)  is used to fetch 64bits atomically in
> the case  32bits qemu running on 64bits linux.  Right?   But how can
> we read 32bits twice in atomic?

You can use cmpxchg8b.

>>> Otherwise, it could be slow (I guess something like spinlock there).
>>
>> Not sure what you mean.  I'm using two things: 1) write barriers; 2)
>> atomic_xchg of a pointer for fast, wait-free enqueuing of call_rcu
>> callbacks.
>
> Got your main idea. Will go through your prototype. Just one more
> question, why here atomic_xchg needed?  Could not the sequence "read
> old pointer, set new pointer" satisfy your requirement?

No, it would be racy.  Say you have this:

     wrong                     right
  -----------------------------------------
     ref(new)                  ref(new)
     old =3D tail                old =3D new
     tail =3D new                xchg(&old, &tail)
     old->next =3D new           old->next =3D new
     up(&sem)                  up(&sem)

where sem and tail are global, while old and new are local.  There can
be any number of producers as in the above scheme, and one consumer.  In
the consumer, iteration of the list starts from the second element (the
first element is dummy):

    dummy =3D new Node;
    tail =3D dummy;
    for(;;) {
        down(&sem);
        node =3D dummy->next;
        unref(dummy);
        visit node;
        dummy =3D node;
    }

Then the invariants are:

- the number of elements N in the list (starting from the dummy node and
following ->next) is always >1.  Because of this, you never have to
touch head when adding an element.

- the number of excess signals in the semaphore sem is always <=3D N-1.
Because of this, it doesn't matter that there is an instant in time
where tail is not reachable from head.  The element is really in the
list (as far as the consumer goes) only after the semaphore is signaled.

In the wrong version, two threads could read the same pointer:

    x =3D tail                                  |
                         y =3D tail             |
    tail =3D new_x                              |
                         tail =3D new_y         |
    x->next =3D new_x                           |   old_tail->next =3D ne=
w_x
    up(&sem)                                  |
                         y->next =3D new_y      |   old_tail->next =3D ne=
w_x
                         up(&sem)             |

No node points to new_x, so you have 2 nodes reachable from the head
(the dummy node, and new_y) with 2 signals on the semaphore,
contradicting the second invariant.

This instead works:

    x =3D new_x                                 |
                         y =3D new_y            |
    xchg(&x, &tail)                           |   tail =3D x, x =3D old_t=
ail
                         xchg(&y, &tail)      |   tail =3D y, y =3D x
    x->next =3D new_x                           |   old_tail->next =3D ne=
w_x
    up(&sem)                                  |
                         y->next =3D new_y      |   x->next =3D new_y
                         up(&sem)             |

because you have 3 nodes and 2 signals.

Paolo