All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Richard Henderson <richard.henderson@linaro.org>, qemu-devel@nongnu.org
Cc: gshan@redhat.com, eesposit@redhat.com, david@redhat.com,
	stefanha@redhat.com, cohuck@redhat.com, eauger@redhat.com
Subject: Re: [PATCH 5/8] util/async: add smp_mb__after_rmw() around BH enqueue/dequeue
Date: Mon, 6 Mar 2023 10:55:52 +0100	[thread overview]
Message-ID: <2e13c2af-1bee-32ae-a6bf-181ad00a3319@redhat.com> (raw)
In-Reply-To: <ba42506d-4906-3d4b-f934-586356278355@linaro.org>

On 3/5/23 20:32, Richard Henderson wrote:
> On 3/3/23 09:19, Paolo Bonzini wrote:
>>       unsigned old_flags;
>>       /*
>>        * The memory barrier makes sure that:
>>        * 1. any writes needed by the callback are visible from the callback
>>        *    after aio_bh_dequeue() returns bh.
>>        * 2. ctx is loaded before the callback has a chance to execute and bh
>>        *    could be freed.
>>        */
>>       old_flags = qatomic_fetch_or(&bh->flags, BH_PENDING | new_flags);
>> +    smp_mb__after_rmw();
>> +
>>       if (!(old_flags & BH_PENDING)) {
>>           QSLIST_INSERT_HEAD_ATOMIC(&ctx->bh_list, bh, next);
>>       }
>> @@ -107,14 +109,15 @@ static QEMUBH *aio_bh_dequeue(BHList *head, 
>> unsigned *flags)
>>       QSLIST_REMOVE_HEAD(head, next);
>>       /*
>>        * The memory barrier is paired with aio_bh_enqueue() and it
>>        * ensures that the callback sees all writes done by the scheduling
>>        * thread.  It also ensures that the scheduling thread sees the cleared
>>        * flag before bh->cb has run, and thus will call aio_notify again if
>>        * necessary.
>>        */
> 
> Is it really paired with aio_bh_enqueue?
> 
> Seems like the enqueue barrier really is for aio_bh_poll, and the 
> dequeue barrier is for the callback.

The documentation has been quite obsolete since the introduction of
bh_list.  The logic is as follows:

aio_bh_enqueue()
   load bh->ctx
   set PENDING flag                                          // (0)
   if flag was not set
     // bh becomes visible to dequeuing thread here:
     insert into bh_list                                     // (1)

   aio_notify
     // Write bh->flags and bh_list before ctx->notified
     smp_wmb()                                               // (2)
     set notified to true
     // Write notified before reading notify_me
     smp_mb()                                                // (3)
     if notify_me then event_notifier_set()

and on the dequeue side it's tricky to see why all barriers are
needed; it boils down to the busy-wait polling done at the beginning
of aio_poll():

aio_poll()
   compute approximate timeout (*unordered* read of bh_list)

   enable notify_me
   // Write notify_me before reading notified
   smp_mb()                                // synchronizes with (3)
   if notified then timeout = 0
   ctx->fdmon_ops->wait(timeout)

   disable notify_me
   aio_notify_accept()
     set notified to false
     // Write ctx->notified before reading e.g. bh_list
     smp_mb()                              // synchronizes with (2)

   aio_bh_poll()
     QSLIST_MOVE_ATOMIC                    // synchronizes with (1)
     aio_bh_dequeue
       remove from head
       clear PENDING/SCHEDULED/IDLE        // synchronizes with (0)
     if flags were set
       aio_bh_call


That is:

for synchronization point (0)
- the main function here is to ensure that aio_bh_dequeue() removes
   from the list before the PENDING flag is cleared, otherwise enqueuers can
   clobber the list, and vice versa for the enqueuers.  For this, it is enough
   that qatomic_fetch_and() is "release" and qatomic_fetch_or() is "acquire"
   (and they are).

for synchronization point (1)
- the bottom half machinery between aio_bh_enqueue and aio_bh_poll only
   needs release-to-acquire synchronization, and that is provided by (1).
   This is also where ordering ensures that bh->ctx is loaded before the
   callback executes (which could free bh).

- between enqueuers, setting the PENDING flag only needs to be done before
   inserting into bh_list to avoid double insertion (and of course needs to
   be done atomically); for this purpose, QSLIST_INSERT_HEAD_ATOMIC needs to
   be "release" (which it is).

Also the call to aio_notify() in aio_bh_enqueue() is unconditional, so
aio_bh_dequeue() need not protect against missed calls to aio_notify().
aio_notify() only synchronizes with aio_poll() and aio_notify_accept().

for synchronization point (2)
- This cannot be just a smp_rmb() because the timeout that was computed
   earlier was not ordered against either notified or notify_me.

- This is a smp_mb() for generality; as far as bh_list is
   concerned it could be smp_mb__before_rmw().

Synchronization point (3) is pretty mundane in comparison.

So I'm dropping this patch; I have prepared a documentation patch instead.

Paolo



  reply	other threads:[~2023-03-06  9:56 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-03 17:19 [PATCH 0/8] Fix missing memory barriers on ARM Paolo Bonzini
2023-03-03 17:19 ` [PATCH 1/8] qatomic: add smp_mb__before/after_rmw() Paolo Bonzini
2023-03-05 18:57   ` Richard Henderson
2023-03-05 21:00     ` Paolo Bonzini
2023-03-06 13:21   ` David Hildenbrand
2023-03-06 13:22     ` David Hildenbrand
2023-03-03 17:19 ` [PATCH 2/8] qemu-thread-posix: cleanup, fix, document QemuEvent Paolo Bonzini
2023-03-05 19:11   ` Richard Henderson
2023-03-06 13:28   ` David Hildenbrand
2023-03-03 17:19 ` [PATCH 3/8] qemu-thread-win32: " Paolo Bonzini
2023-03-05 19:14   ` Richard Henderson
2023-03-06 13:31   ` David Hildenbrand
2023-03-06 14:20     ` Paolo Bonzini
2023-03-06 14:32       ` David Hildenbrand
2023-03-06 15:17         ` Paolo Bonzini
2023-03-03 17:19 ` [PATCH 4/8] edu: add smp_mb__after_rmw() Paolo Bonzini
2023-03-05 19:14   ` Richard Henderson
2023-03-06 13:31   ` David Hildenbrand
2023-03-06 13:38   ` Peter Maydell
2023-03-06 14:10     ` Paolo Bonzini
2023-03-06 14:24       ` Peter Maydell
2023-03-06 15:06         ` Paolo Bonzini
2023-03-06 15:36           ` Peter Maydell
2023-03-03 17:19 ` [PATCH 5/8] util/async: add smp_mb__after_rmw() around BH enqueue/dequeue Paolo Bonzini
2023-03-05 19:32   ` Richard Henderson
2023-03-06  9:55     ` Paolo Bonzini [this message]
2023-03-03 17:19 ` [PATCH 6/8] aio-wait: switch to smp_mb__after_rmw() Paolo Bonzini
2023-03-05 19:32   ` Richard Henderson
2023-03-06 13:32   ` David Hildenbrand
2023-03-06 14:38   ` Stefan Hajnoczi
2023-03-03 17:19 ` [PATCH 7/8] qemu-coroutine-lock: add smp_mb__after_rmw() Paolo Bonzini
2023-03-05 19:36   ` Richard Henderson
2023-03-06 13:33   ` David Hildenbrand
2023-03-03 17:19 ` [PATCH 8/8] physmem: add missing memory barrier Paolo Bonzini
2023-03-05 19:40   ` Richard Henderson
2023-03-06 13:34   ` David Hildenbrand
2023-03-06 13:35 ` [PATCH 0/8] Fix missing memory barriers on ARM David Hildenbrand
2023-03-06 14:14   ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2e13c2af-1bee-32ae-a6bf-181ad00a3319@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eesposit@redhat.com \
    --cc=gshan@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.