Re: [PATCH v2 8/9] async: update documentation of the memory barriers

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Richard Henderson <richard.henderson@linaro.org>, qemu-devel@nongnu.org
Cc: gshan@redhat.com, eesposit@redhat.com, david@redhat.com,
	stefanha@redhat.com, cohuck@redhat.com, eauger@redhat.com
Subject: Re: [PATCH v2 8/9] async: update documentation of the memory barriers
Date: Wed, 8 Mar 2023 19:08:05 +0100	[thread overview]
Message-ID: <b7af7150-af5d-de86-e237-7dafa326b862@redhat.com> (raw)
In-Reply-To: <3c2362c4-1d2f-f749-db1e-201d985e67be@linaro.org>

On 3/8/23 17:47, Richard Henderson wrote:
>> The case that I was imagining for smp_mb__before_rmw() is something 
>> like this:
>>
>>      wake_me = true;
>>      smp_mb__before_rmw();
>>      if (qatomic_xchg(&can_sleep, true)) { ... }
>>
>> where you really need a full barrier.
> 
> What is different about this that doesn't apply to the remove-head case 
> we've been talking about?

For remove-head, nothing is going to change the BH_PENDING flag while 
the code runs.  This would be an okay transformation of the code, at 
both the compiler and the processor level:

   // first part of atomic_fetch_and
   old_flags = LDAXR of bh->flags

   // QSLIST_REMOVE_HEAD ends up between load and store of
   // atomic_fetch_and
   all the loads and stores for remove-head

   // second part of atomic_fetch_and
   new_flags = old_flags & ~(BH_PENDING|BH_SCHEDULED|BH_IDLE);
   (successful) STLXR of new_flags into bh->flags


Instead in the case above, I was thinking you would get a missed wakeup 
without the barriers.  Here is the full pseudocode:

   // this store can move below the load of can_sleep
   qatomic_set(&wake_me, true);
   if (qatomic_xchg(&can_sleep, true)) sleep;

   // this store can move below the load of wake_me
   qatomic_set(&can_sleep, false);
   if (qatomic_xchg(&wake_me, false)) wake them;

The buggy case is where the threads observe can_sleep = true, wake_me = 
false, i.e. the original value of the variables.  Let's look at it with 
CPPMEM.

Like before, the CPPMEM test must use CAS and .readsvalue() to hack 
around the lack of "if"s.  Two .readsvalue() clauses represent the buggy 
case, so we are all good if there is *no* consistent executions.

Unfortunately, it fails:

   // 2 consistent (i.e. buggy) executions
   int main() {
     atomic_int w = 0;
     atomic_int s = 1;
     {{{
       { w.store(1, mo_relaxed);
         // the buggy case is the one in which the threads read the
         // original value of w and s.  So here the CAS writes a
         // dummy value different from both 0 and 1
         cas_strong_explicit(&s, 0, 99, mo_seq_cst, mo_seq_cst);
         s.load(mo_relaxed).readsvalue(1); }
     |||
       { s.store(0, mo_relaxed);
         // same as above
         cas_strong_explicit(&w, 1, 99, mo_seq_cst, mo_seq_cst);
         w.load(mo_relaxed).readsvalue(0); }
     }}}
   }

It works with barriers, which models the extra smp_mb__before_rmw():

   // no consistent executions (i.e. it works)
   int main() {
     atomic_int w = 0;
     atomic_int s = 1;
     {{{
       { w.store(1, mo_relaxed);
         atomic_thread_fence(mo_seq_cst);
         cas_strong_explicit(&s, 0, 99, mo_relaxed, mo_relaxed);
         s.load(mo_relaxed).readsvalue(1); }
     |||
       { s.store(0, mo_relaxed);
         atomic_thread_fence(mo_seq_cst);
         cas_strong_explicit(&w, 1, 99, mo_relaxed, mo_relaxed);
         w.load(mo_relaxed).readsvalue(0); }
     }}}
   }

I think I agree with you that _in practice_ it's going to work at the 
processor level; the pseudo-assembly would be

   r1 = LDAXR(can_sleep);
                                    r2 = LDAXR(wake_me);
                                    STR(can_sleep, false);
                                    STLXR(wake_me, false); // successful
   STR(wake_me, true);
   STLXR(can_sleep, true); // successful (?)
   if r1 == 0 { ... }
                                    if r2 != 0 { ... }

and I can't think of a way in which both store-exclusives (or xchg, or 
cmpxchg) would succeed.  So perhaps smp_mb__before_rmw() could indeed be 
a signal_fence().  But that's quite scary even for the standards of this 
discussion...

Paolo

next prev parent reply	other threads:[~2023-03-08 18:08 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 22:32 [PATCH v2 0/9] Fix missing memory barriers on ARM Paolo Bonzini
2023-03-06 22:32 ` [PATCH v2 1/9] qatomic: add smp_mb__before/after_rmw() Paolo Bonzini
2023-03-06 22:32 ` [PATCH v2 2/9] qemu-thread-posix: cleanup, fix, document QemuEvent Paolo Bonzini
2023-03-06 22:33 ` [PATCH v2 3/9] qemu-thread-win32: " Paolo Bonzini
2023-03-06 22:33 ` [PATCH v2 4/9] edu: add smp_mb__after_rmw() Paolo Bonzini
2023-03-06 22:33 ` [PATCH v2 5/9] aio-wait: switch to smp_mb__after_rmw() Paolo Bonzini
2023-03-07 10:50   ` David Hildenbrand
2023-03-06 22:33 ` [PATCH v2 6/9] qemu-coroutine-lock: add smp_mb__after_rmw() Paolo Bonzini
2023-03-06 22:33 ` [PATCH v2 7/9] physmem: add missing memory barrier Paolo Bonzini
2023-03-06 22:33 ` [PATCH v2 8/9] async: update documentation of the memory barriers Paolo Bonzini
2023-03-06 22:48   ` Stefan Hajnoczi
2023-03-06 23:39   ` Richard Henderson
2023-03-07 10:49     ` Paolo Bonzini
2023-03-07 15:54       ` Richard Henderson
2023-03-07 17:00         ` Paolo Bonzini
2023-03-07 17:26           ` Richard Henderson
2023-03-08 10:49             ` Paolo Bonzini
2023-03-08 16:47               ` Richard Henderson
2023-03-08 18:08                 ` Paolo Bonzini [this message]
2023-03-08 18:41                   ` Richard Henderson
2023-03-06 22:33 ` [PATCH v2 9/9] async: clarify usage of barriers in the polling case Paolo Bonzini
2023-03-06 22:49   ` Stefan Hajnoczi
2023-03-07 10:51   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7af7150-af5d-de86-e237-7dafa326b862@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eesposit@redhat.com \
    --cc=gshan@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).