From: Akihiko Odaki <akihiko.odaki@daynix.com>
To: Paolo Bonzini <pbonzini@redhat.com>, Stefan Weil <sw@weilnetz.de>,
Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>,
Hailiang Zhang <zhanghailiang@xfusion.com>
Cc: "Phil Dennis-Jordan" <phil@philjordan.eu>,
qemu-devel@nongnu.org, devel@daynix.com,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Akihiko Odaki" <akihiko.odaki@daynix.com>
Subject: [PATCH v5 13/13] qemu-thread: Document QemuEvent memory ordering
Date: Thu, 29 May 2025 14:46:02 +0900 [thread overview]
Message-ID: <20250529-event-v5-13-53b285203794@daynix.com> (raw)
In-Reply-To: <20250529-event-v5-0-53b285203794@daynix.com>
Background
----------
I added a variant of QemuEvent for platforms without futex and
refactored qemu_event_set() to find it is difficult to show the
correctness of these changes.
Analyzing the code, I found there are three different levels of memory
ordering constraints:
1. The requirement of QemuEvent users.
2. The constraints that needs to be imposed on memory operations to
satisfy 1.
3. The actual constraints imposed on memory operations.
Distinguishing these levels make it easier to show the correctness of
the mentioned changes. Constraint 1 is not affected by the mentioned
changes, so we only need to focus on contraint 2 and 3. It also allows
sharing constraints 1 and 2 for the variants of QemuEvent
implementations.
Change description
------------------
The code already included some documentation of memory ordering as
inline comments. This change splits the documentation into the three
levels, and place them to appropriate places. Each of the documentations
of constraint 2 and 3 includes references to one of its higher level.
Constraint 1 is now documented in include/qemu/thread.h so that the
users of QemuEvent can refer to it without reading the implementation.
Constraint 2 follows the documentation of the state transitions in
util/event.c.
The inline comments describe constraint 3. The documentation of memory
operations already describe the memory ordering of them well, so they
usually contain only references to constraint 2, which memory operations
are intended to satisfy.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
include/qemu/thread.h | 19 ++++++++++++
util/event.c | 84 ++++++++++++++++++++++++++++-----------------------
2 files changed, 66 insertions(+), 37 deletions(-)
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index f0302ed01fdb..8fafff73de7d 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -19,6 +19,25 @@ typedef struct QemuThread QemuThread;
* https://learn.microsoft.com/en-us/windows/win32/sync/using-event-objects
*
* QemuEvent is more lightweight than QemuSemaphore when HAVE_FUTEX is defined.
+ *
+ * Memory ordering
+ * ---------------
+ *
+ * The documentation of Win32 manual-reset event objects do not specify
+ * memory ordering. Below describes the memory ordering QemuEvent establishes.
+ *
+ * Assume each of the following two orders is specified in a different thread:
+ * - X -> qemu_event_set()
+ * - qemu_event_reset() or qemu_event_wait() -> Y
+ *
+ * X -> Y will be ensured for the two threads if any of the following is
+ * satisfied:
+ * - qemu_event_set() happens before qemu_event_reset().
+ * - qemu_event_set() happens before qemu_event_wait().
+ * - qemu_event_wait() waits for qemu_event_set().
+ *
+ * Note that this is true even when the value is already set before
+ * qemu_event_set().
*/
typedef struct QemuEvent {
#ifndef HAVE_FUTEX
diff --git a/util/event.c b/util/event.c
index df6d60836041..d0dc6ebc00ff 100644
--- a/util/event.c
+++ b/util/event.c
@@ -5,24 +5,33 @@
/*
* Valid transitions:
- * - free->set, when setting the event
- * - busy->set, when setting the event
- * - set->free, when resetting the event
- * - free->busy, when waiting
+ * - FREE -> SET (qemu_event_set)
+ * - BUSY -> SET (qemu_event_set)
+ * - SET -> FREE (qemu_event_reset)
+ * - FREE -> BUSY (qemu_event_wait)
*
- * With futex, the waking and blocking operations follow busy->set and
- * free->busy, respectively.
+ * With futex, the waking and blocking operations follow
+ * BUSY -> SET and FREE -> BUSY, respectively.
*
- * Without futex, busy->set and free->busy never happen. Instead, the waking
- * operation follows free->set and the blocking operation will happen when
- * waiting if the event is not set.
+ * Without futex, BUSY -> SET and FREE -> BUSY never happen. Instead, the waking
+ * operation follows FREE -> SET and the blocking operation will happen in
+ * qemu_event_wait() if the event is not SET.
*
- * set->busy does not happen (it can be observed from the outside but
- * it really is set->free->busy).
+ * The following orders specified in a thread are preserved for any other thread
+ * accessing the event value:
+ * 1. qemu_event_set: X -> store SET
+ * 2. qemu_event_reset: store FREE -> X
+ * 3. qemu_event_wait: load SET -> X
+ * 4. qemu_event_set: store SET -> wake
*
- * busy->free provably cannot happen; to enforce it, the set->free transition
- * is done with an OR, which becomes a no-op if the event has concurrently
- * transitioned to free or busy.
+ * Different combinations of orders 1, 2, 3, and 4 establish the order visible
+ * to the users of QemuEvent in different situations:
+ * When qemu_event_set() happens before qemu_event_reset(): orders 1 and 2
+ * When qemu_event_set() happens before qemu_event_wait(): orders 1 and 3
+ * when qemu_event_wait() waits for qemu_event_set(): orders 1 and 3
+ *
+ * Order 4 ensures that qemu_event_set() wakes qemu_event_wait() if it is
+ * blocked.
*/
#define EV_SET 0
@@ -55,12 +64,23 @@ void qemu_event_set(QemuEvent *ev)
assert(ev->initialized);
#ifdef HAVE_FUTEX
+ /*
+ * Transitions:
+ * - FREE -> SET
+ * - BUSY -> SET
+ *
+ * Order 1. X -> store SET
+ */
if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
- /* There were waiters, wake them up. */
+ /* Order 4. store SET -> wake */
qemu_futex_wake_all(ev);
}
#else
pthread_mutex_lock(&ev->lock);
+ /*
+ * Transition FREE -> SET
+ * Order 1. X -> store SET
+ */
qatomic_store_release(&ev->value, EV_SET);
pthread_cond_broadcast(&ev->cond);
pthread_mutex_unlock(&ev->lock);
@@ -72,15 +92,14 @@ void qemu_event_reset(QemuEvent *ev)
assert(ev->initialized);
/*
- * If there was a concurrent reset (or even reset+wait),
- * do nothing. Otherwise change EV_SET->EV_FREE.
+ * Transition SET -> FREE
+ *
+ * Ensure that BUSY -> FREE never happens with an OR, which becomes a no-op
+ * if the event has concurrently transitioned to FREE or BUSY.
*/
qatomic_or(&ev->value, EV_FREE);
- /*
- * Order reset before checking the condition in the caller.
- * Pairs with the store-release in qemu_event_set().
- */
+ /* Order 2. store FREE -> X */
smp_mb__after_rmw();
}
@@ -90,28 +109,14 @@ void qemu_event_wait(QemuEvent *ev)
#ifdef HAVE_FUTEX
while (true) {
- /*
- * qemu_event_wait must synchronize with qemu_event_set even if it does
- * not go down the slow path, so this load-acquire is needed that
- * synchronizes with the store-release in qemu_event_set().
- */
+ /* Order 3. load SET -> X */
unsigned value = qatomic_load_acquire(&ev->value);
if (value == EV_SET) {
break;
}
if (value == EV_FREE) {
- /*
- * Leave the event reset and tell qemu_event_set that there are
- * waiters. No need to retry, because there cannot be a concurrent
- * busy->free transition. After the CAS, the event will be either
- * set or busy.
- *
- * This cmpxchg doesn't have particular ordering requirements if it
- * succeeds (moving the store earlier can only cause
- * qemu_event_set() to issue _more_ wakeups), the failing case needs
- * acquire semantics like the load above.
- */
+ /* Order 3. load SET -> X */
if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
break;
}
@@ -120,6 +125,11 @@ void qemu_event_wait(QemuEvent *ev)
qemu_futex_wait(ev, EV_BUSY);
}
#else
+ /*
+ * Order 3. load SET -> X
+ *
+ * qatomic_read() loads SET. ev->lock ensures the order.
+ */
pthread_mutex_lock(&ev->lock);
while (qatomic_read(&ev->value) != EV_SET) {
pthread_cond_wait(&ev->cond, &ev->lock);
--
2.49.0
next prev parent reply other threads:[~2025-05-29 5:48 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-29 5:45 [PATCH v5 00/13] Improve futex usage Akihiko Odaki
2025-05-29 5:45 ` [PATCH v5 01/13] futex: Check value after qemu_futex_wait() Akihiko Odaki
2025-05-29 5:45 ` [PATCH v5 02/13] futex: Support Windows Akihiko Odaki
2025-05-29 5:45 ` [PATCH v5 03/13] qemu-thread: Replace __linux__ with CONFIG_LINUX Akihiko Odaki
2025-05-29 5:45 ` [PATCH v5 04/13] qemu-thread: Avoid futex abstraction for non-Linux Akihiko Odaki
2025-05-29 5:45 ` [PATCH v5 05/13] qemu-thread: Use futex for QemuEvent on Windows Akihiko Odaki
2025-05-29 5:45 ` [PATCH v5 06/13] qemu-thread: Use futex if available for QemuLockCnt Akihiko Odaki
2025-05-29 5:45 ` [PATCH v5 07/13] migration: Replace QemuSemaphore with QemuEvent Akihiko Odaki
2025-05-29 14:28 ` Peter Xu
2025-05-29 5:45 ` [PATCH v5 08/13] migration/colo: " Akihiko Odaki
2025-05-29 5:45 ` [PATCH v5 09/13] migration/postcopy: " Akihiko Odaki
2025-05-29 5:45 ` [PATCH v5 10/13] hw/display/apple-gfx: " Akihiko Odaki
2025-05-29 5:46 ` [PATCH v5 11/13] qemu-thread: Remove qatomic_read() in qemu_event_set() Akihiko Odaki
2025-05-29 5:46 ` [PATCH v5 12/13] qemu-thread: Document QemuEvent Akihiko Odaki
2025-05-29 5:46 ` Akihiko Odaki [this message]
2025-06-05 13:24 ` [PATCH v5 00/13] Improve futex usage Paolo Bonzini
2025-06-06 9:46 ` Akihiko Odaki
2025-06-06 20:38 ` Paolo Bonzini
2025-06-07 4:35 ` Akihiko Odaki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250529-event-v5-13-53b285203794@daynix.com \
--to=akihiko.odaki@daynix.com \
--cc=berrange@redhat.com \
--cc=devel@daynix.com \
--cc=farosas@suse.de \
--cc=marcandre.lureau@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=phil@philjordan.eu \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=sw@weilnetz.de \
--cc=zhanghailiang@xfusion.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).