qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] Improve futex usage
@ 2025-05-10  8:51 Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 1/9] futex: Check value after qemu_futex_wait() Akihiko Odaki
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Akihiko Odaki @ 2025-05-10  8:51 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Akihiko Odaki

In a recent discussion, Phil Dennis-Jordan pointed out a quirk in
QemuEvent destruction due to futex-like abstraction, which prevented
the usage of QemuEvent in new and existing code[1]. With some more
thoughts after this discussion, I also found other problem and room
of improvement in futex usage. Here is a stack of patches to resolve
them.

Patch "futex: Check value after qemu_futex_wait()" ensures
qemu_futex_wait() is used in loops as suggested in the man page.

Patch "futex: Support Windows" implements futex functions for Windows.

Patch "qemu-thread: Avoid futex abstraction for non-Linux" and
"qemu-thread: Use futex for QemuEvent on Windows" enable destroying
QemuEvent immediately after qemu_event_wait().

Patch "qemu-thread: Use futex for QemuEvent on Windows" and
"qemu-thread: Use futex if available for QemuLockCnt" make the use of
futex functions added for Windows.

Patches "migration: Replace QemuSemaphore with QemuEvent",
"migration/colo: Replace QemuSemaphore with QemuEvent",
"migration/postcopy: Replace QemuSemaphore with QemuEvent", and
"hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent" replace
some QemuSemaphores with QemuEvents, which can utilize futex. Some of
them rely on that QemuEvent can be destroyed immediately after
qemu_event_wait().

[1]: https://lore.kernel.org/r/CAAibmn3HZeDeK8FrYhHa1GGwc+N8rBuB2VvMRm7LCt0mUGmsYQ@mail.gmail.com

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
Changes in v2:
- Rebased.
- Added patch
  "hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent".
- Link to v1: https://lore.kernel.org/r/20241225-event-v1-0-a58c8d63eb70@daynix.com

---
Akihiko Odaki (9):
      futex: Check value after qemu_futex_wait()
      futex: Support Windows
      qemu-thread: Avoid futex abstraction for non-Linux
      qemu-thread: Use futex for QemuEvent on Windows
      qemu-thread: Use futex if available for QemuLockCnt
      migration: Replace QemuSemaphore with QemuEvent
      migration/colo: Replace QemuSemaphore with QemuEvent
      migration/postcopy: Replace QemuSemaphore with QemuEvent
      hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent

 meson.build                       |   2 +
 include/qemu/futex.h              |  43 ++++++++++-
 include/qemu/lockcnt.h            |   2 +-
 include/qemu/thread-posix.h       |   9 ---
 include/qemu/thread-win32.h       |   6 --
 include/qemu/thread.h             |  11 ++-
 migration/migration.h             |  16 ++---
 migration/colo.c                  |  20 +++---
 migration/migration.c             |  29 ++++----
 migration/postcopy-ram.c          |  10 +--
 migration/savevm.c                |   2 +-
 tests/unit/test-aio-multithread.c |   6 +-
 util/event.c                      | 144 +++++++++++++++++++++++++++++++++++++
 util/lockcnt.c                    |   9 +--
 util/qemu-thread-posix.c          | 148 --------------------------------------
 util/qemu-thread-win32.c          | 129 ---------------------------------
 hw/display/apple-gfx.m            |  10 +--
 util/meson.build                  |   3 +-
 18 files changed, 250 insertions(+), 349 deletions(-)
---
base-commit: 4b1f5b73e060971c434e70694d571adfee553027
change-id: 20241031-event-785a2f0dda4a

Best regards,
-- 
Akihiko Odaki <akihiko.odaki@daynix.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/9] futex: Check value after qemu_futex_wait()
  2025-05-10  8:51 [PATCH v2 0/9] Improve futex usage Akihiko Odaki
@ 2025-05-10  8:51 ` Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 2/9] futex: Support Windows Akihiko Odaki
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Akihiko Odaki @ 2025-05-10  8:51 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Akihiko Odaki

futex(2) - Linux manual page
https://man7.org/linux/man-pages/man2/futex.2.html
> Note that a wake-up can also be caused by common futex usage patterns
> in unrelated code that happened to have previously used the futex
> word's memory location (e.g., typical futex-based implementations of
> Pthreads mutexes can cause this under some conditions).  Therefore,
> callers should always conservatively assume that a return value of 0
> can mean a spurious wake-up, and use the futex word's value (i.e.,
> the user-space synchronization scheme) to decide whether to continue
> to block or not.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/qemu/futex.h              |  9 +++++++++
 tests/unit/test-aio-multithread.c |  4 +++-
 util/qemu-thread-posix.c          | 28 ++++++++++++++++------------
 3 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/include/qemu/futex.h b/include/qemu/futex.h
index 91ae88966e12..f57774005330 100644
--- a/include/qemu/futex.h
+++ b/include/qemu/futex.h
@@ -24,6 +24,15 @@ static inline void qemu_futex_wake(void *f, int n)
     qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0);
 }
 
+/*
+ * Note that a wake-up can also be caused by common futex usage patterns in
+ * unrelated code that happened to have previously used the futex word's
+ * memory location (e.g., typical futex-based implementations of Pthreads
+ * mutexes can cause this under some conditions).  Therefore, callers should
+ * always conservatively assume that it is a spurious wake-up, and use the futex
+ * word's value (i.e., the user-space synchronization scheme) to decide whether
+ * to continue to block or not.
+ */
 static inline void qemu_futex_wait(void *f, unsigned val)
 {
     while (qemu_futex(f, FUTEX_WAIT, (int) val, NULL, NULL, 0)) {
diff --git a/tests/unit/test-aio-multithread.c b/tests/unit/test-aio-multithread.c
index 08d4570ccb14..8c2e41545a29 100644
--- a/tests/unit/test-aio-multithread.c
+++ b/tests/unit/test-aio-multithread.c
@@ -305,7 +305,9 @@ static void mcs_mutex_lock(void)
     prev = qatomic_xchg(&mutex_head, id);
     if (prev != -1) {
         qatomic_set(&nodes[prev].next, id);
-        qemu_futex_wait(&nodes[id].locked, 1);
+        while (qatomic_read(&nodes[id].locked) == 1) {
+            qemu_futex_wait(&nodes[id].locked, 1);
+        }
     }
 }
 
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index b2e26e21205b..eade5311d175 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -428,17 +428,21 @@ void qemu_event_wait(QemuEvent *ev)
 
     assert(ev->initialized);
 
-    /*
-     * qemu_event_wait must synchronize with qemu_event_set even if it does
-     * not go down the slow path, so this load-acquire is needed that
-     * synchronizes with the first memory barrier in qemu_event_set().
-     *
-     * If we do go down the slow path, there is no requirement at all: we
-     * might miss a qemu_event_set() here but ultimately the memory barrier in
-     * qemu_futex_wait() will ensure the check is done correctly.
-     */
-    value = qatomic_load_acquire(&ev->value);
-    if (value != EV_SET) {
+    while (true) {
+        /*
+         * qemu_event_wait must synchronize with qemu_event_set even if it does
+         * not go down the slow path, so this load-acquire is needed that
+         * synchronizes with the first memory barrier in qemu_event_set().
+         *
+         * If we do go down the slow path, there is no requirement at all: we
+         * might miss a qemu_event_set() here but ultimately the memory barrier
+         * in qemu_futex_wait() will ensure the check is done correctly.
+         */
+        value = qatomic_load_acquire(&ev->value);
+        if (value == EV_SET) {
+            break;
+        }
+
         if (value == EV_FREE) {
             /*
              * Leave the event reset and tell qemu_event_set that there are
@@ -452,7 +456,7 @@ void qemu_event_wait(QemuEvent *ev)
              * like the load above.
              */
             if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
-                return;
+                break;
             }
         }
 

-- 
2.49.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/9] futex: Support Windows
  2025-05-10  8:51 [PATCH v2 0/9] Improve futex usage Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 1/9] futex: Check value after qemu_futex_wait() Akihiko Odaki
@ 2025-05-10  8:51 ` Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 3/9] qemu-thread: Avoid futex abstraction for non-Linux Akihiko Odaki
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Akihiko Odaki @ 2025-05-10  8:51 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Akihiko Odaki

Windows supports futex-like APIs since Windows 8 and Windows Server
2012.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 meson.build                       |  2 ++
 include/qemu/futex.h              | 52 ++++++++++++++++++++++++++++++---------
 tests/unit/test-aio-multithread.c |  2 +-
 util/lockcnt.c                    |  2 +-
 util/qemu-thread-posix.c          |  4 +--
 util/meson.build                  |  2 +-
 6 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/meson.build b/meson.build
index 27f115015285..11d42e4a7d39 100644
--- a/meson.build
+++ b/meson.build
@@ -835,11 +835,13 @@ emulator_link_args = []
 midl = not_found
 widl = not_found
 pathcch = not_found
+synchronization = not_found
 host_dsosuf = '.so'
 if host_os == 'windows'
   midl = find_program('midl', required: false)
   widl = find_program('widl', required: false)
   pathcch = cc.find_library('pathcch')
+  synchronization = cc.find_library('Synchronization')
   socket = cc.find_library('ws2_32')
   winmm = cc.find_library('winmm')
 
diff --git a/include/qemu/futex.h b/include/qemu/futex.h
index f57774005330..c22d756021c9 100644
--- a/include/qemu/futex.h
+++ b/include/qemu/futex.h
@@ -1,5 +1,5 @@
 /*
- * Wrappers around Linux futex syscall
+ * Wrappers around Linux futex syscall and similar
  *
  * Copyright Red Hat, Inc. 2017
  *
@@ -11,28 +11,36 @@
  *
  */
 
+/*
+ * Note that a wake-up can also be caused by common futex usage patterns in
+ * unrelated code that happened to have previously used the futex word's
+ * memory location (e.g., typical futex-based implementations of Pthreads
+ * mutexes can cause this under some conditions).  Therefore, qemu_futex_wait()
+ * callers should always conservatively assume that it is a spurious wake-up,
+ * and use the futex word's value (i.e., the user-space synchronization scheme)
+ * to decide whether to continue to block or not.
+ */
+
 #ifndef QEMU_FUTEX_H
 #define QEMU_FUTEX_H
 
+#define HAVE_FUTEX
+
+#ifdef CONFIG_LINUX
 #include <sys/syscall.h>
 #include <linux/futex.h>
 
 #define qemu_futex(...)              syscall(__NR_futex, __VA_ARGS__)
 
-static inline void qemu_futex_wake(void *f, int n)
+static inline void qemu_futex_wake_all(void *f)
 {
-    qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0);
+    qemu_futex(f, FUTEX_WAKE, INT_MAX, NULL, NULL, 0);
 }
 
-/*
- * Note that a wake-up can also be caused by common futex usage patterns in
- * unrelated code that happened to have previously used the futex word's
- * memory location (e.g., typical futex-based implementations of Pthreads
- * mutexes can cause this under some conditions).  Therefore, callers should
- * always conservatively assume that it is a spurious wake-up, and use the futex
- * word's value (i.e., the user-space synchronization scheme) to decide whether
- * to continue to block or not.
- */
+static inline void qemu_futex_wake_single(void *f)
+{
+    qemu_futex(f, FUTEX_WAKE, 1, NULL, NULL, 0);
+}
 static inline void qemu_futex_wait(void *f, unsigned val)
 {
     while (qemu_futex(f, FUTEX_WAIT, (int) val, NULL, NULL, 0)) {
@@ -46,5 +54,25 @@ static inline void qemu_futex_wait(void *f, unsigned val)
         }
     }
 }
+#elif defined(CONFIG_WIN32)
+#include <synchapi.h>
+
+static inline void qemu_futex_wake_all(void *f)
+{
+    WakeByAddressAll(f);
+}
+
+static inline void qemu_futex_wake_single(void *f)
+{
+    WakeByAddressSingle(f);
+}
+
+static inline void qemu_futex_wait(void *f, unsigned val)
+{
+    WaitOnAddress(f, &val, sizeof(val), INFINITE);
+}
+#else
+#undef HAVE_FUTEX
+#endif
 
 #endif /* QEMU_FUTEX_H */
diff --git a/tests/unit/test-aio-multithread.c b/tests/unit/test-aio-multithread.c
index 8c2e41545a29..0ead6bf34ad1 100644
--- a/tests/unit/test-aio-multithread.c
+++ b/tests/unit/test-aio-multithread.c
@@ -330,7 +330,7 @@ static void mcs_mutex_unlock(void)
     /* Wake up the next in line.  */
     next = qatomic_read(&nodes[id].next);
     nodes[next].locked = 0;
-    qemu_futex_wake(&nodes[next].locked, 1);
+    qemu_futex_wake_single(&nodes[next].locked);
 }
 
 static void test_multi_fair_mutex_entry(void *opaque)
diff --git a/util/lockcnt.c b/util/lockcnt.c
index d07c6cc5cee4..ca27d8e61a5c 100644
--- a/util/lockcnt.c
+++ b/util/lockcnt.c
@@ -106,7 +106,7 @@ static bool qemu_lockcnt_cmpxchg_or_wait(QemuLockCnt *lockcnt, int *val,
 static void lockcnt_wake(QemuLockCnt *lockcnt)
 {
     trace_lockcnt_futex_wake(lockcnt);
-    qemu_futex_wake(&lockcnt->count, 1);
+    qemu_futex_wake_single(&lockcnt->count);
 }
 
 void qemu_lockcnt_inc(QemuLockCnt *lockcnt)
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index eade5311d175..13459e44c768 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -345,7 +345,7 @@ static inline void qemu_futex_wait(QemuEvent *ev, unsigned val)
 
 /* Valid transitions:
  * - free->set, when setting the event
- * - busy->set, when setting the event, followed by qemu_futex_wake
+ * - busy->set, when setting the event, followed by qemu_futex_wake_all
  * - set->free, when resetting the event
  * - free->busy, when waiting
  *
@@ -400,7 +400,7 @@ void qemu_event_set(QemuEvent *ev)
         smp_mb__after_rmw();
         if (old == EV_BUSY) {
             /* There were waiters, wake them up.  */
-            qemu_futex_wake(ev, INT_MAX);
+            qemu_futex_wake_all(ev);
         }
     }
 }
diff --git a/util/meson.build b/util/meson.build
index e5cd327e2767..c756558d6763 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -27,7 +27,7 @@ else
   util_ss.add(files('event_notifier-win32.c'))
   util_ss.add(files('oslib-win32.c'))
   util_ss.add(files('qemu-thread-win32.c'))
-  util_ss.add(winmm, pathcch)
+  util_ss.add(winmm, pathcch, synchronization)
 endif
 util_ss.add(when: linux_io_uring, if_true: files('fdmon-io_uring.c'))
 if glib_has_gslice

-- 
2.49.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/9] qemu-thread: Avoid futex abstraction for non-Linux
  2025-05-10  8:51 [PATCH v2 0/9] Improve futex usage Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 1/9] futex: Check value after qemu_futex_wait() Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 2/9] futex: Support Windows Akihiko Odaki
@ 2025-05-10  8:51 ` Akihiko Odaki
  2025-05-10 15:02   ` Paolo Bonzini
  2025-05-10  8:51 ` [PATCH v2 4/9] qemu-thread: Use futex for QemuEvent on Windows Akihiko Odaki
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Akihiko Odaki @ 2025-05-10  8:51 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Akihiko Odaki

qemu-thread used to abstract pthread primitives into futex for the
QemuEvent implementation of POSIX systems other than Linux. However,
this abstraction has one key difference: unlike futex, pthread
primitives require an explicit destruction, and it must be ordered after
wait and wake operations.

It would be easier to perform destruction if a wait operation ensures
the corresponding wake operation finishes as POSIX semaphore does, but
that requires to protect state accesses in qemu_event_set() and
qemu_event_wait() with a mutex. On the other hand, real futex does not
need such a protection but needs complex barrier and atomic operations
before wait and wake operations instead.

Add special implementations of qemu_event_set() and qemu_event_wait()
using pthread primitives. qemu_event_wait() will ensure qemu_event_set()
finishes, and these functions will avoid complex barrier and atomic
operations.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
 util/qemu-thread-posix.c | 45 ++++++++++++++++++---------------------------
 1 file changed, 18 insertions(+), 27 deletions(-)

diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 13459e44c768..805cac444f15 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -319,28 +319,6 @@ void qemu_sem_wait(QemuSemaphore *sem)
 
 #ifdef __linux__
 #include "qemu/futex.h"
-#else
-static inline void qemu_futex_wake(QemuEvent *ev, int n)
-{
-    assert(ev->initialized);
-    pthread_mutex_lock(&ev->lock);
-    if (n == 1) {
-        pthread_cond_signal(&ev->cond);
-    } else {
-        pthread_cond_broadcast(&ev->cond);
-    }
-    pthread_mutex_unlock(&ev->lock);
-}
-
-static inline void qemu_futex_wait(QemuEvent *ev, unsigned val)
-{
-    assert(ev->initialized);
-    pthread_mutex_lock(&ev->lock);
-    if (ev->value == val) {
-        pthread_cond_wait(&ev->cond, &ev->lock);
-    }
-    pthread_mutex_unlock(&ev->lock);
-}
 #endif
 
 /* Valid transitions:
@@ -363,7 +341,7 @@ static inline void qemu_futex_wait(QemuEvent *ev, unsigned val)
 
 void qemu_event_init(QemuEvent *ev, bool init)
 {
-#ifndef __linux__
+#ifndef CONFIG_LINUX
     pthread_mutex_init(&ev->lock, NULL);
     pthread_cond_init(&ev->cond, NULL);
 #endif
@@ -376,7 +354,7 @@ void qemu_event_destroy(QemuEvent *ev)
 {
     assert(ev->initialized);
     ev->initialized = false;
-#ifndef __linux__
+#ifndef CONFIG_LINUX
     pthread_mutex_destroy(&ev->lock);
     pthread_cond_destroy(&ev->cond);
 #endif
@@ -386,6 +364,7 @@ void qemu_event_set(QemuEvent *ev)
 {
     assert(ev->initialized);
 
+#ifdef CONFIG_LINUX
     /*
      * Pairs with both qemu_event_reset() and qemu_event_wait().
      *
@@ -403,6 +382,12 @@ void qemu_event_set(QemuEvent *ev)
             qemu_futex_wake_all(ev);
         }
     }
+#else
+    pthread_mutex_lock(&ev->lock);
+    qatomic_set(&ev->value, EV_SET);
+    pthread_cond_broadcast(&ev->cond);
+    pthread_mutex_unlock(&ev->lock);
+#endif
 }
 
 void qemu_event_reset(QemuEvent *ev)
@@ -424,10 +409,9 @@ void qemu_event_reset(QemuEvent *ev)
 
 void qemu_event_wait(QemuEvent *ev)
 {
-    unsigned value;
-
     assert(ev->initialized);
 
+#ifdef CONFIG_LINUX
     while (true) {
         /*
          * qemu_event_wait must synchronize with qemu_event_set even if it does
@@ -438,7 +422,7 @@ void qemu_event_wait(QemuEvent *ev)
          * might miss a qemu_event_set() here but ultimately the memory barrier
          * in qemu_futex_wait() will ensure the check is done correctly.
          */
-        value = qatomic_load_acquire(&ev->value);
+        unsigned value = qatomic_load_acquire(&ev->value);
         if (value == EV_SET) {
             break;
         }
@@ -467,6 +451,13 @@ void qemu_event_wait(QemuEvent *ev)
          */
         qemu_futex_wait(ev, EV_BUSY);
     }
+#else
+    pthread_mutex_lock(&ev->lock);
+    if (qatomic_read(&ev->value) != EV_SET) {
+        pthread_cond_wait(&ev->cond, &ev->lock);
+    }
+    pthread_mutex_unlock(&ev->lock);
+#endif
 }
 
 static __thread NotifierList thread_exit;

-- 
2.49.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/9] qemu-thread: Use futex for QemuEvent on Windows
  2025-05-10  8:51 [PATCH v2 0/9] Improve futex usage Akihiko Odaki
                   ` (2 preceding siblings ...)
  2025-05-10  8:51 ` [PATCH v2 3/9] qemu-thread: Avoid futex abstraction for non-Linux Akihiko Odaki
@ 2025-05-10  8:51 ` Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 5/9] qemu-thread: Use futex if available for QemuLockCnt Akihiko Odaki
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Akihiko Odaki @ 2025-05-10  8:51 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Akihiko Odaki

Use the futex-based implementation of QemuEvent on Windows to
remove code duplication and remove the overhead of event object
construction and destruction.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/qemu/thread-posix.h |   9 ---
 include/qemu/thread-win32.h |   6 --
 include/qemu/thread.h       |  11 +++-
 util/event.c                | 144 ++++++++++++++++++++++++++++++++++++++++++++
 util/qemu-thread-posix.c    | 143 -------------------------------------------
 util/qemu-thread-win32.c    | 129 ---------------------------------------
 util/meson.build            |   1 +
 7 files changed, 155 insertions(+), 288 deletions(-)

diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
index 5f2f3d1386bc..758808b705e4 100644
--- a/include/qemu/thread-posix.h
+++ b/include/qemu/thread-posix.h
@@ -32,15 +32,6 @@ struct QemuSemaphore {
     unsigned int count;
 };
 
-struct QemuEvent {
-#ifndef __linux__
-    pthread_mutex_t lock;
-    pthread_cond_t cond;
-#endif
-    unsigned value;
-    bool initialized;
-};
-
 struct QemuThread {
     pthread_t thread;
 };
diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h
index d95af4498fc9..da9e7322990c 100644
--- a/include/qemu/thread-win32.h
+++ b/include/qemu/thread-win32.h
@@ -28,12 +28,6 @@ struct QemuSemaphore {
     bool initialized;
 };
 
-struct QemuEvent {
-    int value;
-    HANDLE event;
-    bool initialized;
-};
-
 typedef struct QemuThreadData QemuThreadData;
 struct QemuThread {
     QemuThreadData *data;
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 6f800aad31a9..573f8c9ede20 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -3,13 +3,22 @@
 
 #include "qemu/processor.h"
 #include "qemu/atomic.h"
+#include "qemu/futex.h"
 
 typedef struct QemuCond QemuCond;
 typedef struct QemuSemaphore QemuSemaphore;
-typedef struct QemuEvent QemuEvent;
 typedef struct QemuLockCnt QemuLockCnt;
 typedef struct QemuThread QemuThread;
 
+typedef struct QemuEvent {
+#ifndef HAVE_FUTEX
+    pthread_mutex_t lock;
+    pthread_cond_t cond;
+#endif
+    unsigned value;
+    bool initialized;
+} QemuEvent;
+
 #ifdef _WIN32
 #include "qemu/thread-win32.h"
 #else
diff --git a/util/event.c b/util/event.c
new file mode 100644
index 000000000000..34ac7a08d0bf
--- /dev/null
+++ b/util/event.c
@@ -0,0 +1,144 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "qemu/osdep.h"
+#include "qemu/thread.h"
+
+/*
+ * Valid transitions:
+ * - free->set, when setting the event
+ * - busy->set, when setting the event, followed by qemu_futex_wake_all
+ * - set->free, when resetting the event
+ * - free->busy, when waiting
+ *
+ * set->busy does not happen (it can be observed from the outside but
+ * it really is set->free->busy).
+ *
+ * busy->free provably cannot happen; to enforce it, the set->free transition
+ * is done with an OR, which becomes a no-op if the event has concurrently
+ * transitioned to free or busy.
+ */
+
+#define EV_SET         0
+#define EV_FREE        1
+#define EV_BUSY       -1
+
+void qemu_event_init(QemuEvent *ev, bool init)
+{
+#ifndef HAVE_FUTEX
+    pthread_mutex_init(&ev->lock, NULL);
+    pthread_cond_init(&ev->cond, NULL);
+#endif
+
+    ev->value = (init ? EV_SET : EV_FREE);
+    ev->initialized = true;
+}
+
+void qemu_event_destroy(QemuEvent *ev)
+{
+    assert(ev->initialized);
+    ev->initialized = false;
+#ifndef HAVE_FUTEX
+    pthread_mutex_destroy(&ev->lock);
+    pthread_cond_destroy(&ev->cond);
+#endif
+}
+
+void qemu_event_set(QemuEvent *ev)
+{
+    assert(ev->initialized);
+
+#ifdef HAVE_FUTEX
+    /*
+     * Pairs with both qemu_event_reset() and qemu_event_wait().
+     *
+     * qemu_event_set has release semantics, but because it *loads*
+     * ev->value we need a full memory barrier here.
+     */
+    smp_mb();
+    if (qatomic_read(&ev->value) != EV_SET) {
+        int old = qatomic_xchg(&ev->value, EV_SET);
+
+        /* Pairs with memory barrier in kernel futex_wait system call.  */
+        smp_mb__after_rmw();
+        if (old == EV_BUSY) {
+            /* There were waiters, wake them up.  */
+            qemu_futex_wake_all(ev);
+        }
+    }
+#else
+    pthread_mutex_lock(&ev->lock);
+    qatomic_set(&ev->value, EV_SET);
+    pthread_cond_broadcast(&ev->cond);
+    pthread_mutex_unlock(&ev->lock);
+#endif
+}
+
+void qemu_event_reset(QemuEvent *ev)
+{
+    assert(ev->initialized);
+
+    /*
+     * If there was a concurrent reset (or even reset+wait),
+     * do nothing.  Otherwise change EV_SET->EV_FREE.
+     */
+    qatomic_or(&ev->value, EV_FREE);
+
+    /*
+     * Order reset before checking the condition in the caller.
+     * Pairs with the first memory barrier in qemu_event_set().
+     */
+    smp_mb__after_rmw();
+}
+
+void qemu_event_wait(QemuEvent *ev)
+{
+    assert(ev->initialized);
+
+#ifdef HAVE_FUTEX
+    while (true) {
+        /*
+         * qemu_event_wait must synchronize with qemu_event_set even if it does
+         * not go down the slow path, so this load-acquire is needed that
+         * synchronizes with the first memory barrier in qemu_event_set().
+         *
+         * If we do go down the slow path, there is no requirement at all: we
+         * might miss a qemu_event_set() here but ultimately the memory barrier
+         * in qemu_futex_wait() will ensure the check is done correctly.
+         */
+        unsigned value = qatomic_load_acquire(&ev->value);
+        if (value == EV_SET) {
+            break;
+        }
+
+        if (value == EV_FREE) {
+            /*
+             * Leave the event reset and tell qemu_event_set that there are
+             * waiters.  No need to retry, because there cannot be a concurrent
+             * busy->free transition.  After the CAS, the event will be either
+             * set or busy.
+             *
+             * This cmpxchg doesn't have particular ordering requirements if it
+             * succeeds (moving the store earlier can only cause
+             * qemu_event_set() to issue _more_ wakeups), the failing case needs
+             * acquire semantics like the load above.
+             */
+            if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
+                break;
+            }
+        }
+
+        /*
+         * This is the final check for a concurrent set, so it does need
+         * a smp_mb() pairing with the second barrier of qemu_event_set().
+         * The barrier is inside the FUTEX_WAIT system call.
+         */
+        qemu_futex_wait(ev, EV_BUSY);
+    }
+#else
+    pthread_mutex_lock(&ev->lock);
+    if (qatomic_read(&ev->value) != EV_SET) {
+        pthread_cond_wait(&ev->cond, &ev->lock);
+    }
+    pthread_mutex_unlock(&ev->lock);
+#endif
+}
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 805cac444f15..ba725444ba63 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -317,149 +317,6 @@ void qemu_sem_wait(QemuSemaphore *sem)
     qemu_mutex_unlock(&sem->mutex);
 }
 
-#ifdef __linux__
-#include "qemu/futex.h"
-#endif
-
-/* Valid transitions:
- * - free->set, when setting the event
- * - busy->set, when setting the event, followed by qemu_futex_wake_all
- * - set->free, when resetting the event
- * - free->busy, when waiting
- *
- * set->busy does not happen (it can be observed from the outside but
- * it really is set->free->busy).
- *
- * busy->free provably cannot happen; to enforce it, the set->free transition
- * is done with an OR, which becomes a no-op if the event has concurrently
- * transitioned to free or busy.
- */
-
-#define EV_SET         0
-#define EV_FREE        1
-#define EV_BUSY       -1
-
-void qemu_event_init(QemuEvent *ev, bool init)
-{
-#ifndef CONFIG_LINUX
-    pthread_mutex_init(&ev->lock, NULL);
-    pthread_cond_init(&ev->cond, NULL);
-#endif
-
-    ev->value = (init ? EV_SET : EV_FREE);
-    ev->initialized = true;
-}
-
-void qemu_event_destroy(QemuEvent *ev)
-{
-    assert(ev->initialized);
-    ev->initialized = false;
-#ifndef CONFIG_LINUX
-    pthread_mutex_destroy(&ev->lock);
-    pthread_cond_destroy(&ev->cond);
-#endif
-}
-
-void qemu_event_set(QemuEvent *ev)
-{
-    assert(ev->initialized);
-
-#ifdef CONFIG_LINUX
-    /*
-     * Pairs with both qemu_event_reset() and qemu_event_wait().
-     *
-     * qemu_event_set has release semantics, but because it *loads*
-     * ev->value we need a full memory barrier here.
-     */
-    smp_mb();
-    if (qatomic_read(&ev->value) != EV_SET) {
-        int old = qatomic_xchg(&ev->value, EV_SET);
-
-        /* Pairs with memory barrier in kernel futex_wait system call.  */
-        smp_mb__after_rmw();
-        if (old == EV_BUSY) {
-            /* There were waiters, wake them up.  */
-            qemu_futex_wake_all(ev);
-        }
-    }
-#else
-    pthread_mutex_lock(&ev->lock);
-    qatomic_set(&ev->value, EV_SET);
-    pthread_cond_broadcast(&ev->cond);
-    pthread_mutex_unlock(&ev->lock);
-#endif
-}
-
-void qemu_event_reset(QemuEvent *ev)
-{
-    assert(ev->initialized);
-
-    /*
-     * If there was a concurrent reset (or even reset+wait),
-     * do nothing.  Otherwise change EV_SET->EV_FREE.
-     */
-    qatomic_or(&ev->value, EV_FREE);
-
-    /*
-     * Order reset before checking the condition in the caller.
-     * Pairs with the first memory barrier in qemu_event_set().
-     */
-    smp_mb__after_rmw();
-}
-
-void qemu_event_wait(QemuEvent *ev)
-{
-    assert(ev->initialized);
-
-#ifdef CONFIG_LINUX
-    while (true) {
-        /*
-         * qemu_event_wait must synchronize with qemu_event_set even if it does
-         * not go down the slow path, so this load-acquire is needed that
-         * synchronizes with the first memory barrier in qemu_event_set().
-         *
-         * If we do go down the slow path, there is no requirement at all: we
-         * might miss a qemu_event_set() here but ultimately the memory barrier
-         * in qemu_futex_wait() will ensure the check is done correctly.
-         */
-        unsigned value = qatomic_load_acquire(&ev->value);
-        if (value == EV_SET) {
-            break;
-        }
-
-        if (value == EV_FREE) {
-            /*
-             * Leave the event reset and tell qemu_event_set that there are
-             * waiters.  No need to retry, because there cannot be a concurrent
-             * busy->free transition.  After the CAS, the event will be either
-             * set or busy.
-             *
-             * This cmpxchg doesn't have particular ordering requirements if it
-             * succeeds (moving the store earlier can only cause qemu_event_set()
-             * to issue _more_ wakeups), the failing case needs acquire semantics
-             * like the load above.
-             */
-            if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
-                break;
-            }
-        }
-
-        /*
-         * This is the final check for a concurrent set, so it does need
-         * a smp_mb() pairing with the second barrier of qemu_event_set().
-         * The barrier is inside the FUTEX_WAIT system call.
-         */
-        qemu_futex_wait(ev, EV_BUSY);
-    }
-#else
-    pthread_mutex_lock(&ev->lock);
-    if (qatomic_read(&ev->value) != EV_SET) {
-        pthread_cond_wait(&ev->cond, &ev->lock);
-    }
-    pthread_mutex_unlock(&ev->lock);
-#endif
-}
-
 static __thread NotifierList thread_exit;
 
 /*
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index a7fe3cc345f0..ca2e0b512e26 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -231,135 +231,6 @@ void qemu_sem_wait(QemuSemaphore *sem)
     }
 }
 
-/* Wrap a Win32 manual-reset event with a fast userspace path.  The idea
- * is to reset the Win32 event lazily, as part of a test-reset-test-wait
- * sequence.  Such a sequence is, indeed, how QemuEvents are used by
- * RCU and other subsystems!
- *
- * Valid transitions:
- * - free->set, when setting the event
- * - busy->set, when setting the event, followed by SetEvent
- * - set->free, when resetting the event
- * - free->busy, when waiting
- *
- * set->busy does not happen (it can be observed from the outside but
- * it really is set->free->busy).
- *
- * busy->free provably cannot happen; to enforce it, the set->free transition
- * is done with an OR, which becomes a no-op if the event has concurrently
- * transitioned to free or busy (and is faster than cmpxchg).
- */
-
-#define EV_SET         0
-#define EV_FREE        1
-#define EV_BUSY       -1
-
-void qemu_event_init(QemuEvent *ev, bool init)
-{
-    /* Manual reset.  */
-    ev->event = CreateEvent(NULL, TRUE, TRUE, NULL);
-    ev->value = (init ? EV_SET : EV_FREE);
-    ev->initialized = true;
-}
-
-void qemu_event_destroy(QemuEvent *ev)
-{
-    assert(ev->initialized);
-    ev->initialized = false;
-    CloseHandle(ev->event);
-}
-
-void qemu_event_set(QemuEvent *ev)
-{
-    assert(ev->initialized);
-
-    /*
-     * Pairs with both qemu_event_reset() and qemu_event_wait().
-     *
-     * qemu_event_set has release semantics, but because it *loads*
-     * ev->value we need a full memory barrier here.
-     */
-    smp_mb();
-    if (qatomic_read(&ev->value) != EV_SET) {
-        int old = qatomic_xchg(&ev->value, EV_SET);
-
-        /* Pairs with memory barrier after ResetEvent.  */
-        smp_mb__after_rmw();
-        if (old == EV_BUSY) {
-            /* There were waiters, wake them up.  */
-            SetEvent(ev->event);
-        }
-    }
-}
-
-void qemu_event_reset(QemuEvent *ev)
-{
-    assert(ev->initialized);
-
-    /*
-     * If there was a concurrent reset (or even reset+wait),
-     * do nothing.  Otherwise change EV_SET->EV_FREE.
-     */
-    qatomic_or(&ev->value, EV_FREE);
-
-    /*
-     * Order reset before checking the condition in the caller.
-     * Pairs with the first memory barrier in qemu_event_set().
-     */
-    smp_mb__after_rmw();
-}
-
-void qemu_event_wait(QemuEvent *ev)
-{
-    unsigned value;
-
-    assert(ev->initialized);
-
-    /*
-     * qemu_event_wait must synchronize with qemu_event_set even if it does
-     * not go down the slow path, so this load-acquire is needed that
-     * synchronizes with the first memory barrier in qemu_event_set().
-     *
-     * If we do go down the slow path, there is no requirement at all: we
-     * might miss a qemu_event_set() here but ultimately the memory barrier in
-     * qemu_futex_wait() will ensure the check is done correctly.
-     */
-    value = qatomic_load_acquire(&ev->value);
-    if (value != EV_SET) {
-        if (value == EV_FREE) {
-            /*
-             * Here the underlying kernel event is reset, but qemu_event_set is
-             * not yet going to call SetEvent.  However, there will be another
-             * check for EV_SET below when setting EV_BUSY.  At that point it
-             * is safe to call WaitForSingleObject.
-             */
-            ResetEvent(ev->event);
-
-            /*
-             * It is not clear whether ResetEvent provides this barrier; kernel
-             * APIs (KeResetEvent/KeClearEvent) do not.  Better safe than sorry!
-             */
-            smp_mb();
-
-            /*
-             * Leave the event reset and tell qemu_event_set that there are
-             * waiters.  No need to retry, because there cannot be a concurrent
-             * busy->free transition.  After the CAS, the event will be either
-             * set or busy.
-             */
-            if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
-                return;
-            }
-        }
-
-        /*
-         * ev->value is now EV_BUSY.  Since we didn't observe EV_SET,
-         * qemu_event_set() must observe EV_BUSY and call SetEvent().
-         */
-        WaitForSingleObject(ev->event, INFINITE);
-    }
-}
-
 struct QemuThreadData {
     /* Passed to win32_start_routine.  */
     void             *(*start_routine)(void *);
diff --git a/util/meson.build b/util/meson.build
index c756558d6763..58033131328a 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -35,6 +35,7 @@ if glib_has_gslice
 endif
 util_ss.add(files('defer-call.c'))
 util_ss.add(files('envlist.c', 'path.c', 'module.c'))
+util_ss.add(files('event.c'))
 util_ss.add(files('host-utils.c'))
 util_ss.add(files('bitmap.c', 'bitops.c'))
 util_ss.add(files('fifo8.c'))

-- 
2.49.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/9] qemu-thread: Use futex if available for QemuLockCnt
  2025-05-10  8:51 [PATCH v2 0/9] Improve futex usage Akihiko Odaki
                   ` (3 preceding siblings ...)
  2025-05-10  8:51 ` [PATCH v2 4/9] qemu-thread: Use futex for QemuEvent on Windows Akihiko Odaki
@ 2025-05-10  8:51 ` Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 6/9] migration: Replace QemuSemaphore with QemuEvent Akihiko Odaki
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Akihiko Odaki @ 2025-05-10  8:51 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Akihiko Odaki

This unlocks the futex-based implementation of QemuLockCnt to Windows.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 include/qemu/lockcnt.h | 2 +-
 util/lockcnt.c         | 7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/qemu/lockcnt.h b/include/qemu/lockcnt.h
index f4b62a3f7011..5a2800e3f182 100644
--- a/include/qemu/lockcnt.h
+++ b/include/qemu/lockcnt.h
@@ -17,7 +17,7 @@
 typedef struct QemuLockCnt QemuLockCnt;
 
 struct QemuLockCnt {
-#ifndef CONFIG_LINUX
+#ifndef HAVE_FUTEX
     QemuMutex mutex;
 #endif
     unsigned count;
diff --git a/util/lockcnt.c b/util/lockcnt.c
index ca27d8e61a5c..92c9f8ceca88 100644
--- a/util/lockcnt.c
+++ b/util/lockcnt.c
@@ -12,10 +12,11 @@
 #include "qemu/atomic.h"
 #include "trace.h"
 
-#ifdef CONFIG_LINUX
-#include "qemu/futex.h"
+#ifdef HAVE_FUTEX
 
-/* On Linux, bits 0-1 are a futex-based lock, bits 2-31 are the counter.
+/*
+ * When futex is available, bits 0-1 are a futex-based lock, bits 2-31 are the
+ * counter.
  * For the mutex algorithm see Ulrich Drepper's "Futexes Are Tricky" (ok,
  * this is not the most relaxing citation I could make...).  It is similar
  * to mutex2 in the paper.

-- 
2.49.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 6/9] migration: Replace QemuSemaphore with QemuEvent
  2025-05-10  8:51 [PATCH v2 0/9] Improve futex usage Akihiko Odaki
                   ` (4 preceding siblings ...)
  2025-05-10  8:51 ` [PATCH v2 5/9] qemu-thread: Use futex if available for QemuLockCnt Akihiko Odaki
@ 2025-05-10  8:51 ` Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 7/9] migration/colo: " Akihiko Odaki
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Akihiko Odaki @ 2025-05-10  8:51 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Akihiko Odaki

rp_pong_acks tells if it has ever received one pong. QemuEvent is
better suited for this usage because it represents a boolean rather
than integer and will not decrement with the wait operation.

pause_event can utilize qemu_event_reset() to discard events.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 migration/migration.h |  6 +++---
 migration/migration.c | 29 +++++++++++++----------------
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index d53f7cad84d8..11dba5f4da77 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -342,11 +342,11 @@ struct MigrationState {
          */
         QemuSemaphore rp_sem;
         /*
-         * We post to this when we got one PONG from dest. So far it's an
+         * We set this when we got one PONG from dest. So far it's an
          * easy way to know the main channel has successfully established
          * on dest QEMU.
          */
-        QemuSemaphore rp_pong_acks;
+        QemuEvent rp_pong_acks;
     } rp_state;
 
     double mbps;
@@ -379,7 +379,7 @@ struct MigrationState {
     QemuSemaphore wait_unplug_sem;
 
     /* Migration is paused due to pause-before-switchover */
-    QemuSemaphore pause_sem;
+    QemuEvent pause_event;
 
     /* The semaphore is used to notify COLO thread that failover is finished */
     QemuSemaphore colo_exit_sem;
diff --git a/migration/migration.c b/migration/migration.c
index 4697732bef91..053c23c58d82 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1630,7 +1630,7 @@ void migration_cancel(void)
         }
         /* If the migration is paused, kick it out of the pause */
         if (old_state == MIGRATION_STATUS_PRE_SWITCHOVER) {
-            qemu_sem_post(&s->pause_sem);
+            qemu_event_set(&s->pause_event);
         }
         migrate_set_state(&s->state, old_state, MIGRATION_STATUS_CANCELLING);
     } while (s->state != MIGRATION_STATUS_CANCELLING);
@@ -2342,7 +2342,7 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp)
                    MigrationStatus_str(s->state));
         return;
     }
-    qemu_sem_post(&s->pause_sem);
+    qemu_event_set(&s->pause_event);
 }
 
 int migration_rp_wait(MigrationState *s)
@@ -2550,7 +2550,7 @@ static void *source_return_path_thread(void *opaque)
         case MIG_RP_MSG_PONG:
             tmp32 = ldl_be_p(buf);
             trace_source_return_path_thread_pong(tmp32);
-            qemu_sem_post(&ms->rp_state.rp_pong_acks);
+            qemu_event_set(&ms->rp_state.rp_pong_acks);
             break;
 
         case MIG_RP_MSG_REQ_PAGES:
@@ -2693,7 +2693,7 @@ static inline void
 migration_wait_main_channel(MigrationState *ms)
 {
     /* Wait until one PONG message received */
-    qemu_sem_wait(&ms->rp_state.rp_pong_acks);
+    qemu_event_wait(&ms->rp_state.rp_pong_acks);
 }
 
 /*
@@ -2911,21 +2911,18 @@ static bool migration_switchover_prepare(MigrationState *s)
         return true;
     }
 
-    /* Since leaving this state is not atomic with posting the semaphore
+    /*
+     * Since leaving this state is not atomic with setting the event
      * it's possible that someone could have issued multiple migrate_continue
-     * and the semaphore is incorrectly positive at this point;
-     * the docs say it's undefined to reinit a semaphore that's already
-     * init'd, so use timedwait to eat up any existing posts.
+     * and the event is incorrectly set at this point so reset it.
      */
-    while (qemu_sem_timedwait(&s->pause_sem, 1) == 0) {
-        /* This block intentionally left blank */
-    }
+    qemu_event_reset(&s->pause_event);
 
     /* Update [POSTCOPY_]ACTIVE to PRE_SWITCHOVER */
     migrate_set_state(&s->state, s->state, MIGRATION_STATUS_PRE_SWITCHOVER);
     bql_unlock();
 
-    qemu_sem_wait(&s->pause_sem);
+    qemu_event_wait(&s->pause_event);
 
     bql_lock();
     /*
@@ -4057,10 +4054,10 @@ static void migration_instance_finalize(Object *obj)
     qemu_mutex_destroy(&ms->qemu_file_lock);
     qemu_sem_destroy(&ms->wait_unplug_sem);
     qemu_sem_destroy(&ms->rate_limit_sem);
-    qemu_sem_destroy(&ms->pause_sem);
+    qemu_event_destroy(&ms->pause_event);
     qemu_sem_destroy(&ms->postcopy_pause_sem);
     qemu_sem_destroy(&ms->rp_state.rp_sem);
-    qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
+    qemu_event_destroy(&ms->rp_state.rp_pong_acks);
     qemu_sem_destroy(&ms->postcopy_qemufile_src_sem);
     error_free(ms->error);
 }
@@ -4072,14 +4069,14 @@ static void migration_instance_init(Object *obj)
     ms->state = MIGRATION_STATUS_NONE;
     ms->mbps = -1;
     ms->pages_per_second = -1;
-    qemu_sem_init(&ms->pause_sem, 0);
+    qemu_event_init(&ms->pause_event, false);
     qemu_mutex_init(&ms->error_mutex);
 
     migrate_params_init(&ms->parameters);
 
     qemu_sem_init(&ms->postcopy_pause_sem, 0);
     qemu_sem_init(&ms->rp_state.rp_sem, 0);
-    qemu_sem_init(&ms->rp_state.rp_pong_acks, 0);
+    qemu_event_init(&ms->rp_state.rp_pong_acks, false);
     qemu_sem_init(&ms->rate_limit_sem, 0);
     qemu_sem_init(&ms->wait_unplug_sem, 0);
     qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0);

-- 
2.49.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 7/9] migration/colo: Replace QemuSemaphore with QemuEvent
  2025-05-10  8:51 [PATCH v2 0/9] Improve futex usage Akihiko Odaki
                   ` (5 preceding siblings ...)
  2025-05-10  8:51 ` [PATCH v2 6/9] migration: Replace QemuSemaphore with QemuEvent Akihiko Odaki
@ 2025-05-10  8:51 ` Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 8/9] migration/postcopy: " Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 9/9] hw/display/apple-gfx: " Akihiko Odaki
  8 siblings, 0 replies; 12+ messages in thread
From: Akihiko Odaki @ 2025-05-10  8:51 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Akihiko Odaki

colo_exit_sem and colo_incoming_sem represent one-shot events so they
can be converted into QemuEvent, which is more lightweight.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 migration/migration.h |  6 +++---
 migration/colo.c      | 20 ++++++++++----------
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index 11dba5f4da77..eec49bf3f893 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -186,7 +186,7 @@ struct MigrationIncomingState {
 
     /* The coroutine we should enter (back) after failover */
     Coroutine *colo_incoming_co;
-    QemuSemaphore colo_incoming_sem;
+    QemuEvent colo_incoming_event;
 
     /* Optional load threads pool and its thread exit request flag */
     ThreadPool *load_threads;
@@ -381,8 +381,8 @@ struct MigrationState {
     /* Migration is paused due to pause-before-switchover */
     QemuEvent pause_event;
 
-    /* The semaphore is used to notify COLO thread that failover is finished */
-    QemuSemaphore colo_exit_sem;
+    /* The event is used to notify COLO thread that failover is finished */
+    QemuEvent colo_exit_event;
 
     /* The event is used to notify COLO thread to do checkpoint */
     QemuEvent colo_checkpoint_event;
diff --git a/migration/colo.c b/migration/colo.c
index c976b3ff344d..e0f713c837f5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -146,7 +146,7 @@ static void secondary_vm_do_failover(void)
         return;
     }
     /* Notify COLO incoming thread that failover work is finished */
-    qemu_sem_post(&mis->colo_incoming_sem);
+    qemu_event_set(&mis->colo_incoming_event);
 
     /* For Secondary VM, jump to incoming co */
     if (mis->colo_incoming_co) {
@@ -195,7 +195,7 @@ static void primary_vm_do_failover(void)
     }
 
     /* Notify COLO thread that failover work is finished */
-    qemu_sem_post(&s->colo_exit_sem);
+    qemu_event_set(&s->colo_exit_event);
 }
 
 COLOMode get_colo_mode(void)
@@ -620,8 +620,8 @@ out:
     }
 
     /* Hope this not to be too long to wait here */
-    qemu_sem_wait(&s->colo_exit_sem);
-    qemu_sem_destroy(&s->colo_exit_sem);
+    qemu_event_wait(&s->colo_exit_event);
+    qemu_event_destroy(&s->colo_exit_event);
 
     /*
      * It is safe to unregister notifier after failover finished.
@@ -651,7 +651,7 @@ void migrate_start_colo_process(MigrationState *s)
     s->colo_delay_timer =  timer_new_ms(QEMU_CLOCK_HOST,
                                 colo_checkpoint_notify_timer, NULL);
 
-    qemu_sem_init(&s->colo_exit_sem, 0);
+    qemu_event_init(&s->colo_exit_event, false);
     colo_process_checkpoint(s);
     bql_lock();
 }
@@ -808,11 +808,11 @@ void colo_shutdown(void)
     case COLO_MODE_PRIMARY:
         s = migrate_get_current();
         qemu_event_set(&s->colo_checkpoint_event);
-        qemu_sem_post(&s->colo_exit_sem);
+        qemu_event_set(&s->colo_exit_event);
         break;
     case COLO_MODE_SECONDARY:
         mis = migration_incoming_get_current();
-        qemu_sem_post(&mis->colo_incoming_sem);
+        qemu_event_set(&mis->colo_incoming_event);
         break;
     default:
         break;
@@ -827,7 +827,7 @@ static void *colo_process_incoming_thread(void *opaque)
     Error *local_err = NULL;
 
     rcu_register_thread();
-    qemu_sem_init(&mis->colo_incoming_sem, 0);
+    qemu_event_init(&mis->colo_incoming_event, false);
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                       MIGRATION_STATUS_COLO);
@@ -923,8 +923,8 @@ out:
     }
 
     /* Hope this not to be too long to loop here */
-    qemu_sem_wait(&mis->colo_incoming_sem);
-    qemu_sem_destroy(&mis->colo_incoming_sem);
+    qemu_event_wait(&mis->colo_incoming_event);
+    qemu_event_destroy(&mis->colo_incoming_event);
 
     rcu_unregister_thread();
     return NULL;

-- 
2.49.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 8/9] migration/postcopy: Replace QemuSemaphore with QemuEvent
  2025-05-10  8:51 [PATCH v2 0/9] Improve futex usage Akihiko Odaki
                   ` (6 preceding siblings ...)
  2025-05-10  8:51 ` [PATCH v2 7/9] migration/colo: " Akihiko Odaki
@ 2025-05-10  8:51 ` Akihiko Odaki
  2025-05-10  8:51 ` [PATCH v2 9/9] hw/display/apple-gfx: " Akihiko Odaki
  8 siblings, 0 replies; 12+ messages in thread
From: Akihiko Odaki @ 2025-05-10  8:51 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Akihiko Odaki

thread_sync_sem is an one-shot event so it can be converted into
QemuEvent, which is more lightweight.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 migration/migration.h    |  4 ++--
 migration/postcopy-ram.c | 10 +++++-----
 migration/savevm.c       |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index eec49bf3f893..fd45a0f20c0f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -98,9 +98,9 @@ struct MigrationIncomingState {
     void (*transport_cleanup)(void *data);
     /*
      * Used to sync thread creations.  Note that we can't create threads in
-     * parallel with this sem.
+     * parallel with this event.
      */
-    QemuSemaphore  thread_sync_sem;
+    QemuEvent  thread_sync_event;
     /*
      * Free at the start of the main state load, set as the main thread finishes
      * loading state.
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 995614b38c9d..75fd310fb2b0 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -90,10 +90,10 @@ void postcopy_thread_create(MigrationIncomingState *mis,
                             QemuThread *thread, const char *name,
                             void *(*fn)(void *), int joinable)
 {
-    qemu_sem_init(&mis->thread_sync_sem, 0);
+    qemu_event_init(&mis->thread_sync_event, false);
     qemu_thread_create(thread, name, fn, mis, joinable);
-    qemu_sem_wait(&mis->thread_sync_sem);
-    qemu_sem_destroy(&mis->thread_sync_sem);
+    qemu_event_wait(&mis->thread_sync_event);
+    qemu_event_destroy(&mis->thread_sync_event);
 }
 
 /* Postcopy needs to detect accesses to pages that haven't yet been copied
@@ -964,7 +964,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
     trace_postcopy_ram_fault_thread_entry();
     rcu_register_thread();
     mis->last_rb = NULL; /* last RAMBlock we sent part of */
-    qemu_sem_post(&mis->thread_sync_sem);
+    qemu_event_set(&mis->thread_sync_event);
 
     struct pollfd *pfd;
     size_t pfd_len = 2 + mis->postcopy_remote_fds->len;
@@ -1716,7 +1716,7 @@ void *postcopy_preempt_thread(void *opaque)
 
     rcu_register_thread();
 
-    qemu_sem_post(&mis->thread_sync_sem);
+    qemu_event_set(&mis->thread_sync_event);
 
     /*
      * The preempt channel is established in asynchronous way.  Wait
diff --git a/migration/savevm.c b/migration/savevm.c
index 006514c3e301..52105dd2f10b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2078,7 +2078,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
 
     migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
                                    MIGRATION_STATUS_POSTCOPY_ACTIVE);
-    qemu_sem_post(&mis->thread_sync_sem);
+    qemu_event_set(&mis->thread_sync_event);
     trace_postcopy_ram_listen_thread_start();
 
     rcu_register_thread();

-- 
2.49.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 9/9] hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent
  2025-05-10  8:51 [PATCH v2 0/9] Improve futex usage Akihiko Odaki
                   ` (7 preceding siblings ...)
  2025-05-10  8:51 ` [PATCH v2 8/9] migration/postcopy: " Akihiko Odaki
@ 2025-05-10  8:51 ` Akihiko Odaki
  8 siblings, 0 replies; 12+ messages in thread
From: Akihiko Odaki @ 2025-05-10  8:51 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé,
	Akihiko Odaki

sem in AppleGFXReadMemoryJob is an one-shot event so it can be converted
into QemuEvent, which is more specialized for such a use case.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
 hw/display/apple-gfx.m | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
index 2ff1c90df71a..173fffc86ef1 100644
--- a/hw/display/apple-gfx.m
+++ b/hw/display/apple-gfx.m
@@ -454,7 +454,7 @@ static void set_cursor_glyph(void *opaque)
 /* ------ DMA (device reading system memory) ------ */
 
 typedef struct AppleGFXReadMemoryJob {
-    QemuSemaphore sem;
+    QemuEvent event;
     hwaddr physical_address;
     uint64_t length;
     void *dst;
@@ -470,7 +470,7 @@ static void apple_gfx_do_read_memory(void *opaque)
                         job->dst, job->length, MEMTXATTRS_UNSPECIFIED);
     job->success = (r == MEMTX_OK);
 
-    qemu_sem_post(&job->sem);
+    qemu_event_set(&job->event);
 }
 
 static bool apple_gfx_read_memory(AppleGFXState *s, hwaddr physical_address,
@@ -483,11 +483,11 @@ static bool apple_gfx_read_memory(AppleGFXState *s, hwaddr physical_address,
     trace_apple_gfx_read_memory(physical_address, length, dst);
 
     /* Performing DMA requires BQL, so do it in a BH. */
-    qemu_sem_init(&job.sem, 0);
+    qemu_event_init(&job.event, 0);
     aio_bh_schedule_oneshot(qemu_get_aio_context(),
                             apple_gfx_do_read_memory, &job);
-    qemu_sem_wait(&job.sem);
-    qemu_sem_destroy(&job.sem);
+    qemu_event_wait(&job.event);
+    qemu_event_destroy(&job.event);
     return job.success;
 }
 

-- 
2.49.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/9] qemu-thread: Avoid futex abstraction for non-Linux
  2025-05-10  8:51 ` [PATCH v2 3/9] qemu-thread: Avoid futex abstraction for non-Linux Akihiko Odaki
@ 2025-05-10 15:02   ` Paolo Bonzini
  2025-05-11  4:41     ` Akihiko Odaki
  0 siblings, 1 reply; 12+ messages in thread
From: Paolo Bonzini @ 2025-05-10 15:02 UTC (permalink / raw)
  To: Akihiko Odaki, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé

On 5/10/25 10:51, Akihiko Odaki wrote:
> Add special implementations of qemu_event_set() and qemu_event_wait()
> using pthread primitives. qemu_event_wait() will ensure qemu_event_set()
> finishes, and these functions will avoid complex barrier and atomic
> operations.

Unfortunately not...

> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Tested-by: Phil Dennis-Jordan <phil@philjordan.eu>
> Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>
> ---
>   util/qemu-thread-posix.c | 45 ++++++++++++++++++---------------------------
>   1 file changed, 18 insertions(+), 27 deletions(-)
> 
> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
> index 13459e44c768..805cac444f15 100644
> --- a/util/qemu-thread-posix.c
> +++ b/util/qemu-thread-posix.c
> @@ -319,28 +319,6 @@ void qemu_sem_wait(QemuSemaphore *sem)
>   
>   #ifdef __linux__
>   #include "qemu/futex.h"
> -#else
> -static inline void qemu_futex_wake(QemuEvent *ev, int n)
> -{
> -    assert(ev->initialized);
> -    pthread_mutex_lock(&ev->lock);
> -    if (n == 1) {
> -        pthread_cond_signal(&ev->cond);
> -    } else {
> -        pthread_cond_broadcast(&ev->cond);
> -    }
> -    pthread_mutex_unlock(&ev->lock);
> -}
> -
> -static inline void qemu_futex_wait(QemuEvent *ev, unsigned val)
> -{
> -    assert(ev->initialized);
> -    pthread_mutex_lock(&ev->lock);
> -    if (ev->value == val) {
> -        pthread_cond_wait(&ev->cond, &ev->lock);
> -    }
> -    pthread_mutex_unlock(&ev->lock);
> -}
>   #endif
>   
>   /* Valid transitions:
> @@ -363,7 +341,7 @@ static inline void qemu_futex_wait(QemuEvent *ev, unsigned val)
>   
>   void qemu_event_init(QemuEvent *ev, bool init)
>   {
> -#ifndef __linux__
> +#ifndef CONFIG_LINUX
>       pthread_mutex_init(&ev->lock, NULL);
>       pthread_cond_init(&ev->cond, NULL);
>   #endif
> @@ -376,7 +354,7 @@ void qemu_event_destroy(QemuEvent *ev)
>   {
>       assert(ev->initialized);
>       ev->initialized = false;
> -#ifndef __linux__
> +#ifndef CONFIG_LINUX
>       pthread_mutex_destroy(&ev->lock);
>       pthread_cond_destroy(&ev->cond);
>   #endif
> @@ -386,6 +364,7 @@ void qemu_event_set(QemuEvent *ev)
>   {
>       assert(ev->initialized);
>   
> +#ifdef CONFIG_LINUX
>       /*
>        * Pairs with both qemu_event_reset() and qemu_event_wait().
>        *

The user of this code will not have mutexes so some care is needed.  You 
have:

	if (!qatomic_load_acquire(&done)) {
		qemu_event_reset(&ev);
		if (!qatomic_load_acquire(&done))
			qemu_event_wait(&ev);
	}

and on the other side

	qatomic_store_release(&done, 1);
	qemu_event_set(&ev);
	--> qatomic_set(&ev.value, EV_SET);

I don't think this is correct without the memory barrier in 
qemu_event_set(), though I am not actually sure how you'd
add it.

The problem is, I don't see anything that prevents this:

				set done
				qemu_event_set()
				  pthread_mutex_lock()
				  ev.value = SET
	qemu_event_reset()
	  ev.value |= FREE
	  smp_mb__after_rmw()
				  // store buffer not flushed yet,
				  // so other thread doesn't see "done"
	read done
	pthread_mutex_lock()
				  pthread_cond_broadcast()
				  pthread_mutex_unlock()
	while ev.value != SET
		// hang
		pthread_cond_wait()

The barrier in qemu_event_reset() is not enough, you need one on each side.

> +#else
> +    pthread_mutex_lock(&ev->lock);
> +    if (qatomic_read(&ev->value) != EV_SET) {

Apart from the above this needs to be a while(), because condition 
variables also have spurious wakeups.

Paolo

> +        pthread_cond_wait(&ev->cond, &ev->lock);
> +    }
> +    pthread_mutex_unlock(&ev->lock);
> +#endif
>   }
>   
>   static __thread NotifierList thread_exit;
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/9] qemu-thread: Avoid futex abstraction for non-Linux
  2025-05-10 15:02   ` Paolo Bonzini
@ 2025-05-11  4:41     ` Akihiko Odaki
  0 siblings, 0 replies; 12+ messages in thread
From: Akihiko Odaki @ 2025-05-11  4:41 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
	Hailiang Zhang
  Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
	Daniel P. Berrangé, Philippe Mathieu-Daudé

On 2025/05/11 0:02, Paolo Bonzini wrote:
> On 5/10/25 10:51, Akihiko Odaki wrote:
>> Add special implementations of qemu_event_set() and qemu_event_wait()
>> using pthread primitives. qemu_event_wait() will ensure qemu_event_set()
>> finishes, and these functions will avoid complex barrier and atomic
>> operations.
> 
> Unfortunately not...
> 
>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>> Tested-by: Phil Dennis-Jordan <phil@philjordan.eu>
>> Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>
>> ---
>>   util/qemu-thread-posix.c | 45 +++++++++++++++++ 
>> +---------------------------
>>   1 file changed, 18 insertions(+), 27 deletions(-)
>>
>> diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
>> index 13459e44c768..805cac444f15 100644
>> --- a/util/qemu-thread-posix.c
>> +++ b/util/qemu-thread-posix.c
>> @@ -319,28 +319,6 @@ void qemu_sem_wait(QemuSemaphore *sem)
>>   #ifdef __linux__
>>   #include "qemu/futex.h"
>> -#else
>> -static inline void qemu_futex_wake(QemuEvent *ev, int n)
>> -{
>> -    assert(ev->initialized);
>> -    pthread_mutex_lock(&ev->lock);
>> -    if (n == 1) {
>> -        pthread_cond_signal(&ev->cond);
>> -    } else {
>> -        pthread_cond_broadcast(&ev->cond);
>> -    }
>> -    pthread_mutex_unlock(&ev->lock);
>> -}
>> -
>> -static inline void qemu_futex_wait(QemuEvent *ev, unsigned val)
>> -{
>> -    assert(ev->initialized);
>> -    pthread_mutex_lock(&ev->lock);
>> -    if (ev->value == val) {
>> -        pthread_cond_wait(&ev->cond, &ev->lock);
>> -    }
>> -    pthread_mutex_unlock(&ev->lock);
>> -}
>>   #endif
>>   /* Valid transitions:
>> @@ -363,7 +341,7 @@ static inline void qemu_futex_wait(QemuEvent *ev, 
>> unsigned val)
>>   void qemu_event_init(QemuEvent *ev, bool init)
>>   {
>> -#ifndef __linux__
>> +#ifndef CONFIG_LINUX
>>       pthread_mutex_init(&ev->lock, NULL);
>>       pthread_cond_init(&ev->cond, NULL);
>>   #endif
>> @@ -376,7 +354,7 @@ void qemu_event_destroy(QemuEvent *ev)
>>   {
>>       assert(ev->initialized);
>>       ev->initialized = false;
>> -#ifndef __linux__
>> +#ifndef CONFIG_LINUX
>>       pthread_mutex_destroy(&ev->lock);
>>       pthread_cond_destroy(&ev->cond);
>>   #endif
>> @@ -386,6 +364,7 @@ void qemu_event_set(QemuEvent *ev)
>>   {
>>       assert(ev->initialized);
>> +#ifdef CONFIG_LINUX
>>       /*
>>        * Pairs with both qemu_event_reset() and qemu_event_wait().
>>        *
> 
> The user of this code will not have mutexes so some care is needed.  You 
> have:
> 
>      if (!qatomic_load_acquire(&done)) {
>          qemu_event_reset(&ev);
>          if (!qatomic_load_acquire(&done))
>              qemu_event_wait(&ev);
>      }
> 
> and on the other side
> 
>      qatomic_store_release(&done, 1);
>      qemu_event_set(&ev);
>      --> qatomic_set(&ev.value, EV_SET);
> 
> I don't think this is correct without the memory barrier in 
> qemu_event_set(), though I am not actually sure how you'd
> add it.
> 
> The problem is, I don't see anything that prevents this:
> 
>                  set done
>                  qemu_event_set()
>                    pthread_mutex_lock()
>                    ev.value = SET
>      qemu_event_reset()
>        ev.value |= FREE
>        smp_mb__after_rmw()
>                    // store buffer not flushed yet,
>                    // so other thread doesn't see "done"
>      read done
>      pthread_mutex_lock()
>                    pthread_cond_broadcast()
>                    pthread_mutex_unlock()
>      while ev.value != SET
>          // hang
>          pthread_cond_wait()
> 
> The barrier in qemu_event_reset() is not enough, you need one on each side.

This is insightful; I will restore smp_mb() in qemu_event_set().

> 
>> +#else
>> +    pthread_mutex_lock(&ev->lock);
>> +    if (qatomic_read(&ev->value) != EV_SET) {
> 
> Apart from the above this needs to be a while(), because condition 
> variables also have spurious wakeups.

I will fix it with the next version too.

Regards,
Akihiko Odaki

> 
> Paolo
> 
>> +        pthread_cond_wait(&ev->cond, &ev->lock);
>> +    }
>> +    pthread_mutex_unlock(&ev->lock);
>> +#endif
>>   }
>>   static __thread NotifierList thread_exit;
>>
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-05-11  4:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-10  8:51 [PATCH v2 0/9] Improve futex usage Akihiko Odaki
2025-05-10  8:51 ` [PATCH v2 1/9] futex: Check value after qemu_futex_wait() Akihiko Odaki
2025-05-10  8:51 ` [PATCH v2 2/9] futex: Support Windows Akihiko Odaki
2025-05-10  8:51 ` [PATCH v2 3/9] qemu-thread: Avoid futex abstraction for non-Linux Akihiko Odaki
2025-05-10 15:02   ` Paolo Bonzini
2025-05-11  4:41     ` Akihiko Odaki
2025-05-10  8:51 ` [PATCH v2 4/9] qemu-thread: Use futex for QemuEvent on Windows Akihiko Odaki
2025-05-10  8:51 ` [PATCH v2 5/9] qemu-thread: Use futex if available for QemuLockCnt Akihiko Odaki
2025-05-10  8:51 ` [PATCH v2 6/9] migration: Replace QemuSemaphore with QemuEvent Akihiko Odaki
2025-05-10  8:51 ` [PATCH v2 7/9] migration/colo: " Akihiko Odaki
2025-05-10  8:51 ` [PATCH v2 8/9] migration/postcopy: " Akihiko Odaki
2025-05-10  8:51 ` [PATCH v2 9/9] hw/display/apple-gfx: " Akihiko Odaki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).