* [PATCH v4 01/11] futex: Check value after qemu_futex_wait()
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
@ 2025-05-26 5:29 ` Akihiko Odaki
2025-05-26 5:29 ` [PATCH v4 02/11] futex: Support Windows Akihiko Odaki
` (11 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-26 5:29 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé,
Akihiko Odaki
futex(2) - Linux manual page
https://man7.org/linux/man-pages/man2/futex.2.html
> Note that a wake-up can also be caused by common futex usage patterns
> in unrelated code that happened to have previously used the futex
> word's memory location (e.g., typical futex-based implementations of
> Pthreads mutexes can cause this under some conditions). Therefore,
> callers should always conservatively assume that a return value of 0
> can mean a spurious wake-up, and use the futex word's value (i.e.,
> the user-space synchronization scheme) to decide whether to continue
> to block or not.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
include/qemu/futex.h | 9 +++++++++
tests/unit/test-aio-multithread.c | 4 +++-
util/qemu-thread-posix.c | 35 +++++++++++++----------------------
3 files changed, 25 insertions(+), 23 deletions(-)
diff --git a/include/qemu/futex.h b/include/qemu/futex.h
index 91ae88966e12..f57774005330 100644
--- a/include/qemu/futex.h
+++ b/include/qemu/futex.h
@@ -24,6 +24,15 @@ static inline void qemu_futex_wake(void *f, int n)
qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0);
}
+/*
+ * Note that a wake-up can also be caused by common futex usage patterns in
+ * unrelated code that happened to have previously used the futex word's
+ * memory location (e.g., typical futex-based implementations of Pthreads
+ * mutexes can cause this under some conditions). Therefore, callers should
+ * always conservatively assume that it is a spurious wake-up, and use the futex
+ * word's value (i.e., the user-space synchronization scheme) to decide whether
+ * to continue to block or not.
+ */
static inline void qemu_futex_wait(void *f, unsigned val)
{
while (qemu_futex(f, FUTEX_WAIT, (int) val, NULL, NULL, 0)) {
diff --git a/tests/unit/test-aio-multithread.c b/tests/unit/test-aio-multithread.c
index 08d4570ccb14..8c2e41545a29 100644
--- a/tests/unit/test-aio-multithread.c
+++ b/tests/unit/test-aio-multithread.c
@@ -305,7 +305,9 @@ static void mcs_mutex_lock(void)
prev = qatomic_xchg(&mutex_head, id);
if (prev != -1) {
qatomic_set(&nodes[prev].next, id);
- qemu_futex_wait(&nodes[id].locked, 1);
+ while (qatomic_read(&nodes[id].locked) == 1) {
+ qemu_futex_wait(&nodes[id].locked, 1);
+ }
}
}
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index b2e26e21205b..04fc3bf2298e 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -394,11 +394,7 @@ void qemu_event_set(QemuEvent *ev)
*/
smp_mb();
if (qatomic_read(&ev->value) != EV_SET) {
- int old = qatomic_xchg(&ev->value, EV_SET);
-
- /* Pairs with memory barrier in kernel futex_wait system call. */
- smp_mb__after_rmw();
- if (old == EV_BUSY) {
+ if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
/* There were waiters, wake them up. */
qemu_futex_wake(ev, INT_MAX);
}
@@ -428,17 +424,17 @@ void qemu_event_wait(QemuEvent *ev)
assert(ev->initialized);
- /*
- * qemu_event_wait must synchronize with qemu_event_set even if it does
- * not go down the slow path, so this load-acquire is needed that
- * synchronizes with the first memory barrier in qemu_event_set().
- *
- * If we do go down the slow path, there is no requirement at all: we
- * might miss a qemu_event_set() here but ultimately the memory barrier in
- * qemu_futex_wait() will ensure the check is done correctly.
- */
- value = qatomic_load_acquire(&ev->value);
- if (value != EV_SET) {
+ while (true) {
+ /*
+ * qemu_event_wait must synchronize with qemu_event_set even if it does
+ * not go down the slow path, so this load-acquire is needed that
+ * synchronizes with the first memory barrier in qemu_event_set().
+ */
+ value = qatomic_load_acquire(&ev->value);
+ if (value == EV_SET) {
+ break;
+ }
+
if (value == EV_FREE) {
/*
* Leave the event reset and tell qemu_event_set that there are
@@ -452,15 +448,10 @@ void qemu_event_wait(QemuEvent *ev)
* like the load above.
*/
if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
- return;
+ break;
}
}
- /*
- * This is the final check for a concurrent set, so it does need
- * a smp_mb() pairing with the second barrier of qemu_event_set().
- * The barrier is inside the FUTEX_WAIT system call.
- */
qemu_futex_wait(ev, EV_BUSY);
}
}
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 02/11] futex: Support Windows
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
2025-05-26 5:29 ` [PATCH v4 01/11] futex: Check value after qemu_futex_wait() Akihiko Odaki
@ 2025-05-26 5:29 ` Akihiko Odaki
2025-05-26 5:29 ` [PATCH v4 03/11] qemu-thread: Remove qatomic_read() in qemu_event_set() Akihiko Odaki
` (10 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-26 5:29 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé,
Akihiko Odaki
Windows supports futex-like APIs since Windows 8 and Windows Server
2012.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
meson.build | 2 ++
include/qemu/futex.h | 53 ++++++++++++++++++++++++++++++---------
tests/unit/test-aio-multithread.c | 2 +-
util/lockcnt.c | 2 +-
util/qemu-thread-posix.c | 4 +--
util/meson.build | 2 +-
6 files changed, 48 insertions(+), 17 deletions(-)
diff --git a/meson.build b/meson.build
index ad2053f968b8..d865a2ff778e 100644
--- a/meson.build
+++ b/meson.build
@@ -837,11 +837,13 @@ emulator_link_args = []
midl = not_found
widl = not_found
pathcch = not_found
+synchronization = not_found
host_dsosuf = '.so'
if host_os == 'windows'
midl = find_program('midl', required: false)
widl = find_program('widl', required: false)
pathcch = cc.find_library('pathcch')
+ synchronization = cc.find_library('Synchronization')
socket = cc.find_library('ws2_32')
winmm = cc.find_library('winmm')
diff --git a/include/qemu/futex.h b/include/qemu/futex.h
index f57774005330..607613eec835 100644
--- a/include/qemu/futex.h
+++ b/include/qemu/futex.h
@@ -1,5 +1,5 @@
/*
- * Wrappers around Linux futex syscall
+ * Wrappers around Linux futex syscall and similar
*
* Copyright Red Hat, Inc. 2017
*
@@ -11,28 +11,37 @@
*
*/
+/*
+ * Note that a wake-up can also be caused by common futex usage patterns in
+ * unrelated code that happened to have previously used the futex word's
+ * memory location (e.g., typical futex-based implementations of Pthreads
+ * mutexes can cause this under some conditions). Therefore, qemu_futex_wait()
+ * callers should always conservatively assume that it is a spurious wake-up,
+ * and use the futex word's value (i.e., the user-space synchronization scheme)
+ * to decide whether to continue to block or not.
+ */
+
#ifndef QEMU_FUTEX_H
#define QEMU_FUTEX_H
+#define HAVE_FUTEX
+
+#ifdef CONFIG_LINUX
#include <sys/syscall.h>
#include <linux/futex.h>
#define qemu_futex(...) syscall(__NR_futex, __VA_ARGS__)
-static inline void qemu_futex_wake(void *f, int n)
+static inline void qemu_futex_wake_all(void *f)
{
- qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0);
+ qemu_futex(f, FUTEX_WAKE, INT_MAX, NULL, NULL, 0);
+}
+
+static inline void qemu_futex_wake_single(void *f)
+{
+ qemu_futex(f, FUTEX_WAKE, 1, NULL, NULL, 0);
}
-/*
- * Note that a wake-up can also be caused by common futex usage patterns in
- * unrelated code that happened to have previously used the futex word's
- * memory location (e.g., typical futex-based implementations of Pthreads
- * mutexes can cause this under some conditions). Therefore, callers should
- * always conservatively assume that it is a spurious wake-up, and use the futex
- * word's value (i.e., the user-space synchronization scheme) to decide whether
- * to continue to block or not.
- */
static inline void qemu_futex_wait(void *f, unsigned val)
{
while (qemu_futex(f, FUTEX_WAIT, (int) val, NULL, NULL, 0)) {
@@ -46,5 +55,25 @@ static inline void qemu_futex_wait(void *f, unsigned val)
}
}
}
+#elif defined(CONFIG_WIN32)
+#include <synchapi.h>
+
+static inline void qemu_futex_wake_all(void *f)
+{
+ WakeByAddressAll(f);
+}
+
+static inline void qemu_futex_wake_single(void *f)
+{
+ WakeByAddressSingle(f);
+}
+
+static inline void qemu_futex_wait(void *f, unsigned val)
+{
+ WaitOnAddress(f, &val, sizeof(val), INFINITE);
+}
+#else
+#undef HAVE_FUTEX
+#endif
#endif /* QEMU_FUTEX_H */
diff --git a/tests/unit/test-aio-multithread.c b/tests/unit/test-aio-multithread.c
index 8c2e41545a29..0ead6bf34ad1 100644
--- a/tests/unit/test-aio-multithread.c
+++ b/tests/unit/test-aio-multithread.c
@@ -330,7 +330,7 @@ static void mcs_mutex_unlock(void)
/* Wake up the next in line. */
next = qatomic_read(&nodes[id].next);
nodes[next].locked = 0;
- qemu_futex_wake(&nodes[next].locked, 1);
+ qemu_futex_wake_single(&nodes[next].locked);
}
static void test_multi_fair_mutex_entry(void *opaque)
diff --git a/util/lockcnt.c b/util/lockcnt.c
index d07c6cc5cee4..ca27d8e61a5c 100644
--- a/util/lockcnt.c
+++ b/util/lockcnt.c
@@ -106,7 +106,7 @@ static bool qemu_lockcnt_cmpxchg_or_wait(QemuLockCnt *lockcnt, int *val,
static void lockcnt_wake(QemuLockCnt *lockcnt)
{
trace_lockcnt_futex_wake(lockcnt);
- qemu_futex_wake(&lockcnt->count, 1);
+ qemu_futex_wake_single(&lockcnt->count);
}
void qemu_lockcnt_inc(QemuLockCnt *lockcnt)
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 04fc3bf2298e..4d6f24d705c7 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -345,7 +345,7 @@ static inline void qemu_futex_wait(QemuEvent *ev, unsigned val)
/* Valid transitions:
* - free->set, when setting the event
- * - busy->set, when setting the event, followed by qemu_futex_wake
+ * - busy->set, when setting the event, followed by qemu_futex_wake_all
* - set->free, when resetting the event
* - free->busy, when waiting
*
@@ -396,7 +396,7 @@ void qemu_event_set(QemuEvent *ev)
if (qatomic_read(&ev->value) != EV_SET) {
if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
/* There were waiters, wake them up. */
- qemu_futex_wake(ev, INT_MAX);
+ qemu_futex_wake_all(ev);
}
}
}
diff --git a/util/meson.build b/util/meson.build
index 1adff96ebd51..5735f65f1994 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -27,7 +27,7 @@ else
util_ss.add(files('event_notifier-win32.c'))
util_ss.add(files('oslib-win32.c'))
util_ss.add(files('qemu-thread-win32.c'))
- util_ss.add(winmm, pathcch)
+ util_ss.add(winmm, pathcch, synchronization)
endif
util_ss.add(when: linux_io_uring, if_true: files('fdmon-io_uring.c'))
if glib_has_gslice
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 03/11] qemu-thread: Remove qatomic_read() in qemu_event_set()
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
2025-05-26 5:29 ` [PATCH v4 01/11] futex: Check value after qemu_futex_wait() Akihiko Odaki
2025-05-26 5:29 ` [PATCH v4 02/11] futex: Support Windows Akihiko Odaki
@ 2025-05-26 5:29 ` Akihiko Odaki
2025-05-26 5:29 ` [PATCH v4 04/11] qemu-thread: Replace __linux__ with CONFIG_LINUX Akihiko Odaki
` (9 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-26 5:29 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé,
Akihiko Odaki
The pair of smp_mb() and qatomic_read() sometimes allows skipping the
following qatomic_xchg() call, but it is unclear if it improves
performance so remove it.
Commit 374293ca6fb0 ("qemu-thread: use acquire/release to clarify
semantics of QemuEvent") replaced atomic_mb_read() in qemu_event_set()
with a pair of smp_mb() and atomic_read(). atomic_mb_read() was actually
cheaper than atomic_xchg(). include/qemu/atomic.h at that time had the
following comment:
/* atomic_mb_read/set semantics map Java volatile variables. They are
* less expensive on some platforms (notably POWER & ARMv7) than fully
* sequentially consistent operations.
*
* As long as they are used as paired operations they are safe to
* use. See docs/atomic.txt for more discussion.
*/
However, smp_mb() enforces full sequential consistency, so we cannot
use the same reasoning to claim that the pair of it and qatomic_read()
is cheaper than qatomic_xchg(). Therefore remove the pair and simplify
the code instead.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
util/qemu-thread-posix.c | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 4d6f24d705c7..a1b592d358c3 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -386,18 +386,9 @@ void qemu_event_set(QemuEvent *ev)
{
assert(ev->initialized);
- /*
- * Pairs with both qemu_event_reset() and qemu_event_wait().
- *
- * qemu_event_set has release semantics, but because it *loads*
- * ev->value we need a full memory barrier here.
- */
- smp_mb();
- if (qatomic_read(&ev->value) != EV_SET) {
- if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
- /* There were waiters, wake them up. */
- qemu_futex_wake_all(ev);
- }
+ if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
+ /* There were waiters, wake them up. */
+ qemu_futex_wake_all(ev);
}
}
@@ -413,7 +404,7 @@ void qemu_event_reset(QemuEvent *ev)
/*
* Order reset before checking the condition in the caller.
- * Pairs with the first memory barrier in qemu_event_set().
+ * Pairs with the store-release in qemu_event_set().
*/
smp_mb__after_rmw();
}
@@ -428,7 +419,7 @@ void qemu_event_wait(QemuEvent *ev)
/*
* qemu_event_wait must synchronize with qemu_event_set even if it does
* not go down the slow path, so this load-acquire is needed that
- * synchronizes with the first memory barrier in qemu_event_set().
+ * synchronizes with the store-release in qemu_event_set().
*/
value = qatomic_load_acquire(&ev->value);
if (value == EV_SET) {
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 04/11] qemu-thread: Replace __linux__ with CONFIG_LINUX
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
` (2 preceding siblings ...)
2025-05-26 5:29 ` [PATCH v4 03/11] qemu-thread: Remove qatomic_read() in qemu_event_set() Akihiko Odaki
@ 2025-05-26 5:29 ` Akihiko Odaki
2025-05-26 9:09 ` Philippe Mathieu-Daudé
2025-05-26 5:29 ` [PATCH v4 05/11] qemu-thread: Avoid futex abstraction for non-Linux Akihiko Odaki
` (8 subsequent siblings)
12 siblings, 1 reply; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-26 5:29 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé,
Akihiko Odaki
scripts/checkpatch.pl warns for __linux__ saying "architecture specific
defines should be avoided".
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
include/qemu/thread-posix.h | 2 +-
util/qemu-thread-posix.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
index 5f2f3d1386bc..c412623a9143 100644
--- a/include/qemu/thread-posix.h
+++ b/include/qemu/thread-posix.h
@@ -33,7 +33,7 @@ struct QemuSemaphore {
};
struct QemuEvent {
-#ifndef __linux__
+#ifndef CONFIG_LINUX
pthread_mutex_t lock;
pthread_cond_t cond;
#endif
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index a1b592d358c3..0d6344f4d216 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -317,7 +317,7 @@ void qemu_sem_wait(QemuSemaphore *sem)
qemu_mutex_unlock(&sem->mutex);
}
-#ifdef __linux__
+#ifdef CONFIG_LINUX
#include "qemu/futex.h"
#else
static inline void qemu_futex_wake(QemuEvent *ev, int n)
@@ -363,7 +363,7 @@ static inline void qemu_futex_wait(QemuEvent *ev, unsigned val)
void qemu_event_init(QemuEvent *ev, bool init)
{
-#ifndef __linux__
+#ifndef CONFIG_LINUX
pthread_mutex_init(&ev->lock, NULL);
pthread_cond_init(&ev->cond, NULL);
#endif
@@ -376,7 +376,7 @@ void qemu_event_destroy(QemuEvent *ev)
{
assert(ev->initialized);
ev->initialized = false;
-#ifndef __linux__
+#ifndef CONFIG_LINUX
pthread_mutex_destroy(&ev->lock);
pthread_cond_destroy(&ev->cond);
#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v4 04/11] qemu-thread: Replace __linux__ with CONFIG_LINUX
2025-05-26 5:29 ` [PATCH v4 04/11] qemu-thread: Replace __linux__ with CONFIG_LINUX Akihiko Odaki
@ 2025-05-26 9:09 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 27+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-05-26 9:09 UTC (permalink / raw)
To: Akihiko Odaki, Paolo Bonzini, Stefan Weil, Peter Xu,
Fabiano Rosas, Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé
On 26/5/25 07:29, Akihiko Odaki wrote:
> scripts/checkpatch.pl warns for __linux__ saying "architecture specific
> defines should be avoided".
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
> include/qemu/thread-posix.h | 2 +-
> util/qemu-thread-posix.c | 6 +++---
> 2 files changed, 4 insertions(+), 4 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v4 05/11] qemu-thread: Avoid futex abstraction for non-Linux
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
` (3 preceding siblings ...)
2025-05-26 5:29 ` [PATCH v4 04/11] qemu-thread: Replace __linux__ with CONFIG_LINUX Akihiko Odaki
@ 2025-05-26 5:29 ` Akihiko Odaki
2025-05-26 5:29 ` [PATCH v4 06/11] qemu-thread: Use futex for QemuEvent on Windows Akihiko Odaki
` (7 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-26 5:29 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé,
Akihiko Odaki
qemu-thread used to abstract pthread primitives into futex for the
QemuEvent implementation of POSIX systems other than Linux. However,
this abstraction has one key difference: unlike futex, pthread
primitives require an explicit destruction, and it must be ordered after
wait and wake operations.
It would be easier to perform destruction if a wait operation ensures
the corresponding wake operation finishes as POSIX semaphore does, but
that requires to protect state accesses in qemu_event_set() and
qemu_event_wait() with a mutex. On the other hand, real futex does not
need such a protection but needs complex barrier and atomic operations
to ensure ordering between the two functions.
Add special implementations of qemu_event_set() and qemu_event_wait()
using pthread primitives. qemu_event_wait() will ensure qemu_event_set()
finishes, and these functions will avoid complex barrier and atomic
operations to ensure ordering between them.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Phil Dennis-Jordan <phil@philjordan.eu>
---
util/qemu-thread-posix.c | 41 ++++++++++++++++-------------------------
1 file changed, 16 insertions(+), 25 deletions(-)
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index 0d6344f4d216..be7f25f9b31d 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -319,28 +319,6 @@ void qemu_sem_wait(QemuSemaphore *sem)
#ifdef CONFIG_LINUX
#include "qemu/futex.h"
-#else
-static inline void qemu_futex_wake(QemuEvent *ev, int n)
-{
- assert(ev->initialized);
- pthread_mutex_lock(&ev->lock);
- if (n == 1) {
- pthread_cond_signal(&ev->cond);
- } else {
- pthread_cond_broadcast(&ev->cond);
- }
- pthread_mutex_unlock(&ev->lock);
-}
-
-static inline void qemu_futex_wait(QemuEvent *ev, unsigned val)
-{
- assert(ev->initialized);
- pthread_mutex_lock(&ev->lock);
- if (ev->value == val) {
- pthread_cond_wait(&ev->cond, &ev->lock);
- }
- pthread_mutex_unlock(&ev->lock);
-}
#endif
/* Valid transitions:
@@ -386,10 +364,17 @@ void qemu_event_set(QemuEvent *ev)
{
assert(ev->initialized);
+#ifdef CONFIG_LINUX
if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
/* There were waiters, wake them up. */
qemu_futex_wake_all(ev);
}
+#else
+ pthread_mutex_lock(&ev->lock);
+ qatomic_store_release(&ev->value, EV_SET);
+ pthread_cond_broadcast(&ev->cond);
+ pthread_mutex_unlock(&ev->lock);
+#endif
}
void qemu_event_reset(QemuEvent *ev)
@@ -411,17 +396,16 @@ void qemu_event_reset(QemuEvent *ev)
void qemu_event_wait(QemuEvent *ev)
{
- unsigned value;
-
assert(ev->initialized);
+#ifdef CONFIG_LINUX
while (true) {
/*
* qemu_event_wait must synchronize with qemu_event_set even if it does
* not go down the slow path, so this load-acquire is needed that
* synchronizes with the store-release in qemu_event_set().
*/
- value = qatomic_load_acquire(&ev->value);
+ unsigned value = qatomic_load_acquire(&ev->value);
if (value == EV_SET) {
break;
}
@@ -445,6 +429,13 @@ void qemu_event_wait(QemuEvent *ev)
qemu_futex_wait(ev, EV_BUSY);
}
+#else
+ pthread_mutex_lock(&ev->lock);
+ while (qatomic_read(&ev->value) != EV_SET) {
+ pthread_cond_wait(&ev->cond, &ev->lock);
+ }
+ pthread_mutex_unlock(&ev->lock);
+#endif
}
static __thread NotifierList thread_exit;
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 06/11] qemu-thread: Use futex for QemuEvent on Windows
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
` (4 preceding siblings ...)
2025-05-26 5:29 ` [PATCH v4 05/11] qemu-thread: Avoid futex abstraction for non-Linux Akihiko Odaki
@ 2025-05-26 5:29 ` Akihiko Odaki
2025-05-26 9:23 ` Philippe Mathieu-Daudé
2025-05-26 5:29 ` [PATCH v4 07/11] qemu-thread: Use futex if available for QemuLockCnt Akihiko Odaki
` (6 subsequent siblings)
12 siblings, 1 reply; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-26 5:29 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé,
Akihiko Odaki
Use the futex-based implementation of QemuEvent on Windows to
remove code duplication and remove the overhead of event object
construction and destruction.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
include/qemu/thread-posix.h | 9 ----
include/qemu/thread-win32.h | 6 ---
include/qemu/thread.h | 11 +++-
util/event.c | 122 +++++++++++++++++++++++++++++++++++++++++
util/qemu-thread-posix.c | 121 -----------------------------------------
util/qemu-thread-win32.c | 129 --------------------------------------------
util/meson.build | 1 +
7 files changed, 133 insertions(+), 266 deletions(-)
diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
index c412623a9143..758808b705e4 100644
--- a/include/qemu/thread-posix.h
+++ b/include/qemu/thread-posix.h
@@ -32,15 +32,6 @@ struct QemuSemaphore {
unsigned int count;
};
-struct QemuEvent {
-#ifndef CONFIG_LINUX
- pthread_mutex_t lock;
- pthread_cond_t cond;
-#endif
- unsigned value;
- bool initialized;
-};
-
struct QemuThread {
pthread_t thread;
};
diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h
index d95af4498fc9..da9e7322990c 100644
--- a/include/qemu/thread-win32.h
+++ b/include/qemu/thread-win32.h
@@ -28,12 +28,6 @@ struct QemuSemaphore {
bool initialized;
};
-struct QemuEvent {
- int value;
- HANDLE event;
- bool initialized;
-};
-
typedef struct QemuThreadData QemuThreadData;
struct QemuThread {
QemuThreadData *data;
diff --git a/include/qemu/thread.h b/include/qemu/thread.h
index 6f800aad31a9..573f8c9ede20 100644
--- a/include/qemu/thread.h
+++ b/include/qemu/thread.h
@@ -3,13 +3,22 @@
#include "qemu/processor.h"
#include "qemu/atomic.h"
+#include "qemu/futex.h"
typedef struct QemuCond QemuCond;
typedef struct QemuSemaphore QemuSemaphore;
-typedef struct QemuEvent QemuEvent;
typedef struct QemuLockCnt QemuLockCnt;
typedef struct QemuThread QemuThread;
+typedef struct QemuEvent {
+#ifndef HAVE_FUTEX
+ pthread_mutex_t lock;
+ pthread_cond_t cond;
+#endif
+ unsigned value;
+ bool initialized;
+} QemuEvent;
+
#ifdef _WIN32
#include "qemu/thread-win32.h"
#else
diff --git a/util/event.c b/util/event.c
new file mode 100644
index 000000000000..663b7042b17a
--- /dev/null
+++ b/util/event.c
@@ -0,0 +1,122 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "qemu/osdep.h"
+#include "qemu/thread.h"
+
+/*
+ * Valid transitions:
+ * - free->set, when setting the event
+ * - busy->set, when setting the event, followed by qemu_futex_wake_all
+ * - set->free, when resetting the event
+ * - free->busy, when waiting
+ *
+ * set->busy does not happen (it can be observed from the outside but
+ * it really is set->free->busy).
+ *
+ * busy->free provably cannot happen; to enforce it, the set->free transition
+ * is done with an OR, which becomes a no-op if the event has concurrently
+ * transitioned to free or busy.
+ */
+
+#define EV_SET 0
+#define EV_FREE 1
+#define EV_BUSY -1
+
+void qemu_event_init(QemuEvent *ev, bool init)
+{
+#ifndef HAVE_FUTEX
+ pthread_mutex_init(&ev->lock, NULL);
+ pthread_cond_init(&ev->cond, NULL);
+#endif
+
+ ev->value = (init ? EV_SET : EV_FREE);
+ ev->initialized = true;
+}
+
+void qemu_event_destroy(QemuEvent *ev)
+{
+ assert(ev->initialized);
+ ev->initialized = false;
+#ifndef HAVE_FUTEX
+ pthread_mutex_destroy(&ev->lock);
+ pthread_cond_destroy(&ev->cond);
+#endif
+}
+
+void qemu_event_set(QemuEvent *ev)
+{
+ assert(ev->initialized);
+
+#ifdef HAVE_FUTEX
+ if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
+ /* There were waiters, wake them up. */
+ qemu_futex_wake_all(ev);
+ }
+#else
+ pthread_mutex_lock(&ev->lock);
+ qatomic_store_release(&ev->value, EV_SET);
+ pthread_cond_broadcast(&ev->cond);
+ pthread_mutex_unlock(&ev->lock);
+#endif
+}
+
+void qemu_event_reset(QemuEvent *ev)
+{
+ assert(ev->initialized);
+
+ /*
+ * If there was a concurrent reset (or even reset+wait),
+ * do nothing. Otherwise change EV_SET->EV_FREE.
+ */
+ qatomic_or(&ev->value, EV_FREE);
+
+ /*
+ * Order reset before checking the condition in the caller.
+ * Pairs with the store-release in qemu_event_set().
+ */
+ smp_mb__after_rmw();
+}
+
+void qemu_event_wait(QemuEvent *ev)
+{
+ assert(ev->initialized);
+
+#ifdef HAVE_FUTEX
+ while (true) {
+ /*
+ * qemu_event_wait must synchronize with qemu_event_set even if it does
+ * not go down the slow path, so this load-acquire is needed that
+ * synchronizes with the store-release in qemu_event_set().
+ */
+ unsigned value = qatomic_load_acquire(&ev->value);
+ if (value == EV_SET) {
+ break;
+ }
+
+ if (value == EV_FREE) {
+ /*
+ * Leave the event reset and tell qemu_event_set that there are
+ * waiters. No need to retry, because there cannot be a concurrent
+ * busy->free transition. After the CAS, the event will be either
+ * set or busy.
+ *
+ * This cmpxchg doesn't have particular ordering requirements if it
+ * succeeds (moving the store earlier can only cause
+ * qemu_event_set() to issue _more_ wakeups), the failing case needs
+ * acquire semantics like the load above.
+ */
+ if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
+ break;
+ }
+ }
+
+ qemu_futex_wait(ev, EV_BUSY);
+ }
+#else
+ pthread_mutex_lock(&ev->lock);
+ while (qatomic_read(&ev->value) != EV_SET) {
+ pthread_cond_wait(&ev->cond, &ev->lock);
+ }
+ pthread_mutex_unlock(&ev->lock);
+#endif
+}
diff --git a/util/qemu-thread-posix.c b/util/qemu-thread-posix.c
index be7f25f9b31d..ba725444ba63 100644
--- a/util/qemu-thread-posix.c
+++ b/util/qemu-thread-posix.c
@@ -317,127 +317,6 @@ void qemu_sem_wait(QemuSemaphore *sem)
qemu_mutex_unlock(&sem->mutex);
}
-#ifdef CONFIG_LINUX
-#include "qemu/futex.h"
-#endif
-
-/* Valid transitions:
- * - free->set, when setting the event
- * - busy->set, when setting the event, followed by qemu_futex_wake_all
- * - set->free, when resetting the event
- * - free->busy, when waiting
- *
- * set->busy does not happen (it can be observed from the outside but
- * it really is set->free->busy).
- *
- * busy->free provably cannot happen; to enforce it, the set->free transition
- * is done with an OR, which becomes a no-op if the event has concurrently
- * transitioned to free or busy.
- */
-
-#define EV_SET 0
-#define EV_FREE 1
-#define EV_BUSY -1
-
-void qemu_event_init(QemuEvent *ev, bool init)
-{
-#ifndef CONFIG_LINUX
- pthread_mutex_init(&ev->lock, NULL);
- pthread_cond_init(&ev->cond, NULL);
-#endif
-
- ev->value = (init ? EV_SET : EV_FREE);
- ev->initialized = true;
-}
-
-void qemu_event_destroy(QemuEvent *ev)
-{
- assert(ev->initialized);
- ev->initialized = false;
-#ifndef CONFIG_LINUX
- pthread_mutex_destroy(&ev->lock);
- pthread_cond_destroy(&ev->cond);
-#endif
-}
-
-void qemu_event_set(QemuEvent *ev)
-{
- assert(ev->initialized);
-
-#ifdef CONFIG_LINUX
- if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
- /* There were waiters, wake them up. */
- qemu_futex_wake_all(ev);
- }
-#else
- pthread_mutex_lock(&ev->lock);
- qatomic_store_release(&ev->value, EV_SET);
- pthread_cond_broadcast(&ev->cond);
- pthread_mutex_unlock(&ev->lock);
-#endif
-}
-
-void qemu_event_reset(QemuEvent *ev)
-{
- assert(ev->initialized);
-
- /*
- * If there was a concurrent reset (or even reset+wait),
- * do nothing. Otherwise change EV_SET->EV_FREE.
- */
- qatomic_or(&ev->value, EV_FREE);
-
- /*
- * Order reset before checking the condition in the caller.
- * Pairs with the store-release in qemu_event_set().
- */
- smp_mb__after_rmw();
-}
-
-void qemu_event_wait(QemuEvent *ev)
-{
- assert(ev->initialized);
-
-#ifdef CONFIG_LINUX
- while (true) {
- /*
- * qemu_event_wait must synchronize with qemu_event_set even if it does
- * not go down the slow path, so this load-acquire is needed that
- * synchronizes with the store-release in qemu_event_set().
- */
- unsigned value = qatomic_load_acquire(&ev->value);
- if (value == EV_SET) {
- break;
- }
-
- if (value == EV_FREE) {
- /*
- * Leave the event reset and tell qemu_event_set that there are
- * waiters. No need to retry, because there cannot be a concurrent
- * busy->free transition. After the CAS, the event will be either
- * set or busy.
- *
- * This cmpxchg doesn't have particular ordering requirements if it
- * succeeds (moving the store earlier can only cause qemu_event_set()
- * to issue _more_ wakeups), the failing case needs acquire semantics
- * like the load above.
- */
- if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
- break;
- }
- }
-
- qemu_futex_wait(ev, EV_BUSY);
- }
-#else
- pthread_mutex_lock(&ev->lock);
- while (qatomic_read(&ev->value) != EV_SET) {
- pthread_cond_wait(&ev->cond, &ev->lock);
- }
- pthread_mutex_unlock(&ev->lock);
-#endif
-}
-
static __thread NotifierList thread_exit;
/*
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index a7fe3cc345f0..ca2e0b512e26 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -231,135 +231,6 @@ void qemu_sem_wait(QemuSemaphore *sem)
}
}
-/* Wrap a Win32 manual-reset event with a fast userspace path. The idea
- * is to reset the Win32 event lazily, as part of a test-reset-test-wait
- * sequence. Such a sequence is, indeed, how QemuEvents are used by
- * RCU and other subsystems!
- *
- * Valid transitions:
- * - free->set, when setting the event
- * - busy->set, when setting the event, followed by SetEvent
- * - set->free, when resetting the event
- * - free->busy, when waiting
- *
- * set->busy does not happen (it can be observed from the outside but
- * it really is set->free->busy).
- *
- * busy->free provably cannot happen; to enforce it, the set->free transition
- * is done with an OR, which becomes a no-op if the event has concurrently
- * transitioned to free or busy (and is faster than cmpxchg).
- */
-
-#define EV_SET 0
-#define EV_FREE 1
-#define EV_BUSY -1
-
-void qemu_event_init(QemuEvent *ev, bool init)
-{
- /* Manual reset. */
- ev->event = CreateEvent(NULL, TRUE, TRUE, NULL);
- ev->value = (init ? EV_SET : EV_FREE);
- ev->initialized = true;
-}
-
-void qemu_event_destroy(QemuEvent *ev)
-{
- assert(ev->initialized);
- ev->initialized = false;
- CloseHandle(ev->event);
-}
-
-void qemu_event_set(QemuEvent *ev)
-{
- assert(ev->initialized);
-
- /*
- * Pairs with both qemu_event_reset() and qemu_event_wait().
- *
- * qemu_event_set has release semantics, but because it *loads*
- * ev->value we need a full memory barrier here.
- */
- smp_mb();
- if (qatomic_read(&ev->value) != EV_SET) {
- int old = qatomic_xchg(&ev->value, EV_SET);
-
- /* Pairs with memory barrier after ResetEvent. */
- smp_mb__after_rmw();
- if (old == EV_BUSY) {
- /* There were waiters, wake them up. */
- SetEvent(ev->event);
- }
- }
-}
-
-void qemu_event_reset(QemuEvent *ev)
-{
- assert(ev->initialized);
-
- /*
- * If there was a concurrent reset (or even reset+wait),
- * do nothing. Otherwise change EV_SET->EV_FREE.
- */
- qatomic_or(&ev->value, EV_FREE);
-
- /*
- * Order reset before checking the condition in the caller.
- * Pairs with the first memory barrier in qemu_event_set().
- */
- smp_mb__after_rmw();
-}
-
-void qemu_event_wait(QemuEvent *ev)
-{
- unsigned value;
-
- assert(ev->initialized);
-
- /*
- * qemu_event_wait must synchronize with qemu_event_set even if it does
- * not go down the slow path, so this load-acquire is needed that
- * synchronizes with the first memory barrier in qemu_event_set().
- *
- * If we do go down the slow path, there is no requirement at all: we
- * might miss a qemu_event_set() here but ultimately the memory barrier in
- * qemu_futex_wait() will ensure the check is done correctly.
- */
- value = qatomic_load_acquire(&ev->value);
- if (value != EV_SET) {
- if (value == EV_FREE) {
- /*
- * Here the underlying kernel event is reset, but qemu_event_set is
- * not yet going to call SetEvent. However, there will be another
- * check for EV_SET below when setting EV_BUSY. At that point it
- * is safe to call WaitForSingleObject.
- */
- ResetEvent(ev->event);
-
- /*
- * It is not clear whether ResetEvent provides this barrier; kernel
- * APIs (KeResetEvent/KeClearEvent) do not. Better safe than sorry!
- */
- smp_mb();
-
- /*
- * Leave the event reset and tell qemu_event_set that there are
- * waiters. No need to retry, because there cannot be a concurrent
- * busy->free transition. After the CAS, the event will be either
- * set or busy.
- */
- if (qatomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
- return;
- }
- }
-
- /*
- * ev->value is now EV_BUSY. Since we didn't observe EV_SET,
- * qemu_event_set() must observe EV_BUSY and call SetEvent().
- */
- WaitForSingleObject(ev->event, INFINITE);
- }
-}
-
struct QemuThreadData {
/* Passed to win32_start_routine. */
void *(*start_routine)(void *);
diff --git a/util/meson.build b/util/meson.build
index 5735f65f1994..35029380a376 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -35,6 +35,7 @@ if glib_has_gslice
endif
util_ss.add(files('defer-call.c'))
util_ss.add(files('envlist.c', 'path.c', 'module.c'))
+util_ss.add(files('event.c'))
util_ss.add(files('host-utils.c'))
util_ss.add(files('bitmap.c', 'bitops.c'))
util_ss.add(files('fifo8.c'))
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v4 06/11] qemu-thread: Use futex for QemuEvent on Windows
2025-05-26 5:29 ` [PATCH v4 06/11] qemu-thread: Use futex for QemuEvent on Windows Akihiko Odaki
@ 2025-05-26 9:23 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 27+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-05-26 9:23 UTC (permalink / raw)
To: Akihiko Odaki, Paolo Bonzini, Stefan Weil, Peter Xu,
Fabiano Rosas, Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé
On 26/5/25 07:29, Akihiko Odaki wrote:
> Use the futex-based implementation of QemuEvent on Windows to
> remove code duplication and remove the overhead of event object
> construction and destruction.
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
> include/qemu/thread-posix.h | 9 ----
> include/qemu/thread-win32.h | 6 ---
> include/qemu/thread.h | 11 +++-
> util/event.c | 122 +++++++++++++++++++++++++++++++++++++++++
> util/qemu-thread-posix.c | 121 -----------------------------------------
> util/qemu-thread-win32.c | 129 --------------------------------------------
> util/meson.build | 1 +
> 7 files changed, 133 insertions(+), 266 deletions(-)
Great.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v4 07/11] qemu-thread: Use futex if available for QemuLockCnt
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
` (5 preceding siblings ...)
2025-05-26 5:29 ` [PATCH v4 06/11] qemu-thread: Use futex for QemuEvent on Windows Akihiko Odaki
@ 2025-05-26 5:29 ` Akihiko Odaki
2025-05-26 5:29 ` [PATCH v4 08/11] migration: Replace QemuSemaphore with QemuEvent Akihiko Odaki
` (5 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-26 5:29 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé,
Akihiko Odaki
This unlocks the futex-based implementation of QemuLockCnt to Windows.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
include/qemu/lockcnt.h | 2 +-
util/lockcnt.c | 7 ++++---
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/qemu/lockcnt.h b/include/qemu/lockcnt.h
index f4b62a3f7011..5a2800e3f182 100644
--- a/include/qemu/lockcnt.h
+++ b/include/qemu/lockcnt.h
@@ -17,7 +17,7 @@
typedef struct QemuLockCnt QemuLockCnt;
struct QemuLockCnt {
-#ifndef CONFIG_LINUX
+#ifndef HAVE_FUTEX
QemuMutex mutex;
#endif
unsigned count;
diff --git a/util/lockcnt.c b/util/lockcnt.c
index ca27d8e61a5c..92c9f8ceca88 100644
--- a/util/lockcnt.c
+++ b/util/lockcnt.c
@@ -12,10 +12,11 @@
#include "qemu/atomic.h"
#include "trace.h"
-#ifdef CONFIG_LINUX
-#include "qemu/futex.h"
+#ifdef HAVE_FUTEX
-/* On Linux, bits 0-1 are a futex-based lock, bits 2-31 are the counter.
+/*
+ * When futex is available, bits 0-1 are a futex-based lock, bits 2-31 are the
+ * counter.
* For the mutex algorithm see Ulrich Drepper's "Futexes Are Tricky" (ok,
* this is not the most relaxing citation I could make...). It is similar
* to mutex2 in the paper.
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 08/11] migration: Replace QemuSemaphore with QemuEvent
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
` (6 preceding siblings ...)
2025-05-26 5:29 ` [PATCH v4 07/11] qemu-thread: Use futex if available for QemuLockCnt Akihiko Odaki
@ 2025-05-26 5:29 ` Akihiko Odaki
2025-05-26 9:24 ` Philippe Mathieu-Daudé
2025-05-26 5:29 ` [PATCH v4 09/11] migration/colo: " Akihiko Odaki
` (4 subsequent siblings)
12 siblings, 1 reply; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-26 5:29 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé,
Akihiko Odaki
pause_event can utilize qemu_event_reset() to discard events.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
migration/migration.h | 2 +-
migration/migration.c | 21 +++++++++------------
2 files changed, 10 insertions(+), 13 deletions(-)
diff --git a/migration/migration.h b/migration/migration.h
index d53f7cad84d8..21aa6a3c8fee 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -379,7 +379,7 @@ struct MigrationState {
QemuSemaphore wait_unplug_sem;
/* Migration is paused due to pause-before-switchover */
- QemuSemaphore pause_sem;
+ QemuEvent pause_event;
/* The semaphore is used to notify COLO thread that failover is finished */
QemuSemaphore colo_exit_sem;
diff --git a/migration/migration.c b/migration/migration.c
index 4697732bef91..4098870bce33 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1630,7 +1630,7 @@ void migration_cancel(void)
}
/* If the migration is paused, kick it out of the pause */
if (old_state == MIGRATION_STATUS_PRE_SWITCHOVER) {
- qemu_sem_post(&s->pause_sem);
+ qemu_event_set(&s->pause_event);
}
migrate_set_state(&s->state, old_state, MIGRATION_STATUS_CANCELLING);
} while (s->state != MIGRATION_STATUS_CANCELLING);
@@ -2342,7 +2342,7 @@ void qmp_migrate_continue(MigrationStatus state, Error **errp)
MigrationStatus_str(s->state));
return;
}
- qemu_sem_post(&s->pause_sem);
+ qemu_event_set(&s->pause_event);
}
int migration_rp_wait(MigrationState *s)
@@ -2911,21 +2911,18 @@ static bool migration_switchover_prepare(MigrationState *s)
return true;
}
- /* Since leaving this state is not atomic with posting the semaphore
+ /*
+ * Since leaving this state is not atomic with setting the event
* it's possible that someone could have issued multiple migrate_continue
- * and the semaphore is incorrectly positive at this point;
- * the docs say it's undefined to reinit a semaphore that's already
- * init'd, so use timedwait to eat up any existing posts.
+ * and the event is incorrectly set at this point so reset it.
*/
- while (qemu_sem_timedwait(&s->pause_sem, 1) == 0) {
- /* This block intentionally left blank */
- }
+ qemu_event_reset(&s->pause_event);
/* Update [POSTCOPY_]ACTIVE to PRE_SWITCHOVER */
migrate_set_state(&s->state, s->state, MIGRATION_STATUS_PRE_SWITCHOVER);
bql_unlock();
- qemu_sem_wait(&s->pause_sem);
+ qemu_event_wait(&s->pause_event);
bql_lock();
/*
@@ -4057,7 +4054,7 @@ static void migration_instance_finalize(Object *obj)
qemu_mutex_destroy(&ms->qemu_file_lock);
qemu_sem_destroy(&ms->wait_unplug_sem);
qemu_sem_destroy(&ms->rate_limit_sem);
- qemu_sem_destroy(&ms->pause_sem);
+ qemu_event_destroy(&ms->pause_event);
qemu_sem_destroy(&ms->postcopy_pause_sem);
qemu_sem_destroy(&ms->rp_state.rp_sem);
qemu_sem_destroy(&ms->rp_state.rp_pong_acks);
@@ -4072,7 +4069,7 @@ static void migration_instance_init(Object *obj)
ms->state = MIGRATION_STATUS_NONE;
ms->mbps = -1;
ms->pages_per_second = -1;
- qemu_sem_init(&ms->pause_sem, 0);
+ qemu_event_init(&ms->pause_event, false);
qemu_mutex_init(&ms->error_mutex);
migrate_params_init(&ms->parameters);
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v4 08/11] migration: Replace QemuSemaphore with QemuEvent
2025-05-26 5:29 ` [PATCH v4 08/11] migration: Replace QemuSemaphore with QemuEvent Akihiko Odaki
@ 2025-05-26 9:24 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 27+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-05-26 9:24 UTC (permalink / raw)
To: Akihiko Odaki, Paolo Bonzini, Stefan Weil, Peter Xu,
Fabiano Rosas, Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé
On 26/5/25 07:29, Akihiko Odaki wrote:
> pause_event can utilize qemu_event_reset() to discard events.
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
> migration/migration.h | 2 +-
> migration/migration.c | 21 +++++++++------------
> 2 files changed, 10 insertions(+), 13 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v4 09/11] migration/colo: Replace QemuSemaphore with QemuEvent
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
` (7 preceding siblings ...)
2025-05-26 5:29 ` [PATCH v4 08/11] migration: Replace QemuSemaphore with QemuEvent Akihiko Odaki
@ 2025-05-26 5:29 ` Akihiko Odaki
2025-05-26 9:24 ` Philippe Mathieu-Daudé
2025-05-26 5:29 ` [PATCH v4 10/11] migration/postcopy: " Akihiko Odaki
` (3 subsequent siblings)
12 siblings, 1 reply; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-26 5:29 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé,
Akihiko Odaki
colo_exit_sem and colo_incoming_sem represent one-shot events so they
can be converted into QemuEvent, which is more lightweight.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.h | 6 +++---
migration/colo.c | 20 ++++++++++----------
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/migration/migration.h b/migration/migration.h
index 21aa6a3c8fee..aaec471c00f8 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -186,7 +186,7 @@ struct MigrationIncomingState {
/* The coroutine we should enter (back) after failover */
Coroutine *colo_incoming_co;
- QemuSemaphore colo_incoming_sem;
+ QemuEvent colo_incoming_event;
/* Optional load threads pool and its thread exit request flag */
ThreadPool *load_threads;
@@ -381,8 +381,8 @@ struct MigrationState {
/* Migration is paused due to pause-before-switchover */
QemuEvent pause_event;
- /* The semaphore is used to notify COLO thread that failover is finished */
- QemuSemaphore colo_exit_sem;
+ /* The event is used to notify COLO thread that failover is finished */
+ QemuEvent colo_exit_event;
/* The event is used to notify COLO thread to do checkpoint */
QemuEvent colo_checkpoint_event;
diff --git a/migration/colo.c b/migration/colo.c
index c976b3ff344d..e0f713c837f5 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -146,7 +146,7 @@ static void secondary_vm_do_failover(void)
return;
}
/* Notify COLO incoming thread that failover work is finished */
- qemu_sem_post(&mis->colo_incoming_sem);
+ qemu_event_set(&mis->colo_incoming_event);
/* For Secondary VM, jump to incoming co */
if (mis->colo_incoming_co) {
@@ -195,7 +195,7 @@ static void primary_vm_do_failover(void)
}
/* Notify COLO thread that failover work is finished */
- qemu_sem_post(&s->colo_exit_sem);
+ qemu_event_set(&s->colo_exit_event);
}
COLOMode get_colo_mode(void)
@@ -620,8 +620,8 @@ out:
}
/* Hope this not to be too long to wait here */
- qemu_sem_wait(&s->colo_exit_sem);
- qemu_sem_destroy(&s->colo_exit_sem);
+ qemu_event_wait(&s->colo_exit_event);
+ qemu_event_destroy(&s->colo_exit_event);
/*
* It is safe to unregister notifier after failover finished.
@@ -651,7 +651,7 @@ void migrate_start_colo_process(MigrationState *s)
s->colo_delay_timer = timer_new_ms(QEMU_CLOCK_HOST,
colo_checkpoint_notify_timer, NULL);
- qemu_sem_init(&s->colo_exit_sem, 0);
+ qemu_event_init(&s->colo_exit_event, false);
colo_process_checkpoint(s);
bql_lock();
}
@@ -808,11 +808,11 @@ void colo_shutdown(void)
case COLO_MODE_PRIMARY:
s = migrate_get_current();
qemu_event_set(&s->colo_checkpoint_event);
- qemu_sem_post(&s->colo_exit_sem);
+ qemu_event_set(&s->colo_exit_event);
break;
case COLO_MODE_SECONDARY:
mis = migration_incoming_get_current();
- qemu_sem_post(&mis->colo_incoming_sem);
+ qemu_event_set(&mis->colo_incoming_event);
break;
default:
break;
@@ -827,7 +827,7 @@ static void *colo_process_incoming_thread(void *opaque)
Error *local_err = NULL;
rcu_register_thread();
- qemu_sem_init(&mis->colo_incoming_sem, 0);
+ qemu_event_init(&mis->colo_incoming_event, false);
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_COLO);
@@ -923,8 +923,8 @@ out:
}
/* Hope this not to be too long to loop here */
- qemu_sem_wait(&mis->colo_incoming_sem);
- qemu_sem_destroy(&mis->colo_incoming_sem);
+ qemu_event_wait(&mis->colo_incoming_event);
+ qemu_event_destroy(&mis->colo_incoming_event);
rcu_unregister_thread();
return NULL;
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v4 09/11] migration/colo: Replace QemuSemaphore with QemuEvent
2025-05-26 5:29 ` [PATCH v4 09/11] migration/colo: " Akihiko Odaki
@ 2025-05-26 9:24 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 27+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-05-26 9:24 UTC (permalink / raw)
To: Akihiko Odaki, Paolo Bonzini, Stefan Weil, Peter Xu,
Fabiano Rosas, Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé
On 26/5/25 07:29, Akihiko Odaki wrote:
> colo_exit_sem and colo_incoming_sem represent one-shot events so they
> can be converted into QemuEvent, which is more lightweight.
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> ---
> migration/migration.h | 6 +++---
> migration/colo.c | 20 ++++++++++----------
> 2 files changed, 13 insertions(+), 13 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v4 10/11] migration/postcopy: Replace QemuSemaphore with QemuEvent
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
` (8 preceding siblings ...)
2025-05-26 5:29 ` [PATCH v4 09/11] migration/colo: " Akihiko Odaki
@ 2025-05-26 5:29 ` Akihiko Odaki
2025-05-26 5:29 ` [PATCH v4 11/11] hw/display/apple-gfx: " Akihiko Odaki
` (2 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-26 5:29 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé,
Akihiko Odaki
thread_sync_sem is an one-shot event so it can be converted into
QemuEvent, which is more lightweight.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
---
migration/migration.h | 4 ++--
migration/postcopy-ram.c | 10 +++++-----
migration/savevm.c | 2 +-
3 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/migration/migration.h b/migration/migration.h
index aaec471c00f8..739289de9342 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -98,9 +98,9 @@ struct MigrationIncomingState {
void (*transport_cleanup)(void *data);
/*
* Used to sync thread creations. Note that we can't create threads in
- * parallel with this sem.
+ * parallel with this event.
*/
- QemuSemaphore thread_sync_sem;
+ QemuEvent thread_sync_event;
/*
* Free at the start of the main state load, set as the main thread finishes
* loading state.
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 995614b38c9d..75fd310fb2b0 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -90,10 +90,10 @@ void postcopy_thread_create(MigrationIncomingState *mis,
QemuThread *thread, const char *name,
void *(*fn)(void *), int joinable)
{
- qemu_sem_init(&mis->thread_sync_sem, 0);
+ qemu_event_init(&mis->thread_sync_event, false);
qemu_thread_create(thread, name, fn, mis, joinable);
- qemu_sem_wait(&mis->thread_sync_sem);
- qemu_sem_destroy(&mis->thread_sync_sem);
+ qemu_event_wait(&mis->thread_sync_event);
+ qemu_event_destroy(&mis->thread_sync_event);
}
/* Postcopy needs to detect accesses to pages that haven't yet been copied
@@ -964,7 +964,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
trace_postcopy_ram_fault_thread_entry();
rcu_register_thread();
mis->last_rb = NULL; /* last RAMBlock we sent part of */
- qemu_sem_post(&mis->thread_sync_sem);
+ qemu_event_set(&mis->thread_sync_event);
struct pollfd *pfd;
size_t pfd_len = 2 + mis->postcopy_remote_fds->len;
@@ -1716,7 +1716,7 @@ void *postcopy_preempt_thread(void *opaque)
rcu_register_thread();
- qemu_sem_post(&mis->thread_sync_sem);
+ qemu_event_set(&mis->thread_sync_event);
/*
* The preempt channel is established in asynchronous way. Wait
diff --git a/migration/savevm.c b/migration/savevm.c
index 006514c3e301..52105dd2f10b 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2078,7 +2078,7 @@ static void *postcopy_ram_listen_thread(void *opaque)
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_POSTCOPY_ACTIVE);
- qemu_sem_post(&mis->thread_sync_sem);
+ qemu_event_set(&mis->thread_sync_event);
trace_postcopy_ram_listen_thread_start();
rcu_register_thread();
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v4 11/11] hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
` (9 preceding siblings ...)
2025-05-26 5:29 ` [PATCH v4 10/11] migration/postcopy: " Akihiko Odaki
@ 2025-05-26 5:29 ` Akihiko Odaki
2025-05-26 9:27 ` Philippe Mathieu-Daudé
2025-05-26 14:48 ` [PATCH v4 00/11] Improve futex usage Peter Xu
2025-05-26 16:51 ` Paolo Bonzini
12 siblings, 1 reply; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-26 5:29 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé,
Akihiko Odaki
sem in AppleGFXReadMemoryJob is an one-shot event so it can be converted
into QemuEvent, which is more specialized for such a use case.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
---
hw/display/apple-gfx.m | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/hw/display/apple-gfx.m b/hw/display/apple-gfx.m
index 8dde1f138dce..174d56ae05bc 100644
--- a/hw/display/apple-gfx.m
+++ b/hw/display/apple-gfx.m
@@ -454,7 +454,7 @@ static void set_cursor_glyph(void *opaque)
/* ------ DMA (device reading system memory) ------ */
typedef struct AppleGFXReadMemoryJob {
- QemuSemaphore sem;
+ QemuEvent event;
hwaddr physical_address;
uint64_t length;
void *dst;
@@ -470,7 +470,7 @@ static void apple_gfx_do_read_memory(void *opaque)
job->dst, job->length, MEMTXATTRS_UNSPECIFIED);
job->success = (r == MEMTX_OK);
- qemu_sem_post(&job->sem);
+ qemu_event_set(&job->event);
}
static bool apple_gfx_read_memory(AppleGFXState *s, hwaddr physical_address,
@@ -483,11 +483,11 @@ static bool apple_gfx_read_memory(AppleGFXState *s, hwaddr physical_address,
trace_apple_gfx_read_memory(physical_address, length, dst);
/* Performing DMA requires BQL, so do it in a BH. */
- qemu_sem_init(&job.sem, 0);
+ qemu_event_init(&job.event, 0);
aio_bh_schedule_oneshot(qemu_get_aio_context(),
apple_gfx_do_read_memory, &job);
- qemu_sem_wait(&job.sem);
- qemu_sem_destroy(&job.sem);
+ qemu_event_wait(&job.event);
+ qemu_event_destroy(&job.event);
return job.success;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v4 11/11] hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent
2025-05-26 5:29 ` [PATCH v4 11/11] hw/display/apple-gfx: " Akihiko Odaki
@ 2025-05-26 9:27 ` Philippe Mathieu-Daudé
2025-05-26 9:29 ` Philippe Mathieu-Daudé
0 siblings, 1 reply; 27+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-05-26 9:27 UTC (permalink / raw)
To: Akihiko Odaki, Paolo Bonzini, Stefan Weil, Peter Xu,
Fabiano Rosas, Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé
On 26/5/25 07:29, Akihiko Odaki wrote:
> sem in AppleGFXReadMemoryJob is an one-shot event so it can be converted
> into QemuEvent, which is more specialized for such a use case.
>
> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> ---
> hw/display/apple-gfx.m | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 11/11] hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent
2025-05-26 9:27 ` Philippe Mathieu-Daudé
@ 2025-05-26 9:29 ` Philippe Mathieu-Daudé
2025-05-29 4:49 ` Akihiko Odaki
0 siblings, 1 reply; 27+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-05-26 9:29 UTC (permalink / raw)
To: Akihiko Odaki, Paolo Bonzini, Stefan Weil, Peter Xu,
Fabiano Rosas, Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé
On 26/5/25 11:27, Philippe Mathieu-Daudé wrote:
> On 26/5/25 07:29, Akihiko Odaki wrote:
>> sem in AppleGFXReadMemoryJob is an one-shot event so it can be converted
>> into QemuEvent, which is more specialized for such a use case.
BTW it would be nice to document that in "qemu/thread.h" API.
>>
>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>> ---
>> hw/display/apple-gfx.m | 10 +++++-----
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 11/11] hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent
2025-05-26 9:29 ` Philippe Mathieu-Daudé
@ 2025-05-29 4:49 ` Akihiko Odaki
0 siblings, 0 replies; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-29 4:49 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Paolo Bonzini, Stefan Weil, Peter Xu,
Fabiano Rosas, Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé
On 2025/05/26 18:29, Philippe Mathieu-Daudé wrote:
> On 26/5/25 11:27, Philippe Mathieu-Daudé wrote:
>> On 26/5/25 07:29, Akihiko Odaki wrote:
>>> sem in AppleGFXReadMemoryJob is an one-shot event so it can be converted
>>> into QemuEvent, which is more specialized for such a use case.
>
> BTW it would be nice to document that in "qemu/thread.h" API.
I will add a documentation with the next version.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 00/11] Improve futex usage
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
` (10 preceding siblings ...)
2025-05-26 5:29 ` [PATCH v4 11/11] hw/display/apple-gfx: " Akihiko Odaki
@ 2025-05-26 14:48 ` Peter Xu
2025-05-27 2:09 ` Akihiko Odaki
2025-05-26 16:51 ` Paolo Bonzini
12 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2025-05-26 14:48 UTC (permalink / raw)
To: Akihiko Odaki
Cc: Paolo Bonzini, Stefan Weil, Fabiano Rosas, Hailiang Zhang,
Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé
On Mon, May 26, 2025 at 02:29:10PM +0900, Akihiko Odaki wrote:
> Akihiko Odaki (11):
> futex: Check value after qemu_futex_wait()
> futex: Support Windows
> qemu-thread: Remove qatomic_read() in qemu_event_set()
> qemu-thread: Replace __linux__ with CONFIG_LINUX
> qemu-thread: Avoid futex abstraction for non-Linux
> qemu-thread: Use futex for QemuEvent on Windows
> qemu-thread: Use futex if available for QemuLockCnt
> migration: Replace QemuSemaphore with QemuEvent
> migration/colo: Replace QemuSemaphore with QemuEvent
> migration/postcopy: Replace QemuSemaphore with QemuEvent
In case it makes things easier.. I queued the three migration patches;
AFAIU they look like standalone to go even without prior patches, meanwhile
it shouldn't be an issue if they're queued in two pulls.
I am still not sure whether patch 1 is needed at all, but I'll leave that
to others to decide.
Thanks!
> hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent
--
Peter Xu
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 00/11] Improve futex usage
2025-05-26 14:48 ` [PATCH v4 00/11] Improve futex usage Peter Xu
@ 2025-05-27 2:09 ` Akihiko Odaki
2025-05-27 13:46 ` Peter Xu
0 siblings, 1 reply; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-27 2:09 UTC (permalink / raw)
To: Peter Xu
Cc: Paolo Bonzini, Stefan Weil, Fabiano Rosas, Hailiang Zhang,
Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé
On 2025/05/26 23:48, Peter Xu wrote:
> On Mon, May 26, 2025 at 02:29:10PM +0900, Akihiko Odaki wrote:
>> Akihiko Odaki (11):
>> futex: Check value after qemu_futex_wait()
>> futex: Support Windows
>> qemu-thread: Remove qatomic_read() in qemu_event_set()
>> qemu-thread: Replace __linux__ with CONFIG_LINUX
>> qemu-thread: Avoid futex abstraction for non-Linux
>> qemu-thread: Use futex for QemuEvent on Windows
>> qemu-thread: Use futex if available for QemuLockCnt
>> migration: Replace QemuSemaphore with QemuEvent
>> migration/colo: Replace QemuSemaphore with QemuEvent
>> migration/postcopy: Replace QemuSemaphore with QemuEvent
>
> In case it makes things easier.. I queued the three migration patches;
> AFAIU they look like standalone to go even without prior patches, meanwhile
> it shouldn't be an issue if they're queued in two pulls.
>
> I am still not sure whether patch 1 is needed at all, but I'll leave that
> to others to decide.
The migration patches shouldn't be applied before patches "futex: Check
value after qemu_futex_wait()" and "qemu-thread: Avoid futex abstraction
for non-Linux" as they fix QemuEvent. Merging migration patches earlier
can trigger bugs just like one we faced with hw/display/apple-gfx*
Regarding patch 1 ("futex: Check value after qemu_futex_wait()"), I can
argue that we should have it to comply the generic "upstream-first"
principle; the upstream (man page) says to make a loop, so the
downstream (QEMU) should follow that until the upstream says otherwise.
But I think it's a good idea to spend efforts to understand the
reasoning behind the man page since it's so important that it affects
not only QEMU but also any userspace program that uses libpthread (i.e.,
practically everything).
Regards,
Akihiko Odaki
*
https://lore.kernel.org/qemu-devel/a8b4472c-8741-4f80-87e5-b406c56d1acd@daynix.com/
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 00/11] Improve futex usage
2025-05-27 2:09 ` Akihiko Odaki
@ 2025-05-27 13:46 ` Peter Xu
2025-05-28 3:30 ` Akihiko Odaki
0 siblings, 1 reply; 27+ messages in thread
From: Peter Xu @ 2025-05-27 13:46 UTC (permalink / raw)
To: Akihiko Odaki
Cc: Paolo Bonzini, Stefan Weil, Fabiano Rosas, Hailiang Zhang,
Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé
On Tue, May 27, 2025 at 11:09:08AM +0900, Akihiko Odaki wrote:
> On 2025/05/26 23:48, Peter Xu wrote:
> > On Mon, May 26, 2025 at 02:29:10PM +0900, Akihiko Odaki wrote:
> > > Akihiko Odaki (11):
> > > futex: Check value after qemu_futex_wait()
> > > futex: Support Windows
> > > qemu-thread: Remove qatomic_read() in qemu_event_set()
> > > qemu-thread: Replace __linux__ with CONFIG_LINUX
> > > qemu-thread: Avoid futex abstraction for non-Linux
> > > qemu-thread: Use futex for QemuEvent on Windows
> > > qemu-thread: Use futex if available for QemuLockCnt
> > > migration: Replace QemuSemaphore with QemuEvent
> > > migration/colo: Replace QemuSemaphore with QemuEvent
> > > migration/postcopy: Replace QemuSemaphore with QemuEvent
> >
> > In case it makes things easier.. I queued the three migration patches;
> > AFAIU they look like standalone to go even without prior patches, meanwhile
> > it shouldn't be an issue if they're queued in two pulls.
> >
> > I am still not sure whether patch 1 is needed at all, but I'll leave that
> > to others to decide.
>
> The migration patches shouldn't be applied before patches "futex: Check
> value after qemu_futex_wait()" and "qemu-thread: Avoid futex abstraction for
> non-Linux" as they fix QemuEvent. Merging migration patches earlier can
> trigger bugs just like one we faced with hw/display/apple-gfx*
I didn't see anything like it mentioned in either cover letter or the
apple-gfx patch. Could you provide a pointer?
>
> Regarding patch 1 ("futex: Check value after qemu_futex_wait()"), I can
> argue that we should have it to comply the generic "upstream-first"
> principle; the upstream (man page) says to make a loop, so the downstream
> (QEMU) should follow that until the upstream says otherwise. But I think
> it's a good idea to spend efforts to understand the reasoning behind the man
> page since it's so important that it affects not only QEMU but also any
> userspace program that uses libpthread (i.e., practically everything).
IMHO it's not how we define upstream/downstream, but maybe it's me who
missed something.
It's fine - as long as someone agrees with you (Paolo? :), it could likely
be that I was wrong and I just didn't realize that.
Thanks for the update anyway, I will drop the migration patches regardless.
--
Peter Xu
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 00/11] Improve futex usage
2025-05-27 13:46 ` Peter Xu
@ 2025-05-28 3:30 ` Akihiko Odaki
0 siblings, 0 replies; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-28 3:30 UTC (permalink / raw)
To: Peter Xu
Cc: Paolo Bonzini, Stefan Weil, Fabiano Rosas, Hailiang Zhang,
Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé
On 2025/05/27 22:46, Peter Xu wrote:
> On Tue, May 27, 2025 at 11:09:08AM +0900, Akihiko Odaki wrote:
>> On 2025/05/26 23:48, Peter Xu wrote:
>>> On Mon, May 26, 2025 at 02:29:10PM +0900, Akihiko Odaki wrote:
>>>> Akihiko Odaki (11):
>>>> futex: Check value after qemu_futex_wait()
>>>> futex: Support Windows
>>>> qemu-thread: Remove qatomic_read() in qemu_event_set()
>>>> qemu-thread: Replace __linux__ with CONFIG_LINUX
>>>> qemu-thread: Avoid futex abstraction for non-Linux
>>>> qemu-thread: Use futex for QemuEvent on Windows
>>>> qemu-thread: Use futex if available for QemuLockCnt
>>>> migration: Replace QemuSemaphore with QemuEvent
>>>> migration/colo: Replace QemuSemaphore with QemuEvent
>>>> migration/postcopy: Replace QemuSemaphore with QemuEvent
>>>
>>> In case it makes things easier.. I queued the three migration patches;
>>> AFAIU they look like standalone to go even without prior patches, meanwhile
>>> it shouldn't be an issue if they're queued in two pulls.
>>>
>>> I am still not sure whether patch 1 is needed at all, but I'll leave that
>>> to others to decide.
>>
>> The migration patches shouldn't be applied before patches "futex: Check
>> value after qemu_futex_wait()" and "qemu-thread: Avoid futex abstraction for
>> non-Linux" as they fix QemuEvent. Merging migration patches earlier can
>> trigger bugs just like one we faced with hw/display/apple-gfx*
>
> I didn't see anything like it mentioned in either cover letter or the
> apple-gfx patch. Could you provide a pointer?
The previous email had a URL at the bottom:
https://lore.kernel.org/qemu-devel/a8b4472c-8741-4f80-87e5-b406c56d1acd@daynix.com/
This thread discussed a bug of QemuEvent and reached to a conclusion to
use QemuSemaphore instead to avoid it. This patch series intends to fix it.
Regards,
Akihiko Odaki
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 00/11] Improve futex usage
2025-05-26 5:29 [PATCH v4 00/11] Improve futex usage Akihiko Odaki
` (11 preceding siblings ...)
2025-05-26 14:48 ` [PATCH v4 00/11] Improve futex usage Peter Xu
@ 2025-05-26 16:51 ` Paolo Bonzini
2025-05-27 3:00 ` Akihiko Odaki
12 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2025-05-26 16:51 UTC (permalink / raw)
To: Akihiko Odaki, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé
On 5/26/25 07:29, Akihiko Odaki wrote:
> Changes in v4:
> - Added patch "qemu-thread: Remove qatomic_read() in qemu_event_set()".
Hi Akihiko,
I'm not so confident about putting this patch before the other changes;
I'm referring basically to this hunk:
diff --git a/util/event.c b/util/event.c
index 366c77c90cf..663b7042b17 100644
--- a/util/event.c
+++ b/util/event.c
@@ -48,22 +48,9 @@ void qemu_event_set(QemuEvent *ev)
assert(ev->initialized);
#ifdef HAVE_FUTEX
- /*
- * Pairs with both qemu_event_reset() and qemu_event_wait().
- *
- * qemu_event_set has release semantics, but because it *loads*
- * ev->value we need a full memory barrier here.
- */
- smp_mb();
- if (qatomic_read(&ev->value) != EV_SET) {
- int old = qatomic_xchg(&ev->value, EV_SET);
-
- /* Pairs with memory barrier in kernel futex_wait system call. */
- smp_mb__after_rmw();
- if (old == EV_BUSY) {
- /* There were waiters, wake them up. */
- qemu_futex_wake_all(ev);
- }
+ if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
+ /* There were waiters, wake them up. */
+ qemu_futex_wake_all(ev);
}
#else
pthread_mutex_lock(&ev->lock);
... feel free to resubmit that separately, also because it's missing
a smp_mb__before_rmw().
Also, I'm not sure what was your opinion of the more optimized version
of qemu_event_reset:
diff --git a/util/event.c b/util/event.c
index 366c77c90cf..663b7042b17 100644
--- a/util/event.c
+++ b/util/event.c
@@ -78,6 +78,7 @@ void qemu_event_reset(QemuEvent *ev)
{
assert(ev->initialized);
+#ifdef HAVE_FUTEX
/*
* If there was a concurrent reset (or even reset+wait),
* do nothing. Otherwise change EV_SET->EV_FREE.
@@ -86,6 +87,28 @@ void qemu_event_reset(QemuEvent *ev)
*/
smp_mb__after_rmw();
+#else
+ /*
+ * If futexes are not available, there are no EV_FREE->EV_BUSY
+ * transitions because wakeups are done entirely through the
+ * condition variable. Since qatomic_set() only writes EV_FREE,
+ * the load seems useless but in reality, the acquire synchronizes
+ * with qemu_event_set()'s store release: if qemu_event_reset()
+ * sees EV_SET here, then the caller will certainly see a
+ * successful condition and skip qemu_event_wait():
+ *
+ * done = 1; if (done == 0)
+ * qemu_event_set() { qemu_event_reset() {
+ * lock();
+ * ev->value = EV_SET -----> load ev->value
+ * ev->value = old value | EV_FREE
+ * cond_broadcast()
+ * unlock(); }
+ * } if (done == 0)
+ * // qemu_event_wait() not called
+ */
+ qatomic_set(&ev->value, qatomic_load_acquire(&ev->value) | EV_FREE);
+#endif
}
void qemu_event_wait(QemuEvent *ev)
Do you think it's incorrect? I'll wait for your answer before sending
out the actual pull request.
Paolo
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v4 00/11] Improve futex usage
2025-05-26 16:51 ` Paolo Bonzini
@ 2025-05-27 3:00 ` Akihiko Odaki
2025-05-27 15:01 ` Paolo Bonzini
0 siblings, 1 reply; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-27 3:00 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Weil, Peter Xu, Fabiano Rosas,
Hailiang Zhang
Cc: Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé
On 2025/05/27 1:51, Paolo Bonzini wrote:
> On 5/26/25 07:29, Akihiko Odaki wrote:
>> Changes in v4:
>> - Added patch "qemu-thread: Remove qatomic_read() in qemu_event_set()".
>
> Hi Akihiko,
>
> I'm not so confident about putting this patch before the other changes;
> I'm referring basically to this hunk:
>
> diff --git a/util/event.c b/util/event.c
> index 366c77c90cf..663b7042b17 100644
> --- a/util/event.c
> +++ b/util/event.c
> @@ -48,22 +48,9 @@ void qemu_event_set(QemuEvent *ev)
> assert(ev->initialized);
>
> #ifdef HAVE_FUTEX
> - /*
> - * Pairs with both qemu_event_reset() and qemu_event_wait().
> - *
> - * qemu_event_set has release semantics, but because it *loads*
> - * ev->value we need a full memory barrier here.
> - */
> - smp_mb();
> - if (qatomic_read(&ev->value) != EV_SET) {
> - int old = qatomic_xchg(&ev->value, EV_SET);
> -
> - /* Pairs with memory barrier in kernel futex_wait system call. */
> - smp_mb__after_rmw();
> - if (old == EV_BUSY) {
> - /* There were waiters, wake them up. */
> - qemu_futex_wake_all(ev);
> - }
> + if (qatomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
> + /* There were waiters, wake them up. */
> + qemu_futex_wake_all(ev);
> }
> #else
> pthread_mutex_lock(&ev->lock);
>
>
> ... feel free to resubmit that separately, also because it's missing
> a smp_mb__before_rmw().
I'd like to submit it with "[PATCH v4 05/11] qemu-thread: Avoid futex
abstraction for non-Linux" because it aligns the implementations of
Linux and non-Linux versions to rely on a store-release of EV_SET in
qemu_event_set().
smp_mb__before_rmw() is removed with "[PATCH v4 01/11] futex: Check
value after qemu_futex_wait()" because the fact mentioned with the
comment "Pairs with memory barrier in kernel futex_wait system call" is
no longer relevant; the patch makes QEMU always rely on
qatomic_load_acquire() or qatomic_cmpxchg() to perform a load-acquire of
EV_SET for ordering.
In either case, I can simply that say the ordering between
qemu_event_set() and qemu_event_wait() are ensured by load-acquire and
store-release operations of EV_SET, but to make the correctness
absolutely sure, I recommend to look at my past explanation:
https://lore.kernel.org/qemu-devel/ab6b66d7-fa8c-4049-9a3b-975f7f9c06ab@daynix.com
The explanation is long but so comprehensive that it lists all memory
accesses, ordering requirements, and features of atomic primitives and
futex employed to satisfy the requirements.
>
>
> Also, I'm not sure what was your opinion of the more optimized version
> of qemu_event_reset:
>
> diff --git a/util/event.c b/util/event.c
> index 366c77c90cf..663b7042b17 100644
> --- a/util/event.c
> +++ b/util/event.c
> @@ -78,6 +78,7 @@ void qemu_event_reset(QemuEvent *ev)
> {
> assert(ev->initialized);
>
> +#ifdef HAVE_FUTEX
> /*
> * If there was a concurrent reset (or even reset+wait),
> * do nothing. Otherwise change EV_SET->EV_FREE.
> @@ -86,6 +87,28 @@ void qemu_event_reset(QemuEvent *ev)
> */
> smp_mb__after_rmw();
> +#else
> + /*
> + * If futexes are not available, there are no EV_FREE->EV_BUSY
> + * transitions because wakeups are done entirely through the
> + * condition variable. Since qatomic_set() only writes EV_FREE,
> + * the load seems useless but in reality, the acquire synchronizes
> + * with qemu_event_set()'s store release: if qemu_event_reset()
> + * sees EV_SET here, then the caller will certainly see a
> + * successful condition and skip qemu_event_wait():
> + *
> + * done = 1; if (done == 0)
> + * qemu_event_set() { qemu_event_reset() {
> + * lock();
> + * ev->value = EV_SET -----> load ev->value
> + * ev->value = old value | EV_FREE
> + * cond_broadcast()
> + * unlock(); }
> + * } if (done == 0)
> + * // qemu_event_wait() not called
> + */
> + qatomic_set(&ev->value, qatomic_load_acquire(&ev->value) | EV_FREE);
> +#endif
> }
>
> void qemu_event_wait(QemuEvent *ev)
>
>
> Do you think it's incorrect? I'll wait for your answer before sending
> out the actual pull request.
It's correct, but I don't think it's worthwhile.
This code path is only used by platforms without a futex wrapper.
Currently we only have one for Linux and this series adds one for
Windows, but FreeBSD[1] and OpenBSD[2] have their own futex. macOS also
gained one with version 14.4.[3] We can add wrappers for them too if
their performance really matters.
So the only platforms listed in docs/about/build-platforms.rst that
require the non-futex version are macOS older than 14.4 and NetBSD.
macOS older than 14.4 will not be supported after June 5 since macOS 14
was released June 5, 2023 and docs/about/build-platforms.rst says:
> Support for the previous major version will be dropped 2 years after
> the new major version is released or when the vendor itself drops
> support, whichever comes first.
> Within each major release, only the most recent minor release is
> considered.
There are too few relevant platforms to justify the effort potentially
needed for quality assurance.
Moreover, qemu_event_reset() is often followed by qemu_event_wait() or
other barriers so probably relaxing ordering here does not affect the
overall ordering constraint (and performance) much.
Regards,
Akihiko Odaki
[1]
https://man.freebsd.org/cgi/man.cgi?query=_umtx_op&apropos=0&sektion=2&manpath=FreeBSD+14.2-RELEASE&arch=default&format=html
[2] https://man.openbsd.org/futex
[3]
https://developer.apple.com/documentation/os/synchronization?language=objc#Futex-Conditional-Wait-Primitives
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 00/11] Improve futex usage
2025-05-27 3:00 ` Akihiko Odaki
@ 2025-05-27 15:01 ` Paolo Bonzini
2025-05-28 3:26 ` Akihiko Odaki
0 siblings, 1 reply; 27+ messages in thread
From: Paolo Bonzini @ 2025-05-27 15:01 UTC (permalink / raw)
To: Akihiko Odaki
Cc: Stefan Weil, Peter Xu, Fabiano Rosas, Hailiang Zhang,
Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé
On Tue, May 27, 2025 at 5:01 AM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
> I'd like to submit it with "[PATCH v4 05/11] qemu-thread: Avoid futex
> abstraction for non-Linux" because it aligns the implementations of
> Linux and non-Linux versions to rely on a store-release of EV_SET in
> qemu_event_set().
Ok, I see what you mean - you would like the xchg to be an
xchg_release essentially.
There is actually one case in which skipping the xchg has an effect.
If you have the following:
- one side does
s.foo = 1;
qemu_event_set(&s.ev);
- the other side never reaches the qemu_event_reset(&s.ev)
then skipping the xchg might allow the cacheline for ev to remain
shared. This is unlikely to *make* a difference, though it does
*exist* as a difference, so I will review the patch, but I really
prefer to place it last. It's safer to take a known-working
algorithm, apply it to all OSes (or at least Linux and Windows), and
only then you refine it. It also makes my queue shorter.
> > Do you think it's incorrect? I'll wait for your answer before sending
> > out the actual pull request.
>
> It's correct, but I don't think it's worthwhile.
>
> This code path is only used by platforms without a futex wrapper.
> Currently we only have one for Linux and this series adds one for
> Windows, but FreeBSD[1] and OpenBSD[2] have their own futex. macOS also
> gained one with version 14.4.[3] We can add wrappers for them too if
> their performance really matters.
> So the only platforms listed in docs/about/build-platforms.rst that
> require the non-futex version are macOS older than 14.4 and NetBSD.
> macOS older than 14.4 will not be supported after June 5 since macOS 14
> was released June 5, 2023 and docs/about/build-platforms.rst says:
>
> There are too few relevant platforms to justify the effort potentially
> needed for quality assurance.
Ok, nice. So it's really just NetBSD in the end.
> Moreover, qemu_event_reset() is often followed by qemu_event_wait() or
> other barriers so probably relaxing ordering here does not affect the
> overall ordering constraint (and performance) much.
Understood. For me it wasn't really about performance, but more about
understanding exactly which reorderings can happen and what
synchronizes with what. Load-acquire/store-release are simpler to
understand in that respect, especially since this use of condvar,
without the mutex in reset, is different from everything else that
I've ever seen.
Paolo
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v4 00/11] Improve futex usage
2025-05-27 15:01 ` Paolo Bonzini
@ 2025-05-28 3:26 ` Akihiko Odaki
0 siblings, 0 replies; 27+ messages in thread
From: Akihiko Odaki @ 2025-05-28 3:26 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Stefan Weil, Peter Xu, Fabiano Rosas, Hailiang Zhang,
Phil Dennis-Jordan, qemu-devel, devel, Marc-André Lureau,
Daniel P. Berrangé, Philippe Mathieu-Daudé
On 2025/05/28 0:01, Paolo Bonzini wrote:
> On Tue, May 27, 2025 at 5:01 AM Akihiko Odaki <akihiko.odaki@daynix.com> wrote:
>> I'd like to submit it with "[PATCH v4 05/11] qemu-thread: Avoid futex
>> abstraction for non-Linux" because it aligns the implementations of
>> Linux and non-Linux versions to rely on a store-release of EV_SET in
>> qemu_event_set().
>
> Ok, I see what you mean - you would like the xchg to be an
> xchg_release essentially.
>
> There is actually one case in which skipping the xchg has an effect.
> If you have the following:
>
> - one side does
>
> s.foo = 1;
> qemu_event_set(&s.ev);
>
> - the other side never reaches the qemu_event_reset(&s.ev)
>
> then skipping the xchg might allow the cacheline for ev to remain
> shared. This is unlikely to *make* a difference, though it does
> *exist* as a difference, so I will review the patch, but I really
> prefer to place it last. It's safer to take a known-working
> algorithm, apply it to all OSes (or at least Linux and Windows), and
> only then you refine it. It also makes my queue shorter.
Compilers and processors are free to make such an optimization so the
contract between us and them does not change in that sense.
On the other hand, the removal of smb_mb() does change the contract; now
compilers and processors are free to reorder loads and stores specified
before qemu_event_set() after the load of ev->value, which was not
possible in the past.
I'm confident that the change of the contract is safe, but it makes
sense to pay extra caution. I'll reorder the patches with the next version.
Regards,
Akihiko Odaki
^ permalink raw reply [flat|nested] 27+ messages in thread