* [PATCH 0/2] fix record/replay on MacOS
@ 2025-04-10 22:55 Pierrick Bouvier
2025-04-10 22:55 ` [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread Pierrick Bouvier
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Pierrick Bouvier @ 2025-04-10 22:55 UTC (permalink / raw)
To: qemu-devel
Cc: philmd, Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan, Pierrick Bouvier
Recently, it was found that rr tests fail on MacOS, with a replay_mutex_unlock()
assertion. This is a recent regression, related to running qemu main event loop
in a separate thread, like first commit explain.
We first fix the regression, by handling the qemu replay mutex in the same way
we deal with BQL.
Then, we reenable the disabled test.
Pierrick Bouvier (2):
system/main: transfer replay mutex ownership from main thread to main
loop thread
tests/functional/test_aarch64_replay: reenable on macos
system/main.c | 4 ++++
tests/functional/test_aarch64_replay.py | 2 --
2 files changed, 4 insertions(+), 2 deletions(-)
--
2.39.5
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread
2025-04-10 22:55 [PATCH 0/2] fix record/replay on MacOS Pierrick Bouvier
@ 2025-04-10 22:55 ` Pierrick Bouvier
2025-04-12 5:30 ` Nicholas Piggin
2025-04-10 22:55 ` [PATCH 2/2] tests/functional/test_aarch64_replay: reenable on macos Pierrick Bouvier
` (2 subsequent siblings)
3 siblings, 1 reply; 15+ messages in thread
From: Pierrick Bouvier @ 2025-04-10 22:55 UTC (permalink / raw)
To: qemu-devel
Cc: philmd, Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan, Pierrick Bouvier
On MacOS, UI event loop has to be ran in the main thread of a process.
Because of that restriction, on this platform, qemu main event loop is
ran on another thread [1].
This breaks record/replay feature, which expects thread running qemu_init
to initialize hold this lock, breaking associated functional tests on
MacOS.
Thus, as a generalization, and similar to how BQL is handled, we release
it after init, and reacquire the lock before entering main event loop,
avoiding a special case if a separate thread is used.
Tested on MacOS with:
$ meson test -C build --setup thorough --print-errorlogs \
func-x86_64-x86_64_replay func-arm-arm_replay func-aarch64-aarch64_replay
$ ./build/qemu-system-x86_64 -nographic -icount shift=auto,rr=record,rrfile=replay.log
$ ./build/qemu-system-x86_64 -nographic -icount shift=auto,rr=replay,rrfile=replay.log
[1] https://gitlab.com/qemu-project/qemu/-/commit/f5ab12caba4f1656479c1feb5248beac1c833243
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2907
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
system/main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/system/main.c b/system/main.c
index ecb12fd397c..1c022067349 100644
--- a/system/main.c
+++ b/system/main.c
@@ -25,6 +25,7 @@
#include "qemu/osdep.h"
#include "qemu-main.h"
#include "qemu/main-loop.h"
+#include "system/replay.h"
#include "system/system.h"
#ifdef CONFIG_SDL
@@ -44,10 +45,12 @@ static void *qemu_default_main(void *opaque)
{
int status;
+ replay_mutex_lock();
bql_lock();
status = qemu_main_loop();
qemu_cleanup(status);
bql_unlock();
+ replay_mutex_unlock();
exit(status);
}
@@ -67,6 +70,7 @@ int main(int argc, char **argv)
{
qemu_init(argc, argv);
bql_unlock();
+ replay_mutex_unlock();
if (qemu_main) {
QemuThread main_loop_thread;
qemu_thread_create(&main_loop_thread, "qemu_main",
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] tests/functional/test_aarch64_replay: reenable on macos
2025-04-10 22:55 [PATCH 0/2] fix record/replay on MacOS Pierrick Bouvier
2025-04-10 22:55 ` [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread Pierrick Bouvier
@ 2025-04-10 22:55 ` Pierrick Bouvier
2025-04-11 13:42 ` [PATCH 0/2] fix record/replay on MacOS Philippe Mathieu-Daudé
2025-04-14 15:14 ` Stefan Hajnoczi
3 siblings, 0 replies; 15+ messages in thread
From: Pierrick Bouvier @ 2025-04-10 22:55 UTC (permalink / raw)
To: qemu-devel
Cc: philmd, Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan, Pierrick Bouvier
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
tests/functional/test_aarch64_replay.py | 2 --
1 file changed, 2 deletions(-)
diff --git a/tests/functional/test_aarch64_replay.py b/tests/functional/test_aarch64_replay.py
index 029fef3cbf8..bd6609d9149 100755
--- a/tests/functional/test_aarch64_replay.py
+++ b/tests/functional/test_aarch64_replay.py
@@ -16,8 +16,6 @@ class Aarch64Replay(ReplayKernelBase):
'releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz'),
'7e1430b81c26bdd0da025eeb8fbd77b5dc961da4364af26e771bd39f379cbbf7')
- # Failing on Darwin: https://gitlab.com/qemu-project/qemu/-/issues/2907
- @skipIfOperatingSystem('Darwin')
def test_aarch64_virt(self):
self.set_machine('virt')
self.cpu = 'cortex-a53'
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] fix record/replay on MacOS
2025-04-10 22:55 [PATCH 0/2] fix record/replay on MacOS Pierrick Bouvier
2025-04-10 22:55 ` [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread Pierrick Bouvier
2025-04-10 22:55 ` [PATCH 2/2] tests/functional/test_aarch64_replay: reenable on macos Pierrick Bouvier
@ 2025-04-11 13:42 ` Philippe Mathieu-Daudé
2025-04-14 15:14 ` Stefan Hajnoczi
3 siblings, 0 replies; 15+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-04-11 13:42 UTC (permalink / raw)
To: Pierrick Bouvier, qemu-devel
Cc: Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan
On 11/4/25 00:55, Pierrick Bouvier wrote:
> Pierrick Bouvier (2):
> system/main: transfer replay mutex ownership from main thread to main
> loop thread
> tests/functional/test_aarch64_replay: reenable on macos
Series:
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread
2025-04-10 22:55 ` [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread Pierrick Bouvier
@ 2025-04-12 5:30 ` Nicholas Piggin
2025-04-12 17:24 ` Pierrick Bouvier
0 siblings, 1 reply; 15+ messages in thread
From: Nicholas Piggin @ 2025-04-12 5:30 UTC (permalink / raw)
To: Pierrick Bouvier, qemu-devel
Cc: philmd, Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan
On Fri Apr 11, 2025 at 8:55 AM AEST, Pierrick Bouvier wrote:
> On MacOS, UI event loop has to be ran in the main thread of a process.
> Because of that restriction, on this platform, qemu main event loop is
> ran on another thread [1].
>
> This breaks record/replay feature, which expects thread running qemu_init
> to initialize hold this lock, breaking associated functional tests on
> MacOS.
>
> Thus, as a generalization, and similar to how BQL is handled, we release
> it after init, and reacquire the lock before entering main event loop,
> avoiding a special case if a separate thread is used.
>
> Tested on MacOS with:
> $ meson test -C build --setup thorough --print-errorlogs \
> func-x86_64-x86_64_replay func-arm-arm_replay func-aarch64-aarch64_replay
> $ ./build/qemu-system-x86_64 -nographic -icount shift=auto,rr=record,rrfile=replay.log
> $ ./build/qemu-system-x86_64 -nographic -icount shift=auto,rr=replay,rrfile=replay.log
>
> [1] https://gitlab.com/qemu-project/qemu/-/commit/f5ab12caba4f1656479c1feb5248beac1c833243
>
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2907
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
> system/main.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/system/main.c b/system/main.c
> index ecb12fd397c..1c022067349 100644
> --- a/system/main.c
> +++ b/system/main.c
> @@ -25,6 +25,7 @@
> #include "qemu/osdep.h"
> #include "qemu-main.h"
> #include "qemu/main-loop.h"
> +#include "system/replay.h"
> #include "system/system.h"
>
> #ifdef CONFIG_SDL
> @@ -44,10 +45,12 @@ static void *qemu_default_main(void *opaque)
> {
> int status;
>
> + replay_mutex_lock();
> bql_lock();
> status = qemu_main_loop();
> qemu_cleanup(status);
> bql_unlock();
> + replay_mutex_unlock();
>
> exit(status);
> }
> @@ -67,6 +70,7 @@ int main(int argc, char **argv)
> {
> qemu_init(argc, argv);
> bql_unlock();
> + replay_mutex_unlock();
> if (qemu_main) {
> QemuThread main_loop_thread;
> qemu_thread_create(&main_loop_thread, "qemu_main",
Do we actually need to hold replay mutex (or even bql) over qemu_init()?
Both should get dropped before we return here. But as a simple fix, I
guess this is okay.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread
2025-04-12 5:30 ` Nicholas Piggin
@ 2025-04-12 17:24 ` Pierrick Bouvier
2025-04-14 10:25 ` Philippe Mathieu-Daudé
2025-04-14 10:57 ` Paolo Bonzini
0 siblings, 2 replies; 15+ messages in thread
From: Pierrick Bouvier @ 2025-04-12 17:24 UTC (permalink / raw)
To: Nicholas Piggin, qemu-devel
Cc: philmd, Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan
On 4/11/25 22:30, Nicholas Piggin wrote:
> On Fri Apr 11, 2025 at 8:55 AM AEST, Pierrick Bouvier wrote:
>> On MacOS, UI event loop has to be ran in the main thread of a process.
>> Because of that restriction, on this platform, qemu main event loop is
>> ran on another thread [1].
>>
>> This breaks record/replay feature, which expects thread running qemu_init
>> to initialize hold this lock, breaking associated functional tests on
>> MacOS.
>>
>> Thus, as a generalization, and similar to how BQL is handled, we release
>> it after init, and reacquire the lock before entering main event loop,
>> avoiding a special case if a separate thread is used.
>>
>> Tested on MacOS with:
>> $ meson test -C build --setup thorough --print-errorlogs \
>> func-x86_64-x86_64_replay func-arm-arm_replay func-aarch64-aarch64_replay
>> $ ./build/qemu-system-x86_64 -nographic -icount shift=auto,rr=record,rrfile=replay.log
>> $ ./build/qemu-system-x86_64 -nographic -icount shift=auto,rr=replay,rrfile=replay.log
>>
>> [1] https://gitlab.com/qemu-project/qemu/-/commit/f5ab12caba4f1656479c1feb5248beac1c833243
>>
>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2907
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> system/main.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/system/main.c b/system/main.c
>> index ecb12fd397c..1c022067349 100644
>> --- a/system/main.c
>> +++ b/system/main.c
>> @@ -25,6 +25,7 @@
>> #include "qemu/osdep.h"
>> #include "qemu-main.h"
>> #include "qemu/main-loop.h"
>> +#include "system/replay.h"
>> #include "system/system.h"
>>
>> #ifdef CONFIG_SDL
>> @@ -44,10 +45,12 @@ static void *qemu_default_main(void *opaque)
>> {
>> int status;
>>
>> + replay_mutex_lock();
>> bql_lock();
>> status = qemu_main_loop();
>> qemu_cleanup(status);
>> bql_unlock();
>> + replay_mutex_unlock();
>>
>> exit(status);
>> }
>> @@ -67,6 +70,7 @@ int main(int argc, char **argv)
>> {
>> qemu_init(argc, argv);
>> bql_unlock();
>> + replay_mutex_unlock();
>> if (qemu_main) {
>> QemuThread main_loop_thread;
>> qemu_thread_create(&main_loop_thread, "qemu_main",
>
> Do we actually need to hold replay mutex (or even bql) over qemu_init()?
> Both should get dropped before we return here. But as a simple fix, I
> guess this is okay.
>
For the bql, I don't know the exact reason.
For replay lock, we need to hold it as clock gets saved as soon as the
devices are initialized, which happens before end of qemu_init.
> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread
2025-04-12 17:24 ` Pierrick Bouvier
@ 2025-04-14 10:25 ` Philippe Mathieu-Daudé
2025-04-14 15:24 ` Pierrick Bouvier
2025-04-14 10:57 ` Paolo Bonzini
1 sibling, 1 reply; 15+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-04-14 10:25 UTC (permalink / raw)
To: Pierrick Bouvier, Nicholas Piggin, qemu-devel
Cc: Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan
On 12/4/25 19:24, Pierrick Bouvier wrote:
> On 4/11/25 22:30, Nicholas Piggin wrote:
>> On Fri Apr 11, 2025 at 8:55 AM AEST, Pierrick Bouvier wrote:
>>> On MacOS, UI event loop has to be ran in the main thread of a process.
>>> Because of that restriction, on this platform, qemu main event loop is
>>> ran on another thread [1].
>>>
>>> This breaks record/replay feature, which expects thread running
>>> qemu_init
>>> to initialize hold this lock, breaking associated functional tests on
>>> MacOS.
>>>
>>> Thus, as a generalization, and similar to how BQL is handled, we release
>>> it after init, and reacquire the lock before entering main event loop,
>>> avoiding a special case if a separate thread is used.
>>>
>>> Tested on MacOS with:
>>> $ meson test -C build --setup thorough --print-errorlogs \
>>> func-x86_64-x86_64_replay func-arm-arm_replay func-aarch64-
>>> aarch64_replay
>>> $ ./build/qemu-system-x86_64 -nographic -icount
>>> shift=auto,rr=record,rrfile=replay.log
>>> $ ./build/qemu-system-x86_64 -nographic -icount
>>> shift=auto,rr=replay,rrfile=replay.log
>>>
>>> [1] https://gitlab.com/qemu-project/qemu/-/commit/
>>> f5ab12caba4f1656479c1feb5248beac1c833243
>>>
>>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2907
>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>> ---
>>> system/main.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/system/main.c b/system/main.c
>>> index ecb12fd397c..1c022067349 100644
>>> --- a/system/main.c
>>> +++ b/system/main.c
>>> @@ -25,6 +25,7 @@
>>> #include "qemu/osdep.h"
>>> #include "qemu-main.h"
>>> #include "qemu/main-loop.h"
>>> +#include "system/replay.h"
>>> #include "system/system.h"
>>> #ifdef CONFIG_SDL
>>> @@ -44,10 +45,12 @@ static void *qemu_default_main(void *opaque)
>>> {
>>> int status;
>>> + replay_mutex_lock();
>>> bql_lock();
>>> status = qemu_main_loop();
>>> qemu_cleanup(status);
>>> bql_unlock();
>>> + replay_mutex_unlock();
>>> exit(status);
>>> }
>>> @@ -67,6 +70,7 @@ int main(int argc, char **argv)
>>> {
>>> qemu_init(argc, argv);
>>> bql_unlock();
>>> + replay_mutex_unlock();
>>> if (qemu_main) {
>>> QemuThread main_loop_thread;
>>> qemu_thread_create(&main_loop_thread, "qemu_main",
>>
>> Do we actually need to hold replay mutex (or even bql) over qemu_init()?
>> Both should get dropped before we return here. But as a simple fix, I
>> guess this is okay.
>>
>
> For the bql, I don't know the exact reason.
> For replay lock, we need to hold it as clock gets saved as soon as the
> devices are initialized, which happens before end of qemu_init.
Could be worth adding a comment with that information.
>
>> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread
2025-04-12 17:24 ` Pierrick Bouvier
2025-04-14 10:25 ` Philippe Mathieu-Daudé
@ 2025-04-14 10:57 ` Paolo Bonzini
1 sibling, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2025-04-14 10:57 UTC (permalink / raw)
To: Pierrick Bouvier, Nicholas Piggin, qemu-devel
Cc: philmd, Stefan Hajnoczi, Alex Bennée, Phil Dennis-Jordan
On 4/12/25 19:24, Pierrick Bouvier wrote:
> On 4/11/25 22:30, Nicholas Piggin wrote:
>> Do we actually need to hold replay mutex (or even bql) over qemu_init()?
>> Both should get dropped before we return here. But as a simple fix, I
>> guess this is okay.
>
> For the bql, I don't know the exact reason.
In general it's better to assume that there can be other threads (and
therefore you need BQL).
Also, Rust code panics if you access a BqlCell or BqlRefCell without
holding the lock.
Paolo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] fix record/replay on MacOS
2025-04-10 22:55 [PATCH 0/2] fix record/replay on MacOS Pierrick Bouvier
` (2 preceding siblings ...)
2025-04-11 13:42 ` [PATCH 0/2] fix record/replay on MacOS Philippe Mathieu-Daudé
@ 2025-04-14 15:14 ` Stefan Hajnoczi
2025-04-14 15:25 ` Pierrick Bouvier
3 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2025-04-14 15:14 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, philmd, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan
[-- Attachment #1: Type: text/plain, Size: 902 bytes --]
On Thu, Apr 10, 2025 at 03:55:48PM -0700, Pierrick Bouvier wrote:
> Recently, it was found that rr tests fail on MacOS, with a replay_mutex_unlock()
> assertion. This is a recent regression, related to running qemu main event loop
> in a separate thread, like first commit explain.
>
> We first fix the regression, by handling the qemu replay mutex in the same way
> we deal with BQL.
> Then, we reenable the disabled test.
>
> Pierrick Bouvier (2):
> system/main: transfer replay mutex ownership from main thread to main
> loop thread
> tests/functional/test_aarch64_replay: reenable on macos
>
> system/main.c | 4 ++++
> tests/functional/test_aarch64_replay.py | 2 --
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> --
> 2.39.5
>
Thanks, applied to my staging tree:
https://gitlab.com/stefanha/qemu/commits/staging
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread
2025-04-14 10:25 ` Philippe Mathieu-Daudé
@ 2025-04-14 15:24 ` Pierrick Bouvier
2025-04-15 2:41 ` Nicholas Piggin
0 siblings, 1 reply; 15+ messages in thread
From: Pierrick Bouvier @ 2025-04-14 15:24 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Nicholas Piggin, qemu-devel
Cc: Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan
On 4/14/25 03:25, Philippe Mathieu-Daudé wrote:
> On 12/4/25 19:24, Pierrick Bouvier wrote:
>> On 4/11/25 22:30, Nicholas Piggin wrote:
>>> On Fri Apr 11, 2025 at 8:55 AM AEST, Pierrick Bouvier wrote:
>>>> On MacOS, UI event loop has to be ran in the main thread of a process.
>>>> Because of that restriction, on this platform, qemu main event loop is
>>>> ran on another thread [1].
>>>>
>>>> This breaks record/replay feature, which expects thread running
>>>> qemu_init
>>>> to initialize hold this lock, breaking associated functional tests on
>>>> MacOS.
>>>>
>>>> Thus, as a generalization, and similar to how BQL is handled, we release
>>>> it after init, and reacquire the lock before entering main event loop,
>>>> avoiding a special case if a separate thread is used.
>>>>
>>>> Tested on MacOS with:
>>>> $ meson test -C build --setup thorough --print-errorlogs \
>>>> func-x86_64-x86_64_replay func-arm-arm_replay func-aarch64-
>>>> aarch64_replay
>>>> $ ./build/qemu-system-x86_64 -nographic -icount
>>>> shift=auto,rr=record,rrfile=replay.log
>>>> $ ./build/qemu-system-x86_64 -nographic -icount
>>>> shift=auto,rr=replay,rrfile=replay.log
>>>>
>>>> [1] https://gitlab.com/qemu-project/qemu/-/commit/
>>>> f5ab12caba4f1656479c1feb5248beac1c833243
>>>>
>>>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2907
>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>> ---
>>>> system/main.c | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/system/main.c b/system/main.c
>>>> index ecb12fd397c..1c022067349 100644
>>>> --- a/system/main.c
>>>> +++ b/system/main.c
>>>> @@ -25,6 +25,7 @@
>>>> #include "qemu/osdep.h"
>>>> #include "qemu-main.h"
>>>> #include "qemu/main-loop.h"
>>>> +#include "system/replay.h"
>>>> #include "system/system.h"
>>>> #ifdef CONFIG_SDL
>>>> @@ -44,10 +45,12 @@ static void *qemu_default_main(void *opaque)
>>>> {
>>>> int status;
>>>> + replay_mutex_lock();
>>>> bql_lock();
>>>> status = qemu_main_loop();
>>>> qemu_cleanup(status);
>>>> bql_unlock();
>>>> + replay_mutex_unlock();
>>>> exit(status);
>>>> }
>>>> @@ -67,6 +70,7 @@ int main(int argc, char **argv)
>>>> {
>>>> qemu_init(argc, argv);
>>>> bql_unlock();
>>>> + replay_mutex_unlock();
>>>> if (qemu_main) {
>>>> QemuThread main_loop_thread;
>>>> qemu_thread_create(&main_loop_thread, "qemu_main",
>>>
>>> Do we actually need to hold replay mutex (or even bql) over qemu_init()?
>>> Both should get dropped before we return here. But as a simple fix, I
>>> guess this is okay.
>>>
>>
>> For the bql, I don't know the exact reason.
>> For replay lock, we need to hold it as clock gets saved as soon as the
>> devices are initialized, which happens before end of qemu_init.
>
> Could be worth adding a comment with that information.
>
In case someone is curious about it, changing default state of lock can
answer why it's needed, as it crashes immediately on an assert.
>>
>>> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/2] fix record/replay on MacOS
2025-04-14 15:14 ` Stefan Hajnoczi
@ 2025-04-14 15:25 ` Pierrick Bouvier
0 siblings, 0 replies; 15+ messages in thread
From: Pierrick Bouvier @ 2025-04-14 15:25 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, philmd, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan
On 4/14/25 08:14, Stefan Hajnoczi wrote:
> On Thu, Apr 10, 2025 at 03:55:48PM -0700, Pierrick Bouvier wrote:
>> Recently, it was found that rr tests fail on MacOS, with a replay_mutex_unlock()
>> assertion. This is a recent regression, related to running qemu main event loop
>> in a separate thread, like first commit explain.
>>
>> We first fix the regression, by handling the qemu replay mutex in the same way
>> we deal with BQL.
>> Then, we reenable the disabled test.
>>
>> Pierrick Bouvier (2):
>> system/main: transfer replay mutex ownership from main thread to main
>> loop thread
>> tests/functional/test_aarch64_replay: reenable on macos
>>
>> system/main.c | 4 ++++
>> tests/functional/test_aarch64_replay.py | 2 --
>> 2 files changed, 4 insertions(+), 2 deletions(-)
>>
>> --
>> 2.39.5
>>
>
> Thanks, applied to my staging tree:
> https://gitlab.com/stefanha/qemu/commits/staging
>
Thank you Stefan.
> Stefan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread
2025-04-14 15:24 ` Pierrick Bouvier
@ 2025-04-15 2:41 ` Nicholas Piggin
2025-04-15 18:31 ` Pierrick Bouvier
0 siblings, 1 reply; 15+ messages in thread
From: Nicholas Piggin @ 2025-04-15 2:41 UTC (permalink / raw)
To: Pierrick Bouvier, Philippe Mathieu-Daudé, qemu-devel
Cc: Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan
On Tue Apr 15, 2025 at 1:24 AM AEST, Pierrick Bouvier wrote:
> On 4/14/25 03:25, Philippe Mathieu-Daudé wrote:
>> On 12/4/25 19:24, Pierrick Bouvier wrote:
>>> On 4/11/25 22:30, Nicholas Piggin wrote:
>>>> On Fri Apr 11, 2025 at 8:55 AM AEST, Pierrick Bouvier wrote:
>>>>> On MacOS, UI event loop has to be ran in the main thread of a process.
>>>>> Because of that restriction, on this platform, qemu main event loop is
>>>>> ran on another thread [1].
>>>>>
>>>>> This breaks record/replay feature, which expects thread running
>>>>> qemu_init
>>>>> to initialize hold this lock, breaking associated functional tests on
>>>>> MacOS.
>>>>>
>>>>> Thus, as a generalization, and similar to how BQL is handled, we release
>>>>> it after init, and reacquire the lock before entering main event loop,
>>>>> avoiding a special case if a separate thread is used.
>>>>>
>>>>> Tested on MacOS with:
>>>>> $ meson test -C build --setup thorough --print-errorlogs \
>>>>> func-x86_64-x86_64_replay func-arm-arm_replay func-aarch64-
>>>>> aarch64_replay
>>>>> $ ./build/qemu-system-x86_64 -nographic -icount
>>>>> shift=auto,rr=record,rrfile=replay.log
>>>>> $ ./build/qemu-system-x86_64 -nographic -icount
>>>>> shift=auto,rr=replay,rrfile=replay.log
>>>>>
>>>>> [1] https://gitlab.com/qemu-project/qemu/-/commit/
>>>>> f5ab12caba4f1656479c1feb5248beac1c833243
>>>>>
>>>>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2907
>>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>>> ---
>>>>> system/main.c | 4 ++++
>>>>> 1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/system/main.c b/system/main.c
>>>>> index ecb12fd397c..1c022067349 100644
>>>>> --- a/system/main.c
>>>>> +++ b/system/main.c
>>>>> @@ -25,6 +25,7 @@
>>>>> #include "qemu/osdep.h"
>>>>> #include "qemu-main.h"
>>>>> #include "qemu/main-loop.h"
>>>>> +#include "system/replay.h"
>>>>> #include "system/system.h"
>>>>> #ifdef CONFIG_SDL
>>>>> @@ -44,10 +45,12 @@ static void *qemu_default_main(void *opaque)
>>>>> {
>>>>> int status;
>>>>> + replay_mutex_lock();
>>>>> bql_lock();
>>>>> status = qemu_main_loop();
>>>>> qemu_cleanup(status);
>>>>> bql_unlock();
>>>>> + replay_mutex_unlock();
>>>>> exit(status);
>>>>> }
>>>>> @@ -67,6 +70,7 @@ int main(int argc, char **argv)
>>>>> {
>>>>> qemu_init(argc, argv);
>>>>> bql_unlock();
>>>>> + replay_mutex_unlock();
>>>>> if (qemu_main) {
>>>>> QemuThread main_loop_thread;
>>>>> qemu_thread_create(&main_loop_thread, "qemu_main",
>>>>
>>>> Do we actually need to hold replay mutex (or even bql) over qemu_init()?
>>>> Both should get dropped before we return here. But as a simple fix, I
>>>> guess this is okay.
>>>>
>>>
>>> For the bql, I don't know the exact reason.
>>> For replay lock, we need to hold it as clock gets saved as soon as the
>>> devices are initialized, which happens before end of qemu_init.
>>
>> Could be worth adding a comment with that information.
>>
>
> In case someone is curious about it, changing default state of lock can
> answer why it's needed, as it crashes immediately on an assert.
That all sounds reasonable enough and good info. I'm not suggesting to
remove the lock from qemu_init() by assuming we are in init and init is
single threaded (I agree it's good practice to keep locking consistent).
My question was more that we should move the locks tighter around
the operations that require them. Move the unlock into qemu_init().
Commit f5ab12caba4f1 didn't introduce this problem, cocoa_main()
already immediatey called bql_unlock() so effectively the issue is
still there. The original design before cocoa I guess was that qemu_init
would init things under the same critical section as qemu_main_loop() is
then called, which is reasonable and conservative. It would have been
good to see this bql split get a specific patch to epxlain why it's not
needed across qemu_init and qemu_main_loop, but no big deal now.
The patch is fine for a fix, could I suggest another patch that
moves the lock narrower and perhaps adds a few words of comment?
Thanks,
Nick
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread
2025-04-15 2:41 ` Nicholas Piggin
@ 2025-04-15 18:31 ` Pierrick Bouvier
2025-04-16 3:16 ` Nicholas Piggin
0 siblings, 1 reply; 15+ messages in thread
From: Pierrick Bouvier @ 2025-04-15 18:31 UTC (permalink / raw)
To: Nicholas Piggin, Philippe Mathieu-Daudé, qemu-devel
Cc: Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan
On 4/14/25 19:41, Nicholas Piggin wrote:
> On Tue Apr 15, 2025 at 1:24 AM AEST, Pierrick Bouvier wrote:
>> On 4/14/25 03:25, Philippe Mathieu-Daudé wrote:
>>> On 12/4/25 19:24, Pierrick Bouvier wrote:
>>>> On 4/11/25 22:30, Nicholas Piggin wrote:
>>>>> On Fri Apr 11, 2025 at 8:55 AM AEST, Pierrick Bouvier wrote:
>>>>>> On MacOS, UI event loop has to be ran in the main thread of a process.
>>>>>> Because of that restriction, on this platform, qemu main event loop is
>>>>>> ran on another thread [1].
>>>>>>
>>>>>> This breaks record/replay feature, which expects thread running
>>>>>> qemu_init
>>>>>> to initialize hold this lock, breaking associated functional tests on
>>>>>> MacOS.
>>>>>>
>>>>>> Thus, as a generalization, and similar to how BQL is handled, we release
>>>>>> it after init, and reacquire the lock before entering main event loop,
>>>>>> avoiding a special case if a separate thread is used.
>>>>>>
>>>>>> Tested on MacOS with:
>>>>>> $ meson test -C build --setup thorough --print-errorlogs \
>>>>>> func-x86_64-x86_64_replay func-arm-arm_replay func-aarch64-
>>>>>> aarch64_replay
>>>>>> $ ./build/qemu-system-x86_64 -nographic -icount
>>>>>> shift=auto,rr=record,rrfile=replay.log
>>>>>> $ ./build/qemu-system-x86_64 -nographic -icount
>>>>>> shift=auto,rr=replay,rrfile=replay.log
>>>>>>
>>>>>> [1] https://gitlab.com/qemu-project/qemu/-/commit/
>>>>>> f5ab12caba4f1656479c1feb5248beac1c833243
>>>>>>
>>>>>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2907
>>>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>>>> ---
>>>>>> system/main.c | 4 ++++
>>>>>> 1 file changed, 4 insertions(+)
>>>>>>
>>>>>> diff --git a/system/main.c b/system/main.c
>>>>>> index ecb12fd397c..1c022067349 100644
>>>>>> --- a/system/main.c
>>>>>> +++ b/system/main.c
>>>>>> @@ -25,6 +25,7 @@
>>>>>> #include "qemu/osdep.h"
>>>>>> #include "qemu-main.h"
>>>>>> #include "qemu/main-loop.h"
>>>>>> +#include "system/replay.h"
>>>>>> #include "system/system.h"
>>>>>> #ifdef CONFIG_SDL
>>>>>> @@ -44,10 +45,12 @@ static void *qemu_default_main(void *opaque)
>>>>>> {
>>>>>> int status;
>>>>>> + replay_mutex_lock();
>>>>>> bql_lock();
>>>>>> status = qemu_main_loop();
>>>>>> qemu_cleanup(status);
>>>>>> bql_unlock();
>>>>>> + replay_mutex_unlock();
>>>>>> exit(status);
>>>>>> }
>>>>>> @@ -67,6 +70,7 @@ int main(int argc, char **argv)
>>>>>> {
>>>>>> qemu_init(argc, argv);
>>>>>> bql_unlock();
>>>>>> + replay_mutex_unlock();
>>>>>> if (qemu_main) {
>>>>>> QemuThread main_loop_thread;
>>>>>> qemu_thread_create(&main_loop_thread, "qemu_main",
>>>>>
>>>>> Do we actually need to hold replay mutex (or even bql) over qemu_init()?
>>>>> Both should get dropped before we return here. But as a simple fix, I
>>>>> guess this is okay.
>>>>>
>>>>
>>>> For the bql, I don't know the exact reason.
>>>> For replay lock, we need to hold it as clock gets saved as soon as the
>>>> devices are initialized, which happens before end of qemu_init.
>>>
>>> Could be worth adding a comment with that information.
>>>
>>
>> In case someone is curious about it, changing default state of lock can
>> answer why it's needed, as it crashes immediately on an assert.
>
> That all sounds reasonable enough and good info. I'm not suggesting to
> remove the lock from qemu_init() by assuming we are in init and init is
> single threaded (I agree it's good practice to keep locking consistent).
>
> My question was more that we should move the locks tighter around
> the operations that require them. Move the unlock into qemu_init().
>
> Commit f5ab12caba4f1 didn't introduce this problem, cocoa_main()
> already immediatey called bql_unlock() so effectively the issue is
> still there. The original design before cocoa I guess was that qemu_init
> would init things under the same critical section as qemu_main_loop() is
> then called, which is reasonable and conservative. It would have been
> good to see this bql split get a specific patch to epxlain why it's not
> needed across qemu_init and qemu_main_loop, but no big deal now.
>
Looking more closely, bql_lock ensure vcpus don't start executing
anything before init is completed. So we really want to hold the lock
through all qemu_init().
Concerning replay_lock, during init, icount_configure calls
qemu_clock_get_ns, that calls replay_save_clock, which expects to have
the lock. Thus, we should hold the lock, at least during icount
configuration.
> The patch is fine for a fix, could I suggest another patch that
> moves the lock narrower and perhaps adds a few words of comment?
>
We would still need to acquire locks in qemu_default_main() anyway.
For bql, we definitely want to hold it anytime through init, so the
scope is end of init.
For replay_lock, it could be moved around parts that expect it during
initialization, but what would be the benefit, considering only one
thread is running during init?
Moving locks narrower is usually made to allow more concurrency, at the
price of increased complexity. In init phase, only one thread runs
anyway, so there is no benefit to do anything around here.
What we could eventually do is move those unlock at the end of
qemu_init, but IMHO, it's more readable to see the lock/unlock scheme in
a single place, in system/main.c.
As well, I think it's better to have a single code path for lock/unlock,
whether we use a background thread or not (vs adding a bool parameter to
qemu_default_main() saying if we are in the same thread, or a different
one).
> Thanks,
> Nick
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread
2025-04-15 18:31 ` Pierrick Bouvier
@ 2025-04-16 3:16 ` Nicholas Piggin
2025-04-16 18:54 ` Pierrick Bouvier
0 siblings, 1 reply; 15+ messages in thread
From: Nicholas Piggin @ 2025-04-16 3:16 UTC (permalink / raw)
To: Pierrick Bouvier, Philippe Mathieu-Daudé, qemu-devel
Cc: Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan
On Wed Apr 16, 2025 at 4:31 AM AEST, Pierrick Bouvier wrote:
> On 4/14/25 19:41, Nicholas Piggin wrote:
>> On Tue Apr 15, 2025 at 1:24 AM AEST, Pierrick Bouvier wrote:
>>> On 4/14/25 03:25, Philippe Mathieu-Daudé wrote:
>>>> On 12/4/25 19:24, Pierrick Bouvier wrote:
>>>>> On 4/11/25 22:30, Nicholas Piggin wrote:
>>>>>> On Fri Apr 11, 2025 at 8:55 AM AEST, Pierrick Bouvier wrote:
>>>>>>> On MacOS, UI event loop has to be ran in the main thread of a process.
>>>>>>> Because of that restriction, on this platform, qemu main event loop is
>>>>>>> ran on another thread [1].
>>>>>>>
>>>>>>> This breaks record/replay feature, which expects thread running
>>>>>>> qemu_init
>>>>>>> to initialize hold this lock, breaking associated functional tests on
>>>>>>> MacOS.
>>>>>>>
>>>>>>> Thus, as a generalization, and similar to how BQL is handled, we release
>>>>>>> it after init, and reacquire the lock before entering main event loop,
>>>>>>> avoiding a special case if a separate thread is used.
>>>>>>>
>>>>>>> Tested on MacOS with:
>>>>>>> $ meson test -C build --setup thorough --print-errorlogs \
>>>>>>> func-x86_64-x86_64_replay func-arm-arm_replay func-aarch64-
>>>>>>> aarch64_replay
>>>>>>> $ ./build/qemu-system-x86_64 -nographic -icount
>>>>>>> shift=auto,rr=record,rrfile=replay.log
>>>>>>> $ ./build/qemu-system-x86_64 -nographic -icount
>>>>>>> shift=auto,rr=replay,rrfile=replay.log
>>>>>>>
>>>>>>> [1] https://gitlab.com/qemu-project/qemu/-/commit/
>>>>>>> f5ab12caba4f1656479c1feb5248beac1c833243
>>>>>>>
>>>>>>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/2907
>>>>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>>>>> ---
>>>>>>> system/main.c | 4 ++++
>>>>>>> 1 file changed, 4 insertions(+)
>>>>>>>
>>>>>>> diff --git a/system/main.c b/system/main.c
>>>>>>> index ecb12fd397c..1c022067349 100644
>>>>>>> --- a/system/main.c
>>>>>>> +++ b/system/main.c
>>>>>>> @@ -25,6 +25,7 @@
>>>>>>> #include "qemu/osdep.h"
>>>>>>> #include "qemu-main.h"
>>>>>>> #include "qemu/main-loop.h"
>>>>>>> +#include "system/replay.h"
>>>>>>> #include "system/system.h"
>>>>>>> #ifdef CONFIG_SDL
>>>>>>> @@ -44,10 +45,12 @@ static void *qemu_default_main(void *opaque)
>>>>>>> {
>>>>>>> int status;
>>>>>>> + replay_mutex_lock();
>>>>>>> bql_lock();
>>>>>>> status = qemu_main_loop();
>>>>>>> qemu_cleanup(status);
>>>>>>> bql_unlock();
>>>>>>> + replay_mutex_unlock();
>>>>>>> exit(status);
>>>>>>> }
>>>>>>> @@ -67,6 +70,7 @@ int main(int argc, char **argv)
>>>>>>> {
>>>>>>> qemu_init(argc, argv);
>>>>>>> bql_unlock();
>>>>>>> + replay_mutex_unlock();
>>>>>>> if (qemu_main) {
>>>>>>> QemuThread main_loop_thread;
>>>>>>> qemu_thread_create(&main_loop_thread, "qemu_main",
>>>>>>
>>>>>> Do we actually need to hold replay mutex (or even bql) over qemu_init()?
>>>>>> Both should get dropped before we return here. But as a simple fix, I
>>>>>> guess this is okay.
>>>>>>
>>>>>
>>>>> For the bql, I don't know the exact reason.
>>>>> For replay lock, we need to hold it as clock gets saved as soon as the
>>>>> devices are initialized, which happens before end of qemu_init.
>>>>
>>>> Could be worth adding a comment with that information.
>>>>
>>>
>>> In case someone is curious about it, changing default state of lock can
>>> answer why it's needed, as it crashes immediately on an assert.
>>
>> That all sounds reasonable enough and good info. I'm not suggesting to
>> remove the lock from qemu_init() by assuming we are in init and init is
>> single threaded (I agree it's good practice to keep locking consistent).
>>
>> My question was more that we should move the locks tighter around
>> the operations that require them. Move the unlock into qemu_init().
>>
>> Commit f5ab12caba4f1 didn't introduce this problem, cocoa_main()
>> already immediatey called bql_unlock() so effectively the issue is
>> still there. The original design before cocoa I guess was that qemu_init
>> would init things under the same critical section as qemu_main_loop() is
>> then called, which is reasonable and conservative. It would have been
>> good to see this bql split get a specific patch to epxlain why it's not
>> needed across qemu_init and qemu_main_loop, but no big deal now.
>>
>
> Looking more closely, bql_lock ensure vcpus don't start executing
> anything before init is completed. So we really want to hold the lock
> through all qemu_init().
>
> Concerning replay_lock, during init, icount_configure calls
> qemu_clock_get_ns, that calls replay_save_clock, which expects to have
> the lock. Thus, we should hold the lock, at least during icount
> configuration.
Sounds reasonable.
>> The patch is fine for a fix, could I suggest another patch that
>> moves the lock narrower and perhaps adds a few words of comment?
>>
>
> We would still need to acquire locks in qemu_default_main() anyway.
>
> For bql, we definitely want to hold it anytime through init, so the
> scope is end of init.
> For replay_lock, it could be moved around parts that expect it during
> initialization, but what would be the benefit, considering only one
> thread is running during init?
>
> Moving locks narrower is usually made to allow more concurrency, at the
> price of increased complexity. In init phase, only one thread runs
> anyway, so there is no benefit to do anything around here.
>
> What we could eventually do is move those unlock at the end of
> qemu_init, but IMHO, it's more readable to see the lock/unlock scheme in
> a single place, in system/main.c.
> As well, I think it's better to have a single code path for lock/unlock,
> whether we use a background thread or not (vs adding a bool parameter to
> qemu_default_main() saying if we are in the same thread, or a different
> one).
I think the benefit is just code clarity. I think a function returning
with a different lock state that it is called with generally is not
desirable if it can be avoided.
qemu_init() caller just releasing locks immediately doesn't really serve
a benefit. A comment can be added to say that when qemu_init() returns,
bql is not held and CPUs may be running. This would be the same for any
configuration of background thread or not.
I don't want to bikeshed it too much, if you prefer not to move it. It is
nly one specialized case. Adding some comments about the purpose of the
locks is more important than if you release them here or in the callee.
Thanks,
Nick
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread
2025-04-16 3:16 ` Nicholas Piggin
@ 2025-04-16 18:54 ` Pierrick Bouvier
0 siblings, 0 replies; 15+ messages in thread
From: Pierrick Bouvier @ 2025-04-16 18:54 UTC (permalink / raw)
To: Nicholas Piggin, Philippe Mathieu-Daudé, qemu-devel
Cc: Stefan Hajnoczi, Alex Bennée, Paolo Bonzini,
Phil Dennis-Jordan
On 4/15/25 20:16, Nicholas Piggin wrote:
> I don't want to bikeshed it too much, if you prefer not to move it. It is
> nly one specialized case. Adding some comments about the purpose of the
> locks is more important than if you release them here or in the callee.
>
I sent a patch commenting this:
https://lore.kernel.org/qemu-devel/20250416185218.1654157-1-pierrick.bouvier@linaro.org/T/#u
Pierrick
> Thanks,
> Nick
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-04-16 18:55 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-10 22:55 [PATCH 0/2] fix record/replay on MacOS Pierrick Bouvier
2025-04-10 22:55 ` [PATCH 1/2] system/main: transfer replay mutex ownership from main thread to main loop thread Pierrick Bouvier
2025-04-12 5:30 ` Nicholas Piggin
2025-04-12 17:24 ` Pierrick Bouvier
2025-04-14 10:25 ` Philippe Mathieu-Daudé
2025-04-14 15:24 ` Pierrick Bouvier
2025-04-15 2:41 ` Nicholas Piggin
2025-04-15 18:31 ` Pierrick Bouvier
2025-04-16 3:16 ` Nicholas Piggin
2025-04-16 18:54 ` Pierrick Bouvier
2025-04-14 10:57 ` Paolo Bonzini
2025-04-10 22:55 ` [PATCH 2/2] tests/functional/test_aarch64_replay: reenable on macos Pierrick Bouvier
2025-04-11 13:42 ` [PATCH 0/2] fix record/replay on MacOS Philippe Mathieu-Daudé
2025-04-14 15:14 ` Stefan Hajnoczi
2025-04-14 15:25 ` Pierrick Bouvier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).