* [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug
@ 2025-06-03 12:54 Vladimir Lukianov
2025-06-06 9:46 ` Alex Bennée
2025-10-07 8:14 ` Alex Bennée
0 siblings, 2 replies; 4+ messages in thread
From: Vladimir Lukianov @ 2025-06-03 12:54 UTC (permalink / raw)
To: qemu-devel; +Cc: Alex Bennée, Paolo Bonzini, Vladimir Lukianov
Ensures EVENT_INSTRUCTION written to replay.bin before EVENT_SHUTDOWN_HOST_QMP
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2921
Signed-off-by: Vladimir Lukianov <1844144@gmail.com>
---
During the record pass, test_reverse_debug writes a sequence of
instructions to replay.bin. Presumably due to a race condition or
host's async implementation details, the resulting file looks like:
...
12: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data
13: EVENT_INSTRUCTION(0) + 59 -> 44298
14: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data
15: EVENT_SHUTDOWN_HOST_QMP_QUIT(12)
16: EVENT_INSTRUCTION(0) + 5587988 -> 5632286
17: EVENT_SHUTDOWN_HOST_SIGNAL(14)
18: EVENT_END(39)
Reached 162 of 162 bytes
Here, SHUTDOWN_HOST_QMP_QUIT is written before the last instruction
event. During the replay pass, QUIT is executed before the last
instruction, which causes the VM to shut down. As a result, the QMP
and GDB connections are broken, and the test cannot execute its final
steps.
Adding replay_save_instructions ensures EVENT_INSTRUCTION is written
before EVENT_SHUTDOWN_HOST_QMP_QUIT.
Tested on my arm64. This does not fix the bug on x86_64. The x86_64
case seems similar, but slightly different.
replay/replay.c | 2 ++
tests/functional/test_aarch64_reverse_debug.py | 1 -
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/replay/replay.c b/replay/replay.c
index a3e24c96..b2121788 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -263,6 +263,8 @@ bool replay_has_interrupt(void)
void replay_shutdown_request(ShutdownCause cause)
{
+ replay_save_instructions();
+
if (replay_mode == REPLAY_MODE_RECORD) {
g_assert(replay_mutex_locked());
replay_put_event(EVENT_SHUTDOWN + cause);
diff --git a/tests/functional/test_aarch64_reverse_debug.py b/tests/functional/test_aarch64_reverse_debug.py
index 58d45328..0ac1ccb0 100755
--- a/tests/functional/test_aarch64_reverse_debug.py
+++ b/tests/functional/test_aarch64_reverse_debug.py
@@ -26,7 +26,6 @@ class ReverseDebugging_AArch64(ReverseDebugging):
'releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz'),
'7e1430b81c26bdd0da025eeb8fbd77b5dc961da4364af26e771bd39f379cbbf7')
- @skipFlakyTest("https://gitlab.com/qemu-project/qemu/-/issues/2921")
def test_aarch64_virt(self):
self.set_machine('virt')
self.cpu = 'cortex-a53'
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug
2025-06-03 12:54 [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug Vladimir Lukianov
@ 2025-06-06 9:46 ` Alex Bennée
2025-06-06 13:15 ` Владимир Л.
2025-10-07 8:14 ` Alex Bennée
1 sibling, 1 reply; 4+ messages in thread
From: Alex Bennée @ 2025-06-06 9:46 UTC (permalink / raw)
To: Vladimir Lukianov; +Cc: qemu-devel, Paolo Bonzini
Vladimir Lukianov <1844144@gmail.com> writes:
> Ensures EVENT_INSTRUCTION written to replay.bin before EVENT_SHUTDOWN_HOST_QMP
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2921
> Signed-off-by: Vladimir Lukianov <1844144@gmail.com>
> ---
> During the record pass, test_reverse_debug writes a sequence of
> instructions to replay.bin. Presumably due to a race condition or
> host's async implementation details, the resulting file looks like:
>
> ...
> 12: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data
> 13: EVENT_INSTRUCTION(0) + 59 -> 44298
> 14: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data
> 15: EVENT_SHUTDOWN_HOST_QMP_QUIT(12)
> 16: EVENT_INSTRUCTION(0) + 5587988 -> 5632286
> 17: EVENT_SHUTDOWN_HOST_SIGNAL(14)
> 18: EVENT_END(39)
> Reached 162 of 162 bytes
>
> Here, SHUTDOWN_HOST_QMP_QUIT is written before the last instruction
> event. During the replay pass, QUIT is executed before the last
> instruction, which causes the VM to shut down. As a result, the QMP
> and GDB connections are broken, and the test cannot execute its final
> steps.
Seems reasonable to me.
>
> Adding replay_save_instructions ensures EVENT_INSTRUCTION is written
> before EVENT_SHUTDOWN_HOST_QMP_QUIT.
>
> Tested on my arm64. This does not fix the bug on x86_64. The x86_64
> case seems similar, but slightly different.
Hmm I can't run the functional tests due to missing avocado bits. How
did you run the tests?
>
> replay/replay.c | 2 ++
> tests/functional/test_aarch64_reverse_debug.py | 1 -
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/replay/replay.c b/replay/replay.c
> index a3e24c96..b2121788 100644
> --- a/replay/replay.c
> +++ b/replay/replay.c
> @@ -263,6 +263,8 @@ bool replay_has_interrupt(void)
>
> void replay_shutdown_request(ShutdownCause cause)
> {
> + replay_save_instructions();
> +
> if (replay_mode == REPLAY_MODE_RECORD) {
> g_assert(replay_mutex_locked());
> replay_put_event(EVENT_SHUTDOWN + cause);
> diff --git a/tests/functional/test_aarch64_reverse_debug.py b/tests/functional/test_aarch64_reverse_debug.py
> index 58d45328..0ac1ccb0 100755
> --- a/tests/functional/test_aarch64_reverse_debug.py
> +++ b/tests/functional/test_aarch64_reverse_debug.py
> @@ -26,7 +26,6 @@ class ReverseDebugging_AArch64(ReverseDebugging):
> 'releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz'),
> '7e1430b81c26bdd0da025eeb8fbd77b5dc961da4364af26e771bd39f379cbbf7')
>
> - @skipFlakyTest("https://gitlab.com/qemu-project/qemu/-/issues/2921")
> def test_aarch64_virt(self):
> self.set_machine('virt')
> self.cpu = 'cortex-a53'
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug
2025-06-06 9:46 ` Alex Bennée
@ 2025-06-06 13:15 ` Владимир Л.
0 siblings, 0 replies; 4+ messages in thread
From: Владимир Л. @ 2025-06-06 13:15 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 3547 bytes --]
Hi, thanks for responding!
Usual flow was
source build/pyvenv/bin/activate
export PYTHONBREAKPOINT="ipdb.set_trace"
export
QEMU_TEST_QEMU_BINARY=/home/lukvladimir/dev/qemu/build/qemu-system-aarch64
If missing python packages / avocado
pip install -e python/
pip install avocado-framework
Then
QEMU_TEST_FLAKY_TESTS=1 avocado -V run
tests/functional/test_aarch64_reverse_debug.py
Or
QEMU_TEST_FLAKY_TESTS=1 tests/functional/test_aarch64_reverse_debug.py
On Fri, 6 Jun 2025 at 11:46, Alex Bennée <alex.bennee@linaro.org> wrote:
> Vladimir Lukianov <1844144@gmail.com> writes:
>
> > Ensures EVENT_INSTRUCTION written to replay.bin before
> EVENT_SHUTDOWN_HOST_QMP
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2921
> > Signed-off-by: Vladimir Lukianov <1844144@gmail.com>
> > ---
> > During the record pass, test_reverse_debug writes a sequence of
> > instructions to replay.bin. Presumably due to a race condition or
> > host's async implementation details, the resulting file looks like:
> >
> > ...
> > 12: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data
> > 13: EVENT_INSTRUCTION(0) + 59 -> 44298
> > 14: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data
> > 15: EVENT_SHUTDOWN_HOST_QMP_QUIT(12)
> > 16: EVENT_INSTRUCTION(0) + 5587988 -> 5632286
> > 17: EVENT_SHUTDOWN_HOST_SIGNAL(14)
> > 18: EVENT_END(39)
> > Reached 162 of 162 bytes
> >
> > Here, SHUTDOWN_HOST_QMP_QUIT is written before the last instruction
> > event. During the replay pass, QUIT is executed before the last
> > instruction, which causes the VM to shut down. As a result, the QMP
> > and GDB connections are broken, and the test cannot execute its final
> > steps.
>
> Seems reasonable to me.
>
> >
> > Adding replay_save_instructions ensures EVENT_INSTRUCTION is written
> > before EVENT_SHUTDOWN_HOST_QMP_QUIT.
> >
> > Tested on my arm64. This does not fix the bug on x86_64. The x86_64
> > case seems similar, but slightly different.
>
> Hmm I can't run the functional tests due to missing avocado bits. How
> did you run the tests?
>
> >
> > replay/replay.c | 2 ++
> > tests/functional/test_aarch64_reverse_debug.py | 1 -
> > 2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/replay/replay.c b/replay/replay.c
> > index a3e24c96..b2121788 100644
> > --- a/replay/replay.c
> > +++ b/replay/replay.c
> > @@ -263,6 +263,8 @@ bool replay_has_interrupt(void)
> >
> > void replay_shutdown_request(ShutdownCause cause)
> > {
> > + replay_save_instructions();
> > +
> > if (replay_mode == REPLAY_MODE_RECORD) {
> > g_assert(replay_mutex_locked());
> > replay_put_event(EVENT_SHUTDOWN + cause);
> > diff --git a/tests/functional/test_aarch64_reverse_debug.py
> b/tests/functional/test_aarch64_reverse_debug.py
> > index 58d45328..0ac1ccb0 100755
> > --- a/tests/functional/test_aarch64_reverse_debug.py
> > +++ b/tests/functional/test_aarch64_reverse_debug.py
> > @@ -26,7 +26,6 @@ class ReverseDebugging_AArch64(ReverseDebugging):
> > 'releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz'),
> >
> '7e1430b81c26bdd0da025eeb8fbd77b5dc961da4364af26e771bd39f379cbbf7')
> >
> > - @skipFlakyTest("https://gitlab.com/qemu-project/qemu/-/issues/2921
> ")
> > def test_aarch64_virt(self):
> > self.set_machine('virt')
> > self.cpu = 'cortex-a53'
>
> --
> Alex Bennée
> Virtualisation Tech Lead @ Linaro
>
[-- Attachment #2: Type: text/html, Size: 4703 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug
2025-06-03 12:54 [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug Vladimir Lukianov
2025-06-06 9:46 ` Alex Bennée
@ 2025-10-07 8:14 ` Alex Bennée
1 sibling, 0 replies; 4+ messages in thread
From: Alex Bennée @ 2025-10-07 8:14 UTC (permalink / raw)
To: Vladimir Lukianov; +Cc: qemu-devel, Paolo Bonzini
Vladimir Lukianov <1844144@gmail.com> writes:
> Ensures EVENT_INSTRUCTION written to replay.bin before EVENT_SHUTDOWN_HOST_QMP
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2921
> Signed-off-by: Vladimir Lukianov <1844144@gmail.com>
Queued to pr/031025-10.2-maintainer-1, thanks.
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-10-07 8:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03 12:54 [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug Vladimir Lukianov
2025-06-06 9:46 ` Alex Bennée
2025-06-06 13:15 ` Владимир Л.
2025-10-07 8:14 ` Alex Bennée
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.