All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug
@ 2025-06-03 12:54 Vladimir Lukianov
  2025-06-06  9:46 ` Alex Bennée
  2025-10-07  8:14 ` Alex Bennée
  0 siblings, 2 replies; 4+ messages in thread
From: Vladimir Lukianov @ 2025-06-03 12:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alex Bennée, Paolo Bonzini, Vladimir Lukianov

Ensures EVENT_INSTRUCTION written to replay.bin before EVENT_SHUTDOWN_HOST_QMP

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2921
Signed-off-by: Vladimir Lukianov <1844144@gmail.com>
---
During the record pass, test_reverse_debug writes a sequence of
instructions to replay.bin. Presumably due to a race condition or
host's async implementation details, the resulting file looks like:

...
12: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data  
13: EVENT_INSTRUCTION(0) + 59 -> 44298  
14: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data  
15: EVENT_SHUTDOWN_HOST_QMP_QUIT(12)  
16: EVENT_INSTRUCTION(0) + 5587988 -> 5632286  
17: EVENT_SHUTDOWN_HOST_SIGNAL(14)  
18: EVENT_END(39)  
Reached 162 of 162 bytes

Here, SHUTDOWN_HOST_QMP_QUIT is written before the last instruction
event. During the replay pass, QUIT is executed before the last
instruction, which causes the VM to shut down. As a result, the QMP
and GDB connections are broken, and the test cannot execute its final
steps.

Adding replay_save_instructions ensures EVENT_INSTRUCTION is written
before EVENT_SHUTDOWN_HOST_QMP_QUIT.

Tested on my arm64. This does not fix the bug on x86_64. The x86_64
case seems similar, but slightly different.

 replay/replay.c                                | 2 ++
 tests/functional/test_aarch64_reverse_debug.py | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/replay/replay.c b/replay/replay.c
index a3e24c96..b2121788 100644
--- a/replay/replay.c
+++ b/replay/replay.c
@@ -263,6 +263,8 @@ bool replay_has_interrupt(void)
 
 void replay_shutdown_request(ShutdownCause cause)
 {
+    replay_save_instructions();
+
     if (replay_mode == REPLAY_MODE_RECORD) {
         g_assert(replay_mutex_locked());
         replay_put_event(EVENT_SHUTDOWN + cause);
diff --git a/tests/functional/test_aarch64_reverse_debug.py b/tests/functional/test_aarch64_reverse_debug.py
index 58d45328..0ac1ccb0 100755
--- a/tests/functional/test_aarch64_reverse_debug.py
+++ b/tests/functional/test_aarch64_reverse_debug.py
@@ -26,7 +26,6 @@ class ReverseDebugging_AArch64(ReverseDebugging):
          'releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz'),
         '7e1430b81c26bdd0da025eeb8fbd77b5dc961da4364af26e771bd39f379cbbf7')
 
-    @skipFlakyTest("https://gitlab.com/qemu-project/qemu/-/issues/2921")
     def test_aarch64_virt(self):
         self.set_machine('virt')
         self.cpu = 'cortex-a53'
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug
  2025-06-03 12:54 [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug Vladimir Lukianov
@ 2025-06-06  9:46 ` Alex Bennée
  2025-06-06 13:15   ` Владимир Л.
  2025-10-07  8:14 ` Alex Bennée
  1 sibling, 1 reply; 4+ messages in thread
From: Alex Bennée @ 2025-06-06  9:46 UTC (permalink / raw)
  To: Vladimir Lukianov; +Cc: qemu-devel, Paolo Bonzini

Vladimir Lukianov <1844144@gmail.com> writes:

> Ensures EVENT_INSTRUCTION written to replay.bin before EVENT_SHUTDOWN_HOST_QMP
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2921
> Signed-off-by: Vladimir Lukianov <1844144@gmail.com>
> ---
> During the record pass, test_reverse_debug writes a sequence of
> instructions to replay.bin. Presumably due to a race condition or
> host's async implementation details, the resulting file looks like:
>
> ...
> 12: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data  
> 13: EVENT_INSTRUCTION(0) + 59 -> 44298  
> 14: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data  
> 15: EVENT_SHUTDOWN_HOST_QMP_QUIT(12)  
> 16: EVENT_INSTRUCTION(0) + 5587988 -> 5632286  
> 17: EVENT_SHUTDOWN_HOST_SIGNAL(14)  
> 18: EVENT_END(39)  
> Reached 162 of 162 bytes
>
> Here, SHUTDOWN_HOST_QMP_QUIT is written before the last instruction
> event. During the replay pass, QUIT is executed before the last
> instruction, which causes the VM to shut down. As a result, the QMP
> and GDB connections are broken, and the test cannot execute its final
> steps.

Seems reasonable to me.

>
> Adding replay_save_instructions ensures EVENT_INSTRUCTION is written
> before EVENT_SHUTDOWN_HOST_QMP_QUIT.
>
> Tested on my arm64. This does not fix the bug on x86_64. The x86_64
> case seems similar, but slightly different.

Hmm I can't run the functional tests due to missing avocado bits. How
did you run the tests?

>
>  replay/replay.c                                | 2 ++
>  tests/functional/test_aarch64_reverse_debug.py | 1 -
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/replay/replay.c b/replay/replay.c
> index a3e24c96..b2121788 100644
> --- a/replay/replay.c
> +++ b/replay/replay.c
> @@ -263,6 +263,8 @@ bool replay_has_interrupt(void)
>  
>  void replay_shutdown_request(ShutdownCause cause)
>  {
> +    replay_save_instructions();
> +
>      if (replay_mode == REPLAY_MODE_RECORD) {
>          g_assert(replay_mutex_locked());
>          replay_put_event(EVENT_SHUTDOWN + cause);
> diff --git a/tests/functional/test_aarch64_reverse_debug.py b/tests/functional/test_aarch64_reverse_debug.py
> index 58d45328..0ac1ccb0 100755
> --- a/tests/functional/test_aarch64_reverse_debug.py
> +++ b/tests/functional/test_aarch64_reverse_debug.py
> @@ -26,7 +26,6 @@ class ReverseDebugging_AArch64(ReverseDebugging):
>           'releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz'),
>          '7e1430b81c26bdd0da025eeb8fbd77b5dc961da4364af26e771bd39f379cbbf7')
>  
> -    @skipFlakyTest("https://gitlab.com/qemu-project/qemu/-/issues/2921")
>      def test_aarch64_virt(self):
>          self.set_machine('virt')
>          self.cpu = 'cortex-a53'

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug
  2025-06-06  9:46 ` Alex Bennée
@ 2025-06-06 13:15   ` Владимир Л.
  0 siblings, 0 replies; 4+ messages in thread
From: Владимир Л. @ 2025-06-06 13:15 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3547 bytes --]

Hi, thanks for responding!
Usual flow was
    source build/pyvenv/bin/activate
    export PYTHONBREAKPOINT="ipdb.set_trace"
    export
QEMU_TEST_QEMU_BINARY=/home/lukvladimir/dev/qemu/build/qemu-system-aarch64

If missing python packages / avocado

    pip install -e python/
    pip install avocado-framework

Then

    QEMU_TEST_FLAKY_TESTS=1 avocado -V  run
tests/functional/test_aarch64_reverse_debug.py

Or
    QEMU_TEST_FLAKY_TESTS=1 tests/functional/test_aarch64_reverse_debug.py

On Fri, 6 Jun 2025 at 11:46, Alex Bennée <alex.bennee@linaro.org> wrote:

> Vladimir Lukianov <1844144@gmail.com> writes:
>
> > Ensures EVENT_INSTRUCTION written to replay.bin before
> EVENT_SHUTDOWN_HOST_QMP
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2921
> > Signed-off-by: Vladimir Lukianov <1844144@gmail.com>
> > ---
> > During the record pass, test_reverse_debug writes a sequence of
> > instructions to replay.bin. Presumably due to a race condition or
> > host's async implementation details, the resulting file looks like:
> >
> > ...
> > 12: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data
> > 13: EVENT_INSTRUCTION(0) + 59 -> 44298
> > 14: EVENT_CP_CLOCK_WARP_ACCOUNT(31) no additional data
> > 15: EVENT_SHUTDOWN_HOST_QMP_QUIT(12)
> > 16: EVENT_INSTRUCTION(0) + 5587988 -> 5632286
> > 17: EVENT_SHUTDOWN_HOST_SIGNAL(14)
> > 18: EVENT_END(39)
> > Reached 162 of 162 bytes
> >
> > Here, SHUTDOWN_HOST_QMP_QUIT is written before the last instruction
> > event. During the replay pass, QUIT is executed before the last
> > instruction, which causes the VM to shut down. As a result, the QMP
> > and GDB connections are broken, and the test cannot execute its final
> > steps.
>
> Seems reasonable to me.
>
> >
> > Adding replay_save_instructions ensures EVENT_INSTRUCTION is written
> > before EVENT_SHUTDOWN_HOST_QMP_QUIT.
> >
> > Tested on my arm64. This does not fix the bug on x86_64. The x86_64
> > case seems similar, but slightly different.
>
> Hmm I can't run the functional tests due to missing avocado bits. How
> did you run the tests?
>
> >
> >  replay/replay.c                                | 2 ++
> >  tests/functional/test_aarch64_reverse_debug.py | 1 -
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/replay/replay.c b/replay/replay.c
> > index a3e24c96..b2121788 100644
> > --- a/replay/replay.c
> > +++ b/replay/replay.c
> > @@ -263,6 +263,8 @@ bool replay_has_interrupt(void)
> >
> >  void replay_shutdown_request(ShutdownCause cause)
> >  {
> > +    replay_save_instructions();
> > +
> >      if (replay_mode == REPLAY_MODE_RECORD) {
> >          g_assert(replay_mutex_locked());
> >          replay_put_event(EVENT_SHUTDOWN + cause);
> > diff --git a/tests/functional/test_aarch64_reverse_debug.py
> b/tests/functional/test_aarch64_reverse_debug.py
> > index 58d45328..0ac1ccb0 100755
> > --- a/tests/functional/test_aarch64_reverse_debug.py
> > +++ b/tests/functional/test_aarch64_reverse_debug.py
> > @@ -26,7 +26,6 @@ class ReverseDebugging_AArch64(ReverseDebugging):
> >           'releases/29/Everything/aarch64/os/images/pxeboot/vmlinuz'),
> >
> '7e1430b81c26bdd0da025eeb8fbd77b5dc961da4364af26e771bd39f379cbbf7')
> >
> > -    @skipFlakyTest("https://gitlab.com/qemu-project/qemu/-/issues/2921
> ")
> >      def test_aarch64_virt(self):
> >          self.set_machine('virt')
> >          self.cpu = 'cortex-a53'
>
> --
> Alex Bennée
> Virtualisation Tech Lead @ Linaro
>

[-- Attachment #2: Type: text/html, Size: 4703 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug
  2025-06-03 12:54 [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug Vladimir Lukianov
  2025-06-06  9:46 ` Alex Bennée
@ 2025-10-07  8:14 ` Alex Bennée
  1 sibling, 0 replies; 4+ messages in thread
From: Alex Bennée @ 2025-10-07  8:14 UTC (permalink / raw)
  To: Vladimir Lukianov; +Cc: qemu-devel, Paolo Bonzini

Vladimir Lukianov <1844144@gmail.com> writes:

> Ensures EVENT_INSTRUCTION written to replay.bin before EVENT_SHUTDOWN_HOST_QMP
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2921
> Signed-off-by: Vladimir Lukianov <1844144@gmail.com>

Queued to pr/031025-10.2-maintainer-1, thanks.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-10-07  8:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03 12:54 [PATCH] record/replay: fix race condition on test_aarch64_reverse_debug Vladimir Lukianov
2025-06-06  9:46 ` Alex Bennée
2025-06-06 13:15   ` Владимир Л.
2025-10-07  8:14 ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.